Accessing Java files within .tar files using Java - java

I am creating a Java application that will take in .tar files and retrieve all the files from these .tar files. I know you can use GZIPInputStream to get the files from the .tar files, but is it possible to get these files from the GZIPInputStream as a FileInputStream, something like below?
InputStream is = new GZIPInputStream(new FileInputStream(file));

Yes, it is possible due to the CTOR of GZipInputStream that lets you accept an InputStream as parameter (GZipInputStream is a wrapper in that sense to InputStream). Then all you need to do is follow examples you can find on the internet such as this in order to extract the files.

GZIPInputStream extends InputStream (not directly) so in my opinion it's ok. Try this:
InputStream in = new java.util.GZIPInputStream(new FileInputStream(file));
byte[] buffer = new byte[1024];
int read = 0;
while (( read = in.read(buffer, 0, 1024)) > 0)
{
// do sth
}

Related

Upload to S3 using Gzip in Java

I'm new to Java and I'm trying to upload a large file ( ~10GB ) to Amazon S3. Could anyone please help me with how to use GZip outputsteam for it ?
I've been through some documentations but got confused about Byte Streams, Gzip streams. They must be used together ? Can anyone help me with this piece of code ?
Thanks in advance.
Have a look at this,
Is it possible to gzip and upload this string to Amazon S3 without ever being written to disk?
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
GZipOuputStream gzipOut = new GZipOutputStream(byteOut);
// write your stuff
byte[] bites = byteOut.toByteArray();
//write the bites to the amazon stream
Since its a large file you might want to have a look at multi part upload
This question could have been more specific and there are several ways to achieve this. One approach might look like the below.
The example depends on the commons-io and commons-compress libraries, and uses classes from the java.nio.file package.
public static void compressAndUpload(AmazonS3 s3, InputStream in)
throws IOException
{
// Create temp file
Path tmpPath = Files.createTempFile("prefix", "suffix");
// Create and write to gzip compressor stream
OutputStream out = Files.newOutputStream(tmpPath);
GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(out);
IOUtils.copy(in, gzOut);
// Read content from temp file
InputStream fileIn = Files.newInputStream(tmpPath);
long size = Files.size(tmpPath);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentType("application/x-gzip");
metadata.setContentLength(size);
// Upload file to S3
s3.putObject(new PutObjectRequest("bucket", "key", fileIn, metadata));
}
Buffering, error handling and closing of streams are omitted for brevity.

Need to convert AssetInputStream to FileInputStream

I have implemented a data structure which is working on my computer and now I am trying to port it into my android application. I open a raw .dat resource and get a InputStream but I need to get a FileInputStream:
FileInputStream fip = (FileInputStream) context.getResources().openRawResource(fileID);
FileChannel fc = fip.getChannel();
long bytesSizeOfFileChannel = fc.size();
MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0L, bytesSizeOfFileChannel);
...
The code above throws the following exception since an InputStream can not be cast to a FileInputStream but that's just what I need:
java.lang.ClassCastException: android.content.res.AssetManager$AssetInputStream cannot be cast to java.io.FileInputStream
All my code is build on using this FileChannel with a FileInputStream so I want to keep using it. Is there a way to go from having an InputStream from context.getResources().openRawResource(fileID) and then convert it into a FileChannel?
Somewhat relevant posts in which I could not find a working solution for my case which android:
How to convert InputStream to FileInputStream
Converting inputStream to FileInputStream?
Using FileChannel to write any InputStream?
A resource isn't a file. Ergo it can't be used as a memory-mapped file. If you have resources that are so enormous they need to be memory-mapped, they probably shouldn't be resources at all. And if they are small, memory mapping brings no advantages.
This might be late, but i think you can indirectly get a FileInputStream from an InputStream. what i suggest is this: get the input stream from resource, then create a temp file,get a FileOutputStream from it. read the InputStream and copy it to FileOutputStream.
now the temp file has the contents of your resource file, and now you can create a FileInputStream from this file.
I don't know if this particular solution is useful to you, but i think it can be used in other situations.
As an example, if your file is in the assets folder, you get an InputStream and then a FileInputStream using this method:
InputStream is=getAssets().open("video.3gp");
File tempfile=File.createTempFile("tempfile",".3gp",getDir("filez",0));
FileOutputStream os=newFileOutputStream(tempfile);
byte[] buffer=newbyte[16000];
int length=0;
while((length=is.read(buffer))!=-1){
os.write(buffer,0,length);
}
FileInputStream fis=new FileInputStream(tempfile);

How to obtain File handle to a resource on runtime classpath?

I have a situation where I need to scan the runtime classpath for a resource file (say, res/config/meta.cfg), and then create a File handle for it. The best I've been able to come up with is:
// This file is located inside a JAR that is on the runtime classpath.
String fileName = "res/config/meta.cfg";
try {
InputStream inStream = ClassLoader.getSystemResourceAsStream(fileName);
File file = new File(String.format("${java.io.tmpdir}/%s", fileName));
FileOutputStream foutStream = null;
foutStream = new FileOutputStream(file);
int read = 0;
byte[] bytes = new byte[1024];
while((read = inStream.read(bytes)) != -1)
foutStream.write(bytes, 0, read);
foutStream.close();
return file;
} catch (Exception exc) {
throw new RuntimeException(exc);
}
So essentially, read in the resource as an InputStream, and then write the stream to a temp file (under {$java.io.tmpdir}) so that we can obtain a valid File handle for it.
This seems like going 3 sides around the barn. Is there a better/easier/more elegant way of doing this? Thanks in advance!
No.
Of course you can (and probably should) use a library to copy the InputStream's content to a file but that obviously is not the point of your question.
The classpath does not consist of directories only; resources can be inside archives (typically JARs) or on servers, and may not exist as something that can be accessed via a java.io.File object.
Typically the core problem is to use java.io.File objects where an InputStream would be sufficient. Sometimes you can't do anything against it when using a third-party library but it is a hint that the library designers didn't work very carefully. If you need the file handle in your own code you should have another look why it can't be an InputStream. Most of the time it can.

ZipFile is throwing error but ZipInputStream is able to decompress the archive

I am experiencing a strange behavior with java.util.zip.*
I have a zip file and upon decompressing follwing tihngs happen
ZipFile zipfile = new ZipFile(file, ZipFile.OPEN_READ);
This is exaxt error message
java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:127)
at java.util.zip.ZipFile.<init>(ZipFile.java:143)
at com.basware.ExtractZip.unpack(ExtractZip.java:27)
at com.basware.ExtractZip.main(ExtractZip.java:17)
But if I use the following code it is able to open the archive without any errors
try {
BufferedOutputStream dest = null;
File file = new File("File_Path");
FileInputStream fis = new FileInputStream(file);
ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null) {
System.out.println("Extracting: " +entry);
int count;
byte data[] = new byte[BUFFER];
// write the files to the disk
FileOutputStream fos = new
FileOutputStream(entry.getName());
dest = new
BufferedOutputStream(fos, BUFFER);
while ((count = zis.read(data, 0, BUFFER))
!= -1) {
dest.write(data, 0, count);
}
dest.flush();
dest.close();
}
zis.close();
Please note that files are compressed using WinZIP.
My question is as ZipFile and ZipInputStream are almost same ,why ZipFile is giving exception and why it is unable to perform decompression.
EDIT : The problem is if I zip the file using WinZip tool and then decompress it using listed program it is working fine.But, this problem is specifically coming for archives coming from external source(external source claims that they are using WinZip).On top of it, if I open the very same archive(external one) using WinZip tool it is showing and decompressing files.But this JAVA specific code(ZipFile) is not working at all.
EDIT: I am not able to figure it out why java native code is not working for my ZIP archives, but apache compress solved my problem.It is working for me as suggested by Ian Roberts.
ZipFile attempts to parse the "central directory" at the end of the zip in order to build up a data structure that allows you to access individual entries by name. ZipInputStream doesn't, it only looks at the local header of each entry as it reads through the file from top to bottom. So it looks like your file has good entries but a corrupted central directory for some reason.
There are a number of possibilities, for example issues with the encoding of non-ASCII characters in entry names, or if the zip has more than 64k entries. I would try the commons-compress implementation of ZipFile - even if it doesn't work it should give you a more specific error message than the "something is wrong" that you get from java.util.zip.
In addition to Ian Robert's answer, if Java 7 is an option, you may wish to sidestep the older java.util.zip libraries in favor of using the ZIP filesystem provider.

What is the fastest way to extract 1 file from a zip file which contain a lot of file?

I tried the java.util.zip package, it is too slow.
Then I found LZMA SDK and 7z jbinding but they are also lacking something.
The LZMA SDK does not provide a kind of documentation/tutorial of how-to-use, it is very frustrating. No javadoc.
While the 7z jbinding does not provide a simple way to extract only 1 file, however, it only provide way to extract all the content of the zip file. Moreover, it does not provide a way to specify a location to place the unzipped file.
Any idea please?
What does your code with java.util.zip look like and how big of a zip file are you dealing with?
I'm able to extract a 4MB entry out of a 200MB zip file with 1,800 entries in roughly a second with this:
OutputStream out = new FileOutputStream("your.file");
FileInputStream fin = new FileInputStream("your.zip");
BufferedInputStream bin = new BufferedInputStream(fin);
ZipInputStream zin = new ZipInputStream(bin);
ZipEntry ze = null;
while ((ze = zin.getNextEntry()) != null) {
if (ze.getName().equals("your.file")) {
byte[] buffer = new byte[8192];
int len;
while ((len = zin.read(buffer)) != -1) {
out.write(buffer, 0, len);
}
out.close();
break;
}
}
I have not benchmarked the speed but with java 7 or greater, I extract a file as follows.
I would imagine that it's faster than the ZipFile API:
A short example extracting META-INF/MANIFEST.MF from a zip file test.zip:
// file to extract from zip file
String file = "MANIFEST.MF";
// location to extract the file to
File outputLocation = new File("D:/temp/", file);
// path to the zip file
Path zipFile = Paths.get("D:/temp/test.zip");
// load zip file as filesystem
try (FileSystem fileSystem = FileSystems.newFileSystem(zipFile)) {
// copy file from zip file to output location
Path source = fileSystem.getPath("META-INF/" + file);
Files.copy(source, outputLocation.toPath());
}
Use a ZipFile rather than a ZipInputStream.
Although the documentation does not indicate this (it's in the docs for JarFile), it should use random-access file operations to read the file. Since a ZIPfile contains a directory at a known location, this means a LOT less IO has to happen to find a particular file.
Some caveats: to the best of my knowledge, the Sun implementation uses a memory-mapped file. This means that your virtual address space has to be large enough to hold the file as well as everything else in your JVM. Which may be a problem for a 32-bit server. On the other hand, it may be smart enough to avoid memory-mapping on 32-bit, or memory-map just the directory; I haven't tried.
Also, if you're using multiple files, be sure to use a try/finally to ensure that the file is closed after use.

Categories