I'm looking for a way to extract Zip file. So far I have tried java.util.zip and org.apache.commons.compress, but both gave a corrupted output.
Basically, the input is a ZIP file contain one single .doc file.
java.util.zip: Output corrupted.
org.apache.commons.compress: Output blank file, but with 2 mb size.
So far only the commercial software like Winrar work perfectly. Is there a java library that make use of this?
This is my method using java.util library:
public void extractZipNative(File fileZip)
{
ZipInputStream zis;
StringBuilder sb;
try {
zis = new ZipInputStream(new FileInputStream(fileZip));
ZipEntry ze = zis.getNextEntry();
byte[] buffer = new byte[(int) ze.getSize()];
FileOutputStream fos = new FileOutputStream(this.tempFolderPath+ze.getName());
int len;
while ((len=zis.read(buffer))>0)
{
fos.write(buffer);
}
fos.flush();
fos.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally
{
if (zis!=null)
{
try { zis.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
Many thanks,
Mike
I think your input may be compressed by some "incompatible" zip program like 7zip.
Try investigating first if it can be unpacked with a classical WinZip or such.
Javas zip handling is very well able to deal with zipped archives that come from a "compatible" zip compressor.
It is an error in my code. I need to specify the offset and len of bytes write.
it works for me
ZipFile Vanilla = new ZipFile(new File("Vanilla.zip")); //zipfile defined and needs to be in directory
Enumeration<? extends ZipEntry> entries = Vanilla.entries();// all (files)entries of zip file
while(entries.hasMoreElements()){//runs while there is files in zip
ZipEntry entry = entries.nextElement();//gets name of file in zip
File folderw =new File("tkwgter5834");//creates new directory
InputStream stream = Vanilla.getInputStream(entry);//gets input
FileInputStream inpure= new FileInputStream("Vanilla.zip");//file input stream for zip file to read bytes of file
FileOutputStream outter = new FileOutputStream(new File(folderw +"//"+ entry.toString())); //fileoutput stream creates file inside defined directory(folderw variable) by file's name
outter.write(inpure.readAllBytes());// write into files which were created
outter.close();//closes fileoutput stream
}
Have you tried jUnrar? Perhaps it might work:
https://github.com/edmund-wagner/junrar
If that doesn't work either, I guess your archive is corrupted in some way.
If you know the environment that you're going to be running this code in, I think you're much better off just making a call to the system to unzip it for you. It will be way faster than anything that you implement in java.
I wrote the code to extract a zip file with nested directories and it ran slowly and took a lot of CPU. I wound up replacing it with this:
Runtime.getRuntime().exec(String.format("unzip %s -d %s", archive.getAbsolutePath(), basePath));
That works a lot better.
Related
I have followed following approach to decompress a zip using apache commons compress:
But since I am using OutputStream & IOUtils.copy(ais, os); (code below) to unzip and copy file, the timestamp is not preserved. Is there another way to directly copy the file from the zip such that file timestamp can be preserved.
try (ArchiveInputStream ais =
asFactory.createArchiveInputStream(
new BufferedInputStream(
new FileInputStream(archive)))) {
System.out.println("Extracting!");
ArchiveEntry ae;
while ((ae = ais.getNextEntry()) != null) {
// check if file needs to be extracted {}
if(!extract())
continue;
if (ae.isDirectory()) {
File dir = new File(archive.getParentFile(), ae.getName());
dir.mkdirs();
continue;
}
File f = new File(archive.getParentFile(), ae.getName());
File parent = f.getParentFile();
parent.mkdirs();
try (OutputStream os = new FileOutputStream(f)) {
IOUtils.copy(ais, os);
os.close();
} catch (IOException innerIoe) {
...
}
}
ais.close();
if (!archive.delete()) {
System.out.printf("Could not remove archive %s%n",
archive.getName());
archive.deleteOnExit();
}
} catch (IOException ioe) {
...
}
EDIT: With the help of jbx's answer below, following change will make it work.
IOUtils.copy(ais, os);
os.close();
outFile.setLastModified(entry.getLastModifiedTime().toMillis()); // this line
You could set the lastModifiedTime file attribute using NIO. Do it to the file exactly after you write it (after you close it). The operating system would have marked its last modified time to the current time at that point.
https://docs.oracle.com/javase/tutorial/essential/io/fileAttr.html
You will need to get the last modified time from the zip file, so maybe using NIO's
Zip Filesystem Provider` to browse and extract files from the archive would be better than your current approach (unless the APIs you are using provide you the same information).
https://docs.oracle.com/javase/7/docs/technotes/guides/io/fsp/zipfilesystemprovider.html
I am aware that Oracle notes ZIP/GZIP file compressor/decompressor methods on their website. But I have a scenario where I need to scan and find out whether any nested ZIPs/RARs are involved. For example, the following case:
-MyFiles.zip
-MyNestedFiles.zip
-MyMoreNestedFiles.zip
-MoreProbably.zip
-Other_non_zips
-Other_non_zips
-Other_non_zips
I know that apache commons compress package and java.util.zip are the wideley used packages where commons compress actually caters for the missing features in java.util.zip e.g. some character setting whilst doing zipouts. But what I am not sure about is the utilities for recursing through nested zip files and the answers provided on SO are not very good examples of doing this. I tried the following code (which I got from Oracle blog), but as I suspected, the nested directory recursion fails because it simply cannot find the files:
public static void processZipFiles(String pathName) throws Exception{
ZipInputStream zis = null;
InputStream is = null;
try {
ZipFile zipFile = new ZipFile(new File(pathName));
String nestPathPrefix = zipFile.getName().substring(0, zipFile.getName().length() -4);
for(Enumeration e = zipFile.entries(); e.hasMoreElements();){
ZipEntry ze = (ZipEntry)e.nextElement();
if(ze.getName().contains(".zip")){
is = zipFile.getInputStream(ze);
zis = new ZipInputStream(is);
ZipEntry zentry = zis.getNextEntry();
while (zentry!=null){
System.out.println(zentry.getName());
zentry = zis.getNextEntry();
ZipFile nestFile = new ZipFile(nestPathPrefix+"\\"+zentry.getName());
if (zentry.getName().contains(".zip")) {
processZipFiles(nestPathPrefix+"\\"+zentry.getName());
}
}
is.close();
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally{
if(is != null)
is.close();
if(zis!=null)
zis.close();
}
}
May be I am doing something wrong - or using the wrong utils. My objective is to identify whether any of the files or nested zip files have got file extensions which I am not allowing. This is to make sure that I can prevent my users to upload forbidden files even when they are zipping it. I also have the option to use Tika which can do recursive parsing (Using Zukka Zitting's solution), but I am not sure if I can use the Metadata to do this detection how I want.
Any help/suggestion is appreciated.
Using Commons Compress would be easier, not least because it has sensible shared interfaces between the various decompressors which make life easier + allows handling of other compression formats (eg Tar) at the same time
If you do want to use only the built-in Zip support, I'd suggest you do something like this:
File file = new File("outermost.zip");
FileInputStream input = new FileInputStream(file);
check(input, file.toString());
public static void check(InputStream compressedInput, String name) {
ZipInputStream input = new ZipInputStream(compressedInput);
ZipEntry entry = null;
while ( (entry = input.getNextEntry()) != null ) {
System.out.println("Found " + entry.getName() + " in " + name);
if (entry.getName().endsWith(".zip")) { // TODO Better checking
check(input, name + "/" + entry.getName());
}
}
}
Your code will fail as you're trying to read inner.zip within outer.zip as a local file, but it doesn't exist as a standalone file. The code above will process things ending with .zip as another zip file, and will recurse
You probably want to use commons compress though, so you can handle things with alternate filenames, other compression formats etc
I have successfully extracted *.gz from *.tgz and now I have no idea how to actually extract final files from *.tgz.
There are some options using custom packages but that's not an option for me, I need to use standard Java packages only.
What I tried is using same function that I use for *.tgz for *.gz but it doesn't work.
java.util.zip.ZipException: Not in GZIP format 1.gz
Here is function that is extracting *.tgz files.
public String ExtractFile(String path) {
try {
File newFile = new File(this.getFullPathWithoutExtension(path) + ".gz");
GZIPInputStream gStream;
FileOutputStream outStream;
try (FileInputStream fileStream = new FileInputStream(path)) {
gStream = new GZIPInputStream(fileStream);
outStream = new FileOutputStream(newFile);
byte[] buf = new byte[1024];
int len;
while ((len = gStream.read(buf)) > 0) {
outStream.write(buf, 0, len);
}
}
gStream.close();
outStream.close();
newFile.createNewFile();
return newFile.getPath();
} catch (Exception e) {
System.out.print(e);
}
return null;
}
TL;DR; *.tgz files extracted to *.gz files, however *.gz files cannot be extracted.
A .tgz file wouldn't normally be extracted to a .gz file - it would be extracted to a .tar file. (A .gz file is gzipped; a .tar file is an uncompressed archive containing multiple files; a .tgz is a .tar file that's then been gzipped - you've already "undone" the gzipping.)
I don't think there's anything within Java's standard libraries to handle tar files - so you'll either have to revisit your "I can't use anything not in the standard library" decision or reimplement it yourself. The file format is easily available if you decide to do that.
I am using commons compress to zip multiple files and send it the client from a Servlet.
The files could be a combination of any type of files(text, video, audio, archives, images etc). I take the inputStream of file and write to ServletOutputStream using IOUtils.copy(is, os).
The code usually works fine for any document combination but when there is a request to download files that contain more than 1 zip, I get java.io.IOException: Closed
As a result, the zip file created is corrupted even though the size of zip is summation of individual filesizes(I am not using compression).
I tried to locally create zip and use FileOutputStream instead of response.getOutputStream() in the constructor of ZipArchiveOutputStream and it succeeds.
So, it looks like the problem exists for ServletOutputStream.
Can anyone suggest any workaround.
Here is my code :
`try (ZipArchiveOutputStream zos = new ZipArchiveOutputStream( response.getOutputStream())) {
//get fileList
for(File file : files) {
addFileToZip(zos, file.getName(), new BufferedInputStream(new FileInputStream(file)));
}
zos.close()
}
`
public static void addFileToZip(ZipArchiveOutputStream zipOutputStream, String filename, InputStream inputStream) throws FileNotFoundException {
if(zipOutputStream != null && inputStream != null) {
try {
zipOutputStream.putArchiveEntry(new ZipArchiveEntry(filename));
IOUtils.copy(inputStream, zipOutputStream);
logger.debug("fileAddedToZip :" + filename);
} catch (IOException e) {
logger.error("Error in adding file :" + filename, e);
} finally {
try {
inputStream.close();
zipOutputStream.closeArchiveEntry(); //**Starts to fail here after 1st zip is added**
} catch (IOException e) {
logger.error("Error in closing zip entry :" + filename, e);
}
}
}
`
Here is the exception trace :
`
java.io.IOException: Closed
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:627)
at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
at org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.writeOut(ZipArchiveOutputStream.java:1287)
at org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.writeOut(ZipArchiveOutputStream.java:1272)
at org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.writeDataDescriptor(ZipArchiveOutputStream.java:997)
at org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.closeArchiveEntry(ZipArchiveOutputStream.java:461)
at xxx.yyy.zzz.util.ZipUtils.addFileToZip(ZipUtils.java:110)
line 110 is zipOutputStream.closeArchiveEntry(); //**Starts to fail here after 1st zip is added**
Thanks in advance.
The problem is that you use try-with-resources which automatically closes the stream you create in it, and yet you also close it manually, and therefore when the JVM tries to auto-close it is when you get java.io.IOException: Closed exception because it is already closed.
If you use try-with-resources, you don't need to close the streams you create in it. Remove your manual zos.close() statement:
try (ZipArchiveOutputStream zos =
new ZipArchiveOutputStream(response.getOutputStream())) {
//get fileList
for(File file : files) {
addFileToZip(zos, attachment.getFileName(), is);
}
} // Here zos will be closed automatically!
Also note that once zos is closed, it will also close the servlet's underlying OutputStream so you will not be able to add further entries. You have to add all before it is closed.
I am trying to unzip a file in Java and add all the contents to an array list rather than write to a disk. The issue I am having is that I pass in a certain path to a zip file and then when it reads the zip file and adds the file to the list - when I come to process the files they have strange paths inside my project directory which do not exist.
Please can someone help me here?
public void processZipFile(String path) {
File file = new File(path);
file.setReadable(true);
ZipFile zip;
ArrayList<File> files = new ArrayList<File>();
try {
zip = new ZipFile(file);
Enumeration<ZipEntry> entries = (Enumeration<ZipEntry>) zip
.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
File f = new java.io.File(entry.getName());
allFiles.add(f);
}
} catch (ZipException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
AFAIK, your entry.getName() won't return anything useful that can be used to open a file. Remember that this is just a zip entry and not a physical file.
I would suggest you to store the inputStreams for every entry in you array using zipFile.getInputStream and then extract your contents from the inputstream and finally close these streams when they are useless to you.