Extracting PDF inside a Zip inside a Zip

Extracting PDF inside a Zip inside a Zip - java

i have checked everywhere online and stackoverflow and could not find a match specific to this issue.
I am trying to extract a pdf file that is located in a zip file that is inside a zip file (nested zips).
Re-calling the method i am using to extract does not work nor does changing the whole program to accept Inputstreams instead of how i am doing it below.
The .pdf file inside the nested zip is just skipped at this stage
public static void main(String[] args)
{
try
{
//Paths
String basePath = "C:\\Users\\user\\Desktop\\Scan\\";
File lookupDir = new File(basePath + "Data\\");
String doneFolder = basePath + "DoneUnzipping\\";
File[] directoryListing = lookupDir.listFiles();
for (int i = 0; i < directoryListing.length; i++)
{
if (directoryListing[i].isFile()) //there's definately a file
{
//Save the current file's path
String pathOrigFile = directoryListing[i].getAbsolutePath();
Path origFileDone = Paths.get(pathOrigFile);
Path newFileDone = Paths.get(doneFolder + directoryListing[i].getName());
//unzip it
if(directoryListing[i].getName().toUpperCase().endsWith(ZIP_EXTENSION)) //ZIP files
{
unzip(directoryListing[i].getAbsolutePath(), DESTINATION_DIRECTORY + directoryListing[i].getName());
//move to the 'DoneUnzipping' folder
Files.move(origFileDone, newFileDone);
}
}
}
} catch (Exception e)
{
e.printStackTrace(System.out);
}
}
private static void unzip(String zipFilePath, String destDir)
{
//buffer for read and write data to file
byte[] buffer = new byte[BUFFER_SIZE];
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFilePath)))
{
FileInputStream fis = new FileInputStream(zipFilePath);
ZipEntry ze = zis.getNextEntry();
while(ze != null)
{
String fileName = ze.getName();
int index = fileName.lastIndexOf("/");
String newFileName = fileName.substring(index + 1);
File newFile = new File(destDir + File.separator + newFileName);
//Zips inside zips
if(fileName.toUpperCase().endsWith(ZIP_EXTENSION))
{
ZipInputStream innerZip = new ZipInputStream(zis);
ZipEntry innerEntry = null;
while((innerEntry = innerZip.getNextEntry()) != null)
{
System.out.println("The file: " + fileName);
if(fileName.toUpperCase().endsWith("PDF"))
{
FileOutputStream fos = new FileOutputStream(newFile);
int len;
while ((len = innerZip.read(buffer)) > 0)
{
fos.write(buffer, 0, len);
}
fos.close();
}
}
}
//close this ZipEntry
zis.closeEntry(); // java.io.IOException: Stream Closed
ze = zis.getNextEntry();
}
//close last ZipEntry
zis.close();
fis.close();
} catch (IOException e)
{
e.printStackTrace();
}
}

The solution to this is not as obvious as it seems. Despite writing a few zip utilities myself some time ago, getting zip entries from inside another zip file only seems obvious in retrospect
(and I also got the java.io.IOException: Stream Closed on my first attempt).
The Java classes for ZipFile and ZipInputStream really direct your thinking into using the file system, but it is not required.
The functions below will scan a parent-level zip file, and continue scanning until it finds an entry with a specified name. (Nearly) everything is done in-memory.
Naturally, this can be modified to use different search criteria, find multiple file types, etc. and take different actions, but this at least demonstrates the basic technique in question -- zip files inside of zip files -- no guarantees on other aspects of the code, and someone more savvy could most likely improve the style.
final static String ZIP_EXTENSION = ".zip";
public static byte[] getOnePDF() throws IOException
{
final File source = new File("/path/to/MegaData.zip");
final String nameToFind = "FindThisFile.pdf";
final ByteArrayOutputStream mem = new ByteArrayOutputStream();
try (final ZipInputStream in = new ZipInputStream(new BufferedInputStream(new FileInputStream(source))))
{
digIntoContents(in, nameToFind, mem);
}
// Save to disk, if you want
// copy(new ByteArrayInputStream(mem.toByteArray()), new FileOutputStream(new File("/path/to/output.pdf")));
// Otherwise, just return the binary data
return mem.toByteArray();
}
private static void digIntoContents(final ZipInputStream in, final String nameToFind, final ByteArrayOutputStream mem) throws IOException
{
ZipEntry entry;
while (null != (entry = in.getNextEntry()))
{
final String name = entry.getName();
// Found the file we are looking for
if (name.equals(nameToFind))
{
copy(in, mem);
return;
}
// Found another zip file
if (name.toUpperCase().endsWith(ZIP_EXTENSION.toUpperCase()))
{
digIntoContents(new ZipInputStream(new ByteArrayInputStream(getZipEntryFromMemory(in))), nameToFind, mem);
}
}
}
private static byte[] getZipEntryFromMemory(final ZipInputStream in) throws IOException
{
final ByteArrayOutputStream mem = new ByteArrayOutputStream();
copy(in, mem);
return mem.toByteArray();
}
// General purpose, reusable, utility function
// OK for binary data (bad for non-ASCII text, use Reader/Writer instead)
public static void copy(final InputStream from, final OutputStream to) throws IOException
{
final int bufferSize = 4096;
final byte[] buf = new byte[bufferSize];
int len;
while (0 < (len = from.read(buf)))
{
to.write(buf, 0, len);
}
to.flush();
}

Your question asks how to use java (by implication in windows) to extract a pdf from a zip inside another outer zip.
In many systems including windows it is a single line command that will depend on the location of source and target folders, however using the shortest example of current downloads folder it would be in a shell as simple as
tar -xf "german (2).zip" && tar -xf "german.zip" && german.pdf
to shell the command in windows see
How do I execute Windows commands in Java?
The default pdf viewer can open the result so Windows Edge or in my case SumatraPDF
There is generally no point in putting a pdf inside a zip because it cannot be run in there. So single nesting would be advisable if needed for download transportation.
There is no need to add a password to the zip because PDF uses its own password for opening. Thus unwise to add two levels of complexity. Keep it simple.
If you have multiple zips nested inside multiple zips with multiple pdfs in each then you have to be more specific by filtering names. However avoid that extra onion skin where possible.
\Downloads>tar -xf "german (2).zip" "both.zip" && tar -xf "both.zip" "English language.pdf"
You could complicate that by run in a memory or temp folder but it is reliable and simple to use the native file system so consider without Java its fastest to run
CD /D "C:/Users/user/Desktop/Scan/DoneUnzipping" && for %f in (..\Data\*.zip) do tar -xf "%f" "*.zip" && for %f in (*.zip) do tar -xf "%f" "*.pdf" && del "*.zip"
This will extract all inner zips into working folder then extract all PDFs and remove all the essential temporary zips. The source double zips will not be deleted simply touched.

The line that causes your problem looks to be auto-close block you have created when reading the inner zip:
try(ZipInputStream innerZip = new ZipInputStream(fis)) {
...
}
Several likely issues: firstly it is reading the wrong stream - fis not the existing zis.
Secondly, you shouldn't use try-with-resources for auto-close on innerZip as this implicitly calls innerZip.close() when exiting the block. If you view the source code of ZipInputStream via a good IDE you should see (eventually) that ZipInputStream extends InflaterInputStream which itself extends FilterInputStream. A call to innerZip.close() will close the underlying outer stream zis (fis in your case) hence stream is closed when you resume the next entry of the outer zip.
Therefore remove the try() block and add use of zis:
ZipInputStream innerZip = new ZipInputStream(zis);
Use try-catch block only for the outermost file handling:
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFilePath))) {
ZipEntry ze = zis.getNextEntry();
...
}
Thirdly, you appear to be copying the wrong stream when extracting a PDF - use innerZip not outer zis. The code will never extract PDF as these 2 lines can never be true at the same time because a file ending ZIP will never end PDF too:
if(fileName.toUpperCase().endsWith(ZIP_EXTENSION)) {
...
// You want innerEntry.getName() here
if(fileName.toUpperCase().endsWith("PDF"))
You should be able to switch to one line Files.copy and make use of the PDF filename not zip filename:
if(innerEntry.getName().toUpperCase().endsWith("PDF")) {
Path newFile = Paths.get(destDir + '-'+innerEntry.getName().replace("/", "-"));
System.out.println("Files.copy to " + newFile);
Files.copy(innerZip, newFile);
}

Related

Java program ignoring all the files inside the zip file [duplicate]

This question already has answers here:
How to unzip files recursively in Java?
(10 answers)
Closed last month.
I have program when I give a zip folder path via console. It will go through each item inside that folder (every child item, children of child, etc..). But if it encounters a zip folder it will ignore everything inside the zip folder, I need to read everything including files inside zip folders.
Here is the method that goes through each item:
public static String[] getLogBuffers(String path) throws IOException//path is given via console
{
String zipFileName = path;
String destDirectory = path;
BufferedInputStream errorLogBuffer = null;
BufferedInputStream windowLogBuffer = null;
String strErrorLogFileContents="";
String strWindowLogFileContents="";
String[] errorString=new String[2];
byte[] buffer = new byte[1024];
ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFileName));
ZipEntry zipEntry = zis.getNextEntry();
while (zipEntry != null)
{
String filePath = destDirectory + "/" + zipEntry.getName();
System.out.println("unzipping" + filePath);
if (!zipEntry.isDirectory())
{
if (zipEntry.getName().endsWith("errorlog.txt"))
{
ZipFile zipFile = new ZipFile(path);
InputStream errorStream = zipFile.getInputStream(zipEntry);
BufferedInputStream bufferedInputStream=new BufferedInputStream(errorStream);
byte[] contents = new byte[1024];
System.out.println("ERRORLOG NAME"+zipEntry.getName());
int bytesRead = 0;
while((bytesRead = bufferedInputStream.read(contents)) != -1) {
strErrorLogFileContents += new String(contents, 0, bytesRead);
}
}
if (zipEntry.getName().endsWith("windowlog.txt"))
{ ZipFile zipFile = new ZipFile(path);
InputStream windowStream = zipFile.getInputStream(zipEntry);
BufferedInputStream bufferedInputStream=new BufferedInputStream(windowStream);
byte[] contents = new byte[1024];
System.out.println("WINDOWLOG NAME"+zipEntry.getName());
int bytesRead = 0;
while((bytesRead = bufferedInputStream.read(contents)) != -1) {
strWindowLogFileContents += new String(contents, 0, bytesRead);
}
}
}
zis.closeEntry();
zipEntry = zis.getNextEntry();
}
errorString[0]=strErrorLogFileContents;
errorString[1]=strWindowLogFileContents;
zis.closeEntry();
zis.close();
System.out.println("Buffers ready");
return errorString;
}
Items accessed inside the parent zip folder (my console output):
unzippingC:logFolders/logX3.zip/logX3/
unzippingC:logFolders/logX3.zip/logX3/Anan/
unzippingC:logFolders/logX3.zip/logX3/Anan/errorreports/
unzippingC:logFolders/logX3.zip/logX3/Anan/errorreports/2021-11-23_103518.zip
unzippingC:logFolders/logX3.zip/logX3/Anan/errorreports/errorlog.txt
unzippingC:logX3.zip/logX3/Anan/errorreports/version.txt
unzippingC:logFolders/logX3.zip/logX3/Anan/errorreports/windowlog.txt
As you can see the program only go until 2021-11-23_103518.zip and goes in another path after that but 2021-11-23_103518.zip has children items(files) that I need to access
appreciate any help, thanks

A zip file is not a folder. Although Windows treats a zip file as if it’s a folder,* it is not a folder. A .zip file is a single file with an internal table of entries, each containing compressed data.
Each inner .zip file you read requires a new ZipFile or ZipInputStream. There is no way around that.
You should not create new ZipFile instances to read the same .zip file’s entries. You only need one ZipFile object. You can go through its entries with its entries() method, and you can read each entry with the ZipFile’s getInputStream method.
(I wouldn’t be surprised if using multiple objects to read the same zip file were to run into file locking problems on Windows.)
try (ZipFile zipFile = new ZipFile(path))
{
Enumeration<? extends ZipEntry> entries = zipFile.entries();
while (entries.hasMoreElements())
{
ZipEntry zipEntry = entries.nextElement();
if (zipEntry.getName().endsWith("errorlog.txt"))
{
try (InputStream errorStream = zipFile.getInputStream(zipEntry))
{
// ...
}
}
}
}
Notice that no other ZipFile or ZipInputStream objects are created. Only zipFile reads and traverses the file. Also notice the use of a try-with-resources statement to implicitly close the ZipFile and the InputStream.
You should not use += to build a String. Doing so creates a lot of intermediate String objects which will have to be garbage collected, which can hurt your program’s performance. You should wrap each zip entry’s InputStream in an InputStreamReader, then use that Reader’s transferTo method to append to a single StringWriter that holds your combined log.
String strErrorLogFileContents = new StringWriter();
String strWindowLogFileContents = new StringWriter();
try (ZipFile zipFile = new ZipFile(path))
{
Enumeration<? extends ZipEntry> entries = zipFile.entries();
while (entries.hasMoreElements())
{
ZipEntry zipEntry = entries.nextElement();
if (zipEntry.getName().endsWith("errorlog.txt"))
{
try (Reader entryReader = new InputStreamReader(
zipFile.getInputStream(zipEntry),
StandardCharsets.UTF_8))
{
entryReader.transferTo(strErrorLogFileContents);
}
}
}
}
Notice the use of StandardCharsets.UTF_8. It is almost never correct to create a String from bytes without specifying the Charset. If you don’t provide the Charset, Java will use the system’s default Charset, which means your program will behave differently in Windows than it will on other operating systems.
If you are stuck with Java 8, you won’t have the transferTo method of Reader, so you will have to do the work yourself:
if (zipEntry.getName().endsWith("errorlog.txt"))
{
try (Reader entryReader = new BufferedReader(
new InputStreamReader(
zipFile.getInputStream(zipEntry),
StandardCharsets.UTF_8)))
{
int c;
while ((c = entryReader.read()) >= 0)
{
strErrorLogFileContents.write(c);
}
}
}
The use of BufferedReader means you don’t need to create your own array and implement bulk reads yourself. BufferedReader already does that for you.
As mentioned above, a zip entry which is itself an inner zip file requires a brand new ZipFile or ZipInputStream object to read it. I recommend copying the entry to a temporary file, since reading from a ZipInputStream made from another ZipInputStream is known to be slow, then deleting the temporary file after you’re done reading it.
try (ZipFile zipFile = new ZipFile(path))
{
Enumeration<? extends ZipEntry> entries = zipFile.entries();
while (entries.hasMoreElements())
{
ZipEntry zipEntry = entries.nextElement();
if (zipEntry.getName().endsWith(".zip"))
{
Path tempZipFile = Files.createTempFile(null, ".zip");
try (InputStream errorStream = zipFile.getInputStream(zipEntry))
{
Files.copy(errorStream, tempZipFile,
StandardCopyOption.REPLACE_EXISTING);
}
String[] logsFromZip = getLogBuffers(tempZipFile.toString());
strErrorLogFileContents.write(logsFromZip[0]);
strWindowLogFileContents.write(logsFromZip[1]);
Files.delete(tempZipFile);
}
}
}
Finally, consider creating a meaningful class for your return value. An array of Strings is difficult to understand. A caller won’t know that it always contains exactly two elements and won’t know what those two elements are. A custom return type would be pretty short:
public class Logs {
private final String errorLog;
private final String windowLog;
public Logs(String errorLog,
String windowLog)
{
this.errorLog = errorLog;
this.windowLog = windowLog;
}
public String getErrorLog()
{
return errorLog;
}
public String getWindowLog()
{
return windowLog;
}
}
As of Java 16, you can use a record to make the declaration much shorter:
public record Logs(String errorLog,
String windowLog)
{ }
Whether you use a record or write out the class, you can use it as a return type in your method:
public static Logs getLogBuffers(String path) throws IOException
{
// ...
return new Logs(
strErrorLogFileContents.toString(),
strWindowLogFileContents.toString());
}
* The Windows explorer shell’s practice of treating zip files as folders is a pretty bad user interface. I know I’m not the only one who thinks so. It often ends up making things more difficult for users instead of easier.

Java Reading from n-nested zips, modyfing and writing to new zip preserving original structure

PROBLEM SOLVED IN EDIT 3
I've been struggling with this problem for sometime. All of the questions here in SO or internet seems to work only on 'shallow' structures with one zip inside of another. However I have zip archive which structure is more or less something like this:
input.zip/
--1.zip/
--folder/
----2.zip/
------3.zip/
--------test/
----------some-other-folder/
----------archive.gz/
------------filte-to-parse
----------file-to-parse3.txt
------file-to-parse.txt
--4.zip/
------folder/
and so on so on, my code needs to handle N-level of zips while preserving original zips, gzips, folders and files structure. Using temporary files is forbidden as of lack of privileges (this is something i'm not willing to change).
This is my code I wrote so far, however ZipOutputStream seems to operate only on one (top) level - in case of directories with files/dirs named exactly the same it throws Exception in thread "main" java.util.zip.ZipException: duplicate entry: folder/. It also skips empty directories (which is not expected). What I want to achieve is somehow move my ZipOutputStream to 'lower' level and do operations on each of zips. Maybe there's better approach to handle all of this problem, any help would be appreciated. I need to perform certain text extraction/modification later, however I'm not starting it yet until reading/writing whole structure is not working properly. Thanks in advance for any help!
//constructor
private final File zipFile;
ArchiveResolver(String fileToHandle) {
this.zipFile = new File(Objects.requireNonNull(getClass().getClassLoader().getResource(fileToHandle)).getFile());
}
void resolveInputFile() throws Exception {
FileInputStream fileInputStream = new FileInputStream(this.zipFile);
FileOutputStream fileOutputStream = new FileOutputStream("out.zip");
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);
zip(zipInputStream, zipOutputStream);
zipInputStream.close();
zipOutputStream.close();
}
// this one doesn't preserve internal structure(empty folders), but can work on each file
private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
System.out.println(entry.getName());
byte[] buffer = new byte[1024];
int length;
if (entry.getName().endsWith(".zip")) {
// wrapping outer zip streams to inner streams making actual entries a new source
ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);
ZipEntry zipEntry = new ZipEntry(entry.getName());
// add new zip entry here to outer zipOutputStream: i.e. data.zip
zipOutputStream.putNextEntry(zipEntry);
// now treat this data.zip as parent and call recursively zipFolder on it
zip(innerZipInputStream, innerZipOutputStream);
// Finish internal stream work when innerZipOutput is done
innerZipOutputStream.finish();
// Close entry
zipOutputStream.closeEntry();
} else if (entry.isDirectory()) {
// putting new zip entry into output stream and adding extra '/' to make
// sure zipOutputStream will treat it as folder
ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");
// this only should preserve internal structure
zipOutputStream.putNextEntry(zipEntry);
// reading everything from zipInputStream
while ((length = zipInputStream.read(buffer)) > 0) {
// sending it straight to zipOutputStream
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
// This else will include checking if file is respectively:
// .gz file <- then open it, read from file inside, modify and save it
// .txt file <- also read, modify and preserve
} else {
// create new entry on top of this
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
}
// This one preserves internal structure (empty folders and so)
// BUT! no work on each file is possible it just preserves everything as it is
private void zipWhole(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
System.out.println(entry.getName());
byte[] buffer = new byte[1024];
int length;
zipOutputStream.putNextEntry(new ZipEntry(entry.getName()));
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
EDIT:
Updated my code to the newest version, still nothing to be proud of but did some changes however still not working... I've added here two very important comments about (in my opinion) code that fails. So I've tested two approaches - the first one is getting ZipInputStream from zipFile by using getInputStream(ZipEntry e); - throws Exception in thread "main" java.util.zip.ZipException: no current ZIP entry when I'm trying to put some entries to ZipOutputStream. The second approach focuses on "wrapping" ZipInputStream into one another -> this results in empty ZipInputStreams with no entries and application just goes through the files, list them (only top level of zips...) and finishes without saving anything into the out.zip file.
EDIT 2:
With a little suggestions from the people in the comments, I've decided to rewrite my code focusing to close, finish and closeEntry in appropriate places (I hope i did it better now). So right now I've achieved a little of something - code iterates through every entry, and saves it into out.zip file with proper zip packaging inside. Still skips empty folders tho, not sure why (I've checked some of the questions on stack and web, seems ok). Anyway thanks for help so far, I'll try to work this out and I'll keep this updated.
EDIT 3:
After few approaches to the problem and some reading + refactoring I've managed to solve this problem (however there's still problem while running this code on Linux - empty directories are skipped, seems to be connected to they way certain OS preserve file information?).
Here's working solution:
void resolveInputFile() throws IOException {
FileInputStream fileInputStream = new FileInputStream(this.zipFile);
FileOutputStream fileOutputStream = new FileOutputStream("in.zip");
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);
zip(zipInputStream, zipOutputStream);
zipInputStream.close();
zipOutputStream.close();
}
private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
logger.info(entry.getName());
if (entry.getName().endsWith(".zip")) {
// If entry is zip, I create inner zip streams that wrap outer ones
ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
zip(innerZipInputStream, innerZipOutputStream);
//As mentioned in comments, proper streams needs to be properly closed/finished, I'm done writing to inner stream so I call finish() rather than close() which closes outer stream
innerZipOutputStream.finish();
zipOutputStream.closeEntry();
} else if (entry.getName().endsWith(".gz")) {
GZIPInputStream gzipInputStream = new GZIPInputStream(zipInputStream);
//small trap while using GZIP - to save it properly I needed to put new ZipEntry to outerZipOutputStream BEFORE creating GZIPOutputStream wrapper
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(zipOutputStream);
//To make it as as much efficient as possible I've used BufferedReader
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
long start = System.nanoTime();
logger.info("Started to process {}", zipEntry.getName());
String line;
while ((line = bufferedReader.readLine()) != null) {
//PROCESSING LINE BY LINE...
zipOutputStream.write((line + "\n").getBytes());
}
logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
gzipOutputStream.finish();
zipOutputStream.closeEntry();
} else if (entry.getName().endsWith(".txt")) {
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(zipInputStream));
long start = System.nanoTime();
logger.info("Started to process {}", zipEntry.getName());
String line;
while ((line = bufferedReader.readLine()) != null) {
//PROCESSING LINE BY LINE...
zipOutputStream.write((line + "\n").getBytes());
}
logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
zipOutputStream.closeEntry();
} else if (entry.isDirectory()) {
//Standard directory preserving
byte[] buffer = new byte[8192];
int length;
// Adding extra "/" to make sure it's dir
ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
// sending it straight to zipOutputStream
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
} else {
//In my case it probably will never be called but if there's some different file in here it will be preserved unchanged in the output file
byte[] buffer = new byte[8192];
int length;
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
}
Thanks again for all the help and good advices.

There seems to be a lot of debugging and refactoring to be done there.
There's an obvious problem that you are either not closing your streams/entries or doing so in the wrong order. Buffered data will get lost and the central directory not written. (There is a complication that Java streams unhelpfully close the stream they wrap, so there is finish vs close but it still needs to be done in the correct order).
Zip files have no representation for directories as they have a flat structure - the entire file path is included for each entry in both the local header and central directory.
The part of the Java zip library giving a random access interface uses memory mapped files, so you are stuck with streams for everything except, perhaps, the top level.

Java ZipEntry and Zipoutputstream directory

I have this little piece of code
public void doBuild() throws IOException {
ZipEntry sourceEntry=new ZipEntry(sourcePath);
ZipEntry assetEntry=new ZipEntry(assetPath);
ZipOutputStream out = new ZipOutputStream(new FileOutputStream("output/"+workOn.getName().replaceAll(".bld"," ")+buildNR+".zip"));
out.putNextEntry(sourceEntry);
out.putNextEntry(assetEntry);
out.close();
System.err.println("Build success!");
increaseBuild();
}
So, if I run it it runs trough it fine, creates the .zip and all, but the zip file is empty. sourceEntry and assetEntry are both directories. How could I get those directories to my .zip easily?
For those interested this is a MC mod build system and can be found at https://bitbucket.org/makerimages/makerbuild-system NOTE: the code above is not commited or pushed to there yet!!!!!!!!

Try something like this. The parameter useFullFileNames specifies
whether you want to preserve the full names of the paths to the
files which you're about to zip.
So if you have two files
/dir1/dir2/a.txt
and
/dir1/b.txt
the useFullFileNames specifies if you want to finally see in
the zip file those original paths to the two files or just
the two files with no paths like this
a.txt
and
b.txt
in the root of the zip file which you create.
Note that in my example, the files which are zipped
are actually read and then written to out.
I think you're missing that part.
public static boolean createZip(String fNameZip, boolean useFullFileNames, String... fNames) throws Exception {
try {
int cntBufferSize = 256 * 1024;
BufferedInputStream origin = null;
FileOutputStream dest = new FileOutputStream(fNameZip);
ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(dest));
byte bBuffer[] = new byte[cntBufferSize];
File ftmp = null;
for (int i = 0; i < fNames.length; i++) {
if (fNames[i] != null) {
FileInputStream fi = new FileInputStream(fNames[i]);
origin = new BufferedInputStream(fi, cntBufferSize);
ftmp = new File(fNames[i]);
ZipEntry entry = new ZipEntry(useFullFileNames ? fNames[i] : ftmp.getName());
out.putNextEntry(entry);
int count;
while ((count = origin.read(bBuffer, 0, cntBufferSize)) != -1) {
out.write(bBuffer, 0, count);
}
origin.close();
}
}
out.close();
return true;
} catch (Exception e) {
return false;
}
}

FileNotFoundException when trying to unzip an archive with java.util.zip.ZipFile

I have a silly problem i haven't been able to figure out. Can anyone help me?
My Code is as:
String zipname = "C:/1100.zip";
String output = "C:/1100";
BufferedInputStream bis = null;
BufferedOutputStream bos = null;
try {
ZipFile zipFile = new ZipFile(zipname);
Enumeration<?> enumeration = zipFile.entries();
while (enumeration.hasMoreElements()) {
ZipEntry zipEntry = (ZipEntry) enumeration.nextElement();
System.out.println("Unzipping: " + zipEntry.getName());
bis = new BufferedInputStream(zipFile.getInputStream(zipEntry));
int size;
byte[] buffer = new byte[2048];
It doesn't create a folder but debugging shows all the contents being generated.
In Order to create a folder i used the code
if(!output.exists()){ output.mkdir();} // here i get an error saying filenotfoundexception
bos = new BufferedOutputStream(new FileOutputStream(new File(outPut)));
while ((size = bis.read(buffer)) != -1) {
bos.write(buffer, 0, size);
}
}
} catch (Exception ex) {
ex.printStackTrace();
} finally {
bos.flush();
bos.close();
bis.close();
}
My zip file contains images: a.jpg b.jpg... and in the same hierarchy, I have abc.xml.
I need to extract the content as is in the zip file.
Any helps here.

There are a few problems with your code: Where is outPut declared? output is not a file but a string, so exists() and mkdir() do not exist. Start by declaring output like:
File output = new File("C:/1100");
Furthermore, outPut (with big P) is not declared. It be something like output + File.seprator + zipEntry.getName().
bos = new BufferedOutputStream(new FileOutputStream(output + File.seprator + zipEntry.getName()));
Note that you don't need to pass a File to FileOutputStream, as constructors show in the documentation.
At this point, your code should work if your Zip file does not contain directory. However, when opening the output stream, if zipEntry.getName() has a directory component (for instance somedir/filename.txt), opening the stream will result in a FileNotFoundException, as the parent directory of the file you try to create does not exist. If you want to be able to handle such zip files, you will find your answer in: How to unzip files recursively in Java?

How to use java.util.zip to archive/deflate string in java for use in Google Earth?

Use Case
I need to package up our kml which is in a String into a kmz response for a network link in Google Earth. I would like to also wrap up icons and such while I'm at it.
Problem
Using the implementation below I receive errors from both WinZip and Google Earth that the archive is corrupted or that the file cannot be opened respectively. The part that deviates from other examples I'd built this from are the lines where the string is added:
ZipEntry kmlZipEntry = new ZipEntry("doc.kml");
out.putNextEntry(kmlZipEntry);
out.write(kml.getBytes("UTF-8"));
Please point me in the right direction to correctly write the string so that it is in doc.xml in the resulting kmz file. I know how to write the string to a temporary file, but I would very much like to keep the operation in memory for understandability and efficiency.
private static final int BUFFER = 2048;
private static void kmz(OutputStream os, String kml)
{
try{
BufferedInputStream origin = null;
ZipOutputStream out = new ZipOutputStream(os);
out.setMethod(ZipOutputStream.DEFLATED);
byte data[] = new byte[BUFFER];
File f = new File("./icons"); //folder containing icons and such
String files[] = f.list();
if(files != null)
{
for (String file: files) {
LOGGER.info("Adding to KMZ: "+ file);
FileInputStream fi = new FileInputStream(file);
origin = new BufferedInputStream(fi, BUFFER);
ZipEntry entry = new ZipEntry(file);
out.putNextEntry(entry);
int count;
while((count = origin.read(data, 0, BUFFER)) != -1) {
out.write(data, 0, count);
}
origin.close();
}
}
ZipEntry kmlZipEntry = new ZipEntry("doc.kml");
out.putNextEntry(kmlZipEntry);
out.write(kml.getBytes("UTF-8"));
}
catch(Exception e)
{
LOGGER.error("Problem creating kmz file", e);
}
}
Bonus points for showing me how to put the supplementary files from the icons folder into a similar folder within the archive as opposed to at the same layer as the doc.kml.
Update Even when saving the string to a temp file the errors occur. Ugh.
Use Case Note The use case is for use in a web app, but the code to get the list of files won't work there. For details see how-to-access-local-files-on-server-in-jboss-application

You forgot to call close() on ZipOutputStream. Best place to call it is the finally block of the try block where it's been created.
Update: To create a folder, just prepend its name in the entry name.
ZipEntry entry = new ZipEntry("icons/" + file);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.