I get some very odd errors when using org.apache.commons.compress to read embedded archive files and I suspect it's my inexperience that is haunting me.
When running my code I get a variety of truncated zip file errors (along with other truncated file errors). I suspect it's my use of ArchiveInputStream
private final void handleArchive(String fileName, ArchiveInputStream ais) {
ArchiveEntry archiveEntry = null;
try {
while((archiveEntry = ais.getNextEntry()) != null) {
byte[] buffer = new byte[1024];
while(ais.read(buffer) != -1) {
handleFile(fileName + "/" + archiveEntry.getName(), archiveEntry.getSize(), new ByteArrayInputStream(buffer));
} catch(IOException ioe) {
ioe.printStackTrace();
}
}
When I do this archiveEntry = ais.getNextEntry() does this effectively close my ais, and is there any way to read the bytes of embedded archive files using commons compress?
You re doing some wierd stuff it seems? For each archieve entry while your reading your archieve you re recursively calling your read archieve method which results in opening the next archieve while your parent code is still handling your previous archieve.
You should loop entirely through your archieve entry before handling any new archieve entry in your compressed file. Something like
ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
arInput.read(content, offset, content.length - offset);
}
as stated in the examples on the apache site
Related
I have the following snippet and this is used to check if the given zip entry is a directory or not which works fine.There is an additional requirement given where I need to check the size of each image (its size should not be > 10 MB) inside the folder which is inside the zip file.
I been going through some articles but couldn't get hold of a scenario similar to mine.
The example structure for a zip file would look like the one given below along with the folder and the images size inside them
XYZ.zip>023423>Bat1.jpg ->11MB
XYZ.zip>023423>Bat2.jpg ->5MB
XYZ.zip>023423>Bat3.jpg ->11MB
XYZ.zip>023423>Bat4.jpg ->10MB
Based on the scenario above, at the end of the execution I should able to get the Bat1 & Bat3 as output as their size is >10 MB.Kindly advise.
private void isGivenZipInFolderStructure(ExcelImportCronJobModel
cronJob) {
try {
foldersInZip = new ArrayList<>();
if(cronJob.getReferencedContent() !=null) {
final ZipInputStream zip = new ZipInputStream(this.mediaService.getStreamFromMedia(cronJob.getReferencedContent()));
ZipEntry entry = null;
while ((entry = zip.getNextEntry()) != null) {
LOG.info("Size of the entry {}",entry.getSize());
if(entry.isDirectory()) {
foldersInZip.add(entry.getName().split(BunningsCoreConstants.FORWARD_SLASH)[0]);
}
}
}
} catch (IOException e) {
LOG.error("Error reading zip, e");
}
}
As mentioned in the comments, the value of getSize is not set when reading from a ZipInputStream - unlike when using ZipFile. However you could try to scan the stream contents yourself and monitor the sizes of each entry.
This method scans any ZIP passed in as InputStream which could be derived from a file or other downloaded source:
public static void scan(InputStream is) throws IOException {
System.out.println("==== scanning "+is);
ZipEntry ze;
// Uses ByteArrayOutputStream to determine the size of the entry
ByteArrayOutputStream bout = new ByteArrayOutputStream();
long maxSize = 10_000_000L;
try (ZipInputStream zis = new ZipInputStream(is)) {
while ((ze = zis.getNextEntry()) != null) {
bout.reset();
long size = zis.transferTo(bout);
System.out.println(ze.getName()
+(ze.isDirectory() ? " DIR" : " FILE["+size+"]")
+(size > maxSize ? " *** TOO BIG ***":"")
);
if (size > maxSize) {
// entry which is too big - do something / warning ...
} // else use content: byte[] content = bout.toByteArray();
}
}
}
This approach is not ideal for very large ZIP content, but it may be worth trying for your specific case - better to have a slow solution than none at all.
If there are really big entries in the ZIP you might also consider replacing the line long size = zis.transferTo(bout); with a call to your own method which does not transfer content but still returns the size - similar to implementation of InputStream.transferTo but commenting out the write().
PROBLEM SOLVED IN EDIT 3
I've been struggling with this problem for sometime. All of the questions here in SO or internet seems to work only on 'shallow' structures with one zip inside of another. However I have zip archive which structure is more or less something like this:
input.zip/
--1.zip/
--folder/
----2.zip/
------3.zip/
--------test/
----------some-other-folder/
----------archive.gz/
------------filte-to-parse
----------file-to-parse3.txt
------file-to-parse.txt
--4.zip/
------folder/
and so on so on, my code needs to handle N-level of zips while preserving original zips, gzips, folders and files structure. Using temporary files is forbidden as of lack of privileges (this is something i'm not willing to change).
This is my code I wrote so far, however ZipOutputStream seems to operate only on one (top) level - in case of directories with files/dirs named exactly the same it throws Exception in thread "main" java.util.zip.ZipException: duplicate entry: folder/. It also skips empty directories (which is not expected). What I want to achieve is somehow move my ZipOutputStream to 'lower' level and do operations on each of zips. Maybe there's better approach to handle all of this problem, any help would be appreciated. I need to perform certain text extraction/modification later, however I'm not starting it yet until reading/writing whole structure is not working properly. Thanks in advance for any help!
//constructor
private final File zipFile;
ArchiveResolver(String fileToHandle) {
this.zipFile = new File(Objects.requireNonNull(getClass().getClassLoader().getResource(fileToHandle)).getFile());
}
void resolveInputFile() throws Exception {
FileInputStream fileInputStream = new FileInputStream(this.zipFile);
FileOutputStream fileOutputStream = new FileOutputStream("out.zip");
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);
zip(zipInputStream, zipOutputStream);
zipInputStream.close();
zipOutputStream.close();
}
// this one doesn't preserve internal structure(empty folders), but can work on each file
private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
System.out.println(entry.getName());
byte[] buffer = new byte[1024];
int length;
if (entry.getName().endsWith(".zip")) {
// wrapping outer zip streams to inner streams making actual entries a new source
ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);
ZipEntry zipEntry = new ZipEntry(entry.getName());
// add new zip entry here to outer zipOutputStream: i.e. data.zip
zipOutputStream.putNextEntry(zipEntry);
// now treat this data.zip as parent and call recursively zipFolder on it
zip(innerZipInputStream, innerZipOutputStream);
// Finish internal stream work when innerZipOutput is done
innerZipOutputStream.finish();
// Close entry
zipOutputStream.closeEntry();
} else if (entry.isDirectory()) {
// putting new zip entry into output stream and adding extra '/' to make
// sure zipOutputStream will treat it as folder
ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");
// this only should preserve internal structure
zipOutputStream.putNextEntry(zipEntry);
// reading everything from zipInputStream
while ((length = zipInputStream.read(buffer)) > 0) {
// sending it straight to zipOutputStream
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
// This else will include checking if file is respectively:
// .gz file <- then open it, read from file inside, modify and save it
// .txt file <- also read, modify and preserve
} else {
// create new entry on top of this
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
}
// This one preserves internal structure (empty folders and so)
// BUT! no work on each file is possible it just preserves everything as it is
private void zipWhole(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
System.out.println(entry.getName());
byte[] buffer = new byte[1024];
int length;
zipOutputStream.putNextEntry(new ZipEntry(entry.getName()));
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
EDIT:
Updated my code to the newest version, still nothing to be proud of but did some changes however still not working... I've added here two very important comments about (in my opinion) code that fails. So I've tested two approaches - the first one is getting ZipInputStream from zipFile by using getInputStream(ZipEntry e); - throws Exception in thread "main" java.util.zip.ZipException: no current ZIP entry when I'm trying to put some entries to ZipOutputStream. The second approach focuses on "wrapping" ZipInputStream into one another -> this results in empty ZipInputStreams with no entries and application just goes through the files, list them (only top level of zips...) and finishes without saving anything into the out.zip file.
EDIT 2:
With a little suggestions from the people in the comments, I've decided to rewrite my code focusing to close, finish and closeEntry in appropriate places (I hope i did it better now). So right now I've achieved a little of something - code iterates through every entry, and saves it into out.zip file with proper zip packaging inside. Still skips empty folders tho, not sure why (I've checked some of the questions on stack and web, seems ok). Anyway thanks for help so far, I'll try to work this out and I'll keep this updated.
EDIT 3:
After few approaches to the problem and some reading + refactoring I've managed to solve this problem (however there's still problem while running this code on Linux - empty directories are skipped, seems to be connected to they way certain OS preserve file information?).
Here's working solution:
void resolveInputFile() throws IOException {
FileInputStream fileInputStream = new FileInputStream(this.zipFile);
FileOutputStream fileOutputStream = new FileOutputStream("in.zip");
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);
zip(zipInputStream, zipOutputStream);
zipInputStream.close();
zipOutputStream.close();
}
private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
logger.info(entry.getName());
if (entry.getName().endsWith(".zip")) {
// If entry is zip, I create inner zip streams that wrap outer ones
ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
zip(innerZipInputStream, innerZipOutputStream);
//As mentioned in comments, proper streams needs to be properly closed/finished, I'm done writing to inner stream so I call finish() rather than close() which closes outer stream
innerZipOutputStream.finish();
zipOutputStream.closeEntry();
} else if (entry.getName().endsWith(".gz")) {
GZIPInputStream gzipInputStream = new GZIPInputStream(zipInputStream);
//small trap while using GZIP - to save it properly I needed to put new ZipEntry to outerZipOutputStream BEFORE creating GZIPOutputStream wrapper
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(zipOutputStream);
//To make it as as much efficient as possible I've used BufferedReader
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
long start = System.nanoTime();
logger.info("Started to process {}", zipEntry.getName());
String line;
while ((line = bufferedReader.readLine()) != null) {
//PROCESSING LINE BY LINE...
zipOutputStream.write((line + "\n").getBytes());
}
logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
gzipOutputStream.finish();
zipOutputStream.closeEntry();
} else if (entry.getName().endsWith(".txt")) {
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(zipInputStream));
long start = System.nanoTime();
logger.info("Started to process {}", zipEntry.getName());
String line;
while ((line = bufferedReader.readLine()) != null) {
//PROCESSING LINE BY LINE...
zipOutputStream.write((line + "\n").getBytes());
}
logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
zipOutputStream.closeEntry();
} else if (entry.isDirectory()) {
//Standard directory preserving
byte[] buffer = new byte[8192];
int length;
// Adding extra "/" to make sure it's dir
ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
// sending it straight to zipOutputStream
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
} else {
//In my case it probably will never be called but if there's some different file in here it will be preserved unchanged in the output file
byte[] buffer = new byte[8192];
int length;
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
}
Thanks again for all the help and good advices.
There seems to be a lot of debugging and refactoring to be done there.
There's an obvious problem that you are either not closing your streams/entries or doing so in the wrong order. Buffered data will get lost and the central directory not written. (There is a complication that Java streams unhelpfully close the stream they wrap, so there is finish vs close but it still needs to be done in the correct order).
Zip files have no representation for directories as they have a flat structure - the entire file path is included for each entry in both the local header and central directory.
The part of the Java zip library giving a random access interface uses memory mapped files, so you are stuck with streams for everything except, perhaps, the top level.
I am trying to zip directories from one area (sdCard/someFolder) into a second directory (sdCard/Download), until the .zip file size becomes 5mb. Then, I want to create a new .zip file, fill that new one up to 5mb, etc.
Currently, my code compresses files into .zip directories successfully, but one of the .zip directories always becomes corrupt. I see this when my for loop exits the first Files[] of 22 objects and starts the next directory with Files[] of 4 objects. I believe I am losing some cleanup of old OutputStreams. The out.putNextEntry() becomes null after the second attempt of the for loop. Any help will suffice.
private static void addDirToArchive(ZipOutputStream out, FileOutputStream destinationDir, File sdCardMNDLogs)
{
File[] listOfFiles = sdCardMNDLogs.listFiles();
BufferedInputStream origin = null;
Log.i(TAG3, "Reading directory: " + sdCardMNDLogs.getName());
try{
byte[] buffer = new byte[BUFFER];
for(int i = 0; i < listOfFiles.length; i++)
{
if(listOfFiles[i].isDirectory())
{
addDirToArchive(out, destinationDir, listOfFiles[i]);
continue;
}
try
{
FileInputStream fis = new FileInputStream(listOfFiles[i]);
origin = new BufferedInputStream(fis,BUFFER);
ZipEntry ze = new ZipEntry(listOfFiles[i].getName());
if(currentZipFileSize >= EMAIL_SIZE)
{
out.close();
Log.d(emailTAG, "Creating new zipfile: /Download/MND/nwdLogs_" + i);
out = new ZipOutputStream(new FileOutputStream(new File(sdCard.getAbsolutePath() + "/Download/MND/nwdLogs_ " + i + ".zip")));
currentZipFileSize = 0;
}
out.putNextEntry(ze);
int length;
Log.i(TAG3, "Adding file: " + listOfFiles[i].getName());
while((length = origin.read(buffer, 0, BUFFER)) != -1)
{
out.write(buffer, 0, length);
}
out.closeEntry();
origin.close();
currentZipFileSize = currentZipFileSize + ze.getCompressedSize();
}
catch(IOException ioe)
{
Log.e(TAG3, "IOException: " + ioe);
}
}
}
finally
{
try {
out.close();
} catch (IOException e)
{
e.printStackTrace();
}
}
}
FileOutputStream destinationDir = new FileOutputStream(sdCard.getAbsolutePath() + "/Download/Dir/nwdLogs.zip");
ZipOutputStream out = new ZipOutputStream(destinationDir);
currentZipFileSize = 0;
addDirToArchive(out, destinationDir, dirName);
out.close();
destinationDir.close();
I suspect that the problem is that you aren't calling out.close() before opening the next ZIP file. My understanding is that ZIP's index is only written when the ZIP is closed, so if you neglect to close the index will be missing: hence corruption.
Also, note that you don't need to close both fis and origin. Just close origin ... and it closes fis.
UPDATE - While you have fixed the original close bug, there are more:
You have added a finally block to close out. That is wrong. You don't want addDirToArchive to close out. That's the likely cause of your exceptions.
There are a couple of problems that happen after you have done this:
if (currentZipFileSize >= EMAIL_SIZE)
{
out.close();
out = new ZipOutputStream(new FileOutputStream(...));
currentZipFileSize = 0;
}
Since out is a local parameter, the caller does not see the change
you make. Therefore:
when you call out.close() in the caller, you could be closing
the original ZIP (already closed) ... not the current one
if you called addDirToArchive(out, destinationDir, dirName)
multiple times, in subsequent calls you could be passing a closed ZIP file.
Your exception handling is misguided (IMO). If there is an I/O error writing a file to the ZIP, you do NOT want log a message and keep going. You want to bail out. Either crash the app entirely, or stop doing what you are doing. In this case, you "stream is closed" is clearly a bug in your code, and your exception handling is effectively telling the app to ignore it.
Some advice:
If you are splitting the responsibility for opening and closing resources across multiple methods you need to be VERY careful about what code has responsibility for closing what. You need to understand what you are doing.
Blindly applying (so called) "solutions" (like the finally stuff) ... 'cos someone says "XXX is best practice" or "always do XXX"... is going to get you into trouble. You need to 1) understand what the "solution" does, and 2) think about whether the solution actually does what you need.
I have been created a application which shall extract single files from tar-archive. The application reads the *.tar properly, but when i try to extract files, the application just create new files with correct filename... The files is empty (0kb). So... I probably just create new files instead of extract...
I'm a totally beginner at this point...
for(TarArchiveEntry tae : tarEntries){
System.out.println(tarEntries.size());
try {
fOutput = new FileOutputStream(new File(tae.getFile(), tae.getName()));
byte[] buf = new byte[(int) tae.getSize()];
int len;
while ((len = tarFile.read(buf)) > 0) {
fOutput.write(buf, 0, len);
}
fOutput.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Assuming tarFile is a TarArchiveInputStream you can only read an entry's content right after calling tarFile.getNextTarEntry().
The stream is processed sequentially, so when you invoke getNextTarEntry you skip over the content of the current entry right to the next entry. It looks as if you had read the whole archive in order to fill tarEntries in which case you've already read past the last entry and the stream is exhausted.
Use Case
I need to package up our kml which is in a String into a kmz response for a network link in Google Earth. I would like to also wrap up icons and such while I'm at it.
Problem
Using the implementation below I receive errors from both WinZip and Google Earth that the archive is corrupted or that the file cannot be opened respectively. The part that deviates from other examples I'd built this from are the lines where the string is added:
ZipEntry kmlZipEntry = new ZipEntry("doc.kml");
out.putNextEntry(kmlZipEntry);
out.write(kml.getBytes("UTF-8"));
Please point me in the right direction to correctly write the string so that it is in doc.xml in the resulting kmz file. I know how to write the string to a temporary file, but I would very much like to keep the operation in memory for understandability and efficiency.
private static final int BUFFER = 2048;
private static void kmz(OutputStream os, String kml)
{
try{
BufferedInputStream origin = null;
ZipOutputStream out = new ZipOutputStream(os);
out.setMethod(ZipOutputStream.DEFLATED);
byte data[] = new byte[BUFFER];
File f = new File("./icons"); //folder containing icons and such
String files[] = f.list();
if(files != null)
{
for (String file: files) {
LOGGER.info("Adding to KMZ: "+ file);
FileInputStream fi = new FileInputStream(file);
origin = new BufferedInputStream(fi, BUFFER);
ZipEntry entry = new ZipEntry(file);
out.putNextEntry(entry);
int count;
while((count = origin.read(data, 0, BUFFER)) != -1) {
out.write(data, 0, count);
}
origin.close();
}
}
ZipEntry kmlZipEntry = new ZipEntry("doc.kml");
out.putNextEntry(kmlZipEntry);
out.write(kml.getBytes("UTF-8"));
}
catch(Exception e)
{
LOGGER.error("Problem creating kmz file", e);
}
}
Bonus points for showing me how to put the supplementary files from the icons folder into a similar folder within the archive as opposed to at the same layer as the doc.kml.
Update Even when saving the string to a temp file the errors occur. Ugh.
Use Case Note The use case is for use in a web app, but the code to get the list of files won't work there. For details see how-to-access-local-files-on-server-in-jboss-application
You forgot to call close() on ZipOutputStream. Best place to call it is the finally block of the try block where it's been created.
Update: To create a folder, just prepend its name in the entry name.
ZipEntry entry = new ZipEntry("icons/" + file);