java gzip can't keep original file's extension name

java gzip can't keep original file's extension name - java

I'm using GZIPOutputStream to gzip one xml file to gz file, but after zipping I find the extension name of the xml file (.xml) is missing in the gz file hierarchy. I need to keep the extension name because the zipped gz file will be used by third party system which expects getting a .xml file after unzipping gz file. Are there any solutions for this? My test code is:
public static void main(String[] args) {
compress("D://test.xml", "D://test.gz");
}
private static boolean compress(String inputFileName, String targetFileName){
boolean compressResult=true;
int BUFFER = 1024*4;
byte[] B_ARRAY = new byte[BUFFER];
FileInputStream fins=null;
FileOutputStream fout=null;
GZIPOutputStream zout=null;
try{
File srcFile=new File(inputFileName);
fins=new FileInputStream (srcFile);
File tatgetFile=new File(targetFileName);
fout = new FileOutputStream(tatgetFile);
zout = new GZIPOutputStream(fout);
int number = 0;
while((number = fins.read(B_ARRAY, 0, BUFFER)) != -1){
zout.write(B_ARRAY, 0, number);
}
}catch(Exception e){
e.printStackTrace();
compressResult=false;
}finally{
try {
zout.close();
fout.close();
fins.close();
} catch (IOException e) {
e.printStackTrace();
compressResult=false;
}
}
return compressResult;
}

Maybe I'm missing something, but when I've gzipped files in the past, say test.xml, the output I get would be test.xml.gz. Perhaps if you changed the output filename to test.xml.tz you would still preserve your original file extension.

Not sure what the problem is here, you are calling your own compress function
private static boolean compress(String inputFileName, String targetFileName)
with the following arguments
compress("D://test.xml", "D://test.gz");
Quite obviously you are going to lose the .xml portion of the filename, you never pass it into your method.

Your code is perfectly fine. give the output file names as "D://test.xml.gz" you missed the file extension(.xml).
Ex: compress("D://test.xml", "D://test.xml.gz");

You can also use an ArchiveOutput stream (like Tar) before GZipping it.

Use the ZipOutputStream with ZipEntry instead of GZipOutputStream. so that it will keep the original file extension.
Sample code as below..
ZipOutputStream zipOutStream = new ZipOutputStream(new FileOutputStream(zipFile));
FileInputStream inStream = new FileInputStream(file); // Stream to read file
ZipEntry entry = new ZipEntry(file.getPath()); // Make a ZipEntry
zipOutStream.putNextEntry(entry); // Store entry

I created a copy of GZIPOutputStream and changed the code to allow for a different filename "in the gzip":
private final byte[] header = {
(byte) GZIP_MAGIC, // Magic number (short)
(byte)(GZIP_MAGIC >> 8), // Magic number (short)
Deflater.DEFLATED, // Compression method (CM)
8, // Flags (FLG)
0, // Modification time MTIME (int)
0, // Modification time MTIME (int)
0, // Modification time MTIME (int)
0, // Modification time MTIME (int)
0, // Extra flags (XFLG)
0 // Operating system (OS)
};
private void writeHeader() throws IOException {
out.write(header);
out.write("myinternalfilename".getBytes());
out.write(new byte[] {0});
}
Info about gzip format: http://www.gzip.org/zlib/rfc-gzip.html#specification

I also had the same issue, I found that (apache) commons-compress has a similar class - GzipCompressorOutputStream that can be configured with parameters.
final File compressedFile = new File("test-outer.xml.gz");
final GzipParameters gzipParameters = new GzipParameters();
gzipParameters.setFilename("test-inner.xml");
final GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(new FileOutputStream(compressedFile), gzipParameters);
Dependency:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.8</version>
</dependency>

Related

Extracting PDF inside a Zip inside a Zip

i have checked everywhere online and stackoverflow and could not find a match specific to this issue.
I am trying to extract a pdf file that is located in a zip file that is inside a zip file (nested zips).
Re-calling the method i am using to extract does not work nor does changing the whole program to accept Inputstreams instead of how i am doing it below.
The .pdf file inside the nested zip is just skipped at this stage
public static void main(String[] args)
{
try
{
//Paths
String basePath = "C:\\Users\\user\\Desktop\\Scan\\";
File lookupDir = new File(basePath + "Data\\");
String doneFolder = basePath + "DoneUnzipping\\";
File[] directoryListing = lookupDir.listFiles();
for (int i = 0; i < directoryListing.length; i++)
{
if (directoryListing[i].isFile()) //there's definately a file
{
//Save the current file's path
String pathOrigFile = directoryListing[i].getAbsolutePath();
Path origFileDone = Paths.get(pathOrigFile);
Path newFileDone = Paths.get(doneFolder + directoryListing[i].getName());
//unzip it
if(directoryListing[i].getName().toUpperCase().endsWith(ZIP_EXTENSION)) //ZIP files
{
unzip(directoryListing[i].getAbsolutePath(), DESTINATION_DIRECTORY + directoryListing[i].getName());
//move to the 'DoneUnzipping' folder
Files.move(origFileDone, newFileDone);
}
}
}
} catch (Exception e)
{
e.printStackTrace(System.out);
}
}
private static void unzip(String zipFilePath, String destDir)
{
//buffer for read and write data to file
byte[] buffer = new byte[BUFFER_SIZE];
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFilePath)))
{
FileInputStream fis = new FileInputStream(zipFilePath);
ZipEntry ze = zis.getNextEntry();
while(ze != null)
{
String fileName = ze.getName();
int index = fileName.lastIndexOf("/");
String newFileName = fileName.substring(index + 1);
File newFile = new File(destDir + File.separator + newFileName);
//Zips inside zips
if(fileName.toUpperCase().endsWith(ZIP_EXTENSION))
{
ZipInputStream innerZip = new ZipInputStream(zis);
ZipEntry innerEntry = null;
while((innerEntry = innerZip.getNextEntry()) != null)
{
System.out.println("The file: " + fileName);
if(fileName.toUpperCase().endsWith("PDF"))
{
FileOutputStream fos = new FileOutputStream(newFile);
int len;
while ((len = innerZip.read(buffer)) > 0)
{
fos.write(buffer, 0, len);
}
fos.close();
}
}
}
//close this ZipEntry
zis.closeEntry(); // java.io.IOException: Stream Closed
ze = zis.getNextEntry();
}
//close last ZipEntry
zis.close();
fis.close();
} catch (IOException e)
{
e.printStackTrace();
}
}

The solution to this is not as obvious as it seems. Despite writing a few zip utilities myself some time ago, getting zip entries from inside another zip file only seems obvious in retrospect
(and I also got the java.io.IOException: Stream Closed on my first attempt).
The Java classes for ZipFile and ZipInputStream really direct your thinking into using the file system, but it is not required.
The functions below will scan a parent-level zip file, and continue scanning until it finds an entry with a specified name. (Nearly) everything is done in-memory.
Naturally, this can be modified to use different search criteria, find multiple file types, etc. and take different actions, but this at least demonstrates the basic technique in question -- zip files inside of zip files -- no guarantees on other aspects of the code, and someone more savvy could most likely improve the style.
final static String ZIP_EXTENSION = ".zip";
public static byte[] getOnePDF() throws IOException
{
final File source = new File("/path/to/MegaData.zip");
final String nameToFind = "FindThisFile.pdf";
final ByteArrayOutputStream mem = new ByteArrayOutputStream();
try (final ZipInputStream in = new ZipInputStream(new BufferedInputStream(new FileInputStream(source))))
{
digIntoContents(in, nameToFind, mem);
}
// Save to disk, if you want
// copy(new ByteArrayInputStream(mem.toByteArray()), new FileOutputStream(new File("/path/to/output.pdf")));
// Otherwise, just return the binary data
return mem.toByteArray();
}
private static void digIntoContents(final ZipInputStream in, final String nameToFind, final ByteArrayOutputStream mem) throws IOException
{
ZipEntry entry;
while (null != (entry = in.getNextEntry()))
{
final String name = entry.getName();
// Found the file we are looking for
if (name.equals(nameToFind))
{
copy(in, mem);
return;
}
// Found another zip file
if (name.toUpperCase().endsWith(ZIP_EXTENSION.toUpperCase()))
{
digIntoContents(new ZipInputStream(new ByteArrayInputStream(getZipEntryFromMemory(in))), nameToFind, mem);
}
}
}
private static byte[] getZipEntryFromMemory(final ZipInputStream in) throws IOException
{
final ByteArrayOutputStream mem = new ByteArrayOutputStream();
copy(in, mem);
return mem.toByteArray();
}
// General purpose, reusable, utility function
// OK for binary data (bad for non-ASCII text, use Reader/Writer instead)
public static void copy(final InputStream from, final OutputStream to) throws IOException
{
final int bufferSize = 4096;
final byte[] buf = new byte[bufferSize];
int len;
while (0 < (len = from.read(buf)))
{
to.write(buf, 0, len);
}
to.flush();
}

Your question asks how to use java (by implication in windows) to extract a pdf from a zip inside another outer zip.
In many systems including windows it is a single line command that will depend on the location of source and target folders, however using the shortest example of current downloads folder it would be in a shell as simple as
tar -xf "german (2).zip" && tar -xf "german.zip" && german.pdf
to shell the command in windows see
How do I execute Windows commands in Java?
The default pdf viewer can open the result so Windows Edge or in my case SumatraPDF
There is generally no point in putting a pdf inside a zip because it cannot be run in there. So single nesting would be advisable if needed for download transportation.
There is no need to add a password to the zip because PDF uses its own password for opening. Thus unwise to add two levels of complexity. Keep it simple.
If you have multiple zips nested inside multiple zips with multiple pdfs in each then you have to be more specific by filtering names. However avoid that extra onion skin where possible.
\Downloads>tar -xf "german (2).zip" "both.zip" && tar -xf "both.zip" "English language.pdf"
You could complicate that by run in a memory or temp folder but it is reliable and simple to use the native file system so consider without Java its fastest to run
CD /D "C:/Users/user/Desktop/Scan/DoneUnzipping" && for %f in (..\Data\*.zip) do tar -xf "%f" "*.zip" && for %f in (*.zip) do tar -xf "%f" "*.pdf" && del "*.zip"
This will extract all inner zips into working folder then extract all PDFs and remove all the essential temporary zips. The source double zips will not be deleted simply touched.

The line that causes your problem looks to be auto-close block you have created when reading the inner zip:
try(ZipInputStream innerZip = new ZipInputStream(fis)) {
...
}
Several likely issues: firstly it is reading the wrong stream - fis not the existing zis.
Secondly, you shouldn't use try-with-resources for auto-close on innerZip as this implicitly calls innerZip.close() when exiting the block. If you view the source code of ZipInputStream via a good IDE you should see (eventually) that ZipInputStream extends InflaterInputStream which itself extends FilterInputStream. A call to innerZip.close() will close the underlying outer stream zis (fis in your case) hence stream is closed when you resume the next entry of the outer zip.
Therefore remove the try() block and add use of zis:
ZipInputStream innerZip = new ZipInputStream(zis);
Use try-catch block only for the outermost file handling:
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFilePath))) {
ZipEntry ze = zis.getNextEntry();
...
}
Thirdly, you appear to be copying the wrong stream when extracting a PDF - use innerZip not outer zis. The code will never extract PDF as these 2 lines can never be true at the same time because a file ending ZIP will never end PDF too:
if(fileName.toUpperCase().endsWith(ZIP_EXTENSION)) {
...
// You want innerEntry.getName() here
if(fileName.toUpperCase().endsWith("PDF"))
You should be able to switch to one line Files.copy and make use of the PDF filename not zip filename:
if(innerEntry.getName().toUpperCase().endsWith("PDF")) {
Path newFile = Paths.get(destDir + '-'+innerEntry.getName().replace("/", "-"));
System.out.println("Files.copy to " + newFile);
Files.copy(innerZip, newFile);
}

Zipping Files using util.zip No directory

I have the following situation:
I am able to zip my files with the following method:
public boolean generateZip(){
byte[] application = new byte[100000];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// These are the files to include in the ZIP file
String[] filenames = new String[]{"/subdirectory/index.html", "/subdirectory/webindex.html"};
// Create a buffer for reading the files
try {
// Create the ZIP file
ZipOutputStream out = new ZipOutputStream(baos);
// Compress the files
for (int i=0; i<filenames.length; i++) {
byte[] filedata = VirtualFile.fromRelativePath(filenames[i]).content();
ByteArrayInputStream in = new ByteArrayInputStream(filedata);
// Add ZIP entry to output stream.
out.putNextEntry(new ZipEntry(filenames[i]));
// Transfer bytes from the file to the ZIP file
int len;
while ((len = in.read(application)) > 0) {
out.write(application, 0, len);
}
// Complete the entry
out.closeEntry();
in.close();
}
// Complete the ZIP file
out.close();
} catch (IOException e) {
System.out.println("There was an error generating ZIP.");
e.printStackTrace();
}
downloadzip(baos.toByteArray());
}
This works perfectly and I can download the xy.zip which contains the following directory and file structure:
subdirectory/
----index.html
----webindex.html
My aim is to completely leave out the subdirectory and the zip should only contain the two files. Is there any way to achieve this?
(I am using Java on Google App Engine).
Thanks in advance

If you are sure the files contained in the filenames array are unique if you leave out the directory, change your line for constructing ZipEntrys:
String zipEntryName = new File(filenames[i]).getName();
out.putNextEntry(new ZipEntry(zipEntryName));
This uses java.io.File#getName()

You can use Apache Commons io to list all your files, then read them to an InputStream
Replace the line below
String[] filenames = new String[]{"/subdirectory/index.html", "/subdirectory/webindex.html"}
with the following
Collection<File> files = FileUtils.listFiles(new File("/subdirectory"), new String[]{"html"}, true);
for (File file : files)
{
FileInputStream fileStream = new FileInputStream(file);
byte[] filedata = IOUtils.toByteArray(fileStream);
//From here you can proceed with your zipping.
}
Let me know if you have issues.

You could use the isDirectory() method on VirtualFile

FileNotFoundException when trying to unzip an archive with java.util.zip.ZipFile

I have a silly problem i haven't been able to figure out. Can anyone help me?
My Code is as:
String zipname = "C:/1100.zip";
String output = "C:/1100";
BufferedInputStream bis = null;
BufferedOutputStream bos = null;
try {
ZipFile zipFile = new ZipFile(zipname);
Enumeration<?> enumeration = zipFile.entries();
while (enumeration.hasMoreElements()) {
ZipEntry zipEntry = (ZipEntry) enumeration.nextElement();
System.out.println("Unzipping: " + zipEntry.getName());
bis = new BufferedInputStream(zipFile.getInputStream(zipEntry));
int size;
byte[] buffer = new byte[2048];
It doesn't create a folder but debugging shows all the contents being generated.
In Order to create a folder i used the code
if(!output.exists()){ output.mkdir();} // here i get an error saying filenotfoundexception
bos = new BufferedOutputStream(new FileOutputStream(new File(outPut)));
while ((size = bis.read(buffer)) != -1) {
bos.write(buffer, 0, size);
}
}
} catch (Exception ex) {
ex.printStackTrace();
} finally {
bos.flush();
bos.close();
bis.close();
}
My zip file contains images: a.jpg b.jpg... and in the same hierarchy, I have abc.xml.
I need to extract the content as is in the zip file.
Any helps here.

There are a few problems with your code: Where is outPut declared? output is not a file but a string, so exists() and mkdir() do not exist. Start by declaring output like:
File output = new File("C:/1100");
Furthermore, outPut (with big P) is not declared. It be something like output + File.seprator + zipEntry.getName().
bos = new BufferedOutputStream(new FileOutputStream(output + File.seprator + zipEntry.getName()));
Note that you don't need to pass a File to FileOutputStream, as constructors show in the documentation.
At this point, your code should work if your Zip file does not contain directory. However, when opening the output stream, if zipEntry.getName() has a directory component (for instance somedir/filename.txt), opening the stream will result in a FileNotFoundException, as the parent directory of the file you try to create does not exist. If you want to be able to handle such zip files, you will find your answer in: How to unzip files recursively in Java?

How to use java.util.zip to archive/deflate string in java for use in Google Earth?

Use Case
I need to package up our kml which is in a String into a kmz response for a network link in Google Earth. I would like to also wrap up icons and such while I'm at it.
Problem
Using the implementation below I receive errors from both WinZip and Google Earth that the archive is corrupted or that the file cannot be opened respectively. The part that deviates from other examples I'd built this from are the lines where the string is added:
ZipEntry kmlZipEntry = new ZipEntry("doc.kml");
out.putNextEntry(kmlZipEntry);
out.write(kml.getBytes("UTF-8"));
Please point me in the right direction to correctly write the string so that it is in doc.xml in the resulting kmz file. I know how to write the string to a temporary file, but I would very much like to keep the operation in memory for understandability and efficiency.
private static final int BUFFER = 2048;
private static void kmz(OutputStream os, String kml)
{
try{
BufferedInputStream origin = null;
ZipOutputStream out = new ZipOutputStream(os);
out.setMethod(ZipOutputStream.DEFLATED);
byte data[] = new byte[BUFFER];
File f = new File("./icons"); //folder containing icons and such
String files[] = f.list();
if(files != null)
{
for (String file: files) {
LOGGER.info("Adding to KMZ: "+ file);
FileInputStream fi = new FileInputStream(file);
origin = new BufferedInputStream(fi, BUFFER);
ZipEntry entry = new ZipEntry(file);
out.putNextEntry(entry);
int count;
while((count = origin.read(data, 0, BUFFER)) != -1) {
out.write(data, 0, count);
}
origin.close();
}
}
ZipEntry kmlZipEntry = new ZipEntry("doc.kml");
out.putNextEntry(kmlZipEntry);
out.write(kml.getBytes("UTF-8"));
}
catch(Exception e)
{
LOGGER.error("Problem creating kmz file", e);
}
}
Bonus points for showing me how to put the supplementary files from the icons folder into a similar folder within the archive as opposed to at the same layer as the doc.kml.
Update Even when saving the string to a temp file the errors occur. Ugh.
Use Case Note The use case is for use in a web app, but the code to get the list of files won't work there. For details see how-to-access-local-files-on-server-in-jboss-application

You forgot to call close() on ZipOutputStream. Best place to call it is the finally block of the try block where it's been created.
Update: To create a folder, just prepend its name in the entry name.
ZipEntry entry = new ZipEntry("icons/" + file);

Convert audio stream to WAV byte array in Java without temp file

Given an InputStream called in which contains audio data in a compressed format (such as MP3 or OGG), I wish to create a byte array containing a WAV conversion of the input data. Unfortunately, if you try to do this, JavaSound hands you the following error:
java.io.IOException: stream length not specified
I managed to get it to work by writing the wav to a temporary file, then reading it back in, as shown below:
AudioInputStream source = AudioSystem.getAudioInputStream(new BufferedInputStream(in, 1024));
AudioInputStream pcm = AudioSystem.getAudioInputStream(AudioFormat.Encoding.PCM_SIGNED, source);
AudioInputStream ulaw = AudioSystem.getAudioInputStream(AudioFormat.Encoding.ULAW, pcm);
File tempFile = File.createTempFile("wav", "tmp");
AudioSystem.write(ulaw, AudioFileFormat.Type.WAVE, tempFile);
// The fileToByteArray() method reads the file
// into a byte array; omitted for brevity
byte[] bytes = fileToByteArray(tempFile);
tempFile.delete();
return bytes;
This is obviously less desirable. Is there a better way?

The problem is that the most AudioFileWriters need to know the file size in advance if writing to an OutputStream. Because you can't provide this, it always fails. Unfortunatly, the default Java sound API implementation doesn't have any alternatives.
But you can try using the AudioOutputStream architecture from the Tritonus plugins (Tritonus is an open source implementation of the Java sound API): http://tritonus.org/plugins.html

I notice this one was asked very long time ago. In case any new person (using Java 7 and above) found this thread, note there is a better new way doing it via Files.readAllBytes API. See:
How to convert .wav file into byte array?

Too late, I know, but I was needed this, so this is my two cents on the topic.
public void UploadFiles(String fileName, byte[] bFile)
{
String uploadedFileLocation = "c:\\";
AudioInputStream source;
AudioInputStream pcm;
InputStream b_in = new ByteArrayInputStream(bFile);
source = AudioSystem.getAudioInputStream(new BufferedInputStream(b_in));
pcm = AudioSystem.getAudioInputStream(AudioFormat.Encoding.PCM_SIGNED, source);
File newFile = new File(uploadedFileLocation + fileName);
AudioSystem.write(pcm, Type.WAVE, newFile);
source.close();
pcm.close();
}

The issue is easy to solve if you prepare class which will create correct header for you. In my example Example how to read audio input in wav buffer data goes in some buffer, after that I create header and have wav file in the buffer. No need in additional libraries. Just copy the code from my example.
Example how to use class which creates correct header in the buffer array:
public void run() {
try {
writer = new NewWaveWriter(44100);
byte[]buffer = new byte[256];
int res = 0;
while((res = m_audioInputStream.read(buffer)) > 0) {
writer.write(buffer, 0, res);
}
} catch (IOException e) {
System.out.println("Error: " + e.getMessage());
}
}
public byte[]getResult() throws IOException {
return writer.getByteBuffer();
}
And class NewWaveWriter you can find under my link.

This is very simple...
File f = new File(exportFileName+".tmp");
File f2 = new File(exportFileName);
long l = f.length();
FileInputStream fi = new FileInputStream(f);
AudioInputStream ai = new AudioInputStream(fi,mainFormat,l/4);
AudioSystem.write(ai, Type.WAVE, f2);
fi.close();
f.delete();
The .tmp file is a RAW audio file, the result is a WAV file with header.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.