How do I write XML files directly to a zip archive?

How do I write XML files directly to a zip archive? - java

What is the proper way to write a list of XML files using JAXB directly to a zip archive without using a 3rd party library.
Would it be better to just write all the XML files to a directory and then zip?

As others pointed out, you can use the ZipOutputStream class to create a ZIP-file. The trick to put multiple files in a single ZIP-file is to use the ZipEntry descriptors prior to writing (marshalling) the JAXB XML data in the ZipOutputStream. So your code might look similar to this one:
JAXBElement jaxbElement1 = objectFactory.createRoot(rootType);
JAXBElement jaxbElement2 = objectFactory.createRoot(rootType);
ZipOutputStream zos = null;
try {
zos = new ZipOutputStream(new FileOutputStream("xml-file.zip"));
// add zip-entry descriptor
ZipEntry ze1 = new ZipEntry("xml-file-1.xml");
zos.putNextEntry(ze1);
// add zip-entry data
marshaller.marshal(jaxbElement1, zos);
ZipEntry ze2 = new ZipEntry("xml-file-2.xml");
zos.putNextEntry(ze2);
marshaller.marshal(jaxbElement2, zos);
zos.flush();
} finally {
if (zos != null) {
zos.close();
}
}

The "proper" way — without using a 3rd party library — would be to use java.util.zip.ZipOutputStream.
Personally, though, I prefer TrueZip.
TrueZIP is a Java based plug-in framework for virtual file systems (VFS) which provides transparent access to archive files as if they were just plain directories.

I don't know what JAXB has to do with anything, nor XML - file contents are file contents. Your question is really "How can I output characters directly to a zip archive"
To do that, open a ZipOututStream and use the API to create entries then write contents to each entry. Remember that a zip archive is like a series of named files within the archive.
btw, ZipOututStream is part of the JDK (ie it's not a "library")

Related

Is using the File class in Java how you create a file?

I need some serious help with concepts. I have been given background context on the class, specifically this:
I just need to understand the purpose of this class? Can I create a text file (or any other type of file) with its constructors? Is this just for handling files, if so, what does that mean?
Any help whatsoever will be greatly appreciated!
Thank you

You could use the java.io.File to create a file on the file system:
File myFile = new File("myFile.txt");
myFile.createNewFile();
Note that invoking the constructor won't create the file on the file system. To create an empty file, the createNewFile() method has to be invoked.
The File simply represents a abstraction of the file location, not the file itself. It comes with operations on the file identified by the path: exists(), delete(), length(), etc.
What you probably want is to use the classes that allow you to write content to a file:
If you are to write text, you should use the Writer interface.
If you are to write binary content, you should use the OutputStream interface.
The classes FileWriter and FileOutputStream are, respectively, the ones that link the File and Writer/OutputStream concepts together. Those classes create the file on the file-system for you.
FileWriter myFileWriter = null;
File myFile = new File("myFile.txt");
try {
// file is created on the file-system here
myFileWriter = new FileWriter(myFile);
myFileWriter.write("hello");
} finally {
if (myFileWriter != null) {
myFileWriter.close();
}
}

You can create a file using the File.createNewFile method, or, if you are using Java 7 or newer, using the newer Files.createFile method.
The difference between the old File and the new Path classes is that the former mixed a reference to a path to a file on the filsystem and operations you can do on it, and the latter is just representing the path itself but allows you to query it and analyze its structure.

Reading files from an embedded ZIP archive

I have a ZIP archive that's embedded inside a larger file. I know the archive's starting offset within the larger file and its length.
Are there any Java libraries that would enable me to directly read the files contained within the archive? I am thinking along the lines of ZipFile.getInputStream(). Unfortunately, ZipFile doesn't work for this use case since its constructors require a standalone ZIP file.
For performance reasons, I cannot copy the ZIP achive into a separate file before opening it.
edit: Just to be clear, I do have random access to the file.

I've come up with a quick hack (which needs to get sanitized here and there), but it reads the contents of files from a ZIP archive which is embedded inside a TAR. It uses Java6, FileInputStream, ZipEntry and ZipInputStream. 'Works on my local machine':
final FileInputStream ins = new FileInputStream("archive.tar");
// Zip starts at 0x1f6400, size is not needed
long toSkip = 0x1f6400;
// Safe skipping
while(toSkip > 0)
toSkip -= ins.skip(toSkip);
final ZipInputStream zipin = new ZipInputStream(ins);
ZipEntry ze;
while((ze = zipin.getNextEntry()) != null)
{
final byte[] content = new byte[(int)ze.getSize()];
int offset = 0;
while(offset < content.length)
{
final int read = zipin.read(content, offset, content.length - offset);
if(read == -1)
break;
offset += read;
}
// DEBUG: print out ZIP entry name and filesize
System.out.println(ze + ": " + offset);
}
zipin.close();

1.create FileInputStream fis=new FileInputStream(..);
position it at the start of embedded zipfile:
fis.skip(offset);
open ZipInputStream(fis)

I suggest using TrueZIP, it provides file system access to many kinds of archives. It worked well for me in the past.

If you're using Java SE 7, it provides a zip fie system which allows you to read/ write files in the zip directly: http://docs.oracle.com/javase/7/docs/technotes/guides/io/fsp/zipfilesystemprovider.html

I think apache commons compress may help you.
There is a class org.apache.commons.compress.archivers.zip.ZipArchiveEntry, which inherit java.util.zip.ZipEntry.
It has a method getDataOffset(), that can get the offset of data stream within the archive file.

7-zip-JavaBinding is a Java wrapper for the 7-zip C++ library.
The code snippets page in particular has some nice examples including printing a list of items in an archive, extracting a single file and opening multi-part archives.

Check whether zip4j helps you or not.
You can try PartInputStream to read zip file as per your use case.
I think it is better to create temp zip file and then accessing it.

What is the fastest way to extract 1 file from a zip file which contain a lot of file?

I tried the java.util.zip package, it is too slow.
Then I found LZMA SDK and 7z jbinding but they are also lacking something.
The LZMA SDK does not provide a kind of documentation/tutorial of how-to-use, it is very frustrating. No javadoc.
While the 7z jbinding does not provide a simple way to extract only 1 file, however, it only provide way to extract all the content of the zip file. Moreover, it does not provide a way to specify a location to place the unzipped file.
Any idea please?

What does your code with java.util.zip look like and how big of a zip file are you dealing with?
I'm able to extract a 4MB entry out of a 200MB zip file with 1,800 entries in roughly a second with this:
OutputStream out = new FileOutputStream("your.file");
FileInputStream fin = new FileInputStream("your.zip");
BufferedInputStream bin = new BufferedInputStream(fin);
ZipInputStream zin = new ZipInputStream(bin);
ZipEntry ze = null;
while ((ze = zin.getNextEntry()) != null) {
if (ze.getName().equals("your.file")) {
byte[] buffer = new byte[8192];
int len;
while ((len = zin.read(buffer)) != -1) {
out.write(buffer, 0, len);
}
out.close();
break;
}
}

I have not benchmarked the speed but with java 7 or greater, I extract a file as follows.
I would imagine that it's faster than the ZipFile API:
A short example extracting META-INF/MANIFEST.MF from a zip file test.zip:
// file to extract from zip file
String file = "MANIFEST.MF";
// location to extract the file to
File outputLocation = new File("D:/temp/", file);
// path to the zip file
Path zipFile = Paths.get("D:/temp/test.zip");
// load zip file as filesystem
try (FileSystem fileSystem = FileSystems.newFileSystem(zipFile)) {
// copy file from zip file to output location
Path source = fileSystem.getPath("META-INF/" + file);
Files.copy(source, outputLocation.toPath());
}

Use a ZipFile rather than a ZipInputStream.
Although the documentation does not indicate this (it's in the docs for JarFile), it should use random-access file operations to read the file. Since a ZIPfile contains a directory at a known location, this means a LOT less IO has to happen to find a particular file.
Some caveats: to the best of my knowledge, the Sun implementation uses a memory-mapped file. This means that your virtual address space has to be large enough to hold the file as well as everything else in your JVM. Which may be a problem for a 32-bit server. On the other hand, it may be smart enough to avoid memory-mapping on 32-bit, or memory-map just the directory; I haven't tried.
Also, if you're using multiple files, be sure to use a try/finally to ensure that the file is closed after use.

When creating a zip archive, what constitutes a duplicate entry

In a Java web application I am creating a zip file from various in-memory files (stored as byte[]).
Here's the key bit of code:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ZipOutputStream zos = new ZipOutputStream(baos);
for (//each member of a collection of objects) {
PDFDocument pdfDocument = //generate PDF for this member of the collection;
ZipEntry entry = new ZipEntry(pdfDocument.getFileName());
entry.setSize(pdfDocument.getBody().length);
zos.putNextEntry(entry);
zos.write(pdfDocument.getBody());//pdfDocument.getBody() returns byte[]
zos.closeEntry();
}
zos.close();
The problem: I'm sometimes getting a "ZipException: duplicate entry" when doing the "putNextEntry()" line.
The PDF files themselves will certainly be different, but they may have the same name ("PDF_File_for_John_Smith.pdf"). Is a name collision sufficient to cause this exception?

You can't store 2 entries with the same same name in a zip archive(in the same folder), much like you can't have 2 files with the same name in the same folder in a filesystem.
Edit; And while technically the zip file format allows this, the Java API for dealing with ZIP archives does not.

Yes -- you can use a directory structure inside your ZIP file if you need to hold multiple files with the same file name.

I believe so. Zip was originally intended to archive a directory structure, so it expects filenames to be unique. You could add directories to keep your files separated (and provide extra information to differentiate them, if you want).

decompress .gz file in batch

I have 100 of .gz files which I need to de-compress.
I have couple of questions
a) I am using the code given at http://www.roseindia.net/java/beginners/JavaUncompress.shtml to decompress the .gz file. Its working fine.
Quest:- is there a way to get the file name of the zipped file. I know that Zip class of Java gives of enumeration of entery file to work upon. This can give me the filename, size etc stored in .zip file. But, do we have the same for .gz files or does the file name is same as filename.gz with .gz removed.
b) is there another elegant way to decompress .gz file by calling the utility function in the java code. Like calling 7-zip application from your java class. Then, I don't have to worry about input/output stream.
Thanks in advance.
Kapil

a) Zip is an archive format, while gzip is not. So an entry iterator does not make much sense unless (for example) your gz-files are compressed tar files. What you want is probably:
File outFile = new File(infile.getParent(), infile.getName().replaceAll("\\.gz$", ""));
b) Do you only want to uncompress the files? If not you may be ok with using GZIPInputStream and read the files directly, i.e. without intermediate decompression.
But ok. Let's say you really only want to uncompress the files. If so, you could probably use this:
public static File unGzip(File infile, boolean deleteGzipfileOnSuccess) throws IOException {
GZIPInputStream gin = new GZIPInputStream(new FileInputStream(infile));
FileOutputStream fos = null;
try {
File outFile = new File(infile.getParent(), infile.getName().replaceAll("\\.gz$", ""));
fos = new FileOutputStream(outFile);
byte[] buf = new byte[100000];
int len;
while ((len = gin.read(buf)) > 0) {
fos.write(buf, 0, len);
}
fos.close();
if (deleteGzipfileOnSuccess) {
infile.delete();
}
return outFile;
} finally {
if (gin != null) {
gin.close();
}
if (fos != null) {
fos.close();
}
}
}

Regarding A, the gunzip command creates an uncompressed file with the original name minus the .gz suffix. See the man page.
Regarding B, Do you need gunzip specifically, or will another compression algorithm do? There's a java port of the LZMA compression algorithm used by 7zip to create .7z files, but it will not handle .gz files.

If you have a fixed number of files to decompress once, why don't you use existing tools for that?
As Paul Morie noticed, gunzip can do that:
for i in *.gz; do gunzip $i; done
And it would automatically name them, stripping .gz$
On windows, try winrar, probably, or gunzip from http://unxutils.sf.net

GZip is normally used only on single files, so it generally does not contain information about individual files. To bundle multiple files into one compressed archive, they are first combined into an uncompressed Tar file (with info about individual contents), and then compressed as a single file. This combination is called a Tarball.
There are libraries to extract the individual file info from a Tar, just as with ZipEntries. One example. You will first have to extract the .gz file into a temporary file in order to use it, or at least feed the GZipInputStream into the Tar library.
You may also call 7-Zip from the command line using Java. 7-Zip command-line syntax is here: 7-Zip Command Line Syntax. Example of calling the command shell from Java: Executing shell commands in Java. You will have to call 7-Zip twice: once to extract the Tar from the .tar.gz or .tgz file, and again to extract the individual files from the Tar.
Or, you could just do the easy thing and write a brief shell script or batch file to do your decompression. There's no reason to hammer a square peg in a round hole -- this is what batch files are made for. As a bonus, you can also feed them parameters, reducing the complexity of a java command line execution considerably, while still letting java control execution.

Have you tried
gunzip *.gz

.gz files (gzipped) can store the filename of a compressed file. So for example FuBar.doc can be saved inside myDocument.gz and with appropriate uncompression, the file can be restored to the filename FuBar.doc. Unfortunately, java.util.zip.GZIPInputStream does not support any way of reading the filename even if it is stored inside the archive.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I write XML files directly to a zip archive? - java

What is the proper way to write a list of XML files using JAXB directly to a zip archive without using a 3rd party library. Would it be better to just write all the XML files to a directory and then zip?

Related

Is using the File class in Java how you create a file?

Reading files from an embedded ZIP archive

What is the fastest way to extract 1 file from a zip file which contain a lot of file?

When creating a zip archive, what constitutes a duplicate entry

decompress .gz file in batch

Categories

Resources