Best way to detect if a stream is zipped in Java

Best way to detect if a stream is zipped in Java - java

What is the best way to find out i java.io.InputStream contains zipped data?

Introduction
Since all the answers are 5 years old I feel a duty to write down, what's going on today. I seriously doubt one should read magic bytes of the stream! That's a low level code, it should be avoided in general.
Simple answer
miku writes:
If the Stream can be read via ZipInputStream, it should be zipped.
Yes, but in case of ZipInputStream "can be read" means that first call to .getNextEntry() returns a non-null value. No exception catching et cetera. So instead of magic bytes parsing you can just do:
boolean isZipped = new ZipInputStream(yourInputStream).getNextEntry() != null;
And that's it!
General unzipping thoughts
In general, it appeared that it's much more convenient to work with files while [un]zipping, than with streams. There are several useful libraries, plus ZipFile has got more functionality than ZipInputStream. Handling of zip files is discussed here: What is a good Java library to zip/unzip files? So if you can work with files you better do!
Code sample
I needed in my application to work with streams only. So that's the method I wrote for unzipping:
import org.apache.commons.io.IOUtils;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
public boolean unzip(InputStream inputStream, File outputFolder) throws IOException {
ZipInputStream zis = new ZipInputStream(inputStream);
ZipEntry entry;
boolean isEmpty = true;
while ((entry = zis.getNextEntry()) != null) {
isEmpty = false;
File newFile = new File(outputFolder, entry.getName());
if (newFile.getParentFile().mkdirs() && !entry.isDirectory()) {
FileOutputStream fos = new FileOutputStream(newFile);
IOUtils.copy(zis, fos);
IOUtils.closeQuietly(fos);
}
}
IOUtils.closeQuietly(zis);
return !isEmpty;
}

The magic bytes for the ZIP format are 50 4B. You could test the stream (using mark and reset - you may need to buffer) but I wouldn't expect this to be a 100% reliable approach. There would be no way to distinguish it from a US-ASCII encoded text file that began with the letters PK.
The best way would be to provide metadata on the content format prior to opening the stream and then treat it appropriately.

You could check that the first four bytes of the stream are the local file header signature that starts the local file header that proceeds every file in a ZIP file, as shown in the spec here to be 50 4B 03 04.
A little test code shows this to work:
byte[] buffer = new byte[4];
try {
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("so.zip"));
ZipEntry ze = new ZipEntry("HelloWorld.txt");
zos.putNextEntry(ze);
zos.write("Hello world".getBytes());
zos.close();
FileInputStream is = new FileInputStream("so.zip");
is.read(buffer);
is.close();
}
catch(IOException e) {
e.printStackTrace();
}
for (byte b : buffer) {
System.out.printf("%H ",b);
}
Gave me this output:
50 4B 3 4

Not very elegant, but reliable:
If the Stream can be read via ZipInputStream, it should be zipped.

Checking the magic number may not be the right option.
Docx files are also having similar magic number 50 4B 3 4

Since both .zip and .xlsx having the same Magic number, I couldn't find the valid zip file (if renamed).
So, I have used Apache Tika to find the exact document type.
Even if renamed the file type as zip, it finds the exact type.
Reference: https://www.baeldung.com/apache-tika

I combined answers from #McDowell and
#Innokenty to a small lib function that you can paste into you project:
public static boolean isZipStream(InputStream inputStream) {
if (inputStream == null || !inputStream.markSupported()) {
throw new IllegalArgumentException("InputStream must support mark-reset. Use BufferedInputstream()");
}
boolean isZipped = false;
try {
inputStream.mark(2048);
isZipped = new ZipInputStream(inputStream).getNextEntry() != null;
inputStream.reset();
} catch (IOException ex) {
// cannot be opend as zip.
}
return isZipped;
}
You can use the lib like this:
public static void main(String[] args) {
InputStream inputStream = new BufferedInputStream(...);
if (isZipStream(inputStream)) {
// do zip processing using inputStream
} else {
// do non-zip processing using inputStream
}
}

Related

Copied DocumentFile has different siize and hash to original

I'm attempting to copy / duplicate a DocumentFile in an Android application, but upon inspecting the created duplicate, it does not appear to be exactly the same as the original (which is causing a problem, because I need to do an MD5 check on both files the next time a copy is called, so as to avoid overwriting the same files).
The process is as follows:
User selects a file from a ACTION_OPEN_DOCUMENT_TREE
Source file's type is obtained
New DocumentFile in target location is initialised
Contents of first file is duplicated into second file
The initial stages are done with the following code:
// Get the source file's type
String sourceFileType = MimeTypeMap.getSingleton().getExtensionFromMimeType(contextRef.getContentResolver().getType(file.getUri()));
// Create the new (empty) file
DocumentFile newFile = targetLocation.createFile(sourceFileType, file.getName());
// Copy the file
CopyBufferedFile(new BufferedInputStream(contextRef.getContentResolver().openInputStream(file.getUri())), new BufferedOutputStream(contextRef.getContentResolver().openOutputStream(newFile.getUri())));
The main copy process is done using the following snippet:
void CopyBufferedFile(BufferedInputStream bufferedInputStream, BufferedOutputStream bufferedOutputStream)
{
// Duplicate the contents of the temporary local File to the DocumentFile
try
{
byte[] buf = new byte[1024];
bufferedInputStream.read(buf);
do
{
bufferedOutputStream.write(buf);
}
while(bufferedInputStream.read(buf) != -1);
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
if (bufferedInputStream != null) bufferedInputStream.close();
if (bufferedOutputStream != null) bufferedOutputStream.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
The problem that I'm facing, is that although the file copies successfully and is usable (it's a picture of a cat, and it's still a picture of a cat in the destination), it is slightly different.
The file size has changed from 2261840 to 2262016 (+176)
The MD5 hash has changed completely
Is there something wrong with my copying code that is causing the file to change slightly?
Thanks in advance.

Your copying code is incorrect. It is assuming (incorrectly) that each call to read will either return buffer.length bytes or return -1.
What you should do is capture the number of bytes read in a variable each time, and then write exactly that number of bytes. Your code for closing the streams is verbose and (in theory1) buggy as well.
Here is a rewrite that addresses both of those issues, and some others as well.
void copyBufferedFile(BufferedInputStream bufferedInputStream,
BufferedOutputStream bufferedOutputStream)
throws IOException
{
try (BufferedInputStream in = bufferedInputStream;
BufferedOutputStream out = bufferedOutputStream)
{
byte[] buf = new byte[1024];
int nosRead;
while ((nosRead = in.read(buf)) != -1) // read this carefully ...
{
out.write(buf, 0, nosRead);
}
}
}
As you can see, I have gotten rid of the bogus "catch and squash exception" handlers, and fixed the resource leak using Java 7+ try with resources.
There are still a couple of issues:
It is better for the copy function to take file name strings (or File or Path objects) as parameters and be responsible for opening the streams.
Given that you are doing block reads and writes, there is little value in using buffered streams. (Indeed, it might conceivably be making the I/O slower.) It would be better to use plain streams and make the buffer the same size as the default buffer size used by the Buffered* classes .... or larger.
If you are really concerned about performance, try using transferFrom as described here:
https://www.journaldev.com/861/java-copy-file
1 - In theory, if the bufferedInputStream.close() throws an exception, the bufferedOutputStream.close() call will be skipped. In practice, it is unlikely that closing an input stream will throw an exception. But either way, the try with resource approach will deals with this correctly, and far more concisely.

Java 7 Deflating Files

I have a piece of code which uses the deflate algorithm to compress a file:
public static File compressOld(File rawFile) throws IOException
{
File compressed = new File(rawFile.getCanonicalPath().split("\\.")[0]
+ "_compressed." + rawFile.getName().split("\\.")[1]);
InputStream inputStream = new FileInputStream(rawFile);
OutputStream compressedWriter = new DeflaterOutputStream(new FileOutputStream(compressed));
byte[] buffer = new byte[1000];
int length;
while ((length = inputStream.read(buffer)) > 0)
{
compressedWriter.write(buffer, 0, length);
}
inputStream.close();
compressedWriter.close();
return compressed;
}
However, I'm not happy with the OutputStream copying loop since it's the "outdated" way of writing to streams. Instead, I want to use a Java 7 API method such as Files.copy:
public static File compressNew(File rawFile) throws IOException
{
File compressed = new File(rawFile.getCanonicalPath().split("\\.")[0]
+ "_compressed." + rawFile.getName().split("\\.")[1]);
OutputStream compressedWriter = new DeflaterOutputStream(new FileOutputStream(compressed));
Files.copy(compressed.toPath(), compressedWriter);
compressedWriter.close();
return compressed;
}
The latter method however does not work correctly, the compressed file is messed up and only a few bytes are copied. How come?

I see mainly two problems.
You copy from the target instead of the source. I think the copying has to be changed to Files.copy(rawFile.toPath(), compressedWriter);.
The Javadoc of copy says: "Note that if the given output stream is Flushable then its flush method may need to invoked after this method completes so as to flush any buffered output." So, you have to call the flush-method of the OutputStream after copy.
Additionally there is one more point. The Javadoc of copy says:
It is strongly recommended that the output stream be promptly closed if an I/O error occurs.
You can close the OutputStream in a finally-block to make sure it happens in case of an error. Another possibility is to use try with resources that was introduced in Java 7.

How to speed up download using Java I/O

I'm quite new to using Java I/O as I haven't ever before and have written this to download a .mp4 file from www.kissanime.com.
The download is very, very slow at the moment (approximately 70-100kb/s) and was wondering how I could speed it up. I don't really understand the byte buffering so any help with that would be appreciated. That may be my problem, I'm not sure.
Here's my code:
protected static boolean downloadFile(URL source, File dest) {
try {
URLConnection urlConn = source.openConnection();
urlConn.setConnectTimeout(1000);
urlConn.setReadTimeout(5000);
InputStream in = urlConn.getInputStream();
FileOutputStream out = new FileOutputStream(dest);
BufferedOutputStream bout = new BufferedOutputStream(out);
int fileSize = urlConn.getContentLength();
byte[] b = new byte[65536];
int bytesDownloaded = 0, len;
while ((len = in.read(b)) != -1 && bytesDownloaded < fileSize) {
bout.write(b, 0, len);
bytesDownloaded += len;
// System.out.println((double) bytesDownloaded / 1000000.0 + "mb/" + (double) fileSize / 1000000.0 + "mb");
}
bout.close();
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
Thanks. Any further information will be provided upon request.
I can't find any questions on here related to downloading media files, and I'm sorry if this is deemed to be a duplicate.

Try using IOUtils.toByteArray, It takes an inputstream and returns an array with all bytes, in my opinion it's generally a good idea to check the common utility packages like apache-commons and guava and see if what you're trying to do hasn't already been done

If you want to save the file from InputStream then use this bellow method of apache-commons
FileUtils.copyInputStreamToFile ()
public static void copyInputStreamToFile(InputStream source,
File destination)
throws IOException
Copies bytes from an InputStream source to a file destination. The directories up to destination will be created if they don't already exist. destination will be overwritten if it already exists. The source stream is closed.
Always use file and IO related stuff by using library if available.There are also some other utility methods available & you can explore .
IOUtils
FileUtils

Turns out that it was the vast number of redirects from the link that caused the download speed to be throttled. Thanks everyone who answered.

JaxRS create and return zip file from server

I want to create and return a zip file from my server using JaxRS. I don't think that I want to create an actual file on the server, if possible I would like to create the zip on the fly and pass that back to the client. If I create a huge zip file on the fly will I run out of memory if too many files are in the zip file?
Also I am not sure the most efficient way to do this. Here is what I was thinking but I am very rusty when it comes to input/output in java.
public Response getFiles() {
// These are the files to include in the ZIP file
String[] filenames = // ... bunch of filenames
byte[] buf = new byte[1024];
try {
// Create the ZIP file
ByteArrayOutputStream baos= new ByteArrayOutputStream();
ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(baos));
// Compress the files
for (String filename : filenames) {
FileInputStream in = new FileInputStream(filename);
// Add ZIP entry to output stream.
out.putNextEntry(new ZipEntry(filename));
// Transfer bytes from the file to the ZIP file
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
// Complete the entry
out.closeEntry();
in.close();
}
// Complete the ZIP file
out.close();
ResponseBuilder response = Response.ok(out); // Not a 100% sure this will work
response.type(MediaType.APPLICATION_OCTET_STREAM);
response.header("Content-Disposition", "attachment; filename=\"files.zip\"");
return response.build();
} catch (IOException e) {
}
}
Any help would be greatly appreciated.

There are two options:
1- Create ZIP in a temporal directory and then dump to client.
2- Use OutputStream from the Response to send zip directly to the client, when you are creating them.
But never use memory to create huge ZIP file.

There's no need to create the ZIP file from the first to the last byte in the memory before serving it to the client. Also, there's no need to create such a file in temp directory in advance as well (especially because the IO might be really slow).
The key is to start streaming the "ZIP response" and generating the content on the flight.
Let's say we have a aMethodReturningStream(), which returns a Stream, and we want to turn each element into a file stored in the ZIP file. And that we don't want to keep bytes of each element stored all the time in any intermediate representation, like a collection or an array.
Then such a pseudocode might help:
#GET
#Produces("application/zip")
public Response generateZipOnTheFly() {
StreamingOutput output = strOut -> {
try (ZipOutputStream zout = new ZipOutputStream(strOut)) {
aMethodReturningStream().forEach(singleStreamElement -> {
try {
ZipEntry zipEntry = new ZipEntry(createFileName(singleStreamElement));
FileTime fileTime = FileTime.from(singleStreamElement.getCreationTime());
zipEntry.setCreationTime(fileTime);
zipEntry.setLastModifiedTime(fileTime);
zout.putNextEntry(zipEntry);
zout.write(singleStreamElement.getBytes());
zout.flush();
} catch (IOException e) {
throw new RuntimeException(e);
}
});
}
};
return Response.ok(output)
.header("Content-Disposition", "attachment; filename=\"generated.zip\"")
.build();
}
This concept relies on passing a StreamingOutput to the Response builder. The StreamingOutput is not a full response/entity/body generated before sending the response, but a recipe used to generate the flow of bytes on-the-fly (here wrapped into ZipOutputStream). If you're not sure about this, then maybe set a breakpoint next on flush() and observe the a download progress using e.g. wget.
The key thing to remember here is that the stream here is not a "wrapper" of pre-computed or pre-fetched items. It must be dynamic, e.g. wrapping a DB cursor or something like that. Also, it can be replaced by anything that's streaming data. That's why it cannot be a foreach loop iterating over Element[] elems array (with each Element having all the bytes "inside"), like
for(Element elem: elems)
if you'd like to avoid reading all items into the heap at once before streaming the ZIP.
(Please note this is a pseudocode and you might want to add better handling and polish other stuff as well.)

Writing in the beginning of a text file Java

I need to write something into a text file's beginning. I have a text file with content and i want write something before this content. Say i have;
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
After modifying,I want it to be like this:
Page 1-Scene 59
25.05.2011
Good afternoon sir,how are you today?
I'm fine,how are you?
Thanks for asking,I'm great
Just made up the content :) How can i modify a text file like this way?

You can't really modify it that way - file systems don't generally let you insert data in arbitrary locations - but you can:
Create a new file
Write the prefix to it
Copy the data from the old file to the new file
Move the old file to a backup location
Move the new file to the old file's location
Optionally delete the old backup file

Just in case it will be useful for someone here is full source code of method to prepend lines to a file using Apache Commons IO library. The code does not read whole file into memory, so will work on files of any size.
public static void prependPrefix(File input, String prefix) throws IOException {
LineIterator li = FileUtils.lineIterator(input);
File tempFile = File.createTempFile("prependPrefix", ".tmp");
BufferedWriter w = new BufferedWriter(new FileWriter(tempFile));
try {
w.write(prefix);
while (li.hasNext()) {
w.write(li.next());
w.write("\n");
}
} finally {
IOUtils.closeQuietly(w);
LineIterator.closeQuietly(li);
}
FileUtils.deleteQuietly(input);
FileUtils.moveFile(tempFile, input);
}

I think what you want is random access. Check out the related java tutorial. However, I don't believe you can just insert data at an arbitrary point in the file; If I recall correctly, you'd only overwrite the data. If you wanted to insert, you'd have to have your code
copy a block,
overwrite with your new stuff,
copy the next block,
overwrite with the previously copied block,
return to 3 until no more blocks

As #atk suggested, java.nio.channels.SeekableByteChannel is a good interface. But it is available from 1.7 only.
Update : If you have no issue using FileUtils then use
String fileString = FileUtils.readFileToString(file);

This isn't a direct answer to the question, but often files are accessed via InputStreams. If this is your use case, then you can chain input streams via SequenceInputStream to achieve the same result. E.g.
InputStream inputStream = new SequenceInputStream(new ByteArrayInputStream("my line\n".getBytes()), new FileInputStream(new File("myfile.txt")));

I will leave it here just in case anyone need
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
try (FileInputStream fileInputStream1 = new FileInputStream(fileName1);
FileInputStream fileInputStream2 = new FileInputStream(fileName2)) {
while (fileInputStream2.available() > 0) {
byteArrayOutputStream.write(fileInputStream2.read());
}
while (fileInputStream1.available() > 0) {
byteArrayOutputStream.write(fileInputStream1.read());
}
}
try (FileOutputStream fileOutputStream = new FileOutputStream(fileName1)) {
byteArrayOutputStream.writeTo(fileOutputStream);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Best way to detect if a stream is zipped in Java - java

What is the best way to find out i java.io.InputStream contains zipped data?

Not very elegant, but reliable: If the Stream can be read via ZipInputStream, it should be zipped.

Checking the magic number may not be the right option. Docx files are also having similar magic number 50 4B 3 4

Since both .zip and .xlsx having the same Magic number, I couldn't find the valid zip file (if renamed). So, I have used Apache Tika to find the exact document type. Even if renamed the file type as zip, it finds the exact type. Reference: https://www.baeldung.com/apache-tika

Related

Copied DocumentFile has different siize and hash to original

Java 7 Deflating Files

How to speed up download using Java I/O

JaxRS create and return zip file from server

Writing in the beginning of a text file Java

Categories

Resources