Java: Stream Contents of Zipfile via HTTP - java

I have quite some amount of streamable data (>100MB), which, for the sake of compression, i would like to host packed in a zipfile on an http-server. So this zipfile contains a single file.
Now is it possible for a java-client to stream the data via http, even though it is packed in a zipfile?
According to wikipedia, ZIPs are not sequentially...
http://en.wikipedia.org/wiki/ZIP_(file_format)#Structure
If this is still possible somehow, then how?
edit: about gzip: as i said, i use a custom java client (not a webbrowser) is gzip available in the java http implementation?

Here's a snippet of code (that works) that the client can use to read from the zipped stream:
static void processZippedInputStream(InputStream in, String entryNameRegex)
throws IOException
{
ZipInputStream zin = new ZipInputStream(in);
ZipEntry ze;
while ((ze = zin.getNextEntry()) != null)
{
if (ze.getName().matches(entryNameRegex))
{
// treat zin as a normal input stream - ie read() from it till "empty" etc
break;
}
zin.closeEntry();
}
zin.close();
}
The main difference with a normal InputStream is iterating through the entries. You may know, for example, that you want the first entry, so no need for the name matching parameter etc.

Java supports the gzip format with the GZipInputStream (decompressing) and GZipOutputStream (compressing). Both zip and gzip use the same compressing format internally, the main difference is in the metadata: zip has it at the end of the file, gzip at the beginning (and gzip only supports one enclosed file easily).
For your of streaming one big file, using gzip will be the better thing to do - even more as you don't need access to the metadata.
I'm not sure if the HTTPConnection sends Accept-Encoding: gzip and then handles inflating the content automatically if the server delivers it with Content-Encoding: gzip, but you surely can do it manually if the server simply sends a the .gz file as such (i.e. with Content-Encoding: identity).
(By the way, make sure to read from the stream with not too small buffers, as each deflate call will have a native call overhead, since Java's GZipInputStream uses the native zlib implementation.)

Would it make more sense to let the web server do the zipping? If you are simply trying to reduce the amount of bandwidth being used, rather than really wanting to store the file zipped up on the server, this would simply be a matter of configurations, for example see:
http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
for HTTP/1.1 GZIP compression. The server can force the response to the client to be zipped.
See also http://en.wikipedia.org/wiki/HTTP_compression.
The client will receive zipped packets and handle the unzipping. It should be possible to stream the file too, so the client doesn't need all the file before it can do something useful, because the server can zip individual chunks.

Yes you can, Stream the zip and use the MIME type as application/zip
If you actually want to play stream music on the other end, then it can't be done trivially as you can only unpack once the entire zip is available on client.
If size is you concern, you can either turn down your mp3 bit-rate or use formats such as ogg/vorbis

Use GZIP and then you can stream. Gzip uses the default compression algorithm of zip anyway.

Related

How to completely abort the output stream download?

we're currently working on the service that would archive the data and return it to the user as a ZipOutputStream. What we're currently looking for is an option to completely terminate the operation if something goes wrong on the server side. With our current implementation (just closing the response output stream) errors result in a malformed zip at the user side, but it can't be told if the archive is malformed or not before attempting to unzip it. The desired behavior would be something like download termination (from a browser perspective, for instance, it would result in an unsuccessful download indication (red cross icon or something similar, depending on the browser) explicitly telling the user that something went wrong). We're using Spring Boot, so any java code examples would really be appreciated, but if you know the underlying HTTP mechanism that is responsible for this kind of behavior, and can point in the right direction, that would be much appreciated too.
Here's what we have as of now (output being a response output stream of a Spring REST controller (HttpServletResponse.getOutputStream()) :
try (ZipOutputStream zipOutputStream = new ZipOutputStream(outputStream)) {
try {
for (ZipRecordFile fileInfo : zipRecord.listZipFileOverride()) {
InputStream fileStream = getFileStream(fileInfo.s3region(), fileInfo.s3bucket(),
fileInfo.s3key());
ZipEntry zipEntry = new ZipEntry(fileInfo.fileName());
zipOutputStream.putNextEntry(zipEntry);
fileStream.transferTo(zipOutputStream);
}
}
catch (Exception e) {
outputStream.close();
}
}
There isn't a (clean) way to do what you want:
Once you have started writing the ZIP file to the output stream, it is too late to change the HTTP response code. The response code is sent at the start of response.
Therefore, there is no proper way for the HTTP server to tell the HTTP client: "Hey ... ignore that ZIP file I sent you 'cos it is corrupt".
So what are the alternatives?
On the server side, create the entire ZIP as an in-memory object or write it to a temporary file. If you succeed, send an 2xx response followed by the ZIP data. If you fail, send a 4xx or 5xx response.
The main problem is that you need enough memory or file system space to hold the ZIP file.
Redesign your HTTP API so that the client can sent a second request to check if the first request's response contained a complete ZIP file.
You might be able to exploit MIME multipart encoding; see RFC 1341. Each part of a well-formed MIME multipart has a start marker and an end-marker. What you could try is to have your web-app construct the multipart stream containing the ZIP "by hand". If it decides it must abort the ZIP, it could just close the output stream without adding the required end marker.
The main problem with this is that you are depending on the HTTP stack on the client side to tell the browser (or whatever) that the multipart is corrupted. Furthermore, the browser (or whatever) must not pass on the partial (i.e. corrupt) ZIP file on to the user. I'm not sure if you can rely on (particular) web browsers to do that.
If you are running the download via custom code on the client side, you could conceivably implement your own encapsulation protocol. The effect would be the same as for 3 ... but you wouldn't be abusing the MIME spec.

Stream multiple files over HttpUrlConnection

My application need to transfer multiple files to an http server (by opening OutputStream from HttpUrlConnection) but to avoid overhead of connection establishment we would like to use one connection only. Would this be feasible?
Note: The data is created in real time so that we cannot add them into one archive file and transfer with one shot.
Thanks for your advices!
You're over-optimizing. HttpURLConnection already does TCP connection pooling behind the scenes. Just use a new URL, HttpURLConnection, OutputStream, etc., per file.
The fact that you have to output more than one file does not prevent the fact that you can still use an archive format which can be created using your OutputStream, in real time; and zip is such a format.
The JDK has ZipOutputStream which can help you there; basically you can use it as such (code to set the HTTP headers not shown):
// out is your HttpUrlConnection's OutputStream
try (
final ZipOutputStream zout = new ZipOutputStream(out);
) {
addEntries(zout);
}
The addEntries() method would then create ZipEntry instances, one per file, and write the contents.
Try to use Apache HttpClient. It supports HTTP 1.1 keep-alive feature.
Reference: http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html
Fast read: http://en.wikipedia.org/wiki/HTTP_persistent_connection

How can I apply GZIP content-encoding to an HTTP file upload multipart/form-data POST

I have a custom java console app I am writing to upload non-binary files to a java app server I own. It is performing an HTTPS multipart/form-data POST with the file to a REST api. While it works great for small files, I would like to apply GZIP content-encoding to the post request, so it more efficiently handles large files.
Is there a JAVA library I can use to gzip the post, including the file content and then un-zip it on the other side? I would like to avoid having to zip the file first and would rather rely on HTTP encoding to handle it.
To be pedantic, you wouldn't gzip the entire POST. You would just Gzip the content data, and then in your POST set the Content-Encoding as gzip.
You haven't posted your code (get it???), so some assumptions need to be made to give an example:
import java.util.zip.GZIPOutputStream;
import java.io.ByteArrayOutputStream;
...
final String yourData = "butts";
final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
final GZipOutputStream gzipOutputStream;
try {
gzipOutputStream = new GZipOutputStream(byteArrayOutputStream);
gzipOutputStream.write(yourData.getBytes("utf-8"));
} finally {
gzipOutputStream.close();
}
final byte[] gzippedButts = byteArrayOutputStream.toByteArray();
/*
* Now use the gzipped data as the data in your POST, and also
* make sure to set the Content-Encoding of your HTTP POST to "gzip".
*/
Edit: Reading the question again, it sounds like OP wants a library that will abstract away all of the handling and just Gzip a request body under the hood. Unfortunately, I am not aware of any such library.

How to know if downloaded file from url is not complete?

I'm using this great snippet from How to download and save a file from Internet using Java? to download a file from an url :
URL website = new URL("http://www.website.com/information.asp");
ReadableByteChannel rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream("information.html");
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
But instead of Long.MAX_VALUE, I prefer limit the download to 2mb for security reasons, so I replaced it by
fos.getChannel().transferFrom(rbc, 0, 2097152);
But now, I'm wondering how can I handle the case where the file size is greater than 2mb?
What can I do to check if the file is corrupt or not?
Have you considered checking the Content-Length header as per the RFC? You could then check if this exceeds some acceptable value -- in your case 2MB -- and reject further processing. You could accomplish this with an initial HTTP HEAD request and then a GET if you're happy, or by reading the headers of just the GET response and proceeding with further streaming if acceptable.
Alternatively (but admittedly ugly), you could use a BufferedReader passing in a buffer of 2MB and comparing that with the headers.
As for corruption, you're better off using a checksum as stated in other comments. Of course, this requires you knowing the checksum for the resource up-front, and is not something you're likely to get from the HTTP response itself.
There are actually two aspects to this Question:
how do you know if you've downloaded the entire file, and
how do you know if what you have downloaded is corrupt.
First thing to note is that if you "chop" the file transfer at 2Mb, then if the apparent transferred file size is 2Mb you can be pretty sure that it won't be complete. (By the looks of it, your current code will give you the bytes after any transfer encoding has been decoded ... which simplifies things.)
Next thing to note is that an HTTP response will often include a Content-length header that tells the client how many bytes of (transfer encoded) content to expect in the response body. However, that won't tell you if the bytes you actually received (after decoding) are actually correct. (And besides, this header is optional ... you can't rely on it being there.)
As #ato notes, you would be better off checking the Content-length in the GET (or a HEAD) response before you actually try to read the data.
However, the only sure-fire way to know if you've got a complete / non-corrupt file is to check it against a checksum or (ideally) a crypto-hash that you obtained separately from the transfer. There is no standard way of obtaining a checksum or hash using the HTTP protocol.

The best way to send a file over a Network

I want to send a file to the Browser via the REST Interface.
Can you suggest the most efficient way to do it, Keeping in mind the following?
Not much traffic.
I am fetching the file from HBase which means when I fetch it from HBase I get it in Byte Array.
The files are not in any folder in the server. The files can only be fetched from the HBase table.
The Front end is PHP and I do not know PHP.
In the REST api you can just pass the byte array to Response and it takes care of itself.
Using the following code -
#Produces("image/jpg")
public Response getImage() {
<Fetch it from where ever you have it>
Response.ok(<byteArrayOfTheFile>).build();
}
I am giving case study of WebService by which i send file:
It is always good to encode the file content and send it to the destination where they will be decode it and read the content.
Sending as an attachment is always open to the world becasue it is not encrypted.And if the network having high trafic chances of failure is high.

Categories