How to know if downloaded file from url is not complete? - java

I'm using this great snippet from How to download and save a file from Internet using Java? to download a file from an url :
URL website = new URL("http://www.website.com/information.asp");
ReadableByteChannel rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream("information.html");
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
But instead of Long.MAX_VALUE, I prefer limit the download to 2mb for security reasons, so I replaced it by
fos.getChannel().transferFrom(rbc, 0, 2097152);
But now, I'm wondering how can I handle the case where the file size is greater than 2mb?
What can I do to check if the file is corrupt or not?

Have you considered checking the Content-Length header as per the RFC? You could then check if this exceeds some acceptable value -- in your case 2MB -- and reject further processing. You could accomplish this with an initial HTTP HEAD request and then a GET if you're happy, or by reading the headers of just the GET response and proceeding with further streaming if acceptable.
Alternatively (but admittedly ugly), you could use a BufferedReader passing in a buffer of 2MB and comparing that with the headers.
As for corruption, you're better off using a checksum as stated in other comments. Of course, this requires you knowing the checksum for the resource up-front, and is not something you're likely to get from the HTTP response itself.

There are actually two aspects to this Question:
how do you know if you've downloaded the entire file, and
how do you know if what you have downloaded is corrupt.
First thing to note is that if you "chop" the file transfer at 2Mb, then if the apparent transferred file size is 2Mb you can be pretty sure that it won't be complete. (By the looks of it, your current code will give you the bytes after any transfer encoding has been decoded ... which simplifies things.)
Next thing to note is that an HTTP response will often include a Content-length header that tells the client how many bytes of (transfer encoded) content to expect in the response body. However, that won't tell you if the bytes you actually received (after decoding) are actually correct. (And besides, this header is optional ... you can't rely on it being there.)
As #ato notes, you would be better off checking the Content-length in the GET (or a HEAD) response before you actually try to read the data.
However, the only sure-fire way to know if you've got a complete / non-corrupt file is to check it against a checksum or (ideally) a crypto-hash that you obtained separately from the transfer. There is no standard way of obtaining a checksum or hash using the HTTP protocol.

Related

How to completely abort the output stream download?

we're currently working on the service that would archive the data and return it to the user as a ZipOutputStream. What we're currently looking for is an option to completely terminate the operation if something goes wrong on the server side. With our current implementation (just closing the response output stream) errors result in a malformed zip at the user side, but it can't be told if the archive is malformed or not before attempting to unzip it. The desired behavior would be something like download termination (from a browser perspective, for instance, it would result in an unsuccessful download indication (red cross icon or something similar, depending on the browser) explicitly telling the user that something went wrong). We're using Spring Boot, so any java code examples would really be appreciated, but if you know the underlying HTTP mechanism that is responsible for this kind of behavior, and can point in the right direction, that would be much appreciated too.
Here's what we have as of now (output being a response output stream of a Spring REST controller (HttpServletResponse.getOutputStream()) :
try (ZipOutputStream zipOutputStream = new ZipOutputStream(outputStream)) {
try {
for (ZipRecordFile fileInfo : zipRecord.listZipFileOverride()) {
InputStream fileStream = getFileStream(fileInfo.s3region(), fileInfo.s3bucket(),
fileInfo.s3key());
ZipEntry zipEntry = new ZipEntry(fileInfo.fileName());
zipOutputStream.putNextEntry(zipEntry);
fileStream.transferTo(zipOutputStream);
}
}
catch (Exception e) {
outputStream.close();
}
}
There isn't a (clean) way to do what you want:
Once you have started writing the ZIP file to the output stream, it is too late to change the HTTP response code. The response code is sent at the start of response.
Therefore, there is no proper way for the HTTP server to tell the HTTP client: "Hey ... ignore that ZIP file I sent you 'cos it is corrupt".
So what are the alternatives?
On the server side, create the entire ZIP as an in-memory object or write it to a temporary file. If you succeed, send an 2xx response followed by the ZIP data. If you fail, send a 4xx or 5xx response.
The main problem is that you need enough memory or file system space to hold the ZIP file.
Redesign your HTTP API so that the client can sent a second request to check if the first request's response contained a complete ZIP file.
You might be able to exploit MIME multipart encoding; see RFC 1341. Each part of a well-formed MIME multipart has a start marker and an end-marker. What you could try is to have your web-app construct the multipart stream containing the ZIP "by hand". If it decides it must abort the ZIP, it could just close the output stream without adding the required end marker.
The main problem with this is that you are depending on the HTTP stack on the client side to tell the browser (or whatever) that the multipart is corrupted. Furthermore, the browser (or whatever) must not pass on the partial (i.e. corrupt) ZIP file on to the user. I'm not sure if you can rely on (particular) web browsers to do that.
If you are running the download via custom code on the client side, you could conceivably implement your own encapsulation protocol. The effect would be the same as for 3 ... but you wouldn't be abusing the MIME spec.

Streaming [Random Access] encrypted (AES-CTR) video on the fly using web proxy (nanoHTTPD)

I have an encrypted (128-AES-CTR-NoPadding) video residing on a server which I need to decrypt as it downloads, so that user can stream it (in normal players/web).
I understand the components of this solution and how they should be put together to make this work. It partially works but for the rest I just can't implement streaming right. I have been reading and learning from examples (most of which is playing a file on disk, which is not the case here) on this for the past week and have come to conclusion this is beyond me and I need some help.
Details
I am using a Lightweight webserver (nanoHttpd) acting as a proxy to download the encrypted data from remote server and serve decrypted data. Below are the main codes inside my NanoHTTPD.serve method.
//create urlConnection to encrypted video file with proper headers (ie range headers) as request received by the proxy server
InputStream inputStream = new CipherInputStream(cipher, urlConnection.getInputStream());
return newChunkedResponse(status, contentType,inputStream);
So now if I go to my NanoHttpd webserver (http://localhost:9000), the file starts downloading and after the download completes, the file is fully decrypted and playable as expected.
So this ensures that getting encrypted data from the server and serving decrypted data is working correctly.
But when any video player (html5, vlc) is asked to stream the video from that url, it simply does not work.
If the above code in NanoHTTPD.serve changed to
//create urlConnection to cleardata video file with proper headers (ie range headers) as request received by the proxy server
InputStream inputStream = urlConnection.getInputStream();
return newChunkedResponse(status, contentType,inputStream);
And then try to stream from the aforementioned players, it'll work just fine.
So this ensures that the web proxy is correcting retrieving and feeding data.
Potential problem
To support range requests from the video player we will need to correctly skip to block boundary that is a multiple of cipher block size. So it's possible that when video player is requesting data with header (range: bytes 34-44), the CipherInputStream is probably failing to decrypt the data since the inputstream has data from 34-44. But I am at a loss on how to do this with urlConnection.getInputStream() and CipherInputStream.
But even without this, it should at least start playing the first few seconds because the first request video player sends is (range: 0-) which means inputStream is starting from index 0 so CipherInputStream should be able to decrypt and serve those initial bytes and the video should continue playing.
I am at a complete loss because I don't know how to debug this. Any ideas, sample codes are welcome, I'll try them out and post the results here.
I have figure this out. I'll post the solution here for others.
The problem here was the ranged requests. If the proxy does not send proper responses to these range requests, the playback will fail. This can fail due to a number of reasons.
Your requests to remote server is missing proper range headers.
Your requests to remote server is returning proper ranged data, but you are not decrypting it correctly. This was my case. Of course this decryption process will vary cipher to cipher. For me, I used (AES/CRT/NOPADDING), I was supplying correct iv for the offset. How to calculate iv for offset is described here.
As far as code samples go, I only had to add one line before
InputStream inputStream = new CipherInputStream(cipher, urlConnection.getInputStream());
return newChunkedResponse(status, contentType,inputStream);
which is
jumpToOffset(cipher,....);
After this everything was working correctly including seeking of the video.

HttpServletRequest.getInputStream() does not unwrap chunked HTTP request

I am in the process of sending a HTTP chunked request to an internal system. I've confirmed other factors are not at play by ensuring that I can send small messages without chunk encoding.
My process was basically to change the Transfer-Encoding header to be chunked and I've removed the Content-Length header. Additionally, I am utilising an in-house ChunkedOutputStream which has been around for quite some time.
I am able to connect, obtain an output stream and send the data. The recipient then returns a 200 response so it seems the request was received and successfully handled. The endpoint receives the HTTP Request, and streams the data straight into a table (using HttpServletRequest.getInputStream()).
On inspecting the streamed data I can see that the chunk encoding information in the stream has not been unwrapped/decoded by the Tomcat container automatically. I've been trawling the Tomcat HTTPConnector documentation and can't find anything that alludes to the chunked encoding w.r.t how a chunk encoded message should be handled within a HttpServlet. I can't see other StackOverflow questions querying this so I suspect I am missing something basic.
My question boils down to:
Should Tomcat automatically decode the chunked encoding from my request and give me a "clean" InputStream when I call HttpServletRequest.getInputStream()?
If yes, is there configuration that needs to be updated to enable this functionality? Am I sending something wrong in the headers that is causing it to return the non-decoded stream?
If no, is it common practice to wrap input stream in a ChunkedInputStream or something similar when the Transfer-Encoding header is present ?
This is solved. As expected it was basic in my case.
The legacy system I was using provided handrolled methods to simplify the process of opening a HTTP Connection, sending headers and then using an OutputStream to send the content via a POST. I didn't realise, and it was in a rather obscure location, but the behind-the-scenes helper's we're identifying that I was not specifying a Content-Length thus added the TRANSFER_ENCODING=chunked header and wrapped the OutputStream in a ChunkedOutputStream. This resulted in me double encoding the contents, hence my endpoints (seeming) inability to decode it.
Case closed.

Unable to download more than 16144 characters when requesting json from API

I have an android application which downloads its information as JSON.
A typical JSON download is about 2,000-3,000 characters. But i wanted to stress it, so I created a larger file (~48,000 characters). As files go this is still small, under 50kb.
The problem I have is when downloading I am only getting 16144 charcters of data. That is reader.readLine() returns just one line containing 16144 characters, as does client.execute(request, new BasicResponseHandler());. Obviously with only part of the file, my JSON parsering code fails quickly as its not a valid JSON object.
There are no exceptions raised, so its not an out of memory error. And the problem is repeatable on a HTC desire (2.2) and Galaxy Nexus (4.1.1), so not OS specific either. I've tested the URL in a web browser and it works fine, all the JSON is available so its not a server error.
Question
Can anyone point out why it is downloading only 16144 characters, and how to make it download the whole file?
Method #1
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(uri);
HttpResponse response = client.execute(request);
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
result.setJSONResult(str.toString());
Method #2
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(uri);
HttpResponse response = client.execute(request);
String json = client.execute(request, new BasicResponseHandler());
result.setJSONResult(json);
Note - The url is on a LAN network (http://192.168.0.99:8080...), so I've not included it as it won't be useful.
Update - Fixed
Fixed the problem. In the end I put it down to a file transfer issue rather than memory limits of the phone. Whilst it worked on a PC (Chrome), I found it was broken in other places other than on android such as on the website and other browsers (Safari) didn't work with the raw API call. The underlying problem was the webserver's proxy ngix, wanted to buffer larger responses (over 32kb) however it never had write permissions on the server folders it used for buffering. This meant it sent part of the file, started to buffer and hit a critial error due to been unable to write. When it errored, it stopped sending the rest of the file hence it stopping at an unusual number of bytes. Thanks for all your help!
its because that's the max size a string can hold -- always 2147483647 (2^31 - 1) by the Java specification, the maximum size of an array, which the String class uses for internal storage) or half your maximum heap size (since each character is two bytes), whichever is smaller.
and probably the heap size ll be less than 40kbs
you can use json reader instead of using a string to store the data from web pls refer http://developer.android.com/reference/android/util/JsonReader.html
You are using a line-based reader to read data that is not line-based. When you call readLine, you are asking it to forcefully convert whatever it read into a line of text. This mangles the data if it's not in fact a line of text.
Fixed the problem. In the end I put it down to a file transfer issue rather than memory limits of the phone. Whilst it worked on a PC (Chrome), I found it was broken in other places other than on android such as on the website and other browsers (Safari) didn't work with the raw API call. The underlying problem was the webserver's proxy ngix, wanted to buffer larger responses (over 32kb) however it never had write permissions on the server folders it used for buffering. This meant it sent part of the file, started to buffer and hit a critial error due to been unable to write. When it errored, it stopped sending the rest of the file hence it stopping at an unusual number of bytes. Thanks for all your help!

Java: Stream Contents of Zipfile via HTTP

I have quite some amount of streamable data (>100MB), which, for the sake of compression, i would like to host packed in a zipfile on an http-server. So this zipfile contains a single file.
Now is it possible for a java-client to stream the data via http, even though it is packed in a zipfile?
According to wikipedia, ZIPs are not sequentially...
http://en.wikipedia.org/wiki/ZIP_(file_format)#Structure
If this is still possible somehow, then how?
edit: about gzip: as i said, i use a custom java client (not a webbrowser) is gzip available in the java http implementation?
Here's a snippet of code (that works) that the client can use to read from the zipped stream:
static void processZippedInputStream(InputStream in, String entryNameRegex)
throws IOException
{
ZipInputStream zin = new ZipInputStream(in);
ZipEntry ze;
while ((ze = zin.getNextEntry()) != null)
{
if (ze.getName().matches(entryNameRegex))
{
// treat zin as a normal input stream - ie read() from it till "empty" etc
break;
}
zin.closeEntry();
}
zin.close();
}
The main difference with a normal InputStream is iterating through the entries. You may know, for example, that you want the first entry, so no need for the name matching parameter etc.
Java supports the gzip format with the GZipInputStream (decompressing) and GZipOutputStream (compressing). Both zip and gzip use the same compressing format internally, the main difference is in the metadata: zip has it at the end of the file, gzip at the beginning (and gzip only supports one enclosed file easily).
For your of streaming one big file, using gzip will be the better thing to do - even more as you don't need access to the metadata.
I'm not sure if the HTTPConnection sends Accept-Encoding: gzip and then handles inflating the content automatically if the server delivers it with Content-Encoding: gzip, but you surely can do it manually if the server simply sends a the .gz file as such (i.e. with Content-Encoding: identity).
(By the way, make sure to read from the stream with not too small buffers, as each deflate call will have a native call overhead, since Java's GZipInputStream uses the native zlib implementation.)
Would it make more sense to let the web server do the zipping? If you are simply trying to reduce the amount of bandwidth being used, rather than really wanting to store the file zipped up on the server, this would simply be a matter of configurations, for example see:
http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
for HTTP/1.1 GZIP compression. The server can force the response to the client to be zipped.
See also http://en.wikipedia.org/wiki/HTTP_compression.
The client will receive zipped packets and handle the unzipping. It should be possible to stream the file too, so the client doesn't need all the file before it can do something useful, because the server can zip individual chunks.
Yes you can, Stream the zip and use the MIME type as application/zip
If you actually want to play stream music on the other end, then it can't be done trivially as you can only unpack once the entire zip is available on client.
If size is you concern, you can either turn down your mp3 bit-rate or use formats such as ogg/vorbis
Use GZIP and then you can stream. Gzip uses the default compression algorithm of zip anyway.

Categories