The piece of code below downloads a file from some URL and saves it to a local file. Piece of cake. What could possible be wrong here?
protected long download(ProgressMonitor montitor) throws Exception{
long size = 0;
DataInputStream dis = new DataInputStream(is);
int read = 0;
byte[] chunk = new byte[chunkSize];
while( (read = dis.read(chunk)) != -1){
os.write(chunk, 0, read);
size += read;
if(montitor != null)
montitor.worked(read);
}
chunk = null;
dis.close();
os.flush();
os.close();
return size;
}
The reason I am posting a question here is because it works in 99.999% of the time and doesn't work as expected whenever there is an antivirus or some other protection software installed on a computer running this code. I am blindly pointing a finger that way because whenever I stop (or disable) it, the code works perfect again. The end result of such interference is that the MD5 of downloaded file don't match the expected, and a whole new saga begins.
So, the question is - is it really possible that some smart "protection" software would alter the actual stream coming from the URL without me knowing about it? And if yes - how do you deal with this? (verified with Kasperksy and Norton products).
EDIT-1:
Apparently I've got a hold on the problem and it's got nothing to do with antiviruses. The download takes place from the FTP server (FileZilla in particular) and we use apache commons ftp on client side . What I did is went to the FTP server and terminated the connection (kicked it out) in a middle of the download. I expected that is.read(..) would throw an IOException on client side, but this never happened. Instead, the is.read(..) returns -1 meaning that there is no more data coming from the stream. This is definitely unexpected and explains why sometimes I get partial files. This doesn't explain however why sometimes the data gets altered as well.
Yeah this happens to me all the time. In my case it's caused by transparent HTTP proxying by Websense on my corporate network. The worst problem are caused by the block page being returned with 200 OK.
Do you get the same or similar corruption every time? E.g., do you get some HTML explaining why the request was blocked? The best you can probably do is compare the first few bytes of the downloaded data to some text in the block page, and throw an exception in this case.
Edit: based on your update, have you got the FTP client set to image/binary mode?
Related
I am trying to publish a large video/image file from the local file system to an http path, but I run into an out of memory error after some time...
here is the code
public boolean publishFile(URI publishTo, String localPath) throws Exception {
InputStream istream = null;
OutputStream ostream = null;
boolean isPublishSuccess = false;
URL url = makeURL(publishTo.getHost(), this.port, publishTo.getPath());
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
if (conn != null) {
try {
conn.setDoOutput(true);
conn.setDoInput(true);
conn.setRequestMethod("PUT");
istream = new FileInputStream(localPath);
ostream = conn.getOutputStream();
int n;
byte[] buf = new byte[4096];
while ((n = istream.read(buf, 0, buf.length)) > 0) {
ostream.write(buf, 0, n); //<--- ERROR happens on this line.......???
}
int rc = conn.getResponseCode();
if (rc == 201) {
isPublishSuccess = true;
}
} catch (Exception ex) {
log.error(ex);
} finally {
if (ostream != null) {
ostream.close();
}
if (istream != null) {
istream.close();
}
}
}
return isPublishSuccess;
}
HEre is the error i am getting...
Exception in thread "Thread-8773" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at sun.net.www.http.PosterOutputStream.write(PosterOutputStream.java:61)
at com.test.HTTPClient.publishFile(HTTPClient.java:110)
at com.test.HttpFileTransport.put(HttpFileTransport.java:97)
The HttpUrlConnection is buffering the data so that it can set the Content-Length header (per HTTP spec).
One alternative, if your destination server supports it, is to use "chunked" transfers. This will buffer only a small portion of data at a time. However, not all services support it (Amazon S3, for example, doesn't).
Another alternative (and imo a better one) is to use Jakarta HttpClient. You can set the "entity" in a request from a file, and the connection code will set request headers appropriately.
Edit: nos commented that the OP could call HttpURLConnection.setFixedLengthStreamingMode(long length). I was unaware of this method; it was added in 1.5, and I haven't used this class since then.
However, I still suggest using Jakarta HttpClient, for the simple reason that it reduces the amount of code that the OP has to maintain. Code that is boilerplate, yet still has the potential for errors:
The OP correctly handles the loop to copy between input and output. Usually when I see an example of this, the poster either doesn't properly check the returned buffer size, or keeps re-allocating the buffers. Congratulations, but you now have to ensure that your successors take as much care.
The exception handling isn't quite so good. Yes, the OP remembers to close the connections in a finally block, and again, congratulations on that. Except that either of the close() calls could throw IOException, keeping the other from executing. And the method as a whole throws Exception, so that the compiler isn't going to help catch similar errors.
I count 31 lines of code to setup and execute the response (excluding the response code check and the URL computation, but including the try/catch/finally). With HttpClient, this would be somewhere in the range of a half dozen LOC.
Even if the OP had written this code perfectly, and refactored it into methods similar to those in Jakarta Commons IO, s/he shouldn't do that. This code has been written and tested by others. I know that it's a waste of my time to rewrite it, and suspect that it's a waste of the OP's time as well.
conn.setFixedLengthStreamingMode((int) new File(localpath).length());
And for buffering you could cover your streams into the BufferedOutputStream and BufferedInputStream
Good example of chunked uploading you could find there: gdata-java-client
The problem is that the HttpURLConnection class is using a byte array to store your data. Presumably this video you are pushing is taking more memory than available. You have a few options here:
Increase the memory to your application. You can use the -Xmx1024m option to give 1GB of memory to your application. This will increase the amount of data you can store in memory.
If you still run out of memory, you might want to consider trying another library to push the video up that does not store the data all in memory at once. The Apache Commons HttpClient has such a feature. See this site for more information: http://hc.apache.org/httpclient-3.x/features.html. See this section for multi-part form upload of large files: http://hc.apache.org/httpclient-3.x/methods/multipartpost.html
For anything other than basic GET operations, the built-in java.net HTTP stuff isn't very good. Using Apache Commons HttpClient is recommended for this. It lets you do much more intuitive stuff like this:
PutMethod put = new PutMethod(url);
put.setRequestEntity(new FileRequestEntity(localFile, contentType));
int responseCode = put.executeMethod();
which replaces a lot of your boiler-plate code.
HttpsURLConnection#setChunkedStreamingMode(1024 * 1024 * 10); //10MB chunk
This ensures that any file (of any size) is streamed over a https connection, without internal buffering. This should be used when the file size or the content length is unknown.
Your problem is that you're trying to fix X video bytes into X/N bytes of RAM, when N > 1.
You either need to read the video into a smaller buffer and write it out as you go or make the file smaller or increase the memory available to your process.
Check your heap size. You can use -Xmx to increase it if you've taken the default.
I have an application that does a lot work on S3, mostly downloading files from it. I am seeing a lot of these kind of errors and I'd like to know if this is something on my code or if the service is really unreliable like this.
The code I'm using to read from the S3 object stream is as follows:
public static final void write(InputStream stream, OutputStream output) {
byte[] buffer = new byte[1024];
int read = -1;
try {
while ((read = stream.read(buffer)) != -1) {
output.write(buffer, 0, read);
}
stream.close();
output.flush();
output.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
This OutputStream is a new BufferedOutputStream( new FileOutputStream( file ) ). I am using the latest version of the Amazon S3 Java client and this call is retried four times before giving up. So, after trying this for 4 times it still fails.
Any hints or tips on how I could possibly improve this are appreciated.
I just managed to overcome a very similar problem. In my case the exception I was getting was identical; it happened for larger files but not for small files, and it never happened at all while stepping through the debugger.
The root cause of the problem was that the AmazonS3Client object was getting garbage collected in the middle of the download, which caused the network connection to break. This happened because I was constructing a new AmazonS3Client object with every call to load a file, while the preferred use case is to create a long-lasting client object that survives across calls - or at least is guaranteed to be around during the entirety of the download. So, the simple remedy is to make sure a reference to the AmazonS3Client is kept around so that it doesn't get GC'd.
A link on the AWS forums that helped me is here: https://forums.aws.amazon.com/thread.jspa?threadID=83326
The network is closing the connection, prior to the client getting all the data, for one reason or another, that's what is going on.
Part of any HTTP Request is the content length, Your code is getting the header, saying hey buddy, here's data, and its this much of it.. and then the connection is dropping before the client has read all of the data.. so its bombing out with the exception.
I'd look at your OS/NETWORK/JVM connection timeout settings (though JVM generally inherit from the OS in this situation). The key is to figure out what part of the network is causing the problem. Is it your computer level settings saying, nope not going to wait any longer for packets.. is it that you are using a non blocking read, which has a timeout setting in your code, where it is saying, hey, haven't gotten any data from the server since longer than I'm supposed to wait so I'm going to drop the connection and exception. etc etc etc.
Best bet is to low level snoop the packet traffic and trace backwards, to see where the connection drop is happening, or see if you can up timeouts in things you can control, like your software, and OS/JVM.
First of all, your code is operating entirely normally if (and only if) you suffer connectivity troubles between yourself and Amazon S3. As Michael Slade points out, standard connection-level debugging advice applies.
As to your actual source code, I note a few code smells you should be aware of. Annotating them directly in the source:
public static final void write(InputStream stream, OutputStream output) {
byte[] buffer = new byte[1024]; // !! Abstract 1024 into a constant to make
// this easier to configure and understand.
int read = -1;
try {
while ((read = stream.read(buffer)) != -1) {
output.write(buffer, 0, read);
}
stream.close(); // !! Unexpected side effects: closing of your passed in
// InputStream. This may have unexpected results if your
// stream type supports reset, and currently carries no
// visible documentation.
output.flush(); // !! Violation of RAII. Refactor this into a finally block,
output.close(); // a la Reference 1 (below).
} catch (IOException e) {
throw new RuntimeException(e); // !! Possibly indicative of an outer
// try-catch block for RuntimeException.
// Consider keeping this as IOException.
}
}
(Reference 1)
Otherwise, the code itself seems fine. IO exceptions should be expected occurrences in situations where you're connecting to a fickle remote host, and your best course of action is to draft a sane policy to cache and reconnect in these scenarios.
Try using wireshark to see what is happening on the wire when this happens.
Try temporarily replacing S3 with your own web server and see if the problem persists. If it does it's your code and not S3.
The fact that it's random suggests network issues between your host and some of the S3 hosts.
Also S3 could close slow connections according to my experience.
I would take a very close look at the network equipment nearest your client app. This problem smacks of some network device dropping packets between you and the service. Look to see if there was a starting point when the problem first occurred. Was there any change like a firmware update to a router or replacement of a switch around that time?
Verify your bandwidth usage against the amount purchased from your ISP. Are there times of the day where you're approaching that limit? Can you obtain graphs of your bandwidth usage? See if the premature terminations can be correlated with high-bandwidth usage, particularly if it approaches some known limit. Does the problem seem to pick on smaller files and on large files only when they're almost finished downloading? Purchasing more bandwidth from your ISP may fix the problem.
I am trying to write a server that accepts files and write it in certain directory using DataInputStream and BufferedInputStream.
The server gets 'user name(string)' 'number of files(int)' 'file name(string)' 'size of each file(long)' and 'contents of file which is uninterpreted bytes(byte[])'
and if everything is successful then, I am supposed to send boolean value.
But the problem is that it is not receiving file correctly.
From time to time I get 'broken pipe' error message or the file is corrupted after I receive.
Fixed the problem..
One small thing which may be related to your problem. You should be decrementing your file size variable by the number of bytes actually read, instead of the number of bytes requested to be read:
while(fileSize>0){
if(fileSize < byteSize)
byteSize = (int)fileSize;
int byteRead = din.read(b, 0, byteSize);
fos.write(b);
fileSize -= byteRead; // <-- See here
}
You might be getting this error if when reading the input, the sender closes the connection. It probably has nothing to do with your code. The sender might have timed out, closed the connection before the transfer has finished, or many other things.
Take a look at this related question: How to fix java.net.SocketException: Broken pipe?
I am trying to publish a large video/image file from the local file system to an http path, but I run into an out of memory error after some time...
here is the code
public boolean publishFile(URI publishTo, String localPath) throws Exception {
InputStream istream = null;
OutputStream ostream = null;
boolean isPublishSuccess = false;
URL url = makeURL(publishTo.getHost(), this.port, publishTo.getPath());
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
if (conn != null) {
try {
conn.setDoOutput(true);
conn.setDoInput(true);
conn.setRequestMethod("PUT");
istream = new FileInputStream(localPath);
ostream = conn.getOutputStream();
int n;
byte[] buf = new byte[4096];
while ((n = istream.read(buf, 0, buf.length)) > 0) {
ostream.write(buf, 0, n); //<--- ERROR happens on this line.......???
}
int rc = conn.getResponseCode();
if (rc == 201) {
isPublishSuccess = true;
}
} catch (Exception ex) {
log.error(ex);
} finally {
if (ostream != null) {
ostream.close();
}
if (istream != null) {
istream.close();
}
}
}
return isPublishSuccess;
}
HEre is the error i am getting...
Exception in thread "Thread-8773" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at sun.net.www.http.PosterOutputStream.write(PosterOutputStream.java:61)
at com.test.HTTPClient.publishFile(HTTPClient.java:110)
at com.test.HttpFileTransport.put(HttpFileTransport.java:97)
The HttpUrlConnection is buffering the data so that it can set the Content-Length header (per HTTP spec).
One alternative, if your destination server supports it, is to use "chunked" transfers. This will buffer only a small portion of data at a time. However, not all services support it (Amazon S3, for example, doesn't).
Another alternative (and imo a better one) is to use Jakarta HttpClient. You can set the "entity" in a request from a file, and the connection code will set request headers appropriately.
Edit: nos commented that the OP could call HttpURLConnection.setFixedLengthStreamingMode(long length). I was unaware of this method; it was added in 1.5, and I haven't used this class since then.
However, I still suggest using Jakarta HttpClient, for the simple reason that it reduces the amount of code that the OP has to maintain. Code that is boilerplate, yet still has the potential for errors:
The OP correctly handles the loop to copy between input and output. Usually when I see an example of this, the poster either doesn't properly check the returned buffer size, or keeps re-allocating the buffers. Congratulations, but you now have to ensure that your successors take as much care.
The exception handling isn't quite so good. Yes, the OP remembers to close the connections in a finally block, and again, congratulations on that. Except that either of the close() calls could throw IOException, keeping the other from executing. And the method as a whole throws Exception, so that the compiler isn't going to help catch similar errors.
I count 31 lines of code to setup and execute the response (excluding the response code check and the URL computation, but including the try/catch/finally). With HttpClient, this would be somewhere in the range of a half dozen LOC.
Even if the OP had written this code perfectly, and refactored it into methods similar to those in Jakarta Commons IO, s/he shouldn't do that. This code has been written and tested by others. I know that it's a waste of my time to rewrite it, and suspect that it's a waste of the OP's time as well.
conn.setFixedLengthStreamingMode((int) new File(localpath).length());
And for buffering you could cover your streams into the BufferedOutputStream and BufferedInputStream
Good example of chunked uploading you could find there: gdata-java-client
The problem is that the HttpURLConnection class is using a byte array to store your data. Presumably this video you are pushing is taking more memory than available. You have a few options here:
Increase the memory to your application. You can use the -Xmx1024m option to give 1GB of memory to your application. This will increase the amount of data you can store in memory.
If you still run out of memory, you might want to consider trying another library to push the video up that does not store the data all in memory at once. The Apache Commons HttpClient has such a feature. See this site for more information: http://hc.apache.org/httpclient-3.x/features.html. See this section for multi-part form upload of large files: http://hc.apache.org/httpclient-3.x/methods/multipartpost.html
For anything other than basic GET operations, the built-in java.net HTTP stuff isn't very good. Using Apache Commons HttpClient is recommended for this. It lets you do much more intuitive stuff like this:
PutMethod put = new PutMethod(url);
put.setRequestEntity(new FileRequestEntity(localFile, contentType));
int responseCode = put.executeMethod();
which replaces a lot of your boiler-plate code.
HttpsURLConnection#setChunkedStreamingMode(1024 * 1024 * 10); //10MB chunk
This ensures that any file (of any size) is streamed over a https connection, without internal buffering. This should be used when the file size or the content length is unknown.
Your problem is that you're trying to fix X video bytes into X/N bytes of RAM, when N > 1.
You either need to read the video into a smaller buffer and write it out as you go or make the file smaller or increase the memory available to your process.
Check your heap size. You can use -Xmx to increase it if you've taken the default.
In my Android program, I have some code that downloads a file. This works fine, but since on a cell phone, you can be disconnected at any time, I need to change it do it reconnects and resumes the download when you are halfway through and somebody calls/you lose cell reception/etc. I cannot figure out how to detect the InputStream has stopped working. See the code below:
InputStream in = c.getInputStream();
byte[] buffer = new byte[8024];
int len1 = 0;
while ( (len1 = in.read(buffer)) > 0 ) {
Log("-"+len1+"- Downloaded.");
f.write(buffer,0, len1);
Thread.sleep(50);
}
When I lose internet connection, My log shows:
Log: -8024- Downloaded.
Log: -8024- Downloaded.
Log: -8024- Downloaded.
Log: -8024- Downloaded.
Log: -6024- Downloaded. (some lower number)
And then my program just hangs on the while( (len1 = etc. I need to make it so when the internet gets disconnected I wait for the internet to be connected again and then resume the download.
Take a look here: http://developer.android.com/reference/java/nio/channels/SocketChannel.html
EDIT (based on comment):
http://www.jguru.com/faq/view.jsp?EID=72378
So thoughts based on the above.... you might put the reading in a thread and periodically check to see if the thread has stopped reading data (update a shared variable probably). If it has kill the connection and the thread and deal with it however you need to.
Another alternative is to not use the HTTPURLConnection and deal with the bits you need your self.