Downloaded files are corrupted when buffer length is > 1

Downloaded files are corrupted when buffer length is > 1 - java

I'm trying to write a function which downloads a file at a specific URL. The function produces a corrupt file unless I make the buffer an array of size 1 (as it is in the code below).
The ternary statement above the buffer initialization (which I plan to use) along with hard-coded integer values other than 1 will manufacture a corrupted file.
Note: MAX_BUFFER_SIZE is a constant, defined as 8192 (2^13) in my code.
public static void downloadFile(String webPath, String localDir, String fileName) {
try {
File localFile;
FileOutputStream writableLocalFile;
InputStream stream;
url = new URL(webPath);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
int size = connection.getContentLength(); //File size in bytes
int read = 0; //Bytes read
localFile = new File(localDir);
//Ensure that directory exists, otherwise create it.
if (!localFile.exists())
localFile.mkdirs();
//Ensure that file exists, otherwise create it.
//Note that if we define the file path as we do below initially and call mkdirs() it will create a folder with the file name (I.e. test.exe). There may be a better alternative, revisit later.
localFile = new File(localDir + fileName);
if (!localFile.exists())
localFile.createNewFile();
writableLocalFile = new FileOutputStream(localFile);
stream = connection.getInputStream();
byte[] buffer;
int remaining;
while (read != size) {
remaining = size - read; //Bytes still to be read
//remaining > MAX_BUFFER_SIZE ? MAX_BUFFER_SIZE : remaining
buffer = new byte[1]; //Adjust buffer size according to remaining data (to be read).
read += stream.read(buffer); //Read buffer-size amount of bytes from the stream.
writableLocalFile.write(buffer, 0, buffer.length); //Args: Bytes to read, offset, number of bytes
}
System.out.println("Read " + read + " bytes.");
writableLocalFile.close();
stream.close();
} catch (Throwable t) {
t.printStackTrace();
}
}
The reason I've written it this way is so I may provide a real time progress bar to the user as they are downloading. I've removed it from the code to reduce clutter.

len = stream.read(buffer);
read += len;
writableLocalFile.write(buffer, 0, len);
You must not use buffer.length as the bytes read, you need to use the return value of the read call. Because it might return a short read and then your buffer contains junk (0 bytes or data from previous reads) after the read bytes.
And besides calculating the remaining and using dynamic buffers just go for 16k or something like that. The last read will be short, which is fine.

InputStream.read() may read number of bytes fewer than you requested. But you always append whole buffer to the file. You need to capture actual number of read bytes and append only those bytes to the file.
Additionally:
Watch for InputStream.read() to return -1 (EOF)
Server may return incorrect size. As such, the check read != size is dangerous. I would advise not to rely on the Content-Length HTTP field altogether. Instead, just keep reading from the input stream until you hit EOF.

Related

FileInputStream and DataOutputStream - handling byte[] buffer

I've been working on an app to move files between two hosts and while I got the transfer process to work (code is still really messy so sorry for that, I'm still fixing it) I'm kinda left wondering how exactly it handles the buffer. I'm fairly new to networking in java so I just don't want to end up with "meh i got it to work so let's move on" attitude.
File sending code.
public void sendFile(String filepath, DataOutputStream dos) throws Exception{
if (new File(filepath).isFile()&&dos!=null){
long size = new File(filepath).length();
String strsize = Long.toString(size) +"\n";
//System.out.println("File size in bytes: " + strsize);
outToClient.writeBytes(strsize);
FileInputStream fis = new FileInputStream(filepath);
byte[] filebuffer = new byte[8192];
while(fis.read(filebuffer) > 0){
dos.write(filebuffer);
dos.flush();
}
File recieving code
public void saveFile() throws Exception{
String size = inFromServer.readLine();
long longsize = Long.parseLong(size);
//System.out.println(longsize);
String tmppath = currentpath + "\\" + tmpdownloadname;
DataInputStream dis = new DataInputStream(clientSocket.getInputStream());
FileOutputStream fos = new FileOutputStream(tmppath);
byte[] filebuffer = new byte[8192];
int read = 0;
int remaining = (int)longsize;
while((read = dis.read(filebuffer, 0, Math.min(filebuffer.length, remaining))) > 0){
//System.out.println(Math.min(filebuffer.length, remaining));
//System.out.println(read);
//System.out.println(remaining);
remaining -= read;
fos.write(filebuffer,0, read);
}
}
I'd like to know how exactly buffers on both sides are handled to avoid writing wrong bytes. (ik how receiving code avoids that but i'd still like to know how byte array is handled)
Does fis/dis always wait for buffers to fill up fully? In receiving code it always writes full array or remaining length if it's less than filebuffer.length but what about fis from sending code.

In fact, your code could have a subtle bug, exactly because of the way you handle buffers.
When you read a buffer from the original file, the read(byte[]) method returns the number of bytes actually read. There is no guarantee that, in fact, all 8192 bytes have been read.
Suppose you have a file with 10000 bytes. Your first read operation reads 8192 bytes. Your second read operation, however, will only read 1808 bytes. The third operation will return -1.
In the first read, you write exactly the bytes that you have read, because you read a full buffer. But in the second read, your buffer actually contains 1808 correct bytes, and the remaining 6384 bytes are wrong - they are still there from the previous read.
In this case you are lucky, because this only happens in the last buffer that you write. Thus, the fact that you stop reading on your client side when you reach the pre-sent length causes you to skip those 6384 wrong bytes which you shouldn't have sent anyway.
But in fact, there is no actual guarantee that reading from the file will return 8192 bytes even if the end was not reached yet. The method's contract does not guarantee that, and it's up to the OS and underlying file system. It could, for example, send you 5000 bytes in your first read, and 5000 in your second read. In this case, you would be sending 3192 wrong bytes in the middle of the file.
Therefore, your code should actually look like:
byte[] filebuffer = new byte[8192];
int read = 0;
while(( read = fis.read(filebuffer)) > 0){
dos.write(filebuffer,0,read);
dos.flush();
}
much like the code you have on the receiving side. This guarantees that only the actual bytes read will be written.
So there is nothing actually magical about the way buffers are handled. You give the stream a buffer, you tell it how much of the buffer it's allowed to fill, but there is no guarantee it will fill all of it. It may fill less and you have to take care and use only the portion it tells you it fills.
Another grave mistake you are making, though, is to just convert the long that you received into an int in this line:
int remaining = (int)longsize;
Files may be longer than an integer contains. Especially things like long videos etc. This is why you get that number as a long in the first place. Don't truncate it like that. Keep the remaining as long and change it to int only after you have taken the minimum (because you know the minimum will always be in the range of an int).
long remaining = longsize;
long fileBufferLen = filebuffer.length;
while((read = dis.read(filebuffer, 0, (int)Math.min(fileBufferLen, remaining))) > 0){
...
}
By the way, there is no real reason to use a DataOutputStream and DataInputStream for this. The read(byte[]), read(byte[],int,int), write(byte[]), and write(byte[],int,int) are inherited from the underlying InputStream and there is no reason not to use the socket's OutputStream/InputStream directly, or use a BufferedOutputStream/BufferedOutputStream to wrap it. There is also no need to use flush until you have finished writing/reading.
Also, do not forget to close at least your file input/output streams when you are done with them. You may want to keep the socket input/output streams open for continued communication, but there is no need to keep the files themselves open, it may cause problems. Use a try-with-resources to guarantee that they are closed.

Memory problems loading a file, plus converting into hex

I'm trying to make a file hexadecimal converter (input file -> output hex string of the file)
The code I came up with is
static String open2(String path) throws FileNotFoundException, IOException,OutOfMemoryError {
System.out.println("BEGIN LOADING FILE");
StringBuilder sb = new StringBuilder();
//sb.ensureCapacity(2147483648);
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
//System.out.println(sb.capacity());
sb.append(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return sb.toString();
}
I am sure that "path" is a valid filename.
The problem is with big files (>=
500mb), the compiler outputs a OutOfMemoryError: Java Heap Space on the StringBuilder.append.
To create this code I followed some tips from http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly but I got a doubt when I tried to force a space allocation for the StringBuilder sb: "2147483648 is too big for an int".
If I want to use this code even with very big files (let's say up to 2gb if I really have to stop somewhere) what's the better way to output a hexadecimal string conversion of the file in terms of speed?
I'm now working on copying the converted string into a file. Anyway I'm having problems of "writing the empty buffer on the file" after the eof of the original one.
static String open3(String path) throws FileNotFoundException, IOException {
System.out.println("BEGIN LOADING FILE (Hope this is the last change)");
FileWriter fos = new FileWriter("HEXTMP");
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
fos.write(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return "HEXTMP";
}
obviously the file HEXTMP created has a size multiple of 256k, but if the file is 257k it will be a 512 file with LOT of "000000" at the end.
I know I just have to create a last byte array with cut length.
(I used a file writer because i wanted to write the string of hex; otherwise it would have just copied the file as-is)

Why are you loading complete file?
You can load few bytes in buffer from input file, process bytes in buffer, then write processed bytes buffer to output file. Continue this till all bytes from input file are not processed.
FileInputStream fis = new FileInputStream("in file");
FileOutputStream fos = new FileOutputStream("out");
byte buffer [] = new byte[8192];
while(true){
int count = fis.read(buffer);
if(count == -1)
break;
byte[] processed = processBytesToConvert(buffer, count);
fos.write(processed);
}
fis.close();
fos.close();
So just read few bytes in buffer, convert it to hex string, get bytes from converted hex string, then write back these bytes to file, and continue for next few input bytes.

The problem here is that you try to read the whole file and store it in memory.
You should use stream, read some lines of your input file, convert them and write them in the output file. That way your program can scale, whatever the size of the input file is.

The key would be to read file in chunks instead of reading all of it in one go. Depending on its use you could vary size of the chunk. For example, if you are trying to make a hex viewer / editor determine how much content is being shown in the viewport and read only as much of data from file. Or if you are simply converting and dumping hex to another file use any chunk size that is small enough to fit in memory but big enough for performance. This should be tunable over some runs. Perhaps use filesystem NIO in Java 7 so that you can do all three tasks - reading, processing and writing - concurrently. The link included in question gives good primer on reading files.

Java Read File Larger than 2 GB (Using Chunking)

I'm implementing a file transfer server, and I've run into an issue with sending a file larger than 2 GB over the network. The issue starts when I get the File I want to work with and try to read its contents into a byte[]. I have a for loop :
for(long i = 0; i < fileToSend.length(); i += PACKET_SIZE){
fileBytes = getBytesFromFile(fileToSend, i);
where getBytesFromFile() reads a PACKET_SIZE amount of bytes from fileToSend which is then sent to the client in the for loop. getBytesFromFile() uses i as an offset; however, the offset variable in FileInputStream.read() has to be an int. I'm sure there is a better way to read this file into the array, I just haven't found it yet.
I would prefer to not use NIO yet, although I will switch to using that in the future. Indulge my madness :-)

It doesn't look like you're reading data from the file properly. When reading data from a stream in Java, it's standard practice to read data into a buffer. The size of the buffer can be your packet size.
File fileToSend = //...
InputStream in = new FileInputStream(fileToSend);
OutputStream out = //...
byte buffer[] = new byte[PACKET_SIZE];
int read;
while ((read = in.read(buffer)) != -1){
out.write(buffer, 0, read);
}
in.close();
out.close();
Note that, the size of the buffer array remains constant. But-- if the buffer cannot be filled (like when it reaches the end of the file), the remaining elements of the array will contain data from the last packet, so you must ignore these elements (this is what the out.write() line in my code sample does)

Umm, realize that your handling of the variable i is not correct..
Iteration 0: i=0
Iteration 1: i=PACKET_SIZE
...
...
Iteration n: i=PACKET_SIZE*n

Counted bytes and total bytes differ

I'm writing an Android application which copies files from the assets to one file on the device's drive (no permission problems, bytes get from the assets to the drive). The file that I need to copy is larger than 1 MB, so I split it up into multiple files, and I copy them with something like:
try {
out = new FileOutputStream(destination);
for (InputStream file : files /* InputStreams from assets */) {
copyFile(file);
file.close();
}
out.close();
System.out.println(bytesCopied); // shows 8716288
System.out.println(new File(destination).length()); // shows 8749056
} catch (IOException e) {
Log.e("ERROR", "Cannot copy file.");
return;
}
Then, the copyFile() method:
private void copyFile(InputStream file) throws IOException {
byte[] buffer = new byte[16384];
int length;
while ((length = file.read(buffer)) > 0) {
out.write(buffer);
bytesCopied += length;
out.flush();
}
}
The correct number of total bytes that the destination file should contain is 8716288 (that's what I get when I look at the original files and if I count the written bytes in the Android application), but new File(destination).length() shows 8749056.
What am I doing wrong?

The file size becomes too large because you are not writing length bytes for each write, you are actually writing the whole buffer each time, which is buffer.length() bytes long.
You should use the write(byte[] b, int off, int len) overload instead, to specify how many bytes in the buffer you want to be written on each iteration.

Didn't you mean to write
out.write(buffer, 0, length);
instead of
out.write(buffer);
Otherwise you would always write the complete buffer, even if less bytes were read. This may then lead to a larger file (filled with some garbage between your original data).

How to I find out the size of a GZIP section embedded in firmware?

I am currently analyzing firmware images which contain many different sections, one of which is a GZIP section.
I am able to know the location of the start of the GZIP section using magic number and the GZIPInputStream in Java.
However, I need to know the compressed size of the gzip section. GZIPInputStream would return me the uncompressed file size.
Is there anybody who has an idea?

You can count the number of byte read using a custom InputStream. You would need to force the stream to read one byte at a time to ensure you don't read more than you need.
You can wrap your current InputStream in this
class CountingInputStream extends InputStream {
final InputStream is;
int counter = 0;
public CountingInputStream(InputStream is) {
this.is = is;
}
public int read() throws IOException {
int read = is.read();
if (read >= 0) counter++;
return read;
}
}
and then wrap it in a GZIPInputStream. The field counter will hold the number of bytes read.
To use this with BufferedInputStream you can do
InputStream is = new BufferedInputStream(new FileInputStream(filename));
// read some data or skip to where you want to start.
CountingInputStream cis = new CountingInputStream(is);
GZIPInputStream gzis = new GZIPInputStream(cis);
// read some compressed data
cis.read(...);
int dataRead = cis.counter;

In general, there is no easy way to tell the size of the gzipped data, other than just going through all the blocks.
gzip is a stream compression format, meaning that all the compressed data is written in a single pass. There is no way to stash the compressed size anywhere---it can't be in the header, since that would require more than one pass, and it's useless to have it at the trailer, since if you can locate the trailer, then you already know the compressed size.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Downloaded files are corrupted when buffer length is > 1 - java

Related

FileInputStream and DataOutputStream - handling byte[] buffer

Memory problems loading a file, plus converting into hex

Java Read File Larger than 2 GB (Using Chunking)

Counted bytes and total bytes differ

How to I find out the size of a GZIP section embedded in firmware?

Categories

Resources