Counted bytes and total bytes differ - java

I'm writing an Android application which copies files from the assets to one file on the device's drive (no permission problems, bytes get from the assets to the drive). The file that I need to copy is larger than 1 MB, so I split it up into multiple files, and I copy them with something like:
try {
out = new FileOutputStream(destination);
for (InputStream file : files /* InputStreams from assets */) {
copyFile(file);
file.close();
}
out.close();
System.out.println(bytesCopied); // shows 8716288
System.out.println(new File(destination).length()); // shows 8749056
} catch (IOException e) {
Log.e("ERROR", "Cannot copy file.");
return;
}
Then, the copyFile() method:
private void copyFile(InputStream file) throws IOException {
byte[] buffer = new byte[16384];
int length;
while ((length = file.read(buffer)) > 0) {
out.write(buffer);
bytesCopied += length;
out.flush();
}
}
The correct number of total bytes that the destination file should contain is 8716288 (that's what I get when I look at the original files and if I count the written bytes in the Android application), but new File(destination).length() shows 8749056.
What am I doing wrong?

The file size becomes too large because you are not writing length bytes for each write, you are actually writing the whole buffer each time, which is buffer.length() bytes long.
You should use the write(byte[] b, int off, int len) overload instead, to specify how many bytes in the buffer you want to be written on each iteration.

Didn't you mean to write
out.write(buffer, 0, length);
instead of
out.write(buffer);
Otherwise you would always write the complete buffer, even if less bytes were read. This may then lead to a larger file (filled with some garbage between your original data).

Related

Downloaded files are corrupted when buffer length is > 1

I'm trying to write a function which downloads a file at a specific URL. The function produces a corrupt file unless I make the buffer an array of size 1 (as it is in the code below).
The ternary statement above the buffer initialization (which I plan to use) along with hard-coded integer values other than 1 will manufacture a corrupted file.
Note: MAX_BUFFER_SIZE is a constant, defined as 8192 (2^13) in my code.
public static void downloadFile(String webPath, String localDir, String fileName) {
try {
File localFile;
FileOutputStream writableLocalFile;
InputStream stream;
url = new URL(webPath);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
int size = connection.getContentLength(); //File size in bytes
int read = 0; //Bytes read
localFile = new File(localDir);
//Ensure that directory exists, otherwise create it.
if (!localFile.exists())
localFile.mkdirs();
//Ensure that file exists, otherwise create it.
//Note that if we define the file path as we do below initially and call mkdirs() it will create a folder with the file name (I.e. test.exe). There may be a better alternative, revisit later.
localFile = new File(localDir + fileName);
if (!localFile.exists())
localFile.createNewFile();
writableLocalFile = new FileOutputStream(localFile);
stream = connection.getInputStream();
byte[] buffer;
int remaining;
while (read != size) {
remaining = size - read; //Bytes still to be read
//remaining > MAX_BUFFER_SIZE ? MAX_BUFFER_SIZE : remaining
buffer = new byte[1]; //Adjust buffer size according to remaining data (to be read).
read += stream.read(buffer); //Read buffer-size amount of bytes from the stream.
writableLocalFile.write(buffer, 0, buffer.length); //Args: Bytes to read, offset, number of bytes
}
System.out.println("Read " + read + " bytes.");
writableLocalFile.close();
stream.close();
} catch (Throwable t) {
t.printStackTrace();
}
}
The reason I've written it this way is so I may provide a real time progress bar to the user as they are downloading. I've removed it from the code to reduce clutter.
len = stream.read(buffer);
read += len;
writableLocalFile.write(buffer, 0, len);
You must not use buffer.length as the bytes read, you need to use the return value of the read call. Because it might return a short read and then your buffer contains junk (0 bytes or data from previous reads) after the read bytes.
And besides calculating the remaining and using dynamic buffers just go for 16k or something like that. The last read will be short, which is fine.
InputStream.read() may read number of bytes fewer than you requested. But you always append whole buffer to the file. You need to capture actual number of read bytes and append only those bytes to the file.
Additionally:
Watch for InputStream.read() to return -1 (EOF)
Server may return incorrect size. As such, the check read != size is dangerous. I would advise not to rely on the Content-Length HTTP field altogether. Instead, just keep reading from the input stream until you hit EOF.

Memory problems loading a file, plus converting into hex

I'm trying to make a file hexadecimal converter (input file -> output hex string of the file)
The code I came up with is
static String open2(String path) throws FileNotFoundException, IOException,OutOfMemoryError {
System.out.println("BEGIN LOADING FILE");
StringBuilder sb = new StringBuilder();
//sb.ensureCapacity(2147483648);
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
//System.out.println(sb.capacity());
sb.append(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return sb.toString();
}
I am sure that "path" is a valid filename.
The problem is with big files (>=
500mb), the compiler outputs a OutOfMemoryError: Java Heap Space on the StringBuilder.append.
To create this code I followed some tips from http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly but I got a doubt when I tried to force a space allocation for the StringBuilder sb: "2147483648 is too big for an int".
If I want to use this code even with very big files (let's say up to 2gb if I really have to stop somewhere) what's the better way to output a hexadecimal string conversion of the file in terms of speed?
I'm now working on copying the converted string into a file. Anyway I'm having problems of "writing the empty buffer on the file" after the eof of the original one.
static String open3(String path) throws FileNotFoundException, IOException {
System.out.println("BEGIN LOADING FILE (Hope this is the last change)");
FileWriter fos = new FileWriter("HEXTMP");
int size = 262144;
FileInputStream f = new FileInputStream(path);
FileChannel ch = f.getChannel( );
byte[] barray = new byte[size];
ByteBuffer bb = ByteBuffer.wrap( barray );
while (ch.read(bb) != -1)
{
fos.write(bytesToHex(barray));
bb.clear();
}
System.out.println("FILE LOADED; BRING IT BACK");
return "HEXTMP";
}
obviously the file HEXTMP created has a size multiple of 256k, but if the file is 257k it will be a 512 file with LOT of "000000" at the end.
I know I just have to create a last byte array with cut length.
(I used a file writer because i wanted to write the string of hex; otherwise it would have just copied the file as-is)
Why are you loading complete file?
You can load few bytes in buffer from input file, process bytes in buffer, then write processed bytes buffer to output file. Continue this till all bytes from input file are not processed.
FileInputStream fis = new FileInputStream("in file");
FileOutputStream fos = new FileOutputStream("out");
byte buffer [] = new byte[8192];
while(true){
int count = fis.read(buffer);
if(count == -1)
break;
byte[] processed = processBytesToConvert(buffer, count);
fos.write(processed);
}
fis.close();
fos.close();
So just read few bytes in buffer, convert it to hex string, get bytes from converted hex string, then write back these bytes to file, and continue for next few input bytes.
The problem here is that you try to read the whole file and store it in memory.
You should use stream, read some lines of your input file, convert them and write them in the output file. That way your program can scale, whatever the size of the input file is.
The key would be to read file in chunks instead of reading all of it in one go. Depending on its use you could vary size of the chunk. For example, if you are trying to make a hex viewer / editor determine how much content is being shown in the viewport and read only as much of data from file. Or if you are simply converting and dumping hex to another file use any chunk size that is small enough to fit in memory but big enough for performance. This should be tunable over some runs. Perhaps use filesystem NIO in Java 7 so that you can do all three tasks - reading, processing and writing - concurrently. The link included in question gives good primer on reading files.

How to run methods against the entire bytes of a large file

My program needs to do calculations against the entire bytes of a file and it breaks whenever the file gets above a certain size.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I know I can allocate the amount of memory to my program using command line switches, but I'm wondering if there is a more effective way of handling this in my program?
I'm basically trying to figure out a way to read the file in chunks and pass those chunks to another method and essentially rebuild the file in that method.
This is the problem method. I need these bytes to be used in another method.
This method converts the stream to a byte array:
private byte[] inputStreamToByteArray(InputStream inputStream) {
BufferedInputStream bis = null;
ByteArrayOutputStream baos = null;
try {
bis = new BufferedInputStream(inputStream);
baos = new ByteArrayOutputStream(bis);
byte[] buffer = new byte[1024];
int nRead;
while((nRead = bis.read(buffer)) != -1) {
baos.write(buffer, 0, nRead);
}
} catch(IOException ioe) {
ioe.printStackTrace();
}
return baos.toByteArray();
}
This method checks the file type:
private final boolean isMyFileType(byte[] bytes) {
// do stuff
return theBoolean;
}
The reason it is breaking makes sense to me - the byte array ends up being gigantic if I have a gigantic file AND I'm passing around a gigantic byte array.
My goal, I want to read the bytes from a file, determine what type of file it is using another method I wrote, run compression/decompression method against those bytes after determining the file type.
I have most of my goal completed, I just don't know how to handle file streams and large byte arrays effectively.
You are already using a BufferedInputStream. Use the "mark" method to place a mark in the steam. Make sure the "readlimit" argument to "mark" is large enough for you to detect the file type. Read the first X bytes from the stream (but not more than readlimit) and try to figure out the content. Then call reset() to set the stream back to the beginning and continue withw whatever you want to do with the stream.

How to get a byte array from FileInputStream without OutOfMemory error

I have a FileInputStream which has 200MB of data. I have to retrieve the bytes from the input stream.
I'm using the below code to convert InputStream into byte array.
private byte[] convertStreamToByteArray(InputStream inputStream) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try {
int i;
while ((i = inputStream.read()) > 0) {
bos.write(i);
}
} catch (IOException e) {
e.printStackTrace();
}
return bos.toByteArray();
}
I'm getting OutOfMemory exception while coverting such a large data to a byte array.
Kindly let me know any possible solutions to convert InputStream to byte array.
Why do you want to hold the 200MB file in memory? What are you going to to with the byte array?
If you are going to write it to an OutputStream, get the OutputStream ready first, then read the InputStream a chunk at a time, writing the chunk to the OutputStream as you go. You'll never store more than the chunk in memory.
eg:
public static void pipe(InputStream is, OutputStream os) throws IOException {
int read = -1;
byte[] buf = new byte[1024];
try {
while( (read = is.read(buf)) != -1) {
os.write(buf, 0, read);
}
}
finally {
is.close();
os.close();
}
}
This code will take two streams and pipe one to the other.
Android application has limited Heap Memory and which depend on devices. Currently most of the new devices has 64 but it could be more or less depend on Manufacturer. I have seen device come come with 128 MB heap Memory.
So what this really mean?
Its simply means that regardless of available physical memory your application is not allowed to grow more then allocated heap size.
From Android API level 11 you can request for additional memory by using manifest tag android:largeHeap="true" which will be double your heap size. That simply means if your devices has 64 you will get 128 and in case of 128 you will get 256. But this will not work for lower API version.
I am not exactly sure what is your requirement, but if you planning to send over HTTP then read file send data and read again. You can follow the same procedure for file IO also. Just to make sure not to use memory more then available heap size. Just to be extra cautious make sure you leave some room for application execution.
Your problem is not about how to convert InputStream to byte array but that the array is to big to fit in memory. You don't have much choice but to find a way to process bytes from InputStream in smaller blocks.
You'll probably need to massively increase the heap size. Try running your Java virtual machine with the -Xms384m -Xmx384m flag (which specifies a starting and maximum heap size of 384 megabytes, unless I'm wrong). See this for an old version of the available options: depending on the specific virtual machine and platform you may need to do some digging around, but -Xms and -Xmx should get you over that hump.
Now, you probably really SHOULDN'T read it into a byte array, but if that's your application, then...
try this code
private byte[] convertStreamToByteArray(InputStream inputStream) {
ByteArrayOutputStream byteOutStream = new ByteArrayOutputStream();
int readByte = 0;
byte[] buffer = new byte[2024];
while(true)
{
readByte = inputStream.read(buffer);
if(readByte == -1)
{
break;
}
byteOutStream.write(buffer);
}
inputStream.close();
byteOutStream.flush();
byteOutStream.close();
byte[] byteArray= byteOutStream.toByteArray();
return byteArray;
}
try to read chunk of data from InputStream .

Reading and writing binary file in Java (seeing half of the file being corrupted)

I have some working code in python that I need to convert to Java.
I have read quite a few threads on this forum but could not find an answer. I am reading in a JPG image and converting it into a byte array. I then write this buffer it to a different file. When I compare the written files from both Java and python code, the bytes at the end do not match. Please let me know if you have a suggestion. I need to use the byte array to pack the image into a message that needs to be sent over to a remote server.
Java code (Running on Android)
Reading the file:
File queryImg = new File(ImagePath);
int imageLen = (int)queryImg.length();
byte [] imgData = new byte[imageLen];
FileInputStream fis = new FileInputStream(queryImg);
fis.read(imgData);
Writing the file:
FileOutputStream f = new FileOutputStream(new File("/sdcard/output.raw"));
f.write(imgData);
f.flush();
f.close();
Thanks!
InputStream.read is not guaranteed to read any particular number of bytes and may read less than you asked it to. It returns the actual number read so you can have a loop that keeps track of progress:
public void pump(InputStream in, OutputStream out, int size) {
byte[] buffer = new byte[4096]; // Or whatever constant you feel like using
int done = 0;
while (done < size) {
int read = in.read(buffer);
if (read == -1) {
throw new IOException("Something went horribly wrong");
}
out.write(buffer, 0, read);
done += read;
}
// Maybe put cleanup code in here if you like, e.g. in.close, out.flush, out.close
}
I believe Apache Commons IO has classes for doing this kind of stuff so you don't need to write it yourself.
Your file length might be more than int can hold and than you end up having wrong array length, hence not reading entire file into the buffer.

Categories