How to run methods against the entire bytes of a large file

How to run methods against the entire bytes of a large file - java

My program needs to do calculations against the entire bytes of a file and it breaks whenever the file gets above a certain size.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I know I can allocate the amount of memory to my program using command line switches, but I'm wondering if there is a more effective way of handling this in my program?
I'm basically trying to figure out a way to read the file in chunks and pass those chunks to another method and essentially rebuild the file in that method.
This is the problem method. I need these bytes to be used in another method.
This method converts the stream to a byte array:
private byte[] inputStreamToByteArray(InputStream inputStream) {
BufferedInputStream bis = null;
ByteArrayOutputStream baos = null;
try {
bis = new BufferedInputStream(inputStream);
baos = new ByteArrayOutputStream(bis);
byte[] buffer = new byte[1024];
int nRead;
while((nRead = bis.read(buffer)) != -1) {
baos.write(buffer, 0, nRead);
}
} catch(IOException ioe) {
ioe.printStackTrace();
}
return baos.toByteArray();
}
This method checks the file type:
private final boolean isMyFileType(byte[] bytes) {
// do stuff
return theBoolean;
}
The reason it is breaking makes sense to me - the byte array ends up being gigantic if I have a gigantic file AND I'm passing around a gigantic byte array.
My goal, I want to read the bytes from a file, determine what type of file it is using another method I wrote, run compression/decompression method against those bytes after determining the file type.
I have most of my goal completed, I just don't know how to handle file streams and large byte arrays effectively.

You are already using a BufferedInputStream. Use the "mark" method to place a mark in the steam. Make sure the "readlimit" argument to "mark" is large enough for you to detect the file type. Read the first X bytes from the stream (but not more than readlimit) and try to figure out the content. Then call reset() to set the stream back to the beginning and continue withw whatever you want to do with the stream.

Related

How to get a byte array from FileInputStream without OutOfMemory error

I have a FileInputStream which has 200MB of data. I have to retrieve the bytes from the input stream.
I'm using the below code to convert InputStream into byte array.
private byte[] convertStreamToByteArray(InputStream inputStream) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try {
int i;
while ((i = inputStream.read()) > 0) {
bos.write(i);
}
} catch (IOException e) {
e.printStackTrace();
}
return bos.toByteArray();
}
I'm getting OutOfMemory exception while coverting such a large data to a byte array.
Kindly let me know any possible solutions to convert InputStream to byte array.

Why do you want to hold the 200MB file in memory? What are you going to to with the byte array?
If you are going to write it to an OutputStream, get the OutputStream ready first, then read the InputStream a chunk at a time, writing the chunk to the OutputStream as you go. You'll never store more than the chunk in memory.
eg:
public static void pipe(InputStream is, OutputStream os) throws IOException {
int read = -1;
byte[] buf = new byte[1024];
try {
while( (read = is.read(buf)) != -1) {
os.write(buf, 0, read);
}
}
finally {
is.close();
os.close();
}
}
This code will take two streams and pipe one to the other.

Android application has limited Heap Memory and which depend on devices. Currently most of the new devices has 64 but it could be more or less depend on Manufacturer. I have seen device come come with 128 MB heap Memory.
So what this really mean?
Its simply means that regardless of available physical memory your application is not allowed to grow more then allocated heap size.
From Android API level 11 you can request for additional memory by using manifest tag android:largeHeap="true" which will be double your heap size. That simply means if your devices has 64 you will get 128 and in case of 128 you will get 256. But this will not work for lower API version.
I am not exactly sure what is your requirement, but if you planning to send over HTTP then read file send data and read again. You can follow the same procedure for file IO also. Just to make sure not to use memory more then available heap size. Just to be extra cautious make sure you leave some room for application execution.

Your problem is not about how to convert InputStream to byte array but that the array is to big to fit in memory. You don't have much choice but to find a way to process bytes from InputStream in smaller blocks.

You'll probably need to massively increase the heap size. Try running your Java virtual machine with the -Xms384m -Xmx384m flag (which specifies a starting and maximum heap size of 384 megabytes, unless I'm wrong). See this for an old version of the available options: depending on the specific virtual machine and platform you may need to do some digging around, but -Xms and -Xmx should get you over that hump.
Now, you probably really SHOULDN'T read it into a byte array, but if that's your application, then...

try this code
private byte[] convertStreamToByteArray(InputStream inputStream) {
ByteArrayOutputStream byteOutStream = new ByteArrayOutputStream();
int readByte = 0;
byte[] buffer = new byte[2024];
while(true)
{
readByte = inputStream.read(buffer);
if(readByte == -1)
{
break;
}
byteOutStream.write(buffer);
}
inputStream.close();
byteOutStream.flush();
byteOutStream.close();
byte[] byteArray= byteOutStream.toByteArray();
return byteArray;
}
try to read chunk of data from InputStream .

Java - File To Byte Array - Fast One

I want to read a file into a byte array. So, I am reading it using:
int len1 = (int)(new File(filename).length());
FileInputStream fis1 = new FileInputStream(filename);
byte buf1[] = new byte[len1];
fis1.read(buf1);
However, it is realy very slow. Can anyone inform me a very fast approach (possibly best one) to read a file into byte array. I can use java library also if needed.
Edit: Is there any benchmark which one is faster (including library approach).

It is not very slow, at least there is not way to make it faster. BUT it is wrong. If file is big enough the method read() will not return all bytes from fist call. This method returns number of bytes it managed to read as return value.
The right way is to call this method in loop:
public static void copy(InputStream input,
OutputStream output,
int bufferSize)
throws IOException {
byte[] buf = new byte[bufferSize];
int bytesRead = input.read(buf);
while (bytesRead != -1) {
output.write(buf, 0, bytesRead);
bytesRead = input.read(buf);
}
output.flush();
}
call this as following:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
copy(new FileInputStream(myfile), baos);
byte[] bytes = baos.toByteArray();
Something like this is implemented in a lot of packages, e.g. FileUtils.readFileToByteArray() mentioned by #Andrey Borisov (+1)
EDIT
I think that reason for slowness in your case is the fact that you create so huge array. Are you sure you really need it? Try to re-think your design. I believe that you do not have to read this file into array and can process data incrementally.

apache commons-io FileUtils.readFileToByteArray

Out of memory when encoding file to base64

Using Base64 from Apache commons
public byte[] encode(File file) throws FileNotFoundException, IOException {
byte[] encoded;
try (FileInputStream fin = new FileInputStream(file)) {
byte fileContent[] = new byte[(int) file.length()];
fin.read(fileContent);
encoded = Base64.encodeBase64(fileContent);
}
return encoded;
}
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at org.apache.commons.codec.binary.BaseNCodec.encode(BaseNCodec.java:342)
at org.apache.commons.codec.binary.Base64.encodeBase64(Base64.java:657)
at org.apache.commons.codec.binary.Base64.encodeBase64(Base64.java:622)
at org.apache.commons.codec.binary.Base64.encodeBase64(Base64.java:604)
I'm making small app for mobile device.

You cannot just load the whole file into memory, like here:
byte fileContent[] = new byte[(int) file.length()];
fin.read(fileContent);
Instead load the file chunk by chunk and encode it in parts. Base64 is a simple encoding, it is enough to load 3 bytes and encode them at a time (this will produce 4 bytes after encoding). For performance reasons consider loading multiples of 3 bytes, e.g. 3000 bytes - should be just fine. Also consider buffering input file.
An example:
byte fileContent[] = new byte[3000];
try (FileInputStream fin = new FileInputStream(file)) {
while(fin.read(fileContent) >= 0) {
Base64.encodeBase64(fileContent);
}
}
Note that you cannot simply append results of Base64.encodeBase64() to encoded bbyte array. Actually, it is not loading the file but encoding it to Base64 causing the out-of-memory problem. This is understandable because Base64 version is bigger (and you already have a file occupying a lot of memory).
Consider changing your method to:
public void encode(File file, OutputStream base64OutputStream)
and sending Base64-encoded data directly to the base64OutputStream rather than returning it.
UPDATE: Thanks to #StephenC I developed much easier version:
public void encode(File file, OutputStream base64OutputStream) {
InputStream is = new FileInputStream(file);
OutputStream out = new Base64OutputStream(base64OutputStream)
IOUtils.copy(is, out);
is.close();
out.close();
}
It uses Base64OutputStream that translates input to Base64 on-the-fly and IOUtils class from Apache Commons IO.
Note: you must close the FileInputStream and Base64OutputStream explicitly to print = if required but buffering is handled by IOUtils.copy().

Either the file is too big, or your heap is too small, or you've got a memory leak.
If this only happens with really big files, put something into your code to check the file size and reject files that are unreasonably big.
If this happens with small files, increase your heap size by using the -Xmx command line option when you launch the JVM. (If this is in a web container or some other framework, check the documentation on how to do it.)
If the file recurs, especially with small files, the chances are that you've got a memory leak.
The other point that should be made is that your current approach entails holding two complete copies of the file in memory. You should be able to reduce the memory usage, though you'll typically need a stream-based Base64 encoder to do this. (It depends on which flavor of the base64 encoding you are using ...)
This page describes a stream-based Base64 encoder / decoder library, and includes lnks to some alternatives.

Well, do not do it for the whole file at once.
Base64 works on 3 bytes at a time, so you can read your file in batches of "multiple of 3" bytes, encode them and repeat until you finish the file:
// the base64 encoding - acceptable estimation of encoded size
StringBuilder sb = new StringBuilder(file.length() / 3 * 4);
FileInputStream fin = null;
try {
fin = new FileInputStream("some.file");
// Max size of buffer
int bSize = 3 * 512;
// Buffer
byte[] buf = new byte[bSize];
// Actual size of buffer
int len = 0;
while((len = fin.read(buf)) != -1) {
byte[] encoded = Base64.encodeBase64(buf);
// Although you might want to write the encoded bytes to another
// stream, otherwise you'll run into the same problem again.
sb.append(new String(buf, 0, len));
}
} catch(IOException e) {
if(null != fin) {
fin.close();
}
}
String base64EncodedFile = sb.toString();

You are not reading the whole file, just the first few kb. The read method returns how many bytes were actually read. You should call read in a loop until it returns -1 to be sure that you have read everything.
The file is too big for both it and its base64 encoding to fit in memory. Either
process the file in smaller pieces or
increase the memory available to the JVM with the -Xmx switch, e.g.
java -Xmx1024M YourProgram

This is best code to upload image of more size
bitmap=Bitmap.createScaledBitmap(bitmap, 100, 100, true);
ByteArrayOutputStream stream = new ByteArrayOutputStream();
bitmap.compress(Bitmap.CompressFormat.PNG, 100, stream); //compress to which format you want.
byte [] byte_arr = stream.toByteArray();
String image_str = Base64.encodeBytes(byte_arr);

Well, looks like your file is too large to keep the multiple copies necessary for an in-memory Base64 encoding in the available heap memory at the same time. Given that this is for a mobile device, it's probably not possible to increase the heap, so you have two options:
make the file smaller (much smaller)
Do it in a stram-based way so that you're reading from an InputStream one small part of the file at a time, encode it and write it to an OutputStream, without ever keeping the enitre file in memory.

In Manifest in applcation tag write following
android:largeHeap="true"
It worked for me

Java 8 added Base64 methods, so Apache Commons is no longer needed to encode large files.
public static void encodeFileToBase64(String inputFile, String outputFile) {
try (OutputStream out = Base64.getEncoder().wrap(new FileOutputStream(outputFile))) {
Files.copy(Paths.get(inputFile), out);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}

How to I find out the size of a GZIP section embedded in firmware?

I am currently analyzing firmware images which contain many different sections, one of which is a GZIP section.
I am able to know the location of the start of the GZIP section using magic number and the GZIPInputStream in Java.
However, I need to know the compressed size of the gzip section. GZIPInputStream would return me the uncompressed file size.
Is there anybody who has an idea?

You can count the number of byte read using a custom InputStream. You would need to force the stream to read one byte at a time to ensure you don't read more than you need.
You can wrap your current InputStream in this
class CountingInputStream extends InputStream {
final InputStream is;
int counter = 0;
public CountingInputStream(InputStream is) {
this.is = is;
}
public int read() throws IOException {
int read = is.read();
if (read >= 0) counter++;
return read;
}
}
and then wrap it in a GZIPInputStream. The field counter will hold the number of bytes read.
To use this with BufferedInputStream you can do
InputStream is = new BufferedInputStream(new FileInputStream(filename));
// read some data or skip to where you want to start.
CountingInputStream cis = new CountingInputStream(is);
GZIPInputStream gzis = new GZIPInputStream(cis);
// read some compressed data
cis.read(...);
int dataRead = cis.counter;

In general, there is no easy way to tell the size of the gzipped data, other than just going through all the blocks.
gzip is a stream compression format, meaning that all the compressed data is written in a single pass. There is no way to stash the compressed size anywhere---it can't be in the header, since that would require more than one pass, and it's useless to have it at the trailer, since if you can locate the trailer, then you already know the compressed size.

Reading and writing binary file in Java (seeing half of the file being corrupted)

I have some working code in python that I need to convert to Java.
I have read quite a few threads on this forum but could not find an answer. I am reading in a JPG image and converting it into a byte array. I then write this buffer it to a different file. When I compare the written files from both Java and python code, the bytes at the end do not match. Please let me know if you have a suggestion. I need to use the byte array to pack the image into a message that needs to be sent over to a remote server.
Java code (Running on Android)
Reading the file:
File queryImg = new File(ImagePath);
int imageLen = (int)queryImg.length();
byte [] imgData = new byte[imageLen];
FileInputStream fis = new FileInputStream(queryImg);
fis.read(imgData);
Writing the file:
FileOutputStream f = new FileOutputStream(new File("/sdcard/output.raw"));
f.write(imgData);
f.flush();
f.close();
Thanks!

InputStream.read is not guaranteed to read any particular number of bytes and may read less than you asked it to. It returns the actual number read so you can have a loop that keeps track of progress:
public void pump(InputStream in, OutputStream out, int size) {
byte[] buffer = new byte[4096]; // Or whatever constant you feel like using
int done = 0;
while (done < size) {
int read = in.read(buffer);
if (read == -1) {
throw new IOException("Something went horribly wrong");
}
out.write(buffer, 0, read);
done += read;
}
// Maybe put cleanup code in here if you like, e.g. in.close, out.flush, out.close
}
I believe Apache Commons IO has classes for doing this kind of stuff so you don't need to write it yourself.

Your file length might be more than int can hold and than you end up having wrong array length, hence not reading entire file into the buffer.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to run methods against the entire bytes of a large file - java

Related

How to get a byte array from FileInputStream without OutOfMemory error

Java - File To Byte Array - Fast One

Out of memory when encoding file to base64

How to I find out the size of a GZIP section embedded in firmware?

Reading and writing binary file in Java (seeing half of the file being corrupted)

Categories

Resources