I am trying read and write large files (larger than 100 MBs) using BufferedInputStream & BufferedOutputStream. I am getting Heap Memory issue & OOM exception.
The code looks like :
BufferedInputStream buffIn = new BufferedInputStream(iStream);
/** iStream is the InputStream object **/
BufferedOutputStream buffOut=new BufferedOutputStream(new FileOutputStream(file));
byte []arr = new byte [1024 * 1024];
int available = -1;
while((available = buffIn.read(arr)) > 0) {
buffOut.write(arr, 0, available);
}
buffOut.flush();
buffOut.close();
My question is when we use the BufferedOutputStreeam is it holding the memory till the full file is written out ?
What is the best way to write large files using BufferedOutputStream?
there is nothing wrong with the code you have provided. your memory issues must lie elsewhere. the buffered streams have a fixed memory usage limit.
the easiest way to determine what has caused an OOME, of course, is to have the OOME generate a heap dump and then examine that heap dump in a memory profiler.
Related
I have a use case where in I need to provide a large file [size ~ 2GB] as a downloadable to a website user through spring controller. The file is originally stored in S3.
I have tried out the approach of reading the file from S3ObjectInputStream and directly writing to HttpServletResponse outputStream[using a byte array of 1 MB size].
Even though I am not bringing the file content into memory [apart from the fixed 1 MB size buffer], the used heap memory value increased by roughly 100 MB. I have also verified that the HttpServletResponse outputStream buffer is fixed at size 9KB throughout the process.
Used memory derived using : Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()
Is there a way to figure out what actually is causing an increase in used memory?
Also is there a more optimised ways in terms of memory usage for write S3 file content to HttpServletResponse.
Note. : As S3 pre-signed URLs can be accessed by anyone within the expiry time window, they cannot be used as an alternate to the above due to data security reasons
.
Code Reference:
InputStream inputStream = s3Object.getObjectContent();
OutputStream outStream = httpServletResponse.getOutputStream();
byte[] buffer = new byte[1024 * 1024];
int lengthOfBytesRead;
while ((lengthOfBytesRead = inputStream.read(buffer)) != -1) {
outStream.write(buffer, 0, lengthOfBytesRead);
}
Thanks!
My system throws exception: "java.lang.OutOfMemoryError: Java heap space", when it processed a huge file. I realized that StringWriter.toString() cause the double size in heap, so it could cause the issue. How can I optimize block of following code to avoid Out Of Memory.
public byte[] generateFromFo(final StringWriter foString) {
try {
StringReader foReader = new StringReader(foString.toString());
ByteArrayOutputStream pdfWriter = new ByteArrayOutputStream();
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, fopFactory.newFOUserAgent(),pdfWriter);
TRANSFORMER_FACTORY.newTransformer().transform(new StreamSource(foReader), new SAXResult(fop.getDefaultHandler()));
LOG.debug("Completed rendering PDF output!");
return pdfWriter.toByteArray();
} catch (Exception e) {
LOG.error("Error while generating PDF from FO",e);
throw new AuditReportExportServiceException(AuditErrorCode.INTERNAL_ERROR,"Could not generate PDF from XSL-FO");
}
}
Using an InputStream of bytes may reduce the memory for foString by upto a factor 2 (char = 2 bytes).
A ByteArrayOutputStream resizes during its filling, so adding an estimated need speeds things up, and might prevent a resizing too much.
InputStream foReader = new ByteArrayInputStream(
foString.toString().getBytes(StandardCharsets.UTF_8);
foString.close();
final int initialCapacity = 160 * 1024;
ByteArrayOutputStream pdfWriter = new ByteArrayOutputStream(initialCapacity);
Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, fopFactory.newFOUserAgent(),
pdfWriter);
TRANSFORMER_FACTORY.newTransformer().transform(new StreamSource(foReader),
new SAXResult(fop.getDefaultHandler()));
The best would be to change the API:
public void generateFromFo(final String foString, OutputStream pdfOut) { ... }
This might make the ByteArrayOutputStream superfluous, and you might immediately stream to a file, URL, or whatever.
The document itself and the generated PDF also has issues:
image sizes (but remember the higher resolution of prints)
some images can be nicely vectorized
repeated images like in a page header, should be stored once
fonts should ideally be the standard fonts, second best embedded subsets (of used chars)
XML might be suboptimal, very repetitive
Broadly, you have two main options:
Increase the memory available to your process. The -Xmx option to Java will set this config. You could pass e.g. -Xmx8G to ask for 8GB of memory on a 64 bit system, if you have that much. Docs are here: http://docs.oracle.com/javase/7/docs/technotes/tools/windows/java.html#nonstandard
Change your code to "stream" the data through in smaller chunks, rather than trying to assemble the whole file into a byte[] in memory, as you have done here. You could change the output of your transformer to a FileOutputStream rather than a ByteArrayOutputStream and return a File rather than a byte[] in the code shown? Or, depending on what you do with the output of this method, you could return an InputStream and allow the consumer to receive the file data in a streaming fashion?
You may also need to change things so that the input to this method is consumed in a streaming fashion. How to do that depends on the details of how StringWriter foString was created. You may need to "pipe" an OutputStream into an InputStream to make this work, see https://docs.oracle.com/javase/7/docs/api/java/io/PipedInputStream.html
1 is simpler. 2 is probably better here.
I am trying to decompress a csv file that is in the form name.csv.gz and its I think its something like 600M compressed and we'll say something in the ballpark of 7Gb when decompressed
byte[] buffer = new byte[4096];
try {
GZIPInputStream gzis = new GZIPInputStream(new FileInputStream("/run/media/justin/DATA/2000000033673205_53848.TEST_SCHEDULE_GCO.20180706.090850.2000000033673205.x04q13.csv.gz"));
FileOutputStream out = new FileOutputStream("/run/media/justin/DATA/unzipped.txt");
int len;
while((len = gzis.read(buffer)) > 0) {
out.write(buffer,0,len);
}
gzis.close();
out.close();
System.out.println("DONE!!");
} catch(IOException e) {e.printStackTrace();}
this is the code I am using to decompress it, and at the end, I get the error Unexpected end of ZLIB stream and I am missing several million lines at the end of the file. I haven't found anything on google that has led me in any prosperous directions so any help is greatly appreciated!
Edit: I forgot a line of code at the top (*facepalm) also, I have increased the buffer size, from 2048 to 4096, and I am getting more lines after decompression, so would I be correct in assuming that I just didn't allocate a large enough buffer? (or is this a naive assumption?)
I have increased the buffer size, from 2048 to 4096, and I am getting more lines after decompression, so would I be correct in assuming that I just didn't allocate a large enough buffer? (or is this a naive assumption?)
This is no problem of your buffer size, it's more a problem with the GZIPInputStream.read() methode. The buffer size only declares how "often" the while-loop should read and write, cause a bigger buffer => higher transfer rate => less loops
Your problem is inside of the GZIPInputStream class or has something to do with the used files, maybe try a smaller file first.
So I have created my own personal HTTP Server in Java from scratch.
So far it is working fine but with one major flaw.
When I try to pass big files to the browser I get a Java Heap Space error. I know how to fix this error through the JVM but I am looking for the long term solution for this.
//declare an integer for the byte length of the file
int length = (int) f.length();
//start the fileinput stream.
FileInputStream fis = new FileInputStream(f);
//byte array with the length of the file
byte[] bytes = new byte[length];
//write the file until the bytes is empty.
while ((length = fis.read(bytes)) != -1 ){
write(bytes, 0, length);
}
flush();
//close the file input stream
fis.close();
This way sends the file to the browser successfully and streams it perfectly but the issue is, because I am creating a byte array with the length of the file. When the file is very big I get the Heap Space error.
I have eliminated this issue by using a buffer as shown below and I dont get Heap Space errors anymore. BUT the way shown below does not stream the files in the browser correctly. It's as if the file bytes are being shuffled and are being sent to the browser all together.
final int bufferSize = 4096;
byte buffer[] = new byte[bufferSize];
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis);
while ( true )
{
int length = bis.read( buffer, 0, bufferSize );
if ( length < 0 ) break;
write( buffer, 0, length );
}
flush();
bis.close();
fis.close();
NOTE1:
All the correct Response Headers are being sent perfectly to the browser.
Note2:
Both ways work perfectly on a computer browser but only the first way works on a smartphone's browser (but sometimes it gives me Heap Space error).
If someone knows how to correctly send files to a browser and stream them correctly I would be a very very happy man.
Thank you in advance! :)
When reading from a BufferedInputStream you can allow its' buffer to handle the buffering, there is no reason to read everything into a byte[] (and certainly not a byte[] of the entire File). Read one byte at a time, and rely on the internal buffer of the stream. Something like,
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis);
int abyte;
while ((abyte = bis.read()) != -1 ){
write(abyte);
}
Emm... As I can see it, you try to use chunks in your code anyway,
as I can remember, even the apache HttpClient+FileUpload solution has file size limit about <=2.1GB or something (correct me if I am wrong) so it is a bit hard thing...
I haven't tried the solution yet but as a test you can use java.io.RandomAccessFile in combination with File(Input/Output)Stream on the client and server not to read and write the whole file at a time but sequence of lets say <=30MB blocks for example to avoid the annoying outofmemory errors ; An example of using RandomAccessFile can be found here https://examples.javacodegeeks.com/core-java/io/randomaccessfile/java-randomaccessfile-example/
But still you give less details :( I mean is your client suppose to be a common Java application or not?
If you have some additional information please let me know
Good luck :)
I was given DataInputStream from a HDFS client for a large file (around 2GB) and I need to store it as a file on my host.
I was thinking about using apache common IOUtils and doing something like this...
File temp = getTempFile(localPath);
DataInputStream dis = HDFSClient.open(filepath); // around 2GB file (zipped)
in = new BufferedInputStream(dis);
out = new FileOutputStream(temp);
IOUtils.copy(in, out);
I was looking for other solutions that can work better than this approach. Major concern for this is to use buffering in both input and IOUtils.copy ...
For files larger than 2GB is recommended to use IOUtils.copyLarge() (if we are speaking about the same IOUtils: org.apache.commons.io.IOUtils )
The copy in IOUtils uses a default buffer size of 4Kb (although you can specify another buffer size as a parameter).
The difference between copy() and copyLarge() is the returning result.
For copy(), if the stream is bigger than 2GB you will succeed with the copy but the result is -1.
For copyLarge() the result is exactly the amount of bytes you copied.
See more in the documentation here:
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/IOUtils.html#copyLarge(java.io.InputStream,%20java.io.OutputStream)