Java OutOfMemoryError while merge large file parts from chunked files - java

I have a problem when the user upload large files (> 1 GB) (I'm using flow.js library), it creates hundred of thousand small chunked files (e.g 100KB each) inside temporary directory but failed to merge into single file, due to MemoryOutOfException. This is not happened when the file is under 1 GB. I know it sound tedious and you probably suggest me to increase the XmX in my container-but I want to have another angle besides that.
Here is my code
private void mergeFile(String identifier, int totalFile, String outputFile) throws AppException{
File[] fileDatas = new File[totalFile]; //we know the size of file here and create specific amount of the array
byte fileContents[] = null;
int totalFileSize = 0;
int filePartUploadSize = 0;
int tempFileSize = 0;
//I'm creating array of file and append the length
for (int i = 0; i < totalFile; i++) {
fileDatas[i] = new File(identifier + "." + (i + 1)); //indentifier is the name of the file
totalFileSize += fileDatas[i].length();
}
try {
fileContents = new byte[totalFileSize];
InputStream inStream;
for (int j = 0; j < totalFile; j++) {
inStream = new BufferedInputStream(new FileInputStream(fileDatas[j]));
filePartUploadSize = (int) fileDatas[j].length();
inStream.read(fileContents, tempFileSize, filePartUploadSize);
tempFileSize += filePartUploadSize;
inStream.close();
}
} catch (FileNotFoundException ex) {
throw new AppException(AppExceptionCode.FILE_NOT_FOUND);
} catch (IOException ex) {
throw new AppException(AppExceptionCode.ERROR_ON_MERGE_FILE);
} finally {
write(fileContents, outputFile);
for (int l = 0; l < totalFile; l++) {
fileDatas[l].delete();
}
}
}
Please show the "inefficient" of this method, once again... only large files that cannot be merge using this method, smaller one ( < 1 GB) no problem at all....
I appreciate if you do not suggest me to increase the heap memory instead show me the fundamental error of this method... thanks...
Thanks

It's unnecessary to allocate the entire file size in memory by declaring a byte array of the entire size. Building the concatenated file in memory in general is totally unnecessary.
Just open up an outputstream for your target file, and then for each file that you are combining to make it, just read each one as an input stream and write the bytes to outputstream, closing each one as you finish. Then when you're done with them all, close the output file. Total memory use will be a few thousand bytes for the buffer.
Also, don't do I/O operations in finally block (except closing and stuff).
Here is a rough example you can play with.
ArrayList<File> files = new ArrayList<>();// put your files here
File output = new File("yourfilename");
BufferedOutputStream boss = null;
try
{
boss = new BufferedOutputStream(new FileOutputStream(output));
for (File file : files)
{
BufferedInputStream bis = null;
try
{
bis = new BufferedInputStream(new FileInputStream(file));
boolean done = false;
while (!done)
{
int data = bis.read();
boss.write(data);
done = data < 0;
}
}
catch (Exception e)
{
//do error handling stuff, log it maybe?
}
finally
{
try
{
bis.close();//do this in a try catch just in case
}
catch (Exception e)
{
//handle this
}
}
}
} catch (Exception e)
{
//handle this
}
finally
{
try
{
boss.close();
}
catch (Exception e) {
//handle this
}
}

... show me the fundamental error of this method
The implementation flaw is that you are creating a byte array (fileContents) whose size is the total file size. If the total file size is too big, that will cause an OOME. Inevitably.
Solution - don't do that! Instead "stream" the file by reading from the "chunk" files and writing to the final file using a modest sized buffer.
There are other problems with your code too. For instance, it could leak file descriptors because you are not ensure that inStream is closed under all circumstances. Read up on the "try-with-resources" construct.

Related

Java benchmark disk speed

I'm trying to get some reliable method of measuring disk read speed, but failing at removal of cache out of the equation.
In How to measure Disk Speed in Java for Benchmarking is in answer from simgineer utility for exactly this, but for some reason, I failed to replicate its behaviour, and running the utility does not yield anything precise either (for read).
From suggestion in different answer, setting test file to something bigger than main memory size seems to work, but I cannot afford to spend whole four minutes for system to allocate 130GB file. (not writing anything in the file results in sparse file and returns bogus times)
Sufficient file size seems to be somewhere between
Runtime.getRuntime().maxMemory()
and
Runtime.getRuntime().maxMemory()*2
The source code of my current solution:
File file = new File(false ? "D:/work/bench.dat" : "./work/bench.dat");
RandomAccessFile wFile = null, rFile = null;
try {
System.out.println("Allocating test file ...");
int blockSize = 1024*1024;
long size = false ? 10L*1024L*(long)blockSize : Runtime.getRuntime().maxMemory()*2;
byte[] block = new byte[blockSize];
for(int i = 0; i<blockSize; i++) {
if(i % 2 == 0) block[i] = (byte) (i & 0xFF);
}
System.out.println("Writing ...");
wFile = new RandomAccessFile(file,"rw");
wFile.setLength(size);
for(long i = 0; i<size-blockSize; i+= blockSize) {
wFile.write(block);
}
wFile.close();
System.out.println("Running read test ...");
long t0 = System.nanoTime();
rFile = new RandomAccessFile(file,"r");
int blockCount = (int)(size/blockSize)-1;
Random rnd = new Random();
for(int i = 0; i<testCount; i++) {
rFile.seek((long)rnd.nextInt(blockCount)*(long)blockSize);
rFile.readFully(block, 0, blockSize);
}
rFile.close();
long t1 = System.nanoTime();
double readB = ((double)testCount*(double)blockSize);
double timeNs = (double)(t1-t0);
return (readB/(1024*1024))/(timeNs/(1000*1000*1000));
} catch (Exception e) {
Logger.logError("Failed to benchmark drive speed!", e);
return 0;
} finally {
if(wFile != null) {try {wFile.close();} catch (IOException e) {}}
if(rFile != null) {try {rFile.close();} catch (IOException e) {}}
if(file.exists()) {file.delete();}
}
I somewhat hoped to get a benchmark that will finish within seconds (caching results for following runs) having only first execution a bit slower.
I could technically crawl the filesystem and bench the read on files that are already on the drive, but that smells like a lot of undefined behaviour and firewalls are not happy about it either.
Any other options left? (platform dependent libraries are off the table)
In the end decided to solve the problem by scouring local work folder for files and load those, hoping we packaged enough with application to get specs speeds. In my current test case, the answer is luckily yes, but there are no guarantees, so I keep the approach from question as a backup plan.
This is not exactly perfect solution, but it somewhat works, getting specs speed at about 2000 test files. Bear in mind that this test cannot be rerun with same results, as all test files from previous execution are now probably cached.
You can always call flushmem ( https://chadaustin.me/flushmem/ ) by Chad Austin, but that takes about as much time to execute as the original approach, so I would advise to simply cache result of the first run and hope for the best.
Used code:
final int MIN_FILE_SIZE = 1024*10;
final int MAX_READ = 1024*1024*50;
final int FILE_COUNT_FRACTION = 4;
// Scour the location of the runtime for any usable files.
ArrayList<File> found = new ArrayList<>();
ArrayList<File> queue = new ArrayList<>();
queue.add(new File("./"));
while(!queue.isEmpty() && found.size() < testCount) {
File tested = queue.remove(queue.size()-1);
if(tested.isDirectory()) {
queue.addAll(Arrays.asList(tested.listFiles()));
} else if(tested.length()>MIN_FILE_SIZE){
found.add(tested);
}
}
// If amount of found files is not sufficient, perform test with new file.
if(found.size() < testCount/FILE_COUNT_FRACTION) {
Logger.logInfo("Disk to CPU transfer benchmark failed to find "
+ "sufficient amount of files to read, slow version "
+ "will be performed!", found.size());
return benchTransferSlowDC(testCount);
}
System.out.println(found.size());
byte[] block = new byte[MAX_READ];
Collections.shuffle(found);
RandomAccessFile raf = null;
long readB = 0;
try {
long t0 = System.nanoTime();
for(int i = 0; i<Math.min(found.size(), testCount); i++) {
File file = found.get(i);
int size = (int) Math.min(file.length(), MAX_READ);
raf = new RandomAccessFile(file,"r");
raf.read(block, 0, size);
raf.close();
readB += size;
}
long t1 = System.nanoTime();
return ((double)readB/(1024*1024))/((double)(t1-t0)/(1000*1000*1000));
//return (double)(t1-t0) / (double)readB;
} catch (Exception e) {
Logger.logError("Failed to benchmark drive speed!", e);
if(raf != null) try {raf.close();} catch(Exception ex) {}
return 0;
}

Thread blocking sometimes while unzipping png files

I am developing an android app that requires downloading a zip file (around 1,5 MB max) with a small amount of logos (png files of 20-30KB average size) from a webserver.
I have encapsulated the process of downloading and unzipping the files into android internal storage in an AsyncTask's doInbackground() method.
The issue I have is that the unZipIntoInternalStorage() method I have developed (pasted down), sometimes runs forever. Usually it takes around 900ms seconds to unzip and save the logos into internal storage, but for some unknown reason around 1 of 4 executions blocks during the loop (and stays there "for ever" taking more than 2 or 3 mins to decompress all png files):
while ((count = zipInputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, count);
}
Edited: After doing some logging and debugging I found out that the line slowing down so much the execution is : zipInputStream.read(buffer) inside the while condition. Any ideas why sometimes it runs extremely fast and some others extremely slow?
Here is my complete method to unzip the downloaded files and save them into android internal storage. I also add the method where the zipInputStream is initialized, from the zip file downloaded (both methods executed inside doInBackground() ):
private void unZipIntoInternalStorage(ZipInputStream zipInputStream) {
long start = System.currentTimeMillis();
Log.i(LOG_TAG, "Unzipping started ");
try {
File iconsDir = context.getDir("icons", Context.MODE_PRIVATE);
ZipEntry zipEntry;
byte[] buffer = new byte[1024];
int count;
FileOutputStream outputStream;
while ((zipEntry = zipInputStream.getNextEntry()) != null) {
File icon = new File(iconsDir, zipEntry.getName());
outputStream = new FileOutputStream(icon);
while ((count = zipInputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, count);
}
zipInputStream.closeEntry();
outputStream.close();
}
zipInputStream.close();
} catch (Exception e) {
Log.e(LOG_TAG + " Decompress", "unzip error ", e);
e.printStackTrace();
}
Log.i(LOG_TAG, "Unzipping completed time required: " + (System.currentTimeMillis() - start) + " ms");
}
private ZipInputStream httpDownloadIconsZip(String zipUrl) {
URLConnection urlConnection;
try {
URL finalUrl = new URL(zipUrl);
urlConnection = finalUrl.openConnection();
return new ZipInputStream(urlConnection.getInputStream());
} catch (IOException e) {
Log.e(LOG_TAG, Log.getStackTraceString(e));
return null;
}
}
To clarify, after testing this method several times and debugging, the blocking for ever always happens in the nested while loop I described previously. But I can't find the reason (see edited clarification)
Also I have already tried this method using BufferedOutputStream class and with the same results: nested while loop running forever sometimes and others unzipping successfully in less than a second.
Hope I have been as clear as possible, since I have spent long hours looking for posible causes to the issue in several post regarding unzipping files or java I/O methods with no success.
Any help appreciated. Thanks
I would suspect the InputStream rather than the output to be the issue.
Try :
return new ZipInputStream(new BufferedInputStream(urlConnection.getInputStream()));
You can add an argument for setting buffer size, but default settings should be fine for your use case.
The problem is typically caused by small packet size, leading to one read forcing several IO operations.
Ideally, you do want to use also a BufferedOutputStream, since the read could read much less than 1kB, but you still do pay a full I/O for each write.
As a general rule, remember I/O is 100 times slower than anything else you could do, and often leads to the scheduler putting your task on Wait. So just use BufferedStream anywhere the stream is not in memory (i.e. always except for StringBufferXXXStream basically).
In your case, due to zip protocol, your read could lead to any number of smaller reads on the actual network socket, as Zip parses and interprets headers and contents of the compressed file.
I know this error. If your zip file has been damaged, when you try to unzip it by ZipInputStream, it will be a infinite loop, because the file has no EOF.
But if your unzip it by ZipFile, you can catch that Exception!
public static boolean unZipByFilePath(String fileName, String unZipDir) {
long startUnZipTime = System.currentTimeMillis();
try {
File f = new File(unZipDir);
if (!f.exists()) {
f.mkdirs();
}
BufferedOutputStream dest = null;
BufferedInputStream is = null;
ZipEntry entry;
ZipFile zipfile = new ZipFile(fileName);
Enumeration e = zipfile.entries();
while (e.hasMoreElements()) {
entry = (ZipEntry) e.nextElement();
is = new BufferedInputStream(zipfile.getInputStream(entry));
int count = 0;
byte data[] = new byte[BUFFER];
String destFilePath = unZipDir + "/" + entry.getName();
File desFile = new File(destFilePath);
if (entry.isDirectory()) {
desFile.mkdirs();
} else if (!desFile.exists()) {
desFile.getParentFile().mkdirs();
desFile.createNewFile();
}
FileOutputStream fos = new FileOutputStream(destFilePath);
dest = new BufferedOutputStream(fos, BUFFER);
while ((count = is.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, count);
}
dest.flush();
dest.close();
is.close();
}
zipfile.close();
} catch (Exception e) {
Log.e(TAG, "unZipByFilePath failed : " + e.getMessage());
return false;
}
return true;
}

how to flush the ZipOutputStream periodically in java

I am trying to archive list of files in zip format and then downloading it for the user on the fly...
I am facing out of memory issue when downloading a zip of 1gb size
Please help me how i can resolve this without increasing jvm heap size. i would like to flush the stream periodically..
I AM TRYING TO FLUSH PERIODICALLY BUT THIS IS NOT WORKING FOR ME.
Please find my code attached below:
try{
ServletOutputStream out = response.getOutputStream();
ZipOutputStream zip = new ZipOutputStream(out);
response.setContentType("application/octet-stream");
response.addHeader("Content-Disposition",
"attachment; filename=\"ResultFiles.zip\"");
//adding multiple files to zip
ZipUtility.addFileToZip("c:\\a", "print1.txt", zip);
ZipUtility.addFileToZip("c:\\a", "print2.txt", zip);
ZipUtility.addFileToZip("c:\\a", "print3.txt", zip);
ZipUtility.addFileToZip("c:\\a", "print4.txt", zip);
zip.flush();
zip.close();
out.close();
} catch (ZipException ex) {
System.out.println("zip exception");
} catch (Exception ex) {
System.out.println("exception");
ex.printStackTrace();
}
public class ZipUtility {
static public void addFileToZip(String path, String srcFile,
ZipOutputStream zip) throws Exception {
File file = new File(path + "\\" + srcFile);
boolean exists = file.exists();
if (exists) {
long fileSize = file.length();
int buffersize = (int) fileSize;
byte[] buf = new byte[buffersize];
int len;
FileInputStream fin = new FileInputStream(path + "\\" + srcFile);
zip.putNextEntry(new ZipEntry(srcFile));
int bytesread = 0, bytesBuffered = 0;
while ((bytesread = fin.read(buf)) > -1) {
zip.write(buf, 0, bytesread);
bytesBuffered += bytesread;
if (bytesBuffered > 1024 * 1024) { //flush after 1mb
bytesBuffered = 0;
zip.flush();
}
}
zip.closeEntry();
zip.flush();
fin.close();
}
}
}
}
You want to use chunked encoding to send a file that large otherwise the servlet container will try and figure out the size of the data you are trying to send before sending it so it can set the Content-Length header. Since you are compressing files you don't know the size of the data you're sending. Chunked-Encoding allows you to send pieces of the response in smaller chunks. Don't set the content length of the stream. You might try using curl or something to see the HTTP headers in the response your getting from the server. If it isn't chunked then you'll want to figure that out. You'll want to research how to force the servlet container to send chunked encoding. You might have to add this to the response header to make the servlet container send it chunked.
response.setHeader("Transfer-Encoding", "chunked");
The other option would be to compress the file into a temporary file with File.createTemp(), and then send the contents of that. If you compress to a temp file first then you can know how big the file is and set the content length for the servlet.
I guess you are digging in a wrong direction. Try to replace the servlet output stream by a file stream and see if the issue is still here. I suspect your web container tries to collect whole servlet output to calculate content-length before sending http headers.
Another thing...you are performing your close inside your try catch block. This leaves the chance for the stream to stay open on your files if you have an exception, as well as NOT giving the stream the chance to flush to the disk.
Always make sure your close is in a finally block (at least until you can get Java 7 with its try-with-resources block)
//build the byte buffer for transferring the data from the file
//to the zip.
final int BUFFER = 2048;
byte [] data = new byte[BUFFER];
File zipFile= new File("C\:\\myZip.zip");
BufferedInputStream in = null;
ZipOutputStream zipOut = null;
try {
//create the out stream to send the file to and zip it.
//we want it buffered as that is more efficient.
FileOutputStream destination = new FileOutputStream(zipFile);
zipOut = new ZipOutputStream(new BufferedOutputStream(destination));
zipOut.setMethod(ZipOutputStream.DEFLATED);
//create the input stream (buffered) to read in the file so we
//can write it to the zip.
in = new BufferedInputStream(new FileInputStream(fileToZip), BUFFER);
//now "add" the file to the zip (in object speak only).
ZipEntry zipEntry = new ZipEntry(fileName);
zipOut.putNextEntry(zipEntry);
//now actually read from the file and write the file to the zip.
int count;
while((count = in.read(data, 0, BUFFER)) != -1) {
zipOut.write(data, 0, count);
}
}
catch (FileNotFoundException e) {
throw e;
}
catch (IOException e) {
throw e;
}
finally {
//whether we succeed or not, close the streams.
if(in != null) {
try {
in.close();
}
catch (IOException e) {
//note and do nothing.
e.printStackTrace();
}
}
if(zipOut != null) {
try {
zipOut.close();
}
catch (IOException e) {
//note and do nothing.
e.printStackTrace();
}
}
}
Now if you need to loop, you can just loop around the part that you need to add more files to. Perhaps pass in an array of files and loop over it. This code worked for me zipping a file up.
Don't size your buf based on the file size, use a fixed size buffer.

java nio and FileInputStream

Basically I have this code to decompress some string that stores in a file:
public static String decompressRawText(File inFile) {
InputStream in = null;
InputStreamReader isr = null;
StringBuilder sb = new StringBuilder(STRING_SIZE);
try {
in = new FileInputStream(inFile);
in = new BufferedInputStream(in, BUFFER_SIZE);
in = new GZIPInputStream(in, BUFFER_SIZE);
isr = new InputStreamReader(in);
int length = 0;
while ((length = isr.read(cbuf)) != -1) {
sb.append(cbuf, 0, length);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e1) {
e1.printStackTrace();
}
}
return sb.toString();
}
Since physical IO is quite time consuming, and since my compressed version of files are all quite small(around 2K from 2M of text), is it possible for me to still do the above, but on a file that is already mapped to memory? possibly using java NIO? Thanks
It won't make any difference, at least not much. Mapped files are about 20% faster in I/O last time I looked. You still have to actually do the I/O: mapping just saves some data copying. I would look at increasing BUFFER_SIZE to at least 32k. Also the size of cbuf, which should be a local variable in this method, not a member variable, so it will be thread-safe. It might be worth not compressing the files under a certain size threshold, say 10k.
Also you should be closing isr here, not in.
It might be worth trying putting another BufferedInputStream on top of the GZIPInputStream, as well as the one underneath it. Get it to do more at once.

error while downloading file in java -java.net.SocketException: Connection reset

I'm trying to download a file using socket and server in java.
myClient = new Socket(address,port);
myClient.setSoTimeout(MyFileManager.TIME_OUT);
in = new DataInputStream(myClient.getInputStream());
out = new DataOutputStream(myClient.getOutputStream());
File requestedFile = new File(_fileManager.getDir()+fileName); //creating the new file
// requestedFile.createNewFile(); //now it does
fos = new FileOutputStream(requestedFile);
long size = in.readLong(); //get the size
for (int i=1; i<=size; i++) {
try {
fos.write(in.read());
}
catch (IOException e) {
e.printStackTrace();
}
}
I'm sending the other side the file size and then sending each byte,
right before the bytes end, it throws the above exception, saying conncetion reset.
what could be the problem?
Thank you!
Why do you think that this line returns the number of bytes in the stream??
long size = in.readLong(); //get the size
You should do in.read() until it returns -1.
I think that the loop should be something like:
for (int i=0; i<size; i++)...
Since streams are 0 based.
Also, you might want to keep reading until you hit the EOF rather than reading a specific number of bytes. See this tutorial to learn how :)

Categories