I am trying to send chunks of files from server to more than one clients. When I am trying to send file of size 700mb, it showed "OutOfMemory java heap space" error. I am using Netbeans 7.1.2 version.
I also tried VMoption in the properties. But still the same error happens. I think there is some problem with reading the entire file. Below code is working for up to 300mb. Please give me some suggestions.
Thanks in advance
public class SplitFile {
static int fileid = 0 ;
public static DataUnit[] getUpdatableDataCode(File fileName) throws FileNotFoundException, IOException{
int i = 0;
DataUnit[] chunks = new DataUnit[UAProtocolServer.singletonServer.cloudhosts.length];
FileInputStream fis;
long Chunk_Size = (fileName.length())/chunks.length;
int cursor = 0;
long fileSize = (long) fileName.length();
int nChunks = 0, read = 0;long readLength = Chunk_Size;
byte[] byteChunk;
try {
fis = new FileInputStream(fileName);
//StupidTest.size = (int)fileName.length();
while (fileSize > 0) {
System.out.println("loop"+ i);
if (fileSize <= Chunk_Size) {
readLength = (int) fileSize;
}
byteChunk = new byte[(int)readLength];
read = fis.read(byteChunk, 0, (int)readLength);
fileSize -= read;
// cursor += read;
assert(read==byteChunk.length);
long aid = fileid;
aid = aid<<32 | nChunks;
chunks[i] = new DataUnit(byteChunk,aid);
// Lister.add(chunks[i]);
nChunks++;
++i;
}
fis.close();
fis = null;
}catch(Exception e){
System.out.println("File splitting exception");
e.printStackTrace();
}
return chunks;
}
Reading in the whole file would definitely trigger OutOfMemoryError as file size grow. Tuning the -Xmx1024M may be good for temporary fix, but it's definitely not the right/scalable solution. Also, doesn't matter how you move your variables around (like creating buffer outside of the loop instead of inside the loop) you will get OutOfMemoryError sooner or later. The only way to not get OutOfMemoryError for you is to not to read the complete file in memory.
If you have to use just memory, then an approach is to send off chunks to the client so you don't have to keep all the chunks in memory:
instead of:
chunks[i] = new DataUnit(byteChunk,aid);
do:
sendChunkToClient(new DataUnit(byteChunk, aid));
But the above solution has the drawback that if some error happened in-between chunk sending, you may have hard time trying to resume/recover from the error point.
Saving the chunks to temporary files like Ross Drew suggested is probably better and more reliable.
How about creating the
byteChunk = new byte[(int)readLength];
outside of the loop and just reuse it instead of creating an array of bytes over and over if it's always the same.
Alternatively
You could write incoming data to a temporary file as it comes in instead of maintaining that huge array then process it once it's all arrived.
Also
If you are using it multiple times as an int, you should probably just case readLength to an int outside the loop as well
int len = (int)readLength;
And Chunk_Size is a variable right? It should begin with a lower case letter.
Related
I have a series of objects stored within a file concatenated as below:
sizeOfFile1 || file1 || sizeOfFile2 || file2 ...
The size of the files are serialized long objects and the files are just the raw bytes of the files.
I am trying to extract the files from the input file. Below is my code:
FileInputStream fileInputStream = new FileInputStream("C:\Test.tst");
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
while (fileInputStream.available() > 0)
{
long size = (long) objectInputStream.readObject();
FileOutputStream fileOutputStream = new FileOutputStream("C:\" + size + ".tst");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
int chunkSize = 256;
final byte[] temp = new byte[chunkSize];
int finalChunkSize = (int) (size % chunkSize);
final byte[] finalTemp = new byte[finalChunkSize];
while(fileInputStream.available() > 0 && size > 0)
{
if (fileInputStream.available() > finalChunkSize)
{
int i = fileInputStream.read(temp);
secBufferedOutputStream.write(temp, 0, i);
size = size - i;
}
else
{
int i = fileInputStream.read(finalTemp);
secBufferedOutputStream.write(finalTemp, 0, i);
size = 0;
}
}
bufferedOutputStream.close();
}
fileOutputStream.close();
My code fails after it reads the first sizeOfFile; it just reads the rest of the input file into one file when there are multiple files stored.
Can anyone see the issue here?
Regards.
Wrap it in a DataInputStream and use readFully(byte[]).
But I question the design. Serialization and random access do not mix. It sounds like you should be using a database.
NB you are misusing available(). See the method's Javadoc page. It is never correct to use it as a count of the total number of bytes in the stream. There are few if any correct uses of available(), and this isn't one of them.
you could try NIO instead...
FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, SIZE);
This reads only SIZE bytes from the file.
B
This is using DataInput to read longs. In this particular case I am not using readFully() as a segment might be too long to keep it in memory:
DataInputStream in = new DataInputStream(FileInputStream());
byte[] buf = new byte[64*1024];
while(true) {
OutputStream out = ...;
long size;
try { size = in.readLong(); } catch (EOFException e) { break; }
while(size > 0) {
int len = (size > buf.length)?buf.length:size;
len = in.read(buf, 0, len);
out.write(buf, 0, len);
size-=len;
}
out.close();
}
Save yourself a lot of trouble by doing one of these things:
Switch to using Avro, trust me you would be crazy not to. It's easy to learn, and will accomodate schema changes. Using ObjectXXXStream is one of the worst ideas ever, as soon as you change your schema your old files are garbage.
or use Thrift
or use Hibernate (but this is probably not a great option, hibernate takes a lot of time to learn, and takes a lot of configuration)
If you really refuse to switch to avro, I recommend reading up on apache's IOUtils class. It has a method to copy from one input stream to another, saving you a lot of headaches. Unfortunately what you want to do is a little more complicated, you want the size prefixing each file. You might be able to use a combination of SequenceInputStream objects to do that.
There is also GzipOutputStream and ZipOutputStream, but I think those require some other jars added to your classpath too.
I'm not going to write an example because I honestly think you should just learn avro or thrift and use that.
I have wiki.txt file and its size is 50 MB.
I need to do several things on the file and so I thought that the best way in terms of performance is to load the file to memory, is that correct?
This is the code that I written:
File file = new File("wiki.txt");
FileInputStream fileInputStream = new FileInputStream(file);
FileChannel fileChannel = fileInputStream.getChannel();
MappedByteBuffer mapByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, file.length());
System.out.println((char)mapByteBuffer.get());
I get error on this code: mapByteBuffer.get().
I tried the get() function a few options but all of them I get error and didn't even get an error on e.getMessage() I just got null.
Another important thing to note, my text file contains English words and actions I need to do is search, if expressed is exist in this text file.
Thank you.
I would suggest using a MemoryMappedFile, to read the file directly from the disk instead of loading it in memory.
RandomAccessFile file = new RandomAccessFile("wiki.txt", "r");
FileChannel channel = file.getChannel();
MappedByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, 1024*50);
And then you can read the buffer as usual.
My answers for point (1):
It depends on what you want to do with the file. If your processing doesn't involve rewind operation (looking what was read behind/before), it's best to just read as a stream and process it in one go (instead of loading all into memory).
Even if you need random access across the file, you may also be interested in doing block file operation, because your solution may not scale well when the file size change to bigger size.
RandomAccessFile if you are on Java 1.4 or above.
For random access, the operating system usually handles the file buffer caching quite well you don't have to handle yourself.
It is important to read the whole error, not just the message. Often the real information is in the exception's name not the text associated with it.
You will get an error if the file is empty as there is no first byte.
Note: the approach you are using assumes ASCII 7-bit characters. If you want to assume ISO-8859-1 characters you can use (char) (byteBuffer.get() & 0xFF)
However, if you have plan text you may find that using strings is simpler to use and not much slower. e.g. you can read a 50 MB file as text in less than a second. I would only use a memory mapped file if this is far too long.
I would suggest to use BufferedReader. It is much faster and requires relatively less resources.
First read number of lines:
InputStream is = new BufferedInputStream(new FileInputStream(filename));
byte[] chars = new byte[1024];
int numberOfChars = 0;
while ((numberOfChars = is.read(chars)) != -1)
{
for (int i = 0; i < numberOfChars; ++i)
{
if (chars[i] == '\n' && numberOfChars - i != 1)
{
++count;
}
}
}
count++
return count; // number of lines
Then read the lines:
BufferedReader in = new BufferedReader(new FileReader(fileName));
for (int i = 0; i < endLine; i++)
{
String oneLine = in.readLine();
}
In this strings you can even do search for what you need.
Situation: I have an ArrayList<String> containing a bunch of links to images (http:/www.foo.com/bar/image1.jpg, http:/www.foo.com/bar/image2.png,... etc)
I have found a working piece of code in order to download them one by one:
public void run() {
try {
int counter = 1;
for (String image : imagesList) {
controller.setDownloadStatusTextArea("Downloading image " + counter + " of " + imagesList.size());
URL u = new URL(image);
URLConnection uc = u.openConnection();
String contentType = uc.getContentType();
int contentLength = uc.getContentLength();
InputStream raw = uc.getInputStream();
InputStream in = new BufferedInputStream(raw);
byte[] data = new byte[contentLength];
int bytesRead;
int offset = 0;
while (offset < contentLength) {
bytesRead = in.read(data, offset, data.length - offset);
if (bytesRead == -1)
break;
offset += bytesRead;
}
in.close();
if (offset != contentLength) {
throw new IOException("Only read " + offset + " bytes; Expected " + contentLength + " bytes");
}
String[] tmp = image.split("/");
String filename = tmp[tmp.length - 1];
FileOutputStream out = new FileOutputStream(filename);
out.write(data);
out.flush();
out.close();
counter++;
}
controller.setDownloadStatusTextArea("Download complete");
} catch (Exception ex) {
controller.setDownloadStatusTextArea("Download failed");
}
}
This is the first time I'm doing something like this in Java, and I have a feeling this code can be much more efficient by moving a bunch of variables outside of the for loop. But I'm not sure which can be safely moved outside without affecting the functionality and/or performance (both in a negative or positive way). An insight in this situation would be greatly appreciated.
Also: Can I specify where the files need to be downloaded to? Now they just appear in the project folder, I want the user to be able to change his download folder.
Thanks in advance.
This code can't be made much more time-efficient.
Think of it this way: even if you polished every last dispensable opcode out it, the time it takes for the JVM to execute this portion of code is not significant at all. The real delay will be in waiting for the data to arrive through the network.
It could be more space-efficient, but I don't think it's necessary.
Edit: what you can do is download multiple images at the same time, using threads. If the code above looks complicated though, I would advise against it: take some more time to learn your way around the language.
You don't need to allocte a byte array for whole image... you only need a small buffer - e.g. 8 kB.
Then, read 8 kB from the connection, and write into the FileOutputStream, in a loop.
To make whole code simpler (kick out the loops), you can use e.g.
Commons-IO
(click on FRAMES link to see whole javadoc).
In a Swing application to let the user select a directory, instantiate a JFileChooser with setFileSelectionMode(JFileChooser.DIRECTORIES_ONLY).
You could move all the variable declarations outside of the loop as long as you ensure they are properly initialized with each iteration. You won't save a lot of time relative to the time it will take to download and save the file though.
I want to find out what method is better of two that I have come up with for concatenating my text files in Java. If someone has some insight they can share about what goes on at the kernel level that explains the difference between these methods of writing to a FileChannel, I would greatly appreciate it.
From what I understand from documentation and other Stack Overflow conversations, the allocateDirect allocates space right on the drive, and mostly avoids using RAM. I have a concern that the ByteBuffer created with allocateDirect might have a potential to overflow or not be allocated if the File infile is large, say 1GB. I am guaranteed at this point in the development of our software that the File will be no larger than 2 GB; but there is potential in the future that it might be as big as 10 or 20GB.
I have observed that the transferFrom loop never goes through the loop more than once... so it seems to succeed in writing the entire infile at once; but I haven't tested it with files bigger than 60MB. I looped though, because the documentation specifies that there is no guarantee of how much will be written at once. With transferFrom only able to accept, on my system, an int32 as its count parameter, I won't be able to specify more than 2GB at a time be transferred... Again, kernel expertise would help me understand.
Thanks in advance for your help!!
Using a ByteBuffer:
boolean concatFiles(StringBuffer sb, File infile, File outfile) {
FileChannel inChan = null, outChan = null;
try {
ByteBuffer buff = ByteBuffer.allocateDirect((int)(infile.length() + sb.length()));
//write the stringBuffer so it goes in the output file first:
buff.put(sb.toString().getBytes());
//create the FileChannels:
inChan = new RandomAccessFile(infile, "r" ).getChannel();
outChan = new RandomAccessFile(outfile, "rw").getChannel();
//read the infile in to the buffer:
inChan.read(buff);
// prep the buffer:
buff.flip();
// write the buffer out to the file via the FileChannel:
outChan.write(buff);
inChan.close();
outChan.close();
} catch...etc
}
Using trasferTo (or transferFrom):
boolean concatFiles(StringBuffer sb, File infile, File outfile) {
FileChannel inChan = null, outChan = null;
try {
//write the stringBuffer so it goes in the output file first:
PrintWriter fw = new PrintWriter(outfile);
fw.write(sb.toString());
fw.flush();
fw.close();
// create the channels appropriate for appending:
outChan = new FileOutputStream(outfile, true).getChannel();
inChan = new RandomAccessFile(infile, "r").getChannel();
long startSize = outfile.length();
long inFileSize = infile.length();
long bytesWritten = 0;
//set the position where we should start appending the data:
outChan.position(startSize);
Byte startByte = outChan.position();
while(bytesWritten < length){
bytesWritten += outChan.transferFrom(inChan, startByte, (int) inFileSize);
startByte = bytesWritten + 1;
}
inChan.close();
outChan.close();
} catch ... etc
transferTo() can be far more efficient as there is less data copying, or none if it can all be done in the kernel. And if it isn't on your platform it will still use highly tuned code.
You do need the loop, one day it will iterate and your code will keep working.
I have some working code in python that I need to convert to Java.
I have read quite a few threads on this forum but could not find an answer. I am reading in a JPG image and converting it into a byte array. I then write this buffer it to a different file. When I compare the written files from both Java and python code, the bytes at the end do not match. Please let me know if you have a suggestion. I need to use the byte array to pack the image into a message that needs to be sent over to a remote server.
Java code (Running on Android)
Reading the file:
File queryImg = new File(ImagePath);
int imageLen = (int)queryImg.length();
byte [] imgData = new byte[imageLen];
FileInputStream fis = new FileInputStream(queryImg);
fis.read(imgData);
Writing the file:
FileOutputStream f = new FileOutputStream(new File("/sdcard/output.raw"));
f.write(imgData);
f.flush();
f.close();
Thanks!
InputStream.read is not guaranteed to read any particular number of bytes and may read less than you asked it to. It returns the actual number read so you can have a loop that keeps track of progress:
public void pump(InputStream in, OutputStream out, int size) {
byte[] buffer = new byte[4096]; // Or whatever constant you feel like using
int done = 0;
while (done < size) {
int read = in.read(buffer);
if (read == -1) {
throw new IOException("Something went horribly wrong");
}
out.write(buffer, 0, read);
done += read;
}
// Maybe put cleanup code in here if you like, e.g. in.close, out.flush, out.close
}
I believe Apache Commons IO has classes for doing this kind of stuff so you don't need to write it yourself.
Your file length might be more than int can hold and than you end up having wrong array length, hence not reading entire file into the buffer.