I am getting OutOfMemory Exception. Why? I am using this code for logging. Does this approach correct?
Exceptions and closing of streams are handled in parent methods.
private static void writeToFile(File file, FileWriter out, String message) throws IOException {
if (file.exists() && file.isFile()) {
if ((file.length() + message.getBytes().length) <= FILE_MAX_SIZE_B) {
out.write(message);
} else {
int cutLenght = (int) (file.length() + message.getBytes().length - FILE_MAX_SIZE_B);
FileInputStream fileInputStream = new FileInputStream(file);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fileInputStream));
char[] buf = new char[1024];
int numRead = 0;
StringBuffer text = new StringBuffer(1000);
while ((numRead=bufferedReader.read(buf)) != -1) {
text.append(buf,0,numRead);
}
String result = new String(text).substring(cutLenght);
result += message;
FileWriter fileWriter = new FileWriter(file, appendToFile);
writeToFile(file, fileWriter, result);
bufferedReader.close();
}
}
}
EDIT:
I am using this method for writting my logs in file. So for example in one second I can call 10 logs. I am getting error on lines:
while ((numRead=bufferedReader.read(buf)) != -1) {
text.append(buf,0,numRead);
}
My guess is that you are getting the OutOfMemoryError because you are reading the entire contents of the log file back into memory once it has gotten too close to its maximum size.
You could instead read and write it in smaller chunks, but that could be tricky since you have to avoid overwriting something you haven't already read.
Overall, this technique seems like a very inefficient method of maintaining the log data. Some alternative approaches off the top of my head:
(1) maintain a set of n log files, each with maximum size FILE_MAX_SIZE_B/n. When the first log fills up, open the next one for writing, and so on; when the last one fills up, go back to the first one. In this way you are discarding some of the oldest log data each time you switch files, but not all of it, and still maintaining your overall size limit.
(2) rotate the data within a single file. After each write, add a marker that indicates this is the end of the log stream. When the file has reached its maximum size, just start again at the beginning, overwriting the data that is there. The marker will tell you where the latest message is.
Try something like this:
void appendToFile(File f, CharSequence message, Charset cs, long maximumSize) throws IOException {
long available = maximumSize - f.length();
if (available > 0) {
FileOutputStream fos = new FileOutputStream(f, true);
try {
CharBuffer chars = CharBuffer.wrap(message);
ByteBuffer bytes = ByteBuffer.allocate(8 * 1024); // Re-used when encoding the string
CharsetEncoder enc = cs.newEncoder();
CoderResult res;
do {
res = enc.encode(chars, bytes, true);
bytes.flip();
long len = Math.min(available, bytes.remaining());
available -= len;
fos.write(bytes.array(), bytes.position(), (int) len);
bytes.clear();
} while (res == CoderResult.OVERFLOW && available > 0);
} finally {
fos.close();
}
}
}
Testable with this:
File f = new File(getCacheDir(), "tmp.txt");
f.delete();
// Or whatever charset you want.
Charset cs = Charset.forName("UTF-8");
int maxlen = 2 * 1024; // For this test, 2kb
try {
for (int i = 0; i < maxlen / 20; i++) {
// Write 30 characters for maxlen/20 times == guaranteed overflow
appendToFile(f, "123456789012345678901234567890", cs, maxlen);
System.out.println("Length=" + f.length());
}
} catch (Throwable t) {
t.printStackTrace();
}
f.delete();
Well, you're getting OOM because you're trying to load a huge file into memory.
Did you try opening it with append option instead?
you get OOME because you load the whole file, then get some part of the string. Instead, do a skip on your input stream and read.
Related
I am trying to create a github webhook. It sends a payload every time I publish a new package to one of my repositories. My issue is that that I cannot seem to be able to read in the whole body. It gets cut off at the same number of bytes each time. However, I can see the whole body if I read it using HttpServletRequest#getReader(). Is there something I am doing wrong when trying to read the input stream?
Here is the code for reading the body:
byte[] bodyBytes = new byte[request.getContentLength()];
System.out.println(request.getContentLength());
request.getInputStream().read(bodyBytes);
//System.out.println(request.getReader().readLine()); //works correctly
try (FileWriter fw = new FileWriter(new File("./payload.txt"))) {
for(byte i : bodyBytes)
fw.write("0x" + String.format("%02x ", i) + " ");
fw.write("\n\n\n");
fw.write(new String(bodyBytes));
}
As per the Javadocs, InputStream.read(byte[]) will read at least one byte, when available, and at most as many as the size of the byte array argument. It may read less for any reason, in which case you have to call it repeatedly to get the entire content. Simplest case: write to a ByteArrayOutputStream:
byte[] buf = new byte[1024];
int r;
InputStream is = request.getInputStream();
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
while ((r = is.read(buf)) >= 0) {
baos.write(buf, 0, r);
}
// the bytes are now accessible:
byte[] entireContent = baos.getBytes();
}
This is the principle; it has the disadvantage that it stores the entire content into memory. You may want to process each "batch" of the input and write it to the file instead of in memory, e.g. as:
byte[] buf = new byte[1024];
int r, i;
InputStream is = request.getInputStream();
try (FileWriter fw = new FileWriter(new File("./payload.txt"))) {
while ((r = is.read(buf)) >= 0) {
for (i=0; i < r; i++) {
fw.write("0x" + String.format("%02x ", buf[i]) + " ");
}
// *************** NOTE ****************************
// Apparently you need the entire content as well, so
// this kind of streaming does not apply in this case.
// You have to store the entire content in memory.
// Keeping the code here as an example/reference.
}
}
I am trying merge n pieces of file become single file. But I got strange behavior on my function. The function are called for x times in n seconds. Let say I have 100 files which I will merge, every second I call 5 files and merger it. and in the next second the amount is double to be 10, but from 1-5 is the same file as before the rest is new file. It work normal but in some point, its give zero byte or sometime give the right size.
Could you help me spot the mistake in my function below?
public void mergeFile(list<String> fileList, int x) {
int count = 0;
BufferedOutputStream out = null;
try {
out = new BufferedOutputStream(new FileOutputStream("Test.doc"));
for (String file : fileList) {
InputStream in = new BufferedInputStream(new FileInputStream(file));
byte[] buff = new byte[1024];
in.read(buff);
out.write(buff);
in.close();
count++;
if (count == x) {
break;
}
}
out.flush();
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
*sorry for my English
in.read(buff);
Check the Javadoc. That method isn't guaranteed to fill the buffer. It returns a value which tells you how many bytes it read. You're supposed to use that, and in this situation you are supposed to use it when deciding how many bytes, if any, to write.
You do not read the full file, you read from each file only up to 1024 bytes. You need to loop the read as long as it returns data (or use something like Files.copy().
BTW: you dont need a BufferedOutputStream if you copy with large buffers.
public void mergeFile(list<String> fileList, int x) throws IOException {
try (OutputStream out = new FileOutputStream("Test.doc");) {
int count=0;
for (String file : fileList) {
Files.copy(new File(file).toPath(), out);
count++;
if (count == x) {
break;
}
}
}
}
You also do not need to flush() if you close. I am using "try-with-resource" here, so I dont need to close it explicitely. It is best to propagate the exceptions.
We have some Java code that processes a user-provided file by looping through the file using BufferedReader.readline() to read in each line.
The problem is that when the user uploads a file that has extremely long lines, like an arbitrary binary JPG or such, this can cause out-of-memory issues. Even the first readline() may not return. We want to reject the files with long lines before it OOMs.
Is there a standard Java idiom to handle this, or do we just change to read() and write our own safe version of readLine()?
You will need to read the file character by character (or chunk by chunk) yourself (via some form of read()), and then form the lines into Strings when you encounter a newline character. This way you can throw an Exception (avoiding the OOM error) if some maximum number of characters is hit before a newline is encountered.
If you use a Reader instance it should not be too difficult to implement this code, just read from the Reader into a buffer (which you allocate to your maximum possible line length), and then convert the buffer to String when you encounter a newline (or throw an exception if you don't).
There doesn't appear to be any way to set a line length limit for BufferedReader.readLine(), so it will accumulate the entire line before feeding it to your code, however long that line may be.
Therefore, you'll have to do the line-splitting part yourself, and give up once a line is too long.
You might use the following as a starting point:
class LineTooLongException extends Exception {}
class ShortLineReader implements AutoCloseable {
final Reader reader;
final char[] buf = new char[8192];
int nextIndex = 0;
int maxIndex = 0;
boolean eof;
public ShortLineReader(Reader reader) {
this.reader = reader;
}
public String readLine() throws IOException, LineTooLongException {
if (eof) {
return null;
}
for (;;) {
for (int i = nextIndex; i < maxIndex; i++) {
if (buf[i] == '\n') {
String result = new String(buf, nextIndex, i - nextIndex);
nextIndex = i + 1;
return result;
}
}
if (maxIndex - nextIndex > 6000) {
throw new LineTooLongException();
}
System.arraycopy(buf, nextIndex, buf, 0, maxIndex - nextIndex);
maxIndex -= nextIndex;
nextIndex = 0;
int c = reader.read(buf, maxIndex, buf.length - maxIndex);
if (c == -1) {
eof = true;
return new String(buf, nextIndex, maxIndex - nextIndex);
} else {
maxIndex += c;
}
}
}
#Override
public void close() throws Exception {
reader.close();
}
}
public class Test {
public static void main(String[] args) throws Exception {
File file = new File("D:\\t\\output.log");
// try (OutputStream fos = new BufferedOutputStream(new FileOutputStream(file))) {
// for (int i = 0; i < 10000000; i++) {
// fos.write(65);
// }
// }
try (ShortLineReader r = new ShortLineReader(new FileReader(file))) {
String s;
while ((s = r.readLine()) != null) {
System.out.println(s);
}
}
}
}
Note: This assumes unix-style line termination.
Use BufferedInputStream to read binary data rather than BufferedReader...
for example if it is an image file, using ImageIO and InputStream you can do it like this..
File file = new File("image.gif");
image = ImageIO.read(file);
InputStream is = new BufferedInputStream(new FileInputStream("image.gif"));
image = ImageIO.read(is);
hope it helps...
There doesn't appear to be a definite way but a few things you can do:
Check file headers. jMimeMagic seems to be a pretty good library for this purpose.
Check the type of characters the file contains. Essentially do statistical analysis on the first 'x' bytes of the file and use that to estimate the rest of the content.
Check for newlines '\n' or '\r' in the files, binary files usually wont contain newlines.
Hope that helps.
I'm dealing with the following code that is used to split a large file into a set of smaller files:
FileInputStream input = new FileInputStream(this.fileToSplit);
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output = new FileOutputStream(fileArr[i]);
BufferedOutputStream oBuff = new BufferedOutputStream(output);
int buffSize = 8192;
byte[] buffer = new byte[buffSize];
while (true) {
if (iBuff.available() < buffSize) {
byte[] newBuff = new byte[iBuff.available()];
iBuff.read(newBuff);
oBuff.write(newBuff);
oBuff.flush();
oBuff.close();
break;
}
int r = iBuff.read(buffer);
if (fileArr[i].length() >= this.partSize) {
oBuff.flush();
oBuff.close();
++i;
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
}
oBuff.write(buffer);
}
} catch (Exception e) {
e.printStackTrace();
}
This is the weird behavior I'm seeing... when I run this code using a 3GB file, the initial iBuff.available() call returns a value of a approximatley 2,100,000,000 and the code works fine. When I run this code on a 12GB file, the initial iBuff.available() call only returns a value of 200,000,000 (which is smaller than the split file size of 500,000,000 and causes the processing to go awry).
I'm thinking this discrepancy in behvaior has something to do with the fact that this is on 32-bit windows. I'm going to run a couple more tests on a 4.5 GB file and a 3.5 GB file. If the 3.5 file works and the 4.5 one doesn't, that will further confirm the theory that it's a 32bit vs 64bit issue since 4GB would then be the threshold.
Well if you read the javadoc it quite clearly states:
Returns the number of bytes that can
be read from this input stream
without blocking (emphasis added by me)
So it's quite clear that what you want is not what this method offers. So depending on the underlying InputStream you may get problems much earlier (eg a stream over the network with a server that doesn't return the filesize - you'd have to read the complete file and buffer it just to return the "correct" available() count, which would take a lot of time - what if you only want to read a header?)
So the correct way to handle this is to change your parsing method to be able to handle the file in pieces. Personally I don't see much reason at all to even use available() here - just calling read() and stopping as soon as read() returns -1 should work fine. Can be made more complicated if you want to assure that every file really contains blockSize byte - just add an internal loop if that scenario is important.
int blockSize = XXX;
byte[] buffer = new byte[blockSize];
int i = 0;
int read = in.read(buffer);
while(read != -1) {
out[i++].write(buffer, 0, read);
read = in.read(buffer);
}
There are few correct uses of available(), and this isn't one of them. You don't need all that junk. Memorize this:
int count;
byte[] buffer = new byte[8192]; // or more
while ((count = in.read(buffer)) > 0)
out.write(buffer, 0, count);
That's the canonical way to copy a stream in Java.
You should not use the InputStream.available() function at all. It is only needed in very special circumstances.
You should also not create byte arrays that are larger than 1 MB. It's a waste of memory. The commonly accepted way is to read a small block (4 kB up to 1 MB) from the source file and then store only as many bytes as you have read in the destination file. Do that until you have reached the end of the source file.
available isn't a measure of how much is still to be read but more a measure how much is guaranteed to be able to read before it might EOF or block waiting for input
and put close calls in the finallies
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output;
BufferedOutputStream oBuff=0;
try{
int buffSize = 8192;
int offset=0;
byte[] buffer = new byte[buffSize];
while(true){
int len = iBuff.read(buffer,offset,buffSize-offset);
if(len==-1){//EOF write out last chunk
oBuff.write(buffer,0,offset);
break;
}
if(len+offset==buffSize){//end of buffer write out to file
try{
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
oBuff.write(buffer);
}finally{
oBuff.close();
}
++i;
offset=0;
}
offset+=len;
}//while
}finally{
iBuff.close();
}
Here is some code that splits a file. If performance is critical to you, you can experiment with the buffer size.
package so6164853;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Formatter;
public class FileSplitter {
private static String printf(String fmt, Object... args) {
Formatter formatter = new Formatter();
formatter.format(fmt, args);
return formatter.out().toString();
}
/**
* #param outputPattern see {#link Formatter}
*/
public static void splitFile(String inputFilename, long fragmentSize, String outputPattern) throws IOException {
InputStream input = new FileInputStream(inputFilename);
try {
byte[] buffer = new byte[65536];
int outputFileNo = 0;
OutputStream output = null;
long writtenToOutput = 0;
try {
while (true) {
int bytesToRead = buffer.length;
if (bytesToRead > fragmentSize - writtenToOutput) {
bytesToRead = (int) (fragmentSize - writtenToOutput);
}
int bytesRead = input.read(buffer, 0, bytesToRead);
if (bytesRead != -1) {
if (output == null) {
String outputName = printf(outputPattern, outputFileNo);
outputFileNo++;
output = new FileOutputStream(outputName);
writtenToOutput = 0;
}
output.write(buffer, 0, bytesRead);
writtenToOutput += bytesRead;
}
if (output != null && (bytesRead == -1 || writtenToOutput == fragmentSize)) {
output.close();
output = null;
}
if (bytesRead == -1) {
break;
}
}
} finally {
if (output != null) {
output.close();
}
}
} finally {
input.close();
}
}
public static void main(String[] args) throws IOException {
splitFile("d:/backup.zip", 1440 << 10, "d:/backup.zip.part%04d");
}
}
Some remarks:
Only those bytes that have actually been read from the input file are written to one of the output files.
I left out the BufferedInputStream and BufferedOutputStream since their buffer's size is only 8192 bytes, which less than the buffer I use in the code.
As soon as I open a file, I make sure that it will be closed at the end, no matter what happens. (The finally blocks.)
The code contains only one call to input.read and only one call to output.write. This makes it easier to check for correctness.
The code for splitting a file does not catch the IOException, since it doesn't know what to do in such a case. It is just passed to the caller; maybe the caller knows how to handle it.
Both #ratchet and #Voo are correct.
As for what is happening.
int max value is 2,147,483,647 (http://download.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html).
14 gigabytes is 15,032,385,536 which clearly don't fit an int.
See that according to the API Javadoc (http://download.oracle.com/javase/6/docs/api/java/io/BufferedInputStream.html#available%28%29) and as stated by #Voo, this don't break the method contract at all (just isn't what you are looking for).
I have a log file which gets updated every second. I need to read the log file periodically, and once I do a read, I need to store the file pointer position at the end of the last line I read and in the next periodic read I should start from that point.
Currently, I am using a random access file in Java and using the getFilePointer() method to get he offset value and the seek() method to go to the offset position.
However, I have read in most articles and even the Java doc recommendations to use BufferredReader for efficient reading of a file. How can I achieve this (getting the filepointer and moving to the last line) using a BufferedReader, or is there any other efficient way to achieve this task?
A couple of ways that should work:
open the file using a FileInputStream, skip() the relevant number of bytes, then wrap the BufferedReader around the stream (via an InputStreamReader);
open the file (with either FileInputStream or RandomAccessFile), call getChannel() on the stream/RandomAccessFile to get an underlying FileChannel, call position() on the channel, then call Channels.newInputStream() to get an input stream from the channel, which you can pass to InputStreamReader -> BufferedReader.
I haven't honestly profiled these to see which is better performance-wise, but you should see which works better in your situation.
The problem with RandomAccessFile is essentially that its readLine() method is very inefficient. If it's convenient for you to read from the RAF and do your own buffering to split the lines, then there's nothing wrong with RAF per se-- just that its readLine() is poorly implemented
Neil Coffey's solution is good if you are reading fixed length files. However for files that have variable length (data keep coming in) there are some problems with using BufferedReader directly on FileInputStream or FileChannel inputstream via an InputStreamReader. For ex consider the cases
1)
You want to read data from some offset to current file length. So you use BR on FileInputStream/FileChannel(via an InputStreamReader) and use its readLine method. But while you are busy reading the data let say some data got added which causes BF's readLine to read more data than what you expected(the previous file length)
2)
You finished readLine stuff but when you try to read the current file length/channel position some data got added suddenly which causes the current file length/channel position to increase but you have already read less data than this.
In both of the above cases it is difficult to know the actual data you have read (you cannot just use the length of data read using readLine because it skips some chars like carriage return)
So it is better to read the data in buffered bytes and use a BufferedReader wrapper around this. I wrote some methods like this
/** Read data from offset to length bytes in RandomAccessFile using BufferedReader
* #param offset
* #param length
* #param accessFile
* #throws IOException
*/
public static void readBufferedLines(long offset, long length, RandomAccessFile accessFile) throws IOException{
if(accessFile == null) return;
int bufferSize = BYTE_BUFFER_SIZE;// constant say 4096
if(offset < length && offset >= 0){
int index = 1;
long curPosition = offset;
/*
* iterate (length-from)/BYTE_BUFFER_SIZE times to read into buffer no matter where new line occurs
*/
while((curPosition + (index * BYTE_BUFFER_SIZE)) < length){
accessFile.seek(offset); // seek to last parsed data rather than last data read in to buffer
byte[] buf = new byte[bufferSize];
int read = accessFile.read(buf, 0, bufferSize);
index++;// Increment whether or not read successful
if(read > 0){
int lastnewLine = getLastLine(read,buf);
if(lastnewLine <= 0){ // no new line found in the buffer reset buffer size and continue
bufferSize = bufferSize+read;
continue;
}
else{
bufferSize = BYTE_BUFFER_SIZE;
}
readLine(buf, 0, lastnewLine); // read the lines from buffer and parse the line
offset = offset+lastnewLine; // update the last data read
}
}
// Read last chunk. The last chunk size in worst case is the total file when no newline occurs
if(offset < length){
accessFile.seek(offset);
byte[] buf = new byte[(int) (length-offset)];
int read = accessFile.read(buf, 0, buf.length);
if(read > 0){
readLine(buf, 0, read);
offset = offset+read; // update the last data read
}
}
}
}
private static void readLine(byte[] buf, int from , int lastnewLine) throws IOException{
String readLine = "";
BufferedReader reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(buf,from,lastnewLine) ));
while( (readLine = reader.readLine()) != null){
//do something with readLine
System.out.println(readLine);
}
reader.close();
}
private static int getLastLine(int read, byte[] buf) {
if(buf == null ) return -1;
if(read > buf.length) read = buf.length;
while( read > 0 && !(buf[read-1] == '\n' || buf[read-1] == '\r')) read--;
return read;
}
public static void main(String[] args) throws IOException {
RandomAccessFile accessFile = new RandomAccessFile("C:/sri/test.log", "r");
readBufferedLines(0, accessFile.length(), accessFile);
accessFile.close();
}
I had a similar problem, and I created this class to take lines from BufferedStream, and count how many bytes you have read so far by using getBytes(). We assume the line separator has a single byte by default, and we re-instance the BufferedReader for seek() to work.
public class FileCounterIterator {
public Long position() {
return _position;
}
public Long fileSize() {
return _fileSize;
}
public FileCounterIterator newlineLength(Long newNewlineLength) {
this._newlineLength = newNewlineLength;
return this;
}
private Long _fileSize = 0L;
private Long _position = 0L;
private Long _newlineLength = 1L;
private RandomAccessFile fp;
private BufferedReader itr;
public FileCounterIterator(String filename) throws IOException {
fp = new RandomAccessFile(filename, "r");
_fileSize = fp.length();
this.seek(0L);
}
public FileCounterIterator seek(Long newPosition) throws IOException {
this.fp.seek(newPosition);
this._position = newPosition;
itr = new BufferedReader(new InputStreamReader(new FileInputStream(fp.getFD())));
return this;
}
public Boolean hasNext() throws IOException {
return this._position < this._fileSize;
}
public String readLine() throws IOException {
String nextLine = itr.readLine();
this._position += nextLine.getBytes().length + _newlineLength;
return nextLine;
}
}