I currently have 2 BufferedReaders initialized on the same text file. When I'm done reading the text file with the first BufferedReader, I use the second one to make another pass through the file from the top. Multiple passes through the same file are necessary.
I know about reset(), but it needs to be preceded with calling mark() and mark() needs to know the size of the file, something I don't think I should have to bother with.
Ideas? Packages? Libs? Code?
Thanks
TJ
The Buffered readers are meant to read a file sequentially. What you are looking for is the java.io.RandomAccessFile, and then you can use seek() to take you to where you want in the file.
The random access reader is implemented like so:
try{
String fileName = "c:/myraffile.txt";
File file = new File(fileName);
RandomAccessFile raf = new RandomAccessFile(file, "rw");
raf.readChar();
raf.seek(0);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
The "rw" is a mode character which is detailed here.
The reason the sequential access readers are setup like this is so that they can implement their buffers and that things can not be changed beneath their feet. For example the file reader that is given to the buffered reader should only be operated on by that buffered reader. If there was another location that could affect it you could have inconsistent operation as one reader advanced its position in the file reader while the other wanted it to remain the same now you use the other reader and it is in an undetermined location.
What's the disadvantage of just creating a new BufferedReader to read from the top? I'd expect the operating system to cache the file if it's small enough.
If you're concerned about performance, have you proved it to be a bottleneck? I'd just do the simplest thing and not worry about it until you have a specific reason to. I mean, you could just read the whole thing into memory and then do the two passes on the result, but again that's going to be more complicated than just reading from the start again with a new reader.
The best way to proceed is to change your algorithm, in a way in which you will NOT need the second pass. I used this approach a couple of times, when I had to deal with huge (but not terrible, i.e. few GBs) files which didn't fit the available memory.
It might be hard, but the performance gain usually worths the effort
About mark/reset:
The mark method in BufferedReader takes a readAheadLimit parameter which limits how far you can read after a mark before reset becomes impossible. Resetting doesn't actually mean a file system seek(0), it just seeks inside the buffer. To quote the Javadoc:
readAheadLimit - Limit on the number of characters that may be read while still preserving the mark. After reading this many characters, attempting to reset the stream may fail. A limit value larger than the size of the input buffer will cause a new buffer to be allocated whose size is no smaller than limit. Therefore large values should be used with care.
"The whole business about mark() and reset() in BufferedReader smacks of poor design."
why don't you extend this class and have it do a mark() in the constructor() and then do a seek(0) in topOfFile() method.
BR,
~A
Related
I have extract my tables from my database in json file, now I want to read this files and remove all double quotes on them, seems easy and tried hundred of solutions, and some take me to the out of memory problems. I'm dealing with files that they have more than 1Gb size.The code that you will find below have a strange behaviour, and I don't understand why it return empty files
public void replaceDoubleQuotes(String fileName){
log.debug(" start formatting " + fileName + " ...");
File firstFile = new File ("C:/sqlite/db/tables/" + fileName);
String oldContent = "";
String newContent = "";
BufferedReader reader = null;
BufferedWriter writer = null;
FileWriter writerFile = null;
String stringQuotes = "\\\\\\\\\"";
try {
reader = new BufferedReader(new FileReader(firstFile));
writerFile = new FileWriter("C:/sqlite/db/tables/" + fileName);
writer = new BufferedWriter(writerFile);
while (( oldContent = reader.readLine()) != null ){
newContent = oldContent.replaceAll(stringQuotes, "");
writer.write(newContent);
}
writer.flush();
writer.close();
} catch (Exception e) {
log.error(e);
}
}
and when I try to use FileWriter(path,true) to write at the end of the file the program don't stop increasing the file memory till the hard disk will be full, thanks for help
ps : I also tried to use subString and append the new content and after the while I write the subString but also doesn't work
TL; DR;
Do not read and write the same file concurrently.
The issue
Your code starts reading, and then immediately truncates the file it is reading.
reader = new BufferedReader(new FileReader(firstFile));
writerFile = new FileWriter("C:/sqlite/db/tables/" + fileName);
writer = new BufferedWriter(writerFile);
The first line opens a read handle to the file.
The second line opens a write handle to the same file.
It is not very clear if you look at the documentation of FileWriter constructor, but when you do not use a constructor that allows you to specify the append parameter, then the value is false by default, meaning, you immediately truncate the file if it already exists.
At this point (line 2) you have just erased the file you were about to read. So you end up with an empty file.
What about using append=true
Well, then the file is not erased when it is created, which is "good". So you program starts reading the first line, and outputs (to the same file) the filtered version.
So each time a line is read, another is appended.
No wonder your program will never reach the end of the file : each time it advances a line, it creates another line to process. Generally speaking, you'll never reach end of file (well of course if the file is a single line to begin with, you might but that's a corner case).
The solution
Write to a temporary file, and IF (and only IF) you succed, then swap the files if you really need too.
An advantage of this solution : if for whatever reason your processe crahses, you'll have the original file untouched and you could retry later, which is usually a good thing. Your process is "repeatable".
A disadvantage : you'll need twice the space at some point. (Although you could compress the temp file and reduce this factor but still).
About out of memory issues
When working with arbitrarily large files, the path you chose (using buffered readers and writers) is the right one, because you only use one line-worth of memory at a time.
Therefore it generally avoids memory usage issues (unless of course, you have a file without line breaks, in which case it makes no difference at all).
Other solutions, involving reading the whole file at once, then performing the search/replace in memory, then writing the contents back do not scale that well, so it's good you avoided this kind of computation.
Not related but important
Check out the try with resources syntax to properly close your resources (reader / writer). Here you forgot to close the reader, and you are not closing the writer appropriately anyway (that is : in a finally clause).
Another thing : I'm pretty sure no java program written by a mere mortal will beat tools like sed or awk that are available on most unix platforms (and some more). Maybe you'd want to check if rolling your own in java is worth what is a shell one-liner.
#GPI already provided a great answer on why reading and writing concurrently is causing the issue you're experiencing. It is also worth noting that reading 1gb of data into heap at once can definitely cause a OutOfMemoryError if enough heap isn't allocated which is likely. To solve this problem you could use an InputStream and read chunks of the file at a time, then write to another file until the process is completed, and ultimately replace the existing file with the modified one and delete. With this approach you could even use a ForkJoinTask to help with this since it's such a large job.
Side note;
There may be a better solution than create new file, write to new file, replace existing, delete new file.
After playing with PrintWriter and files, I got a doubt about why do sometimes when I read my files immediately when I create them, there are inconsistencies, for example:
File file = new File("Items.txt");
int loopValue = 10;
try {
PrintWriter fout = new PrintWriter(file);
for (int i = 0; i < loopValue; i++) {
fout.print(i + " asdsadas" + System.lineSeparator());
}
//fout.flush(); <-- I know if I call flush or close this problem don't occur
//fout.close();
System.out.println("Here is the file:");
Scanner readFile = new Scanner(file);
while (readFile.hasNext()) {
System.out.println(readFile.nextLine());
}
} catch (FileNotFoundException e) {
System.err.println(e.getMessage());
}
If I run this code, I will read in console an empty file, something like this:
Here is the file:
But if I modify the loopValue to something like 10000, I will have something like this:
Here is the file:
0 asdsadas
1 asdsadas
2 asdsadas
...
... continues
...
9356 asdsadas
9357 asdsadas
9358 <--- here ends, note that it doesnt end in the value 9999
I know that if I call flush() or close() before read the file I can rid of this problem, but why is this happening? When do PrintWriter decide that is time to clean its buffer without I tell it when? and why when I close or flush the PrintWriter this problem won't happen?
Thanks!
The general concept and motivation behind the buffer for PrintWriter is that it is expensive to write something out to the console. Hence, by queueing up pending changes to be output, the program can run more efficiently. Imagine you had a Java program which were doing something very intensive from a CPU point of view, such as heavy calculations in a multithreaded application. Then, if you were also insisting that each call to PrintWriter.print() deliver its output immediately, the program could hang, and overall performance would decline.
If you insist on seeing the output from PrintWriter immediately after the call, then you can call flush() to achieve this. But as already mentioned, there could be a performance penalty under certain conditions.
Buffering is a basic and important technique for speeding I/O
When you call close it will releases all system resources associated with it here file will be used by Printwriter and has not been saved your changes till now until you flush it. So that Scanner which is trying to read file will get the old unchanged content.Now when you call flush it just flushes the file and forces the writer to write all buffered bytes till then stream buffers your input and than write to the file.Note here that you should use finally block to close the stream and note that close() by default flush the stream for you but here you want to use it in line you better use flush and than close it in finally block.
From your question I take it that you already know that there is (or at least may be) a buffer involved which gets flushed on flush and on close.
As to when flushing happens automatically, the JavaDoc on PrintWriter says:
Unlike the PrintStream class, if automatic flushing is enabled it will be done only when one of the println, printf, or format methods is invoked, rather than whenever a newline character happens to be output.
Now, how and if buffering happens depends on the underlying OutputStream that is used (can be specified via the constructor). If you use a BufferedOutputStream, you can specify the size of the buffer. Although it is not explicitly mentioned in the docs, flushing also happens when the buffer is full.
The PrintWriter constructor taking a File that you are using in your example says
Creates a new PrintWriter, without automatic line flushing, with the specified file. This convenience constructor creates the necessary intermediate OutputStreamWriter, which will encode characters using the default charset for this instance of the Java virtual machine.
without making any additional guarantees as to what OutputStreamWriter it will create and which settings it will use.
Sorry if this question is a dulplicate but I didn't get an answer I was looking for.
Java docs says this
In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders >and InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.
My first question is If bufferedReader can read bytes then why can't we work on images which are in bytes using bufferedreader.
My second question is Does Bufferedreader store characters in BUFFER and what is the meaning of this line
will buffer the input from the specified file.
My third question is what is the meaning of this line
In general, each read request made of a Reader causes a corresponding read request to be >made of the underlying character or byte stream.
There are two questions here.
1. Buffering
Imagine you lived a mile from your nearest water source, and you drink a cup of water every hour. Well, you wouldn't walk all the way to the water for every cup. Go once a day, and come home with a bucket full of enough water to fill the cup 24 times.
The bucket is a buffer.
Imagine your village is supplied water by a river. But sometimes the river runs dry for a month; other times the river brings so much water that the village floods. So you build a dam, and behind the dam there is a reservoir. The reservoir fills up in the rainy season and gradually empties in the dry season. The village gets a steady flow of water all year round.
The reservoir is a buffer.
Data streams in computing are similar to both those scenarios. For example, you can get several kilobytes of data from a filesystem in a single OS system call, but if you want to process one character at a time, you need something similar to a reservoir.
A BufferedReader contains within it another Reader (for example a FileReader), which is the river -- and an array of bytes, which is the reservoir. Every time you read from it, it does something like:
if there are not enough bytes in the "reservoir" to fulfil this request
top up the "reservoir" by reading from the underlying Reader
endif
return some bytes from the "reservoir".
However when you use a BufferedReader, you don't need to know how it works, only that it works.
2. Suitability for images
It's important to understand that BufferedReader and FileReader are examples of Readers. You might not have covered polymorphism in your programming education yet, so when you do, remember this. It means that if you have code which uses FileReader -- but only the aspects of it that conform to Reader -- then you can substitute a BufferedReader and it will work just the same.
It's a good habit to declare variables as the most general class that works:
Reader reader = new FileReader(file);
... because then this would be the only change you need to add buffering:
Reader reader = new BufferedReader(new FileReader(file));
I took that detour because it's all Readers that are less suitable for images.
Reader has two read methods:
int read(); // returns one character, cast to an int
int read(char[] block); // reads into block, returns how many chars it read
The second form is unsuitable for images because it definitely reads chars, not ints.
The first form looks as if it might be OK -- after all, it reads ints. And indeed, if you just use a FileReader, it might well work.
However, think about how a BufferedReader wrapped around a FileReader will work. The first time you call BufferedReader.read(), it will call FileReader.read(buffer) to fill its buffer. Then it will cast the first char of the buffer back to an int, and return that.
Especially when you bring multi-byte charsets into the picture, that can cause problems.
So if you want to read integers, use InputStream not Reader. InputStream has int read(byte[] buf, int offset, int length) -- bytes are much more reliably cast back and forth from int than chars.
Readers (and Writers) in java are specialized classes for dealing with text (character) streams - the concept of a line is meaningless in any other type of stream.
for the general IO equivalent, have a look at BufferedInputStream
so, to answer your questions:
while the reader does eventually read bytes, it converts them to characters. it is not intended to read anything else (like images) - use the InputStream family of classes for that
a buffered reader will read large blocks of data from the underlying stream (which may be a file, socket, or anything else) into a buffer in memory and will then serve read requests from this buffer until the buffer is emptied. this behaviour of reading large chunks instead of smaller chucks every time improves performance.
it means that if you dont wrap a reader in a buffered reader then every time you want to read a single character, it will access the disk.network to get just the single character you want. doing I/O in such small chunks is usually terrible for performance.
Default behaviour is it will convert to character, but when you have an image you cannot have a character data, instead you need pixel of bytes data. So you cannot use it.
It is buffereing, means , it is reading a certain chunk of data in an char array. You can see this behaviour in the code:
public BufferedReader(Reader in) {
this(in, defaultCharBufferSize);
}
and the defaultCharBufferSize is as mentioned below:
private static int defaultCharBufferSize = 8192;
3 Every time you do read operation, it will be reading only one character.
So in a nutshell, buffred means, it will read few chunk of character data first that will keep in a char array and that will be processed and again it will read same chunk of data until it reaches end of stream
You can refer the following to get to know more
BufferedReader
I am learning how to use an InputStream. I was trying to use mark for BufferedInputStream, but when I try to reset I have these exceptions:
java.io.IOException: Resetting to invalid mark
I think this means that my mark read limit is set wrong. I actually don't know how to set the read limit in mark(). I tried like this:
is = new BufferedInputStream(is);
is.mark(is.available());
This is also wrong.
is.mark(16);
This also throws the same exception.
How do I know what read limit I am supposed to set? Since I will be reading different file sizes from the input stream.
mark is sometimes useful if you need to inspect a few bytes beyond what you've read to decide what to do next, then you reset back to the mark and call the routine that expects the file pointer to be at the beginning of that logical part of the input. I don't think it is really intended for much else.
If you look at the javadoc for BufferedInputStream it says
The mark operation remembers a point in the input stream and the reset operation causes all the bytes read since the most recent mark operation to be reread before new bytes are taken from the contained input stream.
The key thing to remember here is once you mark a spot in the stream, if you keep reading beyond the marked length, the mark will no longer be valid, and the call to reset will fail. So mark is good for specific situations and not much use in other cases.
This will read 5 times from the same BufferedInputStream.
for (int i=0; i<5; i++) {
inputStream.mark(inputStream.available()+1);
// Read from input stream
Thumbnails.of(inputStream).forceSize(160, 160).toOutputStream(out);
inputStream.reset();
}
The value you pass to mark() is the amount backwards that you will need to reset. if you need to reset to the beginning of the stream, you will need a buffer as big as the entire stream. this is probably not a great design as it will not scale well to large streams. if you need to read the stream twice and you don't know the source of the data (e.g. if it's a file, you could just re-open it), then you should probably copy it to a temp file so you can re-read it at will.
Is there a way to flush the input stream in Java, perhaps prior to closing it? In relation to
iteratively invoking the statements below, while reading several files on disk
InputStream fileStream = item.openStream();
fileStream.close;
InputStream cannot be flushed. Why do you want to do this?
OutputStream can be flushed as it implements the interface Flushable. Flushing makes IMHO only sense in scenarios where data is written (to force a write of buffered data). Please see the documentation of Flushable for all implementing classes.
This is an old question but appears to be the only one of its kind, and I think there is a valid use case for "flushing" an input stream.
In Java, if you are using a BufferedReader or BufferedInputStream (which I believe is a common case), "flushing" the stream can be considered to be equivalent to discarding all data currently in the buffer -- i.e., flushing the buffer.
For an example of when this might be useful, consider implementing a REPL for a programming language such as Python or similar.
In this case you might use a BufferedReader on System.in. The user enters a (possibly large) expression and hits enter. At this point, the expression (or some part of it) is stored in the buffer of your Reader.
Now, if a syntax error occurs somewhere within the expression, it will be caught and displayed to the user. However, the remainder of the expression still resides in the input buffer.
When the REPL loop continues, it will begin reading just beyond the point where the syntax error occurred, in the middle of some erroneous expression. This is likely not desirable. Rather, it would be better to simply discard the remainder of the buffer and continue with a "fresh start."
In this sense, we can use the BufferedReader API method ready() to discard any remaining buffered characters. The documentation reads:
"Tells whether this stream is ready to be read. A buffered character stream is ready if the buffer is not empty, or if the underlying character stream is ready."
Then we can define a method to "flush" a BufferedReader as:
void flush(BufferedReader input) throws IOException
{
while (input.ready())
input.read();
}
which simply discards all remaining data until the buffer is empty. We then call flush() after handling a syntax error (by displaying to the user). When the REPL loop resumes you have an empty buffer and thus do not get a pile of meaningless errors caused by the "junk" left in the buffer.