After playing with PrintWriter and files, I got a doubt about why do sometimes when I read my files immediately when I create them, there are inconsistencies, for example:
File file = new File("Items.txt");
int loopValue = 10;
try {
PrintWriter fout = new PrintWriter(file);
for (int i = 0; i < loopValue; i++) {
fout.print(i + " asdsadas" + System.lineSeparator());
}
//fout.flush(); <-- I know if I call flush or close this problem don't occur
//fout.close();
System.out.println("Here is the file:");
Scanner readFile = new Scanner(file);
while (readFile.hasNext()) {
System.out.println(readFile.nextLine());
}
} catch (FileNotFoundException e) {
System.err.println(e.getMessage());
}
If I run this code, I will read in console an empty file, something like this:
Here is the file:
But if I modify the loopValue to something like 10000, I will have something like this:
Here is the file:
0 asdsadas
1 asdsadas
2 asdsadas
...
... continues
...
9356 asdsadas
9357 asdsadas
9358 <--- here ends, note that it doesnt end in the value 9999
I know that if I call flush() or close() before read the file I can rid of this problem, but why is this happening? When do PrintWriter decide that is time to clean its buffer without I tell it when? and why when I close or flush the PrintWriter this problem won't happen?
Thanks!
The general concept and motivation behind the buffer for PrintWriter is that it is expensive to write something out to the console. Hence, by queueing up pending changes to be output, the program can run more efficiently. Imagine you had a Java program which were doing something very intensive from a CPU point of view, such as heavy calculations in a multithreaded application. Then, if you were also insisting that each call to PrintWriter.print() deliver its output immediately, the program could hang, and overall performance would decline.
If you insist on seeing the output from PrintWriter immediately after the call, then you can call flush() to achieve this. But as already mentioned, there could be a performance penalty under certain conditions.
Buffering is a basic and important technique for speeding I/O
When you call close it will releases all system resources associated with it here file will be used by Printwriter and has not been saved your changes till now until you flush it. So that Scanner which is trying to read file will get the old unchanged content.Now when you call flush it just flushes the file and forces the writer to write all buffered bytes till then stream buffers your input and than write to the file.Note here that you should use finally block to close the stream and note that close() by default flush the stream for you but here you want to use it in line you better use flush and than close it in finally block.
From your question I take it that you already know that there is (or at least may be) a buffer involved which gets flushed on flush and on close.
As to when flushing happens automatically, the JavaDoc on PrintWriter says:
Unlike the PrintStream class, if automatic flushing is enabled it will be done only when one of the println, printf, or format methods is invoked, rather than whenever a newline character happens to be output.
Now, how and if buffering happens depends on the underlying OutputStream that is used (can be specified via the constructor). If you use a BufferedOutputStream, you can specify the size of the buffer. Although it is not explicitly mentioned in the docs, flushing also happens when the buffer is full.
The PrintWriter constructor taking a File that you are using in your example says
Creates a new PrintWriter, without automatic line flushing, with the specified file. This convenience constructor creates the necessary intermediate OutputStreamWriter, which will encode characters using the default charset for this instance of the Java virtual machine.
without making any additional guarantees as to what OutputStreamWriter it will create and which settings it will use.
Related
When looking at the PrintWriter contract for the following constructor:
public PrintWriter(OutputStream out, boolean autoFlush)
Creates a new PrintWriter from an existing OutputStream. This convenience constructor creates the necessary intermediate OutputStreamWriter, which will convert characters into bytes using the default character encoding.
Parameters:
out - An output stream
autoFlush - A boolean; if true, the println, printf, or format methods will flush the output buffer
See Also:
OutputStreamWriter.OutputStreamWriter(java.io.OutputStream)
Notice the autoFlush flag only works on println, printf, and format. Now, I know that printf and format basically do the exact same thing as print except with more options, but I just don't see why they didn't include print as well in the contract. Why did they make this decision?
I suspect it's because the Java authors are making assumptions about performance:
Consider the following code:
public static void printArray(int[] array, PrintWriter writer) {
for(int i = 0; i < array.length; i++) {
writer.print(array[i]);
if(i != array.length - 1) writer.print(',');
}
}
You almost certainly would not want such a method to call flush() after every single call. It could be a big performance hit, especially for large arrays. And, if for some reason you did want that, you could just call flush yourself.
The idea is that printf, format, and println methods are likely going to be printing a good chunk of text all at once, so it makes sense to flush after every one. But it would rarely, if ever, make sense flush after only 1 or a few characters.
After some searching, I have found a citation for this reasoning (emphasis mine):
Most of the examples we've seen so far use unbuffered I/O. This means each read or write request is handled directly by the underlying OS. This can make a program much less efficient, since each such request often triggers disk access, network activity, or some other operation that is relatively expensive.
To reduce this kind of overhead, the Java platform implements buffered I/O streams. Buffered input streams read data from a memory area known as a buffer; the native input API is called only when the buffer is empty. Similarly, buffered output streams write data to a buffer, and the native output API is called only when the buffer is full.
<snip>
It often makes sense to write out a buffer at critical points, without waiting for it to fill. This is known as flushing the buffer.
Some buffered output classes support autoflush, specified by an optional constructor argument. When autoflush is enabled, certain key events cause the buffer to be flushed. For example, an autoflush PrintWriter object flushes the buffer on every invocation of println or format.
I would like to be able to spawn an external process from Java, and periodically write to its input and read the response as if it were the console. Much of the time, however, when I read the process' output, nothing is available. Is there a good practice to do this sort of thing (even avoiding it)?
Here's a stripped down example of what doesn't work:
import org.apache.commons.exec.*;
import java.io.*;
//...
CommandLine cl = CommandLine.parse("/usr/bin/awk {print($1-1)}");
System.out.println(cl.toStrings());
Process proc = new ProcessBuilder(cl.toStrings()).start();
OutputStream os = proc.getOutputStream(); // avoiding *Buffered* classes
InputStream is = proc.getInputStream(); // to lessen buffering complications
os.write(("4" + System.getProperty("line.separator")).getBytes());
os.flush(); // Doesn't seem to flush.
// os.close(); // uncommenting works, but I'd like to keep the process running
System.out.println("reading");
System.out.println(is.read()); // read even one byte? usually is.available() -> 0
Strangely, if I wrap up the OutputStream in a BufferedWriter, the I can read from some processes (cat), but not others (awk, grep).
Generally, the approach taken is proper. Few things though:
InputStream.read() is a blocking method. It waits for input and is not CPU-intensive. You should revolve around it...
More than one byte can be read, just use read(byte[] buffer, int offset, int len)
Wrapping input streams to BufferedInputStream eases the access (readLine() method). This is an alternative to BufferedReader.
Don't forget to use Process.waitFor().
Also, make sure that the external process writes to standard output (and not to standard error). Two possibilities here:
use Process.getErrorStream() (and treat it as another InputStream)
change command line to /usr/bin/awk {print($1-1)} 2>&1. This will redirect standard error to standard output.
I'm looking at the following example
Which uses the following code
try {
BufferedWriter out = new BufferedWriter(new FileWriter("outfilename"));
out.write("aString");
out.close();
}
catch (IOException e) {}
What's the advantage over doing
FileWriter fw = new FileWriter("outfilename");
I have tried both and they seem comparable in speed when it comes to the task of appending to a file one line at a time
The Javadoc provides a reasonable discussion on this subject:
In general, a Writer sends its output immediately to the underlying
character or byte stream. Unless prompt output is required, it is
advisable to wrap a BufferedWriter around any Writer whose write()
operations may be costly, such as FileWriters and OutputStreamWriters.
For example,
PrintWriter out = new PrintWriter(new BufferedWriter(new
FileWriter("foo.out")));
will buffer the PrintWriter's output to the
file. Without buffering, each invocation of a print() method would
cause characters to be converted into bytes that would then be written
immediately to the file, which can be very inefficient.
If you're writing large blocks of text at once (like entire lines) then you probably won't notice a difference. If you have a lot of code that appends a single character at a time, however, a BufferedWriter will be much more efficient.
Edit
As per andrew's comment below, the FileWriter actually uses its own fixed-size 1024 byte buffer. This was confirmed by looking at the source code. The BufferedWriter sources, on the other hand, show that it uses and 8192 byte buffer size (default), which can be configured by the user to any other desired size. So it seems like the benefits of BufferedWriter vs. FileWriter are limited to:
Larger default buffer size.
Ability to override/customize the buffer size.
And to further muddy the waters, the Java 6 implementation of OutputStreamWriter actually delegates to a StreamEncoder, which uses its own buffer with a default size of 8192 bytes. And the StreamEncoder buffer is user-configurable, although there is no way to access it directly through the enclosing OutputStreamWriter.
this is explained in the javadocs for outputstreamwriter. a filewriter does have a buffer (in the underlying outputstreamwriter), but the character encoding converter is invoked on each call to write. using an outer buffer avoids calling the converter so often.
http://download.oracle.com/javase/1.4.2/docs/api/java/io/OutputStreamWriter.html
A buffer effectivity is more easily seen when the load is high. Loop the out.write a couple thousand of times and you should see a difference.
For a few bytes passed in just one call probably the BufferedWriter is even worse (because it problably later calls FileOutputStream).
Is there a way to flush the input stream in Java, perhaps prior to closing it? In relation to
iteratively invoking the statements below, while reading several files on disk
InputStream fileStream = item.openStream();
fileStream.close;
InputStream cannot be flushed. Why do you want to do this?
OutputStream can be flushed as it implements the interface Flushable. Flushing makes IMHO only sense in scenarios where data is written (to force a write of buffered data). Please see the documentation of Flushable for all implementing classes.
This is an old question but appears to be the only one of its kind, and I think there is a valid use case for "flushing" an input stream.
In Java, if you are using a BufferedReader or BufferedInputStream (which I believe is a common case), "flushing" the stream can be considered to be equivalent to discarding all data currently in the buffer -- i.e., flushing the buffer.
For an example of when this might be useful, consider implementing a REPL for a programming language such as Python or similar.
In this case you might use a BufferedReader on System.in. The user enters a (possibly large) expression and hits enter. At this point, the expression (or some part of it) is stored in the buffer of your Reader.
Now, if a syntax error occurs somewhere within the expression, it will be caught and displayed to the user. However, the remainder of the expression still resides in the input buffer.
When the REPL loop continues, it will begin reading just beyond the point where the syntax error occurred, in the middle of some erroneous expression. This is likely not desirable. Rather, it would be better to simply discard the remainder of the buffer and continue with a "fresh start."
In this sense, we can use the BufferedReader API method ready() to discard any remaining buffered characters. The documentation reads:
"Tells whether this stream is ready to be read. A buffered character stream is ready if the buffer is not empty, or if the underlying character stream is ready."
Then we can define a method to "flush" a BufferedReader as:
void flush(BufferedReader input) throws IOException
{
while (input.ready())
input.read();
}
which simply discards all remaining data until the buffer is empty. We then call flush() after handling a syntax error (by displaying to the user). When the REPL loop resumes you have an empty buffer and thus do not get a pile of meaningless errors caused by the "junk" left in the buffer.
I currently have 2 BufferedReaders initialized on the same text file. When I'm done reading the text file with the first BufferedReader, I use the second one to make another pass through the file from the top. Multiple passes through the same file are necessary.
I know about reset(), but it needs to be preceded with calling mark() and mark() needs to know the size of the file, something I don't think I should have to bother with.
Ideas? Packages? Libs? Code?
Thanks
TJ
The Buffered readers are meant to read a file sequentially. What you are looking for is the java.io.RandomAccessFile, and then you can use seek() to take you to where you want in the file.
The random access reader is implemented like so:
try{
String fileName = "c:/myraffile.txt";
File file = new File(fileName);
RandomAccessFile raf = new RandomAccessFile(file, "rw");
raf.readChar();
raf.seek(0);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
The "rw" is a mode character which is detailed here.
The reason the sequential access readers are setup like this is so that they can implement their buffers and that things can not be changed beneath their feet. For example the file reader that is given to the buffered reader should only be operated on by that buffered reader. If there was another location that could affect it you could have inconsistent operation as one reader advanced its position in the file reader while the other wanted it to remain the same now you use the other reader and it is in an undetermined location.
What's the disadvantage of just creating a new BufferedReader to read from the top? I'd expect the operating system to cache the file if it's small enough.
If you're concerned about performance, have you proved it to be a bottleneck? I'd just do the simplest thing and not worry about it until you have a specific reason to. I mean, you could just read the whole thing into memory and then do the two passes on the result, but again that's going to be more complicated than just reading from the start again with a new reader.
The best way to proceed is to change your algorithm, in a way in which you will NOT need the second pass. I used this approach a couple of times, when I had to deal with huge (but not terrible, i.e. few GBs) files which didn't fit the available memory.
It might be hard, but the performance gain usually worths the effort
About mark/reset:
The mark method in BufferedReader takes a readAheadLimit parameter which limits how far you can read after a mark before reset becomes impossible. Resetting doesn't actually mean a file system seek(0), it just seeks inside the buffer. To quote the Javadoc:
readAheadLimit - Limit on the number of characters that may be read while still preserving the mark. After reading this many characters, attempting to reset the stream may fail. A limit value larger than the size of the input buffer will cause a new buffer to be allocated whose size is no smaller than limit. Therefore large values should be used with care.
"The whole business about mark() and reset() in BufferedReader smacks of poor design."
why don't you extend this class and have it do a mark() in the constructor() and then do a seek(0) in topOfFile() method.
BR,
~A