flushing input stream: java - java

Is there a way to flush the input stream in Java, perhaps prior to closing it? In relation to
iteratively invoking the statements below, while reading several files on disk
InputStream fileStream = item.openStream();
fileStream.close;

InputStream cannot be flushed. Why do you want to do this?
OutputStream can be flushed as it implements the interface Flushable. Flushing makes IMHO only sense in scenarios where data is written (to force a write of buffered data). Please see the documentation of Flushable for all implementing classes.

This is an old question but appears to be the only one of its kind, and I think there is a valid use case for "flushing" an input stream.
In Java, if you are using a BufferedReader or BufferedInputStream (which I believe is a common case), "flushing" the stream can be considered to be equivalent to discarding all data currently in the buffer -- i.e., flushing the buffer.
For an example of when this might be useful, consider implementing a REPL for a programming language such as Python or similar.
In this case you might use a BufferedReader on System.in. The user enters a (possibly large) expression and hits enter. At this point, the expression (or some part of it) is stored in the buffer of your Reader.
Now, if a syntax error occurs somewhere within the expression, it will be caught and displayed to the user. However, the remainder of the expression still resides in the input buffer.
When the REPL loop continues, it will begin reading just beyond the point where the syntax error occurred, in the middle of some erroneous expression. This is likely not desirable. Rather, it would be better to simply discard the remainder of the buffer and continue with a "fresh start."
In this sense, we can use the BufferedReader API method ready() to discard any remaining buffered characters. The documentation reads:
"Tells whether this stream is ready to be read. A buffered character stream is ready if the buffer is not empty, or if the underlying character stream is ready."
Then we can define a method to "flush" a BufferedReader as:
void flush(BufferedReader input) throws IOException
{
while (input.ready())
input.read();
}
which simply discards all remaining data until the buffer is empty. We then call flush() after handling a syntax error (by displaying to the user). When the REPL loop resumes you have an empty buffer and thus do not get a pile of meaningless errors caused by the "junk" left in the buffer.

Related

In PrintWriter, why doesn't the print() function also auto-flush?

When looking at the PrintWriter contract for the following constructor:
public PrintWriter(OutputStream out, boolean autoFlush)
Creates a new PrintWriter from an existing OutputStream. This convenience constructor creates the necessary intermediate OutputStreamWriter, which will convert characters into bytes using the default character encoding.
Parameters:
out - An output stream
autoFlush - A boolean; if true, the println, printf, or format methods will flush the output buffer
See Also:
OutputStreamWriter.OutputStreamWriter(java.io.OutputStream)
Notice the autoFlush flag only works on println, printf, and format. Now, I know that printf and format basically do the exact same thing as print except with more options, but I just don't see why they didn't include print as well in the contract. Why did they make this decision?
I suspect it's because the Java authors are making assumptions about performance:
Consider the following code:
public static void printArray(int[] array, PrintWriter writer) {
for(int i = 0; i < array.length; i++) {
writer.print(array[i]);
if(i != array.length - 1) writer.print(',');
}
}
You almost certainly would not want such a method to call flush() after every single call. It could be a big performance hit, especially for large arrays. And, if for some reason you did want that, you could just call flush yourself.
The idea is that printf, format, and println methods are likely going to be printing a good chunk of text all at once, so it makes sense to flush after every one. But it would rarely, if ever, make sense flush after only 1 or a few characters.
After some searching, I have found a citation for this reasoning (emphasis mine):
Most of the examples we've seen so far use unbuffered I/O. This means each read or write request is handled directly by the underlying OS. This can make a program much less efficient, since each such request often triggers disk access, network activity, or some other operation that is relatively expensive.
To reduce this kind of overhead, the Java platform implements buffered I/O streams. Buffered input streams read data from a memory area known as a buffer; the native input API is called only when the buffer is empty. Similarly, buffered output streams write data to a buffer, and the native output API is called only when the buffer is full.
<snip>
It often makes sense to write out a buffer at critical points, without waiting for it to fill. This is known as flushing the buffer.
Some buffered output classes support autoflush, specified by an optional constructor argument. When autoflush is enabled, certain key events cause the buffer to be flushed. For example, an autoflush PrintWriter object flushes the buffer on every invocation of println or format.

InputStreamReader buffering

In the InputStreamReader class documentation it is declared that:
To enable the efficient conversion of bytes to characters, more bytes
may be read ahead from the underlying stream than are necessary to
satisfy the current read operation.
what does this statement mean ?
Does the implementation buffer some input data ?
If so, after the use of an InputStreamReader the current position, on the stream we are reading, may not be what we expect...
You have read the documentation correctly, and correctly deduced that you cannot reliably go back to reading the underlying input stream and expect to know where you are in the stream.

Can BufferedReader read bytes?

Sorry if this question is a dulplicate but I didn't get an answer I was looking for.
Java docs says this
In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders >and InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.
My first question is If bufferedReader can read bytes then why can't we work on images which are in bytes using bufferedreader.
My second question is Does Bufferedreader store characters in BUFFER and what is the meaning of this line
will buffer the input from the specified file.
My third question is what is the meaning of this line
In general, each read request made of a Reader causes a corresponding read request to be >made of the underlying character or byte stream.
There are two questions here.
1. Buffering
Imagine you lived a mile from your nearest water source, and you drink a cup of water every hour. Well, you wouldn't walk all the way to the water for every cup. Go once a day, and come home with a bucket full of enough water to fill the cup 24 times.
The bucket is a buffer.
Imagine your village is supplied water by a river. But sometimes the river runs dry for a month; other times the river brings so much water that the village floods. So you build a dam, and behind the dam there is a reservoir. The reservoir fills up in the rainy season and gradually empties in the dry season. The village gets a steady flow of water all year round.
The reservoir is a buffer.
Data streams in computing are similar to both those scenarios. For example, you can get several kilobytes of data from a filesystem in a single OS system call, but if you want to process one character at a time, you need something similar to a reservoir.
A BufferedReader contains within it another Reader (for example a FileReader), which is the river -- and an array of bytes, which is the reservoir. Every time you read from it, it does something like:
if there are not enough bytes in the "reservoir" to fulfil this request
top up the "reservoir" by reading from the underlying Reader
endif
return some bytes from the "reservoir".
However when you use a BufferedReader, you don't need to know how it works, only that it works.
2. Suitability for images
It's important to understand that BufferedReader and FileReader are examples of Readers. You might not have covered polymorphism in your programming education yet, so when you do, remember this. It means that if you have code which uses FileReader -- but only the aspects of it that conform to Reader -- then you can substitute a BufferedReader and it will work just the same.
It's a good habit to declare variables as the most general class that works:
Reader reader = new FileReader(file);
... because then this would be the only change you need to add buffering:
Reader reader = new BufferedReader(new FileReader(file));
I took that detour because it's all Readers that are less suitable for images.
Reader has two read methods:
int read(); // returns one character, cast to an int
int read(char[] block); // reads into block, returns how many chars it read
The second form is unsuitable for images because it definitely reads chars, not ints.
The first form looks as if it might be OK -- after all, it reads ints. And indeed, if you just use a FileReader, it might well work.
However, think about how a BufferedReader wrapped around a FileReader will work. The first time you call BufferedReader.read(), it will call FileReader.read(buffer) to fill its buffer. Then it will cast the first char of the buffer back to an int, and return that.
Especially when you bring multi-byte charsets into the picture, that can cause problems.
So if you want to read integers, use InputStream not Reader. InputStream has int read(byte[] buf, int offset, int length) -- bytes are much more reliably cast back and forth from int than chars.
Readers (and Writers) in java are specialized classes for dealing with text (character) streams - the concept of a line is meaningless in any other type of stream.
for the general IO equivalent, have a look at BufferedInputStream
so, to answer your questions:
while the reader does eventually read bytes, it converts them to characters. it is not intended to read anything else (like images) - use the InputStream family of classes for that
a buffered reader will read large blocks of data from the underlying stream (which may be a file, socket, or anything else) into a buffer in memory and will then serve read requests from this buffer until the buffer is emptied. this behaviour of reading large chunks instead of smaller chucks every time improves performance.
it means that if you dont wrap a reader in a buffered reader then every time you want to read a single character, it will access the disk.network to get just the single character you want. doing I/O in such small chunks is usually terrible for performance.
Default behaviour is it will convert to character, but when you have an image you cannot have a character data, instead you need pixel of bytes data. So you cannot use it.
It is buffereing, means , it is reading a certain chunk of data in an char array. You can see this behaviour in the code:
public BufferedReader(Reader in) {
this(in, defaultCharBufferSize);
}
and the defaultCharBufferSize is as mentioned below:
private static int defaultCharBufferSize = 8192;
3 Every time you do read operation, it will be reading only one character.
So in a nutshell, buffred means, it will read few chunk of character data first that will keep in a char array and that will be processed and again it will read same chunk of data until it reaches end of stream
You can refer the following to get to know more
BufferedReader

Buffered Input Stream mark read limit

I am learning how to use an InputStream. I was trying to use mark for BufferedInputStream, but when I try to reset I have these exceptions:
java.io.IOException: Resetting to invalid mark
I think this means that my mark read limit is set wrong. I actually don't know how to set the read limit in mark(). I tried like this:
is = new BufferedInputStream(is);
is.mark(is.available());
This is also wrong.
is.mark(16);
This also throws the same exception.
How do I know what read limit I am supposed to set? Since I will be reading different file sizes from the input stream.
mark is sometimes useful if you need to inspect a few bytes beyond what you've read to decide what to do next, then you reset back to the mark and call the routine that expects the file pointer to be at the beginning of that logical part of the input. I don't think it is really intended for much else.
If you look at the javadoc for BufferedInputStream it says
The mark operation remembers a point in the input stream and the reset operation causes all the bytes read since the most recent mark operation to be reread before new bytes are taken from the contained input stream.
The key thing to remember here is once you mark a spot in the stream, if you keep reading beyond the marked length, the mark will no longer be valid, and the call to reset will fail. So mark is good for specific situations and not much use in other cases.
This will read 5 times from the same BufferedInputStream.
for (int i=0; i<5; i++) {
inputStream.mark(inputStream.available()+1);
// Read from input stream
Thumbnails.of(inputStream).forceSize(160, 160).toOutputStream(out);
inputStream.reset();
}
The value you pass to mark() is the amount backwards that you will need to reset. if you need to reset to the beginning of the stream, you will need a buffer as big as the entire stream. this is probably not a great design as it will not scale well to large streams. if you need to read the stream twice and you don't know the source of the data (e.g. if it's a file, you could just re-open it), then you should probably copy it to a temp file so you can re-read it at will.

Should I use DataInputStream or BufferedInputStream

I want to read each line from a text file and store them in an ArrayList (each line being one entry in the ArrayList).
So far I understand that a BufferedInputStream writes to the buffer and only does another read once the buffer is empty which minimises or at least reduces the amount of operating system operations.
Am I correct - do I make sense?
If the above is the case in what situations would anyone want to use DataInputStream. And finally which of the two should I be using and why - or does it not matter.
Use a normal InputStream (e.g. FileInputStream) wrapped in an InputStreamReader and then wrapped in a BufferedReader - then call readLine on the BufferedReader.
DataInputStream is good for reading primitives, length-prefixed strings etc.
The two classes are not mutually exclusive - you can use both of them if your needs suit.
As you picked up, BufferedInputStream is about reading in blocks of data rather than a single byte at a time. It also provides the convenience method of readLine(). However, it's also used for peeking at data further in the stream then rolling back to a previous part of the stream if required (see the mark() and reset() methods).
DataInputStream/DataOutputStream provides convenience methods for reading/writing certain data types. For example, it has a method to write/read a UTF String. If you were to do this yourself, you'd have to decide on how to determine the end of the String (i.e. with a terminator byte or by specifying the length of the string).
This is different from BufferedInputStream's readLine() which, as the method sounds like, only returns a single line. writeUTF()/readUTF() deal with Strings - that string can have as many lines it it as it wants.
BufferedInputStream is suitable for most text processing purposes. If you're doing something special like trying to serialize the fields of a class to a file, you'd want to use DataInput/OutputStream as it offers greater control of the data at a binary level.
Hope that helps.
You can always use both:
final InputStream inputStream = ...;
final BufferedInputStream bufferedInputStream =
new BufferedInputStream(inputStream);
final DataInputStream dataInputStream =
new DataInputStream(bufferedInputStream);
InputStream: Base class to read byte from stream (network or file ), provide ability to read byte from the stream and delete the end of the stream.
DataInputStream: To read data directly as a primitive datatype.
BufferInputStream: Read data from the input stream and use buffer to optimize the speed to access the data.
You shoud use DataInputStream in cases when you need to interpret the primitive types in a file written by a language other Java in platform-independent manner.
I would advocate using Jakarta Commons IO and the readlines() method (of whatever variety).
It'll look after buffering/closing etc. and give you back a list of text lines. I'll happily roll my own input stream wrapping with buffering etc., but nine times out of ten the Commons IO stuff works fine and is sufficient/more concise/less error prone etc.
The differences are:
The DataInputStream works with the binary data, while the BufferedReader work with character data.
All primitive data types can be handled by using the corresponding methods in DataInputStream class, while only string data can be read from BufferedReader class and they need to be parsed into the respective primitives.
DataInputStream is a part of filtered streams, while BufferedReader is not.
DataInputStream consumes less amount of memory space being it is a binary stream, whereas BufferedReader consumes more memory space being it is character stream.
The data to be handled is limited in DataInputStream, whereas the number of characters to be handled has wide scope in BufferedReader.

Categories