I'm having trouble understanding how do buffers work in Java IO.
Excuse me if I don't express myself as clearly as I would like, I'm not
strong on all these concepts.
As I undestand it, in Java there are readers/writers, for chars (meaning the
possibility of more than one byte per char and encoding options), and streams
for bytes.
Then there are some classes that use buffers.
I believe that a buffer is used mainly so that we can avoid unnecessary system
calls that would involve expensive operations, like accesing a slower device, by
storing more in memory and making the system call useful for more data.
The issue I have is that there seem to be buffering classes for both readers/writers and streams.
I would like to know if buffering characters is enough, or if, by the time those
bytes get to the streaming class, they would be flushed on for example newlines,
as some classes state.
The best I've found about this is this post
How do these different code snippets compare in regard to buffering?
Does autoflush thwart the intent of buffering?
Should there be only one buffer in play, and if so, where (writer or stream)
and why?:
// resolveSocket is some socket
PrintWriter out = new PrintWriter(
resolveSocket.getOutputStream(),
true);
// vs
PrintWriter out = new PrintWriter(
new OutputStreamWriter(
new BufferedOutputStream(resolveSocket.getOutputStream()),
StandardCharsets.UTF_8),
true)
My interest is first and foremost to understand buffering better, and practical only after that.
Thak you all in advance.
EDIT: Found this other stack overflow question interesting and related.
Also this other one talks about BufferedOutputStream.
It may help you to understand the difference between a writer and a stream. An OutputStream is a binary sink. A Writer has a character encoding and understands about newlines. The OutputStreamWriter allows you to send character encoded data, and have it correctly translated to binary for consumption by an OutputStream. What you probably want is:
PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(resolveSocket.getOutputStream())));
This will allow you to use your PrintWriter to output chars, have it buffered by the BufferedWriter, and then translated by the OutputStreamWriter for consumption by the socket output stream.
Related
Im trying to read information sent for a client on android using the TCP protocol. In my server I have this code:
InputStream input = clienteSocket.getInputStream();
int c = input.read();
c will containt the ascci number that the client send.
I also can get this by writing:
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
I would like to know what is the difference between both methods.
You're comparing apples and oranges here.
Your first example reads one byte from the stream, unbuffered, and returns the value of that byte. (Adding 'ASCII number' to that adds no actual information.)
Your second example sets up a buffered reader, which can read chars from the stream, buffered, but it doesn't actually read anything.
You could set up two further examples:
InputStream is = new BufferedInputStream(socket.getInputStream());
int c = is.read();
This reads a byte, with buffering.
Reader reader = new InputStreamReader(socket.getInputStream();
int c = reader.read();
This reads a char, with a little buffering: not as much as BufferedReader provides.
The realistic choices are between the two buffered versions, for efficiency reasons as outlined by #StephenC, and the choice between them is dictated by whether you want bytes or chars.
The buffered approach is better because (in most cases) reduces the number of syscalls that the JVM needs to make to the operating system. Since syscalls are relatively expensive, buffering generally gives you better performance.
In your specific example:
Each time you call c.read() on an input stream you do a syscall.
The first time you do a c.read() (or other read operation) on a buffered input stream, it reads a number of bytes into an in-memory byte-array. In second, third, etc calls to c.read(), the read will typically return a byte out of the in-memory buffer, without making a syscall.
In your example, the only case where using a buffered stream doesn't help would be if you are going to read only one byte from the socket, and then close it.
UPDATE
I didn't notice that you were comparing an unbuffered InputStream with a buffered >> Reader <<. As #EJP, points out, this is "comparing Apples and Oranges". The functionality of the two versions is different. One reades bytes and the other reads characters.
(And if you don't understand that distinction ... and why it is an important distinction ... you would be advised to read the Java Tutorial lesson on Basic I/O. Particularly the sections on byte streams, character streams and buffered streams.)
I cannot really grasp what the purpose is for the FileReader and BufferedReader classes in Java.
At docs.oracle one is recommended to wrap a buffered reader around a FileReader object because it's not efficient to use the FileReader directly. Where does the cost or overhead come from?
Let say I have a text file that I want to read into my java program using these classes:
I use FileReader and BufferedReader
FileReader fileReader = new FileReader(new File("text.txt)"); // probably correct???
BufferedReader bufferedReader = new BufferedReader(fileReader);
1) What is the task of the FileReader object here? Is it responsible to make an I/O-request via the OS to the file, and thereafter read bytes? What is the cost with this?
Is it true that the FileReader makes several I/O-requests? Or is the cost when the FileReader object has to converting bytes to characters, character by character?
2) The task of the BufferedReader-object - to refer to the last sentence above. - Is the role for the BufferedReader-object to simply buffer arrays of incoming bytes and THEN convert those to character?
Very grateful for answers
Edit: first of all thanks for incoming answers. But I should have mentioned that its exactly this documentation I have studied. Call me stupid or something - but what is meant by "each read request". WHEN is each read request made? how often?
In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,
This is mostly why a launched this question - It sounds that the FileReader causes a lot of I/O-request which slows erverthing down.
From oracle docs :
In general, each read request made of a Reader causes a corresponding
read request to be made of the underlying character or byte stream. It
is therefore advisable to wrap a BufferedReader around any Reader
whose read() operations may be costly, such as FileReaders and
InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each
invocation of read() or readLine() could cause bytes to be read from
the file, converted into characters, and then returned, which can be
very inefficient.
So, as the document clearly suggests, Wrapping a BufferedReader around a FileReader prevents reading of data from the file over and over again. BufferedReader buffers the input.
Java has many I/O classes which may be combined. So you might see something like:
BufferedReader in = new BufferedReader(new InputStreamReader(
new FileInputStream(new File("..."), "UTF-8"));
...
in.close(); // Closes all.
This allows flexible combination. So an XML parser could use Reader, and not care where the text comes from: file, URL, memory. This as opposed to "simpler" languages, where there is no choice of implementation (Map with implementations HashMap, TreeMap).
Now, FileReader and FileWriter are old utility classes to read from and write to a file in the default operating system encoding. So for local files. Not portable (!) to other operating systems, or requiring a fixed encoding.
For this FileReader extends InputStreamReader which reads binary data (an InputStream) using a FileInputStream. It does just that.
However it makes sense to use a larger memory buffer to read, hence the advice; which also is long-standing. Remember performance was an issue in the early times of java.
The only advantage of FileReader-standalone would be for tight memory situations, maybe on a smart phone.
I am currently using a buffered streams to read write some files. In between I do some mathematical processing where a symbol is a byte.
To read :
InputStream input = new FileInputStream(outputname)
input.read(byte[] b,int off,int len)
To write :
OutputStream output = new BufferedOutputStream(
new FileOutputStream(outputname),
OUTPUTBUFFERSIZE
)
output.write((byte)byteinsideaint);
Now I need to add some header data, and to support short symbols too. I want to use DataInputStream and DataOutputStream to avoid converting other types to bytes myself and I am wondering if what is their performance.
Do I need to use
OutputStream output = new DataOutputStream(
new BufferedOutputStream(
new FileOutputStream(outputname),
OUTPUTBUFFERSIZE
)
);
to keep the advantages of the data buffering or it is good enough to use
OutputStream output = new DataOutputStream(
new FileOutputStream(outputname)
)
You should add BufferedOutputStream in between. DataOutputStream does not implement any caching (which is good: separation of concerns) and its performance will be very poor without caching the underlying OutputStream. Even the simplest methods like writeInt() can result in four separate disk writes.
As far as I can see only write(byte[], int, int) and writeUTF(String) are writing data in one byte[] chunk. Others write primitive values (like int or double) byte-by-byte.
You absolutely need the BufferedOutputStream in the middle.
I appreciate your concern about the performance, and I have 2 suggestions:
Shrink your streams with java compression. Useful article can be found here.
Use composition instead of inheritance (which is recommended practice anyway). Create a Pipe which contains a PipedInputStream and a PipedOutputStream connected to each other, with getInputStream() and getOutputStream() methods.You can't directly pass the Pipe object to something needing a stream, but you can pass the return value of it's get methods to do it.
I'm looking at the following example
Which uses the following code
try {
BufferedWriter out = new BufferedWriter(new FileWriter("outfilename"));
out.write("aString");
out.close();
}
catch (IOException e) {}
What's the advantage over doing
FileWriter fw = new FileWriter("outfilename");
I have tried both and they seem comparable in speed when it comes to the task of appending to a file one line at a time
The Javadoc provides a reasonable discussion on this subject:
In general, a Writer sends its output immediately to the underlying
character or byte stream. Unless prompt output is required, it is
advisable to wrap a BufferedWriter around any Writer whose write()
operations may be costly, such as FileWriters and OutputStreamWriters.
For example,
PrintWriter out = new PrintWriter(new BufferedWriter(new
FileWriter("foo.out")));
will buffer the PrintWriter's output to the
file. Without buffering, each invocation of a print() method would
cause characters to be converted into bytes that would then be written
immediately to the file, which can be very inefficient.
If you're writing large blocks of text at once (like entire lines) then you probably won't notice a difference. If you have a lot of code that appends a single character at a time, however, a BufferedWriter will be much more efficient.
Edit
As per andrew's comment below, the FileWriter actually uses its own fixed-size 1024 byte buffer. This was confirmed by looking at the source code. The BufferedWriter sources, on the other hand, show that it uses and 8192 byte buffer size (default), which can be configured by the user to any other desired size. So it seems like the benefits of BufferedWriter vs. FileWriter are limited to:
Larger default buffer size.
Ability to override/customize the buffer size.
And to further muddy the waters, the Java 6 implementation of OutputStreamWriter actually delegates to a StreamEncoder, which uses its own buffer with a default size of 8192 bytes. And the StreamEncoder buffer is user-configurable, although there is no way to access it directly through the enclosing OutputStreamWriter.
this is explained in the javadocs for outputstreamwriter. a filewriter does have a buffer (in the underlying outputstreamwriter), but the character encoding converter is invoked on each call to write. using an outer buffer avoids calling the converter so often.
http://download.oracle.com/javase/1.4.2/docs/api/java/io/OutputStreamWriter.html
A buffer effectivity is more easily seen when the load is high. Loop the out.write a couple thousand of times and you should see a difference.
For a few bytes passed in just one call probably the BufferedWriter is even worse (because it problably later calls FileOutputStream).
I want to read each line from a text file and store them in an ArrayList (each line being one entry in the ArrayList).
So far I understand that a BufferedInputStream writes to the buffer and only does another read once the buffer is empty which minimises or at least reduces the amount of operating system operations.
Am I correct - do I make sense?
If the above is the case in what situations would anyone want to use DataInputStream. And finally which of the two should I be using and why - or does it not matter.
Use a normal InputStream (e.g. FileInputStream) wrapped in an InputStreamReader and then wrapped in a BufferedReader - then call readLine on the BufferedReader.
DataInputStream is good for reading primitives, length-prefixed strings etc.
The two classes are not mutually exclusive - you can use both of them if your needs suit.
As you picked up, BufferedInputStream is about reading in blocks of data rather than a single byte at a time. It also provides the convenience method of readLine(). However, it's also used for peeking at data further in the stream then rolling back to a previous part of the stream if required (see the mark() and reset() methods).
DataInputStream/DataOutputStream provides convenience methods for reading/writing certain data types. For example, it has a method to write/read a UTF String. If you were to do this yourself, you'd have to decide on how to determine the end of the String (i.e. with a terminator byte or by specifying the length of the string).
This is different from BufferedInputStream's readLine() which, as the method sounds like, only returns a single line. writeUTF()/readUTF() deal with Strings - that string can have as many lines it it as it wants.
BufferedInputStream is suitable for most text processing purposes. If you're doing something special like trying to serialize the fields of a class to a file, you'd want to use DataInput/OutputStream as it offers greater control of the data at a binary level.
Hope that helps.
You can always use both:
final InputStream inputStream = ...;
final BufferedInputStream bufferedInputStream =
new BufferedInputStream(inputStream);
final DataInputStream dataInputStream =
new DataInputStream(bufferedInputStream);
InputStream: Base class to read byte from stream (network or file ), provide ability to read byte from the stream and delete the end of the stream.
DataInputStream: To read data directly as a primitive datatype.
BufferInputStream: Read data from the input stream and use buffer to optimize the speed to access the data.
You shoud use DataInputStream in cases when you need to interpret the primitive types in a file written by a language other Java in platform-independent manner.
I would advocate using Jakarta Commons IO and the readlines() method (of whatever variety).
It'll look after buffering/closing etc. and give you back a list of text lines. I'll happily roll my own input stream wrapping with buffering etc., but nine times out of ten the Commons IO stuff works fine and is sufficient/more concise/less error prone etc.
The differences are:
The DataInputStream works with the binary data, while the BufferedReader work with character data.
All primitive data types can be handled by using the corresponding methods in DataInputStream class, while only string data can be read from BufferedReader class and they need to be parsed into the respective primitives.
DataInputStream is a part of filtered streams, while BufferedReader is not.
DataInputStream consumes less amount of memory space being it is a binary stream, whereas BufferedReader consumes more memory space being it is character stream.
The data to be handled is limited in DataInputStream, whereas the number of characters to be handled has wide scope in BufferedReader.