Performance of DataInputStream\DataOutputStream

Performance of DataInputStream\DataOutputStream - java

I am currently using a buffered streams to read write some files. In between I do some mathematical processing where a symbol is a byte.
To read :
InputStream input = new FileInputStream(outputname)
input.read(byte[] b,int off,int len)
To write :
OutputStream output = new BufferedOutputStream(
new FileOutputStream(outputname),
OUTPUTBUFFERSIZE
)
output.write((byte)byteinsideaint);
Now I need to add some header data, and to support short symbols too. I want to use DataInputStream and DataOutputStream to avoid converting other types to bytes myself and I am wondering if what is their performance.
Do I need to use
OutputStream output = new DataOutputStream(
new BufferedOutputStream(
new FileOutputStream(outputname),
OUTPUTBUFFERSIZE
)
);
to keep the advantages of the data buffering or it is good enough to use
OutputStream output = new DataOutputStream(
new FileOutputStream(outputname)
)

You should add BufferedOutputStream in between. DataOutputStream does not implement any caching (which is good: separation of concerns) and its performance will be very poor without caching the underlying OutputStream. Even the simplest methods like writeInt() can result in four separate disk writes.
As far as I can see only write(byte[], int, int) and writeUTF(String) are writing data in one byte[] chunk. Others write primitive values (like int or double) byte-by-byte.

You absolutely need the BufferedOutputStream in the middle.
I appreciate your concern about the performance, and I have 2 suggestions:
Shrink your streams with java compression. Useful article can be found here.
Use composition instead of inheritance (which is recommended practice anyway). Create a Pipe which contains a PipedInputStream and a PipedOutputStream connected to each other, with getInputStream() and getOutputStream() methods.You can't directly pass the Pipe object to something needing a stream, but you can pass the return value of it's get methods to do it.

Related

What is the resource efficient way to generate inputstream?

I am new to Java IO. Currently, I have these lines of code which generates an input stream based on string.
String sb = new StringBuilder();
for(...){
sb.append(...);
}
String finalString = sb.toString();
byte[] objectBytes = finalString.getBytes(StandardCharsets.UTF_8);
InputStream inputStream = new ByteArrayInputStream(objectBytes);
Maybe, I am misunderstanding something, but is there a better way to generate InputStream from String other than using getBytes()?
For instance, if String is really large, 50MB, and there is no way to create another copy (getBytes() for another 50MB) of it due to resource constraints, it could potentially throw an out of memory error.
I just wanted to know if above lines of code is the efficient way to generate InputStream from String. For instance, is there a way which I can "stream" String into input stream without using additional memory? Like a Reader-like abstraction on top of String?

I think what you're looking for is a StringReader which is defined as:
A character stream whose source is a string.
To use this efficiently, you would need to know exactly where the bytes are located that you wish to read. It supports both random and sequential access, so you can read the entire String, char by char, if you prefer.

You are producing data, actually writing and you want to almost immediately consume the data, reading.
The Unix technique is to pipe the output of one process to the input of an other process. In java one also needs at least two threads. They will synchronize on producing and consuming.
PipedInputStream in = new PipedInputStream();
PipedOutputStream out = new PipedOutputStream(in);
new Thread(() -> writeAllYouveGot(out)).start();
readAllYouveGot(in);
Here I started a Thread for writing with a Runnable that calls some self-defined method on out. Instead of using new Thread you might prefer an ExecutorService.
Piped I/O is rather seldomly used, though the asynchrone behaviour is optimal. One can even set the pipe's size on the PipedInputStream. The reason for that rare usage, is the need for a second thread.
To complete things, one would probably wrap the binary Input/OutputStreams in new InputStreamReader(in, "UTF-8") and new OutputStreamWriter(out, "UTF-8").

Try something like this (no promises about typos:)
BufferedReader reader = new BufferedRead(new InputStreamReader(yourInputStream), Charset.defaultCharset());
final char[] buffer = new char[8000];
int charsRead = 0;
while(true) {
charsRead = reader.read(buffer, 0, 8000);
if (charsRead == -1) {
break;
}
// Do something with buffer
}
The InputStreamReader converts from byte to char, using the Charset. BufferedReader allows you to read blocks of char.
For really large input streams, you may want to process the input in chunks, rather than reading the entire stream into memory and then processing.

Difference between methods to read a byte from TCP server?

Im trying to read information sent for a client on android using the TCP protocol. In my server I have this code:
InputStream input = clienteSocket.getInputStream();
int c = input.read();
c will containt the ascci number that the client send.
I also can get this by writing:
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
I would like to know what is the difference between both methods.

You're comparing apples and oranges here.
Your first example reads one byte from the stream, unbuffered, and returns the value of that byte. (Adding 'ASCII number' to that adds no actual information.)
Your second example sets up a buffered reader, which can read chars from the stream, buffered, but it doesn't actually read anything.
You could set up two further examples:
InputStream is = new BufferedInputStream(socket.getInputStream());
int c = is.read();
This reads a byte, with buffering.
Reader reader = new InputStreamReader(socket.getInputStream();
int c = reader.read();
This reads a char, with a little buffering: not as much as BufferedReader provides.
The realistic choices are between the two buffered versions, for efficiency reasons as outlined by #StephenC, and the choice between them is dictated by whether you want bytes or chars.

The buffered approach is better because (in most cases) reduces the number of syscalls that the JVM needs to make to the operating system. Since syscalls are relatively expensive, buffering generally gives you better performance.
In your specific example:
Each time you call c.read() on an input stream you do a syscall.
The first time you do a c.read() (or other read operation) on a buffered input stream, it reads a number of bytes into an in-memory byte-array. In second, third, etc calls to c.read(), the read will typically return a byte out of the in-memory buffer, without making a syscall.
In your example, the only case where using a buffered stream doesn't help would be if you are going to read only one byte from the socket, and then close it.
UPDATE
I didn't notice that you were comparing an unbuffered InputStream with a buffered >> Reader <<. As #EJP, points out, this is "comparing Apples and Oranges". The functionality of the two versions is different. One reades bytes and the other reads characters.
(And if you don't understand that distinction ... and why it is an important distinction ... you would be advised to read the Java Tutorial lesson on Basic I/O. Particularly the sections on byte streams, character streams and buffered streams.)

How do buffers work in java IO?

I'm having trouble understanding how do buffers work in Java IO.
Excuse me if I don't express myself as clearly as I would like, I'm not
strong on all these concepts.
As I undestand it, in Java there are readers/writers, for chars (meaning the
possibility of more than one byte per char and encoding options), and streams
for bytes.
Then there are some classes that use buffers.
I believe that a buffer is used mainly so that we can avoid unnecessary system
calls that would involve expensive operations, like accesing a slower device, by
storing more in memory and making the system call useful for more data.
The issue I have is that there seem to be buffering classes for both readers/writers and streams.
I would like to know if buffering characters is enough, or if, by the time those
bytes get to the streaming class, they would be flushed on for example newlines,
as some classes state.
The best I've found about this is this post
How do these different code snippets compare in regard to buffering?
Does autoflush thwart the intent of buffering?
Should there be only one buffer in play, and if so, where (writer or stream)
and why?:
// resolveSocket is some socket
PrintWriter out = new PrintWriter(
resolveSocket.getOutputStream(),
true);
// vs
PrintWriter out = new PrintWriter(
new OutputStreamWriter(
new BufferedOutputStream(resolveSocket.getOutputStream()),
StandardCharsets.UTF_8),
true)
My interest is first and foremost to understand buffering better, and practical only after that.
Thak you all in advance.
EDIT: Found this other stack overflow question interesting and related.
Also this other one talks about BufferedOutputStream.

It may help you to understand the difference between a writer and a stream. An OutputStream is a binary sink. A Writer has a character encoding and understands about newlines. The OutputStreamWriter allows you to send character encoded data, and have it correctly translated to binary for consumption by an OutputStream. What you probably want is:
PrintWriter out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(resolveSocket.getOutputStream())));
This will allow you to use your PrintWriter to output chars, have it buffered by the BufferedWriter, and then translated by the OutputStreamWriter for consumption by the socket output stream.

Java: Possible to have more than one type of stream?

Wondering if one can do something like this successfully:
Socket s = new Socket("", 1234);
BufferedInputStream in = new BufferedInputStream(s.getInputStream());
BufferedOutputStream out = new BufferedOutputStream(s.getOutputStream());
ObjectInputStream oin = new ObjectInputStream(s.getInputStream());
ObjectOutputStream out = new ObjectOutputStream(s.getOutputStream());
Or if there's perhaps a better way of doing it. I ask because I want to send raw data over the Buffered I/O streams and use the Object streams as a means of communicating details and establishing a protocol for connection for my program. Right now I'm trying to just use the Buffered streams and use byte arrays for my client/server protocol but I've hit a hiccup where the byte array I receive is not equal to what I expect it to be, so the == operator and .equals() method do not work for me.

You can't use a mix of streams because they are both buffered so you will get corruption and confusion.
Just use the ObjectStreams for everything.
In general, you should only read from or write to one Stream, Reader or Writer for a stream.

Go take a look at How can I read different groups of data on the same InputStream, using different types of InputStreams for each of them? and see if my answer over there helps. It involves tagging the data in an ObjectStream in order to know if it's text or an object.

Should I use DataInputStream or BufferedInputStream

I want to read each line from a text file and store them in an ArrayList (each line being one entry in the ArrayList).
So far I understand that a BufferedInputStream writes to the buffer and only does another read once the buffer is empty which minimises or at least reduces the amount of operating system operations.
Am I correct - do I make sense?
If the above is the case in what situations would anyone want to use DataInputStream. And finally which of the two should I be using and why - or does it not matter.

Use a normal InputStream (e.g. FileInputStream) wrapped in an InputStreamReader and then wrapped in a BufferedReader - then call readLine on the BufferedReader.
DataInputStream is good for reading primitives, length-prefixed strings etc.

The two classes are not mutually exclusive - you can use both of them if your needs suit.
As you picked up, BufferedInputStream is about reading in blocks of data rather than a single byte at a time. It also provides the convenience method of readLine(). However, it's also used for peeking at data further in the stream then rolling back to a previous part of the stream if required (see the mark() and reset() methods).
DataInputStream/DataOutputStream provides convenience methods for reading/writing certain data types. For example, it has a method to write/read a UTF String. If you were to do this yourself, you'd have to decide on how to determine the end of the String (i.e. with a terminator byte or by specifying the length of the string).
This is different from BufferedInputStream's readLine() which, as the method sounds like, only returns a single line. writeUTF()/readUTF() deal with Strings - that string can have as many lines it it as it wants.
BufferedInputStream is suitable for most text processing purposes. If you're doing something special like trying to serialize the fields of a class to a file, you'd want to use DataInput/OutputStream as it offers greater control of the data at a binary level.
Hope that helps.

You can always use both:
final InputStream inputStream = ...;
final BufferedInputStream bufferedInputStream =
new BufferedInputStream(inputStream);
final DataInputStream dataInputStream =
new DataInputStream(bufferedInputStream);

InputStream: Base class to read byte from stream (network or file ), provide ability to read byte from the stream and delete the end of the stream.
DataInputStream: To read data directly as a primitive datatype.
BufferInputStream: Read data from the input stream and use buffer to optimize the speed to access the data.

You shoud use DataInputStream in cases when you need to interpret the primitive types in a file written by a language other Java in platform-independent manner.

I would advocate using Jakarta Commons IO and the readlines() method (of whatever variety).
It'll look after buffering/closing etc. and give you back a list of text lines. I'll happily roll my own input stream wrapping with buffering etc., but nine times out of ten the Commons IO stuff works fine and is sufficient/more concise/less error prone etc.

The differences are:
The DataInputStream works with the binary data, while the BufferedReader work with character data.
All primitive data types can be handled by using the corresponding methods in DataInputStream class, while only string data can be read from BufferedReader class and they need to be parsed into the respective primitives.
DataInputStream is a part of filtered streams, while BufferedReader is not.
DataInputStream consumes less amount of memory space being it is a binary stream, whereas BufferedReader consumes more memory space being it is character stream.
The data to be handled is limited in DataInputStream, whereas the number of characters to be handled has wide scope in BufferedReader.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.