Java I/O streams; what are the differences? - java

java.io has many different I/O streams, (FileInputStream, FileOutputStream, FileReader, FileWriter, BufferedStreams... etc.) and I am confused in determining the differences between them. What are some examples where one stream type is preferred over another, and what are the real differences between them?

Streams: one byte at a time. Good for binary data.
Readers/Writers: one character at a time. Good for text data.
Anything "Buffered": many bytes/characters at a time. Good almost all the time.

When learning Java I made this mental scheme about java.io:
Streams
byte oriented stream (8 bit)
good for binary data such as a Java .class file
good for "machine-oriented" data
Readers/Writers
char (utf-16) oriented stream (16 bit)
good for text such as a Java source
good for "human-oriented" data
Buffered
always useful unless proven otherwise

This is a big topic! I would recommend that you begin by reading I/O Streams:
An I/O Stream represents an input
source or an output destination. A
stream can represent many different
kinds of sources and destinations,
including disk files, devices, other
programs, and memory arrays.
Streams support many different kinds
of data, including simple bytes,
primitive data types, localized
characters, and objects. Some streams
simply pass on data; others manipulate
and transform the data in useful ways.

Separate each name into words: each capital is a different word.
File Input Stream is to get Input from a File using a Stream.
File Output Stream is to write Output to a File using a Stream
And so on and so forth
As mmyers wrote :
Streams: one byte at a time.
Readers/Writers: one character at a time.
Buffered*: many bytes/characters at a time.

The specialisations you mention are specific types used to provide a standard interface to a variety of data sources. For example, a FileInputStream and an ObjectInputStream will both implement the InputStream interface, but will operate on Files and Objects respectively.

Java input and output is defined in terms of an abstract concept called a “stream”, which is a sequence of data.
There are 2 kinds of streams.
Byte streams (8 bit bytes) Æ Abstract classes are: InputStream and OutputStream
Character streams (16 bit UNICODE) Æ Abstract classes are: Reader and Writer
java.io.* classes use the decorator design pattern. The decorator design pattern attaches
responsibilities to objects at runtime. Decorators are more flexible than inheritance because the inheritance
attaches responsibility to classes at compile time. The java.io.* classes use the decorator pattern to construct
different combinations of behavior at runtime based on some basic classes.
from the book Java/J2EE Job Interview Companion By K.Arulkumaran & A.Sivayini

Byte streams are mostly and widely used stream type in java 1.0 for both character and for byte. After java 1.0 it was deprecated and character streams plays a important role. ie., for example
BufferedReader will get the character from the source, and its constructor looks like
BufferedReader(Reader inputReader)..
Here Reader is an abstract class and the once of its concrete classes are InputStreamReader, which will converts bytes into characters and take input from the keyboard(System.in)...
BufferedReader : Contains internal Buffer that will read characters from the stream. Internal counter keeps track of next character to be supplied to the buffer thru read(). InputStreamReader will takes input as bytes and converts internally into characters.

Related

Types of Streams based on Data Flow?

Long Story Short:
How many types of Streams are there based on Data Flow in the java.io package?
Are they Byte Streams and Character Streams or Binary Streams and Character Streams?
Full Question:
https://youtu.be/v1_ATyL4CNQ?t=20m5s
Skip to 20:05
After watching this tutorial yesterday, I was left with the impression that there are 2 Types of Streams, based on Data Flow: BinaryStreams and CharacterStreams. Today after learning more about the topic, my new findings seem to contradict the old ones.
Most of the people on the internet divide the Streams into 2 types Byte Streams and Character Streams. However when searching oracle docs I found information about Binary Streams too, then I found information about Int, Double, Long Streams from the java.util.stream package.
I am sorry if I am asking a silly question, but I am really confused right now.
The naming is really confusing
The I/O streams (Byte Streams and Character Streams) are data streams are
completely unrelated to the new Stream API (java.util.stream). So we are talking here about two differents things with the same name "stream".
IO Streams : used on Input/output operations reading the data from a resource (input), or writing the data to a resource (output).
java.util.stream :include Stream,IntStream,DoubleStream... Stream definition by Oracle documentation is a sequence of elements supporting sequential and parallel aggregate operations.

Both Reader and Stream give the same result , what is the difference? [duplicate]

Today I got this question for which I think I answered very bad. I said stream is a data that flows and reader is a technique where we read from that is a static data. I know this is an awful answer, so please provide me the crisp difference and definitions between these two with example in Java.
Thanks.
An InputStream is byte-oriented. A Reader is character-oriented.
The javadocs are your friend, explaining the difference. Reader, InputStream
As others have said, the use cases for each are slightly different (even though they often can be used interchangeably)
Since readers are for reading characters, they are better when you are dealing with input that is of a textual nature (or data represented as characters). I say better because Readers (in the context of typical usage) are essentially streams with methods that easily facilitate reading character input.
Stream is for reading bytes, Reader is for reading characters. One character may take one byte or more, depending on character set.
Stream classes are byte-oriented classes, that mean all InputStream classes (Buffered and non-buffered) read data byte by byte from stream and all OutputStream(Buffered and non-buffered) classes writes data byte by byte to the stream. Stream classes are useful when you have small data or if you are dealing with binary files like images.
On the other handReader/Writer are character based classes. These classes read or write one character at time from or into stream. These classes extends either java.io.Reader (all character input classes) or java.io.Writer (all character output classes). These classes are useful if you are dealing with text file or other textual stream. These classes are also Buffered and Non-Buffered.

Reading different encoding from the same InputStream [duplicate]

I'm working through the problems in Programming Pearls, 2nd edition, Column 1. One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file. Since Java is the language I'm the most familiar with, I've decided to use it even though the author seems to have had C and C++ in mind.
Since I'm pretending memory is limited for the purpose of the problem I'm working on, I'd like to make sure the process of reading the file has no buffering at all.
I thought InputStreamReader would be a good solution, until I read this in the Java documentation:
To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.
Ideally, only the bytes that are necessary would be read from the stream -- in other words, I don't want any buffering.
One of the problems involves writing a program that uses only around 1 megabyte of memory to store the contents of a file as a bit array with each bit representing whether or not a 7 digit number is present in the file.
This implies that you need to read the file as bytes (not characters).
Assuming that you do have a genuine requirement to read from a file without buffering, then you should use the FileInputStream class. It does no buffering. It reads (or attempts to read) precisely the number of bytes that you asked for.
If you then need to convert those bytes to characters, you could do this by applying the appropriate String constructor to a byte or byte[]. Note that for multibyte character encodings such as UTF-8, you would need to read sufficient bytes to complete each character. Doing that without the possibility of read-ahead is a bit tricky ... and entails "knowledge* of the character encoding you are reading.
(You could avoid that knowledge by using a CharsetDecoder directly. But then you'd need to use the decode method that operates on Buffer objects, and that is a bit complicated too.)
For what it is worth, Java makes a clear distinction between stream-of-byte and stream-of-character I/O. The former is supported by InputStream and OutputStream, and the latter by Reader and Write. The InputStreamReader class is a Reader, that adapts an InputStream. You should not be considering using it for an application that wants to read stuff byte-wise.

Why to avoid using ByteStream much in Java

We shouldn't use byte Stream as Sun Doc says -
actually it represents a kind of low-level I/O that you should avoid.
What is actually low-level I/O and what is exact problem using byte stream.
So the Java docs say:
CopyBytes seems like a normal program, but it actually represents a
kind of low-level I/O that you should avoid. Since xanadu.txt contains
character data, the best approach is to use character streams, as
discussed in the next section. There are also streams for more
complicated data types. Byte streams should only be used for the most
primitive I/O.
The byte streams give you access to the file as it is. Just the bytes. No interpration of any kind. That means no character set conversion, no handling of ints or floats in binary or ascii representation, no dealing with byte orders, or any of that. The higher level streams provide some of these.
Of course a program that copies a file is actually a pretty good example of something that needs a raw byte stream, because it doesn't need or want to do any kind of intepretation of the data; it just wants to copy it verbatim.
So what the really mean is, use byte streams if you think you need them, but be sure you know what you are doing :)
The suggestion is in the context of reading a text file that is discussed in the tutorial. For that purpose it is better to use character streams to handle character set translation properly:
The Java platform stores character values using Unicode conventions.
Character stream I/O automatically translates this internal format to
and from the local character set.
A program that uses character streams in place of byte streams
automatically adapts to the local character set and is ready for
internationalization — all without extra effort by the programmer.

Should I use DataInputStream or BufferedInputStream

I want to read each line from a text file and store them in an ArrayList (each line being one entry in the ArrayList).
So far I understand that a BufferedInputStream writes to the buffer and only does another read once the buffer is empty which minimises or at least reduces the amount of operating system operations.
Am I correct - do I make sense?
If the above is the case in what situations would anyone want to use DataInputStream. And finally which of the two should I be using and why - or does it not matter.
Use a normal InputStream (e.g. FileInputStream) wrapped in an InputStreamReader and then wrapped in a BufferedReader - then call readLine on the BufferedReader.
DataInputStream is good for reading primitives, length-prefixed strings etc.
The two classes are not mutually exclusive - you can use both of them if your needs suit.
As you picked up, BufferedInputStream is about reading in blocks of data rather than a single byte at a time. It also provides the convenience method of readLine(). However, it's also used for peeking at data further in the stream then rolling back to a previous part of the stream if required (see the mark() and reset() methods).
DataInputStream/DataOutputStream provides convenience methods for reading/writing certain data types. For example, it has a method to write/read a UTF String. If you were to do this yourself, you'd have to decide on how to determine the end of the String (i.e. with a terminator byte or by specifying the length of the string).
This is different from BufferedInputStream's readLine() which, as the method sounds like, only returns a single line. writeUTF()/readUTF() deal with Strings - that string can have as many lines it it as it wants.
BufferedInputStream is suitable for most text processing purposes. If you're doing something special like trying to serialize the fields of a class to a file, you'd want to use DataInput/OutputStream as it offers greater control of the data at a binary level.
Hope that helps.
You can always use both:
final InputStream inputStream = ...;
final BufferedInputStream bufferedInputStream =
new BufferedInputStream(inputStream);
final DataInputStream dataInputStream =
new DataInputStream(bufferedInputStream);
InputStream: Base class to read byte from stream (network or file ), provide ability to read byte from the stream and delete the end of the stream.
DataInputStream: To read data directly as a primitive datatype.
BufferInputStream: Read data from the input stream and use buffer to optimize the speed to access the data.
You shoud use DataInputStream in cases when you need to interpret the primitive types in a file written by a language other Java in platform-independent manner.
I would advocate using Jakarta Commons IO and the readlines() method (of whatever variety).
It'll look after buffering/closing etc. and give you back a list of text lines. I'll happily roll my own input stream wrapping with buffering etc., but nine times out of ten the Commons IO stuff works fine and is sufficient/more concise/less error prone etc.
The differences are:
The DataInputStream works with the binary data, while the BufferedReader work with character data.
All primitive data types can be handled by using the corresponding methods in DataInputStream class, while only string data can be read from BufferedReader class and they need to be parsed into the respective primitives.
DataInputStream is a part of filtered streams, while BufferedReader is not.
DataInputStream consumes less amount of memory space being it is a binary stream, whereas BufferedReader consumes more memory space being it is character stream.
The data to be handled is limited in DataInputStream, whereas the number of characters to be handled has wide scope in BufferedReader.

Categories