What is the difference between OutputStream and Writer? - java

Can someone explain me the difference between OutputStream and Writer? Which of these classes should I work with?

Streams work at the byte level, they can read (InputStream) and write (OutputStream) bytes or list of bytes to a stream.
Reader/Writers add the concept of character on top of a stream. Since a character can only be translated to bytes by using an Encoding, readers and writers have an encoding component (that may be set automatically since Java has a default encoding property). The characters read (Reader) or written (Writer) are automatically converted to bytes by the encoding and sent to the stream.

OutputStream classes writes to the target byte by byte where as Writer classes writes to the target character by character

An OutputStream is a stream that can write information. This is fairly general, so there are specialized OutputStream for special purposes like writing to files. A stream can only write arrays of bytes.
Writers provide more flexibility in that they can write characters and even strings while taking a special encoding into account.
Which one to take is really a matter of what you want to write. If you do have bytes already, you can use the stream directly. If you have characters or strings, you either need to convert them to bytes yourself if you want to write them to a stream, or you need to use a Writer which does that job for you.

OutputStream uses bare bytes, whereas Writer uses encoded charaters.

The Reader/Writer class hierarchy is character-oriented, and the Input Stream/Output Stream class hierarchy is byte-oriented.
Basically there are two types of streams.Byte streams that are used to handle stream of bytes and character streams for handling streams of characters.In byte streams input/output streams are the abstract classes at the top of hierarchy,while writer/reader are abstract classes at the top of character streams hierarchy.
More details here
Cheers!!!

Related

Is byte stream encodes byte to characters or only operates on bytes?

We have byte and character stream, If you read some examples from internet you can find that byte stream only operates on bytes and nothing more.
Once i read that both streams encodes bytes to characters depending on encoding, like if it’s byte stream then utf-8, character stream utf-16. So both of them encodes bytes to characters, if this's true why everywhere is written that it operates on bytes only. Byte stream can read data except bytes and then just converts to bytes?
And then why we need encoding in byte stream ?
Some popular websites did not help me.
Once i read that both streams encodes bytes to characters depending on encoding, like if it’s byte stream then utf-8, character stream utf-16. So both of them encodes bytes to characters, if this's true why everywhere is written that it operates on bytes only. Byte stream can read data except bytes and then just converts to bytes?
Everything in a typical modern computer has to be represented in bytes: a file holds a sequence of bytes, a network connection lets you send a sequence of bytes, a pointer identifies the location of a byte in memory, and so on. So a byte stream — an InputStream or OutputStream or the like — provides basic processing to let you read or write a sequence of bytes, no matter what kind of data is being represented by those bytes. The data might be text encoded as UTF-8 or UTF-16 or some other encoding, or it might be an image in a GIF or PNG or JPEG or other format, or it might be audio data or video data or a PDF or a Word document or . . . well, you get the idea.
A character stream — a Reader or Writer — provides a higher level of processing specifically for text data, so that you don't need to worry about the specific bytes being used to represent the characters, you just need to worry about the characters themselves. You just need to tell the character stream which character encoding to use (or let it use an appropriate default), and it can handle the rest from there.
But there's one big complication: Java didn't introduce this distinction until version 1.1, and because Java aims for a very high degree of backward-compatibility, there are some classes that survive from version 1.0 that kind of straddle the line. In particular, there is a PrintStream class that extends OutputStream and adds special 'print' methods that take more convenient types, such as String, and handle the character encoding internally. That PrintStream class has been there since version 1.0, and is still in wide use, especially because System.out and System.err are instances of it. (In theory, we should be using PrintWriter instead.)
And then why we need encoding in byte stream ?
We need a character encoding in whatever layer is converting between character sequences and byte sequences. Normally that layer is separate from the byte stream, but as I mentioned above, there are some holdovers from version 1.0 that handle the conversion themselves, which means they need to know which encoding to use.
It is a fundamentally quite straightforward system, but due to some required existing knowledge and possible interactions of several parts it can be confusing.
Let's put down some fundamental truths/axioms:
a InputStream is fundamentally about reading bytes from somewhere.
a OutputStream is fundamentally about writing bytes to somewhere.
Reader/Writer are the equivalent of those two for chars/String/text.
In the Java world, as long as you handle only String (or its related types like StringBuilder, ...) you don't need to care about encoding. It will always look like UTF-16, but you might as well pretend no encoding happens.
if you only ever handle byte[] (and related types like ByteBuffer) then you also don't need to care about encoding.
the encoding only ever comes into play when you want to cross over from the byte[] world to the String world (or the other way around).
So some Writer classes like OutputStreamWriter take a Charset to construct. And that's precisely because it's one of those borders that I mention in the last point above: It's handling both String and byte[] (indirectly), because it is a Writer that writes to a OutputStream and for that to work it will need to convert the String that gets written to it into a byte[] that it can forward to the OutputStream.
Other Writer (such as StringWriter) don't transfer data between those two world: it takes in String and produces String, so no conversion is necessary.
On the other side a ByteArrayInputStream is an InputStream that reads from a byte[], so again: both the input and the output live in "the same world", so no conversion is necessary and thus no Charset parameter exists.
tl;dr the "purity" of InputStream/OutputStream/Reader/Writer exists as long as you look only at those interfaces. When you look at specific implementations some of those will need to convert from the text world to the binary world (or vice versa) and those implementations will need to handle both worlds.

Are `InputStream` and `Reader` essentially the same, and are `OutputStream` and `Writer` essentially the same?

In Java, InputStream and OutputStream deal with byte[], and Reader and Writer with char[].
Do their input or output byte[] and char[] essentially have the same values? (That is my impression, because a char and a byte in IO have the same value)
In other words, are InputStream and Reader essentially the same, and are OutputStream and Writer essentially the same?
They're not essentially the same, but they do the same sorts of things for different kinds of data.
InputStream and OutputStream work in bytes. You'd use them when dealing with non-textual information (such as an image).
Reader and Writer work in characters. You'd use them when dealing with textual information.
So "yes" and "no". :-) InputStream and Reader are both for reading information (a stream of bytes or a stream of characters, respectively), and OutputStream and Writer are both for writing information (a stream of bytes or a stream of characters, respectively). Which you use depends on what kind of data you're dealing with. The streams are byte-oriented. The readers/writers are character-oriented.
There are bridging classes between the two kinds of data:
InputStreamReader reads from an InputStream and converts bytes to characters using a CharSet (one provided explicitly or by name).
OutputStreamWriter does the converse: Converts characters to bytes (again via a CharSet) and writes the bytes to an OutputStream.
...but most Reader/Writer subclasses read from/write to sources/destinations that are already character-based, and so don't deal with bytes at all. For instance, StringReader reads characters from a string. Since the source (the string) is already character-based, the Reader doesn't ever deal with bytes, just characters.
Yes, you have the right idea. Standard classes InputStreamReader and OutputStreamWriter act as adapters from the byte stream interfaces to the character stream interfaces, requiring only that a Charset (typically UTF-8) is specified. That Charset will be used to convert the incoming bytes into Java's UTF-16 character type, so notably it is not true that the actual bytes read from an InputStream and Reader are always the same.
InputStream is typically used for reading data of any type, while Reader is only appropriate for reading text data.

When should I use InputStreamReader and OutputStreamWriter?

From the Java Tutorial site, we know InputStreamReader and OutputStreamWriter can convert streams between bytes and characters.
InputStreamReader converts bytes read from input to characters, while OutputStreamWriter converts characters to bytes to output.
But when should I use this two classes?
We have Inputstream/OutputStream input/output byte by byte, and Reader/Writer input/output character by character.
So when using InputStreamReader to input characters from byte stream, why not just use Reader class (or its sub classes) to read character directly? Why not use OutputStream instead of OutputStreamWriter to write bytes directly?
EDIT:
When do I need to convert streams between bytes and characters using InputStreamReader and OutputStreamWriter?
EDIT:
Under which circumstances should I care about encoding scheme?
To understand the purpose of this, you need to get the following firmly into your mind. In Java char and String are for "text" expressed as Unicode, and byte or byte[] are for binary data. Bytes are NOT text. Bytes can represent encoded text ... but they have to be decoded before you can use the char and String types on them.
So when using InputStreamReader to input characters from byte stream, why not just use Reader class (or its sub classes) to read character directly?
(InputStreamReader is a subclass of Reader, so it not a case of "either ... or ...".)
The purpose of the InputStreamReader is to adapt an InputStream to a Reader. This adapter takes care of decoding the text from bytes to chars which contain Unicode codepoints1.
So you would use it when you have an existing InputStream (e.g. from a socket) ... or when you need more control over the selection of the encoding scheme. (Re the latter - you can open a file directly using FileReader, but that implicitly uses the default platforming encoding for the file. By using FileInputStream -> InputStreamReader you can specify the encoding scheme explicitly.)
Why not use OutputStream instead of OutputStreamWriter to write bytes directly?
Its encodings again. If you want write text to an OUtputStream, you have to encode it according to some encoding scheme; e.g.
os.write(str.getBytes("UTF-8"));
By using a Writer, you move the encoding into the output pipeline where it is less obtrusive, and can typically be done more efficiently.
1 - or more strictly, a 16-bit representation of Unicode codepoints.
Reader/Writer give API to read/write the String literals into the stream. Where as Inputstream/OutputStream doesn't provide read/write of String literals, instead they read/write byte by byte.
So If your program needs to read/write String, then I advice using Reader/Writer for simplicity.
Also, Reader/Writer use InputStream/OutputStream internally, so Streams read/write little faster if used directly

Why character streams?

I understand that Java character streams wrap byte streams such that the underlying byte stream is interpreted as per the system default or an otherwise specifically defined character set.
My systems default char-set is UTF-8.
If I use a FileReader to read in a text file, everything looks normal as the default char-set is used to interpret the bytes from the underlying InputStreamReader. If I explicitly define an InputStreamReader to read the UTF-8 encoded text file in as UTF-16, everything obviously looks strange. Using a byte stream like FileInputStream and redirecting its output to System.out, everything looks fine.
So, my questions are;
Why is it useful to use a character stream?
Why would I use a character stream instead of directly using a byte stream?
When is it useful to define a specific char-set?
Code that deals with strings should only "think" in terms of text - for example, reading an input source line by line, you don't want to care about the nature of that source.
However, storage is usually byte-oriented - so you need to create a conversion between the byte-oriented view of a source (encapsulated by InputStream) and the character-oriented view of a source (encapsulated by Reader).
So a method which (say) counts the lines of text in an input source should take a Reader parameter. If you want to count the lines of text in two files, one of which is encoded in UTF-8 and one of which is encoded in UTF-16, you'd create an InputStreamReader around a FileInputStream for each file, specifying the appropriate encoding each time.
(Personally I would avoid FileReader completely - the fact that it doesn't let you specify an encoding makes it useless IMO.)
An InputStream reads bytes, while a Reader reads characters. Because of the way bytes map to characters, you need to specify the character set (or encoding) when you create an InputStreamReader, the default being the platform character set.
When you are reading/writing text which contains characters which could be > 127 , use a char stream. When you are reading/writing binary data use a byte stream.
You cna read text as binary if you wish, but unless you make alot of assumptions it rarely gains you much.

Which Class used for writing characters rather than bytes?

Which class should be used in situations that require writing characters rather than bytes?
Please take a look at java.io.Writer and subclasses.
PrintWriter will be useful
http://download.oracle.com/javase/1.4.2/docs/api/java/io/PrintWriter.html
An important thing to know about I/O in Java is that streams (InputStream and OutputStream etc.) are used for reading and writing binary data (you read or write bytes exactly as they are in the file), and readers and writers (Reader and Writer etc.) are for reading and writing characters.
Readers and writers are a layer on top of streams. A Reader interprets the bytes from an InputStream using a character encoding (such as UTF-8, ISO-8859-1, US-ASCII) to convert them into characters, and a Writer uses a character encoding to turn characters into bytes.

Categories