Which class should be used in situations that require writing characters rather than bytes?
Please take a look at java.io.Writer and subclasses.
PrintWriter will be useful
http://download.oracle.com/javase/1.4.2/docs/api/java/io/PrintWriter.html
An important thing to know about I/O in Java is that streams (InputStream and OutputStream etc.) are used for reading and writing binary data (you read or write bytes exactly as they are in the file), and readers and writers (Reader and Writer etc.) are for reading and writing characters.
Readers and writers are a layer on top of streams. A Reader interprets the bytes from an InputStream using a character encoding (such as UTF-8, ISO-8859-1, US-ASCII) to convert them into characters, and a Writer uses a character encoding to turn characters into bytes.
Related
In Java, InputStream and OutputStream deal with byte[], and Reader and Writer with char[].
Do their input or output byte[] and char[] essentially have the same values? (That is my impression, because a char and a byte in IO have the same value)
In other words, are InputStream and Reader essentially the same, and are OutputStream and Writer essentially the same?
They're not essentially the same, but they do the same sorts of things for different kinds of data.
InputStream and OutputStream work in bytes. You'd use them when dealing with non-textual information (such as an image).
Reader and Writer work in characters. You'd use them when dealing with textual information.
So "yes" and "no". :-) InputStream and Reader are both for reading information (a stream of bytes or a stream of characters, respectively), and OutputStream and Writer are both for writing information (a stream of bytes or a stream of characters, respectively). Which you use depends on what kind of data you're dealing with. The streams are byte-oriented. The readers/writers are character-oriented.
There are bridging classes between the two kinds of data:
InputStreamReader reads from an InputStream and converts bytes to characters using a CharSet (one provided explicitly or by name).
OutputStreamWriter does the converse: Converts characters to bytes (again via a CharSet) and writes the bytes to an OutputStream.
...but most Reader/Writer subclasses read from/write to sources/destinations that are already character-based, and so don't deal with bytes at all. For instance, StringReader reads characters from a string. Since the source (the string) is already character-based, the Reader doesn't ever deal with bytes, just characters.
Yes, you have the right idea. Standard classes InputStreamReader and OutputStreamWriter act as adapters from the byte stream interfaces to the character stream interfaces, requiring only that a Charset (typically UTF-8) is specified. That Charset will be used to convert the incoming bytes into Java's UTF-16 character type, so notably it is not true that the actual bytes read from an InputStream and Reader are always the same.
InputStream is typically used for reading data of any type, while Reader is only appropriate for reading text data.
From the Java Tutorial site, we know InputStreamReader and OutputStreamWriter can convert streams between bytes and characters.
InputStreamReader converts bytes read from input to characters, while OutputStreamWriter converts characters to bytes to output.
But when should I use this two classes?
We have Inputstream/OutputStream input/output byte by byte, and Reader/Writer input/output character by character.
So when using InputStreamReader to input characters from byte stream, why not just use Reader class (or its sub classes) to read character directly? Why not use OutputStream instead of OutputStreamWriter to write bytes directly?
EDIT:
When do I need to convert streams between bytes and characters using InputStreamReader and OutputStreamWriter?
EDIT:
Under which circumstances should I care about encoding scheme?
To understand the purpose of this, you need to get the following firmly into your mind. In Java char and String are for "text" expressed as Unicode, and byte or byte[] are for binary data. Bytes are NOT text. Bytes can represent encoded text ... but they have to be decoded before you can use the char and String types on them.
So when using InputStreamReader to input characters from byte stream, why not just use Reader class (or its sub classes) to read character directly?
(InputStreamReader is a subclass of Reader, so it not a case of "either ... or ...".)
The purpose of the InputStreamReader is to adapt an InputStream to a Reader. This adapter takes care of decoding the text from bytes to chars which contain Unicode codepoints1.
So you would use it when you have an existing InputStream (e.g. from a socket) ... or when you need more control over the selection of the encoding scheme. (Re the latter - you can open a file directly using FileReader, but that implicitly uses the default platforming encoding for the file. By using FileInputStream -> InputStreamReader you can specify the encoding scheme explicitly.)
Why not use OutputStream instead of OutputStreamWriter to write bytes directly?
Its encodings again. If you want write text to an OUtputStream, you have to encode it according to some encoding scheme; e.g.
os.write(str.getBytes("UTF-8"));
By using a Writer, you move the encoding into the output pipeline where it is less obtrusive, and can typically be done more efficiently.
1 - or more strictly, a 16-bit representation of Unicode codepoints.
Reader/Writer give API to read/write the String literals into the stream. Where as Inputstream/OutputStream doesn't provide read/write of String literals, instead they read/write byte by byte.
So If your program needs to read/write String, then I advice using Reader/Writer for simplicity.
Also, Reader/Writer use InputStream/OutputStream internally, so Streams read/write little faster if used directly
Can someone explain me the difference between OutputStream and Writer? Which of these classes should I work with?
Streams work at the byte level, they can read (InputStream) and write (OutputStream) bytes or list of bytes to a stream.
Reader/Writers add the concept of character on top of a stream. Since a character can only be translated to bytes by using an Encoding, readers and writers have an encoding component (that may be set automatically since Java has a default encoding property). The characters read (Reader) or written (Writer) are automatically converted to bytes by the encoding and sent to the stream.
OutputStream classes writes to the target byte by byte where as Writer classes writes to the target character by character
An OutputStream is a stream that can write information. This is fairly general, so there are specialized OutputStream for special purposes like writing to files. A stream can only write arrays of bytes.
Writers provide more flexibility in that they can write characters and even strings while taking a special encoding into account.
Which one to take is really a matter of what you want to write. If you do have bytes already, you can use the stream directly. If you have characters or strings, you either need to convert them to bytes yourself if you want to write them to a stream, or you need to use a Writer which does that job for you.
OutputStream uses bare bytes, whereas Writer uses encoded charaters.
The Reader/Writer class hierarchy is character-oriented, and the Input Stream/Output Stream class hierarchy is byte-oriented.
Basically there are two types of streams.Byte streams that are used to handle stream of bytes and character streams for handling streams of characters.In byte streams input/output streams are the abstract classes at the top of hierarchy,while writer/reader are abstract classes at the top of character streams hierarchy.
More details here
Cheers!!!
I understand that Java character streams wrap byte streams such that the underlying byte stream is interpreted as per the system default or an otherwise specifically defined character set.
My systems default char-set is UTF-8.
If I use a FileReader to read in a text file, everything looks normal as the default char-set is used to interpret the bytes from the underlying InputStreamReader. If I explicitly define an InputStreamReader to read the UTF-8 encoded text file in as UTF-16, everything obviously looks strange. Using a byte stream like FileInputStream and redirecting its output to System.out, everything looks fine.
So, my questions are;
Why is it useful to use a character stream?
Why would I use a character stream instead of directly using a byte stream?
When is it useful to define a specific char-set?
Code that deals with strings should only "think" in terms of text - for example, reading an input source line by line, you don't want to care about the nature of that source.
However, storage is usually byte-oriented - so you need to create a conversion between the byte-oriented view of a source (encapsulated by InputStream) and the character-oriented view of a source (encapsulated by Reader).
So a method which (say) counts the lines of text in an input source should take a Reader parameter. If you want to count the lines of text in two files, one of which is encoded in UTF-8 and one of which is encoded in UTF-16, you'd create an InputStreamReader around a FileInputStream for each file, specifying the appropriate encoding each time.
(Personally I would avoid FileReader completely - the fact that it doesn't let you specify an encoding makes it useless IMO.)
An InputStream reads bytes, while a Reader reads characters. Because of the way bytes map to characters, you need to specify the character set (or encoding) when you create an InputStreamReader, the default being the platform character set.
When you are reading/writing text which contains characters which could be > 127 , use a char stream. When you are reading/writing binary data use a byte stream.
You cna read text as binary if you wish, but unless you make alot of assumptions it rarely gains you much.
Here's just this example:
http://www.xyzws.com/Javafaq/how-to-use-httpurlconnection-post-data-to-web-server/139
Why it feels so strange?
You are actually looking at two different kinds of stream.
The Writer / Reader classes and subclasses are for reading / writing character-based data. It takes care of conversion between Java's internal UTF-16 representation of text and the character encoding used outside. The BufferedReader class adds a readLine() method that understands end-of-line makers.
The InputStream / OutputStream classes and subclasses are for reading and writing byte-based data without any assumptions about character encodings, or that the data is text. Since it eschews these assumptions, "line" has no clear meaning, and hence the BufferedInputStream class does not have a readLine() method.
(Incidentally, DataInputStream does have a readLine() method, but it is deprecated because it is broken. It makes assumptions about encodings, etc that are invalid on some platforms!)
In your particular example, the code is asymmetric because the HTTP service it designed to talk to is asymmetric. The service expects a request with binary content (encoded using the DataOutputStream wrapper), and delivers a response with text content. This is not particularly unusual ... or wrong.
The strangeness of writing the "input" to a server to an "output" is merely a matter of perspective. In simple terms, an OutputStream / Writer is something you "write to" (i.e. a data sink) and an InputStream or Reader is something you "read from" (i.e. a data source). That's just the way it is, and it is not strange at all once you get used to it.
Actually, we don't. There is no method readLine defined in InputStream. It also operates on bytes only, just like OutputStream.
In the code you referenced, readLine is called on a BufferedReader.
Reader and Writer are for text data and operate on characters (and Strings), InputStream and OutputStream work with binary data (raw bytes). To convert between the two (i.e. wrap an InputStream into a Reader or an OutputStream into a Writer), you need to choose a character set.
I'm feeling strange why not read out from OutputStream but from InputStream
That's just a matter of perspective.
An OutputStream or a Writer is where you write your output to.
An InputStream or a Reader is where you read your input from.
Of course, somewhere, on the other end of the stream, someone might treat your OutputStream as their InputStream ...
readLine does exactly what the name implies -- it reads a line of text until the end-of-line marker.
When you write to a stream, you already know where your line ends.
If you are looking for a way to write to streams in a more intuitive way, try PrintWriter.