inputsteam reads a byte each time, and inputstreamreader can convert byte to characher, and then reads a character each time, and reader also reads a character each time, so what is the difference between them?
The InputStreamReader handles the encoding. A character does not always fit into a byte(8bit) and the byte value does not always map to the same char, the java char for example uses 16bit to encode a character which makes it possible to represent a greater number of different characters.
Depending on the source of the InputStream a character may be encoded with ASCII(1 byte), UTF-8(1 or more byte), UTF-16(2 or 4 byte), utf-32(4 byte) or any other existing encoding. Given the right Charset a Reader can convert the raw bytes into the corresponding java character.
From the JavaDocs:
Input Stream:
This abstract class is the superclass of all classes representing an input stream of bytes
Input Stream Reader:
a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset
The stream just gives you the raw bytes, the reader can convert the raw bytes into characters for different encodings (ASCII/ISO/UTF).
http://download.oracle.com/javase/6/docs/api/java/io/InputStream.html
http://download.oracle.com/javase/6/docs/api/java/io/InputStreamReader.html
http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html
InputStreamReader is an implementation of the abstract class Reader that reads character from an InputStream, converting bytes according to a given charset. There are other implementations of Reader too, for example StringReader which return character from a string and does not need any charset conversion.
Related
In Java, InputStream and OutputStream deal with byte[], and Reader and Writer with char[].
Do their input or output byte[] and char[] essentially have the same values? (That is my impression, because a char and a byte in IO have the same value)
In other words, are InputStream and Reader essentially the same, and are OutputStream and Writer essentially the same?
They're not essentially the same, but they do the same sorts of things for different kinds of data.
InputStream and OutputStream work in bytes. You'd use them when dealing with non-textual information (such as an image).
Reader and Writer work in characters. You'd use them when dealing with textual information.
So "yes" and "no". :-) InputStream and Reader are both for reading information (a stream of bytes or a stream of characters, respectively), and OutputStream and Writer are both for writing information (a stream of bytes or a stream of characters, respectively). Which you use depends on what kind of data you're dealing with. The streams are byte-oriented. The readers/writers are character-oriented.
There are bridging classes between the two kinds of data:
InputStreamReader reads from an InputStream and converts bytes to characters using a CharSet (one provided explicitly or by name).
OutputStreamWriter does the converse: Converts characters to bytes (again via a CharSet) and writes the bytes to an OutputStream.
...but most Reader/Writer subclasses read from/write to sources/destinations that are already character-based, and so don't deal with bytes at all. For instance, StringReader reads characters from a string. Since the source (the string) is already character-based, the Reader doesn't ever deal with bytes, just characters.
Yes, you have the right idea. Standard classes InputStreamReader and OutputStreamWriter act as adapters from the byte stream interfaces to the character stream interfaces, requiring only that a Charset (typically UTF-8) is specified. That Charset will be used to convert the incoming bytes into Java's UTF-16 character type, so notably it is not true that the actual bytes read from an InputStream and Reader are always the same.
InputStream is typically used for reading data of any type, while Reader is only appropriate for reading text data.
From the Java Tutorial site, we know InputStreamReader and OutputStreamWriter can convert streams between bytes and characters.
InputStreamReader converts bytes read from input to characters, while OutputStreamWriter converts characters to bytes to output.
But when should I use this two classes?
We have Inputstream/OutputStream input/output byte by byte, and Reader/Writer input/output character by character.
So when using InputStreamReader to input characters from byte stream, why not just use Reader class (or its sub classes) to read character directly? Why not use OutputStream instead of OutputStreamWriter to write bytes directly?
EDIT:
When do I need to convert streams between bytes and characters using InputStreamReader and OutputStreamWriter?
EDIT:
Under which circumstances should I care about encoding scheme?
To understand the purpose of this, you need to get the following firmly into your mind. In Java char and String are for "text" expressed as Unicode, and byte or byte[] are for binary data. Bytes are NOT text. Bytes can represent encoded text ... but they have to be decoded before you can use the char and String types on them.
So when using InputStreamReader to input characters from byte stream, why not just use Reader class (or its sub classes) to read character directly?
(InputStreamReader is a subclass of Reader, so it not a case of "either ... or ...".)
The purpose of the InputStreamReader is to adapt an InputStream to a Reader. This adapter takes care of decoding the text from bytes to chars which contain Unicode codepoints1.
So you would use it when you have an existing InputStream (e.g. from a socket) ... or when you need more control over the selection of the encoding scheme. (Re the latter - you can open a file directly using FileReader, but that implicitly uses the default platforming encoding for the file. By using FileInputStream -> InputStreamReader you can specify the encoding scheme explicitly.)
Why not use OutputStream instead of OutputStreamWriter to write bytes directly?
Its encodings again. If you want write text to an OUtputStream, you have to encode it according to some encoding scheme; e.g.
os.write(str.getBytes("UTF-8"));
By using a Writer, you move the encoding into the output pipeline where it is less obtrusive, and can typically be done more efficiently.
1 - or more strictly, a 16-bit representation of Unicode codepoints.
Reader/Writer give API to read/write the String literals into the stream. Where as Inputstream/OutputStream doesn't provide read/write of String literals, instead they read/write byte by byte.
So If your program needs to read/write String, then I advice using Reader/Writer for simplicity.
Also, Reader/Writer use InputStream/OutputStream internally, so Streams read/write little faster if used directly
first, I know the difference between character and byte.
character is a signature or remark of something("A", "中" or other), byte is a concrete size in computer. And the size of a character in computer depends on the encoding style.
But what exactly is a character stream and a byte stream? what's the specific type they stand for? A byte stream is a stream of bytes? if so, what is a stream of character? My last question is, what type of stream does TCP transport?
Character Stream is a higher level concept than Byte Stream. A Character Stream is, effectively, a Byte Stream that has been wrapped with logic that allows it to output characters from a specific encoding; as opposed to one having to read bytes and decode the characters they represent.
An InputStream reads bytes, and a Reader reads characters.
Everything over TCP will natively be in bytes. If you know that the byte stream is representing characters, you can use an InputStreamReader to use the InputStream as a Reader.
TCP transports bytes of course. What these bytes represent is up to the protocol.
You can read about the relation between character and byte streams here: http://docs.oracle.com/javase/tutorial/i18n/text/stream.html
Practically, a character stream is an application-side abstraction over a byte stream, allowing to read/write bytes into or from characters using various encodings.
Have a look at this :
Character Streams versus Byte Streams
Character and Byte Streams
and i assume TCP transport packets, stream of bytes.
characterstream classes in java are used to handle character'sinput and output for ex-hadles unicode whereas bytestream classes are used to handle input and output of bytes i.e ascii codes only.the former was used in java 1.0 version whereas later is used in java 1.1
The difference between InputStream and InputStreamReader is that InputStream reads as byte, while InputStreamReader reads as char. For example, if the text in a file is abc,then both of them work fine. But if the text is a你们, which is composed of an a and two Chinese characters, then the InputStream does not work.
So we should use InputStreamReader, but my question is:
How does InputStreamReader recognize characters?
a is one byte, but a Chinese character is two bytes. Does it read a as one byte and recognize the other of characters as two bytes, or for every character in this text, does the InputStreamReader read it as two bytes?
An InputStream reads raw octet (8 bit) data. In Java, the byte type is equivalent to the char type in C. In C, this type can be used to represent character data or binary data. In Java, the char type shares greater similarities with the C wchar_t type.
An InputStreamReader then will transform data from some encoding into UTF-16. If "a你们" is encoded as UTF-8 on disk, it will be the byte sequence 61 E4 BD A0 E4 BB AC. When you pass the InputStream to InputStreamReader with the UTF-8 encoding, it will be read as the char sequence 0061 4F60 4EEC.
The character encoding API in Java contains the algorithms to perform this transformation. You can find a list of encodings supported by the Oracle JRE here. The ICU project is a good place to start if you want to understand the internals of how this works in practice.
As Alexander Pogrebnyak points out, you should almost always provide the encoding explicitly. byte-to-char methods that do not specify an encoding rely on the JRE default, which is dependent on operating systems and user settings.
You have to give reader a hint, by providing a character set that your binary file is written in. E.g
Reader reader =
new InputStreamReader(
new FileInputStream( "/path/to/file" ),
"UTF-8" // most likely that the encoding of the file
)
Without a hint it will use your platform default encoding, which in many cases is not what you want.
This link has a nice explanation of encodings: http://www.joelonsoftware.com/articles/Unicode.html
Please explain what Byte streams and Character streams are. What exactly do these mean? Is a Microsoft Word document Byte oriented or Character oriented?
Thanks
A stream is a way of sequentially accessing a file. A byte stream access the file byte by byte. A byte stream is suitable for any kind of file, however not quite appropriate for text files. For example, if the file is using a unicode encoding and a character is represented with two bytes, the byte stream will treat these separately and you will need to do the conversion yourself.
A character stream will read a file character by character. A character stream needs to be given the file's encoding in order to work properly.
Although a Microsoft Word Document contains text, it can't be accessed with a character stream (it isn't a text file). You need to use a byte stream to access it.
ByteStreams:
From oracle documentation page about byte streams:
Programs use byte streams to perform input and output of 8-bit bytes. All byte stream classes are descended from InputStream and OutputStream.
When to use:
Byte streams should only be used for the most primitive I/O
When not to use:
You should not use Byte stream to read Character streams
e.g. To read a text file
Character Streams:
From oracle documentation page about character streams:
The Java platform stores character values using Unicode conventions. Character stream I/O automatically translates this internal format to and from the local character set.
All character stream classes are descended from Reader and Writer.
Character streams are often "wrappers" for byte streams. The character stream uses the byte stream to perform the physical I/O, while the character stream handles translation between characters and bytes.
There are two general-purpose byte-to-character "bridge" streams: InputStreamReader and OutputStreamWriter.
When to use:
To read character streams either from Socket or File of characters
In Summary:
Byte stream reads and write a byte at a time. We must avoid the usage of byte stream while dealing with more sophisticated data.
Character Stream and other available streams should be used to handle sophisticated data.
1.Character oriented are tied to datatype. Only string type or character type can be read through it while byte oriented are not tied to any datatype, data of any datatype can be read(except string) just you have to specify it.
2.Character oriented reads character by character while byte oriented reads byte by byte
3.Character oriented streams use character encoding scheme(UNICODE) while byte oriented do not use any encoding scheme
4.Character oriented streams are also known as reader and writer streams
Byte oriented streams are known as data streams-Data input stream and Data output stream
Read this. It tells you about the difference between bytes and characters (as well as loads of other useful stuff)
A character stream will read a file character by character. The character streams are capable to read 16-bit characters (byte streams read 8-bit characters). Character streams are capable to translate implicitly 8-bit data to 16-bit data or vice versa. Character stream can support all types of character sets ASCII, Unicode, UTF-8, UTF-16 etc.But byte stream is suitable only for ASCII character set.The Java platform stores character values using Unicode conventions. Character stream I/O automatically translates this internal format to and from the local character set.
Unless you are working with binary data, such as image and sound files, you should use readers and writers to read and write information with character streams.