Why does the character stream read ints?

Why does the character stream read ints? - java

In the example given in the Oracle Java Tutorial they are trying to read characters as integers... .
Why and how does that work?
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}

If you read char, there would be no value you could use for end of file.
By using a larger type int, its possible to have every possible character AND another symbol which means end of file.

This is because characters ARE integers. Each character has a unicode equivalent.

Basically a char is an int. Try the following:
char c = 'c';
int i = c;
This will not cause a compile error.

Behind the scenes in java, a char is just a 16-bit unsigned value. An int is a 32-bit unsigned value.
chars are a subset of ints whose values have meaning on the ASCII tables.
Because of this relationship, it is a convenience for syntax to allow the two types to easily converted to the other.

Well, if you read the documentation for Reader/Writer you can see the following explanation:
Writer Class - write Method
Writes a single character. The character to be written is contained
in the 16 low-order bits of the given integer value; the 16
high-order bits are ignored.
And the code simply does:
public void write(int c) throws IOException {
synchronized (lock) {
if (writeBuffer == null){
writeBuffer = new char[writeBufferSize];
}
writeBuffer[0] = (char) c;
write(writeBuffer, 0, 1);
}
}
So, in the case of Writer, and as far as I can see this could have been done with a char data type.
The Reader, on the other hand, int its read method has the responsibility of returning a character or the end of the stream indicator.
The documentation says:
Reader Class read Method
The character read, as an integer in the range 0 to 65535
or -1 if the end of the stream has been reached.
As such, a data type bigger than just a char is needed, and in this case int is used.
And it is implemented as follows:
public int read() throws IOException {
char cb[] = new char[1];
if (read(cb, 0, 1) == -1)
return -1;
else
return cb[0];
}
So, this second case justifies the use of a bigger data type.
The reason why they use an int in both classes could be just a matter of consistency.

Related

Is there some sort of functionality in Java that converts a char into a bit?

I'm trying to find a way to convert a char (Precondition is the char can only be '0' or '1') into an actual bit in Java. I'm not sure if Java has some built-in functionality for this, or if there is an algorithm that can be implemented to do so.
I need to implement the following class:
public void writeBit(char bit) {
//PRE:bit == '0' || bit == '1'
try {
} catch (IOException e) {
System.out.println(e);
}
}
I cannot change the method structure in any way. I am implementing Huffman Encoding and have an array of Strings that represent the encodings for every character within an input file. For example, 'A' or array[65] contains the String: "01011". So if I see the letter A in my file, I need to use writeBit to write out A's respective String to a binary file. Every time I reach 8 bits (one byte) I will call writeByte to send those 8 bits to the binary file, then reset some sort of counter variable to 0 and continue.
What I'm stuck on is how I am supposed to convert the char bit into an actual bit, so that it can be properly written out to a binary file.

Java does not have a primitive data type representing a single bit. On many hardware architectures, it is not even possible to access memory with that granularity.
When you say "an actual bit", then, I can only assume that you mean an integer value that is either 0 or 1, as opposed to char values '0' and '1'. There are numerous ways to perform such a conversion, among them:
byte the_bit = bit - '0';. This takes advantage of the fact that char is an integer type, and that the decimal digits zero and one are encoded in Java with consecutive character codes.
byte the_bit = (bit == '0') ? 0 : 1;. This just explicitly tests whether bit contains the value '0', evaluating to 0 if so or 1 if not.
It gets more complicated from there, for example:
byte the_bit = Byte.parseByte(String.valueOf(bit));. This converts the char to a string containing (only) that char, and then parses it as the string representation of a byte.
All of the above rely to one degree or another on the precondition given: that bit does not have any value other than '0' or '1'.
With that said, I think anything like this is probably the wrong approach for implementing a Huffman encoding, because Java Strings are an unlikely, very heavyweight, representation for the bit strings involved.

You can use Integer.parseInt(String s, int radix) or Integer.parseUnsignedInt(String s, int radix) with radix 2, to convert from a "binary digits string" to internal int java integer form.
public static void main(String[] args) {
int num = Integer.parseInt("101010", 2);
// print 42
System.out.println(num);
}
And reversely with method Integer.toBinaryString(int i) you can generate the binary string representation:
// print 101010
System.out.println(Integer.toBinaryString(42));
Similarly you can use Byte.parseByte(String s, int radix) to parse a byte:
public static void main(String[] args) {
byte num = Byte.parseByte("101010", 2);
// print 42
System.out.println(num);
}

How FileInputStream and FileOutputStream Works in Java?

I'm reading about all input/output streams in java on Java Tutorials Docs. Tutorials writer use this example:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("xanadu.txt");
out = new FileOutputStream("outagain.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
xanadu.txt File data:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Output to outagain.txt file:
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
Why do the writers use int c even if we are reading characters?
Why use -1 in while condition?
How out.write(c); method convert int to again characters?

1: Now I want to ask why writer use int c? even we are reading characters.
FileInputStream.read() returns one byte of data as an int. This works because a byte can be represented as an int without loss of precision. See this answer to understand why int is returned instead of byte.
2: The second why use -1 in while condition?
When the end of file is reached, -1 is returned.
3: How out.write(c); method convert int to again characters? that provide same output in outagain.txt file
FileOutputStream.write() takes a byte parameter as an int. Since an int spans over more values than a byte, the 24 high-order bits of the given int are ignored, making it a byte-compatible value: an int in Java is always 32 bits. By removing the 24 high-order bits, you're down to a 8 bits value, i.e. a byte.
I suggest you read carefully the Javadocs for each of those method. As reference, they answer all of your questions:
read:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
write:
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.

Just read the docs.
here is the read method docs
http://docs.oracle.com/javase/7/docs/api/java/io/FileInputStream.html#read()
public int read()
throws IOException
Reads a byte of data from this input stream. This method blocks if no input is yet available.
Specified by:
read in class InputStream
Returns:
the next byte of data, or -1 if the end of the file is reached.
That int is a your next set of bytes data.
Now , here are the answers.
1) When you assign a char to an int, it denotes it's ascii number to the int.
If you are interested, here us the list of chars and their ascii codes https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html
2)-1 if the end of the file is reached. So that's a check to data exists or not.
3)When you send an ascii code to print writer, it's prints that corresponding char to the file.

Java integer is equal to character?

I apologize if this question is a bit simplistic, but I'm somewhat puzzled as to why my professor has made the following the statement:
Notice that read() returns an integer value. Using an int as a return type allows read() to use -1 to indicate that it has reached the end of the stream. You will recall from your introduction to Java that an int is equal to a char which makes the use of the -1 convenient.
The professor was referencing the following sample code:
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("Independence.txt");
out = new FileOutputStream("Independence.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
This is an advanced Java course, so obviously I've taken a few introductory courses prior to this one. Maybe I'm just having a "blonde moment" of sorts, but I'm not understanding in what context an integer could be equal to a character when making comparisons. The instance method read() returns an integer value when it comes to EOF. That I understand perfectly.
Can anyone shed light on the statement in bold?

In Java, chars is a more specific type of int. I can write.
char c = 65;
This code prints out "A". I need the cast there so Java knows I want the character representation and not the integer one.
public static void main(String... str) {
System.out.println((char) 65);
}
You can look up the int to character mapping in an ASCII table.
And per your teacher, int allows for more values. Since -1 isn't a character value, it can serve as a flag value.

To a computer a character is just a number (that may at some point be mapped to a picture of a letter for display to the user). Languages usually have a special character type to distinguish between "just a number" and "a number that refers to a character", but inside, it's still just some sort of integer.
The reason why read() returns an int is to have "one extra value" to represent EOF. All the values of char are already defined to mean something else, so it uses a larger type to get more values.

It means your professor has been spending too much time programming in C. The definition of read for InputStream (and FileInputStream) is:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned.
(See http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read())
A char in Java, on the other hand, represents a Unicode character, and is treated as an integer in the range 0 to 65535. (In C, a char is an 8-bit integral value, either 0 to 255 or -128 to 127.)
Please note that in Java, a byte is actually an integer in the range -128 to 127; but the definition of read has been specified to avoid the problem, by decreeing that it will return 0 to 255 anyway. The javadoc is using "byte" in a loose sense here.

The char data type in Java is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
The int data type in Java is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive).
Since char cannot be negative (a number between 0 and 65,535) and an int can be negative, the possible values returned from the method is -1 (to signify nothing left) to 65,535 (max value of a char).

What your professor is referring to the fact that characters are just integers used in a special context. If we ignore Unicode and other encoding types and focus on the old days of ASCII, there was an ASCII table (http://www.asciitable.com/). A string of characters is really just a sequence of integers, for example, TUV would be 84 followed by 85 followed by 86.
The 'char' type is an integer internally in the JVM and is more or less a hint that this integer should only be used in a character context.
You can even cast between them.
char a = (char) 65;
int i = (int) 'A';
Those two variables hold the same data in memory, but the compiler and JVM treat them slightly differently.
Because of this, read() returns an integer instead of char so as to allow a -1, which is not a valid character code. Values other than -1 can be cast to a char, while -1 indicates EOF.
Of course, Unicode changes all of this with multi-byte character and code points. I'll leave that as an exercise to you.

I am not sure what the professor means but what it all comes down to is computers only understand 1's and 0's we don't understand 1's and 0's all that we'll so we use a code system first Morris code then ascii now utf -16 ... It varies from computer to computer how accurate numbers(int) is.you know in the real world int is infinate they just keep counting.char also has a size.in utf _16 let's just say it's 16 bits (I will let you read up on that) so if char and int both take 16 bits as the professor says they are the same (size) and reading 1 char is the same as 1int . By the way to be politically correct char is infinite as well.Chinese characters French characters and the character I just made up but can't post cause its not supported.so think of the code system for int and char. -1 int is eof char.(eof = end of file) good luck, I hope this helped.what I don't understand is reading and writing to the same file?

Get bytes from the Int returned from socket intputStream read()

I have an InputStream and I want to read each char until I find a comma "," from a socket.
Heres my code
private static Packet readPacket(InputStream is) throws Exception
{
int ch;
Packet p = new Packet();
String type = "";
while((ch = is.read()) != 44) //44 is the "," in ISO-8859-1 codification
{
if(ch == -1)
throw new IOException("EOF");
type += new String(ch, "ISO-8859-1"); //<----DOES NOT COMPILE
}
...
}
String constructor does not receive an int, only an array of bytes. I read the documentation and the it says
read():
Reads the next byte of data from the input stream.
How can I convert this int to byte then ? Is it using only the less significant bits (8 bits) of all 32 bits of the int ?
Since Im working with Java, I want to keep it full plataform compatible (little endian vs big endian, etc...) Whats the best approach here and why ?
PS: I dont want to use any ready-to-use classes like DataInputStream, etc....

The String constructor takes a char[] (an array)
type += new String(new byte[] { (byte) ch }, "ISO-8859-1");
Btw. it would be more elegant to use a StringBuilder for type and make use of its append-methods. Its faster and also shows the intend better:
private static Packet readPacket(InputStream is) throws Exception {
int ch;
Packet p = new Packet();
StringBuilder type = new StringBuilder();
while((ch = is.read()) != 44) {
if(ch == -1)
throw new IOException("EOF");
// NOTE: conversion from byte to char here is iffy, this works for ISO8859-1/US-ASCII
// but fails horribly for UTF etc.
type.append((char) ch);
}
String data = type.toString();
...
}
Also, to make it more flexible (e.g. work with other character encodings), your method would better take an InputStreamReader that handles the conversion from bytes to characters for you (take look at InputStreamReader(InputStream, Charset) constructor's javadoc).

For this can use an InputStreamReader, which can read encoded character data from a raw byte stream:
InputStreamReader reader = new InputStreamReader(is, "ISO-8859-1");
You may now use reader.read(), which will consume the correct number of bytes from is, decode as ISO-8859-1, and return a Unicode code point that can be correctly cast to a char.
Edit: Responding to comment about not using any "ready-to-use" classes:
I don't know if InputStreamReader counts. If it does, check out Durandal's answer, which is sufficient for certain single byte encodings (like US-ASCII, arguable, or ISO-8859-1).
For multibyte encodings, if you do not want to use any other classes, you would first buffer all data into a byte[] array, then construct a String from that.
Edit: Responding to a related question in the comments on Abhishek's answer.
Q:
Abhishek wrote: Can you please enlighten me a little more? i have tried casting integer ASCII to character..it has worked..can you kindly tell where did i go wrong?
A:
You didn't go "wrong", per se. The reason ASCII works is the same reason that Brian pointed out that ISO-8859-1 works. US-ASCII is a single byte encoding, and bytes 0x00-0x7f have the same value as their corresponding Unicode code points. So a cast to char is conceptually incorrect, but in practice, since the values are the same, it works. Same with ISO-8859-1; bytes 0x00-0xff have the same value as their corresponding code points in that encoding. A cast to char would not work in e.g. IBM01141 (a single byte encoding but with different values).
And, of course, a single byte to char cast would not work for multibyte encodings like UTF-16, as more than one input byte must be read (a variable number, in fact) to determine the correct value of a corresponding char.

type += new String(String.valueOf(ch).getBytes("ISO-8859-1"));

Partial answer: Try replacing :
type += new String(ch, "ISO-8859-1");
by
type+=(char)ch;
This can be done if you receive the ASCII value of the char.Code converts ASCII in to char by casting.
Its better to avoid lengthy code and this would work just fine. The read() function works in many ways:
One way is: int= inpstr.read();
Second inpstr.read(byte)
So its up to you which method you wanna use.. both have different purpose..

Why doesn't StringReader.Read() return a byte?

I was using StringReader in a Data Structures assignment (Huffman codes), and was testing if the end of the string had been reached. I found that the int value that StringReader.read() returns is not -1, but 65535, so casting the result to a byte solved my infinite loop problem I was having.
Is this a bug in JDK, or is it common practice to cast values returned from Reader.read() calls to bytes? Or am I missing something?
The gist of my code was something like this:
StringReader sr = new StringReader("This is a test string");
char c;
do {
c = sr.read();
//} while (c != -1); //<--Broken
} while ((byte)c != -1); //<--Works

In fact that doesn't even compile. I get:
Type mismatch: cannot convert from int to char
Since the sr.read() call returns an int I suggest you store it as such.
This compiles (and works as expected):
StringReader sr = new StringReader("This is a test string");
int i; // <-- changed from char
do {
i = sr.read();
// ... and if you need a char...
char c = (char) i;
} while (i != -1); // <-- works :-)
Why doesn't StringReader.Read() return a byte?
Strings are composed of 16-bit unicode characters. These won't fit in an 8-bit byte. One could argue that a char would have been enough, but then there is no room for providing an indication that the EOF is reached.

Characters in java are 2 bytes because they're encoded in UTF-16. This is why read() returns an int, because byte is not large enough.

char c = (char) -1;
System.out.println(""+c);
System.out.println(""+(byte)c);
This code will solve your doubt ..

A Java String is a sequence of chars which are not bytes but values that represent UTF-16 code-points. The semantics of read is to return the next atom from the input stream. In case of a StringReader the atomic component is a 16-bit value which cannot be represented as a single byte.

StringReader#read returns an int value which is -1 if the end of the stream has been reached.
The problem in your code is that you already convert the int value to a char and test the char:
System.out.println("Is it still (-1)?: " + (int) ((char) -1));

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.