The following code within a program allows 90 to be assigned to the variable 'ch'. 'Z' is then printed to the console.
char ch;
ch = 90;
System.out.println(ch);
However, the following code, that lies within a program, does not compile. If the following code requires the input to the ch variable to be a character type, i.e. (char) System.in.read();, then why does the same not apply when 90 is assigned to ch above? Why doesn't it have to be ch = (char) 90?
char ch;
ch = System.in.read();
The compiler knows that 90 is a valid value for char. However, System.in.read() can return any int, which may be out of the valid range for chars.
If you change 90 to 90000, the code won't compile:
char ch;
ch = 90000;
Whenever you are dealing with system io you need to ensure that you can handle all valid byte input values. However, you then need a mechanism to indicate that the stream has been exhausted. The javadoc for InputStream.read() (System.in is a global InputStream) says (emphasis added),
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned.
If you were to cast -1 to char you would get 65535 because char is unsigned. With byte, it's even worse in that -1 is a valid value. Regardless, you aren't reading char values; you are reading byte(s) encoded as int in the range 0-255 plus -1 to indicate end of stream. If you want char(s) I suggest you look at an InputStreamReader which is described in the javadoc as
An InputStreamReader is a bridge from byte streams to character streams
Related
I am unable to convert extended ASCII characters(having code greater than 128) into their codes.
I am using (int)'�' for conversion but its is giving
� -- 65533
I am using the following java function:
static String decodeCandidateId2(String CandidateId ){
byte[] valueDecoded= Base64.decodeBase64(CandidateId.getBytes());
CandidateId=new String(valueDecoded);
String key="#!#&%$##&^%$";
String output="";
for(int i=0; i<CandidateId.length(); i++) {
int ascii=0;
ascii=(int)CandidateId.charAt(i)-((int)key.charAt((i-1) % key.length()));
output += Character.toString ((char) ascii);
}
return output;
}
if CandidateId = "VXJaWl9lVlV0XpRbZHVZWFpeXVuAV5NVW3BRW19XVQ=="
current output = 12979#2248゚6#5854998ᄑ1゚07008921, but i need to get 12979#224866#5854998#1507008921 as output.
Can anyone please help me to get the correct code.
Strictly speaking, there is no such thing as "Extended ASCII" - or there are several different colloquial definitions of this non-standard term. ASCII codes are from 0 to 127. Full stop.
The Java type char has values ranging from 0 to 65535, which are code points in the Basic Multilingual Plane of the Unicode character set.
Your encoding algorithm uses 16-bit subtraction. "Negative" values will be in the range 32768 to 65535 (since char values are unsigned). However, you seem to want to only deal with values in the range 0 to 255. To do that, you can force your arithmetic to be modulo 256 - e.g. by ANDing the result with 0xFF.
The problem I am facing occurs when I try to type cast some ASCII values to char.
For example:
(char)145 //returns ?
(char)129 //also returns ?
but it is supposed to return a different character. It happens to many other values as well.
I hope I have been clear enough.
ASCII is a 7-bit encoding system. Some programs even use this to detect if a file is binary or textual. Characters below 32 are escape characters and are used as directives (for instance new lines, print command)
The program however will still work. A character is simply stored as a short (thus sixteen bits). But the values in that range don't have an interpretation. This means that the textual output of both values will lead to nothing. On the other hand comparisons like (char) 145 == (char) 129 will still work (return false). Simply because for a processor, there is no difference between a short and a character.
If you are interested in converting your value such that only the lowest seven bits count (this modifying the value such that it is in the valid range), you can use masking:
int value = 145;
value &= 0x7f;
char c = (char) value;
The char type is Unicode 16 bit, UTF-16. So you could do (char) 265 for c-with-circumflex. ASCII is 7 bits 0 - 127.
String s = "" + ((char)145) + ((char)129);
The above is a string of two Unicode characters (each 2 bytes, UTF-16).
byte[] bytes = s.getBytes(StandardCharsets.US_ASCII); // ASCII with '?' as 7bit
s = new String(bytes, StandardCharsets.US_ASCII); // "??"
byte[] bytes = s.getBytes(StandardCharsets.ISO_8859_1); // ISO-8859-1 with Latin1
byte[] bytes = s.getBytes("Windows-1252"); // With Windows Latin1
byte[] bytes = s.getBytes(StandardCharsets.UTF_8); // No information loss.
s = new String(bytes, StandardCharsets.UTF_9); // Orinal string.
In java String/char/Reader/Writer tackle text (in Unicode), whereas byte[]/InputStream/OutputStream tackle binary data, bytes.
And for bytes must always be associated with an encoding to give text.
Answer: as soon as there is a conversion from text to some encoding that does not represent that char, a question mark can be written.
These expressions evaluate to true:
((char) 145) == '\u0091';
((char) 129) == '\u0081';
These UTF-16 values map to the Unicode code points U+0091 and U+0081:
0091;<control>;Cc;0;BN;;;;;N;PRIVATE USE ONE;;;;
0081;<control>;Cc;0;BN;;;;;N;;;;;
These are both control characters without visible graphemes (the question mark acts as a substitution character) and one of them is private use so has no designated purpose. Neither are in the ASCII set.
I notice that the following line of code exists a lot. (For example on this website.)
char ch = (char) System.in.read(); // uses a char, and requires a cast.
Now to test for a particular character keystroke, or an ASCII value, or an escape sequence, etc..
if (ch == 'a' || ch == 65 || ch == '\n' || ch == 13) System.out.print("true");
Does using a char above provide any benefits over the following line of code below, which uses an int?
int i = System.in.read(); // uses an int, which requires no cast.
The int variable "i" can be used in the same if statement as previously shown, above.
Neither approach is correct. The correct way to read characters from System.in is to use an InputStreamReader (or a Scanner if that provides the right functionality). The reason is that InputStream.read() reads a single byte, not characters, and some characters require reading more than one byte. You can also specify the character encoding to be used when converting bytes to characters.
Reader rdr = new InputStreamReader(System.in);
int i = rdr.next();
if (i == -1) {
// end of input
} else {
// normal processing; safe to cast i to char if convenient
}
There is no reason for the cast at all. This is fine
int i = System.in.read();
if(i == 'a'){
// do something
}
You can do this, because 'a' is a value within the range of an int.
Also, be aware that doing the cast directly to a char may be problematic when reading files and such, because what InputStream.read() does is read a byte not a char. A char is two bytes wide.
I apologize if this question is a bit simplistic, but I'm somewhat puzzled as to why my professor has made the following the statement:
Notice that read() returns an integer value. Using an int as a return type allows read() to use -1 to indicate that it has reached the end of the stream. You will recall from your introduction to Java that an int is equal to a char which makes the use of the -1 convenient.
The professor was referencing the following sample code:
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("Independence.txt");
out = new FileOutputStream("Independence.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
This is an advanced Java course, so obviously I've taken a few introductory courses prior to this one. Maybe I'm just having a "blonde moment" of sorts, but I'm not understanding in what context an integer could be equal to a character when making comparisons. The instance method read() returns an integer value when it comes to EOF. That I understand perfectly.
Can anyone shed light on the statement in bold?
In Java, chars is a more specific type of int. I can write.
char c = 65;
This code prints out "A". I need the cast there so Java knows I want the character representation and not the integer one.
public static void main(String... str) {
System.out.println((char) 65);
}
You can look up the int to character mapping in an ASCII table.
And per your teacher, int allows for more values. Since -1 isn't a character value, it can serve as a flag value.
To a computer a character is just a number (that may at some point be mapped to a picture of a letter for display to the user). Languages usually have a special character type to distinguish between "just a number" and "a number that refers to a character", but inside, it's still just some sort of integer.
The reason why read() returns an int is to have "one extra value" to represent EOF. All the values of char are already defined to mean something else, so it uses a larger type to get more values.
It means your professor has been spending too much time programming in C. The definition of read for InputStream (and FileInputStream) is:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned.
(See http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read())
A char in Java, on the other hand, represents a Unicode character, and is treated as an integer in the range 0 to 65535. (In C, a char is an 8-bit integral value, either 0 to 255 or -128 to 127.)
Please note that in Java, a byte is actually an integer in the range -128 to 127; but the definition of read has been specified to avoid the problem, by decreeing that it will return 0 to 255 anyway. The javadoc is using "byte" in a loose sense here.
The char data type in Java is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
The int data type in Java is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive).
Since char cannot be negative (a number between 0 and 65,535) and an int can be negative, the possible values returned from the method is -1 (to signify nothing left) to 65,535 (max value of a char).
What your professor is referring to the fact that characters are just integers used in a special context. If we ignore Unicode and other encoding types and focus on the old days of ASCII, there was an ASCII table (http://www.asciitable.com/). A string of characters is really just a sequence of integers, for example, TUV would be 84 followed by 85 followed by 86.
The 'char' type is an integer internally in the JVM and is more or less a hint that this integer should only be used in a character context.
You can even cast between them.
char a = (char) 65;
int i = (int) 'A';
Those two variables hold the same data in memory, but the compiler and JVM treat them slightly differently.
Because of this, read() returns an integer instead of char so as to allow a -1, which is not a valid character code. Values other than -1 can be cast to a char, while -1 indicates EOF.
Of course, Unicode changes all of this with multi-byte character and code points. I'll leave that as an exercise to you.
I am not sure what the professor means but what it all comes down to is computers only understand 1's and 0's we don't understand 1's and 0's all that we'll so we use a code system first Morris code then ascii now utf -16 ... It varies from computer to computer how accurate numbers(int) is.you know in the real world int is infinate they just keep counting.char also has a size.in utf _16 let's just say it's 16 bits (I will let you read up on that) so if char and int both take 16 bits as the professor says they are the same (size) and reading 1 char is the same as 1int . By the way to be politically correct char is infinite as well.Chinese characters French characters and the character I just made up but can't post cause its not supported.so think of the code system for int and char. -1 int is eof char.(eof = end of file) good luck, I hope this helped.what I don't understand is reading and writing to the same file?
I was using StringReader in a Data Structures assignment (Huffman codes), and was testing if the end of the string had been reached. I found that the int value that StringReader.read() returns is not -1, but 65535, so casting the result to a byte solved my infinite loop problem I was having.
Is this a bug in JDK, or is it common practice to cast values returned from Reader.read() calls to bytes? Or am I missing something?
The gist of my code was something like this:
StringReader sr = new StringReader("This is a test string");
char c;
do {
c = sr.read();
//} while (c != -1); //<--Broken
} while ((byte)c != -1); //<--Works
In fact that doesn't even compile. I get:
Type mismatch: cannot convert from int to char
Since the sr.read() call returns an int I suggest you store it as such.
This compiles (and works as expected):
StringReader sr = new StringReader("This is a test string");
int i; // <-- changed from char
do {
i = sr.read();
// ... and if you need a char...
char c = (char) i;
} while (i != -1); // <-- works :-)
Why doesn't StringReader.Read() return a byte?
Strings are composed of 16-bit unicode characters. These won't fit in an 8-bit byte. One could argue that a char would have been enough, but then there is no room for providing an indication that the EOF is reached.
Characters in java are 2 bytes because they're encoded in UTF-16. This is why read() returns an int, because byte is not large enough.
char c = (char) -1;
System.out.println(""+c);
System.out.println(""+(byte)c);
This code will solve your doubt ..
A Java String is a sequence of chars which are not bytes but values that represent UTF-16 code-points. The semantics of read is to return the next atom from the input stream. In case of a StringReader the atomic component is a 16-bit value which cannot be represented as a single byte.
StringReader#read returns an int value which is -1 if the end of the stream has been reached.
The problem in your code is that you already convert the int value to a char and test the char:
System.out.println("Is it still (-1)?: " + (int) ((char) -1));