Java integer is equal to character? - java

I apologize if this question is a bit simplistic, but I'm somewhat puzzled as to why my professor has made the following the statement:
Notice that read() returns an integer value. Using an int as a return type allows read() to use -1 to indicate that it has reached the end of the stream. You will recall from your introduction to Java that an int is equal to a char which makes the use of the -1 convenient.
The professor was referencing the following sample code:
public class CopyBytes {
public static void main(String[] args) throws IOException {
FileInputStream in = null;
FileOutputStream out = null;
try {
in = new FileInputStream("Independence.txt");
out = new FileOutputStream("Independence.txt");
int c;
while ((c = in.read()) != -1) {
out.write(c);
}
} finally {
if (in != null) {
in.close();
}
if (out != null) {
out.close();
}
}
}
}
This is an advanced Java course, so obviously I've taken a few introductory courses prior to this one. Maybe I'm just having a "blonde moment" of sorts, but I'm not understanding in what context an integer could be equal to a character when making comparisons. The instance method read() returns an integer value when it comes to EOF. That I understand perfectly.
Can anyone shed light on the statement in bold?

In Java, chars is a more specific type of int. I can write.
char c = 65;
This code prints out "A". I need the cast there so Java knows I want the character representation and not the integer one.
public static void main(String... str) {
System.out.println((char) 65);
}
You can look up the int to character mapping in an ASCII table.
And per your teacher, int allows for more values. Since -1 isn't a character value, it can serve as a flag value.

To a computer a character is just a number (that may at some point be mapped to a picture of a letter for display to the user). Languages usually have a special character type to distinguish between "just a number" and "a number that refers to a character", but inside, it's still just some sort of integer.
The reason why read() returns an int is to have "one extra value" to represent EOF. All the values of char are already defined to mean something else, so it uses a larger type to get more values.

It means your professor has been spending too much time programming in C. The definition of read for InputStream (and FileInputStream) is:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned.
(See http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read())
A char in Java, on the other hand, represents a Unicode character, and is treated as an integer in the range 0 to 65535. (In C, a char is an 8-bit integral value, either 0 to 255 or -128 to 127.)
Please note that in Java, a byte is actually an integer in the range -128 to 127; but the definition of read has been specified to avoid the problem, by decreeing that it will return 0 to 255 anyway. The javadoc is using "byte" in a loose sense here.

The char data type in Java is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
The int data type in Java is a 32-bit signed two's complement integer. It has a minimum value of -2,147,483,648 and a maximum value of 2,147,483,647 (inclusive).
Since char cannot be negative (a number between 0 and 65,535) and an int can be negative, the possible values returned from the method is -1 (to signify nothing left) to 65,535 (max value of a char).

What your professor is referring to the fact that characters are just integers used in a special context. If we ignore Unicode and other encoding types and focus on the old days of ASCII, there was an ASCII table (http://www.asciitable.com/). A string of characters is really just a sequence of integers, for example, TUV would be 84 followed by 85 followed by 86.
The 'char' type is an integer internally in the JVM and is more or less a hint that this integer should only be used in a character context.
You can even cast between them.
char a = (char) 65;
int i = (int) 'A';
Those two variables hold the same data in memory, but the compiler and JVM treat them slightly differently.
Because of this, read() returns an integer instead of char so as to allow a -1, which is not a valid character code. Values other than -1 can be cast to a char, while -1 indicates EOF.
Of course, Unicode changes all of this with multi-byte character and code points. I'll leave that as an exercise to you.

I am not sure what the professor means but what it all comes down to is computers only understand 1's and 0's we don't understand 1's and 0's all that we'll so we use a code system first Morris code then ascii now utf -16 ... It varies from computer to computer how accurate numbers(int) is.you know in the real world int is infinate they just keep counting.char also has a size.in utf _16 let's just say it's 16 bits (I will let you read up on that) so if char and int both take 16 bits as the professor says they are the same (size) and reading 1 char is the same as 1int . By the way to be politically correct char is infinite as well.Chinese characters French characters and the character I just made up but can't post cause its not supported.so think of the code system for int and char. -1 int is eof char.(eof = end of file) good luck, I hope this helped.what I don't understand is reading and writing to the same file?

Related

Is there some sort of functionality in Java that converts a char into a bit?

I'm trying to find a way to convert a char (Precondition is the char can only be '0' or '1') into an actual bit in Java. I'm not sure if Java has some built-in functionality for this, or if there is an algorithm that can be implemented to do so.
I need to implement the following class:
public void writeBit(char bit) {
//PRE:bit == '0' || bit == '1'
try {
} catch (IOException e) {
System.out.println(e);
}
}
I cannot change the method structure in any way. I am implementing Huffman Encoding and have an array of Strings that represent the encodings for every character within an input file. For example, 'A' or array[65] contains the String: "01011". So if I see the letter A in my file, I need to use writeBit to write out A's respective String to a binary file. Every time I reach 8 bits (one byte) I will call writeByte to send those 8 bits to the binary file, then reset some sort of counter variable to 0 and continue.
What I'm stuck on is how I am supposed to convert the char bit into an actual bit, so that it can be properly written out to a binary file.
Java does not have a primitive data type representing a single bit. On many hardware architectures, it is not even possible to access memory with that granularity.
When you say "an actual bit", then, I can only assume that you mean an integer value that is either 0 or 1, as opposed to char values '0' and '1'. There are numerous ways to perform such a conversion, among them:
byte the_bit = bit - '0';. This takes advantage of the fact that char is an integer type, and that the decimal digits zero and one are encoded in Java with consecutive character codes.
byte the_bit = (bit == '0') ? 0 : 1;. This just explicitly tests whether bit contains the value '0', evaluating to 0 if so or 1 if not.
It gets more complicated from there, for example:
byte the_bit = Byte.parseByte(String.valueOf(bit));. This converts the char to a string containing (only) that char, and then parses it as the string representation of a byte.
All of the above rely to one degree or another on the precondition given: that bit does not have any value other than '0' or '1'.
With that said, I think anything like this is probably the wrong approach for implementing a Huffman encoding, because Java Strings are an unlikely, very heavyweight, representation for the bit strings involved.
You can use Integer.parseInt(String s, int radix) or Integer.parseUnsignedInt(String s, int radix) with radix 2, to convert from a "binary digits string" to internal int java integer form.
public static void main(String[] args) {
int num = Integer.parseInt("101010", 2);
// print 42
System.out.println(num);
}
And reversely with method Integer.toBinaryString(int i) you can generate the binary string representation:
// print 101010
System.out.println(Integer.toBinaryString(42));
Similarly you can use Byte.parseByte(String s, int radix) to parse a byte:
public static void main(String[] args) {
byte num = Byte.parseByte("101010", 2);
// print 42
System.out.println(num);
}

convert extended ascii character into code in java

I am unable to convert extended ASCII characters(having code greater than 128) into their codes.
I am using (int)'�' for conversion but its is giving
� -- 65533
I am using the following java function:
static String decodeCandidateId2(String CandidateId ){
byte[] valueDecoded= Base64.decodeBase64(CandidateId.getBytes());
CandidateId=new String(valueDecoded);
String key="#!#&%$##&^%$";
String output="";
for(int i=0; i<CandidateId.length(); i++) {
int ascii=0;
ascii=(int)CandidateId.charAt(i)-((int)key.charAt((i-1) % key.length()));
output += Character.toString ((char) ascii);
}
return output;
}
if CandidateId = "VXJaWl9lVlV0XpRbZHVZWFpeXVuAV5NVW3BRW19XVQ=="
current output = 12979#2248゚6#5854998ᄑ1゚07008921, but i need to get 12979#224866#5854998#1507008921 as output.
Can anyone please help me to get the correct code.
Strictly speaking, there is no such thing as "Extended ASCII" - or there are several different colloquial definitions of this non-standard term. ASCII codes are from 0 to 127. Full stop.
The Java type char has values ranging from 0 to 65535, which are code points in the Basic Multilingual Plane of the Unicode character set.
Your encoding algorithm uses 16-bit subtraction. "Negative" values will be in the range 32768 to 65535 (since char values are unsigned). However, you seem to want to only deal with values in the range 0 to 255. To do that, you can force your arithmetic to be modulo 256 - e.g. by ANDing the result with 0xFF.

Why a character Array accepts integer values in Java?

Of course I'm a beginner for Java, previously I learned C. Please take a look on the following code segments.
char Character;
int Number = 27;
Character = Number;
System.out.println(Character);
The above code cannot be compiled as an error stated as “Loss of Information”
Whereas the following code...
char Character = ‘F’;
int Number;
Number = Character;
System.out.println(Number);`
The above code can be compiled but the output is “70”... not as “F”
Also take a look on the following code...
char [] arrayCh = new char [3];
arrayCh [0] = 27;
System.out.println(arrayCh[0]);
The above code can be compiled however it also gives an unfamiliar symbol...
I know the issues regarding the ASCII Values and the memory taking as 'char' takes 16 Bits, 'int' takes 32 Bits. Therefore an integer value couldn’t be assigned in to a character variable whereas a character value can be assigned in to an integer variable as "ASCII" value.
My question is... why a 'char' array accepts 'int' values..? Could anyone explain?
A char is a 2-bytes long, unsigned integer. 27 is an integer literal that is in the range of a char, so the compiler accepts to let you assign it to a char.
'F' is a character literal that represents the character F, which has the decimal value 70 in the unicode standard. So, assigning 'F' to an integer is the same thing as assigning 70.

Converting a int to char and then back to int - doesn't give same result always

I am trying to get a char from an int value > 0xFFFF. But instead, I always get back the same char value, that when cast to an int, prints the value 65535 (0xFFFF).
I couldn't understand why it is generating symbols for unicode > 0xFFFF.
int hex = 0x10FFFF;
char c = (char)hex;
System.out.println((int)c);
I expected the output to be 0x10FFFF. Instead, the output comes back as 65535.
This is because, while an int is 4 bytes, a char is only 2 bytes. Thus, you can't represent all values in a char that you can in an int. Using a standard unsigned integer representation, you can only represent the range of values from 0 to 2^16 - 1 == 65535 in a 2-byte value, so if you convert any number outside that range to a 2-byte value and back, you'll lose data.
int is 4 byte. char is 2 byte.
Your number was well within range an int can hold, but not which char can.
So when you converted that number to a char, it lost data and became the maximum a char can hold, which is what it printed i.e. 65535
Your number was too big to be a char which is 2 bytes. But it was small enough where it fit in as an int which is 4 bytes. 65535 is the biggest amount that fits in a char so that's why you got that value. Also, if a char was big enough to fit your number, when you returned it to an int it might have returned the decimal value for 0x10FFFF which is 1114111.
Unfortunately, I think you were expecting a Java char to be the same thing as a Unicode code point. They are not the same thing.
The Java char, as already expressed by other answers, can only support code points that can be represented in 16 bits, whereas Unicode needs 21 bits to support all code points.
In other words, a Java char on its own, only supports Basic Multilingual Plane characters (code points <= 0xFFFF). In Java, if you want to represent a Unicode code point that is in one of the extended planes (code points > 0xFFFF), then you need surrogate characters, or a pair of characters to do that. This is how UTF-16 works. And, internally, this is how Java strings work as well. Just for fun, run the following snippet to see how a single Unicode code point is actually represented by 2 characters if the code point is > 0xFFFF:
// Printing string length for a string with
// a single unicode code point: 0x22BED.
System.out.println("𢯭".length()); // prints 2, because it uses a surrogate pair.
If you want to safely convert an int value that represents a Unicode code point to a char (or chars to be more exact), and then convert it back to an int code point, you will have to use code like this:
public static void main(String[] args) {
int hex = 0x10FFFF;
System.out.println(Character.isSupplementaryCodePoint(hex)); // prints true because hex > 0xFFFF
char[] surrogateChars = Character.toChars(hex);
int codePointConvertedBack = Character.codePointAt(surrogateChars, 0);
System.out.println(codePointConvertedBack); // prints 1114111
}
Alternatively, instead of manipulating char arrays, you can use a String, like this:
public static void main(String[] args) {
int hex = 0x10FFFF;
System.out.println(Character.isSupplementaryCodePoint(hex)); // prints true because hex > 0xFFFF
String s = new String(new int[] {hex}, 0, 1);
int codePointConvertedBack = s.codePointAt(0);
System.out.println(codePointConvertedBack); // prints 1114111
}
For further reading: Java Character Class

parseInt on a string of 8 bits returns a negative value when the first bit is 1

I've got a huge string of bits (with some \n in it too) that I pass as a parameter to a method, which should isolate the bits 8 by 8, and convert them all to bytes using parseInt().
Thing is, every time the substring of 8 bits starts with a 1, the resulting byte is a negative number. For example, the first substring is '10001101', and the resulting byte is -115. I can't seem to figure out why, can someone help? It works fine with other substrings.
Here's my code, if needed :
static String bitsToBytes(String geneString) {
String geneString_temp = "", sub;
for(int i = 0; i < geneString.length(); i = i+8) {
sub = geneString.substring(i, i+8);
if (sub.indexOf("\n") != -1) {
if (sub.indexOf("\n") != geneString.length())
sub = sub.substring(0, sub.indexOf("\n")) + sub.substring(sub.indexOf("\n")+1, sub.length()) + geneString.charAt(i+9);
}
byte octet = (byte) Integer.parseInt(sub, 2);
System.out.println(octet);
geneString_temp = geneString_temp + octet;
}
geneString = geneString_temp + "\n";
return geneString;
}
In Java, byte is a signed type, meaning that when the most significant bit it set to 1, the number is interpreted as negative.
This is precisely what happens when you print your byte here:
System.out.println(octet);
Since PrintStream does not have an overload of println that takes a single byte, the overload that takes an int gets called. Since octet's most significant bit is set to 1, the number gets sign-extended by replicating its sign bit into bits 9..32, resulting in printout of a negative number.
byte is a signed two's complement integer. So this is a normal behavior: the two's complement representation of a negative number has a 1 in the most-significant bit. You could think of it like a sign bit.
If you don't like this, you can use the following idiom:
System.out.println( octet & 0xFF );
This will pass the byte as an int while preventing sign extension. You'll get an output as if it were unsigned.
Java doesn't have unsigned types, so the only other thing you could do is store the numbers in a wider representation, e.g. short.
In Java, all integers are signed, and the most significant bit is the sign bit.
Because parseInt parse signed int that means it converts the binary if it begins with 0 its positive and if 1 its negative try to use parseUnsignedInt instead

Categories