This question already has answers here:
Calculating with the char variable in java
(2 answers)
Closed 4 years ago.
I'm looking for an explanation for Java's behavior when handling the following scenarios. I understand that the ASCII table is arranged so that the value of the character 5 is five positions greater than 0. This allows for calculations to be done on the char without converting to an int as seen in the first example.
What I don't understand is why Java seems to inconsistently handle when to provide a value from an ASCII table and when to do a calculation on the chars as though they were integers.
int x = '5' - '0';
output x = 5;
int x = '5'
output x = 53;
Now for some examples, that introduce confusion.
int x = '0' + 1 - '5'
output x = -4
int y = '5' - '0' + '1'
output 54
int y = '5' - 0 + '1'
output 102
Java seems to be doing an implicit type conversion, but how is Java inferring which representation of the int/char should it be using?
Just write the char conversion to ASCII code (below your statements)
int x = '0' + 1 - '5'
48 + 1 - 53 = -4
int y = '5' - 0 + '1'
53 - 0 + 49 = 102
int y = '5' - '0' + '1'
53 - 48 + 49 = 54
Notice it's consistent, each int remains int and each char converted to ASCII code
This might seem to be inconsistent but in real they are consistent.
int x = '5' - '0';
output x = 5; because behind the back ASCII codes are, '5'=53 and '0'=48.
Hence
int x = '5'
output x = 53;
You might be mixing the representation from the value. The values never change, so when you perform arithmatic it will always be that '5'==53 and not 5. For the display JLS on primitive to string conversion.
Integer arithmetic is promoted to int for most calculations.
System.out.println('5' + '0');
>>> 101
System.out.println((char)('5' + '0'));
>>> e
Both results have the same numeric value, but one is displayed as a character because it has been cast to character.
Java seems to be doing an implicit type conversion, but how is Java inferring which representation of the int/char should it be using?
It's actually quite simple. char is one of the numeric types in Java, see 4.2.1. Integral Types and Values:
The values of the integral types are integers in the following ranges:
[...]
For char, from '\u0000' to '\uffff' inclusive, that is, from 0 to 65535
All operations on integer types are carried out either with int- or long- precision, see 4.2.2. Integer Operations:
If an integer operator other than a shift operator has at least one operand of type long, then the operation is carried out using 64-bit precision, and the result of the numerical operator is of type long. If the other operand is not long, it is first widened (§5.1.5) to type long by numeric promotion (§5.6).
Otherwise, the operation is carried out using 32-bit precision, and the result of the numerical operator is of type int. If either operand is not an int, it is first widened to type int by numeric promotion.
Note the last sentence: this defines the conversion to be applied, it is called "numeric promotion".
char '0' does not equals int 0. char '0''s binary representation occupies 16 bit:
0000 0000 0011 0000
while int 0's binary representation occupies 32 bit:
0000 0000 0000 0000 0000 0000 0000 0000
When you sum a char and an int, the char will be promoted to int first.
For example. char 5's unicode is 0035, in binary 0000 0000 0011 0101, it will be promoted to int by inserting 16 zeros at head, 0000 0000 0000 0000 0000 0000 0011 0101, and the int represents 53 in decimal.
Related
I have Python background and I don't understand that how byte casting returns decimal value of char according to ASCII.
Here are the some code examples:
// C#
string s = "abc123éé";
int[] x = new int[255];
for (int i = 0; i < s.Length; i++){
amount[(byte)s[i] - (byte)'0']++;
}
If we look for first iteration the casting is on 'a' char and it returns 97.
// Java
char a = 'a';
System.out.println((byte)a);
Same as Java, it returns 97 too. But in Python 3, it does not return as decimal value of char.
>>> a = bytes("a", encoding="utf-8")
>>> a
b'a'
And now if we're coming to my questions:
How / Why byte casting works like this?
I know that byte's value range is -128 to 127 but char's is 0 to 255. How does not it give an exception even 'é' value is 233?
What's the difference between Python at this point?
Only for Java, I do not use Python:
How / Why byte casting works like this?
It is specified by the Java Language Specification, mostly JLS-5.1.3: "...A narrowing conversion of a char to an integral type T likewise simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the resulting value to be a negative number, even though chars represent 16-bit unsigned integer values..."
("Why?" because it is so specified)
I know that byte's value range is -128 to 127 but char's is 0 to 255. How does not it give an exception even 'é' value is 233?
Wrong, chars are 0 to 65535 (or '\u0000' to '\uFFFF') JLS-4.2.1
No reason for Exception, it will result in the byte value -23 (same bits as 'é' or int 233)
I must pass the last point/question, I do not know enough Python
I have the following code:
int a=-12;
char b=(char) a;
System.out.println(b);
System.out.println(b+0);
It first prints out some empty character and then a number 65524. If I change a to, say, 16 the displayed number becomes 65520. If a is -2, the number is 65534.
If the number is positive and small enough it prints out characters out of Unicode table and returns the character's number (which is the same as a) if everything is OK and that previous strange number from above if it's not allright (if a is too big).
For instance, for a = 8451 it returns ℃ (Degree Celsius) character and a itself (8451), but if a=84510 it returns some strange Chinese symbol and a different from a number (18974). If a is even bigger a=845100 it returns empty sumbol and 58668.
The question is, where do those numbers come from?
I've tried to find the answer, but wasn't lucky so far.
EDIT Since int to char is a narrowing conversion, please consider the same question with byte a. Too large numbers are obviously impossible now, but I wonder what happens with negatives - converting byte a = -x to char gives the same weird numbers as with int a.
int is a signed number which has 4 bytes and 32 bits. The Two's complement representation of -12 is
11111111 11111111 11111111 11110011 .
char is a unsigned which has 2 bytes and 16 bits.
int a=-12;
char b=(char)a;
Two's complement representation of b is
11111111 11110011
which is equivalent to 65524
I think it's because char in Java is an unsigned "Double Byte" integer. When Byte is 8-bits, double-byte is 16-bits = 2 power by 16 = 65536 And you get Two's complement (Binary subtraction operation).
Because the number is unsigned all 16 bits are used to represent the integer, so when you give a negative number it creates an overflow you get the number which is (65536 + a), for example:
When int a = -16; you get 65536 - 16 = 65520(in binary: 1111 1111 1111 0000)
When int a = -2; you get 65536 - 2 = 65534 (in binary: 1111 1111 1111 1110)
When int a = 84510; you exceed the limit of 65536 for char, so you are left with 18974 (84510 - 65536 = 18974).
You get a character from the Unicode table, I guess because it's the character set or code page you defined.
When you cast you should pay attention to the range of values of the data types you cast, in this case, the difference between int and char.
Is this Java Api's bug?
int i = 0xD3951892;
System.out.println(i); // -745203566
String binString = Integer.toBinaryString(i);
int radix = 2;
int j = Integer.valueOf(binString, radix );
Assertions.assertThat(j).isEqualTo(i);
I expect it to be true without any question. But it throws below exception:
java.lang.NumberFormatException: For input string: "11010011100101010001100010010010"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:495)
at java.lang.Integer.valueOf(Integer.java:556)
at com.zhugw.temp.IntegerTest.test_valueof_binary_string(IntegerTest.java:14)
So if I have a binary String , e.g. 11010011100101010001100010010010, How can I get its decimal number(-745203566) in Java? DIY? Write code to implement below equation?
Integer.valueOf(String, int radix) and Integer.parseInt(String, int radix) will only parse numbers of value -2 147 483 648 to 2 147 483 647, i.e. the values of 32-bit signed integers.
These functions cannot interpret two's complement numbers for binary (radix = 2), because the string being passed can be of any length, and so a leading 1 could be part of the number or the sign bit. I guess Java's developers decided that the most logical way to proceed is to never accept two's complement, rather than assume that a 32nd bit is a sign bit.
They read your input binary string as unsigned 3 549 763 730 (bigger than max int value). To read a negative value, you'd want to give a positive binary number with a - sign in front. For example for -5:
Integer.parseInt("1011", 2); // 11
// Even if you extended the 1s to try and make two's complement of 5,
// it would always read it as a positive binary value
Integer.parseInt("-101", 2); // -5, this is right
Solutions:
I suggest, first, that if you can store it as a positive number with extra sign information on your own (e.g. a - symbol), do that. For example:
String binString;
if(i < 0)
binString = "-" + Integer.toBinaryString(-i);
else // positive i
binString = Integer.toBinaryString(i);
If you need to use signed binary strings, in order to take a negative number in binary two's complement form (as a string) and parse it to an int, I suggest you take the two's complement manually, convert that into int, and then correct the sign. Recall that two's complement = one's complement + 1, and one's complement is just reverse each bit.
As an example implementation:
String binString = "11010011100101010001100010010010";
StringBuilder onesComplementBuilder = new StringBuilder();
for(char bit : binString.toCharArray()) {
// if bit is '0', append a 1. if bit is '1', append a 0.
onesComplementBuilder.append((bit == '0') ? 1 : 0);
}
String onesComplement = onesComplementBuilder.toString();
System.out.println(onesComplement); // should be the NOT of binString
int converted = Integer.valueOf(onesComplement, 2);
// two's complement = one's complement + 1. This is the positive value
// of our original binary string, so make it negative again.
int value = -(converted + 1);
You could also write your own version of Integer.parseInt for 32-bit two's complement binary numbers. This, of course, assumes you're not using Java 8 and can't just use Integer.parseUnsignedInt, which #llogiq pointed out while I was typing this.
EDIT: You could also use Long.parseLong(String, 2) first, then calculate the two's complement (and mask it by 0xFFFFFFFF), then downgrade the long down to int. Faster to write, probably faster code.
The API docs for Integer.toBinaryString(..) explicitly state:
The value of the argument can be recovered from the returned string s by calling Integer.parseUnsignedInt(s, 8).
(as of Java 8u25) I think this is a documentation error, and it should read Integer.parseUnsignedInt(s, 2). Note the Unsigned. This is because the toBinaryString output will include the sign bit.
Edit: Note that even though this looks like it would produce an unsigned value, it isn't. This is because Java does not really have a notion of unsigned values, only a few static methods to work with ints as if they were unsigned.
public class bitwise_operator {
public static void main(String[] args) {
int var1 = 42;
int var2 = ~var1;
System.out.println(var1 + " " + var2);
}
}
The above code produces 42 -43 as the output.
As far as my understanding goes, Unary Not operator (~), inverts all of the bits of its operand.
Now, 42 in binary is 00101010. On using ~ operator, we get inverted value of 42 i.e. 11010101
If you convert the preceding binary value, the output should be something else and not -43
Tried my luck with different numbers to observe the pattern and found that, the output is 1 number more than the initial value supplied with a leading (-) sign before it, as seen in the above case.
For eg..,
if num is 45 // Output is 45 -46
if num is 1001 // Output is 1001 -1002
Could someone please explain how the Unary Not Operator (~) works internally to output such results?
You are using a signed integer value which is in 2's complement.
Your result is correct: 11010101 is in fact -43:
-2^7 + 2^6 + 2^4 + 2^2 + 2^0 = -128 + 64 + 16 + 4 + 1 = -128 + 85 = -43
This is what's known as two's complement, and it is how integers and all fixed point numbers in Java, C, C++ etc work
-x = ~x + 1
So for example -1 (0xFFFF) negated bitwise (0x0) plus 1 = 0x1
This method works in C, C++ and Java. I would like to know the science behind it.
The value of a char can be 0-255, where the different characters are mapped to one of these values. The numeric digits are also stored in order '0' through '9', but they're also not typically stored as the first ten char values. That is, the character '0' doesn't have an ASCII value of 0. The char value of 0 is almost always the \0 null character.
Without knowing anything else about ASCII, it's pretty straightforward how subtracting a '0' character from any other numeric character will result in the char value of the original character.
So, it's simple math:
'0' - '0' = 0 // Char value of character 0 minus char value of character 0
// In ASCII, that is equivalent to this:
48 - 48 = 0 // '0' has a value of 48 on ASCII chart
So, similarly, I can do integer math with any of the char numberics...
(('3' - '0') + ('5' - '0') - ('2' - '0')) + '0') = '6'
The difference between 3, 5, or 2 and 0 on the ASCII chart is exactly equal to the face value we typically think of when we see that numeric digit. Subtracting the char '0' from each, adding them together, and then adding a '0' back at the end will give us the char value that represent the char that would be the result of doing that simple math.
The code snippet above emulates 3 + 5 - 2, but in ASCII, it's actually doing this:
((51 - 48) + (53 - 48) - (50 - 48)) + 48) = 54
Because on the ASCII chart:
0 = 48
2 = 50
3 = 51
5 = 53
6 = 54
In C the + and - operators apply integer promotion*1 to their arguments, thus the result of subtracting (or adding) two chars is an int*2.
From the C-Standard:
5.1.2.3 Program execution
[...]
10 EXAMPLE 2 In executing the fragment
char c1, c2;
/* ... */
c1 = c1 + c2;
the ‘‘integer promotions’’ require that the abstract machine promote the value of each variable to int size and then add the two ints [...]
Applying this to the OP's implicitly given use case of
char c = 42;
... = c - `0`;
This would be lead to the above being the same as:
... = (int) c - (int) `0`; /* The second cast is redundant, as per Jens' comment. */
^ ^
+------ int -----+
*1: If the operators arguments have a lower rank than int they are promoted to an int.
*2: char has a lower rank than int.
There's no change going on. '0' is an int in C. It is a fancy way to write 48 (assuming ASCII).
You can convince yourself of this fact by computing the size of '0':
printf ("sizeof '0' is %zu\n", sizeof '0');
printf ("sizeof(char) is %zu\n", sizeof(char));
which in the first line very likely prints 4 (or 2) but probably not 1 like in the second row (again: in C; it's different for C++).
The numeric constant 0, without any qualifications, has type int. The result of the binary subtraction operation on a char and an int also has type int due to the usual type promotion process.