Int to char conversion, wrong number - java

I have the following code:
int a=-12;
char b=(char) a;
System.out.println(b);
System.out.println(b+0);
It first prints out some empty character and then a number 65524. If I change a to, say, 16 the displayed number becomes 65520. If a is -2, the number is 65534.
If the number is positive and small enough it prints out characters out of Unicode table and returns the character's number (which is the same as a) if everything is OK and that previous strange number from above if it's not allright (if a is too big).
For instance, for a = 8451 it returns ℃ (Degree Celsius) character and a itself (8451), but if a=84510 it returns some strange Chinese symbol and a different from a number (18974). If a is even bigger a=845100 it returns empty sumbol and 58668.
The question is, where do those numbers come from?
I've tried to find the answer, but wasn't lucky so far.
EDIT Since int to char is a narrowing conversion, please consider the same question with byte a. Too large numbers are obviously impossible now, but I wonder what happens with negatives - converting byte a = -x to char gives the same weird numbers as with int a.

int is a signed number which has 4 bytes and 32 bits. The Two's complement representation of -12 is
11111111 11111111 11111111 11110011 .
char is a unsigned which has 2 bytes and 16 bits.
int a=-12;
char b=(char)a;
Two's complement representation of b is
11111111 11110011
which is equivalent to 65524

I think it's because char in Java is an unsigned "Double Byte" integer. When Byte is 8-bits, double-byte is 16-bits = 2 power by 16 = 65536 And you get Two's complement (Binary subtraction operation).
Because the number is unsigned all 16 bits are used to represent the integer, so when you give a negative number it creates an overflow you get the number which is (65536 + a), for example:
When int a = -16; you get 65536 - 16 = 65520(in binary: 1111 1111 1111 0000)
When int a = -2; you get 65536 - 2 = 65534 (in binary: 1111 1111 1111 1110)
When int a = 84510; you exceed the limit of 65536 for char, so you are left with 18974 (84510 - 65536 = 18974).
You get a character from the Unicode table, I guess because it's the character set or code page you defined.
When you cast you should pay attention to the range of values of the data types you cast, in this case, the difference between int and char.

Related

Why byte casting returns decimal value of char?

I have Python background and I don't understand that how byte casting returns decimal value of char according to ASCII.
Here are the some code examples:
// C#
string s = "abc123éé";
int[] x = new int[255];
for (int i = 0; i < s.Length; i++){
amount[(byte)s[i] - (byte)'0']++;
}
If we look for first iteration the casting is on 'a' char and it returns 97.
// Java
char a = 'a';
System.out.println((byte)a);
Same as Java, it returns 97 too. But in Python 3, it does not return as decimal value of char.
>>> a = bytes("a", encoding="utf-8")
>>> a
b'a'
And now if we're coming to my questions:
How / Why byte casting works like this?
I know that byte's value range is -128 to 127 but char's is 0 to 255. How does not it give an exception even 'é' value is 233?
What's the difference between Python at this point?
Only for Java, I do not use Python:
How / Why byte casting works like this?
It is specified by the Java Language Specification, mostly JLS-5.1.3: "...A narrowing conversion of a char to an integral type T likewise simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the resulting value to be a negative number, even though chars represent 16-bit unsigned integer values..."
("Why?" because it is so specified)
I know that byte's value range is -128 to 127 but char's is 0 to 255. How does not it give an exception even 'é' value is 233?
Wrong, chars are 0 to 65535 (or '\u0000' to '\uFFFF') JLS-4.2.1
No reason for Exception, it will result in the byte value -23 (same bits as 'é' or int 233)
I must pass the last point/question, I do not know enough Python

Java implicit conversion between int and char [duplicate]

This question already has answers here:
Calculating with the char variable in java
(2 answers)
Closed 4 years ago.
I'm looking for an explanation for Java's behavior when handling the following scenarios. I understand that the ASCII table is arranged so that the value of the character 5 is five positions greater than 0. This allows for calculations to be done on the char without converting to an int as seen in the first example.
What I don't understand is why Java seems to inconsistently handle when to provide a value from an ASCII table and when to do a calculation on the chars as though they were integers.
int x = '5' - '0';
output x = 5;
int x = '5'
output x = 53;
Now for some examples, that introduce confusion.
int x = '0' + 1 - '5'
output x = -4
int y = '5' - '0' + '1'
output 54
int y = '5' - 0 + '1'
output 102
Java seems to be doing an implicit type conversion, but how is Java inferring which representation of the int/char should it be using?
Just write the char conversion to ASCII code (below your statements)
int x = '0' + 1 - '5'
48 + 1 - 53 = -4
int y = '5' - 0 + '1'
53 - 0 + 49 = 102
int y = '5' - '0' + '1'
53 - 48 + 49 = 54
Notice it's consistent, each int remains int and each char converted to ASCII code
This might seem to be inconsistent but in real they are consistent.
int x = '5' - '0';
output x = 5; because behind the back ASCII codes are, '5'=53 and '0'=48.
Hence
int x = '5'
output x = 53;
You might be mixing the representation from the value. The values never change, so when you perform arithmatic it will always be that '5'==53 and not 5. For the display JLS on primitive to string conversion.
Integer arithmetic is promoted to int for most calculations.
System.out.println('5' + '0');
>>> 101
System.out.println((char)('5' + '0'));
>>> e
Both results have the same numeric value, but one is displayed as a character because it has been cast to character.
Java seems to be doing an implicit type conversion, but how is Java inferring which representation of the int/char should it be using?
It's actually quite simple. char is one of the numeric types in Java, see 4.2.1. Integral Types and Values:
The values of the integral types are integers in the following ranges:
[...]
For char, from '\u0000' to '\uffff' inclusive, that is, from 0 to 65535
All operations on integer types are carried out either with int- or long- precision, see 4.2.2. Integer Operations:
If an integer operator other than a shift operator has at least one operand of type long, then the operation is carried out using 64-bit precision, and the result of the numerical operator is of type long. If the other operand is not long, it is first widened (§5.1.5) to type long by numeric promotion (§5.6).
Otherwise, the operation is carried out using 32-bit precision, and the result of the numerical operator is of type int. If either operand is not an int, it is first widened to type int by numeric promotion.
Note the last sentence: this defines the conversion to be applied, it is called "numeric promotion".
char '0' does not equals int 0. char '0''s binary representation occupies 16 bit:
0000 0000 0011 0000
while int 0's binary representation occupies 32 bit:
0000 0000 0000 0000 0000 0000 0000 0000
When you sum a char and an int, the char will be promoted to int first.
For example. char 5's unicode is 0035, in binary 0000 0000 0011 0101, it will be promoted to int by inserting 16 zeros at head, 0000 0000 0000 0000 0000 0000 0011 0101, and the int represents 53 in decimal.

byte to char conversion in java give unpredictable result

Hi I just encountered with one question of conversion from byte to int.
code was like this
byte b=(byte)-1;
System.out.println(b);
char c=(char) b;
System.out.println(c);
int i=c;
System.out.println(i);
what I understood is when we convert int -1 to byte it will make 8 bit 2's compliment of +1 so value will be like 1111 1111. when we convert that into char based on MSB it will append 1 or 0. and from char to int just widening conversion is there. but I got output like this.
-1
?
65535
I didn't get why it is printing "?" in 2nd place. please help me out on this

Java Integer.parseInt() for 32-bit signed binary string throws NumberFormatException

Is this Java Api's bug?
int i = 0xD3951892;
System.out.println(i); // -745203566
String binString = Integer.toBinaryString(i);
int radix = 2;
int j = Integer.valueOf(binString, radix );
Assertions.assertThat(j).isEqualTo(i);
I expect it to be true without any question. But it throws below exception:
java.lang.NumberFormatException: For input string: "11010011100101010001100010010010"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:495)
at java.lang.Integer.valueOf(Integer.java:556)
at com.zhugw.temp.IntegerTest.test_valueof_binary_string(IntegerTest.java:14)
So if I have a binary String , e.g. 11010011100101010001100010010010, How can I get its decimal number(-745203566) in Java? DIY? Write code to implement below equation?
Integer.valueOf(String, int radix) and Integer.parseInt(String, int radix) will only parse numbers of value -2 147 483 648 to 2 147 483 647, i.e. the values of 32-bit signed integers.
These functions cannot interpret two's complement numbers for binary (radix = 2), because the string being passed can be of any length, and so a leading 1 could be part of the number or the sign bit. I guess Java's developers decided that the most logical way to proceed is to never accept two's complement, rather than assume that a 32nd bit is a sign bit.
They read your input binary string as unsigned 3 549 763 730 (bigger than max int value). To read a negative value, you'd want to give a positive binary number with a - sign in front. For example for -5:
Integer.parseInt("1011", 2); // 11
// Even if you extended the 1s to try and make two's complement of 5,
// it would always read it as a positive binary value
Integer.parseInt("-101", 2); // -5, this is right
Solutions:
I suggest, first, that if you can store it as a positive number with extra sign information on your own (e.g. a - symbol), do that. For example:
String binString;
if(i < 0)
binString = "-" + Integer.toBinaryString(-i);
else // positive i
binString = Integer.toBinaryString(i);
If you need to use signed binary strings, in order to take a negative number in binary two's complement form (as a string) and parse it to an int, I suggest you take the two's complement manually, convert that into int, and then correct the sign. Recall that two's complement = one's complement + 1, and one's complement is just reverse each bit.
As an example implementation:
String binString = "11010011100101010001100010010010";
StringBuilder onesComplementBuilder = new StringBuilder();
for(char bit : binString.toCharArray()) {
// if bit is '0', append a 1. if bit is '1', append a 0.
onesComplementBuilder.append((bit == '0') ? 1 : 0);
}
String onesComplement = onesComplementBuilder.toString();
System.out.println(onesComplement); // should be the NOT of binString
int converted = Integer.valueOf(onesComplement, 2);
// two's complement = one's complement + 1. This is the positive value
// of our original binary string, so make it negative again.
int value = -(converted + 1);
You could also write your own version of Integer.parseInt for 32-bit two's complement binary numbers. This, of course, assumes you're not using Java 8 and can't just use Integer.parseUnsignedInt, which #llogiq pointed out while I was typing this.
EDIT: You could also use Long.parseLong(String, 2) first, then calculate the two's complement (and mask it by 0xFFFFFFFF), then downgrade the long down to int. Faster to write, probably faster code.
The API docs for Integer.toBinaryString(..) explicitly state:
The value of the argument can be recovered from the returned string s by calling Integer.parseUnsignedInt(s, 8).
(as of Java 8u25) I think this is a documentation error, and it should read Integer.parseUnsignedInt(s, 2). Note the Unsigned. This is because the toBinaryString output will include the sign bit.
Edit: Note that even though this looks like it would produce an unsigned value, it isn't. This is because Java does not really have a notion of unsigned values, only a few static methods to work with ints as if they were unsigned.

Hex representaion of bytes in a String

I read somewhere that string 0123456789ABCDEFFEDCBA987654321089ABCDEF01234567 is 192 bit (24). Its written that it is a "hex representation of bytes"
I need help on this concept.
PS: This is secret key of TripleDES algorithm.
In hexadecimal numbers, you have 16 different digits. These are written using first the ordinary 10 symbols used for decimal digits, 0 through 9. Then the first six letters of the Latin alphabet are used, i.e. A through F.
Since each digit represents a value in the range 0 through F, i.e. one of sixteen possibilities, it holds four bits of information. Thus, in a long string of hex digits, you can compute the total number of bits of information as just four times the number of digits present.
Your example string, "0123456789ABCDEFFEDCBA987654321089ABCDEF01234567", is 48 digits. This means it is a 48 * 4 = 192 bit number, in hexadecimal form.
If you're interested in viewing this large number as a sequence of bytes, just take pairs of digits, since each byte is 8 bits. The first (counting from the left) few bytes then become 0x01, 0x23, 0x45, and so on.
It's just a big number. The only difference between the numbers you are used to (such as "192") is that it's written in using the hexadecimal number system instead of the decimal number system. The hexadecimal number system uses 16 digits (0-9 and A-F) instead of the 10 you are used to (0-9).
That particular number is equivalent to 27898229935051914480226618602452055732231960245295072615 in decimal notation.
Joachim already explained the theoretical concept. If you want to play around with such numbers yourself in Java, then take a look at java.math.BigInteger.
E.g., to convert your hexadecimal number to the decimal or any other system:
// the "radix" is 16 because the string represents a hexadecimal number
BigInteger bi = new BigInteger(
"0123456789ABCDEFFEDCBA987654321089ABCDEF01234567", 16);
// print the number in decimal (digits 0-9)
System.out.println(bi.toString(10));
// print the number in octal (digits 0-7)
System.out.println(bi.toString(8));
0123456789ABCDEF FEDCBA9876543210 89ABCDEF01234567
3 (hex) keys aka 3 * 8 bytes or3 * 8 * 8 = 192 bits.
Each character in hexadecimal corresponds to 4 bits. So for your example, there are 48 characters and 48 * 4 = 192 bits.
With regard to your second question: How does the JVM distinguish between numbers in different bases?
It does not and can not do so! You as a programmer have to support the JVM. If you specify small constants, then you use a prefix to signal either decimal, octal, or hexadecimal base:
// A leading zero signals a constant in octal base;
// octal 46 is decimal 38
final int n1 = 046;
// A leading "0x" signals a constant in hexadecimal base;
// hex 3f is decimal 63
final int n2 = 0x3f;
// No prefix refers to a regular decimal number
final int n3 = 12;
There are only prefixes for octal, decimal, and hexadecimal base because these are most often used by programmers. Note, that there is no prefix for binary constants!
If you use the java.math.BigInteger class like I did in my previous reply to your first question, then you have to specify the base in the constructor:
// input numbers in octal, decimal, and hexadecimal;
// Java needs your help to recognize the base!
final BigInteger b8 = new BigInteger("12345", 8);
final BigInteger b10 = new BigInteger("12345", 10);
final BigInteger b16 = new BigInteger("12345", 16);
// output them in decimal system
System.out.println(b8.toString()); // prints "5349"
System.out.println(b10.toString()); // prints "12345"
System.out.println(b16.toString()); // prints "74565"

Categories