byte to char conversion in java give unpredictable result - java

Hi I just encountered with one question of conversion from byte to int.
code was like this
byte b=(byte)-1;
System.out.println(b);
char c=(char) b;
System.out.println(c);
int i=c;
System.out.println(i);
what I understood is when we convert int -1 to byte it will make 8 bit 2's compliment of +1 so value will be like 1111 1111. when we convert that into char based on MSB it will append 1 or 0. and from char to int just widening conversion is there. but I got output like this.
-1
?
65535
I didn't get why it is printing "?" in 2nd place. please help me out on this

Related

Why byte casting returns decimal value of char?

I have Python background and I don't understand that how byte casting returns decimal value of char according to ASCII.
Here are the some code examples:
// C#
string s = "abc123éé";
int[] x = new int[255];
for (int i = 0; i < s.Length; i++){
amount[(byte)s[i] - (byte)'0']++;
}
If we look for first iteration the casting is on 'a' char and it returns 97.
// Java
char a = 'a';
System.out.println((byte)a);
Same as Java, it returns 97 too. But in Python 3, it does not return as decimal value of char.
>>> a = bytes("a", encoding="utf-8")
>>> a
b'a'
And now if we're coming to my questions:
How / Why byte casting works like this?
I know that byte's value range is -128 to 127 but char's is 0 to 255. How does not it give an exception even 'é' value is 233?
What's the difference between Python at this point?
Only for Java, I do not use Python:
How / Why byte casting works like this?
It is specified by the Java Language Specification, mostly JLS-5.1.3: "...A narrowing conversion of a char to an integral type T likewise simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the resulting value to be a negative number, even though chars represent 16-bit unsigned integer values..."
("Why?" because it is so specified)
I know that byte's value range is -128 to 127 but char's is 0 to 255. How does not it give an exception even 'é' value is 233?
Wrong, chars are 0 to 65535 (or '\u0000' to '\uFFFF') JLS-4.2.1
No reason for Exception, it will result in the byte value -23 (same bits as 'é' or int 233)
I must pass the last point/question, I do not know enough Python

JAVA Byte Manipulation

I want to read a binary file and do some manipulation on each byte. I want to test that I am manipulating the bytes correctly. I want to set a byte variable1 to "00000000" and then another byte variable2 set at "00001111" and OR them newvariable = variable1|variable2, shift the newvariable << 4 bits and then print out the int value.
byte a = 00000000;
//Convert first oneByte to 4 bits and then xor with a;
byte b = 00001111;
byte c = (byte)(a|b);
c = c << 4;
System.out.println("byte= " + c + "\n");
I am not sure why I keep getting "incompatiable types:possible lossy conversion from byte to int"
You need to put a '0b' in front of those numbers to express binary constants. The number 00001111 is interpreted as a literal in octal, which is 585 in decimal. The max byte is 127 (since it's signed). Try 0b00001111 instead.
As literals, those will still be int, so depending on where you do the assignment, you may also need to explicitly cast down to byte.

Int to char conversion, wrong number

I have the following code:
int a=-12;
char b=(char) a;
System.out.println(b);
System.out.println(b+0);
It first prints out some empty character and then a number 65524. If I change a to, say, 16 the displayed number becomes 65520. If a is -2, the number is 65534.
If the number is positive and small enough it prints out characters out of Unicode table and returns the character's number (which is the same as a) if everything is OK and that previous strange number from above if it's not allright (if a is too big).
For instance, for a = 8451 it returns ℃ (Degree Celsius) character and a itself (8451), but if a=84510 it returns some strange Chinese symbol and a different from a number (18974). If a is even bigger a=845100 it returns empty sumbol and 58668.
The question is, where do those numbers come from?
I've tried to find the answer, but wasn't lucky so far.
EDIT Since int to char is a narrowing conversion, please consider the same question with byte a. Too large numbers are obviously impossible now, but I wonder what happens with negatives - converting byte a = -x to char gives the same weird numbers as with int a.
int is a signed number which has 4 bytes and 32 bits. The Two's complement representation of -12 is
11111111 11111111 11111111 11110011 .
char is a unsigned which has 2 bytes and 16 bits.
int a=-12;
char b=(char)a;
Two's complement representation of b is
11111111 11110011
which is equivalent to 65524
I think it's because char in Java is an unsigned "Double Byte" integer. When Byte is 8-bits, double-byte is 16-bits = 2 power by 16 = 65536 And you get Two's complement (Binary subtraction operation).
Because the number is unsigned all 16 bits are used to represent the integer, so when you give a negative number it creates an overflow you get the number which is (65536 + a), for example:
When int a = -16; you get 65536 - 16 = 65520(in binary: 1111 1111 1111 0000)
When int a = -2; you get 65536 - 2 = 65534 (in binary: 1111 1111 1111 1110)
When int a = 84510; you exceed the limit of 65536 for char, so you are left with 18974 (84510 - 65536 = 18974).
You get a character from the Unicode table, I guess because it's the character set or code page you defined.
When you cast you should pay attention to the range of values of the data types you cast, in this case, the difference between int and char.

creating hex byte array in java

Here is the problem, I need to do it in Java:
I need to create a byte array with hex values, to send via socket to a device. message format is something like this
STX cmd1 Arg1 , cmd2 ETX Checksum // Any number of commands and arguments
Example :
STX A 1 ETX 148 // 1 and 148 are in decimal STX is 0x02 and ETX is 0x03 , not text STX and ETX.
The byte array which is to be generated for the above example is this :
STX A 1 ETX 148
{(byte)0x2,(byte)0x41,(byte)0x31,(byte)0x3, (byte)0x94}
Can you please help me. How do I do convert these numbers/characters and assign to byte array?
Unless I'm mistaken, you're already heading in the right direction.
A few things to know, an unsigned byte goes from 0 to 255 (0x00 to 0xFF). In Java, there are only signed data types and a byte goes from -128 to +127.
System.out.println(Byte.MIN_VALUE); // -128
System.out.println(Byte.MAX_VALUE); // +127
If fields 3 & 5 are ints casting them to bytes is fine, but know that anything that is over +127 when casted to a byte will overflow into the negative range.
System.out.println((byte)0x94); // -108
System.out.println((byte)148); // -108
If you're wanting the actual positive value of the byte you can AND each byte against 0xFF.
System.out.println(((byte)-108) & 0xFF); // +148
System.out.println(((byte)-1) & 0xFF); // +255
Couldn't you just use what Sotirios Delimanolis proposed in the comments and put your char variables in there?
char a = 'A';
char b = '1';
byte[] buffer = {(byte)0x2, (byte)a, (byte)b, (byte)0x3, (byte)0x94};
Or am I missing something here?

parseInt on a string of 8 bits returns a negative value when the first bit is 1

I've got a huge string of bits (with some \n in it too) that I pass as a parameter to a method, which should isolate the bits 8 by 8, and convert them all to bytes using parseInt().
Thing is, every time the substring of 8 bits starts with a 1, the resulting byte is a negative number. For example, the first substring is '10001101', and the resulting byte is -115. I can't seem to figure out why, can someone help? It works fine with other substrings.
Here's my code, if needed :
static String bitsToBytes(String geneString) {
String geneString_temp = "", sub;
for(int i = 0; i < geneString.length(); i = i+8) {
sub = geneString.substring(i, i+8);
if (sub.indexOf("\n") != -1) {
if (sub.indexOf("\n") != geneString.length())
sub = sub.substring(0, sub.indexOf("\n")) + sub.substring(sub.indexOf("\n")+1, sub.length()) + geneString.charAt(i+9);
}
byte octet = (byte) Integer.parseInt(sub, 2);
System.out.println(octet);
geneString_temp = geneString_temp + octet;
}
geneString = geneString_temp + "\n";
return geneString;
}
In Java, byte is a signed type, meaning that when the most significant bit it set to 1, the number is interpreted as negative.
This is precisely what happens when you print your byte here:
System.out.println(octet);
Since PrintStream does not have an overload of println that takes a single byte, the overload that takes an int gets called. Since octet's most significant bit is set to 1, the number gets sign-extended by replicating its sign bit into bits 9..32, resulting in printout of a negative number.
byte is a signed two's complement integer. So this is a normal behavior: the two's complement representation of a negative number has a 1 in the most-significant bit. You could think of it like a sign bit.
If you don't like this, you can use the following idiom:
System.out.println( octet & 0xFF );
This will pass the byte as an int while preventing sign extension. You'll get an output as if it were unsigned.
Java doesn't have unsigned types, so the only other thing you could do is store the numbers in a wider representation, e.g. short.
In Java, all integers are signed, and the most significant bit is the sign bit.
Because parseInt parse signed int that means it converts the binary if it begins with 0 its positive and if 1 its negative try to use parseUnsignedInt instead

Categories