Wrapper's parseXXX() for signed binary misunderstanding

Wrapper's parseXXX() for signed binary misunderstanding - java

Let's take Byte.parseByte() as an example as one of the wrappers' parseXXX().
From parseByte(String s, int radix)'s JavaDoc:
Parses the string argument as a signed byte in the radix specified by
the second argument.
But that's not quite true if radix = 2. In other words, the binary literal of -127 is 10000000:
byte b = (byte) 0b10000000;
So the following should be true:
byte b = Byte.parseByte("10000000", 2);
but unfortunately, it throws NumberFormatException, and instead I have to do it as follows:
byte b = Byte.parseByte("-111111", 2);
where parseByte() parses the binary string as a sign-magnitude (the sign and the magnitude), where it should parse as a signed binary (2's complement, i.e. MSB is the sign-bit).
Am I wrong about this?

Am I wrong about this?
Yes. The Javadoc says nothing about 2's-complement. Indeed, it explicitly states how it recognises negative values (i.e. a - prefix, so effectively "human-readable" sign-magnitude).
Think about it another way. If parseByte interpreted radix-2 as 2's-complement, what would you want it to do for radix-10 (or indeed, any other radix)? For consistency, it would have to be 10's-complement, which would be inconvenient, I can assure you!

This is because for parseByte "10000000" is a positive value (128) which does not fit into byte values range -128 to 128. But we can parse two's comlement binary string representation with BigInteger:
byte b = new BigInteger("10000000", 2).byteValue()
this gives expected -128 result

Related

In java, how come method read() from FileInputStream works does not throw " incompatible types: possible lossy conversion"?

I am currently going through Java I/O tutorial and having hard time understanding the read() method of FileInputStream class. I know that per documantion that read() method reads "byte" of data from stream and returns an integer representing the byte (between 0 and 256) or -1 if it reaches the end of file.
Byte in java has a range between -128 and 127, so, how come when I edit xanadu.txt and add ASCI symbol "ƒ" (which has a decimal value of 131), java does not complain by throwing an error that value 131 is out of range defined by byte (-128 and 127)? When I try to test this using literals I get two different results.
The following works:
byte b = 120;
int c = b;
System.out.println((char)c);
Output: x
But this does NOT work (even though it works when added to xanadu.txt):
byte b = 131;
int c = b;
System.out.println((char)c);
Output: error: incompatible types: possible lossy conversion from int to byte
byte b = 131;
I tried explicitly casting using byte: (how is this possible?)
byte b = (byte)131;
int c = b;
System.out.println((char)c);
Output: ﾃ
I am total newbie when it comes to I/O streams, somebody please help me understand it.
EDIT: Turns out my knowledge on concepts of type casting was lacking, specifically in understanding the difference between "Widening" and "Narrowing". Reading up more about these concepts helped me understand why explicit (aka narrowing) casting works.
Allow me to explain: Look at the 3rd code block where I am explicitly casting the literal '131' to type of byte. If we are to convert the literal 131 into binary form of 32-bit signed 2's complement integer, we will get 00000000 00000000 00000000 10000011 which is 32-bits or 4 bytes. Recall that Java data type 'byte' can only hold 8-bit signed 2's complement integer, so, 131 is out of range and thus we get error "possible lossy conversion from int to byte". But, when we explicitly cast it to byte, we are 'chopping off' or correct term would be 'narrowing' the binary down to 8 bit integer. So, when we do that, then the resulting binary is 10000011 which is -125 in decimal value. Since -125 is in range of -128 and 127, byte has no issues accepting and storing it. Now when I try to story the value of byte in int c, implicit or "widening" casting takes place, where -125 in binary form of 8-bit 10000011 is converted into equivalent -125 in binary form of 32-bit 11111111 11111111 11111111 10000011. Finally, system.out is trying to output the value of (char)c which is another explicit or "narrowing" casting where its trying to shrink from 32-bit signed to 16-bit unsigned. When casting is complete, we get 11111111 10000011 in binary form. Now, when this binary is converted into character form by java, it returns ﾃ.
I can conclude by saying that it helps converting everything into binary form and go from there. But make sure you understand encoding and 2's complement

I don't know where you got the value 131 from, but as far as I am concerned, LATIN SMALL LETTER F WITH HOOK (ƒ) is not in the original ASCII character set, but in extended ASCII, with a decimal value of 159. See here. It is also encoded in UTF-16 (how Java chars are encoded) as hex 192 (decimal value 402).
First, ensure that your text files are encoded in extended ASCII, and not UTF-8 (which is the most likely encoding). Then you can use a FileInputStream to read the file, and you will get 159.
Note that 159 is outside the range of the the Java byte type. This is fine, because read returns an int. If the text file is encoded in UTF-8 however, ƒ is encoded in 2 bytes, so read will be reading one byte at a time.
Your second code block doesn't work because as you said, byte goes from -128 to 127, so 131 obviously doesn't fit.
Your third code block forces 131 into a byte, which causes overflow and the value "wraps back around" to -125. b and c are both -125. When you cast this to a char it becomes 65411 because this conversion involves padding the whole number to 16-bits first, then treating it as an unsigned integer.
The reason why this all works when you use FileInputStream.read instead of doing these conversions yourself, is because read actually returns an int, not a byte. It's just that the int it returns will always be in the range -1 ~ 255. This is why we say "read returns a byte", but its actual return type is int.

byte b = 131; // this is 8 bits type, but >8 bits value
int c = b; // this is 32 bits type
System.out.println((char)c); // this is 16 bits type
Output: error: incompatible types: possible lossy conversion from int to byte
byte b = 131;
The two-complement encoding of 131 is:
2^7+2^1+2^0
^^^
sign bit
131 won't fit in a signed byte without an overflow in the two complement representation that is used for signed types. The highest bit=sign bit is set which gets extended when casting from byte to int.
The Java compiler notices that 131 won't fit properly in a byte which leads to the error.

java - Why is 0x000F stored as unsigned?

I was reading through examples trying to understand how to convert signed bytes to unsigned integer counter parts.
The most popular method that I have come across is:
a & 0xFF
Where a is the signed byte.
My question is why is 0xFF stored as unsigned? Are all hex values stored as unsigned? If so why?
And how does "and"-ing turn off the sign bit in the sign integer?
It would be great if someone could break down the process step by step.

You probably saw this in code that converted a byte to an integer, where they wanted to treat the byte as an unsigned value in the range 0-255. It does not apply to integers in general. If you want to make an integer a "unsigned", you can do:
int unsignedA = a & 0x7FFFFFFF;
This will ensure that unsignedA is positive - but it does that by chopping off the high bit, so for example if a was -1, then unsignedA is Integer.MAX_VALUE.
There is no way to turn a 32-bit signed Java integer into a 32-bit unsigned Java integer because there is no datatype in Java for a 32-bit unsigned integer. The only unsigned integral datatype in Java is 16 bits long: char.
If you want to store a 32-bit unsigned integral value in Java, you need to store it in a long:
long unsignedA = a & 0xFFFFFFFFL;

To elaborate on Erwin's answer about converting a byte to an integer: In Java, byte is a signed integer type. That means it has values in the range -128 to 127. If you say:
byte a;
int b;
a = -64;
b = a;
The language will preserve the value; that is, it will set b to -64.
But if you really want to convert your byte to a value from 0 to 255 (which I guess you call the "unsigned counterpart" of the byte value), you can use a & 0xFF. Here's what happens:
Java does not do arithmetic directly on byte or short types. So when it sees a & 0xFF, it converts both sides to an int. The hex value of a, which is a byte, looks like
a = C0
When it's converted to a 32-bit integer, the value (-64) has to be preserved, so that means the 32-bit integer has to have 1 bits in the upper 24 bits. Thus:
a = C0
(int)a = FFFFFFC0
But then you "and" it with 0xFF:
a = C0
(int)a = FFFFFFC0
& 000000FF
--------
a & FF = 000000C0
And the result is an integer in the range 0 to 255.

In Java, literals (1, 0x2A, etc) are positive unless you explicitly indicate that they are negative. It's how we intuitively write numbers.
This previous question answers you question about converting to unsigned. Understanding Java unsigned numbers

Java vs. C#: BigInteger hex string yields different result?

Question:
This code in Java:
BigInteger mod = new BigInteger("86f71688cdd2612ca117d1f54bdae029", 16);
produces (in java) the number
179399505810976971998364784462504058921
However, when I use C#,
BigInteger mod = BigInteger.Parse("86f71688cdd2612ca117d1f54bdae029", System.Globalization.NumberStyles.HexNumber); // base 16
i don't get the same number, I get:
-160882861109961491465009822969264152535
However, when I create the number directly from decimal, it works
BigInteger mod = BigInteger.Parse("179399505810976971998364784462504058921");
I tried converting the hex string in a byte array and reversing it, and creating a biginteger from the reversed array, just in case it's a byte array with different endianness, but that didn't help...
I also encountered the following problem when converting Java-Code to C#:
Java
BigInteger k0 = new BigInteger(byte[]);
to get the same number in C#, I must reverse the array because of different Endianness in the biginteger implementation
C# equivalent:
BigInteger k0 = new BigInteger(byte[].Reverse().ToArray());

Here's what MSDN says about BigInteger.Parse:
If value is a hexadecimal string, the Parse(String, NumberStyles) method interprets value as a negative number stored by using two's complement representation if its first two hexadecimal digits are greater than or equal to 0x80. In other words, the method interprets the highest-order bit of the first byte in value as the sign bit. To make sure that a hexadecimal string is correctly interpreted as a positive number, the first digit in value must have a value of zero. For example, the method interprets 0x80 as a negative value, but it interprets either 0x080 or 0x0080 as a positive value.
So, add a 0 in front of the parsed hexadecimal number to force an unsigned interpretation.
As for round-tripping a big integer represented by a byte array between Java and C#, I'd advise against that, unless you really have to. But both implementations happen to use a compatible two's complement representation, if you fix the endianness issue.
MSDN says:
The individual bytes in the array returned by this method appear in little-endian order. That is, the lower-order bytes of the value precede the higher-order bytes. The first byte of the array reflects the first eight bits of the BigInteger value, the second byte reflects the next eight bits, and so on.
Java docs say:
Returns a byte array containing the two's-complement representation of this BigInteger. The byte array will be in big-endian byte-order: the most significant byte is in the zeroth element.

JAVA: why binary literal for byte with negative sign is being considered as integer type?

I can't understand the following behavior.
I'm trying to declare byte mask using binary literal:
byte mask = 0b1111_1111;
But that's not possible, because I get the following error message:
Type mismatch: cannot convert from int to byte
The most interesting thing is that when I try to declare the mask directly, in decimal representation
byte mask = -1;
I get no error, but these two representations should be absolutely equal!
What am I doing wrong?
Thanks in advance.

You can safely assign a values from -2^7 to 2^7-1 (-128 to 127) to a byte ,since it is 8 bits.
where as 0b1111_1111 = 255
So need a cast there
byte mask = (byte) 0b1111_1111;

The value 0b1111_1111 is equal to 255, outside the byte's range of [-128, 127](because it is signed). Use:
byte mask=(byte)0b1111_1111&0xff;
The narrowing will remove the (all-zero) high bits and fit 8 into 8 without regard for sign.

Your "byte mask" is equivalent to 0xff or 255, which are too large to fit in an 8-bit signed byte, not -1, because the literal in the code is an int. If the value is within the range of a smaller type, the compiler can safely stuff it in there, but it can't safely assign a value outside the range -128..127 to a byte variable, and you'll need a cast.

All numerical literals are considered as 'int', unless cast otherwise or they contain a decimal point or an 'e'.

you can do type casting like this
byte mask = (byte) 0b1111_1111;

Declaring an unsigned int in Java

Is there a way to declare an unsigned int in Java?
Or the question may be framed as this as well:
What is the Java equivalent of unsigned?
Just to tell you the context I was looking at Java's implementation of String.hashcode(). I wanted to test the possibility of collision if the integer were 32 unsigned int.

Java does not have a datatype for unsigned integers.
You can define a long instead of an int if you need to store large values.
You can also use a signed integer as if it were unsigned. The benefit of two's complement representation is that most operations (such as addition, subtraction, multiplication, and left shift) are identical on a binary level for signed and unsigned integers. A few operations (division, right shift, comparison, and casting), however, are different. As of Java SE 8, new methods in the Integer class allow you to fully use the int data type to perform unsigned arithmetic:
In Java SE 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 2^32-1. Use the Integer class to use int data type as an unsigned integer. Static methods like compareUnsigned, divideUnsigned etc have been added to the Integer class to support the arithmetic operations for unsigned integers.
Note that int variables are still signed when declared but unsigned arithmetic is now possible by using those methods in the Integer class.

Whether a value in an int is signed or unsigned depends on how the bits are interpreted - Java interprets bits as a signed value (it doesn't have unsigned primitives).
If you have an int that you want to interpret as an unsigned value (e.g. you read an int from a DataInputStream that you know should be interpreted as an unsigned value) then you can do the following trick.
int fourBytesIJustRead = someObject.getInt();
long unsignedValue = fourBytesIJustRead & 0xffffffffL;
Note, that it is important that the hex literal is a long literal, not an int literal - hence the 'L' at the end.

We needed unsigned numbers to model MySQL's unsigned TINYINT, SMALLINT, INT, BIGINT in jOOQ, which is why we have created jOOU, a minimalistic library offering wrapper types for unsigned integer numbers in Java. Example:
import static org.joou.Unsigned.*;
// and then...
UByte b = ubyte(1);
UShort s = ushort(1);
UInteger i = uint(1);
ULong l = ulong(1);
All of these types extend java.lang.Number and can be converted into higher-order primitive types and BigInteger. Hope this helps.
(Disclaimer: I work for the company behind these libraries)

For unsigned numbers you can use these classes from Guava library:
UnsignedInteger
UnsignedLong
They support various operations:
plus
minus
times
mod
dividedBy
The thing that seems missing at the moment are byte shift operators. If you need those you can use BigInteger from Java.

Perhaps this is what you meant?
long getUnsigned(int signed) {
return signed >= 0 ? signed : 2 * (long) Integer.MAX_VALUE + 2 + signed;
}
getUnsigned(0) → 0
getUnsigned(1) → 1
getUnsigned(Integer.MAX_VALUE) → 2147483647
getUnsigned(Integer.MIN_VALUE) → 2147483648
getUnsigned(Integer.MIN_VALUE + 1) → 2147483649

Use char for 16 bit unsigned integers.

There are good answers here, but I don’t see any demonstrations of bitwise operations. Like Visser (the currently accepted answer) says, Java signs integers by default (Java 8 has unsigned integers, but I have never used them). Without further ado, let‘s do it...
RFC 868 Example
What happens if you need to write an unsigned integer to IO? Practical example is when you want to output the time according to RFC 868. This requires a 32-bit, big-endian, unsigned integer that encodes the number of seconds since 12:00 A.M. January 1, 1900. How would you encode this?
Make your own unsigned 32-bit integer like this:
Declare a byte array of 4 bytes (32 bits)
Byte my32BitUnsignedInteger[] = new Byte[4] // represents the time (s)
This initializes the array, see Are byte arrays initialised to zero in Java?. Now you have to fill each byte in the array with information in the big-endian order (or little-endian if you want to wreck havoc). Assuming you have a long containing the time (long integers are 64 bits long in Java) called secondsSince1900 (Which only utilizes the first 32 bits worth, and you‘ve handled the fact that Date references 12:00 A.M. January 1, 1970), then you can use the logical AND to extract bits from it and shift those bits into positions (digits) that will not be ignored when coersed into a Byte, and in big-endian order.
my32BitUnsignedInteger[0] = (byte) ((secondsSince1900 & 0x00000000FF000000L) >> 24); // first byte of array contains highest significant bits, then shift these extracted FF bits to first two positions in preparation for coersion to Byte (which only adopts the first 8 bits)
my32BitUnsignedInteger[1] = (byte) ((secondsSince1900 & 0x0000000000FF0000L) >> 16);
my32BitUnsignedInteger[2] = (byte) ((secondsSince1900 & 0x000000000000FF00L) >> 8);
my32BitUnsignedInteger[3] = (byte) ((secondsSince1900 & 0x00000000000000FFL); // no shift needed
Our my32BitUnsignedInteger is now equivalent to an unsigned 32-bit, big-endian integer that adheres to the RCF 868 standard. Yes, the long datatype is signed, but we ignored that fact, because we assumed that the secondsSince1900 only used the lower 32 bits). Because of coersing the long into a byte, all bits higher than 2^7 (first two digits in hex) will be ignored.
Source referenced: Java Network Programming, 4th Edition.

It seems that you can handle the signing problem by doing a "logical AND" on the values before you use them:
Example (Value of byte[] header[0] is 0x86 ):
System.out.println("Integer "+(int)header[0]+" = "+((int)header[0]&0xff));
Result:
Integer -122 = 134

Just made this piece of code, wich converts "this.altura" from negative to positive number. Hope this helps someone in need
if(this.altura < 0){
String aux = Integer.toString(this.altura);
char aux2[] = aux.toCharArray();
aux = "";
for(int con = 1; con < aux2.length; con++){
aux += aux2[con];
}
this.altura = Integer.parseInt(aux);
System.out.println("New Value: " + this.altura);
}

You can use the Math.abs(number) function. It returns a positive number.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.