Java - Masking out sign bit causes unexpected behavior - java

I have the following value:
int x = -51232;
Java integers are 32 bits, so in binary this should be the following:
10000000000000001100100000100000
The sign bit on the left is set to 1 since x is negative.
Then I do the operation
x = (x & Integer.MAX_VALUE);
Integer.MAX_VALUE is 2147483647 and in binary that would be:
01111111111111111111111111111111
0 on the left because the value is positive.
So why does x & Integer.MAX_VALUE yield 2147432416? The AND operator should only retrieve bits that x and Integer.MAX_VALUE have in common, which should be equivalent to -x (since they do not share the same sign bit).
What's going on here?

Your misunderstanding is caused by lack of knowledge regarding how negative integers are represented in binary in Java. You should read about 2's complement.
10000000000000001100100000100000 is not the binary representation of -51232.
11111111111111110011011111100000 is.
And when you run bitwise AND, you get:
11111111111111110011011111100000 (-51232)
01111111111111111111111111111111 (Integer.MAX_VALUE)
--------------------------------
01111111111111110011011111100000 (2147432416)
Here's the binary representation of -51232 next to the binary representation of 51232. You can see that their sum is 232. that's always the case with 2's complement, for any pair of ints x and -x.
00000000000000001100100000100000 (-51232)
11111111111111110011011111100000 (51232)

Integers are stored in two complement: https://en.m.wikipedia.org/wiki/Two%27s_complement.
Thus while the leftmost bit indicate negative numbers, it is not really a sign bit as it would be for floating point numbers in common representation.
The main reason of this representation is that it make it easy to do addition and substraction among other at the hardware level.
One should not that with two complement notation, -1 is represented by all bit set to 1.

Related

Long Representation vs Double representation of positive and negative zero in java

I was wondering about the differences between positive and negative zero in different numeric types.
I understand the IEEE-754 for floating point arithmetic and bit representation in double precision so the following didn't come as a surprise
double posz = 0.0;
double negz = -0.0;
System.out.println(Long.toBinaryString(Double.doubleToLongBits(posz)));
System.out.println(Long.toBinaryString(Double.doubleToLongBits(negz)));
// output
>>> 0
>>> 1000000000000000000000000000000000000000000000000000000000000000
What did surprise me and showed me that im clueless about the bit representation of long type in java is that even if i shift right (unsigned >>>) then the binary representation of both positive and negative zero is the same
long posz = 0L;
long negz = -0L;
for (int i = 63; i >= 0; i--) {
System.out.print((posz >>> i) & 1);
}
System.out.println();
for (int i = 63; i >= 0; i--) {
System.out.print((negz >>> i) & 1);
}
// output
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> 0000000000000000000000000000000000000000000000000000000000000000
so i am wondering what does java do from a bit representation when i write the following
long posz = 0L;
long negz = -0L;
Does the compiler understand that they are both zero and disregards the sign (and so assignes 0 to the sign bit) or is there other magic here?
or is there other magic here?
Yes. 2's complement.
2's complement is a bit magical. It accomplishes 2 major objectives. Before getting into that, let's first stew on the notion of negative zero for a moment.
Negative zero is kinda weird. Why does it exist at all?
Negative zero isn't actually a thing. Ask any mathematician "Hey, so, what's up with negative zero?" and they'll just look at you in befuddlement. It's not a thing. Mathematically, 0 and -0 are utterly identical. Not just 'nearly identical', but 100%, fully, in all possible ways, identical. We don't generally want our numbers to be capable of representing both 5.0 as well as 5.00 - as those two are entirely, 100%, identical. If you don't think that a value system ought to waste bits trying to differentiate between 5.0 and 5.00, then it's equally bizarro to want the ability to represent -0.0 and +0.0 as distinct entities.
So, wanting -0 in the first place is kinda weird. All the numeric primitives (long, int, short, byte, and I guess char which is technically numeric too) all cannot represent this number. Instead, long z = -0 boils down to:
Take the constant "0".
Apply the 'negate' operation to this number (- is a unary operator. Just like 2+5 makes the system calculate the binary operation of "addition" on elements 2 and 5, -x makes the system calculate the unary operation of "negation" on element x. Applying the negation operation to 0 produces 0. It's no different from writing, say, int x = 5 + 0;. That +0 part doesn't do anything. The - in front of -0 doesn't do anything. In contrast to -0.0 where it does do something (gets you negative zero, the double value, instead of positive zero).
Store this result in z (so, just 0 then).
There is no way to tell if that minus is there. They both result in ALL ZERO bits, and hence, there is no way for the computer to tell if you initialized that variable with the expression -0 or with +0. Again in contrast to double where as you noticed there's a bit different.
So why does double have it then?
Let's stew a bit on the notion of doubles and IEEE-754 math.
A double takes 64 bits. From basic pure mathematical principles then, a double is as incapable of representing more than 2^64 different possible values you are capable of breaking the speed of light or making 1+1=3.
And yet, a double aims to represent all numbers. There are way more numbers between 0 and 1 than 2^64 options (in fact, an infinite amount of numbers exist between 0 and 1), and that's just 0 to 1.
So, how doubles actually work is different. A few less than 2^64 numbers are chosen from the entire number line. Let's call these the blessed numbers.
The blessed numbers are not equally distributed. The closer you are to 1, the more blessed numbers exist. In other words, the distance between 2 blessed numbers increases as you move away from 1. For example, if you go from, say, 1e100 (a 1 with a hundred zeroes) and want to find the next blessed number, it's quite a ways. It's in fact higher than 1.0! - 1e100+1 is in fact 1e100 again, because the way double math works is that after every single last mathematical operation you to do them, the end result is rounded to the nearest blessed number.
Let's try it!
double d = 1e100;
System.out.println(d);
System.out.println(d + 1);
// prints: 1.0E100
// 1.0E100
But that means.. double values don't actually represent a single number!!. What any given double represents is in fact this concept:
An unknown number whose value lies between [D - 𝛿, D + 𝛿], where D is the blessed number that is closed to this unknown number this value represents, and, and 𝛿 is half of the distance between D and the next nearest blessed number on either side.
Given that usually 𝛿 is incredibly small, this is 'good enough'. But this weirdness does explain why you really, really do not want any business at all with double if accuracy is important (such as with currencies. Don't store those in doubles, ever!)
Given that, what does -0.0 represent? not actually just 0. It represents, specifically: An unknown number whose value lies between [-𝛿, 0] where 0 is real zero (and this, has no sign), and 𝛿 is Double.MIN_VALUE: the smallest non-zero positive number representable with a double.
That's why -0.0 and +0.0 both exist: They are in fact different concepts. Rarely relevant, but sometimes it is. In contrast to e.g. long where 5 just means 5 and not "between 4.5 and 5.5", because longs fundamentally don't recognize that fractional parts exist in the first place. Given that 5 just means 5, then 0 just means 0, and there is no such thing as negative zero in the first place.
Now we get to 2's complement
2's complement is a cool system. It has two neat properties:
It only has the one zero.
It does not matter if you treat the bit sequence as signed-by-way-of-2s-complement or as unsigned, for the purposes of the operations: Addition, Substraction, Increment, Decrement, zero-check. The modifications you do to the bits to implement those operations is identical.
It DOES matter for greater than, less than, and divide.
2's complement works like this: To negate a number, take all bits and flip them (i.e. do a NOT operation on the bits). Then, add 1.
Let's try it!
int x = 5;
int y = -x;
for (int i = 31; i >= 0; i--) {
System.out.print((x >>> i) & 1);
}
System.out.println();
for (int i = 31; i >= 0; i--) {
System.out.print((y >>> i) & 1);
}
System.out.println();
// prints 00000000000000000000000000000101
// 11111111111111111111111111111011
As we can see, the 'flip all bits and add 1' algorithm was applied.
2s complement is, of course, reversible: If you do 'flip all bits and add 1' twice in a row you get the same number out.
Now let's try -0. 0 is 32 0 bits, then flip them all, then add 1:
00000000000000000000000000000000
11111111111111111111111111111111 // flip all
100000000000000000000000000000000 // add 1
00000000000000000000000000000000 // that 1 fell off
and because ints can only store 32 bits, that final '1' falls off of the end. And we're left with zero again.
Now let's go with bytes ( abit smaller) and try to add, say, 200 and 50 together.
11001000 // 200 in binary
00110010 // 50 in binary
-------- +
11111010 // 250 in binary.
now let's instead go: Oh wait, whoops, that was an error, actually these numbers are in 2s complement. That wasn't 200, nono. 11001000 is a bit sequence that actually means (let's apply the 'flip all bits, add 1' scheme: 00111000 - it's actually -56. So the operation was meant to represent '-56 + 50'. Which is -6. -6 in binary is (write out 6, flip bits, add 1):
00000110
11111001
11111010
hey now, look at that, nothing changed! It's the same result! So, when the computer does x + y, where x and y are numbers, the computer does not care. Whether x is "an unsigned number" or "a signed with 2s complement number", the operation is identical.
That's why 2s complement is applied. It makes math MUCH faster. The CPU doesn't have to futz about with branching out to deal with sign bits.
In this sense it is more correct to say that in java, int, long, char, byte and short are neither signed nor unsigned, they just are. At least for the purposes of +, -, ++, and --. No the idea that int is signed is fundamentally a property of e.g. System.out.println(int) - that method chooses to render the bitsequence 11111111111111111111111111111111 as "-1" instead of as 4294967296.
long has no such thing as negative zero. Only float and double have a different representation of positive and negative zero.

Converting Two's complement to denary in Java

I am writing a program to do number system conversions.
The problem I am having is when I input a two's complement number Java is taking it in as a positive binary number for example if I input 11111101 as -3 instead the program takes in 253. how can i make it so the program takes in -3.
Though not addressing the title directly, sign-extending your input is a step towards that goal. Here are some ways to sign-extend that work in a more general case, not just when the input is 8 or 16 or 32 bits.
Relying on arithmetic right shift: (x << k) >> k where k is the number of "unused" bits. So for example to extend an 8-bit number into a 32-bit number, k would be 24.
A XOR trick: (x ^ m) - m, where m is a mask that has a 1 exactly where the signbit of your input is. Eg for 8-bit input, m = 1 << 7. Advantage is that it depends only on the input, not the type you're converting to - for Java that's not really a consideration, but that makes it attractive for use in templated C++ code.
Explicit wrapping, like jsheeran's comment.

Unary "~" operator - What exactly is happening here?

I recently did a Java course (1 week crash course), and we covered some binary mathematics.
This unary ~ operator (tilde I think it's called?) was explained to us thus:
It inverts the bit pattern turning every "0" into a "1" and every "1" into a "0".
e.g. There are 8 bits to a byte. If you have the following byte: 00000000 the inverted value would change to become 11111111.
The above explanation is clear and concise, and totally makes sense to me. Until, that is, I try to implement it.
Given this:
byte x = 3;
byte y = 5;
System.out.println(~x);
System.out.println(~y);
The output is:
-4
-6
I'm very confused about how this happens.
If +3 in binary is 11, then the inversion of this would be 00, which clearly isn't -3.
But as there are 8 bits in a byte, then shouldn't the binary representation of +3 be written as 00000011?
Which would invert to become 11111100. Converted back to decimal value this would be 252.
If however you write the +3 as 011, then it does indeed convert to 100, which is +4, but then how do you know it's a negative number?
How about if you try 0011, which converts to 1100, which if you use the first bit as a sign, then it does indeed become -4.
Ah - so at this point I thought I was getting somewhere.
But then I got to the second value of y = 5.
How do we write this? Using the same logic, +5 converts to binary 0101, which inverts to 1010.
And it's around now that I'm horribly confused. This looks to represent either a signed value of -2, or an unsigned value of +10 decimal? Neither of which are the -6 I'm getting printed out.
Again, if I increase the length up to the 8 digits of a byte, +5 is 00000101, which inverted becomes 11111010. And I really can't find a way to turn this into -6.
Does anyone out there understand this, as I have no idea what is happening here and the more numbers I print out the more confused I become.
Google doesn't seem to come up with anything much on this - maybe it doesn't like looking at little operator signs.. :-(
See this demonstration: -
3 -> 0011
~3 -> 1100 -> -4 (2's complement)
5 -> 0101
~5 -> 1010 -> -6 (2's complement)
Since signed integers are stored as 2's complement, taking 2's complement of 1100 gives you 4. Now since 1100 is a negative number. So, the result is -4. Same is the case with 1010.
1100
0011 - 1's complement
0100 - 2's complement - value = 4 (take negative)
From wikipedia: In two's complement notation, a non-negative number is represented by its ordinary binary representation; in this case, the most significant bit is 0. The two's complement operation is the negation operation, so negative numbers are represented by the two's complement of the absolute value.
To get the two's complement of a binary number, the bits are inverted, or "flipped", by using the bitwise NOT operation; the value of 1 is then added to the resulting value, ignoring the overflow which occurs when taking the two's complement of 0. http://en.wikipedia.org/wiki/Two%27s_complement
So if you have 0101 which is +5 the inverse of that is 1010, which is -5.
You don't really read the 010 as a 5 though, but when you see the 1 at the beginning, you know that to get the number you have to invert the rest of the digits again to get the positive number which you want to negate. If that makes sense.
It's a bit of an alien concept if you have not worked with it before. It's certainly not the way that decimal numbers work, but it is actually simple once you see what happening.
A value of 8 decimal is written as 01010, which negates to 10101. The first digit (1) means it's negative, and then you flip the rest back to get the numeric value: 1010.
One thing to remember is that Two's complement is not the same as ordinary old binary system counting. In normal binary the value of 10101 (which in Two's complement is -8 as above) is of course 21. I guess this is where the confusion comes - how do you tell the difference by looking at them? You must know which representation has been used in order to decide what the value of the number actually is. There is also One's complement which differs slightly.
A good tutorial on binary maths, including One's and Two's complement is given here. http://www.math.grin.edu/~rebelsky/Courses/152/97F/Readings/student-binary
Signed integers are almost universally stored using twos complement. This means inverting the bits (taking the one's complement) and adding one. This way you don't have two representations of integer zero (+0 and -0), and certain signed operations become easier to implement in hardware.
Java uses signed numbers in Two's complement. Your reasoning would be correct in C or other languages when using types as "unsigned int" or "unsigned char".

why Integer.MAX_VALUE + 1 == Integer.MIN_VALUE?

System.out.println(Integer.MAX_VALUE + 1 == Integer.MIN_VALUE);
is true.
I understand that integer in Java is 32 bit and can't go above 231-1, but I can't understand why adding 1 to its MAX_VALUE results in MIN_VALUE and not in some kind of exception. Not mentioning something like transparent conversion to a bigger type, like Ruby does.
Is this behavior specified somewhere? Can I rely on it?
Because the integer overflows. When it overflows, the next value is Integer.MIN_VALUE. Relevant JLS
If an integer addition overflows, then the result is the low-order bits of the mathematical sum as represented in some sufficiently large two's-complement format. If overflow occurs, then the sign of the result is not the same as the sign of the mathematical sum of the two operand values.
The integer storage gets overflowed and that is not indicated in any way, as stated in JSL 3rd Ed.:
The built-in integer operators do not indicate overflow or underflow in any way. Integer operators can throw a NullPointerException if unboxing conversion (§5.1.8) of a null reference is required. Other than that, the only integer operators that can throw an exception (§11) are the integer divide operator / (§15.17.2) and the integer remainder operator % (§15.17.3), which throw an ArithmeticException if the right-hand operand is zero, and the increment and decrement operators ++(§15.15.1, §15.15.2) and --(§15.14.3, §15.14.2), which can throw an OutOfMemoryError if boxing conversion (§5.1.7) is required and there is not sufficient memory available to perform the conversion.
Example in a 4-bits storage:
MAX_INT: 0111 (7)
MIN_INT: 1000 (-8)
MAX_INT + 1:
0111+
0001
----
1000
You must understand how integer values are represented in binary form, and how binary addition works. Java uses a representation called two's complement, in which the first bit of the number represents its sign. Whenever you add 1 to the largest java Integer, which has a bit sign of 0, then its bit sign becomes 1 and the number becomes negative.
This links explains with more details: http://www.cs.grinnell.edu/~rebelsky/Espresso/Readings/binary.html#integers-in-java
--
The Java Language Specification treats this behavior here: http://docs.oracle.com/javase/specs/jls/se6/html/expressions.html#15.18.2
If an integer addition overflows, then the result is the low-order bits of the mathematical sum as represented in some sufficiently large two's-complement format. If overflow occurs, then the sign of the result is not the same as the sign of the mathematical sum of the two operand values.
Which means that you can rely on this behavior.
On most processors, the arithmetic instructions have no mode to fault on an overflow. They set a flag that must be checked. That's an extra instruction so probably slower. In order for the language implementations to be as fast as possible, the languages are frequently specified to ignore the error and continue. For Java the behaviour is specified in the JLS. For C, the language does not specify the behaviour, but modern processors will behave as Java.
I believe there are proposals for (awkward) Java SE 8 libraries to throw on overflow, as well as unsigned operations. A behaviour, I believe popular in the DSP world, is clamp the values at the maximums, so Integer.MAX_VALUE + 1 == Integer.MAX_VALUE [not Java].
I'm sure future languages will use arbitrary precision ints, but not for a while yet. Requires more expensive compiler design to run quickly.
The same reason why the date changes when you cross the international date line: there's a discontinuity there. It's built into the nature of binary addition.
This is a well known issue related to the fact that Integers are represented as two's complement down at the binary layer. When you add 1 to the max value of a two's complement number you get the min value. Honestly, all integers behaved this way before java existed, and changing this behavior for the Java language would have added more overhead to integer math, and confused programmers coming from other languages.
When you add 3 (in binary 11) to 1 (in binary 1), you must change to 0 (in binary 0) all binary 1 starting from the right, until you got 0, which you should change to 1. Integer.MAX_VALUE has all places filled up with 1 so there remain only 0s.
Easy to understand with byte example=>
byte a=127;//max value for byte
byte b=1;
byte c=(byte) (a+b);//assigns -128
System.out.println(c);//prints -128
Here we are forcing addition and casting it to be treated as byte.
So what will happen is that when we reach 127 (largest possible value for a byte) and we add plus 1 then the value flips (as shown in image) from 127 and it becomes -128.
The value starts circling around the type.
Same is for integer.
Also integer + integer stays integer ( unlike byte + byte which gets converted to int [unless casted forcefully as above]).
int int1=Integer.MAX_VALUE+1;
System.out.println(int1); //prints -2147483648
System.out.println(Integer.MIN_VALUE); //prints -2147483648
//below prints 128 as converted to int as not forced with casting
System.out.println(Byte.MAX_VALUE+1);
Cause overflow and two-compliant nature count goes on "second loop", we was on far most right position 2147483647 and after summing 1, we appeared at far most left position -2147483648, next incrementing goes -2147483647, -2147483646, -2147483645, ... and so forth to the far most right again and on and on, its nature of summing machine on this bit depth.
Some examples:
int a = 2147483647;
System.out.println(a);
gives: 2147483647
System.out.println(a+1);
gives: -2147483648 (cause overflow and two-compliant nature count goes on "second loop", we was on far most right position 2147483647 and after summing 1, we appeared at far most left position -2147483648, next incrementing goes -2147483648, -2147483647, -2147483646, ... and so fores to the far most right again and on and on, its nature of summing machine on this bit depth)
System.out.println(2-a);
gives:-2147483645 (-2147483647+2 seems mathematical logical)
System.out.println(-2-a);
gives: 2147483647 (-2147483647-1 -> -2147483648, -2147483648-1 -> 2147483647 some loop described in previous answers)
System.out.println(2*a);
gives: -2 (2147483647+2147483647 -> -2147483648+2147483646 again mathematical logical)
System.out.println(4*a);
gives: -4 (2147483647+2147483647+2147483647+2147483647 -> -2147483648+2147483646+2147483647+2147483647 -> -2-2 (according to last answer) -> -4)`

Why do integers in Java integer not use all the 32 or 64 bits?

I was looking into 32-bit and 64-bit. I noticed that the range of integer values that can stored in 32 bits is ±4,294,967,295 but the Java int is also 32-bit (If I am not mistaken) and it stores values up to ±2 147 483 648. Same thing for long, it stores values from 0 to ±2^63 but 64-bit stores ±2^64 values. How come these values are different?
Integers in Java are signed, so one bit is reserved to represent whether the number is positive or negative. The representation is called "two's complement notation." With this approach, the maximum positive value represented by n bits is given by
(2 ^ (n - 1)) - 1
and the corresponding minimum negative value is given by
-(2 ^ (n - 1))
The "off-by-one" aspect to the positive and negative bounds is due to zero. Zero takes up a slot, leaving an even number of negative numbers and an odd number of positive numbers. If you picture the represented values as marks on a circle—like hours on a clock face—you'll see that zero belongs more to the positive range than the negative range. In other words, if you count zero as sort of positive, you'll find more symmetry in the positive and negative value ranges.
To learn this representation, start small. Take, say, three bits and write out all the numbers that can be represented:
0
1
2
3
-4
-3
-2
-1
Can you write the three-bit sequence that defines each of those numbers? Once you understand how to do that, try it with one more bit. From there, you imagine how it extends up to 32 or 64 bits.
That sequence forms a "wheel," where each is formed by adding one to the previous, with noted wraparound from 3 to -4. That wraparound effect (which can also occur with subtraction) is called "modulo arithemetic."
In 32 bit you can store 2^32 values. If you call these values 0 to 4294967295 or -2147483648 to +2147483647 is up to you. This difference is called "signed type" versus "unsigned type". The language Java supports only signed types for int. Other languages have different types for an unsigned 32bit type.
NO laguage will have a 32bit type for ±4294967295, because the "-" part would require another bit.
That's because Java ints are signed, so you need one bit for the sign.

Categories