I've been digging in hash table source code.
And found how hashing occurs:
int index = (hash & 0x7FFFFFFF) % tab.length;
I don't understand why bitwise AND used here?
if we turn 0x7FFFFFFF into binary we get = 111 1111 1111 1111 1111 1111 1111 1111
As I know bitwise AND will give 1 if first digit and second = 1
So if we get some object hashcode for example 2314539 turn it into binary and do & operation we actually get the same digit:
2314539 = 10 0011 0101 0001 0010 1011
10 0011 0101 0001 0010 1011
&
11 1111 1111 1111 1111 1111
=
10 0011 0101 0001 0010 1011
10 0011 0101 0001 0010 1011 = 2314539
As you can see this operation doesn't make any changes. So what's a point here?
Let us start with the meaning of remainder (%) in Java. According to JLS 15.17.3:
The remainder operation for operands that are integers after binary numeric promotion (§5.6.2) produces a result value such that (a/b)*b+(a%b) is equal to a.
It follows from this rule that the result of the remainder operation can be negative only if the dividend is negative, and can be positive only if the dividend is positive. Moreover, the magnitude of the result is always less than the magnitude of the divisor.
Suppose that the index was computed as index = hash % tab.length. If that were so, a negative value for hash (the dividend) would result in a negative value for index.
But we are going to use index to subscript tab, so it must lie between 0 and tab.length.
Instead, the actual calculation maps hash to a non-negative number first by masking out the sign bit. Then it performs the remainder operation.
So what's a point here?
Your worked example was for a positive hash value. The & does make a difference for negative hash values.
The point is to avoid a negative hash value giving a negative index value which will lead to an ArrayIndexOutOfBoundsException.
Related
Suppose we have int n = 2 ^ 31 then n-1 = 0111111111111111111111111111111, this is what I can get locally.
My guess: convert n to long first -> subtraction -> slice to fit into int.
System.out.println(n);
System.out.println(Integer.toBinaryString(n-1) + " : " + Integer.bitCount(n-1));
System.out.println(n-1);
// output
-2147483648
1111111111111111111111111111111 : 31
2147483647
But I found no specification to validate my guess, is there some?
From Integer overflow wiki.
When an arithmetic operation produces a result larger than the maximum above for an N-bit integer, an overflow reduces the result to modulo N-th power of 2, retaining only the least significant bits of the result and effectively causing a wrap around.
If my guess is just totally wrong, then how it actually works? Any link I can refer to?
Any help will be appreciated :)
That's just how arithmetic in two's complement works.
Subtracting 1 from 2^31 is the same as 2^31 plus -1, as per JLS §15.18.2,
For both integer and floating-point subtraction, it is always the case
that a-b produces the same result as a+(-b).
And also that
If an integer addition overflows, then the result is the low-order
bits of the mathematical sum as represented in some sufficiently large
two's-complement format. If overflow occurs, then the sign of the
result is not the same as the sign of the mathematical sum of the two
operand values.
Now we can calculate the sum of 2^31 and -1 in binary. 2^31 is one 1 followed by 31 zeroes, which is -2147483648 in two's complement. -1 in two's complement is 32 ones, so we have:
1000 0000 0000 0000 0000 0000 0000 0000
+1111 1111 1111 1111 1111 1111 1111 1111
As you can see, the last bit on the left there overflows, but according to the second excerpt, we ignore that. Adding all those up, we get:
0111 1111 1111 1111 1111 1111 1111 1111
which is 2147483647
To optimize my cpp code, I'm trying to use Right Shifting in some case. Here is an example:
int main()
{
int i = (1 - 2) >> 31; // sizeof(int) == 4
...
...
}
I've printed the i and I got -1. It means that it will use 1 instead of 0 to fill in the empty position if the number is negative. In other words, -1 >> 31 works as below:
1111...1 <--- the result of (1 - 2), which is -1
1111...1 <--- -1 >> 31, 1 is used to fill in the empty position
I just want to know if this behavior is clearly defined or not?
If it is UB in cpp, how about in Java?
Yes. It is implementation-defined.
According to C++03 5.8/3 which defines right-shifting:
The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has
an unsigned type or if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of the quotient of E1
divided by the quantity 2 raised to the power E2. If E1 has a signed
type and a negative value, the resulting value is
implementation-defined.
For more information, see this link.
By default it is signed int. The range is -32767 to 32767, bit wise range -111111111111111 to +111111111111111 the very first bit on the left acts as negative or positive indicator. And all the arithmetic operation will be done in 2's complement method. In general negative int are represents in two complements method i.e take you example how -1 is represent.
4 Bytes = 32 bits
0000 0000 0000 0000 0000 0000 0000 0000
how represent 1
0000 0000 0000 0000 0000 0000 0000 0001
Then we invert the digits. 0 becomes 1, 1 becomes 0.
1111 1111 1111 1111 1111 1111 1111 1110
Then we add 1.
1111 1111 1111 1111 1111 1111 1111 1111
This is how -1 is represented
The right-shift of a negative number is defined to shift in 1s to the highest bit positions, then on a 2s complement representation it will behave as an arithmetic shift - the result of right-shifting by N will be the same as dividing by 2N, rounding toward negative infinity. So shifting of -1 is -1
now take an other number
For example,
if you have the 8-bit 2s complement binary number let represent -3
0000 0011
Then we invert the digits.
1111 1100
Then we add 1.
1111 1101
11111101 representing -3 in decimal, and you perform an arithmetic right shift by 1 to give 11111110 representing -2 in decimal, this is the same as dividing -3 by 2^1, giving -1.5 which rounds towards negative infinity resulting in -2.
In Java, behavior of >> is well-defined for negative numbers (see below).
In C++, behavior of >> is undefined for negative numbers (see answer by rsp).
Quoting Java Language Specification, §15.19. Shift Operators:
The value of n >> s is n right-shifted s bit positions with sign-extension. The resulting value is floor(n / 2s). For non-negative values of n, this is equivalent to truncating integer division, as computed by the integer division operator /, by two to the power s.
The value of n >>> s is n right-shifted s bit positions with zero-extension, where:
If n is positive, then the result is the same as that of n >> s.
If n is negative and the type of the left-hand operand is int, then the result is equal to that of the expression (n >> s) + (2 << ~s).
If n is negative and the type of the left-hand operand is long, then the result is equal to that of the expression (n >> s) + (2L << ~s).
System.out.println((-1<<31));
Why this is giving output -2147483648
I know -1<<31 will give 10000000000000000000000000000000, so it should give ans (int)Math.pow(2,31) that is equals to 2147483648
-1<<31 gives 10000000000000000000000000000000 which is -2147483648, not 2147483648. Note that the left bit is the sign bit, so if it's 1, this is a negative number.
BTW, 1<<31 would also give you -2147483648, since 2147483648 is higher than Integer.MAX_VALUE. On the other hand, 1L<<31 would give you 2147483648, since the result would be a long.
I know -1<<31 will give 100000000000000000, so it should give ans (int)Math.pow(2,31) that is equals to 2147483648
That would be the case if int was a two's complement unsigned primitive; but int is signed.
You are correct in the fact that in binary this indeed gives what you say; however, since this is a signed two's complement primitive, the result will be x(0) * 2^0 + x(1) * 2^1 + ... + x(n-2) * 2^(n-2) - x(n-1) * 2^(n-1) (minus, not plus), where x(y) is the value of the y-th bit, counting from 0.
Hence your result.
Nowadays in most architectures numbers are stored in 2-complements, see Wikipedia.
So your result is correct. The sign bit is set and all the resting zeros (because 2-complement) makes that the most negative number for that data type, see here.
Thinking in 2-complements
-1 is represented by 1111 1111 1111 1111 1111 1111 1111 1111
31 shifts to the left yields
1000 0000 0000 0000 0000 0000 0000 0000 which represents -2.147.483.648
I have read here (https://stackoverflow.com/a/27762490/4415632) that when integer overflow occurs, the most significant bits are simply cut off.
However, I have also read here (https://stackoverflow.com/a/27747180/3808877) that when overflow occurs, "the value becomes the minimum value of the type, and start counting up again." Which one is correct, or are both answers correct? If so, can anyone show me why those two interpretations are equivalent to each other?
Both are correct, it depends on context. One is the result of casting and one is the result of overflow. Those are different operations. For example, if you cast Long.MAX_VALUE to an int that is a cast operation
System.out.println((int) Long.MAX_VALUE); // <-- -1
If you overflow an int by adding one to Integer.MAX_VALUE then
System.out.println(Integer.MAX_VALUE + 1); // <-- Integer.MIN_VALUE
Both interpretations are correct, because they are actually the same.
Let's look at the maths to see why.
Java stores values in byte, short, char, int and long in a format called two's complement.
In case of byte, short, int and long it is signed, in case of char it is unsigned.
One of the attributes of the two's complement format is that for most operations it does not matter whether the value is interpreted as signed or unsigned as the resulting bit pattern would be the same.
To shorten things, I'll explain it using byte, but the other types work along the same scheme.
A byte has 8 bits. The topmost bit is interpreted as sign bit. So, the bit pattern goes like this:
snnn nnnn
The separation into two groups of 4 bits each is called Nibble and is performed here for pure readability. As a side note, a nibble can be represented by a hexadecimal digit.
So there are 8 bits in a byte, and each bits could be 0 or 1. This leaves us with 2^8 = 256 different values that could be stored in a byte.
Here are some sample values:
0000 0000 -> 0
0000 0001 -> 1
0000 0010 -> 2
0100 0000 -> 64
0111 1111 -> 127
1000 0000 -> -128
1111 1110 -> -2
1111 1111 -> -1
The 2's complement value of signed numbers which are negative, i.e. the sign bit is set, is created by taking the positive value of the 8 bits and subtracting the range, i.e. in case of a byte by subtracting 256.
Now let's see what happens if you take -1 and add 1.
1111 1111 -1 / 255
+ 0000 0001 1
--------------
= 1 0000 0000 -0 / 256 intermediate result
= 0000 0000 0 / 256 result after dropping excess leading bits
There is an overflow. The result would need 9 bits now, but the byte only has 8 bits, so the most significant bit is lost.
Let's look at another example, -1 plus -1.
1111 1111 -1 / 255
+ 1111 1111 -1 / 255
--------------
= 1 1111 1110 -2 / 510 intermediate result
= 1111 1110 -2 / 254 result after dropping excess leading bits
Or this, 127 plus 5.
0111 1111 127
+ 0000 0101 5
--------------
= 1000 0100 132 / -124
As we can see, the leading bits are dropped and this actually is what leads to the effect that causes it to overflow by "starting to count from the minimum value again".
I add another option: a processor trap. Some processors will generate a trap on integer overflows. When available, this feature usually can be enabled in user mode by setting a bit in the processor status register.
Is there a formula to calculate what the overflow of a Java int would be?
Example: if I add 1 to Integer.MAX_VALUE; the answer is not 2147483648, but rather -2147483648.
Question: if I wanted to calculate what Java would print for a value larger than 2^32, is there an easy mathematical expression (theoretical, not in code)?
((x + 231) mod 232) - 231
Is this what you're looking for? That should be the result of any mathematical operation on a machine that uses 32-bit signed 2's complement integers. That is, if the mathematical value of an operation returns x, the above formula gives the integer that would actually be stored (if the operation doesn't fault, and it's not a "saturating" operation).
Note that I'm using "mod" with a mathematical definition, not the way the % operator works in Java or C. That is, A mod B, where A and B are integers and B > 0, always returns an integer in the range 0 .. B-1, e.g. (-1) mod 5 = 4. More specifically, A mod B = A - B*floor(A/B).
In java an int is 32 bits but it is also signed, which means that the first bit is the "negative" sign. 1 means negative and 0 means positive. Because of this, the largest number is 2147483647 (0111 1111 1111 1111 1111 1111 1111 1111). If you add 1 it makes it 1000 0000 0000 0000 0000 0000 0000 0000 which translates to -2147483648. For any values larger than that you would need to use a long
I believe this would work:
int expected = ((val + Integer.MAX_VALUE) % Integer.MAX_VALUE) - Integer.MAX_VALUE;