java : shift distance for int restricted to 31 bits - java

Any idea why shift distance for int in java is restricted to 31 bits (5 lower bits of the right hand operand)?
http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.19
x >>> n
I could see a similar question java Bit operations >>> shift but nobody pointed the right answer

The shift distance is restricted to 31 bits because a Java int has 32 bits. Shifting an int number by more than 32 bits would produce the same value (either 0 or 0xFFFFFFFF, depending on the initial value and the shift operation you use).

It's a design decision, but it seems a bit unfortunate at least for some use cases. First some terminology: let's call the approach of defining as zero all shifts amounts larger than the number of bits in the shifted word the saturating approach, and the Java approach of using only the bottom 5 (or 6 for long) bits to define the shift amount as the mod approach.
You can look at the problem by listing the useful shift values. Those are shift amounts that result in unique output values1. If you take >>>, the interesting values are 0 though 32 inclusive. 0 results in an unchanged value, and 32 results in 0. Shifting by more than 32 would again produce the same result as 32, sure - but java doesn't even let you shift by 32: it stops at 31! A shift by 32 will, perhaps unexpectedly, leave your value unchanged.
In many uses of >>> a shift by 32 is not possible, or the Java behavior works. In other cases, however, the natural result is 32, and you must special case zero.
As to why they would choose that design? Well, it probably helped that the common PC hardware at the time (x86, just like today) implements shifts in exactly that way (using only the last 5 bits for 32-bit shifts, and the last 6 for 64-bits). So the shifts can be directly mapped to hardware without any special cases, conditional moves or branches2.
Furthermore, for hardware that doesn't implement those semantics by default, it is easy to get the Java semantics by a simple mask: shiftAmount & 0x1F. That's going to be fast on all hardware. The reverse mapping - implementing saturating shifts on hardware that doesn't support it is more complex: you may need a costly compare and branch, some bit twiddling hacks or predicated moves to handle the > 31 case.
Finally, the mod approach is quite natural for many algorithms. For example, if you are implementing a bitmap structure, addressable per-bit, a good implementation may be to have an array of integers, with each integer representing 32 bits. Internally to index into the Nth bit, you would break N down into two parts - the high 27 bits would find the word in the array the bit is in, and the low 5 bits would pick the bit out of the word. To pick the bit out of the word (e.g., to move it to the LSB), you might do:
int val = (word >>> (index & 0x1F)) & 1
That sets val to 1 if the bit was set, 0 otherwise. However, because of the way the Java >>> operator was specified, you don't need the & 0x1F part at all, because it is already implied in the mod definition! So you can omit it, and indeed the JDK's BitSet uses exactly that trick.
1 Granted, any value without a 1 in the MSB may not produce unique values under >>>, once all the 1s get shifted off, so let's just talk about any value with a leading one.
2 For what it's worth, I checked ARM and the semantics are even weirder: for variable shifts, the bottom eight bits of the shift amount is used. So the shift is a weird hybrid - it is effectively a saturating shift once you exceed 31, but only up to 255, at which point it loops around and suddenly has non-zero values for the next 31 values, etc.

Related

Explanation for computing the nearest power of two greater than the argument using bit twiddling

This is a well-known function to compute the next nearest power of two for positive arguments. However I don't have much experience in bit twiddling to grok the logic/theory behind it. Would you kindly explain why and how that works? Particularly, the choice of 1,2,4,8,16 for shifting and if the range of the argument was greater what would be used, for example, for a long? Why logical shift instead of arithmetic and finally, what does ORing shifted arg accomplish?
static int lowestPowerOfTwoGreaterThan(int arg) {
arg |= (arg >>> 1);
arg |= (arg >>> 2);
arg |= (arg >>> 4);
arg |= (arg >>> 8);
arg |= (arg >>> 16);
return ++arg;
}
It's really simple if you track the changes to the value. A power of two has only one set bit e.g 100, 10000000, 10000000000, which means that a power of two, minus one, is a sequence of ones, e.g. 10000 - 1 = 1111. So what the function does is change any number to a sequence of ones (without moving its highest 1 bit) and then add one, e.g. it changes 10000001000111001 (66105) to 11111111111111111 (131071) and adds one to make 100000000000000000 (131072).
First, it ORs the value by itself shifted 1 bit to the right. This has the effect of lengthening all runs of 1 in the value.
10000001000111001
OR 01000000100011100
=================
11000001100111101
You now notice that each run of zeroes is preceded by at least two ones, so instead of shifting by one again, we can speed up the process by shifting by two bits instead of one.
11000001100111101
OR 00110000011001111
=================
11110001111111111
Now, each run of zeroes is preceded by at least four ones, so we shift by four this time, and OR the values again.
11110001111111111
OR 00001111000111111
=================
11111111111111111
Repeating this logic, the next shift distance will be 8, then 16 (stop here for 32-bit values), then 32 (stop here for 64-bit values). For this example, the result remains unchanged for further shifts since it is already a sequence of ones.
This method changes any binary number to a sequence of ones. Adding 1 to this, as stated before, produces the next greatest power of two.

Can we store integers in less than 4 bytes?

I understand that using a short in Java we can store a minimum value of -32,768 and a maximum value of 32,767 (inclusive).
And using an int we can store a minimum value of -2^31 and a maximum value of 2^31-1
Question: If I have an int[] numbers and the numbers I can store are positive up to 10Million.
Is it possible to somehow store these numbers without having to use 4 bytes for each? I am wondering if for a specific "small" range there might be some "hack/trick" so that I could use less memory than numbers.length*4
You could attempt to use a smaller number of bits by using masking or bit-operations to represent each number, and then perform a sign-extension later on if you wish to get the full number of bits. This kind of operation is done on a system-architecture level in nearly all computer systems today.
It may help you to research 2's Complement, which seems to be what you are going for... And possibly Sign Extension for good measure.
Typically, in high-level languages an int is represented by the basic size of the processor register. ex) 8, 16, 32, or 64-bits.
If you use a 2's-Complement method, you could easily account for the full spectrum of positive and negative numbers if needed. This is also very easy on the hardware, because you only have to invert all the bits and then add 1, which may prove to give you a big performance increase over other possible methods.
How 2's Complement Works:
Get –N by inversing all bits and then
add 1
That is, get 1-complement of N and then add 1 to it.
For example with 8-bit words:
9 = 00001001
-9 = 11110111 (11110110 + 1)
Easily and efficiently in hardware
(inverse and then +1)
• An n-bit word can be used to represent numbers
from -2^(N-1) to +(2^(N-1) - 1)
UPDATE: Bit-operations to represent larger numbers.
If you are trying to get a larger number, say 1,000,000 as in your comment, then you can use a Bitwise left-shift operation to then extract the number by increasing your current number by the appropriate power of 2.
9 (base 10): 00000000000000000000000000001001 (base 2)
--------------------------------
9 << 2 (base 10): 00000000000000000000000000100100 (base 2) = 36 (base 10)
You could also try:
(Zero-fill right shift)
This operator shifts the first operand the specified number of bits to the right. Excess bits shifted off to the right are discarded. Zero bits are shifted in from the left. The sign bit becomes 0, so the result is always non-negative.
For non-negative numbers, zero-fill right shift and sign-propagating right shift yield the same result. For example, 9 >>> 2 yields 2, the same as 9 >> 2:
9 (base 10): 00000000000000000000000000001001 (base 2)
--------------------------------
9 >>> 2 (base 10): 00000000000000000000000000000010 (base 2) = 2 (base 10)
However, this is not the case for negative numbers. For example, -9 >>> 2 yields 1073741821, which is different than -9 >> 2 (which yields -3):
-9 (base 10): 11111111111111111111111111110111 (base 2)
--------------------------------
-9 >>> 2 (base 10): 00111111111111111111111111111101 (base 2) = 1073741821 (base 10)
As others have stated in the comments, you could actually hamper your overall performance in the long-run if you are attempting to manipulate data that is not specifically word/double/etc-aligned. This is because your hardware will have to work a bit harder to try and piece together what you truly need.
Just another thought. One parameter is the range of numbers you have. But also other properties can help save storage. For example, when you know that each number will be divisible by some multiple of 8, you need not store the lower 3 bits, since you know they are 0 all the time. (This is how the JVM stores "compressed" references.)
Or, to take another possible scenario: When you store prime numbers, then all of them (except 2) will be odd. So no need to store the lowest bit, as it is always 1. Of course you need to handle 2 separately. A similar trick is used in floating point representations: since the first bit of the mantissa of a non-null number is always 1, it is not stored at all, thus increasing precision by 1 bit.
One solution is to use bit manipulation and use a number of bits of your choosing to store a single number. Say you select to use 5 bits. You can then store 4 such numbers in 4 bytes. You need to pack and unpack the bits into an integer when operations need to be done.
You need to decide if you want to deal with negative numbers in which case you need to store a sign bit.
To make it easier to use, you need to create a class that will conceal the nitty-gritty details via get and store operations.
In light of the questions about performance, as is often the case, we are trading space for performance or vise versa. Depending on the situation various optimization techniques can be used to minimize the number of CPU cycles.
That said, is there a need for such optimization in the first place? If so, is it at the memory level or storage level? Could we use a generic mechanism such as compression to take care of this instead of using special techniques?

Alternative for checking module before using `>>`?

In- Java , C# , Javascript :
AFAIU - the >> is a shift right operand which can deal also with signed numbers :
There is no problem with :
12>>2 --> 3
And also for signed number :
-12>>2 --> -3
But when the decimal result is not an integer , the result are differnt:
10>>2 --> 2
Whereas -10>>2 --> -3
I'm fully aware why it is happening (via Two's complement ) , but :
Question :
Does it mean that when I use the fastest division ever >> - I must check that :
10%4 is not zero ?
Am I missing something here ?
You can use methods like Integer.numberOfTrailingZeros() and Long.numberOfTrailingZeros() to tell if shifting will be accurate or truncated.
You can also use bitwise AND to test the last bits, for example testing the last 4 bits:
int i = 543;
if ((i & 0x0f) == i )
System.out.println("Last 4 bits are zeros!");
Although note that it's not worth using bit shift for "fast" division. You're not going to outsmart the compiler because most of today's compilers are intelligent enough to optimize these cases.
More on this: Is multiplication and division using shift operators in C actually faster?
Edit:
The answer to your question is that bit shifting is not defined as "the fastest division ever", it is defined as what its name says: bit shifting, which in case of negative numbers gives (or might give) different result.
You're not missing anything. If your input can be negative, your 2 options are:
Either check the value and if it might give different result, corrigate it or use division. A simple check might be to test if it's negative, or test the last bits (described above).
Completely avoid using bit shift for division purposes.

What is the purpose of the unsigned right shift operator ">>>" in Java?

I understand what the unsigned right shift operator ">>>" in Java does, but why do we need it, and why do we not need a corresponding unsigned left shift operator?
The >>> operator lets you treat int and long as 32- and 64-bit unsigned integral types, which are missing from the Java language.
This is useful when you shift something that does not represent a numeric value. For example, you could represent a black and white bit map image using 32-bit ints, where each int encodes 32 pixels on the screen. If you need to scroll the image to the right, you would prefer the bits on the left of an int to become zeros, so that you could easily put the bits from the adjacent ints:
int shiftBy = 3;
int[] imageRow = ...
int shiftCarry = 0;
// The last shiftBy bits are set to 1, the remaining ones are zero
int mask = (1 << shiftBy)-1;
for (int i = 0 ; i != imageRow.length ; i++) {
// Cut out the shiftBits bits on the right
int nextCarry = imageRow & mask;
// Do the shift, and move in the carry into the freed upper bits
imageRow[i] = (imageRow[i] >>> shiftBy) | (carry << (32-shiftBy));
// Prepare the carry for the next iteration of the loop
carry = nextCarry;
}
The code above does not pay attention to the content of the upper three bits, because >>> operator makes them
There is no corresponding << operator because left-shift operations on signed and unsigned data types are identical.
>>> is also the safe and efficient way of finding the rounded mean of two (large) integers:
int mid = (low + high) >>> 1;
If integers high and low are close to the the largest machine integer, the above will be correct but
int mid = (low + high) / 2;
can get a wrong result because of overflow.
Here's an example use, fixing a bug in a naive binary search.
Basically this has to do with sign (numberic shifts) or unsigned shifts (normally pixel related stuff).
Since the left shift, doesn't deal with the sign bit anyhow, it's the same thing (<<< and <<)...
Either way I have yet to meet anyone that needed to use the >>>, but I'm sure they are out there doing amazing things.
As you have just seen, the >> operator automatically fills the
high-order bit with its previous contents each time a shift occurs.
This preserves the sign of the value. However, sometimes this is
undesirable. For example, if you are shifting something that does not
represent a numeric value, you may not want sign extension to take
place. This situation is common when you are working with pixel-based
values and graphics. In these cases you will generally want to shift a
zero into the high-order bit no matter what its initial value was.
This is known as an unsigned shift. To accomplish this, you will use
java’s unsigned, shift-right operator,>>>, which always shifts zeros
into the high-order bit.
Further reading:
http://henkelmann.eu/2011/02/01/java_the_unsigned_right_shift_operator
http://www.java-samples.com/showtutorial.php?tutorialid=60
The signed right-shift operator is useful if one has an int that represents a number and one wishes to divide it by a power of two, rounding toward negative infinity. This can be nice when doing things like scaling coordinates for display; not only is it faster than division, but coordinates which differ by the scale factor before scaling will differ by one pixel afterward. If instead of using shifting one uses division, that won't work. When scaling by a factor of two, for example, -1 and +1 differ by two, and should thus differ by one afterward, but -1/2=0 and 1/2=0. If instead one uses signed right-shift, things work out nicely: -1>>1=-1 and 1>>1=0, properly yielding values one pixel apart.
The unsigned operator is useful either in cases where either the input is expected to have exactly one bit set and one will want the result to do so as well, or in cases where one will be using a loop to output all the bits in a word and wants it to terminate cleanly. For example:
void processBitsLsbFirst(int n, BitProcessor whatever)
{
while(n != 0)
{
whatever.processBit(n & 1);
n >>>= 1;
}
}
If the code were to use a signed right-shift operation and were passed a negative value, it would output 1's indefinitely. With the unsigned-right-shift operator, however, the most significant bit ends up being interpreted just like any other.
The unsigned right-shift operator may also be useful when a computation would, arithmetically, yield a positive number between 0 and 4,294,967,295 and one wishes to divide that number by a power of two. For example, when computing the sum of two int values which are known to be positive, one may use (n1+n2)>>>1 without having to promote the operands to long. Also, if one wishes to divide a positive int value by something like pi without using floating-point math, one may compute ((value*5468522205L) >>> 34) [(1L<<34)/pi is 5468522204.61, which rounded up yields 5468522205]. For dividends over 1686629712, the computation of value*5468522205L would yield a "negative" value, but since the arithmetically-correct value is known to be positive, using the unsigned right-shift would allow the correct positive number to be used.
A normal right shift >> of a negative number will keep it negative. I.e. the sign bit will be retained.
An unsigned right shift >>> will shift the sign bit too, replacing it with a zero bit.
There is no need to have the equivalent left shift because there is only one sign bit and it is the leftmost bit so it only interferes when shifting right.
Essentially, the difference is that one preserves the sign bit, the other shifts in zeros to replace the sign bit.
For positive numbers they act identically.
For an example of using both >> and >>> see BigInteger shiftRight.
In the Java domain most typical applications the way to avoid overflows is to use casting or Big Integer, such as int to long in the previous examples.
int hiint = 2147483647;
System.out.println("mean hiint+hiint/2 = " + ( (((long)hiint+(long)hiint)))/2);
System.out.println("mean hiint*2/2 = " + ( (((long)hiint*(long)2)))/2);
BigInteger bhiint = BigInteger.valueOf(2147483647);
System.out.println("mean bhiint+bhiint/2 = " + (bhiint.add(bhiint).divide(BigInteger.valueOf(2))));

Why do integers in Java integer not use all the 32 or 64 bits?

I was looking into 32-bit and 64-bit. I noticed that the range of integer values that can stored in 32 bits is ±4,294,967,295 but the Java int is also 32-bit (If I am not mistaken) and it stores values up to ±2 147 483 648. Same thing for long, it stores values from 0 to ±2^63 but 64-bit stores ±2^64 values. How come these values are different?
Integers in Java are signed, so one bit is reserved to represent whether the number is positive or negative. The representation is called "two's complement notation." With this approach, the maximum positive value represented by n bits is given by
(2 ^ (n - 1)) - 1
and the corresponding minimum negative value is given by
-(2 ^ (n - 1))
The "off-by-one" aspect to the positive and negative bounds is due to zero. Zero takes up a slot, leaving an even number of negative numbers and an odd number of positive numbers. If you picture the represented values as marks on a circle—like hours on a clock face—you'll see that zero belongs more to the positive range than the negative range. In other words, if you count zero as sort of positive, you'll find more symmetry in the positive and negative value ranges.
To learn this representation, start small. Take, say, three bits and write out all the numbers that can be represented:
0
1
2
3
-4
-3
-2
-1
Can you write the three-bit sequence that defines each of those numbers? Once you understand how to do that, try it with one more bit. From there, you imagine how it extends up to 32 or 64 bits.
That sequence forms a "wheel," where each is formed by adding one to the previous, with noted wraparound from 3 to -4. That wraparound effect (which can also occur with subtraction) is called "modulo arithemetic."
In 32 bit you can store 2^32 values. If you call these values 0 to 4294967295 or -2147483648 to +2147483647 is up to you. This difference is called "signed type" versus "unsigned type". The language Java supports only signed types for int. Other languages have different types for an unsigned 32bit type.
NO laguage will have a 32bit type for ±4294967295, because the "-" part would require another bit.
That's because Java ints are signed, so you need one bit for the sign.

Categories