Use of << and >>> in a hash function

Use of << and >>> in a hash function - java

I have a project in C where I need to created a suitable hash function for void pointers which could contain alphanumeric chars, ints or just plain ol' chars.
I need to use a polynomial hash fuction, where instead of multiplying by a constant, I should use a cyclic shift of partial sums by a fixed number of bits.
In this page here
, there's the java code(I assume this is java because of the use of strings):
static int hashCode(String s) {
int h = 0;
for (int i = 0; i < s.length(); i++) {
h = (h << 5) | (h >>> 27); // 5-bit cyclic shift of the running sum
h += (int) s.charAt(i); // add in next character
}
return h;
}
What exactly is this line, below, doing?
h = (h << 5) | (h >>> 27); // 5-bit cyclic shift of the running sum
Yes, the comment says 5bit cyclic shift, but how does the <<, | and >>> operands work in this regard? I've never seen or used any of them before.

As it says, it's a 5-bit cyclic left shift. This means that all the bits are shifted left, with the bit "shifted off" added to the right side, five times.
The code replaces the value of h with the value of two bit patterns ORed together. The first bit pattern is the original value shifted left 5 bits. The second value is the original value shifted right 27 bits.
The left shift of 5 bits puts all the bits but the leftmost five in their final position. The leftmost 5 bits get "shifted out" by that shift and replaced with zeroes as the rightmost bits of the output. The right shift of 27 bits put the leftmost five bits in their final position as the rightmost bits, shifting in zeroes for the leftmost 27 bits. ORing them together produces the desired output.
The >>> is Java's unsigned shift operation. In C or C++, you'd just use >>.

Related

The method add mod 2^512

It's add modulo 2^512. Could you explain me why we doing here >>8 and then &oxFF?
I know i'm bad in math.
int AddModulo512(int []a, int []b)
{
int i = 0, t = 0;
int [] result = new int [a.length];
for(i = 63; i >= 0; i--)
{
t = (a[i]) + (int) (b[i]) + (t >> 8);
result[i] = (t & 0xFF); //?
}
return result;
}

The mathematical effect of a bitwise shift right (>>) on an integer is to divide by two (truncating any remainder). By shifting right 8 times, you divide by 2^8, or 256.
The bitwise & with 0xFF means that the result will be limited to the first byte, or a range of 0-255.
Not sure why it references modulo 512 when it actually divides by 256.

It looks like you have 64 ints in each array, but your math is modulo 2^512. 512 divided by 64 is 8, so you are only using the least significant 8 bits in each int.
Here, t is used to store an intermediate result that may be more than 8 bits long.
In the first loop, t is 0, so it doesn't figure in the addition in the first statement. There's nothing to carry yet. But the addition may result in a value that needs more than 8 bits to store. So, the second line masks out the least significant 8 bits to store in the current result array. The result is left intact to the next loop.
What does the previous value of t do in the next iteration? It functions as a carry in the addition. Bit-shifting it to the right 8 positions makes any bits beyond 8 in the previous loop's result into a carry into the current position.
Example, with just 2-element arrays, to illustrate the carrying:
[1, 255] + [1, 255]
First loop:
t = 255 + 255 + (0) = 510; // 1 11111110
result[i] = 510 & 0xFF = 254; // 11111110
The & 0xFF here takes only the least significant 8 bits. In the analogy with normal math, 9 + 9 = 18, but in an addition problem with many digits, we say "8 carry the 1". The bitmask here performs the same function as extracting the "8" out of 18.
Second loop:
// 1 11111110 >> 8 yields 0 00000001
t = 1 + 1 + (510 >> 8) = 1 + 1 + 1 = 3; // The 1 from above is carried here.
result[i] = 3 & 0xFF = 3;
The >> 8 extracts the possible carry amount. In the analogy with normal math, 9 + 9 = 18, but in an addition problem with many digits, we say "8 carry the 1". The bit shift here performs the same function as extracting the "1" out of 18.
The result is [3, 254].
Notice how any carry leftover from the last iteration (i == 0) is ignored. This implements the modulo 2^512. Any carryover from the last iteration represents 2^512 and is ignored.

>> is a bitwise shift.
The signed left shift operator "<<" shifts a bit pattern to the left,
and the signed right shift operator ">>" shifts a bit pattern to the
right. The bit pattern is given by the left-hand operand, and the
number of positions to shift by the right-hand operand. The unsigned
right shift operator ">>>" shifts a zero into the leftmost position,
while the leftmost position after ">>" depends on sign extension.
& is a bitwise and
The bitwise & operator performs a bitwise AND operation.
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/op3.html
http://www.tutorialspoint.com/java/java_bitwise_operators_examples.htm

>> is the bitshift operator
0xFF is the hexadecimal literal for 255.

I think your question misses a very important part, the data format, i.e. how data are stored in a[] and b[]. To solve this question, I make some assumptions:
Since it's modulo arithmetic, a, b <= 2^512. Thus, a and b have 512 bits.
Since a and b have 64 elements, only 8 right-most bits of each elements are used. In other words, a[i], b[i] <= 256.
Then, what remains is very straightforward. Just consider each a[i] and b[i] as a digit (each digit is 8-bit) in a base 2^512 addition and then perform addition by adding digit-by-digit from right-to-left.
t is the carry variable which stores the value (with carry) of the addition at the last digit. t>>8 throws a way the right-most 8 bits that has been used for the last addition which is used as carry for the current addition. (t & 0xFF) gets the right-most 8 bits of t which is used for the current digit.
Since it's modulo addition, the final carry is thrown away.

Understanding unsigned right shift

The JLS 15.19 describes the formula for >>> operator.
The value of n >>> s is n right-shifted s bit positions with
zero-extension, where:
If n is positive, then the result is the same as that of n >> s.
If n is negative and the type of the left-hand operand is int, then
the result is equal to that of the expression (n >> s) + (2 << ~s).
If n is negative and the type of the left-hand operand is long, then
the result is equal to that of the expression (n >> s) + (2L << ~s).
Why does n >>> s = (n >> s) + (2 << ~s), where ~s = 31 - s for int and ~s = 63 - s for long?

If n is negative it means that the sign bit is set.
>>> s means shift s places to the right introducing zeros into the vacated slots.
>> s means shift s places to the right introducing copies of the sign bit into the vacated slots.
E.g.
10111110000011111000001111100000 >>> 3 == 00010111110000011111000001111100
10111110000011111000001111100000 >> 3 == 11110111110000011111000001111100
Obviously if n is not negative, n >> s and n >>> s are the same. If n is negative, the difference will consist of s ones at the left followed by all zeros.
In other words:
(n >>> s) + X == n >> s (*)
where X consists of s ones followed by 32 - s zeros.
Because there are 32 - s zeros in X, the right-most one in X occurs in the position of the one in 1 << (32 - s), which is equal to 2 << (31 - s), which is the same as 2 << ~s (because ~s == -1 - s and shift amounts work modulo 32 for ints).
Now what happens when you add 2 << ~s to X? You get zero! Let's demonstrate this in the case s == 7. Notice that the the carry disappears off the left.
11111110000000000000000000000000
+ 00000010000000000000000000000000
________________________________
00000000000000000000000000000000
It follows that -X == 2 << ~s. Therefore adding -X to both sides of (*) we get
n >>> s == (n >> s) + (2 << ~s)
For long it's exactly the same, except that shift amounts are done modulo 64, because longs have 64 bits.

Here is some additional context that will help you understand pbabcdefp's answer if you don't already know the basics he assumes:
To understand bitwise operators you must think about numbers as strings of binary digits, eg. 20 = 00010100 and -4 = 11111100 (For the sake of clarity and not having to write so many digits I will be writing all binary numbers as bytes; ints are the same but four times as long). If you are unfamiliar with binary and binary operations, you can read more here. Note how the first digit is special: It makes numbers negative, as if it had a place value (remember elementary math, the ones/tens/hundreds places?) of the most negative number possible, so Byte.MIN_VALUE = -128 = 1000000, and setting any other bit to 1 always increases the number. To easily read a negative number such as 11110011, know that -1 = 11111111, then read the 0s as if they were 1s in a positive number, then that number is how far away you are from -1. So 11110011 = -1 - 00001100 = -1 - 12 = -13.
Also understand that ~s is bitwise NOT: It takes all the digits and flips them, this is actually equivalent to ~s = -1 - s. Eg ~5 (00000101) is -6 (11111010). Observe how my suggested method for reading negative binary numbers is simply a trick to be able to read the bitwise NOT of the number rather than the number itself, which is easier for negative numbers close to zero because those numbers have fewer 0s than 1s.

Create and fill a variable-sized binary truth table

I am trying solve the 0/1 Knapsack problem through brute force. The simplest (it seems) way to do it would be to set up a 2d matrix with 1's and 0's signifying present and non-present in the knapsack, respectively. The parameters would be the number of items (ie: columns), so then the rows should be 2^numOfItems. But since the number of items isn't constant, I can't think of how to fill the matrix. I was told that bit-shifting would work, but I do not understand how that works. Can someone point me in the right direction?
EDIT: by truth table I mean the 'A' part of one of these: http://www.johnloomis.org/ece314/notes/devices/binary_to_BCD/bcd03.png

You don't have to store all the bit sequences in a matrix, it's unnecessary and will waste way too much memory. You can simply use an integer to denote the current set. The integer will go from 0 to 2^n-1 where n is the number of elements that you can choose from. Here's the basic idea.
int max = (1 << n);
for(int set = 0; set < max; set++)
{
for(int e = 0; e < n; e++)
{
if((set & (1 << e)) != 0)
//eth bit is 1 means that the eth item is in our set
else
// eth element will not be put in the knapsack
}
}
The algorithm relies on logical left bit shifting. (1 << n) means that we will shift 1, n positions to the left by padding zeros to the right side of the number. So for example, if we represent 1 as an 8-bit number 00000001, (1 << 1) == 00000010, (1 << 2) == 00000100, etc. The bitwise-and operator is an operator that takes two arguments, and "ands" every two bits that have the same index. So if we have 2 bit-strings of length n each, bit zero will be anded with bit 0, bit 1 with bit 1, etc. The output of & is a 1 if and only if both bits are 1s, otherwise it's 0. Why is this useful?? we need it to test bits. For example, assume that we have some set represented as a bit-string, and we want to determine if the ith bit in the bit-set is one or a zero. We can do that by using a shift left operation followed by a bitwise-and operation.
Example
Set = 00101000
we want to test Set(3) (remember that the rightmost bit is bit 0)
We can do that by shifting 1 3 places to the left, so it becomes 00001000. Then we "and" the shifted 1 with the set
00101000
&
00001000
---------
00001000
As you can see, if the bit I am testing is a 1, then the output of the & will be non zero, otherwise it'll be zero.

What does this binary documentation mean?

I'm trying to decode somebody's byte array and I'm stuck at this part:
&lt state &gt ::= "01" <i>(2 bits) for A</i>
"10" <i>(2 bits) for B</i>
"11" <i>(2 bits) for C</i>
I think this wants me to look at the next 2 bits of the next byte. Would that mean the least or most significant digits of the byte? I suppose I would just throw away the last 6 bits if it means the least significant?
I found this code for looking at the bits of a byte:
for (int i = 0; i < byteArray.Length; i++)
{
byte b = byteArray[i];
byte mask = 0x01;
for (int j = 0; j < 8; j++)
{
bool value = b & mask;
mask << 1;
}
}
Can someone expand on what this does exactly?

Just to give you a start:
To extract individual bits of a byte, you use "&", called the bitwise and operator. The bitwise and operation means "preserve all bits which are set on both sides". E.g. when you calculate the bitwise-and of two bytes, e.g. 00000011 & 00000010, then the result is 00000010, because only the bit at the second last position is set in both sides.
In java programming language, the very same example looks like this:
int a = 3;
int b = 2;
int bitwiseAndResult = a & b; // bitwiseAndResult will be equal to 2 after this
Now to examine if the n'th bit of some int is set, you can do this:
int intToExamine = ...;
if ((intToExamine >> n)) & 1 != 0) {
// here we know that the n'th bit was set
}
The >> is called the bitshift operator. It simply shifts the bits from left to right, like this: 00011010 >> 2 will have the result 00000110.
So from the above you can see that for extracting the n'th bit of some value, you first shift the n'th bit to position 0 (note that the first bit is bit 0, not bit 1), and then you use the bitwise and operator (&) to only keep that bit 0.
Here are some simple examples of bitwise and bit shift operators:
http://www.tutorialspoint.com/java/java_bitwise_operators_examples.htm

Difference between >>> and >>

What is the difference between >>> and >> operators in Java?

>> is arithmetic shift right, >>> is logical shift right.
In an arithmetic shift, the sign bit is extended to preserve the signedness of the number.
For example: -2 represented in 8 bits would be 11111110 (because the most significant bit has negative weight). Shifting it right one bit using arithmetic shift would give you 11111111, or -1. Logical right shift, however, does not care that the value could possibly represent a signed number; it simply moves everything to the right and fills in from the left with 0s. Shifting our -2 right one bit using logical shift would give 01111111.

>>> is unsigned-shift; it'll insert 0. >> is signed, and will extend the sign bit.
JLS 15.19 Shift Operators
The shift operators include left shift <<, signed right shift >>, and unsigned right shift >>>.
The value of n>>s is n right-shifted s bit positions with sign-extension.
The value of n>>>s is n right-shifted s bit positions with zero-extension.
System.out.println(Integer.toBinaryString(-1));
// prints "11111111111111111111111111111111"
System.out.println(Integer.toBinaryString(-1 >> 16));
// prints "11111111111111111111111111111111"
System.out.println(Integer.toBinaryString(-1 >>> 16));
// prints "1111111111111111"
To make things more clear adding positive counterpart
System.out.println(Integer.toBinaryString(121));
// prints "1111001"
System.out.println(Integer.toBinaryString(121 >> 1));
// prints "111100"
System.out.println(Integer.toBinaryString(121 >>> 1));
// prints "111100"
Since it is positive both signed and unsigned shifts will add 0 to left most bit.
Related questions
Right Shift to Perform Divide by 2 On -1
Is shifting bits faster than multiplying and dividing in Java? .NET?
what is c/c++ equivalent way of doing ‘>>>’ as in java (unsigned right shift)
Negative logical shift
Java’s >> versus >>> Operator?
What is the difference between the Java operators >> and >>>?
Difference between >>> and >> operators
What’s the reason high-level languages like C#/Java mask the bit shift count operand?
1 >>> 32 == 1

>>> will always put a 0 in the left most bit, while >> will put a 1 or a 0 depending on what the sign of it is.

They are both right-shift, but >>> is unsigned
From the documentation:
The unsigned right shift operator ">>>" shifts a zero into the leftmost position, while the leftmost position after ">>" depends on sign extension.

The logical right shift (v >>> n) returns a value in which the bits in v have been shifted to the right by n bit positions, and 0's are shifted in from the left side. Consider shifting 8-bit values, written in binary:
01111111 >>> 2 = 00011111
10000000 >>> 2 = 00100000
If we interpret the bits as an unsigned nonnegative integer, the logical right shift has the effect of dividing the number by the corresponding power of 2. However, if the number is in two's-complement representation, logical right shift does not correctly divide negative numbers. For example, the second right shift above shifts 128 to 32 when the bits are interpreted as unsigned numbers. But it shifts -128 to 32 when, as is typical in Java, the bits are interpreted in two's complement.
Therefore, if you are shifting in order to divide by a power of two, you want the arithmetic right shift (v >> n). It returns a value in which the bits in v have been shifted to the right by n bit positions, and copies of the leftmost bit of v are shifted in from the left side:
01111111 >> 2 = 00011111
10000000 >> 2 = 11100000
When the bits are a number in two's-complement representation, arithmetic right shift has the effect of dividing by a power of two. This works because the leftmost bit is the sign bit. Dividing by a power of two must keep the sign the same.

Read more about Bitwise and Bit Shift Operators
>> Signed right shift
>>> Unsigned right shift
The bit pattern is given by the left-hand operand, and the number of positions to shift by the right-hand operand. The unsigned right shift operator >>> shifts a zero into the leftmost position,
while the leftmost position after >> depends on sign extension.
In simple words >>> always shifts a zero into the leftmost position whereas >> shifts based on sign of the number i.e. 1 for negative number and 0 for positive number.
For example try with negative as well as positive numbers.
int c = -153;
System.out.printf("%32s%n",Integer.toBinaryString(c >>= 2));
System.out.printf("%32s%n",Integer.toBinaryString(c <<= 2));
System.out.printf("%32s%n",Integer.toBinaryString(c >>>= 2));
System.out.println(Integer.toBinaryString(c <<= 2));
System.out.println();
c = 153;
System.out.printf("%32s%n",Integer.toBinaryString(c >>= 2));
System.out.printf("%32s%n",Integer.toBinaryString(c <<= 2));
System.out.printf("%32s%n",Integer.toBinaryString(c >>>= 2));
System.out.printf("%32s%n",Integer.toBinaryString(c <<= 2));
output:
11111111111111111111111111011001
11111111111111111111111101100100
111111111111111111111111011001
11111111111111111111111101100100
100110
10011000
100110
10011000

The right shift logical operator (>>> N) shifts bits to the right by N positions, discarding the sign bit and padding the N left-most bits with 0's. For example:
-1 (in 32-bit): 11111111111111111111111111111111
after a >>> 1 operation becomes:
2147483647: 01111111111111111111111111111111
The right shift arithmetic operator (>> N) also shifts bits to the right by N positions, but preserves the sign bit and pads the N left-most bits with 1's. For example:
-2 (in 32-bit): 11111111111111111111111111111110
after a >> 1 operation becomes:
-1: 11111111111111111111111111111111

>> Signed right shift
>>> Unsigned right shift
Example:-
byte x, y; x=10; y=-10;
SOP("Bitwise Left Shift: x<<2 = "+(x<<2));
SOP("Bitwise Right Shift: x>>2 = "+(x>>2));
SOP("Bitwise Zero Fill Right Shift: x>>>2 = "+(x>>>2));
SOP("Bitwise Zero Fill Right Shift: y>>>2 = "+(y>>>2));
output would be :-
Bitwise Left Shift: x<<2 = 40
Bitwise Right Shift: x>>2 = 2
Bitwise Zero Fill Right Shift: x>>>2 = 2
Bitwise Zero Fill Right Shift: y>>>2 = 1073741821

>>(signed) will give u different result for 8 >> 2, -8 >> 2.
right shift of 8
8 = 1000 (In Binary)
perform 2 bit right shift
8 >> 2:
1000 >> 2 = 0010 (equivalent to 2)
right shift of -8
8 = 1000 (In Binary)
1's complement = 0111
2's complement:
0111 + 1 = 1000
Signed bit = 1
perform 2 bit right shift (on 2's co result)
8 >> 2:
1000 >> 2 = 1110 (equivalent to -2)
>>(unsigned) will give u same result for 8 >>> 2, -8 >>> 2.
unsigned right shift of 8
8 = 1000
8 >>> 2 = 0010
unsigned right shift of -8
-8 = 1000
-8 >>> 2 = 0010

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Use of << and >>> in a hash function - java

Related

The method add mod 2^512

Understanding unsigned right shift

Create and fill a variable-sized binary truth table

What does this binary documentation mean?

Difference between >>> and >>

Categories

Resources