Non cryptographic hashing in Java using unsigned integers - java

I am looking for a hashing function that can be used for non cryptographic purposes in Java. The challenge is that most of the hashing functions return signed integer values (-,0,+) that cannot be used as identifier in every context (for example negative integers cannot be used in URLs). One solution to this problem is that I come up with is to use a 32 bit signed int and convert it to a 32 bit unsigned int and store it in a long. This works pretty well. However, 32 bit random information makes hash collisions too frequent in our setup. One way of solving this is to use a 64 bit hashing function (same SipHash works fine) and convert that signed integer to unsigned by shifting one to the right and having 0 in the MSB position. I was trying to achieve that with the Java >> operator but the results does not make sense.
//Using Guava
private final static HashFunction hashFunction = Hashing.sipHash24();
private static int getRandomInt() {
return hashFunction.newHasher().putLong(rnd.nextLong()).hash().asInt();
}
private static long getRandomLong(){
return hashFunction.newHasher().putLong(rnd.nextLong()).hash().asLong();
}
Bitshifting:
System.out.println(Long.toBinaryString(-2147483648L >> 1));
1111111111111111111111111111111111000000000000000000000000000000
What am I missing and how could I have a 62 bit unsigned integer hash value stored in a 64 bit int (long) in Java?
UPDATE1:
After doing some research I finally found a way to correctly display the effect of >>> on a Long value:
System.out.println(
String.format("%64s", Long.toBinaryString(-2147483648L))
.replace(' ', '0'));
System.out.println(
String.format("%64s", Long.toBinaryString(-2147483648L >>> 1))
.replace(' ', '0'));
1111111111111111111111111111111110000000000000000000000000000000
0111111111111111111111111111111111000000000000000000000000000000

a >> b
shifts a to the right by b bits. On the left it repeats the bit that was already there (sign extending!).
Examples:
101010 >> 1 = 110101
010101 >> 1 = 001010
a >>> b
also shifts a to the right by b bits, but doesn't sign extend. It always adds in zeros on the left:
101010 >>> 1 = 010101
010101 >>> 1 = 001010

Related

Storing unsigned long value in two 16bit register

I want to store unsigned long value in two 16bit register.For example if I have long value (-2,147,483,648 to 2,147,483,647) then I'm using formula like:
v[0] = myValue % 65536
v[1] = myValue / 65536
To get value from register
outVal = reg0 + (reg1 * 65536)
But how to do for unsigned long which value range is from 0 to 4,294,967,295?
As commenter harold pointed out already, your formula doesn't even work correctly for negative numbers.
You should do it bitwise instead of using math to avoid surprises (and speed things up in case the compiler didn't optimize it for you already).
Splitting:
v[0] = myValue & 0xFFFF
v[1] = myValue >> 16 // this implicitly cuts off the lower 16 bits
// by shifting them away into the nirvana
Joining:
outVal = reg0 | (reg1 << 16)
This now applies to both signed and unsigned (provided that all your variables have the same "sign type").
Legend, in case your language (which you didn't specify) uses different operators:
& is bitwise AND, | is bitwise OR, << and >> are bitwise shifting left/right (SHL/SHR), 0x marks a hexadecimal literal (you could use 65536 instead of 0xFFFF, but I think the hex literal makes it clearer where this magic number comes from).

Sign extension, bit shifting in JAVA. Help understanding a C-code bit

I have the following C-code (from FFMPEG):
static inline av_const int sign_extend(int val, unsigned bits)
{
unsigned shift = 8 * sizeof(int) - bits;
union { unsigned u; int s; } v = { (unsigned) val << shift };
return v.s >> shift;
}
I'm trying to reproduce this in JAVA. But I have difficulties understanding this. No matter how I toss the bits around, I don't get very close.
As for the value parameter: it takes unsigned byte value as int.
Bits parameter: 4
If the value is 255 and bits is 4. It returns -1. I can't reproduce this in JAVA. Sorry for such fuzzy question. But can you help me understand this code?
The big picture is that I'm trying to encode EA ADPCM audio in JAVA. In FFMPEG:
https://gitorious.org/ffmpeg/ffmpeg/source/c60caa5769b89ab7dc4aa41a21f87d5ee177bd30:libavcodec/adpcm.c#L981
Strictly speaking, the result of running this code with this input data has unspecified results because signed bitshift in C is only properly defined in circumstances that this scenario does not meet. From the C99 standard:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has unsigned type or if E1 has signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and negative value, the resulting value is implementation-defined.
(Emphasis mine)
But let's assume that your implementation defines signed right shift to extend the sign, meaning that the space on the left will be filled with ones if the sign bit is set and zeroes otherwise; the ffmpeg code clearly expects this to be the case. The following is happening: shift has the value of 28 (assuming 32-bit integers). In binary notation:
00000000 00000000 00000000 11111111 = val
11110000 00000000 00000000 00000000 = (unsigned) val << shift
Note that when interpreting (unsigned) val << shift as signed integer, as the code proceeds to do (assuming two's complement representation, as today's computers all use1), the sign bit of that integer is set, so a signed shift to the right fills up with zeroes from the left, and we get
11110000 00000000 00000000 00000000 = v.s
11111111 11111111 11111111 11111111 = v.s >> shift
...and in two's complement representation, that is -1.
In Java, this trick works the same way -- except better, because there the behavior is actually guaranteed. Simply:
public static int sign_extend(int val, int bits) {
int shift = 32 - bits; // int always has 32 bits in Java
int s = val << shift;
return s >> shift;
}
Or, if you prefer:
public static int sign_extend(int val, int bits) {
int shift = 32 - bits;
return val << shift >> shift;
}
1 Strictly speaking, this conversion does not have a well-defined value in the C standard either, for historical reasons. There used to be computers that used different representations, and the same bit pattern with a set sign bit has a completely different meaning in (for example) signed magnitude representation.
The reason why the code looks so odd is that C language is full of undefined behaviours that in Java are well-defined. For example in C bit-shifting a signed integer left so that the sign-bit changes is undefined behaviour and at that point the program can do anything - whatever the compiler causes the program to do - crash, print 42, make true = false, anything can happen, and the compiler still compiled it correctly.
Now the code uses a 1 trick to shift the integer left: it uses an union that lays the bytes of members top of each other - making an unsigned and an signed integer to occupy the same bytes; the bitshift is defined with the unsigned integer; so we do the unsigned shift using it; then shift back using the signed shift (the code assumes that the right shift of a negative number produces properly sign-extended negative numbers, which is also not guaranteed by standard but usually these kinds of libraries have a configuration utility that can refuse compilation on such a quite esoteric platform; likewise this program assumes that CHAR_BIT is 8; however C only makes a guarantee that a char is at least 8 bits wide.
In Java, you do not need anything like a union to accomplish this; instead you do:
static int signExtend(int val, int bits) {
int shift = 32 - bits; // fixed size
int v = val << shift;
return v >> shift;
}
In Java the width of an int is always 32 bits; << can be used for both signed and unsigned shift; and there is no undefined behaviour for extending to the sign bit; >> can be used for signed shift (>>> would be unsigned).
given this code:
static inline av_const int sign_extend(int val, unsigned bits)
{
unsigned shift = 8 * sizeof(int) - bits;
union { unsigned u; int s; } v = { (unsigned) val << shift };
return v.s >> shift;
}
the 'static' modifier says the function is not visible outside the current file.
The 'inline' modifier is a 'request' to the compiler to place the code 'inline' whereever the function is called rather than having a separate function with the associated call/return code sequences
the 'sign_extend' is the name of the function
in C, a right shift, for a signed value will propagate the sign bit,
In C, a right shift, for a unsigned value will zero fill.
It looks like your java is doing the zero fill.
regarding this line:
unsigned shift = 8 * sizeof(int) - bits;
on a 32bit machine, an integer is 32 bits and size of int is 4
so the variable 'shift' will contain (8*4)-bits
regarding this line:
union { unsigned u; int s; } v = { (unsigned) val << shift };
left shift of unsigned will shift the bits left,
with the upper bits being dropped into the bit bucket
and the lower bits being zero filled.
regarding this line:
return v.s >> shift;
this shifts the bits back to their original position,
while propagating the (new) sign bit

Convert 32 bit fixed point value

In my Java application I need to interpret a 32 Bit Fixed Point value. The number format is as follows: The first 15 bits describe the places before the comma/point, the 16th bit represents the sign of the value and the following 16 bits describe the decimal places (1/2,1/4,1/8,1/16,...).
The input is a byte array with four values. The order of the bits in the byte array is little endian.
How can I convert such a number into Java float ?
Just do exactly what it says. Assume x is the 32bit fixed point number as int.
So, put the bits after the point, after the point, and don't use the sign here:
float f = (float)(x & 0x7fff_ffff) / (float)(1 << 16);
Put back the sign:
return x < 0 ? -f : f;
You will lose some precision. A float does not have 31 bits of precision, but your input does. It's easily adapted to doubles though.
Since the sign bit is apparently really in the middle, first get it out:
int sign = x & (1 << 16);
Join the two runs of non-sign bits:
x = (x & 0xFFFF) | ((x >> 1) & 0x7fff0000);
Then do more or less the old thing:
float f = (float)x / (float)(1 << 16);
return sign == 0 ? f : -f;
In case your input is little endian format, use the following approach to generate x:
int x = ByteBuffer.wrap(weirdFixedPoint).order(ByteOrder.LITTLE_ENDIAN).getInt();
where weirdFixedPoint is the byte array containing the 32 bit binary representation.

Bitwise op unexpectedly goes negative

Can someone please explain to me why I'm getting these results?
public static int ipv4ToInt(String address) {
int result = 0;
// iterate over each octet
for(String part : address.split(Pattern.quote("."))) {
// shift the previously parsed bits over by 1 byte
result = result << 8;
System.out.printf("shift = %d\n", result);
// set the low order bits to the current octet
result |= Integer.parseInt(part);
System.out.printf("result = %d\n", result);
}
return result;
}
For ipv4ToInt("10.35.41.134"), I get:
shift = 0
result = 10
shift = 2560
result = 2595
shift = 664320
result = 664361
shift = 170076416
result = 170076550
10.35.41.134 = 170076550
This is the same result that I get when I do the math myself.
For ipv4ToInt("192.168.0.1"), I get:
shift = 0
result = 192
shift = 49152
result = 49320
shift = 12625920
result = 12625920
shift = -1062731776
result = -1062731775
192.168.0.1 = -1062731775
For this one, when I do the math manually, I get 3232235521.
Interestingly:
3232235521 = 11000000101010000000000000000001
And when I enter 1062731775 into my Windows calc and hit the +/- button, I get:
-1062731775 = 11111111111111111111111111111111 11000000101010000000000000000001
The function still works for my purposes, but I'm just really curious to know why on earth result is going negative when I do that last bit shift?
Because of bit overflow in your case!
In Java, the integer is also 32 bits and range is from -2,147,483,648 to 2,147,483,647.
12625920 << 8 crosses the limit of 2^31-1 and hence,the result turns negative...
The result just overturns from -ve side and hence,whatever range is left from the positive side is accompanied by that much from negative side!!!
As suggested by everyone,you should use long variable to avoid overflow!
All primitives in java are signed - They can be positive or negative. The highest bit is used to control this, so when it gets set, the number becomes negative.
Try using a long instead - that will give you the wanted result, as longs can be much larger without overflow being hit.
11000000101010000000000000000001 is 32 bits. The value it represents depends on how you interpret those bits:
as a 32bit signed integer, it is negative (because the left-most bit is 1)
as a 32bit unsigned integer, it is positive (like every other unsigned integer)
as the lowers bits of a long (signed or unsigned), it would give a positive value (if the left-most bits stay 0)
Java uses signed integers, so if you print the value, you'll see a negative number. That does not mean your bits are wrong.
It matters only for printing, you can use these ways for printing a positive integer:
1) change result to long (java will do the work for you).
2) handle signed int as unsigned int by yourself (possibly make a method int2string(int n) or toString(int n).
3) Integer.toUnsignedString(int) is you are using java 8.
As far as I understand, your are parsing an IP address, remember, java use signed data types, you can't have a 32 bits integer.
change result to long to fix that.

Converting Little Endian to Big Endian

All,
I have been practicing coding problems online. Currently I am working on a problem statement Problems where we need to convert Big Endian <-> little endian. But I am not able to jot down the steps considering the example given as:
123456789 converts to 365779719
The logic I am considering is :
1 > Get the integer value (Since I am on Windows x86, the input is Little endian)
2 > Generate the hex representation of the same.
3 > Reverse the representation and generate the big endian integer value
But I am obviously missing something here.
Can anyone please guide me. I am coding in Java 1.5
Since a great part of writing software is about reusing existing solutions, the first thing should always be a look into the documentation for your language/library.
reverse = Integer.reverseBytes(x);
I don't know how efficient this function is, but for toggling lots of numbers, a ByteBuffer should offer decent performance.
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
...
int[] myArray = aFountOfIntegers();
ByteBuffer buffer = ByteBuffer.allocate(myArray.length*Integer.BYTES);
buffer.order(ByteOrder.LITTLE_ENDIAN);
for (int x:myArray) buffer.putInt(x);
buffer.order(ByteOrder.BIG_ENDIAN);
buffer.rewind();
int i=0;
for (int x:myArray) myArray[i++] = buffer.getInt(x);
As eversor pointed out in the comments, ByteBuffer.putInt() is an optional method, and may not be available on all Java implementations.
The DIY Approach
Stacker's answer is pretty neat, but it is possible to improve upon it.
reversed = (i&0xff)<<24 | (i&0xff00)<<8 | (i&0xff0000)>>8 | (i>>24)&0xff;
We can get rid of the parentheses by adapting the bitmasks. E. g., (a & 0xFF)<<8 is equivalent to a<<8 & 0xFF00. The rightmost parentheses were not necessary anyway.
reversed = i<<24 & 0xff000000 | i<<8 & 0xff0000 | i>>8 & 0xff00 | i>>24 & 0xff;
Since the left shift shifts in zero bits, the first mask is redundant. We can get rid of the rightmost mask by using the logical shift operator, which shifts in only zero bits.
reversed = i<<24 | i>>8 & 0xff00 | i<<8 & 0xff0000 | i>>>24;
Operator precedence here, the gritty details on shift operators are in the Java Language Specification
Check this out
int little2big(int i) {
return (i&0xff)<<24 | (i&0xff00)<<8 | (i&0xff0000)>>8 | (i>>24)&0xff;
}
The thing you need to realize is that endian swaps deal with the bytes that represent the integer. So the 4 byte number 27 looks like 0x0000001B. To convert that number, it needs to go to 0x1B000000... With your example, the hex representation of 123456789 is 0x075BCD15 which needs to go to 0x15CD5B07 or in decimal form 365779719.
The function Stacker posted is moving those bytes around by bit shifting them; more specifically, the statement i&0xff takes the lowest byte from i, the << 24 then moves it up 24 bits, so from positions 1-8 to 25-32. So on through each part of the expression.
For example code, take a look at this utility.
Java primitive wrapper classes support byte reversing since 1.5 using reverseBytes method.
Short.reverseBytes(short i)
Integer.reverseBytes(int i)
Long.reverseBytes(long i)
Just a contribution for those who are looking for this answer in 2018.
I think this can also help:
int littleToBig(int i)
{
int b0,b1,b2,b3;
b0 = (i&0x000000ff)>>0;
b1 = (i&0x0000ff00)>>8;
b2 = (i&0x00ff0000)>>16;
b3 = (i&0xff000000)>>24;
return ((b0<<24)|(b1<<16)|(b2<<8)|(b3<<0));
}
Just use the static function (reverseBytes(int i)) in java which is under Integer Wrapper class
Integer i=Integer.reverseBytes(123456789);
System.out.println(i);
output:
365779719
the following method reverses the order of bits in a byte value:
public static byte reverseBitOrder(byte b) {
int converted = 0x00;
converted ^= (b & 0b1000_0000) >> 7;
converted ^= (b & 0b0100_0000) >> 5;
converted ^= (b & 0b0010_0000) >> 3;
converted ^= (b & 0b0001_0000) >> 1;
converted ^= (b & 0b0000_1000) << 1;
converted ^= (b & 0b0000_0100) << 3;
converted ^= (b & 0b0000_0010) << 5;
converted ^= (b & 0b0000_0001) << 7;
return (byte) (converted & 0xFF);
}

Categories