Specifically I want to take this number:
x = 1452610545672622396
and perform
x ^= (x << 21) // In Python I do x ^= (x << 21) & 0xffffffffffffffff
I want to get: -6403331237455490756, which is what I get in Java
instead of: 12043412836254060860, which is what I get in Python (which is what I don't want)
EDIT: In Java I do:
long x = 1452610545672622396;
x ^= (x << 21);
You can use 64bit signed int like java using ctypes.c_longlong, please see example below:
from ctypes import c_longlong as ll
x = 1452610545672622396
output = ll(x^(x<<21))
print output
print output.__class__
You might cause an overflow. Java long is 64 bit long while python has no size limit. Try using Long wrapper class of long. The Long object has also no limits (Well technicially everything has its limits...).
Related
Is it valid to use XOR shift to produce a usable checksum? I can't find any evidence that it collides more than say CRC32.
I did run a simulation on 10 million randomly generated 8 to 32 length byte arrays and the hash32 method below actually produced 2% less collisions than CRC32.
Also, the code seems to run about 40x faster than Java's built-in util.zip.CRC32 class.
public static long hash64( byte[] bytes )
{
long x = 1;
for ( int i = 0; i < bytes.length; i++ )
{
x ^= bytes[ i ];
x ^= ( x << 21 );
x ^= ( x >>> 35 );
x ^= ( x << 4 );
}
return x;
}
public static int hash32( byte[] bytes )
{
int x = 1;
for ( int i = 0; i < bytes.length; i++ )
{
x ^= bytes[ i ];
x ^= ( x << 13 );
x ^= ( x >>> 17 );
x ^= ( x << 5 );
}
return x;
}
Yes, if all you need is a simple file checksum, it's a completely valid alternative, but it's not the best solution.
CRCs are optimized for reliably detecting burst errors, not collision resistance or uniform distribution. CRC-32 may superficially appear to work as a general hash function or a checksum, but it readily fails avalanche and collision tests, as you've seen in your test. CRC is also quite slow because it must implement polynomial division, which requires expensive operations, even when heavily optimized into shift operations. Table versions of CRC which utilize lookup tables (LUT) are also slow in interpreted languages such as Java due to unavoidable bounds-checking and conditional checks under the hood for each lookup.
Your solution is to take Xorshift, a pseudorandom function (PRF), and transform it into a hash function. On the surface, this may seem to pass basic collision tests, but it is not a very good choice. Its avalanche behavior is quite poor, and so there is a greater-than-chance probability of collisions that your tests aren't sensitive enough to find. Not only that, but it is sub-optimal, reading only one byte at a time. Better solutions exist with comparable performance.
A much better choice is 64-bit MurmurHash3, it performs quite well in Java when sufficiently optimized. It may even be faster than your solution for large inputs. I also recommend reading Bret Mulvey's article on Hash Functions. It explains how hash functions are constructed and tested in a digestible way.
In my Java application I need to interpret a 32 Bit Fixed Point value. The number format is as follows: The first 15 bits describe the places before the comma/point, the 16th bit represents the sign of the value and the following 16 bits describe the decimal places (1/2,1/4,1/8,1/16,...).
The input is a byte array with four values. The order of the bits in the byte array is little endian.
How can I convert such a number into Java float ?
Just do exactly what it says. Assume x is the 32bit fixed point number as int.
So, put the bits after the point, after the point, and don't use the sign here:
float f = (float)(x & 0x7fff_ffff) / (float)(1 << 16);
Put back the sign:
return x < 0 ? -f : f;
You will lose some precision. A float does not have 31 bits of precision, but your input does. It's easily adapted to doubles though.
Since the sign bit is apparently really in the middle, first get it out:
int sign = x & (1 << 16);
Join the two runs of non-sign bits:
x = (x & 0xFFFF) | ((x >> 1) & 0x7fff0000);
Then do more or less the old thing:
float f = (float)x / (float)(1 << 16);
return sign == 0 ? f : -f;
In case your input is little endian format, use the following approach to generate x:
int x = ByteBuffer.wrap(weirdFixedPoint).order(ByteOrder.LITTLE_ENDIAN).getInt();
where weirdFixedPoint is the byte array containing the 32 bit binary representation.
I have 2 separate applications one in Java and the other is C++. I am using Murmurhash3 for both. However, in C++ I get a different result as compared to Java for the same string
Here is the one from C++: https://code.google.com/p/smhasher/source/browse/trunk/MurmurHash3.cpp?r=144
I am using the following function:
void MurmurHash3_x86_32 ( const void * key, int len,
uint32_t seed, void * out )
Here is the one for Java: http://search-hadoop.com/c/HBase:hbase-common/src/main/java/org/apache/hadoop/hbase/util/MurmurHash3.java||server+void+%2522hash
There are many versions of the same Java code above.
This is how I am making a call for Java:
String s = new String("b2622f5e1310a0aa14b7f957fe4246fa");
System.out.println(MurmurHash3.murmurhash3_x86_32(s.getBytes(), 0, s.length(), 2147368987));
The output I get from Java:
-1868221715
The output I get from C++
3297211900
When I tested for some other sample strings like
"7c6c5be91430a56187060e06fd64dcb8" and "7e7e5f2613d0a2a8c591f101fe8c7351" they match in Java and C++.
Any pointers are appreciated
There are two problems I can see. First, C++ is using uint32_t, and giving you a value of 3,297,211,900. This number is larger than can fit in a signed 32-bit int, and Java uses only signed integers. However, -1,868,221,715 is not equal to 3,297,211,900, even accounting for the difference between signed and unsigned ints.
(In Java 8 they have added Integer.toUnsignedString(int), which will convert a signed 32-bit int to its unsigned string representation. In earlier versions of Java, you can cast the int to a long and then mask off the high bits: ((long) i) & 0xffffffffL.)
The second problem is that you are using the wrong version of getBytes(). The one that takes no argument converts a Unicode String to a byte[] using the default platform encoding, which may vary depending on how your system is set up. It could be giving you UTF-8, Latin1, Windows-1252, KOI8-R, Shift-JIS, EBCDIC, etc.
Never, ever, ever call the no arguments version of String.getBytes(), under any circumstances. It should be deprecated, decimated, defenestrated, destroyed, and deleted.
Use s.getBytes("UTF-8") (or whatever encoding you're expecting to get) instead.
As the Zen of Python says, "Explicit is better than implicit."
I can't tell if there may be any other problems beyond these two.
I had the same problem with you. But the Java version of my Murmurhash3 is different from yours. After making some changes to the C++ version of Murmurhash3, I made the hash values generated from the two versions the same. I give you my solution, which you can use to check if it also works for you.Maybe the biggest difference between the Java and C++ versions lies in the right shift operation(in Java you can see >> and >>>, while in C++ you can only see >>). The integers in Java are all signed, while in C++ you can use signed or unsigned integers. In Java version, the >> means arithmetic right shift and the >>> means logical right shift. And in C++, the >> means arithmetic right shift. The original C++ version of Murmurhash3 uses unsigned integer, and in order to generate the negative hash value like in Java, first in C++ you should change all the unsigned type uint32_t to signed type int32_t. Then you should locate the >>> in Java and make changes around the corresponding >> in C++. For me, I change from:
inline uint32_t rotl32 ( uint32_t x, int8_t r )
{
return (x << r) | (x >> (32 - r));
}
to:
inline int32_t rotl32 ( int32_t x, int8_t r )
{
return (x << r) | (int32_t)((uint32_t)x >> (32 - r)); //similar to >>> in Java
}
and from:
FORCE_INLINE uint32_t fmix32 ( uint32_t h )
{
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}
to:
FORCE_INLINE int32_t fmix32 ( int32_t h )
{
h ^= (int32_t)((uint32_t)h >> 16); // similar to >>> in Java
h *= 0x85ebca6b;
h ^= (int32_t)((uint32_t)h >> 13);
h *= 0xc2b2ae35;
h ^= (int32_t)((uint32_t)h >> 16);
return h;
}
In this way, my two versions of Murmurhash3 in Java and C++ generate the same hash value.
i have written these following three functions for my project to work:
WORD shuffling(WORD x)
{
// WORD - 4 bytes - 32 bits
//given input - a0,a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15- b0,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15
//output required - a0,b0,a1,b1,a2,b2,a3,b3,a4,b4,a5,b5,a6,b6,a7,b7 - a8,b8,a9,b9,a10,b10,a11,b11,a12,b12,a13,b13,a14,b14,a15,b15
x = (x & 0X0000FF00) << 8 | (x >> 8) & 0X0000FF00 | x & 0XFF0000FF;
x = (x & 0X00F000F0) << 4 | (x >> 4) & 0X00F000F0 | x & 0XF00FF00F;
x = (x & 0X0C0C0C0C) << 2 | (x >> 2) & 0X0C0C0C0C | x & 0XC3C3C3C3;
x = (x & 0X22222222) << 1 | (x >> 1) & 0X22222222 | x & 0X99999999;
return x;
}
WORD t_function(WORD n)
{
WORD t_result=0;
WORD64 var = 2*((n*n)& 0xFFFFFFFF)+n; // (n*n mod FFFFFFFF) becomes a 32-bit word
t_result = (WORD) ((var)& 0xFFFFFFFF);
return t_result;
}
WORD lfsr(WORD t_result)
{
WORD returnValue = t_result;
WORD flag = 0;
flag = returnValue & 0x80000000; // Checking if MSB is 1 or 0
// Left shift the input
returnValue = returnValue << 1;
// If MSB is 1 then XOR the reult with the primitive polynomial
if(flag > 0)
{
returnValue = returnValue ^ 0x4C11DB7;
}
return returnValue;
}
WORD - unsigned long
this code is in "c". Now i have to implement this in java. Everything is fine in compiling and running the code. But here i used unsigned long and in java i have used int Since i am operating on 32bits at a time. The problem is "when implementing in java whenever the result is going out of range of int the output is being deviated and it will not be the same output from the c code. Is there any solution for my problem to replace the unsigned long range values in java
Update – Java 8 can treat signed int & long as if unsigned
In Java, the primitive integer data types (byte, short, int, and long) are signed (positive or negative).
As of Java 8 both int and long can be treated explicitly as if they are unsigned. Officially a feature now, but kind of a hack nonetheless. Some may find it useful in certain limited circumstances. See the Java Tutorial.
int: By default, the int data type is a 32-bit signed two's complement integer, which has a minimum value of -2³¹ and a maximum value of 2³¹-1. In Java SE 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 2³²-1. Use the Integer class to use int data type as an unsigned integer. See the section The Number Classes for more information. Static methods like compareUnsigned, divideUnsigned etc have been added to the Integer class to support the arithmetic operations for unsigned integers.
long: The long data type is a 64-bit two's complement integer. The signed long has a minimum value of -2⁶³ and a maximum value of 2⁶³-1. In Java SE 8 and later, you can use the long data type to represent an unsigned 64-bit long, which has a minimum value of 0 and a maximum value of 2⁶⁴-1. The unsigned long has a minimum value of 0 and maximum value of 2⁶⁴-1. Use this data type when you need a range of values wider than those provided by int. The Long class also contains methods like compareUnsigned, divideUnsigned etc to support arithmetic operations for unsigned long.
I am not necessarily recommending this approach. I’m merely making you aware of the option.
Short answer, there's no unsigned data type in java. long in C is 32-bit on 32-bit systems, but java's long is 64-bit, so you can use that for replacement (at least it would solve the overflow problem). If you need even wider integers, use BigInteger class.
Look over Java's Primitive Data Types. If you need something bigger than a long, try a BigInteger.
All,
I have been practicing coding problems online. Currently I am working on a problem statement Problems where we need to convert Big Endian <-> little endian. But I am not able to jot down the steps considering the example given as:
123456789 converts to 365779719
The logic I am considering is :
1 > Get the integer value (Since I am on Windows x86, the input is Little endian)
2 > Generate the hex representation of the same.
3 > Reverse the representation and generate the big endian integer value
But I am obviously missing something here.
Can anyone please guide me. I am coding in Java 1.5
Since a great part of writing software is about reusing existing solutions, the first thing should always be a look into the documentation for your language/library.
reverse = Integer.reverseBytes(x);
I don't know how efficient this function is, but for toggling lots of numbers, a ByteBuffer should offer decent performance.
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
...
int[] myArray = aFountOfIntegers();
ByteBuffer buffer = ByteBuffer.allocate(myArray.length*Integer.BYTES);
buffer.order(ByteOrder.LITTLE_ENDIAN);
for (int x:myArray) buffer.putInt(x);
buffer.order(ByteOrder.BIG_ENDIAN);
buffer.rewind();
int i=0;
for (int x:myArray) myArray[i++] = buffer.getInt(x);
As eversor pointed out in the comments, ByteBuffer.putInt() is an optional method, and may not be available on all Java implementations.
The DIY Approach
Stacker's answer is pretty neat, but it is possible to improve upon it.
reversed = (i&0xff)<<24 | (i&0xff00)<<8 | (i&0xff0000)>>8 | (i>>24)&0xff;
We can get rid of the parentheses by adapting the bitmasks. E. g., (a & 0xFF)<<8 is equivalent to a<<8 & 0xFF00. The rightmost parentheses were not necessary anyway.
reversed = i<<24 & 0xff000000 | i<<8 & 0xff0000 | i>>8 & 0xff00 | i>>24 & 0xff;
Since the left shift shifts in zero bits, the first mask is redundant. We can get rid of the rightmost mask by using the logical shift operator, which shifts in only zero bits.
reversed = i<<24 | i>>8 & 0xff00 | i<<8 & 0xff0000 | i>>>24;
Operator precedence here, the gritty details on shift operators are in the Java Language Specification
Check this out
int little2big(int i) {
return (i&0xff)<<24 | (i&0xff00)<<8 | (i&0xff0000)>>8 | (i>>24)&0xff;
}
The thing you need to realize is that endian swaps deal with the bytes that represent the integer. So the 4 byte number 27 looks like 0x0000001B. To convert that number, it needs to go to 0x1B000000... With your example, the hex representation of 123456789 is 0x075BCD15 which needs to go to 0x15CD5B07 or in decimal form 365779719.
The function Stacker posted is moving those bytes around by bit shifting them; more specifically, the statement i&0xff takes the lowest byte from i, the << 24 then moves it up 24 bits, so from positions 1-8 to 25-32. So on through each part of the expression.
For example code, take a look at this utility.
Java primitive wrapper classes support byte reversing since 1.5 using reverseBytes method.
Short.reverseBytes(short i)
Integer.reverseBytes(int i)
Long.reverseBytes(long i)
Just a contribution for those who are looking for this answer in 2018.
I think this can also help:
int littleToBig(int i)
{
int b0,b1,b2,b3;
b0 = (i&0x000000ff)>>0;
b1 = (i&0x0000ff00)>>8;
b2 = (i&0x00ff0000)>>16;
b3 = (i&0xff000000)>>24;
return ((b0<<24)|(b1<<16)|(b2<<8)|(b3<<0));
}
Just use the static function (reverseBytes(int i)) in java which is under Integer Wrapper class
Integer i=Integer.reverseBytes(123456789);
System.out.println(i);
output:
365779719
the following method reverses the order of bits in a byte value:
public static byte reverseBitOrder(byte b) {
int converted = 0x00;
converted ^= (b & 0b1000_0000) >> 7;
converted ^= (b & 0b0100_0000) >> 5;
converted ^= (b & 0b0010_0000) >> 3;
converted ^= (b & 0b0001_0000) >> 1;
converted ^= (b & 0b0000_1000) << 1;
converted ^= (b & 0b0000_0100) << 3;
converted ^= (b & 0b0000_0010) << 5;
converted ^= (b & 0b0000_0001) << 7;
return (byte) (converted & 0xFF);
}