Using XOR Shift as a faster CRC32 checksum? - java

Is it valid to use XOR shift to produce a usable checksum? I can't find any evidence that it collides more than say CRC32.
I did run a simulation on 10 million randomly generated 8 to 32 length byte arrays and the hash32 method below actually produced 2% less collisions than CRC32.
Also, the code seems to run about 40x faster than Java's built-in util.zip.CRC32 class.
public static long hash64( byte[] bytes )
{
long x = 1;
for ( int i = 0; i < bytes.length; i++ )
{
x ^= bytes[ i ];
x ^= ( x << 21 );
x ^= ( x >>> 35 );
x ^= ( x << 4 );
}
return x;
}
public static int hash32( byte[] bytes )
{
int x = 1;
for ( int i = 0; i < bytes.length; i++ )
{
x ^= bytes[ i ];
x ^= ( x << 13 );
x ^= ( x >>> 17 );
x ^= ( x << 5 );
}
return x;
}

Yes, if all you need is a simple file checksum, it's a completely valid alternative, but it's not the best solution.
CRCs are optimized for reliably detecting burst errors, not collision resistance or uniform distribution. CRC-32 may superficially appear to work as a general hash function or a checksum, but it readily fails avalanche and collision tests, as you've seen in your test. CRC is also quite slow because it must implement polynomial division, which requires expensive operations, even when heavily optimized into shift operations. Table versions of CRC which utilize lookup tables (LUT) are also slow in interpreted languages such as Java due to unavoidable bounds-checking and conditional checks under the hood for each lookup.
Your solution is to take Xorshift, a pseudorandom function (PRF), and transform it into a hash function. On the surface, this may seem to pass basic collision tests, but it is not a very good choice. Its avalanche behavior is quite poor, and so there is a greater-than-chance probability of collisions that your tests aren't sensitive enough to find. Not only that, but it is sub-optimal, reading only one byte at a time. Better solutions exist with comparable performance.
A much better choice is 64-bit MurmurHash3, it performs quite well in Java when sufficiently optimized. It may even be faster than your solution for large inputs. I also recommend reading Bret Mulvey's article on Hash Functions. It explains how hash functions are constructed and tested in a digestible way.

Related

Explanation of BigInteger 'multiplyToLen' Function

While working on a large integer implementation of my own, I looked through Java's BigInteger source in order to gain further understanding of multiplication algorithms, and focused mainly on multiplyToLen().
Overall, the function seems to take on the general gradeschool multiplication algorithm apporach, but I cannot understand key parts of it.
First, the algorithm goes through this first loop, where x and y are the two numbers being multiplied, and z is the product:
int xstart = xlen - 1;
int ystart = ylen - 1;
...
for (int j=ystart, k=ystart+1+xstart; j >= 0; j--, k--) {
long product = (y[j] & LONG_MASK) * (x[xstart] & LONG_MASK) + carry;
z[k] = (int)product;
carry = product >>> 32;
}
z[xstart] = (int)carry;
Then, it goes onto the next loop, that seems a lot closer to the gradeschool algorithm.
for (int i = xstart-1; i >= 0; i--) {
carry = 0;
for (int j=ystart, k=ystart+1+i; j >= 0; j--, k--) {
long product = (y[j] & LONG_MASK) * (x[i] & LONG_MASK) +
(z[k] & LONG_MASK) + carry;
z[k] = (int)product;
carry = product >>> 32;
}
z[i] = (int)carry;
}
I have tried tracing both loops using decimal numbers to no avail, and I cannot grasp the function of the first loop versus the second loop.
What part of the multiplication algorithm is being done in the first loop?
The first loop multiplies two integers (one from each BigInteger x and y respectively) and then stores the lower 32 bits of the results in the result array z. The higher 32 bits are used as carry for the next higher pair of integers from x and y resp.
The other loops do almost the same, but they have to add the results to the integers already stored in the z array, so they are not as simple as the first one.
The bit fiddling with the longs and the LONG_MASK is only there to treat the integers as unsigned 32 bit values (Java does generally not know unsigned integers) by promoting them to 64 bit integers and then masking the lower 32 bits to get unsigned 32 bit values. The 64 bit multiplication results disregard any overflow in bit 63. The lower bits are stored (loop 1) or added (other loops) to the already calculated results from previous loops, found in z. The top 32 bits are used as carry for the next iteration.
This is how it is generally done. My Delphi code for BigIntegers does the same, and IIRC, that is also the algorithm that Knuth shows in his Art Of Computer Programming (vol II).

Is there even an algorithm for 2^(n) - 1 which lies in Theta Ө(1)?

so I have a question about an algorithm I'm supposed to "invent"/"find". It's an algorithm which calculates 2^(n) - 1 for Ө(n^n) and Ө(1) and Ө(n).
I was thinking for several hours but I couldn't find any solution for both tasks (the first ones while the last one was the easist imo, I posted the algorithm below). But I'm not skilled enough to "invent"/"find" one for a very slow and very fast algorithm.
So far my algorithms are (In Pseudocode):
The one for Ө(n)
int f(int n) {
int number = 2
if(n = 0) then return 0
if(n==1) then return 1
while(n > 1)
number = number * 2
n--
number = number - 1
return number
A simple one and kinda obvious one which uses recursion though I don't know how fast it is (It would be nice if someone could tell me that):
int f(int n) {
if(n==0) then return 0
if(n==1) then return 1
return 3*f(n-1) - 2*f(n-2)
}
Assuming n is not bounded by any constant (and output should not be a simple int, but a data type that can contain large integers to allow it) - there is no algorithm
to yield 2^n -1 in Ө(1), since the size of the output itself is
Ө(log(n)), so if we assume there is such algorithm, and let it
run in constant time and makes less than C operations, for n =
2^(C+1), you will require C+1 operations only to print the
output, which contradicts the assumption that C is the upper bound, so
there is no such algorithm.
For Ө(n^n), if you have a more efficient algorithm (Ө(n) for example), you can make a pointless loop that runs extra n^n iterations and do nothing important, it will make your algorithm Ө(n^n).
There is also a Ө(log(n)*M(logn)) algorithm, using exponent by squaring, and then simply reducing 1 from this value. In here M(x) is complexity of your multiplying operator for number containing x digits.
As commented by #kajacx, you can even improve (3) by applying Fourier transform
Something like:
HugeInt h = 1;
h = h << n;
h = h - 1;
Obviously HugeInt is pseudo-code for an integer type that can be of arbitrary size allowing for any n.
=====
Look at amit's answer instead!
the Ө(n^n) is too tricky for me, but a real Ө(1) algorithm on any "binary" architecture would be:
return n-1 bits filled with 1
(assuming your architecture can allocate and fill n-1 bits in constant time)
;)

Efficient BigInteger multiplication modulo n in Java

I can calculate the multiplication of two BigIntegers (say a and b) modulo n.
This can be done by:
a.multiply(b).mod(n);
However, assuming that a and b are of the same order of n, it implies that during the calculation, a new BigInteger is being calculated, and its length (in bytes) is ~ 2n.
I wonder whether there is more efficient implementation that I can use. Something like modMultiply that is implemented like modPow (which I believe does not calculate the power and then the modulo).
I can only think of
a.mod(n).multiply(b.mod(n)).mod(n)
and you seem already to be aware of this.
BigInteger has a toByteArray() but internally ints are used. hence n must be quite large to have an effect. Maybe in key generation cryptographic code there might be such work.
Furhtermore, if you think of short-cutting the multiplication, you'll get something like the following:
public static BigInteger multiply(BigInteger a, BigInteger b, int mod) {
if (a.signum() == -1) {
return multiply(a.negate(), b, mod).negate();
}
if (b.signum() == -1) {
return multiply(a, b.negate(), mod).negate();
}
int n = (Integer.bitCount(mod - 1) + 7) / 8; // mod in bytes.
byte[] aa = a.toByteArray(); // Highest byte at [0] !!
int na = Math.min(n, aa.length); // Heuristic.
byte[] bb = b.toByteArray();
int nb = Math.min(n, bb.length); // Heuristic.
byte[] prod = new byte[n];
for (int ia = 0; ia < na; ++ia) {
int m = ia + nb >= n ? n - ia - 1 : nb; // Heuristic.
for (int ib = 0; ib < m; ++ib) {
int p = (0xFF & aa[aa.length - 1 - ia]) * (0xFF & bb[bb.length - 1 - ib]);
addByte(prod, ia + ib, p & 0xFF);
if (ia + ib + 1 < n) {
addByte(prod, ia + ib + 1, (p >> 8) & 0xFF);
}
}
}
// Still need to do an expensive mod:
return new BigInteger(prod).mod(BigInteger.valueOf(mod));
}
private static void addByte(byte[] prod, int i, int value) {
while (value != 0 && i < prod.length) {
value += prod[prod.length - 1 - i] & 0xFF;
prod[prod.length - 1 - i] = (byte) value;
value >>= 8;
++i;
}
}
That code does not look appetizing. BigInteger has the problem of exposing the internal value only as big-endian byte[] where the first byte is the most significant one.
Much better would be to have the digits in base N. That is not unimaginable: if N is a power of 2 some nice optimizations are feasible.
(BTW the code is untested - as it does not seem convincingly faster.)
First, the bad news: I couldn't find any existing Java libraries that provided this functionality.
I couldn't find any pure-Java big integer libraries ... apart from java.math.BigInteger.
There are Java / JNI wrappers for the GMP library, but GMP doesn't implement this either.
So what are your options?
Maybe there is some pure-Java library that I missed.
Maybe there some other native (C / C++) big integer library supports this operation ... though you may need to write your own JNI wrappers.
You should be able to implement such a method for yourself, by copying the source code of java.math.BigInteger and adding an extra custom method. Alternatively, it looks like you could extend it.
Having said that, I'm not sure that there is a "substantially faster" algorithm for computing a * b mod n in Java, or any other language. (Apart from special cases; e.g. when n is a power of 2).
Specifically, the "Montgomery Reduction" approach wouldn't help for a single multiplication step. (The Wikipedia page says: "Because numbers have to be converted to and from a particular form suitable for performing the Montgomery step, a single modular multiplication performed using a Montgomery step is actually slightly less efficient than a "naive" one.")
So maybe the most effective way to speedup the computation would be to use the JNI wrappers for GMP.
You can use generic maths, like:
(A*B) mod N = ((A mod N) * (B mod N)) mod N
It may be more CPU intensive, but one should choose between CPU and memory, right?
If we are talking about modular arithmetic then indeed Montgomery reduction may be what you need. Don't know any out of box solutions though.
You can write a BigInteger multiplication as a standard long multiplication in a very large base -- for example, in base 2^32. It is fairly straightforward. If you want only the result modulo n, then it is advantageous to choose a base that is a factor of n or of which n is a factor. Then you can ignore all but one or a few of the lowest-order result (Big)digits as you perform the computation, saving space and maybe time.
That's most practical if you know n in advance, of course, but such pre-knowledge is not essential. It's especially nice if n is a power of two, and it's fairly messy if n is neither a power of 2 nor smaller than the maximum operand handled directly by the system's arithmetic unit, but all of those cases can be handled in principle.
If you must do this specifically with Java BigInteger instances, however, then be aware that any approach not provided by the BigInteger class itself will incur overhead for converting between internal and external representations.
Maybe this:
static BigInteger multiply(BigInteger c, BigInteger x)
{
BigInteger sum = BigInteger.ZERO;
BigInteger addOperand;
for (int i=0; i < FIELD_ELEMENT_BIT_SIZE; i++)
{
if (c.testBit(i))
addOperand = x;
else
addOperand = BigInteger.ZERO;
sum = add(sum, addOperand);
x = x.shiftRight(1);
}
return sum;
}
with the following helper functions:
static BigInteger add(BigInteger a, BigInteger b)
{
return modOrder(a.add(b));
}
static BigInteger modOrder(BigInteger n)
{
return n.remainder(FIELD_ORDER);
}
To be honest though, I'm not sure if this is really efficient at all since none of these operations are performed in-place.

One-byte bool. Why?

In C++, why does a bool require one byte to store true or false where just one bit is enough for that, like 0 for false and 1 for true? (Why does Java also require one byte?)
Secondly, how much safer is it to use the following?
struct Bool {
bool trueOrFalse : 1;
};
Thirdly, even if it is safe, is the above field technique really going to help? Since I have heard that we save space there, but still compiler generated code to access them is bigger and slower than the code generated to access the primitives.
Why does a bool require one byte to store true or false where just one bit is enough
Because every object in C++ must be individually addressable* (that is, you must be able to have a pointer to it). You cannot address an individual bit (at least not on conventional hardware).
How much safer is it to use the following?
It's "safe", but it doesn't achieve much.
is the above field technique really going to help?
No, for the same reasons as above ;)
but still compiler generated code to access them is bigger and slower than the code generated to access the primitives.
Yes, this is true. On most platforms, this requires accessing the containing byte (or int or whatever), and then performing bit-shifts and bit-mask operations to access the relevant bit.
If you're really concerned about memory usage, you can use a std::bitset in C++ or a BitSet in Java, which pack bits.
* With a few exceptions.
Using a single bit is much slower and much more complicated to allocate. In C/C++ there is no way to get the address of one bit so you wouldn't be able to do &trueOrFalse as a bit.
Java has a BitSet and EnumSet which both use bitmaps. If you have very small number it may not make much difference. e.g. objects have to be atleast byte aligned and in HotSpot are 8 byte aligned (In C++ a new Object can be 8 to 16-byte aligned) This means saving a few bit might not save any space.
In Java at least, Bits are not faster unless they fit in cache better.
public static void main(String... ignored) {
BitSet bits = new BitSet(4000);
byte[] bytes = new byte[4000];
short[] shorts = new short[4000];
int[] ints = new int[4000];
for (int i = 0; i < 100; i++) {
long bitTime = timeFlip(bits) + timeFlip(bits);
long bytesTime = timeFlip(bytes) + timeFlip(bytes);
long shortsTime = timeFlip(shorts) + timeFlip(shorts);
long intsTime = timeFlip(ints) + timeFlip(ints);
System.out.printf("Flip time bits %.1f ns, bytes %.1f, shorts %.1f, ints %.1f%n",
bitTime / 2.0 / bits.size(), bytesTime / 2.0 / bytes.length,
shortsTime / 2.0 / shorts.length, intsTime / 2.0 / ints.length);
}
}
private static long timeFlip(BitSet bits) {
long start = System.nanoTime();
for (int i = 0, len = bits.size(); i < len; i++)
bits.flip(i);
return System.nanoTime() - start;
}
private static long timeFlip(short[] shorts) {
long start = System.nanoTime();
for (int i = 0, len = shorts.length; i < len; i++)
shorts[i] ^= 1;
return System.nanoTime() - start;
}
private static long timeFlip(byte[] bytes) {
long start = System.nanoTime();
for (int i = 0, len = bytes.length; i < len; i++)
bytes[i] ^= 1;
return System.nanoTime() - start;
}
private static long timeFlip(int[] ints) {
long start = System.nanoTime();
for (int i = 0, len = ints.length; i < len; i++)
ints[i] ^= 1;
return System.nanoTime() - start;
}
prints
Flip time bits 5.0 ns, bytes 0.6, shorts 0.6, ints 0.6
for sizes of 40000 and 400K
Flip time bits 6.2 ns, bytes 0.7, shorts 0.8, ints 1.1
for 4M
Flip time bits 4.1 ns, bytes 0.5, shorts 1.0, ints 2.3
and 40M
Flip time bits 6.2 ns, bytes 0.7, shorts 1.1, ints 2.4
If you want to store only one bit of information, there is nothing more compact than a char, which is the smallest addressable memory unit in C/C++. (Depending on the implementation, a bool might have the same size as a char but it is allowed to be bigger.)
A char is guaranteed by the C standard to hold at least 8 bits, however, it can also consist of more. The exact number is available via the CHAR_BIT macro defined in limits.h (in C) or climits (C++). Today, it is most common that CHAR_BIT == 8 but you cannot rely on it (see here). It is guaranteed to be 8, however, on POSIX compliant systems and on Windows.
Though it is not possible to reduce the memory footprint for a single flag, it is of course possible to combine multiple flags. Besides doing all bit operations manually, there are some alternatives:
If you know the number of bits at compile time
bitfields (as in your question). But beware, the ordering of fields is not guaranteed, which may result in portability issues.
std::bitset
If you know the size only at runtime
boost::dynamic_bitset
If you have to deal with large bitvectors, take a look at the BitMagic library. It supports compression and is heavily tuned.
As others have pointed out already, saving a few bits is not always a good idea. Possible drawbacks are:
Less readable code
Reduced execution speed because of the extra extraction code.
For the same reason, increases in code size, which may outweigh the savings in data consumption.
Hidden synchronization issues in multithreaded programs. For example, flipping two different bits by two different threads may result in a race condition. In contrast, it is always safe for two threads to modify two different objects of primitive types (e.g., char).
Typically, it makes sense when you are dealing with huge data because then you will benefit from less pressure on memory and cache.
Why don't you just store the state to a byte? Haven't actually tested the below, but it should give you an idea. You can even utilize a short or an int for 16 or 32 states. I believe I have a working JAVA example as well. I'll post this when I find it.
__int8 state = 0x0;
bool getState(int bit)
{
return (state & (1 << bit)) != 0x0;
}
void setAllOnline(bool online)
{
state = -online;
}
void reverseState(int bit)
{
state ^= (1 << bit);
}
Alright here's the JAVA version. I've stored it to an Int value since. If I remember correctly even using a byte would utilize 4 bytes anyways. And this obviously isn't be utilized as an array.
public class State
{
private int STATE;
public State() {
STATE = 0x0;
}
public State(int previous) {
STATE = previous;
}
/*
* #Usage - Used along side the #setMultiple(int, boolean);
* #Returns the value of a single bit.
*/
public static int valueOf(int bit)
{
return 1 << bit;
}
/*
* #Usage - Used along side the #setMultiple(int, boolean);
* #Returns the value of an array of bits.
*/
public static int valueOf(int... bits)
{
int value = 0x0;
for (int bit : bits)
value |= (1 << bit);
return value;
}
/*
* #Returns the value currently stored or the values of all 32 bits.
*/
public int getValue()
{
return STATE;
}
/*
* #Usage - Turns all bits online or offline.
* #Return - <TRUE> if all states are online. Otherwise <FALSE>.
*/
public boolean setAll(boolean online)
{
STATE = online ? -1 : 0;
return online;
}
/*
* #Usage - sets multiple bits at once to a specific state.
* #Warning - DO NOT SET BITS TO THIS! Use setMultiple(State.valueOf(#), boolean);
* #Return - <TRUE> if states were set to online. Otherwise <FALSE>.
*/
public boolean setMultiple(int value, boolean online)
{
STATE |= value;
if (!online)
STATE ^= value;
return online;
}
/*
* #Usage - sets a single bit to a specific state.
* #Return - <TRUE> if this bit was set to online. Otherwise <FALSE>.
*/
public boolean set(int bit, boolean online)
{
STATE |= (1 << bit);
if(!online)
STATE ^= (1 << bit);
return online;
}
/*
* #return = the new current state of this bit.
* #Usage = Good for situations that are reversed.
*/
public boolean reverse(int bit)
{
return (STATE ^= (1 << bit)) == (1 << bit);
}
/*
* #return = <TRUE> if this bit is online. Otherwise <FALSE>.
*/
public boolean online(int bit)
{
int value = 1 << bit;
return (STATE & value) == value;
}
/*
* #return = a String contains full debug information.
*/
#Override
public String toString()
{
StringBuilder sb = new StringBuilder();
sb.append("TOTAL VALUE: ");
sb.append(STATE);
for (int i = 0; i < 0x20; i++)
{
sb.append("\nState(");
sb.append(i);
sb.append("): ");
sb.append(online(i));
sb.append(", ValueOf: ");
sb.append(State.valueOf(i));
}
return sb.toString();
}
}
Also I should point out that you really shouldn't utilize a special class for this, but to just have the variable stored within the class that'll be most likely utilizing it. If you plan to have 100's or even 1000's of Boolean values consider an array of bytes.
E.g. the below example.
boolean[] states = new boolean[4096];
can be converted into the below.
int[] states = new int[128];
Now you're probably wondering how you'll access index 4095 from a 128 array. So what this is doing is if we simplify it. The 4095 is be shifted 5 bits to the right which is technically the same as divide by 32. So 4095 / 32 = rounded down (127). So we are at index 127 of the array. Then we perform 4095 & 31 which will cast it to a value between 0 and 31. This will only work with powers of two minus 1. E.g. 0,1,3,7,15,31,63,127,255,511,1023, etc...
So now we can access the bit at that position. As you can see this is very very compact and beats having 4096 booleans in a file :) This will also provide a much faster read/write to a binary file. I have no idea what this BitSet stuff is, but it looks like complete garbage and since byte,short,int,long are already in their bit forms technically you might as well use them as is. Then creating some complex class to access the individual bits from memory which is what I could grasp from reading a few posts.
boolean getState(int index)
{
return (states[index >> 5] & 1 << (index & 0x1F)) != 0x0;
}
Further information...
Basically if the above was a bit confusing here's a simplified version of what's happening.
The types "byte", "short", "int", "long" all are data types which have different ranges.
You can view this link: http://msdn.microsoft.com/en-us/library/s3f49ktz(v=vs.80).aspx
To see the data ranges of each.
So a byte is equal to 8 bits. So an int which is 4 bytes will be 32 bits.
Now there isn't any easy way to perform some value to the N power. However thanks to bit shifting we can simulate it somewhat. By performing 1 << N this equates to 1 * 2^N. So if we did 2 << 2^N we'd be doing 2 * 2^N. So to perform powers of two always do "1 << N".
Now we know that a int will have 32 bits so can use each bits so we can just simply index them.
To keep things simple think of the "&" operator as a way to check if a value contains the bits of another value. So let's say we had a value which was 31. To get to 31. we must add the following bits 0 through 4. Which are 1,2,4,8, and 16. These all add up to 31. Now when we performing 31 & 16 this will return 16 because the bit 4 which is 2^4 = 16. Is located in this value. Now let's say we performed 31 & 20 which is checking if bits 2 and 4 are located in this value. This will return 20 since both bits 2 and 4 are located here 2^2 = 4 + 2^4 = 16 = 20. Now let's say we did 31 & 48. This is checking for bits 4 and 5. Well we don't have bit 5 in 31. So this will only return 16. It will not return 0. So when performing multiple checks you must check that it physically equals that value. Instead of checking if it equals 0.
The below will verify if an individual bit is at 0 or 1. 0 being false, and 1 being true.
bool getState(int bit)
{
return (state & (1 << bit)) != 0x0;
}
The below is example of checking two values if they contain those bits. Think of it like each bit is represented as 2^BIT so when we do
I'll quickly go over some of the operators. We've just recently explained the "&" operator slightly. Now for the "|" operator.
When performing the following
int value = 31;
value |= 16;
value |= 16;
value |= 16;
value |= 16;
The value will still be 31. This is because bit 4 or 2^4=16 is already turned on or set to 1. So performing "|" returns that value with that bit turned on. If it's already turned on no changes are made. We utilize "|=" to actually set the variable to that returned value.
Instead of doing -> "value = value | 16;". We just do "value |= 16;".
Now let's look a bit further into how the "&" and "|" can be utilized.
/*
* This contains bits 0,1,2,3,4,8,9 turned on.
*/
const int CHECK = 1 | 2 | 4 | 8 | 16 | 256 | 512;
/*
* This is some value were we add bits 0 through 9, but we skip 0 and 8.
*/
int value = 2 | 4 | 8 | 16 | 32 | 64 | 128 | 512;
So when we perform the below code.
int return_code = value & CHECK;
The return code will be 2 + 4 + 8 + 16 + 512 = 542
So we were checking for 799, but we recieved 542 This is because bits o and 8 are offline we equal 256 + 1 = 257 and 799 - 257 = 542.
The above is great great great way to check if let's say we were making a video game and wanted to check if so and so buttons were pressed if any of them were pressed. We could simply check each of those bits with one check and it would be so many times more efficient than performing a Boolean check on every single state.
Now let's say we have Boolean value which is always reversed.
Normally you'd do something like
bool state = false;
state = !state;
Well this can be done with bits as well utilizing the "^" operator.
Just as we performed "1 << N" to choose the whole value of that bit. We can do the same with the reverse. So just like we showed how "|=" stores the return we will do the same with "^=". So what this does is if that bit is on we turn it off. If it's off we turn it on.
void reverseState(int bit)
{
state ^= (1 << bit);
}
You can even have it return the current state. If you wanted it to return the previous state just swap "!=" to "==". So what this does is performs the reversal then checks the current state.
bool reverseAndGet(int bit)
{
return ((state ^= (1 << bit)) & (1 << bit)) != 0x0;
}
Storing multiple non single bit aka bool values into a int can also be done. Let's say we normally write out our coordinate position like the below.
int posX = 0;
int posY = 0;
int posZ = 0;
Now let's say these never wen't passed 1023. So 0 through 1023 was the maximum distance on all of these. I'm choose 1023 for other purposes as previously mentioned you can manipulate the "&" variable as a way to force a value between 0 and 2^N - 1 values. So let's say your range was 0 through 1023. We can perform "value & 1023" and it'll always be a value between 0 and 1023 without any index parameter checks. Keep in mind as previously mentioned this only works with powers of two minus one. 2^10 = 1024 - 1 = 1023.
E.g. no more if (value >= 0 && value <= 1023).
So 2^10 = 1024, which requires 10 bits in order to hold a number between 0 and 1023.
So 10x3 = 30 which is still less than or equal to 32. Is sufficient for holding all these values in an int.
So we can perform the following. So to see how many bits we used. We do 0 + 10 + 20. The reason I put the 0 there is to show you visually that 2^0 = 1 so # * 1 = #. The reason we need y << 10 is because x uses up 10 bits which is 0 through 1023. So we need to multiple y by 1024 to have unique values for each. Then Z needs to be multiplied by 2^20 which is 1,048,576.
int position = (x << 0) | (y << 10) | (z << 20);
This makes comparisons fast.
We can now do
return this.position == position;
apposed to
return this.x == x && this.y == y && this.z == z;
Now what if we wanted the actual positions of each?
For the x we simply do the following.
int getX()
{
return position & 1023;
}
Then for the y we need to perform a left bit shift then AND it.
int getY()
{
return (position >> 10) & 1023;
}
As you may guess the Z is the same as the Y, but instead of 10 we use 20.
int getZ()
{
return (position >> 20) & 1023;
}
I hope whoever views this will find it worth while information :).
If you really want to use 1 bit, you can use a char to store 8 booleans, and bitshift to get the value of the one you want. I doubt it will be faster, and it's probably going to gives you a lot of headaches working that way, but technically it's possible.
On a side note, an attempt like this could prove useful for systems that don't have a lot of memory available for variables but do have some more processing power then what you need. I highly doubt you will ever need it though.

How to get around for loop bottleneck for constant time operations?

Working on a rules agnostic poker simulator for fun. Testing bottlenecks in enumeration, and for hands that would always get pulled from the "unique" array, I found an interesting bottleneck. I measured the average computation time of running each of the variations below 1,000,000,000 times and then took the best of 100 repetitions of that to allow JIT and Hotspot to work their magic. What I found was there's a difference in computation time (6ns vs 27ns) between
public int getRank7(int ... cards) {
int q = (cards[0] >> 16) | (cards[1] >> 16) | (cards[2] >> 16) | (cards[3] >> 16) | (cards[4] >> 16) | (cards[5] >> 16) | (cards[6] >> 16);
int product = ((cards[0] & 0xFF) * (cards[1] & 0xFF) * (cards[2] & 0xFF) * (cards[3] & 0xFF) * (cards[4] & 0xFF) * (cards[5] & 0xFF) * (cards[6] & 0xFF));
if(flushes[q] > 0) return flushes[q];
if(unique[q] > 0) return unique[q];
int x = Arrays.binarySearch(products, product);
return rankings[x];
}
and
public int getRank(int ... cards) {
int q = 0;
long product = 1;
for(int c : cards) {
q |= (c >> 16);
product *= (c & 0xFF);
}
if(flushes[q] > 0) return flushes[q];
if(unique[q] > 0) return unique[q];
int x = Arrays.binarySearch(products, product);
return rankings[x];
}
The issue is definitely the for loop, not the addition of handling multiplication at the top of the function. I'm a little baffled by this since I'm running the same number of operations in each scenario... I realized I'd always have 6 or more cards in this function so I brought things closer together by changing it to
public int getRank(int c0, int c1, int c2, int c3, int c4, int c5, int ... cards)
But I'm going to have the same bottleneck as the number of cards goes up. Is there any way to get around this fact, and if not, could somebody explain to me why a for loop for the same number of operations is so much slower?
I think you'll find that the big difference is branching. Your for loop scenario requires a check and conditional branch on each iteration of the for loop. Your CPU will try and predict which branch will be taken, and pipeline instructions accordingly, but when it mispredicts (at least once per function call, as the loop terminates), the pipeline stalls, which is very expensive.
One thing to try would be a regular for loop with a fixed upper bound (rather than one based on the length of the array); the Java JRE may unroll such a loop, which would result in the same sequence of operations as your more efficient version.
That enhanced for loop requires setting up an iterator, which is relatively expensive when you only have a handful of items.
It'd be interesting to see what your timings are if you wrote a traditional for loop:
for (int i = 0; i < cards.length; ++i)
{
q |= (cards[i] >> 16);
product *= (cards[i] & 0xFF);
}
But even that is likely to be slightly slower than the first example, because there's some loop overhead (incrementing the index, comparing it against the length, and branching to the beginning of the loop).
In any case, the loop overhead adds an increment, a comparison, and a branch to each iteration. And that comparison could very well require a pointer de-reference to get to cards.length. It's quite plausible that the loop overhead is much more expensive than the work you're doing in the loop.

Categories