What is an efficient alternative for an extremely large HashMap?

What is an efficient alternative for an extremely large HashMap? - java

I'm trying to break a symmetric encryption using a 'meet-in-the-middle' attack. For this I need to store 2**32 integer-integer pairs. I'm storing the mapping from a 4-byte cyphertext to a 4-byte key.
At first I tried using an array, but then I realized that you cannot have such a big array in java (the max size is bound by Integer.MAX_VALUE).
Now I'm using a HashMap, but this gets way too slow when the map gets large, even when increasing the max memory to 8GB with -Xmx8192M.
What is an efficient alternative for an extremely large HashMap?
This is the code I'm currently using to populate my hashmap:
HashMap<Integer, Integer> map = new HashMap<>(Integer.MAX_VALUE);
// Loop until integer overflow
for (int k = 1; k != 0; k++)
map.put(encrypt_left(c, k), k);
I haven't seen this code finish, even after letting it run for hours. Progress logging shows that the first 2**24 values are created in 22s, but then the performance quickly decreases.

I'm storing the mapping from a 4-byte cyphertext to a 4-byte key.
Conveniently, 4 bytes is an int. As you observed, array sizes are limited by Integer.MAX_VALUE. That suggests you can use an array – but there's a minor hangup. Integers are signed, but arrays only permit values >=0.
So you create two arrays: one for the positive cyphertexts, and one for the negative cyphertexts. Then you just need to make sure that you've given the JVM enough heap.
How much heap is that?
4 bytes * Integer.MAX_VALUE * 2 arrays
= 17179869176 bytes
= ~16.0 gigabytes.

When building a rainbow table, consider the size of data, you are going to produce. Consider also the fact, that this problem can be solved without vast amounts of RAM. This is done by using files instead of putting all in memory. Typically you build files of the size that fits in your file buffer. For example 4096 bytes or 8192 bytes. If you get a key, you just divide it by the file buffer's size, load the file and look at mod x position.
The tricky part is that you need the encrypted data to be layed out, and not the key. So you start with dummy files and write the key data at the position of the encrypted data.
So let's say, your key is 1026 and the encrypted data is 126. The flke to write 1026 to is 0.rbt because 126*4 byte / 4096 = 0. The position is 126*4 byte.
And of course you need the nio classes for that.

Following the advice of #MattBall, I implemented my own BigArray, which composes a 32-bit length array from 4 separate arrays.
Running this without the suggested JVM arguments will cause an OutOfMemoryError. Using this with the suggested JVM arguments but with too little RAM will probably cause your machine to crash.
/**
* Array that holds 2**32 integers, Implemented as four 30-bit arrays.
* <p>
* Requires 16 GB RAM solely for the array allocation.
* <p>
* Example JVM Arguments: <code>-Xmx22000M -Xms17000M</code>
* <p>
* This sets the max memory to 22,000 MB and the initial memory to 17,000 MB
* <p>
* WARNING: don't use these settings if your machine does not have this much RAM.
*
* #author popovitsj
*/
public class BigArray
{
private int[] a_00= new int[1 << 30];
private int[] a_01 = new int[1 << 30];
private int[] a_10 = new int[1 << 30];
private int[] a_11 = new int[1 << 30];
private static final int A_00 = 0;
private static final int A_01 = 1 << 30;
private static final int A_10 = 1 << 31;
private static final int A_11 = 3 << 30;
private static final int A_30 = A_01 - 1;
public void set(int index, int value)
{
getArray(index)[index & A_30] = value;
}
public int get(int index)
{
return getArray(index)[index & A_30];
}
private int[] getArray(int index)
{
switch (index & A_11)
{
case (A_00):
return a_00;
case (A_01):
return a_01;
case (A_10):
return a_10;
default:
return a_11;
}
}
}

This is big data problem, in this case it is more of a big memory problem. The computation should be done in memory for performance. Use Hazelcast distributed HashMap. It is very easy to use and very performant.
You can use more than 2 or more machines for your problem.
Sample usage :
HazelcastInstance hzInstance = Hazelcast.newHazelcastInstance();
Map<Integer, Integer> map = hzInstance.getMap("map1");
map.put(x,y);
..

Related

What is the space complexity of bitset in this scenario

I am doing a leetcode problem where I have to find the duplicate of an array of size [1-N] inclusive and came upon this solution:
public int findDuplicate(int[] nums) {
BitSet bit = new BitSet();
for(int num : nums) {
if(!bit.get(num)) {
bit.set(num);
} else {
return num;
}
}
return -1;
}
The use of bitset here im assuming is similar to using boolean[] to keep track if we saw the current number previously. So my question is what the space complexity is for this? The runtime seems to be O(n) where n is the size of the input array. Would the same be true for the space complexity?
Link to problem : https://leetcode.com/problems/find-the-duplicate-number/

Your Bitset creates an underlying long[] to store the values. Reading the code of Bitset#set, I would say it's safe to say that the array will never be larger than max(nums) / 64 * 2 = max(nums) / 32. Since long has a fixed size, this comes down to O(max(nums)). If nums contains large values, you can do better with a hash map.
I'm trying this out with simple code, and it seems to corroborate my reading of the code.
BitSet bitSet = new BitSet();
bitSet.set(100);
System.out.println(bitSet.toLongArray().length); // 2 (max(nums) / 32 = 3.125)
bitSet.set(64000);
System.out.println(bitSet.toLongArray().length); // 1001 (max(nums) / 32 = 2000)
bitSet.set(100_000);
System.out.println(bitSet.toLongArray().length); // 1563 (max(nums) / 32 = 3125)
Note that the 2 factor I added is conservative, in general it will be a smaller factor, that's why my formula consistently over-estimates the actual length of the long array, but never by more than a factor of 2. This is the code in Bitset that made me add it:
private void ensureCapacity(int wordsRequired) {
if (words.length < wordsRequired) {
// Allocate larger of doubled size or required size
int request = Math.max(2 * words.length, wordsRequired);
words = Arrays.copyOf(words, request);
sizeIsSticky = false;
}
}
In summary, I would say the bit set is only a good idea if you have reason to believe you have smaller values than you have values (count). For example, if you have only two values but they are over a billion in value, you will needlessly allocate an array of several million elements.
Additionally, even in cases where values remain small, this solutions performs poorly for sorted arrays because Bitset#set will always reallocate and copy the array, so your complexity is not linear at all, it's quadratic in max(nums), which can be terrible if max(nums) is very large. To be linear, you would need to first find the maximum, allocate the necessary length in the Bitset, and then only go through the array.
At this point, using a map is simpler and fits all situations. If speed really matters, my bet is that the Bitset will beat a map under specific conditions (lots of values, but small, and by pre-sizing the bit set as described).

Why is the java vector API so slow compared to scalar?

I recently decided to play around with Java's new incubated vector API, to see how fast it can get. I implemented two fairly simple methods, one for parsing an int and one for finding the index of a character in a string. In both cases, my vectorized methods were incredibly slow compared to their scalar equivalents.
Here's my code:
public class SIMDParse {
private static IntVector mul = IntVector.fromArray(
IntVector.SPECIES_512,
new int[] {0, 0, 0, 0, 0, 0, 1000000000, 100000000, 10000000, 1000000, 100000, 10000, 1000, 100, 10, 1},
0
);
private static byte zeroChar = (byte) '0';
private static int width = IntVector.SPECIES_512.length();
private static byte[] filler;
static {
filler = new byte[16];
for (int i = 0; i < 16; i++) {
filler[i] = zeroChar;
}
}
public static int parseInt(String str) {
boolean negative = str.charAt(0) == '-';
byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
if (negative) {
bytes[0] = zeroChar;
}
bytes = ensureSize(bytes, width);
ByteVector vec = ByteVector.fromArray(ByteVector.SPECIES_128, bytes, 0);
vec = vec.sub(zeroChar);
IntVector ints = (IntVector) vec.castShape(IntVector.SPECIES_512, 0);
ints = ints.mul(mul);
return ints.reduceLanes(VectorOperators.ADD) * (negative ? -1 : 1);
}
public static byte[] ensureSize(byte[] arr, int per) {
int mod = arr.length % per;
if (mod == 0) {
return arr;
}
int length = arr.length - (mod);
length += per;
byte[] newArr = new byte[length];
System.arraycopy(arr, 0, newArr, per - mod, arr.length);
System.arraycopy(filler, 0, newArr, 0, per - mod);
return newArr;
}
public static byte[] ensureSize2(byte[] arr, int per) {
int mod = arr.length % per;
if (mod == 0) {
return arr;
}
int length = arr.length - (mod);
length += per;
byte[] newArr = new byte[length];
System.arraycopy(arr, 0, newArr, 0, arr.length);
return newArr;
}
public static int indexOf(String s, char c) {
byte[] b = s.getBytes(StandardCharsets.UTF_8);
int width = ByteVector.SPECIES_MAX.length();
byte bChar = (byte) c;
b = ensureSize2(b, width);
for (int i = 0; i < b.length; i += width) {
ByteVector vec = ByteVector.fromArray(ByteVector.SPECIES_MAX, b, i);
int pos = vec.compare(VectorOperators.EQ, bChar).firstTrue();
if (pos != width) {
return pos + i;
}
}
return -1;
}
}
I fully expected my int parsing to be slower, since it won't ever be handling more than the vector size can hold (an int can never be more than 10 digits long).
By my bechmarks, parsing 123 as an int 10k times took 3081 microseconds for Integer.parseInt, and 80601 microseconds for my implementation. Searching for 'a' in a very long string ("____".repeat(4000) + "a" + "----".repeat(193)) took 7709 microseconds to String#indexOf's 7.
Why is it so unbelievably slow? I thought the entire point of SIMD is that it's faster than the scalar equivalents for tasks like these.

You picked something SIMD is not great at (string->int), and something that JVMs are very good at optimizing out of loops. And you made an implementation with a bunch of extra copying work if the inputs aren't exact multiples of the vector width.
I'm assuming your times are totals (for 10k repeats each), not a per-call average.
7 us is impossibly fast for that.
"____".repeat(4000) is 16k bytes before the 'a', which I assume is what you're searching for. Even a well-tuned / unrolled memchr (aka indexOf) running at 2x 32-byte vectors per clock cycle, on a 4GHz CPU, would take 625 us for 10k reps. (16000B / (64B/c) * 10000 reps / 4000 MHz). And yes, I'd expect a JVM to either call the native memchr or use something equally efficient for a commonly-used core library function like String#indexOf. For example, glibc's avx2 memchr is pretty well-tuned with loop unrolling; if you're on Linux, your JVM might be calling it.
Built-in String indexOf is also something the JIT "knows about". It's apparently able to hoist it out of loops when it can see that you're using the same string repeatedly as input. (But then what's it doing for the rest of those 7 us? I guess doing a not-quite-so-great memchr and then doing an empty 10k iteration loop at 1/clock could take about 7 microseconds, especially if your CPU isn't as fast as 4GHz.)
See Idiomatic way of performance evaluation? - if doubling the repeat-count to 20k doesn't double the time, your benchmark is broken and not measuring what you think it does.
Your manual SIMD indexOf is very unlikely to get optimized out of a loop. It makes a copy of the whole array every time, if the size isn't an exact multiple of the vector width!! (In ensureSize2). The normal technique is to fall back to scalar for the last size % width elements, which is obviously much better for large arrays. Or even better, do an unaligned load that ends at the end of the array (if the total size is >= vector width) for something where overlap with previous work is not a problem.
A decent memchr on modern x86 (using an algorithm like your indexOf without unrolling) should go at about 1 vector (16/32/64 bytes) per maybe 1.5 clock cycles, with data hot in L1d cache, without loop unrolling or anything. (Checking both the vector compare and the pointer bound as possible loop exit conditions takes extra asm instructions vs. a simple strlen, but see this answer for some microbenchmarks of a simple hand-written strlen that assumes aligned buffers). Probably your indexOf loops bottlenecks on front-end throughput on a CPU like Skylake, with its pipeline width of 4 uops/clock.
So let's guess that your implementation takes 1.5 cycles per 16 byte vector, if perhaps you're on a CPU without AVX2? You didn't say.
16kB / 16B = 1000 vectors. At 1 vector per 1.5 clocks, that's 1500 cycles. On a 3GHz machine, 1500 cycles takes 500 ns = 0.5 us per call, or 5000 us per 10k reps. But since 16194 bytes isn't a multiple of 16, you're also copying the whole thing every call, so that costs some more time, and could plausibly account for your 7709 us total time.
What SIMD is good for
for tasks like these.
No, "horizontal" stuff like ints.reduceLanes is something SIMD is generally slow at. And even with something like How to implement atoi using SIMD? using x86 pmaddwd to multiply and add pairs horizontally, it's still a lot of work.
Note that to make the elements wide enough to multiply by place-values without overflow, you have to unpack, which costs some shuffling. ints.reduceLanes takes about log2(elements) shuffle/add steps, and if you're starting with 512-bit AVX-512 vectors of int, the first 2 of those shuffles are lane-crossing, 3 cycle latency (https://agner.org/optimize/). (Or if your machine doesn't even have AVX2, then a 512-bit integer vector is actually 4x 128-bit vectors. And you had to do separate work to unpack each part. But at least the reduction will be cheap, just vertical adds until you get down to a single 128-bit vector.)

Hmm. I found this post because I've hit something strange with the Vector perfomance for something that ostensibly it should be ideal for - multiplying two double arrays.
static private void doVector(int iteration, double[] input1, double[] input2, double[] output) {
Instant start = Instant.now();
for (int i = 0; i < SPECIES.loopBound(ARRAY_LENGTH); i += SPECIES.length()) {
DoubleVector va = DoubleVector.fromArray(SPECIES, input1, i);
DoubleVector vb = DoubleVector.fromArray(SPECIES, input2, i);
va.mul(vb);
System.arraycopy(va.mul(vb).toArray(), 0, output, i, SPECIES.length());
}
Instant finish = Instant.now();
System.out.println("vector duration " + iteration + ": " + Duration.between(start, finish).getNano());
}
The species length comes out at 4 on my machine (CPU is Intel i7-7700HQ at 2.8 GHz).
On my first attempt the execution was taking more than 15 milliseconds to execute (compared with 0 for the scalar equivalent), even with a tiny array length (8 elements). On a hunch I added the iteration to see whether something had to warm up - and indeed, the first iteration still ALWAYS takes ages (44 ms for 65536 elements). Whilst most of the other iterations are reporting zero time, a few are taking around 15ms but they are randomly distributed (i.e. not always the same iteration index on each run). I sort of expect that (because I'm measuring real-time measurement and other stuff will be going on).
However, overall for an array size of 65536 elements, and 32 iterations, the total duration for the vector approach is 2-3 times longer than that for the scalar one.

Creating combinations of a BitSet

Assume I have a Java BitSet. I now need to make combinations of the BitSet such that only Bits which are Set can be flipped. i.e. only need combinations of Bits which are set.
For Eg. BitSet - 1010, Combinations - 1010, 1000, 0010, 0000
BitSet - 1100, Combination - 1100, 1000, 0100, 0000
I can think of a few solutions E.g. I can take combinations of all 4 bits and then XOR the combinations with the original Bitset. But this would be very resource-intensive for large sparse BitSets. So I was looking for a more elegant solution.

It appears that you want to get the power set of the bit set. There is already an answer here about how to get the power set of a Set<T>. Here, I will show a modified version of the algorithm shown in that post, using BitSets:
private static Set<BitSet> powerset(BitSet set) {
Set<BitSet> sets = new HashSet<>();
if (set.isEmpty()) {
sets.add(new BitSet(0));
return sets;
}
Integer head = set.nextSetBit(0);
BitSet rest = set.get(0, set.size());
rest.clear(head);
for (BitSet s : powerset(rest)) {
BitSet newSet = s.get(0, s.size());
newSet.set(head);
sets.add(newSet);
sets.add(s);
}
return sets;
}

You can perform the operation in a single linear pass instead of recursion, if you realize the integer numbers are a computer’s intrinsic variant of “on off” patterns and iterating over the appropriate integer range will ultimately produce all possible permutations. The only challenge in your case, is to transfer the densely packed bits of an integer number to the target bits of a BitSet.
Here is such a solution:
static List<BitSet> powerset(BitSet set) {
int nBits = set.cardinality();
if(nBits > 30) throw new OutOfMemoryError(
"Not enough memory for "+BigInteger.ONE.shiftLeft(nBits)+" BitSets");
int max = 1 << nBits;
int[] targetBits = set.stream().toArray();
List<BitSet> sets = new ArrayList<>(max);
for(int onOff = 0; onOff < max; onOff++) {
BitSet next = new BitSet(set.size());
for(int bitsToSet = onOff, ix = 0; bitsToSet != 0; ix++, bitsToSet>>>=1) {
if((bitsToSet & 1) == 0) {
int skip = Integer.numberOfTrailingZeros(bitsToSet);
ix += skip;
bitsToSet >>>= skip;
}
next.set(targetBits[ix]);
}
sets.add(next);
}
return sets;
}
It uses an int value for the iteration, which is already enough to represent all permutations that can ever be stored in one of Java’s builtin collections. If your source BitSet has 2³¹ one bits, the 2³² possible combinations do not only require a hundred GB heap, but also a collection supporting 2³² elements, i.e. a size not representable as int.
So the code above terminates early if the number exceeds the capabilities, without even trying. You could rewrite it to use a long or even BigInteger instead, to keep it busy in such cases, until it will fail with an OutOfMemoryError anyway.
For the working cases, the int solution is the most efficient variant.
Note that the code returns a List rather than a HashSet to avoid the costs of hashing. The values are already known to be unique and hashing would only pay off if you want to perform lookups, i.e. call contains with another BitSet. But to test whether an existing BitSet is a permutation of your input BitSet, you wouldn’t even need to generate all permutations, a simple bit operation, e.g. andNot would tell you that already. So for storing and iterating the permutations, an ArrayList is more efficient.

Convert a String to array of bits

I would like to convert a String consisting of 0's and 1's to an array of bits.
The String is of length ~30000 and is sparse (mostly 0s, few 1s)
For example, given a string
"00000000100000000010000100000000001000"
I would like to convert it to an array of bits which will store
[00000000100000000010000100000000001000]
I am thinking of using BitSet or OpenBitSet
Is there a better way? The use case is to perform logical OR efficiently.
I am thinking along these lines
final OpenBitSet logicalOrResult = new OpenBitSet();
for (final String line : lines) {
final OpenBitSet myBitArray = new OpenBitSet();
int pos = 0;
for (final char c : str.toCharArray()) {
myBitArray.set(pos) = c;
pos++;
}
logicalOrResult.or(myBitArray);
}

BigInteger can parse it and store it, and do bitwise operations:
BigInteger x = new BigInteger(bitString, 2);
BigInteger y = new BigInteger(otherBitString, 2);
x = x.or(y);
System.out.println(x.toString(2));

A BitSet ranging over values between 0 and 30000 requires a long array of size less than 500, so you can assume that BitSet.or (or the respective OpenBitSet method) will be sufficiently fast, despite the sparsity. It looks like OpenBitSet has better performance than BitSet, but apart from this it doesn't really matter which you use, both will implement or efficiently. However, be sure to pass the length of the String to the (Open)BitSet constructor to avoid reallocations of the internal long array during construction!
If your strings are much longer and your sparsity is extreme, you could also consider storing them as a sorted list of Integers (or ints, if you use a library like Trove), representing the indices which contain a 1. A bitwise or can be implemented in a merge(sort)-like fashion, which is quite efficient (time O(n + m), where n, m are the numbers of ones in each string). I suspect that in your scenario it will be slower than the BitSet approach though.

You can iterate through each character:
boolean[] bits = new boolean[str.length];
for (int i=0;i<str.length;i++) {
if (str.charAt(i).equals("1")
bits[i] = true;
else if (str.charAt(i).equals("0")
bits[i] = false;
}
If you want to be memory efficient, you could try RLE (Run Length Encoding).

One-byte bool. Why?

In C++, why does a bool require one byte to store true or false where just one bit is enough for that, like 0 for false and 1 for true? (Why does Java also require one byte?)
Secondly, how much safer is it to use the following?
struct Bool {
bool trueOrFalse : 1;
};
Thirdly, even if it is safe, is the above field technique really going to help? Since I have heard that we save space there, but still compiler generated code to access them is bigger and slower than the code generated to access the primitives.

Why does a bool require one byte to store true or false where just one bit is enough
Because every object in C++ must be individually addressable* (that is, you must be able to have a pointer to it). You cannot address an individual bit (at least not on conventional hardware).
How much safer is it to use the following?
It's "safe", but it doesn't achieve much.
is the above field technique really going to help?
No, for the same reasons as above ;)
but still compiler generated code to access them is bigger and slower than the code generated to access the primitives.
Yes, this is true. On most platforms, this requires accessing the containing byte (or int or whatever), and then performing bit-shifts and bit-mask operations to access the relevant bit.
If you're really concerned about memory usage, you can use a std::bitset in C++ or a BitSet in Java, which pack bits.
* With a few exceptions.

Using a single bit is much slower and much more complicated to allocate. In C/C++ there is no way to get the address of one bit so you wouldn't be able to do &trueOrFalse as a bit.
Java has a BitSet and EnumSet which both use bitmaps. If you have very small number it may not make much difference. e.g. objects have to be atleast byte aligned and in HotSpot are 8 byte aligned (In C++ a new Object can be 8 to 16-byte aligned) This means saving a few bit might not save any space.
In Java at least, Bits are not faster unless they fit in cache better.
public static void main(String... ignored) {
BitSet bits = new BitSet(4000);
byte[] bytes = new byte[4000];
short[] shorts = new short[4000];
int[] ints = new int[4000];
for (int i = 0; i < 100; i++) {
long bitTime = timeFlip(bits) + timeFlip(bits);
long bytesTime = timeFlip(bytes) + timeFlip(bytes);
long shortsTime = timeFlip(shorts) + timeFlip(shorts);
long intsTime = timeFlip(ints) + timeFlip(ints);
System.out.printf("Flip time bits %.1f ns, bytes %.1f, shorts %.1f, ints %.1f%n",
bitTime / 2.0 / bits.size(), bytesTime / 2.0 / bytes.length,
shortsTime / 2.0 / shorts.length, intsTime / 2.0 / ints.length);
}
}
private static long timeFlip(BitSet bits) {
long start = System.nanoTime();
for (int i = 0, len = bits.size(); i < len; i++)
bits.flip(i);
return System.nanoTime() - start;
}
private static long timeFlip(short[] shorts) {
long start = System.nanoTime();
for (int i = 0, len = shorts.length; i < len; i++)
shorts[i] ^= 1;
return System.nanoTime() - start;
}
private static long timeFlip(byte[] bytes) {
long start = System.nanoTime();
for (int i = 0, len = bytes.length; i < len; i++)
bytes[i] ^= 1;
return System.nanoTime() - start;
}
private static long timeFlip(int[] ints) {
long start = System.nanoTime();
for (int i = 0, len = ints.length; i < len; i++)
ints[i] ^= 1;
return System.nanoTime() - start;
}
prints
Flip time bits 5.0 ns, bytes 0.6, shorts 0.6, ints 0.6
for sizes of 40000 and 400K
Flip time bits 6.2 ns, bytes 0.7, shorts 0.8, ints 1.1
for 4M
Flip time bits 4.1 ns, bytes 0.5, shorts 1.0, ints 2.3
and 40M
Flip time bits 6.2 ns, bytes 0.7, shorts 1.1, ints 2.4

If you want to store only one bit of information, there is nothing more compact than a char, which is the smallest addressable memory unit in C/C++. (Depending on the implementation, a bool might have the same size as a char but it is allowed to be bigger.)
A char is guaranteed by the C standard to hold at least 8 bits, however, it can also consist of more. The exact number is available via the CHAR_BIT macro defined in limits.h (in C) or climits (C++). Today, it is most common that CHAR_BIT == 8 but you cannot rely on it (see here). It is guaranteed to be 8, however, on POSIX compliant systems and on Windows.
Though it is not possible to reduce the memory footprint for a single flag, it is of course possible to combine multiple flags. Besides doing all bit operations manually, there are some alternatives:
If you know the number of bits at compile time
bitfields (as in your question). But beware, the ordering of fields is not guaranteed, which may result in portability issues.
std::bitset
If you know the size only at runtime
boost::dynamic_bitset
If you have to deal with large bitvectors, take a look at the BitMagic library. It supports compression and is heavily tuned.
As others have pointed out already, saving a few bits is not always a good idea. Possible drawbacks are:
Less readable code
Reduced execution speed because of the extra extraction code.
For the same reason, increases in code size, which may outweigh the savings in data consumption.
Hidden synchronization issues in multithreaded programs. For example, flipping two different bits by two different threads may result in a race condition. In contrast, it is always safe for two threads to modify two different objects of primitive types (e.g., char).
Typically, it makes sense when you are dealing with huge data because then you will benefit from less pressure on memory and cache.

Why don't you just store the state to a byte? Haven't actually tested the below, but it should give you an idea. You can even utilize a short or an int for 16 or 32 states. I believe I have a working JAVA example as well. I'll post this when I find it.
__int8 state = 0x0;
bool getState(int bit)
{
return (state & (1 << bit)) != 0x0;
}
void setAllOnline(bool online)
{
state = -online;
}
void reverseState(int bit)
{
state ^= (1 << bit);
}
Alright here's the JAVA version. I've stored it to an Int value since. If I remember correctly even using a byte would utilize 4 bytes anyways. And this obviously isn't be utilized as an array.
public class State
{
private int STATE;
public State() {
STATE = 0x0;
}
public State(int previous) {
STATE = previous;
}
/*
* #Usage - Used along side the #setMultiple(int, boolean);
* #Returns the value of a single bit.
*/
public static int valueOf(int bit)
{
return 1 << bit;
}
/*
* #Usage - Used along side the #setMultiple(int, boolean);
* #Returns the value of an array of bits.
*/
public static int valueOf(int... bits)
{
int value = 0x0;
for (int bit : bits)
value |= (1 << bit);
return value;
}
/*
* #Returns the value currently stored or the values of all 32 bits.
*/
public int getValue()
{
return STATE;
}
/*
* #Usage - Turns all bits online or offline.
* #Return - <TRUE> if all states are online. Otherwise <FALSE>.
*/
public boolean setAll(boolean online)
{
STATE = online ? -1 : 0;
return online;
}
/*
* #Usage - sets multiple bits at once to a specific state.
* #Warning - DO NOT SET BITS TO THIS! Use setMultiple(State.valueOf(#), boolean);
* #Return - <TRUE> if states were set to online. Otherwise <FALSE>.
*/
public boolean setMultiple(int value, boolean online)
{
STATE |= value;
if (!online)
STATE ^= value;
return online;
}
/*
* #Usage - sets a single bit to a specific state.
* #Return - <TRUE> if this bit was set to online. Otherwise <FALSE>.
*/
public boolean set(int bit, boolean online)
{
STATE |= (1 << bit);
if(!online)
STATE ^= (1 << bit);
return online;
}
/*
* #return = the new current state of this bit.
* #Usage = Good for situations that are reversed.
*/
public boolean reverse(int bit)
{
return (STATE ^= (1 << bit)) == (1 << bit);
}
/*
* #return = <TRUE> if this bit is online. Otherwise <FALSE>.
*/
public boolean online(int bit)
{
int value = 1 << bit;
return (STATE & value) == value;
}
/*
* #return = a String contains full debug information.
*/
#Override
public String toString()
{
StringBuilder sb = new StringBuilder();
sb.append("TOTAL VALUE: ");
sb.append(STATE);
for (int i = 0; i < 0x20; i++)
{
sb.append("\nState(");
sb.append(i);
sb.append("): ");
sb.append(online(i));
sb.append(", ValueOf: ");
sb.append(State.valueOf(i));
}
return sb.toString();
}
}
Also I should point out that you really shouldn't utilize a special class for this, but to just have the variable stored within the class that'll be most likely utilizing it. If you plan to have 100's or even 1000's of Boolean values consider an array of bytes.
E.g. the below example.
boolean[] states = new boolean[4096];
can be converted into the below.
int[] states = new int[128];
Now you're probably wondering how you'll access index 4095 from a 128 array. So what this is doing is if we simplify it. The 4095 is be shifted 5 bits to the right which is technically the same as divide by 32. So 4095 / 32 = rounded down (127). So we are at index 127 of the array. Then we perform 4095 & 31 which will cast it to a value between 0 and 31. This will only work with powers of two minus 1. E.g. 0,1,3,7,15,31,63,127,255,511,1023, etc...
So now we can access the bit at that position. As you can see this is very very compact and beats having 4096 booleans in a file :) This will also provide a much faster read/write to a binary file. I have no idea what this BitSet stuff is, but it looks like complete garbage and since byte,short,int,long are already in their bit forms technically you might as well use them as is. Then creating some complex class to access the individual bits from memory which is what I could grasp from reading a few posts.
boolean getState(int index)
{
return (states[index >> 5] & 1 << (index & 0x1F)) != 0x0;
}
Further information...
Basically if the above was a bit confusing here's a simplified version of what's happening.
The types "byte", "short", "int", "long" all are data types which have different ranges.
You can view this link: http://msdn.microsoft.com/en-us/library/s3f49ktz(v=vs.80).aspx
To see the data ranges of each.
So a byte is equal to 8 bits. So an int which is 4 bytes will be 32 bits.
Now there isn't any easy way to perform some value to the N power. However thanks to bit shifting we can simulate it somewhat. By performing 1 << N this equates to 1 * 2^N. So if we did 2 << 2^N we'd be doing 2 * 2^N. So to perform powers of two always do "1 << N".
Now we know that a int will have 32 bits so can use each bits so we can just simply index them.
To keep things simple think of the "&" operator as a way to check if a value contains the bits of another value. So let's say we had a value which was 31. To get to 31. we must add the following bits 0 through 4. Which are 1,2,4,8, and 16. These all add up to 31. Now when we performing 31 & 16 this will return 16 because the bit 4 which is 2^4 = 16. Is located in this value. Now let's say we performed 31 & 20 which is checking if bits 2 and 4 are located in this value. This will return 20 since both bits 2 and 4 are located here 2^2 = 4 + 2^4 = 16 = 20. Now let's say we did 31 & 48. This is checking for bits 4 and 5. Well we don't have bit 5 in 31. So this will only return 16. It will not return 0. So when performing multiple checks you must check that it physically equals that value. Instead of checking if it equals 0.
The below will verify if an individual bit is at 0 or 1. 0 being false, and 1 being true.
bool getState(int bit)
{
return (state & (1 << bit)) != 0x0;
}
The below is example of checking two values if they contain those bits. Think of it like each bit is represented as 2^BIT so when we do
I'll quickly go over some of the operators. We've just recently explained the "&" operator slightly. Now for the "|" operator.
When performing the following
int value = 31;
value |= 16;
value |= 16;
value |= 16;
value |= 16;
The value will still be 31. This is because bit 4 or 2^4=16 is already turned on or set to 1. So performing "|" returns that value with that bit turned on. If it's already turned on no changes are made. We utilize "|=" to actually set the variable to that returned value.
Instead of doing -> "value = value | 16;". We just do "value |= 16;".
Now let's look a bit further into how the "&" and "|" can be utilized.
/*
* This contains bits 0,1,2,3,4,8,9 turned on.
*/
const int CHECK = 1 | 2 | 4 | 8 | 16 | 256 | 512;
/*
* This is some value were we add bits 0 through 9, but we skip 0 and 8.
*/
int value = 2 | 4 | 8 | 16 | 32 | 64 | 128 | 512;
So when we perform the below code.
int return_code = value & CHECK;
The return code will be 2 + 4 + 8 + 16 + 512 = 542
So we were checking for 799, but we recieved 542 This is because bits o and 8 are offline we equal 256 + 1 = 257 and 799 - 257 = 542.
The above is great great great way to check if let's say we were making a video game and wanted to check if so and so buttons were pressed if any of them were pressed. We could simply check each of those bits with one check and it would be so many times more efficient than performing a Boolean check on every single state.
Now let's say we have Boolean value which is always reversed.
Normally you'd do something like
bool state = false;
state = !state;
Well this can be done with bits as well utilizing the "^" operator.
Just as we performed "1 << N" to choose the whole value of that bit. We can do the same with the reverse. So just like we showed how "|=" stores the return we will do the same with "^=". So what this does is if that bit is on we turn it off. If it's off we turn it on.
void reverseState(int bit)
{
state ^= (1 << bit);
}
You can even have it return the current state. If you wanted it to return the previous state just swap "!=" to "==". So what this does is performs the reversal then checks the current state.
bool reverseAndGet(int bit)
{
return ((state ^= (1 << bit)) & (1 << bit)) != 0x0;
}
Storing multiple non single bit aka bool values into a int can also be done. Let's say we normally write out our coordinate position like the below.
int posX = 0;
int posY = 0;
int posZ = 0;
Now let's say these never wen't passed 1023. So 0 through 1023 was the maximum distance on all of these. I'm choose 1023 for other purposes as previously mentioned you can manipulate the "&" variable as a way to force a value between 0 and 2^N - 1 values. So let's say your range was 0 through 1023. We can perform "value & 1023" and it'll always be a value between 0 and 1023 without any index parameter checks. Keep in mind as previously mentioned this only works with powers of two minus one. 2^10 = 1024 - 1 = 1023.
E.g. no more if (value >= 0 && value <= 1023).
So 2^10 = 1024, which requires 10 bits in order to hold a number between 0 and 1023.
So 10x3 = 30 which is still less than or equal to 32. Is sufficient for holding all these values in an int.
So we can perform the following. So to see how many bits we used. We do 0 + 10 + 20. The reason I put the 0 there is to show you visually that 2^0 = 1 so # * 1 = #. The reason we need y << 10 is because x uses up 10 bits which is 0 through 1023. So we need to multiple y by 1024 to have unique values for each. Then Z needs to be multiplied by 2^20 which is 1,048,576.
int position = (x << 0) | (y << 10) | (z << 20);
This makes comparisons fast.
We can now do
return this.position == position;
apposed to
return this.x == x && this.y == y && this.z == z;
Now what if we wanted the actual positions of each?
For the x we simply do the following.
int getX()
{
return position & 1023;
}
Then for the y we need to perform a left bit shift then AND it.
int getY()
{
return (position >> 10) & 1023;
}
As you may guess the Z is the same as the Y, but instead of 10 we use 20.
int getZ()
{
return (position >> 20) & 1023;
}
I hope whoever views this will find it worth while information :).

If you really want to use 1 bit, you can use a char to store 8 booleans, and bitshift to get the value of the one you want. I doubt it will be faster, and it's probably going to gives you a lot of headaches working that way, but technically it's possible.
On a side note, an attempt like this could prove useful for systems that don't have a lot of memory available for variables but do have some more processing power then what you need. I highly doubt you will ever need it though.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.