Improvement of Algorithm: Counting set bits in Byte-Arrays

Improvement of Algorithm: Counting set bits in Byte-Arrays - java

We store knowledge in byte arrays as bits. Counting the number of set bits is pretty slow. Any suggestion to improve the algorithm is welcome:
public static int countSetBits(byte[] array) {
int setBits = 0;
if (array != null) {
for (int byteIndex = 0; byteIndex < array.length; byteIndex++) {
for (int bitIndex = 0; bitIndex < 7; bitIndex++) {
if (getBit(bitIndex, array[byteIndex])) {
setBits++;
}
}
}
}
return setBits;
}
public static boolean getBit(int index, final byte b) {
byte t = setBit(index, (byte) 0);
return (b & t) > 0;
}
public static byte setBit(int index, final byte b) {
return (byte) ((1 << index) | b);
}
To count the bits of a byte array of length of 156'564 takes 300 ms, that's too much!

Try Integer.bitcount to obtain the number of bits set in each byte. It will be more efficient if you can switch from a byte array to an int array. If this is not possible, you could also construct a look-up table for all 256 bytes to quickly look up the count rather than iterating over individual bits.
And if it's always the whole array's count you're interested in, you could wrap the array in a class that stores the count in a separate integer whenever the array changes. (edit: Or, indeed, as noted in comments, use java.util.BitSet.)

I would use the same global loop but instead of looping inside each byte I would simply use a (precomputed) array of size 256 mapping bytes to their bit count. That would probably be very efficient.
If you need even more speed, then you should separately maintain the count and increment it and decrement it when setting bits (but that would mean a big additional burden on those operations so I'm not sure it's applicable for you).
Another solution would be based on BitSet implementation : it uses an array of long (and not bytes) and here's how it counts :
658 int sum = 0;
659 for (int i = 0; i < wordsInUse; i++)
660 sum += Long.bitCount(words[i]);
661 return sum;

I would use:
byte[] yourByteArray = ...
BitSet bitset = BitSet.valueOf(yourByteArray); // java.util.BitSet
int setBits = bitset.cardinality();
I don't know if it's faster, but I think it will be faster than what you have. Let me know?
Your method would look like
public static int countSetBits(byte[] array) {
return BitSet.valueOf(array).cardinality();
}
You say:
We store knowledge in byte arrays as bits.
I would recommend to use a BitSet for that. It gives you convenient methods, and you seem to be interested in bits, not bytes, so it is a much more appropriate data type compared to a byte[]. (Internally it uses a long[]).

By far the fastest way is counting bits set, in "parallel", method is called Hamming weight
and is implemented in Integer.bitCount(int i) as far as I know.

As per my understaning,
1 Byte = 8 Bits
So if Byte Array size = n , then isn't total number of bits = n*8 ?
Please correct me if my understanding is wrong
Thanks
Vinod

Related

Creating combinations of a BitSet

Assume I have a Java BitSet. I now need to make combinations of the BitSet such that only Bits which are Set can be flipped. i.e. only need combinations of Bits which are set.
For Eg. BitSet - 1010, Combinations - 1010, 1000, 0010, 0000
BitSet - 1100, Combination - 1100, 1000, 0100, 0000
I can think of a few solutions E.g. I can take combinations of all 4 bits and then XOR the combinations with the original Bitset. But this would be very resource-intensive for large sparse BitSets. So I was looking for a more elegant solution.

It appears that you want to get the power set of the bit set. There is already an answer here about how to get the power set of a Set<T>. Here, I will show a modified version of the algorithm shown in that post, using BitSets:
private static Set<BitSet> powerset(BitSet set) {
Set<BitSet> sets = new HashSet<>();
if (set.isEmpty()) {
sets.add(new BitSet(0));
return sets;
}
Integer head = set.nextSetBit(0);
BitSet rest = set.get(0, set.size());
rest.clear(head);
for (BitSet s : powerset(rest)) {
BitSet newSet = s.get(0, s.size());
newSet.set(head);
sets.add(newSet);
sets.add(s);
}
return sets;
}

You can perform the operation in a single linear pass instead of recursion, if you realize the integer numbers are a computer’s intrinsic variant of “on off” patterns and iterating over the appropriate integer range will ultimately produce all possible permutations. The only challenge in your case, is to transfer the densely packed bits of an integer number to the target bits of a BitSet.
Here is such a solution:
static List<BitSet> powerset(BitSet set) {
int nBits = set.cardinality();
if(nBits > 30) throw new OutOfMemoryError(
"Not enough memory for "+BigInteger.ONE.shiftLeft(nBits)+" BitSets");
int max = 1 << nBits;
int[] targetBits = set.stream().toArray();
List<BitSet> sets = new ArrayList<>(max);
for(int onOff = 0; onOff < max; onOff++) {
BitSet next = new BitSet(set.size());
for(int bitsToSet = onOff, ix = 0; bitsToSet != 0; ix++, bitsToSet>>>=1) {
if((bitsToSet & 1) == 0) {
int skip = Integer.numberOfTrailingZeros(bitsToSet);
ix += skip;
bitsToSet >>>= skip;
}
next.set(targetBits[ix]);
}
sets.add(next);
}
return sets;
}
It uses an int value for the iteration, which is already enough to represent all permutations that can ever be stored in one of Java’s builtin collections. If your source BitSet has 2³¹ one bits, the 2³² possible combinations do not only require a hundred GB heap, but also a collection supporting 2³² elements, i.e. a size not representable as int.
So the code above terminates early if the number exceeds the capabilities, without even trying. You could rewrite it to use a long or even BigInteger instead, to keep it busy in such cases, until it will fail with an OutOfMemoryError anyway.
For the working cases, the int solution is the most efficient variant.
Note that the code returns a List rather than a HashSet to avoid the costs of hashing. The values are already known to be unique and hashing would only pay off if you want to perform lookups, i.e. call contains with another BitSet. But to test whether an existing BitSet is a permutation of your input BitSet, you wouldn’t even need to generate all permutations, a simple bit operation, e.g. andNot would tell you that already. So for storing and iterating the permutations, an ArrayList is more efficient.

Byte to "Bit"array

A byte is the smallest numeric datatype java offers but yesterday I came in contact with bytestreams for the first time and at the beginning of every package a marker byte is send which gives further instructions on how to handle the package. Every bit of the byte has a specific meaning so I am in need to entangle the byte into it's 8 bits.
You probably could convert the byte to a boolean array or create a switch for every case but that can't certainly be the best practice.
How is this possible in java why are there no bit datatypes in java?

Because there is no bit data type that exists on the physical computer. The smallest allotment you can allocate on most modern computers is a byte which is also known as an octet or 8 bits. When you display a single bit you are really just pulling that first bit out of the byte with arithmetic and adding it to a new byte which still is using an 8 bit space. If you want to put bit data inside of a byte you can but it will be stored as a at least a single byte no matter what programming language you use.

You could load the byte into a BitSet. This abstraction hides the gory details of manipulating single bits.
import java.util.BitSet;
public class Bits {
public static void main(String[] args) {
byte[] b = new byte[]{10};
BitSet bitset = BitSet.valueOf(b);
System.out.println("Length of bitset = " + bitset.length());
for (int i=0; i<bitset.length(); ++i) {
System.out.println("bit " + i + ": " + bitset.get(i));
}
}
}
$ java Bits
Length of bitset = 4
bit 0: false
bit 1: true
bit 2: false
bit 3: true
You can ask for any bit, but the length tells you that all the bits past length() - 1 are set to 0 (false):
System.out.println("bit 75: " + bitset.get(75));
bit 75: false

Have a look at java.util.BitSet.
You might use it to interpret the byte read and can use the get method to check whether a specific bit is set like this:
byte b = stream.read();
final BitSet bitSet = BitSet.valueOf(new byte[]{b});
if (bitSet.get(2)) {
state.activateComponentA();
} else {
state.deactivateComponentA();
}
state.setFeatureBTo(bitSet.get(1));
On the other hand, you can create your own bitmask easily and convert it to a byte array (or just byte) afterwards:
final BitSet output = BitSet.valueOf(ByteBuffer.allocate(1));
output.set(3, state.isComponentXActivated());
if (state.isY){
output.set(4);
}
final byte w = output.toByteArray()[0];

How is this possible in java why are there no bit datatypes in java?
There are no bit data types in most languages. And most CPU instruction sets have few (if any) instructions dedicated to adressing single bits. You can think of the lack of these as a trade-off between (language or CPU) complexity and need.
Manipulating a single bit can be though of as a special case of manipulating multiple bits; and languages as well as CPU's are equipped for the latter.
Very common operations like testing, setting, clearing, inverting as well as exclusive or are all supported on the integer primitive types (byte, short/char, int, long), operating on all bits of the type at once. By chosing the parameters appropiately you can select which bits to operate on.
If you think about it, a byte array is a bit array where the bits are grouped in packages of 8. Adressing a single bit in the array is relatively simple using logical operators (AND &, OR |, XOR ^ and NOT ~).
For example, testing if bit N is set in a byte can be done using a logical AND with a mask where only the bit to be tested is set:
public boolean testBit(byte b, int n) {
int mask = 1 << n; // equivalent of 2 to the nth power
return (b & mask) != 0;
}
Extending this to a byte array is no magic either, each byte consists of 8 bits, so the byte index is simply the bit number divided by 8, and the bit number inside that byte is the remainder (modulo 8):
public boolean testBit(byte[] array, int n) {
int index = n >>> 3; // divide by 8
int mask = 1 << (n & 7); // n modulo 8
return (array[index] & mask) != 0;
}

Here is a sample, I hope useful for you!
DatagramSocket socket = new DatagramSocket(6160, InetAddress.getByName("0.0.0.0"));
socket.setBroadcast(true);
while (true) {
byte[] recvBuf = new byte[26];
DatagramPacket packet = new DatagramPacket(recvBuf, recvBuf.length);
socket.receive(packet);
String bitArray = toBitArray(recvBuf);
System.out.println(Integer.parseInt(bitArray.substring(0, 8), 2)); // convert first byte binary to decimal
System.out.println(Integer.parseInt(bitArray.substring(8, 16), 2)); // convert second byte binary to decimal
}
public static String toBitArray(byte[] byteArray) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < byteArray.length; i++) {
sb.append(String.format("%8s", Integer.toBinaryString(byteArray[i] & 0xFF)).replace(' ', '0'));
}
return sb.toString();
}

Convert a String to array of bits

I would like to convert a String consisting of 0's and 1's to an array of bits.
The String is of length ~30000 and is sparse (mostly 0s, few 1s)
For example, given a string
"00000000100000000010000100000000001000"
I would like to convert it to an array of bits which will store
[00000000100000000010000100000000001000]
I am thinking of using BitSet or OpenBitSet
Is there a better way? The use case is to perform logical OR efficiently.
I am thinking along these lines
final OpenBitSet logicalOrResult = new OpenBitSet();
for (final String line : lines) {
final OpenBitSet myBitArray = new OpenBitSet();
int pos = 0;
for (final char c : str.toCharArray()) {
myBitArray.set(pos) = c;
pos++;
}
logicalOrResult.or(myBitArray);
}

BigInteger can parse it and store it, and do bitwise operations:
BigInteger x = new BigInteger(bitString, 2);
BigInteger y = new BigInteger(otherBitString, 2);
x = x.or(y);
System.out.println(x.toString(2));

A BitSet ranging over values between 0 and 30000 requires a long array of size less than 500, so you can assume that BitSet.or (or the respective OpenBitSet method) will be sufficiently fast, despite the sparsity. It looks like OpenBitSet has better performance than BitSet, but apart from this it doesn't really matter which you use, both will implement or efficiently. However, be sure to pass the length of the String to the (Open)BitSet constructor to avoid reallocations of the internal long array during construction!
If your strings are much longer and your sparsity is extreme, you could also consider storing them as a sorted list of Integers (or ints, if you use a library like Trove), representing the indices which contain a 1. A bitwise or can be implemented in a merge(sort)-like fashion, which is quite efficient (time O(n + m), where n, m are the numbers of ones in each string). I suspect that in your scenario it will be slower than the BitSet approach though.

You can iterate through each character:
boolean[] bits = new boolean[str.length];
for (int i=0;i<str.length;i++) {
if (str.charAt(i).equals("1")
bits[i] = true;
else if (str.charAt(i).equals("0")
bits[i] = false;
}
If you want to be memory efficient, you could try RLE (Run Length Encoding).

Efficient way of altering data in an array with threads

I've been trying to figure out the most efficient way where many threads are altering a very big byte array on bit level. For ease of explaining I'll base the question around a multithreaded Sieve of Eratosthenes to ease explaining the question. The code though should not be expected to fully completed as I'll omit certain parts that aren't directly related. The sieve also wont be fully optimised as thats not the direct question. The sieve will work in such a way that it saves which values are primes in a byte array, where each byte contains 7 numbers (we can't alter the first bit due to all things being signed).
Lets say our goal is to find all the primes below 1 000 000 000 (1 billion). As a result we would need an byte array of length 1 000 000 000 / 7 +1 or 142 857 143 (About 143 million).
class Prime {
int max = 1000000000;
byte[] b = new byte[(max/7)+1];
Prime() {
for(int i = 0; i < b.length; i++) {
b[i] = (byte)127; //Setting all values to 1 at start
}
findPrimes();
}
/*
* Calling remove will set the bit value associated with the number
* to 0 signaling that isn't an prime
*/
void remove(int i) {
int j = i/7; //gets which array index to access
b[j] = (byte) (b[j] & ~(1 << (i%7)));
}
void findPrimes() {
remove(1); //1 is not a prime and we wanna remove it from the start
int prime = 2;
while (prime*prime < max) {
for(int i = prime*2; i < max; i = prime + i) {
remove(i);
}
prime = nextPrime(prime); //This returns the next prime from the list
}
}
... //Omitting code, not relevant to question
}
Now we got a basic outline where something runs through all numbers for a certain mulitplication table and calls remove to remove numbers set bits that fits the number to 9 if we found out they aren't primes.
Now to up the ante we create threads that do the checking for us. We split the work so that each takes a part of the removing from the table. So for example if we got 4 threads and we are running through the multiplication table for the prime 2, we would assign thread 1 all in the 8 times tables with an starting offset of 2, that is 4, 10, 18, ...., the second thread gets an offset of 4, so it goes through 6, 14, 22... and so on. They then call remove on the ones they want.
Now to the real question. As most can see that while the prime is less than 7 we will have multiple threads accessing the same array index. While running through 2 for example we will have thread 1, thread 2 and thread 3 will all try to access b[0] to alter the byte which causes an race condition which we don't want.
The question therefore is, whats the best way of optimising access to the byte array.
So far the thoughts I've had for it are:
Putting synchronized on the remove method. This obviously would be very easy to implement but an horrible ideas as it would remove any type of gain from having threads.
Create an mutex array equal in size to the byte array. To enter an index one would need the mutex on the same index. This Would be fairly fast but would require another very big array in memory which might not be the best way to do it
Limit the numbers stored in the byte to prime number we start running on. So if we start on 2 we would have numbers per array. This would however increase our array length to 500 000 000 (500 million).
Are there other ways of doing this in a fast and optimal way without overusing the memory?
(This is my first question here so I tried to be as detailed and thorough as possible but I would accept any comments on how I can improve the question - to much detail, needs more detail etc.)

You can use an array of atomic integers for this. Unfortunately there isn't a getAndAND, which would be ideal for your remove() function, but you can CAS in a loop:
java.util.concurrent.atomic.AtomicIntegerArray aia;
....
void remove(int i) {
int j = i/32; //gets which array index to access
do {
int oldVal = aia.get(j);
int newVal = oldVal & ~(1 << (i%32));
boolean updated = aia.weakCompareAndSet(j, oldVal, newVal);
} while(!updated);
}
Basically you keep trying to adjust the slot to remove that bit, but you only succeed if nobody else modifies it out from under you. Safe, and likely to be very efficient. weakCompareAndSet is basically an abstracted Load-link/Store conditional instruction.
BTW, there's no reason not to use the sign bit.

I think you could avoid synchronizing threads...
For example, this task:
for(int i = prime*2; i < max; i = prime + i) {
remove(i);
}
it could be partitioned in small tasks.
for (int i =0; i < thread_poll; i++){
int totalPos = max/8; // dividing virtual array in bytes
int partitionSize = totalPos /thread_poll; // dividing bytes by thread poll
removeAll(prime, partitionSize*i*8, (i + 1)* partitionSize*8);
}
....
// no colisions!!!
void removeAll(int prime, int initial; int max){
k = initial / prime;
if (k < 2) k = 2;
for(int i = k * prime; i < max; i = i + prime) {
remove(i);
}
}

Current best way to populate mixed type byte array

I'm trying to send and receive a byte stream in which certain ranges of bytes represent different pieces of data. I've found ways to convert single primitive datatypes into bytes, but I'm wondering if there's a straightforward way to place certain pieces of data into specified byte regions.
For example, I might need to produce or read something like the following:
byte 1 - int
byte 2-5 - int
byte 6-13 - double
byte 14-21 - double
byte 25 - int
byte 26-45 - string
Any suggestions would be appreciated.

Try DataOutputStream/DataInputStream or, for arrays, the ByteBuffer class.
For storing the integer in X bytes, you may use the following method. If you think it is badly named, you may use the much less descriptive i2os name which is used in several (crypto) algorithm descriptions. Note that the returned octet string uses Big Endian encoding of unsigned ints, which you should specify for your protocol.
public static byte[] possitiveIntegerToOctetString(
final long value, final int octets) {
if (value < 0) {
throw new IllegalArgumentException("Cannot encode negative values");
}
if (octets < 1) {
throw new IllegalArgumentException("Cannot encode a number in negative or zero octets");
}
final int longSizeBytes = Long.SIZE / Byte.SIZE;
final int byteBufferSize = Math.max(octets, longSizeBytes);
final ByteBuffer buf = ByteBuffer.allocate(byteBufferSize);
for (int i = 0; i < byteBufferSize - longSizeBytes; i++) {
buf.put((byte) 0x00);
}
buf.mark();
buf.putLong(value);
// more bytes than long encoding
if (octets >= longSizeBytes) {
return buf.array();
}
// less bytes than long encoding (reset to mark first)
buf.reset();
for (int i = 0; i < longSizeBytes - octets; i++) {
if (buf.get() != 0x00) {
throw new IllegalArgumentException("Value does not fit in " + octets + " octet(s)");
}
}
final byte[] result = new byte[octets];
buf.get(result);
return result;
}
EDIT before storing the string, think of a padding mechanism (spaces would be most used), and character-encoding e.g. String.getBytes(Charset.forName("ASCII")) or "Latin-1". Those are the most common encodings with a single byte per character. Calculating the size of "UTF-8" is slightly more difficult (encode first, add 0x20 valued bytes at the end using ByteBuffer).

You may want to consider having a constant size for each data type. For example, the 32-bit Java int will take up 4 bytes a long will take 8, etc. In fact, if you use Java's DataInputStream and DataOutputStreams, you'll basically be doing that anyway. They have really nice methods like read/writeInt, etc.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Improvement of Algorithm: Counting set bits in Byte-Arrays - java

By far the fastest way is counting bits set, in "parallel", method is called Hamming weight and is implemented in Integer.bitCount(int i) as far as I know.

As per my understaning, 1 Byte = 8 Bits So if Byte Array size = n , then isn't total number of bits = n*8 ? Please correct me if my understanding is wrong Thanks Vinod

Related

Creating combinations of a BitSet

Byte to "Bit"array

Convert a String to array of bits

Efficient way of altering data in an array with threads

Current best way to populate mixed type byte array

Categories

Resources