Convert a String to array of bits - java

I would like to convert a String consisting of 0's and 1's to an array of bits.
The String is of length ~30000 and is sparse (mostly 0s, few 1s)
For example, given a string
"00000000100000000010000100000000001000"
I would like to convert it to an array of bits which will store
[00000000100000000010000100000000001000]
I am thinking of using BitSet or OpenBitSet
Is there a better way? The use case is to perform logical OR efficiently.
I am thinking along these lines
final OpenBitSet logicalOrResult = new OpenBitSet();
for (final String line : lines) {
final OpenBitSet myBitArray = new OpenBitSet();
int pos = 0;
for (final char c : str.toCharArray()) {
myBitArray.set(pos) = c;
pos++;
}
logicalOrResult.or(myBitArray);
}

BigInteger can parse it and store it, and do bitwise operations:
BigInteger x = new BigInteger(bitString, 2);
BigInteger y = new BigInteger(otherBitString, 2);
x = x.or(y);
System.out.println(x.toString(2));

A BitSet ranging over values between 0 and 30000 requires a long array of size less than 500, so you can assume that BitSet.or (or the respective OpenBitSet method) will be sufficiently fast, despite the sparsity. It looks like OpenBitSet has better performance than BitSet, but apart from this it doesn't really matter which you use, both will implement or efficiently. However, be sure to pass the length of the String to the (Open)BitSet constructor to avoid reallocations of the internal long array during construction!
If your strings are much longer and your sparsity is extreme, you could also consider storing them as a sorted list of Integers (or ints, if you use a library like Trove), representing the indices which contain a 1. A bitwise or can be implemented in a merge(sort)-like fashion, which is quite efficient (time O(n + m), where n, m are the numbers of ones in each string). I suspect that in your scenario it will be slower than the BitSet approach though.

You can iterate through each character:
boolean[] bits = new boolean[str.length];
for (int i=0;i<str.length;i++) {
if (str.charAt(i).equals("1")
bits[i] = true;
else if (str.charAt(i).equals("0")
bits[i] = false;
}
If you want to be memory efficient, you could try RLE (Run Length Encoding).

Related

What is the space complexity of bitset in this scenario

I am doing a leetcode problem where I have to find the duplicate of an array of size [1-N] inclusive and came upon this solution:
public int findDuplicate(int[] nums) {
BitSet bit = new BitSet();
for(int num : nums) {
if(!bit.get(num)) {
bit.set(num);
} else {
return num;
}
}
return -1;
}
The use of bitset here im assuming is similar to using boolean[] to keep track if we saw the current number previously. So my question is what the space complexity is for this? The runtime seems to be O(n) where n is the size of the input array. Would the same be true for the space complexity?
Link to problem : https://leetcode.com/problems/find-the-duplicate-number/
Your Bitset creates an underlying long[] to store the values. Reading the code of Bitset#set, I would say it's safe to say that the array will never be larger than max(nums) / 64 * 2 = max(nums) / 32. Since long has a fixed size, this comes down to O(max(nums)). If nums contains large values, you can do better with a hash map.
I'm trying this out with simple code, and it seems to corroborate my reading of the code.
BitSet bitSet = new BitSet();
bitSet.set(100);
System.out.println(bitSet.toLongArray().length); // 2 (max(nums) / 32 = 3.125)
bitSet.set(64000);
System.out.println(bitSet.toLongArray().length); // 1001 (max(nums) / 32 = 2000)
bitSet.set(100_000);
System.out.println(bitSet.toLongArray().length); // 1563 (max(nums) / 32 = 3125)
Note that the 2 factor I added is conservative, in general it will be a smaller factor, that's why my formula consistently over-estimates the actual length of the long array, but never by more than a factor of 2. This is the code in Bitset that made me add it:
private void ensureCapacity(int wordsRequired) {
if (words.length < wordsRequired) {
// Allocate larger of doubled size or required size
int request = Math.max(2 * words.length, wordsRequired);
words = Arrays.copyOf(words, request);
sizeIsSticky = false;
}
}
In summary, I would say the bit set is only a good idea if you have reason to believe you have smaller values than you have values (count). For example, if you have only two values but they are over a billion in value, you will needlessly allocate an array of several million elements.
Additionally, even in cases where values remain small, this solutions performs poorly for sorted arrays because Bitset#set will always reallocate and copy the array, so your complexity is not linear at all, it's quadratic in max(nums), which can be terrible if max(nums) is very large. To be linear, you would need to first find the maximum, allocate the necessary length in the Bitset, and then only go through the array.
At this point, using a map is simpler and fits all situations. If speed really matters, my bet is that the Bitset will beat a map under specific conditions (lots of values, but small, and by pre-sizing the bit set as described).

Creating combinations of a BitSet

Assume I have a Java BitSet. I now need to make combinations of the BitSet such that only Bits which are Set can be flipped. i.e. only need combinations of Bits which are set.
For Eg. BitSet - 1010, Combinations - 1010, 1000, 0010, 0000
BitSet - 1100, Combination - 1100, 1000, 0100, 0000
I can think of a few solutions E.g. I can take combinations of all 4 bits and then XOR the combinations with the original Bitset. But this would be very resource-intensive for large sparse BitSets. So I was looking for a more elegant solution.
It appears that you want to get the power set of the bit set. There is already an answer here about how to get the power set of a Set<T>. Here, I will show a modified version of the algorithm shown in that post, using BitSets:
private static Set<BitSet> powerset(BitSet set) {
Set<BitSet> sets = new HashSet<>();
if (set.isEmpty()) {
sets.add(new BitSet(0));
return sets;
}
Integer head = set.nextSetBit(0);
BitSet rest = set.get(0, set.size());
rest.clear(head);
for (BitSet s : powerset(rest)) {
BitSet newSet = s.get(0, s.size());
newSet.set(head);
sets.add(newSet);
sets.add(s);
}
return sets;
}
You can perform the operation in a single linear pass instead of recursion, if you realize the integer numbers are a computer’s intrinsic variant of “on off” patterns and iterating over the appropriate integer range will ultimately produce all possible permutations. The only challenge in your case, is to transfer the densely packed bits of an integer number to the target bits of a BitSet.
Here is such a solution:
static List<BitSet> powerset(BitSet set) {
int nBits = set.cardinality();
if(nBits > 30) throw new OutOfMemoryError(
"Not enough memory for "+BigInteger.ONE.shiftLeft(nBits)+" BitSets");
int max = 1 << nBits;
int[] targetBits = set.stream().toArray();
List<BitSet> sets = new ArrayList<>(max);
for(int onOff = 0; onOff < max; onOff++) {
BitSet next = new BitSet(set.size());
for(int bitsToSet = onOff, ix = 0; bitsToSet != 0; ix++, bitsToSet>>>=1) {
if((bitsToSet & 1) == 0) {
int skip = Integer.numberOfTrailingZeros(bitsToSet);
ix += skip;
bitsToSet >>>= skip;
}
next.set(targetBits[ix]);
}
sets.add(next);
}
return sets;
}
It uses an int value for the iteration, which is already enough to represent all permutations that can ever be stored in one of Java’s builtin collections. If your source BitSet has 2³¹ one bits, the 2³² possible combinations do not only require a hundred GB heap, but also a collection supporting 2³² elements, i.e. a size not representable as int.
So the code above terminates early if the number exceeds the capabilities, without even trying. You could rewrite it to use a long or even BigInteger instead, to keep it busy in such cases, until it will fail with an OutOfMemoryError anyway.
For the working cases, the int solution is the most efficient variant.
Note that the code returns a List rather than a HashSet to avoid the costs of hashing. The values are already known to be unique and hashing would only pay off if you want to perform lookups, i.e. call contains with another BitSet. But to test whether an existing BitSet is a permutation of your input BitSet, you wouldn’t even need to generate all permutations, a simple bit operation, e.g. andNot would tell you that already. So for storing and iterating the permutations, an ArrayList is more efficient.

Using an array of precomputed powers to extract roots

I am writing a program that solves a sum of tenth powers problem and I need to have a fast algorithm to find n^10 as well as n^(1/10) For natural n<1 000 000. I am precomputing an array of powers, so n^10 (array lookup) takes O(1). For n^(1/10) I am doing a binary search. Is there any way to accelerate extraction of a root beyond that? For example, making an array and filling elements with corresponding roots if the index is a perfect power or leaving zero otherwise would give O(1), but I will run out of memory. Is there a way to make root extraction faster than O(log(n))?
Why should the array of roots run out of memory? If it is the same size as the array of powers, it will fit using the same datatypes. However for the powers, (10^6)^10 = 10^60, which does not fit into a long variable so you need to use biginteger or bigdecimal types. In case your number n is bigger than the biggest array size n_max your memory can afford, you can divide n by n_m until it fits, i.e. split n = n_max^m*k, where m is a natural number and k < n_max:
public class Roots
{
static final int N_MAX = 1_000_000;
double[] roots = new double[N_MAX+1];
Roots() {for (int i = 0; i <= N_MAX; i++) {roots[i] = Math.pow(i, 0.1);}}
double root(long n)
{
int m = 0;
while (n > N_MAX)
{
n /= N_MAX;
m++;
}
return (Math.pow(roots[N_MAX],m)*roots[(int)n]); // in a real case you would precompute pow(roots[N_MAX],m) as well
}
static public void main(String[] args)
{
Roots root = new Roots();
System.out.println(root.root(1000));
System.out.println(root.root(100_000_000_000_000l));
}
}
Apart LUT You got two options to speed up I can think of:
use binary search without multiplication
If you are using bignums then 10th-root binary search search is not O(log(n)) anymore as the basic operation used in it are no longer O(1) !!! For example +,-,<<,>>,|,&,^,>=,<=,>,<,==,!= will became O(b) and * will be O(b^2) or O(b.log(b)) where b=log(n) depending on algorithm used (or even operand magnitude). So naive binary search for root finding will be in the better case O(log^2(n).log(log(n)))
To speedup it you can try not to use multiplication. Yes it is possible and the final complexity will bee O(log^2(n)) Take a look at:
How to get a square root for 32 bit input in one clock cycle only?
To see how to achieve this. The difference is only in solving different equations:
x1 = x0+m
x1^10 = f(x0,m)
If you obtain algebraically x1=f(x0,m) then each multiplication inside translate to bit-shifts and adds... For example 10*x = x<<1 + x<<3. The LUT table is not necessary as you can iterate it during binary search.
I imagine that f(x0,m) will contain lesser powers of x0 so analogically compute all the needed powers too ... so the final result will have no powering. Sorry too lazy to do that for you, you can use some math app for that like Derive for Windows
you can use pow(x,y) = x^y = exp2(y*log2(x))
So x^0.1 = exp2(log2(x)/10) But you would need bigdecimals for this (or fixed point) here see how I do it:
How can I write a power function myself?
For more ideas see this:
Power by squaring for negative exponents

Byte to "Bit"array

A byte is the smallest numeric datatype java offers but yesterday I came in contact with bytestreams for the first time and at the beginning of every package a marker byte is send which gives further instructions on how to handle the package. Every bit of the byte has a specific meaning so I am in need to entangle the byte into it's 8 bits.
You probably could convert the byte to a boolean array or create a switch for every case but that can't certainly be the best practice.
How is this possible in java why are there no bit datatypes in java?
Because there is no bit data type that exists on the physical computer. The smallest allotment you can allocate on most modern computers is a byte which is also known as an octet or 8 bits. When you display a single bit you are really just pulling that first bit out of the byte with arithmetic and adding it to a new byte which still is using an 8 bit space. If you want to put bit data inside of a byte you can but it will be stored as a at least a single byte no matter what programming language you use.
You could load the byte into a BitSet. This abstraction hides the gory details of manipulating single bits.
import java.util.BitSet;
public class Bits {
public static void main(String[] args) {
byte[] b = new byte[]{10};
BitSet bitset = BitSet.valueOf(b);
System.out.println("Length of bitset = " + bitset.length());
for (int i=0; i<bitset.length(); ++i) {
System.out.println("bit " + i + ": " + bitset.get(i));
}
}
}
$ java Bits
Length of bitset = 4
bit 0: false
bit 1: true
bit 2: false
bit 3: true
You can ask for any bit, but the length tells you that all the bits past length() - 1 are set to 0 (false):
System.out.println("bit 75: " + bitset.get(75));
bit 75: false
Have a look at java.util.BitSet.
You might use it to interpret the byte read and can use the get method to check whether a specific bit is set like this:
byte b = stream.read();
final BitSet bitSet = BitSet.valueOf(new byte[]{b});
if (bitSet.get(2)) {
state.activateComponentA();
} else {
state.deactivateComponentA();
}
state.setFeatureBTo(bitSet.get(1));
On the other hand, you can create your own bitmask easily and convert it to a byte array (or just byte) afterwards:
final BitSet output = BitSet.valueOf(ByteBuffer.allocate(1));
output.set(3, state.isComponentXActivated());
if (state.isY){
output.set(4);
}
final byte w = output.toByteArray()[0];
How is this possible in java why are there no bit datatypes in java?
There are no bit data types in most languages. And most CPU instruction sets have few (if any) instructions dedicated to adressing single bits. You can think of the lack of these as a trade-off between (language or CPU) complexity and need.
Manipulating a single bit can be though of as a special case of manipulating multiple bits; and languages as well as CPU's are equipped for the latter.
Very common operations like testing, setting, clearing, inverting as well as exclusive or are all supported on the integer primitive types (byte, short/char, int, long), operating on all bits of the type at once. By chosing the parameters appropiately you can select which bits to operate on.
If you think about it, a byte array is a bit array where the bits are grouped in packages of 8. Adressing a single bit in the array is relatively simple using logical operators (AND &, OR |, XOR ^ and NOT ~).
For example, testing if bit N is set in a byte can be done using a logical AND with a mask where only the bit to be tested is set:
public boolean testBit(byte b, int n) {
int mask = 1 << n; // equivalent of 2 to the nth power
return (b & mask) != 0;
}
Extending this to a byte array is no magic either, each byte consists of 8 bits, so the byte index is simply the bit number divided by 8, and the bit number inside that byte is the remainder (modulo 8):
public boolean testBit(byte[] array, int n) {
int index = n >>> 3; // divide by 8
int mask = 1 << (n & 7); // n modulo 8
return (array[index] & mask) != 0;
}
Here is a sample, I hope useful for you!
DatagramSocket socket = new DatagramSocket(6160, InetAddress.getByName("0.0.0.0"));
socket.setBroadcast(true);
while (true) {
byte[] recvBuf = new byte[26];
DatagramPacket packet = new DatagramPacket(recvBuf, recvBuf.length);
socket.receive(packet);
String bitArray = toBitArray(recvBuf);
System.out.println(Integer.parseInt(bitArray.substring(0, 8), 2)); // convert first byte binary to decimal
System.out.println(Integer.parseInt(bitArray.substring(8, 16), 2)); // convert second byte binary to decimal
}
public static String toBitArray(byte[] byteArray) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < byteArray.length; i++) {
sb.append(String.format("%8s", Integer.toBinaryString(byteArray[i] & 0xFF)).replace(' ', '0'));
}
return sb.toString();
}

Improvement of Algorithm: Counting set bits in Byte-Arrays

We store knowledge in byte arrays as bits. Counting the number of set bits is pretty slow. Any suggestion to improve the algorithm is welcome:
public static int countSetBits(byte[] array) {
int setBits = 0;
if (array != null) {
for (int byteIndex = 0; byteIndex < array.length; byteIndex++) {
for (int bitIndex = 0; bitIndex < 7; bitIndex++) {
if (getBit(bitIndex, array[byteIndex])) {
setBits++;
}
}
}
}
return setBits;
}
public static boolean getBit(int index, final byte b) {
byte t = setBit(index, (byte) 0);
return (b & t) > 0;
}
public static byte setBit(int index, final byte b) {
return (byte) ((1 << index) | b);
}
To count the bits of a byte array of length of 156'564 takes 300 ms, that's too much!
Try Integer.bitcount to obtain the number of bits set in each byte. It will be more efficient if you can switch from a byte array to an int array. If this is not possible, you could also construct a look-up table for all 256 bytes to quickly look up the count rather than iterating over individual bits.
And if it's always the whole array's count you're interested in, you could wrap the array in a class that stores the count in a separate integer whenever the array changes. (edit: Or, indeed, as noted in comments, use java.util.BitSet.)
I would use the same global loop but instead of looping inside each byte I would simply use a (precomputed) array of size 256 mapping bytes to their bit count. That would probably be very efficient.
If you need even more speed, then you should separately maintain the count and increment it and decrement it when setting bits (but that would mean a big additional burden on those operations so I'm not sure it's applicable for you).
Another solution would be based on BitSet implementation : it uses an array of long (and not bytes) and here's how it counts :
658 int sum = 0;
659 for (int i = 0; i < wordsInUse; i++)
660 sum += Long.bitCount(words[i]);
661 return sum;
I would use:
byte[] yourByteArray = ...
BitSet bitset = BitSet.valueOf(yourByteArray); // java.util.BitSet
int setBits = bitset.cardinality();
I don't know if it's faster, but I think it will be faster than what you have. Let me know?
Your method would look like
public static int countSetBits(byte[] array) {
return BitSet.valueOf(array).cardinality();
}
You say:
We store knowledge in byte arrays as bits.
I would recommend to use a BitSet for that. It gives you convenient methods, and you seem to be interested in bits, not bytes, so it is a much more appropriate data type compared to a byte[]. (Internally it uses a long[]).
By far the fastest way is counting bits set, in "parallel", method is called Hamming weight
and is implemented in Integer.bitCount(int i) as far as I know.
As per my understaning,
1 Byte = 8 Bits
So if Byte Array size = n , then isn't total number of bits = n*8 ?
Please correct me if my understanding is wrong
Thanks
Vinod

Categories