Shifting BigInteger of Java by long variable

Shifting BigInteger of Java by long variable - java

I know there are methods shiftLeft(int n) and shiftRight(int n) for BigInteger class which only takes int type as an argument but I have to shift it by a long variable. Is there any method to do it?

BigInteger can only have Integer.MAX_VALUE bits. Shifting right by more than this will always be zero. Shift left any value but zero will be an overflow.
From the Javadoc
* BigInteger constructors and operations throw {#code ArithmeticException} when
* the result is out of the supported range of
* -2<sup>{#code Integer.MAX_VALUE}</sup> (exclusive) to
* +2<sup>{#code Integer.MAX_VALUE}</sup> (exclusive).
If you need more than 2 billion bits to represent your value, you have a fairly usual problem, BigInteger wasn't designed for.
If you need to do bit manipulation on a very large scale, I suggest having an BitSet[] This will allow up to 2 bn of 2 bn bit sets, more than your addressable memory.
yes the long variable might go up to 10^10
For each 10^10 bit number you need 1.25 TB of memory. For this size of data, you may need to store it off heap, we have a library which persist this much data in a single memory mapping without using much heap, but you need to have this much space free on a single disk at least. https://github.com/OpenHFT/Chronicle-Bytes

BigInteger does not support values where long shift amounts would be appropriate. I tried
BigInteger a = BigInteger.valueOf(2).pow(Integer.MAX_VALUE);
and I got the following exception:
Exception in thread "main" java.lang.ArithmeticException: BigInteger would overflow supported range.

Since 2 ^ X is equal to 10 ^ (X * ln(2) / ln(10)), we can calculate for X = 10 ^ 10:
2 ^ (10 ^ 10) = 10 ^ 3,010,299,956.63981195...
= 10 ^ 3,010,299,956 * 10 ^ 0.63981195...
= 4.3632686... * 10 ^ 3,010,299,956
Meaning 4 followed by more than 3 billion more digits.
That's a very large number and will take some doing storing that to full precision.

Related

How to implement Karatsuba multiplication using bit manipulation

I'm implementing Karatsuba multiplication in Scala (my choice) for an online course. Considering the algorithm is meant to multiply large numbers, I chose the BigInt type which is backed by Java BigInteger. I'd like to implement the algorithm efficiently, which using base 10 arithmetic is copied below from Wikipedia:
procedure karatsuba(num1, num2)
if (num1 < 10) or (num2 < 10)
return num1*num2
/* calculates the size of the numbers */
m = max(size_base10(num1), size_base10(num2))
m2 = floor(m/2)
/* split the digit sequences in the middle */
high1, low1 = split_at(num1, m2)
high2, low2 = split_at(num2, m2)
/* 3 calls made to numbers approximately half the size */
z0 = karatsuba(low1, low2)
z1 = karatsuba((low1 + high1), (low2 + high2))
z2 = karatsuba(high1, high2)
return (z2 * 10 ^ (m2 * 2)) + ((z1 - z2 - z0) * 10 ^ m2) + z0
Given that BigInteger is internally represented as an int[], if I can calculate m2 in terms of the int[], I can use bit shifting to extract the lower and higher halves of the number. Similarly, the last step can be achieved by bit shifting too.
However, it's easier said than done, as I can't seem to wrap my head around the logic. For example, if the max number is 999, the binary representation is 1111100111, lower half is 99 = 1100011, upper half is 9 = 1001. How do I get the above split?
Note:
There is an existing question that shows how to implement using arithmetic on BigInteger, but not bit shifting. Hence, my question is not a duplicate.

To be able to use bit shifting to do the splits and recombination, the base needs to be a power of two. Using two itself, as in the linked answer, is probably reasonable. Then the "length" of the inputs can be found directly with bitLength, and the split could be implemented as:
// x = a + 2^N b
BigInteger b = x.shiftRight(N);
BigInteger a = x.subtract(b.shiftLeft(N));
Where N is the size that a will have in bits.
Given that BigInteger is implemented with 32bit limbs, it makes sense to use 2³² as the base, ensuring that the big shifts involve only the movement of whole integers, and not also the slower code path where the BigInteger is shifted by a value between 1 and 31. This could be accomplished by rounding N to a multiple of 32.
The specific constant in this line,
if (N <= 2000) return x.multiply(y); // optimize this parameter
Should probably not be trusted too much, given that comment. For performance there should be some bound though, otherwise the recursive splitting goes too deeply. For example, when the size of the numbers is 32 or less, it's clearly better to just multiply, but probably a good cut-off is much higher. In this source of BigInteger itself, the cutoff is expressed in terms of the number of limbs instead of bits, and set to 80 (so 2560 bits) - it also has an other threshold above which it switches to 3-way Toom-Cook multiplication instead of Karatsuba multiplication.

Calculating number of bits and number of words BigInteger

While converting a String into BigInteger, Java internally calculates the number of bits and then the number of words(each word is a group of 9 integers i think) in a BigInteger as can be seen here from Line 325 to Line 327. numWords is used then to create an array that can accomodate that BigInteger.
I don't understand the logic used for calculating numBits in line 325 and then the logic for numWords in Line 326.
Logically i think that for the string "123456789", numWords should be 1 and for "12345678912",numWords should be 2 , but that's not always the case. For example for "12345678912345678912", numWords should be 3, but it comes out to be 2.
Can anyone please explain the logic used in line 325 and 326?

To represent decimal number of numDigits as binary number, it requires
numDigits * Math.log(10) / Math.log(2)
bits.
int numBits = (int)(((numDigits * bitsPerDigit[radix]) >>> 10) + 1);
In the calculation above bitsPerDigit[10] is 3402.
Math.log(10) / Math.log(2) * Math.pow(2, 10) = 3401.6543691646593

In Java, BigIntegers are not stored as strings or bytes with a digit each. They are stored as an array of 32-bit integers, which together form the so-called magnitude of the BigInteger. There can be no leading zero integers(*), so the BigInteger is stored as compactly as possible.
The "words" mentioned are these 32-bit integers. They are not groups of 9 digits, they are used in full, so each bit counts.
So you just have to know how many 32-bit integers are stored, which is the length of the internal array times 32. But the top integer can still have leading zeroes, so you must get the number of leading zeroes of that top integer and subtract them from the obtained product, in pseudo-code:
numBits = internalArray.length * 32 - numberOfLeadingZeroBits(internalArray[0]);
Note that the internal array is stored with the top integer at the lowest address (I have no idea why that is), so the top integer is at index 0 of the array.
(*) In reality, the above is a little more complicated, since the top item may be stored at an offset from the start of the array (probably to make certain calculations easier), but to understand the mechanism, you can pretend there are no extra integers.

Words doesn't refer to words as you know it - it's referring to words as memory blocks.
https://en.wikipedia.org/wiki/Word_(computer_architecture)

Check division by 3 with binary operations?

I've read this interesting answer about "Checking if a number is divisible by 3"
Although the answer is in Java , it seems to work with other languages also.
Obviously we can do :
boolean canBeDevidedBy3 = (i % 3) == 0;
But the interesting part was this other calculation :
boolean canBeDevidedBy3 = ((int) (i * 0x55555556L >> 30) & 3) == 0;
For simplicity :
0x55555556L = "1010101010101010101010101010110"
Nb
There's also another method to check it :
One can determine if an integer is divisible by 3 by counting the 1
bits at odd bit positions, multiply this number by 2, add the number
of 1-bits at even bit positions add them to the result and check if
the result is divisible by 3
For example :
9310 ( is divisible by 3)
010111012
It has 2 bits in the odd places and 4 bits at the even places ( place is the zero based of the base 2 digit location)
So 2*1 + 4 = 6 which is divisible by 3.
At first I thought those 2 methods are related but I didn't find how.
Question
How does
boolean canBeDevidedBy3 = ((int) (i * 0x55555556L >> 30) & 3) == 0;
— actually determines if i%3==0 ?

Whenever you add 3 to a number, what you do is to add binary 11. Whatever the original value of the number, this will maintain the invariant that twice the number of 1 bits at odd positions, plus the number of 1 bits at even positions, will also be divisible by 3.
You can see that in this way. Let's call the value of the above expression c. You're adding 1 to an odd position, and 1 to an even position. When you add 1 to an even position, either the bit you've added 1 to was set or unset. If it was unset, you'll increase the value of c by 1, because you've added a new 1 in an odd position. If it was previously set, you'll flip that bit, but add a 1 in an even position (from the carry). This means that you initially decrease c by 1, but now when you add the 1 in the even position, you increase c by 2, so overall you've increased c by 2.
Of course, this carry bit might also get added to a bit that's already set, in which case we need to check that this part still increases c by 2: you'll remove a 1 in an even position (decreasing c by 2), and then add a 1 in an odd position (increasing c by 1), meaning that we've in fact decreased c by 1. That is the same as increasing c by 2, though, if we're working modulo 3.
A more formal version of this would be structured as a proof by induction.

The two methods do not appear to be related. The bit-wise method seems to be related to certain methods for the efficient computation of modulo b-1 when using digit base b, known in decimal arithmetic as "casting out nines".
The multiplication-based method is directly based on the definition of division when accomplished by multiplication with the reciprocal. Letting / denote mathematical division, we have
int_quot = (int)(i / 3)
frac_quot = i / 3 - int_quot = i / 3 - (int)(i / 3)
i % 3 = 3 * frac_quot = 3 * (i / 3 - (int)(i / 3))
The fractional portion of the mathematical quotient translates directly into the remainder of integer division: If the fraction is 0, the remainder is 0, if the fraction is 1/3 the remainder is 1, if the fraction is 2/3 the remainder is 2. This means we only need to examine the fractional portion of the quotient.
Instead of dividing by 3, we can multiply by 1/3. If we perform the computation in a 32.32 fixed-point format, 1/3 corresponds to 232*1/3 which is a number between 0x55555555 and 0x55555556. For reasons that will become apparent shortly, we use the overestimation here, that is the rounded-up result 0x555555556.
When we multiply 0x55555556 by i, the most significant 32 bits of the full 64-bit product will contain the integral portion of the quotient (int)(i * 1/3) = (int)(i / 3). We are not interested in this integral portion, so we neither compute nor store it. The lower 32-bits of the product is one of the fractions 0/3, 1/3, 2/3 however computed with a slight error since our value of 0x555555556 is slightly larger than 1/3:
i = 1: i * 0.55555556 = 0.555555556
i = 2: i * 0.55555556 = 0.AAAAAAAAC
i = 3: i * 0.55555556 = 1.000000002
i = 4: i * 0.55555556 = 1.555555558
i = 5: i * 0.55555556 = 1.AAAAAAAAE
If we examine the most significant bits of the three possible fraction values in binary, we find that 0x5 = 0101, 0xA = 1010, 0x0 = 0000. So the two most significant bits of the fractional portion of the quotient correspond exactly to the desired modulo values. Since we are dealing with 32-bit operands, we can extract these two bits with a right shift by 30 bits followed by a mask of 0x3 to isolate two bits. I think the masking is needed in Java as 32-bit integers are always signed. For uint32_t operands in C/C++ the shift alone would suffice.
We now see why choosing 0x55555555 as representation of 1/3 wouldn't work. The fractional portion of the quotient would turn into 0xFFFFFFF*, and since 0xF = 1111 in binary, the modulo computation would deliver an incorrect result of 3.
Note that as i increases in magnitude, the accumulated error from the imprecise representation of 1/3 affects more and more bits of the fractional portion. In fact, exhaustive testing shows that the method only works for i < 0x60000000: beyond that limit the error overwhelms the most significant fraction bits which represent our result.

Floating point notation representation in java specification

Here: http://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3
it says that:
The finite nonzero values of any floating-point value set can all be expressed in the form s · m · 2^(e - N + 1), where s is +1 or -1, m is a positive integer less than 2^N, and e is an integer between Emin = -(2^(K-1)-2) and Emax = 2^(K-1)-1, inclusive, and where N and K are parameters that depend on the value set.
and there is a table below:
Parameter float
N 24
K 8
So let's say N = 24 and K = 8 then we can have the following value from the formula:
s · 2^N · 2^(2^(K-1)-1 - N + 1) which gives us according to values specified in the table:
s * 2^24 * 2^(127 - 24) which is equal to s * 2^127. But float has only 32 bits so it's not possible to store in it such a big number.
So it's obvious that initial formula should be read in a different way. How then?
Also in javadoc for Float max value: http://docs.oracle.com/javase/7/docs/api/java/lang/Float.html#MAX_VALUE
it says:
A constant holding the largest positive finite value of type float, (2-2^-23)·2^127
This also doesn't make sense, as resulting value is much larger than 2^32 - which is possible the biggest value that can be stored in float variable. So again, I'm misreading this notation. So how it should be read?

The idea with the floating point notation is to store a much larger range of numbers than can be stored in the same space (bytes) with the integer representation. So, for example, you say that the "resulting value is much larger than 2^32". But, that would only be a problem if we're storing a typical binary number as one computes in a typical math class.
Instead, floating point representations break those 32 bytes into two main parts:
- significand
- exponent
For simplicity, imagine that 3 bytes are used for the significand and 1 byte for the exponent. Also assume that each of these is your typical binary integer style of representation. So, the three bytes can have a value 2^24, or 2^23 if you want to keep one bit for the sign.
However, the other byte can store up to 2^7 (if you want a sign there too).
So, you could express 500^100, by storing the 500 in the three bytes and the 100 in the 1 byte.
Essentially, one cannot store every number precisely. One changes it into significant form and one can store as many significant digits as the portion reserved for the significand (3 bytes in this example).
Rather than try to explain the complications, check this Wikipedia article for more.

Java get Long larger than 1000000000

in this simple code i can not get Long larger than 1000000000. lenght of that is 10 char and i want to get larger than such as 15 character.
long value = nextLong(rand,1000000000);
long nextLong(Random rng, long n) {
long bits, val;
do {
bits = (rng.nextLong() << 1) >>> 1;
val = bits % n;
} while (bits-val+(n-1) < 0L);
return val;
}

Your long constant is missing an L suffix:
long value = nextLong(rand,100000000000000L);
I want to get larger than such as 15 character.
Java's long has range of –9223372036854775808 to 9223372036854775807 (18 full digits + top digit in the range 0..8), which is sufficient to cover the range that you need to cover. If you need 19 decimal digits or more, you would need to use BigInteger.

You should be able to use BigInt.
Import using:
import java.math.BigInteger;
declare like this:
BigInteger myBigInt = new BigInteger("123456789123456789");

Increase your limit value of 'n'. Since you are limiting the generated random value by performing a modulo 'n', obviously the generated value needs to be less than 'n'. Since your limit is a long, you can increase that limit to allow for 15 digit results without other changes.
However I am not sure what you are trying to accomplish with the loop in the nextLong function. It will only loop when bits > ( Long.MAX - n + 1 ).

I get the feeling that you're limiting yourself by your own modulo operation.
Remember that modulo division is the same as short division - the kind we used back in third grade. That is, instead of dividing out the entire number, we take the whole portion and the remainder.
So, let's take a simple example (a power of 10, since you're using one as well):
99 / 10 = 9 remainder 9
That is to say, if I divide 99 into 10 using short division, I will be able to divide it evenly 9 times, with 9 bits left over. Notice that the left-over is an order of magnitude shorter of what I'm dividing into.
This scales up with higher orders of divisors:
999 / 10 = 99 remainder 9
9999 / 10 = 999 remainder 9
99999 / 10 = 9999 remainder 9
...and so forth. Notice that our remainder is always an order of magnitude below our dividend. This makes sense, since if it were larger than our dividend, it'd be another value we could add to the quotient, and not the remainder.
Now, we come back to your example. You're taking a long value, which can be several orders of magnitude larger or smaller than your passed in value of a billion (which fits fine into an int, and is promoted to a long when you call your method).
The ultimate issue comes down to this:
val = bits % n;
...where bits is some arbitrary long value that could be greater than n.
Remember what we discovered above with the short division above? That's right - your resulting val will be an order of magnitude below your n value - that is to say, it will never be larger than or equal to n.
I'm not entirely sure what it is you're trying to accomplish, so I don't have The Right Thing™ for you to do. But I'd recommend that you re-evaluate the purpose of that modulo operation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.