Implementation of java.util.Random.nextInt

Implementation of java.util.Random.nextInt - java

This function is from java.util.Random. It returns a pseudorandom int uniformly distributed between 0 and the given n. Unfortunately I did not get it.
public int nextInt(int n) {
if (n <= 0)
throw new IllegalArgumentException("n must be positive");
if ((n & -n) == n) // i.e., n is a power of 2
return (int)((n * (long)next(31)) >> 31);
int bits, val;
do {
bits = next(31);
val = bits % n;
} while (bits - val + (n-1) < 0);
return val;
}
My questions are:
Why does it treat the case where n is a power of two specially ? Is it just for performance ?
Why doest it reject numbers that bits - val + (n-1) < 0 ?

It does this in order to assure an uniform distribution of values between 0 and n. You might be tempted to do something like:
int x = rand.nextInt() % n;
but this will alter the distribution of values, unless n is a divisor of 2^31, i.e. a power of 2. This is because the modulo operator would produce equivalence classes whose size is not the same.
For instance, let's suppose that nextInt() generates an integer between 0 and 6 inclusive and you want to draw 0,1 or 2. Easy, right?
int x = rand.nextInt() % 3;
No. Let's see why:
0 % 3 = 0
1 % 3 = 1
2 % 3 = 2
3 % 3 = 0
4 % 3 = 1
5 % 3 = 2
6 % 3 = 0
So you have 3 values that map on 0 and only 2 values that map on 1 and 2. You have a bias now, as 0 is more likely to be returned than 1 or 2.
As always, the javadoc documents this behaviour:
The hedge "approximately" is used in the foregoing description only
because the next method is only approximately an unbiased source of
independently chosen bits. If it were a perfect source of randomly
chosen bits, then the algorithm shown would choose int values from the
stated range with perfect uniformity.
The algorithm is slightly tricky. It rejects values that would result
in an uneven distribution (due to the fact that 2^31 is not divisible
by n). The probability of a value being rejected depends on n. The
worst case is n=2^30+1, for which the probability of a reject is 1/2,
and the expected number of iterations before the loop terminates is 2.
The algorithm treats the case where n is a power of two specially: it
returns the correct number of high-order bits from the underlying
pseudo-random number generator. In the absence of special treatment,
the correct number of low-order bits would be returned. Linear
congruential pseudo-random number generators such as the one
implemented by this class are known to have short periods in the
sequence of values of their low-order bits. Thus, this special case
greatly increases the length of the sequence of values returned by
successive calls to this method if n is a small power of two.
The emphasis is mine.

next generates random bits.
When n is a power of 2, a random integer in that range can be generated just by generating random bits (I assume that always generating 31 and throwing some away is for reproducibility). This code path is simpler and I guess it's a more commonly used case so it's worth making a special "fast path" for this case.
When n isn't a power of 2, it throws away numbers at the "top" of the range so that the random number is evenly distributed. E.g. imagine we had n=3, and imagine we were using 3 bits rather than 31 bits. So bits is a randomly generated number between 0 and 7. How can you generate a fair random number there? Answer: if bits is 6 or 7, we throw it away and generate a new one.

Related

Concatenation of Consecutive Binary Numbers, Modulo Operation discussion behavior

Here I am contributing the Java Solution for the problem.
Concatenation of Consecutive Binary Numbers:
Given an integer n, return the decimal value of the binary string formed by concatenating the binary representations of 1 to n in order, modulo 10^9 + 7.
class Solution {
public int concatenatedBinary(int n) {
long sum = 0;
for(int i = 1; i <= n; i++){
sum = ((sum << Integer.toBinaryString(i).length()) + i) % 1000000007;
}
return (int) sum;
}
}
Now I have a doubt that is when we are modulating at each step within for loop. It will not impact the result till 1000000007 but after that, it will change the sum variable, and this cycle will repeat. Now, why doesn't this modulo impacting the overall result? Thanks in advance.

Let's take a simpler problem: Take the number 1000, write it as bits, then take the number 1001, write that as bits, concatenate the two, what's that, in decimal?
1000 in bits is 11 1110 1000
1001 in bits is 11 1110 1001
Thus, the answer'd be 1111 1010 0011 1110 1001, or 1025001.
But, let's do a more mathy take on this: "concatenate the two" boils down to: "Shift the bits in the first number to the left to make enough room, then add the second number". And "shift left by X" is the same as 'multiply by 2X'. Just like if I have the number '1234', and I tell you to 'shift that left by 2 spots', it's the same as multiplying by 100: That turns it into 123400, which is 1234*100, and 100 is just 102. So, 'shift left by X spots' is the same as 'multiply by bX' where b is the 'base' of the number system we use; 2 in binary, 10 in decimal.
Thus, a different way to state the same result is: 'Take the number 1000, multiply it by 210, add 1001 to it. Sure enough: 1000 * 2^10 + 1001 is indeed 1025001.
Thus, a single 'loop' in your algorithm is effectively: Take the result we have so far, multiply it by 2 a bunch of times (X times, where X is the position of the highest 1-bit in the number we're processing this loop), then add the new number.
So, it's just multiplication and addition.
Modulo has the property that it is stable for those operations.
Consider basic math: You were probably taught about the number line. A horizontal line, infinite in size.
A modulo system is no different, except the number line is a big loop. It's a circle. In modulo 1000000007 space, the numbers '5 and 6' are just as adjacent as the numbers '0 and 1000000006' are.
Given, on the normal number line, a * b = c, modulo has the property that this also means that (a%Z * b%Z)%Z = c%Z for any Z. The same goes for addition; if a + b = c, then (a%Z + b%Z)%Z = c%Z is also true. You can try a bunch of numbers and witness this, or try to prove this yourself, or search the web for proof of this property.
Example:
12 * 18 = 216
(12%7 * 18%7)%7 = 216%7
Yup, that checks out:
5 * 4 = 20
20%7 = 6.
216%7 is also 6.
Thus:
Your question boils down to a lot of applications of multiplying and addition.
multiply and add translate to modulo math without issue.
Therefore, your algorithm works.

Effiecient Algorithm for Finding if a Very Big Number is Divisible by 7

So this was a question on one of the challenges I came across in an online competition, a few days ago.
Question:
Accept two inputs.
A big number of N digits,
The number of questions Q to be asked.
In each of the question, you have to find if the number formed by the string between indices Li and Ri is divisible by 7 or not.
Input:
First line contains the number consisting on N digits. Next line contains Q, denoting the number of questions. Each of the next Q lines contains 2 integers Li and Ri.
Output:
For each question, print "YES" or "NO", if the number formed by the string between indices Li and Ri is divisible by 7.
Constraints:
1 ≤ N ≤ 105
1 ≤ Q ≤ 105
1 ≤ Li, Ri ≤ N
Sample Input:
357753
3
1 2
2 3
4 4
Sample Output:
YES
NO
YES
Explanation:
For the first query, number will be 35 which is clearly divisible by 7.
Time Limit: 1.0 sec for each input file.
Memory Limit: 256 MB
Source Limit: 1024 KB
My Approach:
Now according to the constraints, the maximum length of the number i.e. N can be upto 105. This big a number cannot be fitted into a numeric data structure and I am pretty sure thats not the efficient way to go about it.
First Try:
I thought of this algorithm to apply the generic rules of division to each individual digit of the number. This would work to check divisibility amongst any two numbers, in linear time, i.e. O(N).
static String isDivisibleBy(String theIndexedNumber, int divisiblityNo){
int moduloValue = 0;
for(int i = 0; i < theIndexedNumber.length(); i++){
moduloValue = moduloValue * 10;
moduloValue += Character.getNumericValue(theIndexedNumber.charAt(i));
moduloValue %= divisiblityNo;
}
if(moduloValue == 0){
return "YES";
} else{
return "NO";
}
}
But in this case, the algorithm has to also loop through all the values of Q, which can also be upto 105.
Therefore, the time taken to solve the problem becomes O(Q.N) which can also be considered as Quadratic time. Hence, this crossed the given time limit and was not efficient.
Second Try:
After that didn't work, I tried searching for a divisibility rule of 7. All the ones I found, involved calculations based on each individual digit of the number. Hence, that would again result in a Linear time algorithm. And hence, combined with the number of Questions, it would amount to Quadratic Time, i.e. O(Q.N)
I did find one algorithm named Pohlman–Mass method of divisibility by 7, which suggested
Using quick alternating additions and subtractions: 42,341,530
-> 530 − 341 = 189 + 42 = 231 -> 23 − (1×2) = 21 YES
But all that did was, make the time 1/3rd Q.N, which didn't help much.
Am I missing something here? Can anyone help me find a way to solve this efficiently?
Also, is there a chance this is a Dynamic Programming problem?

There are two ways to go through this problem.
1: Dynamic Programming Approach
Let the input be array of digits A[N].
Let N[L,R] be number formed by digits L to R.
Let another array be M[N] where M[i] = N[1,i] mod 7.
So M[i+1] = ((M[i] * 10) mod 7 + A[i+1] mod 7) mod 7
Pre-calculate array M.
Now consider the expression.
N[1,R] = N[1,L-1] * 10R-L+1 + N[L,R]
implies (N[1,R] mod 7) = (N[1,L-1] mod 7 * (10R-L+1mod 7)) + (N[L,R] mod 7)
implies N[L,R] mod 7 = (M[R] - M[L-1] * (10R-L+1 mod 7)) mod 7
N[L,R] mod 7 gives your answer and can be calculated in O(1) as all values on right of expression are already there.
For 10R-L+1 mod 7, you can pre-calculate modulo 7 for all powers of 10.
Time Complexity :
Precalculation O(N)
Overall O(Q) + O(N)
2: Divide and Conquer Approach
Its a segment tree solution.
On each tree node you store the mod 7 for the number formed by digits in that node.
And the expression given in first approach can be used to find the mod 7 of parent by combining the mod 7 values of two children.
The time complexity of this solution will be O(Q log N) + O(N log N)

Basically you want to be able to to calculate the mod 7 of any digits given the mod of the number at any point.
What you can do is to;
record the modulo at each point O(N) for time and space. Uses up to 100 KB of memory.
take the modulo at the two points and determine how much subtracting the digits before the start would make e.g. O(N) time and space (once not per loop)
e.g. between 2 and 3 inclusive
357 % 7 = 0
3 % 7 = 3 and 300 % 7 = 6 (the distance between the start and end)
and 0 != 6 so the number is not a multiple of 7.
between 4 and 4 inclusive
3577 % 7 == 0
357 % 7 = 0 and 0 * 10 % 7 = 0
as 0 == 0 it is a multiple of 7.

You first build a list of digits modulo 7 for each number starting with 0 offset (like in your case, 0%7, 3%7, 35%7, 357%7...) then for each case of (a,b) grab digits[a-1] and digits[b], then multiply digits[b] by 1-3-2-6-4-5 sequence of 10^X modulo 7 defined by (1+b-a)%6 and compare. If these are equal, return YES, otherwise return NO. A pseudocode:
readString(big);
Array a=[0]; // initial value
Array tens=[1,3,2,6,4,5]; // quick multiplier lookup table
int d=0;
int l=big.length;
for (int i=0;i<l;i++) {
int c=((int)big[i])-48; // '0' -> 0, and "big" has characters
d=(3*d+c)%7;
a.push(d); // add to tail
}
readInt(q);
for (i=0;i<q;i++) {
readInt(li);
readInt(ri); // get question
int left=(a[li-1]*tens[(1+ri-li)%6])%7;
if (left==a[ri]) print("YES"); else print("NO");
}
A test example:
247761901
1
5 9
61901 % 7=0. Calculating:
a = [0 2 3 2 6 3 3 4 5 2]
li = 5
ri = 9
left=(a[5-1]*tens[(1+9-5)%6])%7 = (6*5)%7 = 30%7 = 2
a[ri]=2
Answer: YES

Generating random integer between 1 and infinity

I would like to create an integer value between 1 and infinity. I want to have a probability distribution where the smaller the number is, the higher the chance it is generated.
I generate a random value R between 0 and 2.
Take the series
I want to know the smallest m with which my sum is bigger than R.
I need a fast way to determine m. This is would be pretty straightforward if i had R in binary, since m would be equal to the number of 1's my number has in a row from the most significant bit, plus one.
There is an upper limit on the integer this method can generate: integer values have an upper limit and double precision can also only reach so high in the [0;2[ interval. This is irrelevant, however, since it depends on the accuracy of the data representation method.
What would be the fastest way to determine m?

Set up the inequality
R <= 2 - 2**-m
Isolate the term with m
2**-m <= 2 - R
-m <= log2(2-R)
m >= -log2(2-R).
So it looks like you want ceiling(-log2(2-R)). This is basically an exponential distribution with discretization -- the algorithm for an exponential is -ln(1-U)/rate, where U is a Uniform(0,1) and 1/rate is the desired mean.

I think, straightforward solution will be OK as this series converges really fast:
if (r >= 2)
throw new IllegalArgumentException();
double exp2M = 1 / (2 - r);
int x = (int)exp2M;
int ans = 0;
while (x > 0) {
++ans;
x >>= 2;
}
return ans;

How Java processes for overflow integers [duplicate]

This question already has answers here:
Why do these two multiplication operations give different results?
(2 answers)
Closed 9 years ago.
Now signed_int max value is 2,147,483,647 i.e. 2^31 and 1 bit is sign bit, so
when I run long a = 2,147,483,647 + 1;
It gives a = -2,147,483,648 as answer.. This hold good.
But, 24*60*60*1000*1000 = 86400000000 (actually)...
In java, 24*60*60*1000*1000 it equals to 500654080..
I understand that it is because of overflow in integer, but what processing made this value come, What logic was used to get that number by Java. I also refered here.

Multiplication is executed from left to right like this
int x = 24 * 60;
x = x * 60;
x = x * 1000;
x = x * 1000;
first 3 operations produce 86400000 which still fits into Integer.MAX_VALUE. But the last operation produces 86400000000 which is 0x141dd76000 in hex. Bytes above 4 are truncated and we get 0x1dd76000. If we print it
System.out.println(0x1dd76000);
the result will be
500654080

This is quite subtle: when writing long a = 2147483647 + 1, the right hand side is computed first using ints since you have supplied int literals. But that will clock round to a negative (due to overflow) before being converted to a long. So the promotion from int to long is too late for you.
To circumvent this behaviour, you need to promote at least one of the arguments to a long literal by suffixing an L.
This applies to all arithmetic operations using literals (i.e. also your multiplication): you need to promote one of them to a long type.
The fact that your multiplication answer is 500654080 can be seen by looking at
long n = 24L*60*60*1000*1000;
long m = n % 4294967296L; /* % is extracting the int part so m is 500654080
n.b. 4294967296L is 2^32 (using OP notation, not XOR). */
What's happening here is that you are going 'round and round the clock' with the int type. Yes, you are losing the carry bits but that doesn't matter with multiplication.

As the range of int is -2,147,483,648 to 2,147,483,647.
So, when you keep on adding numbers and its exceed the maximum limit it start gain from the left most number i.e. -2,147,483,648, as it works as a cycle. That you had already mentioned in your question.
Similarly when you are computing 24*60*60*1000*1000 which should result 86400000000 as per Maths.
But actually what happens is somehow as follows:
86400000000 can be written as 2147483647+2147483647+2147483647+2147483647+..36 times+500654080
So, after adding 2147483647 for 40 times results 0 and then 500654080 is left which ultimately results in 500654080.
I hope its clear to you.

Add L in your multiplicatoin. If you add L than it multiply you in Long range otherwise in Integer range which overflow. Try to multiply like this.
24L*60*60*1000*1000
This give you a right answer.

An Integer is 32 bit long. Lets take for example a number that is 4 bit long for the sake of simplicity.
It's max positive value would be:
0111 = 7 (first bit is for sign; 0 means positive, 1 means negative)
0000 = 0
It's min negative value would be:
1111 = -8 (first bit is for sign)
1000 = -1
Now, if we call this type fbit, fbit_max is equal to 7.
fbit_max + 1 = -8
because bitwise 0111 + 1 = 1111
Therefore, the span of fbit_min to fbit_max is 16. From -8 to 7.
If you would multiply something like 7*10 and store it in fbit, the result would be:
fbit number = 7 * 10 (actually 70)
fbit number = 7 (to get to from zero to max) + 16 (min to max) + 16 (min to max) + 16 (min to max) + 15 (the rest)
fbit number = 6

24*60*60*1000*1000 = 86400000000
Using MOD as follows: 86400000000 % 2147483648 = 500654080

Issue with implementation of Fermat's little therorm

Here's my implementation of Fermat's little theorem. Does anyone know why it's not working?
Here are the rules I'm following:
Let n be the number to test for primality.
Pick any integer a between 2 and n-1.
compute a^n mod n.
check whether a^n = a mod n.
myCode:
int low = 2;
int high = n -1;
Random rand = new Random();
//Pick any integer a between 2 and n-1.
Double a = (double) (rand.nextInt(high-low) + low);
//compute:a^n = a mod n
Double val = Math.pow(a,n) % n;
//check whether a^n = a mod n
if(a.equals(val)){
return "True";
}else{
return "False";
}
This is a list of primes less than 100000. Whenever I input in any of these numbers, instead of getting 'true', I get 'false'.
The First 100,008 Primes
This is the reason why I believe the code isn't working.

In java, a double only has a limited precision of about 15 to 17 digits. This means that while you can compute the value of Math.pow(a,n), for very large numbers, you have no guarantee you'll get an exact result once the value has more than 15 digits.
With large values of a or n, your computation will exceed that limit. For example
Math.pow(3, 67) will have a value of 9.270946314789783e31 which means that any digit after the last 3 is lost. For this reason, after applying the modulo operation, you have no guarantee to get the right result (example).
This means that your code does not actually test what you think it does. This is inherent to the way floating point numbers work and you must change the way you hold your values to solve this problem. You could use long but then you would have problems with overflows (a long cannot hold a value greater than 2^64 - 1 so again, in the case of 3^67 you'd have another problem.
One solution is to use a class designed to hold arbitrary large numbers such as BigInteger which is part of the Java SE API.

As the others have noted, taking the power will quickly overflow. For example, if you are picking a number n to test for primality as small as say, 30, and the random number a is 20, 20^30 = about 10^39 which is something >> 2^90. (I took the ln of 10^39).
You want to use BigInteger, which even has the exact method you want:
public BigInteger modPow(BigInteger exponent, BigInteger m)
"Returns a BigInteger whose value is (this^exponent mod m)"
Also, I don't think that testing a single random number between 2 and n-1 will "prove" anything. You have to loop through all the integers between 2 and n-1.

#evthim Even if you have used the modPow function of the BigInteger class, you cannot get all the prime numbers in the range you selected correctly. To clarify the issue further, you will get all the prime numbers in the range, but some numbers you have are not prime. If you rearrange this code using the BigInteger class. When you try all 64-bit numbers, some non-prime numbers will also write. These numbers are as follows;
341, 561, 645, 1105, 1387, 1729, 1905, 2047, 2465, 2701, 2821, 3277, 4033, 4369, 4371, 4681, 5461, 6601, 7957, 8321, 8481, 8911, 10261, 10585, 11305, 12801, 13741, 13747, 13981, 14491, 15709, 15841, 16705, 18705, 18721, 19951, 23001, 23377, 25761, 29341, ...
https://oeis.org/a001567
161038, 215326, 2568226, 3020626, 7866046, 9115426, 49699666, 143742226, 161292286, 196116194, 209665666, 213388066, 293974066, 336408382, 376366, 666, 566, 566, 666 2001038066, 2138882626, 2952654706, 3220041826, ...
https://oeis.org/a006935
As a solution, make sure that the number you tested is not in this list by getting a list of these numbers from the link below.
http://www.cecm.sfu.ca/Pseudoprimes/index-2-to-64.html
The solution for C # is as follows.
public static bool IsPrime(ulong number)
{
return number == 2
? true
: (BigInterger.ModPow(2, number, number) == 2
? (number & 1 != 0 && BinarySearchInA001567(number) == false)
: false)
}
public static bool BinarySearchInA001567(ulong number)
{
// Is number in list?
// todo: Binary Search in A001567 (https://oeis.org/A001567) below 2 ^ 64
// Only 2.35 Gigabytes as a text file http://www.cecm.sfu.ca/Pseudoprimes/index-2-to-64.html
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.