So I've implemented my own little RSA algorithm and in the course of that I wrote a function to find large prime numbers.
First I wrote a function prime? that tests for primality and then I wrote two versions of a prime searching function. In the first version I just test random BigIntegers until I hit a prime. In the second version I sample a random BigInteger and then incremented it until I find a prime.
(defn resampling []
(let [rnd (Random.)]
(->> (repeatedly #(BigInteger. 512 rnd))
(take-while (comp not prime?))
(count))))
(defn incrementing []
(->> (BigInteger. 512 (Random.))
(iterate inc)
(take-while (comp not prime?))
(count)))
(let [n 100]
{:resampling (/ (reduce + (repeatedly n resampling)) n)
:incrementing (/ (reduce + (repeatedly n incrementing)) n)})
Running this code yielded the two averages of 332.41 for the resampling function and 310.74 for the incrementing function.
Now the first number makes complete sense to me. The prime number theorem states that the n'th prime is about n*ln(n) in size (where ln is the natural logarithm). So the distance between adjacent primes is approximately n*ln(n) - (n-1)*ln(n-1) ≈ (n - (n - 1))*ln(n) = ln(n) (For large values of n ln(n) ≈ ln(n - 1)). Since I'm sampling 512-bit integers I'd expect the distance between primes to be in the vicinity of ln(2^512) = 354.89. Therefore random sampling should take about 354.89 attempts on average before hitting a prime, which comes out quite nicely.
The puzzle for me is why the incrementing function is taking about just as many steps. If I imagine throwing a dart at a grid where primes are spaced 355 units apart, it should take only about half that many steps on average to walk to the next higher prime, since on average I'd be hitting the center between two primes.
(The code for prime? is a little lengthy. You can take a look at it here.)
You assume that primes are equally distributed, that seems not to be the case.
Let's consider the following possible scenario: If primes would always come as pairs for example 10...01 and 10...03 then the next pair would come in 2*ln(n). For the sampling algorithm this distribution makes no difference, but for the incrementing algorithm the probability to start inside of a such pair is almost 0, so this means it would need to go a half of the big distance on average, that is ln(n).
In a nutshell: to estimate the behavior of the incremental algorithm right, it is not enough to know the average distance between the primes.
Related
This question already has answers here:
Why do people say there is modulo bias when using a random number generator?
(10 answers)
Closed 2 years ago.
so my question is at Java but it can be in any programming language.
there is this declaration :
Random rnd = new Random();
We want to get a random number at range 0 to x
I want to know if there is any mathematical difference between the following:
rnd.nextInt() % x;
and
rnd.nextInt(x)
The main question is, are one of these solutions more random than the other? Is one solution more appropriate or "correct" than the other? If they are equal I will be happy to see the mathematics proof for it
Welcome to "mathematical insight" with "MS Paint".
So, from a statistical standpoint, it would depend on the distribution of the numbers being generated. First of all, we'll treat the probability of any one number coming up as an independant event (aka discarding the seed, which RNG, etc). Following that, a modulus simply takes a range of numbers (e.g. a from N, where 0<=a<N), and subdivides them based on the divisor (the x in a % x). While the numbers are technically from a discrete population (integers), the range of integers for a probability mass function would be so large that it'd end up looking like a continuous graph anyhow. So let's consider a graph of the probability distribution function for a range of numbers:
If your random number generator doesn't generate with a uniform distribution across the range of numbers (aka, any number is as likely to come up as another number), then modulo would (potentially) be breaking up the results of a non-uniform distribution. When you consider the individual integers in those ranges as discrete (and individual) outcomes, the probability of any number i (0 <= i < x) being the result is the multiplication of the individual probabilities (i_1 * i_2 * ... * i_(N/x)). To think of it another way, if we overlaid the subdivisions of the ranges, it's plain to see that in non-symmetric distributions, it's much more likely that a modulo would not result in equally likely outcomes:
Remember, the likelihood of an outcome i in the graph above would be achieved through multiplying the likelihood of the individuals numbers (i_1, ..., i_(N/x)) in the range N that could result in i. For further clarity, if your range N doesn't evenly divide by the modular divisor x, there will always be some amount of numbers N % x that will have 1 addditional integer that could produce their result. This means that most modulus divisors that aren't a power of 2 (and similarly, ranges that are not a multiple of their divisor) could be skewed towards their lower results, regardless of having a uniform distribution:
So to summarize the point, Random#nextInt(int bound) takes all of these things (and more!) into consideration, and will consistently produce an outcome with uniform probability across the range of bound. Random#nextInt() % bound is only a halfway step that works in some specific scenarios. To your teacher's point, I would argue it's more likely you'll see some specific subset of numbers when using the modulus approach, not less.
new Random(x) just creates the Random object with the given seed, it does not ifself yield a random value.
I presume you are asking what the difference is between nextInt() % x and nextInt(x).
The difference is as follows.
nextInt(x)
nextInt(x) yields a random number n where 0 ≤ n < x, evenly distributed.
nextInt() % x
nextInt() % x yields a random number in the full integer range1, and then applies modulo x. The full integer range includes negative numbers, so the result could also be a negative number. With other words, the range is −x < n < x.
Furthermore, the distribution is not even in by far the most cases. nextInt() has 232 possibilities, but, for simplicity's sake, let's assume it has 24 = 16 possibilities, and we choose x not to be 16 or greater. Let's assume that x is 10.
All possibilities are 0, 1, 2, …, 14, 15, 16. After applying the modulo 10, the results are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5. That means that some numbers have a greater likelihood to occur than others. That also means that the change of some numbers occurring twice has increased.
As we see, nextInt() % x has two problems:
Range is not as required.
Uneven distribution.
So you should definetely use nextInt(int bound) here. If the requirement is get only unique numbers, you must exclude the numbers already drawn from the number generator. See also Generating Unique Random Numbers in Java.
1 According to the Javadoc.
Specifically, if used in the form of:
Random.nextFloat() * N;
can I expect a highly randomized distribution of values from 0 to N?
Would it be better to do something like this?
Random.nextInt(N) * Random.nextFloat();
A single random number from a good generator--and java.util.Random is a good one--will be evenly distributed across the range... it will have a mean and median value of 0.5*N. 1/4 of the numbers will be less than 0.25*N and 1/4 of the numbers will be larger than 0.75*N, etc.
If you then multiply this by another random number generator (whose mean value is 0.5), you will end up with a random number with a mean value of 0.25*N and a median value of 0.187*N... So half your numbers are less than 0.187*N! 1/4 of the numbers will be under .0677*N! And only 1/4 of the numbers will be over 0.382*N. (Numbers obtained experimentally by looking at 1,000,000 random numbers generated as the product of two other random numbers, and analyzing them.)
This is probably not what you want.
At first, Random in Java doesn't contain rand() method. See docs. I think you thought about Random.next() method.
Due to your question, documentation says that nextFloat() is implemented like this:
public float nextFloat() {
return next(24) / ((float)(1 << 24));
}
So you don't need to use anything else.
Random#nextFloat() will give you an evenly distributed number between 0 and 1.
If you take an even distribution and multiply it by N, you scale the distribution up evenly. So you get a random number between 0 and N evenly distributed.
If you multiply this by a random number between 0 and N, then you'll get an uneven distribution. If multiplying by N gives you an even distribution between 0 and N, then multiplying by a number between 0 and N, must give you an answer that is less or equal to if you just multiplied by N. So your numbers on average are smaller.
I want to generate a 160-bit prime number in java. I know that I'll have to loop through all the 160-bit numbers and for any number n, I'll have to check if they are divisible by any primes less than sqroot(n) or by any primality test like Miller-Rabin test. My questions are:
Is there any specific library which does this?
Is there any other (better) way to do this?
BigInteger.probablePrime(160, new Random()) generates a BigInteger that is almost certainly prime -- the probability that it is not a prime is less than the probability that you will get struck by lightning. In general, BigInteger already has heavily tested and optimized primality testing operations built in.
For what it's worth, the reason this won't take forever is that by the prime number theorem, a randomly chosen n-bit number has probability proportional to 1/n of being prime, so on average you only need to try O(n) different random n-bit numbers before you'll find one that's prime.
What is the most efficient way to compute the product
a1 b2 c3 d4 e5 ...
assuming that squaring costs about half as much as multiplication? The number of operands is less than 100.
Is there a simple algorithm also for the case that the multiplication time is proportional to the square of operand length (as with java.math.BigInteger)?
The first (and only) answer is perfect w.r.t. the number of operations.
Funnily enough, when applied to sizable BigIntegers, this part doesn't matter at all. Even computing abbcccddddeeeee without any optimizations takes about the same time.
Most of the time gets spent in the final multiplication (BigInteger implements none of the smarter algorithms like Karatsuba, Toom–Cook, or FFT, so the time is quadratic). What's important is assuring that the intermediate multiplicands are about the same size, i.e., given numbers p, q, r, s of about the same size, computing (pq) (rs) is usually faster than ((pq) r) s. The speed ratio seems to be about 1:2 for some dozens of operands.
Update
In Java 8, there are both Karatsuba and Toom–Cook multiplications in BigInteger.
I absolutely don't know if this is the optimal approach (although I think it is asymptotically optimal), but you can do it all in O(N) multiplications. You group the arguments of a * b^2 * c^3 like this: c * (c*b) * (c*b*a). In pseudocode:
result = 1
accum = 1
for i in 0 .. arguments:
accum = accum * arg[n-i]
result = result * accum
I think it is asymptotically optimal, because you have to use N-1 multiplications just to multiply N input arguments.
As mentioned in the Oct 26 '12 edit:
With multiplication time superlinear in the size of the operands, it would be of advantage to keep the size of the operands for long operations similar (especially if the only Toom-Cook available was toom-2 (Karatsuba)). If not going for a full optimisation, putting operands in a queue that allows popping them in order of increasing (significant) length looks a decent shot from the hip.
Then again, there are special cases: 0, powers of 2, multiplications where one factor is (otherwise) "trivial" ("long-by-single-digit multiplication", linear in sum of factor lengths).
And squaring is simpler/faster than general multiplication (question suggests assuming ½), which would suggest the following strategy:
in a pre-processing step, count trailing zeroes weighted by exponent
result 0 if encountering a 0
remove trailing zeroes, discard resulting values of 1
result 1 if no values left
find and combine values occurring more than once
set up a queue allowing extraction of the "shortest" number. For each pair (number, exponent), insert the factors exponentiation by squaring would multiply
optional: combine "trivial factors" (see above) and re-insert
Not sure how to go about this. Say factors of length 12 where trivial, and initial factors are of length 1, 2, …, 10, 11, 12, …, n. Optimally, you combine 1+10, 2+9, … for 7 trivial factors from 12. Combining shortest gives 3, 6, 9, 12 for 8 from 12
extract the shortest pair of factors, multiply and re-insert
once there is just one number, the result is that with the zeroes from the first step tacked on
(If factorisation was cheap, it would have to go on pretty early to get most from cheap squaring.)
I am trying to write a program to find the largest prime factor of a very large number, and have tried several methods with varying success. All of the ones I have found so far have been unbelievably slow. I had a thought, and am wondering if this is a valid approach:
long number = input;
while(notPrime(number))
{
number = number / getLowestDivisiblePrimeNumber();
}
return number;
This approach would take an input, and would do the following:
200 -> 100 -> 50 -> 25 -> 5 (return)
90 -> 45 -> 15 -> 5 (return)
It divides currentNum repeatedly by the smallest divisible number (most often 2, or 3) until currentNum itself is prime (there is no divisible prime number less than the squareroot of currentNum), and assumes this is the largest prime factor of the original input.
Will this always work? If not, can someone give me a counterexample?
-
EDIT: By very large, I mean about 2^40, or 10^11.
The method will work, but will be slow. "How big are your numbers?" determines the method to use:
Less than 2^16 or so: Lookup table.
Less than 2^70 or so: Sieve of Atkin. This is an optimized version of the more well known Sieve of Eratosthenes. Edit: Richard Brent's modification of Pollard's rho algorithm may be better in this case.
Less than 10^50: Lenstra elliptic curve factorization
Less than 10^100: Quadratic Sieve
More than 10^100: General Number Field Sieve
This will always work because of the Unique Prime Factorization Theorem.
Certainly it will work (see Mark Byers' answer), but for "very large" inputs it may take far too long. You should note that your call to getLowestDivisiblePrimeNumber() conceals another loop, so this runs at O(N^2), and that depending on what you mean by "very large" it may have to work on BigNums which will be slow.
You could speed it up a little, by noting that your algorithm need never check factors smaller than the last one found.
You are trying to find the prime factors of a number. What you are proposing will work, but will still be slow for large numbers.... you should be thankful for this, since most modern security is predicated on this being a difficult problem.
From a quick search I just did, the fastest known way to factor a number is by using the Elliptic Curve Method.
You could try throwing your number at this demo: http://www.alpertron.com.ar/ECM.HTM .
If that convinces you, you could try either stealing the code (that's no fun, they provide a link to it!) or reading up on the theory of it elsewhere. There's a Wikipedia article about it here: http://en.wikipedia.org/wiki/Lenstra_elliptic_curve_factorization but I'm too stupid to understand it. Thankfully, it's your problem, not mine! :)
The thing with Project Euler is that there is usually an obvious brute-force method to do the problem, which will take just about forever. As the questions become more difficult, you will need to implement clever solutions.
One way you can solve this problem is to use a loop that always finds the smallest (positive integer) factor of a number. When the smallest factor of a number is that number, then you've found the greatest prime factor!
Detailed Algorithm description:
You can do this by keeping three variables:
The number you are trying to factor (A)
A current divisor store (B)
A largest divisor store (C)
Initially, let (A) be the number you are interested in - in this case, it is 600851475143. Then let (B) be 2. Have a conditional that checks if (A) is divisible by (B). If it is divisible, divide (A) by (B), reset (B) to 2, and go back to checking if (A) is divisible by (B). Else, if (A) is not divisible by (B), increment (B) by +1 and then check if (A) is divisible by (B). Run the loop until (A) is 1. The (3) you return will be the largest prime divisor of 600851475143.
There are numerous ways you could make this more effective - instead of incrementing to the next integer, you could increment to the next necessarily prime integer, and instead of keeping a largest divisor store, you could just return the current number when its only divisor is itself. However, the algorithm I described above will run in seconds regardless.
The implementation in python is as follows:-
def lpf(x):
lpf = 2;
while (x > lpf):
if (x%lpf==0):
x = x/lpf
lpf = 2
else:
lpf+=1;
print("Largest Prime Factor: %d" % (lpf));
def main():
x = long(raw_input("Input long int:"))
lpf(x);
return 0;
if __name__ == '__main__':
main()
Example: Let's find the largest prime factor of 105 using the method described above.
Let (A) = 105. (B) = 2 (we always start with 2), and we don't have a value for (C) yet.
Is (A) divisible by (B)? No. Increment (B) by +1: (B) = 3. Is Is (A) divisible by (B)? Yes. (105/3 = 35). The largest divisor found so far is 3. Let (C) = 3. Update (A) = 35. Reset (B) = 2.
Now, is (A) divisible by (B)? No. Increment (B) by +1: (B) = 3. Is (A) divisible by (B)? No. Increment (B) by +1: (B) = 4. Is (A) divisible by (B)? No. Increment (B) by +1: (B) = 5. Is (A) divisible by (B)? Yes. (35/5 = 7). The largest divisor we found previously is stored in (C). (C) is currently 3. 5 is larger than 3, so we update (C) = 5. We update (A)=7. We reset (B)=2.
Then we repeat the process for (A), but we will just keep incrementing (B) until (B)=(A), because 7 is prime and has no divisors other than itself and 1. (We could already stop when (B)>((A)/2), as you cannot have integer divisors greater than half of a number - the smallest possible divisor (other than 1) of any number is 2!)
So at that point we return (A) = 7.
Try doing a few of these by hand, and you'll get the hang of the idea