Finding Big-O for a Rabin-Miller test - java

So I'm trying to find out what Big-O is for a Rabin-Miller test and I've done some research on it but I can't really find a good explanation for it. The thing that confuses me the most is this part:
while( !isPrime(n)){
n = new BigInteger(bit, new Random());
}
This is a piece of my main program where I keep generating a new number until I find a prime and then it exits the loop. How can I estimate Big-O when I don't know how many times the while loop will run?

Well when calculating the Big-O, there are three things to do
find the best case
find the worst case
and
find the average case.
In this instance, the worst case Big-O would clearly be O(infinity), which will be achieved in the highly unlikely case that n is initially not a prime number, and all the newly calculated instances of n are also never going to be prime numbers.
The best case Big-O would be the same as the Big-O of your isPrime() method. This is because the best case would be when n is initially a prime number, which will cause the while loop to never be executed at all. One thing to note is that your while-loop condition does two things : check if a boolean value is true, and calls the isPrime() method. So to find the Big-O, one must multiply the Big-O of the isPrime() method by the Big-O of the boolean being checked. The Big-O of the boolean being checked is O(1). Therefore, your best case will be the same as the Big-O of your isPrime() method, as 1*x = x. I do not know how wrote your isPrime() method, so cannot tell you the best-case Big-O.
The average case however, is harder to find here because you're dealing with random numbers. Since we deal with random numbers, the average case can be calculated using something called expected analysis In order to do this however, we need to know the range of the random numbers. The java api says that the bigInteger constructor that you're using calculates random numbers within the range of 0 and 2^bit-1, inclusive.(Positive numbers only). Since I don't know what the value of bit in your code, I cannot give you the average case, but hope that you'll be able to calculate that yourself.
If you have any questions just ask!

Related

Complexity in cases excellent, average and bad

I created this method in java that indicates whether an integer array is sorted or not. What is its complexity? I think if good is O(1) in the worst case is O(n) in the average case?
static boolean order(int[] a){
for(int i=0;i<a.length-1;i++){
if(a[i]>a[i+1]) return false;
}
return true;
}
You didn't tell anything about your input. So suppose it's totally random. So for any 2 neighbour pairs we have 50% chance that they are ordered. It means that we have probability 1 of making 1 step, 0.5 for 2 steps, 0.25 for 3 steps and generally 2^(-k) for k steps. Let's calculate expected number of steps:
I don't know how to calculate sum of this series so I used wolfram alpha and got answer: 2, so it's a constant.
So as I understand average case for random input is O(1).
I'm not sure it is correct way to calculate average complexity but seems fine to me.
Complexity is usually quoted in worst case, which in your case is O(n).

Debugging of a recursive algorithm

My question is if there are some smart ways of debugging complicated recursive algorithms.
Assume that we have a complicated one (not a simple case when recursion counter is decreased in each 'nested iteration').
I mean something like recursive traversing of a graph when loops are possible.
I need to check if I am not getting endless loop somewhere. And doing this just using a debugger gives not certain answer (because I am not sure if an algorithm is in endless loop or just process as it should).
It's hard to explain it without concrete example. But what I need is...
'to check if the endless loops don't occur in let's say complicated recursive algorithm'.
You need to form a theory for why you think the algorithm does terminate. Ideally, prove the theory as a mathematical theorem.
You can look for a function of the problem state that does reduce on each recursive call. For example, see the following discussion of Ackermann's function, from Wikipedia
It may not be immediately obvious that the evaluation of A(m, n) always terminates. However, the recursion is bounded because in each recursive application either m decreases, or m remains the same and n decreases. Each time that n reaches zero, m decreases, so m eventually reaches zero as well. (Expressed more technically, in each case the pair (m, n) decreases in the lexicographic order on pairs, which is a well-ordering, just like the ordering of single non-negative integers; this means one cannot go down in the ordering infinitely many times in succession.) However, when m decreases there is no upper bound on how much n can increase — and it will often increase greatly.
That is the type of reasoning you should be thinking of applying to your algorithm.
If you cannot find any way to prove your algorithm terminates, consider looking for a variation whose termination you can prove. It is not always possible to decide whether an arbitrary program terminates or not. The trick is to write algorithms you can prove terminate.
Best is proving finiteness by pre and post conditions, variants and invariants. If you can specify a (virtual) formula which value increases on every call you have a guarantee.
This is the same as proving loops to be finite. Furthermore it might make complex algorithms more tackable.
You need to count the depth of recursive calls ... and then throw an exception if the depth of recursive calls reaches a certain threshold.
For example:
void TheMethod(object[] otherParameters, int recursiveCallDepth)
{
if (recursiveCallDepth > 100) {
throw new Exception("...."); }
TheMethod(otherParameters, ++recursiveCallDepth);
}
if you want to check for endless loops,
write a System.out.println("no its not endless"); at the next line of calling the recursive function.
if the loop would be endless, this statement wont get print, if otherwise you will see the output
One suggestion is the following:
If you have endless loop then in the graph case you will obtain a path with number of vertices greater than the total number of vertices in the graph. Assuming that the number of vertices in the graph is a global variable (which, I think, is the most common case) you can do a conditional breakpoint in the beginning of the recursion if the depth is already above the total number of vertices.
Here is a link how you do conditional breakpoints for java in Eclipse.

Finding a prime number at least a 100 digits long that contains 273042282802155991

I am new to Java and one of my class assignments is to find a prime number at least 100 digits long that contains the numbers 273042282802155991.
I have this so far but when I compile it and run it it seems to be in a continuous loop.
I'm not sure if I've done something wrong.
public static void main(String[] args) {
BigInteger y = BigInteger.valueOf(304877713615599127L);
System.out.println(RandomPrime(y));
}
public static BigInteger RandomPrime(BigInteger x)
{
BigInteger i;
for (i = BigInteger.valueOf(2); i.compareTo(x)<0; i.add(i)) {
if ((x.remainder(i).equals(BigInteger.ZERO))) {
x.divide(i).equals(x);
i.subtract(i);
}
}
return i;
}
Since this is homework ...
There is a method on BigInteger that tests for primality. This is much much faster than attempting to factorize a number. (If you take an approach that involves attempting to factorize 100 digit numbers you will fail. Factorization is believed to be an NP-complete problem. Certainly, there is no known polynomial time solution.)
The question is asking for a prime number that contains a given sequence of digits when it is represented as a sequence of decimal digits.
The approach of generating "random" primes and then testing if they contain those digits is infeasible. (Some simple high-school maths tells you that the probability that a randomly generated 100 digit number contains a given 18 digit sequence is ... 82 / 1018. And you haven't tested for primality yet ...
But there's another way to do it ... think about it!
Only start writing code once you've figured out in your head how your algorithm will work, and done the mental estimates to confirm that it will give an answer in a reasonable length of time.
When I say infeasible, I mean infeasible for you. Given a large enough number of computers, enough time and some high-powered mathematics, it may be possible to do some of these things. Thus, technically they may be computationally feasible. But they are not feasible as a homework exercise. I'm sure that the point of this exercise is to get you to think about how to do this the smart way ...
One tip is that these statements do nothing:
x.divide(i).equals(x);
i.subtract(i);
Same with part of your for loop:
i.add(i)
They don't modify the instances themselves, but return new values - values that you're failing to check and do anything with. BigIntegers are "immutable". They can't be changed - but they can be operated upon and return new values.
If you actually wanted to do something like this, you would have to do:
i = i.add(i);
Also, why would you subtract i from i? Wouldn't you always expect this to be 0?
You need to implement/use miller-rabin algorithm
Handbook of Applied Cryptography
chapter 4.24
http://www.cacr.math.uwaterloo.ca/hac/about/chap4.pdf

Repetition algorithm problem, Java

Set an algorithm that reads a undetermined number of values to m, all positive ints, one at a time. If m is even, verify how many divisors there is and write that information. If m is odd calculate and write the factored of m.
How do I do that? I'm entirely confused by that problem, I need a light on that subject.
You need to have a loop which repeatedly calls a method readAndWorkWithNumber().
This method
reads a number m (might call another method to do this)
checks if m is odd or even
if odd, calls factorize(m).
if even, calls countFactors(m).
The last two methods should then do what their name said, and output the result. (Alternatively, return it and output it in readAndWorkWith).

BigO running time on some methods

Ok, these are all pretty simple methods, and there are a few of them, so I didnt want to just create multiple questions when they are all the same thing. BigO is my weakness. I just cant figure out how they come up with these answers. Is there anyway you can give me some insight into your thinking for analyzing running times of some of these methods? How do you break it down? How should I think when I see something like these? (specifically the second one, I dont get how thats O(1))
function f1:
loop 3 times
loop n times
Therefore O(3*n) which is effectively O(n).
function f2:
loop 50 times
O(50) is effectively O(1).
We know it will loop 50 times because it will go until n = n - (n / 50) is 0. For this to be true, it must iterate 50 times (n - (n / 50)*50 = 0).
function f3:
loop n times
loop n times
Therefore O(n^2).
function f4:
recurse n times
You know this because worst case is that n = high - low + 1. Disregard the +1.
That means that n = high - low.
To terminate,
arr[hi] * arr[low] > 10
Assume that this doesn't occur until low is incremented to the highest it can go (high).
This means n = high - 0 and we must recurse up to n times.
function 5:
loops ceil(log_2(n)) times
We know this because of the m/=2.
For example, let n=10. log_2(10) = 3.3, the ceiling of which is 4.
10 / 2 =
5 / 2 =
2.5 / 2 =
1.25 / 2 =
0.75
In total, there are 4 iterations.
You get an n^2 analysis when performing a loop within a loop, such as the third method.
However, the first method doesn't a n^2 timing analysis because the first loop is defined as running three times. This makes the timing for the first one 3n, but we don't care about numbers for Big-O.
The second one, introduces an interesting paradigm, where despite the fact that you have a single loop, the timing analysis is still O(1). This is because if you were to chart the timing it takes to perform this method, it wouldn't behave as O(n) for smaller numbers. For larger numbers it becomes obvious.
For the fourth method, you have an O(n) timing because you're recursive function call is passing lo + 1. This is similar to if you were using a for loop and incrementing with lo++/++lo.
The last one has a O(log n) timing because your dividing your variable by two. Just remember than anything that reminds you of a binary search will have a log n timing.
There is also another trick to timing analysis. Say you had a loop within a loop, and within each of the two loops you were reading lines from a file or popping of elements from a stack. This actually would only be a O(n) method, because a file only has a certain number of lines you can read, and a stack only has a certain number of elements you can pop off.
The general idea of big-O notation is this: it gives a rough answer to the question "If you're given a set of N items, and you have to perform some operation repeatedly on these items, how many times will you need to perform this operation?" I say a rough answer, because it (most of the time) doesn't give a precise answer of "5*N+35", but just "N". It's like a ballpark. You don't really care about the precise answer, you just want to know how bad it will get when N gets large. So answers like O(N), O(N*N), O(logN) and O(N!) are typical, because they each represent sort of a "class" of answers, which you can compare to each other. An algorithm with O(N) will perform way better than an algorithm with O(N*N) when N gets large enough, it doesn't matter how lengthy the operation is itself.
So I break it down thus: First identify what the N will be. In the examples above it's pretty obvious - it's the size of the input array, because that determines how many times we will loop. Sometimes it's not so obvious, and sometimes you have multiple input data, so instead of just N you also get M and other letters (and then the answer is something like O(N*M*M)).
Then, when I have my N figured out, I try to identify the loop which depends on N. Actually, these two things often get identified together, as they are pretty much tied together.
And, lastly of course, I have to figure out how many iterations the program will make depending on N. And to make it easier, I don't really try to count them, just try to recognize the typical answers - O(1), O(N), O(N*N), O(logN), O(N!) or perhaps some other power of N. The O(N!) is actually pretty rare, because it's so inefficient, that implementing it would be pointless.
If you get an answer of something like N*N+N+1, then just discard the smaller ones, because, again, when N gets large, the others don't matter anymore. And ignore if the operation is repeated some fixed number of times. O(5*N) is the same as O(N), because it's the ballpark we're looking for.
Added: As asked in the comments, here are the analysis of the first two methods:
The first one is easy. There are only two loops, the inner one is O(N), and the outer one just repeats that 3 times. So it's still O(N). (Remember - O(3N) = O(N)).
The second one is tricky. I'm not really sure about it. After looking at it for a while I understood why it loops at most only 50 times. Since this is not dependant on N at all, it counts as O(1). However, if you were to pass it, say, an array of only 10 items, all positive, it would go into an infinite loop. That's O(∞), I guess. So which one is it? I don't know...
I don't think there's a formal way of determining the big-O number for an algorithm. It's like the halting problem. In fact, come to think of it, if you could universally determine the big-O for a piece of code, you could also determine if it ever halts or not, thus contradicting the halting problem. But that's just my musings.
Typically I just go by... dunno, sort of a "gut feeling". Once you "get" what the Big-O represents, it becomes pretty intuitive. But for complicated algorithms it's not always possible to determine. Take Quicksort for example. On average it's O(N*logN), but depending on the data it can degrade to O(N*N). The questions you'll get on the test though should have clear answers.
The second one is 50 because big O is a function of the length of the input. That is if the input size changes from 1 million to 1 billion, the runtime should increase by 1000 if the function is O(N) and 1 million if it's O(n^2). However the second function runs in time 50 regardless of the input length, so it's O(1). Technically it would be O(50) but constants don't matter for big O.

Categories