Integer range in square root algorithm of Cracking the Code book

Integer range in square root algorithm of Cracking the Code book - java

There is an algorithm in java for square root in cracking the code book as below:
int sqrt(int n) {
return sqrt_helper(n, 1, n);
}
int sqrt_helper(int n, int min, int max) {
if (max < min) return -1;
int guess = (min + max) / 2·,
if (guess *guess == n) {
return guess;
} else if (guess * guess < n) {
return sqrt_helper(n, guess + 1, max);
} else {
return sqrt_helper(n, min, guess - l);
}
}
The question is:
As min and max are integer, they can have any values in the range, i.e max = Integer.MAX_VALUE
So how not be worry about guess = (min + max) / 2 as it will cross the allowed range, or guess *guess also.

There are simple ways of getting around that problem (like min + (max - min) / 2).
The more serious integer overflow problem is guess * guess. You could change the test to compare guess with n / guess, which is slower but normally won't overflow. Or you could use a bit hack to find a better starting point (clz is useful here, if you have it), since you should be able to find a guess whose square is within the range of representable integers.
An interviewer might be even more impressed if you were able to provide the Newton-Raphson algorithm, which converges extremely rapidly.

Since you mention "Cracking the Coding Interview"...
Typically in the context of the average coding interview one wouldn't worry about implementation-specific details like this. The interviewer is trying to confirm basic competency and understanding - they'll rarely want to run your code, and it should be a given for both of you that the algorithm will break down at the extreme limits of your language's basic data types. If the interviewer asks specifically about limitations, then you could briefly mention that the function will fail for values higher than (Integer.MAX_VALUE / 2) in this language.
The limitation will apply to almost any algorithm you write for a coding interview, and no reasonable interviewer would expect you to specifically design your solution to mitigate this kind of edge case. I would find it extremely off-putting if I asked a candidate to write a function that produces Fibonacci numbers and they spent time trying to optimize the case where the output exceeds 16 digit values.
If for some reason you needed to find the square root of extremely large values using this algorithm in a real life scenario, I'd expect you'd to have to implement it using a generic big number library for your particular language. That being said, I wouldn't roll my own square root algorithm for any real use case under almost any circumstance.

You'd have to confirm your language. But in pseudo-code you can do something like:
int guess = ((min.ToBigInt() + max.ToBigInt()) / 2).ToInt()

Related

Calculate nCr(n,m) mod k for large n efficiently

I need to calculate nCr(n,m) % k for large n (n <= 10^7) efficiently.
Here is my try:
int choose(int n, int m, int k) {
if (n==m || m==0)
return 1 % k;
return (choose(n-1, m-1, k) + choose(n-1, m , k)) % k;
}
It calculates some amount of combinations mod k: nCr(n,m) % k by exploiting pascals identity.
This is too inefficient for large n (try choose(100, 12, 223092870)), I'm not sure if this can be speeded up by memoization or if some totally different number theoretic approach is necessary.
I need this to be executed efficiently for large numbers instantly which is why I'm not sure if memoization is the solution.
Note: k doesn't have to be a prime!

Since nPr has an explicit formula nPr(n, m) = n!/((n-m)!) you should definitely try to use that instead. My tip would be:
Remember that n! = n*(n-1)*...*2*1
Notice that a while loop (yes, loop, not recursion ^^) could greatly optimize the calculation (the division cancels out lots of factors, leaving you with a multiplication nPr(n, m) = n*(n-1)*...*(n-m+2)*(n-m+1))
Finally, you should calculate the modulo after calculating nPr(n, m), to avoid redundant modulo operations.
If it helps, you could try formulating a loop invariant, which is pretty much a statement that should be true for all valid values of n and m.
Hope this helped :)
EDIT
I realized you said nCr after I wrote my answer. For nCr, you could add another while-loop after calculating nPr, that simply calculates m!, divide nPr by m!, and then modulo THAT answer instead. All in all, this would yield an O(n) algorithm, which is pretty scalable. It uses very little memory as well.

This comes up now and then in programming competitions, one common way of solving this is using Lucas' and the Chinese Remainder Theorem.
#DAle posted a useful resource with the details: http://fishi.devtail.io/weblog/2015/06/25/computing-large-binomial-coefficients-modulo-prime-non-prime/

Modular arithmetic: Division over factorials % Prime

I want to efficiently calculate ((X+Y)!/(X!Y!))% P (P is like 10^9+7)
This discussion gives some insights on distributing modulo over division.
My concern is it's not necessary that a modular inverse always exists for a number.
Basically, I am looking for a code implementation of solving the problem.
For multiplication it is very straightforward:
public static int mod_mul(int Z,int X,int Y,int P)
{
// Z=(X+Y) the factorial we need to calculate, P is the prime
long result = 1;
while(Z>1)
{
result = (result*Z)%P
Z--;
}
return result;
}
I also realize that many factors can get cancelled in the division (before taking modulus), but if the number of divisors increase, then I'm finding it difficult to efficiently come up with an algorithm to divide. ( Looping over List(factors(X)+factors(Y)...) to see which divides current multiplying factor of numerator).
Edit: I don't want to use BigInt solutions.
Is there any java/python based solution or any standard algorithm/library for cancellation of factors( if inverse option is not full-proof) or approaching this type of problem.

((X+Y)!/(X!Y!)) is a low-level way of spelling a binomial coefficient ((X+Y)-choose-X). And while you didn't say so in your question, a comment in your code implies that P is prime. Put those two together, and Lucas's theorem applies directly: http://en.wikipedia.org/wiki/Lucas%27_theorem.
That gives a very simple algorithm based on the base-P representations of X+Y and X. Whether BigInts are required is impossible to guess because you didn't give any bounds on your arguments, beyond that they're ints. Note that your sample mod_mul code may not work at all if, e.g., P is greater than the square root of the maximum int (because result * Z may overflow then).

It's binomial coefficients - C(x+y,x).
You can calculate it differently C(n,m)=C(n-1,m)+C(n-1,m-1).
If you are OK with time complexity O(x*y), the code will be much simpler.
http://en.wikipedia.org/wiki/Combination

for what you need here is a way to do it efficiently : -
C(n,k) = C(n-1,k) + C(n-1,k-1)
Use dynamic programming to calculate efficient in bottom up approach
C(n,k)%P = ((C(n-1,k))%P + (C(n-1,k-1))%P)%P
Therefore F(n,k) = (F(n-1,k)+F(n-1,k-1))%P
Another faster approach : -
C(n,k) = C(n-1,k-1)*n/k
F(n,k) = ((F(n-1,k-1)*n)%P*inv(k)%P)%P
inv(k)%P means modular inverse of k.
Note:- Try to evaluate C(n,n-k) if (n-k<k) because nC(n-k) = nCk

Complexity of algorithm implementing Newton's method in finding square root

I have written a Java program to calculate the square root of a user-defined number using Newton's method. The main operations of the algo goes like that:
answer = guess - ((guess * guess - inputNumber) / (2 * guess));
while (Math.abs(answer * answer - inputNumber) > leniency) {
guess = answer;
answer = guess - ((guess * guess - inputNumber) / (2 * guess));
}
I'm now seeking to find the complexity of the algorithm (yup it's homework), and have read up from here that the time complexity of Newton's method is O(log(n) * F(x)).
However, from the above code snippet, I have interpreted the time complexity to be:
O(1+ ∑(1 to n) (1) ) = O(1+n) = O(n)
Not sure what I'm getting wrong here, but I can't seem to understand the disparity in big Os even after reading wiki's explanation.
Also, I am assuming that "complexity of algorithm" is synonymous to "time complexity". Is it right to do so?
Would really appreciate help in explaining this paradox, as I'm a newbie student with a few 'touch and go' programming modules worth of background.
Thanks in advance :)

The problem is that you actually know nothing about n in your calculation - you don't say what it should be. When you calculate the actual error of the next iteration of the algorithm (do it!), you'll see that eg. if a is at least 1 and error is less than 1, you basically double the number of valid places every iteration. So to get p decimal places, you have to perform log(p) iterations.

library for integer factorization in java or scala

There are a lot of questions about how to implement factorization, however for production use, I would rather use an open source library to get something efficient and well tested right away.
The method I am looking for looks like this:
static int[] getPrimeFactors(int n)
it would return {2,2,3} for n=12
A library may also have an overload for handling long or even BigInteger types
The question is not about a particular application, it is about having a library which handles well this problem. Many people argue that different implementations are needed depending on the range of the numbers, in this regard, I would expect that the library select the most reasonable method at runtime.
By efficient I don't mean "world fastest" (I would not work on the JVM for that...), I just mean dealing with int and long range within a second rather than a hour.

It depends what you want to do. If your needs are modest (say, you want to solve Project Euler problems), a simple implementation of Pollard's rho algorithm will find factors up to ten or twelve digits instantly; if that's what you want, let me know, and I can post some code. If you want a more powerful factoring program that's written in Java, you can look at the source code behind Dario Alpern's applet; I don't know about a test suite, and it's really not designed with an open api, but it does have lots of users and is well tested. Most of the heavy-duty open-source factoring programs are written in C or C++ and use the GMP big-integer library, but you may be able to access them via your language's foreign function interface; look for names like gmp-ecm, msieve, pari or yafu. If those don't satisfy you, a good place to ask for more help is the Mersenne Forum.

If you want to solve your problem, rather than get what you are asking for, you want a table. You can precompute it using silly slow methods, store it, and then look up the factors for any number in microseconds. In particular, you want a table where the smallest factor is listed in an index corresponding to the number--much more memory efficient if you use trial division to remove a few of the smallest primes--and then walk your way down the table until you hit a 1 (meaning no more divisors; what you have left is prime). This will take only two bytes per table entry, which means you can store everything on any modern machine more hefty than a smartphone.
I can demonstrate how to create this if you're interested, and show how to check that it is correct with greater reliability than you could hope to achieve with an active community and unit tests of a complex algorithm (unless you ran the algorithm to generate this table and verified that it was all ok).

I need them for testing if a polynomial is primitive or not.
This is faster than trying to find the factors of all the numbers.
public static boolean gcdIsOne(int[] nums) {
int smallest = Integer.MAX_VALUE;
for (int num : nums) {
if (num > 0 && smallest < num)
smallest = num;
}
OUTER:
for (int i = 2; i * i <= smallest; i = (i == 2 ? 3 : i + 2)) {
for (int num : nums) {
if (num % i != 0)
continue OUTER;
}
return false;
}
return true;
}

I tried this function in scala. Here is my result:
def getPrimeFactores(i: Int) = {
def loop(i: Int, mod: Int, primes: List[Int]): List[Int] = {
if (i < 2) primes // might be i == 1 as well and means we are done
else {
if (i % mod == 0) loop(i / mod, mod, mod :: primes)
else loop(i, mod + 1, primes)
}
}
loop(i, 2, Nil).reverse
}
I tried it to be as much functional as possible.
if (i % mod == 0) loop(i / mod, mod, mod :: primes) checks if we found a divisor. If we did we add it to primes and divide i by mod.
If we did not find a new divisor, we just increase the divisor.
loop(i, 2, Nil).reverse initializes the function and orders the result increasingly.

What is a sensible prime for hashcode calculation?

Eclipse 3.5 has a very nice feature to generate Java hashCode() functions. It would generate for example (slightly shortened:)
class HashTest {
int i;
int j;
public int hashCode() {
final int prime = 31;
int result = prime + i;
result = prime * result + j;
return result;
}
}
(If you have more attributes in the class, result = prime * result + attribute.hashCode(); is repeated for each additional attribute. For ints .hashCode() can be omitted.)
This seems fine but for the choice 31 for the prime. It is probably taken from the hashCode implementation of Java String, which was used for performance reasons that are long gone after the introduction of hardware multipliers. Here you have many hashcode collisions for small values of i and j: for example (0,0) and (-1,31) have the same value. I think that is a Bad Thing(TM), since small values occur often. For String.hashCode you'll also find many short strings with the same hashcode, for instance "Ca" and "DB". If you take a large prime, this problem disappears if you choose the prime right.
So my question: what is a good prime to choose? What criteria do you apply to find it?
This is meant as a general question - so I do not want to give a range for i and j. But I suppose in most applications relatively small values occur more often than large values. (If you have large values the choice of the prime is probably unimportant.) It might not make much of a difference, but a better choice is an easy and obvious way to improve this - so why not do it? Commons lang HashCodeBuilder also suggests curiously small values.
(Clarification: this is not a duplicate of Why does Java's hashCode() in String use 31 as a multiplier? since my question is not concerned with the history of the 31 in the JDK, but on what would be a better value in new code using the same basic template. None of the answers there try to answer that.)

I recommend using 92821. Here's why.
To give a meaningful answer to this you have to know something about the possible values of i and j. The only thing I can think of in general is, that in many cases small values will be more common than large values. (The odds of 15 appearing as a value in your program are much better than, say, 438281923.) So it seems a good idea to make the smallest hashcode collision as large as possible by choosing an appropriate prime. For 31 this rather bad - already for i=-1 and j=31 you have the same hash value as for i=0 and j=0.
Since this is interesting, I've written a little program that searched the whole int range for the best prime in this sense. That is, for each prime I searched for the minimum value of Math.abs(i) + Math.abs(j) over all values of i,j that have the same hashcode as 0,0, and then took the prime where this minimum value is as large as possible.
Drumroll: the best prime in this sense is 486187739 (with the smallest collision being i=-25486, j=67194). Nearly as good and much easier to remember is 92821 with the smallest collision being i=-46272 and j=46016.
If you give "small" another meaning and want to be the minimum of Math.sqrt(i*i+j*j) for the collision as large as possible, the results are a little different: the best would be 1322837333 with i=-6815 and j=70091, but my favourite 92821 (smallest collision -46272,46016) is again almost as good as the best value.
I do acknowledge that it is quite debatable whether these calculation make much sense in practice. But I do think that taking 92821 as prime makes much more sense than 31, unless you have good reasons not to.

Actually, if you take a prime so large that it comes close to INT_MAX, you have the same problem because of modulo arithmetic. If you expect to hash mostly strings of length 2, perhaps a prime near the square root of INT_MAX would be best, if the strings you hash are longer it doesn't matter so much and collisions are unavoidable anyway...

Collisions may not be such a big issue... The primary goal of the hash is to avoid using equals for 1:1 comparisons.
If you have an implementation where equals is "generally" extremely cheap for objects that have collided hashs, then this is not an issue (at all).
In the end, what is the best way of hashing depends on what you are comparing. In the case of an int pair (as in your example), using basic bitwise operators could be sufficient (as using & or ^).

You need to define your range for i and j. You could use a prime number for both.
public int hashCode() {
http://primes.utm.edu/curios/ ;)
return 97654321 * i ^ 12356789 * j;
}

I'd choose 7243. Large enough to avoid collissions with small numbers. Doesn't overflow to small numbers quickly.

I just want to point out that hashcode has nothing to do with prime.
In JDK implementation
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
I found if you replace 31 with 27, the result are very similar.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.