Modular arithmetic: Division over factorials % Prime - java

I want to efficiently calculate ((X+Y)!/(X!Y!))% P (P is like 10^9+7)
This discussion gives some insights on distributing modulo over division.
My concern is it's not necessary that a modular inverse always exists for a number.
Basically, I am looking for a code implementation of solving the problem.
For multiplication it is very straightforward:
public static int mod_mul(int Z,int X,int Y,int P)
{
// Z=(X+Y) the factorial we need to calculate, P is the prime
long result = 1;
while(Z>1)
{
result = (result*Z)%P
Z--;
}
return result;
}
I also realize that many factors can get cancelled in the division (before taking modulus), but if the number of divisors increase, then I'm finding it difficult to efficiently come up with an algorithm to divide. ( Looping over List(factors(X)+factors(Y)...) to see which divides current multiplying factor of numerator).
Edit: I don't want to use BigInt solutions.
Is there any java/python based solution or any standard algorithm/library for cancellation of factors( if inverse option is not full-proof) or approaching this type of problem.

((X+Y)!/(X!Y!)) is a low-level way of spelling a binomial coefficient ((X+Y)-choose-X). And while you didn't say so in your question, a comment in your code implies that P is prime. Put those two together, and Lucas's theorem applies directly: http://en.wikipedia.org/wiki/Lucas%27_theorem.
That gives a very simple algorithm based on the base-P representations of X+Y and X. Whether BigInts are required is impossible to guess because you didn't give any bounds on your arguments, beyond that they're ints. Note that your sample mod_mul code may not work at all if, e.g., P is greater than the square root of the maximum int (because result * Z may overflow then).

It's binomial coefficients - C(x+y,x).
You can calculate it differently C(n,m)=C(n-1,m)+C(n-1,m-1).
If you are OK with time complexity O(x*y), the code will be much simpler.
http://en.wikipedia.org/wiki/Combination

for what you need here is a way to do it efficiently : -
C(n,k) = C(n-1,k) + C(n-1,k-1)
Use dynamic programming to calculate efficient in bottom up approach
C(n,k)%P = ((C(n-1,k))%P + (C(n-1,k-1))%P)%P
Therefore F(n,k) = (F(n-1,k)+F(n-1,k-1))%P
Another faster approach : -
C(n,k) = C(n-1,k-1)*n/k
F(n,k) = ((F(n-1,k-1)*n)%P*inv(k)%P)%P
inv(k)%P means modular inverse of k.
Note:- Try to evaluate C(n,n-k) if (n-k<k) because nC(n-k) = nCk

Related

Is there a way to pow 2 BigInteger Numbers in java?

I have to pow a bigInteger number with another BigInteger number.
Unfortunately, only one BigInteger.pow(int) is allowed.
I have no clue on how I can solve this problem.
I have to pow a bigInteger number with another BigInteger number.
No, you don't.
You read a crypto spec and it seemed to say that. But that's not what it said; you didn't read carefully enough. The mathematical 'universe' that the math in the paper / spec you're reading operates in is different from normal math. It's a modulo-space. All operations are implicitly performed modulo X, where X is some number the crypto algorithm explains.
You can do that just fine.
Alternatively, the spec is quite clear and says something like: C = (A^B) % M and you've broken that down in steps (... first, I must calculate A to the power of B. I'll worry about what the % M part is all about later). That's not how that works - you can't lop that operation into parts. (A^B) % M is quite doable, and has its own efficient algorithm. (A^B) is simply not calculable without a few years worth of the planet's entire energy and GDP output.
The reason I know that must be what you've been reading, is because (A ^ B) % M is a common operation in crypto. (Well, that, and the simple fact that A^B can't be done).
Just to be crystal clear: When I say impossible, I mean it in the same way 'travelling faster than the speed of light' is impossible. It's a law in the physics sense of the word: If you really just want to do A^B and not in a modspace where B is so large it doesn't fit in an int, a computer cannot calculate it, and the result will be gigabytes large. int can hold about 9 digits worth. Just for fun, imagine doing X^Y where both X and Y are 20 digit numbers.
The result would have 10^21 digits.
That's roughly equal to the total amount of disk space available worldwide. 10^12 is a terabyte. You're asking to calculate a number where, forget about calculating it, merely storing it requires one thousand million harddisks each of 1TB.
Thus, I'm 100% certain that you do not want what you think you want.
TIP: If you can't follow the math (which is quite bizarre; it's not like you get modulo-space math in your basic AP math class!), generally rolling your own implementation of a crypto algorithm isn't going to work out. The problem with crypto is, if you mess up, often a unit test cannot catch it. No; someone will hack your stuff and then you know, and that's a high price to pay. Rely on experts to build the algorithm, spend your time ensuring the protocol is correct (which is still quite difficult to get right, don't take that lightly!). If you insist, make dang sure you have a heap of plaintext+keys / encrypted (or plaintext / hashed, or whatever it is you're doing) pairs to test against, and assume that whatever you wrote, even if it passes those tests, is still insecure because e.g. it is trivial to leak the key out of your algorithm using timing attacks.
Since you anyway want to use it in a modulo operation with a prime number, like #Progman said in the comments, you can use modPow()
Below is an example code:
// Create BigInteger objects
BigInteger biginteger1, biginteger2, exponent, result;
//prime number
int pNumber = 5;
// Intializing all BigInteger Objects
biginteger1 = new BigInteger("23895");
biginteger2 = BigInteger.valueOf(pNumber);
exponent = new BigInteger("15");
// Perform modPow operation on the objects and exponent
result = biginteger1.modPow(exponent, biginteger2);

Worst case time complexity of Math.sqrt in java

We have a test exercise where you need to find out whether a given N number is a square of another number or no, with the smallest time complexity.
I wrote:
public static boolean what2(int n) {
double newN = (double)n;
double x = Math.sqrt(newN);
int y = (int)x;
if (y * y == n)
return false;
else
return true;
}
I looked online and specifically on SO to try and find the complexity of sqrt but couldn't find it. This SO post is for C# and says its O(1), and this Java post says its O(1) but could potentially iterate over all doubles.
I'm trying to understand the worst time complexity of this method. All other operations are O(1) so this is the only factor.
Would appreciate any feedback!
Using the floating point conversion is OK because java's int type is 32 bits and java's double type is the IEEE 64 bit format that can represent all values of 32 bit integers exactly.
If you were to implement your function for long, you would need to be more careful because many large long values are not represented exactly as doubles, so taking the square root and converting it to an integer type might not yield the actual square root.
All operations in your implementation execute in constant time, so the complexity of your solution is indeed O(1).
If I understood the question correctly, the Java instruction can be converted by just-in-time-compilation to use the native fsqrt instruction (however I don't know whether this is actually the case), which, according to this table, uses a bounded number of processor cycles, which means that the complexity would be O(1).
java's Math.sqrt actually delegates sqrt to StrictMath.java source code one of its implementations can be found here, by looking at sqrt function, it looks like the complexity is constant time. Look at while(r != 0) loop inside.

Fermats factorization method not functioning

I'm working on a program to compare different algorithms for factorization of large integers. One of the algorithms I'm including in the comparison is Fermats factorization method. The algorithm seems to work just fine for small numbers, but when I get larger numbers I get weird results.
Here's my code:
public void fermat(long n)
{
ArrayList<Long> factors = new ArrayList<Long>();
a = (long)Math.ceil(Math.sqrt(n));
b = a*a - n;
b_root = (long)(Math.sqrt(b)+0.5);
while(b_root*b_root != b)
{
a++;
b = a*a - n;
b_root = (long)(Math.sqrt(b)+0.5);
}
factors.add(a-b_root);
factors.add(a+b_root);
}
Now, when I try to factor 42139523531366663 I get the resulting factors 6194235479 and 2984853201, which is incorrect since 6194235479 * 2984853201 = 18488883597240918279. I figured that I got this result because somewhere in the algorithm I got to a point where the numbers became too big for a long or something similar, so the algorithm got a bit messed up because of that. I added a check which calculated the product of the two factors and compared with the input value, so that I'd get an alert if the factorization was faulty:
long x,y;
x = factors.get(0);
y = factors.get(1);
if(x*y!=n)
System.out.println("Faulty factorization.");
Interestingly enough, the check passed as true and I didn't get the alert. I tried just printing the result of the multiplication and this actually resulted in the input value. So my question is why does my program behave like this, and what can I do about it?
It looks like there is an overflow in a long somewhere, because longs have 64 bits and
42139523531366663 + 2^64 = 18488883597240918279
For sufficiently large numbers, you may need switch to using BigInteger.
Is it because there's an error in multiplying large numbers too?
That may be a valid enough reason. This is what makes the program think that it's factorization is right, but when you actually multiply the numbers without using the program, you discover the error.

Finding a prime number at least a 100 digits long that contains 273042282802155991

I am new to Java and one of my class assignments is to find a prime number at least 100 digits long that contains the numbers 273042282802155991.
I have this so far but when I compile it and run it it seems to be in a continuous loop.
I'm not sure if I've done something wrong.
public static void main(String[] args) {
BigInteger y = BigInteger.valueOf(304877713615599127L);
System.out.println(RandomPrime(y));
}
public static BigInteger RandomPrime(BigInteger x)
{
BigInteger i;
for (i = BigInteger.valueOf(2); i.compareTo(x)<0; i.add(i)) {
if ((x.remainder(i).equals(BigInteger.ZERO))) {
x.divide(i).equals(x);
i.subtract(i);
}
}
return i;
}
Since this is homework ...
There is a method on BigInteger that tests for primality. This is much much faster than attempting to factorize a number. (If you take an approach that involves attempting to factorize 100 digit numbers you will fail. Factorization is believed to be an NP-complete problem. Certainly, there is no known polynomial time solution.)
The question is asking for a prime number that contains a given sequence of digits when it is represented as a sequence of decimal digits.
The approach of generating "random" primes and then testing if they contain those digits is infeasible. (Some simple high-school maths tells you that the probability that a randomly generated 100 digit number contains a given 18 digit sequence is ... 82 / 1018. And you haven't tested for primality yet ...
But there's another way to do it ... think about it!
Only start writing code once you've figured out in your head how your algorithm will work, and done the mental estimates to confirm that it will give an answer in a reasonable length of time.
When I say infeasible, I mean infeasible for you. Given a large enough number of computers, enough time and some high-powered mathematics, it may be possible to do some of these things. Thus, technically they may be computationally feasible. But they are not feasible as a homework exercise. I'm sure that the point of this exercise is to get you to think about how to do this the smart way ...
One tip is that these statements do nothing:
x.divide(i).equals(x);
i.subtract(i);
Same with part of your for loop:
i.add(i)
They don't modify the instances themselves, but return new values - values that you're failing to check and do anything with. BigIntegers are "immutable". They can't be changed - but they can be operated upon and return new values.
If you actually wanted to do something like this, you would have to do:
i = i.add(i);
Also, why would you subtract i from i? Wouldn't you always expect this to be 0?
You need to implement/use miller-rabin algorithm
Handbook of Applied Cryptography
chapter 4.24
http://www.cacr.math.uwaterloo.ca/hac/about/chap4.pdf

Compute weighted averages for large numbers

I'm trying to get the weighted average of a few numbers. Basically I have:
Price - 134.42
Quantity - 15236545
There can be as few as one or two or as many as fifty or sixty pairs of prices and quantities. I need to figure out the weighted average of the price. Basically, the weighted average should give very little weight to pairs like
Price - 100000000.00
Quantity - 3
and more to the pair above.
The formula I currently have is:
((price)(quantity) + (price)(quantity) + ...)/totalQuantity
So far I have this done:
double optimalPrice = 0;
int totalQuantity = 0;
double rolling = 0;
System.out.println(rolling);
Iterator it = orders.entrySet().iterator();
while(it.hasNext()) {
System.out.println("inside");
Map.Entry order = (Map.Entry)it.next();
double price = (Double)order.getKey();
int quantity = (Integer)order.getValue();
System.out.println(price + " " + quantity);
rolling += price * quantity;
totalQuantity += quantity;
System.out.println(rolling);
}
System.out.println(rolling);
return rolling/totalQuantity;
The problem is I very quickly max out the "rolling" variable.
How can I actually get my weighted average?
A double can hold a pretty large number (about 1.7 x 10^308, according the docs), but you probably shouldn't use it for values where exact precision is required (such as monetary values).
Check out the BigDecimal class instead. This question on SO talks about it in more detail.
One solution is to use java.math.BigInteger for both rolling and totalQuantity, and only divide them at the end. This has a better numeric stability, as you only have a single floating-point division at the end and everything else is integer operations.
BigInteger is basically unbounded so you shouldn't run into any overflows.
EDIT: Sorry, only upon re-reading I've noticed your price is a double anyway. Maybe it's worth circumventing this by multiplying it with 100 and then converting to BigInteger - since I see in your example it has precisely 2 digits right of the decimal point - and then divide it by 100 at the end, although it's a bit of a hack.
For maximum flexibility, use BigDecimal for rolling, and BigInteger for totalQuantity. After dividing (note, you have it backwards; it should be rolling / totalQuantity), you can either return a BigDecimal, or use doubleValue at a loss of precision.
At any given point, you have recorded both the total value ax + by + cz + ... = pq and the total weight a + b + c + ... = p. Knowing both then gives you the average value pq/p = q. The problem is that pq and p are large sums that overflow, even though you just want the moderately sized q.
The next step adds, for example, a weight of r and a value s. You want to find the new sum (pq + rs) / (p + r) by using only the value of q, which can only happen if p and pq somehow "annihilate" by being in the numerator and denominator of the same fraction. That's impossible, as I'll show.
The value that you need to add in this iteration is, naturally,
(pq + rs) / (p + r) - q
Which can't be simplified to a point where p*q and p disappear. You can also find
(pq + rs) / q(p + r)
the factor by which you'd multiply q in order to get the next average; but again, pq and p remain. So there's no clever solution.
Others have mentioned arbitrary-precision variables, and that's a good solution here. The size of p and pq grow linearly with the number of entries, and the memory usage and calculation speed of integers/floats grows logarithmically with the size of the values. So performance is O(log(n)) unlike the disaster that it would if p were somehow the multiple of many numbers.
First, I don't see how you could be "maxing out" the rolling variable. As #Ash points out, it can represent values up to about 1.7 x 10^308. The only possibility I can think of is that you have some bad values in your input. (Perhaps the real problem is that you are losing precision ...)
Second, your use of a Map as to represent orders is strange and probably broken. The way you are currently using it, you cannot represent orders involving two or more items with the same price.
Your final result is an just a weighted average of precises, so presumably you don't need to follow the rules used when calculating account balances etc. If I am correct about the above, then you don't need to use BigDecimal, double will suffice.
The problem of overflow can be solved by storing a "running average" and updating it with each new entry. Namely, let
a_n = (sum_{i=1}^n x_i * w_i) / (sum_{i=1}^n w_i)
for n = 1, ..., N. You start with a_n = x_n and then add
d_n := a_{n+1} - a_n
to it. The formula for d_n is
d_n = (x_{n+1} - w_{n+1}*a_n) / W_{n+1}
where W_n := sum_{i=1}^n w_n. You need to keep track of W_n, but this problem can be solved by storing it as double (it will be OK as we're only interested in the average). You can also normalize the weights, if you know that all your weights are multiples of 1000, just divide them by 1000.
To get additional accuracy, you can use compensated summation.
Preemptive explanation: it is OK to use floating point arithmetic here. double has relative precision of 2E-16. The OP is averaging positive numbers, so there will be no cancellation error. What the proponents of arbitrary precision arithmetic don't tell you is that, leaving aside rounding rules, in the cases when it does give you lots of additional precision over IEEE754 floating point arithmetic, this will come at significant memory and performance cost. Floating point arithmetic was designed by very smart people (Prof. Kahan, among others), and if there was a way of cheaply increasing arithmetic precision over what is offered by floating point, they'd do it.
Disclaimer: if your weights are completely crazy (one is 1, another is 10000000), then I am not 100% sure if you will get satisfying accuracy, but you can test it on some example when you know what the answer should be.
Do two loops: compute totalQuantity first in the first loop. Then in the second loop accumulate price * (quantity / totalQuantity).

Categories