Is there a way to pow 2 BigInteger Numbers in java? - java

I have to pow a bigInteger number with another BigInteger number.
Unfortunately, only one BigInteger.pow(int) is allowed.
I have no clue on how I can solve this problem.

I have to pow a bigInteger number with another BigInteger number.
No, you don't.
You read a crypto spec and it seemed to say that. But that's not what it said; you didn't read carefully enough. The mathematical 'universe' that the math in the paper / spec you're reading operates in is different from normal math. It's a modulo-space. All operations are implicitly performed modulo X, where X is some number the crypto algorithm explains.
You can do that just fine.
Alternatively, the spec is quite clear and says something like: C = (A^B) % M and you've broken that down in steps (... first, I must calculate A to the power of B. I'll worry about what the % M part is all about later). That's not how that works - you can't lop that operation into parts. (A^B) % M is quite doable, and has its own efficient algorithm. (A^B) is simply not calculable without a few years worth of the planet's entire energy and GDP output.
The reason I know that must be what you've been reading, is because (A ^ B) % M is a common operation in crypto. (Well, that, and the simple fact that A^B can't be done).
Just to be crystal clear: When I say impossible, I mean it in the same way 'travelling faster than the speed of light' is impossible. It's a law in the physics sense of the word: If you really just want to do A^B and not in a modspace where B is so large it doesn't fit in an int, a computer cannot calculate it, and the result will be gigabytes large. int can hold about 9 digits worth. Just for fun, imagine doing X^Y where both X and Y are 20 digit numbers.
The result would have 10^21 digits.
That's roughly equal to the total amount of disk space available worldwide. 10^12 is a terabyte. You're asking to calculate a number where, forget about calculating it, merely storing it requires one thousand million harddisks each of 1TB.
Thus, I'm 100% certain that you do not want what you think you want.
TIP: If you can't follow the math (which is quite bizarre; it's not like you get modulo-space math in your basic AP math class!), generally rolling your own implementation of a crypto algorithm isn't going to work out. The problem with crypto is, if you mess up, often a unit test cannot catch it. No; someone will hack your stuff and then you know, and that's a high price to pay. Rely on experts to build the algorithm, spend your time ensuring the protocol is correct (which is still quite difficult to get right, don't take that lightly!). If you insist, make dang sure you have a heap of plaintext+keys / encrypted (or plaintext / hashed, or whatever it is you're doing) pairs to test against, and assume that whatever you wrote, even if it passes those tests, is still insecure because e.g. it is trivial to leak the key out of your algorithm using timing attacks.

Since you anyway want to use it in a modulo operation with a prime number, like #Progman said in the comments, you can use modPow()
Below is an example code:
// Create BigInteger objects
BigInteger biginteger1, biginteger2, exponent, result;
//prime number
int pNumber = 5;
// Intializing all BigInteger Objects
biginteger1 = new BigInteger("23895");
biginteger2 = BigInteger.valueOf(pNumber);
exponent = new BigInteger("15");
// Perform modPow operation on the objects and exponent
result = biginteger1.modPow(exponent, biginteger2);

Related

Multiplication should be suboptimal. Why is it used in hashCode?

Hash Functions are incredibly useful and versatile. In general, they are used to map a space to one much smaller space. Of course that means that two objects may hash to the same
value (collision), but this is because you are reducing the space (pigeonhole principle).
The efficiency of the function largely depends on the size of the hash space.
It comes as a surprise then that a lot of Java hashCode functions are using multiplication to produce the hash code of a new object as e.g. follows (creating-a-hashcode-method-java)
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((email == null) ? 0 : email.hashCode());
result = prime * result + (int) (id ^ (id >>> 32));
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
If we want to mix two hashcodes in the same range, xor should be much better than addition and is I think traditionally used. If we wanted to increase the space, shifting by some bytes and then xoring would still imho make sense. I guess multiplying by 31 is almost the same as shifting one hash by 1 and then adding but it should be much less efficient...
As it is the recommended approach though, I think I am missing something. So my question is why would this be?
Notes:
I am not asking why we use a prime. It is pretty clear that if we used multiplication, we should go with a prime. However multiplying by any number, even a prime, should still be suboptimal to xor. That is why e.g. all these other non-cryptographic hash functions - as well as most cryptographic - use xor and not multiplications...
I have indeed no indication (apart from all those well known hash functions) xor would be better. In fact just by the fact it is so widely accepted, I suspect it should be as good and in practice better to multiply by a prime and sum. I am asking why this is...
The int type in Java can be used to represent any whole number from -2147483648 to 2147483647.
Sometimes the hashcode of an object may be its memory address (which makes sense and is efficient in a lot of situations) (if inherited from e.g. object)
The answer to this is a mixture of different factors:
On modern architecture, the time taken to perform a multiplication versus a shift may not end up being measurable overall within a given pipeline of instructions-- it has more to do with the availability of the relevant execution unit on the CPU than the "raw" time taken;
In practice when integrating with standard collections libraries in day-to-day programming, it's often more important that a hash function is correct, "good enough" and easy to automate in an IDE than for it to be as perfect as possible;
The collections libraries generally add secondary hash functions and potentially other techniques behind the scenes to overcome some of the weaknesses of what would otherwise be a poor hash function;
With resizable collections, an effective hash function has the goal of dispersing its hashes across the available range for arbitrary sizes of hash tables (though as I say, it will get help from the built-in secondary function): multiplying by a "magic" constant is often a cheap way to achieve this (or, even if multiplication turned out to be a bit more expensive than a shift: still cheap enough, given the benefit); addition rather than XOR may help to allow this 'avalanche' effect slightly. (In most practical cases, you will probably find that they work equally well.)
You can generally assume that the JIT compiler "knows" about equivalents such as shifting 5 places and subtracting 1 rather than multiplying by 31. Just because you write "*31" in the source code doesn't mean that it will literally be compiled to a multiplication instruction. (In practice, it might be, though, because despite what you think, the multiply instruction may well be "faster" on average on the architecture in question... It's usually better to make your code stick to the required logic and let the JIT compiler handle the low level optimisations in a case such as this.)

Generating Random Hash Functions for LSH Minhash Algorithm

I'm programming a minhashing algorithm in Java that requires me to generate an arbitrary number of random hash functions (240 hash functions in my case), and run any number of integers through it (2000 at the moment).
In order to do that, I've been generating random numbers a, b, and c (from the range 1 - 2001) for each of the 240 hash functions. Then, my hash function returns h = ((a*x) + b) % c, where h is the return value and x is one of the integers run through it.
Is this an efficient implementation of random hashing, or is there a more common/acceptable way to do it?
This post was asking a similar question, but I'm still somewhat confused by the wording of the answer: Minhash implementation how to find hash functions for permutations
When I was working with Bloom filters a few years ago, I ran across an article that describes how to generate multiple hash functions very simply, with a minimum of code. The method he describes works very well. See Less Hashing, Same Performance: Building a Better Bloom Filter.
The basic idea is to create two hash functions, call them h1 and h2, with which you can then simulate multiple hash functions, g1 through gk, using the formula:
gi = h1(x) + i*h2(x)
i varies from 1 to k (the number of hash functions you want).
The paper is well worth reading, even if you decide not to implement his idea. Although after reading it I can't imagine not wanting to implement it. It made my Bloom filter code a whole lot more tractable and didn't negatively impact performance.
So the method that I described above was almost correct. The numbers a and b should be randomly generated. However, c needs to be a prime number that is slightly larger than the maximum possible value of x. Once those numbers have been chosen, finding hash value h using h = ((a*x)+b) % c is the standard, accepted way to generate hash functions.
Also, a and b should be random numbers from the range 1 to c-1.

Finding a prime number at least a 100 digits long that contains 273042282802155991

I am new to Java and one of my class assignments is to find a prime number at least 100 digits long that contains the numbers 273042282802155991.
I have this so far but when I compile it and run it it seems to be in a continuous loop.
I'm not sure if I've done something wrong.
public static void main(String[] args) {
BigInteger y = BigInteger.valueOf(304877713615599127L);
System.out.println(RandomPrime(y));
}
public static BigInteger RandomPrime(BigInteger x)
{
BigInteger i;
for (i = BigInteger.valueOf(2); i.compareTo(x)<0; i.add(i)) {
if ((x.remainder(i).equals(BigInteger.ZERO))) {
x.divide(i).equals(x);
i.subtract(i);
}
}
return i;
}
Since this is homework ...
There is a method on BigInteger that tests for primality. This is much much faster than attempting to factorize a number. (If you take an approach that involves attempting to factorize 100 digit numbers you will fail. Factorization is believed to be an NP-complete problem. Certainly, there is no known polynomial time solution.)
The question is asking for a prime number that contains a given sequence of digits when it is represented as a sequence of decimal digits.
The approach of generating "random" primes and then testing if they contain those digits is infeasible. (Some simple high-school maths tells you that the probability that a randomly generated 100 digit number contains a given 18 digit sequence is ... 82 / 1018. And you haven't tested for primality yet ...
But there's another way to do it ... think about it!
Only start writing code once you've figured out in your head how your algorithm will work, and done the mental estimates to confirm that it will give an answer in a reasonable length of time.
When I say infeasible, I mean infeasible for you. Given a large enough number of computers, enough time and some high-powered mathematics, it may be possible to do some of these things. Thus, technically they may be computationally feasible. But they are not feasible as a homework exercise. I'm sure that the point of this exercise is to get you to think about how to do this the smart way ...
One tip is that these statements do nothing:
x.divide(i).equals(x);
i.subtract(i);
Same with part of your for loop:
i.add(i)
They don't modify the instances themselves, but return new values - values that you're failing to check and do anything with. BigIntegers are "immutable". They can't be changed - but they can be operated upon and return new values.
If you actually wanted to do something like this, you would have to do:
i = i.add(i);
Also, why would you subtract i from i? Wouldn't you always expect this to be 0?
You need to implement/use miller-rabin algorithm
Handbook of Applied Cryptography
chapter 4.24
http://www.cacr.math.uwaterloo.ca/hac/about/chap4.pdf

What's the benefit of seeding a random number generator with only prime numbers?

While conducting some experiments in Java, my project supervisor reminded me to seed each iteration of the experiment with a different number. He also mentioned that I should use prime numbers for the seed values. This got me thinking — why primes? Why not any other number as the seed? Also, why must the prime number be sufficiently big? Any ideas? I would've asked him this myself, but its 4am here right now, everyone's asleep, I just remembered this question and I'm burning to know the answer (I'm sure you know the feeling).
It would be nice if you could provide some references, I'm very interested in the math/concept behind all this!
EDIT:
I'm using java.util.Random.
FURTHER EDIT:
My professor comes from a C background, but I'm using Java. Don't know if that helps. It appears that using primes is his idiosyncrasy, but I think we've unearthed some interesting answers about generating random numbers. Thanks to everyone for the effort!
Well one blink at the implementation would show you that he CAN'T have any reason for that claim at all. Why? Because that's how the set seed function looks like:
synchronized public void setSeed(long seed) {
seed = (seed ^ multiplier) & mask;
this.seed.set(seed);
haveNextNextGaussian = false;
}
And that's exactly what's called from the constructor. So even if you give it a prime, it won't use it anyhow, so if at all you'd have to use a seed s where (s^ multiplier) & mask results in a prime ;)
Java uses a usual linear congruency method, i.e.:
x_n+1 = (a * x_n + c) mod m with 2 <= a < m; 0 <= c < m.
Since you want to get a maximal periode, c and m have to be relatively prime and a few other quite obscure limitations, plus a few tips how to get a practically useful version. Knuth obviously covers that in detail in part2 ;)
But anyhow, the seed doesn't influence the qualities of the generator at all. Even if the implementation would be using a Lehmer generator, it would obviously make sure that N is prime (otherwise the algorithm is practically useless; and not uniformly distributed if all random values would have to be coprime to a non prime N I wager) which makes the point moot
If the generator is a Lehmer generator, than the seed and the modulus must be co-prime; see the wiki page. One way to ensure they are co-prime is to start with a prime number.
If you are talking about java.util.Random, or one of its subclasses in the Oracle runtime, there's no reason for this. It's just a whim of your supervisor.

How to handle multiplication of numbers close to 1

I have a bunch of floating point numbers (Java doubles), most of which are very close to 1, and I need to multiply them together as part of a larger calculation. I need to do this a lot.
The problem is that while Java doubles have no problem with a number like:
0.0000000000000000000000000000000001 (1.0E-34)
they can't represent something like:
1.0000000000000000000000000000000001
Consequently of this I lose precision rapidly (the limit seems to be around 1.000000000000001 for Java's doubles).
I've considered just storing the numbers with 1 subtracted, so for example 1.0001 would be stored as 0.0001 - but the problem is that to multiply them together again I have to add 1 and at this point I lose precision.
To address this I could use BigDecimals to perform the calculation (convert to BigDecimal, add 1.0, then multiply), and then convert back to doubles afterwards, but I have serious concerns about the performance implications of this.
Can anyone see a way to do this that avoids using BigDecimal?
Edit for clarity: This is for a large-scale collaborative filter, which employs a gradient descent optimization algorithm. Accuracy is an issue because often the collaborative filter is dealing with very small numbers (such as the probability of a person clicking on an ad for a product, which may be 1 in 1000, or 1 in 10000).
Speed is an issue because the collaborative filter must be trained on tens of millions of data points, if not more.
Yep: because
(1 + x) * (1 + y) = 1 + x + y + x*y
In your case, x and y are very small, so x*y is going to be far smaller - way too small to influence the results of your computation. So as far as you're concerned,
(1 + x) * (1 + y) = 1 + x + y
This means you can store the numbers with 1 subtracted, and instead of multiplying, just add them up. As long as the results are always much less than 1, they'll be close enough to the mathematically precise results that you won't care about the difference.
EDIT: Just noticed: you say most of them are very close to 1. Obviously this technique won't work for numbers that are not close to 1 - that is, if x and y are large. But if one is large and one is small, it might still work; you only care about the magnitude of the product x*y. (And if both numbers are not close to 1, you can just use regular Java double multiplication...)
Perhaps you could use logarithms?
Logarithms conveniently reduce multiplication to addition.
Also, to take care of the initial precision loss, there is the function log1p (at least, it exists in C/C++), which returns log(1+x) without any precision loss. (e.g. log1p(1e-30) returns 1e-30 for me)
Then you can use expm1 to get the decimal part of the actual result.
Isn't this sort of situation exactly what BigDecimal is for?
Edited to add:
"Per the second-last paragraph, I would prefer to avoid BigDecimals if possible for performance reasons." – sanity
"Premature optimization is the root of all evil" - Knuth
There is a simple solution practically made to order for your problem. You are concerned it might not be fast enough, so you want to do something complicated that you think will be faster. The Knuth quote gets overused sometimes, but this is exactly the situation he was warning against. Write it the simple way. Test it. Profile it. See if it's too slow. If it is then start thinking about ways to make it faster. Don't add all this additional complex, bug-prone code until you know it's necessary.
Depending on where the numbers are coming from and how you are using them, you may want to use rationals instead of floats. Not the right answer for all cases, but when it is the right answer there's really no other.
If rationals don't fit, I'd endorse the logarithms answer.
Edit in response to your edit:
If you are dealing with numbers representing low response rates, do what scientists do:
Represent them as the excess / deficit (normalize out the 1.0 part)
Scale them. Think in terms of "parts per million" or whatever is appropriate.
This will leave you dealing with reasonable numbers for calculations.
Its worth noting that you are testing the limits of your hardware rather than Java. Java uses the 64-bit floating point in your CPU.
I suggest you test the performance of BigDecimal before you assume it won't be fast enough for you. You can still do tens of thousands of calculations per second with BigDecimal.
As David points out, you can just add the offsets up.
(1+x) * (1+y) = 1 + x + y + x*y
However, it seems risky to choose to drop out the last term. Don't. For example, try this:
x = 1e-8
y = 2e-6
z = 3e-7
w = 4e-5
What is (1+x)(1+y)(1+z)*(1+w)? In double precision, I get:
(1+x)(1+y)(1+z)*(1+w)
ans =
1.00004231009302
However, see what happens if we just do the simple additive approximation.
1 + (x+y+z+w)
ans =
1.00004231
We lost the low order bits that may have been important. This is only an issue if some of the differences from 1 in the product are at least sqrt(eps), where eps is the precision you are working in.
Try this instead:
f = #(u,v) u + v + u*v;
result = f(x,y);
result = f(result,z);
result = f(result,w);
1+result
ans =
1.00004231009302
As you can see, this gets us back to the double precision result. In fact, it is a bit more accurate, since the internal value of result is 4.23100930230249e-05.
If you really need the precision, you will have to use something like BigDecimal, even if it's slower than Double.
If you don't really need the precision, you could perhaps go with David's answer. But even if you use multiplications a lot, it might be some Premature Optimization, so BIgDecimal might be the way to go anyway
When you say "most of which are very close to 1", how many, exactly?
Maybe you could have an implicit offset of 1 in all your numbers and just work with the fractions.

Categories