Java BigInteger, cut off last digit - java

Fairly easy, if the BigInteger number is 543 I want it to cut off the last digit so that it is 54.
Two easy ways to do this can be :
Use strings, get substring and create new biginteger with the new value.
Use BigIntegers divide method with number 10. ( 543 / 10 = 54.3 => 54 )
The thing is I will be performing this a lot of times with large integers of course.
My guess is that playing around with strings will be slower but then again I haven't used Bigintegers so much and have no idea how expensive the "divide" operation is.
The speed is essential here, what is the fastest way to implement this (memory is no problem only speed) ?
Others solutions are also welcome.

Divide by 10 is most likely going to be faster.

Dividing by 10 is much faster than using a substring operation. Using the following benchmark, I get about 161x times (ratio is proportional to bit count)
long divTime = 0;
long substrTime = 0;
final int bitsCount = 1000;
for (int i = 0; i < 1000; ++i) {
long t1, t2;
BigInteger random = new BigInteger(bitsCount, new Random());
t1 = System.currentTimeMillis();
random.divide(BigInteger.TEN);
t2 = System.currentTimeMillis();
divTime += (t2 - t1);
t1 = System.currentTimeMillis();
String str = random.toString();
new BigInteger(str.substring(0, str.length() - 1));
t2 = System.currentTimeMillis();
substrTime += (t2 - t1);
}
System.out.println("Divide: " + divTime);
System.out.println("Substr: " + substrTime);
System.out.println("Ratio: " + (substrTime / divTime));

If you create a BigInteger statically that has the number 10, and then use that to divide by 10, that will be potentially the fastest way to do this. It beats creating a temporary new BigInteger every time.
The problem with substring is that you are essentially creating a new String every single time, and that is much slower, not to mention the slowness that is iterating through a string to get its substring.

The fastest way is dividing the number by 10 with an efficient internal division implementation. The internals of that operation are behind the scenes but certainly non-trivial since the number is stored base-2.

The fastest possible implementation would probably be to use a data type whose internal representation uses base 10, i.e. some sort of BCD. Then, division by 10 would simply mean dropping the last byte (or even just incrementing/decrementing an index if you implement it the right way).
Of course, you'd have to implement all arithmetic and other operations you need from scratch, making this a lot of work.

It's probably premature to even be asking this question. Do it the obvious way (divide by ten), then benchmark it, and optimize it if you need to. Converting to a string representation and back will be much slower.

The toString() alone is probably slower than the substring.

Various people have said that dividing by 10 will be faster than converting to a string and taking the substring. To understand why, just think about the computation involved in converting from a BigInteger to a String, and vice versa. For example:
/* simplified pseudo code for converting +ve numbers to strings */
StringBuffer sb = new StringBuffer(...);
while (number != 0) {
digit = number % 10;
sb.append((char)(digit + '0'));
number = number / 10;
}
return sb.toString();
The important thing to note is that converting from a number to a string entails repeatedly dividing by 10. Indeed the number of divisions is proportional to log10(number). Going in the other direction involves log10(number) multiplications. It should be obvious that this is much more computation than a single division by 10.

if performance is crucial... don't use java
In languages which compile to machine code (for instance c or c++) the integer divide is quicker by a huge factor. String operations use (or can use) memory allocations and are therefore slow.
My bet is that in java int divisions will be quicker too. Otherwise their vm implementation is really weird.

Related

Is there a way to pow 2 BigInteger Numbers in java?

I have to pow a bigInteger number with another BigInteger number.
Unfortunately, only one BigInteger.pow(int) is allowed.
I have no clue on how I can solve this problem.
I have to pow a bigInteger number with another BigInteger number.
No, you don't.
You read a crypto spec and it seemed to say that. But that's not what it said; you didn't read carefully enough. The mathematical 'universe' that the math in the paper / spec you're reading operates in is different from normal math. It's a modulo-space. All operations are implicitly performed modulo X, where X is some number the crypto algorithm explains.
You can do that just fine.
Alternatively, the spec is quite clear and says something like: C = (A^B) % M and you've broken that down in steps (... first, I must calculate A to the power of B. I'll worry about what the % M part is all about later). That's not how that works - you can't lop that operation into parts. (A^B) % M is quite doable, and has its own efficient algorithm. (A^B) is simply not calculable without a few years worth of the planet's entire energy and GDP output.
The reason I know that must be what you've been reading, is because (A ^ B) % M is a common operation in crypto. (Well, that, and the simple fact that A^B can't be done).
Just to be crystal clear: When I say impossible, I mean it in the same way 'travelling faster than the speed of light' is impossible. It's a law in the physics sense of the word: If you really just want to do A^B and not in a modspace where B is so large it doesn't fit in an int, a computer cannot calculate it, and the result will be gigabytes large. int can hold about 9 digits worth. Just for fun, imagine doing X^Y where both X and Y are 20 digit numbers.
The result would have 10^21 digits.
That's roughly equal to the total amount of disk space available worldwide. 10^12 is a terabyte. You're asking to calculate a number where, forget about calculating it, merely storing it requires one thousand million harddisks each of 1TB.
Thus, I'm 100% certain that you do not want what you think you want.
TIP: If you can't follow the math (which is quite bizarre; it's not like you get modulo-space math in your basic AP math class!), generally rolling your own implementation of a crypto algorithm isn't going to work out. The problem with crypto is, if you mess up, often a unit test cannot catch it. No; someone will hack your stuff and then you know, and that's a high price to pay. Rely on experts to build the algorithm, spend your time ensuring the protocol is correct (which is still quite difficult to get right, don't take that lightly!). If you insist, make dang sure you have a heap of plaintext+keys / encrypted (or plaintext / hashed, or whatever it is you're doing) pairs to test against, and assume that whatever you wrote, even if it passes those tests, is still insecure because e.g. it is trivial to leak the key out of your algorithm using timing attacks.
Since you anyway want to use it in a modulo operation with a prime number, like #Progman said in the comments, you can use modPow()
Below is an example code:
// Create BigInteger objects
BigInteger biginteger1, biginteger2, exponent, result;
//prime number
int pNumber = 5;
// Intializing all BigInteger Objects
biginteger1 = new BigInteger("23895");
biginteger2 = BigInteger.valueOf(pNumber);
exponent = new BigInteger("15");
// Perform modPow operation on the objects and exponent
result = biginteger1.modPow(exponent, biginteger2);

Why is Math.pow(int,int) slower than my naive implementation?

Yesterday I saw a question asking why Math.pow(int,int) is so slow, but the question was poorly worded and showed no research effort, so it was quickly closed.
I did a little test of my own and found that the Math.pow method actually did run extremely slow compared to my own naive implementation (which isn't even a particularly efficient implementation) when dealing with integer arguments. Below is the code I ran to test this:
class PowerTest {
public static double myPow(int base, int exponent) {
if(base == 0) return 0;
if(exponent == 0) return 1;
int absExponent = (exponent < 0)? exponent * -1 : exponent;
double result = base;
for(int i = 1; i < absExponent; i++) {
result *= base;
}
if(exponent < 1) result = 1 / result;
return result;
}
public static void main(String args[]) {
long startTime, endTime;
startTime = System.nanoTime();
for(int i = 0; i < 5000000; i++) {
Math.pow(2,2);
}
endTime = System.nanoTime();
System.out.printf("Math.pow took %d milliseconds.\n", (endTime - startTime) / 1000000);
startTime = System.nanoTime();
for(int i = 0; i < 5000000; i++) {
myPow(2,2);
}
endTime = System.nanoTime();
System.out.printf("myPow took %d milliseconds.\n", (endTime - startTime) / 1000000);
}
}
On my computer (linux on an intel x86_64 cpu), the output almost always reported that Math.pow took 10ms while myPow took 2ms. This occasionally fluctuated by a millisecond here or there, but Math.pow ran about 5x slower on average.
I did some research and, according to grepcode, Math.pow only offers a method with type signature of (double, double), and it defers that to the StrictMath.pow method which is a native method call.
The fact that the Math library only offers a pow function that deals with doubles seems to indicate a possible answer to this question. Obviously, a power algorithm that must handle the possibility of a base or exponent of type double is going to take longer to execute than my algorithm which only deals with integers. However, in the end, it boils down to architecture-dependent native code (which almost always runs faster than JVM byte code, probably C or assembly in my case). It seems that at this level, an optimization would be made to check the data type and run a simpler algorithm if possible.
Given this information, why does the native Math.pow method consistently run much slower than my un-optimized and naive myPow method when given integer arguments?
As others have said, you cannot just ignore the use of double, as floating point arithmetic will almost certainly be slower. However, this is not the only reason - if you change your implementation to use them, it is still faster.
This is because of two things: the first is that 2^2 (exponent, not xor) is a very quick calculation to perform, so your algorithm is fine to use for that - try using two values from Random#nextInt (or nextDouble) and you'll see that Math#pow is actually much quicker.
The other reason is that calling native methods has overhead, which is actually meaningful here, because 2^2 is so quick to calculate, and you are calling Math#pow so many times. See What makes JNI calls slow? for more on this.
There is no pow(int,int) function. You are comparing apples to oranges with your simplifying assumption that floating point numbers can be ignored.
Math.pow is slow because it deals with an equation in the generic sense, using fractional powers to raise it to the given power. It's the lookup it has to go through when computing that takes more time.
Simply multiplying numbers together is often faster, since native calls in Java are much more efficient.
Edit: It may also be worthy to note that Math functions use doubles, which can also take a longer amount of time than using ints.
Math.pow(x, y) is probably implemented as exp(y log x). This allows for fractional exponents and is remarkably fast.
But you'll be able to beat this performance by writing your own version if you only require small positive integral arguments.
Arguably Java could make that check for you, but there would be a point for large integers where the built-in version would be faster. It would also have to define an appropriate integral return type and the risk of overflowing that is obvious. Defining the behaviour around a branch region would be tricky.
By the way, your integral type version could be faster. Do some research on exponentiation by squaring.

What do I do when a Long isn't long enough (Java)?

So, I'm working through the bottom of Project Euler. There are a lot of great problems there (most of which are way above my level). But I very quickly came across a major problem:
Java will not do huge numbers.
Now, if I had the ability to make fast programs taking alternative routes, this would not be a problem for me. Unfortunately, I am not that person.
Take, for example, Problem 20. It call to find the digit sum of 100! I wasn't even writing for the digit parsing, but just my code to find the factorial failed.
long onehfact = 1;
for(int i = 1; i <= 100; i++){
onehfact = onehfact * i;
System.out.println(onehfact);
}
This worked for about 20 sequences, but started giving random-ish 19 digit numbers. By 75 sequences, it was just giving me zero.
100! is definitely not zero.
I need a way to have incredibly large numbers, and scientific notation will not work for my purposes. Is there a large variable type that I could use, that can hold numbers as large as 100-150 digits? Is there any other solution that would have the same result?
There is a class called BigInteger that will allow you to handle and perform operations on numbers of any size. It will allocate a variable amount of memory for the digits.
Here is the documentation. The class is in the java.math package:
http://docs.oracle.com/javase/7/docs/api/java/math/BigInteger.html
I worked through some problems in Project Euler in Java, too. For some of the other problems you are dealing with large non-integer numbers and you need the related class java.math.BigDecimal:
http://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
You can use BigInteger to satisfy your requirement.
BigInteger onehfact = BigInteger.valueOf(1L);
for(long i = 1; i <= 100; i++){
onehfact = onehfact.multiply(i);
System.out.println(onehfact);
}

Which is more intensive - Modulo vs String<->Int conversion?

Consider the following 2 (sort of) pseudo code blocks (my question is specific to Java though) -
//BLOCK 1 (using modulo)
for(int i=0,i<N,i++){ //N is in the order of millions
int temp1 = a%10;
int temp2 = a/10;
// more code adding temp1 & temp2, etc.
}
//BLOCK 2 (using int <-> char conversions)
for(int i=0;i<N;i++) { //N is in the order of millions
if(a>9){
String t1 = String.valueOf(a);
Integer temp1 = Integer.parseInt(t1.CharAt(0));
Integer temp2 = Integer.parserInt(t2.CharAt(1));
}
// more code adding temp1 & temp2, etc.
}
My question is - keeping everything else the same, compare the 2 ways of obtaining temp1 & temp2 in the 2 code blocks above.
Which one is more intensive? I think it should be the modulo but I am not sure. Because string<->int<->char conversions are not CPU intensive I believe. I could be entirely wrong and hence I am posting this question.
(PS. Most of you may have guessed that I am trying to get the 1st and 2nd digits of a 2-digit number and you're correct.)
I think you'll find that converting an integer into a string actually calls the modulo (or division) function quite a bit anyway, so I suspect the string one will be slower. That's on top of the fact that you're doing work with classes rather than primitive types.
But you should measure, not guess! Without measurement, optimisation is a hit-and-miss operation.
Try each of your proposed solutions in a ten-million-iteration loop and time tem. That's a good start.
The only way to know is test it in your environment.
Short answer : modulo is clearly faster because you are working with a int.
Long answer : Block 1 uses primitives and block 2 uses classes which adds overhead. It is not very significant for something this simple but it will still requires more cpu cycles. You can have a look at this nice answer about primitives in Java.

java unique number less then 12 characters

I have a user case which involves generating a number which a user enters into a website to link a transaction to their account.
So I have the following code which generates a random 12 digit number:
public String getRedemptionCode(long utid, long userId) {
long nano = System.nanoTime();
long temp = nano + utid + 1232;
long redemptionCode = temp + userId + 5465;
if (redemptionCode < 0) {
redemptionCode = Math.abs(redemptionCode);
}
String redemptionCodeFinal = StringUtils.rightPad(String.valueOf(redemptionCode), 12, '1');
redemptionCodeFinal = redemptionCodeFinal.substring(0, 12);
return redemptionCodeFinal;
}
This method takes in two params which are generated by a DB.
What I need to understand is:
Is this random? I have a test which ran this method 1 million times and it always seem to be random.
Can I cut this down to 8 characters?
No it is neither unique nor random.
It is not "random" in the sense of highly entropic / uncorrelated with other values.
The only source of non-determinism is System.nanoTime, so all the entropy comes from a few of the least significant bits of the system clock. Simply adding numbers like 1232 and 5465 does not make the result less correlated with subsequent results.
Is this random? I have a test which ran this method 1 million times and it always seem to be random.
If this code is used in multiple threads on the same machine, or on multiple machines with synced clocks, you will see duplicates more quickly.
Since there is low entropy, you are likely to see duplicates by random chance fairly quickly. Math.se addresses the likelihood depending on how many of these you generate.
Can I cut this down to 8 characters?
Only if you don't lose entropy. Consider two ways of truncating a timestamp:
long time = ...; // Least significant bits have randomness.
String s = "" + time;
// Cut off the right-most, most entropic bits
String bad = time.substring(0, 8);
// Cut off the left-most, least entropic bits
String better = time.substring(time.length() - 8);
Since it is a straightforward calculation from an increasing counter, an attacker who can try multiple times can predict the value produced in a way that they would not be able to had you used a crypto-strong random number generator like java.util.SecureRandom.
Is this random?
You are asking, is your function based on System.nanoTime() a random number generator (RNG)?
The definition of RNG is: generator, which generates numbers that lack any pattern.
So, are numbers returned from your function without any pattern?
No, they have an easily-observable pattern, because they depend on System.nanoTime() (system clock).
Can I cut this down to 8 characters?
Yes, you can, but it's still not random. Adding or padding won't help too.
Use SecureRandom instead.

Categories