Fast integer division in Java - java

It is well known that integer division is slow operation (typically several times slower than integer multiplication). But, if one need to perform many divide operations with a fixed divisor, it is possible to do some preconditioning on the divisor and replace "/" with multiplication and bit operations (Chapter 10 in Hacker's Delight).
As I've tested, if the divisor is a compilation-time constant, (e.g. static final long DIVISOR = 12345L;) JVM will do the trick and replace all divisions by DIVISOR with multiplication and bit operations. I'm interesting in the same kind of trick but when the divisor is known only at runtime.
For example, the following (slow) method:
void reduceArraySlow(long[] data, long denominator){
for(int i = 0; i < data.length; ++i)
data[i] = data[i] / denominator;
}
can be replaced with something:
void reduceArrayFast(long[] data, long denominator){
SomeMagicStructure magic = computeMagic(denominator);
for(int i = 0; i < data.length; ++i)
// computes data[i] / denominator
data[i] = doFastDivision(data[i], magic);
}
which must do the job much faster, since all / operations are replaced with faster operations (and also because division is not pipelined in CPUs).

There is well known C/C++ libdivide library for fast integer division, and there is my adaptation of this library for Java libdivide4j.
Fast division with libdivide4j looks as follows:
void reduceArrayFast(long[] data, long denominator){
FastDivision.Magic magic = FastDivision.magicSigned(denominator);
for(int i = 0; i < data.length; ++i)
// computes data[i] / denominator
data[i] = FastDivision.divideSignedFast(data[i], magic);
}
A simple benchmark
public void benchmark() throws Exception {
Random rnd = new Random();
int nIterations = 10000;
//let the JIT to optimize something
for (int att = 0; att < nIterations; att++) {
long[] data = new long[1000];
for (int i = 0; i < data.length; i++)
data[i] = rnd.nextLong();
long denominator = rnd.nextLong();
long[] slow = data.clone();
long start = System.nanoTime();
reduceArraySlow(slow, denominator);
long slowTime = System.nanoTime() - start;
long[] fast = data.clone();
start = System.nanoTime();
reduceArrayFast(fast, denominator);
long fastTime = System.nanoTime() - start;
Assert.assertArrayEquals(slow, fast);
// print last 100 timings (JVM already warmed up)
if (att > nIterations - 100) {
System.out.println("\"/\" operation: " + slowTime);
System.out.println("Fast division: " + fastTime);
System.out.println("");
}
}
}
shows the following timings (nanoseconds) for ordinary / and fast division (Core i7, jdk8 64-bit) :
"/" operation: 13233
Fast division: 5957
"/" operation: 13148
Fast division: 5103
"/" operation: 13587
Fast division: 6188
"/" operation: 14173
Fast division: 6773
...

Related

Does Java have a limit on loop cycles?

I solved the Project Euler problem #14 https://projecteuler.net/problem=14 on Java, but when I run it in Powershell, it stops iterating at exactly i = 113383 every time. I rewrote the solution on python, and it works perfectly fine, albeit slowly. According to my (identical) python solution, the answer is that the number that produces the longest chain is 837799 and the chain is 524 operations long.
Why does the Java solution not finish the for-loop? Is there some kind of limit in Java on how long it can stay in a loop? I cannot come up with any other explanation. Java code below. I wrote the System.out.println(i) there just to see what is going on.
class ProjectEuler14 {
public static void main(String[] args) {
int largestNumber = 1;
int largestChain = 1;
int currentNumber;
int chainLength;
for (int i = 2; i < 1000000; i++) {
System.out.println(i);
currentNumber = i;
chainLength = 0;
while (currentNumber != 1) {
if (currentNumber % 2 == 0) currentNumber /= 2;
else currentNumber = 3 * currentNumber + 1;
chainLength++;
}
if (chainLength > largestChain) {
largestChain = chainLength;
largestNumber = i;
}
}
System.out.println("\n\nThe number under million that produces the "
+ "longest chain is " + largestNumber +
" and the chain's length is " + largestChain);
}
}
It's not the for loop. It's the while loop. The condition currentNumber != 1 is always true; forever.
In java, an int is specifically defined as an integral number between -2^31 and +2^31 -1, inclusive, and operations 'roll over'. try it!
int x = 2^31 -1;
x++;
System.out.println(x);
this prints a large negative number (in fact, precisely -2^31).
It's happening in your algorithm, and that's why it never finishes.
A trivial solution is to 'upgrade' to longs; they are just as fast, really (yay 64-bit processors!) and use 64 bits, thus giving them a range of -2^63 to +2^63-1.
Python sort of scales up its numbers into slowness silently, java makes different choices (and, for crypto and other purposes, that rollover thing is in fact desired).
If you want to go even further, you can always use BigInteger, which grows as much as you need forever (becoming slower and taking more memory as it goes).
To know rollover occurred, the 3* operation would then result in a number that is lower than the original, and you can check for that:
replace:
else currentNumber = 3 * currentNumber + 1;
with:
else {
int newNumber = currentNumber * 3 + 1;
if (newNumber < currentNumber) throw new IllegalStateException("Overflow has occurred; 3 * " + currentNumber + " + 1 exceeds ints capacities.");
currentNumber = newNumber;
}
and rerun it. You'll see your app nicely explain itself.
The currentNumber is exceeding size of int, use long instead.
Do you hava problem overflow int.
Change int to long.
long largestNumber = 1;
long largestChain = 1;
long currentNumber;
long chainLength;
for (int i = 2; i < 1000000; i++) {
//System.out.println(i);
currentNumber = i;
chainLength = 0;
while (currentNumber != 1) {
//System.out.println("# = " + currentNumber);
if (currentNumber % 2 == 0) {
currentNumber /= 2;
} else {
currentNumber = (3 * currentNumber) +1 ;
}
chainLength++;
}
// System.out.println("################################ " + i);
if (chainLength > largestChain) {
largestChain = chainLength;
largestNumber = i;
}
}
System.out.println("\n\nThe number under million that produces the "
+ "longest chain is " + largestNumber
+ " and the chain's length is " + largestChain);

Time to loop using long and double

Following piece of code is take different timing with long and double, not able to understand why there is the difference in the timing?
public static void main(String[] args) {
long j = 1000000000;
double k = 1000000000;
long t1 = System.currentTimeMillis();
for (int index = 0; index < j; index++) {
}
long t2 = System.currentTimeMillis();
for (int index = 0; index < k; index++) {
}
long t3 = System.currentTimeMillis();
long longTime = t2 - t1;
long doubleTime = t3 - t2;
System.out.println("Time to loop long :: " + longTime);
System.out.println("Time to loop double :: " + doubleTime);
}
Output:
Time to loop long :: 2322
Time to loop double :: 1510
long is taking longer time than double, I have 64 bit window operating system and 64 bit Java.
When I modified my code and add casting long and double to int like
public static void main(String[] args) {
long j = 1000000000;
double k = 1000000000;
long t1 = System.currentTimeMillis();
for (int index = 0; index < (int)j; index++) {
}
long t2 = System.currentTimeMillis();
for (int index = 0; index < (int)k; index++) {
}
long t3 = System.currentTimeMillis();
long longTime = t2 - t1;
long doubleTime = t3 - t2;
System.out.println("Time to loop long :: " + longTime);
System.out.println("Time to loop double :: " + doubleTime);
}
The time got reduced but still there is the difference in the timing, but this time double is taking more time than long(opposite of first case)
Output:
Time to loop long :: 760
Time to loop double :: 1030
Firstly, a long is a 64-bit integer and a double is a 64-bit floating point number. The timing difference will likely be due to the difference in optimisation between integer arithmetic and floating point arithmetic in your CPU's ALU.
Secondly, the second time you run your application, in each for loop, the loop evaluates the stop condition every time, so you're casting from a long and double to an integer respectively on every iteration. If you precast the value to an integer value before the loop's condition then you should get more consistent times:
int j_int = (int) j;
for(int index = 0; index < j_int; index++) { /* Body */ }
int k_int = (int) k;
for(int index = 0; index < k_int; index++) { /* Body */ }
In general, casting from long to int is simpler than from double to int.
The reason is that long and int are both whole numbers and represented in memory simply by their binary representation (and possibly one bit for the sign).
Casting from one to another is quite straightforward by just "cropping" or "extending" the memory area (and handling signs correctly).
However, double are floating point numbers and their binary representation a bit more complicated, using sign, mantissa and exponent.
Casting from here to whole numbers thus is more complicated, as it requires conversion from one binary format to the other first.

Why is multiplied many times faster than taking the square root?

I have several questions with the following algorithms to tell if a number is prime, I also know that with the sieve of Eratosthenes can be faster response.
Why is faster to compute i i * sqrt (n) times. than sqrt (n) just one time ?
Why Math.sqrt() is faster than my sqrt() method ?
What is the complexity of these algorithms O (n), O (sqrt (n)), O (n log (n))?
public class Main {
public static void main(String[] args) {
// Case 1 comparing Algorithms
long startTime = System.currentTimeMillis(); // Start Time
for (int i = 2; i <= 100000; ++i) {
if (isPrime1(i))
continue;
}
long stopTime = System.currentTimeMillis(); // End Time
System.out.printf("Duracion: %4d ms. while (i*i <= N) Algorithm\n",
stopTime - startTime);
// Case 2 comparing Algorithms
startTime = System.currentTimeMillis();
for (int i = 2; i <= 100000; ++i) {
if (isPrime2(i))
continue;
}
stopTime = System.currentTimeMillis();
System.out.printf("Duracion: %4d ms. while (i <= sqrt(N)) Algorithm\n",
stopTime - startTime);
// Case 3 comparing Algorithms
startTime = System.currentTimeMillis();
for (int i = 2; i <= 100000; ++i) {
if (isPrime3(i))
continue;
}
stopTime = System.currentTimeMillis();
System.out.printf(
"Duracion: %4d ms. s = sqrt(N) while (i <= s) Algorithm\n",
stopTime - startTime);
// Case 4 comparing Algorithms
startTime = System.currentTimeMillis();
for (int i = 2; i <= 100000; ++i) {
if (isPrime4(i))
continue;
}
stopTime = System.currentTimeMillis();
System.out.printf(
"Duracion: %4d ms. s = Math.sqrt(N) while (i <= s) Algorithm\n",
stopTime - startTime);
}
public static boolean isPrime1(int n) {
for (long i = 2; i * i <= n; i++) {
if (n % i == 0)
return false;
}
return true;
}
public static boolean isPrime2(int n) {
for (long i = 2; i <= sqrt(n); i++) {
if (n % i == 0)
return false;
}
return true;
}
public static boolean isPrime3(int n) {
double s = sqrt(n);
for (long i = 2; i <= s; i++) {
if (n % i == 0)
return false;
}
return true;
}
public static boolean isPrime4(int n) {
// Proving wich if faster between my sqrt method or Java's sqrt
double s = Math.sqrt(n);
for (long i = 2; i <= s; i++) {
if (n % i == 0)
return false;
}
return true;
}
public static double abs(double n) {
return n < 0 ? -n : n;
}
public static double sqrt(double n) {
// Newton's method, from book Algorithms 4th edition by Robert Sedgwick
// and Kevin Wayne
if (n < 0)
return Double.NaN;
double err = 1e-15;
double p = n;
while (abs(p - n / p) > err * n)
p = (p + n / p) / 2.0;
return p;
}
}
This is the link of my code also: http://ideone.com/Fapj1P
1. Why is faster to compute i*i, sqrt (n) times. than sqrt (n) just one time ?
Look at the complexities below. The additional cost of computing square root.
2. Why Math.sqrt() is faster than my sqrt() method ?
Math.sqrt() delegates call to StrictMath.sqrt which is done in hardware or native code.
3. What is the complexity of these algorithms?
The complexity of each function you described
i=2 .. i*i<n O(sqrt(n))
i=2 .. sqrt(n) O(sqrt(n)*log(n))
i=2 .. sqrt (by Newton's method) O(sqrt(n)) + O(log(n))
i=2 .. sqrt (by Math.sqrt) O(sqrt(n))
Newton's method's complexity from
http://en.citizendium.org/wiki/Newton%27s_method#Computational_complexity
Squaring a number is effectively an integer operation while sqrt is floating point. Recognizing the run-time allocations for casting and computation the results you have observed are not surprising.
The Wikipedia page on sqrt http://en.wikipedia.org/wiki/Square_root has a nice section on computation.
As for the faster method I trust you can investigate (sub)linear run-time operation n^2.
On the note of run-times you might like this little piece of code I wrote up to demonstrate the number of system calls made to functions during iteration, you may find it, or something similar to it in java useful as you think about this sort of stuff. gist.github.com/Krewn/1ea0c788ac7210efc475
edit: Here is a nice explanation of integer sqrt. run-times http://www.codecodex.com/wiki/Calculate_an_integer_square_root
edit: on a 64 nm core2 to--- How slow (how many cycles) is calculating a square root?
Please include output in your post when possible
Related to your question although approaching primes in a different way,
def getPrimes(n):
primes = [2]
# instantiates our list to a list of one element, 2
k = 3
while(len(primes) < n):
# python uses the prefix function len(var) for lists dictionaries and strings
k2 = 0
isprime=True
#Vacuously true assumption that every number is prime unless
while(primes[k2]**2<=k): # <> this is where you are substituting sqrt with sq <> #
if(k%primes[k2]==0):
isprime=False
break
k2+=1
if(isprime):primes.append(k)
k+=2
return(primes)
print (getPrimes(30))

Project Euler #6 Two codes, different answers ONLY for big inputs. Why?

Here are two codes for solving problem 6 in project euler: Why do they give similar answers until I make the number larger? (100,000)
The sum of the squares of the first ten natural numbers is,
12 + 22 + ... + 102 = 385
The square of the sum of the first ten
natural numbers is,
(1 + 2 + ... + 10)2 = 552 = 3025
Hence the difference between the sum
of the squares of the first ten natural numbers and the square of the
sum is 3025 − 385 = 2640.
Find the difference between the sum of the squares of the first one
hundred natural numbers and the square of the sum.
Code 1:
public class Problem_Six_V2 {
public static void main(String[] args) {
long limit = 100000;
long sum = (limit * (limit + 1)) / 2;
long sumOfSqr = (long)((((2*limit)*limit)+((2*limit)*1)+(1*limit)+(1*1))*limit)/6;
System.out.println(Math.pow(sum, 2) +" "+ sumOfSqr);
System.out.println(Math.pow(sum, 2) - sumOfSqr);
}
}
^^^ Outputs = 2.500016666416665E19
Here's code two:
public class Problem_Six {
public static void main(String[] args) {
long sum = 0;
long sumSqr = 0;
long sumOfSqr = 0;
for(long i = 1; i <= 100000; i++){
sum += i;
sumOfSqr += Math.pow(i,2);
}
sumSqr = (long) Math.pow(sum, 2);
System.out.println(sumSqr +" "+ sumOfSqr);
System.out.println(sumSqr - sumOfSqr);
}
}
^^ Outputs = 9223038698521425807
I guess it's something to two with the types being used, but they seem similar in both codes..hmm
Math.pow(i,2) accepts doubles as parameters. Doubles are not 100% precise,
you lose precision. Stick to operations on int/long only. The answer is pretty small
and fits even into an int.
Not sure why you use 100000 as your limit, problem 6 has 100 as a limit.
In Java when results of integer arithmetic don't fit into int variables,
you should use long, when they don't fit even into long variables, you
should use BigInteger.
But avoid doubles, they are not precise for such kinds of tasks.
Here is your program corrected.
import java.math.BigInteger;
public class Problem_Six {
public static void main(String[] args) {
BigInteger sum = BigInteger.ZERO;
BigInteger sumSqr = BigInteger.ZERO;
BigInteger sumOfSqr = BigInteger.ZERO;
for (long i = 1; i <= 100000; i++) {
sum = sum.add(BigInteger.valueOf(i));
sumOfSqr = sumOfSqr.add(BigInteger.valueOf(i * i));
}
sumSqr = sum.multiply(sum);
System.out.println(sumSqr + " " + sumOfSqr);
System.out.println(sumSqr.subtract(sumOfSqr).toString());
// System.out.println(Long.MAX_VALUE);
}
}

Approximate median of an immutable array

I need to find a median value of an array of doubles (in Java) without modifying it (so selection is out) or allocating a lot of new memory. I also don't care to find the exact median, but within 10% is fine (so if median splits the sorted array 40%-60% it's fine).
How can I achieve this efficiently?
Taking into account suggestions from rfreak, ILMTitan and Peter I wrote this code:
public static double median(double[] array) {
final int smallArraySize = 5000;
final int bigArraySize = 100000;
if (array.length < smallArraySize + 2) { // small size, so can just sort
double[] arr = array.clone();
Arrays.sort(arr);
return arr[arr.length / 2];
} else if (array.length > bigArraySize) { // large size, don't want to make passes
double[] arr = new double[smallArraySize + 1];
int factor = array.length / arr.length;
for (int i = 0; i < arr.length; i++)
arr[i] = array[i * factor];
return median(arr);
} else { // average size, can sacrifice time for accuracy
final int buckets = 1000;
final double desiredPrecision = .005; // in percent
final int maxNumberOfPasses = 10;
int[] histogram = new int[buckets + 1];
int acceptableMin, acceptableMax;
double min, max, range, scale,
medianMin = -Double.MAX_VALUE, medianMax = Double.MAX_VALUE;
int sum, numbers, bin, neighborhood = (int) (array.length * 2 * desiredPrecision);
for (int r = 0; r < maxNumberOfPasses; r ++) { // enter search for number around median
max = -Double.MAX_VALUE; min = Double.MAX_VALUE;
numbers = 0;
for (int i = 0; i < array.length; i ++)
if (array[i] > medianMin && array[i] < medianMax) {
if (array[i] > max) max = array[i];
if (array[i] < min) min = array[i];
numbers ++;
}
if (min == max) return min;
if (numbers <= neighborhood) return (medianMin + medianMax) / 2;
acceptableMin = (int) (numbers * (50d - desiredPrecision) / 100);
acceptableMax = (int) (numbers * (50d + desiredPrecision) / 100);
range = max - min;
scale = range / buckets;
for (int i = 0; i < array.length; i ++)
histogram[(int) ((array[i] - min) / scale)] ++;
sum = 0;
for (bin = 0; bin <= buckets; bin ++) {
sum += histogram[bin];
if (sum > acceptableMin && sum < acceptableMax)
return ((.5d + bin) * scale) + min;
if (sum > acceptableMax) break; // one bin has too many values
}
medianMin = ((bin - 1) * scale) + min;
medianMax = (bin * scale) + min;
for (int i = 0; i < histogram.length; i ++)
histogram[i] = 0;
}
return .5d * medianMin + .5d * medianMax;
}
}
Here I take into account the size of the array. If it's small, then just sort and get the true median. If it's very large, sample it and get the median of the samples, and otherwise iteratively bin the values and see if the median can be narrowed down to an acceptable range.
I don't have any problems with this code. If someone sees something wrong with it, please let me know.
Thank you.
Assuming you mean median and not average. Also assuming you are working with fairly large double[], or memory wouldn't be an issue for sorting a copy and performing an exact median. ...
With minimal additional memory overhead you could probably run a O(n) algorithm that would get in the ballpark. I'd try this and see how accurate it is.
Two passes.
First pass find the min and max. Create a set of buckets that represent evenly spaced number ranges between the min and max. Make a second pass and "count" how many numbers fall in each bin. You should then be able to make a reasonable estimate of the median. Using 1000 buckets would only cost 4k if you use int[] to store the buckets. The math should be fast.
The only question is accuracy, and I think you should be able to tune the number of buckets to get in the error range for your data sets.
I'm sure someone with a better math/stats background than I could provide a precise size to get the error range you are looking for.
Pick a small number of array elements at random, and find the median of those.
Following on from the OPs question about; how to extract N values from a much larger array.
The following code shows how long it takes to find the median of a large array and then shows how long it take to find the median of a fixed size selection of values. The fixed size selection has a fixed cost, but is increasingly inaccurate as the the size of the original array grows.
The following prints
Avg time 17345 us. median=0.5009231700563378
Avg time 24 us. median=0.5146687617507585
the code
double[] nums = new double[100 * 1000 + 1];
for (int i = 0; i < nums.length; i++) nums[i] = Math.random();
{
int runs = 200;
double median = 0;
long start = System.nanoTime();
for (int r = 0; r < runs; r++) {
double[] arr = nums.clone();
Arrays.sort(arr);
median = arr[arr.length / 2];
}
long time = System.nanoTime() - start;
System.out.println("Avg time " + time / 1000 / runs + " us. median=" + median);
}
{
int runs = 20000;
double median = 0;
long start = System.nanoTime();
for (int r = 0; r < runs; r++) {
double[] arr = new double[301]; // fixed size to sample.
int factor = nums.length / arr.length; // take every nth value.
for (int i = 0; i < arr.length; i++)
arr[i] = nums[i * factor];
Arrays.sort(arr);
median = arr[arr.length / 2];
}
long time = System.nanoTime() - start;
System.out.println("Avg time " + time / 1000 / runs + " us. median=" + median);
}
To meet your requirement of not creating objects, I would put the fixed size array in a ThreadLocal so there is no ongoing object creation. You adjust the size of the array to suit how fast you want the function to be.
1) How much is a lot of new memory? Does it preclude a sorted copy of the data, or of references to the data?
2) Is your data repetitive (are there many distinct values)? If yes, then your answer to (1) is less likely to cause problems, because you may be able to do something with a lookup map and an array: e.g. Map and an an array of short and a suitably tweaked comparison object.
3) The typical case for the your "close to the mean" approximation is more likely to be O(n.log(n)). Most sort algorithms only degrade to O(n^2) with pathological data. Additionally, the exact median is only going to be (typically) O(n.log(n)), assuming you can afford a sorted copy.
4) Random sampling (a-la dan04) is more likely to be accurate than choosing values near the mean, unless your distribution is well behaved. For example poisson distribution and log normal both have different medians to means.

Categories