import java.io.*;
import java.util.ArrayList;
public class Ristsumma {
static long numberFromFile;
static long sum1, sum2;
static long number, number2;
static long variable, variable2;
static long counter;
public static void main(String args[]) throws IOException{
try{
BufferedReader br = new BufferedReader(new FileReader("ristsis.txt"));
numberFromFile = Long.parseLong(br.readLine());
br.close();
}catch(Exception e){
e.printStackTrace();
}
variable=numberFromFile;
ArrayList<Long> numbers = new ArrayList<Long>();
while (variable > 0){
number = variable %10;
variable/=10;
numbers.add(number);
}
for (int i=0; i< numbers.size(); i++) {
sum1 += numbers.get(i);
}
ArrayList<Long> numbers2 = new ArrayList<Long>();
for(long s=1; s<numberFromFile; s++){
variable2=s;
number2=0;
sum2=0;
while (variable2 > 0){
number2 = variable2 %10;
variable2/=10;
numbers2.add(number2);
}
for (int i=0; i< numbers2.size(); i++) {
sum2 += numbers2.get(i);
}
if(sum1==sum2){
counter+=1;
}
numbers2.clear();
}
PrintWriter pw = new PrintWriter("ristval.txt", "UTF-8");
pw.println(counter);
pw.close();
}
}
So I have this code. It takes a number from a file, adds all numbers separately from that number and adds them together (for example the number is 123 then it gives 1+2+3=6). In the second half it looks out all numbers from 1 to that number in the file and counts how many different numbers give the same answer. If the number is 123, the sum is 6 and the answer that the code writes is 9 (because 6, 15, 24, 33, 42, 51, 60, 105, 114 also give the same answer). The code works, but my problem is that when the number from a file is for example 2 222 222 222, then it takes almost half an hour to get the answer. How can I make this run faster?
Remove unnecessary creation of lists
You are unnecessarily creating lists
ArrayList<Long> numbers = new ArrayList<Long>();
while (variable > 0){
number = variable %10;
variable/=10;
numbers.add(number);
}
for (int i=0; i< numbers.size(); i++) {
sum1 += numbers.get(i);
}
Here you create an arraylist, just to temporaily hold Longs, you can eliminate the entire list
while (variable > 0){
number = variable %10;
variable/=10;
sum1 += number
}
The same for the other arraylist numbers2
Presize arralists
We have already eliminated the arraylists but if we hadn't we could improve speed by presizing the arrays
ArrayList<Long> numbers = new ArrayList<Long>(someGuessAsToSize);
It isn't nessissary that your guess be correct, the arraylist will still auto resize, but if the guess is approximately correct you will speed up the code as the arraylist will not have to periodically resize.
General style
You are holding lots of (what should be) method variables as fields
static long numberFromFile;
static long sum1, sum2;
static long number, number2;
static long variable, variable2;
static long counter;
This is unlikely to affect performance but is an unusual thing to do and makes the code less readable with the potential for "hidden effects"
Your problem is intriguing - it got me wondering how much faster it would run with threads.
Here is a threaded implementation that splits the task of calculating the second problem across threads. My laptop only has two cores so I have set the threads to 4.
public static void main(String[] args) throws Exception {
final long in = 222222222;
final long target = calcSum(in);
final ExecutorService executorService = Executors.newFixedThreadPool(4);
final Collection<Future<Integer>> futures = Lists.newLinkedList();
final int chunk = 100;
for (long i = in; i > 0; i -= chunk) {
futures.add(executorService.submit(new Counter(i > chunk ? i - chunk : 0, i, target)));
}
long res = 0;
for (final Future<Integer> f : futures) {
res += f.get();
}
System.out.println(res);
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.DAYS);
}
public static final class Counter implements Callable<Integer> {
private final long start;
private final long end;
private final long target;
public Counter(long start, long end, long target) {
this.start = start;
this.end = end;
this.target = target;
}
#Override
public Integer call() throws Exception {
int count = 0;
for (long i = start; i < end; ++i) {
if (calcSum(i) == target) {
++count;
}
}
return count;
}
}
public static long calcSum(long num) {
long sum = 0;
while (num > 0) {
sum += num % 10;
num /= 10;
}
return sum;
}
It calculates the solution with 222 222 222 as an input in a few seconds.
I optimised the calculation of the sum to remove all the Lists that you were using.
EDIT
I added some timing code using Stopwatch and tried with and without #Ingo's optimisation using 222222222 * 100 as the input number.
Without the optimisation the code takes 35 seconds. Changing the calc method to:
public static long calcSum(long num, final long limit) {
long sum = 0;
while (num > 0) {
sum += num % 10;
if (limit > 0 && sum > limit) {
break;
}
num /= 10;
}
return sum;
}
With the added the optimisation the code takes 28 seconds.
Note this this is a highly non-scientific benchmark as I didn't warm the JIT or run multiple trials (partly because I'm lazy and partly because I'm busy).
EDIT
Fiddling with the chunk size gives fairly different results too. With a chunk of 1000 time drops to around 17 seconds.
EDIT
If you want to be really fancy you can use a ForkJoinPool:
public static void main(String[] args) throws Exception {
final long in = 222222222;
final long target = calcSum(in);
final ForkJoinPool forkJoinPool = new ForkJoinPool();
final ForkJoinTask<Integer> result = forkJoinPool.submit(new Counter(0, in, target));
System.out.println(result.get());
forkJoinPool.shutdown();
forkJoinPool.awaitTermination(1, TimeUnit.DAYS);
}
public static final class Counter extends RecursiveTask<Integer> {
private static final long THRESHOLD = 1000;
private final long start;
private final long end;
private final long target;
public Counter(long start, long end, long target) {
this.start = start;
this.end = end;
this.target = target;
}
#Override
protected Integer compute() {
if (end - start < 1000) {
return computeDirectly();
}
long mid = start + (end - start) / 2;
final Counter low = new Counter(start, mid, target);
final Counter high = new Counter(mid, end, target);
low.fork();
final int highResult = high.compute();
final int lowResult = low.join();
return highResult + lowResult;
}
private Integer computeDirectly() {
int count = 0;
for (long i = start; i < end; ++i) {
if (calcSum(i) == target) {
++count;
}
}
return count;
}
}
public static long calcSum(long num) {
long sum = 0;
while (num > 0) {
sum += num % 10;
num /= 10;
}
return sum;
}
On a different (much faster) computer this runs in under a second, as compared to 2.8 seconds for the original approach.
You spend most of the time checking numbers that are failing the test. However, as Ingo observed, if you have a number ab, then (a-1)(b+1) has the same sum as ab. Instead of checking all numbers, you can generate them:
Lets say our number is 2 222, the sum is 8.
Approach #1: bottom up
We now generate the number starting with the smallest (we pad with zeroes for reading convenience): 0008. The next one is 0017, the next are 0026, 0035, 0044, 0053, 0062, 0071, 0080, 0107 and so on. The problematic part is finding the first number that has this sum.
Approach #2: top down
We start at 2222, the next lower number is 2213, then 2204, 2150, 2141, and so on. Here you don't have the problem that you need to find the lowest number.
I don't have time to write code now, but there should be an algorithm to realize both approaches, that does not involve trying out all numbers.
For a number abc, (a)(b-1)(c+1) is the next lower number, while (a)(b+1)(c-1) is the next higher number. The only interesting/difficult thing is when you need to overflow because b==9 or c==9, or b==0, c==0. The next bigger number if b==9 is (a+1)(9)(c-1) if c>0, and (a)(8)(0) if c==0. Now go make your algorithm, these examples should be enough.
Observe that you don't need to store the individual digits at all.
Instead, all you're interested in is the actual sum of the digits.
Considering this, a method like
static int diagsum(long number) { ... }
would be great. If it is easy enogh, the JIT could inline it, or at least optimize better than your spaghetti code.
Then again, you could benefit from another method that stops computing the digit sum at some limit. Fore example, when you have
22222222
the sum is 20, and that means that you need not compute any other sum that is greater than 20. For example:
45678993
Instead, you could just stop after you have the last 3 digits (which you get first by your diision method), because 9+9+3 is 21 and this is alread greater than 20.
===================================================================
Another optimization:
If you have some number:
123116
it is immediately clear that all unique permutations of those 6 digits have the same digit sum, thus
321611, 231611, ... are solutions
Then, for any pair of individual digits ab, a transformed number would contain (a+1)(b-1) and (a-1)(b+1) in the same place, as long as a+1, ... are still in the range 0..9. Apply recursively to get even more numbewrs.
You can then turn to numbers with less digits. Obviously, to have the same digit sum, you must combine 2 digits of the original number, if possible, for example
5412 => 912, 642, 741, 552, 561, 543
etc.
Apply the same algorithm recursively as above, until no transformations and combinations are possible.
=========
It must be said, though, that above idea would take lots of memory, because one must maintain a Set-like data structure to take care of duplicates. However, for 987_654_321 we get already 39_541_589 results, and probably much more with even greater numbers. Thus it is questionable if the effort to actually do it the combinatorical way is worth it.
Related
Say I want to go through a loop a billion times how could I optimize the loop to get my results faster?
As an example:
double randompoint;
for(long count =0; count < 1000000000; count++) {
randompoint = (Math.random() * 1) + 0; //generate a random point
if(randompoint <= .75) {
var++;
}
}
I was reading up on vecterization? But I'm not quite sure how to go about it. Any Ideas?
Since Java is cross-platform, you pretty much have to rely on the JIT to vectorize. In your case it can't, since each iteration depends heavily on the previous one (due to how the RNG works).
However, there are two other major ways to improve your computation.
The first is that this work is very amenable to parallelization. The technical term is embarrassingly parallel. This means that multithreading will give a perfectly linear speedup over the number of cores.
The second is that Math.random() is written to be multithreading safe, which also means that it's slow because it needs to use atomic operations. This isn't helpful, so we can skip that overhead by using a non-threadsafe RNG.
I haven't written much Java since 1.5, but here's a dumb implementation:
import java.util.*;
import java.util.concurrent.*;
class Foo implements Runnable {
private long count;
private double threshold;
private long result;
public Foo(long count, double threshold) {
this.count = count;
this.threshold = threshold;
}
public void run() {
ThreadLocalRandom rand = ThreadLocalRandom.current();
for(long l=0; l<count; l++) {
if(rand.nextDouble() < threshold)
result++;
}
}
public static void main(String[] args) throws Exception {
long count = 1000000000;
double threshold = 0.75;
int cores = Runtime.getRuntime().availableProcessors();
long sum = 0;
List<Foo> list = new ArrayList<Foo>();
List<Thread> threads = new ArrayList<Thread>();
for(int i=0; i<cores; i++) {
// TODO: account for count%cores!=0
Foo t = new Foo(count/cores, threshold);
list.add(t);
Thread thread = new Thread(t);
thread.start();
threads.add(thread);
}
for(Thread t : threads) t.join();
for(Foo f : list) sum += f.result;
System.out.println(sum);
}
}
You can also optimize and inline the random generator, to avoid going via doubles. Here it is with code taken from the ThreadLocalRandom docs:
public void run() {
long seed = new Random().nextLong();
long limit = (long) ((1L<<48) * threshold);
for(int i=0; i<count; i++) {
seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
if (seed < limit) ++result;
}
}
However, the best approach is to work smarter, not harder. As the number of events increases, the probability tends towards a normal distribution. This means that for your huge range, you can randomly generate a number with such a distribution and scale it:
import java.util.Random;
class StayInSchool {
public static void main(String[] args) {
System.out.println(coinToss(1000000000, 0.75));
}
static long coinToss(long iterations, double threshold) {
double mean = threshold * iterations;
double stdDev = Math.sqrt(threshold * (1-threshold) * iterations);
double p = new Random().nextGaussian();
return (long) (p*stdDev + mean);
}
}
Here are the timings on my 4 core system (including VM startup) for these approaches:
Your baseline: 20.9s
Single threaded ThreadLocalRandom: 6.51s
Single threaded optimized random: 1.75s
Multithreaded ThreadLocalRandom: 1.67s
Multithreaded optimized random: 0.89s
Generating a gaussian: 0.14s
I'm in troubles with a multithreading java program.
The program consists of a splitted sum of an array of integers with multithreads and than the total sum of the slices.
The problem is that computing time does not decrements by incrementing number of threads (I know that there is a limit number of threads after that the computing time is slower than less threads). I expect to see a decrease of execution time before that limit number of threads (benefits of parallel execution). I use the variable fake in run method to make time "readable".
public class MainClass {
private final int MAX_THREAD = 8;
private final int ARRAY_SIZE = 1000000;
private int[] array;
private SimpleThread[] threads;
private int numThread = 1;
private int[] sum;
private int start = 0;
private int totalSum = 0;
long begin, end;
int fake;
MainClass() {
fillArray();
for(int i = 0; i < MAX_THREAD; i++) {
threads = new SimpleThread[numThread];
sum = new int[numThread];
begin = (long) System.currentTimeMillis();
for(int j = 0 ; j < numThread; j++) {
threads[j] = new SimpleThread(start, ARRAY_SIZE/numThread, j);
threads[j].start();
start+= ARRAY_SIZE/numThread;
}
for(int k = 0; k < numThread; k++) {
try {
threads[k].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
end = (long) System.currentTimeMillis();
for(int g = 0; g < numThread; g++) {
totalSum+=sum[g];
}
System.out.printf("Result with %d thread-- Sum = %d Time = %d\n", numThread, totalSum, end-begin);
numThread++;
start = 0;
totalSum = 0;
}
}
public static void main(String args[]) {
new MainClass();
}
private void fillArray() {
array = new int[ARRAY_SIZE];
for(int i = 0; i < ARRAY_SIZE; i++)
array[i] = 1;
}
private class SimpleThread extends Thread{
int start;
int size;
int index;
public SimpleThread(int start, int size, int sumIndex) {
this.start = start;
this.size = size;
this.index = sumIndex;
}
public void run() {
for(int i = start; i < start+size; i++)
sum[index]+=array[i];
for(long i = 0; i < 1000000000; i++) {
fake++;
}
}
}
Unexpected Result Screenshot
As a general rule, you won't get a speedup from multi-threading if the "work" performed by each thread is less than the overheads of using the threads.
One of the overheads is the cost of starting a new thread. This is surprisingly high. Each time you start a thread the JVM needs to perform syscalls to allocate the thread stack memory segment and the "red zone" memory segment, and initialize them. (The default thread stack size is typically 500KB or 1MB.) Then there are further syscalls to create the native thread and schedule it.
In this example, you have 1,000,000 elements to sum and you divide this work among N threads. As N increases, the amount of work performed by each thread decreases.
It is not hard to see that the time taken to sum 1,000,000 elements is going to be less than the time needed to start 4 threads ... just based on counting the memory read and write operations. Then you need to take into account that the child threads are created one at a time by the parent thread.
If you do the analysis completely, it is clear that there is a point at which adding more threads actually slows down the computation even if you have enough to cores to run all threads in parallel. And your benchmarking seems to suggest1 that that point is around about 2 threads.
By the way, there is a second reason why you may not get as much speedup as you expect for a benchmark like this one. The "work" that each thread is doing is basically scanning a large array. Reading and writing arrays will generate requests to the memory system. Ideally, these requests will be satisfied by the (fast) on-chip memory caches. However, if you try to read / write an array that is larger than the memory cache, then many / most of those requests turn into (slow) main memory requests. Worse still, if you have N cores all doing this then you can find that the number of main memory requests is too much for the memory system to keep up .... and the threads slow down.
The bottom line is that multi-threading does not automatically make an application faster, and it certainly won't if you do it the wrong way.
In your example:
the amount of work per thread is too small compared with the overheads of creating and starting threads, and
memory bandwidth effects are likely to be a problem if can "factor out" the thread creation overheads
1 - I don't understand the point of the "fake" computation. It probably invalidates the benchmark, though it is possible that the JIT compiler optimizes it away.
Why sum is wrong sometimes?
Because ARRAY_SIZE/numThread may have fractional part (e.g. 1000000/3=333333.3333333333) which gets rounded down so start variable loses some hence the sum maybe less than 1000000 depending on the value of divisor.
Why the time taken is increasing as the number of threads increases?
Because in the run function of each thread you do this:
for(long i = 0; i < 1000000000; i++) {
fake++;
}
which I do not understand from your question :
I use the variable fake in run method to make time "readable".
what that means. But every thread needs to increment your fake variable 1000000000 times.
As a side note, for what you're trying to do there is the Fork/Join-Framework. It allows you easily split tasks recursively and implements an algorithm which will distribute your workload automatically.
There is a guide available here; it's example is very similar to your case, which boils down to a RecursiveTask like this:
class Adder extends RecursiveTask<Integer>
{
private int[] toAdd;
private int from;
private int to;
/** Add the numbers in the given array */
public Adder(int[] toAdd)
{
this(toAdd, 0, toAdd.length);
}
/** Add the numbers in the given array between the given indices;
internal constructor to split work */
private Adder(int[] toAdd, int fromIndex, int upToIndex)
{
this.toAdd = toAdd;
this.from = fromIndex;
this.to = upToIndex;
}
/** This is the work method */
#Override
protected Integer compute()
{
int amount = to - from;
int result = 0;
if (amount < 500)
{
// base case: add ints and return the result
for (int i = from; i < to; i++)
{
result += toAdd[i];
}
}
else
{
// array too large: split it into two parts and distribute the actual adding
int newEndIndex = from + (amount / 2);
Collection<Adder> invokeAll = invokeAll(Arrays.asList(
new Adder(toAdd, from, newEndIndex),
new Adder(toAdd, newEndIndex, to)));
for (Adder a : invokeAll)
{
result += a.invoke();
}
}
return result;
}
}
To actually run this, you can use
RecursiveTask adder = new Adder(fillArray(ARRAY_LENGTH));
int result = ForkJoinPool.commonPool().invoke(adder);
Starting threads is heavy and you'll only see the benefit of it on large processes that don't compete for the same resources (none of it applies here).
I wrote a small program to find the first 5 Taxicab numbers (so far only 6 are known) by checking each integer from 2 to 5E+15. The definition of Taxicab numbers is here.
However, my program took 8 minutes just to reach 3E+7. Since Taxicab(3) is in the order of 8E+7, I hesitate to let it run any further without optimizing it first.
I'm using NetBeans 8 on Ubuntu 16.10 on a HP 8560w, i7 2600qm quad core, 16GB RAM. However, Java only uses 1 core, to a maximum of 25% total CPU power, even when given Very High Priority. How do I fix this?
public class Ramanujan
{
public static void main(String[] args)
{
long limit;
//limit = 20;
limit = 500000000000000000L;
int order = 1;
for (long testCase = 2; testCase < limit; testCase++)
{
if (isTaxicab(testCase, order))
{
System.out.printf("Taxicab(%d) = %d*****************************\n",
order, testCase);
order++;
}
else
{
if (testCase%0x186a0 ==0) //Prints very 100000 iterations to track progress
{
//To track progress
System.out.printf("%d \n", testCase);
}
}
}
}
public static boolean isTaxicab(long testCase, int order)
{
int way = 0; //Number of ways that testCase can be expressed as sum of 2 cube numbers.
long i = 1;
long iUpperBound = (long) (1+Math.cbrt(testCase/2));
//If testCase = i*i*i + j*j*j AND i<=j
//then i*i*i cant be > testCase/2
//No need to test beyond that
while (i < iUpperBound)
{
if ( isSumOfTwoCubes(testCase, i) )
{
way++;
}
i++;
}
return (way >= order);
}
public static boolean isSumOfTwoCubes(long testCase,long i)
{
boolean isSum = false;
long jLowerBound = (long) Math.cbrt(testCase -i*i*i);
for (long j = jLowerBound; j < jLowerBound+2; j++)
{
long sumCubes = i*i*i + j*j*j;
if (sumCubes == testCase)
{
isSum = true;
break;
}
}
return isSum;
}
}
The program itself will only ever use one core until you parallelize it.
You need to learn how to use Threads.
Your problem is embarrassingly parallel. Parallelizing too much (i.e. creating too many threads) will be detrimental because each thread creates an overhead, so you need to be careful regarding exactly how you parallelize.
If it was up to me, I would initialize a list of worker threads where each thread effectively performs isTaxicab() and simply assign a single testCase to each worker as it becomes available.
You would want to code such that you can easily experiment with the number of workers.
I try to solve one problem on codeforces. And I get Time limit exceeded judjment. The only time consuming operation is calculation sum of big array. So I've tried to optimize it, but with no result.
What I want: Optimize the next function:
//array could be Integer.MAX_VALUE length
private long canocicalSum(int[] array) {
int sum = 0;
for (int i = 0; i < array.length; i++)
sum += array[i];
return sum;
}
Question1 [main]: Is it possible to optimize canonicalSum?
I've tried: to avoid operations with very big numbers. So i decided to use auxiliary data. For instance, I convert array1[100] to array2[10], where array2[i] = array1[i] + array1[i+1] + array1[i+9].
private long optimizedSum(int[] array, int step) {
do {
array = sumItr(array, step);
} while (array.length != 1);
return array[0];
}
private int[] sumItr(int[] array, int step) {
int length = array.length / step + 1;
boolean needCompensation = (array.length % step == 0) ? false : true;
int aux[] = new int[length];
for (int i = 0, auxSum = 0, auxPointer = 0; i < array.length; i++) {
auxSum += array[i];
if ((i + 1) % step == 0) {
aux[auxPointer++] = auxSum;
auxSum = 0;
}
if (i == array.length - 1 && needCompensation) {
aux[auxPointer++] = auxSum;
}
}
return aux;
}
Problem: But it appears that canonicalSum is ten times faster than optimizedSum. Here my test:
#Test
public void sum_comparison() {
final int ARRAY_SIZE = 100000000;
final int STEP = 1000;
int[] array = genRandomArray(ARRAY_SIZE);
System.out.println("Start canonical Sum");
long beg1 = System.nanoTime();
long sum1 = canocicalSum(array);
long end1 = System.nanoTime();
long time1 = end1 - beg1;
System.out.println("canon:" + TimeUnit.MILLISECONDS.convert(time1, TimeUnit.NANOSECONDS) + "milliseconds");
System.out.println("Start optimizedSum");
long beg2 = System.nanoTime();
long sum2 = optimizedSum(array, STEP);
long end2 = System.nanoTime();
long time2 = end2 - beg2;
System.out.println("custom:" + TimeUnit.MILLISECONDS.convert(time2, TimeUnit.NANOSECONDS) + "milliseconds");
assertEquals(sum1, sum2);
assertTrue(time2 <= time1);
}
private int[] genRandomArray(int size) {
int[] array = new int[size];
Random random = new Random();
for (int i = 0; i < array.length; i++) {
array[i] = random.nextInt();
}
return array;
}
Question2: Why optimizedSum works slower than canonicalSum?
As of Java 9, vectorisation of this operation has been implemented but disabled, based on benchmarks measuring the all-in cost of the code plus its compilation. Depending on your processor, this leads to the relatively entertaining result that if you introduce artificial complications into your reduction loop, you can trigger autovectorisation and get a quicker result! So the fastest code, for now, assuming numbers small enough not to overflow, is:
public int sum(int[] data) {
int value = 0;
for (int i = 0; i < data.length; ++i) {
value += 2 * data[i];
}
return value / 2;
}
This isn't intended as a recommendation! This is more to illustrate that the speed of your code in Java is dependent on the JIT, its trade-offs, and its bugs/features in any given release. Writing cute code to optimise problems like this is at best vain and will put a shelf life on the code you write. For instance, had you manually unrolled a loop to optimise for an older version of Java, your code would be much slower in Java 8 or 9 because this decision would completely disable autovectorisation. You'd better really need that performance to do it.
Question1 [main]: Is it possible to optimize canonicalSum?
Yes, it is. But I have no idea with what factor.
Some things you can do are:
use the parallel pipelines introduced in Java 8. The processor has instruction for doing parallel sum of 2 arrays (and more). This can be observed in Octave when you sum two vectors with ".+" (parallel addition) or "+" it is way faster than using a loop.
use multithreading. You could use a divide and conquer algorithm. Maybe like this:
divide the array into 2 or more
keep dividing recursively until you get an array with manageable size for a thread.
start computing the sum for the sub arrays (divided arrays) with separate threads.
finally add the sum generated (from all the threads) for all sub arrays together to produce final result
maybe unrolling the loop would help a bit, too. By loop unrolling I mean reducing the steps the loop will have to make by doing more operations in the loop manually.
An example from http://en.wikipedia.org/wiki/Loop_unwinding :
for (int x = 0; x < 100; x++)
{
delete(x);
}
becomes
for (int x = 0; x < 100; x+=5)
{
delete(x);
delete(x+1);
delete(x+2);
delete(x+3);
delete(x+4);
}
but as mentioned this must be done with caution and profiling since the JIT could do this kind of optimizations itself probably.
A implementation for mathematical operations for the multithreaded approach can be seen here.
The example implementation with the Fork/Join framework introduced in java 7 that basically does what the divide and conquer algorithm above does would be:
public class ForkJoinCalculator extends RecursiveTask<Double> {
public static final long THRESHOLD = 1_000_000;
private final SequentialCalculator sequentialCalculator;
private final double[] numbers;
private final int start;
private final int end;
public ForkJoinCalculator(double[] numbers, SequentialCalculator sequentialCalculator) {
this(numbers, 0, numbers.length, sequentialCalculator);
}
private ForkJoinCalculator(double[] numbers, int start, int end, SequentialCalculator sequentialCalculator) {
this.numbers = numbers;
this.start = start;
this.end = end;
this.sequentialCalculator = sequentialCalculator;
}
#Override
protected Double compute() {
int length = end - start;
if (length <= THRESHOLD) {
return sequentialCalculator.computeSequentially(numbers, start, end);
}
ForkJoinCalculator leftTask = new ForkJoinCalculator(numbers, start, start + length/2, sequentialCalculator);
leftTask.fork();
ForkJoinCalculator rightTask = new ForkJoinCalculator(numbers, start + length/2, end, sequentialCalculator);
Double rightResult = rightTask.compute();
Double leftResult = leftTask.join();
return leftResult + rightResult;
}
}
Here we develop a RecursiveTask splitting an array of doubles until
the length of a subarray doesn't go below a given threshold. At this
point the subarray is processed sequentially applying on it the
operation defined by the following interface
The interface used is this:
public interface SequentialCalculator {
double computeSequentially(double[] numbers, int start, int end);
}
And the usage example:
public static double varianceForkJoin(double[] population){
final ForkJoinPool forkJoinPool = new ForkJoinPool();
double total = forkJoinPool.invoke(new ForkJoinCalculator(population, new SequentialCalculator() {
#Override
public double computeSequentially(double[] numbers, int start, int end) {
double total = 0;
for (int i = start; i < end; i++) {
total += numbers[i];
}
return total;
}
}));
final double average = total / population.length;
double variance = forkJoinPool.invoke(new ForkJoinCalculator(population, new SequentialCalculator() {
#Override
public double computeSequentially(double[] numbers, int start, int end) {
double variance = 0;
for (int i = start; i < end; i++) {
variance += (numbers[i] - average) * (numbers[i] - average);
}
return variance;
}
}));
return variance / population.length;
}
If you want to add N numbers then the runtime is O(N). So in this aspect your canonicalSum can not be "optimized".
What you can do to reduce runtime is make the summation parallel. I.e. break the array to parts and pass it to separate threads and in the end sum the result returned by each thread.
Update: This implies multicore system but there is a java api to get the number of cores
I'm trying to make a decent Java program that generates the primes from 1 to N (mainly for Project Euler problems).
At the moment, my algorithm is as follows:
Initialise an array of booleans (or a bitarray if N is sufficiently large) so they're all false, and an array of ints to store the primes found.
Set an integer, s equal to the lowest prime, (ie 2)
While s is <= sqrt(N)
Set all multiples of s (starting at s^2) to true in the array/bitarray.
Find the next smallest index in the array/bitarray which is false, use that as the new value of s.
Endwhile.
Go through the array/bitarray, and for every value that is false, put the corresponding index in the primes array.
Now, I've tried skipping over numbers not of the form 6k + 1 or 6k + 5, but that only gives me a ~2x speed up, whilst I've seen programs run orders of magnitudes faster than mine (albeit with very convoluted code), such as the one here
What can I do to improve?
Edit: Okay, here's my actual code (for N of 1E7):
int l = 10000000, n = 2, sqrt = (int) Math.sqrt(l);
boolean[] nums = new boolean[l + 1];
int[] primes = new int[664579];
while(n <= sqrt){
for(int i = 2 * n; i <= l; nums[i] = true, i += n);
for(n++; nums[n]; n++);
}
for(int i = 2, k = 0; i < nums.length; i++) if(!nums[i]) primes[k++] = i;
Runs in about 350ms on my 2.0GHz machine.
While s is <= sqrt(N)
One mistake people often do in such algorithms is not precomputing square root.
while (s <= sqrt(N)) {
is much, much slower than
int limit = sqrt(N);
while (s <= limit) {
But generally speaking, Eiko is right in his comment. If you want people to offer low-level optimisations, you have to provide code.
update Ok, now about your code.
You may notice that number of iterations in your code is just little bigger than 'l'. (you may put counter inside first 'for' loop, it will be just 2-3 times bigger) And, obviously, complexity of your solution can't be less then O(l) (you can't have less than 'l' iterations).
What can make real difference is accessing memory effectively. Note that guy who wrote that article tries to reduce storage size not just because he's memory-greedy. Making compact arrays allows you to employ cache better and thus increase speed.
I just replaced boolean[] with int[] and achieved immediate x2 speed gain. (and 8x memory) And I didn't even try to do it efficiently.
update2
That's easy. You just replace every assignment a[i] = true with a[i/32] |= 1 << (i%32) and each read operation a[i] with (a[i/32] & (1 << (i%32))) != 0. And boolean[] a with int[] a, obviously.
From the first replacement it should be clear how it works: if f(i) is true, then there's a bit 1 in an integer number a[i/32], at position i%32 (int in Java has exactly 32 bits, as you know).
You can go further and replace i/32 with i >> 5, i%32 with i&31. You can also precompute all 1 << j for each j between 0 and 31 in array.
But sadly, I don't think in Java you could get close to C in this. Not to mention, that guy uses many other tricky optimizations and I agree that his could would've been worth a lot more if he made comments.
Using the BitSet will use less memory. The Sieve algorithm is rather trivial, so you can simply "set" the bit positions on the BitSet, and then iterate to determine the primes.
Did you also make the array smaller while skipping numbers not of the form 6k+1 and 6k+5?
I only tested with ignoring numbers of the form 2k and that gave me ~4x speed up (440 ms -> 120 ms):
int l = 10000000, n = 1, sqrt = (int) Math.sqrt(l);
int m = l/2;
boolean[] nums = new boolean[m + 1];
int[] primes = new int[664579];
int i, k;
while (n <= sqrt) {
int x = (n<<1)+1;
for (i = n+x; i <= m; nums[i] = true, i+=x);
for (n++; nums[n]; n++);
}
primes[0] = 2;
for (i = 1, k = 1; i < nums.length; i++) {
if (!nums[i])
primes[k++] = (i<<1)+1;
}
The following is from my Project Euler Library...Its a slight Variation of the Sieve of Eratosthenes...I'm not sure, but i think its called the Euler Sieve.
1) It uses a BitSet (so 1/8th the memory)
2) Only uses the bitset for Odd Numbers...(another 1/2th hence 1/16th)
Note: The Inner loop (for multiples) begins at "n*n" rather than "2*n" and also multiples of increment "2*n" are only crossed off....hence the speed up.
private void beginSieve(int mLimit)
{
primeList = new BitSet(mLimit>>1);
primeList.set(0,primeList.size(),true);
int sqroot = (int) Math.sqrt(mLimit);
primeList.clear(0);
for(int num = 3; num <= sqroot; num+=2)
{
if( primeList.get(num >> 1) )
{
int inc = num << 1;
for(int factor = num * num; factor < mLimit; factor += inc)
{
//if( ((factor) & 1) == 1)
//{
primeList.clear(factor >> 1);
//}
}
}
}
}
and here's the function to check if a number is prime...
public boolean isPrime(int num)
{
if( num < maxLimit)
{
if( (num & 1) == 0)
return ( num == 2);
else
return primeList.get(num>>1);
}
return false;
}
You could do the step of "putting the corresponding index in the primes array" while you are detecting them, taking out a run through the array, but that's about all I can think of right now.
I wrote a simple sieve implementation recently for the fun of it using BitSet (everyone says not to, but it's the best off the shelf way to store huge data efficiently). The performance seems to be pretty good to me, but I'm still working on improving it.
public class HelloWorld {
private static int LIMIT = 2140000000;//Integer.MAX_VALUE broke things.
private static BitSet marked;
public static void main(String[] args) {
long startTime = System.nanoTime();
init();
sieve();
long estimatedTime = System.nanoTime() - startTime;
System.out.println((float)estimatedTime/1000000000); //23.835363 seconds
System.out.println(marked.size()); //1070000000 ~= 127MB
}
private static void init()
{
double size = LIMIT * 0.5 - 1;
marked = new BitSet();
marked.set(0,(int)size, true);
}
private static void sieve()
{
int i = 0;
int cur = 0;
int add = 0;
int pos = 0;
while(((i<<1)+1)*((i<<1)+1) < LIMIT)
{
pos = i;
if(marked.get(pos++))
{
cur = pos;
add = (cur<<1);
pos += add*cur + cur - 1;
while(pos < marked.length() && pos > 0)
{
marked.clear(pos++);
pos += add;
}
}
i++;
}
}
private static void readPrimes()
{
int pos = 0;
while(pos < marked.length())
{
if(marked.get(pos++))
{
System.out.print((pos<<1)+1);
System.out.print("-");
}
}
}
}
With smaller LIMITs (say 10,000,000 which took 0.077479s) we get much faster results than the OP.
I bet java's performance is terrible when dealing with bits...
Algorithmically, the link you point out should be sufficient
Have you tried googling, e.g. for "java prime numbers". I did and dug up this simple improvement:
http://www.anyexample.com/programming/java/java_prime_number_check_%28primality_test%29.xml
Surely, you can find more at google.
Here is my code for Sieve of Erastothenes and this is actually the most efficient that I could do:
final int MAX = 1000000;
int p[]= new int[MAX];
p[0]=p[1]=1;
int prime[] = new int[MAX/10];
prime[0]=2;
void sieve()
{
int i,j,k=1;
for(i=3;i*i<=MAX;i+=2)
{
if(p[i])
continue;
for(j=i*i;j<MAX;j+=2*i)
p[j]=1;
}
for(i=3;i<MAX;i+=2)
{
if(p[i]==0)
prime[k++]=i;
}
return;
}