Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
int n, k;
int count = 0, diff;
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String[] input;
input = br.readLine().split(" ");
n = Integer.parseInt(input[0]);
int[] a = new int[n];
k = Integer.parseInt(input[1]);
input = br.readLine().split(" ");
for (int i = 0; i < n; i++) {
a[i] = Integer.parseInt(input[i]);
for (int j = 0; j < i; j++) {
diff = a[j] - a[i];
if (diff == k || -diff == k) {
count++;
}
}
}
System.out.print(count);
This is a sample program where I am printing particular difference count, where n range is <=100000
Now problem is to decrease execution for this program. How can I make it better to reduce running time.
Thanks in advance for suggestions
Read the numbers from a file and put them in a Map (numbers as keys, their frequencies as values). Iterate over them once, and for each number check if the map contains that number with k added. If so, increase your counter. If you use a HashMap it's O(n) that way, instead of your algorithm's O(n^2).
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int k = Integer.parseInt(br.readLine().split(" ")[1]);
Map<Integer, Integer> readNumbers = new HashMap<Integer, Integer>();
for (String aNumber : br.readLine().split(" ")) {
Integer num = Integer.parseInt(aNumber);
Integer freq = readNumbers.get(num);
readNumbers.put(num, freq == null ? 1 : freq + 1);
}
int count = 0;
for (Integer aNumber : readNumbers.keySet()) {
int freq = readNumbers.get(aNumber);
if (k == 0) {
count += freq * (freq - 1) / 2;
} else if (readNumbers.containsKey(aNumber + k)) {
count += freq * readNumbers.get(aNumber + k);
}
}
System.out.print(count);
EDIT fixed for duplicates and k = 0
Here is a comparison of #Socha23's solution using HashSet, TIntIntHashSet and the original solution.
For 100,000 numbers I got the following (without the reading and parsing)
For 100 unique values, k=10
Set: 89,699,743 took 0.036 ms
Trove Set: 89,699,743 took 0.017 ms
Loops: 89,699,743 took 3623.2 ms
For 1000 unique values, k=10
Set: 9,896,049 took 0.187 ms
Trove Set: 9,896,049 took 0.193 ms
Loops: 9,896,049 took 2855.7 ms
The code
import gnu.trove.TIntIntHashMap;
import gnu.trove.TIntIntProcedure;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
class Main {
public static void main(String... args) throws Exception {
Random random = new Random(1);
int[] a = new int[100 * 1000];
int k = 10;
for (int i = 0; i < a.length; i++)
a[i] = random.nextInt(100);
for (int i = 0; i < 5; i++) {
testSet(a, k);
testTroveSet(a, k);
testLoops(a, k);
}
}
private static void testSet(int[] a, int k) {
Map<Integer, Integer> readNumbers = new HashMap<Integer, Integer>();
for (int num : a) {
Integer freq = readNumbers.get(num);
readNumbers.put(num, freq == null ? 1 : freq + 1);
}
long start = System.nanoTime();
int count = 0;
for (Integer aNumber : readNumbers.keySet()) {
if (readNumbers.containsKey(aNumber + k)) {
count += (readNumbers.get(aNumber) * readNumbers.get(aNumber + k));
}
}
long time = System.nanoTime() - start;
System.out.printf("Set: %,d took %.3f ms%n", count, time / 1e6);
}
private static void testTroveSet(int[] a, final int k) {
final TIntIntHashMap readNumbers = new TIntIntHashMap();
for (int num : a)
readNumbers.adjustOrPutValue(num, 1,1);
long start = System.nanoTime();
final int[] count = { 0 };
readNumbers.forEachEntry(new TIntIntProcedure() {
#Override
public boolean execute(int key, int keyCount) {
count[0] += readNumbers.get(key + k) * keyCount;
return true;
}
});
long time = System.nanoTime() - start;
System.out.printf("Trove Set: %,d took %.3f ms%n", count[0], time / 1e6);
}
private static void testLoops(int[] a, int k) {
long start = System.nanoTime();
int count = 0;
for (int i = 0; i < a.length; i++) {
for (int j = 0; j < i; j++) {
int diff = a[j] - a[i];
if (diff == k || -diff == k) {
count++;
}
}
}
long time = System.nanoTime() - start;
System.out.printf("Loops: %,d took %.1f ms%n", count, time / 1e6);
}
private static long free() {
return Runtime.getRuntime().freeMemory();
}
}
Since split() uses regular expressions to split a string, you should meassure whether StringTokenizer would speed up things.
You are trying to find elements which have difference k. Try this:
Sort the array.
You can do it in one pass after sorting by having two pointers and adjusting one of them depending on if the difference is bigger or smaller than k
A sparse map for the values, with their frequency of occurrence.
SortedMap<Integer, Integer> a = new TreeMap<Integer, Integer>();
for (int i = 0; i < n; ++i) {
int value = input[i];
Integer old = a.put(value, 1);
if (old != null) {
a.put(value, old.intValue() + 1);
}
}
for (Map.Entry<Integer, Integer> entry : a.entrySet()) {
Integer freq = a.get(entry.getKey() + k);
count += entry.getValue() * freq; // N values x M values further on.
}
This O(n).
Should this be too costly, you could sort the input array and do something similar.
I don't understand why you have one loop inside another. It's O(n^2) that way.
You also mingle reading in this array of ints with getting this count. I'd separate the two - read the whole thing in and then sweep through and get the difference count.
Perhaps I'm misunderstanding what you're doing, but it feels like you're re-doing a lot of wok in that inside loop.
Why not use java.util.Scanner clas instead of BufferReader.
for example :-
Scanner sc = new Scanner(System.in);
int number = sc.nextInt();
this may work faster as their are less wrappers involved.... See this link
Use sets and maps, as other users have already explained, so I won't reiterate their suggestions again.
I will suggest something else.
Stop using String.split. It compiles and uses a regular expression.
String.split has this line in it: Pattern.compile(expr).split(this).
If you want to split along a single character, you could write your own function and it would be much faster. I believe Guava (ex-Google collections API) has String split function which splits on characters without using a regular expression.
Related
So I am trying to understand about how ForkJoinPool works. I am trying to achieve better performance using this for a large array of about 2 million elements and then adding their reciprocal. I understand that ForkJoinPool.commpnPool().invoke(task); calls compute() which forks the task in two tasks if it is not smaller and then computes and then joins them. So far, we are using two cores.
But if I want to xecute this on multiple cores, how do I do that and achieve 4 times better performance than the usual single thread run? Below is my code for default ForkJoinPool():
#Override
protected void compute() {
// TODO
if (endIndexExclusive - startIndexInclusive <= seq_count) {
for (int i = startIndexInclusive; i < endIndexExclusive; i++)
value += 1 / input[i];
} else {
ReciprocalArraySumTask left = new ReciprocalArraySumTask(startIndexInclusive,
(endIndexExclusive + startIndexInclusive) / 2, input);
ReciprocalArraySumTask right = new ReciprocalArraySumTask((endIndexExclusive + startIndexInclusive) / 2,
endIndexExclusive, input);
left.fork();
right.compute();
left.join();
value = left.value + right.value;
}
}
}
protected static double parArraySum(final double[] input) {
assert input.length % 2 == 0;
double sum = 0;
// Compute sum of reciprocals of array elements
ReciprocalArraySumTask task = new ReciprocalArraySumTask(0, input.length, input);
ForkJoinPool.commonPool().invoke(task);
return task.getValue();
}
//Here I am trying to achieve with 4 cores
protected static double parManyTaskArraySum(final double[] input,
final int numTasks) {
double sum = 0;
System.out.println("Total tasks = " + numTasks);
System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", String.valueOf(numTasks));
// Compute sum of reciprocals of array elements
int chunkSize = ReciprocalArraySum.getChunkSize(numTasks, input.length);
System.out.println("Chunk size = " + chunkSize);
ReciprocalArraySumTask task = new ReciprocalArraySumTask(0, input.length, input);
ForkJoinPool pool = new ForkJoinPool();
// pool.
ForkJoinPool.commonPool().invoke(task);
return task.getValue();
}
You want to use 4 cores but you are giving a job which will need only two cores. In the following example, getChunkStartInclusive and getChunkEndExclusive methods give the range for beginning and ending indexes of each chunk. I believe the following code can solve your problem and give you some implementation idea.
protected static double parManyTaskArraySum(final double[] input,
final int numTasks) {
double sum = 0;
System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", String.valueOf(numTasks));
List<ReciprocalArraySumTask> ts = new ArrayList<ReciprocalArraySumTask>(numTasks);
int i;
for (i = 0; i < numTasks - 1 ; i++) {
ts.add(new ReciprocalArraySumTask(getChunkStartInclusive(i,numTasks,input.length),getChunkEndExclusive(i,numTasks,input.length),input));
ts.get(i).fork();
}
ts.add( new ReciprocalArraySumTask(getChunkStartInclusive(i, numTasks, input.length), getChunkEndExclusive(i, numTasks, input.length), input));
ts.get(i).compute();
for (int j = 0; j < numTasks - 1; j++) {
ts.get(j).join();
}
for (int j = 0; j < numTasks; j++) {
sum += ts.get(j).getValue();
}
return sum;
}
This is my approach:
Threshold is the limit when the compute starts to calculate and stops to stack recursive calls, this works better if each processor is used twice or more (there is a limit of course), that's because I use numTask * 2.
protected static double parManyTaskArraySum(final double[] input,
final int numTasks) {
int start;
int end;
int size = input.length;
int threshold = size / (numTasks * 2);
List<ReciprocalArraySumTask> actions = new ArrayList<>();
for (int i = 0; i < numTasks; i++) {
start = getChunkStartInclusive(i, numTasks, size);
end = getChunkEndExclusive(i, numTasks, size);
actions.add(new ReciprocalArraySumTask(start, end, input, threshold, I));
}
ForkJoinTask.invokeAll(actions);
return actions.stream().map(ReciprocalArraySumTask::getValue).reduce(new Double(0), Double::sum);
}
I am trying to find frequency of a longest substring in a huge string.
'Huge string' can be up to 2M characters long, only a-z
'Substring' may be between 100k to 2M characters long
'Substring' is always same or smaller size than 'Huge string'
currently, I am using following method which I created:
public static int[] countSubstringOccurence(String input, int substringLength) {
// input = from 100 000 to 2 000 000 alphanumeric characters long string;
// substringLength = from 100 000 to 2 000 000, always smaller than input
LinkedHashMap < String, Integer > substringOccurence = new LinkedHashMap < > ();
int l;
for (int i = 0; i < (input.length() - substringLength) + 1; i++) {
String substring = input.substring(i, i + substringLength);
if (substringOccurence.containsKey(substring)) {
l = substringOccurence.get(substring);
substringOccurence.put(substring, ++l);
} else {
substringOccurence.put(substring, 1);
}
}
List < Integer > substringOccurenceList = new ArrayList < > (substringOccurence.values());
int numberOfUniqueSubstrings = substringOccurenceList.size();
int numberOfOccurenciesOfMostCommonSubstrings = 0;
int numberOfSubstringsOfMostCommonSubstring = 0;
for (int i: substringOccurenceList) {
if (i > numberOfOccurenciesOfMostCommonSubstrings) {
numberOfOccurenciesOfMostCommonSubstrings = i;
numberOfSubstringsOfMostCommonSubstring = 1;
} else if (i == numberOfOccurenciesOfMostCommonSubstrings) {
numberOfSubstringsOfMostCommonSubstring++;
}
}
return new int[] {
numberOfUniqueSubstrings,
numberOfOccurenciesOfMostCommonSubstrings,
numberOfSubstringsOfMostCommonSubstring
};
}
later I am converting this to ArrayList and I iterate through whole list to find how many substrings and how many times these substrings are represented.
But after around 4 000 to 8 000 iterations I get java.lang.OutOfMemoryError Exception (which I expect since the process of this code takes over 2GB of memory at this point (I know, storing this amount of strings in memory can take up to 2TB in edge cases)). I tried using SHA1 hash as a key, which works, but it takes way more time, there are possible collisions and I think that there might be a better way to do this, but I can't think of any "better" optimization.
Thank you for any kind of help.
EDIT
There is some example input => output:
f("abcabc", 3) => 3 2 1
f("abcdefghijklmnopqrstuvwqyzab", 3) => 26 1 26
f("abcdefghijklmnopqrstuvwqyzab", 2) => 26 2 1
Ive changed the code to this:
public static int[] countSubstringOccurence(String text, int substringLength) {
int textLength = text.length();
int numberOfUniqueSubstrings = 0;
List<Integer> substrIndexes = new ArrayList<>();
for (int i = 0; i < (textLength - substringLength) + 1; i++) {
boolean doesNotExists = true;
for (int j = i + 1; j < (textLength - substringLength) + 1; j++) {
String actualSubstr = text.substring(i, i + substringLength);
String indexSubstr = text.substring(j, j + substringLength);
if (actualSubstr.equals(indexSubstr)) {
doesNotExists = false;
substrIndexes.add(j);
}
}
if (doesNotExists) {
numberOfUniqueSubstrings++;
substrIndexes.add(i);
}
}
LinkedHashMap<Integer, Integer> substrCountMap = new LinkedHashMap<>();
for (int i : substrIndexes) {
String substr = text.substring(i, i + substringLength);
int lastIndex = 0;
int count = 0;
while (lastIndex != -1) {
lastIndex = text.indexOf(substr, lastIndex);
if (lastIndex != -1) {
count++;
lastIndex += substr.length();
}
}
substrCountMap.put(i, count);
}
List<Integer> substrCountList = new ArrayList<>(substrCountMap.values());
int numberOfOccurenciesOfMostCommonSubstrings = 0;
int numberOfSubstringsOfMostCommonSubstring = 0;
for (int count : substrCountList) {
if (count > numberOfOccurenciesOfMostCommonSubstrings) {
numberOfOccurenciesOfMostCommonSubstrings = count;
numberOfSubstringsOfMostCommonSubstring = 1;
} else if (count == numberOfOccurenciesOfMostCommonSubstrings) {
numberOfSubstringsOfMostCommonSubstring++;
}
}
return new int[] {
numberOfUniqueSubstrings,
numberOfOccurenciesOfMostCommonSubstrings,
numberOfSubstringsOfMostCommonSubstring
};
}
this code does not crash, its just really, really slow (I guess its at least O(2n^2)). Can anyone think of a faster way?
It would be great if it could fit under 1GB RAM and under 15 minutes on a CPU equal to i3-3xxx. I am done for today.
Run it on Java 6. Not kidding!
Java 6 substring does NOT copy the characters, but only the reference, the index and the length.
just use StrinsgTokenizer class and extract each word.Then store each word in an array of String type of size given by the method <object name>.countTokens();
then you can easily calculate the frequencies of the given word
An input n of the order 10^18 and output should be the sum of all the numbers whose set bits is only 2. For e.g. n = 5 setbit is 101--> 2 set bits. For n = 1234567865432784,How can I optimize the below code?
class TestClass
{
public static void main(String args[])
{
long N,s=0L;
Scanner sc = new Scanner(System.in);
N=sc.nextLong();
for(long j = 1; j<=N; j++)
{
long b = j;
int count = 0;
while(b!=0)
{
b = b & (b-1);
count++;
}
if(count == 2)
{
s+=j;
count = 0;
}
else
{
count = 0;
continue;
}
}
System.out.println(s%1000000007);
s=0L;
}
}
Java has a function
if (Integer.bitCount(i) == 2) { ...
However consider a bit: that are a lot of numbers to inspect.
What about generating all numbers that have just two bits set?
Setting the ith and jth bit of n:
int n = (1 << i) | (1 << j); // i != j
Now consider 31² steps, not yet 1000 with N steps.
As this is homework my advise:
Try to turn the problem around, do the least work, take a step back, find the intelligent approach, search the math core. And enjoy.
Next time, do not spoil yourself of success moments.
As you probably had enough time to think about Joop Eggen's suggestion,
here is how i would do it (which is what Joop described i think):
import java.util.Scanner;
public class Program {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
long n = sc.nextLong();
long sum = 0;
for (int firstBitIndex = 0; firstBitIndex < 64; firstBitIndex++) {
long firstBit = 1L << firstBitIndex;
if (firstBit >= n)
break;
for (int secondBitIndex = firstBitIndex + 1; secondBitIndex < 64; secondBitIndex++) {
long value = firstBit | (1L << secondBitIndex);
if (value > n)
break;
sum += value;
}
}
System.out.println(sum % 1000000007);
sc.close();
}
}
Java provides the class BigInteger, which includes a method nextProbablePrime(). This means you could do something like this:
BigInteger n = new BigInteger(stringInputN);
BigInteger test = BigInteger.valueOf(2);
BigInteger total = BigInteger.valueOf(0);
while (test.compareTo(n) < 0){
total = total.add(test);
test = test.nextProbablePrime();
}
System.out.println(total);
This this has an extremely low probability of getting the wrong answer (but nonzero), so you might want to run it twice just to doublecheck. It should be faster than manually iterating it by hand though.
I've been playing around with the Project Euler challenges to help improve my knowledge of Java. In particular, I wrote the following code for problem 14, which asks you to find the longest Collatz chain which starts at a number below 1,000,000. It works on the assumption that subchains are incredibly likely to arise more than once, and by storing them in a cache, no redundant calculations are done.
Collatz.java:
import java.util.HashMap;
public class Collatz {
private HashMap<Long, Integer> chainCache = new HashMap<Long, Integer>();
public void initialiseCache() {
chainCache.put((long) 1, 1);
}
private long collatzOp(long n) {
if(n % 2 == 0) {
return n/2;
}
else {
return 3*n +1;
}
}
public int collatzChain(long n) {
if(chainCache.containsKey(n)) {
return chainCache.get(n);
}
else {
int count = 1 + collatzChain(collatzOp(n));
chainCache.put(n, count);
return count;
}
}
}
ProjectEuler14.java:
public class ProjectEuler14 {
public static void main(String[] args) {
Collatz col = new Collatz();
col.initialiseCache();
long limit = 1000000;
long temp = 0;
long longestLength = 0;
long index = 1;
for(long i = 1; i < limit; i++) {
temp = col.collatzChain(i);
if(temp > longestLength) {
longestLength = temp;
index = i;
}
}
System.out.println(index + " has the longest chain, with length " + longestLength);
}
}
This works. And according to the "measure-command" command from Windows Powershell, it takes roughly 1708 milliseconds (1.708 seconds) to execute.
However, after reading through the forums, I noticed that some people, who had written seemingly naive code, which calculate each chain from scratch, seemed to be getting much better execution times than me. I (conceptually) took one of the answers, and translated it into Java:
NaiveProjectEuler14.java:
public class NaiveProjectEuler14 {
public static void main(String[] args) {
int longest = 0;
int numTerms = 0;
int i;
long j;
for (i = 1; i <= 10000000; i++) {
j = i;
int currentTerms = 1;
while (j != 1) {
currentTerms++;
if (currentTerms > numTerms){
numTerms = currentTerms;
longest = i;
}
if (j % 2 == 0){
j = j / 2;
}
else{
j = 3 * j + 1;
}
}
}
System.out.println("Longest: " + longest + " (" + numTerms + ").");
}
}
On my machine, this also gives the correct answer, but it gives it in 0.502 milliseconds - a third of the speed of my original program. At first I thought that maybe there was a small overhead in creating a HashMap, and that the times taken were too small to draw any conclusions. However, if I increase the upper limit from 1,000,000 to 10,000,000 in both programs, NaiveProjectEuler14 takes 4709 milliseconds (4.709 seconds), whilst ProjectEuler14 takes a whopping 25324 milliseconds (25.324 seconds)!
Why does ProjectEuler14 take so long? The only explanation I can fathom is that storing huge amounts of pairs in the HashMap data structure is adding a huge overhead, but I can't see why that should be the case. I've also tried recording the number of (key, value) pairs stored during the course of the program (2,168,611 pairs for the 1,000,000 case, and 21,730,849 pairs for the 10,000,000 case) and supplying a little over that number to the HashMap constructor so that it only has to resize itself at most once, but this does not seem to affect the execution times.
Does anyone have any rationale for why the memoized version is a lot slower?
There are some reasons for that unfortunate reality:
Instead of containsKey, do an immediate get and check for null
The code uses an extra method to be called
The map stores wrapped objects (Integer, Long) for primitive types
The JIT compiler translating byte code to machine code can do more with calculations
The caching does not concern a large percentage, like fibonacci
Comparable would be
public static void main(String[] args) {
int longest = 0;
int numTerms = 0;
int i;
long j;
Map<Long, Integer> map = new HashMap<>();
for (i = 1; i <= 10000000; i++) {
j = i;
Integer terms = map.get(i);
if (terms != null) {
continue;
}
int currentTerms = 1;
while (j != 1) {
currentTerms++;
if (currentTerms > numTerms){
numTerms = currentTerms;
longest = i;
}
if (j % 2 == 0){
j = j / 2;
// Maybe check the map only here
Integer m = map.get(j);
if (m != null) {
currentTerms += m;
break;
}
}
else{
j = 3 * j + 1;
}
}
map.put(j, currentTerms);
}
System.out.println("Longest: " + longest + " (" + numTerms + ").");
}
This does not really do an adequate memoization. For increasing parameters not checking the 3*j+1 somewhat decreases the misses (but might also skip meoized values).
Memoization lives from heavy calculation per call. If the function takes long because of deep recursion rather than calculation, the memoization overhead per function call counts negatively.
I try to solve one problem on codeforces. And I get Time limit exceeded judjment. The only time consuming operation is calculation sum of big array. So I've tried to optimize it, but with no result.
What I want: Optimize the next function:
//array could be Integer.MAX_VALUE length
private long canocicalSum(int[] array) {
int sum = 0;
for (int i = 0; i < array.length; i++)
sum += array[i];
return sum;
}
Question1 [main]: Is it possible to optimize canonicalSum?
I've tried: to avoid operations with very big numbers. So i decided to use auxiliary data. For instance, I convert array1[100] to array2[10], where array2[i] = array1[i] + array1[i+1] + array1[i+9].
private long optimizedSum(int[] array, int step) {
do {
array = sumItr(array, step);
} while (array.length != 1);
return array[0];
}
private int[] sumItr(int[] array, int step) {
int length = array.length / step + 1;
boolean needCompensation = (array.length % step == 0) ? false : true;
int aux[] = new int[length];
for (int i = 0, auxSum = 0, auxPointer = 0; i < array.length; i++) {
auxSum += array[i];
if ((i + 1) % step == 0) {
aux[auxPointer++] = auxSum;
auxSum = 0;
}
if (i == array.length - 1 && needCompensation) {
aux[auxPointer++] = auxSum;
}
}
return aux;
}
Problem: But it appears that canonicalSum is ten times faster than optimizedSum. Here my test:
#Test
public void sum_comparison() {
final int ARRAY_SIZE = 100000000;
final int STEP = 1000;
int[] array = genRandomArray(ARRAY_SIZE);
System.out.println("Start canonical Sum");
long beg1 = System.nanoTime();
long sum1 = canocicalSum(array);
long end1 = System.nanoTime();
long time1 = end1 - beg1;
System.out.println("canon:" + TimeUnit.MILLISECONDS.convert(time1, TimeUnit.NANOSECONDS) + "milliseconds");
System.out.println("Start optimizedSum");
long beg2 = System.nanoTime();
long sum2 = optimizedSum(array, STEP);
long end2 = System.nanoTime();
long time2 = end2 - beg2;
System.out.println("custom:" + TimeUnit.MILLISECONDS.convert(time2, TimeUnit.NANOSECONDS) + "milliseconds");
assertEquals(sum1, sum2);
assertTrue(time2 <= time1);
}
private int[] genRandomArray(int size) {
int[] array = new int[size];
Random random = new Random();
for (int i = 0; i < array.length; i++) {
array[i] = random.nextInt();
}
return array;
}
Question2: Why optimizedSum works slower than canonicalSum?
As of Java 9, vectorisation of this operation has been implemented but disabled, based on benchmarks measuring the all-in cost of the code plus its compilation. Depending on your processor, this leads to the relatively entertaining result that if you introduce artificial complications into your reduction loop, you can trigger autovectorisation and get a quicker result! So the fastest code, for now, assuming numbers small enough not to overflow, is:
public int sum(int[] data) {
int value = 0;
for (int i = 0; i < data.length; ++i) {
value += 2 * data[i];
}
return value / 2;
}
This isn't intended as a recommendation! This is more to illustrate that the speed of your code in Java is dependent on the JIT, its trade-offs, and its bugs/features in any given release. Writing cute code to optimise problems like this is at best vain and will put a shelf life on the code you write. For instance, had you manually unrolled a loop to optimise for an older version of Java, your code would be much slower in Java 8 or 9 because this decision would completely disable autovectorisation. You'd better really need that performance to do it.
Question1 [main]: Is it possible to optimize canonicalSum?
Yes, it is. But I have no idea with what factor.
Some things you can do are:
use the parallel pipelines introduced in Java 8. The processor has instruction for doing parallel sum of 2 arrays (and more). This can be observed in Octave when you sum two vectors with ".+" (parallel addition) or "+" it is way faster than using a loop.
use multithreading. You could use a divide and conquer algorithm. Maybe like this:
divide the array into 2 or more
keep dividing recursively until you get an array with manageable size for a thread.
start computing the sum for the sub arrays (divided arrays) with separate threads.
finally add the sum generated (from all the threads) for all sub arrays together to produce final result
maybe unrolling the loop would help a bit, too. By loop unrolling I mean reducing the steps the loop will have to make by doing more operations in the loop manually.
An example from http://en.wikipedia.org/wiki/Loop_unwinding :
for (int x = 0; x < 100; x++)
{
delete(x);
}
becomes
for (int x = 0; x < 100; x+=5)
{
delete(x);
delete(x+1);
delete(x+2);
delete(x+3);
delete(x+4);
}
but as mentioned this must be done with caution and profiling since the JIT could do this kind of optimizations itself probably.
A implementation for mathematical operations for the multithreaded approach can be seen here.
The example implementation with the Fork/Join framework introduced in java 7 that basically does what the divide and conquer algorithm above does would be:
public class ForkJoinCalculator extends RecursiveTask<Double> {
public static final long THRESHOLD = 1_000_000;
private final SequentialCalculator sequentialCalculator;
private final double[] numbers;
private final int start;
private final int end;
public ForkJoinCalculator(double[] numbers, SequentialCalculator sequentialCalculator) {
this(numbers, 0, numbers.length, sequentialCalculator);
}
private ForkJoinCalculator(double[] numbers, int start, int end, SequentialCalculator sequentialCalculator) {
this.numbers = numbers;
this.start = start;
this.end = end;
this.sequentialCalculator = sequentialCalculator;
}
#Override
protected Double compute() {
int length = end - start;
if (length <= THRESHOLD) {
return sequentialCalculator.computeSequentially(numbers, start, end);
}
ForkJoinCalculator leftTask = new ForkJoinCalculator(numbers, start, start + length/2, sequentialCalculator);
leftTask.fork();
ForkJoinCalculator rightTask = new ForkJoinCalculator(numbers, start + length/2, end, sequentialCalculator);
Double rightResult = rightTask.compute();
Double leftResult = leftTask.join();
return leftResult + rightResult;
}
}
Here we develop a RecursiveTask splitting an array of doubles until
the length of a subarray doesn't go below a given threshold. At this
point the subarray is processed sequentially applying on it the
operation defined by the following interface
The interface used is this:
public interface SequentialCalculator {
double computeSequentially(double[] numbers, int start, int end);
}
And the usage example:
public static double varianceForkJoin(double[] population){
final ForkJoinPool forkJoinPool = new ForkJoinPool();
double total = forkJoinPool.invoke(new ForkJoinCalculator(population, new SequentialCalculator() {
#Override
public double computeSequentially(double[] numbers, int start, int end) {
double total = 0;
for (int i = start; i < end; i++) {
total += numbers[i];
}
return total;
}
}));
final double average = total / population.length;
double variance = forkJoinPool.invoke(new ForkJoinCalculator(population, new SequentialCalculator() {
#Override
public double computeSequentially(double[] numbers, int start, int end) {
double variance = 0;
for (int i = start; i < end; i++) {
variance += (numbers[i] - average) * (numbers[i] - average);
}
return variance;
}
}));
return variance / population.length;
}
If you want to add N numbers then the runtime is O(N). So in this aspect your canonicalSum can not be "optimized".
What you can do to reduce runtime is make the summation parallel. I.e. break the array to parts and pass it to separate threads and in the end sum the result returned by each thread.
Update: This implies multicore system but there is a java api to get the number of cores