Java: how ot optimize sum of big array

Java: how ot optimize sum of big array - java

I try to solve one problem on codeforces. And I get Time limit exceeded judjment. The only time consuming operation is calculation sum of big array. So I've tried to optimize it, but with no result.
What I want: Optimize the next function:
//array could be Integer.MAX_VALUE length
private long canocicalSum(int[] array) {
int sum = 0;
for (int i = 0; i < array.length; i++)
sum += array[i];
return sum;
}
Question1 [main]: Is it possible to optimize canonicalSum?
I've tried: to avoid operations with very big numbers. So i decided to use auxiliary data. For instance, I convert array1[100] to array2[10], where array2[i] = array1[i] + array1[i+1] + array1[i+9].
private long optimizedSum(int[] array, int step) {
do {
array = sumItr(array, step);
} while (array.length != 1);
return array[0];
}
private int[] sumItr(int[] array, int step) {
int length = array.length / step + 1;
boolean needCompensation = (array.length % step == 0) ? false : true;
int aux[] = new int[length];
for (int i = 0, auxSum = 0, auxPointer = 0; i < array.length; i++) {
auxSum += array[i];
if ((i + 1) % step == 0) {
aux[auxPointer++] = auxSum;
auxSum = 0;
}
if (i == array.length - 1 && needCompensation) {
aux[auxPointer++] = auxSum;
}
}
return aux;
}
Problem: But it appears that canonicalSum is ten times faster than optimizedSum. Here my test:
#Test
public void sum_comparison() {
final int ARRAY_SIZE = 100000000;
final int STEP = 1000;
int[] array = genRandomArray(ARRAY_SIZE);
System.out.println("Start canonical Sum");
long beg1 = System.nanoTime();
long sum1 = canocicalSum(array);
long end1 = System.nanoTime();
long time1 = end1 - beg1;
System.out.println("canon:" + TimeUnit.MILLISECONDS.convert(time1, TimeUnit.NANOSECONDS) + "milliseconds");
System.out.println("Start optimizedSum");
long beg2 = System.nanoTime();
long sum2 = optimizedSum(array, STEP);
long end2 = System.nanoTime();
long time2 = end2 - beg2;
System.out.println("custom:" + TimeUnit.MILLISECONDS.convert(time2, TimeUnit.NANOSECONDS) + "milliseconds");
assertEquals(sum1, sum2);
assertTrue(time2 <= time1);
}
private int[] genRandomArray(int size) {
int[] array = new int[size];
Random random = new Random();
for (int i = 0; i < array.length; i++) {
array[i] = random.nextInt();
}
return array;
}
Question2: Why optimizedSum works slower than canonicalSum?

As of Java 9, vectorisation of this operation has been implemented but disabled, based on benchmarks measuring the all-in cost of the code plus its compilation. Depending on your processor, this leads to the relatively entertaining result that if you introduce artificial complications into your reduction loop, you can trigger autovectorisation and get a quicker result! So the fastest code, for now, assuming numbers small enough not to overflow, is:
public int sum(int[] data) {
int value = 0;
for (int i = 0; i < data.length; ++i) {
value += 2 * data[i];
}
return value / 2;
}
This isn't intended as a recommendation! This is more to illustrate that the speed of your code in Java is dependent on the JIT, its trade-offs, and its bugs/features in any given release. Writing cute code to optimise problems like this is at best vain and will put a shelf life on the code you write. For instance, had you manually unrolled a loop to optimise for an older version of Java, your code would be much slower in Java 8 or 9 because this decision would completely disable autovectorisation. You'd better really need that performance to do it.

Question1 [main]: Is it possible to optimize canonicalSum?
Yes, it is. But I have no idea with what factor.
Some things you can do are:
use the parallel pipelines introduced in Java 8. The processor has instruction for doing parallel sum of 2 arrays (and more). This can be observed in Octave when you sum two vectors with ".+" (parallel addition) or "+" it is way faster than using a loop.
use multithreading. You could use a divide and conquer algorithm. Maybe like this:
divide the array into 2 or more
keep dividing recursively until you get an array with manageable size for a thread.
start computing the sum for the sub arrays (divided arrays) with separate threads.
finally add the sum generated (from all the threads) for all sub arrays together to produce final result
maybe unrolling the loop would help a bit, too. By loop unrolling I mean reducing the steps the loop will have to make by doing more operations in the loop manually.
An example from http://en.wikipedia.org/wiki/Loop_unwinding :
for (int x = 0; x < 100; x++)
{
delete(x);
}
becomes
for (int x = 0; x < 100; x+=5)
{
delete(x);
delete(x+1);
delete(x+2);
delete(x+3);
delete(x+4);
}
but as mentioned this must be done with caution and profiling since the JIT could do this kind of optimizations itself probably.
A implementation for mathematical operations for the multithreaded approach can be seen here.
The example implementation with the Fork/Join framework introduced in java 7 that basically does what the divide and conquer algorithm above does would be:
public class ForkJoinCalculator extends RecursiveTask<Double> {
public static final long THRESHOLD = 1_000_000;
private final SequentialCalculator sequentialCalculator;
private final double[] numbers;
private final int start;
private final int end;
public ForkJoinCalculator(double[] numbers, SequentialCalculator sequentialCalculator) {
this(numbers, 0, numbers.length, sequentialCalculator);
}
private ForkJoinCalculator(double[] numbers, int start, int end, SequentialCalculator sequentialCalculator) {
this.numbers = numbers;
this.start = start;
this.end = end;
this.sequentialCalculator = sequentialCalculator;
}
#Override
protected Double compute() {
int length = end - start;
if (length <= THRESHOLD) {
return sequentialCalculator.computeSequentially(numbers, start, end);
}
ForkJoinCalculator leftTask = new ForkJoinCalculator(numbers, start, start + length/2, sequentialCalculator);
leftTask.fork();
ForkJoinCalculator rightTask = new ForkJoinCalculator(numbers, start + length/2, end, sequentialCalculator);
Double rightResult = rightTask.compute();
Double leftResult = leftTask.join();
return leftResult + rightResult;
}
}
Here we develop a RecursiveTask splitting an array of doubles until
the length of a subarray doesn't go below a given threshold. At this
point the subarray is processed sequentially applying on it the
operation defined by the following interface
The interface used is this:
public interface SequentialCalculator {
double computeSequentially(double[] numbers, int start, int end);
}
And the usage example:
public static double varianceForkJoin(double[] population){
final ForkJoinPool forkJoinPool = new ForkJoinPool();
double total = forkJoinPool.invoke(new ForkJoinCalculator(population, new SequentialCalculator() {
#Override
public double computeSequentially(double[] numbers, int start, int end) {
double total = 0;
for (int i = start; i < end; i++) {
total += numbers[i];
}
return total;
}
}));
final double average = total / population.length;
double variance = forkJoinPool.invoke(new ForkJoinCalculator(population, new SequentialCalculator() {
#Override
public double computeSequentially(double[] numbers, int start, int end) {
double variance = 0;
for (int i = start; i < end; i++) {
variance += (numbers[i] - average) * (numbers[i] - average);
}
return variance;
}
}));
return variance / population.length;
}

If you want to add N numbers then the runtime is O(N). So in this aspect your canonicalSum can not be "optimized".
What you can do to reduce runtime is make the summation parallel. I.e. break the array to parts and pass it to separate threads and in the end sum the result returned by each thread.
Update: This implies multicore system but there is a java api to get the number of cores

Related

How to use ForkJoinPool to use multiple cores in java?

So I am trying to understand about how ForkJoinPool works. I am trying to achieve better performance using this for a large array of about 2 million elements and then adding their reciprocal. I understand that ForkJoinPool.commpnPool().invoke(task); calls compute() which forks the task in two tasks if it is not smaller and then computes and then joins them. So far, we are using two cores.
But if I want to xecute this on multiple cores, how do I do that and achieve 4 times better performance than the usual single thread run? Below is my code for default ForkJoinPool():
#Override
protected void compute() {
// TODO
if (endIndexExclusive - startIndexInclusive <= seq_count) {
for (int i = startIndexInclusive; i < endIndexExclusive; i++)
value += 1 / input[i];
} else {
ReciprocalArraySumTask left = new ReciprocalArraySumTask(startIndexInclusive,
(endIndexExclusive + startIndexInclusive) / 2, input);
ReciprocalArraySumTask right = new ReciprocalArraySumTask((endIndexExclusive + startIndexInclusive) / 2,
endIndexExclusive, input);
left.fork();
right.compute();
left.join();
value = left.value + right.value;
}
}
}
protected static double parArraySum(final double[] input) {
assert input.length % 2 == 0;
double sum = 0;
// Compute sum of reciprocals of array elements
ReciprocalArraySumTask task = new ReciprocalArraySumTask(0, input.length, input);
ForkJoinPool.commonPool().invoke(task);
return task.getValue();
}
//Here I am trying to achieve with 4 cores
protected static double parManyTaskArraySum(final double[] input,
final int numTasks) {
double sum = 0;
System.out.println("Total tasks = " + numTasks);
System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", String.valueOf(numTasks));
// Compute sum of reciprocals of array elements
int chunkSize = ReciprocalArraySum.getChunkSize(numTasks, input.length);
System.out.println("Chunk size = " + chunkSize);
ReciprocalArraySumTask task = new ReciprocalArraySumTask(0, input.length, input);
ForkJoinPool pool = new ForkJoinPool();
// pool.
ForkJoinPool.commonPool().invoke(task);
return task.getValue();
}

You want to use 4 cores but you are giving a job which will need only two cores. In the following example, getChunkStartInclusive and getChunkEndExclusive methods give the range for beginning and ending indexes of each chunk. I believe the following code can solve your problem and give you some implementation idea.
protected static double parManyTaskArraySum(final double[] input,
final int numTasks) {
double sum = 0;
System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", String.valueOf(numTasks));
List<ReciprocalArraySumTask> ts = new ArrayList<ReciprocalArraySumTask>(numTasks);
int i;
for (i = 0; i < numTasks - 1 ; i++) {
ts.add(new ReciprocalArraySumTask(getChunkStartInclusive(i,numTasks,input.length),getChunkEndExclusive(i,numTasks,input.length),input));
ts.get(i).fork();
}
ts.add( new ReciprocalArraySumTask(getChunkStartInclusive(i, numTasks, input.length), getChunkEndExclusive(i, numTasks, input.length), input));
ts.get(i).compute();
for (int j = 0; j < numTasks - 1; j++) {
ts.get(j).join();
}
for (int j = 0; j < numTasks; j++) {
sum += ts.get(j).getValue();
}
return sum;
}

This is my approach:
Threshold is the limit when the compute starts to calculate and stops to stack recursive calls, this works better if each processor is used twice or more (there is a limit of course), that's because I use numTask * 2.
protected static double parManyTaskArraySum(final double[] input,
final int numTasks) {
int start;
int end;
int size = input.length;
int threshold = size / (numTasks * 2);
List<ReciprocalArraySumTask> actions = new ArrayList<>();
for (int i = 0; i < numTasks; i++) {
start = getChunkStartInclusive(i, numTasks, size);
end = getChunkEndExclusive(i, numTasks, size);
actions.add(new ReciprocalArraySumTask(start, end, input, threshold, I));
}
ForkJoinTask.invokeAll(actions);
return actions.stream().map(ReciprocalArraySumTask::getValue).reduce(new Double(0), Double::sum);
}

Why sentinel search slower than linear?

I decided to reduce the number of comparisons required to find an element in an array. Here we replace the last element of the list with the search element itself and run a while loop to see if there exists any copy of the search element in the list and quit the loop as soon as we find the search element. See the code snippet for clarification.
import java.util.Random;
public class Search {
public static void main(String[] args) {
int n = 10000000;
int key = 10000;
int[] arr = generateRandomSize(n);
long start = System.nanoTime();
int find = sentinels(arr, key);
long end = System.nanoTime();
System.out.println(find);
System.out.println(end - start);
arr = generateRandomSize(n);
start = System.nanoTime();
find = linear(arr, key);
end = System.nanoTime();
System.out.println(find);
System.out.println(end - start);
}
public static int[] generateRandomSize(int n) {
int[] arr = new int[n];
Random rand = new Random();
for (int i = 0; i < n; ++i) {
arr[i] = rand.nextInt(5000);
}
return arr;
}
public static int linear(int[] a, int key) {
for(int i = 0; i < a.length; ++i) {
if (a[i] == key) {
return i;
}
}
return -1;
}
public static int sentinels(int[] a, int key) {
int n = a.length;
int last = a[n-1];
a[n-1] = key;
int i = 0;
while (a[i] != key) {
++i;
}
a[n-1] = last;
if ((i < n - 1) || a[n-1] == key ) {
return i;
}
return -1;
}
}
So using sentinel search we are not doing 10000000 comparisons like i < arr.length. But why linear search always shows up better performance?

You'd have to look at the byte code, and even deeper to see what hotspot is making from this. But I am quite sure that this statement is not true:
using sentinel search we are not doing 10000000 comparisons like i <
arr.length
Why? Because when you access a[i], i has to be bounds checked. In the linear case on the other hand, the optimiser can deduce that it can omit the bounds check since it "knows" that i>=0 (because of the loop structure) and also i<arr.length because it has already been tested in the loop condition.
So the sentinel approach just adds overhead.
This makes me think of a smart C++ optimisation (called "Template Meta Programming" and "Expression Templates") I did about 20 years ago that led to faster execution times (at cost of a much higher compilation time), and after the next compiler version was released, I discovered that the new version was able to optimise the original source to produce the exact same assembly - in short I should have rather used my time differently and stayed with the more readable (=easier to maintain) version of the code.

Hashmap memoization slower than directly computing the answer

I've been playing around with the Project Euler challenges to help improve my knowledge of Java. In particular, I wrote the following code for problem 14, which asks you to find the longest Collatz chain which starts at a number below 1,000,000. It works on the assumption that subchains are incredibly likely to arise more than once, and by storing them in a cache, no redundant calculations are done.
Collatz.java:
import java.util.HashMap;
public class Collatz {
private HashMap<Long, Integer> chainCache = new HashMap<Long, Integer>();
public void initialiseCache() {
chainCache.put((long) 1, 1);
}
private long collatzOp(long n) {
if(n % 2 == 0) {
return n/2;
}
else {
return 3*n +1;
}
}
public int collatzChain(long n) {
if(chainCache.containsKey(n)) {
return chainCache.get(n);
}
else {
int count = 1 + collatzChain(collatzOp(n));
chainCache.put(n, count);
return count;
}
}
}
ProjectEuler14.java:
public class ProjectEuler14 {
public static void main(String[] args) {
Collatz col = new Collatz();
col.initialiseCache();
long limit = 1000000;
long temp = 0;
long longestLength = 0;
long index = 1;
for(long i = 1; i < limit; i++) {
temp = col.collatzChain(i);
if(temp > longestLength) {
longestLength = temp;
index = i;
}
}
System.out.println(index + " has the longest chain, with length " + longestLength);
}
}
This works. And according to the "measure-command" command from Windows Powershell, it takes roughly 1708 milliseconds (1.708 seconds) to execute.
However, after reading through the forums, I noticed that some people, who had written seemingly naive code, which calculate each chain from scratch, seemed to be getting much better execution times than me. I (conceptually) took one of the answers, and translated it into Java:
NaiveProjectEuler14.java:
public class NaiveProjectEuler14 {
public static void main(String[] args) {
int longest = 0;
int numTerms = 0;
int i;
long j;
for (i = 1; i <= 10000000; i++) {
j = i;
int currentTerms = 1;
while (j != 1) {
currentTerms++;
if (currentTerms > numTerms){
numTerms = currentTerms;
longest = i;
}
if (j % 2 == 0){
j = j / 2;
}
else{
j = 3 * j + 1;
}
}
}
System.out.println("Longest: " + longest + " (" + numTerms + ").");
}
}
On my machine, this also gives the correct answer, but it gives it in 0.502 milliseconds - a third of the speed of my original program. At first I thought that maybe there was a small overhead in creating a HashMap, and that the times taken were too small to draw any conclusions. However, if I increase the upper limit from 1,000,000 to 10,000,000 in both programs, NaiveProjectEuler14 takes 4709 milliseconds (4.709 seconds), whilst ProjectEuler14 takes a whopping 25324 milliseconds (25.324 seconds)!
Why does ProjectEuler14 take so long? The only explanation I can fathom is that storing huge amounts of pairs in the HashMap data structure is adding a huge overhead, but I can't see why that should be the case. I've also tried recording the number of (key, value) pairs stored during the course of the program (2,168,611 pairs for the 1,000,000 case, and 21,730,849 pairs for the 10,000,000 case) and supplying a little over that number to the HashMap constructor so that it only has to resize itself at most once, but this does not seem to affect the execution times.
Does anyone have any rationale for why the memoized version is a lot slower?

There are some reasons for that unfortunate reality:
Instead of containsKey, do an immediate get and check for null
The code uses an extra method to be called
The map stores wrapped objects (Integer, Long) for primitive types
The JIT compiler translating byte code to machine code can do more with calculations
The caching does not concern a large percentage, like fibonacci
Comparable would be
public static void main(String[] args) {
int longest = 0;
int numTerms = 0;
int i;
long j;
Map<Long, Integer> map = new HashMap<>();
for (i = 1; i <= 10000000; i++) {
j = i;
Integer terms = map.get(i);
if (terms != null) {
continue;
}
int currentTerms = 1;
while (j != 1) {
currentTerms++;
if (currentTerms > numTerms){
numTerms = currentTerms;
longest = i;
}
if (j % 2 == 0){
j = j / 2;
// Maybe check the map only here
Integer m = map.get(j);
if (m != null) {
currentTerms += m;
break;
}
}
else{
j = 3 * j + 1;
}
}
map.put(j, currentTerms);
}
System.out.println("Longest: " + longest + " (" + numTerms + ").");
}
This does not really do an adequate memoization. For increasing parameters not checking the 3*j+1 somewhat decreases the misses (but might also skip meoized values).
Memoization lives from heavy calculation per call. If the function takes long because of deep recursion rather than calculation, the memoization overhead per function call counts negatively.

How to run this code faster?

import java.io.*;
import java.util.ArrayList;
public class Ristsumma {
static long numberFromFile;
static long sum1, sum2;
static long number, number2;
static long variable, variable2;
static long counter;
public static void main(String args[]) throws IOException{
try{
BufferedReader br = new BufferedReader(new FileReader("ristsis.txt"));
numberFromFile = Long.parseLong(br.readLine());
br.close();
}catch(Exception e){
e.printStackTrace();
}
variable=numberFromFile;
ArrayList<Long> numbers = new ArrayList<Long>();
while (variable > 0){
number = variable %10;
variable/=10;
numbers.add(number);
}
for (int i=0; i< numbers.size(); i++) {
sum1 += numbers.get(i);
}
ArrayList<Long> numbers2 = new ArrayList<Long>();
for(long s=1; s<numberFromFile; s++){
variable2=s;
number2=0;
sum2=0;
while (variable2 > 0){
number2 = variable2 %10;
variable2/=10;
numbers2.add(number2);
}
for (int i=0; i< numbers2.size(); i++) {
sum2 += numbers2.get(i);
}
if(sum1==sum2){
counter+=1;
}
numbers2.clear();
}
PrintWriter pw = new PrintWriter("ristval.txt", "UTF-8");
pw.println(counter);
pw.close();
}
}
So I have this code. It takes a number from a file, adds all numbers separately from that number and adds them together (for example the number is 123 then it gives 1+2+3=6). In the second half it looks out all numbers from 1 to that number in the file and counts how many different numbers give the same answer. If the number is 123, the sum is 6 and the answer that the code writes is 9 (because 6, 15, 24, 33, 42, 51, 60, 105, 114 also give the same answer). The code works, but my problem is that when the number from a file is for example 2 222 222 222, then it takes almost half an hour to get the answer. How can I make this run faster?

Remove unnecessary creation of lists
You are unnecessarily creating lists
ArrayList<Long> numbers = new ArrayList<Long>();
while (variable > 0){
number = variable %10;
variable/=10;
numbers.add(number);
}
for (int i=0; i< numbers.size(); i++) {
sum1 += numbers.get(i);
}
Here you create an arraylist, just to temporaily hold Longs, you can eliminate the entire list
while (variable > 0){
number = variable %10;
variable/=10;
sum1 += number
}
The same for the other arraylist numbers2
Presize arralists
We have already eliminated the arraylists but if we hadn't we could improve speed by presizing the arrays
ArrayList<Long> numbers = new ArrayList<Long>(someGuessAsToSize);
It isn't nessissary that your guess be correct, the arraylist will still auto resize, but if the guess is approximately correct you will speed up the code as the arraylist will not have to periodically resize.
General style
You are holding lots of (what should be) method variables as fields
static long numberFromFile;
static long sum1, sum2;
static long number, number2;
static long variable, variable2;
static long counter;
This is unlikely to affect performance but is an unusual thing to do and makes the code less readable with the potential for "hidden effects"

Your problem is intriguing - it got me wondering how much faster it would run with threads.
Here is a threaded implementation that splits the task of calculating the second problem across threads. My laptop only has two cores so I have set the threads to 4.
public static void main(String[] args) throws Exception {
final long in = 222222222;
final long target = calcSum(in);
final ExecutorService executorService = Executors.newFixedThreadPool(4);
final Collection<Future<Integer>> futures = Lists.newLinkedList();
final int chunk = 100;
for (long i = in; i > 0; i -= chunk) {
futures.add(executorService.submit(new Counter(i > chunk ? i - chunk : 0, i, target)));
}
long res = 0;
for (final Future<Integer> f : futures) {
res += f.get();
}
System.out.println(res);
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.DAYS);
}
public static final class Counter implements Callable<Integer> {
private final long start;
private final long end;
private final long target;
public Counter(long start, long end, long target) {
this.start = start;
this.end = end;
this.target = target;
}
#Override
public Integer call() throws Exception {
int count = 0;
for (long i = start; i < end; ++i) {
if (calcSum(i) == target) {
++count;
}
}
return count;
}
}
public static long calcSum(long num) {
long sum = 0;
while (num > 0) {
sum += num % 10;
num /= 10;
}
return sum;
}
It calculates the solution with 222 222 222 as an input in a few seconds.
I optimised the calculation of the sum to remove all the Lists that you were using.
EDIT
I added some timing code using Stopwatch and tried with and without #Ingo's optimisation using 222222222 * 100 as the input number.
Without the optimisation the code takes 35 seconds. Changing the calc method to:
public static long calcSum(long num, final long limit) {
long sum = 0;
while (num > 0) {
sum += num % 10;
if (limit > 0 && sum > limit) {
break;
}
num /= 10;
}
return sum;
}
With the added the optimisation the code takes 28 seconds.
Note this this is a highly non-scientific benchmark as I didn't warm the JIT or run multiple trials (partly because I'm lazy and partly because I'm busy).
EDIT
Fiddling with the chunk size gives fairly different results too. With a chunk of 1000 time drops to around 17 seconds.
EDIT
If you want to be really fancy you can use a ForkJoinPool:
public static void main(String[] args) throws Exception {
final long in = 222222222;
final long target = calcSum(in);
final ForkJoinPool forkJoinPool = new ForkJoinPool();
final ForkJoinTask<Integer> result = forkJoinPool.submit(new Counter(0, in, target));
System.out.println(result.get());
forkJoinPool.shutdown();
forkJoinPool.awaitTermination(1, TimeUnit.DAYS);
}
public static final class Counter extends RecursiveTask<Integer> {
private static final long THRESHOLD = 1000;
private final long start;
private final long end;
private final long target;
public Counter(long start, long end, long target) {
this.start = start;
this.end = end;
this.target = target;
}
#Override
protected Integer compute() {
if (end - start < 1000) {
return computeDirectly();
}
long mid = start + (end - start) / 2;
final Counter low = new Counter(start, mid, target);
final Counter high = new Counter(mid, end, target);
low.fork();
final int highResult = high.compute();
final int lowResult = low.join();
return highResult + lowResult;
}
private Integer computeDirectly() {
int count = 0;
for (long i = start; i < end; ++i) {
if (calcSum(i) == target) {
++count;
}
}
return count;
}
}
public static long calcSum(long num) {
long sum = 0;
while (num > 0) {
sum += num % 10;
num /= 10;
}
return sum;
}
On a different (much faster) computer this runs in under a second, as compared to 2.8 seconds for the original approach.

You spend most of the time checking numbers that are failing the test. However, as Ingo observed, if you have a number ab, then (a-1)(b+1) has the same sum as ab. Instead of checking all numbers, you can generate them:
Lets say our number is 2 222, the sum is 8.
Approach #1: bottom up
We now generate the number starting with the smallest (we pad with zeroes for reading convenience): 0008. The next one is 0017, the next are 0026, 0035, 0044, 0053, 0062, 0071, 0080, 0107 and so on. The problematic part is finding the first number that has this sum.
Approach #2: top down
We start at 2222, the next lower number is 2213, then 2204, 2150, 2141, and so on. Here you don't have the problem that you need to find the lowest number.
I don't have time to write code now, but there should be an algorithm to realize both approaches, that does not involve trying out all numbers.
For a number abc, (a)(b-1)(c+1) is the next lower number, while (a)(b+1)(c-1) is the next higher number. The only interesting/difficult thing is when you need to overflow because b==9 or c==9, or b==0, c==0. The next bigger number if b==9 is (a+1)(9)(c-1) if c>0, and (a)(8)(0) if c==0. Now go make your algorithm, these examples should be enough.

Observe that you don't need to store the individual digits at all.
Instead, all you're interested in is the actual sum of the digits.
Considering this, a method like
static int diagsum(long number) { ... }
would be great. If it is easy enogh, the JIT could inline it, or at least optimize better than your spaghetti code.
Then again, you could benefit from another method that stops computing the digit sum at some limit. Fore example, when you have
22222222
the sum is 20, and that means that you need not compute any other sum that is greater than 20. For example:
45678993
Instead, you could just stop after you have the last 3 digits (which you get first by your diision method), because 9+9+3 is 21 and this is alread greater than 20.
===================================================================
Another optimization:
If you have some number:
123116
it is immediately clear that all unique permutations of those 6 digits have the same digit sum, thus
321611, 231611, ... are solutions
Then, for any pair of individual digits ab, a transformed number would contain (a+1)(b-1) and (a-1)(b+1) in the same place, as long as a+1, ... are still in the range 0..9. Apply recursively to get even more numbewrs.
You can then turn to numbers with less digits. Obviously, to have the same digit sum, you must combine 2 digits of the original number, if possible, for example
5412 => 912, 642, 741, 552, 561, 543
etc.
Apply the same algorithm recursively as above, until no transformations and combinations are possible.
=========
It must be said, though, that above idea would take lots of memory, because one must maintain a Set-like data structure to take care of duplicates. However, for 987_654_321 we get already 39_541_589 results, and probably much more with even greater numbers. Thus it is questionable if the effort to actually do it the combinatorical way is worth it.

java - how to reduce execution time for this program [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
int n, k;
int count = 0, diff;
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String[] input;
input = br.readLine().split(" ");
n = Integer.parseInt(input[0]);
int[] a = new int[n];
k = Integer.parseInt(input[1]);
input = br.readLine().split(" ");
for (int i = 0; i < n; i++) {
a[i] = Integer.parseInt(input[i]);
for (int j = 0; j < i; j++) {
diff = a[j] - a[i];
if (diff == k || -diff == k) {
count++;
}
}
}
System.out.print(count);
This is a sample program where I am printing particular difference count, where n range is <=100000
Now problem is to decrease execution for this program. How can I make it better to reduce running time.
Thanks in advance for suggestions

Read the numbers from a file and put them in a Map (numbers as keys, their frequencies as values). Iterate over them once, and for each number check if the map contains that number with k added. If so, increase your counter. If you use a HashMap it's O(n) that way, instead of your algorithm's O(n^2).
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int k = Integer.parseInt(br.readLine().split(" ")[1]);
Map<Integer, Integer> readNumbers = new HashMap<Integer, Integer>();
for (String aNumber : br.readLine().split(" ")) {
Integer num = Integer.parseInt(aNumber);
Integer freq = readNumbers.get(num);
readNumbers.put(num, freq == null ? 1 : freq + 1);
}
int count = 0;
for (Integer aNumber : readNumbers.keySet()) {
int freq = readNumbers.get(aNumber);
if (k == 0) {
count += freq * (freq - 1) / 2;
} else if (readNumbers.containsKey(aNumber + k)) {
count += freq * readNumbers.get(aNumber + k);
}
}
System.out.print(count);
EDIT fixed for duplicates and k = 0

Here is a comparison of #Socha23's solution using HashSet, TIntIntHashSet and the original solution.
For 100,000 numbers I got the following (without the reading and parsing)
For 100 unique values, k=10
Set: 89,699,743 took 0.036 ms
Trove Set: 89,699,743 took 0.017 ms
Loops: 89,699,743 took 3623.2 ms
For 1000 unique values, k=10
Set: 9,896,049 took 0.187 ms
Trove Set: 9,896,049 took 0.193 ms
Loops: 9,896,049 took 2855.7 ms
The code
import gnu.trove.TIntIntHashMap;
import gnu.trove.TIntIntProcedure;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
class Main {
public static void main(String... args) throws Exception {
Random random = new Random(1);
int[] a = new int[100 * 1000];
int k = 10;
for (int i = 0; i < a.length; i++)
a[i] = random.nextInt(100);
for (int i = 0; i < 5; i++) {
testSet(a, k);
testTroveSet(a, k);
testLoops(a, k);
}
}
private static void testSet(int[] a, int k) {
Map<Integer, Integer> readNumbers = new HashMap<Integer, Integer>();
for (int num : a) {
Integer freq = readNumbers.get(num);
readNumbers.put(num, freq == null ? 1 : freq + 1);
}
long start = System.nanoTime();
int count = 0;
for (Integer aNumber : readNumbers.keySet()) {
if (readNumbers.containsKey(aNumber + k)) {
count += (readNumbers.get(aNumber) * readNumbers.get(aNumber + k));
}
}
long time = System.nanoTime() - start;
System.out.printf("Set: %,d took %.3f ms%n", count, time / 1e6);
}
private static void testTroveSet(int[] a, final int k) {
final TIntIntHashMap readNumbers = new TIntIntHashMap();
for (int num : a)
readNumbers.adjustOrPutValue(num, 1,1);
long start = System.nanoTime();
final int[] count = { 0 };
readNumbers.forEachEntry(new TIntIntProcedure() {
#Override
public boolean execute(int key, int keyCount) {
count[0] += readNumbers.get(key + k) * keyCount;
return true;
}
});
long time = System.nanoTime() - start;
System.out.printf("Trove Set: %,d took %.3f ms%n", count[0], time / 1e6);
}
private static void testLoops(int[] a, int k) {
long start = System.nanoTime();
int count = 0;
for (int i = 0; i < a.length; i++) {
for (int j = 0; j < i; j++) {
int diff = a[j] - a[i];
if (diff == k || -diff == k) {
count++;
}
}
}
long time = System.nanoTime() - start;
System.out.printf("Loops: %,d took %.1f ms%n", count, time / 1e6);
}
private static long free() {
return Runtime.getRuntime().freeMemory();
}
}

Since split() uses regular expressions to split a string, you should meassure whether StringTokenizer would speed up things.

You are trying to find elements which have difference k. Try this:
Sort the array.
You can do it in one pass after sorting by having two pointers and adjusting one of them depending on if the difference is bigger or smaller than k

A sparse map for the values, with their frequency of occurrence.
SortedMap<Integer, Integer> a = new TreeMap<Integer, Integer>();
for (int i = 0; i < n; ++i) {
int value = input[i];
Integer old = a.put(value, 1);
if (old != null) {
a.put(value, old.intValue() + 1);
}
}
for (Map.Entry<Integer, Integer> entry : a.entrySet()) {
Integer freq = a.get(entry.getKey() + k);
count += entry.getValue() * freq; // N values x M values further on.
}
This O(n).
Should this be too costly, you could sort the input array and do something similar.

I don't understand why you have one loop inside another. It's O(n^2) that way.
You also mingle reading in this array of ints with getting this count. I'd separate the two - read the whole thing in and then sweep through and get the difference count.
Perhaps I'm misunderstanding what you're doing, but it feels like you're re-doing a lot of wok in that inside loop.

Why not use java.util.Scanner clas instead of BufferReader.
for example :-
Scanner sc = new Scanner(System.in);
int number = sc.nextInt();
this may work faster as their are less wrappers involved.... See this link

Use sets and maps, as other users have already explained, so I won't reiterate their suggestions again.
I will suggest something else.
Stop using String.split. It compiles and uses a regular expression.
String.split has this line in it: Pattern.compile(expr).split(this).
If you want to split along a single character, you could write your own function and it would be much faster. I believe Guava (ex-Google collections API) has String split function which splits on characters without using a regular expression.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: how ot optimize sum of big array - java

Related

How to use ForkJoinPool to use multiple cores in java?

Why sentinel search slower than linear?

Hashmap memoization slower than directly computing the answer

How to run this code faster?

java - how to reduce execution time for this program [closed]

Categories

Resources