ForkJoinPool parallelism=1 deadlock - java

I'm using the jsr166y ForkJoinPool to distribute computational tasks amongst threads. But I clearly must be doing something wrong.
My tasks seem to work flawlessly if I create the ForkJoinPool with parallelism > 1 (the default is Runtime.availableProcessors(); I've been running with 2-8 threads). But if I create the ForkJoinPool with parallelism = 1, I see deadlocks after an unpredictable number of iterations.
Yes - setting parallelism = 1 is a strange practice. In this case, I'm profiling a parallel algorithm as thread-count increases, and I want to compare the parallel version, run with to a single thread, to a baseline serial implementation, so as to accurately ascertain the overhead of the parallel implementation.
Below is a simple example that illustrates the issue I'm seeing. The 'task' is a dummy iteration over a fixed array, divided recursively into 16 subtasks.
If run with THREADS = 2 (or more), it runs reliably to completion, but if run with THREADS = 1, it invariably deadlocks. After an unpredictable number of iterations, the main loop hangs in ForkJoinPool.invoke(), waiting on task.join(), and the worker thread exits.
I'm running with JDK 1.6.0_21 and 1.6.0_22 under Linux, and using a version of jsr166y downloaded a few days ago from Doug Lea's website (http://gee.cs.oswego.edu/dl/concurrency-interest/index.html)
Any suggestions for what I'm missing? Many thanks in advance.
package concurrent;
import jsr166y.ForkJoinPool;
import jsr166y.RecursiveAction;
public class TestFjDeadlock {
private final static int[] intArray = new int[256 * 1024];
private final static float[] floatArray = new float[256 * 1024];
private final static int THREADS = 1;
private final static int TASKS = 16;
private final static int ITERATIONS = 10000;
public static void main(String[] args) {
// Initialize the array
for (int i = 0; i < intArray.length; i++) {
intArray[i] = i;
}
ForkJoinPool pool = new ForkJoinPool(THREADS);
// Run through ITERATIONS loops, subdividing the iteration into TASKS F-J subtasks
for (int i = 0; i < ITERATIONS; i++) {
pool.invoke(new RecursiveIterate(0, intArray.length));
}
pool.shutdown();
}
private static class RecursiveIterate extends RecursiveAction {
final int start;
final int end;
public RecursiveIterate(final int start, final int end) {
this.start = start;
this.end = end;
}
#Override
protected void compute() {
if ((end - start) <= (intArray.length / TASKS)) {
// We've reached the subdivision limit - iterate over the arrays
for (int i = start; i < end; i += 3) {
floatArray[i] += i + intArray[i];
}
} else {
// Subdivide and start new tasks
final int mid = (start + end) >>> 1;
invokeAll(new RecursiveIterate(start, mid), new RecursiveIterate(mid, end));
}
}
}
}

looks like a bug in the ForkJoinPool. everything i can see in the usage for the class fits your example. the only other possibility might be one of your tasks throwing an exception and dying abnormally (although that should still be handled).

Related

Performance of Parallel Sieve of Eratosthenes

I'm trying to modify the sequential "Sieve of Eratosthenes" algorithm in order to take advantage of multiple cores. My goal was to increase performance relative to the vanilla algorithm, but all of my attempts have been futile...
Here's what I have thus far:
public class ParallelSieve implements SieveCalculator
{
private int nThreads;
public ParallelSieve(int nThreads) {
this.nThreads = nThreads;
}
#Override
public SieveResult calculate(int ceiling) {
if (ceiling < Primes.MIN) {
return SieveResult.emptyResult();
}
ThreadSafeBitSet isComposite = new ThreadSafeBitSet(ceiling + 1);
ForkJoinPool threadPool = new ForkJoinPool(nThreads);
for (int n = Primes.MIN; n * n <= ceiling; ++n) {
if (isComposite.get(n)) {
continue;
}
int from = n * n;
int to = (ceiling / n) * n;
threadPool.invoke(new RecursivelyMarkSieve(isComposite, from, to, n));
}
threadPool.shutdown();
return new SieveResult(isComposite);
}
private class RecursivelyMarkSieve extends RecursiveAction
{
private static final int THRESHOLD = 1000;
private final ThreadSafeBitSet isComposite;
private final int from, to, step;
RecursivelyMarkSieve(ThreadSafeBitSet isComposite, int from, int to, int step) {
this.isComposite = isComposite;
this.from = from;
this.to = to;
this.step = step;
}
#Override
protected void compute() {
int workload = (to - from) / step + 1;
if (workload <= THRESHOLD) {
for (int index = from; index <= to; index += step) {
isComposite.set(index);
}
return;
}
int middle = (to - from) / (2 * step);
int leftSplit = from + middle * step;
int rightSplit = from + (middle + 1) * step;
ForkJoinTask.invokeAll(
new RecursivelyMarkSieve(isComposite, from, leftSplit, step),
new RecursivelyMarkSieve(isComposite, rightSplit, to, step)
);
}
}
}
My thought process was, once a prime value is found, we can break up the work of marking its multiples via a thread pool. I was drawn to the ForkJoinPool because I can limit the number of threads being used, and easily submit it custom, recursive tasks that break up the work of marking multiples. Still, my solution is too slow! Any suggestions?
With all prospective multi-threading solutions you have to balance the advantage to be gained by multiplying the amount of processing available against the overheads of administering the multi-threaded solution.
In particular:
There is some overhead to starting threads.
If you synchronize or use a thread-safe class (which has synchronization built in) there is the overhead of synchronization, plus the fact that while using synchronized methods you are possibly funnelling the solution back down to a single thread.
Looking at your solution, the actual logic (the compute method) has very little in it in terms of computation, but accesses the thread-safe bit set and it starts a new thread. So the overheads will far outweight the actual logic.
To use multi-threading effectively you need to figure out how to split up your task such that there is a significant amount of work to be done by each thread and the use of an synchronized data structures is limited. You can't invoke a new thread for each integer you come across.
There is a lot online on how to parallelize the sieve of Eratosthenes, so I suggest looking at how others have tackled the problem.
The general paradigm today is "map-reduce". Split the problem-set into chunks. Process each chunk separately. Collate the results back together again. Repeat and/or recurse.

Java ThreadPool limit maximum thread ever created

I am trying to write a Java multithreaded program performing a multiplication on 2 matrices given as a file and using a limited total of threads used.
For example if I set a number of thread at 16 I want my threadpool to be able to reuse those 16 threads until all the tasks are done.
However I end up with a larger execution time for a larger number of threads and I am having a hard time trying to understand why.
Runnable:
class Task implements Runnable
{
int _row = 0;
int _col = 0;
public Task(int row, int col)
{
_row = row;
_col = col;
}
#Override
public void run()
{
Application.multiply(_row, _col);
}
}
Application:
public class Application
{
private static Scanner sc = new Scanner(System.in);
private static int _A[][];
private static int _B[][];
private static int _C[][];
public static void main(final String [] args) throws InterruptedException
{
ExecutorService executor = Executors.newFixedThreadPool(16);
ThreadPoolExecutor pool = (ThreadPoolExecutor) executor;
_A = readMatrix();
_B = readMatrix();
_C = new int[_A.length][_B[0].length];
long startTime = System.currentTimeMillis();
for (int x = 0; x < _C.length; x++)
{
for (int y = 0; y < _C[0].length; y++)
{
executor.execute(new Task(x, y));
}
}
long endTime = System.currentTimeMillis();
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.HOURS);
System.out.printf("Calculation Time: %d ms\n" , endTime - startTime);
}
public static void multMatrix(int row, int col)
{
int sum = 0;
for (int i = 0; i < _B.length; i++)
{
sum += _A[row][i] * _B[i][col];
}
_C[row][col] = sum;
}
...
}
The matrix calculations and workload sharing seems correct so it might come from a bad use of ThreadPool
Context switching takes time.
If you have 8 cores and you are executing 8 threads they all can work simultaneously and as soon as one finishes it will be reused.
On the other hand if you have 16 threads for 8 cores each thread will compete for the processor time and scheduler will switch those threads and your time would increase to - Execution time + Context swithcing.
The more the threads the more the context switching and hence the time increases.
Those threads are already being reused to execute the tasks, that's the expected behaviour of ThreadPoolExecutor.
http://www.codejava.net/java-core/concurrency/java-concurrency-understanding-thread-pool-and-executors
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
You're getting a higher computation time as you increase the name of threads because the time needed to create them is greater than the improvement of performance that the concurrency gives at the execution of that -relative short- tasks.
Use submit instead of execute
Make a list of returned Futures so that you can wait for them.
List<Future<?>> futures = new ArrayList<>();
futures.add(executor.submit(new Task(x, y)));
Then just wait for these futures to complete.

Sum of range multithreading

My program is trying to sum a range with a given number of threads in order to run it in parallel but it seems that with just one threads it runs better than 4 (I have an 8 core CPU). It is my first time working with multithreading in Java so maybe I have a problem in my code that makes it take longer?
My benchmarks(sum of range 0-10000) done for the moment are:
1 thread: 1350 microsecs (average)
2 thread: 1800 microsecs (average)
4 thread: 2400 microsecs (average)
8 thread: 3300 microsecs (average)
Thanks in advance!
/*
Compile: javac RangeSum.java
Execute: java RangeSum nThreads initRange finRange
*/
import java.util.ArrayList;
import java.util.concurrent.*;
public class RangeSum implements Runnable {
private int init;
private int end;
private int id;
static public int out = 0;
Object lock = new Object();
public synchronized static void increment(int partial) {
out = out + partial;
}
public RangeSum(int init,int end) {
this.init = init;
this.end = end;
}//parameters to pass in threads
// the function called for each thread
public void run() {
int partial = 0;
for(int k = this.init; k < this.end; k++)
{
partial = k + partial + 1;
}
increment(partial);
}//thread: sum its id to the out variable
public static void main(String args[]) throws InterruptedException {
final long startTime = System.nanoTime()/1000;//start time: microsecs
//get command line values for
int NumberOfThreads = Integer.valueOf(args[0]);
int initRange = Integer.valueOf(args[1]);
int finRange = Integer.valueOf(args[2]);
//int[] out = new int[NumberOfThreads];
// an array of threads
ArrayList<Thread> Threads = new ArrayList<Thread>(NumberOfThreads);
// spawn the threads / CREATE
for (int i = 0; i < NumberOfThreads; i++) {
int initial = i*finRange/NumberOfThreads;
int end = (i+1)*finRange/NumberOfThreads;
Threads.add(i, new Thread(new RangeSum(initial,end)));
Threads.get(i).start();
}
// wait for the threads to finish / JOIN
for (int i = 0; i < NumberOfThreads; i++) {
try {
Threads.get(i).join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
System.out.println("All threads finished!");
System.out.println("Total range sum: " + out);
final long endTime = System.nanoTime()/1000;//end time
System.out.println("Time elapsed: "+(endTime - startTime));
}
}
Your workload entirely in memory-non-blocking computation - on a general principle, in this kind of scenario, a single thread will complete the work faster than multiple threads.
Multiple threads tend to interfere with the L1/L2 CPU caching and incur additional overhead for context
switching
Specifically, wrt to your code, you initialize final long startTime = System.nanoTime()/1000; too early and measure thread setup time as well as the actual time it takes them to complete. Its probably better to setup your Threads list first and then:
final long startTime =...
for (int i = 0; i < NumberOfThreads; i++) {
Thread.get(i).start()
}
but really, in this case, the expectations that multiple threads will improve processing time is not warranted.

Not expected result with multithread programming

I'm in troubles with a multithreading java program.
The program consists of a splitted sum of an array of integers with multithreads and than the total sum of the slices.
The problem is that computing time does not decrements by incrementing number of threads (I know that there is a limit number of threads after that the computing time is slower than less threads). I expect to see a decrease of execution time before that limit number of threads (benefits of parallel execution). I use the variable fake in run method to make time "readable".
public class MainClass {
private final int MAX_THREAD = 8;
private final int ARRAY_SIZE = 1000000;
private int[] array;
private SimpleThread[] threads;
private int numThread = 1;
private int[] sum;
private int start = 0;
private int totalSum = 0;
long begin, end;
int fake;
MainClass() {
fillArray();
for(int i = 0; i < MAX_THREAD; i++) {
threads = new SimpleThread[numThread];
sum = new int[numThread];
begin = (long) System.currentTimeMillis();
for(int j = 0 ; j < numThread; j++) {
threads[j] = new SimpleThread(start, ARRAY_SIZE/numThread, j);
threads[j].start();
start+= ARRAY_SIZE/numThread;
}
for(int k = 0; k < numThread; k++) {
try {
threads[k].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
end = (long) System.currentTimeMillis();
for(int g = 0; g < numThread; g++) {
totalSum+=sum[g];
}
System.out.printf("Result with %d thread-- Sum = %d Time = %d\n", numThread, totalSum, end-begin);
numThread++;
start = 0;
totalSum = 0;
}
}
public static void main(String args[]) {
new MainClass();
}
private void fillArray() {
array = new int[ARRAY_SIZE];
for(int i = 0; i < ARRAY_SIZE; i++)
array[i] = 1;
}
private class SimpleThread extends Thread{
int start;
int size;
int index;
public SimpleThread(int start, int size, int sumIndex) {
this.start = start;
this.size = size;
this.index = sumIndex;
}
public void run() {
for(int i = start; i < start+size; i++)
sum[index]+=array[i];
for(long i = 0; i < 1000000000; i++) {
fake++;
}
}
}
Unexpected Result Screenshot
As a general rule, you won't get a speedup from multi-threading if the "work" performed by each thread is less than the overheads of using the threads.
One of the overheads is the cost of starting a new thread. This is surprisingly high. Each time you start a thread the JVM needs to perform syscalls to allocate the thread stack memory segment and the "red zone" memory segment, and initialize them. (The default thread stack size is typically 500KB or 1MB.) Then there are further syscalls to create the native thread and schedule it.
In this example, you have 1,000,000 elements to sum and you divide this work among N threads. As N increases, the amount of work performed by each thread decreases.
It is not hard to see that the time taken to sum 1,000,000 elements is going to be less than the time needed to start 4 threads ... just based on counting the memory read and write operations. Then you need to take into account that the child threads are created one at a time by the parent thread.
If you do the analysis completely, it is clear that there is a point at which adding more threads actually slows down the computation even if you have enough to cores to run all threads in parallel. And your benchmarking seems to suggest1 that that point is around about 2 threads.
By the way, there is a second reason why you may not get as much speedup as you expect for a benchmark like this one. The "work" that each thread is doing is basically scanning a large array. Reading and writing arrays will generate requests to the memory system. Ideally, these requests will be satisfied by the (fast) on-chip memory caches. However, if you try to read / write an array that is larger than the memory cache, then many / most of those requests turn into (slow) main memory requests. Worse still, if you have N cores all doing this then you can find that the number of main memory requests is too much for the memory system to keep up .... and the threads slow down.
The bottom line is that multi-threading does not automatically make an application faster, and it certainly won't if you do it the wrong way.
In your example:
the amount of work per thread is too small compared with the overheads of creating and starting threads, and
memory bandwidth effects are likely to be a problem if can "factor out" the thread creation overheads
1 - I don't understand the point of the "fake" computation. It probably invalidates the benchmark, though it is possible that the JIT compiler optimizes it away.
Why sum is wrong sometimes?
Because ARRAY_SIZE/numThread may have fractional part (e.g. 1000000/3=333333.3333333333) which gets rounded down so start variable loses some hence the sum maybe less than 1000000 depending on the value of divisor.
Why the time taken is increasing as the number of threads increases?
Because in the run function of each thread you do this:
for(long i = 0; i < 1000000000; i++) {
fake++;
}
which I do not understand from your question :
I use the variable fake in run method to make time "readable".
what that means. But every thread needs to increment your fake variable 1000000000 times.
As a side note, for what you're trying to do there is the Fork/Join-Framework. It allows you easily split tasks recursively and implements an algorithm which will distribute your workload automatically.
There is a guide available here; it's example is very similar to your case, which boils down to a RecursiveTask like this:
class Adder extends RecursiveTask<Integer>
{
private int[] toAdd;
private int from;
private int to;
/** Add the numbers in the given array */
public Adder(int[] toAdd)
{
this(toAdd, 0, toAdd.length);
}
/** Add the numbers in the given array between the given indices;
internal constructor to split work */
private Adder(int[] toAdd, int fromIndex, int upToIndex)
{
this.toAdd = toAdd;
this.from = fromIndex;
this.to = upToIndex;
}
/** This is the work method */
#Override
protected Integer compute()
{
int amount = to - from;
int result = 0;
if (amount < 500)
{
// base case: add ints and return the result
for (int i = from; i < to; i++)
{
result += toAdd[i];
}
}
else
{
// array too large: split it into two parts and distribute the actual adding
int newEndIndex = from + (amount / 2);
Collection<Adder> invokeAll = invokeAll(Arrays.asList(
new Adder(toAdd, from, newEndIndex),
new Adder(toAdd, newEndIndex, to)));
for (Adder a : invokeAll)
{
result += a.invoke();
}
}
return result;
}
}
To actually run this, you can use
RecursiveTask adder = new Adder(fillArray(ARRAY_LENGTH));
int result = ForkJoinPool.commonPool().invoke(adder);
Starting threads is heavy and you'll only see the benefit of it on large processes that don't compete for the same resources (none of it applies here).

Multithreading only .4 of a second faster?

so for my programming class we have to do the following:
Fill an integer array with 5 million integers ranging from 0-9.
Then find the number of times each number (0-9) occurs and display this.
We have to measure the time it takes to count the occurences for both single threaded, and multi-threaded. Currently I average 9.3ms for single threaded, and 8.9 ms multithreaded with 8 threads on my 8 core cpu, why is this?
Currently for multithreading I have one array filled with numbers and am calculating lower and upper bounds for each thread to count occurences. here is my current attempt:
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
threads[i].join();
}
}
Could anyone shed some light? Cheers.
You are essentially doing all the work sequentially because each thread you create you immediately join it.
Move the threads[i].join() outside the main construction loop into it's own loop. While you're at it you should probably also start all of the threads outside of the loop as starting them while new threads are still being created is not a good idea because creating threads takes time.
class ThreadTester {
private final int threadCount;
private final int numberCount;
int[] numbers = new int[5_000_000];
AtomicIntegerArray occurences;
Thread[] threads;
AtomicLong milliseconds = new AtomicLong();
public ThreadTester(int threadCount, int numberCount) {
this.threadCount = threadCount;
this.numberCount = numberCount;
occurences = new AtomicIntegerArray(numberCount);
threads = new Thread[threadCount];
Random r = new Random();
for (int i = 0; i < numbers.length; i++) {
numbers[i] = r.nextInt(numberCount);
}
}
public void createThreads() throws InterruptedException {
final int divisionSize = numbers.length / threadCount;
for (int i = 0; i < threads.length; i++) {
final int lower = (i * divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
#Override
public void run() {
long start = System.nanoTime();
for (int i = lower; i <= upper; i++) {
occurences.addAndGet(numbers[i], 1);
}
long end = System.nanoTime();
milliseconds.addAndGet(end - start);
}
});
}
}
private void startThreads() {
for (Thread thread : threads) {
thread.start();
}
}
private void finishThreads() throws InterruptedException {
for (Thread thread : threads) {
thread.join();
}
}
public long test() throws InterruptedException {
createThreads();
startThreads();
finishThreads();
return milliseconds.get();
}
}
public void test() throws InterruptedException {
for (int threads = 1; threads < 50; threads++) {
ThreadTester tester = new ThreadTester(threads, 10);
System.out.println("Threads=" + threads + " ns=" + tester.test());
}
}
Note that even here the fastest solution is using one thread but you can clearly see that an even number of threads does it quicker as I am using an i5 which has 2 cores but works as 4 via hyperthreading.
Interestingly though - as suggested by #biziclop - removing all contention between threads via the occurrences by giving each thread its own `occurrences array we get a more expected result:
The other answers all explored the immediate problems with your code, I'll give you a different angle: one that's about design of multi-threading in general.
The idea of parallel computing speeding up calculations depends on the assumption that the small bits you broke the problem up into can indeed be run in parallel, independently of each other.
And at first glance, your problem is exactly like that, chop the input range up into 8 equal parts, fire up 8 threads and off they go.
There is a catch though:
occurences[numbers[i]]++;
The occurences array is a resource shared by all threads, and therefore you must control access to it to ensure correctness: either by explicit synchronization (which is slow) or something like an AtomicIntegerArray. But the Atomic* classes are only really fast if access to them is rarely contested. And in your case access will be contested a lot, because most of what your inner loop does is incrementing the number of occurrences.
So what can you do?
The problem is caused partly by the fact that occurences is such a small structure (an array with 10 elements only, regardless of input size), threads will continuously try to update the same element. But you can turn that to your advantage: make all the threads keep their own separate tally, and when they all finished, just add up their results. This will add a small, constant overhead to the end of the process but will make the calculations go truly parallel.
The join method allows one thread to wait for the completion of another, so the second thread will start only after the first will finish.
Join each thread after you started all threads.
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
}
for(int i = 0; i < threads.length; i++) {
threads[i].join();
}
}
Also there seem to be a race condition in code at occurences[numbers[i]]++
So most probably if you update the code and use more threads the output wouldn't be correct. You should use an AtomicIntegerArray: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicIntegerArray.html
Use an ExecutorService with Callable and invoke all tasks then you can safely aggregate them. Also use TimeUnit for elapsing time manipulations (sleep, joining, waiting, convertion, ...)
Start by defining the task with his input/output :
class Task implements Callable<Task> {
// input
int[] source;
int sliceStart;
int sliceEnd;
// output
int[] occurences = new int[10];
String runner;
long elapsed = 0;
Task(int[] source, int sliceStart, int sliceEnd) {
this.source = source;
this.sliceStart = sliceStart;
this.sliceEnd = sliceEnd;
}
#Override
public Task call() {
runner = Thread.currentThread().getName();
long start = System.nanoTime();
try {
compute();
} finally {
elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
}
return this;
}
void compute() {
for (int i = sliceStart; i < sliceEnd; i++) {
occurences[source[i]]++;
}
}
}
Then let's define some variable to manage parameters:
// Parametters
int size = 5_000_000;
int parallel = Runtime.getRuntime().availableProcessors();
int slices = parallel;
Then generates random input:
// Generated source
int[] source = new int[size];
ThreadLocalRandom random = ThreadLocalRandom.current();
for (int i = 0; i < source.length; i++) source[i] = random.nextInt(10);
Start timing total computation and prepare tasks:
long start = System.nanoTime();
// Prepare tasks
List<Task> tasks = new ArrayList<>(slices);
int sliceSize = source.length / slices;
for (int sliceStart = 0; sliceStart < source.length;) {
int sliceEnd = Math.min(sliceStart + sliceSize, source.length);
Task task = new Task(source, sliceStart, sliceEnd);
tasks.add(task);
sliceStart = sliceEnd;
}
Executes all task on threading configuration (don't forget to shutdown it !):
// Execute tasks
ExecutorService executor = Executors.newFixedThreadPool(parallel);
try {
executor.invokeAll(tasks);
} finally {
executor.shutdown();
}
Then task have been completed, just aggregate data:
// Collect data
int[] occurences = new int[10];
for (Task task : tasks) {
for (int i = 0; i < occurences.length; i++) {
occurences[i] += task.occurences[i];
}
}
Finally you can output computation result:
// Display result
long elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
System.out.printf("Computation done in %tT.%<tL%n", calendar(elapsed));
System.out.printf("Results: %s%n", Arrays.toString(occurences));
You can also output partial computations:
// Print debug output
int idxSize = (String.valueOf(size).length() * 4) / 3;
String template = "Slice[%," + idxSize + "d-%," + idxSize + "d] computed in %tT.%<tL by %s: %s%n";
for (Task task : tasks) {
System.out.printf(template, task.sliceStart, task.sliceEnd, calendar(task.elapsed), task.runner, Arrays.toString(task.occurences));
}
Which gives on my workstation:
Computation done in 00:00:00.024
Results: [500159, 500875, 500617, 499785, 500017, 500777, 498394, 498614, 499498, 501264]
Slice[ 0-1 250 000] computed in 00:00:00.013 by pool-1-thread-1: [125339, 125580, 125338, 124888, 124751, 124608, 124463, 124351, 125023, 125659]
Slice[1 250 000-2 500 000] computed in 00:00:00.014 by pool-1-thread-2: [124766, 125423, 125111, 124756, 125201, 125695, 124266, 124405, 125083, 125294]
Slice[2 500 000-3 750 000] computed in 00:00:00.013 by pool-1-thread-3: [124903, 124756, 124934, 125640, 124954, 125452, 124556, 124816, 124737, 125252]
Slice[3 750 000-5 000 000] computed in 00:00:00.014 by pool-1-thread-4: [125151, 125116, 125234, 124501, 125111, 125022, 125109, 125042, 124655, 125059]
the small trick to convert elapsed millis in a stopwatch calendar:
static final TimeZone UTC= TimeZone.getTimeZone("UTC");
public static Calendar calendar(long millis) {
Calendar calendar = Calendar.getInstance(UTC);
calendar.setTimeInMillis(millis);
return calendar;
}

Categories