My program is trying to sum a range with a given number of threads in order to run it in parallel but it seems that with just one threads it runs better than 4 (I have an 8 core CPU). It is my first time working with multithreading in Java so maybe I have a problem in my code that makes it take longer?
My benchmarks(sum of range 0-10000) done for the moment are:
1 thread: 1350 microsecs (average)
2 thread: 1800 microsecs (average)
4 thread: 2400 microsecs (average)
8 thread: 3300 microsecs (average)
Thanks in advance!
/*
Compile: javac RangeSum.java
Execute: java RangeSum nThreads initRange finRange
*/
import java.util.ArrayList;
import java.util.concurrent.*;
public class RangeSum implements Runnable {
private int init;
private int end;
private int id;
static public int out = 0;
Object lock = new Object();
public synchronized static void increment(int partial) {
out = out + partial;
}
public RangeSum(int init,int end) {
this.init = init;
this.end = end;
}//parameters to pass in threads
// the function called for each thread
public void run() {
int partial = 0;
for(int k = this.init; k < this.end; k++)
{
partial = k + partial + 1;
}
increment(partial);
}//thread: sum its id to the out variable
public static void main(String args[]) throws InterruptedException {
final long startTime = System.nanoTime()/1000;//start time: microsecs
//get command line values for
int NumberOfThreads = Integer.valueOf(args[0]);
int initRange = Integer.valueOf(args[1]);
int finRange = Integer.valueOf(args[2]);
//int[] out = new int[NumberOfThreads];
// an array of threads
ArrayList<Thread> Threads = new ArrayList<Thread>(NumberOfThreads);
// spawn the threads / CREATE
for (int i = 0; i < NumberOfThreads; i++) {
int initial = i*finRange/NumberOfThreads;
int end = (i+1)*finRange/NumberOfThreads;
Threads.add(i, new Thread(new RangeSum(initial,end)));
Threads.get(i).start();
}
// wait for the threads to finish / JOIN
for (int i = 0; i < NumberOfThreads; i++) {
try {
Threads.get(i).join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
System.out.println("All threads finished!");
System.out.println("Total range sum: " + out);
final long endTime = System.nanoTime()/1000;//end time
System.out.println("Time elapsed: "+(endTime - startTime));
}
}
Your workload entirely in memory-non-blocking computation - on a general principle, in this kind of scenario, a single thread will complete the work faster than multiple threads.
Multiple threads tend to interfere with the L1/L2 CPU caching and incur additional overhead for context
switching
Specifically, wrt to your code, you initialize final long startTime = System.nanoTime()/1000; too early and measure thread setup time as well as the actual time it takes them to complete. Its probably better to setup your Threads list first and then:
final long startTime =...
for (int i = 0; i < NumberOfThreads; i++) {
Thread.get(i).start()
}
but really, in this case, the expectations that multiple threads will improve processing time is not warranted.
Related
I am trying to write a Java multithreaded program performing a multiplication on 2 matrices given as a file and using a limited total of threads used.
For example if I set a number of thread at 16 I want my threadpool to be able to reuse those 16 threads until all the tasks are done.
However I end up with a larger execution time for a larger number of threads and I am having a hard time trying to understand why.
Runnable:
class Task implements Runnable
{
int _row = 0;
int _col = 0;
public Task(int row, int col)
{
_row = row;
_col = col;
}
#Override
public void run()
{
Application.multiply(_row, _col);
}
}
Application:
public class Application
{
private static Scanner sc = new Scanner(System.in);
private static int _A[][];
private static int _B[][];
private static int _C[][];
public static void main(final String [] args) throws InterruptedException
{
ExecutorService executor = Executors.newFixedThreadPool(16);
ThreadPoolExecutor pool = (ThreadPoolExecutor) executor;
_A = readMatrix();
_B = readMatrix();
_C = new int[_A.length][_B[0].length];
long startTime = System.currentTimeMillis();
for (int x = 0; x < _C.length; x++)
{
for (int y = 0; y < _C[0].length; y++)
{
executor.execute(new Task(x, y));
}
}
long endTime = System.currentTimeMillis();
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.HOURS);
System.out.printf("Calculation Time: %d ms\n" , endTime - startTime);
}
public static void multMatrix(int row, int col)
{
int sum = 0;
for (int i = 0; i < _B.length; i++)
{
sum += _A[row][i] * _B[i][col];
}
_C[row][col] = sum;
}
...
}
The matrix calculations and workload sharing seems correct so it might come from a bad use of ThreadPool
Context switching takes time.
If you have 8 cores and you are executing 8 threads they all can work simultaneously and as soon as one finishes it will be reused.
On the other hand if you have 16 threads for 8 cores each thread will compete for the processor time and scheduler will switch those threads and your time would increase to - Execution time + Context swithcing.
The more the threads the more the context switching and hence the time increases.
Those threads are already being reused to execute the tasks, that's the expected behaviour of ThreadPoolExecutor.
http://www.codejava.net/java-core/concurrency/java-concurrency-understanding-thread-pool-and-executors
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
You're getting a higher computation time as you increase the name of threads because the time needed to create them is greater than the improvement of performance that the concurrency gives at the execution of that -relative short- tasks.
Use submit instead of execute
Make a list of returned Futures so that you can wait for them.
List<Future<?>> futures = new ArrayList<>();
futures.add(executor.submit(new Task(x, y)));
Then just wait for these futures to complete.
Say I want to go through a loop a billion times how could I optimize the loop to get my results faster?
As an example:
double randompoint;
for(long count =0; count < 1000000000; count++) {
randompoint = (Math.random() * 1) + 0; //generate a random point
if(randompoint <= .75) {
var++;
}
}
I was reading up on vecterization? But I'm not quite sure how to go about it. Any Ideas?
Since Java is cross-platform, you pretty much have to rely on the JIT to vectorize. In your case it can't, since each iteration depends heavily on the previous one (due to how the RNG works).
However, there are two other major ways to improve your computation.
The first is that this work is very amenable to parallelization. The technical term is embarrassingly parallel. This means that multithreading will give a perfectly linear speedup over the number of cores.
The second is that Math.random() is written to be multithreading safe, which also means that it's slow because it needs to use atomic operations. This isn't helpful, so we can skip that overhead by using a non-threadsafe RNG.
I haven't written much Java since 1.5, but here's a dumb implementation:
import java.util.*;
import java.util.concurrent.*;
class Foo implements Runnable {
private long count;
private double threshold;
private long result;
public Foo(long count, double threshold) {
this.count = count;
this.threshold = threshold;
}
public void run() {
ThreadLocalRandom rand = ThreadLocalRandom.current();
for(long l=0; l<count; l++) {
if(rand.nextDouble() < threshold)
result++;
}
}
public static void main(String[] args) throws Exception {
long count = 1000000000;
double threshold = 0.75;
int cores = Runtime.getRuntime().availableProcessors();
long sum = 0;
List<Foo> list = new ArrayList<Foo>();
List<Thread> threads = new ArrayList<Thread>();
for(int i=0; i<cores; i++) {
// TODO: account for count%cores!=0
Foo t = new Foo(count/cores, threshold);
list.add(t);
Thread thread = new Thread(t);
thread.start();
threads.add(thread);
}
for(Thread t : threads) t.join();
for(Foo f : list) sum += f.result;
System.out.println(sum);
}
}
You can also optimize and inline the random generator, to avoid going via doubles. Here it is with code taken from the ThreadLocalRandom docs:
public void run() {
long seed = new Random().nextLong();
long limit = (long) ((1L<<48) * threshold);
for(int i=0; i<count; i++) {
seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
if (seed < limit) ++result;
}
}
However, the best approach is to work smarter, not harder. As the number of events increases, the probability tends towards a normal distribution. This means that for your huge range, you can randomly generate a number with such a distribution and scale it:
import java.util.Random;
class StayInSchool {
public static void main(String[] args) {
System.out.println(coinToss(1000000000, 0.75));
}
static long coinToss(long iterations, double threshold) {
double mean = threshold * iterations;
double stdDev = Math.sqrt(threshold * (1-threshold) * iterations);
double p = new Random().nextGaussian();
return (long) (p*stdDev + mean);
}
}
Here are the timings on my 4 core system (including VM startup) for these approaches:
Your baseline: 20.9s
Single threaded ThreadLocalRandom: 6.51s
Single threaded optimized random: 1.75s
Multithreaded ThreadLocalRandom: 1.67s
Multithreaded optimized random: 0.89s
Generating a gaussian: 0.14s
I have the code like the below. In a loop it is executing the method "process". It is running sequentially. I want to run this method parallel, but it should be finished within the loop so that I can sum in the next line. i.e even it is running parallel all functions should finish before the 2nd for loop execute.
How to solve this in Jdk1.7 not JDK1.8 version?
public static void main(String s[]){
int arrlen = 10;
int arr[] = new int[arrlen] ;
int t =0;
for(int i=0;i<arrlen;i++){
arr[i] = i;
t = process(arr[i]);
arr[i] = t;
}
int sum =0;
for(int i=0;i<arrlen;i++){
sum += arr[i];
}
System.out.println(sum);
}
public static int process(int arr){
return arr*2;
}
Below example might help you. I have used fork/join framework to do that.
For small array size like your example, conventional method might be faster and I doubt that fork/join way would take slight higher time. But for larger size or process , fork/join framework is suitable. Even java 8 parallel streams uses fork/join framework as underlying base.
public class ForkMultiplier extends RecursiveAction {
int[] array;
int threshold = 3;
int start;
int end;
public ForkMultiplier(int[] array,int start, int end) {
this.array = array;
this.start = start;
this.end = end;
}
protected void compute() {
if (end - start < threshold) {
computeDirectly();
} else {
int middle = (end + start) / 2;
ForkMultiplier f1= new ForkMultiplier(array, start, middle);
ForkMultiplier f2= new ForkMultiplier(array, middle, end);
invokeAll(f1, f2);
}
}
protected void computeDirectly() {
for (int i = start; i < end; i++) {
array[i] = array[i] * 2;
}
}
}
You main class would like this below
public static void main(String s[]){
int arrlen = 10;
int arr[] = new int[arrlen] ;
for(int i=0;i<arrlen;i++){
arr[i] = i;
}
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new ForkMultiplier(arr, 0, arr.length));
int sum =0;
for(int i=0;i<arrlen;i++){
sum += arr[i];
}
System.out.println(sum);
}
You basically need to use Executors and Futures combined that exist since Java 1.5 (see Java Documentation).
In the following example, I've created a main class that uses another helper class that acts like the processor you want to parallelize.
The main class is splitted in 3 steps:
Creates the processes pool and executes tasks in parallel.
Waits for all tasks to finish their work.
Collects the results from tasks.
For didactic reasons, I've put some logs and more important, I've put a random waiting time in each process' business logic, simulating a time-consuming algorithm ran by the Process class.
The maximum waiting time for each process is 2 seconds, which is also the highest waiting time for step 2, even if you increase the number of parallel tasks (just try changing the variable totalTasks of the following code to test it).
Here the Main class:
package com.example;
import java.util.ArrayList;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class Main
{
public static void main(String[] args) throws InterruptedException, ExecutionException
{
int totalTasks = 100;
ExecutorService newFixedThreadPool = Executors.newFixedThreadPool(totalTasks);
System.out.println("Step 1 - Starting parallel tasks");
ArrayList<Future<Integer>> tasks = new ArrayList<Future<Integer>>();
for (int i = 0; i < totalTasks; i++) {
tasks.add(newFixedThreadPool.submit(new Process(i)));
}
long ts = System.currentTimeMillis();
System.out.println("Step 2 - Wait for processes to finish...");
boolean tasksCompleted;
do {
tasksCompleted = true;
for (Future<Integer> task : tasks) {
if (!task.isDone()) {
tasksCompleted = false;
Thread.sleep(10);
break;
}
}
} while (!tasksCompleted);
System.out.println(String.format("Step 2 - End in '%.3f' seconds", (System.currentTimeMillis() - ts) / 1000.0));
System.out.println("Step 3 - All processes finished to run, let's collect results...");
Integer sum = 0;
for (Future<Integer> task : tasks) {
sum += task.get();
}
System.out.println(String.format("Total final sum is: %d", sum));
}
}
Here the Process class:
package com.example;
import java.util.concurrent.Callable;
public class Process implements Callable<Integer>
{
private Integer value;
public Process(Integer value)
{
this.value = value;
}
public Integer call() throws Exception
{
Long sleepTime = (long)(Math.random() * 2000);
System.out.println(String.format("Starting process with value %d, sleep time %d", this.value, sleepTime));
Thread.sleep(sleepTime);
System.out.println(String.format("Stopping process with value %d", this.value));
return value * 2;
}
}
Hope this helps.
so for my programming class we have to do the following:
Fill an integer array with 5 million integers ranging from 0-9.
Then find the number of times each number (0-9) occurs and display this.
We have to measure the time it takes to count the occurences for both single threaded, and multi-threaded. Currently I average 9.3ms for single threaded, and 8.9 ms multithreaded with 8 threads on my 8 core cpu, why is this?
Currently for multithreading I have one array filled with numbers and am calculating lower and upper bounds for each thread to count occurences. here is my current attempt:
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
threads[i].join();
}
}
Could anyone shed some light? Cheers.
You are essentially doing all the work sequentially because each thread you create you immediately join it.
Move the threads[i].join() outside the main construction loop into it's own loop. While you're at it you should probably also start all of the threads outside of the loop as starting them while new threads are still being created is not a good idea because creating threads takes time.
class ThreadTester {
private final int threadCount;
private final int numberCount;
int[] numbers = new int[5_000_000];
AtomicIntegerArray occurences;
Thread[] threads;
AtomicLong milliseconds = new AtomicLong();
public ThreadTester(int threadCount, int numberCount) {
this.threadCount = threadCount;
this.numberCount = numberCount;
occurences = new AtomicIntegerArray(numberCount);
threads = new Thread[threadCount];
Random r = new Random();
for (int i = 0; i < numbers.length; i++) {
numbers[i] = r.nextInt(numberCount);
}
}
public void createThreads() throws InterruptedException {
final int divisionSize = numbers.length / threadCount;
for (int i = 0; i < threads.length; i++) {
final int lower = (i * divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
#Override
public void run() {
long start = System.nanoTime();
for (int i = lower; i <= upper; i++) {
occurences.addAndGet(numbers[i], 1);
}
long end = System.nanoTime();
milliseconds.addAndGet(end - start);
}
});
}
}
private void startThreads() {
for (Thread thread : threads) {
thread.start();
}
}
private void finishThreads() throws InterruptedException {
for (Thread thread : threads) {
thread.join();
}
}
public long test() throws InterruptedException {
createThreads();
startThreads();
finishThreads();
return milliseconds.get();
}
}
public void test() throws InterruptedException {
for (int threads = 1; threads < 50; threads++) {
ThreadTester tester = new ThreadTester(threads, 10);
System.out.println("Threads=" + threads + " ns=" + tester.test());
}
}
Note that even here the fastest solution is using one thread but you can clearly see that an even number of threads does it quicker as I am using an i5 which has 2 cores but works as 4 via hyperthreading.
Interestingly though - as suggested by #biziclop - removing all contention between threads via the occurrences by giving each thread its own `occurrences array we get a more expected result:
The other answers all explored the immediate problems with your code, I'll give you a different angle: one that's about design of multi-threading in general.
The idea of parallel computing speeding up calculations depends on the assumption that the small bits you broke the problem up into can indeed be run in parallel, independently of each other.
And at first glance, your problem is exactly like that, chop the input range up into 8 equal parts, fire up 8 threads and off they go.
There is a catch though:
occurences[numbers[i]]++;
The occurences array is a resource shared by all threads, and therefore you must control access to it to ensure correctness: either by explicit synchronization (which is slow) or something like an AtomicIntegerArray. But the Atomic* classes are only really fast if access to them is rarely contested. And in your case access will be contested a lot, because most of what your inner loop does is incrementing the number of occurrences.
So what can you do?
The problem is caused partly by the fact that occurences is such a small structure (an array with 10 elements only, regardless of input size), threads will continuously try to update the same element. But you can turn that to your advantage: make all the threads keep their own separate tally, and when they all finished, just add up their results. This will add a small, constant overhead to the end of the process but will make the calculations go truly parallel.
The join method allows one thread to wait for the completion of another, so the second thread will start only after the first will finish.
Join each thread after you started all threads.
public void createThreads(int divisionSize) throws InterruptedException {
threads = new Thread[threadCount];
for(int i = 0; i < threads.length; i++) {
final int lower = (i*divisionSize);
final int upper = lower + divisionSize - 1;
threads[i] = new Thread(new Runnable() {
long start, end;
#Override
public void run() {
start = System.nanoTime();
for(int i = lower; i <= upper; i++) {
occurences[numbers[i]]++;
}
end = System.nanoTime();
milliseconds += (end-start)/1000000.0;
}
});
threads[i].start();
}
for(int i = 0; i < threads.length; i++) {
threads[i].join();
}
}
Also there seem to be a race condition in code at occurences[numbers[i]]++
So most probably if you update the code and use more threads the output wouldn't be correct. You should use an AtomicIntegerArray: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicIntegerArray.html
Use an ExecutorService with Callable and invoke all tasks then you can safely aggregate them. Also use TimeUnit for elapsing time manipulations (sleep, joining, waiting, convertion, ...)
Start by defining the task with his input/output :
class Task implements Callable<Task> {
// input
int[] source;
int sliceStart;
int sliceEnd;
// output
int[] occurences = new int[10];
String runner;
long elapsed = 0;
Task(int[] source, int sliceStart, int sliceEnd) {
this.source = source;
this.sliceStart = sliceStart;
this.sliceEnd = sliceEnd;
}
#Override
public Task call() {
runner = Thread.currentThread().getName();
long start = System.nanoTime();
try {
compute();
} finally {
elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
}
return this;
}
void compute() {
for (int i = sliceStart; i < sliceEnd; i++) {
occurences[source[i]]++;
}
}
}
Then let's define some variable to manage parameters:
// Parametters
int size = 5_000_000;
int parallel = Runtime.getRuntime().availableProcessors();
int slices = parallel;
Then generates random input:
// Generated source
int[] source = new int[size];
ThreadLocalRandom random = ThreadLocalRandom.current();
for (int i = 0; i < source.length; i++) source[i] = random.nextInt(10);
Start timing total computation and prepare tasks:
long start = System.nanoTime();
// Prepare tasks
List<Task> tasks = new ArrayList<>(slices);
int sliceSize = source.length / slices;
for (int sliceStart = 0; sliceStart < source.length;) {
int sliceEnd = Math.min(sliceStart + sliceSize, source.length);
Task task = new Task(source, sliceStart, sliceEnd);
tasks.add(task);
sliceStart = sliceEnd;
}
Executes all task on threading configuration (don't forget to shutdown it !):
// Execute tasks
ExecutorService executor = Executors.newFixedThreadPool(parallel);
try {
executor.invokeAll(tasks);
} finally {
executor.shutdown();
}
Then task have been completed, just aggregate data:
// Collect data
int[] occurences = new int[10];
for (Task task : tasks) {
for (int i = 0; i < occurences.length; i++) {
occurences[i] += task.occurences[i];
}
}
Finally you can output computation result:
// Display result
long elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
System.out.printf("Computation done in %tT.%<tL%n", calendar(elapsed));
System.out.printf("Results: %s%n", Arrays.toString(occurences));
You can also output partial computations:
// Print debug output
int idxSize = (String.valueOf(size).length() * 4) / 3;
String template = "Slice[%," + idxSize + "d-%," + idxSize + "d] computed in %tT.%<tL by %s: %s%n";
for (Task task : tasks) {
System.out.printf(template, task.sliceStart, task.sliceEnd, calendar(task.elapsed), task.runner, Arrays.toString(task.occurences));
}
Which gives on my workstation:
Computation done in 00:00:00.024
Results: [500159, 500875, 500617, 499785, 500017, 500777, 498394, 498614, 499498, 501264]
Slice[ 0-1 250 000] computed in 00:00:00.013 by pool-1-thread-1: [125339, 125580, 125338, 124888, 124751, 124608, 124463, 124351, 125023, 125659]
Slice[1 250 000-2 500 000] computed in 00:00:00.014 by pool-1-thread-2: [124766, 125423, 125111, 124756, 125201, 125695, 124266, 124405, 125083, 125294]
Slice[2 500 000-3 750 000] computed in 00:00:00.013 by pool-1-thread-3: [124903, 124756, 124934, 125640, 124954, 125452, 124556, 124816, 124737, 125252]
Slice[3 750 000-5 000 000] computed in 00:00:00.014 by pool-1-thread-4: [125151, 125116, 125234, 124501, 125111, 125022, 125109, 125042, 124655, 125059]
the small trick to convert elapsed millis in a stopwatch calendar:
static final TimeZone UTC= TimeZone.getTimeZone("UTC");
public static Calendar calendar(long millis) {
Calendar calendar = Calendar.getInstance(UTC);
calendar.setTimeInMillis(millis);
return calendar;
}
I'm using the jsr166y ForkJoinPool to distribute computational tasks amongst threads. But I clearly must be doing something wrong.
My tasks seem to work flawlessly if I create the ForkJoinPool with parallelism > 1 (the default is Runtime.availableProcessors(); I've been running with 2-8 threads). But if I create the ForkJoinPool with parallelism = 1, I see deadlocks after an unpredictable number of iterations.
Yes - setting parallelism = 1 is a strange practice. In this case, I'm profiling a parallel algorithm as thread-count increases, and I want to compare the parallel version, run with to a single thread, to a baseline serial implementation, so as to accurately ascertain the overhead of the parallel implementation.
Below is a simple example that illustrates the issue I'm seeing. The 'task' is a dummy iteration over a fixed array, divided recursively into 16 subtasks.
If run with THREADS = 2 (or more), it runs reliably to completion, but if run with THREADS = 1, it invariably deadlocks. After an unpredictable number of iterations, the main loop hangs in ForkJoinPool.invoke(), waiting on task.join(), and the worker thread exits.
I'm running with JDK 1.6.0_21 and 1.6.0_22 under Linux, and using a version of jsr166y downloaded a few days ago from Doug Lea's website (http://gee.cs.oswego.edu/dl/concurrency-interest/index.html)
Any suggestions for what I'm missing? Many thanks in advance.
package concurrent;
import jsr166y.ForkJoinPool;
import jsr166y.RecursiveAction;
public class TestFjDeadlock {
private final static int[] intArray = new int[256 * 1024];
private final static float[] floatArray = new float[256 * 1024];
private final static int THREADS = 1;
private final static int TASKS = 16;
private final static int ITERATIONS = 10000;
public static void main(String[] args) {
// Initialize the array
for (int i = 0; i < intArray.length; i++) {
intArray[i] = i;
}
ForkJoinPool pool = new ForkJoinPool(THREADS);
// Run through ITERATIONS loops, subdividing the iteration into TASKS F-J subtasks
for (int i = 0; i < ITERATIONS; i++) {
pool.invoke(new RecursiveIterate(0, intArray.length));
}
pool.shutdown();
}
private static class RecursiveIterate extends RecursiveAction {
final int start;
final int end;
public RecursiveIterate(final int start, final int end) {
this.start = start;
this.end = end;
}
#Override
protected void compute() {
if ((end - start) <= (intArray.length / TASKS)) {
// We've reached the subdivision limit - iterate over the arrays
for (int i = start; i < end; i += 3) {
floatArray[i] += i + intArray[i];
}
} else {
// Subdivide and start new tasks
final int mid = (start + end) >>> 1;
invokeAll(new RecursiveIterate(start, mid), new RecursiveIterate(mid, end));
}
}
}
}
looks like a bug in the ForkJoinPool. everything i can see in the usage for the class fits your example. the only other possibility might be one of your tasks throwing an exception and dying abnormally (although that should still be handled).