core-count tasks CompletableFuture slower than parallelStream

core-count tasks CompletableFuture slower than parallelStream - java

My PC is four-cored (FYI)
CompletableFuture will use ForkJoinPool.commonPool() as its official doc points out:
All async methods without an explicit Executor argument are performed using the ForkJoinPool.commonPool() (unless it does not support a parallelism level of at least two, in which case, a new Thread is created to run each task).
I debugged and found out the following code from CompletableFuture.supplyAsync(Supplier<U> supplier)
private static final boolean useCommonPool =
(ForkJoinPool.getCommonPoolParallelism() > 1);
/**
* Default executor -- ForkJoinPool.commonPool() unless it cannot
* support parallelism.
*/
private static final Executor asyncPool = useCommonPool ?
ForkJoinPool.commonPool() : new ThreadPerTaskExecutor();
Which means as parallelStream always does using ForkJoinPool.commonPool(), but here why it's quicker.
I tried to print them out and found out that only three threads when using CompletableFuture:
private static int concurrencyGet() {
List<CompletableFuture<Integer>> futureList = IntStream.rangeClosed(0, 10).boxed()
.map(i -> CompletableFuture.supplyAsync(() -> getNumber(i)))
.collect(Collectors.toList());
return futureList.stream().map(future -> future.join()).reduce(0, Integer::sum);
}
But parallelStream using four including the main thread.
My guess is that in CompletableFuture.supplyAsync(), the ForkJoinPool.getCommonPoolParallelism() is only three while main thread taking one of the four, since it's asynchronous.
But the parallelStream will use up all the four since its not asynchronous.
Is this correct? I wonder are there some official documentations for this issue?
Thanks for the help.

Following is how I understood it from Venkat Subramaniams talk on Parallel and Asynchronous Programming with Streams and CompletableFuture:
As CompleteableFuture also utilizes ForkJoinPool.commonPool() it may as well use the main thread, and it does under certain circumstances.
Given the following example
public static void main(String[] args) {
CompletableFuture<Integer> future = CompletableFuture.supplyAsync(() -> numberSupplier());
future.thenAccept(i -> System.out.println("f: " + i + " - " + Thread.currentThread()));
sleep(100); //wait for async operations to finish before exiting
}
private static Integer numberSupplier() {
Integer n = 2;
System.out.println("c: " + n + " - " + Thread.currentThread());
sleep(19);
return n;
}
private static void sleep(int millis) {
try {
Thread.sleep(millis);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
you might get a console output like this:
c: 2 - Thread[ForkJoinPool.commonPool-worker-1,5,main]
f: 2 - Thread[ForkJoinPool.commonPool-worker-1,5,main]
Both, the supplyAsync(..) as well as the thenAccept(..) part are executed by a worker thread from the ForkJoinPool.
However, if the Supplier<Integer> given to supplyAsync(..) is so fast, that it is finished when the thenAccept(..) is invoked, then that second part might as well be executed in the main thread:
private static Integer numberSupplier() {
Integer n = 2;
//System.out.println("c: " + n + " - " + Thread.currentThread());
//sleep(19);
return n;
}
Output:
f: 2 - Thread[main,5,main]

Related

How to force CompletableFuture.thenApply() to run on the same thread that ran the previous stage?

Here's a short code version of the problem I'm facing:
public static void main(String[] args) {
CompletableFuture.supplyAsync(() -> {
/*
try {
Thread.sleep(2000);
} catch (InterruptedException ignored) {}
*/
//System.out.println("supplyAsync: " + Thread.currentThread().getName());
return 1;
})
.thenApply(i -> {
System.out.println("apply: " + Thread.currentThread().getName());
return i + 1;
})
.thenAccept((i) -> {
System.out.println("accept: " + Thread.currentThread().getName());
System.out.println("result: " + i);
}).join();
}
This is the output that I get:
apply: main
accept: main
result: 2
I'm surprised to see main there! I expected something like this which happens when I uncomment the Thread.sleep() call or even as much as uncomment the single sysout statement there:
supplyAsync: ForkJoinPool.commonPool-worker-1
apply: ForkJoinPool.commonPool-worker-1
accept: ForkJoinPool.commonPool-worker-1
result: 2
I understand thenApplyAsync() will make sure it won't run on the main thread, but I want to avoid passing the data returned by the supplier from the thread that ran supplyAsync to the thread that's going to run thenApply and the other subsequent thens in the chain.

The method thenApply evaluates the function in the caller’s thread because the future has been completed already. Of course, when you insert a sleep into the supplier, the future has not been completed by the time, thenApply is called. Even a print statement might slow down the supplier enough to have the main thread invoke thenApply and thenAccept first. But this is not reliable behavior, you may get different results when running the code repeatedly.
Not only does the future not remember which thread completed it, there is no way to tell an arbitrary thread to execute a particular code. The thread might be busy with something else, being entirely uncooperative, or even have terminated in the meanwhile.
Just consider
ExecutorService s = Executors.newSingleThreadExecutor();
CompletableFuture<Integer> cf = CompletableFuture.supplyAsync(() -> {
System.out.println("supplyAsync: " + Thread.currentThread().getName());
return 1;
}, s);
s.shutdown();
s.awaitTermination(1, TimeUnit.DAYS);
cf.thenApply(i -> {
System.out.println("apply: " + Thread.currentThread().getName());
return i + 1;
})
.thenAccept((i) -> {
System.out.println("accept: " + Thread.currentThread().getName());
System.out.println("result: " + i);
}).join();
How could we expect the functions passed to thenApply and thenAccept to be executed in the already terminated pool’s worker thread?
We could also write
CompletableFuture<Integer> cf = new CompletableFuture<>();
Thread t = new Thread(() -> {
System.out.println("completing: " + Thread.currentThread().getName());
cf.complete(1);
});
t.start();
t.join();
System.out.println("completer: " + t.getName() + " " + t.getState());
cf.thenApply(i -> {
System.out.println("apply: " + Thread.currentThread().getName());
return i + 1;
})
.thenAccept((i) -> {
System.out.println("accept: " + Thread.currentThread().getName());
System.out.println("result: " + i);
}).join();
which will print something alike
completing: Thread-0
completer: Thread-0 TERMINATED
apply: main
accept: main
result: 2
Obviously, we can’t insist on this thread processing the subsequent stages.
But even when the thread is a still alive worker thread of a pool, it doesn’t know that it has completed a future nor has it a notion of “processing subsequent stages”. Following the Executor abstraction, it just has received an arbitrary Runnable from the queue and after processing it, it proceeds with its main loop, fetching the next Runnable from the queue.
So once the first future has been completed, the only way to tell it to do the work of completing other futures, is by enqueuing the tasks. This is what happens when using thenApplyAsync specifying the same pool or performing all actions with the …Async methods without an executor, i.e. using the default pool.
When you use a single threaded executor for all …Async methods, you can be sure that all actions are executed by the same thread, but they will still get through the pool’s queue. Since even then, it’s the main thread actually enqueuing the dependent actions in case of an already completed future, a thread safe queue and hence, synchronization overhead, is unavoidable.
But note that even if you manage to create the chain of dependent actions first, before a single worker thread processes them all sequentially, this overhead is still there. Each future’s completion is done by storing the new state in a thread safe way, making the result potentially visible to all other threads, and atomically checking whether a concurrent completion (e.g. a cancelation) has happened in the meanwhile. Then, the dependent action(s) chained by other threads will be fetched, of course, in a thread safe way, before they are executed.
All these actions with synchronization semantics make it unlikely that there are benefits of processing the data by the same thread when having a chain of dependent CompletableFutures.
The only way to have an actual local processing potentially with performance benefits is by using
CompletableFuture.runAsync(() -> {
System.out.println("supplyAsync: " + Thread.currentThread().getName());
int i = 1;
System.out.println("apply: " + Thread.currentThread().getName());
i = i + 1;
System.out.println("accept: " + Thread.currentThread().getName());
System.out.println("result: " + i);
}).join();
Or, in other words, if you don’t want detached processing, don’t create detached processing stages in the first place.

Thread used for Java CompletableFuture composition?

I'm starting to be comfortable with Java CompletableFuture composition, having worked with JavaScript promises. Basically the composition just scheduled the chained commands on the indicated executor. But I'm unsure of which thread is running when the composition is performed.
Let's say I have two executors, executor1 and executor2; for simplicity let's say they are separate thread pools. I schedule a CompletableFuture (to use a very loose description):
CompletableFuture<Foo> futureFoo = CompletableFuture.supplyAsync(this::getFoo, executor1);
Then when that is done I transform the Foo to Bar using the second executor:
CompletableFuture<Bar> futureBar .thenApplyAsync(this::fooToBar, executor2);
I understand that getFoo() will be called from a thread in the executor1 thread pool. I understand that fooToBar() will be called from a thread in the executor2 thread pool.
But what thread is used for the actual composition, i.e. after getFoo() finishes and futureFoo() is complete; but before the fooToBar() command gets scheduled on executor2? In other words, what thread actually runs the code to schedule the second command on the second executor?
Is the scheduling performed as part of the same thread in executor1 that called getFoo()? If so, would this completable future composition be equivalent to my simply scheduling fooToBar() manually myself in the first command in the executor1 task?

This is intentionally unspecified. In practice, it will be handled by the same code that also handles the chained operations when the variants without the Async suffix are invoked and exhibits similar behavior.
So when we use the following test code
CompletableFuture.supplyAsync(() -> {
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(1));
return "";
}, r -> new Thread(r, "A").start())
.thenAcceptAsync(s -> {}, r -> {
System.out.println("scheduled by " + Thread.currentThread());
new Thread(r, "B").start();
});
it will likely print
scheduled by Thread[A,5,main]
as the thread that completed the previous stage was used to schedule the depending action.
However when we use
CompletableFuture<String> first = CompletableFuture.supplyAsync(() -> "",
r -> new Thread(r, "A").start());
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(1));
first.thenAcceptAsync(s -> {}, r -> {
System.out.println("scheduled by " + Thread.currentThread());
new Thread(r, "B").start();
});
it will likely print
scheduled by Thread[main,5,main]
as by the time the main thread invokes thenAcceptAsync, the first future is already completed and the main thread will schedule the action itself.
But that is not the end of the story. When we use
CompletableFuture<String> first = CompletableFuture.supplyAsync(() -> {
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(5));
return "";
}, r -> new Thread(r, "A").start());
Set<String> s = ConcurrentHashMap.newKeySet();
Runnable submitter = () -> {
String n = Thread.currentThread().getName();
do {
for(int i = 0; i < 1000; i++)
first.thenAcceptAsync(x -> s.add(n+" "+Thread.currentThread().getName()),
Runnable::run);
} while(!first.isDone());
};
Thread b = new Thread(submitter, "B");
Thread c = new Thread(submitter, "C");
b.start();
c.start();
b.join();
c.join();
System.out.println(s);
It may not only print the combinations B A and C A from the first scenario and B B and C C from the second. On my machine it reproducibly also prints the combinations B C and C B indicating that an action passed to thenAcceptAsync by one thread got submitted to the executor by the other thread calling thenAcceptAsync with a different action at the same time.
This is matching the scenarios for the thread evaluating the function passed to thenApply (without the Async) described in this answer. As said at the beginning, that was what I expected as both things are likely handled by the same code. But unlike the thread evaluating the function passed to thenApply, the thread invoking the execute method on the Executor is not even mentioned in the documentation. So in theory, another implementation could use an entirely different thread not calling a method on the future nor completing it.

At the end is a simple program that does like your code snippet and allows you to play with it.
The output confirms that the executor you supply is called to complete (unless you explicitly call complete early enough - which would happen in the calling thread of complete) when the condition it is waiting on is ready - the get() on a Future blocks until the Future is finished.
Supply an arg - there's an executor 1 and executor 2, supply no args there's just one executor. The output is either (same executor - things a run as separate tasks in the same executor sequentially) -
In thread Thread[main,5,main] - getFoo
In thread Thread[main,5,main] - getFooToBar
In thread Thread[pool-1-thread-1,5,main] - Supplying Foo
In thread Thread[pool-1-thread-1,5,main] - fooToBar
In thread Thread[main,5,main] - Completed
OR (two executors - things again run sequentially but using different executors) -
In thread Thread[main,5,main] - getFoo
In thread Thread[main,5,main] - getFooToBar
In thread Thread[pool-1-thread-1,5,main] - Supplying Foo
In thread Thread[pool-2-thread-1,5,main] - fooToBar
In thread Thread[main,5,main] - Completed
Remember: the code with the executors (in this example can start immediately in another thread .. the getFoo was called prior to even getting to setting up the FooToBar).
Code follows -
package your.test;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import java.util.function.Function;
import java.util.function.Supplier;
public class TestCompletableFuture {
private static void dumpWhichThread(final String msg) {
System.err.println("In thread " + Thread.currentThread().toString() + " - " + msg);
}
private static final class Foo {
final int i;
Foo(int i) {
this.i = i;
}
};
public static Supplier<Foo> getFoo() {
dumpWhichThread("getFoo");
return new Supplier<Foo>() {
#Override
public Foo get() {
dumpWhichThread("Supplying Foo");
return new Foo(10);
}
};
}
private static final class Bar {
final String j;
public Bar(final String j) {
this.j = j;
}
};
public static Function<Foo, Bar> getFooToBar() {
dumpWhichThread("getFooToBar");
return new Function<Foo, Bar>() {
#Override
public Bar apply(Foo t) {
dumpWhichThread("fooToBar");
return new Bar("" + t.i);
}
};
}
public static void main(final String args[]) throws InterruptedException, ExecutionException, TimeoutException {
final TestCompletableFuture obj = new TestCompletableFuture();
obj.running(args.length == 0);
}
private String running(final boolean sameExecutor) throws InterruptedException, ExecutionException, TimeoutException {
final Executor executor1 = Executors.newSingleThreadExecutor();
final Executor executor2 = sameExecutor ? executor1 : Executors.newSingleThreadExecutor();
CompletableFuture<Foo> futureFoo = CompletableFuture.supplyAsync(getFoo(), executor1);
CompletableFuture<Bar> futureBar = futureFoo.thenApplyAsync(getFooToBar(), executor2);
try {
// Try putting a complete here before the get ..
return futureBar.get(50, TimeUnit.SECONDS).j;
}
finally {
dumpWhichThread("Completed");
}
}
}
Which thread triggers the Bar stage to progress - in the above - it's executor1. In general the thread completing the future (i.e. giving it a value) is what releases the thing depending on it. If you completed the FutureFoo immediately on the main thread - it would be the one triggering it.
SO you have to be careful with this. If you have "N" things all waiting on the future results - but use only a single threaded executor - then the first one scheduled will block that executor until it completes. You can extrapolate to M threads, N futures - it can decay into "M" locks preventing the rest of things progressing.

Java 8 Concurrency Simplest Canonical Form for Basic Task

I have two questions:
1. What is the simplest canonical form for running a Callable as a task in Java 8, capturing and processing the result?
2. In the example below, what is the best/simplest/clearest way to hold the main process open until all the tasks have completed?
Here's the example I have so far -- is this the best approach in Java 8 or is there something more basic?
import java.util.*;
import java.util.concurrent.*;
import java.util.function.*;
public class SimpleTask implements Supplier<String> {
private SplittableRandom rand = new SplittableRandom();
final int id;
SimpleTask(int id) { this.id = id; }
#Override
public String get() {
try {
TimeUnit.MILLISECONDS.sleep(rand.nextInt(50, 300));
} catch(InterruptedException e) {
System.err.println("Interrupted");
}
return "Completed " + id + " on " +
Thread.currentThread().getName();
}
public static void main(String[] args) throws Exception {
for(int i = 0; i < 10; i++)
CompletableFuture.supplyAsync(new SimpleTask(i))
.thenAccept(System.out::println);
System.in.read(); // Or else program ends too soon
}
}
Is there a simpler and clearer Java-8 way to do this? And how do I eliminate the System.in.read() in favor of a better approach?

The canonical way to wait for the completion of multiple CompletableFuture instance is to create a new one depending on all of them via CompletableFuture.allOf. You can use this new future to wait for its completion or schedule new follow-up actions just like with any other CompletableFuture:
CompletableFuture.allOf(
IntStream.range(0,10).mapToObj(SimpleTask::new)
.map(s -> CompletableFuture.supplyAsync(s).thenAccept(System.out::println))
.toArray(CompletableFuture<?>[]::new)
).join();
Of course, it always gets simpler if you forego assigning a unique id to each task. Since your first question was about Callable, I’ll demonstrate how you can easily submit multiple similar tasks as Callables via an ExecutorService:
ExecutorService pool = Executors.newCachedThreadPool();
pool.invokeAll(Collections.nCopies(10, () -> {
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(
ThreadLocalRandom.current().nextInt(50, 300)));
final String s = "Completed on "+Thread.currentThread().getName();
System.out.println(s);
return s;
}));
pool.shutdown();
The executor service returned by Executors.newCachedThreadPool() is unshared and won’t stay alive, even if you forget to invoke shutDown(), but it can take up to one minute before all threads are terminated then.
Since your first question literally was: “What is the simplest canonical form for running a Callable as a task in Java 8, capturing and processing the result?”, the answer might be that the simplest form still is invoking it’s call() method directly, e.g.
Callable<String> c = () -> {
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(
ThreadLocalRandom.current().nextInt(50, 300)));
return "Completed on "+Thread.currentThread().getName();
};
String result = c.call();
System.out.println(result);
There’s no simpler way…

Consider collecting the futures into a list. Then you can use join() on each future to await their completion in the current thread:
List<CompletableFuture<Void>> futures = IntStream.range(0,10)
.mapToObj(id -> supplyAsync(new SimpleTask(id)).thenAccept(System.out::println))
.collect(toList());
futures.forEach(CompletableFuture::join);

Custom thread pool in Java 8 parallel stream

Is it possible to specify a custom thread pool for Java 8 parallel stream? I can not find it anywhere.
Imagine that I have a server application and I would like to use parallel streams. But the application is large and multi-threaded so I want to compartmentalize it. I do not want a slow running task in one module of the applicationblock tasks from another module.
If I can not use different thread pools for different modules, it means I can not safely use parallel streams in most of real world situations.
Try the following example. There are some CPU intensive tasks executed in separate threads.
The tasks leverage parallel streams. The first task is broken, so each step takes 1 second (simulated by thread sleep). The issue is that other threads get stuck and wait for the broken task to finish. This is contrived example, but imagine a servlet app and someone submitting a long running task to the shared fork join pool.
public class ParallelTest {
public static void main(String[] args) throws InterruptedException {
ExecutorService es = Executors.newCachedThreadPool();
es.execute(() -> runTask(1000)); //incorrect task
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.shutdown();
es.awaitTermination(60, TimeUnit.SECONDS);
}
private static void runTask(int delay) {
range(1, 1_000_000).parallel().filter(ParallelTest::isPrime).peek(i -> Utils.sleep(delay)).max()
.ifPresent(max -> System.out.println(Thread.currentThread() + " " + max));
}
public static boolean isPrime(long n) {
return n > 1 && rangeClosed(2, (long) sqrt(n)).noneMatch(divisor -> n % divisor == 0);
}
}

There actually is a trick how to execute a parallel operation in a specific fork-join pool. If you execute it as a task in a fork-join pool, it stays there and does not use the common one.
final int parallelism = 4;
ForkJoinPool forkJoinPool = null;
try {
forkJoinPool = new ForkJoinPool(parallelism);
final List<Integer> primes = forkJoinPool.submit(() ->
// Parallel task here, for example
IntStream.range(1, 1_000_000).parallel()
.filter(PrimesPrint::isPrime)
.boxed().collect(Collectors.toList())
).get();
System.out.println(primes);
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown();
}
}
The trick is based on ForkJoinTask.fork which specifies: "Arranges to asynchronously execute this task in the pool the current task is running in, if applicable, or using the ForkJoinPool.commonPool() if not inForkJoinPool()"

The parallel streams use the default ForkJoinPool.commonPool which by default has one less threads as you have processors, as returned by Runtime.getRuntime().availableProcessors() (This means that parallel streams leave one processor for the calling thread).
For applications that require separate or custom pools, a ForkJoinPool may be constructed with a given target parallelism level; by default, equal to the number of available processors.
This also means if you have nested parallel streams or multiple parallel streams started concurrently, they will all share the same pool. Advantage: you will never use more than the default (number of available processors). Disadvantage: you may not get "all the processors" assigned to each parallel stream you initiate (if you happen to have more than one). (Apparently you can use a ManagedBlocker to circumvent that.)
To change the way parallel streams are executed, you can either
submit the parallel stream execution to your own ForkJoinPool: yourFJP.submit(() -> stream.parallel().forEach(soSomething)).get(); or
you can change the size of the common pool using system properties: System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "20") for a target parallelism of 20 threads.
Example of the latter on my machine which has 8 processors. If I run the following program:
long start = System.currentTimeMillis();
IntStream s = IntStream.range(0, 20);
//System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "20");
s.parallel().forEach(i -> {
try { Thread.sleep(100); } catch (Exception ignore) {}
System.out.print((System.currentTimeMillis() - start) + " ");
});
The output is:
215 216 216 216 216 216 216 216 315 316 316 316 316 316 316 316 415 416 416 416
So you can see that the parallel stream processes 8 items at a time, i.e. it uses 8 threads. However, if I uncomment the commented line, the output is:
215 215 215 215 215 216 216 216 216 216 216 216 216 216 216 216 216 216 216 216
This time, the parallel stream has used 20 threads and all 20 elements in the stream have been processed concurrently.

Alternatively to the trick of triggering the parallel computation inside your own forkJoinPool you can also pass that pool to the CompletableFuture.supplyAsync method like in:
ForkJoinPool forkJoinPool = new ForkJoinPool(2);
CompletableFuture<List<Integer>> primes = CompletableFuture.supplyAsync(() ->
//parallel task here, for example
range(1, 1_000_000).parallel().filter(PrimesPrint::isPrime).collect(toList()),
forkJoinPool
);

The original solution (setting the ForkJoinPool common parallelism property) no longer works. Looking at the links in the original answer, an update which breaks this has been back ported to Java 8. As mentioned in the linked threads, this solution was not guaranteed to work forever. Based on that, the solution is the forkjoinpool.submit with .get solution discussed in the accepted answer. I think the backport fixes the unreliability of this solution also.
ForkJoinPool fjpool = new ForkJoinPool(10);
System.out.println("stream.parallel");
IntStream range = IntStream.range(0, 20);
fjpool.submit(() -> range.parallel()
.forEach((int theInt) ->
{
try { Thread.sleep(100); } catch (Exception ignore) {}
System.out.println(Thread.currentThread().getName() + " -- " + theInt);
})).get();
System.out.println("list.parallelStream");
int [] array = IntStream.range(0, 20).toArray();
List<Integer> list = new ArrayList<>();
for (int theInt: array)
{
list.add(theInt);
}
fjpool.submit(() -> list.parallelStream()
.forEach((theInt) ->
{
try { Thread.sleep(100); } catch (Exception ignore) {}
System.out.println(Thread.currentThread().getName() + " -- " + theInt);
})).get();

We can change the default parallelism using the following property:
-Djava.util.concurrent.ForkJoinPool.common.parallelism=16
which can set up to use more parallelism.

To measure the actual number of used threads, you can check Thread.activeCount():
Runnable r = () -> IntStream
.range(-42, +42)
.parallel()
.map(i -> Thread.activeCount())
.max()
.ifPresent(System.out::println);
ForkJoinPool.commonPool().submit(r).join();
new ForkJoinPool(42).submit(r).join();
This can produce on a 4-core CPU an output like:
5 // common pool
23 // custom pool
Without .parallel() it gives:
3 // common pool
4 // custom pool

Until now, I used the solutions described in the answers of this question. Now, I came up with a little library called Parallel Stream Support for that:
ForkJoinPool pool = new ForkJoinPool(NR_OF_THREADS);
ParallelIntStreamSupport.range(1, 1_000_000, pool)
.filter(PrimesPrint::isPrime)
.collect(toList())
But as #PabloMatiasGomez pointed out in the comments, there are drawbacks regarding the splitting mechanism of parallel streams which depends heavily on the size of the common pool. See Parallel stream from a HashSet doesn't run in parallel .
I am using this solution only to have separate pools for different types of work but I can not set the size of the common pool to 1 even if I don't use it.

Note:
There appears to be a fix implemented in JDK 10 that ensures the Custom Thread Pool uses the expected number of threads.
Parallel stream execution within a custom ForkJoinPool should obey the parallelism
https://bugs.openjdk.java.net/browse/JDK-8190974

If you don't want to rely on implementation hacks, there's always a way to achieve the same by implementing custom collectors that will combine map and collect semantics... and you wouldn't be limited to ForkJoinPool:
list.stream()
.collect(parallel(i -> process(i), executor, 4))
.join()
Luckily, it's done already here and available on Maven Central:
http://github.com/pivovarit/parallel-collectors
Disclaimer: I wrote it and take responsibility for it.

Go to get abacus-common. Thread number can by specified for parallel stream. Here is the sample code:
LongStream.range(4, 1_000_000).parallel(threadNum)...
Disclosure： I'm the developer of abacus-common.

If you don't need a custom ThreadPool but you rather want to limit the number of concurrent tasks, you can use:
List<Path> paths = List.of("/path/file1.csv", "/path/file2.csv", "/path/file3.csv").stream().map(e -> Paths.get(e)).collect(toList());
List<List<Path>> partitions = Lists.partition(paths, 4); // Guava method
partitions.forEach(group -> group.parallelStream().forEach(csvFilePath -> {
// do your processing
}));
(Duplicate question asking for this is locked, so please bear me here)

Here is how I set the max thread count flag mentioned above programatically and a code sniped to verify that the parameter is honored
System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "2");
Set<String> threadNames = Stream.iterate(0, n -> n + 1)
.parallel()
.limit(100000)
.map(i -> Thread.currentThread().getName())
.collect(Collectors.toSet());
System.out.println(threadNames);
// Output -> [ForkJoinPool.commonPool-worker-1, Test worker, ForkJoinPool.commonPool-worker-3]

If you don't mind using a third-party library, with cyclops-react you can mix sequential and parallel Streams within the same pipeline and provide custom ForkJoinPools. For example
ReactiveSeq.range(1, 1_000_000)
.foldParallel(new ForkJoinPool(10),
s->s.filter(i->true)
.peek(i->System.out.println("Thread " + Thread.currentThread().getId()))
.max(Comparator.naturalOrder()));
Or if we wished to continue processing within a sequential Stream
ReactiveSeq.range(1, 1_000_000)
.parallel(new ForkJoinPool(10),
s->s.filter(i->true)
.peek(i->System.out.println("Thread " + Thread.currentThread().getId())))
.map(this::processSequentially)
.forEach(System.out::println);
[Disclosure I am the lead developer of cyclops-react]

I tried the custom ForkJoinPool as follows to adjust the pool size:
private static Set<String> ThreadNameSet = new HashSet<>();
private static Callable<Long> getSum() {
List<Long> aList = LongStream.rangeClosed(0, 10_000_000).boxed().collect(Collectors.toList());
return () -> aList.parallelStream()
.peek((i) -> {
String threadName = Thread.currentThread().getName();
ThreadNameSet.add(threadName);
})
.reduce(0L, Long::sum);
}
private static void testForkJoinPool() {
final int parallelism = 10;
ForkJoinPool forkJoinPool = null;
Long result = 0L;
try {
forkJoinPool = new ForkJoinPool(parallelism);
result = forkJoinPool.submit(getSum()).get(); //this makes it an overall blocking call
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown(); //always remember to shutdown the pool
}
}
out.println(result);
out.println(ThreadNameSet);
}
Here is the output saying the pool is using more threads than the default 4.
50000005000000
[ForkJoinPool-1-worker-8, ForkJoinPool-1-worker-9, ForkJoinPool-1-worker-6, ForkJoinPool-1-worker-11, ForkJoinPool-1-worker-10, ForkJoinPool-1-worker-1, ForkJoinPool-1-worker-15, ForkJoinPool-1-worker-13, ForkJoinPool-1-worker-4, ForkJoinPool-1-worker-2]
But actually there is a weirdo, when I tried to achieve the same result using ThreadPoolExecutor as follows:
BlockingDeque blockingDeque = new LinkedBlockingDeque(1000);
ThreadPoolExecutor fixedSizePool = new ThreadPoolExecutor(10, 20, 60, TimeUnit.SECONDS, blockingDeque, new MyThreadFactory("my-thread"));
but I failed.
It will only start the parallelStream in a new thread and then everything else is just the same, which again proves that the parallelStream will use the ForkJoinPool to start its child threads.

I made utility method to run task in parallel with argument which defines max number of threads.
public static void runParallel(final int maxThreads, Runnable task) throws RuntimeException {
ForkJoinPool forkJoinPool = null;
try {
forkJoinPool = new ForkJoinPool(maxThreads);
forkJoinPool.submit(task).get();
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown();
}
}
}
It creates ForkJoinPool with max number of allowed threads and it shuts it down after the task completes (or fails).
Usage is following:
final int maxThreads = 4;
runParallel(maxThreads, () ->
IntStream.range(1, 1_000_000).parallel()
.filter(PrimesPrint::isPrime)
.boxed().collect(Collectors.toList()));

The (currently) accepted answer is partly wrong. It is not sufficient to just submit() the parallel stream to the dedicated fork-join-pool. In this case, the stream will use that pool's threads and additionally the common fork-join-pool and even the calling thread to handle the workload of the stream, it seems up to the size of the common fork-join pool. The behaviour is a bit weird but definitely not what is required.
To actually restrict the work completely to the dedicated pool, you must encapsulate it into a CompletableFuture:
final int parallelism = 4;
ForkJoinPool forkJoinPool = null;
try {
forkJoinPool = new ForkJoinPool(parallelism);
final List<Integer> primes = CompletableFuture.supplyAsync(() ->
// Parallel task here, for example
IntStream.range(1, 1_000_000).parallel()
.filter(PrimesPrint::isPrime)
.boxed().collect(Collectors.toList()),
forkJoinPool) // <- passes dedicated fork-join pool as executor
.join(); // <- Wait for result from forkJoinPool
System.out.println(primes);
} finally {
if (forkJoinPool != null) {
forkJoinPool.shutdown();
}
}
This code stays with all operations in forkJoinPool on both Java 8u352 and Java 17.0.1.

How to transfer values between multiple threads in java

In java:
r1=complexCalc1();
r2=complexCalc2();
r3=complexCalc3();
r4=complexCalc4();
r5=complexCalc5();
return r1+r2+r3+r4+r5;
Assume running times like
complexCalc1() -> 5 mins
complexCalc2() -> 3 mins
complexCalc3() -> 2 mins
complexCalc4() -> 4 mins
complexCalc5() -> 9 mins
if this program had run sequentially it would take 23
minutes for calculating r1+r2+r3+r4+r5. If each function had run parallely i.e. each
complexCalc() function in separate threads total time taken would be 9 mins for r1+r2+r3+r4+r5 computation.
My question is how to achieve it.. I tried several methods but i still
cant figure out anything concrete.
Thanks in advance.

A rough draft of the solution, using only standard Java API, looks like this:
public class Main {
private static final Callable<Integer> createCalculationSimulator (final int result, final int minutesToWait) {
return new Callable<Integer> () {
#Override
public Integer call() throws Exception {
Thread.sleep(minutesToWait*60*1000L);
return result;
}
};
}
public static void main(String[] args) throws Exception {
final ExecutorService executorService = Executors.newFixedThreadPool (5);
final long startTime = System.currentTimeMillis();
final List<Future<Integer>> results = executorService.invokeAll(
Arrays.asList(
createCalculationSimulator(1, 5),
createCalculationSimulator(2, 3),
createCalculationSimulator(3, 2),
createCalculationSimulator(4, 4),
createCalculationSimulator(5, 9)));
int resultSum = 0;
for (final Future<Integer> result : results) {
resultSum += result.get();
}
final long endTime = System.currentTimeMillis();
System.out.println("The end result is " + resultSum + ". Time needed = " + (endTime - startTime)/1000 + " seconds.");
}
}

If you can divide the task into logical independent tasks (which I believe you can as you already indicated) then it is fairly easy with Java 5+.
Implement each task in its own Callable
Submit all of them to the Executor. ExecutorService.invokeAll(...)
The above step returns a List which you will store and make sure all of them are completed (Look at the api)
Note
Initialize the thread pool size to be equal to the number of cores (Of-course, you tune after you profile.
If you can have external dependency then I suggest using Guava library that simplifies the usage of Executors.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

core-count tasks CompletableFuture slower than parallelStream - java

Related

How to force CompletableFuture.thenApply() to run on the same thread that ran the previous stage?

Thread used for Java CompletableFuture composition?

Java 8 Concurrency Simplest Canonical Form for Basic Task

Custom thread pool in Java 8 parallel stream

How to transfer values between multiple threads in java

Categories

Resources