CompletableFuture single task that continues with many parallel tasks - java

I have the following code:
return CompletableFuture.supplyAsync(() -> {
return foo; // some custom object
})
.thenAccept(foo -> {
// ??? need to spawn N async parallel jobs that works on 'foo'
});
In english: the first task creates the foo object asynchronously; and then I need to run N parallel processes on it.
Is there a better way to do this then:
...
CompletableFuture[] parallel = new CompletableFuture[N];
for (int i = 0; i < N; i++) {
parallel[i] = CompletableFuture.runAsync(() -> {
work(foo);
});
}
CompletableFuture.allOf(parallel).join();
...
I don't like this as one thread gets locked while waiting N jobs to finish.

You can chain as many independent jobs as you like on a particular prerequisite job, e.g.
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
Collections.nCopies(N, base).forEach(f -> f.thenAcceptAsync(foo -> work(foo)));
will spawn N parallel jobs, invoking work(foo) concurrently, after the completion of the initial job which provides the Foo instance.
But keep in mind, that the underlying framework will consider the number of available CPU cores to size the thread pool actually executing the parallel jobs, so if N > #cores, some of these jobs may run one after another.
If the work is I/O bound, thus, you want to have a higher number of parallel threads, you have to specify your own executor.
The nCopies/forEach is not necessary, a for loop would do as well, but it provides a hint of how to handle subsequent jobs, that depend on the completion of all these parallel jobs:
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
CompletableFuture<Void> all = CompletableFuture.allOf(
Collections.nCopies(N, base).stream()
.map(f -> f.thenAcceptAsync(foo -> work(foo)))
.toArray(CompletableFuture<?>[]::new));
Now you can use all to check for the completion of all jobs or chain additional actions.

Since CompletableFuture.allOf already returns another CompletableFuture<Void>a you can just do another .thenAccept on it and extract the returned values from the CFs in parallel in the callback, that way you avoid calling join

Related

Java CompletableFuture allOf approach

I am trying to run 3 operations in parallel using the CompletableFuture approach. Now these 3 operations return different types so need to retrieve the data separately. Here is what i am trying to do:
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync (() -> getAList());
CompletableFuture<Map<String,B> bFuture = CompletableFuture.supplyAsync (() -> getBMap());
CompletableFuture<Map<String,C> cFuture = CompletableFuture.supplyAsync (() -> getCMap());
CompletableFuture<Void> combinedFuture =
CompletableFuture.allOf (aFuture, bFuture, cFuture);
combinedFuture.get(); (or join())
List<A> aData = aFuture.get(); (or join)
Map<String, C> bData = bFuture.get(); (or join)
Map<String, C> cData = cFuture.get(); (or join)
This does the job and works but i am trying to understand if we need to do these gets/joins on combined future as well as individual ones and if there is a better way to do this.
Also i tried using then whenComplete() approach but then the variables i want to assign the returned data are inside the method so i am getting a "The final local variable cannot be assigned, since it is defined in an enclosing type in Java" error and i don't want to move them to the class level.
looking for some expert/alternate opinions. Thank you in advance
SG
Calling get or join just implies “wait for the completion”. It has no influence of the completion itself.
When you call CompletableFuture.supplyAsync(() -> getAList()), this method will submit the evaluation of the supplier to the common pool immediately. The caller’s only influence on the execution of getAList() is the fact that the JVM will terminate if there are no non-daemon threads running. This is a common error in simple test programs, incorporating a main method that doesn’t wait for completion. Otherwise, the execution of getAList() will complete, regardless of whether its result will ever be queried.
So when you use
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync(() -> getAList());
CompletableFuture<Map<String,B>> bFuture=CompletableFuture.supplyAsync(() -> getBMap());
CompletableFuture<Map<String,C>> cFuture=CompletableFuture.supplyAsync(() -> getCMap());
List<A> aData = aFuture.join();
Map<String, B> bData = bFuture.join();
Map<String, C> cData = cFuture.join();
The three subsequent supplyAsync calls ensure that the three operations might run concurrently. The three join() calls only wait for the result and when the third join() returned, you know that all three operations are completed. It’s possible that the first join() returns at a time when aFuture has been completed, but either or both of the other operations are still running, but that doesn’t matter for three independent operations.
When you execute CompletableFuture.allOf(aFuture, bFuture, cFuture).join(); before the individual join() calls, it ensures that all three operations completed before the first individual join() call, but as said, it has no impact when all three operations are independent and you’re not relying on some side effect of their execution (which you shouldn’t in general).
The actual purpose of allOf is to construct a new future when you do not want to wait for the result immediately. E.g.
record Result(List<A> aData, Map<String, B> bData, Map<String, C> cData) {}
CompletableFuture<Result> r = CompletableFuture.allOf(aFuture, bFuture, cFuture)
.thenApply(v -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
// return r or pass it to some other code...
here, the use of allOf is preferable to, e.g.
CompletableFuture<Result> r = CompletableFuture.supplyAsync(
() -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
because the latter might block a worker thread when join() is called from the supplier. The underlying framework might compensate when it detects this, e.g. start a new thread, but this is still an expensive operation. In contrast, the function chained to allOf is only evaluated after all futures completed, so all embedded join() calls are guaranteed to return immediately.
For a small number of futures, there’s still an alternative to allOf, e.g.
var r = aFuture.thenCompose(a ->
bFuture.thenCombine(cFuture, (b, c) -> new Result(a, b, c)));

Java Spring How to multithread operations and wait for completion

So I have a get operation where a user supplys a list of items.
For each item in the list I need to make an API call which is what takes the most time in my application.
Lets say the flow is List -> Operation A -> Operation B ( API Call ) -> Operation C
So instead of processing all the items in the list sequentially, I was thinking if I could create multiple threads. All the threads would have to finish Operation B before Operation C starts. Operations A and C don't need to be multithreaded.
Is there a straightforward way to do this?
You could try to use CompletableFuture for this, something along the lines of:
ExecutorService executorService = Executors.newFixedThreadPool(3);
CompletableFuture<?>[] all =
youList.stream()
.map(x -> CompletableFuture.supplyAsync(() -> operationA(x))
.thenApplyAsync(YourClass::operationB)
.thenApplyAsync(YouClass::operationC))
.toArray(x -> new CompletableFuture[0]);
CompletableFuture.allOf(all);

Thread vs CompletableFuture

What is the advantage of passing code directly to thread vs using CompletableFuture instead?
Thread thread = new Thread(() -> {do something});
thread.start();
VS
CompletableFuture<Void> cf1 =
CompletableFuture.runAsync(() -> {do something});
CompletableFuture.runAsync(...) runs the Runnable in the forkJoin-Pool which is managed, while new Thread() creates a new thread which you have to manage.
What does "is managed" mean, it's pre-allocated and the threads are shared in the JVM. When the runnable is completed, the thread can be reused for other runnables. This makes better usage of resources, especially as thread instantiation is an expensive operation - not only the object, but also some extra non-heap memory - the thread stack - has to be allocated.
#Gerald Mücke already mentioned the important difference:
CompletableFuture.runAsync(...) runs the Runnable in the forkJoin-Pool which is managed, while new Thread() creates a new thread which you have to manage.
CompletableFuture will use threads managed by a ThreadPool (default or customized).
However, I think the following two points are also should be considered.
First
CompletableFuture has so many easily understandable methods to chain different asynchronous computations together, making it much easier to introduce asynchrony than directly using Thread.
CompletableFuture[] futures = IntStream.rangeClosed(0, LEN).boxed()
.map(i -> CompletableFuture.supplyAsync(() -> runStage1(i), EXECUTOR_SERVICE))
.map(future -> future.thenCompose(i -> CompletableFuture.supplyAsync(() -> runStage2(i), EXECUTOR_SERVICE)))
.toArray(size -> new CompletableFuture[size]);
CompletableFuture.allOf(futures).join();
Second
You should never forget to handle exceptions; with CompletableFuture, you can directly handle them like this:
completableFuture.handle((s, e) -> e.toString()).join()
or take advantage of them this way to interrupt the computation:
completableFuture.completeExceptionally(new RuntimeException("Calculation failed!"));
while you will easily encounter some serious problems using Thread.
CompletableFuture is a promise which uses default ForkJoinPool (thread pool of size equal to number of CPU's) unless provided with another ThreadPool. A thread pool will contain n or more number of Thread.

Ordered execution of many CompletableFuture.allof() while staying non-blocking

I have this case where there is 10 or more tasks that are grouped into many groups. Inside these groups everything should run concurrently, but because each group needs the results of the previous group (with exception of the first group), I need to run them in an orderly fashion (Tasks inside a group don't need to run in order).
The tasks themselves are querying data from database then apply some transformation and save it back to database.
Task 1.1 // This group run first
Task 1.2
Task 2.1 // Waiting results from group 1
Task 2.2
Task 2.3
Task 3.1 // Waiting results from group 2
I was thinking to use list of allOf(), iterate it then explicitly call get() for each of that allOf(), but it will block which I don't want it to happen, so my question is, how to execute many allOf() in order? Is iteven possible to use only CompletableFuture here?
When you use allOf(), it returns a CompletableFuture that will complete only when all of the given completion stages are completed.
If you chain calls from the returned future, they are thus guaranteed that a call to get() on any of the completion stages passed to allOf() will never block (since they are already completed).
// First group
CompletableFuture<Integer> task11 = CompletableFuture.supplyAsync(() -> 1);
CompletableFuture<Integer> task12 = CompletableFuture.supplyAsync(() -> 42);
CompletableFuture<Integer> task13 = CompletableFuture.supplyAsync(() -> 1729);
// this one will complete after all tasks from the first group complete
CompletableFuture<Void> allFirstTasks = CompletableFuture.allOf(task11, task12, task13);
// Second group will be child tasks from the first group
CompletableFuture<Integer> task21 = allFirstTasks.thenApply(__ ->
task11.join() + task12.join() + task13.join() // will not block
);
Note: using join() instead of get() to avoid handling of checked exceptions.

How does CompletableFuture know that tasks are independent?

Imagine that we have the following dummy code:
CompletableFuture<BigInteger> cf1 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(2L));
CompletableFuture<BigInteger> cf2 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(3L));
cf1.thenCombine(cf2, (x, y) -> x.add(y)).thenAccept(System.out::println);
Does JVM know that cf1 and cf2 carry independent threads in this case? And what will change if threads will be dependent (for example, use one connection to database)?
More general, how does CompletableFuture synchronize threads?
A CompletableFuture has no relation to any thread. It is just a holder for a result retrieved asynchronously with methods to operate on that result.
The static supplyAsync and runAsync methods are just helper methods. The javadoc of supplyAsync states
Returns a new CompletableFuture that is asynchronously completed by a
task running in the ForkJoinPool.commonPool() with the value obtained
by calling the given Supplier.
This is more or less equivalent to
Supplier<R> sup = ...;
CompletableFuture<R> future = new CompletableFuture<R>();
ForkJoinPool.commonPool().submit(() -> {
try {
R result = sup.get();
future.complete(result);
} catch (Throwable e) {
future.completeExceptionally(e);
}
});
return future;
The CompletableFuture is returned, even allowing you to complete it before the task submitted to the pool.
More general, how does CompletableFuture synchronize threads?
It doesn't, since it doesn't know which threads are operating on it. This is further hinted at in the javadoc
Since (unlike FutureTask) this class has no direct control over the
computation that causes it to be completed, cancellation is treated as
just another form of exceptional completion. Method cancel has the
same effect as completeExceptionally(new CancellationException()).
Method isCompletedExceptionally() can be used to determine if a
CompletableFuture completed in any exceptional fashion.
CompletableFuture objects do not control processing.
I don't think that a CompletableFuture (CF) "synchronizes threads". It uses the executor you have provided or the common pool if you have not provided one.
When you call supplyAsync, the CF submits the various tasks to that pool which in turns manages the underlying threads to execute the tasks.
It doesn't know, nor does it try to synchronize anything. It is still the client's responsibility to properly synchronize access to mutable shared data.

Categories