Java CompletableFuture allOf approach - java

I am trying to run 3 operations in parallel using the CompletableFuture approach. Now these 3 operations return different types so need to retrieve the data separately. Here is what i am trying to do:
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync (() -> getAList());
CompletableFuture<Map<String,B> bFuture = CompletableFuture.supplyAsync (() -> getBMap());
CompletableFuture<Map<String,C> cFuture = CompletableFuture.supplyAsync (() -> getCMap());
CompletableFuture<Void> combinedFuture =
CompletableFuture.allOf (aFuture, bFuture, cFuture);
combinedFuture.get(); (or join())
List<A> aData = aFuture.get(); (or join)
Map<String, C> bData = bFuture.get(); (or join)
Map<String, C> cData = cFuture.get(); (or join)
This does the job and works but i am trying to understand if we need to do these gets/joins on combined future as well as individual ones and if there is a better way to do this.
Also i tried using then whenComplete() approach but then the variables i want to assign the returned data are inside the method so i am getting a "The final local variable cannot be assigned, since it is defined in an enclosing type in Java" error and i don't want to move them to the class level.
looking for some expert/alternate opinions. Thank you in advance
SG

Calling get or join just implies “wait for the completion”. It has no influence of the completion itself.
When you call CompletableFuture.supplyAsync(() -> getAList()), this method will submit the evaluation of the supplier to the common pool immediately. The caller’s only influence on the execution of getAList() is the fact that the JVM will terminate if there are no non-daemon threads running. This is a common error in simple test programs, incorporating a main method that doesn’t wait for completion. Otherwise, the execution of getAList() will complete, regardless of whether its result will ever be queried.
So when you use
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync(() -> getAList());
CompletableFuture<Map<String,B>> bFuture=CompletableFuture.supplyAsync(() -> getBMap());
CompletableFuture<Map<String,C>> cFuture=CompletableFuture.supplyAsync(() -> getCMap());
List<A> aData = aFuture.join();
Map<String, B> bData = bFuture.join();
Map<String, C> cData = cFuture.join();
The three subsequent supplyAsync calls ensure that the three operations might run concurrently. The three join() calls only wait for the result and when the third join() returned, you know that all three operations are completed. It’s possible that the first join() returns at a time when aFuture has been completed, but either or both of the other operations are still running, but that doesn’t matter for three independent operations.
When you execute CompletableFuture.allOf(aFuture, bFuture, cFuture).join(); before the individual join() calls, it ensures that all three operations completed before the first individual join() call, but as said, it has no impact when all three operations are independent and you’re not relying on some side effect of their execution (which you shouldn’t in general).
The actual purpose of allOf is to construct a new future when you do not want to wait for the result immediately. E.g.
record Result(List<A> aData, Map<String, B> bData, Map<String, C> cData) {}
CompletableFuture<Result> r = CompletableFuture.allOf(aFuture, bFuture, cFuture)
.thenApply(v -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
// return r or pass it to some other code...
here, the use of allOf is preferable to, e.g.
CompletableFuture<Result> r = CompletableFuture.supplyAsync(
() -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
because the latter might block a worker thread when join() is called from the supplier. The underlying framework might compensate when it detects this, e.g. start a new thread, but this is still an expensive operation. In contrast, the function chained to allOf is only evaluated after all futures completed, so all embedded join() calls are guaranteed to return immediately.
For a small number of futures, there’s still an alternative to allOf, e.g.
var r = aFuture.thenCompose(a ->
bFuture.thenCombine(cFuture, (b, c) -> new Result(a, b, c)));

Related

Correct use of CompletableFuture.anyOf()

I want to make parallel calls to different external services and proceed with the first successful response.
Successful here means that the result returned by the service has certain values in certain fields.
However, for CompletableFuture, everything other than an exception is success. So, even for for business failures, I have to throw an exception to signal non-success to CompletableFuture. This feels wrong, ideally I would want to provide a boolean to indicate business success/failure. Is there a better way to signal business failures?
The second question I have is, how do I make sure I don't run out of threads due to the abandoned CompletableFutures that would keep running even after CompletableFutures.anyOf() returns. Ideally I want to force stop the threads but as per the thread below, the best I can do is cancel the downstream operations.
How to cancel Java 8 completable future?
When you call CompletableFuture#cancel, you only stop the downstream
part of the chain. Upstream part, i. e. something that will eventually
call complete(...) or completeExceptionally(...), doesn't get any
signal that the result is no more needed.
I can force stop treads by providing my own ExecutorService and calling shutdown()/shutdownNow() on it after CompletableFuture.anyOf() returns.
However I am not sure about the implications of creating a new ExecutorService instance for each request.
This is perhaps an attempt to look away from anyOf(), as there's just no easy way to cancel other tasks with it.
What this is doing is create and start async tasks, then keep a reference to the future along with an object that would be used to effectively terminate other tasks, which of course is dependent on your actual code/task. In this example, I'm just returning the input string (hence the type Pair<CompletableFuture<String>, String>); your code will probably have something like a request object.
ExecutorService exec = Executors.newFixedThreadPool(5);
List<String> source = List.of();
List<Pair<CompletableFuture<String>, String>> futures = source.stream()
.map(s -> Pair.of(CompletableFuture.supplyAsync(() -> s.toUpperCase(), exec), s))
.collect(Collectors.toList());
Consumer<Pair<CompletableFuture<String>, String>> cancelOtherTasksFunction = pair -> {
futures.stream()
.filter(future -> future != pair)
.forEach(future -> {
future.getLeft().cancel(true); // cancel the future
// future.getRight().doSomething() // cancel actual task
});
};
AtomicReference<String> result = new AtomicReference<>();
futures.forEach(future -> future.getLeft()
.thenAccept(s -> {
if (null == result.compareAndExchangeRelease(null, s)) {
// the first success is recorded
cancelOtherTasksFunction.accept(future);
}
}));
I suppose you can (should?) create a class to hold the future and the task object (such as an http request you could cancel) to replace the Pair.
Note:
if (null == result.compareAndExchangeRelease(null, s)) is only valid if your future returns a non-null value.
The first valid response will be in result, but you still need to test for blocking before returning it, although I suppose it should work as other tasks are being cancelled (that's the theory).
You may decide to make futures.forEach part of the stream pipeline above, just be careful to force all tasks to be submitted (which collect(Collectors.toList()) does).

Is the writer's reason correct for using thenCompose and not thenComposeAsync

This question is different from this one Difference between Java8 thenCompose and thenComposeAsync because I want to know what is the writer's reason for using thenCompose and not thenComposeAsync.
I was reading Modern Java in action and I came across this part of code on page 405:
public static List<String> findPrices(String product) {
ExecutorService executor = Executors.newFixedThreadPool(10);
List<Shop> shops = Arrays.asList(new Shop(), new Shop());
List<CompletableFuture<String>> priceFutures = shops.stream()
.map(shop -> CompletableFuture.supplyAsync(() -> shop.getPrice(product), executor))
.map(future -> future.thenApply(Quote::parse))
.map(future -> future.thenCompose(quote ->
CompletableFuture.supplyAsync(() -> Discount.applyDiscount(quote), executor)))
.collect(toList());
return priceFutures.stream()
.map(CompletableFuture::join).collect(toList());
}
Everything is Ok and I can understand this code but here is the writer's reason for why he didn't use thenComposeAsync on page 408 which I can't understand:
In general, a method without the Async suffix in its name executes
its task in the same threads the previous task, whereas a method
terminating with Async always submits the succeeding task to the
thread pool, so each of the tasks can be handled by a
different thread. In this case, the result of the second
CompletableFuture depends on the first,so it makes no difference to
the final result or to its broad-brush timing whether you compose the
two CompletableFutures with one or the other variant of this method
In my understanding with the thenCompose( and thenComposeAsync) signatures as below:
public <U> CompletableFuture<U> thenCompose(
Function<? super T, ? extends CompletionStage<U>> fn) {
return uniComposeStage(null, fn);
}
public <U> CompletableFuture<U> thenComposeAsync(
Function<? super T, ? extends CompletionStage<U>> fn) {
return uniComposeStage(asyncPool, fn);
}
The result of the second CompletableFuture can depends on the previous CompletableFuture in many situations(or rather I can say almost always), should we use thenCompose and not thenComposeAsync in those cases?
What if we have blocking code in the second CompletableFuture?
This is a similar example which was given by person who answered similar question here: Difference between Java8 thenCompose and thenComposeAsync
public CompletableFuture<String> requestData(Quote quote) {
Request request = blockingRequestForQuote(quote);
return CompletableFuture.supplyAsync(() -> sendRequest(request));
}
To my mind in this situation using thenComposeAsync can make our program faster because here blockingRequestForQuote can be run on different thread. But based on the writer's opinion we should not use thenComposeAsync because it depends on the first CompletableFuture result(that is Quote).
My question is:
Is the writer's idea correct when he said :
In this case, the result of the second
CompletableFuture depends on the first,so it makes no difference to
the final result or to its broad-brush timing whether you compose the
two CompletableFutures with one or the other variant of this method
TL;DR It is correct to use thenCompose instead of thenComposeAsync here, but not for the cited reasons. Generally, the code example should not be used as a template for your own code.
This chapter is a recurring topic on Stackoverflow for reasons we can best describe as “insufficient quality”, to stay polite.
In general, a method without the Async suffix in its name executes its task in the same threads the previous task, …
There is no such guaranty about the executing thread in the specification. The documentation says:
Actions supplied for dependent completions of non-async methods may be performed by the thread that completes the current CompletableFuture, or by any other caller of a completion method.
So there’s also the possibility that the task is performed “by any other caller of a completion method”. An intuitive example is
CompletableFuture<X> f = CompletableFuture.supplyAsync(() -> foo())
.thenApply(f -> f.bar());
There are two threads involved. One that invokes supplyAsync and thenApply and the other which will invoke foo(). If the second completes the invocation of foo() before the first thread enters the execution of thenApply, it is possible that the future is already completed.
A future does not remember which thread completed it. Neither does it have some magic ability to tell that thread to perform an action despite it might be busy with something else or even have terminated since then. So it should be obvious that calling thenApply on an already completed future can’t promise to use the thread that completed it. In most cases, it will perform the action immediately in the thread that calls thenApply. This is covered by the specification’s wording “any other caller of a completion method”.
But that’s not the end of the story. As this answer explains, when there are more than two threads involved, the action can also get performed by another thread calling an unrelated completion method on the future at the same time. This may happen rarely, but it’s possible in the reference implementation and permitted by the specification.
We can summarize it as: Methods without Async provides the least control over the thread that will perform the action and may even perform it right in the calling thread, leading to synchronous behavior.
So they are best when the executing thread doesn’t matter and you’re not hoping for background thread execution, i.e. for short, non-blocking operations.
whereas a method terminating with Async always submits the succeeding task to the thread pool, so each of the tasks can be handled by a different thread. In this case, the result of the second CompletableFuture depends on the first, …
When you do
future.thenCompose(quote ->
CompletableFuture.supplyAsync(() -> Discount.applyDiscount(quote), executor))
there are three futures involved, so it’s not exactly clear, which future is meant by “second”. supplyAsync is submitting an action and returning a future. The submission is contained in a function passed to thenCompose, which will return another future.
If you used thenComposeAsync here, you only mandated that the execution of supplyAsync has to be submitted to the thread pool, instead of performing it directly in the completing thread or “any other caller of a completion method”, e.g. directly in the thread calling thenCompose.
The reasoning about dependencies makes no sense here. “then” always implies a dependency. If you use thenComposeAsync here, you enforced the submission of the action to the thread pool, but this submission still won’t happen before the completion of future. And if future completed exceptionally, the submission won’t happen at all.
So, is using thenCompose reasonable here? Yes it is, but not for the reasons given is the quote. As said, using the non-async method implies giving up control over the executing thread and should only be used when the thread doesn’t matter, most notably for short, non-blocking actions. Calling supplyAsync is a cheap action that will submit the actual action to the thread pool on its own, so it’s ok to perform it in whatever thread is free to do it.
However, it’s an unnecessary complication. You can achieve the same using
future.thenApplyAsync(quote -> Discount.applyDiscount(quote), executor)
which will do exactly the same, submit applyDiscount to executor when future has been completed and produce a new future representing the result. Using a combination of thenCompose and supplyAsync is unnecessary here.
Note that this very example has been discussed in this Q&A already, which also addresses the unnecessary segregation of the future operations over multiple Stream operations as well as the wrong sequence diagram.
What a polite answer from Holger! I am really impressed he could provide such a great explanation and at the same time staying in bounds of not calling the author plain wrong. I want to provide my 0.02$ here too, a little, after reading the same book and having to scratch my head twice.
First of all, there is no "remembering" of which thread executed which stage, neither does the specification make such a statement (as already answered above). The interesting part is even in the cited above documentation:
Actions supplied for dependent completions of non-async methods may be performed by the thread that completes the current CompletableFuture, or by any other caller of a completion method.
Even that ...completes the current CompletableFuture part is tricky. What if there are two threads that try to call complete on a CompletableFuture, which thread will run all the dependent actions? The one that has actually completed it? Or any other? I wrote a jcstress test that is very non-intuitive when looking at the results:
#JCStressTest
#State
#Outcome(id = "1, 0", expect = Expect.ACCEPTABLE, desc = "executed in completion thread")
#Outcome(id = "0, 1", expect = Expect.ACCEPTABLE, desc = "executed in the other thread")
#Outcome(id = "0, 0", expect = Expect.FORBIDDEN)
#Outcome(id = "1, 1", expect = Expect.FORBIDDEN)
public class CompletableFutureWhichThread1 {
private final CompletableFuture<String> future = new CompletableFuture<>();
public CompletableFutureWhichThread1() {
future.thenApply(x -> action(Thread.currentThread().getName()));
}
volatile int x = -1; // different default to not mess with the expected result
volatile int y = -1; // different default to not mess with the expected result
volatile int actor1 = 0;
volatile int actor2 = 0;
private String action(String threadName) {
System.out.println(Thread.currentThread().getName());
// same thread that completed future, executed action
if ("actor1".equals(threadName) && actor1 == 1) {
x = 1;
return "action";
}
// same thread that completed future, executed action
if ("actor2".equals(threadName) && actor2 == 1) {
x = 1;
return "action";
}
y = 1;
return "action";
}
#Actor
public void actor1() {
Thread.currentThread().setName("actor1");
boolean completed = future.complete("done-actor1");
if (completed) {
actor1 = 1;
} else {
actor2 = 1;
}
}
#Actor
public void actor2() {
Thread.currentThread().setName("actor2");
boolean completed = future.complete("done-actor2");
if (completed) {
actor2 = 1;
}
}
#Arbiter
public void arbiter(II_Result result) {
if (x == 1) {
result.r1 = 1;
}
if (y == 1) {
result.r2 = 1;
}
}
}
After running this, both 0, 1 and 1, 0 are seen. You do not need to understand very much about the test itself, but it proves a rather interesting point.
You have a CompletableFuture future that has a future.thenApply(x -> action(...)); attached to it. There are two threads (actor1 and actor2) that both, at the same time, compete with each other into completing it (the specification says that only one will be successful). The results show that if actor1 called complete, but does not actually complete the CompletableFuture (actor2 did), it can still do the actual work in action. In other words, a thread that completed a CompletableFuture is not necessarily the thread that executes the dependent actions (those thenApply for example). This was rather interesting for me to find out, though it makes sense.
Your reasonings about speed are a bit off. When you dispatch your work to a different thread, you usually pay a penalty for that. thenCompose vs thenComposeAsync is about being able to predict where exactly is your work going to happen. As you have seen above you can not do that, unless you use the ...Async methods that take a thread pool. Your natural question should be : "Why do I care where it is executed?".
There is an internal class in jdk's HttpClient called SelectorManager. It has (from a high level) a rather simple task: it reads from a socket and gives "responses" back to the threads that wait for a http result. In essence, this is a thread that wakes up all interested parties that wait for some http packets. Now imagine that this particular thread does internally thenCompose. Now also imagine that your chain of calls looks like this:
httpClient.sendAsync(() -> ...)
.thenApply(x -> foo())
where foo is a method that never finishes (or takes a lot of time to finish). Since you have no idea in which thread the actual execution is going to happen, it can, very well, happen in SelectorManager thread. Which would be a disaster. Everyone other http calls would stale, because this thread is busy now. Thus thenComposeAsync: let the configured pool do the work/waiting if needed, while the SelectorManager thread is free to do its work.
So the reasons that the author gives are plain wrong.

Concurrently run a Void CompletionStage but ignore result

I have two completionStages method calls that each call a remote service if a condition is not met. They are both pretty long running processes and we need to decrease latency. I also do not care for the secondFuture's response. It could return CompletionStage<Void> as I only care if the method runs before we exit the main method. An added complexity is that injectedClass2.serviceCall also throws a really important exception (404 StatusRuntimeException) that needs to be surfaced to the client.
How do I ensure that the first and second future run asynchronously (not dependant on each other) meanwhile the second future surfaces its error codes and exceptions for the client.
Main Method below is my best attempt at this. It works, but I am looking to learn a better implementation that takes advantage of completables/streams,etc.
try {
.
.
.
CompletionStage<Response> firstFuture;
CompletionStage<Response> secondFuture = CompletableFuture.completedFuture(Response.default());
if (condition) {
firstFuture = legacyImplThing.resolve(param1, param2);
} else {
firstFuture =
injectedClass1.longRunningOp(param1, param2);
secondFuture = injectedClass2.serviceCall(param1, param2, someOtherData);
}
final CompletionStage<MainMethodResponse> response =
CompletableFutures.combine(firstFuture, secondFuture, (a, b) -> a)
.thenApply(
v -> ServiceResponse.newBuilder().setParam(v.toString()).build());
handleResponse(response, responseObserver);
} catch (Exception e) {
responseObserver.onError(e);
}
Maybe out of scope, how would one test/check that two completionStages were run concurrently?
EDIT: CompletableFutures.combine() is a third-party library method and not part of the java.util.concurrent package.
Chaining other stages does not alter the previous stages. In other words, the parallelism is entirely outside your control, as it has been determined already.
More specifically, when you invoke injectedClass1.longRunningOp(param1, param2), the implementation of the method longRunningOp decides, how the returned future will be completed. Likewise, when you call injectedClass2.serviceCall(param1, param2, someOtherData), the implementation of serviceCall will determine the returned future’s completion. Both methods could use the same executor behind the scenes or entirely different approaches.
The only scenario where you can influence the parallelism, is that both methods perform the actual operation in the caller’s thread, to eventually return an already completed future. In this case, you would have to wrap each call into another asynchronous operation to let them run in parallel. But it would be a strange design to return a future when performing a lengthy operation in the caller’s thread.
Your code
CompletableFutures.combine(firstFuture, secondFuture, (a, b) -> a)
does not match the documented API. A valid call would be
firstFuture.thenCombine(secondFuture, (a, b) -> a)
In this case, you are not influencing the completions of firstFuture or secondFuture. You are only specifying what should happen after both futures have been completed.
There is, by the way, no reason to specify a trivial function like (a, b) -> a in thenCombine, just to chain another thenApply. You can use
firstFuture.thenCombine(secondFuture,
(v, b) -> ServiceResponse.newBuilder().setParam(v.toString()).build())
in the first place.

CompletableFuture single task that continues with many parallel tasks

I have the following code:
return CompletableFuture.supplyAsync(() -> {
return foo; // some custom object
})
.thenAccept(foo -> {
// ??? need to spawn N async parallel jobs that works on 'foo'
});
In english: the first task creates the foo object asynchronously; and then I need to run N parallel processes on it.
Is there a better way to do this then:
...
CompletableFuture[] parallel = new CompletableFuture[N];
for (int i = 0; i < N; i++) {
parallel[i] = CompletableFuture.runAsync(() -> {
work(foo);
});
}
CompletableFuture.allOf(parallel).join();
...
I don't like this as one thread gets locked while waiting N jobs to finish.
You can chain as many independent jobs as you like on a particular prerequisite job, e.g.
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
Collections.nCopies(N, base).forEach(f -> f.thenAcceptAsync(foo -> work(foo)));
will spawn N parallel jobs, invoking work(foo) concurrently, after the completion of the initial job which provides the Foo instance.
But keep in mind, that the underlying framework will consider the number of available CPU cores to size the thread pool actually executing the parallel jobs, so if N > #cores, some of these jobs may run one after another.
If the work is I/O bound, thus, you want to have a higher number of parallel threads, you have to specify your own executor.
The nCopies/forEach is not necessary, a for loop would do as well, but it provides a hint of how to handle subsequent jobs, that depend on the completion of all these parallel jobs:
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
CompletableFuture<Void> all = CompletableFuture.allOf(
Collections.nCopies(N, base).stream()
.map(f -> f.thenAcceptAsync(foo -> work(foo)))
.toArray(CompletableFuture<?>[]::new));
Now you can use all to check for the completion of all jobs or chain additional actions.
Since CompletableFuture.allOf already returns another CompletableFuture<Void>a you can just do another .thenAccept on it and extract the returned values from the CFs in parallel in the callback, that way you avoid calling join

How does CompletableFuture know that tasks are independent?

Imagine that we have the following dummy code:
CompletableFuture<BigInteger> cf1 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(2L));
CompletableFuture<BigInteger> cf2 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(3L));
cf1.thenCombine(cf2, (x, y) -> x.add(y)).thenAccept(System.out::println);
Does JVM know that cf1 and cf2 carry independent threads in this case? And what will change if threads will be dependent (for example, use one connection to database)?
More general, how does CompletableFuture synchronize threads?
A CompletableFuture has no relation to any thread. It is just a holder for a result retrieved asynchronously with methods to operate on that result.
The static supplyAsync and runAsync methods are just helper methods. The javadoc of supplyAsync states
Returns a new CompletableFuture that is asynchronously completed by a
task running in the ForkJoinPool.commonPool() with the value obtained
by calling the given Supplier.
This is more or less equivalent to
Supplier<R> sup = ...;
CompletableFuture<R> future = new CompletableFuture<R>();
ForkJoinPool.commonPool().submit(() -> {
try {
R result = sup.get();
future.complete(result);
} catch (Throwable e) {
future.completeExceptionally(e);
}
});
return future;
The CompletableFuture is returned, even allowing you to complete it before the task submitted to the pool.
More general, how does CompletableFuture synchronize threads?
It doesn't, since it doesn't know which threads are operating on it. This is further hinted at in the javadoc
Since (unlike FutureTask) this class has no direct control over the
computation that causes it to be completed, cancellation is treated as
just another form of exceptional completion. Method cancel has the
same effect as completeExceptionally(new CancellationException()).
Method isCompletedExceptionally() can be used to determine if a
CompletableFuture completed in any exceptional fashion.
CompletableFuture objects do not control processing.
I don't think that a CompletableFuture (CF) "synchronizes threads". It uses the executor you have provided or the common pool if you have not provided one.
When you call supplyAsync, the CF submits the various tasks to that pool which in turns manages the underlying threads to execute the tasks.
It doesn't know, nor does it try to synchronize anything. It is still the client's responsibility to properly synchronize access to mutable shared data.

Categories