How does CompletableFuture know that tasks are independent? - java

Imagine that we have the following dummy code:
CompletableFuture<BigInteger> cf1 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(2L));
CompletableFuture<BigInteger> cf2 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(3L));
cf1.thenCombine(cf2, (x, y) -> x.add(y)).thenAccept(System.out::println);
Does JVM know that cf1 and cf2 carry independent threads in this case? And what will change if threads will be dependent (for example, use one connection to database)?
More general, how does CompletableFuture synchronize threads?

A CompletableFuture has no relation to any thread. It is just a holder for a result retrieved asynchronously with methods to operate on that result.
The static supplyAsync and runAsync methods are just helper methods. The javadoc of supplyAsync states
Returns a new CompletableFuture that is asynchronously completed by a
task running in the ForkJoinPool.commonPool() with the value obtained
by calling the given Supplier.
This is more or less equivalent to
Supplier<R> sup = ...;
CompletableFuture<R> future = new CompletableFuture<R>();
ForkJoinPool.commonPool().submit(() -> {
try {
R result = sup.get();
future.complete(result);
} catch (Throwable e) {
future.completeExceptionally(e);
}
});
return future;
The CompletableFuture is returned, even allowing you to complete it before the task submitted to the pool.
More general, how does CompletableFuture synchronize threads?
It doesn't, since it doesn't know which threads are operating on it. This is further hinted at in the javadoc
Since (unlike FutureTask) this class has no direct control over the
computation that causes it to be completed, cancellation is treated as
just another form of exceptional completion. Method cancel has the
same effect as completeExceptionally(new CancellationException()).
Method isCompletedExceptionally() can be used to determine if a
CompletableFuture completed in any exceptional fashion.
CompletableFuture objects do not control processing.

I don't think that a CompletableFuture (CF) "synchronizes threads". It uses the executor you have provided or the common pool if you have not provided one.
When you call supplyAsync, the CF submits the various tasks to that pool which in turns manages the underlying threads to execute the tasks.

It doesn't know, nor does it try to synchronize anything. It is still the client's responsibility to properly synchronize access to mutable shared data.

Related

Thread vs Runnable vs CompletableFuture in Java multi threading

I am trying to implement multi threading in my Spring Boot app. I am just beginner on multi threading in Java and after making some search and reading articles on various pages, I need to be clarified about the following points. So;
As far as I see, I can use Thread, Runnable or CompletableFuture in order to implement multi threading in a Java app. CompletableFuture seems a newer and cleaner way, but Thread may have more advantages. So, should I stick to CompletableFuture or use all of them based on the scenario?
Basically I want to send 2 concurrent requests to the same service method by using CompletableFuture:
CompletableFuture<Integer> future1 = fetchAsync(1);
CompletableFuture<Integer> future2 = fetchAsync(2);
Integer result1 = future1.get();
Integer result2 = future2.get();
How can I send these request concurrently and then return result based on the following condition:
if the first result is not null, return result and stop process
if the first result is null, return the second result and stop process
How can I do this? Should I use CompletableFuture.anyOf() for that?
CompletableFuture is a tool which settles atop the Executor/ExecutorService abstraction, which has implementations dealing with Runnable and Thread. You usually have no reason to deal with Thread creation manually. If you find CompletableFuture unsuitable for a particular task you may try the other tools/abstractions first.
If you want to proceed with the first (in the sense of faster) non‑null result, you can use something like
CompletableFuture<Integer> future1 = fetchAsync(1);
CompletableFuture<Integer> future2 = fetchAsync(2);
Integer result = CompletableFuture.anyOf(future1, future2)
.thenCompose(i -> i != null?
CompletableFuture.completedFuture((Integer)i):
future1.thenCombine(future2, (a, b) -> a != null? a: b))
.join();
anyOf allows you to proceed with the first result, but regardless of its actual value. So to use the first non‑null result we need to chain another operation which will resort to thenCombine if the first result is null. This will only complete when both futures have been completed but at this point we already know that the faster result was null and the second is needed. The overall code will still result in null when both results were null.
Note that anyOf accepts arbitrarily typed futures and results in a CompletableFuture<Object>. Hence, i is of type Object and a type cast needed. An alternative with full type safety would be
CompletableFuture<Integer> future1 = fetchAsync(1);
CompletableFuture<Integer> future2 = fetchAsync(2);
Integer result = future1.applyToEither(future2, Function.identity())
.thenCompose(i -> i != null?
CompletableFuture.completedFuture(i):
future1.thenCombine(future2, (a, b) -> a != null? a: b))
.join();
which requires us to specify a function which we do not need here, so this code resorts to Function.identity(). You could also just use i -> i to denote an identity function; that’s mostly a stylistic choice.
Note that most complications stem from the design that tries to avoid blocking threads by always chaining a dependent operation to be executed when the previous stage has been completed. The examples above follow this principle as the final join() call is only for demonstration purposes; you can easily remove it and return the future, if the caller expects a future rather than being blocked.
If you are going to perform the final blocking join() anyway, because you need the result value immediately, you can also use
Integer result = future1.applyToEither(future2, Function.identity()).join();
if(result == null) {
Integer a = future1.join(), b = future2.join();
result = a != null? a: b;
}
which might be easier to read and debug. This ease of use is the motivation behind the upcoming Virtual Threads feature. When an action is running on a virtual thread, you don’t need to avoid blocking calls. So with this feature, if you still need to return a CompletableFuture without blocking the your caller thread, you can use
CompletableFuture<Integer> resultFuture = future1.applyToEitherAsync(future2, r-> {
if(r != null) return r;
Integer a = future1.join(), b = future2.join();
return a != null? a: b;
}, Executors.newVirtualThreadPerTaskExecutor());
By requesting a virtual thread for the dependent action, we can use blocking join() calls within the function without hesitation which makes the code simpler, in fact, similar to the previous non-asynchronous variant.
In all cases, the code will provide the faster result if it is non‑null, without waiting for the completion of the second future. But it does not stop the evaluation of the unnecessary future. Stopping an already ongoing evaluation is not supported by CompletableFuture at all. You can call cancel(…) on it, but this will will only set the completion state (result) of the future to “exceptionally completed with a CancellationException”
So whether you call cancel or not, the already ongoing evaluation will continue in the background and only its final result will be ignored.
This might be acceptable for some operations. If not, you would have to change the implementation of fetchAsync significantly. You could use an ExecutorService directly and submit an operation to get a Future which support cancellation with interruption.
But it also requires the operation’s code to be sensitive to interruption, to have an actual effect:
When calling blocking operations, use those methods that may abort and throw an InterruptedException and do not catch-and-continue.
When performing a long running computational intense task, poll Thread.interrupted() occasionally and bail out when true.
So, should I stick to CompletableFuture or use all of them based on the scenario?
Use the one that is most appropriate to the scenario. Obviously, we can't be more specific unless you explain the scenario.
There are various factors to take into account. For example:
Thread + Runnable doesn't have a natural way to wait for / return a result. (But it is not hard to implement.)
Repeatedly creating bare Thread objects is inefficient because thread creation is expensive. Thread pooling is better but you shouldn't implement a thread pool yourself.
Solutions that use an ExecutorService take care of thread pooling and allow you to use Callable and return a Future. But for a once-off async computation this might be over-kill.
Solutions that involve ComputableFuture allow you to compose and combine asynchronous tasks. But if you don't need to do that, using ComputableFuture may be overkill.
As you can see ... there is no single correct answer for all scenarios.
Should I use CompletableFuture.anyOf() for that?
No. The logic of your example requires that you must have the result for future1 to determine whether or not you need the result for future2. So the solution is something like this:
Integer i1 = future1.get();
if (i1 == null) {
return future2.get();
} else {
future2.cancel(true);
return i1;
}
Note that the above works with plain Future as well as CompletableFuture. If you were using CompletableFuture because you thought that anyOf was the solution, then you didn't need to do that. Calling ExecutorService.submit(Callable) will give you a Future ...
It will be more complicated if you need to deal with exceptions thrown by the tasks and/or timeouts. In the former case, you need to catch ExecutionException and the extract its cause exception to get the exception thrown by the task.
There is also the caveat that the second computation may ignore the interrupt and continue on regardless.
So, should I stick to CompletableFuture or use all of them based on the scenario?
Well, they all have different purposes and you'll probably use them all either directly or indirectly:
Thread represents a thread and while it can be subclassed in most cases you shouldn't do so. Many frameworks maintain thread pools, i.e. they spin up several threads that then can take tasks from a task pool. This is done to reduce the overhead that thread creation brings as well as to reduce the amount of contention (many threads and few cpu cores mean a lot of context switches so you'd normally try to have fewer threads that just work on one task after another).
Runnable was one of the first interfaces to represent tasks that a thread can work on. Another is Callable which has 2 major differences to Runnable: 1) it can return a value while Runnable has void and 2) it can throw checked exceptions. Depending on your case you can use either but since you want to get a result, you'll more likely use Callable.
CompletableFuture and Future are basically a way for cross-thread communication, i.e. you can use those to check whether the task is done already (non-blocking) or to wait for completion (blocking).
So in many cases it's like this:
you submit a Runnable or Callable to some executor
the executor maintains a pool of Threads to execute the tasks you submitted
the executor returns a Future (one implementation being CompletableFuture) for you to check on the status and results of the task without having to synchronize yourself.
However, there may be other cases where you directly provide a Runnable to a Thread or even subclass Thread but nowadays those are far less common.
How can I do this? Should I use CompletableFuture.anyOf() for that?
CompletableFuture.anyOf() wouldn't work since you'd not be able to determine which of the 2 you'd pass in was successful first.
Since you're interested in result1 first (which btw can't be null if the type is int) you basically want to do the following:
Integer result1 = future1.get(); //block until result 1 is ready
if( result1 != null ) {
return result1;
} else {
return future2.get(); //result1 was null so wait for result2 and return it
}
You'd not want to call future2.get() right away since that would block until both are done but instead you're first interested in future1 only so if that produces a result you wouldn't have for future2 to ever finish.
Note that the code above doesn't handle exceptional completions and there's also probably a more elegant way of composing the futures like you want but I don't remember it atm (if I do I'll add it as an edit).
Another note: you could call future2.cancel() if result1 isn't null but I'd suggest you first check whether cancelling would even work (e.g. you'd have a hard time really cancelling a webservice request) and what the results of interrupting the service would be. If it's ok to just let it complete and ignore the result that's probably the easier way to go.

Java CompletableFuture allOf approach

I am trying to run 3 operations in parallel using the CompletableFuture approach. Now these 3 operations return different types so need to retrieve the data separately. Here is what i am trying to do:
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync (() -> getAList());
CompletableFuture<Map<String,B> bFuture = CompletableFuture.supplyAsync (() -> getBMap());
CompletableFuture<Map<String,C> cFuture = CompletableFuture.supplyAsync (() -> getCMap());
CompletableFuture<Void> combinedFuture =
CompletableFuture.allOf (aFuture, bFuture, cFuture);
combinedFuture.get(); (or join())
List<A> aData = aFuture.get(); (or join)
Map<String, C> bData = bFuture.get(); (or join)
Map<String, C> cData = cFuture.get(); (or join)
This does the job and works but i am trying to understand if we need to do these gets/joins on combined future as well as individual ones and if there is a better way to do this.
Also i tried using then whenComplete() approach but then the variables i want to assign the returned data are inside the method so i am getting a "The final local variable cannot be assigned, since it is defined in an enclosing type in Java" error and i don't want to move them to the class level.
looking for some expert/alternate opinions. Thank you in advance
SG
Calling get or join just implies “wait for the completion”. It has no influence of the completion itself.
When you call CompletableFuture.supplyAsync(() -> getAList()), this method will submit the evaluation of the supplier to the common pool immediately. The caller’s only influence on the execution of getAList() is the fact that the JVM will terminate if there are no non-daemon threads running. This is a common error in simple test programs, incorporating a main method that doesn’t wait for completion. Otherwise, the execution of getAList() will complete, regardless of whether its result will ever be queried.
So when you use
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync(() -> getAList());
CompletableFuture<Map<String,B>> bFuture=CompletableFuture.supplyAsync(() -> getBMap());
CompletableFuture<Map<String,C>> cFuture=CompletableFuture.supplyAsync(() -> getCMap());
List<A> aData = aFuture.join();
Map<String, B> bData = bFuture.join();
Map<String, C> cData = cFuture.join();
The three subsequent supplyAsync calls ensure that the three operations might run concurrently. The three join() calls only wait for the result and when the third join() returned, you know that all three operations are completed. It’s possible that the first join() returns at a time when aFuture has been completed, but either or both of the other operations are still running, but that doesn’t matter for three independent operations.
When you execute CompletableFuture.allOf(aFuture, bFuture, cFuture).join(); before the individual join() calls, it ensures that all three operations completed before the first individual join() call, but as said, it has no impact when all three operations are independent and you’re not relying on some side effect of their execution (which you shouldn’t in general).
The actual purpose of allOf is to construct a new future when you do not want to wait for the result immediately. E.g.
record Result(List<A> aData, Map<String, B> bData, Map<String, C> cData) {}
CompletableFuture<Result> r = CompletableFuture.allOf(aFuture, bFuture, cFuture)
.thenApply(v -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
// return r or pass it to some other code...
here, the use of allOf is preferable to, e.g.
CompletableFuture<Result> r = CompletableFuture.supplyAsync(
() -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
because the latter might block a worker thread when join() is called from the supplier. The underlying framework might compensate when it detects this, e.g. start a new thread, but this is still an expensive operation. In contrast, the function chained to allOf is only evaluated after all futures completed, so all embedded join() calls are guaranteed to return immediately.
For a small number of futures, there’s still an alternative to allOf, e.g.
var r = aFuture.thenCompose(a ->
bFuture.thenCombine(cFuture, (b, c) -> new Result(a, b, c)));

Is the writer's reason correct for using thenCompose and not thenComposeAsync

This question is different from this one Difference between Java8 thenCompose and thenComposeAsync because I want to know what is the writer's reason for using thenCompose and not thenComposeAsync.
I was reading Modern Java in action and I came across this part of code on page 405:
public static List<String> findPrices(String product) {
ExecutorService executor = Executors.newFixedThreadPool(10);
List<Shop> shops = Arrays.asList(new Shop(), new Shop());
List<CompletableFuture<String>> priceFutures = shops.stream()
.map(shop -> CompletableFuture.supplyAsync(() -> shop.getPrice(product), executor))
.map(future -> future.thenApply(Quote::parse))
.map(future -> future.thenCompose(quote ->
CompletableFuture.supplyAsync(() -> Discount.applyDiscount(quote), executor)))
.collect(toList());
return priceFutures.stream()
.map(CompletableFuture::join).collect(toList());
}
Everything is Ok and I can understand this code but here is the writer's reason for why he didn't use thenComposeAsync on page 408 which I can't understand:
In general, a method without the Async suffix in its name executes
its task in the same threads the previous task, whereas a method
terminating with Async always submits the succeeding task to the
thread pool, so each of the tasks can be handled by a
different thread. In this case, the result of the second
CompletableFuture depends on the first,so it makes no difference to
the final result or to its broad-brush timing whether you compose the
two CompletableFutures with one or the other variant of this method
In my understanding with the thenCompose( and thenComposeAsync) signatures as below:
public <U> CompletableFuture<U> thenCompose(
Function<? super T, ? extends CompletionStage<U>> fn) {
return uniComposeStage(null, fn);
}
public <U> CompletableFuture<U> thenComposeAsync(
Function<? super T, ? extends CompletionStage<U>> fn) {
return uniComposeStage(asyncPool, fn);
}
The result of the second CompletableFuture can depends on the previous CompletableFuture in many situations(or rather I can say almost always), should we use thenCompose and not thenComposeAsync in those cases?
What if we have blocking code in the second CompletableFuture?
This is a similar example which was given by person who answered similar question here: Difference between Java8 thenCompose and thenComposeAsync
public CompletableFuture<String> requestData(Quote quote) {
Request request = blockingRequestForQuote(quote);
return CompletableFuture.supplyAsync(() -> sendRequest(request));
}
To my mind in this situation using thenComposeAsync can make our program faster because here blockingRequestForQuote can be run on different thread. But based on the writer's opinion we should not use thenComposeAsync because it depends on the first CompletableFuture result(that is Quote).
My question is:
Is the writer's idea correct when he said :
In this case, the result of the second
CompletableFuture depends on the first,so it makes no difference to
the final result or to its broad-brush timing whether you compose the
two CompletableFutures with one or the other variant of this method
TL;DR It is correct to use thenCompose instead of thenComposeAsync here, but not for the cited reasons. Generally, the code example should not be used as a template for your own code.
This chapter is a recurring topic on Stackoverflow for reasons we can best describe as “insufficient quality”, to stay polite.
In general, a method without the Async suffix in its name executes its task in the same threads the previous task, …
There is no such guaranty about the executing thread in the specification. The documentation says:
Actions supplied for dependent completions of non-async methods may be performed by the thread that completes the current CompletableFuture, or by any other caller of a completion method.
So there’s also the possibility that the task is performed “by any other caller of a completion method”. An intuitive example is
CompletableFuture<X> f = CompletableFuture.supplyAsync(() -> foo())
.thenApply(f -> f.bar());
There are two threads involved. One that invokes supplyAsync and thenApply and the other which will invoke foo(). If the second completes the invocation of foo() before the first thread enters the execution of thenApply, it is possible that the future is already completed.
A future does not remember which thread completed it. Neither does it have some magic ability to tell that thread to perform an action despite it might be busy with something else or even have terminated since then. So it should be obvious that calling thenApply on an already completed future can’t promise to use the thread that completed it. In most cases, it will perform the action immediately in the thread that calls thenApply. This is covered by the specification’s wording “any other caller of a completion method”.
But that’s not the end of the story. As this answer explains, when there are more than two threads involved, the action can also get performed by another thread calling an unrelated completion method on the future at the same time. This may happen rarely, but it’s possible in the reference implementation and permitted by the specification.
We can summarize it as: Methods without Async provides the least control over the thread that will perform the action and may even perform it right in the calling thread, leading to synchronous behavior.
So they are best when the executing thread doesn’t matter and you’re not hoping for background thread execution, i.e. for short, non-blocking operations.
whereas a method terminating with Async always submits the succeeding task to the thread pool, so each of the tasks can be handled by a different thread. In this case, the result of the second CompletableFuture depends on the first, …
When you do
future.thenCompose(quote ->
CompletableFuture.supplyAsync(() -> Discount.applyDiscount(quote), executor))
there are three futures involved, so it’s not exactly clear, which future is meant by “second”. supplyAsync is submitting an action and returning a future. The submission is contained in a function passed to thenCompose, which will return another future.
If you used thenComposeAsync here, you only mandated that the execution of supplyAsync has to be submitted to the thread pool, instead of performing it directly in the completing thread or “any other caller of a completion method”, e.g. directly in the thread calling thenCompose.
The reasoning about dependencies makes no sense here. “then” always implies a dependency. If you use thenComposeAsync here, you enforced the submission of the action to the thread pool, but this submission still won’t happen before the completion of future. And if future completed exceptionally, the submission won’t happen at all.
So, is using thenCompose reasonable here? Yes it is, but not for the reasons given is the quote. As said, using the non-async method implies giving up control over the executing thread and should only be used when the thread doesn’t matter, most notably for short, non-blocking actions. Calling supplyAsync is a cheap action that will submit the actual action to the thread pool on its own, so it’s ok to perform it in whatever thread is free to do it.
However, it’s an unnecessary complication. You can achieve the same using
future.thenApplyAsync(quote -> Discount.applyDiscount(quote), executor)
which will do exactly the same, submit applyDiscount to executor when future has been completed and produce a new future representing the result. Using a combination of thenCompose and supplyAsync is unnecessary here.
Note that this very example has been discussed in this Q&A already, which also addresses the unnecessary segregation of the future operations over multiple Stream operations as well as the wrong sequence diagram.
What a polite answer from Holger! I am really impressed he could provide such a great explanation and at the same time staying in bounds of not calling the author plain wrong. I want to provide my 0.02$ here too, a little, after reading the same book and having to scratch my head twice.
First of all, there is no "remembering" of which thread executed which stage, neither does the specification make such a statement (as already answered above). The interesting part is even in the cited above documentation:
Actions supplied for dependent completions of non-async methods may be performed by the thread that completes the current CompletableFuture, or by any other caller of a completion method.
Even that ...completes the current CompletableFuture part is tricky. What if there are two threads that try to call complete on a CompletableFuture, which thread will run all the dependent actions? The one that has actually completed it? Or any other? I wrote a jcstress test that is very non-intuitive when looking at the results:
#JCStressTest
#State
#Outcome(id = "1, 0", expect = Expect.ACCEPTABLE, desc = "executed in completion thread")
#Outcome(id = "0, 1", expect = Expect.ACCEPTABLE, desc = "executed in the other thread")
#Outcome(id = "0, 0", expect = Expect.FORBIDDEN)
#Outcome(id = "1, 1", expect = Expect.FORBIDDEN)
public class CompletableFutureWhichThread1 {
private final CompletableFuture<String> future = new CompletableFuture<>();
public CompletableFutureWhichThread1() {
future.thenApply(x -> action(Thread.currentThread().getName()));
}
volatile int x = -1; // different default to not mess with the expected result
volatile int y = -1; // different default to not mess with the expected result
volatile int actor1 = 0;
volatile int actor2 = 0;
private String action(String threadName) {
System.out.println(Thread.currentThread().getName());
// same thread that completed future, executed action
if ("actor1".equals(threadName) && actor1 == 1) {
x = 1;
return "action";
}
// same thread that completed future, executed action
if ("actor2".equals(threadName) && actor2 == 1) {
x = 1;
return "action";
}
y = 1;
return "action";
}
#Actor
public void actor1() {
Thread.currentThread().setName("actor1");
boolean completed = future.complete("done-actor1");
if (completed) {
actor1 = 1;
} else {
actor2 = 1;
}
}
#Actor
public void actor2() {
Thread.currentThread().setName("actor2");
boolean completed = future.complete("done-actor2");
if (completed) {
actor2 = 1;
}
}
#Arbiter
public void arbiter(II_Result result) {
if (x == 1) {
result.r1 = 1;
}
if (y == 1) {
result.r2 = 1;
}
}
}
After running this, both 0, 1 and 1, 0 are seen. You do not need to understand very much about the test itself, but it proves a rather interesting point.
You have a CompletableFuture future that has a future.thenApply(x -> action(...)); attached to it. There are two threads (actor1 and actor2) that both, at the same time, compete with each other into completing it (the specification says that only one will be successful). The results show that if actor1 called complete, but does not actually complete the CompletableFuture (actor2 did), it can still do the actual work in action. In other words, a thread that completed a CompletableFuture is not necessarily the thread that executes the dependent actions (those thenApply for example). This was rather interesting for me to find out, though it makes sense.
Your reasonings about speed are a bit off. When you dispatch your work to a different thread, you usually pay a penalty for that. thenCompose vs thenComposeAsync is about being able to predict where exactly is your work going to happen. As you have seen above you can not do that, unless you use the ...Async methods that take a thread pool. Your natural question should be : "Why do I care where it is executed?".
There is an internal class in jdk's HttpClient called SelectorManager. It has (from a high level) a rather simple task: it reads from a socket and gives "responses" back to the threads that wait for a http result. In essence, this is a thread that wakes up all interested parties that wait for some http packets. Now imagine that this particular thread does internally thenCompose. Now also imagine that your chain of calls looks like this:
httpClient.sendAsync(() -> ...)
.thenApply(x -> foo())
where foo is a method that never finishes (or takes a lot of time to finish). Since you have no idea in which thread the actual execution is going to happen, it can, very well, happen in SelectorManager thread. Which would be a disaster. Everyone other http calls would stale, because this thread is busy now. Thus thenComposeAsync: let the configured pool do the work/waiting if needed, while the SelectorManager thread is free to do its work.
So the reasons that the author gives are plain wrong.

Java CompletableFuture assigning executor

I am having confusion with defining executor in CompletableFuture. I am not sure how to tell CompletableFuture to run it in that particular executor. Thanks in advance.
//Suppose I have an executor
ExecutorService myExecutor=Executors.newFixedThreadPool(2);
//If I create a future like this
CompletableFuture.runAsync(() -> {
//Do something
}, myExecutor); // I can put the executor here and say the future to this executor
//But I do not know where to put executor if I create my future in method style like this
private final CompletableFuture<Void> myMethod(String something) {
//Do something
return null;
}
//and use it like this
.thenCompose(this::myMethod); //How can I specify the executor in this case?
In your example, you have 3 CompletableFutures that are at play:
the one returned by runAsync()
the one returned by myMethod()
the one returned by thenCompose()
You also have 4 tasks that need to be run:
the one passed to runAsync() will be executed on the given executor and handle future 1;
the one that calls myMethod() from thenCompose() to create future 2 can be run on any executor, use thenComposeAsync() to explicitly choose one;
the one that will complete future 2 returned by myMethod() – this will be controlled inside myMethod() itself;
the one that will complete future 3 returned by thenCompose() – this is handled internally and will depend on execution order (e.g. if myMethod() returns an already completed future, it will also complete the former).
As you can see, several tasks and executors are involved, but you can always control the executors used in dependent stages using *Async() variants. The only case where you don't really control it is the 4th case, but it is a cheap operation as long as dependent stages use the *Async() variants as well.
you can do something like this:
ExecutorService es = Executors.newFixedThreadPool(4);
List<Runnable> tasks = getTasks();
CompletableFuture<?>[] futures = tasks.stream()
.map(task -> CompletableFuture.runAsync(task, es))
.toArray(CompletableFuture[]::new);
CompletableFuture.allOf(futures).join();
es.shutdown();

How to cancel Java 8 completable future?

I am playing with Java 8 completable futures. I have the following code:
CountDownLatch waitLatch = new CountDownLatch(1);
CompletableFuture<?> future = CompletableFuture.runAsync(() -> {
try {
System.out.println("Wait");
waitLatch.await(); //cancel should interrupt
System.out.println("Done");
} catch (InterruptedException e) {
System.out.println("Interrupted");
throw new RuntimeException(e);
}
});
sleep(10); //give it some time to start (ugly, but works)
future.cancel(true);
System.out.println("Cancel called");
assertTrue(future.isCancelled());
assertTrue(future.isDone());
sleep(100); //give it some time to finish
Using runAsync I schedule execution of a code that waits on a latch. Next I cancel the future, expecting an interrupted exception to be thrown inside. But it seems that the thread remains blocked on the await call and the InterruptedException is never thrown even though the future is canceled (assertions pass). An equivalent code using ExecutorService works as expected. Is it a bug in the CompletableFuture or in my example?
When you call CompletableFuture#cancel, you only stop the downstream part of the chain. Upstream part, i. e. something that will eventually call complete(...) or completeExceptionally(...), doesn't get any signal that the result is no more needed.
What are those 'upstream' and 'downstream' things?
Let's consider the following code:
CompletableFuture
.supplyAsync(() -> "hello") //1
.thenApply(s -> s + " world!") //2
.thenAccept(s -> System.out.println(s)); //3
Here, the data flows from top to bottom - from being created by supplier, through being modified by function, to being consumed by println. The part above particular step is called upstream, and the part below is downstream. E. g. steps 1 and 2 are upstream for step 3.
Here's what happens behind the scenes. This is not precise, rather it's a convenient mind model of what's going on.
Supplier (step 1) is being executed (inside the JVM's common ForkJoinPool).
The result of the supplier is then being passed by complete(...) to the next CompletableFuture downstream.
Upon receiving the result, that CompletableFuture invokes next step - a function (step 2) which takes in previous step result and returns something that will be passed further, to the downstream CompletableFuture's complete(...).
Upon receiving the step 2 result, step 3 CompletableFuture invokes the consumer, System.out.println(s). After consumer is finished, the downstream CompletableFuture will receive it's value, (Void) null
As we can see, each CompletableFuture in this chain has to know who are there downstream waiting for the value to be passed to their's complete(...) (or completeExceptionally(...)). But the CompletableFuture don't have to know anything about it's upstream (or upstreams - there might be several).
Thus, calling cancel() upon step 3 doesn't abort steps 1 and 2, because there's no link from step 3 to step 2.
It is supposed that if you're using CompletableFuture then your steps are small enough so that there's no harm if a couple of extra steps will get executed.
If you want cancellation to be propagated upstream, you have two options:
Implement this yourself - create a dedicated CompletableFuture (name it like cancelled) which is checked after every step (something like step.applyToEither(cancelled, Function.identity()))
Use reactive stack like RxJava 2, ProjectReactor/Flux or Akka Streams
Apparently, it's intentional. The Javadoc for the method CompletableFuture::cancel states:
[Parameters:] mayInterruptIfRunning - this value has no effect in this implementation because interrupts are not used to control processing.
Interestingly, the method ForkJoinTask::cancel uses almost the same wording for the parameter mayInterruptIfRunning.
I have a guess on this issue:
interruption is intended to be used with blocking operations, like sleep, wait or I/O operations,
but neither CompletableFuture nor ForkJoinTask are intended to be used with blocking operations.
Instead of blocking, a CompletableFuture should create a new CompletionStage, and cpu-bound tasks are a prerequisite for the fork-join model. So, using interruption with either of them would defeat their purpose. And on the other hand, it might increase complexity, that's not required if used as intended.
If you actually want to be able to cancel a task, then you have to use Future itself (e.g. as returned by ExecutorService.submit(Callable<T>), not CompletableFuture. As pointed out in the answer by nosid, CompletableFuture completely ignores any call to cancel(true).
My suspicion is that the JDK team did not implement interruption because:
Interruption was always hacky, difficult for people to understand, and difficult to work with. The Java I/O system is not even interruptible, despite calls to InputStream.read() being blocking calls! (And the JDK team have no plans to make the standard I/O system interruptible again, like it was in the very early Java days.)
The JDK team have been trying very hard to phase out old broken APIs from the early Java days, such as Object.finalize(), Object.wait(), Thread.stop(), etc. I believe Thread.interrupt() is considered to be in the category of things that must be eventually deprecated and replaced. Therefore, newer APIs (like ForkJoinPool and CompletableFuture) are already not supporting it.
CompletableFuture was designed for building DAG-structured pipelines of operations, similar to the Java Stream API. It's very dificult to succinctly describe how interruption of one node of a dataflow DAG should affect execution in the rest of the DAG. (Should all concurrent tasks be canceled immediately, when any node is interrupted?)
I suspect the JDK team just didn't want to deal with getting interruption right, given the levels of internal complexity that the JDK and libraries have reached these days. (The internals of the lambda system -- ugh.)
One very hacky way around this would be to have each CompletableFuture export a reference to itself to an externally-visible AtomicReference, then the Thread reference could be interrupted directly when needed from another external thread. Or if you start all the tasks using your own ExecutorService, in your own ThreadPool, you can manually interrupt any or all the threads that were started, even if CompletableFuture refuses to trigger interruption via cancel(true). (Note though that CompletableFuture lambdas cannot throw checked exceptions, so if you have an interruptible wait in a CompletableFuture, you'll have to re-throw as an unchecked exception.)
More simply, you could just declare an AtomicReference<Boolean> cancel = new AtomicReference<>() in an external scope, and periodically check this flag from inside each CompletableFuture task's lambda.
You could also try setting up a DAG of Future instances rather than a DAG of CompletableFuture instances, that way you can exactly specify how exceptions and interruption/cancellation in any one task should affect the other currently-running tasks. I show how to do this in my example code in my question here, and it works well, but it's a lot of boilerplate.
You need an alternative implementation of CompletionStage to accomplish true thread interruption. I've just released a small library that serves exactly this purpose - https://github.com/vsilaev/tascalate-concurrent
The call to wait will still block even if Future.cancel(..) is called. As mentioned by others the CompletableFuture will not use interrupts to cancel the task.
According to the javadoc of CompletableFuture.cancel(..):
mayInterruptIfRunning this value has no effect in this implementation because interrupts are not used to control processing.
Even if the implementation would cause an interrupt, you would still need a blocking operation in order to cancel the task or check the status via Thread.interrupted().
Instead of interrupting the Thread, which might not be always easy to do, you may have check points in your operation where you can gracefully terminate the current task. This can be done in a loop over some elements that will be processed or you check before each step of the operation for the cancel status and throw an CancellationException yourself.
The tricky part is to get a reference of the CompletableFuture within the task in order to call Future.isCancelled(). Here is an example of how it can be done:
public abstract class CancelableTask<T> {
private CompletableFuture<T> task;
private T run() {
try {
return compute();
} catch (Throwable e) {
task.completeExceptionally(e);
}
return null;
}
protected abstract T compute() throws Exception;
protected boolean isCancelled() {
Future<T> future = task;
return future != null && future.isCancelled();
}
public Future<T> start() {
synchronized (this) {
if (task != null) throw new IllegalStateException("Task already started.");
task = new CompletableFuture<>();
}
return task.completeAsync(this::run);
}
}
Edit: Here the improved CancelableTask version as a static factory:
public static <T> CompletableFuture<T> supplyAsync(Function<Future<T>, T> operation) {
CompletableFuture<T> future = new CompletableFuture<>();
return future.completeAsync(() -> operation.apply(future));
}
here is the test method:
#Test
void testFuture() throws InterruptedException {
CountDownLatch started = new CountDownLatch(1);
CountDownLatch done = new CountDownLatch(1);
AtomicInteger counter = new AtomicInteger();
Future<Object> future = supplyAsync(task -> {
started.countDown();
while (!task.isCancelled()) {
System.out.println("Count: " + counter.getAndIncrement());
}
System.out.println("Task cancelled");
done.countDown();
return null;
});
// wait until the task is started
assertTrue(started.await(5, TimeUnit.SECONDS));
future.cancel(true);
System.out.println("Cancel called");
assertTrue(future.isCancelled());
assertTrue(future.isDone());
assertTrue(done.await(5, TimeUnit.SECONDS));
}
If you really want to use interrupts in addition to the CompletableFuture, then you can pass a custom Executor to CompletableFuture.completeAsync(..) where you create your own Thread, override cancel(..) in the CompletableFuture and interrupt your Thread.
The CancellationException is part of the internal ForkJoin cancel routine. The exception will come out when you retrieve the result of future:
try { future.get(); }
catch (Exception e){
System.out.println(e.toString());
}
Took a while to see this in a debugger. The JavaDoc is not that clear on what is happening or what you should expect.

Categories