Ordered execution of many CompletableFuture.allof() while staying non-blocking - java

I have this case where there is 10 or more tasks that are grouped into many groups. Inside these groups everything should run concurrently, but because each group needs the results of the previous group (with exception of the first group), I need to run them in an orderly fashion (Tasks inside a group don't need to run in order).
The tasks themselves are querying data from database then apply some transformation and save it back to database.
Task 1.1 // This group run first
Task 1.2
Task 2.1 // Waiting results from group 1
Task 2.2
Task 2.3
Task 3.1 // Waiting results from group 2
I was thinking to use list of allOf(), iterate it then explicitly call get() for each of that allOf(), but it will block which I don't want it to happen, so my question is, how to execute many allOf() in order? Is iteven possible to use only CompletableFuture here?

When you use allOf(), it returns a CompletableFuture that will complete only when all of the given completion stages are completed.
If you chain calls from the returned future, they are thus guaranteed that a call to get() on any of the completion stages passed to allOf() will never block (since they are already completed).
// First group
CompletableFuture<Integer> task11 = CompletableFuture.supplyAsync(() -> 1);
CompletableFuture<Integer> task12 = CompletableFuture.supplyAsync(() -> 42);
CompletableFuture<Integer> task13 = CompletableFuture.supplyAsync(() -> 1729);
// this one will complete after all tasks from the first group complete
CompletableFuture<Void> allFirstTasks = CompletableFuture.allOf(task11, task12, task13);
// Second group will be child tasks from the first group
CompletableFuture<Integer> task21 = allFirstTasks.thenApply(__ ->
task11.join() + task12.join() + task13.join() // will not block
);
Note: using join() instead of get() to avoid handling of checked exceptions.

Related

Correct use of CompletableFuture.anyOf()

I want to make parallel calls to different external services and proceed with the first successful response.
Successful here means that the result returned by the service has certain values in certain fields.
However, for CompletableFuture, everything other than an exception is success. So, even for for business failures, I have to throw an exception to signal non-success to CompletableFuture. This feels wrong, ideally I would want to provide a boolean to indicate business success/failure. Is there a better way to signal business failures?
The second question I have is, how do I make sure I don't run out of threads due to the abandoned CompletableFutures that would keep running even after CompletableFutures.anyOf() returns. Ideally I want to force stop the threads but as per the thread below, the best I can do is cancel the downstream operations.
How to cancel Java 8 completable future?
When you call CompletableFuture#cancel, you only stop the downstream
part of the chain. Upstream part, i. e. something that will eventually
call complete(...) or completeExceptionally(...), doesn't get any
signal that the result is no more needed.
I can force stop treads by providing my own ExecutorService and calling shutdown()/shutdownNow() on it after CompletableFuture.anyOf() returns.
However I am not sure about the implications of creating a new ExecutorService instance for each request.
This is perhaps an attempt to look away from anyOf(), as there's just no easy way to cancel other tasks with it.
What this is doing is create and start async tasks, then keep a reference to the future along with an object that would be used to effectively terminate other tasks, which of course is dependent on your actual code/task. In this example, I'm just returning the input string (hence the type Pair<CompletableFuture<String>, String>); your code will probably have something like a request object.
ExecutorService exec = Executors.newFixedThreadPool(5);
List<String> source = List.of();
List<Pair<CompletableFuture<String>, String>> futures = source.stream()
.map(s -> Pair.of(CompletableFuture.supplyAsync(() -> s.toUpperCase(), exec), s))
.collect(Collectors.toList());
Consumer<Pair<CompletableFuture<String>, String>> cancelOtherTasksFunction = pair -> {
futures.stream()
.filter(future -> future != pair)
.forEach(future -> {
future.getLeft().cancel(true); // cancel the future
// future.getRight().doSomething() // cancel actual task
});
};
AtomicReference<String> result = new AtomicReference<>();
futures.forEach(future -> future.getLeft()
.thenAccept(s -> {
if (null == result.compareAndExchangeRelease(null, s)) {
// the first success is recorded
cancelOtherTasksFunction.accept(future);
}
}));
I suppose you can (should?) create a class to hold the future and the task object (such as an http request you could cancel) to replace the Pair.
Note:
if (null == result.compareAndExchangeRelease(null, s)) is only valid if your future returns a non-null value.
The first valid response will be in result, but you still need to test for blocking before returning it, although I suppose it should work as other tasks are being cancelled (that's the theory).
You may decide to make futures.forEach part of the stream pipeline above, just be careful to force all tasks to be submitted (which collect(Collectors.toList()) does).

Java CompletableFuture allOf approach

I am trying to run 3 operations in parallel using the CompletableFuture approach. Now these 3 operations return different types so need to retrieve the data separately. Here is what i am trying to do:
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync (() -> getAList());
CompletableFuture<Map<String,B> bFuture = CompletableFuture.supplyAsync (() -> getBMap());
CompletableFuture<Map<String,C> cFuture = CompletableFuture.supplyAsync (() -> getCMap());
CompletableFuture<Void> combinedFuture =
CompletableFuture.allOf (aFuture, bFuture, cFuture);
combinedFuture.get(); (or join())
List<A> aData = aFuture.get(); (or join)
Map<String, C> bData = bFuture.get(); (or join)
Map<String, C> cData = cFuture.get(); (or join)
This does the job and works but i am trying to understand if we need to do these gets/joins on combined future as well as individual ones and if there is a better way to do this.
Also i tried using then whenComplete() approach but then the variables i want to assign the returned data are inside the method so i am getting a "The final local variable cannot be assigned, since it is defined in an enclosing type in Java" error and i don't want to move them to the class level.
looking for some expert/alternate opinions. Thank you in advance
SG
Calling get or join just implies “wait for the completion”. It has no influence of the completion itself.
When you call CompletableFuture.supplyAsync(() -> getAList()), this method will submit the evaluation of the supplier to the common pool immediately. The caller’s only influence on the execution of getAList() is the fact that the JVM will terminate if there are no non-daemon threads running. This is a common error in simple test programs, incorporating a main method that doesn’t wait for completion. Otherwise, the execution of getAList() will complete, regardless of whether its result will ever be queried.
So when you use
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync(() -> getAList());
CompletableFuture<Map<String,B>> bFuture=CompletableFuture.supplyAsync(() -> getBMap());
CompletableFuture<Map<String,C>> cFuture=CompletableFuture.supplyAsync(() -> getCMap());
List<A> aData = aFuture.join();
Map<String, B> bData = bFuture.join();
Map<String, C> cData = cFuture.join();
The three subsequent supplyAsync calls ensure that the three operations might run concurrently. The three join() calls only wait for the result and when the third join() returned, you know that all three operations are completed. It’s possible that the first join() returns at a time when aFuture has been completed, but either or both of the other operations are still running, but that doesn’t matter for three independent operations.
When you execute CompletableFuture.allOf(aFuture, bFuture, cFuture).join(); before the individual join() calls, it ensures that all three operations completed before the first individual join() call, but as said, it has no impact when all three operations are independent and you’re not relying on some side effect of their execution (which you shouldn’t in general).
The actual purpose of allOf is to construct a new future when you do not want to wait for the result immediately. E.g.
record Result(List<A> aData, Map<String, B> bData, Map<String, C> cData) {}
CompletableFuture<Result> r = CompletableFuture.allOf(aFuture, bFuture, cFuture)
.thenApply(v -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
// return r or pass it to some other code...
here, the use of allOf is preferable to, e.g.
CompletableFuture<Result> r = CompletableFuture.supplyAsync(
() -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
because the latter might block a worker thread when join() is called from the supplier. The underlying framework might compensate when it detects this, e.g. start a new thread, but this is still an expensive operation. In contrast, the function chained to allOf is only evaluated after all futures completed, so all embedded join() calls are guaranteed to return immediately.
For a small number of futures, there’s still an alternative to allOf, e.g.
var r = aFuture.thenCompose(a ->
bFuture.thenCombine(cFuture, (b, c) -> new Result(a, b, c)));

Are `thenRunAsync(...)` and `CompletableFuture.runAsync(() -> { ... });` related at all?

I need to perform some extra tasks but let the original thread finish up, e.g. send back an HTTP response.
I think I can just do this:
return mainTasksFuture.thenApply(response -> {
CompletableFuture.runAsync(() -> {
// extra tasks
});
return response;
});
But I remembered there's a thenRunAsync. Is
return mainTasksFuture.thenApply(response -> {
return response;
}).thenRunAsync(() -> {
// extra tasks
});
basically another way to do the same thing? In other words, are the then*Async methods terminators (completion methods) that return the previous chain's result in the original thread, then spawn a new thread to execute the rest?
I'm almost certain the answer is no. It just seems it might be that purely based on method names, to someone new to CompletableFutures. I wanted a confirmation though, in case what I'm reading about ForkJoinPool.commonPool is actually saying what I'm doubting, just in a different way.
You wrote
It just ∗seems* it might be that purely based on method names, to someone new to CompletableFutures.
Well, the method names correctly reflect what the methods do. Both, runAsync and thenRunAsync initiate the asynchronous execution of a Runnable and return a future, which will be completed when the asynchronous execution has finished. So the similarity in the names is justified.
It’s your code which is fundamentally different.
In this variant
return mainTasksFuture.thenApply(response -> {
CompletableFuture.runAsync(() -> {
// extra tasks
});
return response;
});
you are ignoring the future returned by runAsync entirely, so the future returned by thenApply will be completed as soon as the asynchronous operation has been triggered. The caller can retrieve the result value while the “extra tasks” are still running concurrently.
In contrast, with
return mainTasksFuture.thenApply(response -> {
return response;
}).thenRunAsync(() -> {
// extra tasks
});
the thenApply is entirely obsolete as it doesn’t do anything. But you are returning the future returned by thenRunAsync, which will be completed when the asynchronous execution of the Runnable has finished and has the type CompletableFuture<Void>, as the runnable does not produce a value (the future will be completed with null). In the exceptional case, it would get completed with the exception of mainTasksFuture, but in the successful case, it does not pass through the result value.
If the first variant matches your actual intention (the caller should not depend on the completion of the extra tasks), simply don’t model them as a dependency:
mainTasksFuture.thenRunAsync(() -> {
// extra tasks
});
return mainTasksFuture; // does not depend on the completion of extra task
Otherwise, stay with variant 2 (minus obsolete things)
return mainTasksFuture.thenRunAsync(() -> {
// extra tasks
}); // depends on the completion of extra task but results in (Void)null
if you don’t need the result value. Otherwise, you can use
return mainTasksFuture.thenApplyAsync(response -> {
// extra tasks
return response;
}); // depends on the completion of extra task and returns original result
it would be the same as with
return mainTasksFuture.thenCompose(response ->
CompletableFuture.runAsync(() -> {
// extra tasks
}).thenApply(_void -> response));
which does not ignore the future returned by runAsync, but there’s no advantage in this complication, compared to thenApplyAsync.
Another alternative would be
return mainTasksFuture.whenComplete((response,failure) -> {
if(failure == null) {
// extra tasks
}
});
as the future returned by whenComplete will get completed with the original future’s result when the extra tasks have been completed. But the function is always evaluated, even when the original future completed exceptionally, so it needs another conditional if that’s not desired.
Both runAsync and thenRunAsync execute the Runnable taks asynchronous
executes the given action using this stage's default asynchronous execution facility
Question : In other words, are the then*Async methods terminators (completion methods) that return the previous chain's result in the original thread, then spawn a new thread to execute the rest?
Answer: No, From documentation One stage's execution may be triggered by completion of a single stage, or both of two stages, or either of two stages.So basically the result might be returned based on how programmer coded that part, but now in your case (using thenRunAsync) the result will be returned after first stage completion because in the second stage thenRunAsync you are taking result from first stage as input but not returning anything.
Interface CompletionStage
One stage's execution may be triggered by completion of a single stage, or both of two stages, or either of two stages. Dependencies on a single stage are arranged using methods with prefix then. Those triggered by completion of both of two stages may combine their results or effects, using correspondingly named methods. Those triggered by either of two stages make no guarantees about which of the results or effects are used for the dependent stage's computation.
There is also a slight difference between first example and second example
Example : 1 In this example the Runnable tasks get executed asynchronously before returning the result, both Function from thenApply and Runnable from runAsync will be executed concurrently
return mainTasksFuture.thenApply(response -> {
CompletableFuture.runAsync(() -> {
// extra tasks
});
return response;
});
Example : 2 In this example Runnable task from thenRunAsync will be executed after completion of Function from thenApply
return mainTasksFuture.thenApply(response -> {
return response;
}).thenRunAsync(() -> {
// extra tasks
});

Can we improve performance on lists other than java 8 parallel streams

I have to dump data from somewhere by calling rest API which returns List.
First i have to get some List object from one rest api. Now used parallel stream and gone through each item with forEach.
Now on for each element i have to call some other api to get the data which returns again list and save the same list by calling another rest api.
This is taking around 1 Hour for 6000 records of step 1.
I tried like below:
restApiMethodWhichReturns6000Records
.parallelStream().forEach(id ->{
anotherMethodWhichgetsSomeDataAndPostsToOtherRestCall(id);
});
public void anotherMethodWhichgetsSomeDataAndPostsToOtherRestCall(String id) {
sestApiToPostData(url,methodThatGetsListOfData(id));
}
parallelStream can cause unexpected behavior some times. It uses a common ForkJoinPool. So if you have parallel streams somewhere else in the code, it may have a blocking nature for long running tasks. Even in the same stream if some tasks are time taking, all the worker threads will be blocked.
A good discussion on this stackoverflow. Here you see some tricks to assign task specific ForkJoinPool.
First of all make sure your REST service is non-blocking.
One more thing you can do is to play with pool size by supplying -Djava.util.concurrent.ForkJoinPool.common.parallelism=4 to JVM.
IF the API calls are blocking, even when you run them in parallel, you will be able to do just a few calls in parallel.
I would try out a solution using CompletableFuture.
The code would be something like this:
List<CompletableFuture>> apiCallsFutures = restApiMethodWhichReturns6000Records
.stream()
.map(id -> CompletableFuture.supplyAsync(() -> getListOfData(id)) // Mapping the get list of data call to a Completable Future
.thenApply(listOfData -> callAPItoPOSTData(url, listOfData)) // when the get list call is complete, the post call can be performed
.collect(Collectors.toList());
CompletableFuture[] completableFutures = apiCallsFutures.toArray(new CompletableFuture[apiCallsFutures.size()]); // CompletableFuture.allOf accepts only arrays :(
CompletableFuture<Void> all = CompletableFuture.allOf(completableFutures); // Combine all the futures
all.get(); // perform calls
For more details about CompletableFutures, have a look over: https://www.baeldung.com/java-completablefuture

CompletableFuture single task that continues with many parallel tasks

I have the following code:
return CompletableFuture.supplyAsync(() -> {
return foo; // some custom object
})
.thenAccept(foo -> {
// ??? need to spawn N async parallel jobs that works on 'foo'
});
In english: the first task creates the foo object asynchronously; and then I need to run N parallel processes on it.
Is there a better way to do this then:
...
CompletableFuture[] parallel = new CompletableFuture[N];
for (int i = 0; i < N; i++) {
parallel[i] = CompletableFuture.runAsync(() -> {
work(foo);
});
}
CompletableFuture.allOf(parallel).join();
...
I don't like this as one thread gets locked while waiting N jobs to finish.
You can chain as many independent jobs as you like on a particular prerequisite job, e.g.
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
Collections.nCopies(N, base).forEach(f -> f.thenAcceptAsync(foo -> work(foo)));
will spawn N parallel jobs, invoking work(foo) concurrently, after the completion of the initial job which provides the Foo instance.
But keep in mind, that the underlying framework will consider the number of available CPU cores to size the thread pool actually executing the parallel jobs, so if N > #cores, some of these jobs may run one after another.
If the work is I/O bound, thus, you want to have a higher number of parallel threads, you have to specify your own executor.
The nCopies/forEach is not necessary, a for loop would do as well, but it provides a hint of how to handle subsequent jobs, that depend on the completion of all these parallel jobs:
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
CompletableFuture<Void> all = CompletableFuture.allOf(
Collections.nCopies(N, base).stream()
.map(f -> f.thenAcceptAsync(foo -> work(foo)))
.toArray(CompletableFuture<?>[]::new));
Now you can use all to check for the completion of all jobs or chain additional actions.
Since CompletableFuture.allOf already returns another CompletableFuture<Void>a you can just do another .thenAccept on it and extract the returned values from the CFs in parallel in the callback, that way you avoid calling join

Categories