So I have a get operation where a user supplys a list of items.
For each item in the list I need to make an API call which is what takes the most time in my application.
Lets say the flow is List -> Operation A -> Operation B ( API Call ) -> Operation C
So instead of processing all the items in the list sequentially, I was thinking if I could create multiple threads. All the threads would have to finish Operation B before Operation C starts. Operations A and C don't need to be multithreaded.
Is there a straightforward way to do this?
You could try to use CompletableFuture for this, something along the lines of:
ExecutorService executorService = Executors.newFixedThreadPool(3);
CompletableFuture<?>[] all =
youList.stream()
.map(x -> CompletableFuture.supplyAsync(() -> operationA(x))
.thenApplyAsync(YourClass::operationB)
.thenApplyAsync(YouClass::operationC))
.toArray(x -> new CompletableFuture[0]);
CompletableFuture.allOf(all);
Related
I want to make parallel calls to different external services and proceed with the first successful response.
Successful here means that the result returned by the service has certain values in certain fields.
However, for CompletableFuture, everything other than an exception is success. So, even for for business failures, I have to throw an exception to signal non-success to CompletableFuture. This feels wrong, ideally I would want to provide a boolean to indicate business success/failure. Is there a better way to signal business failures?
The second question I have is, how do I make sure I don't run out of threads due to the abandoned CompletableFutures that would keep running even after CompletableFutures.anyOf() returns. Ideally I want to force stop the threads but as per the thread below, the best I can do is cancel the downstream operations.
How to cancel Java 8 completable future?
When you call CompletableFuture#cancel, you only stop the downstream
part of the chain. Upstream part, i. e. something that will eventually
call complete(...) or completeExceptionally(...), doesn't get any
signal that the result is no more needed.
I can force stop treads by providing my own ExecutorService and calling shutdown()/shutdownNow() on it after CompletableFuture.anyOf() returns.
However I am not sure about the implications of creating a new ExecutorService instance for each request.
This is perhaps an attempt to look away from anyOf(), as there's just no easy way to cancel other tasks with it.
What this is doing is create and start async tasks, then keep a reference to the future along with an object that would be used to effectively terminate other tasks, which of course is dependent on your actual code/task. In this example, I'm just returning the input string (hence the type Pair<CompletableFuture<String>, String>); your code will probably have something like a request object.
ExecutorService exec = Executors.newFixedThreadPool(5);
List<String> source = List.of();
List<Pair<CompletableFuture<String>, String>> futures = source.stream()
.map(s -> Pair.of(CompletableFuture.supplyAsync(() -> s.toUpperCase(), exec), s))
.collect(Collectors.toList());
Consumer<Pair<CompletableFuture<String>, String>> cancelOtherTasksFunction = pair -> {
futures.stream()
.filter(future -> future != pair)
.forEach(future -> {
future.getLeft().cancel(true); // cancel the future
// future.getRight().doSomething() // cancel actual task
});
};
AtomicReference<String> result = new AtomicReference<>();
futures.forEach(future -> future.getLeft()
.thenAccept(s -> {
if (null == result.compareAndExchangeRelease(null, s)) {
// the first success is recorded
cancelOtherTasksFunction.accept(future);
}
}));
I suppose you can (should?) create a class to hold the future and the task object (such as an http request you could cancel) to replace the Pair.
Note:
if (null == result.compareAndExchangeRelease(null, s)) is only valid if your future returns a non-null value.
The first valid response will be in result, but you still need to test for blocking before returning it, although I suppose it should work as other tasks are being cancelled (that's the theory).
You may decide to make futures.forEach part of the stream pipeline above, just be careful to force all tasks to be submitted (which collect(Collectors.toList()) does).
I am trying to run 3 operations in parallel using the CompletableFuture approach. Now these 3 operations return different types so need to retrieve the data separately. Here is what i am trying to do:
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync (() -> getAList());
CompletableFuture<Map<String,B> bFuture = CompletableFuture.supplyAsync (() -> getBMap());
CompletableFuture<Map<String,C> cFuture = CompletableFuture.supplyAsync (() -> getCMap());
CompletableFuture<Void> combinedFuture =
CompletableFuture.allOf (aFuture, bFuture, cFuture);
combinedFuture.get(); (or join())
List<A> aData = aFuture.get(); (or join)
Map<String, C> bData = bFuture.get(); (or join)
Map<String, C> cData = cFuture.get(); (or join)
This does the job and works but i am trying to understand if we need to do these gets/joins on combined future as well as individual ones and if there is a better way to do this.
Also i tried using then whenComplete() approach but then the variables i want to assign the returned data are inside the method so i am getting a "The final local variable cannot be assigned, since it is defined in an enclosing type in Java" error and i don't want to move them to the class level.
looking for some expert/alternate opinions. Thank you in advance
SG
Calling get or join just implies “wait for the completion”. It has no influence of the completion itself.
When you call CompletableFuture.supplyAsync(() -> getAList()), this method will submit the evaluation of the supplier to the common pool immediately. The caller’s only influence on the execution of getAList() is the fact that the JVM will terminate if there are no non-daemon threads running. This is a common error in simple test programs, incorporating a main method that doesn’t wait for completion. Otherwise, the execution of getAList() will complete, regardless of whether its result will ever be queried.
So when you use
CompletableFuture<List<A>> aFuture = CompletableFuture.supplyAsync(() -> getAList());
CompletableFuture<Map<String,B>> bFuture=CompletableFuture.supplyAsync(() -> getBMap());
CompletableFuture<Map<String,C>> cFuture=CompletableFuture.supplyAsync(() -> getCMap());
List<A> aData = aFuture.join();
Map<String, B> bData = bFuture.join();
Map<String, C> cData = cFuture.join();
The three subsequent supplyAsync calls ensure that the three operations might run concurrently. The three join() calls only wait for the result and when the third join() returned, you know that all three operations are completed. It’s possible that the first join() returns at a time when aFuture has been completed, but either or both of the other operations are still running, but that doesn’t matter for three independent operations.
When you execute CompletableFuture.allOf(aFuture, bFuture, cFuture).join(); before the individual join() calls, it ensures that all three operations completed before the first individual join() call, but as said, it has no impact when all three operations are independent and you’re not relying on some side effect of their execution (which you shouldn’t in general).
The actual purpose of allOf is to construct a new future when you do not want to wait for the result immediately. E.g.
record Result(List<A> aData, Map<String, B> bData, Map<String, C> cData) {}
CompletableFuture<Result> r = CompletableFuture.allOf(aFuture, bFuture, cFuture)
.thenApply(v -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
// return r or pass it to some other code...
here, the use of allOf is preferable to, e.g.
CompletableFuture<Result> r = CompletableFuture.supplyAsync(
() -> new Result(aFuture.join(), bFuture.join(), cFuture.join()));
because the latter might block a worker thread when join() is called from the supplier. The underlying framework might compensate when it detects this, e.g. start a new thread, but this is still an expensive operation. In contrast, the function chained to allOf is only evaluated after all futures completed, so all embedded join() calls are guaranteed to return immediately.
For a small number of futures, there’s still an alternative to allOf, e.g.
var r = aFuture.thenCompose(a ->
bFuture.thenCombine(cFuture, (b, c) -> new Result(a, b, c)));
I have to dump data from somewhere by calling rest API which returns List.
First i have to get some List object from one rest api. Now used parallel stream and gone through each item with forEach.
Now on for each element i have to call some other api to get the data which returns again list and save the same list by calling another rest api.
This is taking around 1 Hour for 6000 records of step 1.
I tried like below:
restApiMethodWhichReturns6000Records
.parallelStream().forEach(id ->{
anotherMethodWhichgetsSomeDataAndPostsToOtherRestCall(id);
});
public void anotherMethodWhichgetsSomeDataAndPostsToOtherRestCall(String id) {
sestApiToPostData(url,methodThatGetsListOfData(id));
}
parallelStream can cause unexpected behavior some times. It uses a common ForkJoinPool. So if you have parallel streams somewhere else in the code, it may have a blocking nature for long running tasks. Even in the same stream if some tasks are time taking, all the worker threads will be blocked.
A good discussion on this stackoverflow. Here you see some tricks to assign task specific ForkJoinPool.
First of all make sure your REST service is non-blocking.
One more thing you can do is to play with pool size by supplying -Djava.util.concurrent.ForkJoinPool.common.parallelism=4 to JVM.
IF the API calls are blocking, even when you run them in parallel, you will be able to do just a few calls in parallel.
I would try out a solution using CompletableFuture.
The code would be something like this:
List<CompletableFuture>> apiCallsFutures = restApiMethodWhichReturns6000Records
.stream()
.map(id -> CompletableFuture.supplyAsync(() -> getListOfData(id)) // Mapping the get list of data call to a Completable Future
.thenApply(listOfData -> callAPItoPOSTData(url, listOfData)) // when the get list call is complete, the post call can be performed
.collect(Collectors.toList());
CompletableFuture[] completableFutures = apiCallsFutures.toArray(new CompletableFuture[apiCallsFutures.size()]); // CompletableFuture.allOf accepts only arrays :(
CompletableFuture<Void> all = CompletableFuture.allOf(completableFutures); // Combine all the futures
all.get(); // perform calls
For more details about CompletableFutures, have a look over: https://www.baeldung.com/java-completablefuture
I have this case where there is 10 or more tasks that are grouped into many groups. Inside these groups everything should run concurrently, but because each group needs the results of the previous group (with exception of the first group), I need to run them in an orderly fashion (Tasks inside a group don't need to run in order).
The tasks themselves are querying data from database then apply some transformation and save it back to database.
Task 1.1 // This group run first
Task 1.2
Task 2.1 // Waiting results from group 1
Task 2.2
Task 2.3
Task 3.1 // Waiting results from group 2
I was thinking to use list of allOf(), iterate it then explicitly call get() for each of that allOf(), but it will block which I don't want it to happen, so my question is, how to execute many allOf() in order? Is iteven possible to use only CompletableFuture here?
When you use allOf(), it returns a CompletableFuture that will complete only when all of the given completion stages are completed.
If you chain calls from the returned future, they are thus guaranteed that a call to get() on any of the completion stages passed to allOf() will never block (since they are already completed).
// First group
CompletableFuture<Integer> task11 = CompletableFuture.supplyAsync(() -> 1);
CompletableFuture<Integer> task12 = CompletableFuture.supplyAsync(() -> 42);
CompletableFuture<Integer> task13 = CompletableFuture.supplyAsync(() -> 1729);
// this one will complete after all tasks from the first group complete
CompletableFuture<Void> allFirstTasks = CompletableFuture.allOf(task11, task12, task13);
// Second group will be child tasks from the first group
CompletableFuture<Integer> task21 = allFirstTasks.thenApply(__ ->
task11.join() + task12.join() + task13.join() // will not block
);
Note: using join() instead of get() to avoid handling of checked exceptions.
I have the following code:
return CompletableFuture.supplyAsync(() -> {
return foo; // some custom object
})
.thenAccept(foo -> {
// ??? need to spawn N async parallel jobs that works on 'foo'
});
In english: the first task creates the foo object asynchronously; and then I need to run N parallel processes on it.
Is there a better way to do this then:
...
CompletableFuture[] parallel = new CompletableFuture[N];
for (int i = 0; i < N; i++) {
parallel[i] = CompletableFuture.runAsync(() -> {
work(foo);
});
}
CompletableFuture.allOf(parallel).join();
...
I don't like this as one thread gets locked while waiting N jobs to finish.
You can chain as many independent jobs as you like on a particular prerequisite job, e.g.
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
Collections.nCopies(N, base).forEach(f -> f.thenAcceptAsync(foo -> work(foo)));
will spawn N parallel jobs, invoking work(foo) concurrently, after the completion of the initial job which provides the Foo instance.
But keep in mind, that the underlying framework will consider the number of available CPU cores to size the thread pool actually executing the parallel jobs, so if N > #cores, some of these jobs may run one after another.
If the work is I/O bound, thus, you want to have a higher number of parallel threads, you have to specify your own executor.
The nCopies/forEach is not necessary, a for loop would do as well, but it provides a hint of how to handle subsequent jobs, that depend on the completion of all these parallel jobs:
CompletableFuture<Foo> base=CompletableFuture.supplyAsync(() -> new Foo());
CompletableFuture<Void> all = CompletableFuture.allOf(
Collections.nCopies(N, base).stream()
.map(f -> f.thenAcceptAsync(foo -> work(foo)))
.toArray(CompletableFuture<?>[]::new));
Now you can use all to check for the completion of all jobs or chain additional actions.
Since CompletableFuture.allOf already returns another CompletableFuture<Void>a you can just do another .thenAccept on it and extract the returned values from the CFs in parallel in the callback, that way you avoid calling join