Most efficient way to stream on list of Futures - java

I'm calling an async client method by streaming over a list of objects. The method returns Future.
What's the best way to iterate over the list of Futures returned after the call (so as to process those Future which comes first)?
Note: The async client only returns Future not CompletableFuture.
Following is the code:
List<Future<Object>> listOfFuture = objectsToProcess.parallelStream()
.map((object) -> {
/* calling an async client returning a Future<Object> */ })
.collect(Collectors.toList());

Having this list of List<Future<Object>>, I would submit it to a custom pool, instead of using the default stream parallel processing.
That is because the stream api uses a common pool for parallel processing and you will call get on those Futures(if it takes significant time for processing) - you will block all other stream operations that use parallel operations within your application until this one is done.
This would a bit like this:
forJoinPool.submit( () -> list.stream().parallel().map(future -> future.get()).collect(Collectors.toList())).get();
I would go with a custom pool like shown here

Related

Composing *many* completable futures

All the available literature shows CompletableFuture composition where a single future is composed with another using thenCompose to return a CompletableFuture<ResultType>.
In my circumstance I have a single initial seedFuture which yields many pieces of data instead of one, each of which requires further async work.
final CompletableFuture<List<NeedsMoreAsyncWork>> seedFuture = supplyAsync(() -> {
// do async work
}, someExecutor);
Can this be expressed using a fluent API, so the end result is a List<ComposableFuture<ResultType>>? Or must I revert to using a for loop?
The completable future API only has a method to run the futures in parallel: CompletableFuture.allOf(...).
If you need to run the futures in the order they appear in the list, then the only option I'm aware of is a custom loop:
List<Object> objects = ...
CompletableFuture loop = CompletableFuture.completedFuture(null);
for (Object object : objects) {
loop = loop.thenCompose( r -> future(object));
}
The main issue with this solution is that if you have many futures, it could cause a StackOverflow exception. You can prevent it using the Trampoline pattern. An implementation of this pattern is available via com.ibm.sync:asyncutil. The project is on GitHub.

Correct use of CompletableFuture.anyOf()

I want to make parallel calls to different external services and proceed with the first successful response.
Successful here means that the result returned by the service has certain values in certain fields.
However, for CompletableFuture, everything other than an exception is success. So, even for for business failures, I have to throw an exception to signal non-success to CompletableFuture. This feels wrong, ideally I would want to provide a boolean to indicate business success/failure. Is there a better way to signal business failures?
The second question I have is, how do I make sure I don't run out of threads due to the abandoned CompletableFutures that would keep running even after CompletableFutures.anyOf() returns. Ideally I want to force stop the threads but as per the thread below, the best I can do is cancel the downstream operations.
How to cancel Java 8 completable future?
When you call CompletableFuture#cancel, you only stop the downstream
part of the chain. Upstream part, i. e. something that will eventually
call complete(...) or completeExceptionally(...), doesn't get any
signal that the result is no more needed.
I can force stop treads by providing my own ExecutorService and calling shutdown()/shutdownNow() on it after CompletableFuture.anyOf() returns.
However I am not sure about the implications of creating a new ExecutorService instance for each request.
This is perhaps an attempt to look away from anyOf(), as there's just no easy way to cancel other tasks with it.
What this is doing is create and start async tasks, then keep a reference to the future along with an object that would be used to effectively terminate other tasks, which of course is dependent on your actual code/task. In this example, I'm just returning the input string (hence the type Pair<CompletableFuture<String>, String>); your code will probably have something like a request object.
ExecutorService exec = Executors.newFixedThreadPool(5);
List<String> source = List.of();
List<Pair<CompletableFuture<String>, String>> futures = source.stream()
.map(s -> Pair.of(CompletableFuture.supplyAsync(() -> s.toUpperCase(), exec), s))
.collect(Collectors.toList());
Consumer<Pair<CompletableFuture<String>, String>> cancelOtherTasksFunction = pair -> {
futures.stream()
.filter(future -> future != pair)
.forEach(future -> {
future.getLeft().cancel(true); // cancel the future
// future.getRight().doSomething() // cancel actual task
});
};
AtomicReference<String> result = new AtomicReference<>();
futures.forEach(future -> future.getLeft()
.thenAccept(s -> {
if (null == result.compareAndExchangeRelease(null, s)) {
// the first success is recorded
cancelOtherTasksFunction.accept(future);
}
}));
I suppose you can (should?) create a class to hold the future and the task object (such as an http request you could cancel) to replace the Pair.
Note:
if (null == result.compareAndExchangeRelease(null, s)) is only valid if your future returns a non-null value.
The first valid response will be in result, but you still need to test for blocking before returning it, although I suppose it should work as other tasks are being cancelled (that's the theory).
You may decide to make futures.forEach part of the stream pipeline above, just be careful to force all tasks to be submitted (which collect(Collectors.toList()) does).

Can we improve performance on lists other than java 8 parallel streams

I have to dump data from somewhere by calling rest API which returns List.
First i have to get some List object from one rest api. Now used parallel stream and gone through each item with forEach.
Now on for each element i have to call some other api to get the data which returns again list and save the same list by calling another rest api.
This is taking around 1 Hour for 6000 records of step 1.
I tried like below:
restApiMethodWhichReturns6000Records
.parallelStream().forEach(id ->{
anotherMethodWhichgetsSomeDataAndPostsToOtherRestCall(id);
});
public void anotherMethodWhichgetsSomeDataAndPostsToOtherRestCall(String id) {
sestApiToPostData(url,methodThatGetsListOfData(id));
}
parallelStream can cause unexpected behavior some times. It uses a common ForkJoinPool. So if you have parallel streams somewhere else in the code, it may have a blocking nature for long running tasks. Even in the same stream if some tasks are time taking, all the worker threads will be blocked.
A good discussion on this stackoverflow. Here you see some tricks to assign task specific ForkJoinPool.
First of all make sure your REST service is non-blocking.
One more thing you can do is to play with pool size by supplying -Djava.util.concurrent.ForkJoinPool.common.parallelism=4 to JVM.
IF the API calls are blocking, even when you run them in parallel, you will be able to do just a few calls in parallel.
I would try out a solution using CompletableFuture.
The code would be something like this:
List<CompletableFuture>> apiCallsFutures = restApiMethodWhichReturns6000Records
.stream()
.map(id -> CompletableFuture.supplyAsync(() -> getListOfData(id)) // Mapping the get list of data call to a Completable Future
.thenApply(listOfData -> callAPItoPOSTData(url, listOfData)) // when the get list call is complete, the post call can be performed
.collect(Collectors.toList());
CompletableFuture[] completableFutures = apiCallsFutures.toArray(new CompletableFuture[apiCallsFutures.size()]); // CompletableFuture.allOf accepts only arrays :(
CompletableFuture<Void> all = CompletableFuture.allOf(completableFutures); // Combine all the futures
all.get(); // perform calls
For more details about CompletableFutures, have a look over: https://www.baeldung.com/java-completablefuture

Using Threads in Loop

I have a for loop which needs to execute 36000 times
for(int i=0;i<36000;i++)
{
}
Whether its possible to use Multiple threads inorder to execute the loop faster at the same time
Please suggest how to use it.
If you want a more explicit method, you can use thread pools with Thread, Callable or Runnable. See my answere here for examples:
Java : a method to do multiple calculations on arrays quickly
Thread won't naturally exit at end of run()
I do not recommend using Java's Fork/Join as they are not that great as they were hyped to be. Performance is pretty bad. Instead, I would use Java 8's map and parallel streams if you want to make it easy. You have several options using this method.
IntStream.range(1, 4)
.mapToObj(i -> "testing " + i)
.forEach(System.out::println);
You would want to call map( lambda ). Java 8 finally brings lambda functions. It is possible to feed the stream one huge list, but there will be a performance impact. IntStream.range will do what you want. Then you need to figure out which of the new functions you want to use like filter, map, count, sum, reduce, etc. You may have to tell it that you want it to be a parallel stream. See these links.
https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html
http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/
Classic method and still has the best performance is to do it yourself using a thread pool:
Basically, you would create a Runnable (does not return something) or Callable (returns a result) object that will do some work on one of the treads in the pool. The pool with handle scheduling, which is great for us. Java has several options on the pool you use. You can create a Runnable/Callable in a loop, then submit that into the pool. The pool immediately returns a Future object that represents the task. You can add that Future to an ArrayList if you have many of these. After adding all the futures to the list, loop through them and call future.get(), which will wait for the end of execution. See the linked example above, which does not use a list, but does everything else I said.

How does CompletableFuture know that tasks are independent?

Imagine that we have the following dummy code:
CompletableFuture<BigInteger> cf1 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(2L));
CompletableFuture<BigInteger> cf2 = CompletableFuture.supplyAsync(() -> BigInteger.valueOf(3L));
cf1.thenCombine(cf2, (x, y) -> x.add(y)).thenAccept(System.out::println);
Does JVM know that cf1 and cf2 carry independent threads in this case? And what will change if threads will be dependent (for example, use one connection to database)?
More general, how does CompletableFuture synchronize threads?
A CompletableFuture has no relation to any thread. It is just a holder for a result retrieved asynchronously with methods to operate on that result.
The static supplyAsync and runAsync methods are just helper methods. The javadoc of supplyAsync states
Returns a new CompletableFuture that is asynchronously completed by a
task running in the ForkJoinPool.commonPool() with the value obtained
by calling the given Supplier.
This is more or less equivalent to
Supplier<R> sup = ...;
CompletableFuture<R> future = new CompletableFuture<R>();
ForkJoinPool.commonPool().submit(() -> {
try {
R result = sup.get();
future.complete(result);
} catch (Throwable e) {
future.completeExceptionally(e);
}
});
return future;
The CompletableFuture is returned, even allowing you to complete it before the task submitted to the pool.
More general, how does CompletableFuture synchronize threads?
It doesn't, since it doesn't know which threads are operating on it. This is further hinted at in the javadoc
Since (unlike FutureTask) this class has no direct control over the
computation that causes it to be completed, cancellation is treated as
just another form of exceptional completion. Method cancel has the
same effect as completeExceptionally(new CancellationException()).
Method isCompletedExceptionally() can be used to determine if a
CompletableFuture completed in any exceptional fashion.
CompletableFuture objects do not control processing.
I don't think that a CompletableFuture (CF) "synchronizes threads". It uses the executor you have provided or the common pool if you have not provided one.
When you call supplyAsync, the CF submits the various tasks to that pool which in turns manages the underlying threads to execute the tasks.
It doesn't know, nor does it try to synchronize anything. It is still the client's responsibility to properly synchronize access to mutable shared data.

Categories