Composing *many* completable futures

Composing *many* completable futures - java

All the available literature shows CompletableFuture composition where a single future is composed with another using thenCompose to return a CompletableFuture<ResultType>.
In my circumstance I have a single initial seedFuture which yields many pieces of data instead of one, each of which requires further async work.
final CompletableFuture<List<NeedsMoreAsyncWork>> seedFuture = supplyAsync(() -> {
// do async work
}, someExecutor);
Can this be expressed using a fluent API, so the end result is a List<ComposableFuture<ResultType>>? Or must I revert to using a for loop?

The completable future API only has a method to run the futures in parallel: CompletableFuture.allOf(...).
If you need to run the futures in the order they appear in the list, then the only option I'm aware of is a custom loop:
List<Object> objects = ...
CompletableFuture loop = CompletableFuture.completedFuture(null);
for (Object object : objects) {
loop = loop.thenCompose( r -> future(object));
}
The main issue with this solution is that if you have many futures, it could cause a StackOverflow exception. You can prevent it using the Trampoline pattern. An implementation of this pattern is available via com.ibm.sync:asyncutil. The project is on GitHub.

Related

Correct use of CompletableFuture.anyOf()

I want to make parallel calls to different external services and proceed with the first successful response.
Successful here means that the result returned by the service has certain values in certain fields.
However, for CompletableFuture, everything other than an exception is success. So, even for for business failures, I have to throw an exception to signal non-success to CompletableFuture. This feels wrong, ideally I would want to provide a boolean to indicate business success/failure. Is there a better way to signal business failures?
The second question I have is, how do I make sure I don't run out of threads due to the abandoned CompletableFutures that would keep running even after CompletableFutures.anyOf() returns. Ideally I want to force stop the threads but as per the thread below, the best I can do is cancel the downstream operations.
How to cancel Java 8 completable future?
When you call CompletableFuture#cancel, you only stop the downstream
part of the chain. Upstream part, i. e. something that will eventually
call complete(...) or completeExceptionally(...), doesn't get any
signal that the result is no more needed.
I can force stop treads by providing my own ExecutorService and calling shutdown()/shutdownNow() on it after CompletableFuture.anyOf() returns.
However I am not sure about the implications of creating a new ExecutorService instance for each request.

This is perhaps an attempt to look away from anyOf(), as there's just no easy way to cancel other tasks with it.
What this is doing is create and start async tasks, then keep a reference to the future along with an object that would be used to effectively terminate other tasks, which of course is dependent on your actual code/task. In this example, I'm just returning the input string (hence the type Pair<CompletableFuture<String>, String>); your code will probably have something like a request object.
ExecutorService exec = Executors.newFixedThreadPool(5);
List<String> source = List.of();
List<Pair<CompletableFuture<String>, String>> futures = source.stream()
.map(s -> Pair.of(CompletableFuture.supplyAsync(() -> s.toUpperCase(), exec), s))
.collect(Collectors.toList());
Consumer<Pair<CompletableFuture<String>, String>> cancelOtherTasksFunction = pair -> {
futures.stream()
.filter(future -> future != pair)
.forEach(future -> {
future.getLeft().cancel(true); // cancel the future
// future.getRight().doSomething() // cancel actual task
});
};
AtomicReference<String> result = new AtomicReference<>();
futures.forEach(future -> future.getLeft()
.thenAccept(s -> {
if (null == result.compareAndExchangeRelease(null, s)) {
// the first success is recorded
cancelOtherTasksFunction.accept(future);
}
}));
I suppose you can (should?) create a class to hold the future and the task object (such as an http request you could cancel) to replace the Pair.
Note:
if (null == result.compareAndExchangeRelease(null, s)) is only valid if your future returns a non-null value.
The first valid response will be in result, but you still need to test for blocking before returning it, although I suppose it should work as other tasks are being cancelled (that's the theory).
You may decide to make futures.forEach part of the stream pipeline above, just be careful to force all tasks to be submitted (which collect(Collectors.toList()) does).

Can we improve performance on lists other than java 8 parallel streams

I have to dump data from somewhere by calling rest API which returns List.
First i have to get some List object from one rest api. Now used parallel stream and gone through each item with forEach.
Now on for each element i have to call some other api to get the data which returns again list and save the same list by calling another rest api.
This is taking around 1 Hour for 6000 records of step 1.
I tried like below:
restApiMethodWhichReturns6000Records
.parallelStream().forEach(id ->{
anotherMethodWhichgetsSomeDataAndPostsToOtherRestCall(id);
});
public void anotherMethodWhichgetsSomeDataAndPostsToOtherRestCall(String id) {
sestApiToPostData(url,methodThatGetsListOfData(id));
}

parallelStream can cause unexpected behavior some times. It uses a common ForkJoinPool. So if you have parallel streams somewhere else in the code, it may have a blocking nature for long running tasks. Even in the same stream if some tasks are time taking, all the worker threads will be blocked.
A good discussion on this stackoverflow. Here you see some tricks to assign task specific ForkJoinPool.
First of all make sure your REST service is non-blocking.
One more thing you can do is to play with pool size by supplying -Djava.util.concurrent.ForkJoinPool.common.parallelism=4 to JVM.

IF the API calls are blocking, even when you run them in parallel, you will be able to do just a few calls in parallel.
I would try out a solution using CompletableFuture.
The code would be something like this:
List<CompletableFuture>> apiCallsFutures = restApiMethodWhichReturns6000Records
.stream()
.map(id -> CompletableFuture.supplyAsync(() -> getListOfData(id)) // Mapping the get list of data call to a Completable Future
.thenApply(listOfData -> callAPItoPOSTData(url, listOfData)) // when the get list call is complete, the post call can be performed
.collect(Collectors.toList());
CompletableFuture[] completableFutures = apiCallsFutures.toArray(new CompletableFuture[apiCallsFutures.size()]); // CompletableFuture.allOf accepts only arrays :(
CompletableFuture<Void> all = CompletableFuture.allOf(completableFutures); // Combine all the futures
all.get(); // perform calls
For more details about CompletableFutures, have a look over: https://www.baeldung.com/java-completablefuture

Most efficient way to stream on list of Futures

I'm calling an async client method by streaming over a list of objects. The method returns Future.
What's the best way to iterate over the list of Futures returned after the call (so as to process those Future which comes first)?
Note: The async client only returns Future not CompletableFuture.
Following is the code:
List<Future<Object>> listOfFuture = objectsToProcess.parallelStream()
.map((object) -> {
/* calling an async client returning a Future<Object> */ })
.collect(Collectors.toList());

Having this list of List<Future<Object>>, I would submit it to a custom pool, instead of using the default stream parallel processing.
That is because the stream api uses a common pool for parallel processing and you will call get on those Futures(if it takes significant time for processing) - you will block all other stream operations that use parallel operations within your application until this one is done.
This would a bit like this:
forJoinPool.submit( () -> list.stream().parallel().map(future -> future.get()).collect(Collectors.toList())).get();
I would go with a custom pool like shown here

Combine a list of Observables and wait until all completed

TL;DR
How to convert Task.whenAll(List<Task>) into RxJava?
My existing code uses Bolts to build up a list of asynchronous tasks and waits until all of those tasks finish before performing other steps. Essentially, it builds up a List<Task> and returns a single Task which is marked as completed when all tasks in the list complete, as per the example on the Bolts site.
I'm looking to replace Bolts with RxJava and I'm assuming this method of building up a list of async tasks (size not known in advance) and wrapping them all into a single Observable is possible, but I don't know how.
I've tried looking at merge, zip, concat etc... but can't get to work on the List<Observable> that I'd be building up as they all seem geared to working on just two Observables at a time if I understand the docs correctly.
I'm trying to learn RxJava and am still very new to it so forgive me if this is an obvious question or explained in the docs somewhere; I have tried searching. Any help would be much appreciated.

You can use flatMap in case you have dynamic tasks composition. Something like this:
public Observable<Boolean> whenAll(List<Observable<Boolean>> tasks) {
return Observable.from(tasks)
//execute in parallel
.flatMap(task -> task.observeOn(Schedulers.computation()))
//wait, until all task are executed
//be aware, all your observable should emit onComplete event
//otherwise you will wait forever
.toList()
//could implement more intelligent logic. eg. check that everything is successful
.map(results -> true);
}
Another good example of parallel execution
Note: I do not really know your requirements for error handling. For example, what to do if only one task fails. I think you should verify this scenario.

It sounds like you're looking for the Zip operator.
There are a few different ways of using it, so let's look at an example. Say we have a few simple observables of different types:
Observable<Integer> obs1 = Observable.just(1);
Observable<String> obs2 = Observable.just("Blah");
Observable<Boolean> obs3 = Observable.just(true);
The simplest way to wait for them all is something like this:
Observable.zip(obs1, obs2, obs3, (Integer i, String s, Boolean b) -> i + " " + s + " " + b)
.subscribe(str -> System.out.println(str));
Note that in the zip function, the parameters have concrete types that correspond to the types of the observables being zipped.
Zipping a list of observables is also possible, either directly:
List<Observable<?>> obsList = Arrays.asList(obs1, obs2, obs3);
Observable.zip(obsList, (i) -> i[0] + " " + i[1] + " " + i[2])
.subscribe(str -> System.out.println(str));
...or by wrapping the list into an Observable<Observable<?>>:
Observable<Observable<?>> obsObs = Observable.from(obsList);
Observable.zip(obsObs, (i) -> i[0] + " " + i[1] + " " + i[2])
.subscribe(str -> System.out.println(str));
However, in both of these cases, the zip function can only accept a single Object[] parameter since the types of the observables in the list are not known in advance as well as their number. This means that that the zip function would have to check the number of parameters and cast them accordingly.
Regardless, all of the above examples will eventually print 1 Blah true
EDIT: When using Zip, make sure that the Observables being zipped all emit the same number of items. In the above examples all three observables emitted a single item. If we were to change them to something like this:
Observable<Integer> obs1 = Observable.from(new Integer[]{1,2,3}); //Emits three items
Observable<String> obs2 = Observable.from(new String[]{"Blah","Hello"}); //Emits two items
Observable<Boolean> obs3 = Observable.from(new Boolean[]{true,true}); //Emits two items
Then 1, Blah, True and 2, Hello, True would be the only items passed into the zip function(s). The item 3would never be zipped since the other observables have completed.

Of the suggestions proposed, zip() actually combines observable results with each other, which may or may not be what is wanted, but was not asked in the question. In the question, all that was wanted was execution of each of the operations, either one-by-one or in parallel (which was not specified, but linked Bolts example was about parallel execution). Also, zip() will complete immediately when any of the observables complete, so it's in violation of the requirements.
For parallel execution of Observables, flatMap() presented in the other answer is fine, but merge() would be more straight-forward. Note that merge will exit on error of any of the Observables, if you rather postpone the exit until all observables have finished, you should be looking at mergeDelayError().
For one-by-one, I think Observable.concat() static method should be used. Its javadoc states like this:
concat(java.lang.Iterable> sequences)
Flattens an Iterable of Observables into one Observable, one after the other, without interleaving them
which sounds like what you're after if you don't want parallel execution.
Also, if you're only interested in the completion of your task, not return values, you should probably look into Completable instead of Observable.
TLDR: for one-by-one execution of tasks and oncompletion event when they are completed, I think Completable.concat() is best suited. For parallel execution, Completable.merge() or Completable.mergeDelayError() sounds like the solution. The former one will stop immediately on any error on any completable, the latter one will execute them all even if one of them has an error, and only then reports the error.

With Kotlin
Observable.zip(obs1, obs2, BiFunction { t1 : Boolean, t2:Boolean ->
})
It's important to set the type for the function's arguments or you will have compilation errors
The last argument type change with the number of argument :
BiFunction for 2
Function3 for 3
Function4 for 4
...

You probably looked at the zip operator that works with 2 Observables.
There is also the static method Observable.zip. It has one form which should be useful for you:
zip(java.lang.Iterable<? extends Observable<?>> ws, FuncN<? extends R> zipFunction)
You can check out the javadoc for more.

I'm writing some computation heave code in Kotlin with JavaRx Observables and RxKotlin. I want to observe a list of observables to be completed and in the meantime giving me an update with the progress and latest result. At the end it returns the best calculation result. An extra requirement was to run Observables in parallel for using all my cpu cores. I ended up with this solution:
#Volatile var results: MutableList<CalculationResult> = mutableListOf()
fun doALotOfCalculations(listOfCalculations: List<Calculation>): Observable<Pair<String, CalculationResult>> {
return Observable.create { subscriber ->
Observable.concatEager(listOfCalculations.map { calculation: Calculation ->
doCalculation(calculation).subscribeOn(Schedulers.computation()) // function doCalculation returns an Observable with only one result
}).subscribeBy(
onNext = {
results.add(it)
subscriber.onNext(Pair("A calculation is ready", it))
},
onComplete = {
subscriber.onNext(Pair("Finished: ${results.size}", findBestCalculation(results))
subscriber.onComplete()
},
onError = {
subscriber.onError(it)
}
)
}
}

I had similar problem, I needed to fetch search items from rest call while also integrate saved suggestions from a RecentSearchProvider.AUTHORITY and combine them together to one unified list. I was trying to use #MyDogTom solution, unfortunately there is no Observable.from in RxJava. After some research I got a solution that worked for me.
fun getSearchedResultsSuggestions(context : Context, query : String) : Single<ArrayList<ArrayList<SearchItem>>>
{
val fetchedItems = ArrayList<Observable<ArrayList<SearchItem>>>(0)
fetchedItems.add(fetchSearchSuggestions(context,query).toObservable())
fetchedItems.add(getSearchResults(query).toObservable())
return Observable.fromArray(fetchedItems)
.flatMapIterable { data->data }
.flatMap {task -> task.observeOn(Schedulers.io())}
.toList()
.map { ArrayList(it) }
}
I created an observable from the array of observables that contains lists of suggestions and results from the internet depending on the query. After that you just go over those tasks with flatMapIterable and run them using flatmap, place the results in array, which can be later fetched into a recycle view.

If you use Project Reactor, you can use Mono.when.
Mono.when(publisher1, publisher2)
.map(i-> {
System.out.println("everything is done!");
return i;
}).block()

Can RxJava reduce() be unsafe when parallelized?

I want to use the reduce() operation on observable to map it to a Guava ImmutableList, since I prefer it so much more to the standard ArrayList.
Observable<String> strings = ...
Observable<ImmutableList<String>> captured = strings.reduce(ImmutableList.<String>builder(), (b,s) -> b.add(s))
.map(ImmutableList.Builder::build);
captured.forEach(i -> System.out.println(i));
Simple enough. But suppose I somewhere scheduled the observable strings in parallel with multiple threads or something. Would this not derail the reduce() operation and possibly cause a race condition? Especially since the ImmutableList.Builder would be vulnerable to that?

The problem lies in the shared state between realizations of the chain. This is pitfall # 8 in my blog:
Shared state in an Observable chain
Let's assume you are dissatisfied with the performance or the type of the List the toList() operator returns and you want to roll your own aggregator instead of it. For a change, you want to do this by using existing operators and you find the operator reduce():
Observable<Vector<Integer>> list = Observable
.range(1, 3)
.reduce(new Vector<Integer>(), (vector, value) -> {
vector.add(value);
return vector;
});
list.subscribe(System.out::println);
list.subscribe(System.out::println);
list.subscribe(System.out::println);
When you run the 'test' calls, the first prints what you'd expect, but the second prints a vector where the range 1-3 appears twice and the third subscribe prints 9 elements!
The problem is not with the reduce() operator itself but with the expectation surrounding it. When the chain is established, the new Vector passed in is a 'global' instance and will be shared between all evaluation of the chain.
Naturally, there is a way of fixing this without implementing an operator for the whole purpose (which should be quite simple if you see the potential in the previous CounterOp):
Observable<Vector<Integer>> list2 = Observable
.range(1, 3)
.reduce((Vector<Integer>)null, (vector, value) -> {
if (vector == null) {
vector = new Vector<>();
}
vector.add(value);
return vector;
});
list2.subscribe(System.out::println);
list2.subscribe(System.out::println);
list2.subscribe(System.out::println);
You need to start with null and create a vector inside the accumulator function, which now isn't shared between subscribers.
Alternatively, you can look into the collect() operator which has a factory callback for the initial value.
The rule of thumb here is that whenever you see an aggregator-like operator taking some plain value, be cautious as this 'initial value' will most likely be shared across all subscribers and if you plan to consume the resulting stream with multiple subscribers, they will clash and may give you unexpected results or even crash.

According to the Observable contract, an observable must not make onNext calls in parallel, so you have to modify your strings Observable to respect this. You can use the serialize operator to achieve this.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Composing many completable futures - java

Related

Correct use of CompletableFuture.anyOf()

Can we improve performance on lists other than java 8 parallel streams

Most efficient way to stream on list of Futures

Combine a list of Observables and wait until all completed

Can RxJava reduce() be unsafe when parallelized?

Categories

Resources