Combine a list of Observables and wait until all completed - java

TL;DR
How to convert Task.whenAll(List<Task>) into RxJava?
My existing code uses Bolts to build up a list of asynchronous tasks and waits until all of those tasks finish before performing other steps. Essentially, it builds up a List<Task> and returns a single Task which is marked as completed when all tasks in the list complete, as per the example on the Bolts site.
I'm looking to replace Bolts with RxJava and I'm assuming this method of building up a list of async tasks (size not known in advance) and wrapping them all into a single Observable is possible, but I don't know how.
I've tried looking at merge, zip, concat etc... but can't get to work on the List<Observable> that I'd be building up as they all seem geared to working on just two Observables at a time if I understand the docs correctly.
I'm trying to learn RxJava and am still very new to it so forgive me if this is an obvious question or explained in the docs somewhere; I have tried searching. Any help would be much appreciated.

You can use flatMap in case you have dynamic tasks composition. Something like this:
public Observable<Boolean> whenAll(List<Observable<Boolean>> tasks) {
return Observable.from(tasks)
//execute in parallel
.flatMap(task -> task.observeOn(Schedulers.computation()))
//wait, until all task are executed
//be aware, all your observable should emit onComplete event
//otherwise you will wait forever
.toList()
//could implement more intelligent logic. eg. check that everything is successful
.map(results -> true);
}
Another good example of parallel execution
Note: I do not really know your requirements for error handling. For example, what to do if only one task fails. I think you should verify this scenario.

It sounds like you're looking for the Zip operator.
There are a few different ways of using it, so let's look at an example. Say we have a few simple observables of different types:
Observable<Integer> obs1 = Observable.just(1);
Observable<String> obs2 = Observable.just("Blah");
Observable<Boolean> obs3 = Observable.just(true);
The simplest way to wait for them all is something like this:
Observable.zip(obs1, obs2, obs3, (Integer i, String s, Boolean b) -> i + " " + s + " " + b)
.subscribe(str -> System.out.println(str));
Note that in the zip function, the parameters have concrete types that correspond to the types of the observables being zipped.
Zipping a list of observables is also possible, either directly:
List<Observable<?>> obsList = Arrays.asList(obs1, obs2, obs3);
Observable.zip(obsList, (i) -> i[0] + " " + i[1] + " " + i[2])
.subscribe(str -> System.out.println(str));
...or by wrapping the list into an Observable<Observable<?>>:
Observable<Observable<?>> obsObs = Observable.from(obsList);
Observable.zip(obsObs, (i) -> i[0] + " " + i[1] + " " + i[2])
.subscribe(str -> System.out.println(str));
However, in both of these cases, the zip function can only accept a single Object[] parameter since the types of the observables in the list are not known in advance as well as their number. This means that that the zip function would have to check the number of parameters and cast them accordingly.
Regardless, all of the above examples will eventually print 1 Blah true
EDIT: When using Zip, make sure that the Observables being zipped all emit the same number of items. In the above examples all three observables emitted a single item. If we were to change them to something like this:
Observable<Integer> obs1 = Observable.from(new Integer[]{1,2,3}); //Emits three items
Observable<String> obs2 = Observable.from(new String[]{"Blah","Hello"}); //Emits two items
Observable<Boolean> obs3 = Observable.from(new Boolean[]{true,true}); //Emits two items
Then 1, Blah, True and 2, Hello, True would be the only items passed into the zip function(s). The item 3would never be zipped since the other observables have completed.

Of the suggestions proposed, zip() actually combines observable results with each other, which may or may not be what is wanted, but was not asked in the question. In the question, all that was wanted was execution of each of the operations, either one-by-one or in parallel (which was not specified, but linked Bolts example was about parallel execution). Also, zip() will complete immediately when any of the observables complete, so it's in violation of the requirements.
For parallel execution of Observables, flatMap() presented in the other answer is fine, but merge() would be more straight-forward. Note that merge will exit on error of any of the Observables, if you rather postpone the exit until all observables have finished, you should be looking at mergeDelayError().
For one-by-one, I think Observable.concat() static method should be used. Its javadoc states like this:
concat(java.lang.Iterable> sequences)
Flattens an Iterable of Observables into one Observable, one after the other, without interleaving them
which sounds like what you're after if you don't want parallel execution.
Also, if you're only interested in the completion of your task, not return values, you should probably look into Completable instead of Observable.
TLDR: for one-by-one execution of tasks and oncompletion event when they are completed, I think Completable.concat() is best suited. For parallel execution, Completable.merge() or Completable.mergeDelayError() sounds like the solution. The former one will stop immediately on any error on any completable, the latter one will execute them all even if one of them has an error, and only then reports the error.

With Kotlin
Observable.zip(obs1, obs2, BiFunction { t1 : Boolean, t2:Boolean ->
})
It's important to set the type for the function's arguments or you will have compilation errors
The last argument type change with the number of argument :
BiFunction for 2
Function3 for 3
Function4 for 4
...

You probably looked at the zip operator that works with 2 Observables.
There is also the static method Observable.zip. It has one form which should be useful for you:
zip(java.lang.Iterable<? extends Observable<?>> ws, FuncN<? extends R> zipFunction)
You can check out the javadoc for more.

I'm writing some computation heave code in Kotlin with JavaRx Observables and RxKotlin. I want to observe a list of observables to be completed and in the meantime giving me an update with the progress and latest result. At the end it returns the best calculation result. An extra requirement was to run Observables in parallel for using all my cpu cores. I ended up with this solution:
#Volatile var results: MutableList<CalculationResult> = mutableListOf()
fun doALotOfCalculations(listOfCalculations: List<Calculation>): Observable<Pair<String, CalculationResult>> {
return Observable.create { subscriber ->
Observable.concatEager(listOfCalculations.map { calculation: Calculation ->
doCalculation(calculation).subscribeOn(Schedulers.computation()) // function doCalculation returns an Observable with only one result
}).subscribeBy(
onNext = {
results.add(it)
subscriber.onNext(Pair("A calculation is ready", it))
},
onComplete = {
subscriber.onNext(Pair("Finished: ${results.size}", findBestCalculation(results))
subscriber.onComplete()
},
onError = {
subscriber.onError(it)
}
)
}
}

I had similar problem, I needed to fetch search items from rest call while also integrate saved suggestions from a RecentSearchProvider.AUTHORITY and combine them together to one unified list. I was trying to use #MyDogTom solution, unfortunately there is no Observable.from in RxJava. After some research I got a solution that worked for me.
fun getSearchedResultsSuggestions(context : Context, query : String) : Single<ArrayList<ArrayList<SearchItem>>>
{
val fetchedItems = ArrayList<Observable<ArrayList<SearchItem>>>(0)
fetchedItems.add(fetchSearchSuggestions(context,query).toObservable())
fetchedItems.add(getSearchResults(query).toObservable())
return Observable.fromArray(fetchedItems)
.flatMapIterable { data->data }
.flatMap {task -> task.observeOn(Schedulers.io())}
.toList()
.map { ArrayList(it) }
}
I created an observable from the array of observables that contains lists of suggestions and results from the internet depending on the query. After that you just go over those tasks with flatMapIterable and run them using flatmap, place the results in array, which can be later fetched into a recycle view.

If you use Project Reactor, you can use Mono.when.
Mono.when(publisher1, publisher2)
.map(i-> {
System.out.println("everything is done!");
return i;
}).block()

Related

Spring WebFlux (Reactor) combine multiple non-blocking methods

Using non-blocking calls I want to generically take a Mono and call a method that returns a Flux and for each item in the Flux, call a method that returns Mono to return a Flux which is a an aggregate object of Bar + Foo + Bar and has as many elements as the Flux method returns (will return).
As a concrete example:
Methods:
Flux<Bar> getBarsByFoo(Foo foo);
Mono<More> getMoreByBar(Bar bar);
Combined getCombinedFrom(Bar bar, Foo foo, More more);
Working code section:
Flux<Combined> getCombinedByFoo(Foo foo) {
getBarsByFoo(foo)...
}
From a blocking perspective what I want to accomplish is:
List<Combined> getCombinedByFoo(Foo foo) {
List<Bar> bars = getBarsByFoo(foo):
List<Combined> combinedList = new ArrayList<>(bars.size());
for (Bar bar: bars) {
More more = getMoreByBar(bar);
combinedList.append(getCombinedFrom(bar, foo, more));
}
return combinedList;
}
Any help on which Flux and Mono methods to use would be appreciated. I am still learning to change my brain into non-blocking thinking. Conceptually, I think there is a function to apply to each element (Bar) in from getBarsByFoo(Foo foo) to somehow map that to the combined element...
I like to think about Reactor programming as a flow of operations (as in flow programming), as a chain/DAG of operation.
In your case, you want to:
map each emitted Bar object to a Combined object.
Along the way, you need to use/call another publisher to fetch additional information:
you need to wait for it to complete so you can fetch its output value. In the case of Monads/streams, there's a flatMap operation for it.
flatMap waits for (or you can say that it extracts) a different publisher value to integrate it in the current chain of operations. I think it is called flatMap because in a sense, we break a level of hierarchy to flatten two nested publishers/monads in a single merged one.
The following example show a reactive version of your method (for a less verbose version, see Toerktumlare answer:
Flux<Combined> combine(Foo foo) {
Flux<Bar> bars = getBarBy(foo);
Flux<Combined> result = bars.flatMap(bar -> {
Mono<More> nextMore = getMoreBy(bar);
Mono<Combined> next = nextMore.map(more -> getCombinedFrom(foo, bar, more));
return next;
);
return result;
}
If you get your foo object through a Mono, you can just call flatMapMany on it:
Mono<Foo> nextFoo = ...;
Flux<Combined> = nextFoo.flatMapMany(foo -> combine(foo));
WARNING
flatMap is very powerful: it can trigger concurrent execution of the provided operation. In your case, it means that many getMoreBy(bar) operations can be launched at the same time. But it is a double-edged sword, because then it means that:
ordering of elements is not preserved (or at least, there's no guarantee)
In resource constrained system, having multiple operations launched at the same time could hurt performance or cause harm to the system (like, too many files open, etc.)
The concurrency behavior is quite high by default (256) and can be controlled in different ways:
flatMap accepts an optional concurrency argument, to adapt the number of tasks allowed to run at the same time.
There are other operators that flatten publishers, but manage work differently, like concatMap: it enforces sequential execution (and therefore, preserve ordering) of mapping tasks.
Something like this:
Flux<Combined> getCombinedByFoo(Foo foo) {
return getBarsByFoo(foo)
.flatMap(bar -> getMoreByBar(bar)
.flatMap(more -> getCombinedFrom(bar, foo, more)))
}
I dont have an idea to check, i wrote this by free hand but something like this i guess.

How to wait for completion of multiple Observables in RXJava?

I have multiple tasks, each being executed on Schedulers.newThread(). The task is a method, which returns Observable<Long>.
The overall structure looks like this:
public void createAndPerformOperations(int mDataStructureSize, int operationsAmount) {
disposables.add(Single.fromCallable(() ->
(new OperationsFactory()).getOperations((new DataStructureFactory()).getMaps(mDataStructureSize)))
.subscribeOn(Schedulers.newThread())
.observeOn(AndroidSchedulers.mainThread())
.subscribe(operations -> {
for (int i = 0; i < operations.size(); i++) {
performOperation(operations.get(i), i, operationsAmount);
}
}));
}
private void performOperation(Operation operation, int id, int operationsAmount) {
disposables.add(Observable.defer(() -> operation
.executeAndReturnUptime(operationsAmount))
.subscribeOn(Schedulers.newThread())
.observeOn(AndroidSchedulers.mainThread())
.subscribe(upTime -> uptimeStream.onNext(new Pair<>(id, upTime))));
}
The uptimeStream is a PublishSubject, which is observed in a Fragment.
I need to somehow call the PublishSubject's observer onComplete method, but only when all tasks are finished and all Pairs are consumed.
Although the amount of tasks is known, I'm trying to avoid implementing any type of hardcoded counter for consumed items, since the code should be reusable if that amount changes. I've tried calling onComplete method directly from different places, but nothing seems to really work. The main problem is that I don't need just a confirmation that all tasks are finished, but I also need to process all the values in the parent Fragment
I'm fine with removing PublishSubject in general, if other solution works.
UPD: I'm aware of this solution, but I can't find Observable.from in RXJava 3, and in this solution .zip() doesn't work for me, since I can't manually list all the observables + need to include ID field besides the actual return value
UPD2: I deleted the HashMap thing because I've got to thinking, and the main point here is to still get the results one by one, and execute only onComplete after the tasks are completed. So the solution with fromIterable does not really fit there.

Composing *many* completable futures

All the available literature shows CompletableFuture composition where a single future is composed with another using thenCompose to return a CompletableFuture<ResultType>.
In my circumstance I have a single initial seedFuture which yields many pieces of data instead of one, each of which requires further async work.
final CompletableFuture<List<NeedsMoreAsyncWork>> seedFuture = supplyAsync(() -> {
// do async work
}, someExecutor);
Can this be expressed using a fluent API, so the end result is a List<ComposableFuture<ResultType>>? Or must I revert to using a for loop?
The completable future API only has a method to run the futures in parallel: CompletableFuture.allOf(...).
If you need to run the futures in the order they appear in the list, then the only option I'm aware of is a custom loop:
List<Object> objects = ...
CompletableFuture loop = CompletableFuture.completedFuture(null);
for (Object object : objects) {
loop = loop.thenCompose( r -> future(object));
}
The main issue with this solution is that if you have many futures, it could cause a StackOverflow exception. You can prevent it using the Trampoline pattern. An implementation of this pattern is available via com.ibm.sync:asyncutil. The project is on GitHub.

Can RxJava reduce() be unsafe when parallelized?

I want to use the reduce() operation on observable to map it to a Guava ImmutableList, since I prefer it so much more to the standard ArrayList.
Observable<String> strings = ...
Observable<ImmutableList<String>> captured = strings.reduce(ImmutableList.<String>builder(), (b,s) -> b.add(s))
.map(ImmutableList.Builder::build);
captured.forEach(i -> System.out.println(i));
Simple enough. But suppose I somewhere scheduled the observable strings in parallel with multiple threads or something. Would this not derail the reduce() operation and possibly cause a race condition? Especially since the ImmutableList.Builder would be vulnerable to that?
The problem lies in the shared state between realizations of the chain. This is pitfall # 8 in my blog:
Shared state in an Observable chain
Let's assume you are dissatisfied with the performance or the type of the List the toList() operator returns and you want to roll your own aggregator instead of it. For a change, you want to do this by using existing operators and you find the operator reduce():
Observable<Vector<Integer>> list = Observable
.range(1, 3)
.reduce(new Vector<Integer>(), (vector, value) -> {
vector.add(value);
return vector;
});
list.subscribe(System.out::println);
list.subscribe(System.out::println);
list.subscribe(System.out::println);
When you run the 'test' calls, the first prints what you'd expect, but the second prints a vector where the range 1-3 appears twice and the third subscribe prints 9 elements!
The problem is not with the reduce() operator itself but with the expectation surrounding it. When the chain is established, the new Vector passed in is a 'global' instance and will be shared between all evaluation of the chain.
Naturally, there is a way of fixing this without implementing an operator for the whole purpose (which should be quite simple if you see the potential in the previous CounterOp):
Observable<Vector<Integer>> list2 = Observable
.range(1, 3)
.reduce((Vector<Integer>)null, (vector, value) -> {
if (vector == null) {
vector = new Vector<>();
}
vector.add(value);
return vector;
});
list2.subscribe(System.out::println);
list2.subscribe(System.out::println);
list2.subscribe(System.out::println);
You need to start with null and create a vector inside the accumulator function, which now isn't shared between subscribers.
Alternatively, you can look into the collect() operator which has a factory callback for the initial value.
The rule of thumb here is that whenever you see an aggregator-like operator taking some plain value, be cautious as this 'initial value' will most likely be shared across all subscribers and if you plan to consume the resulting stream with multiple subscribers, they will clash and may give you unexpected results or even crash.
According to the Observable contract, an observable must not make onNext calls in parallel, so you have to modify your strings Observable to respect this. You can use the serialize operator to achieve this.

How does RxJava Observable "Iteration" work?

I started to play around with RxJava and ReactFX, and I became pretty fascinated with it. But as I'm experimenting I have dozens of questions and I'm constantly researching for answers.
One thing I'm observing (no pun intended) is of course lazy execution. With my exploratory code below, I noticed nothing gets executed until the merge.subscribe(pet -> System.out.println(pet)) is called. But what fascinated me is when I subscribed a second subscriber merge.subscribe(pet -> System.out.println("Feed " + pet)), it fired the "iteration" again.
What I'm trying to understand is the behavior of the iteration. It does not seem to behave like a Java 8 stream that can only be used once. Is it literally going through each String one at a time and posting it as the value for that moment? And do any new subscribers following any previously fired subscribers receive those items as if they were new?
public class RxTest {
public static void main(String[] args) {
Observable<String> dogs = Observable.from(ImmutableList.of("Dasher", "Rex"))
.filter(dog -> dog.matches("D.*"));
Observable<String> cats = Observable.from(ImmutableList.of("Tabby", "Grumpy Cat", "Meowmers", "Peanut"));
Observable<String> ferrets = Observable.from(CompletableFuture.supplyAsync(() -> "Harvey"));
Observable<String> merge = dogs.mergeWith(cats).mergeWith(ferrets);
merge.subscribe(pet -> System.out.println(pet));
merge.subscribe(pet -> System.out.println("Feed " + pet));
}
}
Observable<T> represents a monad, a chained operation, not the execution of the operation itself. It is descriptive language, rather than the imperative you're used to. To execute an operation, you .subscribe() to it. Every time you subscribe a new execution stream is created from scratch. Do not confuse streams with threads, as subscription are executed synchronously unless you specify a thread change with .subscribeOn() or .observeOn(). You chain new elements to any existing operation/monad/Observable to add new behaviour, like changing threads, filtering, accumulation, transformation, etc. In case your observable is an expensive operation you don't want to repeat on every subscription, you can prevent recreation by using .cache().
To make any asynchronous/synchronous Observable<T> operation into a synchronous inlined one, use .toBlocking() to change its type to BlockingObservable<T>. Instead of .subscribe() it contains new methods to execute operations on each result with .forEach(), or coerce with .first()
Observables are a good tool because they're mostly* deterministic (same inputs always yield same outputs unless you're doing something wrong), reusable (you can send them around as part of a command/policy pattern) and for the most part ignore concurrence because they should not rely on shared state (a.k.a. doing something wrong). BlockingObservables are good if you're trying to bring an observable-based library into imperative language, or just executing an operation on an Observable that you have 100% confidence it's well managed.
Architecting your application around these principles is a change of paradigm that I can't really cover on this answer.
*There are breaches like Subject and Observable.create() that are needed to integrate with imperative frameworks.

Categories