Spring WebFlux (Reactor) combine multiple non-blocking methods

Spring WebFlux (Reactor) combine multiple non-blocking methods - java

Using non-blocking calls I want to generically take a Mono and call a method that returns a Flux and for each item in the Flux, call a method that returns Mono to return a Flux which is a an aggregate object of Bar + Foo + Bar and has as many elements as the Flux method returns (will return).
As a concrete example:
Methods:
Flux<Bar> getBarsByFoo(Foo foo);
Mono<More> getMoreByBar(Bar bar);
Combined getCombinedFrom(Bar bar, Foo foo, More more);
Working code section:
Flux<Combined> getCombinedByFoo(Foo foo) {
getBarsByFoo(foo)...
}
From a blocking perspective what I want to accomplish is:
List<Combined> getCombinedByFoo(Foo foo) {
List<Bar> bars = getBarsByFoo(foo):
List<Combined> combinedList = new ArrayList<>(bars.size());
for (Bar bar: bars) {
More more = getMoreByBar(bar);
combinedList.append(getCombinedFrom(bar, foo, more));
}
return combinedList;
}
Any help on which Flux and Mono methods to use would be appreciated. I am still learning to change my brain into non-blocking thinking. Conceptually, I think there is a function to apply to each element (Bar) in from getBarsByFoo(Foo foo) to somehow map that to the combined element...

I like to think about Reactor programming as a flow of operations (as in flow programming), as a chain/DAG of operation.
In your case, you want to:
map each emitted Bar object to a Combined object.
Along the way, you need to use/call another publisher to fetch additional information:
you need to wait for it to complete so you can fetch its output value. In the case of Monads/streams, there's a flatMap operation for it.
flatMap waits for (or you can say that it extracts) a different publisher value to integrate it in the current chain of operations. I think it is called flatMap because in a sense, we break a level of hierarchy to flatten two nested publishers/monads in a single merged one.
The following example show a reactive version of your method (for a less verbose version, see Toerktumlare answer:
Flux<Combined> combine(Foo foo) {
Flux<Bar> bars = getBarBy(foo);
Flux<Combined> result = bars.flatMap(bar -> {
Mono<More> nextMore = getMoreBy(bar);
Mono<Combined> next = nextMore.map(more -> getCombinedFrom(foo, bar, more));
return next;
);
return result;
}
If you get your foo object through a Mono, you can just call flatMapMany on it:
Mono<Foo> nextFoo = ...;
Flux<Combined> = nextFoo.flatMapMany(foo -> combine(foo));
WARNING
flatMap is very powerful: it can trigger concurrent execution of the provided operation. In your case, it means that many getMoreBy(bar) operations can be launched at the same time. But it is a double-edged sword, because then it means that:
ordering of elements is not preserved (or at least, there's no guarantee)
In resource constrained system, having multiple operations launched at the same time could hurt performance or cause harm to the system (like, too many files open, etc.)
The concurrency behavior is quite high by default (256) and can be controlled in different ways:
flatMap accepts an optional concurrency argument, to adapt the number of tasks allowed to run at the same time.
There are other operators that flatten publishers, but manage work differently, like concatMap: it enforces sequential execution (and therefore, preserve ordering) of mapping tasks.

Something like this:
Flux<Combined> getCombinedByFoo(Foo foo) {
return getBarsByFoo(foo)
.flatMap(bar -> getMoreByBar(bar)
.flatMap(more -> getCombinedFrom(bar, foo, more)))
}
I dont have an idea to check, i wrote this by free hand but something like this i guess.

Related

Java Streams - Using a setter inside map()

I have a discussion with colleague that we should not be using setters inside stream.map() like the solution suggested here - https://stackoverflow.com/a/35377863/1552771
There is a comment to this answer that discourages using map this way, but there hasn’t been a reason given as to why this is a bad idea. Can someone provide a possible scenario why this can break?
I have seen some discussions where people talk about concurrent modification of the collection itself, by adding or removing items from it, but are there any negatives to using map to just set some values to a data object?

Using side effects in map like invoking a setter, has a lot of similarities to using peek for non-debugging purposes, which have been discussed in In Java streams is peek really only for debugging?
This answer has a very good general advice:
Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.
Whereas the other answer names associated practical problems; I have to cite myself:
The important thing you have to understand, is that streams are driven by the terminal operation. The terminal operation determines whether all elements have to be processed or any at all.
When you place an operation with a side effect into a map function, you have a specific expectation about on which elements it will be performed and perhaps even how it will be performed, e.g. in which order. Whether the expectation will be fulfilled, depends on other subsequent Stream operations and perhaps even on subtle implementation details.
To show some examples:
IntStream.range(0, 10) // outcome changes with Java 9
.mapToObj(i -> System.out.append("side effect on "+i+"\n"))
.count();
IntStream.range(0, 2) // outcome changes with Java 10 (or 8u222)
.flatMap(i -> IntStream.range(i * 5, (i+1) * 5 ))
.map(i -> { System.out.println("side effect on "+i); return i; })
.anyMatch(i -> i > 3);
IntStream.range(0, 10) // outcome may change with every run
.parallel()
.map(i -> { System.out.println("side effect on "+i); return i; })
.anyMatch(i -> i > 6);
Further, as already mentioned in the linked answer, even if you have a terminal operation that processes all elements and is ordered, there is no guaranty about the processing order (or concurrency for parallel streams) of intermediate operations.
The code may happen to do the desired thing when you have a stream with no duplicates and a terminal operation processing all elements and a map function which is calling only a trivial setter, but the code has so many dependencies on subtle surrounding conditions that it will become a maintenance nightmare. Which brings us back to the first quote about using an API in an unintended way.

I think the real issue here is that it is just bad practice and violates the intended use of the capability. For example, one can also accomplish the same thing with filter. This perverts its use and also makes the code confusing or at best, unnecessarily verbose.
public static void main(String[] args) {
List<MyNumb> foo =
IntStream.range(1, 11).mapToObj(MyNumb::new).collect(
Collectors.toList());
System.out.println(foo);
foo = foo.stream().filter(i ->
{
i.value *= 10;
return true;
}).collect(Collectors.toList());
System.out.println(foo);
}
class MyNumb {
int value;
public MyNumb(int v) {
value = v;
}
public String toString() {
return Integer.toString(value);
}
}
So going back to the original example. One does not need to use map at all, resulting in the following rather ugly mess.
foos = foos.stream()
.filter(foo -> { boolean b = foo.isBlue();
if (b) {
foo.setTitle("Some value");
}
return b;})
.collect(Collectors.toList());

Streams are not just some new set of APIs which makes things easier for you. It also brings functional programming paradigm with it.
And, functional programming paradigm's most important aspect is to use pure functions for computations. A pure function is one where the output depends only and only on its input.
So, basically Streams API should use stateless, side-effect-free and pure functions.
Quoting things from Joshua Bloch's Effective Java (3rd Edition)
If you’re new to streams, it can be difficult to get the hang of them. Merely expressing your computation as a stream pipeline can be hard. When you succeed, your program will run, but you may realize little if any benefit. Streams isn’t just an API, it’s a paradigm based on functional programming. In order to obtain the expressiveness, speed, and in some cases parallelizability that streams have to offer, you have to adopt the paradigm as well as the API. The most important part of the streams paradigm is to structure your compu- tation as a sequence of transformations where the result of each stage is as close as possible to a pure function of the result of the previous stage. A pure function is one whose result depends only on its input: it does not depend on any mutable state, nor does it update any state. In order to achieve this, any function objects that you pass into stream operations, both intermediate and terminal, should be free of side-effects.
Occasionally, you may see streams code that looks like this snippet, which builds a frequency table of the words in a text file:
// Uses the streams API but not the paradigm--Don't do this!
Map<String, Long> freq = new HashMap<>();
try (Stream<String> words = new Scanner(file).tokens()) {
words.forEach(word -> { freq.merge(word.toLowerCase(), 1L, Long::sum);
});
}
What’s wrong with this code? After all, it uses streams, lambdas, and method references, and gets the right answer. Simply put, it’s not streams code at all; it’s iterative code masquerading as streams code. It derives no benefits from the streams API, and it’s (a bit) longer, harder to read, and less maintainable than the corresponding iterative code. The problem stems from the fact that this code is doing all its work in a terminal forEach operation, using a lambda that mutates external state (the frequency table). A forEach operation that does anything more than present the result of the computation performed by a stream is a “bad smell in code,” as is a lambda that mutates state. So how should this code look?
// Proper use of streams to initialize a frequency table
Map<String, Long> freq;
try (Stream<String> words = new Scanner(file).tokens()) {
freq = words
.collect(groupingBy(String::toLowerCase, counting()));
}

Just to name a few:
map() with setter is interfering (it modifies the initial data), while specs require a non-interfering function. For more details read this post.
map() with setter is stateful (your logic may depend on initial value of field you're updating), while specs require a stateless function
even if you're not interfering the collection that you're iterating over, the side effect of the setter is unnecessary
Setters in map may mislead the future code maintainers
etc...

Should I use "then" or "flatMap" for control flow?

So, I'm trying to work with Webflux and I've got a scenario "check if an object exists; if so, do stuff, else - indicate error".
That can be written in reactor as:
public Mono<Void> handleObjectWithSomeId(Mono<IdType> id){
return id.
flatMap(repository::exists). //repository.exists returns Mono<Boolean>
flatMap(e -> e ? e : Mono.error(new DoesntExistException())).
then(
//can be replaced with just(someBusinessLogic())
Mono.fromCallable(this::someBusinessLogic)
);
}
or as:
public Mono<Void> handleObjectWithSomeId(Mono<IdType> id){
return id.
flatMap(repository::exists). //repository.exists returns Mono<Boolean>
flatMap(e -> e ? e : Mono.error(new DoesntExistException())).
map(e -> this.someBusinessLogic()));
}
Let's assume that return type of someBusinessLogic cannot be changed and it has to be simple void, not Mono<Void>.
In both cases if the object won't exist, appropriate Mono.error(...) will be produced.
While I understand that then and flatMap have different semantics, effectively I get the same result. Even though in second case I'm using flatMap against its meaning, I get to skip flatMap and fromCallable in favor of simple map with ignored argument (that seems more readable). My point is, both apporaches have advantages and disadvantages when it comes to readability and code quality.
So, here's a summary:
Using then
pros
is semantically correct
cons
in many cases (like above) requires wrapping in ad-hoc Mono/Flux
Using flatMap
pros
simplifies continued "happy scenario" code
cons
is semantically incorrect
What are other pros/cons of both approaches? What should I take under consideration when choosing an operator?
I've found this reactor issue that states that there is not real difference in speed.

TL, DR: If you care about the result of the previous computation, you can use map(), flatMap() or other map variant. Otherwise, if you just want the previous stream finished, use then().
You can see a detailed log of execution for yourself, by placing an .log() call in both methods:
public Mono<Void> handleObjectWithSomeId(Mono<IdType> id) {
return id.log()
.flatMap(...)
...;
}
Like all other operations in Project Reactor, the semantics for then() and flatMap() are already defined. The context mostly defines how these operators should work together to solve your problem.
Let's consider the context you provided in the question. What flatMap() does is, whenever it gets an event, it executes the mapping function asynchronously.
Since we have a Mono<> after the last flatMap() in the question, it will provide the result of previous single computation, which we ignore. Note that if we had a Flux<> instead, the computation would be done for every element.
On the other hand, then() doesn't care about the preceding sequence of events. It just cares about the completion event:
That's why, in your example it doesn't matter very much which one you use. However, in other contexts you might choose accordingly.
You might also find the Which operator do I need? section Project Reactor Reference helpful.

is it safe/good practice to "reuse" CompletableFuture

When experimenting with CompletableFuture, I was wondering if the given code is safe.
CompletableFuture<Integer> foo = CompletableFuture.supplyAsync(() -> 42);
foo.thenApply((bar) -> {
System.out.println("bar " + bar);
return bar;
})
.acceptEither(foo.thenApply((baz) -> {
System.out.println("baz " + baz);
return baz;
}),
(z) -> System.out.println("finished processing of " + z));
It works, printing
bar 42
baz 42
finished processing of 42
Is it safe/a good idea to call thenApply or other methods more than once on a given instance of CompletableFuture?

What you describe is not a “reuse” of a CompleteableFuture, as it still performs only one action and completes at most once.
You are just registering multiple dependent stages, which is completely within the foreseen usage. At no point, the documentation suggested that dependent stages had to form a single linear chain. The use case is already described by the interface CompletionStage<T>:
A stage of a possibly asynchronous computation, that performs an action or computes a value when another CompletionStage completes. A stage completes upon termination of its computation, but this may in turn trigger other dependent stages.
Note the use of “other dependent stages” rather then “another dependent stage”. The entire documentation consistently uses plural when it comes to dependent stages. This also applies to the documentation of the CompletableFuture<T> implementation class.
After all, when you have two actions a and b which have no dependency to each other but a dependency to c, the only thing that makes sense, is to chain both to c rather than create a chain like c → a → b or c → b → a as the latter would inhibit the concurrent execution of the independent actions a and b, which is the entire point of the concurrency API.
Note that when modelling the dependency this way, there is no guaranty of the outcome you have shown. The two printing actions “bar” and “baz” have no dependency on each other, but only on the supplier of the 42 value, and the action which prints “finished processing” is scheduled for the completion of either of them, not both. So besides the indeterminate order of the “bar” and “baz” output, you could also see a log like “bar, finished processing, baz” or “baz, finished processing, bar”.

Can RxJava reduce() be unsafe when parallelized?

I want to use the reduce() operation on observable to map it to a Guava ImmutableList, since I prefer it so much more to the standard ArrayList.
Observable<String> strings = ...
Observable<ImmutableList<String>> captured = strings.reduce(ImmutableList.<String>builder(), (b,s) -> b.add(s))
.map(ImmutableList.Builder::build);
captured.forEach(i -> System.out.println(i));
Simple enough. But suppose I somewhere scheduled the observable strings in parallel with multiple threads or something. Would this not derail the reduce() operation and possibly cause a race condition? Especially since the ImmutableList.Builder would be vulnerable to that?

The problem lies in the shared state between realizations of the chain. This is pitfall # 8 in my blog:
Shared state in an Observable chain
Let's assume you are dissatisfied with the performance or the type of the List the toList() operator returns and you want to roll your own aggregator instead of it. For a change, you want to do this by using existing operators and you find the operator reduce():
Observable<Vector<Integer>> list = Observable
.range(1, 3)
.reduce(new Vector<Integer>(), (vector, value) -> {
vector.add(value);
return vector;
});
list.subscribe(System.out::println);
list.subscribe(System.out::println);
list.subscribe(System.out::println);
When you run the 'test' calls, the first prints what you'd expect, but the second prints a vector where the range 1-3 appears twice and the third subscribe prints 9 elements!
The problem is not with the reduce() operator itself but with the expectation surrounding it. When the chain is established, the new Vector passed in is a 'global' instance and will be shared between all evaluation of the chain.
Naturally, there is a way of fixing this without implementing an operator for the whole purpose (which should be quite simple if you see the potential in the previous CounterOp):
Observable<Vector<Integer>> list2 = Observable
.range(1, 3)
.reduce((Vector<Integer>)null, (vector, value) -> {
if (vector == null) {
vector = new Vector<>();
}
vector.add(value);
return vector;
});
list2.subscribe(System.out::println);
list2.subscribe(System.out::println);
list2.subscribe(System.out::println);
You need to start with null and create a vector inside the accumulator function, which now isn't shared between subscribers.
Alternatively, you can look into the collect() operator which has a factory callback for the initial value.
The rule of thumb here is that whenever you see an aggregator-like operator taking some plain value, be cautious as this 'initial value' will most likely be shared across all subscribers and if you plan to consume the resulting stream with multiple subscribers, they will clash and may give you unexpected results or even crash.

According to the Observable contract, an observable must not make onNext calls in parallel, so you have to modify your strings Observable to respect this. You can use the serialize operator to achieve this.

How does RxJava Observable "Iteration" work?

I started to play around with RxJava and ReactFX, and I became pretty fascinated with it. But as I'm experimenting I have dozens of questions and I'm constantly researching for answers.
One thing I'm observing (no pun intended) is of course lazy execution. With my exploratory code below, I noticed nothing gets executed until the merge.subscribe(pet -> System.out.println(pet)) is called. But what fascinated me is when I subscribed a second subscriber merge.subscribe(pet -> System.out.println("Feed " + pet)), it fired the "iteration" again.
What I'm trying to understand is the behavior of the iteration. It does not seem to behave like a Java 8 stream that can only be used once. Is it literally going through each String one at a time and posting it as the value for that moment? And do any new subscribers following any previously fired subscribers receive those items as if they were new?
public class RxTest {
public static void main(String[] args) {
Observable<String> dogs = Observable.from(ImmutableList.of("Dasher", "Rex"))
.filter(dog -> dog.matches("D.*"));
Observable<String> cats = Observable.from(ImmutableList.of("Tabby", "Grumpy Cat", "Meowmers", "Peanut"));
Observable<String> ferrets = Observable.from(CompletableFuture.supplyAsync(() -> "Harvey"));
Observable<String> merge = dogs.mergeWith(cats).mergeWith(ferrets);
merge.subscribe(pet -> System.out.println(pet));
merge.subscribe(pet -> System.out.println("Feed " + pet));
}
}

Observable<T> represents a monad, a chained operation, not the execution of the operation itself. It is descriptive language, rather than the imperative you're used to. To execute an operation, you .subscribe() to it. Every time you subscribe a new execution stream is created from scratch. Do not confuse streams with threads, as subscription are executed synchronously unless you specify a thread change with .subscribeOn() or .observeOn(). You chain new elements to any existing operation/monad/Observable to add new behaviour, like changing threads, filtering, accumulation, transformation, etc. In case your observable is an expensive operation you don't want to repeat on every subscription, you can prevent recreation by using .cache().
To make any asynchronous/synchronous Observable<T> operation into a synchronous inlined one, use .toBlocking() to change its type to BlockingObservable<T>. Instead of .subscribe() it contains new methods to execute operations on each result with .forEach(), or coerce with .first()
Observables are a good tool because they're mostly* deterministic (same inputs always yield same outputs unless you're doing something wrong), reusable (you can send them around as part of a command/policy pattern) and for the most part ignore concurrence because they should not rely on shared state (a.k.a. doing something wrong). BlockingObservables are good if you're trying to bring an observable-based library into imperative language, or just executing an operation on an Observable that you have 100% confidence it's well managed.
Architecting your application around these principles is a change of paradigm that I can't really cover on this answer.
*There are breaches like Subject and Observable.create() that are needed to integrate with imperative frameworks.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.