map vs flatMap in reactor

map vs flatMap in reactor - java

I've found a lot of answers regarding RxJava, but I want to understand how it works in Reactor.
My current understanding is very vague, i tend to think of map as being synchronous and flatMap to be asynchronous but I can't really get my had around it.
Here is an example:
files.flatMap { it ->
Mono.just(Paths.get(UPLOAD_ROOT, it.filename()).toFile())
.map {destFile ->
destFile.createNewFile()
destFile
}
.flatMap(it::transferTo)
}.then()
I have files (a Flux<FilePart>) and i want to copy it to some UPLOAD_ROOT on the server.
This example is taken from a book.
I can change all the .map to .flatMap and vice versa and everything still works. I wonder what the difference is.

map is for synchronous, non-blocking, 1-to-1 transformations
flatMap is for asynchronous (non-blocking) 1-to-N transformations
The difference is visible in the method signature:
map takes a Function<T, U> and returns a Flux<U>
flatMap takes a Function<T, Publisher<V>> and returns a Flux<V>
That's the major hint: you can pass a Function<T, Publisher<V>> to a map, but it wouldn't know what to do with the Publishers, and that would result in a Flux<Publisher<V>>, a sequence of inert publishers.
On the other hand, flatMap expects a Publisher<V> for each T. It knows what to do with it: subscribe to it and propagate its elements in the output sequence. As a result, the return type is Flux<V>: flatMap will flatten each inner Publisher<V> into the output sequence of all the Vs.
About the 1-N aspect:
for each <T> input element, flatMap maps it to a Publisher<V>. In some cases (eg. an HTTP request), that publisher will emit only one item, in which case we're pretty close to an async map.
But that's the degenerate case. The generic case is that a Publisher can emit multiple elements, and flatMap works just as well.
For an example, imagine you have a reactive database and you flatMap from a sequence of user IDs, with a request that returns a user's set of Badge. You end up with a single Flux<Badge> of all the badges of all these users.
Is map really synchronous and non-blocking?
Yes: it is synchronous in the way the operator applies it (a simple method call, and then the operator emits the result) and non-blocking in the sense that the function itself shouldn't block the operator calling it. In other terms it shouldn't introduce latency. That's because a Flux is still asynchronous as a whole. If it blocks mid-sequence, it will impact the rest of the Flux processing, or even other Flux.
If your map function is blocking/introduces latency but cannot be converted to return a Publisher, consider publishOn/subscribeOn to offset that blocking work on a separate thread.

The flatMap method is similar to the map method with the key difference that the supplier you provide to it should return a Mono<T> or Flux<T>.
Using the map method would result in a Mono<Mono<T>>
whereas using flatMap results in a Mono<T>.
For example, it is useful when you have to make a network call to retrieve data, with a java API that returns a Mono, and then another network call that needs the result of the first one.
// Signature of the HttpClient.get method
Mono<JsonObject> get(String url);
// The two urls to call
String firstUserUrl = "my-api/first-user";
String userDetailsUrl = "my-api/users/details/"; // needs the id at the end
// Example with map
Mono<Mono<JsonObject>> result = HttpClient.get(firstUserUrl).
map(user -> HttpClient.get(userDetailsUrl + user.getId()));
// This results with a Mono<Mono<...>> because HttpClient.get(...)
// returns a Mono
// Same example with flatMap
Mono<JsonObject> bestResult = HttpClient.get(firstUserUrl).
flatMap(user -> HttpClient.get(userDetailsUrl + user.getId()));
// Now the result has the type we expected
Also, it allows for handling errors precisely:
public UserApi {
private HttpClient httpClient;
Mono<User> findUser(String username) {
String queryUrl = "http://my-api-address/users/" + username;
return Mono.fromCallable(() -> httpClient.get(queryUrl)).
flatMap(response -> {
if (response.statusCode == 404) return Mono.error(new NotFoundException("User " + username + " not found"));
else if (response.statusCode == 500) return Mono.error(new InternalServerErrorException());
else if (response.statusCode != 200) return Mono.error(new Exception("Unknown error calling my-api"));
return Mono.just(response.data);
});
}
}

How map internally works in the Reactor.
Creating a Player class.
#Data
#AllArgsConstructor
public class Player {
String name;
String name;
}
Now creating some instances of Player class
Flux<Player> players = Flux.just(
"Zahid Khan",
"Arif Khan",
"Obaid Sheikh")
.map(fullname -> {
String[] split = fullname.split("\\s");
return new Player(split[0], split[1]);
});
StepVerifier.create(players)
.expectNext(new Player("Zahid", "Khan"))
.expectNext(new Player("Arif", "Khan"))
.expectNext(new Player("Obaid", "Sheikh"))
.verifyComplete();
What’s important to understand about the map() is that the mapping is
performed synchronously, as each item is published by the source Flux.
If you want to perform the mapping asynchronously, you should consider
the flatMap() operation.
How FlatMap internally works.
Flux<Player> players = Flux.just(
"Zahid Khan",
"Arif Khan",
"Obaid Sheikh")
.flatMap(
fullname ->
Mono.just(fullname).map(p -> {
String[] split = p.split("\\s");
return new Player(split[0], split[1]);
}).subscribeOn(Scheduler.parallel()));
List<Player> playerList = Arrays.asList(
new Player("Zahid", "Khan"),
new Player("Arif", "Khan"),
new Player("Obaid", "Sheikh"));
StepVerifier.create(players).expectNextMatches(player ->
playerList.contains(player))
.expectNextMatches(player ->
playerList.contains(player))
.expectNextMatches(player ->
playerList.contains(player))
.expectNextMatches(player ->
playerList.contains(player))
.verifyComplete();
Internally in a Flatmap(), a map() operation is performed to the Mono to transform the String to Player. Furthermore, subcribeOn () indicates that each subscription should take place in a parallel thread. In absence of subscribeOn() flatmap() acts as a synchronized.
The map is for synchronous, non-blocking, one-to-one transformations
while the flatMap is for asynchronous (non-blocking) One-to-Many transformations.

Related

Aggregate values and convert into single type within the same Java stream

I have a class with a collection of Seed elements. One of the method's return type of Seed is Optional<Pair<Boolean, String>>.
I'm trying to loop over all seeds, find if any boolean value is true and at the same time, create a set with all the String values. For instance, my input is in the form Optional<Pair<Boolean, String>>, the output should be Optional<Signal> where Signal is like:
class Signal {
public boolean exposure;
public Set<String> alarms;
// constructor and getters (can add anything to this class, it's just a bag)
}
This is what I currently have that works:
// Seed::hadExposure yields Optional<Pair<Boolean, String>> where Pair have key/value or left/right
public Optional<Signal> withExposure() {
if (seeds.stream().map(Seed::hadExposure).flatMap(Optional::stream).findAny().isEmpty()) {
return Optional.empty();
}
final var exposure = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.anyMatch(Pair::getLeft);
final var alarms = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.map(Pair::getRight)
.filter(Objects::nonNull)
.collect(Collectors.toSet());
return Optional.of(new Signal(exposure, alarms));
}
Now I have time to make it better because Seed::hadExposure could become and expensive call, so I was trying to see if I could make all of this with only one pass. I've tried (some suggestions from previous questions) with reduce, using collectors (Collectors.collectingAndThen, Collectors.partitioningBy, etc.), but nothing so far.

It's possible to do this in a single stream() expression using map to convert the non-empty exposure to a Signal and then a reduce to combine the signals:
Signal signal = exposures.stream()
.map(exposure ->
new Signal(
exposure.getLeft(),
exposure.getRight() == null
? Collections.emptySet()
: Collections.singleton(exposure.getRight())))
.reduce(
new Signal(false, new HashSet<>()),
(leftSig, rightSig) -> {
HashSet<String> alarms = new HashSet<>();
alarms.addAll(leftSig.alarms);
alarms.addAll(rightSig.alarms);
return new Signal(
leftSig.exposure || rightSig.exposure, alarms);
});
However, if you have a lot of alarms it would be expensive because it creates a new Set and adds the new alarms to the accumulated alarms for each exposure in the input.
In a language that was designed from the ground-up to support functional programming, like Scala or Haskell, you'd have a Set data type that would let you efficiently create a new set that's identical to an existing set but with an added element, so there'd be no efficiency worries:
filteredSeeds.foldLeft((false, Set[String]())) { (result, exposure) =>
(result._1 || exposure.getLeft, result._2 + exposure.getRight)
}
But Java doesn't come with anything like that out of the box.
You could create just a single Set for the result and mutate it in your stream's reduce expression, but some would regard that as poor style because you'd be mixing a functional paradigm (map/reduce over a stream) with a procedural one (mutating a set).
Personally, in Java, I'd just ditch the functional approach and use a for loop in this case. It'll be less code, more efficient, and IMO clearer.
If you have enough space to store an intermediate result, you could do something like:
List<Pair<Boolean, String>> exposures =
seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.collect(Collectors.toList());
Then you'd only be calling the expensive Seed::hadExposure method once per item in the input list.

How to efficiently split a single input Flux into many output Flux based on a computed element property?

We have some code that is given a Flux<Event> containing all events. Clients
then request a Flux<Event> for a subset of these events.
The code does something like:
// Note: Kotlin code, but this question is not Kotlin-specific
/**
* All incoming events
*/
private val allEvents: Flux<Event> = ...
/**
* Returns an flux of the events with the matching key.
*/
fun eventsForKey(key: String): Flux<Event> {
return allEvents.filter { event ->
event.key == key
}
}
So we've got allEvents, which has all of the incoming events, and the
eventsForKey function is called (potentially many times) to create a
Flux<Event> of only the events with the key specified. There are potentially
a lot of these filtered Flux instances that are alive concurrently.
My concern is that this is effectively doing a linear search for which
"sub-Flux" to deliver each event to. That is, if there are n sub-Flux
instances alive at a given moment, and a single event arrives, the event will
be tested against all n filter predicates.
What I want is a something that will let me specify an input Flux and a key
function, and then (repeatedly) obtain an output Flux for any given key
value. Each sub-Flux would behave just like the filtered ones above, but
instead of executing n predicate checks for each event, each event would
result in one key computation and a single dictionary lookup for the outgoing
Flux. Events that don't match an existing sub-Flux should be discarded,
just as they would be with a filter.
I found Flux.groupBy (which is also the accepted answer to this related
question) but:
Its return type is the unwieldy Flux<GroupedFlux<K,T>>:
I don't want the sub-Flux for a group to come into existence when its
first event appears. I need to be able to obtain a Flux for a given key on
demand, which is potentially before any events matching that key have
arrived.
I also don't want to have to deal with groups that no downstream consumer
has asked for. Events that don't match a key downstream consumers have
asked for should just be filtered out.
Its documentation states:
Note that groupBy works best with a low cardinality of groups, so chose
your keyMapper function accordingly.
I'm not sure if "a low cardinality of groups" means each "group" needs to be
small, or if the number of groups needs to be small. (and I don't know what
"small" means in this context.) I am specifically trying to deal with a
situation where the number of sub-Flux instances may be large.
Does Reactor provide a way to efficiently demultiplex a Flux like this?

Your question sounded very interesting to me and I was playing with this. This solution might not be elegant; but I simply wanted to share!
Your requirement sounds like you need some stateful predicate for filtering events before sub-fluxing to avoid every subscriber to do the filtering on their own!
In that case, we need to maintain a list/set somewhere to hold the list of allowed events. [In my example, I am going to assume I have a flux of string and the first character is the event. Based on other answer you have included in your question]
// map for char and the corresponding flux
private static final Map<Character, Flux<String>> CHAR_FLUX = new HashMap<>();
// allowed chars. empty initially
private static final List<Character> ALLOWED_CHARS = new ArrayList<>();
// stateful predicate
private static final Predicate<Character> IS_ALLOWED = c -> {
System.out.println("IS_ALLOWED check : " + c);
return ALLOWED_CHARS.contains(c);
};
Flux<GroupedFlux<Character, String>> groupedFluxFlux = Flux.just("a1", "b1", "c1", "a2", "b2", "c2", "a3", "b3", "c3", "a4", "b4", "c4", "a1", "b1", "c1", "a2", "b2", "c2", "a3", "b3", "c3", "a4", "b4", "c4")
.delayElements(Duration.ofMillis(1000))
.filter(s -> IS_ALLOWED.test(s.charAt(0))) // check if it is allowed
.groupBy(s -> s.charAt(0)) // group by starts only for the allowed keys
.cache();
groupBy returns unicastprocessor which can be consumed by only one subscriber. In your case, if you expect more than 1 subscriber for the same key, then we need this map. Otherwise it is not required.
Your eventsForKey method would return the key value from the map after adding it to the list/set.
// here the filter is just 1 filter for 1 subscriber. does not filter for every event
ALLOWED_CHARS.add('a');
return CHAR_FLUX.computeIfAbsent('a', k -> Flux.defer(() -> groupedFluxFlux.filter(gf -> gf.key() == 'a').flatMap(Function.identity())).cache());
Assumptions:
You have a limited set of events (cardinality). Otherwise the list/map might grow & groupedFlux might also not perform very well.

To do it properly probably takes a better understanding of the core reactor framework than I am personally familiar with but it seems that you want a single Subscriber and multiple Publishers driven by a HashMap. A decorated Subscriber should be easy enough in concept:
class DeMuxedSubscriber<T> implements Subscriber<T> {
Map<T, SimplePublisher<T>> mapPublishers = new HashMap<>();
#Override
public void onSubscribe(Subscription s) {
s.request(Long.MAX_VALUE);
}
#Override
public void onNext(T s) {
if ( mapPublishers.get(s) != null)
mapPublishers.get(s).subscriber.onNext(s);
}
#Override
public void onError(Throwable t) {
mapPublishers.values().forEach(sp->sp.subscriber.onError(t));
}
#Override
public void onComplete() {
mapPublishers.values().forEach(sp->sp.subscriber.onComplete());
}
public Publisher<T> getPublisher(T s) {
mapPublishers.putIfAbsent(s, new SimplePublisher<T>());
return mapPublishers.get(s);
}
};
And there is probably a class somewhere that handles being a publisher just fine but this will suffice to illustrate:
class SimplePublisher<T> implements Publisher<T> {
Subscriber<? super T> subscriber;
#Override
public void subscribe(Subscriber<? super T> s) {
subscriber = s;
}
}
And then you can make a simple example to use it. This all seems a bit awkward, and the example DeMuxedSubscriber shown here ignores backpressure, but hey, details:
Flux<String> wordFlux = Flux.generate(() -> new Integer(0), (i, sink) -> {
if (i >= 100)
sink.complete();
i = i + 1;
sink.next(Integer.toString(largestPrimeFactor(i)));
return i;
});
DeMuxedSubscriber<String> deMuxedSubscriber = new DeMuxedSubscriber<>();
Flux.from(deMuxedSubscriber.getPublisher("3")).subscribe(System.out::println);
Flux.from(deMuxedSubscriber.getPublisher("5")).subscribe(System.out::println);
wordFlux.subscribe(deMuxedSubscriber);

Collect stream only if allMatch filter and process stream once in Java

I have the following stream code:
List<Data> results = items.stream()
.map(item -> requestDataForItem(item))
.filter(data -> data.isValid())
.collect(Collectors.toList());
Data requestDataForItem(Item item) {
// call another service here
}
The problem is that I want to call
requestDataForItem only when all elements in the stream are valid.
For example,
if the first item is invalid I don't wont to make the call for any element in the stream.
There is .allMatch in the stream API,
but it returns a boolean.
I want to do the same as .allMatch than
.collect the result when everything matched.
Also, I want to process stream only once,
with two loops it is easy.
Is this possible with the Java Streams API?

This would be a job for Java 9:
List<Data> results = items.stream()
.map(item -> requestDataForItem(item))
.takeWhile(data -> data.isValid())
.collect(Collectors.toList());
This operation will stop at the first invalid element. In a sequential execution, this implies that no subsequent requestDataForItem calls are made. In a parallel execution, some additional elements might get processed concurrently, before the operation stops, but that’s the price for efficient parallel processing.
In either case, the result list will only contain the elements before the first encountered invalid element and you can easily check using results.size() == items.size() whether all elements were valid.
In Java 8, there is no such simple method and using an additional library or rolling out your own implementation of takeWhile wouldn’t pay off considering how simple the non-stream solution would be
List<Data> results = new ArrayList<>();
for(Item item: items) {
Data data = requestDataForItem(item);
if(!data.isValid()) break;
results.add(data);
}

You could theoretically use .allMatch then collect if .allMatch returns true, but then you'd be processing the collection twice. There's no way to do what you're trying to do with the streams API directly.
You could create a method to do this for you and simply pass your collection to it as opposed to using the stream API. This is slightly less elegant than using the stream API but more efficient as it processes the collection only once.
List<Data> results = getAllIfValid(
items.stream().map(item ->
requestDataForItem(item).collect(Collectors.toList())
);
public List<Data> getAllIfValid(List<Data> items) {
List<Data> results = new ArrayList<>();
for (Data d : items) {
if (!d.isValid()) {
return new ArrayList<>();
}
results.add(d);
}
return results;
}
This will return all the results if every element passes and only processes the items collection once. If any fail the isValid() check, it'll return an empty list as you want all or nothing. Simply check to see if the returned collection is empty to see whether or not all items passed the isValid() check.

Implement a two step process:
test if allMatch returns true.
If it does return true, do the collect with a second stream.

Try this.
List<Data> result = new ArrayList<>();
boolean allValid = items.stream()
.map(item -> requestDataForItem(item))
.allMatch(data -> data.isValid() && result.add(data));
if (!allValid)
result.clear();

Java Stream > Is is possible to inline an "orElseGet" into a parent Stream?

I wasn't sure how exactly to frame this question, so bear with me...
1) Is there a better (aka more "proper") way to instantiate a Stream of optional elements, other than adding null and subsequently filtering out null's?
Stream.of( ... ,
person.likesRed() ? Color.RED : null)
.filter(Objects::nonNull)
...
2) Secondly, is there a way to "inline" the following orElseGet function into the parent Stream/map?
.map(p -> ofNullable(p.getFavouriteColours()).orElseGet(fallbackToDefaultFavouriteColours))
The full (contrived) example:
import static java.util.Optional.ofNullable;
public Response getFavouriteColours(final String personId) {
Person person = personService.findById(personId);
Supplier<List<String>> fallbackToDefaultFavouriteColours = () ->
Stream.of(
Color.BLUE,
Color.GREEN,
person.likesRed() ? Color.RED : null)
.filter(Objects::nonNull)
.map(Color::getName)
.collect(Collectors.toList());
return ofNullable(person)
.map(p -> ofNullable(p.getFavouriteColours()).orElseGet(fallbackToDefaultFavouriteColours))
.map(Response::createSuccess)
.orElse(Response::createNotFound);
}

A cleaner expression would be
Stream.concat(Stream.of(Color.BLUE, Color.GREEN),
person.likesRed()? Stream.of(Color.RED): Stream.empty())
This isn’t simpler than your original expression, but it doesn’t create the bad feeling of inserting something just to filter it out afterwards or, more abstract, of discarding an already known information that has to be reconstructed afterwards.
There is even a technical difference. The expression above creates a Stream that a has a known size that can be used to optimize certain operations. In contrast, the variant using filter only has an estimated size, which will be the number of elements before filtering, but not a known exact size.
The surrounding code can be greatly simplified by not overusing Optional:
public Response getFavouriteColours(final String personId) {
Person person = personService.findById(personId);
if(person == null) return Response.createNotFound();
List<String> favouriteColours = person.getFavouriteColours();
if(favouriteColours == null)
favouriteColours = Stream.concat(
Stream.of(Color.BLUE, Color.GREEN),
person.likesRed()? Stream.of(Color.RED): Stream.empty())
.map(Color::getName)
.collect(Collectors.toList());
return Response.createSuccess(favouriteColours);
}
Even the Stream operation itself is not simpler than a conventional imperative code here:
public Response getFavouriteColours(final String personId) {
Person person = personService.findById(personId);
if(person==null) return Response.createNotFound();
List<String> favouriteColours = person.getFavouriteColours();
if(favouriteColours==null) {
favouriteColours=new ArrayList<>();
Collections.addAll(favouriteColours, Color.BLUE.getName(), Color.GREEN.getName());
if(person.likesRed()) favouriteColours.add(Color.RED.getName());
}
return Response.createSuccess(favouriteColours);
}
though it’s likely that a more complex example would benefit from the Stream API use, whereas the use of Optional is unlikely to get better with more complex operations. A chain of Optional operations can simplify the code if all absent values or filter mismatches within the chain are supposed to be handled the same way at the end of the chain. If, however, like in your example (and most real life scenarios) every absent value should get a different treatment or be reported individually, using Optional, especially the nested use of Optionals, does not improve the code.

Is it correct to filter observables using flatMap and filter?

Using a contrived example to illustrate my question, I have an Observable of a type of composite object:
Observable<Category>
public class CategoryPayload {
public List<Category> categories;
// other meta data and getters
}
public class Category {
public Integer id;
// other meta data and getters
}
I need to filter out certain categories based on id so I end up doing something like:
Observable<CategoryPayload> categoryObservable = service.getCategoryPayload();
// use flatMap to transform the Observable into multiple
mSubscription.add(
categoryObservable.flatMap(new Func1<CategoryPayload, Observable<Category>>(){
public Observable<Category> call(CategoryPayload categoryPayload){
return Observable.from(categoryPayload.categories);
}
}).filter(new Func1<Category, Boolean>(){
public Boolean call(Category category){
return category.id != SOME_BANNED_CATEGORY_ID;
}
}).toList())
.subscribe(mObserver);
Please forgive the contrived code. I am really just trying to understand whether it is a correct use of RX to flatten out my observable and then filter it in the way that I am doing above.

You are using Rx.Observable filter method to filter over a List. This is an anti-pattern because Lists are Iterables, which are the dual to Observable. Hence what you really want is a filter function for Lists, instead of converting an Iterable to Observable.
You can either use Guava's filter functions for collections, or Kotlin's built-in functions for Iterables (would require rewriting in Kotlin), or Xtend's equivalent to Kotlin's (would require rewriting in Xtend), or writing the manual mutation (with for loop) in Java.
Overall, you would .map over Observable<CategoryPayload> and inside the map do the filtering over List<Category>.

I don't see any problem with using RxJava. If you expect a single result from getCategoryPayload() or you don't care if multiple lists of categories get into the same aggregated list, then your example is okay.
mSubscriptions.add(
service.getCategoryPayload()
.flatMapIterable(p -> p.categories)
.filter(c -> c.id != SOME_BANNED_CATEGORY_ID)
.toList()
.subscribe(mObserver)
);
Otherwise, If you want to keep the payloads intact but trim the categories, you can use any fluent Iterable API (Guava, IxJava):
mSubscriptions.add(
service.getCategoryPayload()
.map(p -> {
Ix.from(p.categories).filter(c -> c.id == SOME_BANNED_CATEGORY_ID).removeAll();
return p.categories; // or just return p;
})
.subscribe(mObserver)
);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

map vs flatMap in reactor - java

Related

Aggregate values and convert into single type within the same Java stream

How to efficiently split a single input Flux into many output Flux based on a computed element property?

Collect stream only if allMatch filter and process stream once in Java

Java Stream > Is is possible to inline an "orElseGet" into a parent Stream?

Is it correct to filter observables using flatMap and filter?

Categories

Resources