I have a Flux of strings. For each string, I have to make a remote call. But the problem is that the method that makes the remote call actually returns a Mono of the response (obviously since corresponding to a single request, there'll be a single response).
What should be the correct pattern to handle such cases? One solution I can think of is to make serial (or parallel) calls for the stream elements and reduce the responses to a single one and return.
Here's the code:
fluxObj.flatmap(a -> makeRemoteCall(a)//converts the Mono of the response to a Flux).reduce(...)
I am being unable to wrap my head around inside the flatmap.The makeRemoteCall method returns a Mono. But the flatmap returns a Flux of the response. First, why is this happening? Second, does it mean that the returned Flux contains a single response object (that was returned in the Mono)?
If the mapper Function returns a Mono, then it means that there will be (at most) one derived value for each source element in the Flux.
Having the Function return:
an empty Mono (eg. Mono.empty()) for a given value means that this source value is "ignored"
a valued Mono (like in your example) means that this source value is asynchronously mapped to another specific value
a Flux with several derived values for a given value means that this source value is asynchronously mapped to several values
For instance, given the following flatMap:
Flux.just("A", "B")
.flatMap(v -> Mono.just("value" + v))
Subscribing to the above Flux<String> and printing the emitted elements would yield:
valueA
valueB
Another fun example: with delays, one can get out of order results. Like this:
Flux.just(300, 100)
.flatMap(delay -> Mono.delay(Duration.ofMillis(delay))
.thenReturn(delay + "ms")
)
would result in a Flux<String> that yields:
100ms
300ms
If you see documentation flatMap, you can found answers to your questions:
Transform the elements emitted by this Flux asynchronously into Publishers, then flatten these inner publishers into a single Flux, sequentially and preserving order using concatenation.
Long story in short,
#Test
public void testFlux() {
Flux<String> oneString = Flux.just("1");
oneString
.flatMap(s -> testMono(s))
.collectList()
.subscribe(integers -> System.out.println("elements:" + integers));
}
private Mono<Integer> testMono(String s) {
return Mono.just(Integer.valueOf(s + "0"));
}
mapper - s -> testMono(s) where testMono(s) is Publisher (in your case makeRemoteCall(a)), it transforms a type of my oneString to Integer.
I collected Flux result to List, and printed it. Console output:
elements:[10]
It means result Flux after flatMap operator contains just one element.
Related
Let's say I have both var a = Flux.just("A", "B", "C") and var b = Flux.just("B", "C", "D")
I want to be able to intersect both variables and the result should be equivalent of a set intersect
Something like a.intersect(b) or Flux.intersect(a, b) that would result in (Flux of) ["B", "C"]
I could not find any operation that does this, any ideas?
You could use join, filter, map and groupBy like so
//Join fluxes in tuple
a.join(b,s -> Flux.never(), s-> Flux.never(),Tuples::of)
//Filter out matching
.filter(t -> t.getT1().equals(t.getT2()))
//Revert to single value
.map(Tuple2::getT1)
//Remove duplicates
.groupBy(f -> f)
.map(GroupedFlux::key)
.subscribe(System.out::println);
Results in single subscription to each and will also work with dupes.
Or you could write your own intersect method
public <T> Flux<T> intersect(Flux<T> f1,Flux<T> f2){
return f1.join(f2,f ->Flux.never(),f-> Flux.never(),Tuples::of)
.filter(t -> t.getT1().equals(t.getT2()))
.map(Tuple2::getT1)
.groupBy(f -> f)
.map(GroupedFlux::key);
}
//Use on it's own
intersect(a,b).subscribe(System.out::println)
//Or with existing flux
a.transform(f -> intersect(a,f)).subscribe(System.out::println)
My favoured approach would be something like:
Flux.merge(a, b)
.groupBy(Function.identity())
.filterWhen(g -> g.count().map(l -> l>1))
.map(g -> g.key())
.subscribe(System.out::print); //Prints "BC"
(If a or b might contain duplicates, replace the first line with Flux.merge(a.distinct(), b.distinct()).)
Each publisher is only played once, and it's trivial to expand it to more than two publishers if necessary.
I like efficiency, so I like to use what is proven without overly depending on streaming (or fluxing) operations.
Disadvantage of this is the need to collect one of the fluxes into a sorted list. Perhaps you can know in advance whether one Flux is shorter. Seems to me however that are going to have to do such a thing no matter what since you have to compare each element of Flux A against all elements in Flux B (or at least until you find a match).
So, collect Flux A into a sorted list and then there is no reason not to use Collections::binarySearch on your collected/sorted flux.
a.collectSortedList()
.flatMapMany(sorteda -> b.filter(be->Collections.binarySearch(sorteda, be)>=0))
.subscribe(System.out::println);
I understood that when I run an Akka Stream graph, it will materialised the most right component.
But doing this:
Source.range(1,100).to(Sink.reduce((a,b) -> a+b)).run(materializer);
Will materialise NotUsed though the most left component is a sink that returns integer.
However, doing the same with runWith works fine:
Source.range(1, 100).runWith(Sink.reduce((a, b) -> a + b), materializer)
.thenAccept(value -> LOGGER.info("The final value is {}", value));
What is it that I didn't understand well about the run method?
By default, to retains the materialized value of the stream operator that calls that method. In your example...
Source.range(1, 100).to(Sink.reduce((a, b) -> a + b)).run(materializer);
// ^
...the Source invokes to, so calling run on the stream returns the materialized value of the Source, which is NotUsed, and ignores the materialized value of the Sink. This is the equivalent of running source.toMat(sink, Keep.left()).
In contrast, calling runWith instead of to and run in this case returns the materialized value of the Sink, because runWith is a shorthand way of using Keep.right().
From the documentation:
final CompletionStage<Integer> sum = tweets.map(t -> 1).runWith(sumSink, system);
runWith() is a convenience method that automatically ignores the materialized value of any other operators except those appended by the runWith() itself. In the above example it translates to using Keep.right as the combiner for materialized values.
I am using RxJava/Kotlin Observable#take() to get first 50 items from the list. But #take() operator is not behaving as it should as per Rx docs.
In Rx docs, #take() is defined as:
"Emit only the first n items emitted by an Observable"
I have a function like this:
As we can see the pageSize argument is 50
And initial size of the list is 300
After that #take(50) is applied to that Observable and at next breakpoint I still get the full size list i.e. size = 300
But just for the check, if something is wrong with the debugger or observable, I tried to take only items whose displayName contains "9", but this time I get the expected result of smaller list with 9 in each of their #displayName field.
I believe RxJava/Kotlin's #take() operator is not that crazy and it's just me.
take behaves correctly as it will give you only 50 List<FollowersEntry> "marbles". Based on your screenshots and wording, I guess you wanted 50 FollowersEntry. There is a fundamental logical difference between a container of objects and the objects themselves. RxJava sees only an object sequence of type List<> but it can't know about the nested objects you intended to work with.
Therefore, you either have to use it.take(50) inside map (or whatever the Kotlin collections function is) or unroll the sequence of lists into sequence of entries via flatMapIterable:
getFollowers()
.flatMapIterable(entry -> entry)
.take(50 /* entries this time, not lists */)
Take a good look at the return type of your method - Single<List<FollowersEntity>>. The Observable returned from remoteFollowersService.getFollowers() is not an Observable that emits 300 FollowersEntity items - it is an Observable that emits a single item, and that single item is a List containing 300 FollowersEntity items. In other words you need to call take on the list, not on the observable.
return remoteFollowersService.getFollowers()
.map { val size = it.size; it } // for debugging
.map { it.take(pageSize) }
.map { val size = it.size; it } // for debugging
.map { it.filter { item -> item.displayName.contains("9") } }
.single(emptyList())
I am having trouble comprehending why parallel stream and stream are giving a different result for the exact same statement.
List<String> list = Arrays.asList("1", "2", "3");
String resultParallel = list.parallelStream().collect(StringBuilder::new,
(response, element) -> response.append(" ").append(element),
(response1, response2) -> response1.append(",").append(response2.toString()))
.toString();
System.out.println("ResultParallel: " + resultParallel);
String result = list.stream().collect(StringBuilder::new,
(response, element) -> response.append(" ").append(element),
(response1, response2) -> response1.append(",").append(response2.toString()))
.toString();
System.out.println("Result: " + result);
ResultParallel: 1, 2, 3
Result: 1 2 3
Can somebody explain why this is happening and how I get the non-parallel version to give the same result as the parallel version?
The Java 8 Stream.collect method has the following signature:
<R> R collect(Supplier<R> supplier,
BiConsumer<R, ? super T> accumulator,
BiConsumer<R, R> combiner);
Where BiConsumer<R, R> combiner is called only in the parallel streams (in order to combine partial results into a single container), therefore the output of your first code snippet is:
ResultParallel: 1, 2, 3
In the sequential version the combiner doesn't get called (see this answer), therefore the following statement is ignored:
(response1, response2) -> response1.append(",").append(response2.toString())
and the result is different:
1 2 3
How to fix it? Check #Eugene's answer or this question and answers.
To understand why this is going wrong, consider this from the javadoc.
accumulator - an associative, non-interfering, stateless function that must fold an element into a result container.
combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.
What this is saying is that it should not matter whether the elements are collected by "accumulating" or "combining" or some combination of the two. But in your code, the accumulator and the combiner concatenate using a different separator. They are not "compatible" in the sense required by the javadoc.
That leads to inconsistent results depending on whether sequential or parallel streams are used.
In the parallel case, the stream is split into substreams1 to be handled by different threads. This leads to a separate collection for each substream. The collections are then combined.
In the sequential case, the stream is not split. Instead, the stream is simply accumulated into a single collection, and no combining needs to take place.
Observations:
In general, for a stream of this size performing a simple transformation, parallelStream() is liable to make things slower.
In this specific case, the bottleneck with the parallelStream() version will be the combining step. That is a serial step, and it performs the same amount of copying as the entire serial pipeline. So, in fact, parallelization is definitely going to make things slower.
In fact, the lambdas do not behave correctly. They add an extra space at the start, and double some spaces if the combiner is used. A more correct version would be:
String result = list.stream().collect(
StringBuilder::new,
(b, e) -> b.append(b.isEmpty() ? "" : " ").append(e),
(l, r) -> l.append(l.isEmpty() ? "" : " ").append(r)).toString();
The Joiner class is a far simpler and more efficient way to concatenate streams. (Credit: #Eugene)
1 - In this case, the substreams each have only one element. For a longer list, you would typically get as many substreams as there are worker threads, and the substreams would contain multiple elements.
As a side note, even if you replace , with a space in the combiner, your results are still going to differ (slightly altered the code to make it more readable):
String resultParallel = list.parallelStream().collect(
StringBuilder::new,
(builder, elem) -> builder.append(" ").append(elem),
(left, right) -> left.append(" ").append(right)).toString();
String result = list.stream().collect(
StringBuilder::new,
(builder, elem) -> builder.append(" ").append(elem),
(left, right) -> left.append(" ").append(right)).toString();
System.out.println("ResultParallel: ->" + resultParallel + "<-"); // -> 1 2 3 4<-
System.out.println("Result: ->" + result + "<-"); // -> 1 2 3 4<-
Notice how you have a little too many spaces.
The java-doc has the hint:
combiner... must be compatible with the accumulator function
If you want to join, there are simpler options like:
String.join(",", yourList)
yourList.stream().collect(Collectors.joining(","))
This question already has answers here:
Why filter() after flatMap() is "not completely" lazy in Java streams?
(8 answers)
Closed 3 years ago.
Consider the following code:
urls.stream()
.flatMap(url -> fetchDataFromInternet(url).stream())
.filter(...)
.findFirst()
.get();
Will fetchDataFromInternet be called for second url when the first one was enough?
I tried with a smaller example and it looks like working as expected. i.e processes data one by one but can this behavior be relied on? If not, does calling .sequential() before .flatMap(...) help?
Stream.of("one", "two", "three")
.flatMap(num -> {
System.out.println("Processing " + num);
// return FetchFromInternetForNum(num).data().stream();
return Stream.of(num);
})
.peek(num -> System.out.println("Peek before filter: "+ num))
.filter(num -> num.length() > 0)
.peek(num -> System.out.println("Peek after filter: "+ num))
.forEach(num -> {
System.out.println("Done " + num);
});
Output:
Processing one
Peek before filter: one
Peek after filter: one
Done one
Processing two
Peek before filter: two
Peek after filter: two
Done two
Processing three
Peek before filter: three
Peek after filter: three
Done three
Update: Using official Oracle JDK8 if that matters on implementation
Answer:
Based on the comments and the answers below, flatmap is partially lazy. i.e reads the first stream fully and only when required, it goes for next. Reading a stream is eager but reading multiple streams is lazy.
If this behavior is intended, the API should let the function return an Iterable instead of a stream.
In other words: link
Under the current implementation, flatmap is eager; like any other stateful intermediate operation (like sorted and distinct). And it's very easy to prove :
int result = Stream.of(1)
.flatMap(x -> Stream.generate(() -> ThreadLocalRandom.current().nextInt()))
.findFirst()
.get();
System.out.println(result);
This never finishes as flatMap is computed eagerly. For your example:
urls.stream()
.flatMap(url -> fetchDataFromInternet(url).stream())
.filter(...)
.findFirst()
.get();
It means that for each url, the flatMap will block all others operation that come after it, even if you care about a single one. So let's suppose that from a single url your fetchDataFromInternet(url) generates 10_000 lines, well your findFirst will have to wait for all 10_000 to be computed, even if you care about only one.
EDIT
This is fixed in Java 10, where we get our laziness back: see JDK-8075939
EDIT 2
This is fixed in Java 8 too (8u222): JDK-8225328
It’s not clear why you set up an example that does not address the actual question, you’re interested in. If you want to know, whether the processing is lazy when applying a short-circuiting operation like findFirst(), well, then use an example using findFirst() instead of forEach that processes all elements anyway. Also, put the logging statement right into the function whose evaluation you want to track:
Stream.of("hello", "world")
.flatMap(s -> {
System.out.println("flatMap function evaluated for \""+s+'"');
return s.chars().boxed();
})
.peek(c -> System.out.printf("processing element %c%n", c))
.filter(c -> c>'h')
.findFirst()
.ifPresent(c -> System.out.printf("found an %c%n", c));
flatMap function evaluated for "hello"
processing element h
processing element e
processing element l
processing element l
processing element o
found an l
This demonstrates that the function passed to flatMap gets evaluated lazily as expected while the elements of the returned (sub-)stream are not evaluated as lazy as possible, as already discussed in the Q&A you have linked yourself.
So, regarding your fetchDataFromInternet method that gets invoked from the function passed to flatMap, you will get the desired laziness. But not for the data it returns.
Today I also stumbled up on this bug. Behavior is not so strait forward, cause simple case, like below, is working fine, but similar production code doesn't work.
stream(spliterator).map(o -> o).flatMap(Stream::of)..flatMap(Stream::of).findAny()
For guys who cannot wait another couple years for migration to JDK-10 there is a alternative true lazy stream. It doesn't support parallel. It was dedicated for JavaScript translation, but it worked out for me, cause interface is the same.
StreamHelper is collection based, but it is easy to adapt Spliterator.
https://github.com/yaitskov/j4ts/blob/stream/src/main/java/javaemul/internal/stream/StreamHelper.java