How to chain asynchronous operations using Java RX Observable? - java

I want to make a HTTP request repeatedly and act on result. I start with public Observable<NewsItem> fetchItems(NewsFeed feed). One request gets a few news items but I decided to flatten it.
The idea was to use Observable.interval() make the request multiple times, then combine resulting Observables into one.
Observable
.interval(timePerItem, TimeUnit.MILLISECONDS)
.map(i -> feed)
.map(feed -> fetchItems(feed))
.subscribe(result -> System.out.println(result));
But the result is Observable<Observable<NewsItem>> not Observable<NewsItem>. How to marge them?
I have found the marge() operator (RX-Java doc: Marge). But it does not seem to fit the use case.
In previous version I used CompletableFuture<List<NewsItem>> fetchNewsItems() but I wasn't able to fit it into Observable chain.

Not sure if I understand the problem, but aren't you just looking for flatMap?
Observable
.interval(timePerItem, TimeUnit.MILLISECONDS)
.flatMap(i -> fetchItems(feed))
.subscribe(result -> System.out.println(result));

Related

Reactor Flux subscriber stream stopped when using reduce on flatMap

I want change my code for single subscriber. Now i have
auctionFlux.window(Duration.ofSeconds(120), Duration.ofSeconds(120)).subscribe(
s -> s.groupBy(Auction::getItem).subscribe( longAuctionGroupedFlux -> longAuctionGroupedFlux.reduce(new ItemDumpStats(), this::calculateStats )
));
This code is working correctly reduce method is very simple. I tried change my code for single subscriber
auctionFlux.window(Duration.ofSeconds(120), Duration.ofSeconds(120))
.flatMap(window -> window.groupBy(Auction::getItem))
.flatMap(longAuctionGroupedFlux -> longAuctionGroupedFlux.reduce(new ItemDumpStats(), this::calculateStats))
.subscribe(itemDumpStatsMono -> log.info(itemDumpStatsMono.toString()));
This is my code, and this code is not working. No errors and no results. After debugging i found code is stuck on second flatMap when i reducing stream. I think problem is on flatMap merging, stucking on Mono resolve. Some one now how to fix this problem and use only single subscriber?
How to replicate, you can use another class or create one. In small size is working but on bigger is dying
List<Auction> auctionList = new ArrayList<>();
for (int i = 0;i<100000;i++){
Auction a = new Auction((long) i, "test");
a.setItem((long) (i%50));
auctionList.add(a);
}
Flux.fromIterable(auctionList).groupBy(Auction::getId).flatMap(longAuctionGroupedFlux ->
longAuctionGroupedFlux.reduce(new ItemDumpStats(), (itemDumpStats, auction) -> itemDumpStats)).collectList().subscribe(itemDumpStats -> System.out.println(itemDumpStats.toString()));
On this approach is instant result but I using 3 subscribers
Flux.fromIterable(auctionList)
.groupBy(Auction::getId)
.subscribe(
auctionIdAuctionGroupedFlux -> auctionIdAuctionGroupedFlux.reduce(new ItemDumpStats(), (itemDumpStats, auction) -> itemDumpStats).subscribe(itemDumpStats -> System.out.println(itemDumpStats.toString()
)
));
I think the behavior you described is related to the interaction between groupBy chained with flatMap.
Check groupBy documentation. It states that:
The groups need to be drained and consumed downstream for groupBy to work correctly. Notably when the criteria produces a large amount of groups, it can lead to hanging if the groups are not suitably consumed downstream (eg. due to a flatMap with a maxConcurrency parameter that is set too low).
By default, maxConcurrency (flatMap) is set to 256 (i checked the source code of 3.2.2). So,
selecting more than 256 groups may cause the execution to hang (particularly when all execution happens on the same thread).
The following code helps in understanding what happens when you chain the operators groupBy and flatMap:
#Test
public void groupAndFlatmapTest() {
val groupCount = 257;
val groupSize = 513;
val list = rangeClosed(1, groupSize * groupCount).boxed().collect(Collectors.toList());
val source = Flux.fromIterable(list)
.groupBy(i -> i % groupCount)
.flatMap(Flux::collectList);
StepVerifier.create(source).expectNextCount(groupCount).expectComplete().verify();
}
The execution of this code hangs. Changing groupCount to 256 or less makes the test pass (for every value of groupSize).
So, regarding your original problem, it is very possible that you are creating a large amount of groups with your key-selector Auction::getItem.
Adding parallel fixed problem, but i looking answer why reduce dramatically slow flatMap.

Why do flatmapped Java streams behave differently with iterator vs foreach? [duplicate]

This question already has answers here:
Why filter() after flatMap() is "not completely" lazy in Java streams?
(8 answers)
Closed 3 years ago.
Consider the following code:
urls.stream()
.flatMap(url -> fetchDataFromInternet(url).stream())
.filter(...)
.findFirst()
.get();
Will fetchDataFromInternet be called for second url when the first one was enough?
I tried with a smaller example and it looks like working as expected. i.e processes data one by one but can this behavior be relied on? If not, does calling .sequential() before .flatMap(...) help?
Stream.of("one", "two", "three")
.flatMap(num -> {
System.out.println("Processing " + num);
// return FetchFromInternetForNum(num).data().stream();
return Stream.of(num);
})
.peek(num -> System.out.println("Peek before filter: "+ num))
.filter(num -> num.length() > 0)
.peek(num -> System.out.println("Peek after filter: "+ num))
.forEach(num -> {
System.out.println("Done " + num);
});
Output:
Processing one
Peek before filter: one
Peek after filter: one
Done one
Processing two
Peek before filter: two
Peek after filter: two
Done two
Processing three
Peek before filter: three
Peek after filter: three
Done three
Update: Using official Oracle JDK8 if that matters on implementation
Answer:
Based on the comments and the answers below, flatmap is partially lazy. i.e reads the first stream fully and only when required, it goes for next. Reading a stream is eager but reading multiple streams is lazy.
If this behavior is intended, the API should let the function return an Iterable instead of a stream.
In other words: link
Under the current implementation, flatmap is eager; like any other stateful intermediate operation (like sorted and distinct). And it's very easy to prove :
int result = Stream.of(1)
.flatMap(x -> Stream.generate(() -> ThreadLocalRandom.current().nextInt()))
.findFirst()
.get();
System.out.println(result);
This never finishes as flatMap is computed eagerly. For your example:
urls.stream()
.flatMap(url -> fetchDataFromInternet(url).stream())
.filter(...)
.findFirst()
.get();
It means that for each url, the flatMap will block all others operation that come after it, even if you care about a single one. So let's suppose that from a single url your fetchDataFromInternet(url) generates 10_000 lines, well your findFirst will have to wait for all 10_000 to be computed, even if you care about only one.
EDIT
This is fixed in Java 10, where we get our laziness back: see JDK-8075939
EDIT 2
This is fixed in Java 8 too (8u222): JDK-8225328
It’s not clear why you set up an example that does not address the actual question, you’re interested in. If you want to know, whether the processing is lazy when applying a short-circuiting operation like findFirst(), well, then use an example using findFirst() instead of forEach that processes all elements anyway. Also, put the logging statement right into the function whose evaluation you want to track:
Stream.of("hello", "world")
.flatMap(s -> {
System.out.println("flatMap function evaluated for \""+s+'"');
return s.chars().boxed();
})
.peek(c -> System.out.printf("processing element %c%n", c))
.filter(c -> c>'h')
.findFirst()
.ifPresent(c -> System.out.printf("found an %c%n", c));
flatMap function evaluated for "hello"
processing element h
processing element e
processing element l
processing element l
processing element o
found an l
This demonstrates that the function passed to flatMap gets evaluated lazily as expected while the elements of the returned (sub-)stream are not evaluated as lazy as possible, as already discussed in the Q&A you have linked yourself.
So, regarding your fetchDataFromInternet method that gets invoked from the function passed to flatMap, you will get the desired laziness. But not for the data it returns.
Today I also stumbled up on this bug. Behavior is not so strait forward, cause simple case, like below, is working fine, but similar production code doesn't work.
stream(spliterator).map(o -> o).flatMap(Stream::of)..flatMap(Stream::of).findAny()
For guys who cannot wait another couple years for migration to JDK-10 there is a alternative true lazy stream. It doesn't support parallel. It was dedicated for JavaScript translation, but it worked out for me, cause interface is the same.
StreamHelper is collection based, but it is easy to adapt Spliterator.
https://github.com/yaitskov/j4ts/blob/stream/src/main/java/javaemul/internal/stream/StreamHelper.java

CompletableFuture: How to apply a function to multiple CompletableFutures?

Suppose I have 3 downloads, framed as completable futures:
CompletableFuture<Doc> dl1 = CompletableFuture.supplyAsync(() -> download("file1"));
CompletableFuture<Doc> dl2 = CompletableFuture.supplyAsync(() -> download("file2"));
CompletableFuture<Doc> dl3 = CompletableFuture.supplyAsync(() -> download("file3"));
Then all of them need to be handled the same way
CompletableFuture<String> s1 = dl1.thenApply(Doc::getFilename);
CompletableFuture<String> s2 = dl2.thenApply(Doc::getFilename);
CompletableFuture<String> s3 = dl3.thenApply(Doc::getFilename);
And you can imagine multiple functions to be applied, all in parallel.
According to the DRY principle, this example seems inappropriate. So I'm looking for a solution to define only 1 workflow that is executed 3 times, in parallel.
How can this be accomplished?
I tried allOf, but that has two problems 1) it starts blocking and 2) the return type can only run stuff instead of handle it.
Stream.of("file1", "file2", "file3") // or your input in any other format, that can easily be transformed to a stream...
// .parallel() // well... depends...
.map(s -> CompletableFuture.supplyAsync(() -> download(s)))
.map(dl -> dl.thenApply(Doc::getFilename))
.map(CompletableFuture::join) // if you want to have all the results collected
.collect(Collectors.toList());
Of course also the two map-calls can be combined. But at least you do not write everything x times... If you do not like the collection to List you can also call something else on it, e.g. .forEach(System.out::println). The .forEach has the benefit, that as soon as the response is available, the consumer is called.
Or the classic: just use a loop and a list/array for your input, but you may need to take care of more than you would have with streams

how to use java parallel stream instead of executorThreadsPool?

I want to write a test that execute many parallel calls to my API.
ExecutorService executor = Executors.newCachedThreadPool();
final int numOfUsers = 10;
for (int i = 0; i < numOfUsers; i++) {
executor.execute(() -> {
final Device device1 = getFirstDevice();
final ResponseDto responseDto = devicesServiceLocal.acquireDevice(device1.uuid, 4738);
if (responseDto.status == Status.SUCCESS)
{
successCount.incrementAndGet();
}
});
}
I know I can do it using executorThreadsPool, like this:
devicesList.parallelStream()
.map(device -> do something)
I could have created it with java8 parallel stream:
How can i do it on one device?
meaning I want few calls to acquire the same device.
something like this:
{{device}}.parallelStream().execute(myAction).times(10)
Yes it can, but...
You would think
Stream.generate(() -> device)
.limit(10)
.parallel()
.forEach(device -> device.execute());
should do the job. But NO, because reason (I really do not know why, no clue).
If I let device.execute() wait a second and then let it print something. The stream prints 10 times every second something. So it isn't at all parallel, not what you want.
Google is my friend and I found a lot of articles that warn against parallelStream. But my eye fell on http://blog.jooq.org/2014/06/13/java-8-friday-10-subtle-mistakes-when-using-the-streams-api/ number 8 and 9. 8 was saying if it is backed by a collection you'll have to sort it and it will magically work so:
Stream.generate(() -> device)
.limit(10)
.sorted((a,b)->0) // Sort it (kind of), what??
.parallel()
.forEach(device -> device.execute());
And now it prints after one second 8 times and after an other second 2 times something. I have 8 cores so that is what we (kind of) expect.
I used .forEach() in my stream, but at first I was (like your example) using .map(). .map() didn't print a thing: the stream was never consumed (see 9 in the linked article).
So, beware working with streams and especially parallel ones. You have to be sure you're stream is consumed, it is finite (.limit()), it is working parallel, etc. Streams are weird, I suggest keeping your working solution.
Note: if device.execute() is a blocking operation (IO, networking...) you will never have more than your number of cores (in my case 8) tasks that will be executed at the same time.
Update (thanks to Holger):
Holger gave an elegant alternative:
IntStream.range(0,10)
.parallel()
.mapToObject(i -> getDevice())
.forEach(device -> device.execute());
// Or shorter:
IntStream.range(0,10)
.parallel()
.forEach(i -> getDevice().execute());
which is just like a parallel for-loop (and it works).

groupBy, filter and memory leak in Rx

According to the documentation of groupBy:
Note: A GroupedObservable will cache the items it is to emit until such time as it is subscribed to. For this reason, in order to avoid memory leaks, you should not simply ignore those GroupedObservables that do not concern you. Instead, you can signal to them that they may discard their buffers by applying an operator like take(int)(0) to them.
There's a RxJava tutorial which says:
Internally, every Rx operator does 3 things
It subscribes to the source and observes the values.
It transforms the observed sequence according to the operator's purpose.
It pushes the modified sequence to its own subscribers, by calling onNext, onError and onCompleted.
Let's take a look at the following code block which extracts only even numbers from range(0, 10):
Observable.range(0, 10)
.groupBy(i -> i % 2)
.filter(g -> g.getKey() % 2 == 0)
.flatMap(g -> g)
.subscribe(System.out::println, Throwable::printStackTrace);
My questions are:
Does it mean filter operator already implies a subscription to every group resulted from groupBy or just the Observable<GroupedObservable> one?
Will there be a memory leak in this case? If so,
How to properly discard those groups? Replace filter with a custom one, which does a take(0) followed by a return Observable.empty()? You may ask why I don't just return take(0) directly: it's because filter doesn't neccessarily follow right after groupBy, but can be anywhere in the chain and involve more complex conditions.
Apart from the memory leak, the current implementation may end up hanging completely due to internal request coordination problems.
Note that using take(0), the group may be recreated all the time. I'd instead use ignoreElements which drops values, no items reach flatMap and the group itself won't be recreated all the time.
Your suspicions are correct in that to properly handle the grouped observable each of the inner observables (g) must be subscribed to. As filter is subscribing to the outer observable only it's a bad idea. Just do what you need in the flatMap using ignoreElements to filter out undesired groups.
Observable.range(0, 10)
.groupBy(i -> i % 2)
.flatMap(g -> {
if (g.getKey() % 2 == 0)
return g;
else
return g.ignoreElements();
})
.subscribe(System.out::println, Throwable::printStackTrace);

Categories