CompletableFuture: How to apply a function to multiple CompletableFutures? - java

Suppose I have 3 downloads, framed as completable futures:
CompletableFuture<Doc> dl1 = CompletableFuture.supplyAsync(() -> download("file1"));
CompletableFuture<Doc> dl2 = CompletableFuture.supplyAsync(() -> download("file2"));
CompletableFuture<Doc> dl3 = CompletableFuture.supplyAsync(() -> download("file3"));
Then all of them need to be handled the same way
CompletableFuture<String> s1 = dl1.thenApply(Doc::getFilename);
CompletableFuture<String> s2 = dl2.thenApply(Doc::getFilename);
CompletableFuture<String> s3 = dl3.thenApply(Doc::getFilename);
And you can imagine multiple functions to be applied, all in parallel.
According to the DRY principle, this example seems inappropriate. So I'm looking for a solution to define only 1 workflow that is executed 3 times, in parallel.
How can this be accomplished?
I tried allOf, but that has two problems 1) it starts blocking and 2) the return type can only run stuff instead of handle it.

Stream.of("file1", "file2", "file3") // or your input in any other format, that can easily be transformed to a stream...
// .parallel() // well... depends...
.map(s -> CompletableFuture.supplyAsync(() -> download(s)))
.map(dl -> dl.thenApply(Doc::getFilename))
.map(CompletableFuture::join) // if you want to have all the results collected
.collect(Collectors.toList());
Of course also the two map-calls can be combined. But at least you do not write everything x times... If you do not like the collection to List you can also call something else on it, e.g. .forEach(System.out::println). The .forEach has the benefit, that as soon as the response is available, the consumer is called.
Or the classic: just use a loop and a list/array for your input, but you may need to take care of more than you would have with streams

Related

Intersect operation with Flux - Project Reactor

Let's say I have both var a = Flux.just("A", "B", "C") and var b = Flux.just("B", "C", "D")
I want to be able to intersect both variables and the result should be equivalent of a set intersect
Something like a.intersect(b) or Flux.intersect(a, b) that would result in (Flux of) ["B", "C"]
I could not find any operation that does this, any ideas?
You could use join, filter, map and groupBy like so
//Join fluxes in tuple
a.join(b,s -> Flux.never(), s-> Flux.never(),Tuples::of)
//Filter out matching
.filter(t -> t.getT1().equals(t.getT2()))
//Revert to single value
.map(Tuple2::getT1)
//Remove duplicates
.groupBy(f -> f)
.map(GroupedFlux::key)
.subscribe(System.out::println);
Results in single subscription to each and will also work with dupes.
Or you could write your own intersect method
public <T> Flux<T> intersect(Flux<T> f1,Flux<T> f2){
return f1.join(f2,f ->Flux.never(),f-> Flux.never(),Tuples::of)
.filter(t -> t.getT1().equals(t.getT2()))
.map(Tuple2::getT1)
.groupBy(f -> f)
.map(GroupedFlux::key);
}
//Use on it's own
intersect(a,b).subscribe(System.out::println)
//Or with existing flux
a.transform(f -> intersect(a,f)).subscribe(System.out::println)
My favoured approach would be something like:
Flux.merge(a, b)
.groupBy(Function.identity())
.filterWhen(g -> g.count().map(l -> l>1))
.map(g -> g.key())
.subscribe(System.out::print); //Prints "BC"
(If a or b might contain duplicates, replace the first line with Flux.merge(a.distinct(), b.distinct()).)
Each publisher is only played once, and it's trivial to expand it to more than two publishers if necessary.
I like efficiency, so I like to use what is proven without overly depending on streaming (or fluxing) operations.
Disadvantage of this is the need to collect one of the fluxes into a sorted list. Perhaps you can know in advance whether one Flux is shorter. Seems to me however that are going to have to do such a thing no matter what since you have to compare each element of Flux A against all elements in Flux B (or at least until you find a match).
So, collect Flux A into a sorted list and then there is no reason not to use Collections::binarySearch on your collected/sorted flux.
a.collectSortedList()
.flatMapMany(sorteda -> b.filter(be->Collections.binarySearch(sorteda, be)>=0))
.subscribe(System.out::println);

Reactor Flux subscriber stream stopped when using reduce on flatMap

I want change my code for single subscriber. Now i have
auctionFlux.window(Duration.ofSeconds(120), Duration.ofSeconds(120)).subscribe(
s -> s.groupBy(Auction::getItem).subscribe( longAuctionGroupedFlux -> longAuctionGroupedFlux.reduce(new ItemDumpStats(), this::calculateStats )
));
This code is working correctly reduce method is very simple. I tried change my code for single subscriber
auctionFlux.window(Duration.ofSeconds(120), Duration.ofSeconds(120))
.flatMap(window -> window.groupBy(Auction::getItem))
.flatMap(longAuctionGroupedFlux -> longAuctionGroupedFlux.reduce(new ItemDumpStats(), this::calculateStats))
.subscribe(itemDumpStatsMono -> log.info(itemDumpStatsMono.toString()));
This is my code, and this code is not working. No errors and no results. After debugging i found code is stuck on second flatMap when i reducing stream. I think problem is on flatMap merging, stucking on Mono resolve. Some one now how to fix this problem and use only single subscriber?
How to replicate, you can use another class or create one. In small size is working but on bigger is dying
List<Auction> auctionList = new ArrayList<>();
for (int i = 0;i<100000;i++){
Auction a = new Auction((long) i, "test");
a.setItem((long) (i%50));
auctionList.add(a);
}
Flux.fromIterable(auctionList).groupBy(Auction::getId).flatMap(longAuctionGroupedFlux ->
longAuctionGroupedFlux.reduce(new ItemDumpStats(), (itemDumpStats, auction) -> itemDumpStats)).collectList().subscribe(itemDumpStats -> System.out.println(itemDumpStats.toString()));
On this approach is instant result but I using 3 subscribers
Flux.fromIterable(auctionList)
.groupBy(Auction::getId)
.subscribe(
auctionIdAuctionGroupedFlux -> auctionIdAuctionGroupedFlux.reduce(new ItemDumpStats(), (itemDumpStats, auction) -> itemDumpStats).subscribe(itemDumpStats -> System.out.println(itemDumpStats.toString()
)
));
I think the behavior you described is related to the interaction between groupBy chained with flatMap.
Check groupBy documentation. It states that:
The groups need to be drained and consumed downstream for groupBy to work correctly. Notably when the criteria produces a large amount of groups, it can lead to hanging if the groups are not suitably consumed downstream (eg. due to a flatMap with a maxConcurrency parameter that is set too low).
By default, maxConcurrency (flatMap) is set to 256 (i checked the source code of 3.2.2). So,
selecting more than 256 groups may cause the execution to hang (particularly when all execution happens on the same thread).
The following code helps in understanding what happens when you chain the operators groupBy and flatMap:
#Test
public void groupAndFlatmapTest() {
val groupCount = 257;
val groupSize = 513;
val list = rangeClosed(1, groupSize * groupCount).boxed().collect(Collectors.toList());
val source = Flux.fromIterable(list)
.groupBy(i -> i % groupCount)
.flatMap(Flux::collectList);
StepVerifier.create(source).expectNextCount(groupCount).expectComplete().verify();
}
The execution of this code hangs. Changing groupCount to 256 or less makes the test pass (for every value of groupSize).
So, regarding your original problem, it is very possible that you are creating a large amount of groups with your key-selector Auction::getItem.
Adding parallel fixed problem, but i looking answer why reduce dramatically slow flatMap.

how to use java parallel stream instead of executorThreadsPool?

I want to write a test that execute many parallel calls to my API.
ExecutorService executor = Executors.newCachedThreadPool();
final int numOfUsers = 10;
for (int i = 0; i < numOfUsers; i++) {
executor.execute(() -> {
final Device device1 = getFirstDevice();
final ResponseDto responseDto = devicesServiceLocal.acquireDevice(device1.uuid, 4738);
if (responseDto.status == Status.SUCCESS)
{
successCount.incrementAndGet();
}
});
}
I know I can do it using executorThreadsPool, like this:
devicesList.parallelStream()
.map(device -> do something)
I could have created it with java8 parallel stream:
How can i do it on one device?
meaning I want few calls to acquire the same device.
something like this:
{{device}}.parallelStream().execute(myAction).times(10)
Yes it can, but...
You would think
Stream.generate(() -> device)
.limit(10)
.parallel()
.forEach(device -> device.execute());
should do the job. But NO, because reason (I really do not know why, no clue).
If I let device.execute() wait a second and then let it print something. The stream prints 10 times every second something. So it isn't at all parallel, not what you want.
Google is my friend and I found a lot of articles that warn against parallelStream. But my eye fell on http://blog.jooq.org/2014/06/13/java-8-friday-10-subtle-mistakes-when-using-the-streams-api/ number 8 and 9. 8 was saying if it is backed by a collection you'll have to sort it and it will magically work so:
Stream.generate(() -> device)
.limit(10)
.sorted((a,b)->0) // Sort it (kind of), what??
.parallel()
.forEach(device -> device.execute());
And now it prints after one second 8 times and after an other second 2 times something. I have 8 cores so that is what we (kind of) expect.
I used .forEach() in my stream, but at first I was (like your example) using .map(). .map() didn't print a thing: the stream was never consumed (see 9 in the linked article).
So, beware working with streams and especially parallel ones. You have to be sure you're stream is consumed, it is finite (.limit()), it is working parallel, etc. Streams are weird, I suggest keeping your working solution.
Note: if device.execute() is a blocking operation (IO, networking...) you will never have more than your number of cores (in my case 8) tasks that will be executed at the same time.
Update (thanks to Holger):
Holger gave an elegant alternative:
IntStream.range(0,10)
.parallel()
.mapToObject(i -> getDevice())
.forEach(device -> device.execute());
// Or shorter:
IntStream.range(0,10)
.parallel()
.forEach(i -> getDevice().execute());
which is just like a parallel for-loop (and it works).

Is there a way to zip two streams?

This question arrose from the answer to another question where map and reduce were suggested to calculate a sum concurrently.
In that question there's a complexCalculation(e), but now I was wondering how to parallellise even more, by splitting the calculation in two parts, so that complexCalculation(e) = part1(e) * part2(e). I wonder whether it would be possible to calculate part1 and part2 on a collection concurrently (using map() again) and then zip the two resulting streams so that the ith element of both streams is combined with the function * so that the resulting stream equals the stream that can be gotten by mapping complexCalculation(e) on that collection. In code this would look like:
Stream map1 = bigCollection.parallelStream().map(e -> part1(e));
Stream map2 = bigCollection.parallelStream().map(e -> part2(e));
// preferably map1 and map2 are computed concurrently...
Stream result = map1.zip(map2, (e1, e2) -> e1 * e2);
result.equals(bigCollection.map(e -> complexCalculation(e))); //should be true
So my question is: does there exist some functionality like the zip function I tried to describe here?
parallelStream() is guarenteed to complete in the order submitted. This means you cannot assume that two parallelStreams can be zipped together like this.
Your original bigCollection.map(e -> complexCalculation(e)) is likely to be faster unless your collection is actually smaller than the number of CPUs you have.
If you really want to parallelize part1 and part2 (for example your bigCollection has very few elements, less than CPU cores), you can do the following trick. Suppose you have two methods part1 and part2 in the current class:
public long part1(Type t) { ... }
public long part2(Type t) { ... }
Create a stream of two functions created from these methods and process it in parallel like this:
bigCollection.parallelStream()
.map(e -> Stream.<ToLongFunction<Type>>of(this::part1, this::part2)
.parallel()
.mapToLong(fn -> fn.applyAsLong(e)).reduce(1, (a, b) -> a*b))
.// continue the outer stream operations
However it's very rare case. As #PeterLawrey noted if your outer collection is big enough, no need to parallelize part1 and part2. Instead you will handle separate elements in parallel.

How to chain asynchronous operations using Java RX Observable?

I want to make a HTTP request repeatedly and act on result. I start with public Observable<NewsItem> fetchItems(NewsFeed feed). One request gets a few news items but I decided to flatten it.
The idea was to use Observable.interval() make the request multiple times, then combine resulting Observables into one.
Observable
.interval(timePerItem, TimeUnit.MILLISECONDS)
.map(i -> feed)
.map(feed -> fetchItems(feed))
.subscribe(result -> System.out.println(result));
But the result is Observable<Observable<NewsItem>> not Observable<NewsItem>. How to marge them?
I have found the marge() operator (RX-Java doc: Marge). But it does not seem to fit the use case.
In previous version I used CompletableFuture<List<NewsItem>> fetchNewsItems() but I wasn't able to fit it into Observable chain.
Not sure if I understand the problem, but aren't you just looking for flatMap?
Observable
.interval(timePerItem, TimeUnit.MILLISECONDS)
.flatMap(i -> fetchItems(feed))
.subscribe(result -> System.out.println(result));

Categories