Can we get parallel processing of each of Multi pipeline steps?

Can we get parallel processing of each of Multi pipeline steps? - java

Lets say we have:
a list of URLs, that is a source for our Multi
as a first step we grab HTML of this page using HTTP client call
then we try to find some specific tag and grab its content
then we store things we found into database
Now we have a 3 steps here. Is there a way how these steps can be run in parallel? I mean after some time it should: grab HTML and simultaneously processing html + getting tags content and also simultaneously saving data into database from item that was processed already.(hopefully its obvious what I mean here) This way we can have parallel processing. As default, what I can see, mutiny does it in serial manner.
Here is an example:
#Test
public void test3() {
Multi<String> source = Multi.createFrom().items("a", "b", "c");
source
.onItem().transform(i -> trans(i, "-step1"))
.onItem().transform(i -> trans(i, "-step2"))
.onItem().transform(i -> trans(i, "-step3"))
.subscribe().with(item -> System.out.println("Subscriber received " + item));
}
private String trans(String s, String add) {
int t = new Random().nextInt(4) * 1000;
try {
print("Sleeping for '" + s + "' miliseconds: " + t);
Thread.sleep(t);
} catch (InterruptedException e) {
e.printStackTrace();
}
return s + add;
}
Now this reports following console output:
Sleeping for 'a' miliseconds: 2000
Sleeping for 'a-step1' miliseconds: 3000
Sleeping for 'a-step1-step2' miliseconds: 3000
Subscriber received a-step1-step2-step3
Sleeping for 'b' miliseconds: 0
Sleeping for 'b-step1' miliseconds: 0
Sleeping for 'b-step1-step2' miliseconds: 0
Subscriber received b-step1-step2-step3
Sleeping for 'c' miliseconds: 1000
Sleeping for 'c-step1' miliseconds: 3000
Sleeping for 'c-step1-step2' miliseconds: 3000
Subscriber received c-step1-step2-step3
One can see that its not running in parallel. What did I miss here?

As #jponge mentioned, you can collect your items in some List<Uni<String>>
and then call
Uni.combine().all().unis(listOfUnis).onitem().subscribe().with()
List<Uni<String>> listOfUnis = new ArrayList<>();
Multi<String> source = Multi.createFrom().items("a", "b", "c");
source
.onItem().invoke(i -> listOfUnis.add(trans(i, "-step1")))
.onItem().invoke(i -> listOfUnis.add(trans(i, "-step2")))
.onItem().invoke(i -> listOfUnis.add(trans(i, "-step3")))
// do not subscribe on Multis here
one more note here - if you are going to make HTTP calls, better add
.emitOn(someBlockingPoolExecutor)
since you don't want to block Netty threads waiting for http calls to complete

This is expected, Multi processes items as a stream.
If you want to make parallel operations (say, launch 10 HTTP requests) you should combine Uni, see https://smallrye.io/smallrye-mutiny/guides/combining-items

Related

How to block until next data is emitted from hot Flux?

I have some function that return some Flux<Integer>. This flux is hot, it is being emitting live data. After some time of execution, I want to block until the next Integer is emitted, and assign to a variable. This Integer may not be the first and will not be the last.
I considered blockFirst(), but this would block indefinitely as the Flux has already emitted an Integer. I do not care if the Integer is the first or last in the Flux, I just want to block till the next Integer is emitted and assign it to a variable.
How would I do this? I think I could subscribe to the Flux, and after the first value, unsubscribe, but I am hoping there is a better way to do this.

It depends on the replay/buffering behavior of your hot flux. Both blockFirst() and next() operator do the same things: they wait for the first value received in the current subscription.
It is very important to understand that, because in the case of hot fluxes, subscription is independent of source data emission. The first value is not necessarily the first value emitted upstream. It is the first value received by your current subscriber, and that depends on the upstream flow behaviour.
In case of hot fluxes, how they pass values to the subscribers depends both on their buffering and broadcast strategies. For this answer, I will focus only on the buffering aspect:
If your hot flux does not buffer any emitted value (Ex: Flux.share(), Sinks.multicast().directBestEffort()), then both blockFirst() and next().block() operators meet your requirement: wait until the next emitted live data in a blocking fashion.
NOTE: next() has the advantage to allow to become non-blocking if replacing block with cache and subscribe
If your upstream flux does buffer some past values, then your subscriber / downsream flow will not only receive live stream. Before it, it will receive (part of) upstream history. In such case, you will have to use a more advanced strategy to skip values until the one you want.
From your question, I would say that skipping values until an elapsed time has passed can be done using skipUntilOther(Mono.delay(wantedDuration)).
But be careful, because the delay starts from your subscription, not from upstream subscription (to do so, you would require the upstream to provide timed elements, and switch to another strategy).
It is also important to know that Reactor forbids calling block() from some Threads (the one used by non-elastic schedulers).
Let's demonstrate all of that with code. In the below program, there's 4 examples:
Use next/blockFirst directly on a non-buffering hot flux
Use skipUntilOther on a buffering hot flux
Show that blocking can fail sometimes
Try to avoid block operation
All examples are commented for clarity, and launched sequentially in a main function:
import java.time.Duration;
import java.util.concurrent.CountDownLatch;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;
public class HotFlux {
public static void main(String[] args) throws Exception {
System.out.println("== 1. Hot flux without any buffering");
noBuffer();
System.out.println("== 2. Hot flux with buffering");
withBuffer();
// REMINDER: block operator is not always accepted by Reactor
System.out.println("== 3. block called from a wrong context");
blockFailsOnSomeSchedulers();
System.out.println("== 4. Next value without blocking");
avoidBlocking();
}
static void noBuffer() {
// Prepare a hot flux thanks to share().
var hotFlux = Flux.interval(Duration.ofMillis(100))
.share();
// Prepare an operator that fetch the next value from live stream after a delay.
var nextValue = Mono.delay(Duration.ofMillis(300))
.then(hotFlux.next());
// Launch live data emission
var livestream = hotFlux.subscribe(i -> System.out.println("Emitted: "+i));
try {
// Trigger value fetching after a delay
var value = nextValue.block();
System.out.println("next() -> " + value);
// Immediately try to block until next value is available
System.out.println("blockFirst() -> " + hotFlux.blockFirst());
} finally {
// stop live data production
livestream.dispose();
}
}
static void withBuffer() {
// Prepare a hot flux replaying all values emitted in the past to each subscriber
var hotFlux = Flux.interval(Duration.ofMillis(100))
.cache();
// Launch live data emission
var livestream = hotFlux.subscribe(i -> System.out.println("Emitted: "+i));
try {
// Wait half a second, then get next emitted value.
var value = hotFlux.skipUntilOther(Mono.delay(Duration.ofMillis(500)))
.next()
.block();
System.out.println("skipUntilOther + next: " + value);
// block first can also be used
value = hotFlux.skipUntilOther(Mono.delay(Duration.ofMillis(500)))
.blockFirst();
System.out.println("skipUntilOther + blockFirst: " + value);
} finally {
// stop live data production
livestream.dispose();
}
}
public static void blockFailsOnSomeSchedulers() throws InterruptedException {
var hotFlux = Flux.interval(Duration.ofMillis(100)).share();
var barrier = new CountDownLatch(1);
var forbiddenInnerBlock = Mono.delay(Duration.ofMillis(200))
// This block will fail, because delay op above is scheduled on parallel scheduler by default.
.then(Mono.fromCallable(() -> hotFlux.blockFirst()))
.doFinally(signal -> barrier.countDown());
forbiddenInnerBlock.subscribe(value -> System.out.println("Block success: "+value),
err -> System.out.println("BLOCK FAILED: "+err.getMessage()));
barrier.await();
}
static void avoidBlocking() throws InterruptedException {
var hotFlux = Flux.interval(Duration.ofMillis(100)).share();
var asyncValue = hotFlux.skipUntilOther(Mono.delay(Duration.ofMillis(500)))
.next()
// time wil let us verify that the value has been fetched once then cached properly
.timed()
.cache();
asyncValue.subscribe(); // launch immediately
// Barrier is required because we're in a main/test program. If you intend to reuse the mono in a bigger scope, you do not need it.
CountDownLatch barrier = new CountDownLatch(2);
// We will see that both subscribe methods return the same timestamped value, because it has been launched previously and cached
asyncValue.subscribe(value -> System.out.println("Get value (1): "+value), err -> barrier.countDown(), () -> barrier.countDown());
asyncValue.subscribe(value -> System.out.println("Get value (2): "+value), err -> barrier.countDown(), () -> barrier.countDown());
barrier.await();
}
}
This program gives the following output:
== 1. Hot flux without any buffering
Emitted: 0
Emitted: 1
Emitted: 2
Emitted: 3
next() -> 3
Emitted: 4
blockFirst() -> 4
== 2. Hot flux with buffering
Emitted: 0
Emitted: 1
Emitted: 2
Emitted: 3
Emitted: 4
Emitted: 5
skipUntilOther + next: 5
Emitted: 6
Emitted: 7
Emitted: 8
Emitted: 9
Emitted: 10
Emitted: 11
skipUntilOther + blockFirst: 11
== 3. block called from a wrong context
BLOCK FAILED: block()/blockFirst()/blockLast() are blocking, which is not supported in thread parallel-6
== 4. Next value without blocking
Get value (1): Timed(4){eventElapsedNanos=500247504, eventElapsedSinceSubscriptionNanos=500247504, eventTimestampEpochMillis=1674831275873}
Get value (2): Timed(4){eventElapsedNanos=500247504, eventElapsedSinceSubscriptionNanos=500247504, eventTimestampEpochMillis=1674831275873}

Project reactor: subscribeOn() after Flux.create() doesn't consistently use the scheduler thread

When I run the code below, I expect to see both subscribers getting their own elastic thread. However, they do not consistently do so. For example, on my system if I uncomment the thread.Sleep(100) line, my print statements indicate that the consumers are receiving their data on the main thread. If leave it commented, I see the output I would expect: each consumer receives its data on its own elastic thread. Why do I see this nondeterministic behavior? How am I abusing the API?
List<FluxSink<String>> holder = new ArrayList<>();
ConnectableFlux<String> connect = Flux.
<String>create(holder::add).replay();
connect.connect();
Flux<String> flux = connect.subscribeOn(Schedulers.elastic());
FluxSink<String> sink = holder.get(0);
flux.subscribe(c -> {
System.out.println(Thread.currentThread().getName() + ", " +
"consumer 1 says " + c);
});
flux.subscribe(c -> {
System.out.println(Thread.currentThread().getName() + ", " +
"consumer 2 says " + c);
});
// When uncommented, subscribers receive on elastic threads; else,
// on the main thread, as if I had chosen schedulers.immediate()
//
// Thread.sleep(100);
sink.next("Hi!");
sink.complete();
Thread.sleep(1000);
//output with Thread.sleep(100):
// main, consumer 1 says Hi!
// main, consumer 2 says Hi!
//
//output without Thread.sleep(100):
// elastic-2, consumer 1 says Hi!
// elastic-3, consumer 2 says Hi!
What I'd like to achieve is a hot stream with multiple subscribers and each subscriber on its own thread.

Flux.generate with Consumer and parallel

I have simple Flux
Flux<Long> flux = Flux.generate(
AtomicLong::new,
(state, sink) -> {
long i = state.getAndIncrement();
sink.next(i);
if (i == 3) sink.complete();
return state;
}, (state) -> System.out.println("state: " + state));
Which works as expected in a single thread:
flux.subscribe(System.out::println);
The output is
0 1 2 3 state: 4
But when I switch to parallel:
flux.parallel().runOn(Schedulers.elastic()).subscribe(System.out::println);
The Consumer which should print state: Number isn't invoked. I just see:
0 3 2 1
Is it a bug or expected feature?

I'm not a reactive expert but after digging into the source code it seems that the behavior is by design; it seems that creating a ParallelFlux has the side effect of blocking the call of the State Consumer; if you want to go parallel and getting the State Consumer invoked you can use:
flux.publishOn(Schedulers.elastic()).subscribe(System.out::println);

Timing parallel timeout for process in Java

I wanted to launch a bunch of process to work simultaneously - so i've created an array of process and the lunched them like this:
Process[] p_ids= new Process[ids.length];
int index = 0;
//launching the processes
for (int id: ids) {
String runProcessCommand = createCommandForProcess();
System.out.println(runProcessCommand);
try {
p_ids[index] = Runtime.getRuntime().exec(runProcessCommand);
index++;
} catch (IOException e) {
}
}
After that I wanted to wait for all of them to finish. So I took the same array of processes and iterate over all the process in it, each iteration I am waiting for the current iterated process to finish or wait for a specific time out to pass.
like this:
for (Process p_id: p_ids) {
try {
//timeout for waiting a process
p_id.waitFor(timeoutForSingleProcess, TimeUnit.HOURS);
//error handling when reaching timeout
} catch (InterruptedException e) {
System.err.println("One of the process's execution time exceeded the timeout limit");
e.printStackTrace();
}
}
The problem is that I want to give a total_time_out - meaning a fixed time out for each one of the processes.
saying I have process1, process2, process3. I want to give a timeout of 1 hour. If each one of the process (1,2 or 3) will take more then an hour to finish I want the timeout to kick in.
The problem in my code that the timeout is starting to count down the time - when it turn arrives in the loop (and not in the same time as the other process). i.e. if process1 takes 0.5 an hour and process 2 takes 1 hour - the two process will be launch at the same time but process 2 timeout will start counting down 0.5 hour after its lunch (because we waited 0.5 hour for process 1 before moving to process 2). that way the timeout that should have been activated - was ignored.
Is there any process pool or something like that which could help me?

Why is my RxJava observable not firing off subscribers?

I'm messing around with RxJava and I want to stream a thousand consecutive integers. Then I want to asynchronously split them into odd and even streams, and then print them asynchronously.
However, I'm getting nothing printed out or at least very partial output. What am I missing? Did I schedule incorrectly? Or is the console having multithreading issues in Eclipse?
public static void main(String[] args) {
List<Integer> values = IntStream.range(0,1000).mapToObj(i -> Integer.valueOf(i)).collect(Collectors.toList());
Observable<Integer> ints = Observable.from(values).subscribeOn(Schedulers.computation());
Observable<Integer> evens = ints.filter(i -> Math.abs(i) % 2 == 0);
Observable<Integer> odds = ints.filter(i -> Math.abs(i) % 2 != 0);
evens.subscribe(i -> System.out.println(i + " IS EVEN " + Thread.currentThread().getName()));
odds.subscribe(i -> System.out.println(i + " IS ODD " + Thread.currentThread().getName()));
}

You are starting the processing pipeline using Schedules.computation which runs daemon threads. Thus when your main thread finishes, those threads are terminated before processing your observable.
So if you would like to see your results printed you could have your main thread wait for the results (e.g. by Thread.sleep) or subscribe on the calling thread by removing subscribeOn. There is also an option to create your own scheduler which will run non-daemon threads.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Can we get parallel processing of each of Multi pipeline steps? - java

This is expected, Multi processes items as a stream. If you want to make parallel operations (say, launch 10 HTTP requests) you should combine Uni, see https://smallrye.io/smallrye-mutiny/guides/combining-items

Related

How to block until next data is emitted from hot Flux?

Project reactor: subscribeOn() after Flux.create() doesn't consistently use the scheduler thread

Flux.generate with Consumer and parallel

Timing parallel timeout for process in Java

Why is my RxJava observable not firing off subscribers?

Categories

Resources