Mono.when firing before publishers complete - java

Sample code:
Flux<Integer> fluxSrc = Flux.<Integer> create(e -> {
e.next(1);
try {
Thread.sleep(500);
} catch (InterruptedException e1) {
throw new RuntimeException(e1);
}
e.complete();
})
.publishOn(Schedulers.single())
.publish().autoConnect(2);
Flux<Integer> fluxA = fluxSrc
.publishOn(Schedulers.single())
.map(j -> 10 + j);
fluxA.subscribe(System.out::println);
Mono<Integer> monoB = fluxSrc
.publishOn(Schedulers.single())
.reduce(20, (j, k) -> {
try {
Thread.sleep(1000);
} catch (InterruptedException e1) {
throw new RuntimeException(e1);
}
return j + k;
});
monoB.subscribe(System.out::println);
Mono.when(fluxA, monoB)
.block();
System.out.println("After");
This produces the following output:
11
After
21
Why does it not wait for both publishers (fluxA and monoB) to complete? How should I structure the code so I make sure all publishers complete before After is reached?

By using .publish(), fluxSrc is turned into hot flux. Consider:
Hot publishers, on the other hand, do not depend on any number of
subscribers. They might start publishing data right away and would
continue doing so whenever a new Subscriber comes in (in which case
said subscriber would only see new elements emitted after it
subscribed). For hot publishers, something does indeed happen before
you subscribe.
(https://projectreactor.io/docs/core/release/reference/#reactor.hotCold)
One way to fix it is to get rid of publish and operate on cold stream. Another one is to change .autoConnect(2); to .autoConnect(3); - that's because you want to start processing data when 3rd subscription - Mono.when(fluxA, monoB).block(); is reached (previous ones are fluxA.subscribe and monoB.subscribe).
Edit:
When did wait for sources to finish, but it got onComplete signal from previous subsription.
What probably happened is:
flux A was subsribed by fluxA.subscribe(System.out::println);, emitted 11 and printed it.
flux B was subsribed by monoB.subscribe(System.out::println); and started reduction.
Mono.when was subsribed (which triggered "multicasting" - fluxes were subsribed second time).
Reduction started, it's result will be 21.
Another reduction started and was immediately finished with result 20 (reducing empty stream - only item from fluxSrc was already consumed by another reduction).
flux A sent onComplete to both subsribers.
flux B sent onComplete with result of reduction = 20. It was passed to subscription made by Mono.when, that's why it wasn't printed.
Both fluxes sent onComplete since Mono.when subsription, so that After was printed.
Around that time first reduction was completed with value 21, which was passed to monoB.subscribe(System.out::println);

Related

Leverage PriorityBlockingQueue to build producer-comsumer pattern in Java Reactor

In my project, there is a Spring scheduler periodically scans "TO BE DONE" tasks from DB, then distributing them to task consumer for subsequent handling. So, the current implementation is to construct a Reactor Sinks between producer and consumer.
Sinks.Many<Task> taskSink = Sinks.many().multicast().onBackpressureBuffer(1000, false);
Producer:
Flux<Date> dates = loadDates();
dates.filterWhen(...)
.concatMap(date -> taskManager.getTaskByDate(date))
.doOnNext(taskSink::tryEmitNext)
.subscribe();
Consumer:
taskProcessor.process(taskSink.asFlux())
.subscribeOn(Schedulers.boundedElastic())
.subscribe();
By using Sink, it works fine for most of cases. But when the system under heavy load, system maintainer would want to know:
How many tasks still sitting in the Sink?
If it is possible to clear all tasks within the Sink.
If it is possible to prioritize tasks within the Sink.
Unfortunately, Sink it's impossible to fulfill all the needs mentioned above.
So, I created a wrapper class that includes a Map and PriorityBlockingQueue. I refrerenced the implementation from this link https://stackoverflow.com/a/71009712/19278017.
After that, the original producer-consumer code revised as below:
Task queue:
MergingQueue<Task> taskQueue = new PriorityMergingQueue();
Producer:
Flux<Date> dates = loadDates();
dates.filterWhen(...)
.concatMap(date -> taskManager.getTaskByDate(date))
.doOnNext(taskQueue::enqueue)
.subscribe();
Consumer:
taskProcessor.process(Flux.create((sink) -> {
sink.onRequest(n -> {
Task task;
try {
while(!sink.isCancel() && n > 0) {
if(task = taskQueue.poll(1, TimeUnit.SECOND) != null) {
sink.next(task);
n--;
}
} catch() {
....
})
.subscribeOn(Schedulers.boundedElastic())
.subscribe();
I got some questions as below:
Will that be an issue the code doing a .poll()? Since, I came across thread hang issue during the longevity testing. Just not sure if it's due to the poll() call.
Is there any alternative solution in Reactor, which works like a PriorityBlockingQueue?
The goal of reactive programming is to avoid blocking operations. PriorityBlockingQueue.poll() will cause issues as it will block the thread waiting for the next element.
There is however an alternative solution in Reactor: the unicast version of Sinks.Many allows using an arbitrary Queue for buffering using Sinks.many().unicast().onBackPressureBuffer(Queue<T>). By using a PriorityQueue instanced outside of the Sink, you can fulfill all three requirements.
Here is a short demo where I emit a Task every 100ms:
public record Task(int prio) {}
private static void log(Object message) {
System.out.println(LocalTime.now(ZoneOffset.UTC).truncatedTo(ChronoUnit.MILLIS) + ": " + message);
}
public void externalBufferDemo() throws InterruptedException {
Queue<Task> taskQueue = new PriorityQueue<>(Comparator.comparingInt(Task::prio).reversed());
Sinks.Many<Task> taskSink = Sinks.many().unicast().onBackpressureBuffer(taskQueue);
taskSink.asFlux()
.delayElements(Duration.ofMillis(100))
.subscribe(task -> log(task));
for (int i = 0; i < 10; i++) {
taskSink.tryEmitNext(new Task(i));
}
// Show amount of tasks sitting in the Sink:
log("Nr of tasks in sink: " + taskQueue.size());
// Clear all tasks in the sink after 350ms:
Thread.sleep(350);
taskQueue.clear();
log("Nr of tasks after clear: " + taskQueue.size());
Thread.sleep(1500);
}
Output:
09:41:11.347: Nr of tasks in sink: 9
09:41:11.450: Task[prio=0]
09:41:11.577: Task[prio=9]
09:41:11.687: Task[prio=8]
09:41:11.705: Nr of tasks after clear: 0
09:41:11.799: Task[prio=7]
Note that delayElements has an internal queue of size 1, which is why Task 0 was picked up before Task 1 was emitted, and why Task 7 was picked up after the clear.
If multicast is required, you can transform your flux using one of the many operators enabling multicasting.

Parallelizing deserialization step

There is the following pipeline:
item is produced (the producer is external to the pipeline);
item is deserialized (JSON to Java object);
item is processed;
At the moment it all happens synchronously in a single thread:
while(producer.next()) {
var item = gson.deserialize(producer.item());
processItem(item);
}
Or schematically:
PRODUCER -> DESERIALIZATION -> CONSUMER
(sync) (sync) (sync)
The concern is that the deserialization step has no side-effects and could be parallelized saving some world time.
The overall code should like the following:
var pipeline = new Pipeline<Item>();
pipeline.setProducer(producer);
pipeline.setDeserialization(gson::deserialize);
pipeline.setConsumer(item -> {
...
});
pipeline.run();
Or schematically:
-> DESERIALIZATION
-> DESERIALIZATION
-> DESERIALIZATION
PRODUCER -> ... -> CONSUMER
-> DESERIALIZATION
-> DESERIALIZATION
-> DESERIALIZATION
(sync) (parallel) (sync)
Important notice. Deserialized items should be produced:
synchronously;
in the same order the original producer produces encoded items.
Q. Is there a standardized way to code such a pipeline?
Try
while(producer.next()) {
CompletableFuture.supplyAsync(()-> gson.deserialize(producer.item()))
.thenRunAsync(item->processItem(item));
}
One way you can achieve your pattern is to:
Construct a multi-threaded executor to process the decoding requests
Have a consumer queue; each time you submit an item to be decoded, also add the corresponding Future object to the consumer queue
Have a consumer thread sit waiting to take items off the queue [which therefore consumes them in the order they were posted], call the corresponding get() method [which waits for the item to be decoded]
So the 'consumer' would look like this:
BlockingQueue<Future<Item>> consumerQueue = new LinkedBlockingDeque<>();
Thread consumerThread = new Thread(() -> {
try {
while (true) {
Future<Item> item = consumerQueue.take();
try {
// Get the next decoded item that's ready
Item decodedItem = item.get();
// 'Consume' the item
...
} catch (ExecutionException ex) {
}
}
} catch (InterruptedException irr) {
}
});
consumerThread.start()
Meanwhile, the 'producer' end, with its multi-threaded 'decoder', would look like this:
ExecutorService decoder = Executors.newFixedThreadPool(4);
while (!producer.hasNext()) {
Item item = producer.next()
// Submit the decode job for asynchronous processing
Future<Item> p = decoder.submit(() -> {
item.decode();
}, item);
// Also queue this decode job for future consumption once complete
consumerQueue.add(p);
}
As a separate matter, I wonder if you will actually see much benefit in practice, since by insisting on consumption in the same order, you are inherently introducing a serial condition on the process. But technically, this is one way that you could achieve what you are after.
P.S. If you didn't want a separate consumer thread, then the same 'producer' thread could poll the queue for completed items and execute in line.

Observable.timer rxjava not working for exponential back off

I am trying to set up an exponential back off via an Observable.timer if the network is down or if a given service is down. I have a retryWhen when there are errors.
I have two issue, I cannot get the timer to work, no matter the time set, it always runs immediately. From what I know in the docs it should run the delay then send a complete, but when I look at the logs, I see no delay.
Second is because of I wanted to get the value of the retry when it is returned I used subscribe to get it, however when Observable error is returned it throws an exception when I do the calculations. For the second issue, I plan to do a check on the type of Observable and action it depending on the type.
If I could get ideas on what I may be doing wrong that would be great
return Observable.zip(
locationObservable,
oAdapterService.getIssuerInformation(sponsorCode),
oAdapterService.getOfferInformation(sponsorCode, activity.getOfferCode()),
(LocationInfo a, IssuerInfo b, OfferInfo c) -> {
OAdapterUtil.setLocationInfo(activity, a);
OAdapterUtil.setIssuerInfo(activity, b);
OAdapterUtil.setOfferInfo(activity, c);
return activity;
})
.retryWhen(errors -> errors.zipWith(Observable.range(1, maxRetries), (error, retries) -> {
if (retries++ < maxRetries) {
log.debug("Issues with Service call for transaction ID {} with initiator ID {}, retry count {}"
,activity.getTransactionId(),activity.getInitiatorId() ,retries);
return Observable.just(retries);
}
log.error("Tried to call Service {} time(s) for for transaction ID {} with initiator ID {}, error is {} "
,maxRetries,activity.getTransactionId(),activity.getInitiatorId(),error);
return Observable.error(error);
}
).flatMap(x -> {
log.debug("X value in flat map is {}",x.toString());
x.subscribe(currentValue -> {
log.debug("X value in subscribe is with subscribe {}",currentValue.toString());
double retryCount = Double.parseDouble(currentValue.toString()) + 2.0 ;
log.debug("retry count {}",retryCount);
long exponentialBackOff =(long)Math.pow(2.0, retryCount);
log.debug("exp back off {}",exponentialBackOff);
// Observable.timer(exponentialBackOff, TimeUnit.SECONDS);
});
Observable.timer(10, TimeUnit.SECONDS);
return x;
// Observable.timer(backoffPeriod, TimeUnit.MILLISECONDS);
}
));
You have an orphan line of code:
Observable.timer(10, TimeUnit.SECONDS);
The only thing this line of code does is to create an observable. The result is discarded because nothing is done with it.
If you need to back off, then do:
return x.delay(10, TimeUnit.SECONDS);
inside of the flatMap() operator. Remove the x.subscriber(); any logging should be done before returning.

How to create blocking backpressure with rxjava Flowables?

I have a Flowable that we are returning in a function that will continually read from a database and add it to a Flowable.
public void scan() {
Flowable<String> flow = Flowable.create((FlowableOnSubscribe<String>) emitter -> {
Result result = new Result();
while (!result.hasData()) {
result = request.query(skip, limit);
partialResult.getResult()
.getFeatures().forEach(feature -> emmitter.emit(feature));
}
}, BackpressureStrategy.BUFFER)
.subscribeOn(Schedulers.io());
return flow;
}
Then I have another object that can call this method.
myObj.scan()
.parallel()
.runOn(Schedulers.computation())
.map(feature -> {
//Heavy Computation
})
.sequential()
.blockingSubscribe(msg -> {
logger.debug("Successfully processed " + msg);
}, (e) -> {
logger.error("Failed to process features because of error with scan", e);
});
My heavy computation section could potentially take a very long time. So long in fact that there is a good chance that the database requests will load the whole database into memory before the consumer finishes the first couple entries.
I have read up on backpressure with rxjava but the only 4 options essentially make me drop data or replace it with the last.
Is there a way to make it so that when I call emmitter.emit(feature) the call blocks until there is more room in the Flowable?
I.E I want to treat the Flowable as a blocking queue where push will sleep if the queue is past the capacity.

Converting an Observable to a Flowable with backpressure in RxJava2

I am observing the lines produced by a NetworkResource, wrapping it in an Observable.create. Here is the code, missing try/catch and cancellation for simplicity:
fun linesOf(resource: NetworkResource): Observable<String> =
Observable.create { emitter ->
while (!emitter.isDisposed) {
val line = resource.readLine()
Log.i(TAG, "Emitting: $line")
emitter.onNext(line)
}
}
The problem is that later I want to turn it into a Flowable using observable.toFlowable(LATEST) to add backpressure in case my consumer can't keep up, but depending on how I do it, the consumer stops receiving items after item 128.
A) this way everything works:
val resource = ...
linesOf(resource)
.subscribeOn(Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.toFlowable(BackpressureStrategy.LATEST)
.subscribe { Log.i(TAG, "Consuming: $it") }
B) here the consumer gets stuck after 128 items (but the emitting continues):
val resource = ...
linesOf(resource)
.toFlowable(BackpressureStrategy.LATEST)
.subscribeOn(Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.subscribe { Log.i(TAG, "Consuming: $it") } // <-- stops after 128
In option A) everything works without any issues, and I can see the Emitting: ... log side by side with the Consuming: ... log.
In option B) I can see the Emitting: ... log message happily emitting new lines, but I stop seeing the Consuming: ... log message after item 128, even though the emitting continues.
Question: Can someone help me understand why this happens?
First of all, you are using the wrong type and wrong operator. Using Flowable removes the need for conversion. Using Flowable.generate gets you backpressure:
Flowable.generate(emitter -> {
String line = resource.readLine();
if (line == null) {
emitter.onComplete();
} else {
emitter.onNext(line);
}
});
Second, the reason your version hangs is due to a same pool deadlock caused by subscribeOn. Requests from downstream are scheduled behind your eager emission loop and can not take effect, stopping the emission at the default 128 elements. Use Flowable.subscribeOn(scheduler, false) to avoid this case.

Categories