Proper handling of Backpressure and Concurrency with RxJava2

Proper handling of Backpressure and Concurrency with RxJava2 - java

I have a service with which I have registered a call back, and now I want to expose it as Flowable with certain requirements/limitations:
Thread receiving callback should not be blocked (work should be handed off to a different thread/scheduler specified by the observer)
There should not be any exceptions thrown due to consumers being slow down stream
Multiple consumers can subscribe to it independent of each other
consumers can choose to buffer all the items so that none of them are lost, however they should not be buffered in the 'producer' class
Below is what I have currently
class MyBroadcaster {
private PublishProcessor<Packet> packets = PublishProcessor.create();
private Flowable<Packet> backpressuredPackets = packets.onBackpressureLatest();
public MyBroadcaster() {
//this is actually different to my exact use but same conceptually
registerCallback(packets::onNext);
}
public Flowable<Packet> observeAllPacketsOn(Scheduler scheduler) {
return backpressuredPackets.observeOn(scheduler);
}
}
I'm not sure if this actually fits my requirements. There's a note on the onBackpressureLatest javadoc regarding observeOn that I don't understand:
Note that due to the nature of how backpressure requests are propagated through subscribeOn/observeOn, requesting more than 1 from downstream doesn't guarantee a continuous delivery of onNext events
And I have other questions:
Does the onBackpressureLatest call make it so that the items are no longer multicasted?
How can I test my requirements?
Bonus: If I have multiple such publishers (in same class or elsewhere) , what is the best way to make the same pattern reusable. Create my own Flowable with delegation/extra methods?

I'm not sure if this actually fits my requirements.
It does not. Apply either onBackpressureLatest or onBackpressureBuffer followed by observeOn in the observeSomePacketsOn and observeAllPacketsOn respectively.
Does the onBackpressureLatest call make it so that the items are no longer multicasted?
The multicasting is done by PublishProcessor and different subscribers will establish a channel to it independently where the onBackpressureXXX and observeOn operators take effect on an individual subscriber basis.
How can I test my requirements?
Subscribe through the lossy or lossless Flowable with a TestSubscriber (Flowable.test()), feed a known set of Packets into packets and see if all of them arrived either via TestSubscriber.assertValueCount() or TestSubscriber.values(). The lossy one should be 1 .. N and the lossless one should have N values after a grace period.
Bonus: If I have multiple such publishers (in same class or elsewhere) , what is the best way to make the same pattern reusable. Create my own Flowable with delegation/extra methods?
You could turn the observeAllPacketsOn into a FlowableTransformer and instead of a method call on MyBroadcaster, use compose, for example:
class MyTransformers {
public static FlowableTransformer<T, T> lossyObserveOn(Scheduler s) {
return f -> f.onBackpressureLatest().observeOn(s);
}
}
new MyBroadcaster().getPacketFlow()
.compose(MyTransformers.lossyObserveOn(scheduler))
.subscribe(/* ... */);

Related

Why do some Reactor operators request far more elements than they are interested in?

I have the following code:
Flux<String> flux = Flux.<String>never()
.doOnRequest(n -> System.out.println("Requested " + n));
It is a Flux that never emits any signal, but reports demand to the console.
Each of the following 3 lines
flux.take(3).next().block();
flux.next().block();
flux.blockFirst();
produces this output:
Requested 9223372036854775807
Looking at the code, I see the following.
BlockingSingleSubscriber (works both in the cases of Flux#blockFirst() and Mono#block():
public final void onSubscribe(Subscription s) {
this.s = s;
if (!cancelled) {
s.request(Long.MAX_VALUE);
}
}
MonoNext.NextSubscriber:
public void request(long n) {
if (WIP.compareAndSet(this, 0, 1)) {
s.request(Long.MAX_VALUE);
}
}
FluxTake.TakeSubscriber:
public void request(long n) {
if (wip == 0 && WIP.compareAndSet(this, 0, 1)) {
if (n >= this.n) {
s.request(Long.MAX_VALUE);
} else {
s.request(n);
}
return;
}
s.request(n);
}
So Flux#blockFirst(), Flux#next() and Mono#block() always signal an unbounded demand to their upstream, and Flux#take() can do the same under some circumstances.
But Flux#blockFirst(), Flux#next() and Mono#block() each need at max one element from their upstream, and Flux#take() needs maximally this.n.
Also, Flux#take() javadoc says the following:
Note that this operator doesn't manipulate the backpressure requested amount.
Rather, it merely lets requests from downstream propagate as is and cancels once
N elements have been emitted. As a result, the source could produce a lot of
extraneous elements in the meantime. If that behavior is undesirable and you do
not own the request from downstream (e.g. prefetching operators), consider
using {#link #limitRequest(long)} instead.
The question is: why do they signal an unbounded demand when they know the limit upfront? I had an impression that reactive backpressure was about only asking for what you are ready to consume. But in reality, it often works like this: shout 'produce all you can' to the upstream, and then cancel the subscription once satisfied. In cases when it is costly to produce gazillion records upstream this seems simply wasteful.

tl;dr - Requesting only what you need is usually ideal in a pull based system, but is very rarely ideal in a push based system.
I had an impression that reactive backpressure was about only asking for what you are ready to consume.
Not quite, it's what you are able to consume. The difference is subtle, but important.
In a pull based system, you'd be entirely correct - requesting more values than you know you'll ever need would almost never be a good thing, as the more values you request, the more work needs to happen to produce those values.
But note that reactive streams are inherently push based, not pull based. Most reactive frameworks, reactor included, are built with this in mind - and while hybrid, or pull based semantics are possible (using Flux.generate() to produce elements one at a time on demand for example) this is very much a secondary use case. The norm is to have a publisher that has a bunch of data it needs to offload, and it "wants" to push that to you as quickly as possible to be rid of it.
This is important as it flips the view as to what's ideal from a requesting perspective. It no longer becomes a question of "What's the most I'll ever need", but instead "What's the most I can ever deal with" - the bigger the number, the better.
As an example, let's say I have a database query returning 2000 records connected to a flux - but I only want 1. If I have a publisher that's pushing these 2000 records, and I call request(1), then I'm not "helping" things at all - I haven't caused less processing on the database side, those records are already there and waiting. Since I've only requested 1 however, the publisher must then decide whether it can buffer the remaining records, or it's best to skip some or all of them, or it should throw an exception if it can't keep up, or something else entirely. Whatever it does, I'm actually causing more work, and possibly even an exception in some cases, by requesting fewer records.
Granted, this is not always desirable - perhaps those extra elements in the Flux really do cause extra processing that's wasteful, perhaps network bandwidth is a primary concern, etc. In that case, you'd want to explicitly call limitRequest(). In most cases though, that's probably not the behaviour you're after.
(For completeness sake, the best scenario is of course to limit the data at source - put a LIMIT 1 on your database query if you only want a single value for instance. Then you don't have to worry about any of this stuff. But, of course, in real-world usage that's not always possible.)

Java Flux vs. Observable/BehaviorSubject

My question is whether or not Flux has the ability to behave like an Observable or BehaviorSubject. I think I get the gist of what a Flux does and how, but every tutorial I see creates a Flux of static content, i.e. some pre-existing array of numbers which are finite in nature.
However, I want my Flux to be a stream of unknown values over time... like an Observable or BehaviorSubject. With those, you can create a method like setNextValue(String value), and pump those values to all subscribers of the Observable/BehaviorSubject etc.
Is this possible with a Flux? Or does the Flux have to be composed of an Observable type stream of values first?
Update
I answered my own question with an implementation down below. The accepted answer will lead down same path probably, but slightly complicated.

every tutorial I see creates a Flux of static content, i.e. some pre-existing array of numbers which are finite in nature.
You'll see this because most tutorials focus on how to manipulate & use a Flux - but the implication here (that you can just use a Flux with static, fixed-length content) is both unfortunate, and wrong. It's much more powerful than that, and using it with such static content is almost certainly not how you see it used in the real-world.
There's essentially 3 different ways of instantiating a Flux to emit elements dynamically as you describe:
However, I want my Flux to be a stream of unknown values over time... like an Observable or BehaviorSubject. With those, you can create a method like setNextValue(String value), and pump those values to all subscribers of the Observable/BehaviorSubject etc.
Absolutely - have a look at Flux.push(). This exposes an emitter, and emitter.next(value) can be called whenever you wish. This stream can go on for as long as you want it to (infinitely, if desired.) Flux.create() is essentially the multi-threaded variant of Flux.push(), which may also be of use.
Flux.generate() may also be worth a look - this is a bit like an "on-demand" version of Flux.push(), where you only emit the next element via a callback when the downstream consumer requests it, rather than emitting whenever you want to. This isn't always practical, but it makes sense to use this method if the use-case makes it feasible, as it respects backpressure and thus can be guaranteed not to overwhelm the consumer with more requests than it can handle.

This can be achieved like this:
private EmitterProcessor<String> processor;
private FluxSink<String> statusSink;
private Flux<String> status;
public constructor() {
this.processor = EmitterProcessor.create();
this.statusSink = this.processor.sink(FluxSink.OverflowStrategy.BUFFER);
this.status = this.processor.publish().autoConnect();
}
public Flux<String> getStatus() {
return this.status;
}
public void setStatus(String status) {
this.statusSink.next(status);
}

What kind of object is a Reactive Java Subscription?

In Reactive Java, we're told that the .subscribe() call returns "a Subscription reference". But Subscription is an interface, not a class. So what kind of object are we handed that implements this interface? Do we have any control over this?
There is the class Subscriptions that can create and return several different kinds of Subscription, but what does one do with them? If I write
Subscription mSub = Subscriptions.create(<some Action0>);
mSub = someObservable.subscribe();
won't my just-created Subscription simply be overwritten by whatever the .subscribe() call returns? How do you use a Subscription you create?
(On a somewhat related note, what is the point of Subscriptions.unsubscribed(), which "returns a Subscription to which unsubscribe does nothing, as it is already unsubscribed. Huh?)

Short answer: You shouldn't care.
Longer answer: a subscription gives you two methods:
unsubscribe(), which causes the subscription to terminate.
isUnsubscribed(), which checks whether that has already happened.
You can use these methods to a) check whether an Observable chain terminated and b) to cause it to terminate prematurely, for example if the user switched to a different Activity.
That's it. You aren't exposed to the internals on purpose. Also, do you notice that there's no resubscribe method? That's because if you want to restart the operation, you need to resubscribe to the Observable, giving you a new Subscription.

As you know Subscriptions are used to keep references to ongoing Observables, mainly for resources' management. For example in Android applications, when you change an Activity (screen) you flush old Activity Observables. In this scenario, Subscription instances are given by .subscribe() (as you mentioned) and stored. So, for which reason would one create a Subscription directly, especially Subscriptions.unsubscribed()? I encountered two cases:
Default implementation; avoid declaration like Subscription mSub; that would be filled latter and could create an NPE. It's especially true if you use Kotlin that require property initialization.
Testing

On a somewhat related note, what is the point of Subscriptions.unsubscribed(), which "returns a Subscription to which unsubscribe does nothing, as it is already unsubscribed. Huh?
In 1.x, Subscriptions.unsubscribed() is used to return a Subscription instance the operation was completed (or never run in the first place) when the control is returned to your code from RxJava. Since being unsubscribed is stateless and a constant state, the returned Subscription is a singleton because just by looking at the interface Subscription there is no (reasonable) way to distinguish one completed/unsubscribed Subscription from another.
In 2.x, there is a public and internal version of its equivalent interface, Disposable. The internal version is employed mostly to swap out a live Disposable with a terminated one, avoiding NullPointerException and null checks in general and to help the GC somewhat.
what does one do with them?
Usually you don't need to worry about Subscriptions.create(); it is provided for the case you have a resource you'd like to attach to the lifecycle of your end-subscriber:
FileReader file = new FileReader ("file.txt");
readLines(file)
.map(line -> line.length())
.reduce(0, (a, b) -> a + b)
.subscribe(new Subscriber<Integer>() {
{
add(Subscriptions.create(() -> {
Closeables.closeSilently(file); // utility from Guava
});
}
#Override public void onNext(Integer) {
// process
}
// onError(), onCompleted()
});
This example, demonstrating one way of usage, can be expressed via using instead nonetheless:
Observable.using(
() -> new FileReader("file.txt"), // + try { } catch { }
file -> readLines(file).map(...).reduce(...),
file -> Closeables.closeSilently(file)
)
.subscribe(...)

How does RxJava Observable "Iteration" work?

I started to play around with RxJava and ReactFX, and I became pretty fascinated with it. But as I'm experimenting I have dozens of questions and I'm constantly researching for answers.
One thing I'm observing (no pun intended) is of course lazy execution. With my exploratory code below, I noticed nothing gets executed until the merge.subscribe(pet -> System.out.println(pet)) is called. But what fascinated me is when I subscribed a second subscriber merge.subscribe(pet -> System.out.println("Feed " + pet)), it fired the "iteration" again.
What I'm trying to understand is the behavior of the iteration. It does not seem to behave like a Java 8 stream that can only be used once. Is it literally going through each String one at a time and posting it as the value for that moment? And do any new subscribers following any previously fired subscribers receive those items as if they were new?
public class RxTest {
public static void main(String[] args) {
Observable<String> dogs = Observable.from(ImmutableList.of("Dasher", "Rex"))
.filter(dog -> dog.matches("D.*"));
Observable<String> cats = Observable.from(ImmutableList.of("Tabby", "Grumpy Cat", "Meowmers", "Peanut"));
Observable<String> ferrets = Observable.from(CompletableFuture.supplyAsync(() -> "Harvey"));
Observable<String> merge = dogs.mergeWith(cats).mergeWith(ferrets);
merge.subscribe(pet -> System.out.println(pet));
merge.subscribe(pet -> System.out.println("Feed " + pet));
}
}

Observable<T> represents a monad, a chained operation, not the execution of the operation itself. It is descriptive language, rather than the imperative you're used to. To execute an operation, you .subscribe() to it. Every time you subscribe a new execution stream is created from scratch. Do not confuse streams with threads, as subscription are executed synchronously unless you specify a thread change with .subscribeOn() or .observeOn(). You chain new elements to any existing operation/monad/Observable to add new behaviour, like changing threads, filtering, accumulation, transformation, etc. In case your observable is an expensive operation you don't want to repeat on every subscription, you can prevent recreation by using .cache().
To make any asynchronous/synchronous Observable<T> operation into a synchronous inlined one, use .toBlocking() to change its type to BlockingObservable<T>. Instead of .subscribe() it contains new methods to execute operations on each result with .forEach(), or coerce with .first()
Observables are a good tool because they're mostly* deterministic (same inputs always yield same outputs unless you're doing something wrong), reusable (you can send them around as part of a command/policy pattern) and for the most part ignore concurrence because they should not rely on shared state (a.k.a. doing something wrong). BlockingObservables are good if you're trying to bring an observable-based library into imperative language, or just executing an operation on an Observable that you have 100% confidence it's well managed.
Architecting your application around these principles is a change of paradigm that I can't really cover on this answer.
*There are breaches like Subject and Observable.create() that are needed to integrate with imperative frameworks.

How to Assign a Thread to a Type of Data?

I have a stream of sequential data with two types.
And, there are an observer subscribed (listen) for each type of data.
Observer observer1 = new Observer() {
#Override
public void next(Data data) {
//Do something
}
}
Observer observer2 = new Observer() {
#Override
public void next(Data data) {
//Do something
}
}
feed.subscribe(observer1, "Type1");
feed.subscribe(observer2, "Type2");
When a piece of data of Type1 is received, the next() method of observer1 will be called and the same thing for Type2 and observer2.
And since, the data is sequential, when we have data of type1 and type2, the second one should wait for the process of the first one (will be blocked).
Now I want to use multi-threading to solve this problem. However, I don't want to create a thread for each piece of data I received. I want the observers run in parallel.
For example, if I have this sequence:
type1, type1, type1, type2, ...
The second and third piece of data should wait for first one processing (because they are type1), but the forth one should be collected by observer2 and processed.
Note that the API that I am using just allow me to subscribe, and it will handle the next() methods.
The solution that comes to my mind is to have a mapping on the threads and observers. So, in each call of next() method, instead of creating a new Thread, I can call the thread assigned to that type to resume. Or, if there is a thread that runs for observer1, I want all the data of type1 stay in a queue waiting for that thread to process them.
Is it any design pattern, or data structure to solve this problem in Java?

This sounds like an ideal fit for the producer/consumer pattern. There would be two consumers, one for each data type. There would also be a producer thread that is the subscriber to the data feed. For each data element received, it places a unit of work on the queue for the appropriate consumer. The above Wikipedia article has a sample Java implementation. Lots of other resources are also available on the web.

This is a typical problem solved with queues. There is a single consumer thread, two queues (N queues in general, where N would be your data types) and 2 (again N in general) worker threads. The consumer thread reads the stream and places the "message" in the correct queue. Each worker thread polls its corresponding queue, processes the next available item and repeats.
Reference: see the implementations of java.util.Queue for Java >= 1.5.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.