I have been looking at new rx java 2 and I'm not quite sure I understand the idea of backpressure anymore...
I'm aware that we have Observable that does not have backpressure support and Flowable that has it.
So based on example, lets say I have flowable with interval:
Flowable.interval(1, TimeUnit.MILLISECONDS, Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.subscribe(new Consumer<Long>() {
#Override
public void accept(Long aLong) throws Exception {
// do smth
}
});
This is going to crash after around 128 values, and thats pretty obvious I am consuming slower than getting items.
But then we have the same with Observable
Observable.interval(1, TimeUnit.MILLISECONDS, Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.subscribe(new Consumer<Long>() {
#Override
public void accept(Long aLong) throws Exception {
// do smth
}
});
This will not crash at all, even when I put some delay on consuming it still works. To make Flowable work lets say I put onBackpressureDrop operator, crash is gone but not all values are emitted either.
So the base question I can not find answer currently in my head is why should I care about backpressure when I can use plain Observable still receive all values without managing the buffer? Or maybe from the other side, what advantages do backpressure give me in favour of managing and handling the consuming?
What backpressure manifests in practice is bounded buffers, Flowable.observeOn has a buffer of 128 elements that gets drained as fast as the dowstream can take it. You can increase this buffer size individually to handle bursty source and all the backpressure-management practices still apply from 1.x. Observable.observeOn has an unbounded buffer that keeps collecting the elements and your app may run out of memory.
You may use Observable for example:
handling GUI events
working with short sequences (less than 1000 elements total)
You may use Flowable for example:
cold and non-timed sources
generator like sources
network and database accessors
Backpressure is when your observable (publisher) is creating more events than your subscriber can handle. So you can get subscribers missing events, or you can get a huge queue of events which just leads to out of memory eventually. Flowable takes backpressure into consideration. Observable does not. Thats it.
it reminds me of a funnel which when it has too much liquid overflows. Flowable can help with not making that happen:
with tremendous backpressure:
but with using flowable, there is much less backpressure :
Rxjava2 has a few backpressure strategies you can use depending on your usecase. by strategy i mean Rxjava2 supplies a way to handle the objects that cannot be processed because of the overflow (backpressure).
here are the strategies.
I wont go through them all, but for example if you want to not worry about the items that are overflowed you can use a drop strategy like this:
observable.toFlowable(BackpressureStrategy.DROP)
As far as i know there should be a 128 item limit on the queue, after that there can be a overflow (backpressure). Even if its not 128 its close to that number. Hope this helps someone.
if you need to change the buffer size from 128 it looks like it can be done
like this (but watch any memory constraints:
myObservable.toFlowable(BackpressureStrategy.MISSING).buffer(256); //but using MISSING might be slower.
in software developement usually back pressure strategy means your telling the emitter to slow down a bit as the consumer cannot handle the velocity your emitting events.
The fact that your Flowable crashed after emitting 128 values without backpressure handling doesn't mean it will always crash after exactly 128 values: sometimes it will crash after 10, and sometimes it will not crash at all. I believe this is what happened when you tried the example with Observable - there happened to be no backpressure, so your code worked normally, next time it may not. The difference in RxJava 2 is that there is no concept of backpressure in Observables anymore, and no way to handle it. If you're designing a reactive sequence that will probably require explicit backpressure handling - then Flowable is your best choice.
Related
So what I have understood from the docs is that parallel Flux is that essentially divided the flux elements into separate rails.(Essentially something like grouping). And as far as thread is considered, it would be the job of schedulers. So let's consider a situation like this. And all this will be run on the same scheduler instance provided via runOn() methods.
Let's consider a situation like below:
Mono<Response> = webClientCallAPi(..) //function returning Mono from webclient call
Now let's say we make around 100 calls
Flux.range(0,100).subscribeOn(Schedulers.boundedElastic()).flatMap(i -> webClientCallApi(i)).collecttoList() // or subscribe somehow
and if we use paralleFlux:
Flux.range(0,100).parallel().runOn(Schedulers.boundedElastic()).flatMap(i -> webClientCallApi(i)).sequential().collecttoList();
So if my understanding is correct, it pretty much seems to be similar. So what are the advantages of ParallelFlux over Flux and when should you use parallelFlux over flux?
In practice, you'll likely very rarely need to use a parallel flux, including in this example.
In your example, you're firing off 100 web service calls. Bear in mind the actual work needed to do this is very low - you generate and fire off an asynchronous request, and then some time later you receive a response back. In between that request & response you're not doing any work at all, it simply takes a tiny amount of CPU resources when each request is sent, and another tiny about when each response is received. (This is one of the core advantages of using an asynchronous framework to make your web requests, you're not tying up any threads while the request is in-flight.)
If you split this flux and run it in parallel, you're saying that you want these tiny amounts of CPU resources to be split so they can run simultaneously, on different CPU cores. This makes absolutely no sense - the overhead of splitting the flux, running it in parallel and then combining it later is going to be much, much greater than just leaving it to execute on a normal, sequential scheduler.
On the other hand, let's say I had a Flux<Integer> and I wanted to check if each of those integers was a prime for example - or perhaps a Flux<String> of passwords that I wanted to check against a BCrypt hash. Those sorts of operations are genuinely CPU intensive, so in that case a parallel flux, used to split execution across cores, could make a lot of sense. In reality though, those situations occur quite rarely in the normal reactor use cases.
(Also, just as a closing note, you almost always want to use Schedulers.parallel() with a parallel flux, not Schedulers.boundedElastic().)
I've just discovered the joys of RxJava and its 10000 public methods, but I am struggling to understand how (and if) we should incorporate async apis into reactive pipelines.
To give an example, let's say I'm building a pipeline that:
takes keys from some cold source (or hot, in which case let's say we already have a way of dealing with an overactive source)
fetches data for those keys using an asynchronous client (or just applies any kind of async processing)
batches the data and
saves it into storage.
If we had a blocking api for step #2, it might look something like this.
source.map((key) -> client.callBlocking(key))
.buffer(500, TimeUnit.MILLISECONDS, 100)
.subscribe(dataList -> storage.batchSave(dataList));
With a couple more calls, we could parallelise this, making it so that 100 threads are waiting on client.callBlocking at any given time.
But what if the api we have is already asynchronous and we want to make use of that? I imagine the same pipeline would look something like this
source.magicMethod(new Processor() {
// When downstream requests more items
public void request(int count) {
upstream.request(count);
}
// When upstream delivers us an item
public void onNext(Object key) {
client.callAsync(key)
.onResult((data) -> downstream.onNext(data));
}
})
.buffer(500, TimeUnit.MILLISECONDS, 100)
.subscribe(data -> storage.batchSave(data));
What I want to know is which method is magicMethod. Or perhaps this is a terrible idea to incorporate async calls into a pipeline and we should never ever. (There is also a question of pre-fetching, so that downstream code does not necessarily have to wait for data after requesting it, but let's put that aside for now)
Note that this is not a question about parallelism. The second version could run perfectly well in a single thread (plus whatever threads the client may or may not be using under the hood)
Also, while the question is about RxJava, I'd be just as happy to see an answer using Reactor.
Thanks for helping a poor old reactive noob :)
The versions of buffer operator that don't operate on time honour backpressure as per JavaDoc:
http://reactivex.io/RxJava/2.x/javadoc/io/reactivex/Flowable.html#buffer-int-
However, any version of buffer that involves time based buffers doesn't support backpressure, like this one
http://reactivex.io/RxJava/2.x/javadoc/io/reactivex/Flowable.html#buffer-long-java.util.concurrent.TimeUnit-int-
I understand this comes from the fact that once the time is ticking, you can't stop it similarly to, for example interval operator, that doesn't support backpressure either for the same reason.
What I want is a buffering operator that is both size and time based and fully supports backpressure by propagating the backpressure signals to BOTH the upstream AND the time ticking producer, something like this:
someFlowable()
.buffer(
Flowable.interval(1, SECONDS).onBackpressureDrop(),
10
);
So now I could drop the tick on backpressure signals.
Is this something currently achievable in rxJava2? How about Project-Reactor?
I've encountered the problem recently and here is my implementation. It can be used like this:
Flowable<List<T>> bufferedFlow = (some flowable of T)
.compose(new BufferTransformer(1, TimeUnit.MILLISECONDS, 8))
It supports backpressure by the count you've specified.
Here is the implementation: https://gist.github.com/driventokill/c49f86fb0cc182994ef423a70e793a2d
I had problems with solution from https://stackoverflow.com/a/55136139/6719538 when used DisposableSubscriber as subscribers, and as far as I can see this transformer don't consider calls Suscription#request from downstream subscribers (it could overflow them). I create my version that was tested in production - BufferTransformerHonorableToBackpressure.java. fang-yang - great respect for idea.
It's been a while, but I had a look at this again and somehow it struck me that this:
public static <T> FlowableTransformer<T, List<T>> buffer(
int n, long period, TimeUnit unit)
{
return o ->
o.groupBy(__ -> 1)
.concatMapMaybe(
gf ->
gf.take(n)
.take(period, SECONDS)
.toList()
.filter(l -> !l.isEmpty())
);
}
is pretty much doing what I described.
That, if I am correct is fully backpressured and will either buffer n items or after specified time if enough items haven't been collected
I had another go at it that lead to a quite overengineered solution that seems to be working (TM)
The requirements:
A buffering operator that releases a buffer after a time interval elapses, or the buffer reaches maximum size, whichever happens first
The operator has to be fully backpressured, that is, if requests cease from downstream, the buffer operator should not emit data nor should it raise any exception (like the starndard Flowable.buffer(interval, TimeUnit) operator does. The operator should not consume its source/upstream in an unbounded mode either
Do this with composing existing/implemented operators.
Why would anyone want it?:
The need for such operator came when I wanted to implement a buffering on an infinite/long running stream. I wanted to buffer for efficiency, but the standard Flowable.buffer(n) is not suitable here since an "infinite" stream can emit k < n elements and then not emit items for a long time. Those k elements are trapped in buffer(n). So adding timeout would do the job, but in rxJava2 the buffer operator with timeout doesn't honor backpressure and buffering/dropping or any other built in strategy is not good enough.
The solution outline:
The solution is based on generateAsync and partialCollect operators, both implemented in https://github.com/akarnokd/RxJava2Extensions project. The rest is starndard RxJava2.
First wrap all the values from upstream in a container class C
Then merge that stream with a stream which source is using generateAsync. That stream uses switchMap to emit instancesof C that are effectively timeout signals.
The two merged streams are flowing into partialCollect that holds a reference to an "API" object to emit items into the generateAsync upstream. This is a sort of feedback loop that goes from paritialCollect via the "API" object to generateAsync that feeds back to partialCollect. In this way partialCollect can upon receiving the first element in a buffer emit a signal that will effectively start a timeout. If the buffer doesn't fill before the timeout, it will cause an instance of empty C (not containing any value) flowing back into partialCollect. It will detect it as a timeout signal and release the aggregated buffer downstream. If the buffer is released because of reaching its maximum size, it will be released and the next item will kick off another timeout. Any timeout signal (an instance of empty C) arriving late, aka after the buffer has been released because of reaching maximum size will be ignored. It is possible, because it's the partialCollect that instantiate and sends out the timeout signal item that will potentially flow back to it. Checking the identity of that item allows to detect a late vs legitimate timeout signal.
The code:
https://gist.github.com/artur-jablonski/5eb2bb470868d9eeeb3c9ee247110d4a
Observable and Flowable interfaces seem to be identical. Why Flowable was introduced in RxJava 2.0? When should I prefer to use Flowable over Observable?
As stated in the documentation:
A small regret about introducing backpressure in RxJava 0.x is that
instead of having a separate base reactive class, the Observable
itself was retrofitted. The main issue with backpressure is that many
hot sources, such as UI events, can't be reasonably backpressured and
cause unexpected MissingBackpressureException (i.e., beginners don't
expect them).
We try to remedy this situation in 2.x by having
io.reactivex.Observable non-backpressured and the new
io.reactivex.Flowable be the backpressure-enabled base reactive class.
Use Observable when you have relatively few items over time (<1000) and/or there's no risk of producer overflooding consumers and thus causing OOM.
Use Flowable when you have relatively large amount of items and you need to carefully control how Producer behaves in order to to avoid resource exhaustion and/or congestion.
Backpressure
When you have an observable which emits items so fast that consumer can’t keep up with the flow leading to the existence of emitted but unconsumed items.
How unconsumed items, which are emitted by observables but not consumed by subscribers, are managed and controlled is what backpressure strategy deals with.
Ref link
I'm trying to familiarise myself with the problem of reactive backpressure handling, specifically by reading this wiki: https://github.com/ReactiveX/RxJava/wiki/Backpressure
In the buffer paragraph, we have this more involved example code:
// we have to multicast the original bursty Observable so we can use it
// both as our source and as the source for our buffer closing selector:
Observable<Integer> burstyMulticast = bursty.publish().refCount();
// burstyDebounced will be our buffer closing selector:
Observable<Integer> burstyDebounced = burstMulticast.debounce(10, TimeUnit.MILLISECONDS);
// and this, finally, is the Observable of buffers we're interested in:
Observable<List<Integer>> burstyBuffered = burstyMulticast.buffer(burstyDebounced);
If I understand correctly, we're effectively debouncing the bursty source stream by generating a debounced signal stream for the buffer operator.
But why do we need to use the publish and refcount operators here? What problem would it cause if we'd just drop them? The comments don't make it much clearer for me, aren't RxJava Observables up to multicasting by default?
The answer lies in the difference between hot and cold observables.
Buffer operator combines the 2 streams and has no way to know they have a common source (in your case). When activated (subscribed to), it'll subscribe to them both, which in return will trigger 2 distinct subscriptions to your original input.
Now 2 things can happen, either the input is a hot observable, and the subscription has no effect but to register the listener, and everything will work as expected, or it's a cold observable, and each subscription will result in potentially distinct and desynchronized streams.
For instance a cold observable can be one which performs a network request when subscribed, and notified the result. Not calling publish on it means 2 requests will be done.
Publish+refcount/connect is the usual way to transform a cold observable into a hot one, making sure a single subscribe will happen, and all streams will behave identically.