Best implementation of REAL backpressure in RxJava - java

Well, backpressure in RxJava is not real backpressure, but only ignoring some sets of elements.
But what if I cannot loose any elements and I need to slow emition somehow?
RxJava cannot affect element emition, so developer needs to implement it by himself. But how?
The simpliest way comes to mind is to use some counter with incrementing on emition and decrementing on finishing.
Like that:
public static void sleep(int ms) {
try {
Thread.sleep(ms);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws InterruptedException {
AtomicInteger counter = new AtomicInteger();
Scheduler sA = Schedulers.from(Executors.newFixedThreadPool(1));
Scheduler sB = Schedulers.from(Executors.newFixedThreadPool(5));
Observable.create(s -> {
while (!s.isUnsubscribed()) {
if (counter.get() < 100) {
s.onNext(Math.random());
counter.incrementAndGet();
} else {
sleep(100);
}
}
}).subscribeOn(sA)
.flatMap(r ->
Observable.just(r)
.subscribeOn(sB)
.doOnNext(x -> sleep(1000))
.doOnNext(x -> counter.decrementAndGet())
)
.subscribe();
}
But I think this way is very poor. Is there any better solutions?

Well, backpressure in RxJava is not real backpressure
RxJava's backpressure implementation is a non-blocking cooperation between subsequent producers and consumers through a request channel. The consumer asks for some amount of elements via request() and the producers creates/generates/emits at most that amount of items via onNext, sometimes with delays between onNexts.
but only ignoring some sets of elements.
This happens only when you explicitly tell RxJava to drop any overflow.
RxJava cannot affect element emition, so developer needs to implement it by himself. But how?
Using Observable.create requires advanced knowledge of how non-blocking backpressure can be implemented and practically it is not recommended to library users. RxJava has plenty of ways to give you backpressure-enabled flows without complications:
Observable.range(1, 100)
.map(v -> Math.random())
.subscribeOn(sA)
.flatMap(v ->
Observable.just(v).subscribeOn(sB)
.doOnNext(x -> sleep(1000))
)
.subscribe();
or
Observable.create(SyncOnSubscribe.createStateless(
o -> o.onNext(Math.random())
)
.subscribeOn(sA)
...

As you noted yourself, this actually has nothing to do with RxJava.
If you must process all events eventually, but you want to do that at your own pace, use queues:
ExecutorService emiter = Executors.newSingleThreadExecutor();
ScheduledExecutorService workers = Executors.newScheduledThreadPool(4);
BlockingQueue<String> events = new LinkedBlockingQueue<>();
emiter.submit(() -> {
System.out.println("I'll send 100 events as fast as I can");
for (int i = 0; i < 100; i++) {
try {
events.put(UUID.randomUUID().toString());
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
workers.scheduleWithFixedDelay(
() -> {
String result = null;
try {
result = events.take();
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(String.format("I don't care, got %s only now", result));
}, 0, 1, TimeUnit.SECONDS
);

Related

Project Reactor: buffer with parallel execution

I need to copy date from one source (in parallel) to another with batches.
I did this:
Flux.generate((SynchronousSink<String> sink) -> {
try {
String val = dataSource.getNextItem();
if (val == null) {
sink.complete();
return;
}
sink.next(val);
} catch (InterruptedException e) {
sink.error(e);
}
})
.parallel(4)
.runOn(Schedulers.parallel())
.doOnNext(dataTarget::write)
.sequential()
.blockLast();
class dataSource{
public Item getNextItem(){
//...
}
}
class dataTarget{
public void write(List<Item> items){
//...
}
}
It receives data in parallel, but writes one at a time.
I need to collect them in batches (like by 10 items) and then write the batch.
How can I do that?
UPDATE:
The main idea that the source is the messaging system (i.e. rabbitmq or nats) that's suitable to efficiently send messages one by one, but the target is the database which is more efficient on inserting a batch.
So the final result should be like — I receive messages in parallel until buffer is not filled up, then I write all the buffer into database by one shot.
It's easy to do in regular java, but in case of streams — I don't get how to do it. How to buffer the data and how to pause the reader till the writer is not ready to get next part.
All you need is Flux#buffer(int maxSize) operator:
Flux.generate((SynchronousSink<String> sink) -> {
try {
String val = dataSource.getNextItem();
if (val == null) {
sink.complete();
return;
}
sink.next(val);
} catch (InterruptedException e) {
sink.error(e);
}
})
.buffer(10) //Flux<List<String>>
.flatMap(dataTarget::write)
.blockLast();
class DataTarget{
public Mono<Void> write(List<String> items){
return reactiveDbClient.insert(items);
}
}
Here, buffer collects items into multiple List's of 10 items(batches). You do not need to use parallel scheduler. The flatmap will run these operations asynchronously. See Understanding Reactive’s .flatMap() Operator.
You need to do your heavy work in individual Publisher-s which will be materialized in flatMap() in parallel. Like this
Flux.generate((SynchronousSink<String> sink) -> {
try {
String val = dataSource.getNextItem();
if (val == null) {
sink.complete();
return;
}
sink.next(val);
} catch (InterruptedException e) {
sink.error(e);
}
})
.parallel(4)
.runOn(Schedulers.parallel())
.flatMap(item -> Mono.fromCallable(() -> dataTarget.write(item)))
.sequential()
.blockLast();
Best approach (from algorithmic view) is to have ringbuffer and use microbatching technique. Writes to ringbuffer is done from rabbitmq, one-by-one (or multiple in parallel). Reading thread (single only) would get all messages at once (presented at a time of batch start), insert them into database and do it again... All at once means single message (if there is only one), or bunch of them (if they have been accumulated while duration of last insert was long enough to).
This technique is used also in jdbc (if I remember correctly) and can be implemented easily using lmax disruptor library in java.
Sample project (using ractor /Flux/ and System.out.println) can be found on https://github.com/luvarqpp/reactorBatch
Core code:
final Flux<String> stringFlux = Flux.interval(Duration.ofMillis(1)).map(x -> "Msg number " + x);
final Flux<List<String>> stringFluxMicrobatched = stringFlux
.bufferTimeout(100, Duration.ofNanos(1));
stringFluxMicrobatched.subscribe(strings -> {
// Batch insert into DB
System.out.print("Inserting in batch " + strings.size() + " strings.");
try {
// Inserting into db is simulated by 10 to 40 ms sleep here...
Thread.sleep(rnd.nextInt(30) + 10);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(" ... Done");
});
Please feel welcome to edit and improve this post with name of technique and references. This is community wiki...

How to check when all CompleteableFuture are done?

I have a Stream<Item> which I'm mapping to a CompleteableFuture<ItemResult>
What I'd like to do is to know when all the futures are completed.
One may suggest to:
collect all the futures to an array and use CompleteableFuture.allOf(). This is somewhat problematic since there could be hundreds of thousands of items
just continue with forEach(CompleteableFuture::join). This is problematic too as calling forEach with join will just block the stream and it will be essentially a serial processing and not concurrent
Inject a poisoned item in the end of the stream. This could work but it's not that elegant in my view
check if the executor queue is empty - This is quite limiting because I might use more than one executor in the future. Also, the queue can be momentarily empty
Monitor the database instead and check the number of new items
I feel like all the suggested solutions aren't good enough.
What is the appropriate way to monitor the futures?
Thanks
EDIT:
another (vague) idea I had in mind is to use a counter and wait for it to go down to zero. But again, need to check that it's not a momentarily 0..
Disclaimer: I'm not sure whether Phaser is the right tool here, and if yes, whether it's better to have one root with multiple children or to chain them like I'm proposing below, so feel free to correct me.
Here's one approach that uses Phaser.
A Phaser has a limited number of parties, so we need to create a new child Phaser if that limit is about to get reached:
private Phaser register(Phaser phaser) {
if (phaser.getRegisteredParties() < 65534) {
// warning: side-effect,
// conflicts with AtomicReference#updateAndGet recommendation,
// might not fit well if the Stream is parallel:
phaser.register();
return phaser;
} else {
return new Phaser(phaser, 1);
}
}
Register each CompletableFuture against that Phaser chain, and deregister once done:
private void register(CompletableFuture<?> future, AtomicReference<Phaser> phaser) {
Phaser registeredPhaser = phaser.updateAndGet(this::register);
future
.thenRun(registeredPhaser::arriveAndDeregister)
.exceptionally(e -> {
// log e?
registeredPhaser.arriveAndDeregister();
return null;
});
}
Wait for all futures to be finished:
private <T> void await(Stream<CompletableFuture<T>> futures) {
Phaser rootPhaser = new Phaser(1);
AtomicReference<Phaser> phaser = new AtomicReference<>(rootPhaser);
futures.forEach(future -> register(future, phaser));
rootPhaser.arriveAndAwaitAdvance();
rootPhaser.arriveAndDeregister();
}
Example:
ExecutorService executor = Executors.newFixedThreadPool(500);
// creating fake stream with 500,000 futures:
Stream<CompletableFuture<Integer>> stream = IntStream
.rangeClosed(1, 500_000)
.mapToObj(i -> CompletableFuture.supplyAsync(() -> {
try {
TimeUnit.MILLISECONDS.sleep(10);
if (i % 50_000 == 0) {
System.out.println(Thread.currentThread().getName() + ": " + i);
}
return i;
} catch (InterruptedException e) {
throw new IllegalStateException(e);
}
}, executor));
// usage:
await(stream);
System.out.println("Done");
Outputs:
pool-1-thread-348: 50000
pool-1-thread-395: 100000
pool-1-thread-333: 150000
pool-1-thread-30: 200000
pool-1-thread-120: 250000
pool-1-thread-10: 300000
pool-1-thread-241: 350000
pool-1-thread-340: 400000
pool-1-thread-283: 450000
pool-1-thread-176: 500000
Done

Force a wait or sleep during stream

Let's say I have:
list.stream()
.map(someService::someRateLimitedApiCall) //does not implement Runnable
.filter(Optional::isPresent)
.map(Optional::get)
.sleep(1000) //is something like this possible?
.min...;
The API service only allows limited number of transactions per second, and I am seeking to introduce a delay in between calls.
If not, is there a way to add an executor with a fixed delay within the iteration of the stream?
(To be clear, I am not violating the terms of the external API and will not abuse the service.)
Rather than use peek, why not just put the delay in the map operation which calls the API?
.map(e -> {
try {
Thread.sleep(1000);
} catch (InterruptedException ex) {
return Optional.empty();
}
return someRateLimitedApiCall(e);
})
The simple solution (without parallel streams) was to use peek as multiple commenters suggested. Since it requires a Consumer:
.peek(i -> {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
})

Rx java OutOfMemory

EDITED: see this question which is more clear and precise:
RxJava flatMap and backpressure strange behavior
I'm currently writing a data synchronization job with RxJava and I'm quite novice with reactive programming and especialy RxJava library.
My job is quite simple I have a list of element IDs, I call a webservice to get each element by ID, do some processing and do multiple call to push data to DB.
I load the data from WS with 1 io thread and push the data to DB with multiple io threads.
However I always end-up with OutOfMemory error.
I thought first that loading the data from the WS was faster than storing them in the DBs.
But as both WS call and DB call synchronous call should they exert backpressure on each other?
Thank you for your help.
My code pretty much look like this:
#Test
public void test() {
int MAX_CONCURRENT_LOAD = 1;
int MAX_CONCURRENT_STORE = 2;
List<Integer> ids = IntStream.range(0, 10000).boxed().collect(Collectors.toList());
Observable.from(ids)
.flatMap(this::produce, MAX_CONCURRENT_LOAD)
.flatMap(this::consume, MAX_CONCURRENT_STORE)
.toBlocking().forEach(s -> System.out.println("Value " + s));
System.out.println("Finished");
}
private Observable<Integer> produce(final int value) {
return Observable.<Integer>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(500); //Here I call WS to retrieve data
s.onNext(value);
s.onCompleted();
}
} catch (Exception e) {
s.onError(e);
}
}).subscribeOn(Schedulers.io());
}
private Observable<Boolean> consume(Integer value) {
return Observable.<Boolean>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(10000); //Here I call DB to store data
s.onNext(true);
s.onCompleted();
}
} catch (Exception e) {
s.onNext(false);
s.onCompleted();
}
}).subscribeOn(Schedulers.io());
}
It seems your WS is poll based so if you use fromCallable instead of your custom Observable, you get proper backpressure:
return Observable.<Integer>fromCallabe(s -> {
Thread.sleep(500); //Here I call WS to retrieve data
return value;
}).subscribeOn(Schedulers.io());
Otherwise, if you have blocking WS and blocking database, you can use them to backpressure each other:
ids.map(id -> db.store(ws.get(id)).subscribeOn(Schedulers.io())
.toBlocking().subscribe(...)
and potentially leave off subscribeOn and toBlocking as well.

Stream generate at fixed rate

I'm using Stream.generate to get data from Instagram. As instagram limits calls per hour I want generate to run less frequent then every 2 seconds.
I've chosen such title because I moved from ScheduledExecutorService.scheduleAtFixedRate and that's what I was searching for. I do realise that stream intermediate operations are lazy and cannot be called on schedule. If you have better idea for title let me know.
So again I want to have at least 2 second delay between genations.
My attempt wich doesn't take into consideration time consumed by operations after generate, which might take longer then 2s:
Stream.generate(() -> {
List<MediaFeedData> feedDataList = null;
while (feedDataList == null) {
try {
Thread.sleep(2000);
feedDataList = newData();
} catch (InstagramException e) {
notifyError(e.getMessage());
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return feedDataList;
})
A solution would be to decouple the generator from the Stream, for example using a BlockingQueue
final BlockingQueue<Integer> queue = new LinkedBlockingQueue<>(100);
ScheduledExecutorService scheduler = new ScheduledThreadPoolExecutor(1);
scheduler.scheduleAtFixedRate(() -> {
// Generate new data every 2s, regardless of their processing rate
ThreadLocalRandom random = ThreadLocalRandom.current();
queue.offer(random.nextInt(10));
}, 0, 2, TimeUnit.SECONDS);
Stream.generate(() -> {
try {
// Accept new data if ready, or wait for some more to be generated
return queue.take();
} catch (InterruptedException e) {}
return -1;
}).forEach(System.out::println);
If the data processing takes more than 2s, new data will be enqueued and wait to be consumed. If it takes less than 2s, the take method in the generator will wait for new data to be produced by the scheduler.
This way, you are guaranteed to make less than N calls per hour to Instagram !
As far as I understand, your question is about solving two problems:
waiting at a fixed rate rather than a fixed delay
creating a stream for an unknown number of items which allows processing until some point of time (i.e. is not infinite)
You can solve the first task by using a deadline-based waiting and the second by implementing a Spliterator:
Stream<List<MediaFeedData>> stream = StreamSupport.stream(
new Spliterators.AbstractSpliterator<List<MediaFeedData>>(Long.MAX_VALUE, 0) {
long lastTime=System.currentTimeMillis();
#Override
public boolean tryAdvance(Consumer<? super List<MediaFeedData>> action) {
if(quitCondition()) return false;
List<MediaFeedData> feedDataList = null;
while (feedDataList == null) {
lastTime+=TimeUnit.SECONDS.toMillis(2);
while(System.currentTimeMillis()<lastTime)
LockSupport.parkUntil(lastTime);
try {
feedDataList=newData();
} catch (InstagramException e) {
notifyError(e.getMessage());
if(QUIT_ON_EXCEPTION) return false;
}
}
action.accept(feedDataList);
return true;
}
}, false);
Make a Timer and a semaphore. The timer raises the semaphore every 2 seconds, and in the stream you wait on every call for the semaphore.
This keeps the waits to the specified minimum (2 s), and - funnily - would even work with .parallel().
private final volatile Semaphore tickingSemaphore= new Semaphore(1, true);
In its own thread:
Stream.generate(() -> {
tickingSemaphore.acquire();
...
};
In the timer:
tickingSemaphore.release();

Categories