EDITED: see this question which is more clear and precise:
RxJava flatMap and backpressure strange behavior
I'm currently writing a data synchronization job with RxJava and I'm quite novice with reactive programming and especialy RxJava library.
My job is quite simple I have a list of element IDs, I call a webservice to get each element by ID, do some processing and do multiple call to push data to DB.
I load the data from WS with 1 io thread and push the data to DB with multiple io threads.
However I always end-up with OutOfMemory error.
I thought first that loading the data from the WS was faster than storing them in the DBs.
But as both WS call and DB call synchronous call should they exert backpressure on each other?
Thank you for your help.
My code pretty much look like this:
#Test
public void test() {
int MAX_CONCURRENT_LOAD = 1;
int MAX_CONCURRENT_STORE = 2;
List<Integer> ids = IntStream.range(0, 10000).boxed().collect(Collectors.toList());
Observable.from(ids)
.flatMap(this::produce, MAX_CONCURRENT_LOAD)
.flatMap(this::consume, MAX_CONCURRENT_STORE)
.toBlocking().forEach(s -> System.out.println("Value " + s));
System.out.println("Finished");
}
private Observable<Integer> produce(final int value) {
return Observable.<Integer>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(500); //Here I call WS to retrieve data
s.onNext(value);
s.onCompleted();
}
} catch (Exception e) {
s.onError(e);
}
}).subscribeOn(Schedulers.io());
}
private Observable<Boolean> consume(Integer value) {
return Observable.<Boolean>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(10000); //Here I call DB to store data
s.onNext(true);
s.onCompleted();
}
} catch (Exception e) {
s.onNext(false);
s.onCompleted();
}
}).subscribeOn(Schedulers.io());
}
It seems your WS is poll based so if you use fromCallable instead of your custom Observable, you get proper backpressure:
return Observable.<Integer>fromCallabe(s -> {
Thread.sleep(500); //Here I call WS to retrieve data
return value;
}).subscribeOn(Schedulers.io());
Otherwise, if you have blocking WS and blocking database, you can use them to backpressure each other:
ids.map(id -> db.store(ws.get(id)).subscribeOn(Schedulers.io())
.toBlocking().subscribe(...)
and potentially leave off subscribeOn and toBlocking as well.
Related
I need to copy date from one source (in parallel) to another with batches.
I did this:
Flux.generate((SynchronousSink<String> sink) -> {
try {
String val = dataSource.getNextItem();
if (val == null) {
sink.complete();
return;
}
sink.next(val);
} catch (InterruptedException e) {
sink.error(e);
}
})
.parallel(4)
.runOn(Schedulers.parallel())
.doOnNext(dataTarget::write)
.sequential()
.blockLast();
class dataSource{
public Item getNextItem(){
//...
}
}
class dataTarget{
public void write(List<Item> items){
//...
}
}
It receives data in parallel, but writes one at a time.
I need to collect them in batches (like by 10 items) and then write the batch.
How can I do that?
UPDATE:
The main idea that the source is the messaging system (i.e. rabbitmq or nats) that's suitable to efficiently send messages one by one, but the target is the database which is more efficient on inserting a batch.
So the final result should be like — I receive messages in parallel until buffer is not filled up, then I write all the buffer into database by one shot.
It's easy to do in regular java, but in case of streams — I don't get how to do it. How to buffer the data and how to pause the reader till the writer is not ready to get next part.
All you need is Flux#buffer(int maxSize) operator:
Flux.generate((SynchronousSink<String> sink) -> {
try {
String val = dataSource.getNextItem();
if (val == null) {
sink.complete();
return;
}
sink.next(val);
} catch (InterruptedException e) {
sink.error(e);
}
})
.buffer(10) //Flux<List<String>>
.flatMap(dataTarget::write)
.blockLast();
class DataTarget{
public Mono<Void> write(List<String> items){
return reactiveDbClient.insert(items);
}
}
Here, buffer collects items into multiple List's of 10 items(batches). You do not need to use parallel scheduler. The flatmap will run these operations asynchronously. See Understanding Reactive’s .flatMap() Operator.
You need to do your heavy work in individual Publisher-s which will be materialized in flatMap() in parallel. Like this
Flux.generate((SynchronousSink<String> sink) -> {
try {
String val = dataSource.getNextItem();
if (val == null) {
sink.complete();
return;
}
sink.next(val);
} catch (InterruptedException e) {
sink.error(e);
}
})
.parallel(4)
.runOn(Schedulers.parallel())
.flatMap(item -> Mono.fromCallable(() -> dataTarget.write(item)))
.sequential()
.blockLast();
Best approach (from algorithmic view) is to have ringbuffer and use microbatching technique. Writes to ringbuffer is done from rabbitmq, one-by-one (or multiple in parallel). Reading thread (single only) would get all messages at once (presented at a time of batch start), insert them into database and do it again... All at once means single message (if there is only one), or bunch of them (if they have been accumulated while duration of last insert was long enough to).
This technique is used also in jdbc (if I remember correctly) and can be implemented easily using lmax disruptor library in java.
Sample project (using ractor /Flux/ and System.out.println) can be found on https://github.com/luvarqpp/reactorBatch
Core code:
final Flux<String> stringFlux = Flux.interval(Duration.ofMillis(1)).map(x -> "Msg number " + x);
final Flux<List<String>> stringFluxMicrobatched = stringFlux
.bufferTimeout(100, Duration.ofNanos(1));
stringFluxMicrobatched.subscribe(strings -> {
// Batch insert into DB
System.out.print("Inserting in batch " + strings.size() + " strings.");
try {
// Inserting into db is simulated by 10 to 40 ms sleep here...
Thread.sleep(rnd.nextInt(30) + 10);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(" ... Done");
});
Please feel welcome to edit and improve this post with name of technique and references. This is community wiki...
I'm trying to understand how to apply backpressure in Spring WebFlux. I understand the theory of backpressure, but I can't reproduce it, so I don't fully understand it.
Let's take the following example:
public void test() throws InterruptedException {
EmitterProcessor<String> processor = EmitterProcessor.create();
new Thread(() -> {
int i = 0;
while(runThread) {
try {
Thread.sleep(100);
} catch (InterruptedException ignored) {
}
processor.onNext("Value: " + i);
i++;
}
processor.onComplete();
}).start();
processor
.subscribe(makeSubscriber("FIRST - "), Throwable::printStackTrace);
}
private Consumer<String> makeSubscriber(String label) {
return v -> {
System.out.println(label + v);
try {
Thread.sleep(1000);
} catch (InterruptedException ignored) {
}
};
}
I have created a Hot Flux in the form of an EmitterProcessor and in a separate thread I start producing data for it.
A bit lower, I subscribe to it. The subscriber is slower than the rate at which elements are being produced, so the issues should start to occur, right?
But the subscriber logic is run on the producer thread. When I call processor.onNext(), it synchronously calls all the subscribers, so if the subscribers are slow, the publisher is slowed down as well. So, then backpressure doesn't even seem useful.
I have also tried making two Spring Boot WebFlux applications, one with a Flux endpoint and one that consumes the endpoint, so I can be certain the consumer runs on a separate thread. But then, any attempt I make at backpressure in the consumer does nothing. There is no buffer being filled, there is nothing being dropped or anything!
Can anyone give me a concrete example of backpressure? Preferably in Spring WebFlux but I'll take any reactive Java library.
the documentation to the variant of subscribe method you have chosen reads:
The subscription will request an unbounded demand (Long.MAX_VALUE).
that is, you switched off backpressure yourself.
To use backpressure , subscribe with Flux.subscribe(Subscriber)
I have a Flowable that we are returning in a function that will continually read from a database and add it to a Flowable.
public void scan() {
Flowable<String> flow = Flowable.create((FlowableOnSubscribe<String>) emitter -> {
Result result = new Result();
while (!result.hasData()) {
result = request.query(skip, limit);
partialResult.getResult()
.getFeatures().forEach(feature -> emmitter.emit(feature));
}
}, BackpressureStrategy.BUFFER)
.subscribeOn(Schedulers.io());
return flow;
}
Then I have another object that can call this method.
myObj.scan()
.parallel()
.runOn(Schedulers.computation())
.map(feature -> {
//Heavy Computation
})
.sequential()
.blockingSubscribe(msg -> {
logger.debug("Successfully processed " + msg);
}, (e) -> {
logger.error("Failed to process features because of error with scan", e);
});
My heavy computation section could potentially take a very long time. So long in fact that there is a good chance that the database requests will load the whole database into memory before the consumer finishes the first couple entries.
I have read up on backpressure with rxjava but the only 4 options essentially make me drop data or replace it with the last.
Is there a way to make it so that when I call emmitter.emit(feature) the call blocks until there is more room in the Flowable?
I.E I want to treat the Flowable as a blocking queue where push will sleep if the queue is past the capacity.
I am just learning and trying to apply CompletableFuture to my problem statement. I have a list of items I am iterating over.
Prop is a class with only two attributes prop1 and prop2, respective getters and setters.
List<Prop> result = new ArrayList<>();
for ( Item item : items ) {
item.load();
Prop temp = new Prop();
// once the item is loaded, get its properties
temp.setProp1(item.getProp1());
temp.setProp2(item.getProp2());
result.add(temp);
}
return result;
However, item.load() here is a blocking call. So, I was thinking to use CompletableFuture something like below -
for (Item item : items) {
CompletableFuture<Prop> prop = CompletableFuture.supplyAsync(() -> {
try {
item.load();
return item;
} catch (Exception e) {
logger.error("Error");
return null;
}
}).thenApply(item1 -> {
try {
Prop temp = new Prop();
// once the item is loaded, get its properties
temp.setProp1(item.getProp1());
temp.setProp2(item.getProp2());
return temp;
} catch (Exception e) {
}
});
}
But I am not sure how I can wait for all the items to be loaded and then aggregate and return their result.
I may be completely wrong in the way of implementing CompletableFutures since this is my first attempt. Please pardon any mistake. Thanks in advance for any help.
There are two issues with your approach of using CompletableFuture.
First, you say item.load() is a blocking call, so the CompletableFuture’s default executor is not suitable for it, as it tries to achieve a level of parallelism matching the number of CPU cores. You could solve this by passing a different Executor to CompletableFuture’s asynchronous methods, but your load() method doesn’t return a value that your subsequent operations rely on. So the use of CompletableFuture complicates the design without a benefit.
You can perform the load() invocations asynchronously and wait for their completion just using an ExecutorService, followed by the loop as-is (without the already performed load() operation, of course):
ExecutorService es = Executors.newCachedThreadPool();
es.invokeAll(items.stream()
.map(i -> Executors.callable(i::load))
.collect(Collectors.toList()));
es.shutdown();
List<Prop> result = new ArrayList<>();
for(Item item : items) {
Prop temp = new Prop();
// once the item is loaded, get its properties
temp.setProp1(item.getProp1());
temp.setProp2(item.getProp2());
result.add(temp);
}
return result;
You can control the level of parallelism through the choice of the executor, e.g. you could use a Executors.newFixedThreadPool(numberOfThreads) instead of the unbounded thread pool.
Well, backpressure in RxJava is not real backpressure, but only ignoring some sets of elements.
But what if I cannot loose any elements and I need to slow emition somehow?
RxJava cannot affect element emition, so developer needs to implement it by himself. But how?
The simpliest way comes to mind is to use some counter with incrementing on emition and decrementing on finishing.
Like that:
public static void sleep(int ms) {
try {
Thread.sleep(ms);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws InterruptedException {
AtomicInteger counter = new AtomicInteger();
Scheduler sA = Schedulers.from(Executors.newFixedThreadPool(1));
Scheduler sB = Schedulers.from(Executors.newFixedThreadPool(5));
Observable.create(s -> {
while (!s.isUnsubscribed()) {
if (counter.get() < 100) {
s.onNext(Math.random());
counter.incrementAndGet();
} else {
sleep(100);
}
}
}).subscribeOn(sA)
.flatMap(r ->
Observable.just(r)
.subscribeOn(sB)
.doOnNext(x -> sleep(1000))
.doOnNext(x -> counter.decrementAndGet())
)
.subscribe();
}
But I think this way is very poor. Is there any better solutions?
Well, backpressure in RxJava is not real backpressure
RxJava's backpressure implementation is a non-blocking cooperation between subsequent producers and consumers through a request channel. The consumer asks for some amount of elements via request() and the producers creates/generates/emits at most that amount of items via onNext, sometimes with delays between onNexts.
but only ignoring some sets of elements.
This happens only when you explicitly tell RxJava to drop any overflow.
RxJava cannot affect element emition, so developer needs to implement it by himself. But how?
Using Observable.create requires advanced knowledge of how non-blocking backpressure can be implemented and practically it is not recommended to library users. RxJava has plenty of ways to give you backpressure-enabled flows without complications:
Observable.range(1, 100)
.map(v -> Math.random())
.subscribeOn(sA)
.flatMap(v ->
Observable.just(v).subscribeOn(sB)
.doOnNext(x -> sleep(1000))
)
.subscribe();
or
Observable.create(SyncOnSubscribe.createStateless(
o -> o.onNext(Math.random())
)
.subscribeOn(sA)
...
As you noted yourself, this actually has nothing to do with RxJava.
If you must process all events eventually, but you want to do that at your own pace, use queues:
ExecutorService emiter = Executors.newSingleThreadExecutor();
ScheduledExecutorService workers = Executors.newScheduledThreadPool(4);
BlockingQueue<String> events = new LinkedBlockingQueue<>();
emiter.submit(() -> {
System.out.println("I'll send 100 events as fast as I can");
for (int i = 0; i < 100; i++) {
try {
events.put(UUID.randomUUID().toString());
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
workers.scheduleWithFixedDelay(
() -> {
String result = null;
try {
result = events.take();
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(String.format("I don't care, got %s only now", result));
}, 0, 1, TimeUnit.SECONDS
);