I have a Flowable that we are returning in a function that will continually read from a database and add it to a Flowable.
public void scan() {
Flowable<String> flow = Flowable.create((FlowableOnSubscribe<String>) emitter -> {
Result result = new Result();
while (!result.hasData()) {
result = request.query(skip, limit);
partialResult.getResult()
.getFeatures().forEach(feature -> emmitter.emit(feature));
}
}, BackpressureStrategy.BUFFER)
.subscribeOn(Schedulers.io());
return flow;
}
Then I have another object that can call this method.
myObj.scan()
.parallel()
.runOn(Schedulers.computation())
.map(feature -> {
//Heavy Computation
})
.sequential()
.blockingSubscribe(msg -> {
logger.debug("Successfully processed " + msg);
}, (e) -> {
logger.error("Failed to process features because of error with scan", e);
});
My heavy computation section could potentially take a very long time. So long in fact that there is a good chance that the database requests will load the whole database into memory before the consumer finishes the first couple entries.
I have read up on backpressure with rxjava but the only 4 options essentially make me drop data or replace it with the last.
Is there a way to make it so that when I call emmitter.emit(feature) the call blocks until there is more room in the Flowable?
I.E I want to treat the Flowable as a blocking queue where push will sleep if the queue is past the capacity.
Related
In my project, there is a Spring scheduler periodically scans "TO BE DONE" tasks from DB, then distributing them to task consumer for subsequent handling. So, the current implementation is to construct a Reactor Sinks between producer and consumer.
Sinks.Many<Task> taskSink = Sinks.many().multicast().onBackpressureBuffer(1000, false);
Producer:
Flux<Date> dates = loadDates();
dates.filterWhen(...)
.concatMap(date -> taskManager.getTaskByDate(date))
.doOnNext(taskSink::tryEmitNext)
.subscribe();
Consumer:
taskProcessor.process(taskSink.asFlux())
.subscribeOn(Schedulers.boundedElastic())
.subscribe();
By using Sink, it works fine for most of cases. But when the system under heavy load, system maintainer would want to know:
How many tasks still sitting in the Sink?
If it is possible to clear all tasks within the Sink.
If it is possible to prioritize tasks within the Sink.
Unfortunately, Sink it's impossible to fulfill all the needs mentioned above.
So, I created a wrapper class that includes a Map and PriorityBlockingQueue. I refrerenced the implementation from this link https://stackoverflow.com/a/71009712/19278017.
After that, the original producer-consumer code revised as below:
Task queue:
MergingQueue<Task> taskQueue = new PriorityMergingQueue();
Producer:
Flux<Date> dates = loadDates();
dates.filterWhen(...)
.concatMap(date -> taskManager.getTaskByDate(date))
.doOnNext(taskQueue::enqueue)
.subscribe();
Consumer:
taskProcessor.process(Flux.create((sink) -> {
sink.onRequest(n -> {
Task task;
try {
while(!sink.isCancel() && n > 0) {
if(task = taskQueue.poll(1, TimeUnit.SECOND) != null) {
sink.next(task);
n--;
}
} catch() {
....
})
.subscribeOn(Schedulers.boundedElastic())
.subscribe();
I got some questions as below:
Will that be an issue the code doing a .poll()? Since, I came across thread hang issue during the longevity testing. Just not sure if it's due to the poll() call.
Is there any alternative solution in Reactor, which works like a PriorityBlockingQueue?
The goal of reactive programming is to avoid blocking operations. PriorityBlockingQueue.poll() will cause issues as it will block the thread waiting for the next element.
There is however an alternative solution in Reactor: the unicast version of Sinks.Many allows using an arbitrary Queue for buffering using Sinks.many().unicast().onBackPressureBuffer(Queue<T>). By using a PriorityQueue instanced outside of the Sink, you can fulfill all three requirements.
Here is a short demo where I emit a Task every 100ms:
public record Task(int prio) {}
private static void log(Object message) {
System.out.println(LocalTime.now(ZoneOffset.UTC).truncatedTo(ChronoUnit.MILLIS) + ": " + message);
}
public void externalBufferDemo() throws InterruptedException {
Queue<Task> taskQueue = new PriorityQueue<>(Comparator.comparingInt(Task::prio).reversed());
Sinks.Many<Task> taskSink = Sinks.many().unicast().onBackpressureBuffer(taskQueue);
taskSink.asFlux()
.delayElements(Duration.ofMillis(100))
.subscribe(task -> log(task));
for (int i = 0; i < 10; i++) {
taskSink.tryEmitNext(new Task(i));
}
// Show amount of tasks sitting in the Sink:
log("Nr of tasks in sink: " + taskQueue.size());
// Clear all tasks in the sink after 350ms:
Thread.sleep(350);
taskQueue.clear();
log("Nr of tasks after clear: " + taskQueue.size());
Thread.sleep(1500);
}
Output:
09:41:11.347: Nr of tasks in sink: 9
09:41:11.450: Task[prio=0]
09:41:11.577: Task[prio=9]
09:41:11.687: Task[prio=8]
09:41:11.705: Nr of tasks after clear: 0
09:41:11.799: Task[prio=7]
Note that delayElements has an internal queue of size 1, which is why Task 0 was picked up before Task 1 was emitted, and why Task 7 was picked up after the clear.
If multicast is required, you can transform your flux using one of the many operators enabling multicasting.
I need to copy date from one source (in parallel) to another with batches.
I did this:
Flux.generate((SynchronousSink<String> sink) -> {
try {
String val = dataSource.getNextItem();
if (val == null) {
sink.complete();
return;
}
sink.next(val);
} catch (InterruptedException e) {
sink.error(e);
}
})
.parallel(4)
.runOn(Schedulers.parallel())
.doOnNext(dataTarget::write)
.sequential()
.blockLast();
class dataSource{
public Item getNextItem(){
//...
}
}
class dataTarget{
public void write(List<Item> items){
//...
}
}
It receives data in parallel, but writes one at a time.
I need to collect them in batches (like by 10 items) and then write the batch.
How can I do that?
UPDATE:
The main idea that the source is the messaging system (i.e. rabbitmq or nats) that's suitable to efficiently send messages one by one, but the target is the database which is more efficient on inserting a batch.
So the final result should be like — I receive messages in parallel until buffer is not filled up, then I write all the buffer into database by one shot.
It's easy to do in regular java, but in case of streams — I don't get how to do it. How to buffer the data and how to pause the reader till the writer is not ready to get next part.
All you need is Flux#buffer(int maxSize) operator:
Flux.generate((SynchronousSink<String> sink) -> {
try {
String val = dataSource.getNextItem();
if (val == null) {
sink.complete();
return;
}
sink.next(val);
} catch (InterruptedException e) {
sink.error(e);
}
})
.buffer(10) //Flux<List<String>>
.flatMap(dataTarget::write)
.blockLast();
class DataTarget{
public Mono<Void> write(List<String> items){
return reactiveDbClient.insert(items);
}
}
Here, buffer collects items into multiple List's of 10 items(batches). You do not need to use parallel scheduler. The flatmap will run these operations asynchronously. See Understanding Reactive’s .flatMap() Operator.
You need to do your heavy work in individual Publisher-s which will be materialized in flatMap() in parallel. Like this
Flux.generate((SynchronousSink<String> sink) -> {
try {
String val = dataSource.getNextItem();
if (val == null) {
sink.complete();
return;
}
sink.next(val);
} catch (InterruptedException e) {
sink.error(e);
}
})
.parallel(4)
.runOn(Schedulers.parallel())
.flatMap(item -> Mono.fromCallable(() -> dataTarget.write(item)))
.sequential()
.blockLast();
Best approach (from algorithmic view) is to have ringbuffer and use microbatching technique. Writes to ringbuffer is done from rabbitmq, one-by-one (or multiple in parallel). Reading thread (single only) would get all messages at once (presented at a time of batch start), insert them into database and do it again... All at once means single message (if there is only one), or bunch of them (if they have been accumulated while duration of last insert was long enough to).
This technique is used also in jdbc (if I remember correctly) and can be implemented easily using lmax disruptor library in java.
Sample project (using ractor /Flux/ and System.out.println) can be found on https://github.com/luvarqpp/reactorBatch
Core code:
final Flux<String> stringFlux = Flux.interval(Duration.ofMillis(1)).map(x -> "Msg number " + x);
final Flux<List<String>> stringFluxMicrobatched = stringFlux
.bufferTimeout(100, Duration.ofNanos(1));
stringFluxMicrobatched.subscribe(strings -> {
// Batch insert into DB
System.out.print("Inserting in batch " + strings.size() + " strings.");
try {
// Inserting into db is simulated by 10 to 40 ms sleep here...
Thread.sleep(rnd.nextInt(30) + 10);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(" ... Done");
});
Please feel welcome to edit and improve this post with name of technique and references. This is community wiki...
There is the following pipeline:
item is produced (the producer is external to the pipeline);
item is deserialized (JSON to Java object);
item is processed;
At the moment it all happens synchronously in a single thread:
while(producer.next()) {
var item = gson.deserialize(producer.item());
processItem(item);
}
Or schematically:
PRODUCER -> DESERIALIZATION -> CONSUMER
(sync) (sync) (sync)
The concern is that the deserialization step has no side-effects and could be parallelized saving some world time.
The overall code should like the following:
var pipeline = new Pipeline<Item>();
pipeline.setProducer(producer);
pipeline.setDeserialization(gson::deserialize);
pipeline.setConsumer(item -> {
...
});
pipeline.run();
Or schematically:
-> DESERIALIZATION
-> DESERIALIZATION
-> DESERIALIZATION
PRODUCER -> ... -> CONSUMER
-> DESERIALIZATION
-> DESERIALIZATION
-> DESERIALIZATION
(sync) (parallel) (sync)
Important notice. Deserialized items should be produced:
synchronously;
in the same order the original producer produces encoded items.
Q. Is there a standardized way to code such a pipeline?
Try
while(producer.next()) {
CompletableFuture.supplyAsync(()-> gson.deserialize(producer.item()))
.thenRunAsync(item->processItem(item));
}
One way you can achieve your pattern is to:
Construct a multi-threaded executor to process the decoding requests
Have a consumer queue; each time you submit an item to be decoded, also add the corresponding Future object to the consumer queue
Have a consumer thread sit waiting to take items off the queue [which therefore consumes them in the order they were posted], call the corresponding get() method [which waits for the item to be decoded]
So the 'consumer' would look like this:
BlockingQueue<Future<Item>> consumerQueue = new LinkedBlockingDeque<>();
Thread consumerThread = new Thread(() -> {
try {
while (true) {
Future<Item> item = consumerQueue.take();
try {
// Get the next decoded item that's ready
Item decodedItem = item.get();
// 'Consume' the item
...
} catch (ExecutionException ex) {
}
}
} catch (InterruptedException irr) {
}
});
consumerThread.start()
Meanwhile, the 'producer' end, with its multi-threaded 'decoder', would look like this:
ExecutorService decoder = Executors.newFixedThreadPool(4);
while (!producer.hasNext()) {
Item item = producer.next()
// Submit the decode job for asynchronous processing
Future<Item> p = decoder.submit(() -> {
item.decode();
}, item);
// Also queue this decode job for future consumption once complete
consumerQueue.add(p);
}
As a separate matter, I wonder if you will actually see much benefit in practice, since by insisting on consumption in the same order, you are inherently introducing a serial condition on the process. But technically, this is one way that you could achieve what you are after.
P.S. If you didn't want a separate consumer thread, then the same 'producer' thread could poll the queue for completed items and execute in line.
Good time of day everyone.
I wonder if it's possible to somehow emit element of a Flowable on a different thread than next ones.
For example I have a hot in-memory cache of database objects and I don't want to go to io thread to get elements from there.
Whant I want to do is basically:
if (cache.contains(e)) {
emiter.emit(cache.get(e));
} else {
Io.post(() -> emiter.emit(db.get(e)));
}
I need the same Flowable to use different threads.
I haven't found a way to do this so far.Is it possible?
Consider following method:
private Flowable<String> getDbOnlyIfNotCached(String key) {
if (cache.contains(key)) {
return Flowable.just(cache.get(key));
} else {
return Flowable.fromCallable(() -> db.get(key))
.subscribeOn(Schedulers.io());
}
}
If cache.contains(key) is true, everything will run in the calling thread. If the value is not cached, db.get(key) will be called using the io scheduler.
Update: Examples in Android
You can use above method like this:
getDbOnlyIfNotCached("hit")
.subscribe(s -> {
// If "hit" is cached, this will be executed in the current thread.
Log.d(TAG, Thread.currentThread().getName());
});
getDbOnlyIfNotCached("miss")
.subscribe(s -> {
// If "miss" is cached, this will be executed in another thread.
Log.d(TAG, Thread.currentThread().getName());
});
Or you can use it in a Flowable chain using flatMap.
Flowable.just("hello")
./* some other operators */
.flatMap(s -> getDbOnlyIfNotCached(s))
// If "hit" is cached, chain still runs in the current thread.
.subscribe(s -> {
Log.d(TAG, s + " " + Thread.currentThread().getName());
});
Flowable.just("miss")
./* some other operators */
.flatMap(s -> getDbOnlyIfNotCached(s))
// If "miss" is cached, chain switches to another thread.
.subscribe(s -> {
Log.d(TAG, Thread.currentThread().getName());
});
If you want to observe on the main thread, then specify observeOn at the end of the chain.
Flowable.just("miss")
./* some other operators */
.flatMap(s -> getDbOnlyIfNotCached(s))
// if "miss" is cached, chain switches to another thread.
.observeOn(AndroidSchedulers.mainThread())
// Now switched to the main thread.
.subscribe(s -> {
Log.d(TAG, Thread.currentThread().getName());
});
I am trying to set up an exponential back off via an Observable.timer if the network is down or if a given service is down. I have a retryWhen when there are errors.
I have two issue, I cannot get the timer to work, no matter the time set, it always runs immediately. From what I know in the docs it should run the delay then send a complete, but when I look at the logs, I see no delay.
Second is because of I wanted to get the value of the retry when it is returned I used subscribe to get it, however when Observable error is returned it throws an exception when I do the calculations. For the second issue, I plan to do a check on the type of Observable and action it depending on the type.
If I could get ideas on what I may be doing wrong that would be great
return Observable.zip(
locationObservable,
oAdapterService.getIssuerInformation(sponsorCode),
oAdapterService.getOfferInformation(sponsorCode, activity.getOfferCode()),
(LocationInfo a, IssuerInfo b, OfferInfo c) -> {
OAdapterUtil.setLocationInfo(activity, a);
OAdapterUtil.setIssuerInfo(activity, b);
OAdapterUtil.setOfferInfo(activity, c);
return activity;
})
.retryWhen(errors -> errors.zipWith(Observable.range(1, maxRetries), (error, retries) -> {
if (retries++ < maxRetries) {
log.debug("Issues with Service call for transaction ID {} with initiator ID {}, retry count {}"
,activity.getTransactionId(),activity.getInitiatorId() ,retries);
return Observable.just(retries);
}
log.error("Tried to call Service {} time(s) for for transaction ID {} with initiator ID {}, error is {} "
,maxRetries,activity.getTransactionId(),activity.getInitiatorId(),error);
return Observable.error(error);
}
).flatMap(x -> {
log.debug("X value in flat map is {}",x.toString());
x.subscribe(currentValue -> {
log.debug("X value in subscribe is with subscribe {}",currentValue.toString());
double retryCount = Double.parseDouble(currentValue.toString()) + 2.0 ;
log.debug("retry count {}",retryCount);
long exponentialBackOff =(long)Math.pow(2.0, retryCount);
log.debug("exp back off {}",exponentialBackOff);
// Observable.timer(exponentialBackOff, TimeUnit.SECONDS);
});
Observable.timer(10, TimeUnit.SECONDS);
return x;
// Observable.timer(backoffPeriod, TimeUnit.MILLISECONDS);
}
));
You have an orphan line of code:
Observable.timer(10, TimeUnit.SECONDS);
The only thing this line of code does is to create an observable. The result is discarded because nothing is done with it.
If you need to back off, then do:
return x.delay(10, TimeUnit.SECONDS);
inside of the flatMap() operator. Remove the x.subscriber(); any logging should be done before returning.