Reactor - Delay Flux elements in case of processing errors - java

I have a similar problem to this question and I do not see an accepted answer. I have researched through and did not get a satisfactory answer.
I have a reactive Kafka consumer (Spring Reactor) with a poll amount 'x' and the application pushes the messages polled to a reactive endpoint using reactive webclient. The issue here is that the external service can perform differently overtime and I will have to adjust the Kafka consumer to poll less messages when the circuit breaker opens (Or kick in backpressure) when we see lot of failures. Is there a way in the current reactor to automatically
React when the circuit breaker is in open state and reduce the poll amount or slow down the consumption.
Increase the poll amount to the previous state when the circuit is closed ( External service would scaled up if it goes down ).
I do not want to use delayElements or delayUntil since these are mostly static in nature and want the application to react during runtime. How can I configure these end to end backpressure? I would provide the values for consumers when the circuit is closed, partially closed and open in app configs.

As backpressure is based on the slowness of the consumer, one way to achieve this is to convert certain exception types to delay. You can use the onErrorResume for this purpose as demonstrated below:
long start = System.currentTimeMillis();
Flux.range(1, 1000)
.doOnNext(item -> System.out.println("Elpased " + (System.currentTimeMillis() - start) + " millis for item: " + item))
.flatMap(item -> process(item).onErrorResume(this::slowDown), 5) // concurrency limit for demo
.blockLast();
System.out.println("Flow took " + (System.currentTimeMillis() - start) + " milliseconds.");
private Mono<Integer> process(Integer item) {
// simulate error for some items
if (item >= 50 && item <= 100) {
return Mono.error(new RuntimeException("Downstream failed."));
}
// normal processing
return Mono.delay(Duration.ofMillis(10))
.thenReturn(item);
}
private Mono<Integer> slowDown(Throwable e) {
if (e instanceof RuntimeException) { // you could check for circuit breaker exception
return Mono.delay(Duration.ofMillis(1000)).then(Mono.empty()); // delay to slow down
}
return Mono.empty(); // no delay for other errors
}
If you check the output of this code, you can see there is some slow down between the items 50 and 100 but it works at regular speed before and after.
Note that my example does not use Kafka. As you are using reactor-kafka library which honors backpressure it is supposed to work the same way as this dummy example.
Also, as the Flux might process items concurrently, the slow down is not immediate, it will try to process some additional items before properly slowing down.

Related

WebFlux backoff and multi-threading in a kafka consumer flow

I have a Kafka consumer written in Java and SpringBoot.
I am using WebFlux in order to make a call to trigger some actions on a third party server (and waiting the result of course).
This server has rate limit that is limiting me from making a lot of requests in a short time.
In order to prevent failures I intend to keep on trying calling the server using WebFlux backoff:
webClientBuilder.build()
.get()
...
.retryWhen(getRetryPolicyOnTooManyRequests())
...
private RetryBackoffSpec getRetryPolicyOnTooManyRequests() {
return Retry.backoff(20, Duration.ofSeconds(retryBackoffMinimumSeconds))
.filter(this::is429Error);
}
private boolean is429Error(Throwable throwable) {
return throwable instanceof WebClientResponseException
&& ((WebClientResponseException) throwable).getStatusCode() == HttpStatus.TOO_MANY_REQUESTS;
}
My questions are about the behavior I should expect from my kafka:
What will happen when I'll be backoffing one of my calls? Will I be blocking the thread? Will a new thread be opened to process another message?
If I got the default consumer configurations (max.poll.records=500, max.poll.interval.ms=30000) and my backoff time will get to 5 minutes will the kafka group be rebalanced?
If so, is there a smarter way to tackle this issue so I won't get rebalanced
each time, other than just putting a super high number in max.poll.interval.ms

Rabbit MQ doesn't flush acks?

The problem appeared in logs: Consumer failed to start in 60000 milliseconds; does the task executor have enough threads to support the container concurrency?
We try to open handlers for like 50 queues dynamically by SimpleMessageListenerContainer.addQueueNames(), then application is started. It consumes some messages, but the RabbitMQ admin panel shows that they are unacked. After a significant amount of time, messages are stacking up to 6 unacked messages (queue has fairly low amount of messages per minute) per queue, which sums up to 300 messages total, something happens and they all become consumed and acked. While messages are unacked, the container seems to be trying to load another consumer until it bumps into the limit.
We rely on AUTO acknowledgment mode now, when it was MANUAL, it was fine.
There are two questions:
What can be the reason for unacked messages? Is there any flushing mechanism that doesn't trigger often?
What do I do with "not enough threads" message?
Those two seem to be really related one to another.
Here's the setup:
#Bean
fun queueMessageListenerContainer(
connectionFactory: ConnectionFactory,
retryOperationsInterceptor: RetryOperationsInterceptor,
vehicleQueueListenerFactory: QueueListenerFactory,
): SimpleMessageListenerContainer {
return SimpleMessageListenerContainer().also {
it.connectionFactory = connectionFactory
it.setConsumerTagStrategy { queueName -> consumerTag(queueName) }
it.setMessageListener(vehicleQueueListenerFactory.create())
it.setConcurrentConsumers(2)
it.setMaxConcurrentConsumers(5)
it.setListenerId("queue-consumer")
it.setAdviceChain(retryOperationsInterceptor)
it.setRecoveryInterval(RABBIT_HEARTH_BEAT.toMillis())
//had 10-100 threads, didn't help
it.setTaskExecutor(rabbitConsumersExecutorService)
// AUTO suppose to set ack for the messages, right?
it.acknowledgeMode = AcknowledgeMode.AUTO
}
}
#Bean
fun connectionFactory(rabbitProperties: RabbitProperties): AbstractConnectionFactory {
val rabbitConnectionFactory = com.rabbitmq.client.ConnectionFactory().also { connectionFactory ->
connectionFactory.isAutomaticRecoveryEnabled = true
connectionFactory.isTopologyRecoveryEnabled = true
connectionFactory.networkRecoveryInterval = RABBIT_HEARTH_BEAT.toMillis()
connectionFactory.requestedHeartbeat = RABBIT_HEARTH_BEAT.toSeconds().toInt()
// was up to 100 connections, didn't help
connectionFactory.setSharedExecutor(rabbitConnectionExecutorService)
connectionFactory.host = rabbitProperties.host
connectionFactory.port = rabbitProperties.port ?: connectionFactory.port
}
return CachingConnectionFactory(rabbitConnectionFactory)
.also {
it.cacheMode = rabbitProperties.cache.connection.mode
it.connectionCacheSize = rabbitProperties.cache.connection.size
it.setConnectionNameStrategy { "simulation-gateway:${springProfiles.firstOrNull()}:event-consumer" }
}
}
class QueueListenerFactory {
fun create(){
return MessageListener {
try {
// no ack, rely on AUTO acknowledgement mode
handle()
} catch (e: Throwable) {
...
}
}
}
}
Okay, I figured out what the problem was. Basically, it couldn't start all of the queues consumers in time, since it not only is slow process for too slow for SimpleMessageListenerContainer, but also we tried to addQueueNames one by one.
userRepository.findAll()
.map { user -> queueName(user) }
.onEach { queueName ->
simpleContainerListener.addQueueNames(queueName)
}
But the following line of documentation for SimpleMessageListenerContainer remained unnoticed:
The existing consumers will be cancelled after they have processed any pre-fetched messages and new consumers will be created
Which means what actually happened is recreation of (1, 2, ... N) consumers. What made it even worse is that if the request comes from the API, we did exactly the same simpleContainerListener.addQueueNames(queueName) after handling the request, which recreated all of consumers after that.
Also, recreation of the consumers was the reason why AUTO acknowledgement didn't work: threads were hanging trying to build enough consumers before the timeout.
I fixed this by adding DirectMessageListenerContainer to handle recently added queues, which is blazing fast, compared to SimpleMessageListenerContainer for the particular case of adding just one new consumer.
DirectMessageListenerContainer(connectionFactory).also {
it.setConsumerTagStrategy { queueName -> consumerTag(queueName, RECENT_CONSUMER_TAG) }
it.setMessageListener(ListenerFactory.create())
it.setListenerId("queue-consumer-recent")
it.setAdviceChain(retryOperationsInterceptor)
it.setTaskExecutor(recentQueuesTaskExecutor)
it.acknowledgeMode = AcknowledgeMode.AUTO
}
The downside is DirectMessageListenerContainer using 1 thread per queue on every instance. This is exactly why I didn't want to use it in the first place, but using both DirectMessageListenerContainer for recent and SimpleContainerListener for already existing queues significantly reduces amount of thread required to handle those queues. As far as I understand, an overwhelming usage of DirectMessageListenerContainer will lead to OOM eventually, so the next step can be to transfer queues from direct to simple container listener in batches.

How to limit concurrent http requests with Mono & Flux

I want to handle Flux to limit concurrent HTTP requests made by List of Mono.
When some requests are done (received responses), then service requests another until the total count of waiting requests is 15.
A single request returns a list and triggers another request depending on the result.
At this point, I want to send requests with limited concurrency.
Because consumer side, too many HTTP requests make an opposite server in trouble.
I used flatMapMany like below.
public Flux<JsonNode> syncData() {
return service1
.getData(param1)
.flatMapMany(res -> {
List<Mono<JsonNode>> totalTask = new ArrayList<>();
Map<String, Object> originData = service2.getDataFromDB(param2);
res.withArray("data").forEach(row -> {
String id = row.get("id").asText();
if (originData.containsKey(id)) {
totalTask.add(service1.updateRequest(param3));
} else {
totalTask.add(service1.deleteRequest(param4));
}
originData.remove(id);
});
for (left) {
totalTask.add(service1.createRequest(param5));
}
return Flux.merge(totalTask);
});
}
void syncData() {
syncDataService.syncData().????;
}
I tried chaining .window(15), but it doesn't work. All the requests are sent simultaneously.
How can I handle Flux for my goal?
I am afraid Project Reactor doesn't provide any implementation of either rate or time limit.
However, you can find a bunch of 3rd party libraries that provide such functionality and are compatible with Project Reactor. As far as I know, resilience4-reactor supports that and is also compatible with Spring and Spring Boot frameworks.
The RateLimiterOperator checks if a downstream subscriber/observer can acquire a permission to subscribe to an upstream Publisher. If the rate limit would be exceeded, the RateLimiterOperator could either delay requesting data from the upstream or it can emit a RequestNotPermitted error to the downstream subscriber.
RateLimiter rateLimiter = RateLimiter.ofDefaults("name");
Mono.fromCallable(backendService::doSomething)
.transformDeferred(RateLimiterOperator.of(rateLimiter))
More about RateLimiter module itself here: https://resilience4j.readme.io/docs/ratelimiter
You can use limitRate on a Flux. you need to probably reformat your code a bit but see docs here: https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#limitRate-int-
flatMap takes a concurrency parameter: https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#flatMap-java.util.function.Function-int-
Mono<User> getById(int userId) { ... }
Flux.just(1, 2, 3, 4).flatMap(client::getById, 2)
will limit the number of concurrent requests to 2.

How to fix "MissingBackpressureException"

I am using RxJava2 Flowables by subscribing to a stream of events from a PublishSubject.It's being used in enterprise level application and we don't have the choice of dropping any events.
I am using version RxJava 2.2.8
I am using BackpressureStrategy.BUFFER as I don't want to lose any of my events.
Also, I buffer again for 50000 or 3 minutes whichever is earlier. This I do as I want to consolidate events and then process them.
But I get the following errors in a few minutes of my run
io.reactivex.exceptions.MissingBackpressureException: Could not emit buffer due to lack of requests
at io.reactivex.internal.subscribers.QueueDrainSubscriber.fastPathOrderedEmitMax(QueueDrainSubscriber.java:121)
at io.reactivex.internal.operators.flowable.FlowableBufferTimed$BufferExactBoundedSubscriber.run(FlowableBufferTimed.java:569)
at io.reactivex.Scheduler$Worker$PeriodicTask.run(Scheduler.java:479)
at io.reactivex.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:66)
I tried increasing the buffer size by setting up, but there is no change in the behavior.
System.setProperty("rx2.buffer-size", "524288");
Also If I buffer for a longer time instead of 3 minutes, I get the exception after much longer time probably because my downstream performs better when the events are consolidated more. However, I don't have that choice because these are live events and needs processing immediately(in 3-5 minutes).
I also tried thread.sleep() before invoking the "subscription.next" in case of error but still getting the same results.
keySubject.hide()
.toFlowable(BackpressureStrategy.BUFFER)
.parallel()
.runOn(Schedulers.computation())
.map(e -> e.getContents())
.flatMap(s -> Flowable.fromIterable(s))
.sequential()
.buffer(3,TimeUnit.MINUTES,50000)
.subscribe(new Subscriber<List<String>>() {
#Override
public void onSubscribe(Subscription var1) {
innerSubscription = var1;
innerSubscription.request(1L);
}
#Override
public void onNext(List<String> logs) {
Subscription.request(1L);
/// Do some logic here
}
I want to know How do I handle the backpressure to avoid this exception? Is this exception because of ".buffer" method
Is there a way for me to check the status of these buffers. Also why even if I increase the rx2.buffer-size, I still get the exception in the same amount of time. Ideally, the system should run longer with a higher buffer size if the exception is because if buffer getting full.
Any help on the reason for this message "Could not emit buffer due to lack of requests at " will be great.
The thing is, why do you use a subject that isn't backpressure-aware? Are you using that as a poor man's event bus? Also, assuming e.getContents() is a simple getter I believe you can replace this whole block
.toFlowable(BackpressureStrategy.BUFFER)
.parallel()
.runOn(Schedulers.computation())
.map(e -> e.getContents())
.flatMap(s -> Flowable.fromIterable(s))
.sequential()
.buffer(3,TimeUnit.MINUTES,50000)
.subscribe(new Subscriber<List<String>>() { ... });
with
.flatMapIterable(e -> e.getContents())
.buffer(3,TimeUnit.MINUTES,50000)
.rebatchRequests(1)
.observeOn(Schedulers.computation())
.doOnNext(s -> /* Do some logic here */)
.subscribe();

Project Reactor async send email with retry on error

I need to send some data after user registered. I want to do first attempt in main thread, but if there are any errors, I want to retry 5 times with 10 minutes interval.
#Override
public void sendRegisterInfo(MailData data) {
Mono.just(data)
.doOnNext(this::send)
.doOnError(ex -> logger.warn("Main queue {}", ex.getMessage()))
.doOnSuccess(d -> logger.info("Send mail to {}", d.getRecipient()))
.onErrorResume(ex -> retryQueue(data))
.subscribe();
}
private Mono<MailData> retryQueue(MailData data) {
return Mono.just(data)
.delayElement(Duration.of(10, ChronoUnit.MINUTES))
.doOnNext(this::send)
.doOnError(ex -> logger.warn("Retry queue {}", ex.getMessage()))
.doOnSuccess(d -> logger.info("Send mail to {}", d.getRecipient()))
.retry(5)
.subscribe();
}
It works.
But I've got some questions:
Did I correct to make operation in doOnNext function?
Is it correct to use delayElement to make a delay between executions?
Did the thread blocked when waiting for delay?
And what the best practice to make a retries on error and make a delay between it?
doOnXXX for logging is fine. But for the actual element processing, you must prefer using flatMap rather than doOnNext (assuming your processing is asynchronous / can be converted to returning a Flux/Mono).
This is correct. Another way is to turn the code around and start from a Flux.interval, but here delayElement is better IMO.
The delay runs on a separate thread/scheduler (by default, Schedulers.parallel()), so not blocking the main thread.
There's actually a Retry builder dedicated to that kind of use case in the reactor-extra addon: https://github.com/reactor/reactor-addons/blob/master/reactor-extra/src/main/java/reactor/retry/Retry.java

Categories