How to delay retry by 4 hours on SQS?

How to delay retry by 4 hours on SQS? - java

TL;DR: how to mimic rabbitMQ's scheduling functionality keeping the consumer:
stateless
free from managing scheduled messages
free from useless retries from scheduled messages between receiving the message and finally consuming it the correct scheduled time
I have a single SQS queue with default properties on creation. The average time a consumer takes to process a message is 1~2s. But a few messages needs to be processed twice, between a 4h window. These messages are called B, and the others are called A.
Suppose I have my queue with the following messages: A1, A2, B1, A3, B2 (5 messages, max 10s to consume them all) at the start of these table:
time | what should happen
---------|-------------------
now | consumer connected to queue
now+10s | all As were consumed successfully and deleted from queue
Bs had their unsuccessful first try and now they are waiting for their retry in 4h
between | nothing happens since no new messages arrived and old ones are waiting
now+4h4s | Bs successfully consumed during second retry and due that, deleted from queue
I have a Spring application where I can throw exceptions when I find a type B message. Due simplicity and scalability, I want to have one single thread consuming messages taking 1~2s to consume each message.
This way, I cannot hang message processing as this answer suggested. I also don't need SQS' Delivery delay since it postpones just the messages arriving at queue and not retries. If possible, I would like to keep using long polling #JmsListener and avoid at all keeping any state on my memory's application. I want to avoid this if possible

I would write a small AWS Lambda function that gets invoked every ~minute. That function would get a message (off the hopefully FIFO-type SQS queue) and check the time it was added. If it was added >= 4 hours, it would delete it off the incoming queue and add it to the delayed by 4 hour queue, which your application could listen to. If it moved a message, continue to do so until the next message isn't 4 hours old. Increase/decrease the frequency of the lambda to increase the granularity of how 'tight' to 4 hours you are, but at the added expense of running the lambda more often.
Here is a quick link to an example of an AWS Lambda function using SQS: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs-example.html

You could send message B to a Step Functions state machine and put a wait state in to wait for 4 hours before sending it to the queue. The state machine would keep the state for you, and you can send messages directly to SQS from Step Functions so you don't need to write any code.

Since I was using JmsListener with setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE), I decided to run this at the end of the consumer of re-processable messages:
myAmazonSqsInstance.sendMessage(
new SendMessageRequest()
.withQueueUrl("queueName")
.withMessageBody(myMessageWithText)
.withDelaySeconds(900) // 900s = 15min
);
This way this message will be consumed successfully but a new message with the same body will be produced on the queue. This message will be consumed in 15min, and due to my business logic, fail again. There will be 16 fails (16*15min=4h) til it finally is consumed without producing new messages.
Although this is not what I asked for, and it's similar to the other answers (only the tech stack is different), I decided to write it down here to make a java solution available

Related

Does Apache Camel skip deleting a message read from a queue if processing takes too long?

I have a single Apache consumer that reads messages from an SQS standard queue and processes it.
Some of the messages take very little time to be processed (1-2 min), whereas some messages take a long time (70-80 min).
I have not explicitly specified a visibility timeout in the Camel configurations, so each message read is given a visibility timeout of 30s, as specified in the queue configuration.
Messages that take a short time to be processed are working fine, but for messages that long durations, Apache Camel is not deleting the messages once it has finished processing.
So the messages become available in the queue again, and Camel reads the message again, processes it for a long time and this cycle repeats.
I understand that the messages are reappearing in the queue because the visibility timeout is low, but my concern is -
1. Why doesn't Camel delete the message as soon as processing is finished for these long running jobs?
2. Messages for short jobs are being deleted correctly (even the short running jobs exceed the visibility timeout of 30 sec)
There are no errors or exceptions occurring anywhere. I even made a sample POC job that does nothing else but wait for a long time (85 minutes) and then prints a success message. What I notice is that the job completes successfully, but Camel doesn't delete the message. Why?

I fixed this using configuration options in Camel - 'extendMessageVisibility'.
When we set this option to true, Camel runs a background task that keeps extending the visibility timeout of the message until it gets processed by the consumer.
Again, my real problem wasn't that the message became visible on the queue again, but that it wasn't being deleted by the Camel consumer.
However, based on my observation, I see that applying the extendMessageVisibility option has led to Camel keeping track of the message and then deleting it correctly.
We can configure the queue URI like this:
String queueUri = "aws-sqs://" + queueName + "?amazonSQSClient=#client&extendMessageVisibility=true&visibilityTimeout=60";

Activemq does not balance messages after some time

I´m using activemq(5.14.5) with camel(2.13.4) because I still need java 6.
I have a queue and 15 consumers. The messages sent to them are request reply.
When I start the consumers, the messages are distributed one per consumer as soon as the messages arrive but, after some time, only one consumer receives the messages, the others stay idle and a lot of messages stay pending.
The consumers have this configuration:
concurrentConsumers=15&maxMessagesPerTask=1&destination.consumer.prefetchSize=0&transferException=true
The time spent to process each message can varies a lot because of our business rule so, I don´t know if activemq has some rule that manage slow consumers and redirect to only one that is more "efficient".
The behaviour that I was expecting is that all the messages that arrives, start to process until all the consumers are full, but it is not what is happening.
Anybody knows what is happening?
Following is an image about what is happening:

Your configuration has two eye-catching settings:
maxMessagesPerTask=1
If you did not intend to configure auto-scaling the threadpool, you should remove this setting completely. Is is by default unlimited and it sets how long to keep threads for processing (scaling up/down threadpool).
See also the Spring Docs about this setting
prefetchSize=0
Have you tried setting this to 1 so that every consumer just gets 1 message at a time?
The AMQ docs say about the prefetchSize:
Large prefetch values are recommended for high performance with high message volumes. However, for lower message volumes, where each message takes a long time to process, the prefetch should be set to 1. This ensures that a consumer is only processing one message at a time. Specifying a prefetch limit of zero, however, will cause the consumer to poll for messages, one at a time, instead of the message being pushed to the consumer.

Size of event bus in vert.x

I am using vert.x to read a file and transform and then push to kafka.
I am using 2 verticles, without using any worker thread (I dont want to change the order of logs in the file).
Verticle 1 : Read the file and filter
Verticle 2 : Publish to kafka
Each files contain approximately 120000 lines
However, I observed that after sometime i stop observing logs from verticle 1.
I am suspecting that event bus is getting full, so Consumer is still consuming, but producer thread is waiting for event bus to get empty.
So My questions are
1. What is the default size of event bus? In Docs it says
DEFAULT_ACCEPT_BACKLOG
The default accept backlog = 1024
2. How do I confirm my suspicion that publisher thread is blocked?

VertX uses Netty's SingleThreadEventLoop internally for its event bus, maximum pending tasks allowed is Integer.MAX_VALUE which is probably 2 billion messages.
You may have to try VertxOptions.setWarningExceptionTime(long warningExceptionTime) to set the value lower than default (5sec) to see if there is any warning about blocked thread.

To complement #iwat answer, in the version I am using, it looks like the max size is read from a system property:
protected static final int DEFAULT_MAX_PENDING_TASKS = Math.max(16, SystemPropertyUtil.getInt("io.netty.eventLoop.maxPendingTasks", 2147483647));
So you can control the size of the queues in front of the Verticles by setting that system property.
If the event bus is full (the queue in NioEventLoop reaches the max size), the task will be rejected. So if you hit that, you should start to see error responses to your messages, you should not see any blocked producers.

I'm not sure the accept-backlog setting has any effect on the eventbus, given the documentation it might have something to do with the netserver, but from a short scan of the code I haven't found any use in the eventbus.
The event bus however does deliver the message immediately, messages don't get queued up somewhere (at least that's what I understand from the code). So regarding your first question, it doesn't have any size, at least not when running locally (don't know about the clustered version, but I assume that doesn't apply in your case anyway)
To confirm an (eventloop) thread is actually blocked is easy, there should be tons of exceptions in your log stating the event loop is blocked.
I guess your problem is somewhere else, but that's actually hard to tell without any code or meaningful logs.

How to monitor an activeMQ's queue arrival and dispatching times of messages?

Is there a way (third party software or programming) to monitor the time a message arrive to an specific queue and the time it's consumed?
something like a message arrive at 17:14:22 565 and consumed at 17:14:22 598 or the message was enqueued N miliseconds
I have read about Statistics plugin but it just give max and min times of enqueued messages

You can use http://activemq.apache.org/advisory-message.html
First example below to be notified when a message is delivered to the broker.
Second example to be notified when the message is consumed.
AdvisorySupport.getMessageDeliveredAdvisoryTopic()
AdvisorySupport.getMessageConsumedAdvisoryTopic()
See example here below to have access to Message properties like creation time, in or out time when a messages arrived or left the broker.
Here is the list of properties http://activemq.apache.org/activemq-message-properties.html
Spring JMS Producer and Consumer interaction

One way is to write your own plugin. (http://activemq.apache.org/developing-plugins.html)
It's quite simple, and the effect is similar as you change activemq broker code.
You can extends BrokerFilter class, and override it's methods, like postProcessDispatch(), send(). Then you can record the time, or whatever you want in your own code.
I write a simple example (https://github.com/lcy362/FoxActivemqPlugin/blob/b54d375a6a91a9ec418e779deb69a8b11f7d985a/src/main/java/com/mallow/activemq/FoxBrokerPlugin.java), hoping that's helpful.

Kafka - Delayed Queue implementation using high level consumer

Want to implement a delayed consumer using the high level consumer api
main idea:
produce messages by key (each msg contains creation timestamp) this makes sure that each partition has ordered messages by produced time.
auto.commit.enable=false (will explicitly commit after each message process)
consume a message
check message timestamp and check if enough time has passed
process message (this operation will never fail)
commit 1 offset
while (it.hasNext()) {
val msg = it.next().message()
//checks timestamp in msg to see delay period exceeded
while (!delayedPeriodPassed(msg)) {
waitSomeTime() //Thread.sleep or something....
}
//certain that the msg was delayed and can now be handled
Try { process(msg) } //the msg process will never fail the consumer
consumer.commitOffsets //commit each msg
}
some concerns about this implementation:
commit each offset might slow ZK down
can consumer.commitOffsets throw an exception? if yes i will consume the same message twice (can solve with idempotent messages)
problem waiting long time without committing the offset, for example delay period is 24 hours, will get next from iterator, sleep for 24 hours, process and commit (ZK session timeout ?)
how can ZK session keep-alive without commit new offsets ? (setting a hive zookeeper.session.timeout.ms can resolve in dead consumer without recognising it)
any other problems im missing?
Thanks!

One way to go about this would be to use a different topic where you push all messages that are to be delayed. If all delayed messages should be processed after the same time delay this will be fairly straight forward:
while(it.hasNext()) {
val message = it.next().message()
if(shouldBeDelayed(message)) {
val delay = 24 hours
val delayTo = getCurrentTime() + delay
putMessageOnDelayedQueue(message, delay, delayTo)
}
else {
process(message)
}
consumer.commitOffset()
}
All regular messages will now be processed as soon as possible while those that need a delay gets put on another topic.
The nice thing is that we know that the message at the head of the delayed topic is the one that should be processed first since its delayTo value will be the smallest. Therefore we can set up another consumer that reads the head message, checks if the timestamp is in the past and if so processes the message and commits the offset. If not it does not commit the offset and instead just sleeps until that time:
while(it.hasNext()) {
val delayedMessage = it.peek().message()
if(delayedMessage.delayTo < getCurrentTime()) {
val readMessage = it.next().message
process(readMessage.originalMessage)
consumer.commitOffset()
} else {
delayProcessingUntil(delayedMessage.delayTo)
}
}
In case there are different delay times you could partition the topic on the delay (e.g. 24 hours, 12 hours, 6 hours). If the delay time is more dynamic than that it becomes a bit more complex. You could solve it by introducing having two delay topics. Read all messages off delay topic A and process all the messages whose delayTo value are in the past. Among the others you just find the one with the closest delayTo and then put them on topic B. Sleep until the closest one should be processed and do it all in reverse, i.e. process messages from topic B and put the once that shouldn't yet be proccessed back on topic A.
To answer your specific questions (some have been addressed in the comments to your question)
Commit each offset might slow ZK down
You could consider switching to storing the offset in Kafka (a feature available from 0.8.2, check out offsets.storage property in consumer config)
Can consumer.commitOffsets throw an exception? if yes, I will consume the same message twice (can solve with idempotent messages)
I believe it can, if it is not able to communicate with the offset storage for instance. Using idempotent messages solves this problem though, as you say.
Problem waiting long time without committing the offset, for example delay period is 24 hours, will get next from iterator, sleep for 24 hours, process and commit (ZK session timeout?)
This won't be a problem with the above outlined solution unless the processing of the message itself takes more than the session timeout.
How can ZK session keep-alive without commit new offsets? (setting a hive zookeeper.session.timeout.ms can resolve in dead consumer without recognizing it)
Again with the above you shouldn't need to set a long session timeout.
Any other problems I'm missing?
There always are ;)

Use Tibco EMS or other JMS Queue's. They have retry delay built in . Kafka may not be the right design choice for what you are doing

I would suggest another route in your cases.
It doesn't make sense to address the waiting time in the main thread of the consumer. This will be an anti-pattern in how the queues are used. Conceptually, you need to process the messages as fastest as possible and keep the queue at a low loading factor.
Instead, I would use a scheduler that will schedule jobs for each message you are need to delay. This way you can process the queue and create asynchronous jobs that will be triggered at predefined points in time.
The downfall of using this technique is that it is sensible to the status of the JVM that holds the scheduled jobs in memory. If that JVM fails, you loose the scheduled jobs and you don't know if the task was or was not executed.
There are scheduler implementations, though that can be configured to run in a cluster environment, thus keeping you safe from JVM crashes.
Take a look at this java scheduling framework: http://www.quartz-scheduler.org/

We had the same issue during one of our tasks. Although, eventually, it was solved without using delayed queues, but when exploring the solution, the best approach we found was to use pause and resume functionality provided by the KafkaConsumer API. This approach and its motivation is perfectly described here: https://medium.com/naukri-engineering/retry-mechanism-and-delay-queues-in-apache-kafka-528a6524f722

Keyed-list on schedule or its redis alternative may be best approaches.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.