RabbitMQ Delivery Acknowledgement Timeout

RabbitMQ Delivery Acknowledgement Timeout - java

I am using a managed RabbitMQ cluster through AWS Amazon-MQ. If the consumers finish their work quickly then everything is working fine. However, depending on few scenarios few consumers are taking more than 30 mins to complete the processing.
In that scenarios, RabbitMQ deletes the consumer and makes the same messages visible again in the queue. Becasue of this another consumer picks it up and starts processing. It is happing in the loop. Therefore the same transaction is getting executed again and I am loosing the consumer as well.
I am not using any AcknowledgeMode so I believe it's AUTO by default and it has 30 mins limit.
Is there any way to increase the Delivery Acknowledgement Timeout for AUTO mode?
Or please let me know if anyone has any other solutions for this.

Reply From AWS Support:
Consumer timeout is now configurable but can be done only by the service team. The change will be permanent irrespective of any version.
So you may update RabbitMQ to latest, and no need to stick with 3.8.11. Provide your broker details and desired timeout, they should be able to do it for you.

This is the response from AWS support.
From my understanding, I see that your workload is currently affected by the consumer_timeout parameter that was introduced in v3.8.15.
We have had a number of reach outs due to this, unfortunately, the service team has confirmed that while they can manually edit the rabbitmq.conf, this will be overwritten on the next reboot or failover and thus is not a recommended solution. This will also mean that all security patching on the brokers where a manual change is applied, will have to be paused. Currently, the service does not support custom user configurations for RabbitMQ from this configuration file, but have confirmed they are looking to address this in future, however, is not able to an ETA on when this will available.
From the RabbitMQ github, it seems this was added for quorum queues in v3.8.15 (https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.15 ), but seems to apply to all consumers (https://github.com/rabbitmq/rabbitmq-server/pull/2990 ).
Unfortunately, RabbitMQ itself does not support downgrades (https://www.rabbitmq.com/upgrade.html )
Thus the recommended workaround and safest action form the service team, as of now is to create a new broker on an older version (3.8.11) and set auto minor version upgrade to false, so that it wont be upgraded.
Then export the configuration from the existing RabbitMQ instance and import it into new instance and use this instance going forward.

Related

Kafka consumer group not rebalancing when increasing partitions

I have a situation where in my dev environment, my Kafka consumer groups will rebalance and distribute partitions to consumer instances just fine after increasing the partition count of a subscribed topic.
However, when we deploy our product into its kubernetes environment, we aren't seeing the consumer groups rebalance after increasing the partition count of the topic. Kafka recognized the increase which can be seen from the server logs or describing the topic from the command line. However, the consumer groups won't rebalance and recognize the new partitions. From my local testing, kafka respects metadata.max.age.ms (default 5 mins). But in kubernetes the group never rebalances.
I don't know if it affects anything but we're using static membership.
The consumers are written in Java and use the standard Kafka Java library. No messages are flowing through Kafka, and adding messages doesn't help. I don't see anything special in the server or consumer configurations that differs from my dev environment. Is anyone aware of any configurations that may affect this behavior?
** Update **
The issue was only occurring for a new topic. At first, the consumer application was starting before the producer application (which is intended to create the topic). So the consumer was auto creating the topic. In this scenario, the topic defaulted to 1 partition. When the producer application started it, updated the partition count per configuration. After that update, we never saw a rebalance.
Next we tried disabling consumer auto topic creation to address this. This prevented the consumer application from auto creating the topic on subscription. Yet still after the topic was created by the producer app, the consumer group was never rebalanced, so the consumer application would sit idle.
According to the documentation I've found, and testing in my dev environment, both of these situations should trigger a rebalance. For whatever reason we don't see that happen in our deployments. My temporary workaround was just to ensure that the topic is already created prior to allowing my consumer's to subscribe. I don't like it, but it works for now. I suspect that the different behavior I'm seeing is due to my dev environment running a single kafka broker vs the kubernetes deployments with a cluster, but that's just a guess.

Kafka defaults to update topic metadata only after 5 minutes, so will not detect partition changes immediately, as you've noticed. The deployment method of your app shouldn't matter, as long as network requests are properly reaching the broker.
Plus, check your partition assignment strategy to see if it's using sticky assignment. This will depend on what version of the client you're using, as the defaults changed around 2.7, I think
No messages are flowing through Kafka
If there's no data on the new partitions, there's no real need to rebalance to consume from them

How to make ActiveMQ detect duplicate message from message publisher (Idempotent producer)

Does ActiveMQ support Idempotent producer? I know Camel has an idempotent consumer pattern to detect and handle duplicate messages, but I'm wondering if this can be prevented at the source (producer).
Here is a little back ground. I have applications that are horizontally scaled accessing same database. There is one particular table that maintains status of a particular process. These horizontal applications should be able to read the status and invoke another process, however only one of them should be able to invoke it. This application periodically polls the data base and posts a message to a messaging broker, once the required condition is met. But I want one of the load balancing application should be able to post the message.
One crude approach I'm thinking is...
On Machine 1:
Read the database for checking if the necessary condition is met.
Before posting message to the broker, write a record to another status table with a unique key that identifies the process and commits. If this operation fails due to unique key constraint violation, it means process on another machine succeeded in posting the message.
Post the message to the broker
If the message posting is failed, for some reason, perform delete operation on the status table based on the unique key/ primary key.
The same operation can be performed by same application running on machine 2 , 3, 4 etc.
Below is one pitfall I quickly notice with this approach.
Assuming that Machine 1 is able to complete step 2 but failed performing step 3 and continues with step 4. Meanwhile Machine 2, when it failed at step 2, will move on with out attempting to read the status again and post the message.
To address this, I need to put retry on step 3, until the message is successfully posted to broker.
Another option is to use https://camel.apache.org/components/latest/eips/idempotentConsumer-eip.html pattern. But this is essentially a filter at consumer side. Though this will serve my purpose, is there a similar approach out of box available on message publishing side.
I wonder, if this approach is even correct or any better alternative approach, or any existing libraries that can be used to perform locking kind of mechanism across JVM either local or remote.

It's not clear what version of ActiveMQ you're using (i.e. ActiveMQ 5.x or ActiveMQ Artemis) so I'll try to address this issue for both.
ActiveMQ 5.x doesn't have any built-in support for detecting duplicates sent from clients. However, you could potentially implement this feature using a broker plugin. The only challenge I see here is configuring, managing, and monitoring the cache of duplicate IDs.
ActiveMQ Artemis does have built in support for detecting duplicates sent from clients. You can read more about duplicate detection in the documentation. Since the broker supports this behavior natively it provides clean configuration, management, and monitoring.
In either case you'll need to set a special header on each message with "a unique key that identifies the process" just like you would for your potential database solution. Furthermore, using the broker as the duplicate detector is much simpler overall.
If you're currently using ActiveMQ 5.x but want to move to ActiveMQ Artemis in order to use the duplicate detection feature you don't necessarily need to update your clients as ActiveMQ Artemis fully supports the OpenWire protocol used by 5.x clients. You should just be able to point them to the new instance of ActiveMQ Artemis and have everything work.

Kafka old consumer rebalance issue

In our system we're using an older version of kafka (0.9.0.1) and the old scala consumer API in a tomcat application.
Everything works fine most of the time, but sometimes when the servers where the consumers run are heavily utilised by some other tasks in the app then the consumers become unresponsive which triggers as expected a rebalance and that consumer is removed from its partitions and other consumers are used.
My question is if there is an easy way for the consumer to re-register itself when it comes back up?
I know that the old consumers store the partition consumer details in Zookeeper and was thinking we could have a task that would periodically check if our consumer is registered there and restart the consumer if not, but I'm not sure what exactly we should check there. Can anyone point me to some documentation about the data stored in zookeeper by kafka (haven't found anything in the official documentation sadly :( )?

Basically, what you want is fixed assignments, and that consumer groups never rebalance. If there was a way to disable automatic consumer rebalancing in the old Scala client, or maybe even increase the rebalance timeout to a much higher value, that could also work, but I couldn't find how to do that with the old Scala consumer.
However, it is possible to assign fixed topic/partitions when using the newer Java consumers, also available in that same 0.9 kafka version. Look for Subscribing To Specific Partitions in the latest Javadocs:
https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
Subscribing To Specific Partitions
In the previous examples we subscribed to the topics we were interested in and
let Kafka give our particular process a fair share of the partitions for those topics.
This provides a simple load balancing mechanism so multiple instances of our program
can divided up the work of processing records.
In this mode the consumer will just get the partitions it subscribes to
and if the consumer instance fails no attempt will be made to
rebalance partitions to other instances.

Delayed Queue implementation in Storm – Kafka, Cassandra, Redis or Beanstalk?

I have a storm topology to process messages from Kafka and make HTTP call / saves in Cassandra based on the task in hand. I process the messages as soon as they come. How ever few messages are not processed completely due to the response form external sources such as an HTTP. I would like to implement a exponential backoff mechanism for retrial in-case HTTP server does not respond/returns an error message to retry after some time. I could think of few ideas using which I could achieve them. I would like to know which of them will be a better solution also if there is any other solution that I can use which is fault tolerant. Since this is used to implement an exponential backoff each message will have a different delay time.
Send it another topic in Kafka which is consumed later. My preferred Solution. I know we can use Kafka offset so consume the message at a latter stage. How ever I could not find documentation/Sample code to do the same. It will be really helpful if any one can help me out with this.
Write the message Cassandra / Redis and write a scheduler to fetch the messages which are not processed and are ready to be consumed and Send it to Kafka so that my storm topology can consume it. (Existing solution in other legacy project(Non Storm))
Send to Beanstalk with Delay (Existing solution in other legacy project(Non Storm). How ever I would like to avoid using this solution and use it only in case I am out of option).
While this is pretty much what I would like to do. I am not able to find documentation to implement delayProcessingUntil as mentioned in Kafka - Delayed Queue implementation using high level consumer
I have done scheduled job from Data-store and delay using Beanstalk in the past, but I would prefer to use Kafka.

Kafka spout has an exponential backoff message retry built-in. You can configure initial delay, delay multiplier and maximum delay through spout configuration. If there is an error in the bolt, you can call collector.fail(input). After that you just leave it to spout to do the retry.
https://github.com/apache/storm/blob/v0.10.0/external/storm-kafka/src/jvm/storm/kafka/ExponentialBackoffMsgRetryManager.java

I think your use case describes the need for a database rather than a queue. You want to temporarily store records until their time and then remove them so they don't show up in future searches. Trying to do that in a queue would be awkward at best, as your analysis shows.
I suggest you create another column family in Cassandra to hold these delayed requests. You'd store the request itself along with a time to retry. Whether you'd want to also have a time series of failed HTTP attempts and related data is up to you. As a delayed request is finally fulfilled, you'd delete the corresponding row from the CF. The search for delayed requests is straightforward, too.
Of course, any database, even a file on the local drive or in HDFS could work, too.

You might be interested in the Kafka Retry project https://github.com/IBM/kafka-retry. It provides a delayed retry queue using a single retry topic.

Detecting ActiveMQ flow control

I have a production system that uses ActiveMQ (5.3.2) to send messages from server A to server B. A few weeks ago, the system inexplicably started taking 10+ second to send a message. After a reboot of the producer, the system worked fine.
After investigation, I'm pretty sure this is due to producer flow control. (I have a fairly standard activemq setup). The day before this happened (for other reasons) my consumer software had been acting erratically and had even stopped accepting connections for a while. So I'm guessing this triggered this. (It does puzzle me that the requests were still being throttled a day later).
Question -- how can I confirm that the requests were being throttled. I took a heap dump of the server -- is there data in memory I can look for?
Edit: I've found the following:
WireFormatNegotiator.tcpNoDelayEnabled=false for one of three WireFormatNegotiator instances in the memory. I'm trying to figure out what sets this.
And second (and more important), is there a way I can use JMX to tell if the messages are being throttled? I'd like to set up a Nagios alert to let me know if this happens in the future. What property should I check for with JMX?

you can configure your producer client to throw javax.jms.ResourceAllocationException exceptions which can then be detected/logged, etc. just set one of the following...
<systemUsage>
<systemUsage sendFailIfNoSpaceAfterTimeout="3000">
...OR...
<systemUsage sendFailIfNoSpace="true">

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.