How to make ActiveMQ detect duplicate message from message publisher (Idempotent producer)

How to make ActiveMQ detect duplicate message from message publisher (Idempotent producer) - java

Does ActiveMQ support Idempotent producer? I know Camel has an idempotent consumer pattern to detect and handle duplicate messages, but I'm wondering if this can be prevented at the source (producer).
Here is a little back ground. I have applications that are horizontally scaled accessing same database. There is one particular table that maintains status of a particular process. These horizontal applications should be able to read the status and invoke another process, however only one of them should be able to invoke it. This application periodically polls the data base and posts a message to a messaging broker, once the required condition is met. But I want one of the load balancing application should be able to post the message.
One crude approach I'm thinking is...
On Machine 1:
Read the database for checking if the necessary condition is met.
Before posting message to the broker, write a record to another status table with a unique key that identifies the process and commits. If this operation fails due to unique key constraint violation, it means process on another machine succeeded in posting the message.
Post the message to the broker
If the message posting is failed, for some reason, perform delete operation on the status table based on the unique key/ primary key.
The same operation can be performed by same application running on machine 2 , 3, 4 etc.
Below is one pitfall I quickly notice with this approach.
Assuming that Machine 1 is able to complete step 2 but failed performing step 3 and continues with step 4. Meanwhile Machine 2, when it failed at step 2, will move on with out attempting to read the status again and post the message.
To address this, I need to put retry on step 3, until the message is successfully posted to broker.
Another option is to use https://camel.apache.org/components/latest/eips/idempotentConsumer-eip.html pattern. But this is essentially a filter at consumer side. Though this will serve my purpose, is there a similar approach out of box available on message publishing side.
I wonder, if this approach is even correct or any better alternative approach, or any existing libraries that can be used to perform locking kind of mechanism across JVM either local or remote.

It's not clear what version of ActiveMQ you're using (i.e. ActiveMQ 5.x or ActiveMQ Artemis) so I'll try to address this issue for both.
ActiveMQ 5.x doesn't have any built-in support for detecting duplicates sent from clients. However, you could potentially implement this feature using a broker plugin. The only challenge I see here is configuring, managing, and monitoring the cache of duplicate IDs.
ActiveMQ Artemis does have built in support for detecting duplicates sent from clients. You can read more about duplicate detection in the documentation. Since the broker supports this behavior natively it provides clean configuration, management, and monitoring.
In either case you'll need to set a special header on each message with "a unique key that identifies the process" just like you would for your potential database solution. Furthermore, using the broker as the duplicate detector is much simpler overall.
If you're currently using ActiveMQ 5.x but want to move to ActiveMQ Artemis in order to use the duplicate detection feature you don't necessarily need to update your clients as ActiveMQ Artemis fully supports the OpenWire protocol used by 5.x clients. You should just be able to point them to the new instance of ActiveMQ Artemis and have everything work.

Related

RabbitMQ Delivery Acknowledgement Timeout

I am using a managed RabbitMQ cluster through AWS Amazon-MQ. If the consumers finish their work quickly then everything is working fine. However, depending on few scenarios few consumers are taking more than 30 mins to complete the processing.
In that scenarios, RabbitMQ deletes the consumer and makes the same messages visible again in the queue. Becasue of this another consumer picks it up and starts processing. It is happing in the loop. Therefore the same transaction is getting executed again and I am loosing the consumer as well.
I am not using any AcknowledgeMode so I believe it's AUTO by default and it has 30 mins limit.
Is there any way to increase the Delivery Acknowledgement Timeout for AUTO mode?
Or please let me know if anyone has any other solutions for this.

Reply From AWS Support:
Consumer timeout is now configurable but can be done only by the service team. The change will be permanent irrespective of any version.
So you may update RabbitMQ to latest, and no need to stick with 3.8.11. Provide your broker details and desired timeout, they should be able to do it for you.

This is the response from AWS support.
From my understanding, I see that your workload is currently affected by the consumer_timeout parameter that was introduced in v3.8.15.
We have had a number of reach outs due to this, unfortunately, the service team has confirmed that while they can manually edit the rabbitmq.conf, this will be overwritten on the next reboot or failover and thus is not a recommended solution. This will also mean that all security patching on the brokers where a manual change is applied, will have to be paused. Currently, the service does not support custom user configurations for RabbitMQ from this configuration file, but have confirmed they are looking to address this in future, however, is not able to an ETA on when this will available.
From the RabbitMQ github, it seems this was added for quorum queues in v3.8.15 (https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.15 ), but seems to apply to all consumers (https://github.com/rabbitmq/rabbitmq-server/pull/2990 ).
Unfortunately, RabbitMQ itself does not support downgrades (https://www.rabbitmq.com/upgrade.html )
Thus the recommended workaround and safest action form the service team, as of now is to create a new broker on an older version (3.8.11) and set auto minor version upgrade to false, so that it wont be upgraded.
Then export the configuration from the existing RabbitMQ instance and import it into new instance and use this instance going forward.

Delete Mail using camel without the consumer

Hy all,
in the software I'm developing, I have different camel routes that work on data, that is (in this case) loaded from an imap server using the camel-mail component.
Each of those routes does something with the data and then gives the data to the next route. They are dynamically configured at runtime.
In between those routes is an embedded ActiveMQ server which is used by each route to load the data from and save the data to (for the next route to pick it up).
Because of this structure I'm having a special case with the camel-mail consumer.
When loading a mail and sending it to the first ActiveMQ queue, it is immediatelly deleted/marked as read (depending on the settings on the mail consumer), but the actual processing of the mail has not concluded yet, as the next routes still have to process it.
This is a simplified view:
from("imaps://imap.server.com?...")
// Format mail in a way the other routes understand
.to("activemq:queue1"); // After this the mail is delete on the imap server
from("activemq:queue1")
// do some processing
.to("activemq:queue2");
from("activemq:queue2")
// Do some final processing
.to("..."); // NOW the mail should be delete on the imap server
This issue is even more a problem with the error handling I do.
Ever route in this "chain" sends failed exchanges to a deadLetterQueue on the ActiveMQ server. This way there is one error handling route, which picks up the failed exchanges and deals with them, on matter where it crashed.
In case there is a problem I want the email on the imap server to be handled differently (maybe even do nothing an try again on the next poll)
As camels InOut MEP returns the exchange to the (mail)consumer when the route ends i.e. when the exchange is given to the queue, I can't use the consumer to delete the mails after the whole process has ended.
Unfortunatelly I also don't see a delete option on the mail producer (which makes sense I guess, because its not how imap works).
I could also use smtp for this if thats necessary.
Does anybody have an idea how I could achieve this using no other connector then the camel component to connect to the mail server?
Greets and thanks in advance
Chris
Edit:
Adding the parameter "exchangePattern=InOut" to the jms queues (.to("activemq:queue1?exchangePattern=InOut")) lets the mail component wait for the whole process to finish.
The problem with that is, that we lose the big advantage with ActiveMQ that all routes are independent of each other. This is important so we are don't run into issues with consuming the mail when a later route takes a long time to process, which is very likely to happen.
So idealy we find a solution, where the mail is deleted without any component waiting for something to finish

Delayed Queue implementation in Storm – Kafka, Cassandra, Redis or Beanstalk?

I have a storm topology to process messages from Kafka and make HTTP call / saves in Cassandra based on the task in hand. I process the messages as soon as they come. How ever few messages are not processed completely due to the response form external sources such as an HTTP. I would like to implement a exponential backoff mechanism for retrial in-case HTTP server does not respond/returns an error message to retry after some time. I could think of few ideas using which I could achieve them. I would like to know which of them will be a better solution also if there is any other solution that I can use which is fault tolerant. Since this is used to implement an exponential backoff each message will have a different delay time.
Send it another topic in Kafka which is consumed later. My preferred Solution. I know we can use Kafka offset so consume the message at a latter stage. How ever I could not find documentation/Sample code to do the same. It will be really helpful if any one can help me out with this.
Write the message Cassandra / Redis and write a scheduler to fetch the messages which are not processed and are ready to be consumed and Send it to Kafka so that my storm topology can consume it. (Existing solution in other legacy project(Non Storm))
Send to Beanstalk with Delay (Existing solution in other legacy project(Non Storm). How ever I would like to avoid using this solution and use it only in case I am out of option).
While this is pretty much what I would like to do. I am not able to find documentation to implement delayProcessingUntil as mentioned in Kafka - Delayed Queue implementation using high level consumer
I have done scheduled job from Data-store and delay using Beanstalk in the past, but I would prefer to use Kafka.

Kafka spout has an exponential backoff message retry built-in. You can configure initial delay, delay multiplier and maximum delay through spout configuration. If there is an error in the bolt, you can call collector.fail(input). After that you just leave it to spout to do the retry.
https://github.com/apache/storm/blob/v0.10.0/external/storm-kafka/src/jvm/storm/kafka/ExponentialBackoffMsgRetryManager.java

I think your use case describes the need for a database rather than a queue. You want to temporarily store records until their time and then remove them so they don't show up in future searches. Trying to do that in a queue would be awkward at best, as your analysis shows.
I suggest you create another column family in Cassandra to hold these delayed requests. You'd store the request itself along with a time to retry. Whether you'd want to also have a time series of failed HTTP attempts and related data is up to you. As a delayed request is finally fulfilled, you'd delete the corresponding row from the CF. The search for delayed requests is straightforward, too.
Of course, any database, even a file on the local drive or in HDFS could work, too.

You might be interested in the Kafka Retry project https://github.com/IBM/kafka-retry. It provides a delayed retry queue using a single retry topic.

Tib RV - listing all the processes that are publishing to a given topic

We have RV messaging systems publishing and receiving messages.Recently some underlying jars were upgraded - these are serialization jars used by all publishers and subscribers. However , it seems that some of the publishers are still referencing old versions of the serialization jars and therefore the receivers fail when trying to deserialize received messages.
Obviously restarting these publisher services should fix the problem. However , how do I identify all publishers using a particular topic to send messages to ? There must be some RV admin way of listing all the processes that are publishing to a given topic ?

I just gave a similar answer on another question:
There is a really great tool for this called Rai Insight
Basically what it can do is to sit on a box and silently listen all the multicast data and represent statistics even in real time. We used it to monitor traffic flow spikes with just few seconds delay.
It can give you traffic statistics braked down by multicast group, service number or even sending machine. Traffic flow peak/average, retransmission rate peak/average. All you can think of.
It will also give you per-service per-topic information.

How is it possible to reliably send JMS message? (fail over MessageProducer.send() errors)

Is it possible to reliably send JMS message to the destination? By reliably I mean ensuring that if e.g. MessageProducer.send() call fails for some reason, it will be retried automatically. I realize that transaction session may use .recover() as last resort, but what about retrying? E.g. I have intermittent network failure in between session was established and attempted to send a message. How would recover() help in this case?

As far as I know, JMS do not support such behavior. You could search among specific vendor-extensions but, IMHO, it is unlikely that you find something suitable to your needs.
I see only two solutions to your problem:
Implement it. You can manage the JMS session manually, catch any exception and, if needed, use "set rollback only" function of transaction manager to invalidate the transaction.
Use a local queue to store messages and use a background service to move them to target remote queue. Note that many queue manager support this, e.g. Store and forward Queues of ActiveMQ. Obviously this way, your transaction boundary will not include remote queue.
I know that 2nd solutions is not a full answer to your problem but, many times, it is sufficient.

JMS doesn't specify the behavior you are looking for. In fact, JMS specifically addresses problems due to network failure by noting that you may get the same message twice and calls this a "functionally duplicate" message since from the point of view of the JMS broker, it has only been delivered once.
Since this is not part of JMS your answer lies in the different vendor implementations. For example, WebSphere MQ has a feature called "Multi-Instance Queue Manager" as of v7.0.1. A v7.0.1 client application will automatically retry the connection and even follow the QMgr from the primary to the secondary node in the event of a failure. The application blocks while this occurs and is not aware of the failover.
However, even with this behavior, your app still needs to code for the failure. For example, if using the WMQ automatic reconnect (or any provider's reconnect for that matter) you probably want to tune the length of time the app might block waiting to recover the connection so that the user doesn't experience an indefinite hang. When the call unblocks, the transaction is rolled back and any retry must occur in the code. This is appropriate since the transaction is associated with a connection that is no longer valid.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.