We have a use case that holds active jobs in a RabbitMQ job queue. The Clients, when they are free, pull jobs from this queue. Pretty normal. But in our case, we do not ACK the jobs. We allow them to stay in the Unacked state so if the Client dies, the job goes to a pre-defined DeadLetter queue. We then have a process that pulls messages from dead-letter, and decides to either requeue message back to original job-queue, or discard.
This has worked well for a long time. Now, we have upgraded to a newer version of RMQ, and found that we get disconnects with PRECONDITION_FAILED, because the default ack timeout of 30 minutes has expired.
Beyond removing this from the server, does anyone know a way to configure this on a per-message level?
While some might say just ACK the job, and use a handler to return to DEADLETTER if needed. Well, sorry, that will not work for us.
So, any thoughts?
No, there is not at this time. You should configure the default to be greater than the longest expected job duration. Please note that if you are using quorum queues this may cause disk usage growth because the log files can't be compacted while messages are outstanding.
We may make this timeout configurable in a more granular way, so please keep an eye on future RabbitMQ releases for that.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
Related
I have an architecture with multiple pods subscribing to a GCP Topic.
Every pod handles messages while it's up but is not interested in receiving messages it missed when it was not up.
In ActiveMQ this was Non Persistent messages, but I don't see the equivalent in GCP.
The only thing I thought is message lifetime with a minimum of 10 minutes.
Is this possible in GCP and where can it be configured ?
There is no option in Cloud Pub/Sub to disable storage. You have two options.
As you suggest, set the message retention duration to the minimum, 10 minutes. This does mean you'll get messages that are up to ten minutes old when the pod starts up. The disadvantage to this approach is that if there is an issue with your pod and it falls more than ten minutes behind in processing messages, then it will not get those messages, even when it is up.
Use the seek operation and seek to seek forward to the current timestamp. When a pod starts up, the first thing it could do is issue a Seek command that would acknowledge all messages before the provided timestamp. Note that this operation is eventually consistent and so it is possible you may still get some older messages when you initially start your subscriber (or, if you are using push, once your endpoint is up). Also keep in mind that the seek operation is an administrative operation and therefore is limited to 6,000 operations a minute (100 operations per second). Therefore, if you have a lot of pods that are often restarting, you may run into this limit.
I am using a managed RabbitMQ cluster through AWS Amazon-MQ. If the consumers finish their work quickly then everything is working fine. However, depending on few scenarios few consumers are taking more than 30 mins to complete the processing.
In that scenarios, RabbitMQ deletes the consumer and makes the same messages visible again in the queue. Becasue of this another consumer picks it up and starts processing. It is happing in the loop. Therefore the same transaction is getting executed again and I am loosing the consumer as well.
I am not using any AcknowledgeMode so I believe it's AUTO by default and it has 30 mins limit.
Is there any way to increase the Delivery Acknowledgement Timeout for AUTO mode?
Or please let me know if anyone has any other solutions for this.
Reply From AWS Support:
Consumer timeout is now configurable but can be done only by the service team. The change will be permanent irrespective of any version.
So you may update RabbitMQ to latest, and no need to stick with 3.8.11. Provide your broker details and desired timeout, they should be able to do it for you.
This is the response from AWS support.
From my understanding, I see that your workload is currently affected by the consumer_timeout parameter that was introduced in v3.8.15.
We have had a number of reach outs due to this, unfortunately, the service team has confirmed that while they can manually edit the rabbitmq.conf, this will be overwritten on the next reboot or failover and thus is not a recommended solution. This will also mean that all security patching on the brokers where a manual change is applied, will have to be paused. Currently, the service does not support custom user configurations for RabbitMQ from this configuration file, but have confirmed they are looking to address this in future, however, is not able to an ETA on when this will available.
From the RabbitMQ github, it seems this was added for quorum queues in v3.8.15 (https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.15 ), but seems to apply to all consumers (https://github.com/rabbitmq/rabbitmq-server/pull/2990 ).
Unfortunately, RabbitMQ itself does not support downgrades (https://www.rabbitmq.com/upgrade.html )
Thus the recommended workaround and safest action form the service team, as of now is to create a new broker on an older version (3.8.11) and set auto minor version upgrade to false, so that it wont be upgraded.
Then export the configuration from the existing RabbitMQ instance and import it into new instance and use this instance going forward.
I've been browsing the forums for last few days and tried almost everything i could find, but without any luck.
The situation is: inside our Java Web Application we have ActiveMQ 5.7 (I know it's very old, eventually we will upgrade to newer version - but for some reasons it's not possible right now). We have only one broker and multiple consumers.
When I start the servers (I have tried to do so for 2, 3, 4 and more servers) everything is ok. The servers are comunicating with each other, QUEUE messages are consumed instantly. But when I leave the servers idle (for example to finally catch some sleep ;) ) it is no longer the case. Messages are stuck in the database and are not beign consumed. The only option to have them delivered is to restart the server.
Part of my configuration (we keep it in properties file, it's the actual state, however I have tried many different combinations):
BrokerServiceURI=broker:(tcp://0.0.0.0:{0})/{1}?persistent=true&useJmx=false&populateJMSXUserID=false&useShutdownHook=false&deleteAllMessagesOnStartup=false&enableStatistics=true
ConnectionFactoryURI=failover://({0})?initialReconnectDelay=100&timeout=6000
ConnectionFactoryServerURI=tcp://{0}:{1}?keepAlive=true&soTimeout=100&wireFormat.cacheEnabled=false&wireFormat.tightEncodingEnabled=false&wireFormat.maxInactivityDuration=0
BrokerService.startAsync=true
BrokerService.networkConnectorStartAsync=true
BrokerService.keepDurableSubsActive=false
Do you have a clue?
I cannot actually tell you the reason from the description mentioned above but I can list down a few checks that are fresh in my mind. Please confirm the following if they are valid for you or not.
Can you check the consumer connections?
Are the consumer sessions still active?
If all the consumer-connections are up, then check the thread-dump whether the active consumer threads (I'm assuming you created consumer threads, correct me if I'm wrong) are in RUNNING or WAITING state(this happened with me where all the consumers were active but some other thread was keeping a lock on Logger while posting a message to slack and the consumers were in WAITING state) because of some other thread in the server).
Check the Dispatch queue size for each consumer. Check the prefetch of each consumer and then compare Dispatch Queue size with Prefetch, refer
Is there a JMSXGroupID you are allotting to each message?
Can you tell a little more about your consumer/producer/broker configurations?
We are running a high throughput system that utilizes tibco-ems JMS to pass large numbers of messages to and from our main server to our client connections. We've done some statistics and have determined that JMS is the causing a lot of latency. How can we make tibco JMS more performant? Are there any resources that give a good discussion on this topic.
Using non-persistent messages is one option if you don't need persistence.
Note that even if you do need persistence, sometimes it's better to use non persistent messages, and in case of a crash perform a different recovery action (like resending all messages)
This is relevant if:
crashes are rare (as the recovery takes time)
you can easily detect a crash
you can handle duplicate messages (you may not know exactly which messages were delivered before the crash
EMS also provides some mechanisms that are persistent, but less bullet proof then classic guaranteed delivery
these include:
instead of "exactly once" message delivery you can use "at least once" or "up to once" delivery.
you may use the pre-fetch mechanism which causes the client to fetch messages to memory before your application request them.
EMS should not be the bottle neck. I've done testing and we have gotten a shitload of throughput on our server.
You need to try to determine where the bottle neck is. Is the problem in the producer of the message or the consumer. Are messages piling up on the queue.
What type of scenario are you doing.
Pub/sup or request reply?
are you having temporary queue pile up. Too many temporary queues can cause performance issues. (Mostly when they linger because you didn't close something properly)
Are you publishing to a topic with durable subscribers if so. Try bridging the topic to queue and reading from those. Durable subscribers can cause a little hiccup in performance too since it needs to track who has copies of all messages.
Ensure that your sending process has one session and multiple calls through that session. Don't open a complete session for each operation. Re-use where possible. Do the same for the consumer.
make sure you CLOSE when you are done. EMS doesn't clear things up. So if you make a connection and just close your app the connection still is there and sucking up resources.
review your tolerance for lost messages in the even of a crash. If you are doing Client ack and it doesn't matter if you crash processing the message then switch to auto. Also I believe if you are using (TEMS - Tibco EMS for WCF) there's a problem with the session acknowledge. So a message is only when its processed on the whole message, we switched from Client ACK to the one that had Dups ok and it worked better)
My Java EE application sends JMS to queue continuously, but sometimes the JMS consumer application stopped receiving JMS. It causes the JMS queue very large even full, that collapses the server.
My server is JBoss or Websphere. Do the application servers provide strategy to remove "timeout" JMS messages?
What is strategy to handle large JMS queue? Thanks!
With any asynchronous messaging you must deal with the "fast producer/slow consumer" problem. There are a number of ways to deal with this.
Add consumers. With WebSphere MQ you can trigger a queue based on depth. Some shops use this to add new consumer instances as queue depth grows. Then as queue depth begins to decline, the extra consumers die off. In this way, consumers can be made to automatically scale to accommodate changing loads. Other brokers generally have similar functionality.
Make the queue and underlying file system really large. This method attempts to absorb peaks in workload entirely in the queue. This is after all what queuing was designed to do in the first place. Problem is, it doesn't scale well and you must allocate disk that 99% of the time will be almost empty.
Expire old messages. If the messages have an expiry set then you can cause them to be cleaned up. Some JMS brokers will do this automatically while on others you may need to browse the queue in order to cause the expired messages to be deleted. Problem with this is that not all messages lose their business value and become eligible for expiry. Most fire-and-forget messages (audit logs, etc.) fall into this category.
Throttle back the producer. When the queue fills, nothing can put new messages to it. In WebSphere MQ the producing application then receives a return code indicating that the queue is full. If the application distinguishes between fatal and transient errors, it can stop and retry.
The key to successfully implementing any of these is that your system be allowed to provide "soft" errors that the application will respond to. For example, many shops will raise the MAXDEPTH parameter of a queue the first time they get a QFULL condition. If the queue depth exceeds the size of the underlying file system the result is that instead of a "soft" error that impacts a single queue the file system fills and the entire node is affected. You are MUCH better off tuning the system so that the queue hits MAXDEPTH well before the file system fills but then also instrumenting the app or other processes to react to the full queue in some way.
But no matter what else you do, option #4 above is mandatory. No matter how much disk you allocate or how many consumer instances you deploy or how quickly you expire messages there is always a possibility that your consumer(s) won't keep up with message production. When this happens your producer app should throttle back, or raise an alarm and stop or do anything other than hang or die. Asynchronous messaging is only asynchronous up to the point that you run out of space to queue messages. After that your apps are synchronous and must gracefully handle that situation, even if that means to (gracefully) shut own.
Sure!
http://download.oracle.com/docs/cd/E17802_01/products/products/jms/javadoc-102a/index.html
Message#setJMSExpiration(long) does exactly what you want.