Spring AMQP stuck queue due to unack'd message - java

I am using a SimpleMessageListenerContainer and had problems that every hour or so the queue would get stuck and nothing would be processed due to an unack'd message.
I am sure this is due an error that isn't being caught properly but can't trace the issue.
I have set the acknowledge mode to NONE and this "fixed" the issue but it is really just hiding the issue. Also if I want to throw a AmqpException and re-queue the message this doesn't work with acknowledge mode set to NONE.
My question is how can I trace the issue with the queue getting stuck, is there a way to see the payload of the unack'd message? Or is there an acknowledgement mode that will allow acknowledges to not to be needed but re-queue messages if an exception is thrown?
Here is how I am registering a listener:
final SimpleMessageListenerContainer container = new SimpleMessageListenerContainer();
container.setConnectionFactory(connectionFactory);
container.setQueueNames(queueName);
container.setMessageListener(new MQMessageListenerWrapper(listener));
container.setAcknowledgeMode(AcknowledgeMode.NONE);
container.start();
Thanks.

My best guess is your consumer thread is hung someplace upstream of the listener. When control is returned to the container, the message is ack'd or rejected; it can't be left in an unack'd state if the thread returned to the container.
Use jstack <pid> to find out where the consumer thread is stuck.
You are correct NONE is just masking the issue.

When the queue gets stuck look at the connections listening on the specific queue. Could be a sign of some sort of dead lock scenario because of 2 (or more) consumer-threads listening on the same queue - and therefore being blocked by rabbit.

This was an issue within my code that I finally tracked down as it only occurred in a rare instance.
This had nothing to do with Spring AMQP or RabbitMQ just my bad coding :-)

Related

How to handle RMQ connection loss?

We're using Java RMQ client in Scala and we're experiencing some issues on DEV environment. We have this fallback strategy set up:
def addConnectionShutdownListener(connection: Connection): Unit ={
connection.addShutdownListener { cause: ShutdownSignalException =>
logger.error(s"Error on RMQ connection: ${cause.getMessage}", cause)
if (exitOnFail) {
logger.info("Terminating process with RMQ consumer is shut down")
System.exit(1)
}
else if (retryOnFail) {
logger.info(s"Retrying to connect")
retryCreatingConnection(1)
}
}
}
addConnectionShutdownListener(rmqConnection)
In a similar fashion, I added channel connection shutdown listener.
So there are 2 strategies which we use (and modify through config)
exit on fail
retry on fail
I set up exit on fail strategy and sometimes it works correctly. I see this line on log when error happens Terminating process with RMQ consumer is shut down and service is restarted correctly (kubernetes pod is shut down and it is started automatically again). I disabled RMQ auto recovery because it didn't worked at all.
The problem is sometimes some queues are left without consumers and messages are being queued and hanging, but there is no this error message in log. It's really hard to test it, since I don't know what circumstances happened on our DEV environment.
What could happen?
Is there a better way to handle a connection loss, or to be more precise - to handle a scenario when consumers are somehow detached from queue?
Thanks in advance,
Amer

ActiveMQ - cannot rollback non transaced session INVIDUAL_ACK

Is it possible to rollback async processed message in ActiveMQ? I'm consuming next message while first one is still processing, so while I'm trying to rollback the first message on another (not activemq pool) thread, I'm getting above error. Eventually should I sednd message to DLQ manually?
Message error handling can work a couple ways:
Broker-side 'redelivery policy'. Where the client invokes a rollback n number (default is usually 6 retries) of times and the broker automatically moves the message to a Dead Letter Queue (DLQ)
Client-side. Application consumes the message and then produces to the DLQ.
Option #1 is good for unplanned/planned outages-- database down, etc. Where you want automatic retry. The re-delivery policy can also be configured when the client connects to the broker.
Option #2 is good for 'bad data' scenarios where you know the message will never be able to be processed. This is ideal, because you can move the message on the 1st consumption and not have to reject the message n number of times.
When you combine infinite retry with #1 and include #2 in your application flow, you can have a robust process flow of automatic retry, and move-bad-data-out-of-the-way-quickly. Best of breed =)
ActiveMQ Redelivery policy

spring rabbit exception in consumer issue

I have a spring rabbit consumer:
public class SlackIdle1Consumer extends AbstractMessageConsumer {
#Override public void process(Message amqpMessage, Channel channel)
throws Exception {
/*very bad exception goes here.
it causes amqp message to be rejected and if no other consumer is available and error
still persists, the message begins looping over and over.
And when the error is fixed,
those messages are being processed but the result of this procession may be harmful.
*/
}
}
}
And somewhere inside an exception happens. Lets imagine this is a bad exception - development logic error. So amqp message begins to spin indefinitely, and when error is fixed and consumer restarted, all old messages are being processed, and it's bad, because logic and data may change since those messages were sent. How to handle it properly?
So the question is: how to fix this situation properly? Should I wrap all my code to try-catch clause or will I have to develop 'checks' in each consumer to prevent consistency issues in my app?
There are several options:
Set the container's defaultRequeueRejected property to false so failed messages are always rejected (discarded or sent to a dead letter exchange depending on queue configuration).
If you want some exceptions to be retried and others not, then add a try catch and throw an AmqpRejectAndDontRequeueException to reject those you don't want retried.
Add a custom ErrorHandler to the container, to do the same thing as #2 - determine which exceptions you want retried - documentation here.
Add a retry advice with a recoverer - the default recoverer simply logs the error, the RejectAndDontRequeueRecoverer causes the message to be rejected after retries are exhausted, the RepublishMessageRecoverer is used to write to a queue with additional diagnostics in headers - documentation here.

How to temporarily disable a message listener

What would be a nice and good way to temporarily disable a message listener? The problem I want to solve is:
A JMS message is received by a message listener
I get an error when trying to process the message.
I wait for my system to get ready again to be able to process the message.
Until my system is ready, I don't want any more messages, so...
...I want to disable the message listener.
My system is ready for processing again.
The failed message gets processed, and the JMS message gets acknowledged.
Enable the message listener again.
Right now, I'm using Sun App Server. I disabled the message listener by setting it to null in the MessageConsumer, and enabled it again using setMessageListener(myOldMessageListener), but after this I don't get any more messages.
How about if you don't return from the onMessage() listener method until your system is ready to process messages again? That'll prevent JMS from delivering another message on that consumer.
That's the async equivalent of not calling receive() in a synchronous case.
There's no multi-threading for a given JMS session, so the pipeline of messages is held up until the onMessage() method returns.
I'm not familiar with the implications of dynamically calling setMessageListener(). The javadoc says there's undefined behavior if called "when messages are being consumed by an existing listener or sync consumer". If you're calling from within onMessage(), it sounds like you're hitting that undefined case.
There are start/stop methods at the Connection level, if that's not too coarse-grained for you.
Problem solved by a workaround replacing the message listener by a receive() loop, but I'm still interested in how to disable a message listener and enable it shortly again.
That looks to me like the messages are being delivered but nothing is happening with them because you have no listener attached. It's been a while since I've done anything with JMS but don't you want to have the message sent to the dead letter queue or something while you fix the system, and then move the messages back onto the original queue once you're ready for processing again?
On WebLogic you can set up max retries, an error queue to handle messages that exceed the max retry limit, and other parameters. I'm not certain off the top of my head, but you also might be able to specify a wait period. All this is available to you in the admin console. I'd look at the admin for the JMS provider you've got and see if it can do something similar.
In JBoss the following code will do the trick:
MBeanServer mbeanServer = MBeanServerLocator.locateJBoss();
ObjectName objName = new ObjectName("jboss.j2ee:ear=MessageGateway.ear,jar=MessageGateway-EJB.jar,name=MessageSenderMDB,service=EJB3");
JMSContainerInvokerMBean invoker = (JMSContainerInvokerMBean) MBeanProxy.get(JMSContainerInvokerMBean.class, objName, mbeanServer);
invoker.stop(); //Stop MDB
invoker.start(); //Start MDB
I think you can call
messageConsumer.setMessageListener(null);
inside your MessageListener implementation and schedule the reestablishment task (for example in ScheduledExecutorService). This task should call
connection.stop();
messageConsumer.setMessageListener(YOUR_NEW_LISTENER);
connection.start();
and it will be working. start() and stop() methods are used for restarting delivery structrues (not TCP connection).
Read the Javadoc https://docs.oracle.com/javaee/7/api/javax/jms/Connection.html#stop--
Temporarily stops a connection's delivery of incoming messages. Delivery can be restarted using the connection's start method. When the connection is stopped, delivery to all the connection's message consumers is inhibited: synchronous receives block, and messages are not delivered to message listeners.
For temporarily stops a connection's delivery of incoming messages you need to use stop() method from Connection interface: https://docs.oracle.com/javaee/7/api/javax/jms/Connection.html#stop--
Just don't call connection.stop() from MessageListener because according to JMS spec. you will get deadlock or exception. Instead you can call connection.stop() from different thread, you just need to synchronize MessageListener and thread that going to suspend connection with function connection.stop()

Is this a realistic expectation of a distributed mechanism?

I've been evaluating ActiveMQ as a candidate message broker. I've written some test code to try and get an understanding of ActiveMQ's performance limitations.
I can produce a failure state in the broker by sending messages as fast as possible like this:
try {
while(true) {
byte[] payload = new byte[(int) (Math.random() * 16384)];
BytesMessage message = session.createBytesMessage();
message.writeBytes(payload);
producer.send(message);
} catch (JMSException ex) { ... }
I was surprised that the line
producer.send(message);
blocks when the broker enters a failed state. I was hoping that some exception would be thrown, so there would be some indication that the broker has failed.
I realize that my test code is spamming the broker, and I expect the broker to fail. However, I would prefer that the broker failed "loudly" as opposed to simply blocking.
Is this an unrealistic expectation?
Update:
Uri's answer references an ActiveMQ bug report that was filed in March. The bug description includes a proposal that sounds like what I'm looking for: "if the request on the transport had a timeout (this is to catch failure scenarios, so something that's not expected to reasonably happen), things would have errored out rather than building waiting threads."
However, after 8 months the bug is currently unassigned with a single vote. So I guess the question still stands, is this something ActiveMQ should (will?) implement?
You are testing the 'slow consumer' and producer flowcontrol issue all message brokers have to deal with. Do you wanna fail producers, block them or spool to disk?
Basically the out of the box default in ActiveMQ is to block producers. But you can configure message cursors to spool to disk.
BTW you've not said if you are using queues/topics or persistent/non-persistent; if you are using non persistent topics there are other strategies you can use for discarding messages etc.
Apprently there's a known issue, not sure if it's been fixed:
https://issues.apache.org/activemq/browse/AMQ-1625
Not sure about ActiveMQ config, but other JMS providers have various configuration options - so you maybe able to get ActiveMQ to do as you wish in that situation.
I know Fiorano has options to specify whether providers block or not in this situation.

Categories