spring rabbit exception in consumer issue - java

I have a spring rabbit consumer:
public class SlackIdle1Consumer extends AbstractMessageConsumer {
#Override public void process(Message amqpMessage, Channel channel)
throws Exception {
/*very bad exception goes here.
it causes amqp message to be rejected and if no other consumer is available and error
still persists, the message begins looping over and over.
And when the error is fixed,
those messages are being processed but the result of this procession may be harmful.
*/
}
}
}
And somewhere inside an exception happens. Lets imagine this is a bad exception - development logic error. So amqp message begins to spin indefinitely, and when error is fixed and consumer restarted, all old messages are being processed, and it's bad, because logic and data may change since those messages were sent. How to handle it properly?
So the question is: how to fix this situation properly? Should I wrap all my code to try-catch clause or will I have to develop 'checks' in each consumer to prevent consistency issues in my app?

There are several options:
Set the container's defaultRequeueRejected property to false so failed messages are always rejected (discarded or sent to a dead letter exchange depending on queue configuration).
If you want some exceptions to be retried and others not, then add a try catch and throw an AmqpRejectAndDontRequeueException to reject those you don't want retried.
Add a custom ErrorHandler to the container, to do the same thing as #2 - determine which exceptions you want retried - documentation here.
Add a retry advice with a recoverer - the default recoverer simply logs the error, the RejectAndDontRequeueRecoverer causes the message to be rejected after retries are exhausted, the RepublishMessageRecoverer is used to write to a queue with additional diagnostics in headers - documentation here.

Related

Spring Cloud Stream - notice and handle errors in broker

I am fairly new to developing distributed applications with messaging, and to Spring Cloud Stream in particular. I am currently wondering about best practices on how to deal with errors on the broker side.
In our application, we need to both consume and produce messages from/to multiple sources/destinations like this:
Consumer side
For consuming, we have defined multiple #Beans of type java.util.function.Consumer. The configuration for those looks like this:
spring.cloud.stream.bindings.consumeA-in-0.destination=inputA
spring.cloud.stream.bindings.consumeA-in-0.group=$Default
spring.cloud.stream.bindings.consumeB-in-0.destination=inputB
spring.cloud.stream.bindings.consumeB-in-0.group=$Default
This part works quite well - wenn starting the application, the exchanges "inputA" and "inputB" as well as the queues "inputA.$Default" and "inputB.$Default" with corresponding binding are automatically created in RabbitMQ.
Also, in case of an error (e.g. a queue is suddenly not available), the application gets notified immediately with a QueuesNotAvailableException and continuously tries to re-establish the connection.
My only question here is: Is there some way to handle this exception in code? Or, what are best practices to deal with failures like this on broker side?
Producer side
This one is more problematic. Producing messages is triggered by some internal logic, we cannot use function #Beans here. Instead, we currently rely on StreamBridge to send messages. The problem is that this approach does not trigger creation of exchanges and queues on startup. So when our code calls streamBridge.send("outputA", message), the message is sent (result is true), but it just disappears into the void since RabbitMQ automatically drops unroutable messages.
I found that with this configuration, I can at least get RabbitMQ to create exchanges and queues as soon as the first message is sent:
spring.cloud.stream.source=produceA;produceB
spring.cloud.stream.default.producer.requiredGroups=$Default
spring.cloud.stream.bindings.produceA-out-0.destination=outputA
spring.cloud.stream.bindings.produceB-out-0.destination=outputB
I need to use streamBridge.send("produceA-out-0", message) in code to make it work, which is not too great since it means having explicit configuration hardcoded, but at least it works.
I also tried to implement the producer in a Reactor style as desribed in this answer, but in this case the exchange/queue also is not created on application startup and the sent message just disappears even though the return status of the sending method is "OK".
Failures on the broker side are not registered at all with this approach - when I simulate one e.g. by deleting the queue or the exchange, it is not registered by the application. Only when another message is sent, I get in the logs:
ERROR 21804 --- [127.0.0.1:32404] o.s.a.r.c.CachingConnectionFactory : Shutdown Signal: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'produceA-out-0' in vhost '/', class-id=60, method-id=40)
But still, the result of StreamBridge#send was true in this case. But we need to know that sending did actually fail at this point (we persist the state of the sent object using this boolean return value). Is there any way to accomplish that?
Any other suggestions on how to make this producer scenario more robust? Best practices?
EDIT
I found an interesting solution to the producer problem using correlations:
...
CorrelationData correlation = new CorrelationData(UUID.randomUUID().toString());
messageHeaderAccessor.setHeader(AmqpHeaders.PUBLISH_CONFIRM_CORRELATION, correlation);
Message<String> message = MessageBuilder.createMessage(payload, messageHeaderAccessor.getMessageHeaders());
boolean sent = streamBridge.send(channel, message);
try {
final CorrelationData.Confirm confirm = correlation.getFuture().get(30, TimeUnit.SECONDS);
if (correlation.getReturned() == null && confirm.isAck()) {
// success logic
} else {
// failed logic
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
// failed logic
} catch (ExecutionException | TimeoutException e) {
// failed logic
}
using these additional configurations:
spring.cloud.stream.rabbit.default.producer.useConfirmHeader=true
spring.rabbitmq.publisher-confirm-type=correlated
spring.rabbitmq.publisher-returns=true
This seems to work quite well, although I'm still clueless about the return value of StreamBridge#send, it is always true and I cannot find information in which cases it would be false. But the rest is fine, I can get information on issues with the exchange or the queue from the correlation or the confirm.
But this solution is very much focused on RabbitMQ, which causes two problems:
our application should be able to connect to different brokers (e.g. Azure Service Bus)
in tests we use Kafka binder and I don't know how to configure the application context to make it work in this case, too
Any help would be appreciated.
On the consumer side, you can listen for an event such as the ListenerContainerConsumerFailedEvent.
https://docs.spring.io/spring-amqp/docs/current/reference/html/#consumer-events
On the producer side, producers only know about exchanges, not any queues bound to them; hence the requiredGroups property which causes the queue to be bound.
You only need spring.cloud.stream.default.producer.requiredGroups=$Default - you can send to arbitrary destinations using the StreamBridge and the infrastructure will be created.
#SpringBootApplication
public class So70769305Application {
public static void main(String[] args) {
SpringApplication.run(So70769305Application.class, args);
}
#Bean
ApplicationRunner runner(StreamBridge bridge) {
return args -> bridge.send("foo", "test");
}
}
spring.cloud.stream.default.producer.requiredGroups=$Default

ActiveMQ - cannot rollback non transaced session INVIDUAL_ACK

Is it possible to rollback async processed message in ActiveMQ? I'm consuming next message while first one is still processing, so while I'm trying to rollback the first message on another (not activemq pool) thread, I'm getting above error. Eventually should I sednd message to DLQ manually?
Message error handling can work a couple ways:
Broker-side 'redelivery policy'. Where the client invokes a rollback n number (default is usually 6 retries) of times and the broker automatically moves the message to a Dead Letter Queue (DLQ)
Client-side. Application consumes the message and then produces to the DLQ.
Option #1 is good for unplanned/planned outages-- database down, etc. Where you want automatic retry. The re-delivery policy can also be configured when the client connects to the broker.
Option #2 is good for 'bad data' scenarios where you know the message will never be able to be processed. This is ideal, because you can move the message on the 1st consumption and not have to reject the message n number of times.
When you combine infinite retry with #1 and include #2 in your application flow, you can have a robust process flow of automatic retry, and move-bad-data-out-of-the-way-quickly. Best of breed =)
ActiveMQ Redelivery policy

Spring AMQP stuck queue due to unack'd message

I am using a SimpleMessageListenerContainer and had problems that every hour or so the queue would get stuck and nothing would be processed due to an unack'd message.
I am sure this is due an error that isn't being caught properly but can't trace the issue.
I have set the acknowledge mode to NONE and this "fixed" the issue but it is really just hiding the issue. Also if I want to throw a AmqpException and re-queue the message this doesn't work with acknowledge mode set to NONE.
My question is how can I trace the issue with the queue getting stuck, is there a way to see the payload of the unack'd message? Or is there an acknowledgement mode that will allow acknowledges to not to be needed but re-queue messages if an exception is thrown?
Here is how I am registering a listener:
final SimpleMessageListenerContainer container = new SimpleMessageListenerContainer();
container.setConnectionFactory(connectionFactory);
container.setQueueNames(queueName);
container.setMessageListener(new MQMessageListenerWrapper(listener));
container.setAcknowledgeMode(AcknowledgeMode.NONE);
container.start();
Thanks.
My best guess is your consumer thread is hung someplace upstream of the listener. When control is returned to the container, the message is ack'd or rejected; it can't be left in an unack'd state if the thread returned to the container.
Use jstack <pid> to find out where the consumer thread is stuck.
You are correct NONE is just masking the issue.
When the queue gets stuck look at the connections listening on the specific queue. Could be a sign of some sort of dead lock scenario because of 2 (or more) consumer-threads listening on the same queue - and therefore being blocked by rabbit.
This was an issue within my code that I finally tracked down as it only occurred in a rare instance.
This had nothing to do with Spring AMQP or RabbitMQ just my bad coding :-)

How to configure Camel's RedeliveryPolicy retriesExhaustedLogLevel?

I have set up an errorHandler in a Camel route that will retry a message several times before sending the message to a dead letter channel (an activemq queue in this case). What I would like is to see an ERROR log when the message failed to be retried the max number of times and was then sent to the dead letter queue.
Looking at the docs for error handling and dead letter channels, it seems that there are 2 options available on the RedeliveryPolicy: retriesAttemptedLogLevel and retriesExhaustedLogLevel. Supposedly by default the retriesExhaustedLogLevel is already set at LoggingLevel.ERROR, but it does not appear to actually log anything when it has expended all retries and routes the message to the dead letter channel.
Here is my errorHandler definition via Java DSL.
.errorHandler(this.deadLetterChannel(MY_ACTIVE_MQ_DEAD_LETTER)
.useOriginalMessage()
.maximumRedeliveries(3)
.useExponentialBackOff()
.retriesExhaustedLogLevel(LoggingLevel.ERROR)
.retryAttemptedLogLevel(LoggingLevel.WARN))
I have explicitly set the level to ERROR now and it still does not appear to log out anything (to any logging level). On the other hand, retryAttemptedLogLevel is working just fine and will log to the appropriate LoggingLevel (ie, I could set retryAttemptedLogLevel to LoggingLevel.ERROR and see the retries as ERROR logs). However I only want a single ERROR log in the event of exhaustion, instead of an ERROR log for each retry when a subsequent retry could potentially succeed.
Maybe I am missing something, but it seems that the retriesExhaustedLogLevel does not do anything...or does not log anything if the ErrorHandler is configured as a DeadLetterChannel. Is there a configuration that I am still needing, or does this feature of RedeliveryPolicy not execute for this specific ErrorHandlerFactory?
I could also set up a route to send my exhausted messages that simply logs and routes to my dead letter channel, but I would prefer to try and use what is already built into the ErrorHandler if possible.
Updated the ErrorHandler's DeadLetterChannel to be a direct endpoint. Left the 2 logLevel configs the same. I got the 3 retry attempted WARN logs, but no ERROR log telling me the retries were exhausted. I did, however, set up a small route listening to the direct dead letter endpoint that logs, and that is working.
Not a direct solution to my desire to have the ERROR log work for the exhaustion, but is an acceptable workaround for now.
Please try with this code:
.errorHandler(deadLetterChannel("kafka:sample-dead-topic")
.maximumRedeliveries(4).redeliveryDelay(60000)
.retriesExhaustedLogLevel(LoggingLevel.WARN)
.retryAttemptedLogLevel( LoggingLevel.WARN)
.retriesExhaustedLogLevel(LoggingLevel.ERROR)
.logHandled(true)
.allowRedeliveryWhileStopping(true)
.logRetryStackTrace(true)
.logExhausted(true)
.logStackTrace(true)
.logExhaustedMessageBody(true)
)
retry is configured for 1 minute interval.
Camel application logged the errors for evry retry with the detailed information.

Why do my RabbitMQ channels keep closing?

I'm debugging some Java code that uses Apache POI to pull data out of Microsoft Office documents. Occasionally, it encounter a large document and POI crashes when it runs out of memory. At that point, it tries to publish the error to RabbitMQ, so that other components can know that this step failed and take the appropriate actions. However, when it tries to publish to the queue, it gets a com.rabbitmq.client.AlreadyClosedException (clean connection shutdown; reason: Attempt to use closed channel).
Here's the error handler code:
try {
//Extraction and indexing code
}
catch(Throwable t) {
// Something went wrong! We'll publish the error and then move on with
// our lives
System.out.println("Error received when indexing message: ");
t.printStackTrace();
System.out.println();
String error = PrintExc.format(t);
message.put("error", error);
if(mime == null) {
mime = "application/vnd.unknown";
}
message.put("mime", mime);
publish("IndexFailure", "", MessageProperties.PERSISTENT_BASIC, message);
}
For completeness, here's the publish method:
private void publish(String exch, String route,
AMQP.BasicProperties props, Map<String, Object> message) throws Exception{
chan.basicPublish(exch, route, props,
JSONValue.toJSONString(message).getBytes());
}
I can't find any code within the try block that appears to close the RabbitMQ channel. Are there any circumstances in which the channel could be closed implicitly?
EDIT: I should note that the AlreadyClosedException is thrown by the basicPublish call inside publish.
An AMQP channel is closed on a channel error. Two common things that can cause a channel error:
Trying to publish a message to an exchange that doesn't exist
Trying to publish a message with the immediate flag set that doesn't have a queue with an active consumer set
I would look into setting up a ShutdownListener on the channel you're trying to use to publish a message using the addShutdownListener() to catch the shutdown event and look at what caused it.
Another reason in my case was that by mistake I acknowledged a message twice. This lead to RabbitMQ errors in the log like this after the second acknowledgment.
=ERROR REPORT==== 11-Dec-2012::09:48:29 ===
connection <0.6792.0>, channel 1 - error:
{amqp_error,precondition_failed,"unknown delivery tag 1",'basic.ack'}
After I removed the duplicate acknowledgement then the errors went away and the channel did not close anymore and also the AlreadyClosedException were gone.
I'd like to add this information for other users who will be searching for this topic
Another possible reason for Receiving a Channel Closed Exception is when Publishers and Consumers are accessing Channel/Queue with different queue declaration/settings
Publisher
channel.queueDeclare("task_queue", durable, false, false, null);
Worker
channel.queueDeclare("task_queue", false, false, false, null);
From RabbitMQ Site
RabbitMQ doesn't allow you to redefine an existing queue with different parameters and will return an error to any program that tries to do that
Apparently, there are many reasons for the AMQP connection and/or channels to close abruptly. In my case, there was too many unacknowledged messages on the queue because the consumer didn't specify the prefetch_count so the connection was getting terminated every ~1min. Limiting the number of unacknowledged messages by setting the consumer's prefetch count to a non-zero value fixed the problem.
channel.basicQos(100);
For those who wonder why their consuming channels are closing, check if you try to Ack or Nack a delivery more than once.
In the rabbitmq log you would see messages like:
operation basic.ack caused a channel exception precondition_failed:
unknown delivery tag ...
I also had this problem. The reason for my case was that, first I built the queue with durable = false and in the log file I had this error message when I switched durable to true:
"inequivalent arg 'durable' for queue 'logsQueue' in vhost '/':
received 'true' but current is 'false'"
Then, I changed the name of the queue and it worked for me. I assumed that the RabbitMQ server keeps the record of the built queues somewhere and it cannot change the status from durable to non-durable and vice versa.
Again I made durable=false for the new queue and this time I got this error
"inequivalent arg 'durable' for queue 'logsQueue1' in vhost '/':
received 'false' but current is 'true'"
My assumption was true. When I listed the queues in rabbitMQ server by:
rabbitmqctl list_queues
I saw both queues in the server.
To summarize, 2 solutions are:
1. renaming the name of the queue which is not a good solution
2. resetting rabbitMQ by:
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app

Categories