I have a frequent Channel shutdown: connection error issues (under 24.133.241:5671 thread, name is truncated) in RabbitMQ Java client (my producer and consumer are far apart). Most of the time consumer is automatically restarted as I have enabled heartbeat (15 seconds). However, there were some instances only Channel shutdown: connection error but no Consumer raised exception and no Restarting Consumer (under cTaskExecutor-4 thread).
My current workaround is to restart my application. Anyone can shed some light on this matter?
2017-03-20 12:42:38.856 ERROR 24245 --- [24.133.241:5671] o.s.a.r.c.CachingConnectionFactory
: Channel shutdown: connection error
2017-03-20 12:42:39.642 WARN 24245 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerCont
ainer : Consumer raised exception, processing can restart if the connection factory supports
it
...
2017-03-20 12:42:39.642 INFO 24245 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerCont
ainer : Restarting Consumer: tags=[{amq.ctag-4CqrRsUP8plDpLQdNcOjDw=21-05060179}], channel=Ca
ched Rabbit Channel: AMQChannel(amqp://21-05060179#10.24.133.241:5671/,1), conn: Proxy#7ec317
54 Shared Rabbit Connection: SimpleConnection#44bac9ec [delegate=amqp://21-05060179#10.24.133
.241:5671/], acknowledgeMode=NONE local queue size=0
Generally, this is due to the consumer thread being "stuck" in user code somewhere, so it can't react to the broken connection.
If you have network issues, perhaps it's stuck reading or writing to a socket; make sure you have timeouts set for any I/O operations.
Next time it happens take a thread dump to see what the consumer threads are doing.
Related
Faced an issue on prod system where 1 message was left unacked for 30 mins which lead to consumer being shutdown. Now I have added shutdownlisteners as described in rabbit mq docs -
https://rabbitmq.github.io/rabbitmq-java-client/api/4.x.x/com/rabbitmq/client/ShutdownListener.html
if (cause.isHardError()) {
log.error("Connection error with cause : {}", cause);
Connection conn = (Connection) cause.getReference();
if (!cause.isInitiatedByApplication()) {
Method reason = cause.getReason();
log.error("Rabbit Mq Consumer Connection Shutdown : {} {}", reason, cause);
}
} else{
Channel ch = (Channel)cause.getReference();
log.error("Channel error details : {}", ch);
}
});
The issue is it's not getting invoked at all in testing. I tried triggering it through 2 ways-
Through unacked delivery timeout. Basically threw a general excption and never acked it(these were the original conditions of the bug). However, this didn't work.
I used channel.close() to shutdown the consumer but still didn't receive an event.
Looking for any way to replicate the issue I faced and test/trigger the shutdownlisteners. Thanks
On one of our platforms, HDFS namenode is shutting down with following error message every 1 or 3 days
FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [<ip1>:<port>,<ip2>:<port>, etc], stream=QuorumOutputStream starting at txid 29873171))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:109)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:710)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:188)
at java.lang.Thread.run(Thread.java:748)
Before this FATAL log we can see following kind of logs, on which we can detect a degradation of response time
WARN client.QuorumJournalManager (QuorumCall.java:waitFor(185)) - Waited 18014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [<ip1>:<port>,<ip2>:<port>]
Have you already encountered this problem, and do you have any advices to fix it ?
We have already:
checked that our VMs are time synchronized
detected that when the problem occurs a burst of data on the network is in progress, without detecting the root cause yet
checked our network devices. Except a problem on a port which goes from UP state to DOWN state quickly that we are going to fix, the network seems correct
Thanks in advance
One of my KafkaStream apps gives me the following error:
Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms
It is not likely a permission or connectivity issue. Because it consumes a batch of messages (maybe till the last message available) then I see Informed to shut down in logs and then State transition from RUNNING to PENDING_SHUTDOWN. Then I get the timeout error.
It is a simple single-threaded app with code as simple as this:
builder.stream("prod.company.scores",
Consumed.with(ScoreSerde.getGenericKeySerde(), ScoreSerde.getEnvelopeSerde()))
.filter((key,value) -> isRecordNew(value))
.filter((key,value) -> isScoreNew(value))
.filter((key,value) -> isScoreRedeem(value))
.peek(foreachAction)
.to("kafka-consumer.score.redeem");
INFO KafkaProducer: Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. this message does not shutdown the application. Need to check for other problems.
Jms outbound-channel-adapter works perfectly fine but I see intermittently this error in the log, however, MQ message still gets delivered.
2019-06-07 10:16:22 [JMSCCThreadPoolWorker-5] INFO o.s.j.c.CachingConnectionFactory - Encountered a JMSException - resetting the underlying JMS Connection
com.ibm.msg.client.jms.DetailedJMSException: JMSWMQ1107: A problem with this connection has occurred.
at com.ibm.msg.client.wmq.common.internal.Reason.reasonToException(Reason.java:578)
at com.ibm.msg.client.wmq.common.internal.Reason.createException(Reason.java:214)
at com.ibm.msg.client.wmq.internal.WMQConnection.consumer(WMQConnection.java:794)
at com.ibm.mq.jmqi.remote.api.RemoteHconn.callEventHandler(RemoteHconn.java:2903)
at com.ibm.mq.jmqi.remote.api.RemoteHconn.driveEventsEH(RemoteHconn.java:628)
at com.ibm.mq.jmqi.remote.impl.RemoteDispatchThread.processHconn(RemoteDispatchThread.java:691)
at com.ibm.mq.jmqi.remote.impl.RemoteDispatchThread.run(RemoteDispatchThread.java:233)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTask(WorkQueueItem.java:263)
at com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.runItem(SimpleWorkQueueItem.java:99)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run(WorkQueueItem.java:284)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.runWorkQueueItem(WorkQueueManager.java:312)
at com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManagerImplementation$ThreadPoolWorker.run(WorkQueueManagerImplementation.java:1214)
Caused by: com.ibm.mq.MQException: JMSCMQ0001: WebSphere MQ call failed with compcode '2' ('MQCC_FAILED') reason '2009' ('MQRC_CONNECTION_BROKEN').
...This is errorChannel configuration:
<int:header-enricher id="errorMsg.HeaderEnricher"
input-channel="errorChannel"
output-channel="omniAlertsJmsErrorChannel">
...and jms outbound-channel-adapter configured:
<int-jms:outbound-channel-adapter
id="jmsOutToNE" channel="umpAlertNotificationJMSChannel"
destination="senderTopic"
jms-template="jmsQueueTemplate"
>
I am expecting omniAlertsJmsErrorChannel to receive the MessageHandlingException which is not happening from jmsOutToNE adapter. All other channel/flow errors are being routed to omniAlertsJmsErrorChannel.
Also, wondering if there jms outbound-channel-adapter is retrying internally when com.ibm.mq.MQException happens and it gets successful in the subsequent try?
The errorChannel is out of use in any MessageHandler. Their exception are just thrown to the caller. To handle those exception you need to consider adding a request-handler-advice-chain with the ExpressionEvaluatingRequestHandlerAdvice. Or if we talk about a retry, you also can add to that chain a RequestHandlerRetryAdvice.
See more info in the Reference Manual.
Not sure though why your message is still delivered to the MQ. There is no any retry out-of-the-box in the <int-jms:outbound-channel-adapter. It might a behavior of the MQ Connection Factory adapter in the IBM library.
When a certain event arrives, I stop my rabbitmq listner using container.stop(); and after the needed job is done, I restart it with container.start(), but when a new event arrives, I get following error:
Exception in thread "SimpleAsyncTaskExecutor-1" 2016-04-22T16:20:53.646 WARN 15336 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : Consumer raised exception, processing can restart if the connection factory supports it
java.lang.NullPointerException: null
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.isActive(SimpleMessageListenerContainer.java:756) ~[spring-rabbit-1.4.3.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$600(SimpleMessageListenerContainer.java:82) ~[spring-rabbit-1.4.3.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1100) ~[spring-rabbit-1.4.3.RELEASE.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
It's actually a harmless (but scary) log messsage, but it is fixed in 1.5.3.
As mentioned in this answer:
It's generally better to stop the container on a separate thread.
because the container can't fully stop until the listener exits.