Faced an issue on prod system where 1 message was left unacked for 30 mins which lead to consumer being shutdown. Now I have added shutdownlisteners as described in rabbit mq docs -
https://rabbitmq.github.io/rabbitmq-java-client/api/4.x.x/com/rabbitmq/client/ShutdownListener.html
if (cause.isHardError()) {
log.error("Connection error with cause : {}", cause);
Connection conn = (Connection) cause.getReference();
if (!cause.isInitiatedByApplication()) {
Method reason = cause.getReason();
log.error("Rabbit Mq Consumer Connection Shutdown : {} {}", reason, cause);
}
} else{
Channel ch = (Channel)cause.getReference();
log.error("Channel error details : {}", ch);
}
});
The issue is it's not getting invoked at all in testing. I tried triggering it through 2 ways-
Through unacked delivery timeout. Basically threw a general excption and never acked it(these were the original conditions of the bug). However, this didn't work.
I used channel.close() to shutdown the consumer but still didn't receive an event.
Looking for any way to replicate the issue I faced and test/trigger the shutdownlisteners. Thanks
Related
We have multiple application servers in a cluster connecting to an instance of ActiveMQ Artemis 2.17.0. One of the applications will act as master and the remaining application servers will act as slave nodes.
The master will try to do drop the queue and recreates the queue before producing or consuming the messages as part of the data processing cycle.
Below is the exception stack we are observing in Artemis logs
2022-01-03 19:14:55,603 WARN [org.apache.activemq.artemis.core.server] Errors occurred during the buffering operation : ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ229025: Cannot delete queue -1.36.1.level-1.queue on binding -1.36.1.level-1.queue - it has consumers = org.apache.activemq.artemis.core.postoffice.impl.LocalQueueBinding]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(ActiveMQServerImpl.java:2338) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(ActiveMQServerImpl.java:2306) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(ActiveMQServerImpl.java:2297) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(ActiveMQServerImpl.java:2277) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(ActiveMQServerImpl.java:2270) [artemis-server-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.protocol.openwire.OpenWireConnection.removeDestination(OpenWireConnection.java:1062) [artemis-openwire-protocol-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.core.protocol.openwire.OpenWireConnection$CommandProcessor.processRemoveDestination(OpenWireConnection.java:1206) [artemis-openwire-protocol-2.17.0.jar:2.17.0]
at org.apache.activemq.command.DestinationInfo.visit(DestinationInfo.java:124) [activemq-client-5.16.0.jar:5.16.0]
at org.apache.activemq.artemis.core.protocol.openwire.OpenWireConnection.act(OpenWireConnection.java:323) [artemis-openwire-protocol-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33) [artemis-commons-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.17.0.jar:2.17.0]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.17.0.jar:2.17.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_292]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_292]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.17.0.jar:2.17.0]
Could you tell me when this would happen and how to avoid it?
Please find below sample code used in dropping the queue:
public void dropQueue() throws JMSException {
try {
if (connection instanceof ActiveMQConnection && destination instanceof ActiveMQDestination) {
((ActiveMQConnection)connection).destroyDestination((ActiveMQDestination)destination);
} else {
log.info("Dropping queue : " + queueName + " not supported.");
}
} catch(JMSException e) {
throw e;
}
}
The broker is logging this WARN message (and not deleting the queue) because the queue you are trying to delete has consumers on it just as the WARN message indicates:
Cannot delete queue...it has consumers
You can avoid this either by not deleting the queue or removing all the consumers from that queue before deleting it.
In general it's not a good idea for clients to delete queues on the broker since queues are shared resources and may be used by other clients.
Context
I have a webservice writing a test id in a queue. Then, a listener reads the queue, searches the test and starts it. During those steps, it writes updates of the test in the database in order to be shown to the user. To be more precise: the test is launched in a docker container and at the end of it, I want to update the status of the test to FINISHED. For that, I use the docker java library with a callback.
Problem
When it comes to call the callback, I receive multiple error messages on the call to update the test (but it happens only once, if I try twice the second time works, but it still writes a lot of error messages from the transaction manager).
Here are the error messages logged:
2020-11-20 09:20:43,639 WARN [docker-java-stream--1032099154] (org.jboss.jca.core.connectionmanager.listener.TxConnectionListener) IJ000305: Connection error occured: org.jboss.jca.core.connectionmanager.listener.TxConnectionListener#600268e6[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection#236f1a69 connection handles=1 lastReturned=1605860423264 lastValidated=1605860242146 lastCheckedOut=1605860443564 trackByTx=true pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool#2efb0d3b mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool#3e8e7a62[pool=ApplicationDS] xaResource=LocalXAResourceImpl#482fdad2[connectionListener=600268e6 connectionManager=4c83f895 warned=false currentXid=null productName=Oracle productVersion=Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.3.0.0.0 jndiName=java:/ApplicationDS] txSync=TransactionSynchronization#1387480544{tx=Local transaction (delegate=TransactionImple < ac, BasicAction: 0:ffffac110002:-50a6b0bf:5fb6bdb9:73db4 status: ActionStatus.ABORTING >, owner=Local transaction context for provider JBoss JTA transaction provider) wasTrackByTx=true enlisted=true cancel=false}]: java.sql.SQLRecoverableException: IO Error: Socket read interrupted
2020-11-20 09:20:43,647 INFO [docker-java-stream--1032099154] (org.jboss.jca.core.connectionmanager.listener.TxConnectionListener) IJ000302: Unregistered handle that was not registered: org.jboss.jca.adapters.jdbc.jdk8.WrappedConnectionJDK8#4f3c1cb2 for managed connection: org.jboss.jca.adapters.jdbc.local.LocalManagedConnection#236f1a69
2020-11-20 09:20:43,656 WARN [docker-java-stream--1032099154] (com.arjuna.ats.jta) ARJUNA016031: XAOnePhaseResource.rollback for < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac110002:-50a6b0bf:5fb6bdb9:73db4, node_name=1, branch_uid=0:ffffac110002:-50a6b0bf:5fb6bdb9:73db8, subordinatenodename=null, eis_name=java:/ApplicationDS > failed with exception: org.jboss.jca.core.spi.transaction.local.LocalXAException: IJ001160: Could not rollback local transaction
Caused by: org.jboss.jca.core.spi.transaction.local.LocalResourceException: IO Error: Socket read interrupted
at org.jboss.ironjacamar.jdbcadapters#1.4.22.Final//org.jboss.jca.adapters.jdbc.local.LocalManagedConnection.rollback(LocalManagedConnection.java:139)
...
Caused by: java.sql.SQLRecoverableException: IO Error: Socket read interrupted
at com.oracle.jdbc//oracle.jdbc.driver.T4CConnection.doRollback(T4CConnection.java:1140)
...
Caused by: java.io.InterruptedIOException: Socket read interrupted
at com.oracle.jdbc//oracle.net.nt.TimeoutSocketChannel.handleInterrupt(TimeoutSocketChannel.java:258)
...
Explanation
At the beginning, I thought about a connection problem, maybe the the transaction is no more available at the callback time (because the docker run took too long), maybe it has to be invalidated.
But at the end, as written in the console, it came from an interruption of the thread when it tries to acquire the lock to update the test and I discovered where this interruption came from: I took a look at the method executeAndStream in DefaultInvocationBuilder from the docker java library and I discovered this:
Thread thread = new Thread(() -> {
Thread streamingThread = Thread.currentThread();
try (DockerHttpClient.Response response = execute(request)) {
callback.onStart(() -> {
streamingThread.interrupt();
response.close();
});
sourceConsumer.accept(response);
callback.onComplete();
} catch (Exception e) {
callback.onError(e);
}
}, "docker-java-stream-" + Objects.hashCode(request));
thread.setDaemon(true);
thread.start();
And here it is, the closable given to onStart interrupts the thread. After that, I discovered in the method onComplete from ResultCallbackTemplate (that I was extending for my callback) a close on that closable:
#Override
public void onComplete() {
try {
close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
Resolution
The problem finally came from the following code I wrote:
#Override
public void onComplete() {
super.onComplete();
updateTest(FINISHED);
}
I was calling the onComplete method from the parent without knowing what is does and as usual, first before doing anything else. To correct that, I only had to call the super method at the end:
#Override
public void onComplete() {
updateTest(FINISHED);
super.onComplete();
}
I have an odd issue with my Tomcat + Spring websocket application. When a user disconnects without sending a "closing" signal, due to power loss or a plug pull, the thread will block about 10 seconds later.
The thread blocks on this function :
org.springframework.web.socket.WebSocketSession.sendMessage(WebSocketMessage<?> wsm) throws IOException;
I have tried putting a line in my AppConfig to try and set a timeout of 3 seconds but it does not seem to work properly as the block seems to go on for upwards of 15 minutes before throwing an exception.
#Bean(name="servletServerContainerFactoryBean")
public int maxSessionIdleTimeout() {
return 3000;
}
Here is the stack trace after an eventual SocketTimeoutException
Step: 2304
SendB -> test isOpen -> sendMes -> Done -> Finished Send.
SendB -> test2 isOpen -> sendMes -> User closed connection during packet sending: s01
Propogating exception up!
java.io.IOException: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:315)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:250)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:223)
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49)
at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:197)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:102)
at org.infpls.noxio.game.module.game.session.NoxioSession.sendPacket(NoxioSession.java:40)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby.step(GameLobby.java:117)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby$GameLoop.run(GameLobby.java:274)
Caused by: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:494)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:309)
... 8 more
## CRITICAL ## Ejecting player: test2 :: Exception thrown on packet send...
java.io.IOException: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:315)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:250)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:223)
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49)
at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:197)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:102)
at org.infpls.noxio.game.module.game.session.NoxioSession.sendPacket(NoxioSession.java:40)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby.step(GameLobby.java:117)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby$GameLoop.run(GameLobby.java:274)
Caused by: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:494)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:309)
... 8 more
Finished Send. Step Finished.
Having threads be blocked for 15 minutes at a time is a major problem. Any info on why this happens and how to fix it would be greatly appreciated. Thank you!
Found an answer finally. It's actually a system setting.
Source
How many times to retry before deciding that something is wrong and
it is necessary to report this suspicion to network layer. Minimal RFC
value is 3, it is default, which corresponds to 3sec-8min depending on
RTO.
/proc/sys/net/ipv4/tcp_retries2
I have a frequent Channel shutdown: connection error issues (under 24.133.241:5671 thread, name is truncated) in RabbitMQ Java client (my producer and consumer are far apart). Most of the time consumer is automatically restarted as I have enabled heartbeat (15 seconds). However, there were some instances only Channel shutdown: connection error but no Consumer raised exception and no Restarting Consumer (under cTaskExecutor-4 thread).
My current workaround is to restart my application. Anyone can shed some light on this matter?
2017-03-20 12:42:38.856 ERROR 24245 --- [24.133.241:5671] o.s.a.r.c.CachingConnectionFactory
: Channel shutdown: connection error
2017-03-20 12:42:39.642 WARN 24245 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerCont
ainer : Consumer raised exception, processing can restart if the connection factory supports
it
...
2017-03-20 12:42:39.642 INFO 24245 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerCont
ainer : Restarting Consumer: tags=[{amq.ctag-4CqrRsUP8plDpLQdNcOjDw=21-05060179}], channel=Ca
ched Rabbit Channel: AMQChannel(amqp://21-05060179#10.24.133.241:5671/,1), conn: Proxy#7ec317
54 Shared Rabbit Connection: SimpleConnection#44bac9ec [delegate=amqp://21-05060179#10.24.133
.241:5671/], acknowledgeMode=NONE local queue size=0
Generally, this is due to the consumer thread being "stuck" in user code somewhere, so it can't react to the broken connection.
If you have network issues, perhaps it's stuck reading or writing to a socket; make sure you have timeouts set for any I/O operations.
Next time it happens take a thread dump to see what the consumer threads are doing.
I have an application that is attempting to call a service and the other service appears to be timing out. The problem is my application does not receive any timeout exceptions, although I do see an error printed out to the console:
[7/8/13 12:39:32:360 EDT] 00000005 TimeoutManage I WTRN0006W: Transaction 0000013FBF252E43000000010000000CE81CB4935851D5C13DECD3DBB2D463F0DBECAEE60000013FBF252E43000000010000000CE81CB4935851D5C13DECD3DBB2D463F0DBECAEE600000001 has timed out after 120 seconds.
[7/8/13 12:39:32:360 EDT] 00000005 TimeoutManage I WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[WebContainer : 1,5,main]. The stack trace of this thread when the timeout occurred was:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:196)
com.ibm.io.async.AbstractAsyncFuture.waitForCompletion(AbstractAsyncFuture.java:334)
com.ibm.io.async.AsyncFuture.getByteCount(AsyncFuture.java:218)
com.ibm.ws.tcp.channel.impl.AioSocketIOChannel.readAIOSync(AioSocketIOChannel.java:215)
com.ibm.ws.tcp.channel.impl.AioTCPReadRequestContextImpl.processSyncReadRequest(AioTCPReadRequestContextImpl.java:182)
com.ibm.ws.tcp.channel.impl.TCPReadRequestContextImpl.read(TCPReadRequestContextImpl.java:111)
com.ibm.ws.http.channel.outbound.impl.HttpOutboundServiceContextImpl.parseResponseMessageSync(HttpOutboundServiceContextImpl.java:1657)
com.ibm.ws.http.channel.outbound.impl.HttpOutboundServiceContextImpl.readSyncResponse(HttpOutboundServiceContextImpl.java:725)
com.ibm.ws.http.channel.outbound.impl.HttpOutboundServiceContextImpl.startResponseReadSync(HttpOutboundServiceContextImpl.java:1775)
com.ibm.ws.http.channel.outbound.impl.HttpOutboundServiceContextImpl.finishRequestMessage(HttpOutboundServiceContextImpl.java:1195)
com.ibm.ws.websvcs.transport.http.out.HttpOutSyncWriter.finishBufferRequest(HttpOutSyncWriter.java:94)
com.ibm.ws.websvcs.transport.http.out.HttpOutWriter.writeBuffer(HttpOutWriter.java:136)
com.ibm.ws.websvcs.transport.http.out.HttpOutByteBufferOutputStream.finish(HttpOutByteBufferOutputStream.java:468)
com.ibm.ws.websvcs.transport.http.SOAPOverHTTPSender.sendChunkedRequest(SOAPOverHTTPSender.java:890)
com.ibm.ws.websvcs.transport.http.SOAPOverHTTPSender.sendSOAPRequest(SOAPOverHTTPSender.java:807)
com.ibm.ws.websvcs.transport.http.SOAPOverHTTPSender.send(SOAPOverHTTPSender.java:611)
com.ibm.ws.websvcs.transport.http.HTTPTransportSender.invoke(HTTPTransportSender.java:364)
org.apache.axis2.engine.AxisEngine.send(AxisEngine.java:531)
org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:401)
org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:228)
org.apache.axis2.client.OperationClient.execute(OperationClient.java:163)
org.apache.axis2.jaxws.core.controller.impl.AxisInvocationController.execute(AxisInvocationController.java:581)
org.apache.axis2.jaxws.core.controller.impl.AxisInvocationController.doInvoke(AxisInvocationController.java:130)
org.apache.axis2.jaxws.core.controller.impl.InvocationControllerImpl.invoke(InvocationControllerImpl.java:93)
org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invokeSEIMethod(JAXWSProxyHandler.java:364)
org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invoke(JAXWSProxyHandler.java:185)
The client is created with these settings:
bindProvider.getRequestContext().put(com.ibm.wsspi.webservices.Constants.RESPONSE_TIMEOUT_PROPERTY , connectionProperties.getProperty(MyService.TIME_OUT));
bindProvider.getRequestContext().put(com.ibm.wsspi.webservices.Constants.CONNECTION_TIMEOUT_PROPERTY , connectionProperties.getProperty(MyService.TIME_OUT));
bindProvider.getRequestContext().put(com.ibm.wsspi.webservices.Constants.READ_TIMEOUT_PROPERTY , connectionProperties.getProperty(MyService.TIME_OUT));
MyService.TIME_OUT has a value of 20000 and I have verified that it is being set correctly.
The code that catches calls the service looks like this:
try
{
response = ((MyServicePortType) myService).doWebServiceOperation(request);
}
catch (Throwable e) //I know, catch Throwable is not very good but right now I'd be happy to catch ANYthing here!
{
log.error("Webservice reported error",e);
}
Even though I've changed my catch block to catch anything, I still don't catch any exceptions. WebSphere detects a transaction timeout, but I don't know why the application doesn't detect a timeout in the web service call. Is there something I'm missing that would cause a proper timeout exception to be thrown so that I can catch it and send the message to the application framework?
Well now I feel silly.
It seems that for WebSphere, these properties (RESPONSE_TIMEOUT_PROPERTY, CONNECTION_TIMEOUT_PROPERTY, etc...) should have their values set in seconds, and I was using milliseconds based on what I'd seen in online examples (that clearly were not intended for WebSphere).
Changing 20000 to 20 has resolved this problem.
The page that suggested I should be assuming seconds instead of milliseconds is this one: http://pic.dhe.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/ae/rwbs_httptransportprop.html