In JMS it is easy to find out if a connection is lost, a exception happens. But how do I find out if the connection is there again?
Scenario: I use JMS to communicate with my server. Now my connection breaks (server is down), which results in a exception. So far so good. If the server is up again and the connection is reestablished, how do I know that?
I don't see any Listeners which would facilitate such information.
Ahhh...the old exception handling/reconnection conundrum.
There are some transport providers that will automatically reconnect your application for you and some who make the app drive reconnection. In general the reconnections hide the exception from the application. The down side is that you don't want the app to hang forever if all the remote messaging nodes are down so ultimately, you must include some reconnection logic.
Now here's the interesting part - how do you handle the exceptions in a provider neutral way? The JMS exception is practically worthless. For example, a "security exception" can be that the Java security policies are too restrictive, that the file system permissions are too restrictive, that the LDAP credentials failed, that the connection to the transport failed, that the open of the queue or topic failed or any of dozens of other security-related problems. It's the linked exception that has the details from the transport provider that really help debug the problem. My clients have generally taken one of three different approaches here...
Treat all errors the same. Close all objects and reinitialize them. this is JMS portable.
Allow the app to inspect the linked exceptions to distinguish between fatal and transient errors (i.e. auth error vs. queue full). Not provider portable.
Provider-specific error-handling classes. A hybrid of the other two.
In your case, the queue and topic objects are probably only valid in the context of the original connection. Assuming a provider who reconnects automatically the fact that you got an exception means reconnect failed and the context for queue and topic objects could not be restored. Close all objects and reconnect.
Whether you want to do something more provider-specific such as distinguish between transient and permanent errors is one of those "it depends" things and you'll have to figure that out on a case-by-case basis.
The best way to monitor for connection exception is setting an exception listener, for example:
ConnectionFactory connectionFactory = (ConnectionFactory) context.lookup("jmsContextName");
connection = connectionFactory.createConnection();
connection.setExceptionListener(new ExceptionListener() {
#Override
public void onException(JMSException exception) {
logger.error("ExceptionListener triggered: " + exception.getMessage(), exception);
try {
Thread.sleep(5000); // Wait 5 seconds (JMS server restarted?)
restartJSMConnection();
} catch (InterruptedException e) {
logger.error("Error pausing thread" + e.getMessage());
}
}
});
connection.start();
JMS spec does not describe any transport protocol, it does not say anything about connections (i.e. should broker keep them alive or establish a new connection for every session). So, I think what you mean by
Now my connection breaks (server is down), which results in a exception.
is that you are trying to send a message and you are getting a JmsException.
I think, the only way to see if broker is up is to try to send a message.
Your only option in the case of a Connection based JMSException is to attempt to reestablish the connection in your exception handler, and retry the operation.
Related
In my Java application I am using the failover transport to connect to a local ActiveMQ broker:
failover:(tcp://0.0.0.0:61616)
I create one single connection that I reuse in the rest of the application:
ActiveMQConnection connection = (ActiveMQConnection) connectionFactory.createConnection();
In another part of the application when I receive some external call I need to send a message to the broker, and so, for doing that I create a new "Session":
Session locSession = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
When the broker is down my app tries to reconnect to the broker forever (this is the expected behavior I really want to have).
However, the problem is that if the broker is down and I receive a call that invokes the code that executes the connection.createSession(false, Session.AUTO_ACKNOWLEDGE) then my app hangs forever on this line of code waiting for the app to reconnect successfully to the broker and then create the session.
Please, do you know any way to check before I execute createSession if the connection object is trying to reconnect or it is really connected? If I am able to know this I could avoid the creation of the session if the app is not connected to the broker (only trying to reconnect) and therefore I would avoid to hanging on connection.createSession forever (I would raise an exception).
I wasn't able to find any property or method on ActiveMQConnection to gather this information.
The failover: url provides a setting startupMaxReconnectAttempts to prevent infinite retry when connecting to the broker the first time.
Also note-- If you want an exception to bubble up, that conflicts with requirement to have infinite retry. You would need to adjust the failover settings to match your intended behavior, by setting a max count or max time to perform retry, then throw an exception and unblock your caller.
For example, you could indicate you only want to retry for 5 minutes, then receive an exception to handle in the code to prevent the infinite blocking.
Thank you all for your help and suggestions. They helped me a lot in re-focusing the problem.
However I f found the answer to my question using the method "getTransport().isConnected()".
In our java mail (using Java Mail API) application we first connect to the mail server, fetch messages, process headers and then afterwards process the message bodies and attachments using pop3 as usual.
Session session = Session.getInstance(props, null);
Store store = session.getStore(urln);
store.connect();
Folder f = store.getFolder("INBOX");
f.open(READ);
Messages m = f.getMessages(..);
for (Message m : messages) {
if (!store.isConnected()) {
//raise exception
}
processSubject();
processFrom();
processBodyAndAttachments();
..
}
The implementation works fine on most environments, but on some customer the storeconnection gets lost during the process in the for loop. We can see the raises exception in the logs. My questions:
AFAIK, the mail server can sometimes reject new connections, but does
it terminate current living connections (may be becasue of too much
connections or disconnects old ones to give access to the new ones?)
When the store is disconnected, does the folder gets closed too?
Is it better to check the folder?
The connection may be lost everywhere in the for loop and it does not
seem to be a good practise to put isConnected check everywhere in the
loop, it will make the code dirty and also cause performance issues,
is it a good practise to put in a try catch block and check for
IOExceptions? (Folder closed) Or other suggestions? Which exceptions
should be handled? There may be some cases where the message is not
parseable but connection is healthy.
What about adding a disconnect listener?
Network connections can be broken for a variety of reasons. Your program always has to be prepared for the connection to drop at any time.
With POP3, there is only one connection, so if the connection is dropped the store should be disconnected and the folder should be closed.
If the Folder is open, check the Folder. Otherwise check the Store.
You need a strategy for handling failures. If you keep track of what messages have been successfully processed you may be able to restart the processing at the next message after a failure. A lot of the details depend on your environment and your application requirements.
A disconnect listener won't make this easier.
I am using rabbit mq 3.4.1 java client library and not able to get the auto recovery mechanism work.
This is how I am creating the rabbit mq connection factory:
factory = new ConnectionFactory();
factory.setUsername(userName);
factory.setPassword(password);
factory.setVirtualHost(virtualHost);
factory.setAutomaticRecoveryEnabled(true);
factory.setNetworkRecoveryInterval(5);
factory.setRequestedHeartbeat(3);
After publishing of a message, if I shutdown the rabbit mq broker and bring it up again, I expect the recovery mechanism to kick in and have the connection restored to a 'sane' state. But I get the below error:
com.rabbitmq.client.AlreadyClosedException: connection is already closed due to connection error; protocol method: #method<connection.close>(reply-code=320, reply-text=CONNECTION_FORCED - broker forced connection closure with reason 'shutdown', class-id=0, method-id=0)
at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:190) ~[amqp-client-3.4.1.jar:na]
at com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:291) ~[amqp-client-3.4.1.jar:na]
at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:654) ~[amqp-client-3.4.1.jar:na]
at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:631) ~[amqp-client-3.4.1.jar:na]
at com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:622) ~[amqp-client-3.4.1.jar:na]
Am I missing anything here? The only way to work around this problem is to register a ShutDownListener and re-initialize the rabbit mq connection factory, connection, and the channels.
Also to answer
"chrislott"
comment, I see the auto recovery kicking in to recover. I create a exchange by using a temporary channel:
Channel channel = connection.createChannel();
channel.exchangeDeclare(exchangeName, exchangeType, durable);
channel.close();
And I see the below exception when its trying to recover the topology:
Caught an exception when recovering topology Caught an exception while recovering exchange testSuccessfulInitVirtualHost_Exchange: channel is already closed due to clean channel shutdown; protocol method: #method<channel.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0)
com.rabbitmq.client.TopologyRecoveryException: Caught an exception while recovering exchange testSuccessfulInitVirtualHost_Exchange: channel is already closed due to clean channel shutdown; protocol method: #method<channel.close>(reply-code=200, reply-text=OK, class-id=0, method-id=0)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverExchanges(AutorecoveringConnection.java:482)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverEntities(AutorecoveringConnection.java:467)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.beginAutomaticRecovery(AutorecoveringConnection.java:411)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.access$000(AutorecoveringConnection.java:52)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection$1.shutdownCompleted(AutorecoveringConnection.java:351)
at com.rabbitmq.client.impl.ShutdownNotifierComponent.notifyListeners(ShutdownNotifierComponent.java:75)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:574)
The above exception is not seen if I do not close the channel that's used for creating the exchange.
My reading of the RabbitMQ ConnectionFactory#setAutomaticRecoveryEnabled(Boolean) method is that it primarily enables recovery from NETWORK failure.
Here's a nice discussion: https://www.rabbitmq.com/api-guide.html
For example, if your machine loses a route to the broker for a period of time, perhaps due to a switch or other failure, then the automatic recovery can re-establish a connection etc. The doc doesn't say anything about surviving broker shutdown/restart, I don't think your expectation is reasonable.
IMHO to recover from a broker restart, the shutdown-listener approach seems to be a solid approach.
Normally rabbit client should handle recovery itself - you shouldn't reimplement the same manually. At least try using lyra.
I had some problems during failover testing. Connections tend to start hanging forever on broker restart, so shutdown signal exception was the last thing in logs. I fixed it by setting:
factory.setConnectionTimeout(20000);
Also recovery didn't play well with temporary queues. If you have those you probably will have to do some additional handling (again, try lyra first).
What's the correct way of handling a websocket error besides logging it?
Regarding onError(), the Endpoint documentation states that:
Developers may implement this method when the web socket session
creates some kind of error that is not modeled in the web socket
protocol. This may for example be a notification that an incoming
message is too big to handle, or that the incoming message could not
be encoded.
There are a number of categories of exception that this method is
(currently) defined to handle:
connection problems, for example, a socket failure that occurs before the web socket connection can be formally closed. These are modeled as SessionExceptions
runtime errors thrown by developer created message handlers calls.
conversion errors encoding incoming messages before any message handler has been called. These are modeled as DecodeExceptions
Are all of these types of exceptions fatal, causing the websocket to close?
Should the onError() method close the websocket (call Session.close()) if an error occurs?
So far, I assumed that it's my responsibility to cleanly close the session, informing the client about the close reason. This is why my onError() tried invoking session.close() if session.isOpen() returned true, but this caused tomcat (8.0.15) to throw a NullPointerException:
...
Caused by: java.lang.NullPointerException
at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.onWritePossible(WsRemoteEndpointImplServer.java:96)
at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:444)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessage(WsRemoteEndpointImplBase.java:335)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.startMessageBlock(WsRemoteEndpointImplBase.java:264)
at org.apache.tomcat.websocket.WsSession.sendCloseMessage(WsSession.java:536)
at org.apache.tomcat.websocket.WsSession.doClose(WsSession.java:464)
at org.apache.tomcat.websocket.WsSession.close(WsSession.java:441)
at my.package.MyEndpoint.onWebSocketError(MyEndpoint.java:229)
... 18 more
Is this a tomcat bug, a misunderstanding on my part, or both?
Edit: It seems that the Java EE websocket example dukeeetf2 assumes that errors are fatal; and that there's no need to close the session. The errors are logged, and the session is removed:
#OnError
public void error(Session session, Throwable t) {
/* Remove this connection from the queue */
queue.remove(session);
logger.log(Level.INFO, t.toString());
logger.log(Level.INFO, "Connection error.");
}
#OnError method invocation does not mean that Session will be closed; You can do whatever you want, it depends in the contract specified by your application.
stacktrace from tomcat implementation seems like a bug.
ad dukeeetf2 sample - seems like this code contains other assumptions - Endpoints does not throw an exception, so everything caught here is from underlying WebSocket framework implementation. That does not really mean that there is an "Connection Error"; I would maybe do close right away (if this is how I wan't my application to handle errors); this implementation could result in opened connections without any messages.
I saw this is a bit dated but ended up here today when looking for this info.
Depending on how you rely on the state of the websocket, you need to close the session manually, at least for the javax.websocket implementation.
In my case, the error happening was causing a problem for the websession client administration implementation, so I closed the session as in the above example.
I think it depends on what you need, but it certainly does not do a close session in this implementation.
I though Spymemcached does attempt to reestablish connection to the server when this connection get lost.
But I am seeing something different; wondering what I misunderstand or what I do wrong. Here is some sample code:
MemcachedClient c=new MemcachedClient(AddrUtil.getAddresses("server:11211"));
while(true)
try {
System.out.println(c.get("hatsts"));
Thread.sleep(10000);
} catch(Exception e) {
e.printStackTrace();
}
It runs initially without problem. Then I pull the network plug. Subsequently, the client detects a network failure and throws following exception:
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
But then, when i re-establish the network, the client does not recover and continues throwing the exception; even after 5 min. I tried SpyMemcached 2.10.6 and 2.9.0.
What am I missing?
The problem here is that because you pulled the network cable the tcp socket on you client still thinks the connection is valid. The tcp keepalive varies from operating system to operating system and can be as high as 30 minutes. As a result the application (in this case Spymemcached) is not notified that the connection is no longer valid by the tcp layer and therefore Spymemcached doesn't try to reconnect.
The way Spymemcached detects this situation is by counting the amount of consecutive operation timeouts. The last time I checked the default value was 99. Once this many ops time out then Spymemcached will reconnect. You can change this value in the ConnectionFactory if you want to set it to some other value. There's a function called getContinuousTimeout() which is where the Spymemcached gets 99 from by default. You can construct your own ConnectionFactory with the ConnectionFactoryBuilder.
Hopefully this is enough information to answer your question and get you going in the right direction. If not let me know and I can add some more details.