I have a simple JMS application deployed on OC4J under AIX server, in my application I'm listening to some queues and sending to other queues on a Websphere MQ deployed under AS400 server.
The problem is that my connections to these queues are terminated/closed when it stays idle for some time with the error MQJMS1016 (this is not the problem), and when that happens I attempt to recover the connection and it works, however, the old connection is stuck at the MQ and would not terminate until it is terminated manually.
The recovery code goes as follows:
public void recover() {
cleanup();
init();
}
public void cleanup(){
if (session != null) {
try {
session .close();
} catch (JMSException e) {
}
}
if (connection != null) {
try {
connection.close();
} catch (JMSException e) {
}
}
}
public void init(){
// typical initialization of the connection, session and queue...
}
The MQJMS1016 is an internal error and indicates that the connection loss is due to something wrong with the code or WMQ itself. Tuning the channels will help but you really need to get to the problem of why the app is spewing orphaned connections fast enough to exhaust all available channels.
The first thing I'd want to do is check the versions of WMQ and of the WMQ client that are running. If this is new development, be sure you are using the WMQ v7 client because v6 is end-of-life as of Sept 2011. The v7 client works with v6 QMgrs until you are able to upgrade that as well. Once you get to v7 client and QMgr, there are quite a bit of channel tuning and reconnection options available to you.
The WMQ v7 client download is here: http://bit.ly/bXM0q3
Also, note that the reconnect logic in the code above does not sleep between attempts. If a client throws connection requests at a high rate of speed, it can overload the WMQ listener and execute a very effective DOS attack. Recommended to sleep a few seconds between attempts.
Finally, please, PLEASE print the linked exceptions in your JMSException catch blocks. If you have a problem with a JMS transport provider, the JMS Linked Exception will contain any low-level error info. In the case of WMQ it contains the Reason Code such as 2035 MQRC_AUTHORIZATION_ERROR or 2033 MQRC_NO_MSG_AVAILABLE. Here's an example:
try {
.
. code that might throw a JMSException
.
} catch (JMSException je) {
System.err.println("caught "+je);
Exception e = je.getLinkedException();
if (e != null) {
System.err.println("linked exception: "+e);
} else {
System.err.println("No linked exception found.");
}
}
If you get an error at 2am some night, your WMQ administrator will thank you for the linked exceptions.
Since the orphaned connections (stuck connections on MQ side) does not affect the messages processing (i.e. they do not consume messages), we left things as it is until the maximum connections allowed on the MQ was reached.
The recovery did not work anymore, and once we reached that point, the MQ administrator had to clean the orphaned connection manually, however, the good news is that searching for this particular problem led to an issue reported on IBM support site:
check here
Related
We are using hazelcast distributed lock and cache functions in our products. Usage of distributed locking is vitally important for our business logic.
Currently we are using the embedded mode(each application node is also a hazelcast cluster member). We are going to switch to client - server mode.
The problem we have noticed for client - server is that, once the cluster is down for a period, after several attempts clients are destroyed and any objects (maps, sets, etc.) that were retrieved from that client are no longer usable.
Also the client instance does not recover even after the Hazelcast cluster comes back up (we receive HazelcastInstanceNotActiveException )
I know that this issue has been addressed several times and ended up as being a feature request:
issue1
issue2
issue3
My question : What should be the strategy to recover the client? Currently we are planning to enqueue a task in the client process as below. Based on a condition it will try to restart the client instance...
We will check whether the client is running or not via clientInstance.getLifecycleService().isRunning() check.
Here is the task code:
private class ClientModeHazelcastInstanceReconnectorTask implements Runnable {
#Override
public void run() {
try {
HazelCastService hazelcastService = HazelCastService.getInstance();
HazelcastInstance clientInstance = hazelcastService.getHazelcastInstance();
boolean running = clientInstance.getLifecycleService().isRunning();
if (!running) {
logger.info("Current clientInstance is NOT running. Trying to start hazelcastInstance from ClientModeHazelcastInstanceReconnectorTask...");
hazelcastService.startHazelcastInstance(HazelcastOperationMode.CLIENT);
}
} catch (Exception ex) {
logger.error("Error occured in ClientModeHazelcastInstanceReconnectorTask !!!", ex);
}
}
}
Is this approach suitable? I also tried to listen LifeCycle events but could not make it work via events.
Regards
In Hazelcast 3.9 we changed the way connection and reconnection works in clients. You can read about the new behavior in the docs: http://docs.hazelcast.org/docs/3.9.1/manual/html-single/index.html#configuring-client-connection-strategy
I hope this helps.
In Hazelcast 3.10 you may increase connection attempt limit from 2 (by default) to maximum:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getNetworkConfig().setConnectionAttemptLimit(Integer.MAX_VALUE);
I'm using Qpid Proton (proton-j-0.13.0) to send messages over AMQP to an ActiveMQ 5.12.0 queue. On a development machine, where ActiveMQ and the Java program run on the same machine, this is working fine. On a test environment, where ActiveMQ is running on a separate server, we see the send() method hangs in 15 to 20 percent of the cases. The CPU also remains around 100% when the send() method hangt. When the send() succeeds, it completes within 0.1 seconds.
Statements to perform a send are similar to this:
final Messenger messenger = Messenger.Factory.create();
messenger.start;
messenger.put(message); // one message of 1 KByte
messenger.send(1);
messenger.stop();
I'm aware Messenger.send(int n) is a blocking method. However, I don't know why it would block my calls. I can add a timeout and try to resend the message, but that's a workaround instead of a proper solution.
Statements to receive the sent messages from ActiveMQ are similar to this:
this.messenger = Messenger.Factory.create();
this.messenger.start();
this.messenger.subscribe(this.address);
while (this.isRunning) {
try {
this.messenger.recv(1);
while (this.messenger.incoming() > 0) {
final Message message = this.messenger.get();
this.messageListener.onMessage(message);
} catch (final Exception e) {
LOGGER.error("Exception while receiving messages", e);
}
}
Am I missing something simple, being a Qpid newbie? Could this be configuration in ActiveMQ? Is it normal to add a timeout and retry? Any help to resolve this would appreciated.
How can I disconnect a netty client from the server so it executes the handerRemoved method on the server side and completely stops running? I tried using group.shutDownGraceFully() but the client still keeps connected to the server. Is there any method I am missing?
I also noticed when I try to connect to the server and it is not reachable (connection refused), the next time I connect to a real server it connects but it does not send or get any more messages.
you seem to be new to network programming in general.
I am new to netty so please don't take anything I say as 100% true and especially not anywhere near 100% efficient.
so one major basic fact of network programming is that the the client and server are not directly linked (obviously). In order to execute a method on the server, you will need to send a message from the client to the server. for instance:
what you have on Client:
//on shutdown
{
workerGroup.shutdownGracefully();
bossGroup.shutdownGracefully();
}
what you want:
{
yourchannelname.writeAndFlush("bye"+"\r\n")
workerGroup.shutdownGracefully();
bossGroup.shutdownGracefully();
}
and when the server receives the bye command:
// If user typed the 'bye' command, wait until the server closes
// the connection.
if ("bye".equals(line.toLowerCase())) {
ch.closeFuture().sync();
break;
}
}
//this is for safety reasons, it is optional-ish
// Wait until all messages are flushed before closing the channel.
if (lastWriteFuture != null) {
lastWriteFuture.sync();
}
//what you already have
} finally {
// The connection is closed automatically on shutdown.
group.shutdownGracefully();
}
}
hope this helped, iv never answered a question on stack overflow before so I hope I at least sound like I know what im doing :P
I've created an MMO for the Android phone and use a Java server with TCP/IP sockets. Everything generally works fine, but after about a day of clients logging on and off my network becomes extremely laggy -- even if there aren't clients connected. NETSTAT shows no lingering connections, but there is obviously something terribly wrong going on.
If I do a full reboot everything magically is fine again, but this isn't a tenable solution for the long-term. This is what my disconnect method looks like (on both ends):
public final void disconnect()
{
Alive = false;
Log.write("Disconnecting " + _socket.getRemoteSocketAddress());
try
{
_socket.shutdownInput();
}
catch (final Exception e)
{
Log.write(e);
}
try
{
_socket.shutdownOutput();
}
catch (final Exception e)
{
Log.write(e);
}
try
{
_input.close();
}
catch (final Exception e)
{
Log.write(e);
}
try
{
_output.close();
}
catch (final Exception e)
{
Log.write(e);
}
try
{
_socket.close();
}
catch (final Exception e)
{
Log.write(e);
}
}
_input and _output are BufferedInputStream and BufferedOutputStream spawned from the socket. According to documentation calling shutdownInput() and shutdownOutput() shouldn't be necessary, but I'm throwing everything I possibly can at this.
I instantiate the sockets with default settings -- I'm not touching soLinger, KeepAlive, noDelay or anything like that. I do not have any timeouts set on send/receive. I've tried using WireShark but it reveals nothing unusual, just like NETSTAT.
I'm pretty desperate for answers on this. I've put a lot of effort into this project and am frustrated with what appears to be a serious hidden flaw in Java's default TCP implementation.
Get rid of shutdownInput() and shutdownOutput() and all the closes except the close for the BufferedOutputStream, and a subsequent close on the socket itself in a finally block as a belt & braces. You are shutting down and closing everything else before the output stream, which prevents it from flushing. Closing the output stream flushes it and closes the socket. That's all you need.
OP here, unable to comment on original post.
Restarting the server process does not appear to resolve the issue. The network remains very "laggy" even several minutes after shutting down the server entirely.
By "laggy" I mean the connection becomes extremely slow with both up and down traffic. Trying to load websites, or upload to my FTP, is painfully slow like I'm on a 14.4k modem (I'm on a 15mbs fiber). Internet Speed Tests don't even work when it is in this state -- I get an error about not finding the file, when the websites eventually load up.
All of this instantly clears up after a reboot, and only after a reboot.
I modified my disconnect method as EJP suggested, but the problem persists.
Server runs on a Windows 7 installation, latest version of Java / Java SDK. The server has 16gb of RAM, although it's possible I'm not allocating it properly for the JVM to use fully. No stray threads or processes appear to be present. I'll see what JVISUALVM says. – jysend 13 mins ago
Nothing unusual in JVISUALVM -- 10mb heap, 50% CPU use, 3160 objects (expected), 27 live threads out of 437 started. Server has been running for about 18 hours; loading up CNN's front page takes about a minute, and the normal speed test I use (first hit googling Speed Test) won't even load the page. NETSTAT shows no lingering connections. Ran all up to date antivirus. Server has run 24/7 in the past without any issues -- it is only when I started running this Java server on it that this started to happen.
In JMS it is easy to find out if a connection is lost, a exception happens. But how do I find out if the connection is there again?
Scenario: I use JMS to communicate with my server. Now my connection breaks (server is down), which results in a exception. So far so good. If the server is up again and the connection is reestablished, how do I know that?
I don't see any Listeners which would facilitate such information.
Ahhh...the old exception handling/reconnection conundrum.
There are some transport providers that will automatically reconnect your application for you and some who make the app drive reconnection. In general the reconnections hide the exception from the application. The down side is that you don't want the app to hang forever if all the remote messaging nodes are down so ultimately, you must include some reconnection logic.
Now here's the interesting part - how do you handle the exceptions in a provider neutral way? The JMS exception is practically worthless. For example, a "security exception" can be that the Java security policies are too restrictive, that the file system permissions are too restrictive, that the LDAP credentials failed, that the connection to the transport failed, that the open of the queue or topic failed or any of dozens of other security-related problems. It's the linked exception that has the details from the transport provider that really help debug the problem. My clients have generally taken one of three different approaches here...
Treat all errors the same. Close all objects and reinitialize them. this is JMS portable.
Allow the app to inspect the linked exceptions to distinguish between fatal and transient errors (i.e. auth error vs. queue full). Not provider portable.
Provider-specific error-handling classes. A hybrid of the other two.
In your case, the queue and topic objects are probably only valid in the context of the original connection. Assuming a provider who reconnects automatically the fact that you got an exception means reconnect failed and the context for queue and topic objects could not be restored. Close all objects and reconnect.
Whether you want to do something more provider-specific such as distinguish between transient and permanent errors is one of those "it depends" things and you'll have to figure that out on a case-by-case basis.
The best way to monitor for connection exception is setting an exception listener, for example:
ConnectionFactory connectionFactory = (ConnectionFactory) context.lookup("jmsContextName");
connection = connectionFactory.createConnection();
connection.setExceptionListener(new ExceptionListener() {
#Override
public void onException(JMSException exception) {
logger.error("ExceptionListener triggered: " + exception.getMessage(), exception);
try {
Thread.sleep(5000); // Wait 5 seconds (JMS server restarted?)
restartJSMConnection();
} catch (InterruptedException e) {
logger.error("Error pausing thread" + e.getMessage());
}
}
});
connection.start();
JMS spec does not describe any transport protocol, it does not say anything about connections (i.e. should broker keep them alive or establish a new connection for every session). So, I think what you mean by
Now my connection breaks (server is down), which results in a exception.
is that you are trying to send a message and you are getting a JmsException.
I think, the only way to see if broker is up is to try to send a message.
Your only option in the case of a Connection based JMSException is to attempt to reestablish the connection in your exception handler, and retry the operation.