Tyrus sometimes stuck when trying to connect - java

I'm using the tyrus-standalone-client-1.12.jar to maintain a connection to a Websocket server (or set of servers) I have no control over. I'm creating a ClientManager instance that I configure and then use clientManager.asyncConnectToServer(this, new URI(server)), where this is the instance of a class with annotated methods like #OnOpen, #OnMessage and so on.
I also have a ClientManager.ReconnectHandler registered that handles onDisconnect and onConnectFailure and of course outputs debug messages.
Most of the time it connects just fine, but especially when the server has issues and I loose connection, reconnecting sometimes doesn't work.
I first noticed it when I simply returned true in onDisconnect and it just wouldn't reconnect sometimes (which in this case the ReconnectHandler should have done for me, which it usually did as well). The rest of the program keeps running just fine, but my websocket client just wouldn't do anything after the debug message in onDisconnect.
Since then I changed it to only use the ReconnectHandler to just connect again via asyncConnectToServer (on a delay), in order to be able to switch to another server (I couldn't find a way to do that with just the ReconnectHandler). Even then, the asyncConnectToServer sometimes just seems to not do anything. I'm sure it does something, but it doesn't output the debug message in onConnectFailure and also doesn't call onOpen, even after hours, so the client ends up just being stuck there.
Again, this isn't always the case. It can reconnect just fine several times, both triggered by onDisconnect or by onConnectFailure, and then on one attempt suddenly just hang. When I had two instances of the program running at the same time, both reconnected a few times and then both hang on asyncConnectToServer at the same reconnect attempt, which for me seems to indicate that it is caused by some state of the server or connection.
One time it even failed to connect when intially connecting (not reconnecting), during the same time where the server seemed to have issues.
Does anyone have an idea what could cause this?
Am I missing some property to set a connection attempt timeout?
Am I missing some way to retrieve connection failure info other than ReconnectHandler.onConnectFailure?
Am I even allowed to reuse the same ClientManager to connect several times (after the previous connection closed)?
Could there be something in my client endpoint implementation that somehow prevents onOpen or onConnectFailure to be called?
It's kind of hard to tell what's going on without getting any error message and without being able to reproduce it. I used JConsole's Detect Deadlock button on the program with a client hanging like this and it didn't detect anything (not sure if it would, but I thought I'd mention it).

Related

Getting MQ error (reason 2594) when trying to connect to MQ manager after first few messages

I am upgrading a standalone Java app that uses IBM MQ to send messages to a local Websphere 8.5 server. The existing app uses a bunch of different jars for the MQ code (mq, mqbind mqjms, connector-api, jms).
For the new one I saw that there is now an all-encompassing "allclient" MQ JAR (https://mvnrepository.com/artifact/com.ibm.mq/com.ibm.mq.allclient/9.2.0.0) so I decided to use that.
It appears to work just fine for the first few messages, but after sending 4-5 messages, all subsequent messages will then fail with a code 2594 (https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_9.1.0/com.ibm.mq.tro.doc/q120510_.htm):
Caused by: com.ibm.mq.jmqi.JmqiException: CC=2;RC=2594;AMQ9204: Connection to host 'localhost(5558)' rejected. [1=com.ibm.mq.jmqi.JmqiException[CC=2;RC=2594;AMQ9503: Channel negotiation failed. [3=WAS.JMS.SVRCONN ]],3=localhost(5558),5=RemoteConnection.initSess]
at com.ibm.mq.jmqi.remote.api.RemoteFAP$Connector.jmqiConnect(RemoteFAP.java:13588)
at com.ibm.mq.jmqi.remote.api.RemoteFAP$Connector.access$100(RemoteFAP.java:13125)
at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiConnect(RemoteFAP.java:1430)
at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiConnect(RemoteFAP.java:1389)
at com.ibm.mq.ese.jmqi.InterceptedJmqiImpl.jmqiConnect(InterceptedJmqiImpl.java:377)
at com.ibm.mq.ese.jmqi.ESEJMQI.jmqiConnect(ESEJMQI.java:562)
at com.ibm.mq.MQSESSION.MQCONNX_j(MQSESSION.java:916)
at com.ibm.mq.MQManagedConnectionJ11.<init>(MQManagedConnectionJ11.java:240)
On the server side, I get the following in the console:
CWSIC3712E: A WebSphere MQ client, previously connected from host 127.0.0.1:58963 on transport chain InboundBasicMQLink, has been disconnected because of exception java.io.IOException: Async IO operation failed (1), reason: RC: 55 The specified network resource or device is no longer available.
After this error occurs, any subsequent attempts to send a message will fail with the same error. I have to restart the app at which point the same thing repeats: first 4-5 messages send before the fails begin. If I switch back to using the old JARs without changing the code, I'm able to send an unlimited number of messages without any issues.
The reason code is confusing to me ("An MQCONN or MQCONNX call was issued from a client connected application, but it failed to agree a password protection algorithm with the queue manager.") because if it's truly a password issue, why do the first couple messages send without issue? It does not seem to be an issue with closing/disconnecting the queue/manager because I'll wait a few seconds between each send and can breakpoint/println and see that each time they are being closed before the next send.
Any ideas?
I found a partial workaround to this problem:
In the Java code, for our MQMessage object, we declare a "replyToQueueName". If I remove that setter, the problem seems to go away (we can send as many messages as we want without error).
I'm not certain why in this particular case that seems to work. The failure that occurs happens on the MQQueueManager declaration which is much higher up in the code than the replyToQueueName setter. This combined with this bug only occurring after 4-5 messages are sent seems to maybe indicate that something is not being "closed" properly but as far as I know there is no way to "close" a message and we are already closing/disconnecting the manager and the queue.

How to obtain the status of an instance of java.rmi.Remote

I have an instance to a class A that implements java.rmi.Remote.
In order to check the health of the connection to the RMI Server, I invoke a custom-made, trivial member function of the instance of A and see if an Exception is thrown. That's not really elegant. Therefore my question:
Is there any native way to check if the connection is available for method invocation on the instance of A, i.e. without the need to actually try to call a member function?
A special case is: Should the RMI server be restarted during the lifetime of the instance of A on the client side, then the instance of A becomes invalid and defunct (although the server might be back up and healthy).
From Java RMI FAQ :
F.1 At what point is there a "live" connection between the client and
the server and how are connections managed?
When a client does a "lookup" operation, a connection is made to the
rmiregistry on the specified host. In general, a new connection may or
may not be created for a remote call. Connections are cached by the
Java RMI transport for future use, so if a connection is free to the
right destination for a remote call, then it is used. A client cannot
explicitly close a connection to a server, since connections are
managed at the Java RMI transport level. Connections will time out if
they are unused for a period of time.
Your questions :
Is there any native way to check if the connection is available for
method invocation on the instance of A, i.e. without the need to
actually try to call a member function?
This question boils down to how to check programmatically that a given server/system is UP. This question has already been answered several times here and several other forums. One such question which answers this is Checking if server is online from Java code.
A special case is: Should the RMI server be restarted during the
lifetime of the instance of A on the client side, then the instance of
A becomes invalid and defunct (although the server might be back up
and healthy).
Then again, the answer is pretty easy. If the instance of the class was busy performing a step within the remote method invocation, then there would be a connection related exception thrown instantly.
Again, from RMI FAQ, D.8 Why can't I get an immediate notification when a client crashes? :
If a TCP connection is held open between the client and the server
throughout their interaction, then the server can detect the client
reboot(I'm adding here: and vice-versa) when a later attempt to write to the connection
fails (including the hourly TCP keepalive packet, if enabled).
However, Java RMI is designed not to require such permanent
connections forever between client and server(or peers), as it impairs
scalability and doesn't help very much.
Given that it is absolutely impossible to instantly determine when a
network peer crashes or becomes otherwise unavailable, you must decide
how your application should behave when a peer stops responding.
The lookup would keep on working perfectly, till the server is UP and doesn't get down while the client is performing operation on the remote method. You must decide here how your application should behave if the peer restarts. Additionally, there is no such concept as session in RMI.
I hope this answers all of your questions.
Your question is founded on a fallacy.
Knowing the status in advance doesn't help you in the slightest. The status test is followed by a timing window which is followed by your use of the server. During the timing window, the status can change. The server could be up when you test and down when you use. Or it could be down when you test and up when you use.
The correct way to determine whether any resource is available is to try to use it. This applies to input files, RMI servers, Web systems, ...
Should the RMI server be restarted during the lifetime of the instance of A on the client side, then the instance of A becomes invalid and defunct (although the server might be back up and healthy).
In this case you will get either a java.rmi.ConnectException or a java.rmi.NoSuchObjectException depending on whether the remote object restarted on a different port or the same port.

Some Spring WebSocket Sessions never disconnect

I have a websocket solution for duplex communication between mobile apps and a java backend system. I am using Spring WebSockets, with STOMP. I have implemented a ping-pong solution to keep websockets open longer than 30 seconds because I need longer sessions than that. Sometimes I get these errors in the logs, which seem to come from checkSession() in Spring's SubProtocolWebSocketHandler.
server.log: 07:38:41,090 ERROR [org.springframework.web.socket.messaging.SubProtocolWebSocketHandler] (ajp-http-executor-threads - 14526905) No messages received after 60205 ms. Closing StandardWebSocketSession[id=214a10, uri=/base/api/websocket].
They are not very frequent, but happens every day and the time of 60 seconds seem appropriate since it's hardcoded into the Spring class mentioned above. But then after running the application for a while I start getting large amounts of these really long-lived 'timeouts':
server.log: 00:09:25,961 ERROR [org.springframework.web.socket.messaging.SubProtocolWebSocketHandler] (ajp-http-executor-threads - 14199679) No messages received after 208049286 ms. Closing StandardWebSocketSession[id=11a9d9, uri=/base/api/websocket].
And at about this time the application starts experiencing problems.
I've been trying to search for this behavior but havn't found it anywhere on the web. Has anyone seen this problem before, know a solution, or can explain it to me?
We found some things:
We have added our own ping/pong functionality on STOMP level that runs every 30 seconds.
The mobile client had a bug that caused them to keep replying to the pings even when going into screensaving mode. This meant that the websocket was never closed or timed out.
On each pong message that the server received the Spring check found that no 'real' messages had been received for a very long time and triggered the log to be written. It then tries to close the websocket with this code:
session.close(CloseStatus.SESSION_NOT_RELIABLE);
but I suspect this doesn't close the session correctly. And even if it did, the mobile clients would try to reconnect. So when 30 more seconds have passed another pong message is sent to the server causing yet another one of these logs to be written. And so on forever...
The solution was to write some server-side code to close old websockets based on this project and also to fix the bug in the mobile clients that made them respond to ping/pong even when being in screensaver mode.
Oh, one thing that might be good for other people to know is that clients should never be trusted and we saw that they could sometimes send multiple request for websockets within one millisecond, so make sure to handle these 'duplicate requests' some way!
I am also facing the same problem.
net stat on Linux output shows tcp connections and status as below:
1 LISTEN
13 ESTABLISHED
67 CLOSE_WAIT
67 TCP connections are waiting to be closed but these are never getting closed.

Client and Server connection fails around 4/5 of the time

For a while I now have been playing around with networking between my computer and my Android phone using Java both as the server and client app.
When connected, it works as it should, both ends sending and recieving as they should. However, I'm having trouble to make them always connect, even when on a LAN network, and the client usually times out when connecting, particularly when getting the ObjectInputStream.
I have tried increasing and decreasing the timeouts on both the client and server sockets without any better result. Only worse.
I'm not exactly certain what information anyone would need in order to give me such tips, so if I'm missing something from this text, please tell me and I'll provide what I have got.
I just would like some tips as to what may be wrong with my code or program flow that cause this problem. Is there problems with the timing of the connection? In that case, why does it work when I get the ObjectOutputStream from the socket?
In case it is of help, I've provided what I'd imagine is the most important parts of the connection classes on the bottom of the question post.
If this is questioned in the wrong way or just being too specific to my own project, please tell me and I'll remove the question and try again.
ClientConnection.java - The class handling all connection on the Android client.
Server.java - Handles connecting clients and requests.
Connection.java - Currently just a storage class but will later also be used to check if the server should send a connection check and make sure the client is still connected.

How to fix error: [BEA][SQLServer JDBC Driver]No more data available to read

My java application does use DB Connection pooling. One of the functionality started failing today with this error:
[BEA][SQLServer JDBC Driver]No more data available to read
This doesn't occur daily. Once I restart my application server things look fine for some days and this error comes back again.
Anyone encountered this error? Reasons might vary, but I would like to know those various reasons to mitigate my issue.
Is it possible that the database or network connection has briefly had an outage? You might expect any currently open result sets then to become invalid with resulting errors.
I've never seen this particular error, but then I don't work with BEA or SQL Server, but a quick google does show other folks suggesting such a cause.
When you're using a connection pool, if you do get such a glitch, then all connections in teh pool become "stale" or invalid. My application server (WebSphere) has the option to discard the entire connection pool after particular errors are detected. The result then is that one unlucky request sees the error, but then subsequent requests get a new connection and recover. If you don't discard the whole pool then you get a failure as each stale connection is used and discarded.
I suggest you investigate to see a). whether your app server has such a capability b). how you application responds if the database is bounced, if this replicates the error then maybe you've found the cause.

Categories