I have a fairly complex websocket based application running on an up to date Tomcat 8 server.
At fairly random intervals I get this exception, simultaneously on all connected users.
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:275)
at org.apache.tomcat.websocket.PerMessageDeflate.sendMessagePart(PerMessageDeflate.java:377)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:284)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:258)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialBytes(WsRemoteEndpointImplBase.java:161)
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendBinary(WsRemoteEndpointBasic.java:56)
at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendBinaryMessage(StandardWebSocketSession.java:202)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:105)
at org.infpls.royale.server.game.session.RoyaleSession.sendImmiediate(RoyaleSession.java:69)
at org.infpls.royale.server.game.session.SessionThread.run(SessionThread.java:46)
After this exception is thrown, the web socket is left in a PARTIAL_WRITING state and disconnects on the next write attempt.
I've seen it happen 15 minutes after starting Tomcat and I've seen it happen after idling on the server for 8 hours. I cannot find any correlation to what users are doing on the server and when this exception is thrown.
The problem seems to be happening fairly deep into Spring/Java NIO code and I am not sure how to debug this.
Any help would be greatly appreciated!
I took a shot in the dark after reading some slightly related issues from other people and instead of passing the bytebuffer off to the client threads to send it, I made full copies of the byte buffer for each thread that would send it.
I also switched from using the websocket.sendBinary(ByteBuffer bb) to using websocket.sendBinary(byte[] bb) instead.
That seems to have done the trick as I have not seen this bug happen again while running the server in production mode with pretty heavy load for 12 hours. If i find further information about this I will update it here.
Related
I am developing a nodemcu websocket server android client app using java.i successfully created client and connected to it through a websocket client service.i can detect server failure/closed when sending data.but can't detect it at the time of failure that is if server powered off cant know untill some data is send.how to know the server failure at the time of failure.using okhttp 4.1.0 library.can anyone help
how to know the server failure at the time of failure.using okhttp 4.1.0 library.can anyone help
You can't. It's not possible, but, there are workarounds, see below.
Why isn't it possible? Internally, the internet is packet switched, which means data is first gathered up into packets, and then these packets are sent.
Most of the stuff you do on the web feels like it is 'streams' instead (you send 1 character, and one character arrives on the other side). But that's all based on protocols that are built on top of the packet nature of the internet.
When you have an open connection between 2 computers via the internet, no data is actually being sent, at all. It's not like you have a line reserved. Old telephone networks did work like that: When you dialled somebody, you got a dedicated line, and once the line got interrupted, you'd hear beeps to indicate this.
That is not how the internet works. Those wires and everything in between have no idea that there is an open connection at all. That's just some bits in memory on your computer and on the server which lets them identify certain packets as part of the longer conversation those 2 machines were having, is all.
Thus we arrive at why this isn't possible: Given that no packets are flowing whatsoever until one side actually sends data to the other, it is impossible to tell the difference between 'no data being sent right now' and 'somebody tripped over the power cable in the server park'. That's why you don't get that info until you send something (and the reason you get that is only because when you send something, the protocol dictates that the server sends you back a confirmation of receiving what you sent. If that takes too long, your computer will send it a few more times just in case the packet just got lost somewhere, and will eventually give up and conclude that the server can no longer be reached or crashed or lost power, and only then do you get the IOException).
Workarounds
A simple one is to upgrade your own protocol: Dictate that the server or client (doesn't matter who takes the responsibility to do this) sends a do-nothing message at least once a minute. You can then conclude after not receiving that for 100 seconds or so that the connection is probably dead. You can start a timer for 100 seconds, reset it every time you receive any data whatsoever. If the timer ever runs out? Connection is likely dead.
This is somewhat take on this idea built into the protocol that lets you make connections that feel like streams of data. That protocol is called TCP/IP, and the feature is called KeepAlive.
The problem is, you possibly don't get to dictate the TCP/IP settings for your websocket connection. If you can, you can turn on keepalive (for example in java, you use Socket to make raw TCP/IP connections, and it has a .setSoKeepAlive(true) method. Check the API if you can get at the socket or otherwise scan the docs for 'keepalive' and see if there's anything there.
I bet there won't be, which means you have to use the trick I mentioned above: Update your server code to use a timer to send a 'hello!' 60 seconds after any conversation, and update your client code to give up on the connection once 100 seconds have passed (give it 40 additional seconds; sometimes the internet gets a little backed up or servers get a little busy).
UPDATE:
My goal is to learn what factors could overwhelm my little tomcat server. And when some exception happens, what I could do to resolve or remediate it without switching my server to a better machine. This is not a real app in a production environment but just my own experiment (Besides some changes on the server-side, I may also do something on my client-side)
Both of my client and server are very simple: the server only checks the URL format and send 201 code if it is correct. Each request sent from my client only includes an easy JSON body. There is no database involved. The two machines (t2-micro) only run client and server respectively.
My client is OkHttpClient(). To avoid timeout exceptions, I already set timeout 1,000,000 milli secs via setConnectTimeout, setReadTimeout, and setWriteTimeout. I also go to $CATALINA/conf/server.xml on my server and set connectionTimeout = "-1"(infinite)
ORIGINAL POST:
I'm trying to stress out my server by having a client launching 3000+ threads sending HTTP requests to my server. Both of my client and server reside on different ec2 instances.
Initially, I encountered some timeout issues, but after I set the connection, read and write timeout to a bigger value, this exception has been resolved. However, with the same specification, I'm getting java.net.ConnectException: Failed to connect to my_host_ip:8080 exception. And I do not know its root cause. I'm new to multithreading and distributed system, can anyone please give me some insights of this exception?
Below is some screenshot of from my ec2:
1. Client:
2. Server:
Having gone through similar exercise in past I can say that there is no definitive answer to the problem of scaling.
Here are some general trouble shooting steps that may lead to more specific information. I would suggest trying out tests by tweaking a few parameters in each test and measure the changes in Cpu, logs etc.
Please provide what value you have put for the timeout. Increasing timeout could cause your server (or client) to run out of threads quickly (cause each thread can process for longer). Question the need for increasing timeout. Is there any processing that slows your server?
Check application logs, JVM usage, memory usage on the client and Server. There will be some hints there.
Your client seems to be hitting 99%+ and then come down. This implies that there could be a problem at the client side in that it maxes out during the test. Your might want to resize your client to be able to do more.
Look at open file handles. The number should be sufficiently high.
Tomcat has some limit on thread count to handle load. You can check this in server.xml and if required change it to handle more. Although cpu doesn't actually max out on server side so unlikely that this is the problem.
If you a database then check the performance of the database. Also check jdbc connect settings. There is thread and timeout config at jdbc level as well.
Is response compression set up on the Tomcat? It will give much better throughout on server especially if the data being sent back by each request is more than a few kbs.
--------Update----------
Based on update on question few more thoughts.
Since the application is fairly simple, the path in terms of stressing the server should be to start low and increase load in increments whilst monitoring various things (cpu, memory, JVM usage, file handle count, network i/o).
The increments of load should be spread over several runs.
Start with something as low as 100 parallel threads.
Record as much information as you can after each run and if the server holds up well, increase load.
Suggested increments 100, 200, 500, 1000, 1500, 2000, 2500, 3000.
At some level you will see that the server can no longer take it. That would be your breaking point.
As you increase load and monitor you will likely discover patterns that suggest tuning of specific parameters. Each tuning attempt should then be tested again the same level of multi threading. The improvement of available will be obvious from the monitoring.
I have a websocket solution for duplex communication between mobile apps and a java backend system. I am using Spring WebSockets, with STOMP. I have implemented a ping-pong solution to keep websockets open longer than 30 seconds because I need longer sessions than that. Sometimes I get these errors in the logs, which seem to come from checkSession() in Spring's SubProtocolWebSocketHandler.
server.log: 07:38:41,090 ERROR [org.springframework.web.socket.messaging.SubProtocolWebSocketHandler] (ajp-http-executor-threads - 14526905) No messages received after 60205 ms. Closing StandardWebSocketSession[id=214a10, uri=/base/api/websocket].
They are not very frequent, but happens every day and the time of 60 seconds seem appropriate since it's hardcoded into the Spring class mentioned above. But then after running the application for a while I start getting large amounts of these really long-lived 'timeouts':
server.log: 00:09:25,961 ERROR [org.springframework.web.socket.messaging.SubProtocolWebSocketHandler] (ajp-http-executor-threads - 14199679) No messages received after 208049286 ms. Closing StandardWebSocketSession[id=11a9d9, uri=/base/api/websocket].
And at about this time the application starts experiencing problems.
I've been trying to search for this behavior but havn't found it anywhere on the web. Has anyone seen this problem before, know a solution, or can explain it to me?
We found some things:
We have added our own ping/pong functionality on STOMP level that runs every 30 seconds.
The mobile client had a bug that caused them to keep replying to the pings even when going into screensaving mode. This meant that the websocket was never closed or timed out.
On each pong message that the server received the Spring check found that no 'real' messages had been received for a very long time and triggered the log to be written. It then tries to close the websocket with this code:
session.close(CloseStatus.SESSION_NOT_RELIABLE);
but I suspect this doesn't close the session correctly. And even if it did, the mobile clients would try to reconnect. So when 30 more seconds have passed another pong message is sent to the server causing yet another one of these logs to be written. And so on forever...
The solution was to write some server-side code to close old websockets based on this project and also to fix the bug in the mobile clients that made them respond to ping/pong even when being in screensaver mode.
Oh, one thing that might be good for other people to know is that clients should never be trusted and we saw that they could sometimes send multiple request for websockets within one millisecond, so make sure to handle these 'duplicate requests' some way!
I am also facing the same problem.
net stat on Linux output shows tcp connections and status as below:
1 LISTEN
13 ESTABLISHED
67 CLOSE_WAIT
67 TCP connections are waiting to be closed but these are never getting closed.
I have an app which uses an instance of the Socket class to communicate with a server.
I use the streams returned by socket.getInputStream() and socket.getOutputStream() to read and write data.
When my Android app is always "active" (not minimized), there is no problem with the communication. It does not matter how long the connection lasts.
When I "pause" the application and re-open it quickly, everything still works fine.
However, when I pause the application for about 5 minutes and re-open it, the InputStream shows strange behavior: it stops reading anything. I get timeout errors instead of the data sent by the server.
The connection is still alive, the server is able to write and read. isInputShutdown() on the client-side returns false.
Using a network analysis tool, I can also see that the data sent by the server IN FACT reaches the client but it somehow does not get recognized by the InputStream ...
However, writing data from the client to the server using the OutputStream works fine.
Maybe it's worth mentioning that the socket object and the streams are declared as static to be accessible for all the activities of the app. But as I don't have any problems with the OutputStream, I cannot imagine that this could be the reason.
The only workaround I have at this point is to close the whole socket and connect a new one to the server. But this is causing unnecessary network traffic because I have to handshake again. It would be better not to do it this way.
If anyone has had similar experience and found a solution, I would be really happy if you could share it with me.
You should create a service that will be run in background and implement socket connection with a server.
As Kevin Krumwiede pointed out by referring to this post: Strange behavior of socket outputstream android, when sending data every X minutes (e.g. 3 or 4), everything still works as it should even after 30 minutes of being 'paused'.
I had the hope that Socket.setKeepAlive(true) would be enough to keep the connection alive so I dont have to cause too much unnecessary network traffic but in my particular case, this does not help.
Sending 1 byte of 'garbage' every X minutes 'solves' the problem.
we've been having this problem for a long time and still cannot find out where is the problem. Our application uses RTMP for videostreaming and if the webclient cannot connect it skips to RTMPT (RTMP over HTTP). This causes the video to freeze after couple seconds of playback.
I have already found some forums where people seems to be havoing the same issue, but none of the proposed solutions worked. One suggestion was to turn of the video recording, but it didn't work. I have also read, that it seems to be a thread problem in the red5, but before hacking into the RED5 I would like to know, if maybe somebody has a patch or anything which repairs this.
One thing more, we've been testing this on Macs if that should count. Thank you very much in advance.
the very first thing you should look at is really the red5/error log.
Also Red5 occassionally produces output that might be not in the log but just to plain std.out
There is a red5-debug.sh or red5-highpref.sh that does output/log everything to a file called std.out.
You should use those logs to start your analysis. Eventually you will already see something into it. For example exception like:
broken pipe
connection closed due to too long xxx
handshake error
encoding issue in packet xyz
unexpected connection closed
call xyz cannot be handled
too many connections
heap space error
too many open files
Some of them are operating system specific, like for example the number of open files. Some are not.
Also it is very important that you are using the latest revision of Red5 and not an old version. You did not tell us what version you are using.
However, just from symptoms like video freezes *occassional disconnects* or similar you won't be able to start a real analysis of the problem.
Sebastian
Were you connected to the server when the video freezed? Or after that? I am not sure but I think connection closed which caused the stream to freeze.Just check in the access logs of Red5 if there are any logs for 'idle' packets(possibly after a 'send' packet(s) and more than one in number).
Another thing you could have a look at is your web server log files because RTMPT is over HTTP. I once had a problem with my anti DDOS program on the server. RTMPT will make many connections after each other and these TCP connections remain alive for about 4 minutes by default. You can easily get hundreds connections at the same time being seen as a DDOS-attack and as a result the IP-adres of the client will be banned.