we've been having this problem for a long time and still cannot find out where is the problem. Our application uses RTMP for videostreaming and if the webclient cannot connect it skips to RTMPT (RTMP over HTTP). This causes the video to freeze after couple seconds of playback.
I have already found some forums where people seems to be havoing the same issue, but none of the proposed solutions worked. One suggestion was to turn of the video recording, but it didn't work. I have also read, that it seems to be a thread problem in the red5, but before hacking into the RED5 I would like to know, if maybe somebody has a patch or anything which repairs this.
One thing more, we've been testing this on Macs if that should count. Thank you very much in advance.
the very first thing you should look at is really the red5/error log.
Also Red5 occassionally produces output that might be not in the log but just to plain std.out
There is a red5-debug.sh or red5-highpref.sh that does output/log everything to a file called std.out.
You should use those logs to start your analysis. Eventually you will already see something into it. For example exception like:
broken pipe
connection closed due to too long xxx
handshake error
encoding issue in packet xyz
unexpected connection closed
call xyz cannot be handled
too many connections
heap space error
too many open files
Some of them are operating system specific, like for example the number of open files. Some are not.
Also it is very important that you are using the latest revision of Red5 and not an old version. You did not tell us what version you are using.
However, just from symptoms like video freezes *occassional disconnects* or similar you won't be able to start a real analysis of the problem.
Sebastian
Were you connected to the server when the video freezed? Or after that? I am not sure but I think connection closed which caused the stream to freeze.Just check in the access logs of Red5 if there are any logs for 'idle' packets(possibly after a 'send' packet(s) and more than one in number).
Another thing you could have a look at is your web server log files because RTMPT is over HTTP. I once had a problem with my anti DDOS program on the server. RTMPT will make many connections after each other and these TCP connections remain alive for about 4 minutes by default. You can easily get hundreds connections at the same time being seen as a DDOS-attack and as a result the IP-adres of the client will be banned.
Related
This question already has answers here:
FtpClient storeFile always return False
(5 answers)
Closed 1 year ago.
I have a Java program that uploads new/changed files to my Web site via FTP. It currently uses the Apache Commons Net Library, version 3.8.0.
After moving to a new city, the program, which I’ve been using for almost 20 years, began failing. It still connects to the FTP server and signs in successfully. But when it tries to upload a file, it pauses for 20-30 seconds, then fails. It always fails on the first file, 100% of the time.
The failing call is to org.apache.commons.net.ftp.FTPClient.storeFile(). The documentation says storeFile() turns True if successfully completed, false if not. Curiously, the method is also documented to throw various forms of IOException. The documentation doesn’t say when or why the method decides to return a boolean versus throwing an exception.
My problem is that storeFile() is returning a false (indicating failure), and never throws an exception. So, I have no error message to tell me what caused the failure. File name & path look OK. The Web hosting company tried to determine why the failure was occurring, but was unsuccessful.
This problem has been going on for weeks now. Anyone have any ideas on how to debug this?
If the cause of your problem is moving to a new city, and you can still open the control connection, the most likely culprit is a change to your underlying ISP and network that is blocking the data transfer stream from opening.
FTP port 21 is used for opening connections and is normally allowed by all networks but then a new, random, unprivileged port is negotiated over the control connection and then used for the actual DATA transfers. I bet your "storeFile()" is trying to open a data connection and hitting a block which is probably causing a timeout. You may be interpretting this as "never throws an exception" but in reality it might throw a Timeout Exception after you sit around and wait long enough.
One thing I would recommend is find a way to have your FTP client use PASSIVE mode for the FTP data transfer. This is designed into the protocol to avoid these types of problems. You can read about it in detail on the wiki https://en.wikipedia.org/wiki/File_Transfer_Protocol under "Communications and Data Transfer"
I have a fairly complex websocket based application running on an up to date Tomcat 8 server.
At fairly random intervals I get this exception, simultaneously on all connected users.
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:275)
at org.apache.tomcat.websocket.PerMessageDeflate.sendMessagePart(PerMessageDeflate.java:377)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:284)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:258)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialBytes(WsRemoteEndpointImplBase.java:161)
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendBinary(WsRemoteEndpointBasic.java:56)
at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendBinaryMessage(StandardWebSocketSession.java:202)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:105)
at org.infpls.royale.server.game.session.RoyaleSession.sendImmiediate(RoyaleSession.java:69)
at org.infpls.royale.server.game.session.SessionThread.run(SessionThread.java:46)
After this exception is thrown, the web socket is left in a PARTIAL_WRITING state and disconnects on the next write attempt.
I've seen it happen 15 minutes after starting Tomcat and I've seen it happen after idling on the server for 8 hours. I cannot find any correlation to what users are doing on the server and when this exception is thrown.
The problem seems to be happening fairly deep into Spring/Java NIO code and I am not sure how to debug this.
Any help would be greatly appreciated!
I took a shot in the dark after reading some slightly related issues from other people and instead of passing the bytebuffer off to the client threads to send it, I made full copies of the byte buffer for each thread that would send it.
I also switched from using the websocket.sendBinary(ByteBuffer bb) to using websocket.sendBinary(byte[] bb) instead.
That seems to have done the trick as I have not seen this bug happen again while running the server in production mode with pretty heavy load for 12 hours. If i find further information about this I will update it here.
I have a websocket solution for duplex communication between mobile apps and a java backend system. I am using Spring WebSockets, with STOMP. I have implemented a ping-pong solution to keep websockets open longer than 30 seconds because I need longer sessions than that. Sometimes I get these errors in the logs, which seem to come from checkSession() in Spring's SubProtocolWebSocketHandler.
server.log: 07:38:41,090 ERROR [org.springframework.web.socket.messaging.SubProtocolWebSocketHandler] (ajp-http-executor-threads - 14526905) No messages received after 60205 ms. Closing StandardWebSocketSession[id=214a10, uri=/base/api/websocket].
They are not very frequent, but happens every day and the time of 60 seconds seem appropriate since it's hardcoded into the Spring class mentioned above. But then after running the application for a while I start getting large amounts of these really long-lived 'timeouts':
server.log: 00:09:25,961 ERROR [org.springframework.web.socket.messaging.SubProtocolWebSocketHandler] (ajp-http-executor-threads - 14199679) No messages received after 208049286 ms. Closing StandardWebSocketSession[id=11a9d9, uri=/base/api/websocket].
And at about this time the application starts experiencing problems.
I've been trying to search for this behavior but havn't found it anywhere on the web. Has anyone seen this problem before, know a solution, or can explain it to me?
We found some things:
We have added our own ping/pong functionality on STOMP level that runs every 30 seconds.
The mobile client had a bug that caused them to keep replying to the pings even when going into screensaving mode. This meant that the websocket was never closed or timed out.
On each pong message that the server received the Spring check found that no 'real' messages had been received for a very long time and triggered the log to be written. It then tries to close the websocket with this code:
session.close(CloseStatus.SESSION_NOT_RELIABLE);
but I suspect this doesn't close the session correctly. And even if it did, the mobile clients would try to reconnect. So when 30 more seconds have passed another pong message is sent to the server causing yet another one of these logs to be written. And so on forever...
The solution was to write some server-side code to close old websockets based on this project and also to fix the bug in the mobile clients that made them respond to ping/pong even when being in screensaver mode.
Oh, one thing that might be good for other people to know is that clients should never be trusted and we saw that they could sometimes send multiple request for websockets within one millisecond, so make sure to handle these 'duplicate requests' some way!
I am also facing the same problem.
net stat on Linux output shows tcp connections and status as below:
1 LISTEN
13 ESTABLISHED
67 CLOSE_WAIT
67 TCP connections are waiting to be closed but these are never getting closed.
whenever our application handles a large amount of http request, the error "too many open files" is being displayed on the logs and I am sure that error is connected to the socket and creates a new file descriptor instance see the error below:
[java.net.Socket.createImpl(Socket.java:447), java.net.Socket.getImpl(Socket.java:510),
java.net.Socket.setSoTimeout(Socket.java:1101),
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:122),
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnecti
onOperator.java:148),
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149),
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121),
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:561), org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415),
when I saw on the internet that I should use
EntityUtils.consume(entity);
httpClient.getConnectionManager().shutdown();
the errors were reduced but not that many, I have a feeling that my consuming of resources is not that enough to clear all of the file descriptors. Right I am looking for different answers asides on changing the ulimit because the application will be deployed on other server that we can't configure if changes are needed.
Since you are using Linux, there are some configurations which might be changed to solve the problem. First of all, what is happening? After you close the socket in Java, the operating systems sets it to TIME_WAIT state and that is because something still might be sent to the socket, so OS maintains it open for some time to make sure all the packets will be received (basically the packets which are still on the way and you need some kind of response for those must be received).
As far as I know it's not an optimal solution for this problem, but it works. You should set tcp_tw_recycle and tcp_tw_reuse to 1, to allow fast re-usage of sockets by OS. Now how it is done, depends on the Linux version you have. For instance in Fedora I could do something like:
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
to set those temporarily (untill reboot). It is up to you to find out how to set those permanently, cuz I'm not very strong at Linux Administration.
EDIT: I'm mentioning once again, it is not an optimal solution, try to think what can be changed in application before messing with OS configuration.
A driver from one of our vendors suddenly started failing last week, on a machine where it has worked for almost a year. The last two descriptive messages output by the driver are now three minutes apart (whereas before they were seconds). Reverse-engineering the code, I've found that the three minute delays happen during calls to java.rmi.Naming.bind().
What's my starting point for troubleshooting this and getting more information out of the call to bind()? It does not return a success or failure indication, and it does not appear to be throwing any Exceptions. I would assume that the consistent 3-minute delay means that some timeout is being hit at some point during the process, but then why is there no indication of failure?
This is almost certainly a DNS problem. Check the DNS lookup and reverse lookup times. Naming.bind() itself is trivial, at both ends.
Answering my own question:
java RMI appears to log to java.util.logging. So, configure java.util.logging to see what Java RMI is doing, so you can debug the problem.
(In my case, we use log4j, so I found a library to route java.util.logging messages to log4j.)
With logging enabled, we discovered that RMI was spending three minutes trying to connect to a wrong IP address that it thought was localhost, thanks to a wrong entry somebody had added to our hosts file.