Websocket blocks without timing out on client plug pull/power loss - java

I have an odd issue with my Tomcat + Spring websocket application. When a user disconnects without sending a "closing" signal, due to power loss or a plug pull, the thread will block about 10 seconds later.
The thread blocks on this function :
org.springframework.web.socket.WebSocketSession.sendMessage(WebSocketMessage<?> wsm) throws IOException;
I have tried putting a line in my AppConfig to try and set a timeout of 3 seconds but it does not seem to work properly as the block seems to go on for upwards of 15 minutes before throwing an exception.
#Bean(name="servletServerContainerFactoryBean")
public int maxSessionIdleTimeout() {
return 3000;
}
Here is the stack trace after an eventual SocketTimeoutException
Step: 2304
SendB -> test isOpen -> sendMes -> Done -> Finished Send.
SendB -> test2 isOpen -> sendMes -> User closed connection during packet sending: s01
Propogating exception up!
java.io.IOException: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:315)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:250)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:223)
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49)
at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:197)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:102)
at org.infpls.noxio.game.module.game.session.NoxioSession.sendPacket(NoxioSession.java:40)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby.step(GameLobby.java:117)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby$GameLoop.run(GameLobby.java:274)
Caused by: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:494)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:309)
... 8 more
## CRITICAL ## Ejecting player: test2 :: Exception thrown on packet send...
java.io.IOException: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:315)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:250)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:223)
at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49)
at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:197)
at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:102)
at org.infpls.noxio.game.module.game.session.NoxioSession.sendPacket(NoxioSession.java:40)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby.step(GameLobby.java:117)
at org.infpls.noxio.game.module.game.dao.lobby.GameLobby$GameLoop.run(GameLobby.java:274)
Caused by: java.net.SocketTimeoutException
at org.apache.tomcat.websocket.server.WsRemoteEndpointImplServer.doWrite(WsRemoteEndpointImplServer.java:81)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.writeMessagePart(WsRemoteEndpointImplBase.java:494)
at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendMessageBlock(WsRemoteEndpointImplBase.java:309)
... 8 more
Finished Send. Step Finished.
Having threads be blocked for 15 minutes at a time is a major problem. Any info on why this happens and how to fix it would be greatly appreciated. Thank you!

Found an answer finally. It's actually a system setting.
Source
How many times to retry before deciding that something is wrong and
it is necessary to report this suspicion to network layer. Minimal RFC
value is 3, it is default, which corresponds to 3sec-8min depending on
RTO.
/proc/sys/net/ipv4/tcp_retries2

Related

HDFS: namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal

On one of our platforms, HDFS namenode is shutting down with following error message every 1 or 3 days
FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [<ip1>:<port>,<ip2>:<port>, etc], stream=QuorumOutputStream starting at txid 29873171))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:109)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:710)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:188)
at java.lang.Thread.run(Thread.java:748)
Before this FATAL log we can see following kind of logs, on which we can detect a degradation of response time
WARN client.QuorumJournalManager (QuorumCall.java:waitFor(185)) - Waited 18014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [<ip1>:<port>,<ip2>:<port>]
Have you already encountered this problem, and do you have any advices to fix it ?
We have already:
checked that our VMs are time synchronized
detected that when the problem occurs a burst of data on the network is in progress, without detecting the root cause yet
checked our network devices. Except a problem on a port which goes from UP state to DOWN state quickly that we are going to fix, the network seems correct
Thanks in advance

ERROR Error cleaning broadcast Exception [duplicate]

This question already has answers here:
What are possible reasons for receiving TimeoutException: Futures timed out after [n seconds] when working with Spark [duplicate]
(4 answers)
Closed 5 years ago.
I get the following error while running my spark streaming application, we have a large application running multiple stateful (with mapWithState) and stateless operations. It's getting difficult to isolate the error since spark itself hangs and the only error we see is in the spark log and not the application log itself.
The error happens only after abount 4-5 mins with a micro-batch interval of 10 seconds.
I am using Spark 1.6.1 on an ubuntu server with Kafka based input and output streams.
Please note it's not possible for me to provide the smallest possible code to re-create this bug as it does not occur in unit test-cases, and the application itself is very large
Any direction you can give to solve this issue will be helpful. Please let me know if I can provide any more information.
Error inline below:
[2017-07-11 16:15:15,338] ERROR Error cleaning broadcast 2211 (org.apache.spark.ContextCleaner)
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:136)
at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:77)
at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:233)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:189)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:180)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:180)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1180)
at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:173)
at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:68)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
Your exception message clearly says that its RPCTimeout due to default configuration of 120 seconds and adjust to optimal value as per your work load.
please see 1.6 configuration
your error messages org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds].
and
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) confirms that.
For Better understanding please see the below code from
see RpcTimeout.scala
/**
* Wait for the completed result and return it. If the result is not available within this
* timeout, throw a [[RpcTimeoutException]] to indicate which configuration controls the timeout.
* #param awaitable the `Awaitable` to be awaited
* #throws RpcTimeoutException if after waiting for the specified time `awaitable`
* is still not ready
*/
def awaitResult[T](awaitable: Awaitable[T]): T = {
try {
Await.result(awaitable, duration)
} catch addMessageIfTimeout
}
}
Also see my answer in another context

Again ERROR java.util.concurrent.TimeoutException: android.view.ThreadedRenderer.finalize() timed out after 10 seconds

Please don't mark as duplicate
I get this error very often on crashlytics, and I can't find the problem. I saw it here or here but I don't have any WebViews in my app.
I have in my app some AsyncTasks that are processing some date: select from DB, compare and call the listener to update UI when is done (I check if activity is still visible when it's needed to update UI). This processing can be long if user has 10k records to compare.
Have to mention that error happens for different devices manufacturers (more on Samsung) and on different Android versions (more on Andorid 6.0 +)
I get this error many times as different one, for example:
File: Daemons.java:217
java.util.concurrent.TimeoutException: android.view.ThreadedRenderer.finalize() timed out after 10 seconds
1 at android.view.ThreadedRenderer.nDeleteProxy(Native Method)
2 at android.view.ThreadedRenderer.finalize(ThreadedRenderer.java:459)
3 at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:217)
4 at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:200)
5 at java.lang.Thread.run(Thread.java:818)
File: Daemons.java:206
java.util.concurrent.TimeoutException: android.view.ThreadedRenderer.finalize() timed out after 10 seconds
1 at android.view.ThreadedRenderer.nDeleteProxy(Native Method)
2 at android.view.ThreadedRenderer.finalize(ThreadedRenderer.java:449)
3 at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:206)
4 at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:189)
5 at java.lang.Thread.run(Thread.java:818)
File Daemons.java:210
java.util.concurrent.TimeoutException: android.view.ThreadedRenderer.finalize() timed out after 15 seconds
1 at android.view.ThreadedRenderer.nDeleteProxy(Native Method)
2 at android.view.ThreadedRenderer.finalize(ThreadedRenderer.java:427)
3 at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:210)
4 at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:193)
5 at java.lang.Thread.run(Thread.java:818)
etc...
As one of the posts you link to describes, this is an exception being generated by garbage collection taking too long for an object with a native code implementation. I ran into this exception and found a solution by looking at the source code producing this exception java.lang.Daemons$FinalizerDaemon.finalizerTimedOut() which you can see here.
In finalizerTimedOut(), if Thread.getDefaultUncaughtExceptionHandler() == null, it create and logs a TimeoutException() and exits with a non-zero code (System.exit(2);)! To the user, this will look like an FC! To prevent this, you need to call Thread.setDefaultUncaughtExceptionHandler() to catch and handle this exception.

Unable to get resource from jedis

After running my application, i am getting this error after around 5 mins.
Even though i am returning the resource after use, i keep getting this.
I have built jedis-2.2.2-SNAPSHOT.jar from the jedis code base, since its not released yet
I had set the minIdle = 100, maxIdle=200 & maxActive=200. At the time of this exception, the connection count to redis was 122 from my application
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:42)
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:442)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:360)
at redis.clients.util.Pool.getResource(Pool.java:40)
... 6 more
Did you check that redis is still up & running ?
If not, investigate why it died.
try a redis-cli in a terminal if you can. "info" would give you more details.

WebService times out, but client receives no exception

I have an application that is attempting to call a service and the other service appears to be timing out. The problem is my application does not receive any timeout exceptions, although I do see an error printed out to the console:
[7/8/13 12:39:32:360 EDT] 00000005 TimeoutManage I WTRN0006W: Transaction 0000013FBF252E43000000010000000CE81CB4935851D5C13DECD3DBB2D463F0DBECAEE60000013FBF252E43000000010000000CE81CB4935851D5C13DECD3DBB2D463F0DBECAEE600000001 has timed out after 120 seconds.
[7/8/13 12:39:32:360 EDT] 00000005 TimeoutManage I WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[WebContainer : 1,5,main]. The stack trace of this thread when the timeout occurred was:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:196)
com.ibm.io.async.AbstractAsyncFuture.waitForCompletion(AbstractAsyncFuture.java:334)
com.ibm.io.async.AsyncFuture.getByteCount(AsyncFuture.java:218)
com.ibm.ws.tcp.channel.impl.AioSocketIOChannel.readAIOSync(AioSocketIOChannel.java:215)
com.ibm.ws.tcp.channel.impl.AioTCPReadRequestContextImpl.processSyncReadRequest(AioTCPReadRequestContextImpl.java:182)
com.ibm.ws.tcp.channel.impl.TCPReadRequestContextImpl.read(TCPReadRequestContextImpl.java:111)
com.ibm.ws.http.channel.outbound.impl.HttpOutboundServiceContextImpl.parseResponseMessageSync(HttpOutboundServiceContextImpl.java:1657)
com.ibm.ws.http.channel.outbound.impl.HttpOutboundServiceContextImpl.readSyncResponse(HttpOutboundServiceContextImpl.java:725)
com.ibm.ws.http.channel.outbound.impl.HttpOutboundServiceContextImpl.startResponseReadSync(HttpOutboundServiceContextImpl.java:1775)
com.ibm.ws.http.channel.outbound.impl.HttpOutboundServiceContextImpl.finishRequestMessage(HttpOutboundServiceContextImpl.java:1195)
com.ibm.ws.websvcs.transport.http.out.HttpOutSyncWriter.finishBufferRequest(HttpOutSyncWriter.java:94)
com.ibm.ws.websvcs.transport.http.out.HttpOutWriter.writeBuffer(HttpOutWriter.java:136)
com.ibm.ws.websvcs.transport.http.out.HttpOutByteBufferOutputStream.finish(HttpOutByteBufferOutputStream.java:468)
com.ibm.ws.websvcs.transport.http.SOAPOverHTTPSender.sendChunkedRequest(SOAPOverHTTPSender.java:890)
com.ibm.ws.websvcs.transport.http.SOAPOverHTTPSender.sendSOAPRequest(SOAPOverHTTPSender.java:807)
com.ibm.ws.websvcs.transport.http.SOAPOverHTTPSender.send(SOAPOverHTTPSender.java:611)
com.ibm.ws.websvcs.transport.http.HTTPTransportSender.invoke(HTTPTransportSender.java:364)
org.apache.axis2.engine.AxisEngine.send(AxisEngine.java:531)
org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:401)
org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:228)
org.apache.axis2.client.OperationClient.execute(OperationClient.java:163)
org.apache.axis2.jaxws.core.controller.impl.AxisInvocationController.execute(AxisInvocationController.java:581)
org.apache.axis2.jaxws.core.controller.impl.AxisInvocationController.doInvoke(AxisInvocationController.java:130)
org.apache.axis2.jaxws.core.controller.impl.InvocationControllerImpl.invoke(InvocationControllerImpl.java:93)
org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invokeSEIMethod(JAXWSProxyHandler.java:364)
org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invoke(JAXWSProxyHandler.java:185)
The client is created with these settings:
bindProvider.getRequestContext().put(com.ibm.wsspi.webservices.Constants.RESPONSE_TIMEOUT_PROPERTY , connectionProperties.getProperty(MyService.TIME_OUT));
bindProvider.getRequestContext().put(com.ibm.wsspi.webservices.Constants.CONNECTION_TIMEOUT_PROPERTY , connectionProperties.getProperty(MyService.TIME_OUT));
bindProvider.getRequestContext().put(com.ibm.wsspi.webservices.Constants.READ_TIMEOUT_PROPERTY , connectionProperties.getProperty(MyService.TIME_OUT));
MyService.TIME_OUT has a value of 20000 and I have verified that it is being set correctly.
The code that catches calls the service looks like this:
try
{
response = ((MyServicePortType) myService).doWebServiceOperation(request);
}
catch (Throwable e) //I know, catch Throwable is not very good but right now I'd be happy to catch ANYthing here!
{
log.error("Webservice reported error",e);
}
Even though I've changed my catch block to catch anything, I still don't catch any exceptions. WebSphere detects a transaction timeout, but I don't know why the application doesn't detect a timeout in the web service call. Is there something I'm missing that would cause a proper timeout exception to be thrown so that I can catch it and send the message to the application framework?
Well now I feel silly.
It seems that for WebSphere, these properties (RESPONSE_TIMEOUT_PROPERTY, CONNECTION_TIMEOUT_PROPERTY, etc...) should have their values set in seconds, and I was using milliseconds based on what I'd seen in online examples (that clearly were not intended for WebSphere).
Changing 20000 to 20 has resolved this problem.
The page that suggested I should be assuming seconds instead of milliseconds is this one: http://pic.dhe.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/ae/rwbs_httptransportprop.html

Categories