spring boot redis operation throw broken pipe error - java

We use redis in spring boot project. After running a period of time, the redis operation MAY throw a broken pipe error, but sometimes it will succeed. Restarting the service will resolve this problem, but it's not a good idea.
I can't tell the reason why it happens. It seems that some redis connections in the pool are unusable, but aren't closed and evicted from the pool.
my questions are:
possible reason causing the broken pipe error?
if there isn't redis operation for a long period, will the idle connection in the pool become unusable?
will the connection be closed and evicted from pool when broken pipe error happen?
pom.xml:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
application.yml:
spring:
redis:
database: 0
host: ${REDIS_HOST:127.0.0.1}
password: ${REDIS_PASSWORD:password}
port: ${REDIS_PORT:6379}
timeout: ${REDIS_TIMEOUT:1000}
pool:
max-active: ${REDIS_MAX_ACTIVE:100}
max-wait: ${REDIS_MAX_WAIT:500}
max-idle: ${REDIS_MAX_IDLE:20}
min-idle: ${REDIS_MIN_IDLE:5}
error message:
org.springframework.data.redis.RedisConnectionFailureException: java.net.SocketException: Broken pipe (Write failed); nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Broken pipe (Write failed)
at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:67) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:41) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:37) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:37) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.connection.jedis.JedisConnection.convertJedisAccessException(JedisConnection.java:212) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.connection.jedis.JedisConnection.hSet(JedisConnection.java:2810) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.core.DefaultHashOperations$9.doInRedis(DefaultHashOperations.java:173) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:204) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:166) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:88) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
at org.springframework.data.redis.core.DefaultHashOperations.put(DefaultHashOperations.java:170) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Broken pipe (Write failed)
at redis.clients.jedis.Connection.flush(Connection.java:291) ~[jedis-2.8.2.jar!/:na]
at redis.clients.jedis.Connection.getIntegerReply(Connection.java:220) ~[jedis-2.8.2.jar!/:na]
at redis.clients.jedis.BinaryJedis.hset(BinaryJedis.java:749) ~[jedis-2.8.2.jar!/:na]
at org.springframework.data.redis.connection.jedis.JedisConnection.hSet(JedisConnection.java:2808) ~[spring-data-redis-1.7.6.RELEASE.jar!/:na]
... 115 common frames omitted
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_111]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_111]
at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_111]
at redis.clients.util.RedisOutputStream.flushBuffer(RedisOutputStream.java:52) ~[jedis-2.8.2.jar!/:na]
at redis.clients.util.RedisOutputStream.flush(RedisOutputStream.java:216) ~[jedis-2.8.2.jar!/:na]
at redis.clients.jedis.Connection.flush(Connection.java:288) ~[jedis-2.8.2.jar!/:na]
... 118 common frames omitted

Answer my question:
why the broken pipe error happen?
TransactionSynchronizationManager will hold the RedisConnection in thread, and wont close it or return it to pool, see RedisTemplate.java and RedisConnectionUtils.java. After restarting the redis server, operation on the holded RedisConnection in thread will throw broken pipe error.
how to resolve it?
Adding try/catch for all redis operation, if the error happens , unbinding it from thread, and could get a new connection from pool and execute redis operation again.
private static final ExceptionTranslationStrategy EXCEPTION_TRANSLATION =
new FallbackExceptionTranslationStrategy(JedisConverters.exceptionConverter());
public Object req(RedisRequest req) {
try {
return req.request();
} catch (Exception ex) {
if (ex instanceof NullPointerException) {
throw ex;
}
DataAccessException exception = EXCEPTION_TRANSLATION.translate(ex);
if (exception instanceof RedisConnectionFailureException) {
RedisConnectionUtils.unbindConnection(factory);
/** retry again */
return req.request();
} else {
throw ex;
}
}
}

This can happen for n number of reasons, one of them could be when you use a long-living connection (e. g. connect to Redis on application start and then use the connection over and over).
Some of the things to do are:
Reconnect if the connection is broken (needs some try/catch magic to prevent that errors are propagated to your application logic)
or a better was is to use
TestOnBorrow - Sends a PING request when you ask for the resource.
TestOnReturn - Sends a PING when you return a resource to the pool.
TestWhileIdle - Sends periodic PINGS from idle resources in the pool.
Connect at the moment you need the connection and disconnect afterwards
Regarding
if there isn't redis operation for a long period, will the idle connection in the pool become unusable?
maxidle means at any given time the system allows 'maxIdle' many number of connections to be idle the rest will be constantly checked, closed and returned to the pool.
I dont know a reason when an idle connection be unusable. Anyways this can be got rid of by using ways mentioned above.

Related

Getting reactor.pool.PoolShutdownException during save in database

Service is using org.springframework.r2dbc.core.DatabaseClient with reactor-pool and r2dbc-mysql driver.
I'm doing inserts in the database every 5-10 seconds (50-100 insert statements) and randomly after every 2-3 hours I'm getting reactor.pool.PoolShutdownException: Pool has been shut down, what might be the reason for this behavior?
Dependecy versions:
r2dbc-pool: 0.8.8.RELEASE
r2dbc-mysql: 0.8.2
spring-r2dbc: 5.3.15
Stacktrace:
org.springframework.dao.DataAccessResourceFailureException: Failed to obtain R2DBC Connection; nested exception is reactor.pool.PoolShutdownException: Pool has been shut down
at org.springframework.r2dbc.connection.ConnectionFactoryUtils.lambda$getConnection$0(ConnectionFactoryUtils.java:88)
at reactor.core.publisher.Mono.lambda$onErrorMap$31(Mono.java:3733)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:106)
at reactor.core.publisher.FluxRetry$RetrySubscriber.onError(FluxRetry.java:95)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:172)
at reactor.pool.AbstractPool$Borrower.fail(AbstractPool.java:477)
at reactor.pool.SimpleDequePool.doAcquire(SimpleDequePool.java:264)
at reactor.pool.AbstractPool$Borrower.request(AbstractPool.java:432)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:110)
at reactor.pool.SimpleDequePool$QueueBorrowerMono.subscribe(SimpleDequePool.java:676)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.FluxRetry$RetrySubscriber.resubscribe(FluxRetry.java:117)
at reactor.core.publisher.MonoRetry.subscribeOrReturn(MonoRetry.java:50)
at reactor.core.publisher.Mono.subscribe(Mono.java:4385)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:172)
at reactor.core.publisher.FluxMap$MapSubscriber.onError(FluxMap.java:132)
at reactor.core.publisher.Operators.error(Operators.java:198)
at reactor.core.publisher.MonoError.subscribe(MonoError.java:53)
at reactor.core.publisher.MonoDeferContextual.subscribe(MonoDeferContextual.java:55)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.FluxUsingWhen.subscribe(FluxUsingWhen.java:104)
at reactor.core.publisher.Flux.subscribe(Flux.java:8469)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:255)
at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:157)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoZip$ZipCoordinator.signal(MonoZip.java:251)
at reactor.core.publisher.MonoZip$ZipInner.onNext(MonoZip.java:336)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
at reactor.core.publisher.MonoCompletionStage.lambda$subscribe$0(MonoCompletionStage.java:83)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.whenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.whenComplete(Unknown Source)
at reactor.core.publisher.MonoCompletionStage.subscribe(MonoCompletionStage.java:58)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.MonoZip.subscribe(MonoZip.java:128)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.MonoZip.subscribe(MonoZip.java:128)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Basically, it happens when you have too many pending acquired connections
Example: your connection pool is 100 but you are trying to do 500 parallel
inserts, where 400 will be in pending status).
In this situation, reactor-pool disposes connection pool. So to avoid such an issue, I’m controlling the number of parallel executions.
Actually I see two ways to handle this case:
(The right way in my opinion) Is to control the flow of incoming messages by specifying concurrency parameter on the operator (concurrency =< then pool size)
flux
//some intermediate operators
.flatMap( {databaseOperation(it) }, poolSize)
In this case, you won't have more parallel executions than your connection pool can afford.
using delayUntil operator, which delays elements until there are unused connections (you can use connection pool metrics to retrieve that). I wouldn't recommend this approach because you can end up with of memory exception if you are not controlling back-pressure, or you will have to drop some items if the buffer overflows.
Method that delays message until there are available connections
fun <T> Flux<T>.delayUntilHasAvailableConnections(pool: ConnectionPool): Flux<T> {
val hasAvailableConnections = Supplier {
val metrics = pool.metrics.get()
metrics.pendingAcquireSize() <= metrics.maxAllocatedSize
}
val connectionsExistsMono = Mono.fromSupplier(hasAvailableConnections)
val hasConnectionMono = connectionsExistsMono
.handle { hasConnections, sink: SynchronousSink<Boolean> ->
if (hasConnections) {
sink.next(hasConnections)
} else {
sink.error(RuntimeException("No Connections"))
}
}.retryWhen(Retry.fixedDelay(Long.MAX_VALUE, Duration.ofSeconds(5)))
return delayUntil { hasConnectionMono }
}
Usage:
flux
//some intermediate operators
.delayUntilHasAvailableConnections(connectionPool)
.flatMap { databaseOperation(it) }
As Vladlen pointed out, reactor-pool has the ability to some kind of refuse new connections to the pool if there is a large queue for acquiring DB connections. In case of a Spring application with spring-r2dbc, this functionality is disabled by default and all connection attempts get queued.
Nevertheless I got the same exception in my Spring application. In my case it was a more special condition but if someone else stumbles upon this: The Spring actuator health check also checks the connectivity to the database. If you have your application deployed to Kubernetes, the request to the /actuator/health endpoint may not return in time if there is a large queue in front of the DB pool. This causes the readiness-/liveness-probes of Kubernetes to fail with "Connection Timed out" making Kubernetes think the application is unhealthy (which is some kind of true). In the end, Kubernetes kills the application and all existing connections to the DB pool get terminated with the above exception.
My solution was to manually limit the load similar to what Vladlen pointed out:
val coroutineLimit = Semaphore(dbPoolSize)
workItems.forEach {
launch {
coroutineLimit.withPermit {
// My DB operation
}
}
}

java.net.BindException: Address already in use: connect (I did read the others before posting)

I know there are a ton of these posts, but this is a little different. We are using vended code for part of our data processing system, and part of the system sends emails to clients if certain events take place on data insertion or deletion. Recently we have started getting address already in use exceptions. We checked the repository history, and nothing has changed in our code in the last 6 months for this system. We have already tried the typical solutions for this issue including increasing the number of connections allowed to the port with little success. We had a meeting with the vendor, and I asked if anything had changed in their code, and if they would assure that all connections in their code are explicitly closed. They indicated that they are explicitly closing all sockets. However they didn't show us the code so there is no way for us to know if this is true other than taking their word for it. So, the only thing I can think of to do is continue to increase the number of connections to the port until we stop getting bind exceptions. So, what is the industry standard for max number of connections to port 25; is there one? Also if anyone has any other suggestions I would greatly appreciate it? Thanks so much in advance, Robert
20210505112127.716 ERROR m.fiserv.ppx.business.notification.EmailNotifier : MessagingException from notify
javax.mail.MessagingException: Could not connect to SMTP host: SERVER.URL.COM, port: 25;
nested exception is:
java.net.BindException: Address already in use: connect
at com.sun.mail.smtp.SMTPTransport.openServer(SMTPTransport.java:1545)
at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:453)
Caused by:
java.net.BindException: Address already in use: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:90)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:380)
20210505131529.950 ERROR erv.ppx.web.controller.AuditReportViewController : Error while generating HTML
net.sf.jasperreports.engine.JRException: Error writing to OutputStream writer : CorpAdminAuditReport
at net.sf.jasperreports.engine.export.JRHtmlExporter.exportReport(JRHtmlExporter.java:496)
at com.fiserv.ppx.web.controller.AuditReportViewController.generateReport(AuditReportViewController.java:184)
Caused by:
com.ibm.wsspi.webcontainer.ClosedConnectionException: OutputStream encountered error during write
at com.ibm.ws.webcontainer.channel.WCCByteBufferOutputStream.write(WCCByteBufferOutputStream.java:188)
at com.ibm.ws.webcontainer.srt.SRTOutputStream.write(SRTOutputStream.java:97)
20210505140706.240 ERROR com.fiserv.ppx.business.db.DBConnectionUtil : Exception in getting for AppServer connection from DataSource.
com.ibm.websphere.ce.cm.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,005 seconds.
at com.ibm.ws.rsadapter.AdapterUtil.toSQLException(AdapterUtil.java:1680)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:661)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:611)
Caused by:
com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,005 seconds.
at com.ibm.ejs.j2c.FreePool.createOrWaitForConnection(FreePool.java:1781)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3834)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3082)
20210505140731.341 ERROR com.fiserv.ppx.business.db.DBConnectionUtil : Exception in getting for AppServer connection from DataSource.
com.ibm.websphere.ce.cm.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,010 seconds.
at com.ibm.ws.rsadapter.AdapterUtil.toSQLException(AdapterUtil.java:1680)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:661)
at com.ibm.ws.rsadapter.jdbc.WSJdbcDataSource.getConnection(WSJdbcDataSource.java:611)
Caused by:
com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException: J2CA1010E: Connection not available; timed out waiting for 180,010 seconds.
at com.ibm.ejs.j2c.FreePool.createOrWaitForConnection(FreePool.java:1781)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3904)
at com.ibm.ejs.j2c.PoolManager.reserve(PoolManager.java:3082)
20210505140731.341 ERROR com.fiserv.ppx.sso.controller.SSOController : SSO Configuration error
java.lang.NullPointerException
at com.fiserv.ppx.business.db.PPXDbTransactionManager.<init>(PPXDbTransactionManager.java:60)
at com.fiserv.ppx.sso.impl.SSOLoginAuthenticator.authenticateSSOUser(SSOLoginAuthenticator.java:157)

Can a SSLHandshakeException be a retry-able exception?

I have a service-to-service connection that is intermittently throwing SSLHandshakeExceptions from a jersey client.
public static class MyClientFilter extends ClientFilter{
#Override
public ClientResponse handle(ClientRequest cr) throws ClientHandlerException {
try {
return getNext().handle(cr);
} catch (ClientHandlerException e) {
Throwable rootCause = e.getCause() != null ? e.getCause() : e;
if (ConnectException.class.isInstance(rootCause) ||
SocketException.class.isInstance(rootCause) ||
SSLHandshakeException.class.isInstance(rootCause) //maybe?
) {
//do some retry logic
}
}
}
}
The fact that it is only happening intermittently (very rarely) says to me that my certificates and TLS are all configured correctly. In my client I am attempting to retry connections if they fail due to connection or socket exceptions. I am considering making an SSLHandshakeException also a retry-able exception because in my case it seems like it should be, but I am wondering if an SSLHandshakeException could be caused by a connection or socket issue and, if so, is there a way to tell?
Update:
The message of the exception seems to indicate that it could be a connection issue that is not related to SSL configuration:
Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1002)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:249)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
... 44 common frames omitted
Can a SSLHandshakeException be a retry-able exception?
It is not entirely clear what you are asking:
Does SSLHandshakeException itself retry? No. Of course not.
Are you permitted to retry a connection attempt following a SSLHandshakeException? Yes, you are permitted to retry.
Is advisable to retry? It probably will just fail again, but it depends on what is causing the connection to fail.
Is advisable to retry repeatedly? Definitely not.
Really what this boils down to is diagnosing the cause of the connection failures. To do this you will need to enable client-side debug logging for the SSL connections.
A common cause for this kind of problem is that the client and server cannot negotiate a mutually acceptable SSL/TLS protocol version or cryptographic suite. This typically happens when one end is using an old SSL / TLS stack that is (by current standards) insecure. If this is the root cause then retrying won't help.
It is also possible ... but extremely unlikely ... that the server or the network "glitched" at just the wrong time.
The message of the exception seems to indicate that it could be a connection issue that is not related to SSL configuration.
Actually, I doubt it. It is standard behavior for a server to simply close the connection if the negotiation has failed; see RFC 8446 Section 4.1 for the details. The client will see that as a broken connection.

how long can Spring JdbcTemplate wait for an oracle stored procedure to finish

my java code is like:
logger.info("start");
getJdbcTemplate().execute("call " + procedureName + "()");
and I got the exception:
org.springframework.dao.DataAccessResourceFailureException: StatementCallback; SQL [call PRMI_UPDATE_USER_LOGIN_INFO()]; Io ERROR: Connection reset; nested exception is java.sql.SQLException: Io ERROR: Connection reset
at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:257)
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:72)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:407)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:428)
Maybe it's caused by the long time waiting. I found that it printed "start" in log and after about 5 minutes I got the exception.
update at 2013-03-13:
I got that exception not only at calling oracle stored procedure but at druid's 'JdbcUtil.close(...)':
com.alibaba.druid.util.JdbcUtils.close:81 - close connection error
java.sql.SQLRecoverableException: Io Error: Connection reset
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:101)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:199)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:263)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:521)
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:500)
at oracle.jdbc.driver.PhysicalConnection.close(PhysicalConnection.java:3509)
at com.alibaba.druid.filter.FilterChainImpl.connection_close(FilterChainImpl.java:167)
at com.alibaba.druid.filter.stat.StatFilter.connection_close(StatFilter.java:254)
at com.alibaba.druid.filter.FilterChainImpl.connection_close(FilterChainImpl.java:163)
at com.alibaba.druid.proxy.jdbc.ConnectionProxyImpl.close(ConnectionProxyImpl.java:115)
at com.alibaba.druid.util.JdbcUtils.close(JdbcUtils.java:79)
at com.alibaba.druid.pool.DruidDataSource.shrink(DruidDataSource.java:1876)
at com.alibaba.druid.pool.DruidDataSource$DestroyConnectionThread.run(DruidDataSource.java:1694)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at oracle.net.ns.DataPacket.send(DataPacket.java:150)
at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:180)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:169)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:117)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:92)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:77)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1034)
at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1010)
at oracle.jdbc.driver.T4C7Ocommoncall.receive(T4C7Ocommoncall.java:97)
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:487)
The druid's JdbcUtil.close method is quite simple:
public static void close(Connection x) {
if (x == null) {
return;
}
try {
x.close();
} catch (Exception e) {
LOG.debug("close connection error", e);
}
}
the source code is :
https://github.com/alibaba/druid/blob/master/src/main/java/com/alibaba/druid/util/JdbcUtils.java
It should wait as long as it is needed. Forget about various hacks which try to "detect" deadlock based on timeout delay.
you should find also some ORA-XXXX error. Io ERROR: Connection reset does not look like Oracle error message, there should be some error number attached to it
the timeout 5 minutes is very strange value. Theoretically this can be setup also on database side. As profile parameter CPU_PER_CALL but in such a case you should get an error: ORA-02393: exceeded call limit on CPU usage. And you connection should NOT be lost
theoretically you can also have problems which dead connection detection, but 5 minutes timeout is too short for that
another possible source can be ORA-600 error. Oracle internal error, maybe your session process crashed and therefore TCP connection was lost
you should contact your local DBAs and ask then for cooperation. They should help you better than anonymous people on the Internet forum.
Maybe it's caused by the long time waiting
No it is not caused due to that
As Java Doc says about DataAccessResourceFailureException
Data access exception thrown when a resource fails completely: for
example, if we can't connect to a database using JDBC.

What's the difference between "java.io.IOException: Connection timed out" and "SocketTimeoutException: Read timed out"

If I set a socket SoTimeout, and read from it. when read time exceed the timeout limit, I'll get an "SocketTimeoutException: Read timed out".
and here is the stack in my case:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:277)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:527)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:462)
but here I encountered "IOExcetion: Connection timed out", i don't know how it happened.
Stacks:
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:277)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:527)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:462)
Can someone tell me what's the differences between the two exceptions, Thanks.
A connection timeout means you attempted to connect to the remote IP/port pair and failed to do so: it did not answer at all. Another possible error at that stage would be connection refused, in which this pair is available but rejected your connection attempt. Both of these errors appear on the initial setup of a socket. Note that these errors only occur with TCP, since a TCP connection requires the establishment of a session.
When you have a socket read timeout, it means you are connected, but failed to read data in time. Timeouts on sockets are configurable. You may also get a connection reset error, which means you did connect successfully, but the other end decided that after all you're not worth it :p
Simple answer:
In one case (Connection timed out) your application cannot connect to the server in a timely manner. In the other case (Read timed out) the connection can be established but during read the connection times out.
'Connection timed out' after the connect phase means that something has gone seriously wrong with the connection and it must be closed. 'Read timeout' just means that no data arrived within the specified receive timeout period: it isn't fatal.

Categories