How to handle in Reactor Netty an io.netty.channel.ConnectTimeoutException

How to handle in Reactor Netty an io.netty.channel.ConnectTimeoutException - java

I'm trying to use Reactor Netty TcpClient in reactive way to interact with hosts, that may be unreachable. Here is an example of a channel initialization logic:
ConnectionProvider connectionProvider = ConnectionProvider.fixed("fixed", 50);
TcpClient.create(connectionProvider)
.host(host).port(port)
.wiretap(true)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 50)
.doOnConnect(x -> log.trace("Connect to {}:{}", host, port))
.doOnConnected(conn -> log.trace("Connected {}", conn.channel()))
.connect()
.subscribe(this::utilizeConnection);
the output, that i receiving :
2019-09-04 08:23:13.612 TRACE 71988 --- [ioEventLoop-4-3] c.c.pcb.poc.network.tcp.NettyTcpSender : Connect to 192.168.88.210:2000
2019-09-04 08:23:13.684 WARN 71988 --- [actor-tcp-nio-4] io.netty.util.concurrent.DefaultPromise : An exception was thrown by reactor.netty.resources.PooledConnectionProvider$DisposableAcquire.operationComplete()
reactor.core.Exceptions$ErrorCallbackNotImplemented: io.netty.channel.ConnectTimeoutException: connection timed out: /192.168.88.210:2000
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /192.168.88.210:2000
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ~[netty-transport-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127) ~[netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:405) [netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.MonoCreate] :
reactor.core.publisher.Mono.create(Mono.java:183)
reactor.netty.resources.PooledConnectionProvider.acquire(PooledConnectionProvider.java:130)
Error has been observed by the following operator(s):
|_ Mono.create ⇢ reactor.netty.resources.PooledConnectionProvider.acquire(PooledConnectionProvider.java:130)
|_ Mono.doOnSubscribe ⇢ reactor.netty.tcp.TcpClientDoOn.connect(TcpClientDoOn.java:58)
The 'inbound' and 'outbound' are having a dedicated method to handle their errors, but they works on top of a Connection instance that won't be created if you got the 'connection timeout'.
I tried:
The exception, that i receiving is wrapped in 'ErrorCallbackNotImplemented'. But I wasn't able to find any way to implement any 'ErrorCallback'
The log contains a warning message from 'io.netty.util.concurrent.DefaultPromise' . but I wasn't able to find a way how to make own Promise to handle it in a right way.
No any configurations i've found that may somehow intercept connection timeouts.
workaround. The blocked approach to create a connection ( .block() instead of .subscribe()) will allow me to catch any Connection creating exceptions within plain try-catch block, but i'll lose the benefits of reactive approach with such workaround.
Do somebody may suggest me at least something to help me to find a right way to handle a 'io.netty.channel.ConnectTimeoutException'?

Do not forget to implement your error callback
Usually reactor.core.Exceptions$ErrorCallbackNotImplemented happens when there is subscription over labmda based .subscribe method (same for Mono and Flux).
If you are going to look at the sources here and here, you will find the place where reactor.core.Exceptions$ErrorCallbackNotImplemented is thrown!
Action Points
In order to handle the original io.netty.channel.ConnectTimeoutException I would recommend looking at Handling Errors section of the original Project Reactor documentation

Related

Webflux, reactor connection reset by peer unhandled

So basically I am simulating a 'Connection reset by peer' locally using Toxiproxy or Wiremock (same behaviour for both of them) and I get the exception below.
The tricky part is that it will bypass any defined exception handlers and there's no way to revert the flow if there's a failure.
I tried configuring the WebClient as mentioned here: https://github.com/reactor/reactor-netty/issues/388 by either removing the connection pool or simply setting the HttpClient keepAlive property to false.
I defined a global exception handling mechanism by extending ErrorWebExceptionHandler but it doesn't go through.
Do you have any suggestions on how to manage this one? Or a proper configuration which I could try for either TcpClient / HttpClient or WebClient? What am I missing?
There must be a way to handle this properly..
[26/04/22 20:02:28] lvl=ERROR [ioEventLoopGroup-4-3] r.n.r.PooledConnectionProvider - [id: 0xaf559a88, L:0.0.0.0/0.0.0.0:56755 ! R:localhost/127.0.0.1:8989] Pooled connection observed an error
java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:233)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:358)
at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:247)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1140)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

Why errorChannel is not getting called in case of MQException from int-jms:outbound-channel-adapter?

Jms outbound-channel-adapter works perfectly fine but I see intermittently this error in the log, however, MQ message still gets delivered.
2019-06-07 10:16:22 [JMSCCThreadPoolWorker-5] INFO o.s.j.c.CachingConnectionFactory - Encountered a JMSException - resetting the underlying JMS Connection
com.ibm.msg.client.jms.DetailedJMSException: JMSWMQ1107: A problem with this connection has occurred.
at com.ibm.msg.client.wmq.common.internal.Reason.reasonToException(Reason.java:578)
at com.ibm.msg.client.wmq.common.internal.Reason.createException(Reason.java:214)
at com.ibm.msg.client.wmq.internal.WMQConnection.consumer(WMQConnection.java:794)
at com.ibm.mq.jmqi.remote.api.RemoteHconn.callEventHandler(RemoteHconn.java:2903)
at com.ibm.mq.jmqi.remote.api.RemoteHconn.driveEventsEH(RemoteHconn.java:628)
at com.ibm.mq.jmqi.remote.impl.RemoteDispatchThread.processHconn(RemoteDispatchThread.java:691)
at com.ibm.mq.jmqi.remote.impl.RemoteDispatchThread.run(RemoteDispatchThread.java:233)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTask(WorkQueueItem.java:263)
at com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.runItem(SimpleWorkQueueItem.java:99)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run(WorkQueueItem.java:284)
at com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.runWorkQueueItem(WorkQueueManager.java:312)
at com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManagerImplementation$ThreadPoolWorker.run(WorkQueueManagerImplementation.java:1214)
Caused by: com.ibm.mq.MQException: JMSCMQ0001: WebSphere MQ call failed with compcode '2' ('MQCC_FAILED') reason '2009' ('MQRC_CONNECTION_BROKEN').
...This is errorChannel configuration:
<int:header-enricher id="errorMsg.HeaderEnricher"
input-channel="errorChannel"
output-channel="omniAlertsJmsErrorChannel">
...and jms outbound-channel-adapter configured:
<int-jms:outbound-channel-adapter
id="jmsOutToNE" channel="umpAlertNotificationJMSChannel"
destination="senderTopic"
jms-template="jmsQueueTemplate"
>
I am expecting omniAlertsJmsErrorChannel to receive the MessageHandlingException which is not happening from jmsOutToNE adapter. All other channel/flow errors are being routed to omniAlertsJmsErrorChannel.
Also, wondering if there jms outbound-channel-adapter is retrying internally when com.ibm.mq.MQException happens and it gets successful in the subsequent try?

The errorChannel is out of use in any MessageHandler. Their exception are just thrown to the caller. To handle those exception you need to consider adding a request-handler-advice-chain with the ExpressionEvaluatingRequestHandlerAdvice. Or if we talk about a retry, you also can add to that chain a RequestHandlerRetryAdvice.
See more info in the Reference Manual.
Not sure though why your message is still delivered to the MQ. There is no any retry out-of-the-box in the <int-jms:outbound-channel-adapter. It might a behavior of the MQ Connection Factory adapter in the IBM library.

camel : stackoverflow error when route is called recursively

I have a camel route which calls itself until a certain condition is met. Basically the idea is to implement the retry of route. When the application is deployed I am getting stackoverflow error when retries happen for a long period.
[Camel (camel-1) thread #1 - Multicast] ERROR com.application.RouteName.lambda$configure$0 - Exception occurred during execution on the exchange: Exchange[ID-batchrater-310822922-1-383133832-34058-1530798326741-0-6]
org.apache.camel.CamelExecutionException: Exception occurred during execution on the exchange: Exchange[ID-batchrater-310822922-1-383133832-34058-1530798326741-0-6]
at org.apache.camel.util.ObjectHelper.wrapCamelExecutionException(ObjectHelper.java:1779)
at org.apache.camel.impl.DefaultExchange.setException(DefaultExchange.java:351)
.
.
.
.
Caused by: java.lang.StackOverflowError: null
at org.springframework.beans.factory.support.AbstractBeanFactory.transformedBeanName(AbstractBeanFactory.java:1117)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:239)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202)
at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1084)
at org.apache.camel.spring.spi.ApplicationContextRegistry.lookupByNameAndType(ApplicationContextRegistry.java:47)
at org.apache.camel.impl.PropertyPlaceholderDelegateRegistry.lookupByNameAndType(PropertyPlaceholderDelegateRegistry.java:63)
at org.apache.camel.component.bean.BeanInfo.createParameterMappingStrategy(BeanInfo.java:177)
at org.apache.camel.component.bean.BeanInfo.<init>(BeanInfo.java:99)
I believe the stackoverflow error is due to the recursive call of route and I changed the route structure and now redelivery is handled by the retryDelivery mechanisms available in camel onException() . And my number of retries can be infinite until the condition is met.
I need to know will there be any chance of stackOverFlow again with this approach.

No this is the right approach to handle error handling redeliveries with the onException and other error handling features. Using the loop EIP leads to longer stack-frames and should not be used for looping very long. So you did the right fix.

trying to find db connection leak in my code, using Spring / JPA / Hikari

I've got a problem with a Spring web application that periodically runs into an error fetching a connection from my connection pool. Eventually in the logs I see entries like:
Caused by: javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.
Only way to recover I've found once it hits this point is to restart Tomcat.
I think the most likely explanation is I have some code somewhere that is not properly cleaning up its connection - returning it to Hikari, leaving something open so Spring can't clean it up, etc.
To troubleshoot I've set my hikari config leakDetectionThreshold to 5000ms and enabled logging. After that, I see log entries like
2018-04-24 19:53:56 WARN ProxyLeakTask:87 - Connection leak detection
triggered for org.postgresql.jdbc.PgConnection#664ec666, stack trace
follows
java.lang.Exception: Apparent connection leak detected
at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122)
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:35)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:99)
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:129)
at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.connection(StatementPreparerImpl.java:47)
at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$5.doPrepare(StatementPreparerImpl.java:146)
at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$StatementPreparationTemplate.prepareStatement(StatementPreparerImpl.java:172)
at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.prepareQueryStatement(StatementPreparerImpl.java:148)
at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1940)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1909)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1887)
at org.hibernate.loader.Loader.doQuery(Loader.java:932)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:349)
at org.hibernate.loader.Loader.doList(Loader.java:2615)
at org.hibernate.loader.Loader.doList(Loader.java:2598)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2430)
at org.hibernate.loader.Loader.list(Loader.java:2425)
at org.hibernate.loader.custom.CustomLoader.list(CustomLoader.java:335)
at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:2129)
at org.hibernate.internal.AbstractSharedSessionContract.list(AbstractSharedSessionContract.java:981)
at org.hibernate.query.internal.NativeQueryImpl.doList(NativeQueryImpl.java:147)
at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1398)
at org.hibernate.query.internal.AbstractProducedQuery.getSingleResult(AbstractProducedQuery.java:1444)
at sun.reflect.GeneratedMethodAccessor191.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.orm.jpa.SharedEntityManagerCreator$DeferredQueryInvocationHandler.invoke(SharedEntityManagerCreator.java:379)
at com.sun.proxy.$Proxy163.getSingleResult(Unknown Source)
at com.mycompany.web.jpa.util.DBHelper.getPagedMappedDbResults(DBHelper.java:76)
at com.mycompany.web.jpa.repository.TaskRepositoryImpl.findTaskDetailsByStepIdAndIdIn(TaskRepositoryImpl.java:245)
......
So it is detecting a possible leak. Could be a false positive I suppose? But this is also the only class in my app that is doing database access outside of the standard service/repository pattern often used in Spring apps, so it seems like a likely culprit, and it's my best lead at the moment.
Anyway, the last piece of non library code I see in the trace (ie stuff I wrote, so most likely to be the cause of the leak!) is my DBHelper::getPagedMappedDbResults method, relevant bit included here:
Query q = entityManager.createNativeQuery(countQueryText);
setQueryParameters(q, parameters);
long numActualResults = 0;
try {
numActualResults = ((Number)q.getSingleResult()).longValue(); // line 76
} catch (Exception e) {
System.out.println("just in case: " + e);
}
So basically I create a Query object from my EntityManager instance, set some parameters, and run it to get some results.
Is there something I need to be doing with a Query object when I'm done with it? q.cleanup()? I don't see anything like this from reading the docs, but am I not doing good housekeeping on this resource?
The entityManager itself is created from an #Autowired annotation. My understanding is if I didn't "new" it to instantiate it and instead let the Spring framework autowire it, then Spring will do whatever cleanup is necessary. Is that right? Or do I need to be doing some cleanup after I use the entityManager?
Version details:
Tomcat 8 / Java 8
Spring 5.0.0.RELEASE
Spring Data Kay-RELEASE
Hibernate 5.2.3.Final
Hikari 2.4.5
Any advice or suggestions would be greatly appreciated, thanks!

What is the query? Is it heavy? Maybe you have deadlock here? Connection management looks fine. You do not acquire connection explicitly, so no need to release it. The query might be long running so Hibernate is not able to complete it and release the connection.
Also, you can check the number of open connections on the DB side. Do some analysis on that side as well.

When Hystrix short-circuits and the circuit is OPEN it never closes even after the the offending service is back up

I am having an Issue where when the hystrix circuit breaker trips, it does not close ever again. I have turned logging to debug and I do not see it trying to allow a test request through in which case it seems to me It will never close since its only supposed to close when a test execution goes through successfully indicating that the offending service is now healthy. According to the documentation the Circuit break configuration defaults should be working but I can not seem to tell why the test request is never allowed through.
2016-02-18 09:00:38,782 noodle-soup-service application-akka.actor.default-dispatcher-7 ERROR akka.actor.OneForOneStrategy - CallServiceCommand short-circuited and fallback failed.
com.netflix.hystrix.exception.HystrixRuntimeException: CallServiceCommand short-circuited and fallback failed.
at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:816) ~[com.netflix.hystrix.hystrix-core-1.4.23.jar:1.4.23]
at com.netflix.hystrix.AbstractCommand$16.call(AbstractCommand.java:790) ~[com.netflix.hystrix.hystrix-core-1.4.23.jar:1.4.23]
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:99) ~[io.reactivex.rxjava-1.1.0.jar:1.1.0]
at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:71) ~[io.reactivex.rxjava-1.1.0.jar:1.1.0]
...
Caused by: java.lang.RuntimeException: Hystrix circuit short-circuited and is OPEN
at com.netflix.hystrix.AbstractCommand$1.call(AbstractCommand.java:414) ~[com.netflix.hystrix.hystrix-core-1.4.23.jar:1.4.23]
... 38 common frames omitted

In the Akka error handling strategy, I needed to schedule a new test call so that eventually when it succeeds, then Hystrix can close the Circuit.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to handle in Reactor Netty an io.netty.channel.ConnectTimeoutException - java

Related

Webflux, reactor connection reset by peer unhandled

Why errorChannel is not getting called in case of MQException from int-jms:outbound-channel-adapter?

camel : stackoverflow error when route is called recursively

trying to find db connection leak in my code, using Spring / JPA / Hikari

When Hystrix short-circuits and the circuit is OPEN it never closes even after the the offending service is back up

Categories

Resources