What does Retry mean in context of Java Couchbase SDK? - java

I am using Java couchbase sdk in my application. While setting up the DefaultCouchbaseEnvironment, I came across the property RetryStrategy. Now I am using the default configuration for which the retry strategy is BestEffortRetryStrategy. According to documentation
BestEffortRetryStrategy will retry the operation until it either succeeds or the maximum request lifetime is reached
By default the maximum request lifetime is 75 seconds.
Now what i what i want to understand here is what does retry mean here. Does retry mean retrying the request whenever an exception occurs or does it mean it will retry to allocate this request to some node to process the request in case it can't and it will keep retrying for 75 seconds?
I am looking at my application logs for different exceptions to understand this and I could see that TemporaryFailureException wasn't retried and i could also see that in some instances RequestCancelledException was being thrown after 75 seconds. Is it fair to assume that couchbase retries a request to assign it to node to process it and not actually retries on any exception once it makes it to the node that will process this request?
StackTrace for TemporaryFailureException-
stackTrace: com.couchbase.client.java.error.TemporaryFailureException: null
at com.couchbase.client.java.bucket.api.Mutate$2$1.call(Mutate.java:246)
at com.couchbase.client.java.bucket.api.Mutate$2$1.call(Mutate.java:220)
at rx.internal.operators.OnSubscribeMap$MapSubscriber.onNext(OnSubscribeMap.java:69)
at rx.observers.Subscribers$5.onNext(Subscribers.java:235)
at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onNext(OnSubscribeDoOnEach.java:101)
at rx.internal.producers.SingleProducer.request(SingleProducer.java:65)
at rx.Subscriber.setProducer(Subscriber.java:211)
at rx.internal.operators.OnSubscribeMap$MapSubscriber.setProducer(OnSubscribeMap.java:102)
at rx.Subscriber.setProducer(Subscriber.java:205)
at rx.internal.operators.OnSubscribeMap$MapSubscriber.setProducer(OnSubscribeMap.java:102)
at rx.Subscriber.setProducer(Subscriber.java:205)
at rx.Subscriber.setProducer(Subscriber.java:205)
at rx.subjects.AsyncSubject.onCompleted(AsyncSubject.java:103)
at com.couchbase.client.core.endpoint.AbstractGenericHandler.completeResponse(AbstractGenericHandler.java:508)
at com.couchbase.client.core.endpoint.AbstractGenericHandler.access$000(AbstractGenericHandler.java:86)
at com.couchbase.client.core.endpoint.AbstractGenericHandler$1.call(AbstractGenericHandler.java:526)
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java)
at java.lang.Thread.run(Thread.java:748)
Caused by: rx.exceptions.OnErrorThrowable$OnNextValue: OnError while emitting onNext value: com.couchbase.client.core.message.kv.UpsertResponse.class
at rx.exceptions.OnErrorThrowable.addValueAsLastCause(OnErrorThrowable.java:118)
at rx.internal.operators.OnSubscribeMap$MapSubscriber.onNext(OnSubscribeMap.java:73)
... 21 common frames omitted```

BestEffortRetryStrategy should retry until the the request is cancelled by the timeout.
FailFastRetryStrategy should not retry. It should fail immediately.
If you have a TemporaryFailureException and have BestEffortRetryStrategy, that should have been retried. If you had one that was not retried can you share the stacktrace?
Mike

Related

CXF cannot set ExceptionHandler for DefaultConnectingIOReactor

I use cxf with default clients for my async http requests. In some cases, connectingIOReactor gets some RuntimeException and breaks the reactor.
org.apache.http.nio.reactor.IOReactorException: I/O dispatch worker terminated abnormally
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:359)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: HostnameVerifier, socket reset for TTL
2 lines skipped for [org.apache.cxf]
at org.apache.http.nio.conn.ssl.SSLIOSessionStrategy$1.verify(SSLIOSessionStrategy.java:188)
at org.apache.http.nio.reactor.ssl.SSLIOSession.doHandshake(SSLIOSession.java:371)
at org.apache.http.nio.reactor.ssl.SSLIOSession.isAppInputReady(SSLIOSession.java:541)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:120)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
... 1 more
Whenever IOReactor throws exception, I always get message Request cannot be executed; I/O reactor status: STOPPED. I cannot send requests again till application is restarted.
I solved my problem by copying AsyncHTTPConduitFactory and set exceptionHandler for ioreactor manuelly. However, I couldn't find any way to set this exception handler via properties or without copying.
Is there any way to set exceptionHandler parameterized way?

Lettuce RedisCache throws java.util.concurrent.RejectedExecutionException Thread limit exceeded replacing blocked worker

I am using Spring Boot with Redis-Cache, with Lettuce default configuration, and receiving the following RejectedExecutionException after the server has been up for a few minutes:
org.springframework.data.redis.RedisSystemException: Unknown redis exception; nested exception is java.util.concurrent.RejectedExecutionException: Thread limit exceeded replacing blocked worker
at org.springframework.data.redis.FallbackExceptionTranslationStrategy.getFallback(FallbackExceptionTranslationStrategy.java:53)
at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:43)
at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:257)
at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.convertLettuceAccessException(LettuceStringCommands.java:718)
at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.get(LettuceStringCommands.java:63)
at org.springframework.data.redis.connection.DefaultedRedisConnection.get(DefaultedRedisConnection.java:210)
at org.springframework.data.redis.cache.DefaultRedisCacheWriter.lambda$get$1(DefaultRedisCacheWriter.java:109)
at org.springframework.data.redis.cache.DefaultRedisCacheWriter.execute(DefaultRedisCacheWriter.java:242)
at org.springframework.data.redis.cache.DefaultRedisCacheWriter.get(DefaultRedisCacheWriter.java:109)
at org.springframework.data.redis.cache.RedisCache.lookup(RedisCache.java:82)
at org.springframework.cache.support.AbstractValueAdaptingCache.get(AbstractValueAdaptingCache.java:58)
at org.springframework.cache.interceptor.AbstractCacheInvoker.doGet(AbstractCacheInvoker.java:73)
at org.springframework.cache.interceptor.CacheAspectSupport.findInCaches(CacheAspectSupport.java:525)
at org.springframework.cache.interceptor.CacheAspectSupport.findCachedItem(CacheAspectSupport.java:490)
at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:372)
at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:316)
at org.springframework.cache.interceptor.CacheInterceptor.invoke(CacheInterceptor.java:61)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688)
at my.controller.MyController.lambda$myFunc$0(MyController.java:60)
at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: java.util.concurrent.RejectedExecutionException: Thread limit exceeded replacing blocked worker
at java.util.concurrent.ForkJoinPool.tryCompensate(ForkJoinPool.java:2011)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3310)
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at io.lettuce.core.protocol.AsyncCommand.await(AsyncCommand.java:81)
at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:112)
at io.lettuce.core.FutureSyncInvocationHandler.handleInvocation(FutureSyncInvocationHandler.java:62)
at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)
at com.sun.proxy.$Proxy168.get(Unknown Source)
at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.get(LettuceStringCommands.java:61)
... 21 common frames omitted
It seems that Lettuce uses the common ForkJoinPool, that the DeferredResult uses as well, and all the requests and connections are choking the pool (please correct me if I'm wrong). What is the recommended approach? Should I move Lettuce to use a different pool? If so how? Please let me if there is any other configuration or other information that I can provide.
Lettuce uses Netty's EventLoop's as its threading infrastructure.
What here happens is that your task is executed on a ForkJoin pool. Lettuce uses CompletableFutures to return a handle for an async result processing and the synchronous API calls CompletableFuture.get(timeout, TimeUnit) to await command completion. Calling a blocking method on a ForkJoin pool involves the ManagedBlocker to potentially switch to a different thread that can proceed with work.
If too many threads await command completion, you eventually end up with RejectedExecutionException.
I suggest using a different execution pool for the lambda that you're invoking.

How to handle akka AskTimeoutException when submiting flink job

Flink 1.5.3, When I submit flink job to flink cluster (on yarn), it always throw AskTimeoutException. In flink configuration file, I have configed the parmater "akka.ask.timeout=1000s" , but the Exception is still like this below.
That means I have increased the timeout parameter, "akka.ask.timeout=1000s" , but it does not work.
org.apache.flink.runtime.rest.handler.RestHandlerException: Job submission failed.
at org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$handleRequest$2(JobSubmitHandler.java:116)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1851759541]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
... 21 more
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1851759541]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
... 9 more
So is there any solution to avoid this issue?
The timeouts of the communication between the REST handlers and the Flink cluster is controlled by web.timeout. The timeout is specified in milliseconds and, thus, you would need to set it to web.timeout: 1000000 in your flink-conf.yaml if you want to wait 1000s.
Moreover, it would be good to check the cluster entrypoint logs why the job submission takes so long. Usually it should not take longer than 10 seconds.

Random Exception : Futures timed out after Exception in Spark Jobs

Getting the following error on running the spark Job on Spark 2.0.
The error is Random in nature & does not occur all the time.
Once the tasks are being created most of them are completed properly while a few gets hung & throws the following error after a while.
I have tried increasing the following properties spark.executor.heartbeatInterval & spark.network.timeout but of no use.
17/07/23 20:46:35 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;#597e9d16,BlockManagerId(driver, 128.164.190.35, 38337))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1857)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
... 14 more
Yes, the problem is indeed due to GC as it used to pause the tasks, changing the default GC to G1GC reduced the problem. Thanks
XX:+UseG1GC
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

GC over limit exceeded when using Spring integration executor channel

I have got the below exception , I suspect heap memory is full so GC exception was thrown . Kindly explain if any other perspective for the below application solution
2017:06:07 21:18:36.275 [loginputtaskexecutor-7] ERROR o.s.i.handler.LoggingHandler - org.springframework.messaging.MessageHandlingException: nested exception is java.lang.IllegalStateException: Cannot process message
at org.springframework.integration.handler.MethodInvokingMessageProcessor.processMessage(MethodInvokingMessageProcessor.java:96)
at org.springframework.integration.handler.ServiceActivatingHandler.handleRequestMessage(ServiceActivatingHandler.java:89)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:109)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:127)
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:116)
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:148)
at org.springframework.integration.dispatcher.UnicastingDispatcher.access$000(UnicastingDispatcher.java:53)
at org.springframework.integration.dispatcher.UnicastingDispatcher$3.run(UnicastingDispatcher.java:129)
at org.springframework.integration.util.ErrorHandlingTaskExecutor$1.run(ErrorHandlingTaskExecutor.java:55)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Cannot process message
at org.springframework.integration.util.MessagingMethodInvokerHelper.processInternal(MessagingMethodInvokerHelper.java:333)
at org.springframework.integration.util.MessagingMethodInvokerHelper.process(MessagingMethodInvokerHelper.java:155)
at org.springframework.integration.handler.MethodInvokingMessageProcessor.processMessage(MethodInvokingMessageProcessor.java:93)
... 11 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
**Application flow in detail :**
Spring integration application build to listen message from ActiveMQ , after consuming message from ActiveMQ it will be handed over to input channel (Executor channel) which has subscriber as service activator . In Service Activator message is converted to json then stored to Cassandra . # transaction was mentioned on the Service activator method .
With the above solution , I thought of breaking Message transaction flow by implementing executor channel , after message consumed it will be handed over to executor channel and the transaction ends . after then threads in executor channel would take care of performing parallel write to cassandra.
Is there any better way to write as fast as possible for large volume of data to casandra using java spring integration
If the data sink can't keep up, add a limit to the queue size in the TaskExecutor and use a CallerRunsPolicy or CallerBlocksPolicy when the queue is full.
That will naturally throttle the workload at the rate the sink can deal with.

Categories