Maximum number of retries possible concurrently in resilience4j - java

I implemented retry mechanism offered by resilience4j in my machine for a project that makes http calls asynchronously. I can see that the http calls are being retried properly. However, these calls are asynchronous and so multiple HTTP calls will be made and it is possible that a number of these calls will fall into a retry at the same time. However, at a time I am able to see that only 7-9 retries are attempted. My question is why is there a cap on this ? Is it possible to configure this ?
Lets say i have a method as (this is a pseudocode).
#Async
#Retry(name = "retryA",fallbackMethod = "fallbackRetry")
public ResponseObj getExternalHttpRepsonse(String payload){
ClientResponse resp = webUtils.postRequest(payload);
boolean validatePredicate = responsePredciate.test(resp);
if(!validatePredicate){
throw new PredicateValidationFailedException();
}
return new ResponseObj(resp);
}
I am seeing an output of 7-9 attempts 1s failed, attempt 2s failed in the logs continuously whenever failures occur. Why is this capped between 7-9 and not more than that ?

Related

WebFlux backoff and multi-threading in a kafka consumer flow

I have a Kafka consumer written in Java and SpringBoot.
I am using WebFlux in order to make a call to trigger some actions on a third party server (and waiting the result of course).
This server has rate limit that is limiting me from making a lot of requests in a short time.
In order to prevent failures I intend to keep on trying calling the server using WebFlux backoff:
webClientBuilder.build()
.get()
...
.retryWhen(getRetryPolicyOnTooManyRequests())
...
private RetryBackoffSpec getRetryPolicyOnTooManyRequests() {
return Retry.backoff(20, Duration.ofSeconds(retryBackoffMinimumSeconds))
.filter(this::is429Error);
}
private boolean is429Error(Throwable throwable) {
return throwable instanceof WebClientResponseException
&& ((WebClientResponseException) throwable).getStatusCode() == HttpStatus.TOO_MANY_REQUESTS;
}
My questions are about the behavior I should expect from my kafka:
What will happen when I'll be backoffing one of my calls? Will I be blocking the thread? Will a new thread be opened to process another message?
If I got the default consumer configurations (max.poll.records=500, max.poll.interval.ms=30000) and my backoff time will get to 5 minutes will the kafka group be rebalanced?
If so, is there a smarter way to tackle this issue so I won't get rebalanced
each time, other than just putting a super high number in max.poll.interval.ms

What should I try next in order to minimise/eliminate java.net.SocketTimeoutException: timeout spring retry

I get lots of events to process in RabbitMq and then those get forward to service 1 to process and after some processing the data, there is an internal call to a micro service2. However, I do get java.net.SocketTimeoutException: timeout frequently when I call service2, so I tried to increase timeout limit from 2s to 10 sec as a first trial and it did minimise the timeout exceptions but still lot of them are still there,
second change I made is removal of deprecated retry method of spring and replace the same with retryWhen method with back off and jitter factor introduced as shown below
.retryWhen(Retry.backoff(ServiceUtils.NUM_RETRIES, Duration.ofSeconds(2)).jitter(0.50)
.onRetryExhaustedThrow((retryBackoffSpec, retrySignal) -> {
throw new ServiceException(
ErrorBo.builder()
.message("Service failed to process after max retries")
.build());
}))
.onErrorResume(error -> {
// return and print the error only if all the retries have been exhausted
log.error(error.getMessage() + ". Error occurred while generating pdf");
return Mono.error(ServiceUtils
.returnServiceException(ServiceErrorCodes.SERVICE_FAILURE,
String.format("Service failed to process after max retries, failed to generate PDF)));
})
);
So my questions are,
I do get success for few service call and for some failure, does it mean some where there is still bottle neck for processing the request may be at server side that is does not process all the request.
Do I need to still increase timeout limit if possible
How do I make sure that there is no java.net.SocketTimeoutException: timeout
This issue has started coming recently. and it seems there is no change in ports or any connection level changes.
But still what all things I should check in order to make sure the connection level setting are correct. Could someone please guide on this.
Thanks in advance.

How to simulate timeout in response to a Rest request in Spring?

I have a Rest API implemented with Spring Boot 2. To check some client behavior on timeout, how can I simulate that condition in my testing environment? The server should regularly receive the request and process it (in fact, in production timeouts happen due to random network slowdowns and large big response payloads).
Would adding a long sleep be the proper simulation technique? Is there any better method to really have the server "drop" the response?
Needing sleeps to test your code is considered bad practice. Instead you want to replicate the exception you receive from the timeout, e.g. java.net.SocketTimeoutException when using RestTemplate.exchange.
Then you can write a test as such:
public class FooTest
#Mock
RestTemplate restTemplate;
#Before
public void setup(){
when(restTemplate.exchange(...)).thenThrow(new java.net.SocketTimeoutException())
}
#Test
public void test(){
// TODO
}
}
That way you wont be twiddling your thumbs waiting for something to happen.
Sleep is one way to do it, but if you're writing dozens of tests like that then having to wait for a sleep will create a really long-running test suite.
The alternative would be to change the 'threshold' for timeout on the client side for testing. If in production your client is supposed to wait 5 seconds for a response, then in test change it to 0.5 seconds (assuming your server takes longer than that to respond) but keeping the same error handling logic.
The latter might not work in all scenarios, but it will definitely save you from having a test suite that takes 10+ mins to run.
You can do one thing which I did in my case .
Actually in my case, when my application is running in a production environment, we keep on polling trades from the API and sometimes it drops the connection by throwing an Exception SSLProtocolException.
What we did
int retryCount =5;
while (true ){
count++;
try{
//send an api request here
}catch (Exception e){
if(retryCount == count ) {
throw e
// here we can do a thread sleep. and try after that period to reconnect
}
}
}
Similarly in your case some Exception it will throw catch that Exception and put your thread in Sleep for a while and after that again try for Connection the retryCount you can modify as per your requirment, in my case it was 5.

Apache httpclient: why doesn't retry when timeout

In apache httpclient 4.3, DefaultHttpRequestRetryHandler's code
if (exception instanceof InterruptedIOException) {
// Timeout
return false;
}
It won't retry if it's timeout. What's the reason? Sometimes, the network is not stable, I just want to retry connection. I can use my own RetryHandler, but I just want to make sure if there is any problem if I retry when timeout.
It won't retry if it's timeout. What's the reason?
Why should it? Timeouts usually defines a maximum period of inactivity between two consecutive operations. Why should the request be retried if it times out in the first place? If you are willing to wait longer for the operation to complete you should be using a greater timeout value.
This helped me. I tried to disable the retry option. The code below does the opposite.
DefaultHttpClient httpClient = new DefaultHttpClient();
DefaultHttpRequestRetryHandler retryHandler = new DefaultHttpRequestRetryHandler(0, true);
httpClient.setHttpRequestRetryHandler(retryHandler);
Thanks
I have used commercially a custom RetryHandler which mimics the Default* one, but allows retry for the following exceptions which we were getting regularly: ConnectTimeoutException and HttpHostConnectException. These exceptions were thrown a lot after a 15s timeout. The connection should be made sub-second so we now retry up to 3 times with 5s timeout, this has seen a large increase in successful connections being made on the second attempt.
We are still looking into why these connection requests aren’t being made in a timely manner between our azure app service and on-prem services.

Akka and Ask Pattern. When Actor is abruptly stopped can i return Future?

I currently have code which dispatches a request using the Ask Pattern. The dispatched request will generate an Akka Actor which sends a HTTP request and then returns the response. I'm using Akka's circuit breaker API to manage issues with the upstream web services i call.
If the circuitbreaker is in an open state then all subsequent requests are failing fast which is the desired effect. However when the actor fails fast it just throws a CircuitBreakerOpenException, stops the actor however control does not return to the code which made the initial request until an AskTimeoutException is generated.
This is the code which dispatches the request
Timeout timeout = new Timeout(Duration.create(10, SECONDS));
Future<Object> future = Patterns.ask(myActor, argMessage, timeout);
Response res = (Response ) Await.result(future, timeout.duration());
This is the circuitbreaker
getSender().tell(breaker.callWithSyncCircuitBreaker(new Callable<Obj>()
{
#Override
public Obj call() throws Exception {
return fetch(message);
}
}), getSelf()
);
getContext().stop(getSelf());
When this block of code is executed and if the circuit is open it fails fast throwing an exception however i want to return control back to the code which handles the future without having to wait for a timeout.
Is this possible?
When an actor fails out and is restarted, if it was processing a message, no response will be automatically sent to that sender. If you want to send that sender a message on that particular failure then catch that exception explicitly and respond back to that sender with a failed result, making sure to capture the sender first before you go into any future callbacks to avoid closing over this mutable state. You could also try to do this in the preRestart, but that's not very safe as by that time the sender might have changed if you are using futures inside the actor.

Categories