I have a Java-based server managed by the kubernetes cluster. It's a distributed environment where the number of the instance is set to 4 to handle millions of request per minute.
The issue that I am facing is kubernetes tries to balance the cluster and in the process kills the pod and take it to another node, but there are pending HTTP request GET and POST that gets lost.
What is the solution by kubernetes or architectural solution that would let me retry if the request is stuck/ failed?
UPDATE:
I have two configurations for kubernetes service:
LoadBalancer (is with AWS ELB): for external facing
ClusterIP: for internal microservice based architecture
Kubernetes gives you the means to gracefully handle pod terminations via SIGTERM and preStop hooks. There are several articles on this, e.g. Graceful shutdown of pods with Kubernetes. In your Java app, you should listen for SIGTERM and gracefully shutdown the server (most http frameworks have this "shutdown" functionality built in them).
The issue that I am facing is kubernetes tries to balance the cluster and in the process kills the pod and take it to another node
Now this sounds a little suspicious - in general K8s only evicts and reschedules pods on different nodes under specific circumstances, for example when a node is running out of resources to serve the pod. If your pods are frequently getting rescheduled, this is generally a sign that something else is happening, so you should probably determine the root cause (if you have resource limits set in your deployment spec make sure your service container isn't exceeding those - this is a common problem with JVM containers).
Finally, HTTP retries are inherently unsafe for non-idempotent requests (POST/PUT), so you can't just retry on any failed request without knowing the logical implications. In any case, retries generally happen on the client side, not server, so it's not a flag you can set in K8s to enable them.
Service mesh solves the particular issue that you are facing.
There are different service mesh available. General features of service mesh are
Load balancing
Fine-grained traffic policies
Service discovery
Service monitoring
Tracing
Routing
Service Mesh
Istio
Envoy
Linkerd
Linkerd: https://linkerd.io/2/features/retries-and-timeouts/
Related
I am having difficulty with one of my application exposing GRPC service. The issue we started with was that while doing rolling update, we were getting failures from our service due to service unavailability, which was not expected as we have multiple nodes and are deploying the nodes one by one.
We are using consul for service discovery and our application logic is to start the grpc service (make method call on Server class to start), give one second delay and then register to consul. The conclusion we came up with for the issue is that even when we call the server start method, it takes some time before it is actually ready to start serving the RPC calls. This delay is supposedly more than our 1s delay, hence we register the service on consul before actually the server is ready, hence errors.
What I am looking for is a way to check server's readiness before registering it to consul. Which will allow us to get the RPC calls on server only when the server is actually ready. So, does anyone know any way which can be useful in this case ?
Sorry, it turned out the server was ready to take the requests. The failing request were actually due to timeout (set on the load balancer envoy). The DB pool (Hikari) was initially taking more time for serving requests hence the failure.
For Grpc service client side load balancing is used.
Channel creation
ManageChannelBuilder.forTarget("host1:port,host2:port,host3:port").nameResolverFactory(new CustomNameResolverProvider()).loadBalancerFactory(RoundRobinBalancerFactory.getInstance()).usePlaintText(true).build();
Use this channel to create stub.
Problem
If one of the service [host1] goes down then whether stub will handle this scenario and not send any further request to service [host1] ?
As per documentation at https://grpc.io/blog/loadbalancing
A thick client approach means the load balancing smarts are
implemented in the client. The client is responsible for keeping track
of available servers, their workload, and the algorithms used for
choosing servers. The client typically integrates libraries that
communicate with other infrastructures such as service discovery, name
resolution, quota management, etc.
So is it the responsibility of ManagedChannel class to maintain list of active server or application code needs to maintain list of active server list and create instance of ManagedChannel every time with active server list ?
Test Result
As per test if one of the service goes down there is no impact on load balancing and all request are processed correctly.
So can it be assumed that either stub or ManagedChannel class handle active server list ?
Answer with documentation will be highly appreciated.
Load Balancers generally handle nodes going down. Even when managed by an external service, nodes can crash abruptly and Load Balancers want to avoid those nodes. So all Load Balancer implementations for gRPC I'm aware of avoid failing calls when a backend is down.
Pick First (the default), iterates through the addresses until one works. Round Robin only round robins over working connections. So what you're describing should work fine.
I will note that your approach does have one downfall: you can't change the servers while the process is running. Removing broken backends in one thing, but adding new working backends is another. If your load is ever too high, you may not be able to address the issue by adding more workers because even if you add more workers your clients won't connect to them.
RabbitMQ RPC
I decided to use RabbitMQ RPC as described here.
My Setup
Incoming web requests (on Tomcat) will dispatch RPC requests over RabbitMQ to different services and assemble the results. I use one reply queue with one custom consumer that listens to all RPC responses and collects them with their correlation id in a simple hash map. Nothing fancy there.
This works great in a simple integration test on controller level.
Problem
When I try to do this in a web project deployed on Tomcat, Tomcat refuses to shut down. jstack and some debugging learned me a thread is spawn to listen for the RPC response and is blocking Tomcat from shutting down gracefully. I guess this is because the created thread is created on application level instead of request level and is not managed by Tomcat. When I set breakpoints in Servlet.destroy() or ServletContextListener.contextDestroyed(ServletContextEvent sce), they are not reached, so I see no way to manually clean things up.
Alternative
As an alternative, I could use a new reply queue (and simple QueueingConsumer) for each web request. I've tested this, it works and Tomcat shuts down as it should. But I'm wondering if this is the way to go.. Can a RabbitMQ cluster deal with thousands (or even millions) of short living queues/consumers? I can imagine queues aren't that big, but still.. constantly broadcasting to all cluster nodes.. the total memory footprint..
Question
So in short, is it wise do create a queue for each incoming web request or how should I setup RabbitMQ with one queue and consumer so Tomcat can shutdown gracefully?
I found a solution for my problem:
The Java client is creating his own threads. There is the possibility to add your own ExecutorService when creating a new connection. Doing so in the ServletContextListener.initialized() method, one can keep track of the ExecutorService and shut it down manually in the ServletContextListener.destroyed() method.
executorService.shutdown();
executorService.awaitTermination(20, TimeUnit.SECONDS);
I used Executors.newCachedThreadPool(); as the threads have many short executions, and they get cleaned up when being idle for more then 60s.
This is the link to the RabbitMQ Google group thread (thx to Michael Klishin for showing me the right direction)
I have a relatively simple java service that fetches information from various SOAP webservices and does so using apache cxf 2.5.2 under the hood. The service launches 20 worker threads to churn through 1000-8000 requests every hour and each request could make 2-5 webservice calls depending on the nature of the request.
Setup
I am using connection pooling on the webservice connections
Connection Timeout is set to 2 seconds in order to realistically tackle the volume of requests efficiently.
All connections are going out through a http proxy.
20 Worker Threads
Grunty 16 cpu box
The problem is that I am starting to see 'connect time out' errors in the logs and quite a large number of them and it seems the the application service is also effecting the machines network performance as curl from the command line takes >5 seconds just establish a connection to the same webservices. However when I stop the service application, curl performance improves drastically to < 5ms
How have other people tackled this situation using CXF? did it work or did they switch to a different library? If you were to start from scratch how would you design for 'small payload high frequency' transactions?
Once we had the similar problem as yours that the request took very long time to complete. It is not CXF issue, every web services' stacks will operate long for very frequent request.
To solve this issue we implemented JMS EJB message driven bean. The flow were as follows: when the users send their request to web service, all requests were put into JMS queue so that response to users come very quickly and request is left to process at the background. Later the users were able to see their operations: if they are still send to process, if they are processing, if they are completed successfully or if they failed to complete for some reason.
If I had to design frequent transactions application, I would definitely use JMS for that.
Hope this helps.
I have a central load balancing server and several application servers running on Apache Tomcat. The load balancing server receives request and forwards them to the application servers in round robin fashion. If one these application servers goes down, the load balancing server should stop forwarding requests to it.
My current solution for this is to ping the application servers every few minutes and if I don't receive a response, remove them from a list of available servers. Is there a better way to monitor the status of these servers? Should I ping more often or should the application servers constantly inform the load balancing server?
Execute a null transaction on it regularly. Pinging really isn't enough, it only exercises the TCP/IP stack, and I have seen operating systems in states where TCP/IP was up but no applications and not even part of the OS stack itself. Executing a transaction exercises everything. Include the database in the null transaction.
First ensure your server isn DDOS attrack protected , if the depends on you application connection avg time edit keep alive time
Then you should study about precock mpm , i think it will give you best solution