For Grpc service client side load balancing is used.
Channel creation
ManageChannelBuilder.forTarget("host1:port,host2:port,host3:port").nameResolverFactory(new CustomNameResolverProvider()).loadBalancerFactory(RoundRobinBalancerFactory.getInstance()).usePlaintText(true).build();
Use this channel to create stub.
Problem
If one of the service [host1] goes down then whether stub will handle this scenario and not send any further request to service [host1] ?
As per documentation at https://grpc.io/blog/loadbalancing
A thick client approach means the load balancing smarts are
implemented in the client. The client is responsible for keeping track
of available servers, their workload, and the algorithms used for
choosing servers. The client typically integrates libraries that
communicate with other infrastructures such as service discovery, name
resolution, quota management, etc.
So is it the responsibility of ManagedChannel class to maintain list of active server or application code needs to maintain list of active server list and create instance of ManagedChannel every time with active server list ?
Test Result
As per test if one of the service goes down there is no impact on load balancing and all request are processed correctly.
So can it be assumed that either stub or ManagedChannel class handle active server list ?
Answer with documentation will be highly appreciated.
Load Balancers generally handle nodes going down. Even when managed by an external service, nodes can crash abruptly and Load Balancers want to avoid those nodes. So all Load Balancer implementations for gRPC I'm aware of avoid failing calls when a backend is down.
Pick First (the default), iterates through the addresses until one works. Round Robin only round robins over working connections. So what you're describing should work fine.
I will note that your approach does have one downfall: you can't change the servers while the process is running. Removing broken backends in one thing, but adding new working backends is another. If your load is ever too high, you may not be able to address the issue by adding more workers because even if you add more workers your clients won't connect to them.
Related
We are running a setup on production where grpc clients are talking to servers via proxy in between (image attached)
The client is written in java and server is written in go. We are using the load balancing property as round_robin in the client. Despite this, we have observed some bizarre behaviour. When our proxy servers scale in i.e reduce from let's say 4 to 3, then resolver gets into action and the request load from our clients gets distributed equally to all of our proxies, but when the proxy servers scale out i.e increase from 4 to 8, then the new proxy servers don't get any requests from the clients which leads to a skewed distribution of request load on our proxy servers. Is there any configuration that we can do to avoid this?
We tried setting a property named networkaddress.cache.ttl to 60 seconds in the JVM ARGS but even this didn't help.
You need to cycle the sticky gRPC connections using the keepalive and keepalive timeout configuration in the gRPC client.
Please have a look at this - gRPC connection cycling
both round_robin and pick_first perform name resolution only once. They are intended for thin, user-facing clients (android, desktop) that have relatively short life-time, so sticking to a particular (set of) backend connection(s) is not a problem then.
If your client is a server app, then you should be rather be using grpclb or the newer xDS: they automatically re-resolve available backends when needed. To enable them you need to add runtime dependency in your client to grpc-grpclb or grpc-xds respectively.
grpclb does not need any additional configuration or setup, but has limited functionality. Each client process will have its own load-balancer+resolver instance. backends are obtained via repeated DNS resolution by default.
xDS requires an external envoy instance/service from which it obtains available backends.
I have a Java-based server managed by the kubernetes cluster. It's a distributed environment where the number of the instance is set to 4 to handle millions of request per minute.
The issue that I am facing is kubernetes tries to balance the cluster and in the process kills the pod and take it to another node, but there are pending HTTP request GET and POST that gets lost.
What is the solution by kubernetes or architectural solution that would let me retry if the request is stuck/ failed?
UPDATE:
I have two configurations for kubernetes service:
LoadBalancer (is with AWS ELB): for external facing
ClusterIP: for internal microservice based architecture
Kubernetes gives you the means to gracefully handle pod terminations via SIGTERM and preStop hooks. There are several articles on this, e.g. Graceful shutdown of pods with Kubernetes. In your Java app, you should listen for SIGTERM and gracefully shutdown the server (most http frameworks have this "shutdown" functionality built in them).
The issue that I am facing is kubernetes tries to balance the cluster and in the process kills the pod and take it to another node
Now this sounds a little suspicious - in general K8s only evicts and reschedules pods on different nodes under specific circumstances, for example when a node is running out of resources to serve the pod. If your pods are frequently getting rescheduled, this is generally a sign that something else is happening, so you should probably determine the root cause (if you have resource limits set in your deployment spec make sure your service container isn't exceeding those - this is a common problem with JVM containers).
Finally, HTTP retries are inherently unsafe for non-idempotent requests (POST/PUT), so you can't just retry on any failed request without knowing the logical implications. In any case, retries generally happen on the client side, not server, so it's not a flag you can set in K8s to enable them.
Service mesh solves the particular issue that you are facing.
There are different service mesh available. General features of service mesh are
Load balancing
Fine-grained traffic policies
Service discovery
Service monitoring
Tracing
Routing
Service Mesh
Istio
Envoy
Linkerd
Linkerd: https://linkerd.io/2/features/retries-and-timeouts/
I am writing a java/scala akka proof of concept and currently I am fumbling with the actor concept in a cluster environment.
Specification
I have a specific situation where a system sends the same messages to multiple nodes. My job is to not drop any of those messages and pass only 1 message to a backend system. Like a unique filter with load balancing/fail-over capabilities.
Idea
I was thinking of using 2 "frontend" actors on 2 nodes and the system would send messages to a frontend router (lets say round-robin) which sends to the frontend actors that send to the backend.
The other fallback solution would be to use a only-leader-send-to-backend system where they all get the same message and only the leader passes it forward.
Problem
The problem I am facing (see code) is that I want the router to use existing frontend actors as routees on the cluster. This fails in the sample code because the router looks for the routees by routees-path (config setting) only locally, doesn't find any and dies.
I haven't had success with the config where the router deploys routees on the cluster nodes either. It would always deploy them locally.
I have sample code here http://ge.tt/2UHUqoQ/v/0?c. There are 2 entry points
* TransformationSample.App2 - run two instances with commandline params 2551 and 2552 each (seed nodes)
* TransformationSample.App1 - run one instance with no commandline params
The App1 is the one that tries to create a router and communicate with it but the router terminates because it can't find the frontend routers locally. I have the issue pinned to the akka.cluster.routing.ClusterRouteeProvider class createRoutees method line 178 https://github.com/akka/akka/blob/releasing-2.1.0-RC1/akka-cluster/src/main/scala/akka/cluster/routing/ClusterRouterConfig.scala.
In closing
I am probably doing something wrong here and please excuse my scala (this is the first project I am writing it with).
The reason why I am wishing for this router thingy to work is because the next step of the proof of concept would be to load balance the backend system with a similar setup where the frontend actors would communicate with a (separate) backend cluster router which sends work round-robin to backend actors.
Is this over-engineered? We have to have fail-over for the front part and load balancing on the back part.
First, what kind of actor are you using? Scala and Akka actors are different from each other.
If you are using an akka actor, try using the Remote Actor System which is really good especially if you have DB installed.
Two questions about EC2 ELB:
First is how to properly run JMeter tests. I've found the following http://osdir.com/ml/jmeter-user.jakarta.apache.org/2010-04/msg00203.html, which basically says to set -Dsun.net.inetaddr.ttl=0 when starting JMeter (which is easy) and the second point it makes is that the routing is per ip not per request. So aside from starting a farm of jmeter instances I don't see how to get around that. Any ideas are welcome, or possibly I'm mis-reading the explanation(?)
Also, I have a web service that is making a server side call to another web service in java (and both behind ELB), so I'm using HttpClient and it's MultiThreadedHttpConnectionManager, where I provide some large-ish routes to host value in the connection manager. And I'm wondering if that will break the load balancing behavior ELB because the connections are cached (and also, that the requests all originate from the same machine). I can switch to use a new HttpClient each time (kind of lame) but that doesn't get around the fact that all requests are originating from a small number of hosts.
Backstory: I'm in the process of perf testing a service using ELB on EC2 and the traffic is not distributing evenly (most traffic to 1-2 nodes, almost no traffic to 1 node, no traffic at all to a 4th node). And so the issues above are the possible culprits I've identified.
I have had very simular problems. One thing is the ELB does not scale well under burst load. So when you are trying to test it, it is not scaling up immediately. It takes a lot of time for it to move up. Another thing that is a drawback is the fact that it uses a CNAME as the DNS look up. This alone is going to slow you down. There are more performance issues you can research.
My recommendation is to use haproxy. You have much more control, and you will like the performance. I have been very happy with it. I use heartbeat to setup a redundant server and I am good to go.
Also if you plan on doing SSL with the ELB, you will suffer more because I found the performance to be below par.
I hope that helps some. When it comes down to it, AWS has told me personally that load testing the ELB does not really work, and if you are planning on launching with a large amount of load, you need to tell them so they can scale you up ahead of time.
You don't say how many jmeter instances you're running, but in my experience it should be around 2x the number of AZs you're scaling across. Even then, you will probably see unbalanced loads - it is very unusual to see the load scaled exactly across your back-end fleet.
You can help (a bit) by running your jmeter instances in different regions.
Another factor is the duration of your test. ELBs do take some time to scale up - you can generally tell how many instances are running by doing an nslookup against the ELB name. Understand your scaling patterns, and build tests around them. (So if it takes 20 minutes to add another instance to the ELB pool, include a 25-30 minute warm-up to your test.) You also get AWS to "pre-warm" the ELB pool if necessary.
If your ELB pool size is sufficient for your test, and can verify that the pool does not change during a test run, you can always try running your tests directly against the ELB IPs - i.e. manually balancing the traffic.
I'm not sure what you expect to happen with the 2nd tier of calls - if you're opening a connection, and re-using it, there's obviously no way to have that scaled across instances without closing & re-opening the connection. Are these calls running on the same set of servers, or a different set? You can create an internal ELB, and use that endpoint to connect to, but I'm not sure that would help in the scenario you've described.
Given the following scenario, please suggest me a way to implement memcache in my application.
Currently, I have 10 webservers in which the same application is being run and a load balancer to decide upon to which web server the request be sent.
On each webserver, I am maintaining a local cache i.e. there is some class XYZ which controls the MySQL table xyz and this class has initialize method
which will warm up the local cache.
Now, suppose the webservers are X,Y,Z. The load balancer sends a request to X and this request adds some values to db & updates the cache. Again the same request was sent by the load balancer to Y. But since server Y doesnot have the value in the cache, it hits the database.
So, given this scenario, how should I implement memcache in my application so that I could minimize db hits.
Should I have a separate memcache server and all the other 10 webservers will get the cached data from this memcacher server?
One work around (not ideal though), would be to implement sticky session on the load balancer so that request from one user always go through to the same server (for the duration of their session). This doesn't help much if a server dies or you need cached data shared between sessions (but it is easy and quick to do if your load balancer supports it).
Otherwise the better solution is to use something like memcached (or membase if your feeling adventurous). Memcached can either be deployed on each or your servers or on separate servers (use multiple servers to avoid the problem of one servers dying and taking your cache with it). Then on each of your application servers specify in your memcached client connection details for all of the memcached servers (put them in the same order on each server and use a consistent hashing algorithm in the memcached client options to determine on which server(s) the cache key will go).
In short - you now have a working memcached set-up (as long as you use it sensibly inside your application)
There are plenty of memcached tutorials out there that can help with the finer points of doing all this but hopefully my post will give you some general direction.