Resolution of Server IPs in gRPC

Resolution of Server IPs in gRPC - java

We are running a setup on production where grpc clients are talking to servers via proxy in between (image attached)
The client is written in java and server is written in go. We are using the load balancing property as round_robin in the client. Despite this, we have observed some bizarre behaviour. When our proxy servers scale in i.e reduce from let's say 4 to 3, then resolver gets into action and the request load from our clients gets distributed equally to all of our proxies, but when the proxy servers scale out i.e increase from 4 to 8, then the new proxy servers don't get any requests from the clients which leads to a skewed distribution of request load on our proxy servers. Is there any configuration that we can do to avoid this?
We tried setting a property named networkaddress.cache.ttl to 60 seconds in the JVM ARGS but even this didn't help.

You need to cycle the sticky gRPC connections using the keepalive and keepalive timeout configuration in the gRPC client.
Please have a look at this - gRPC connection cycling

both round_robin and pick_first perform name resolution only once. They are intended for thin, user-facing clients (android, desktop) that have relatively short life-time, so sticking to a particular (set of) backend connection(s) is not a problem then.
If your client is a server app, then you should be rather be using grpclb or the newer xDS: they automatically re-resolve available backends when needed. To enable them you need to add runtime dependency in your client to grpc-grpclb or grpc-xds respectively.
grpclb does not need any additional configuration or setup, but has limited functionality. Each client process will have its own load-balancer+resolver instance. backends are obtained via repeated DNS resolution by default.
xDS requires an external envoy instance/service from which it obtains available backends.

Related

Why do we use port number localhost: 8080? Why don't we use a port number when using www.example.com?

When I use a Spring Boot app in local it uses, localhost:8080. When it is pushed to Pivotal Cloud Foundry, it has some route https://my-app.xyz-domain.com and we can access the URL without a port, what is happening behind the scene?
Please help me understand.

There is a default port number for each protocol which is used by the browser if none is specified. For https it is 443, for http 80 and for telnet 23.
On Unix and similar systems like Linux, those are often not available to a developer so other ports are used but then they have to be specified. 8080 is often available and looks like 80.

On CloudFoundry, your application is actually still running on localhost:8080. The reason that you can access your application through https://my-app.xyz-domain.com is that the platform handles routing the traffic from that URL to your application.
The way this works is as follows:
You deploy your application. It's run by the foundation in a container. The container is assigned a port, which it provides to the application through the $PORT env variable (this can technically change, but it's been 8080 for a long time). Your application then listens on localhost:$PORT or effectively localhost:8080.
The platform also runs Envoy in your container. It's configured to listen for incoming HTTP and HTTPS requests, and it will proxy that traffic to your application on localhost:$PORT.
Using the cf cli, you map a route to your application. This is a logical rule that tells the platform what external traffic should go to your application. A route can consist of a hostname, domain, and/or path. For example, my-cool-app.example.com or my-cool-app.example.com/foo. For a route to work, the domain must have its DNS directed to the platform.
When an end-user accesses the route that you mapped, the DNS resolves to the platform and the traffic is directed to the external load balancers (sometimes TCP/layer4, sometimes HTTPS/layer7) that sit in front of the platform. These proxies do not have knowledge of CF, they just proxy incoming traffic.
Traffic from the external load balancers is spread across the set of the platform Gorouters. The Gorouter is a second layer of proxies, but these proxies have knowledge of the platform, specifically, all of the routes that have been mapped on the platform and where those applications actually live.
When a request comes to Gorouter, it will recognize the route like my-cool-app.example.com and look up the location of the container where that app is running. Traffic from Gorouter is then proxied to the Envoy which is running in the app container. This ties into step two as the Envoy will route that traffic to your application.
All in total, incoming requests are routed like this:
Client/Browser -> External LBs -> Gorouters -> Envoy -> Application

First, you should change the port to 80 or 443, because HTTP corresponds to 80, and HTTPS corresponds to 443. Then, you should set the domain name to resolve to the current host, so that you can access the current application through the domain name. In addition, if you want to set the local domain name, then The hosts file should be modified.

GRPC client side load balancing : One of node goes down

For Grpc service client side load balancing is used.
Channel creation
ManageChannelBuilder.forTarget("host1:port,host2:port,host3:port").nameResolverFactory(new CustomNameResolverProvider()).loadBalancerFactory(RoundRobinBalancerFactory.getInstance()).usePlaintText(true).build();
Use this channel to create stub.
Problem
If one of the service [host1] goes down then whether stub will handle this scenario and not send any further request to service [host1] ?
As per documentation at https://grpc.io/blog/loadbalancing
A thick client approach means the load balancing smarts are
implemented in the client. The client is responsible for keeping track
of available servers, their workload, and the algorithms used for
choosing servers. The client typically integrates libraries that
communicate with other infrastructures such as service discovery, name
resolution, quota management, etc.
So is it the responsibility of ManagedChannel class to maintain list of active server or application code needs to maintain list of active server list and create instance of ManagedChannel every time with active server list ?
Test Result
As per test if one of the service goes down there is no impact on load balancing and all request are processed correctly.
So can it be assumed that either stub or ManagedChannel class handle active server list ?
Answer with documentation will be highly appreciated.

Load Balancers generally handle nodes going down. Even when managed by an external service, nodes can crash abruptly and Load Balancers want to avoid those nodes. So all Load Balancer implementations for gRPC I'm aware of avoid failing calls when a backend is down.
Pick First (the default), iterates through the addresses until one works. Round Robin only round robins over working connections. So what you're describing should work fine.
I will note that your approach does have one downfall: you can't change the servers while the process is running. Removing broken backends in one thing, but adding new working backends is another. If your load is ever too high, you may not be able to address the issue by adding more workers because even if you add more workers your clients won't connect to them.

What cometd configurations to use to reduce 402 error occurrences?

We have implemented a Java servlet running on JBoss container that uses CometD long-polling. This has been implemented in a few organizations without any issue, but in a recent implementation there are functional issues which appear to be related to the network setup of this organization.
Specifically, around 5% of the time, the connect requests are getting back 402 errors:
{"id":"39","error":"402::Unknown client","successful":false,"advice":{"interval":0,"reconnect":"handshake"},"channel":"/meta/connect"}
Getting this organization to address network performance is a significant challenge, so we are looking at a way to tune the implementation to reduce these issues.
Which cometd configuration parameters can be updated to improve this?
maxinterval, timeout, multiSessionInverval, etc?
Thank you!

The "402 unknown client" error is due to the fact that the server does not see /meta/connect heartbeat messages from the client and expires the correspondent session on the server. This is typically due to network issues.
Once the client network is restored, the client sends a /meta/connect heartbeat message but the server doesn't have the correspondent session, hence the 402.
The parameter that controls the server side expiration of sessions is maxInterval, documented here: https://docs.cometd.org/current/reference/#_java_server.
By default is 10 seconds. If you increase it, it means you are retaining in the server memory sessions for a longer time, so you need to take that into account.

Apache Http Client and Load Balancers

After spending a few hours reading the Http Client documentation and source code I have decided that I should definitely ask for help here.
I have a load balancer server using a round-robin algorithm somewhat like this
+---> RESTServer1
client --> load balancer +---> RESTServer2
+---> RESTServer3
Our client is using HttpClient to direct requests to our load balancer server, which in turn round-robins the requests to the corresponding RESTServer.
Now, Apache HttpClient creates, by default, a pool of connections (2 per route by default). This connections are by default persistent connections since I am using Http v1.1 and my servers are emitting Connection: Keep-Alive headers.
So, the problems is that since HttpClient creates this persistent connections, then those connections are no longer subject to round-robing algorithm at the balancer level. They always hit the same server every time.
This creates two problems:
I can see that sometimes one or more of the balanced servers are overloaded with traffic, whereas one ore more of the other servers are idle; and
even if I take one of my REST servers out of the balancer, it stills receives requests while the persistent connections are alive.
Definitely this is not the intended behavior.
I suppose I could force a Connection: close header in my responses, or I could run HttpClient without a connection pool or with a NoConnectionReuseStrategy. But the documentation for HttpClient states that the idea behind the use of a pool is to improve performance by avoiding having to open a socket every time and doing all the TPC handshaking and related stuff. So, I have to conclude that the use of a connection pool is beneficial to the performance of my applications.
So my question here, is there a way to use persistent connections with a load-balancer in the way or am I forced to use non-persistent connections for this scenario?
I want the performance that comes with reusing connections, but I want them properly load-balanced. Any thoughts on how I can configure this scenario with Apache Http Client if at all possible?

Your question is perhaps more related to your load balancer configuration and the style of load balancing. There are several ways:
HTTP Redirection
LB acts as a reverse proxy
Pure packet forwarding
In scenarios 1 and 3 you do not have a chance with persistent connections. If your load balancer acts like a reverse proxy, there might be a way to achieve persistent connections with balancing. "Dumb" balancers, like SMTP or LDAP selects the target per TCP connection, not on a request basis.
For example the Apache HTTPd server with the balancer module (see http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html) can dispatch every request (even on persistent connections) to a different server.
Also check, that you do not receive a balancer cookie which might be session persistent so that the cause is not the persistent connection but a balancer cookie.
HTH, Mark

+1 to #mp911de answer
One can also make the scenarios 1 and 3 work reasonably well by limiting the total time to live of persistent connections to some short period time, say 15 seconds. This way connections would live long enough to get re-used during periods of activity and short enough to go away during periods of relative inactivity.

How to implement Memcache in my application

Given the following scenario, please suggest me a way to implement memcache in my application.
Currently, I have 10 webservers in which the same application is being run and a load balancer to decide upon to which web server the request be sent.
On each webserver, I am maintaining a local cache i.e. there is some class XYZ which controls the MySQL table xyz and this class has initialize method
which will warm up the local cache.
Now, suppose the webservers are X,Y,Z. The load balancer sends a request to X and this request adds some values to db & updates the cache. Again the same request was sent by the load balancer to Y. But since server Y doesnot have the value in the cache, it hits the database.
So, given this scenario, how should I implement memcache in my application so that I could minimize db hits.
Should I have a separate memcache server and all the other 10 webservers will get the cached data from this memcacher server?

One work around (not ideal though), would be to implement sticky session on the load balancer so that request from one user always go through to the same server (for the duration of their session). This doesn't help much if a server dies or you need cached data shared between sessions (but it is easy and quick to do if your load balancer supports it).
Otherwise the better solution is to use something like memcached (or membase if your feeling adventurous). Memcached can either be deployed on each or your servers or on separate servers (use multiple servers to avoid the problem of one servers dying and taking your cache with it). Then on each of your application servers specify in your memcached client connection details for all of the memcached servers (put them in the same order on each server and use a consistent hashing algorithm in the memcached client options to determine on which server(s) the cache key will go).
In short - you now have a working memcached set-up (as long as you use it sensibly inside your application)
There are plenty of memcached tutorials out there that can help with the finer points of doing all this but hopefully my post will give you some general direction.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.