Apache HttpClient 4 persistent connection per Proxy instead of per route

Apache HttpClient 4 persistent connection per Proxy instead of per route - java

My understanding, all implementations of ClientConnectionManager persist connections base on route. This results in basically no persistent connections if a proxy is involved. For example, the HttpClient needs to visit 1000 different domains via a HTTP proxy with an fix IP, it has to establish at least 1000 connection to the proxy instead of creating 1 persistent connection to the proxy and reuse that for the 1000 requests.
I'm simulating multiple users visiting thousands of domains (fake domains, all dns resolved to a couple of IPs, the resolving happen after the proxy, so nothing to do with HttpClient). The above behavior quickly use up all available ports in the localhost as I increase the # of users and domains, the Address Bind errors occur as result.
Is there a way to make the HttpClient to persist connection on proxy basis? ie. A HttpClient only maintain specified number of connections to a given proxy.

After intensive research, it seems that Apache HttpClient doesn't support this behavior out-of-box. I have to modify the HttpClient/HttpCore source in order to have this feature, ie. maintain persistent connections based only on localAddress and First Proxy address.
The classes I modified are:
org.apache.http.conn.routing.HttpRounte.java and
org.apache.http.conn.routing.BasicRouteDirector.java.
Basically I changed the hashCode and equal method in HttpRoute (which is used as a key to hashtable for persistent conn lookup), so the lookup doesn't consider target address if a proxy is involved.
Initial test results of above modification shows about 100 times improvement in terms of request throughput in my scenario. So far it works fine for me.
Kevin

Related

Resolution of Server IPs in gRPC

We are running a setup on production where grpc clients are talking to servers via proxy in between (image attached)
The client is written in java and server is written in go. We are using the load balancing property as round_robin in the client. Despite this, we have observed some bizarre behaviour. When our proxy servers scale in i.e reduce from let's say 4 to 3, then resolver gets into action and the request load from our clients gets distributed equally to all of our proxies, but when the proxy servers scale out i.e increase from 4 to 8, then the new proxy servers don't get any requests from the clients which leads to a skewed distribution of request load on our proxy servers. Is there any configuration that we can do to avoid this?
We tried setting a property named networkaddress.cache.ttl to 60 seconds in the JVM ARGS but even this didn't help.

You need to cycle the sticky gRPC connections using the keepalive and keepalive timeout configuration in the gRPC client.
Please have a look at this - gRPC connection cycling

both round_robin and pick_first perform name resolution only once. They are intended for thin, user-facing clients (android, desktop) that have relatively short life-time, so sticking to a particular (set of) backend connection(s) is not a problem then.
If your client is a server app, then you should be rather be using grpclb or the newer xDS: they automatically re-resolve available backends when needed. To enable them you need to add runtime dependency in your client to grpc-grpclb or grpc-xds respectively.
grpclb does not need any additional configuration or setup, but has limited functionality. Each client process will have its own load-balancer+resolver instance. backends are obtained via repeated DNS resolution by default.
xDS requires an external envoy instance/service from which it obtains available backends.

Is Jetty org.eclipse.jetty.client.HttpClient thread safe in version 11

I'd like to know if HttpClient is thread-safe, ie can be used by 2 threads without problem.
Reading the class code, it's not clear for me, as some fields are thread safe and some are not.
There has been questions not answered:
Is Jetty HttpClient(9.3.8.v20160314) thread-safe and can be used repeatedly?
And some question seem to say it is:
How to isolate Jetty HttpClient for multiple users?

The design of Jetty's HttpClient is for it to be treated the same as a Web Browser with multiple tabs open.
It is, by design, multi-threaded.
You are encouraged to only have 1 HttpClient instance started and use it repeatedly for all requests you want to make to as many servers as you want.
The Request object you create for each request is unique for that request only and cannot be reused.
There is also an active ConnectionPool that is maintained for each destination so that a minimum number of open connections is maintained, which is especially useful if you are talking to a secure SSL/TLS server (as the initial handshake to establish the connection is often the slowest part of that request/response exchange)

How to track/monitor socket connection in spring boot?

We have a spring boot (with zuul) app using default embedded tomcat (I think). It has many clients implemented with different technologies and languages. And we have problem with too many port in TIME_WAIT: i.e. too many socket connections are opened/closed w.r.t the expected request behavior that should keep connections alive most of the time.
By retrieving the HttpRequest object in the deployed API, I can get information on the request header. This way I can track the http protocol used (http/1.1) and header parameter such as keep-alive (which, if present, is redundant with the use of http/1.1).
=> I would like to track opened and closed socket connections, but I don't see how?
Intermediate information would be better than nothing.
Note: I found some tutorial on a similar topic when using spring-websocket, but we don't.

Apache Http Client and Load Balancers

After spending a few hours reading the Http Client documentation and source code I have decided that I should definitely ask for help here.
I have a load balancer server using a round-robin algorithm somewhat like this
+---> RESTServer1
client --> load balancer +---> RESTServer2
+---> RESTServer3
Our client is using HttpClient to direct requests to our load balancer server, which in turn round-robins the requests to the corresponding RESTServer.
Now, Apache HttpClient creates, by default, a pool of connections (2 per route by default). This connections are by default persistent connections since I am using Http v1.1 and my servers are emitting Connection: Keep-Alive headers.
So, the problems is that since HttpClient creates this persistent connections, then those connections are no longer subject to round-robing algorithm at the balancer level. They always hit the same server every time.
This creates two problems:
I can see that sometimes one or more of the balanced servers are overloaded with traffic, whereas one ore more of the other servers are idle; and
even if I take one of my REST servers out of the balancer, it stills receives requests while the persistent connections are alive.
Definitely this is not the intended behavior.
I suppose I could force a Connection: close header in my responses, or I could run HttpClient without a connection pool or with a NoConnectionReuseStrategy. But the documentation for HttpClient states that the idea behind the use of a pool is to improve performance by avoiding having to open a socket every time and doing all the TPC handshaking and related stuff. So, I have to conclude that the use of a connection pool is beneficial to the performance of my applications.
So my question here, is there a way to use persistent connections with a load-balancer in the way or am I forced to use non-persistent connections for this scenario?
I want the performance that comes with reusing connections, but I want them properly load-balanced. Any thoughts on how I can configure this scenario with Apache Http Client if at all possible?

Your question is perhaps more related to your load balancer configuration and the style of load balancing. There are several ways:
HTTP Redirection
LB acts as a reverse proxy
Pure packet forwarding
In scenarios 1 and 3 you do not have a chance with persistent connections. If your load balancer acts like a reverse proxy, there might be a way to achieve persistent connections with balancing. "Dumb" balancers, like SMTP or LDAP selects the target per TCP connection, not on a request basis.
For example the Apache HTTPd server with the balancer module (see http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html) can dispatch every request (even on persistent connections) to a different server.
Also check, that you do not receive a balancer cookie which might be session persistent so that the cause is not the persistent connection but a balancer cookie.
HTH, Mark

+1 to #mp911de answer
One can also make the scenarios 1 and 3 work reasonably well by limiting the total time to live of persistent connections to some short period time, say 15 seconds. This way connections would live long enough to get re-used during periods of activity and short enough to go away during periods of relative inactivity.

Opening a new database connection for every client that connects to the server application?

I am in the process of building a client-server application and I would really like an advise on how to design the server-database connection part.
Let's say the basic idea is the following:
Client authenticates himself on the server.
Client sends a request to server.
Server stores client's request to the local database.
In terms of Java Objects we have
Client Object
Server Object
Database Object
So when a client connects to the server a session is created between them through which all the data is exchanged. Now what bothers me is whether i should create a database object/connection for each client session or whether I should create one database object that will handle all requests.
Thus the two concepts are
Create one database object that handles all client requests
For each client-server session create a database object that is used exclusively for the client.
Going with option 1, I guess that all methods should become synchronized in order to avoid one client thread not overwriting the variables of the other. However, making it synchronize it will be time consuming in the case of a lot of concurrent requests as each request will be placed in queue until the one running is completed.
Going with option 2, seems a more appropriate solution but creating a database object for every client-server session is a memory consuming task, plus creating a database connection for each client could lead to a problem again when the number of concurrent connected users is big.
These are just my thoughts, so please add any comments that it may help on the decision.
Thank you

Option 3: use a connection pool. Every time you want to connect to the database, you get a connection from the pool. When you're done with it, you close the connection to give it back to the pool.
That way, you can
have several clients accessing the database concurrently (your option 1 doesn't allow that)
have a reasonable number of connections opened and avoid bringing the database to its knees or run out of available connections (your option 2 doesn't allow that)
avoid opening new database connections all the time (your option 2 doesn't allow that). Opening a connection is a costly operation.
Basically all server apps use this strategy. All Java EE servers come with a connection pool. You can also use it in Java SE applications, by using a pool as a library (HikariCP, Tomcat connection pool, etc.)

I would suggested a third option, database connection pooling. This way you create a specified number of connections and give out the first available free connection as soon as it becomes available. This gives you the best of both worlds - there will almost always be free connections available quickly and you keep the number of connections the database at a reasonable level. There are plenty of the box java connection pooling solutions, so have a look online.

Just use connection pooling and go with option 2. There are quite a few - C3P0, BoneCP, DBCP. I prefer BoneCP.

Both are not good solutions.
Problem with Option 1:
You already stated the problems with synchronizing when there are multiple threads. But apart from that there are many other problems like transaction management (when are you going to commit your connection?), Security (all clients can see precommitted values).. just to state a few..
Problem with Option 2:
Two of the biggest problems with this are:
It takes a lot of time to create a new connection each and every time. So performance will become an issue.
Database connections are extremely expensive resources which should be used in limited numbers. If you start creating DB Connections for every client you will soon run out of them although most of the connections would not be actively used. You will also see your application performance drop.
The Connection Pooling Option
That is why almost all client-server applications go with the connection pooling solution. You have a set connections in the pool which are obtained and released appropriately. Almost all Java Frameworks have sophisticated connection pooling solutions.
If you are not using any JDBC framework (most use the Spring JDBC\Hibernate) read the following article:
http://docs.oracle.com/javase/jndi/tutorial/ldap/connect/pool.html
If you are using any of the popular Java Frameworks like Spring, I would suggest you use Connection Pooling provided by the framework.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.