Apache HttpClient connection configuration - java

I am trying to setup a HttpClient through the HttpClientBuilder. I also had a look at the HttpClientConnectionManager and here the confusion started.
On the ConnectionManager or more exactly the PoolingHttpClientConnectionManager there are methods to:
close expired connections
close idle connections
When is a connection considered expired?
When is it idle?
What happens when a connection from the pool is closed? Is it ensured, that there are connections recreated when needed?

HTTP is based on TCP, which manages that packages are sent and received in the correct order and requests retransmissions if packages got lost mid way. A TCP connection is started with a TCP-Handshake consisting of SYN, SYN-ACK and ACK messages while it is ended with a FIN, ACK-FIN, and ACK series as can be seen from this image taken from Wikipedia
While HTTP is a request-response protocol, opening and closing connections is quite costly and so HTTP/1.1 allowed to reuse existing connections. With the header Connection: keep-alive i.e. you tell your client (i.e. browser) to keep the connection open to a server. A server can have litterally thousands and thousands open connection at the same time. In order to avoid draining the server's resources connection are usually timely limited. Via socket timeouts idle connections or connections with certain connection issues (broken internet access, ...) are closed after some predefined time by the server automatically.
Plenty of HTTP implementations, such as Apaches HTTP client 4.4 and beyond, check the status of a connection only when it is about to use it.
The handling of stale connections was changed in version 4.4. Previously, the code would check every connection by default before re-using it. The code now only checks the connection if the elapsed time since the last use of the connection exceeds the timeout that has been set. The default timeout is set to 2000ms (Source)
If a connection therefore might not have been used for some time the client may not have read the ACK-FIN from the server and therefore still think the connection is open when it actually got already closed by the server some time ago. Such a connection is expired and usually called half-closed. It therefore may be collected by the pool.
Note that if you send requests including a Connection: close HTTP header, the connection should be closed right after the client received the response.
The state of open connections can be checked via netstat which should be present on most modern operation systems. I recently had to check one of our HTTP clients which was managed through a third party library that did not propagate the Connection: Close header properly and therefore led to plenty of half-closed connections.

According to: https://hc.apache.org/httpcomponents-client-4.5.x/current/tutorial/html/connmgmt.html#d5e418
HttpClient tries to mitigate the problem by testing whether the
connection is 'stale', that is no longer valid because it was closed
on the server side, prior to using the connection for executing an
HTTP request. The stale connection check is not 100% reliable. The
only feasible solution that does not involve a one thread per socket
model for idle connections is a dedicated monitor thread used to evict
connections that are considered expired due to a long period of
inactivity. The monitor thread can periodically call
ClientConnectionManager#closeExpiredConnections() method to close all
expired connections and evict closed connections from the pool. It can also optionally call ClientConnectionManager#closeIdleConnections() method to close all connections that have been idle over a given period of time.
The difference between expired and idle is that an expired connection has been closed on the server side, while the idle connection isn't necessarily closed on the server side, but it has been idle over a period of time. When a connection is closed, it becomes available again in the pool to be used.

Related

Disconnecting a single client disconnects many other clients

I’m testing a Diffusion solution in our pre-production environment. The solution gives anonymous clients 10 minutes of free access before they have to authenticate, or be disconnected. This works fine in development and early testing, but in pre-production when one client is disconnected we see many simultaneous disconnection of other clients without cause. Once the logging is set to FINEST the log file says:
2016-03-21 11:57:36.557|DEBUG|Diffusion: InboundThreadPool Thread_4||NIOBufferedChannel#52e2a219[connected local=/10.0.4.1:8080 remote=/10.0.1.99:58673] : Closed(UNEXPECTED_ERROR) Unexpected error EOF|com.pushtechnology.diffusion.io.message.MessageChannelException
2016-03-21 11:57:36.558|DEBUG|Diffusion: InboundThreadPool Thread_4||Java Client 50328FF242799CD4-000000000000015A AWAITING_RECONNECTION#10.0.1.99: State changed from CONNECTED to AWAITING_RECONNECTION.|com.pushtechnology.diffusion.clients.impl.ClientImpl
2016-03-21 11:57:36.558|DEBUG|Diffusion: InboundThreadPool Thread_4||Java Client 50328FF242799CD4-000000000000015A AWAITING_RECONNECTION#10.0.1.99: CONNECTION_LOST keeping alive for 60000 ms.|com.pushtechnology.diffusion.clients.impl.ClientImpl
The effected clients are always browsers, not smart phones. Often older browsers such as IE9.
I'm guessing that your pre-production environment has a load balancer which is set to use connection pooling. Versions of IE prior to v10 did not support WebSockets, so they'll be using XHR long polling. Your smart phone client also will be using WebSockets, so will be unaffected.
The manual has this to say in section "Considerations when using load balancers"
Do not use connection pooling for connections between the load balancer and the Diffusion server. If multiple client connections are multiplexed through a single server-side connection, this can cause client connections to be prematurely closed.
In Diffusion, a client is associated with a single TCP/HTTP connection for the lifetime of that connection. If a Diffusion server closes a client, the connection is also closed. Diffusion makes no distinction between a single client connection and a multiplexed connection, so when a client sharing a multiplexed connection closes, the connection between the load balancer and Diffusion is closed, and subsequently all of the client-side connections multiplexed through that server-side connection are closed.
To illustrate the problem. When a Diffusions server has a direct connection with its audience Alice, Bob and Charlie, closing Bob's connection is straight forward
When a connection pooling middle box (a proxy or load-balancer) enters the mix, closing Bob's connection results in disconnection for Alice and Charlie as well.
So, whereas connection pooling is a good idea for regular HTTP servers, it is problematic for Diffusion servers entertaining an audience of XHR polling clients if it needs to disconnect discrete clients.

Apache HttpClient 4.3 - setting connection idle timeout

What's the shortest way to configure connection idle timeout on Apache HttpClient 4.3 version?
I've looked in the documentation and couldn't find anything. My goal is to reduce open connections to a minimum post server-peak.
for example in Jetty Client 8.x you can set httpClient.setIdleTimeout: http://download.eclipse.org/jetty/stable-8/apidocs/org/eclipse/jetty/client/HttpClient.html#setIdleTimeout(long)
The timeout is set in the RequestConfig so you could set the default when the HttpClientBuilder is called.
For example assuming your timeout variable is in seconds to create your custom RequestConfig you could do something like this:
RequestConfig config = RequestConfig.custom()
.setSocketTimeout(timeout * 1000)
.setConnectTimeout(timeout * 1000)
.build();
You could then build your HttpClient setting the default RequestConfig like this:
HttpClients.custom()
.setDefaultRequestConfig(config);
You can't set an idle connection timeout in the config for Apache HTTP Client. The reason is that there is a performance overhead in doing so.
The documentation clearly states why, and gives an example of an idle connection monitor implementation you can copy. Essentially this is another thread that you run to periodically call closeIdleConnections on HttpClientConnectionManager
http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html
One of the major shortcomings of the classic blocking I/O model is that the network socket can react to I/O events only when blocked in an I/O operation. When a connection is released back to the manager, it can be kept alive however it is unable to monitor the status of the socket and react to any I/O events. If the connection gets closed on the server side, the client side connection is unable to detect the change in the connection state (and react appropriately by closing the socket on its end).
HttpClient tries to mitigate the problem by testing whether the connection is 'stale', that is no longer valid because it was closed on the server side, prior to using the connection for executing an HTTP request. The stale connection check is not 100% reliable and adds 10 to 30 ms overhead to each request execution. The only feasible solution that does not involve a one thread per socket model for idle connections is a dedicated monitor thread used to evict connections that are considered expired due to a long period of inactivity. The monitor thread can periodically call ClientConnectionManager#closeExpiredConnections() method to close all expired connections and evict closed connections from the pool. It can also optionally call ClientConnectionManager#closeIdleConnections() method to close all connections that have been idle over a given period of time.

Apache HttpClient: How to auto close connections by server's keep-alive time?

Apache HttpClient 4.3b2, HttpCore 4.3.
I use PoolingHttpClientConnectionManager to manage 5 connections concurrently:
PoolingHttpClientConnectionManager connectionManager;
HttpClient httpclient;
connectionManager = new PoolingHttpClientConnectionManager();
connectionManager.setDefaultMaxPerRoute(5);
httpclient = HttpClientBuilder.create().setConnectionManager(connectionManager).build();
Server have 5 seconds keep-alive time.
When server initiate close connection process it is staying in FIN_WAIT2 state until I'll execute connectionManager.shutdown() or connectionManager.closeExpiredConnections() or connectionManager.closeIdleConnections(5, TimeUnit.SECONDS) manually. Server waits FIN package. How can I automatically close connections on client side after server start closing process?
When I do requests from Chrome browser, server stay in TIME_WAIT state when it try to close connection by keep-alive (FIN_WAIT2 state changes very quickly). How can I get the same behavior with Apache HttpClient?
This problem is explained in details in HttpClient tutorial
One of the major shortcomings of the classic blocking I/O model is that the network socket can react to I/O events only when blocked in an I/O operation. When a connection is released back to the manager, it can be kept alive however it is unable to monitor the status of the socket and react to any I/O events. If the connection gets closed on the server side, the client side connection is unable to detect the change in the connection state (and react appropriately by closing the socket on its end).
If you want expired connections to get pro-actively evicted from the connection pool there is no way around running an additional thread enforcing a connection eviction policy that suits your application.
In PoolingHttpClientConnectionManager class there is a method setValidateAfterInactivity that sets period of connection inactivity in milliseconds. If this period has been exceeded connection pool revalidates connection before passing it to HttpClient.
This method is available since v.4.4.
In prior versions RequestConfig.Builder.setStaleConnectionCheckEnabled method could have been used.
I found this question multiple times while working on an Apache HttpClient 5 based client implementation to figure out whether a idle http connection monitor is still required.
Apparently, since Apache HttpClient 4.4, there is org.apache.hc.client5.http.impl.IdleConnectionEvictor which does exactly the thing described in HttpClient tutorial (which isn't mentioned in the tutorial).
Thought it might be useful to be aware of this for others as well.

Why does DefaultHttpClient send data over a half-closed socket?

I'm using DefaultHttpClient with a ThreadSafeClientConnManager on Android (2.3.x) to send HTTP requests to a my REST server (embedded Jetty).
After ~200 seconds of idle time, the server closes the TCP connection with a [FIN]. The Android client responds with an [ACK]. This should and does leave the socket in a half-closed state (server is still listening, but can't send data).
I would expect that when the client tries to use that connection again (via HttpClient.execute), DefaultHttpClient would detect the half-closed state, close the socket on the client side (thus sending it's [FIN/ACK] to finalize the close), and open a new connection for the request. But, there's the rub.
Instead, it sends the new HTTP request over the half-closed socket. Only after sending is the half-closed state detected and the socket closed on the client-side (with the [FIN] sent to the server). Of course, the server can't respond to the request (it had already sent its [FIN]), so the client thinks the request failed and automatically retries via a new socket/connection.
The end result is that server sees and processes two copies of the request.
Any ideas on how to fix this? (My server does the correct thing with the second copy, but I'm annoyed that the payload is transmitted twice.)
Shouldn't DefaultHttpClient detect that the socket was closed when it first tries to write the new HTTP packet, close that socket immediately, and start a new one? I'm baffled as to how a new HTTP request is sent on a socket minutes after the server sent a [FIN].
This is a general limitation of the blocking I/O in Java. There is simply no way of finding out whether or not the opposite endpoint has closed connection other than by attempting to read from the socket. Apache HttpClient works this problem around by employing the so stale connection check which is essentially a very brief read operation. However, the check can and often is disabled. In fact it is often advisable to have it disabled due to extra latency the check introduces. I have no idea how exactly the version of HttpClient shipped with Android behaves in this regard but you could try explicitly enabling the check by using an appropriate config parameter.
A better solution to this problem might be evicting connections from the connection pool that have been idle over a particular period of time (say 150 seconds) after a period of inactivity.
http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html#d5e652

JVM-based longpoll/comet client: routers killing idle connections

I currently have a JVM-based network client that does an HTTP long poll (aka comet) request using the standard java.net.HttpURLConnection. I have timeout set very high for the connection (1 hour). For most users it works fine. But some users do not receive the data sent from the server and eventually time out after 1 hour.
My theory is that a (NAT) router is timing out and discarding their connections because they are idle too long before the server sends any data.
My questions then are:
Can I enable TCP keep-alive for the connections used by java.net.HttpURLConnection? I could not find a way to do this.
Is there a different API (than HttpURLConnection) I should be using instead?
Other solutions?
java.net.HttpURLConnection handles Keep-Alive header transparently, it can be controlled and it is on by default. But your problem is not in Keep-Alive, which is a higher level flag indicating that the server should close the connection after handling the first request but rather waiting for the next one.
In your case probably something on the lower level of OSI stack interrupts the connection. Because keeping an open but idle TCP connection for such a long period of time is never a good choice (FTP protocol with two open connections: one for commands and one for data has the same problem), I would rather implement some sort of disconnect/retry fail-safe procedure on the client side.
In fact safe limit would probably be just few minutes, not hours. Simply disconnect from the HTTP server pro-actively every 60 seconds or 5 minutes. Should do the trick.
There does not appear to be a way to turn on TCP keep-alive for HttpURLConnection.
Apache HttpComponents will be an option when version 4.2 comes out with TCP keep-alive support.

Categories