Flume's HttpSource: is the Jetty server multithread? - java

I've been looking a bit to Flume's HttpSource internals, trying yo figure out how the Jetty server is used.
I've seen a single element list of Connectors is used; this Connector will listen for incoming Http connections on configured Http host and port. Then a Context is created for the root path, and a HttpServlet is added to this Context containing the logic to be executed when a connection is received. Finally, the Jetty server is started.
Connector[] connectors = new Connector[1];
if (sslEnabled) {
SslSocketConnector sslSocketConnector = new HTTPSourceSocketConnector(excludedProtocols);
...
connectors[0] = sslSocketConnector;
} else {
SelectChannelConnector connector = new SelectChannelConnector();
...
connectors[0] = connector;
}
connectors[0].setHost(host);
connectors[0].setPort(port);
srv.setConnectors(connectors);
try {
org.mortbay.jetty.servlet.Context root = new org.mortbay.jetty.servlet.Context(srv, "/", org.mortbay.jetty.servlet.Context.SESSIONS);
root.addServlet(new ServletHolder(new FlumeHTTPServlet()), "/");
HTTPServerConstraintUtil.enforceConstraints(root);
srv.start();
...
My question is, seen the above implementation: does such a Jetty server create a thread for each incoming Http connection? Or does a unique HttpServlet serve all the requests, one by one, sequentially?
Thanks for helping!

First of note: org.mortbay.jetty means you are using a very old version of Jetty. Probably Jetty 5 or Jetty 6. Those have been EOL (End of Life'd) way back in 2010 (and earlier).
Back in the Jetty 6 days, there was a ThreadPool that was used on-demand, and depending on your Connector type it would either result in a thread per connection (known as blocking connectors), or a thread per nio selection (in this case your 1 connections have many threads over the lifetime of the connection, but never more than 1 active per connection).
Starting with Jetty 8, and Servlet async, this threading model was refactored to favor async behavior of request processing more.
With Jetty 9, all blocking connectors were dropped in favor of supporting fully async processing of the request, its inputstreams, and its outputstreams.
The current model is for a ThreadPool of threads to be used, on demand, only when needed by a connection (this could be for processing of the request, or the response, or reading the request body content, or writing the response body content, or active websocket streaming, etc...)
This model is preferred for SPDY and HTTP/2 based support, where you have multiple requests per physical connection. But know that in those models its quite possible to have multiple active threads per physical connection, depending on behavior of your servlets.
Also, the web application itself can choose to spin up more threads for its own processing, such as via the servlet async processing behaviors, or to initiate outgoing requests to other services, or to process other tasks that are unrelated to a specific request / response context.

Related

send keep alive on long asynchronous request in spring server

I have a controller in spring which getting a POST request which is handling as asynchronous(using DeferredResult object as a return value).
The response for this request is writing bytes to the HTTP stream directly (HttpServletResponse.getWriter().print()) and when it's done writing it sets result on the DeferredResult object for close the connection.
I'm writing my response in stream chunks.
I have an issue in this request handling because the client is closing the connection if I'm not writing to it for 1 minute. (I can write some chunks and then stop writing for 1 minute - therefore the connection will be closed in the middle of my procedure).
I want to control the closing connection procedure - I want to send keep alive when I'm not writing any data to the stream so that the connection won't be closed until I decided to close it from the server-side.
I didn't find out how should I get control of the connection from the controller in the server.
Please assist.
Thanks.
There is no such thing as a "keep alive" during an ongoing request or response in HTTP which can help with idle timeouts when receiving a request or response.
HTTP keep alive is only about keeping the TCP connection open after a response in order to process more requests on the same connection. TCP keep alive is instead used to detect connection loss without TCP shutdown and can also be used to prevent idle timeouts in stateful packet filters (as used in firewalls or NAT routers) in between client and server. It does not prevent idle timeouts at the application level though since it does not transport any data visible to the application level.
Note that the way you want to use HTTP is contrary to how HTTP was designed originally. It was designed for a client sending a full request and the server sending a full response immediately and not for the server sending some parts of the response, idling some time and then send some more. The proper way to implement such behavior would be by using WebSockets. With WebSockets both client and server can send new messages at any time (i.e. no request-response schema) and it also supports keep-alive messages. If WebSockets are not an option you can instead implement a polling client which regularly polls for new data from the server with a new request.
I ran into similar need just recently. The server code executes a long running operation that can take as long as 30 minutes to return, and the client times out long before that. The solution was to have the long running operation send periodic "keep alive" packets of data to the client via a "callback" argument provided by the request handler method. The callback is nothing more than a function (think of Lambda in Java) that takes as parameter the "keep alive" data packet to send to client, and then writes that data packet to the client via the java.io.PrintWriter reference that you can get off of javax.servlet.http.HttpServletResponse. Below code is the handler method that does this. I had to refactor the code in the call hierarchy to accept this new "callback" parameter until the "callback" can reach the method that is performing the long running operation, and inside that code I invoke the "callback" every so often, for example every time 10 records are processed. Not that below is Groovy code (scripting code on top of Java that runs on the JVM) and the server-side framework is Spring,
...
#Autowired
DataImporter dataImporter
#PostMapping("/my/endpoint")
void importData(#RequestBody MyDto myDto, HttpServletResponse response) {
// Callback to allow servant code deep in the call hierarchy to report back to client any arbitrary message
Closure<Void> callback = { String str ->
response.writer.print str
response.writer.flush()
}
// This leads to the code that is performing a long running operation. Using
// this "hook" that code has a direct connection to the client whereby
// it can send packets of data to keep the connection from timing out.
dataImporter.importData(myDto, callback)
}
}

Can't use two connectors (http and https) in Jetty v.9.4.3

When I'm adding two connectors to embedded Jetty server I can't use neither HTTP nor HTTPS - browser/curl is simply stuck. The code I use to create embedded Jetty is approximately the following (it is based on this example - http://self-learning-java-tutorial.blogspot.de/2015/10/jetty-configuring-many-connectors.html):
HttpConfiguration httpConfiguration = new HttpConfiguration();
httpConfiguration.setRequestHeaderSize(requestHeaderSize);
ServerConnector httpConnector= new ServerConnector(server, 1, -1, new
HttpConnectionFactory(httpConfiguration));
httpConnector.setPort(getPort());
httpConnector.setReuseAddress(true);
httpConnector.setIdleTimeout(maxTimeout);
server.addConnector(httpConnector);
HttpConfiguration httpsConfiguration = new HttpConfiguration();
httpsConfiguration.setSecureScheme("https");
httpsConfiguration.setSecurePort(securePort);
httpsConfiguration.addCustomizer(new SecureRequestCustomizer());
ServerConnector sslConnector = new ServerConnector(server,
new SslConnectionFactory(sslContextFactory, HttpVersion.HTTP_1_1.asString()),
new HttpConnectionFactory(httpsConfiguration));
sslConnector.setPort(securePort);
sslConnector.setIdleTimeout(maxTimeout);
sslConnector.setReuseAddress(true);
server.addConnector(sslConnector);
ServletContextHandler servContext = new
ServletContextHandler(ServletContextHandler.NO_SESSIONS);
servContext.setContextPath("/");
server.setHandler(servContext);
server.start();
I turned on debug logs inside org.eclipse.jetty and on any request I get the following:
Selector loop woken up from select, 0/1 selected [] [io.ManagedSelector][jetty-default-3]
Running action org.eclipse.jetty.io.ManagedSelector$Accept#4278b8a5 [][io.ManagedSelector] [jetty-default-3]
Queued change org.eclipse.jetty.io.ManagedSelector$CreateEndPoint#535fb063 on org.eclipse.jetty.io.ManagedSelector#3959754c id=3 keys=2 selected=0 [] [io.ManagedSelector] [jetty-default-3]
EatWhatYouKill#1289003f/org.eclipse.jetty.io.ManagedSelector$SelectorProducer#7ff1b622/PRODUCING/0/1->PRODUCING/0/1 PEC org.eclipse.jetty.io.ManagedSelector$CreateEndPoint#535fb063 [] [strategy.EatWhatYouKill] [jetty-default-3]
Selector loop waiting on select [] [io.ManagedSelector] [jetty-default-3]
When only one connector is added everything works as expected.
P.S. SO questions "Selector loop waiting on select" when running multiple test cases which use wiremock stubs and Jetty+Jersey infinite loop with curl post query don't give any answer other than it's a jetty bug fixed in 9.3 (I use 9.4.3)
Embedded Jetty supports as many connectors on 1 server as you can dream up.
There is no technical limitation in Jetty (the only limitations that exist are in the OS and Networking stacks on your environment)
Its important to note that you have to have a sane HttpConfiguration setup.
As they can refer to each other's connectors. (this is for "is secure" behavior, security constraints, etc)
While it is possible to have multiple connectors that simple are not aware of each other, this is not the general use case.
When using HTTPS (aka HTTP over TLS/SSL) the choice of Certificates (sizes, types, alogorithms, etc), and Cipher suite selections will impact your ability to connect to that HTTPS connector.
Note that HTTPS is TLS (not SSL), and Jetty can use the ALPN extensions to TLS which allow the client to negotiate the next protocol to actually use (be it HTTP/1.x or HTTP/2 or whatever your configured next protocol list is)
Here's a few examples of multiple connectors in embedded Jetty.
eclipse/jetty.project - embedded/ManyConnectors.java
eclipse/jetty.project - embedded/LikeJettyXml.java
jetty-project/embedded-jetty-cookbook - ConnectorSpecificContexts.java
jetty-project/embedded-jetty-cookbook - ConnectorSpecificWebapps.java
jetty-project/embedded-jetty-cookbook - SecuredRedirectHandlerExample.java
jetty-project/embedded-jetty-cookbook - ServletTransportGuaranteeExample.java

java httpclient connectionpool lease vs keep alive

I have used apache httpclient 4.5 in production for a while now, but recently, with the addition of a new use case, the system started failing.
We have multiple services that communicate through REST webservices, the client is a wrapper around apache httpclient 4.5.
Say i have service A communicating with service B. The communication works correctly until I restart service B. The next call I initiate from service A to service B fails, due to time out. After doing some research I found that the underlying TCP connection is reused for performance reasons (no more 2 way handshake etc). Since the server has been restarted, the underlying TCP connection is stale.
After reading the documentation, I found out that I can expire my connection after n seconds. Say I restart service B, then the call will fail the first n seconds, but after that the connection is rebuild. This is the keepAliveStrategy I implemented
connManager = new PoolingHttpClientConnectionManager();
connManager.setMaxTotal(100);
connManager.setDefaultMaxPerRoute(10);
ConnectionKeepAliveStrategy keepAliveStrategy = new DefaultConnectionKeepAliveStrategy() {
public long getKeepAliveDuration(HttpResponse response, HttpContext context) {
long keepAliveDuration = super.getKeepAliveDuration(response, context);
if (keepAliveDuration == -1) {
keepAliveDuration = 45 * 1000; // 45 seconds
}
return keepAliveDuration;
}
};
CloseableHttpClient closeableHttpClient = HttpClients.custom()
.setConnectionManager(connManager)
.setKeepAliveStrategy(keepAliveStrategy)
.build();
I am just wondering if this is correct usage of this library. I this the way it is meant to work or am I making everything overly complex?
Not sure it's 100% the same scenario, but here's my 2 cents:
We had a similar issues (broken connections in pool after a period of inactivity). When we were using an older version of HttpClient (3.X), we used the http.connection.stalecheck manager parameter, taking a minor performance hit over the possibility to get a IOException when a connection has been used that was closed server-side.
After upgrading to 4.4+ this approach was deprecated and started using setValidateAfterInactivity, which is a middle ground between per-call validation and runtime-error scenario:
PoolingHttpClientConnectionManager poolingConnManager = new PoolingHttpClientConnectionManager();
poolingConnManager.setValidateAfterInactivity(5000);
void o.a.h.i.c.PoolingHttpClientConnectionManager.setValidateAfterInactivity(int ms)
Defines period of inactivity in milliseconds after which persistent connections must be re-validated prior to being leased to the consumer. Non-positive value passed to this method disables connection validation. This check helps detect connections that have become stale (half-closed) while kept inactive in the pool.
If you're also controlling the consumed API, you can adapt the keep-alive strategy to the timing your client uses. We're using AWS Cloudfront + ELB's with connection draining for deregistered instances to ensure the kept-alive connections are fully closed, when performing a rolling upgrade. I guess as long as the connections are guaranteed to be kept alive for, say 30 seconds, any value passed to the connection manager below that will always ensure the validity check will mitigate any runtime I/O errors which are purely related to stale/expired connections.

How to setup timeout for ejb lookup in websphere 7.0

I have developed a standalone Javase client which performs an EJB Lookup to a remote server and executes its method.The Server application is in EJB 3.0
Under some strange magical but rare situations my program hangs indefinetly, on looking inside the issue it seems that while looking up the ejb on the server, I never get the response from the server and it also never times out.
I would like to know if there is a property or any other way through which we can setup the lookup time in client or at the server side.
There is a very nice article that discusses ORB configuration best practices at DeveloperWorks here. I'm quoting the three different settings that can be configured at client (you, while doing a lookup and executing a method at a remote server);
Connect timeout: Before the client ORB can even send a request to a server, it needs to establish an IIOP connection (or re-use an
existing one). Under normal circumstances, the IIOP and underlying TCP
connect operations should complete very fast. However, contention on
the network or another unforeseen factor could slow this down. The
default connect timeout is indefinite, but the ORB custom property
com.ibm.CORBA.ConnectTimeout (in seconds) can be used to change the
timeout.
Locate request timeout: Once a connection has been established and a client sends an RMI request to the server, then LocateRequestTimeout
can be used to limit the time for the CORBA LocateRequest (a CORBA
“ping”) for the object. As a result, the LocateRequestTimeout should
be less than or equal to the RequestTimeout because it is a much
shorter operation in terms of data sent back and forth. Like the
RequestTimeout, the LocateRequestTimeout defaults to 180 seconds.
Request timeout: Once the client ORB has an established TCP connection to the server, it will send the request across. However, it
will not wait indefinitely for a response, by default it will wait for
180 seconds. This is the ORB request timeout interval. This can
typically be lowered, but it should be in line with the expected
application response times from the server.
You can try the following code, which performs task & then waits at most the time specified.
Future<Object> future = executorService.submit(new Callable<Object>() {
public Object call() {
return lookup(JNDI_URL);
}
});
try {
Object result = future.get(20L, TimeUnit.SECONDS); //- Waiting for at most 20 sec
} catch (ExecutionException ex) {
logger.log(LogLevel.ERROR,ex.getMessage());
return;
}
Also, the task can be cancelled by future.cancel(true).
Remote JNDI uses the ORB, so the only option available is com.ibm.CORBA.RequestTimeout, but that will have an affect on all remote calls. As described in the 7.0 InfoCenter, the default value is 180 (3 minutes).

How can I ensure that my HttpClient 4.1 does not leak sockets?

My server uses data from an internal web service to construct its response, on a per request basis. I'm using Apache HttpClient 4.1 to make the requests. Each initial request will result in about 30 requests to the web service. Of these, 4 - 8 will end up with sockets stuck in CLOSE_WAIT, which never get released. Eventually these stuck sockets exceed my ulimit and my process runs out of file descriptors.
I don't want to just raise my ulimit (1024), because that will just mask the problem.
The reason I've moved to HttpClient is that java.net.HttpUrlConnection was behaving the same way.
I have tried moving to a SingleClientConnManager per request, and calling client.getConnectionManager().shutdown() on it, but sockets still end up stuck.
Should I be trying to solve this so that I end up with 0 open sockets while there are no running requests, or should I be concentrating on request persistence and pooling?
For clarity I'm including some details which may be relevant:
OS: Ubuntu 10.10
JRE: 1.6.0_22
Language: Scala 2.8
Sample code:
val cleaner = Executors.newScheduledThreadPool(1)
private val client = {
val ssl_ctx = SSLContext.getInstance("TLS")
val managers = Array[TrustManager](TrustingTrustManager)
ssl_ctx.init(null, managers, new java.security.SecureRandom())
val sslSf = new org.apache.http.conn.ssl.SSLSocketFactory(ssl_ctx, SSLSocketFactory.ALLOW_ALL_HOSTNAME_VERIFIER)
val schemeRegistry = new SchemeRegistry()
schemeRegistry.register(new Scheme("https", 443, sslSf))
val connection = new ThreadSafeClientConnManager(schemeRegistry)
object clean extends Runnable{
override def run = {
connection.closeExpiredConnections
connection.closeIdleConnections(30, SECONDS)
}
}
cleaner.scheduleAtFixedRate(clean,10,10,SECONDS)
val httpClient = new DefaultHttpClient(connection)
httpClient.getCredentialsProvider().setCredentials(new AuthScope(AuthScope.ANY), new UsernamePasswordCredentials(username,password))
httpClient
}
val get = new HttpGet(uri)
val entity = client.execute(get).getEntity
val stream = entity.getContent
val justForTheExample = IOUtils.toString(stream)
stream.close()
Test: netstat -a | grep {myInternalWebServiceName} | grep CLOSE_WAIT
(Lists sockets for my process that are in CLOSE_WAIT state)
Post comment discussion:
This code now demonstrates correct usage.
One needs to pro-actively evict expired / idle connections from the connection pool, as in the blocking I/O model connections cannot react to I/O events unless they are being read from / written to. For details see
http://hc.apache.org/httpcomponents-client-dev/tutorial/html/connmgmt.html#d4e631
I've marked oleg's answer as correct, as it highlights an important usage point about HttpClient's connection pooling.
To answer my specific original question, though, which was "Should I be trying to solve for 0 unused sockets or trying to maximize pooling?"
Now that the pooling solution is in place and working correctly the application throughput has increased by about 150%. I attribute this to not having to renegotiate SSL and multiple handshakes, instead reusing persistent connections in accordance with HTTP 1.1.
It is definitely worth working to utilize pooling as intended, rather than trying to hack around with calling ThreadSafeClientConnManager.shutdown() after each request etcetera. If, on the other hand, you were calling arbitrary hosts and not reusing routes the way I am you might easily find that it becomes necessary to do that sort of hackery, as the JVM might surprise you with the long life of CLOSE_WAIT designated sockets if you're not garbage collecting very often.
I had the same issue and solved it using the suggesting found here: here. The author touches on some TCP basics:
When a TCP connection is about to close, its finalization is negotiated by both parties. Think of it as breaking a contract in a civilized manner. Both parties sign the paper and it’s all good. In geek talk, this is done via the FIN/ACK messages. Party A sends a FIN message to indicate it wants to close the socket. Party B sends an ACK saying it received the message and is considering the demand. Party B then cleans up and sends a FIN to Party A. Party A responds with the ACK and everyone walks away.
The problem comes in
when B doesn’t send its FIN. A is kinda stuck waiting for it. It has
initiated its finalization sequence and is waiting for the other party
to do the same.
He then mentions RFC 2616, 14.10 to suggest setting up an http header to solve this issue:
postMethod.addHeader("Connection", "close");
Honestly, I don't really know the implications of setting this header. But it did stop CLOSE_WAIT from happening on my unit tests.

Categories