What is the difference between thread per connection vs thread per request? - java

Can you please explain the two methodologies which has been implemented in various servlet implementations:
Thread per connection
Thread per request
Which of the above two strategies scales better and why?

Which of the above two strategies scales better and why?
Thread-per-request scales better than thread-per-connection.
Java threads are rather expensive, typically using a 1Mb memory segment each, whether they are active or idle. If you give each connection its own thread, the thread will typically sit idle between successive requests on the connection. Ultimately the framework needs to either stop accepting new connections ('cos it can't create any more threads) or start disconnecting old connections (which leads to connection churn if / when the user wakes up).
HTTP connection requires significantly less resources than a thread stack, although there is a limit of 64K open connections per IP address, due to the way that TCP/IP works.
By contrast, in the thread-per-request model, the thread is only associated while a request is being processed. That usually means that the service needs fewer threads to handle the same number of users. And since threads use significant resources, that means that the service will be more scalable.
(And note that thread-per-request does not mean that the framework has to close the TCP connection between HTTP request ...)
Having said that, the thread-per-request model is not ideal when there are long pauses during the processing of each request. (And it is especially non-ideal when the service uses the comet approach which involves keeping the reply stream open for a long time.) To support this, the Servlet 3.0 spec provides an "asynchronous servlet" mechanism which allows a servlet's request method to suspend its association with the current request thread. This releases the thread to go and process another request.
If the web application can be designed to use the "asynchronous" mechanism, it is likely to be more scalable than either thread-per-request or thread-per-connection.
FOLLOWUP
Let's assume a single webpage with 1000 images. This results in 1001 HTTP requests. Further let's assume HTTP persistent connections is used. With the TPR strategy, this will result in 1001 thread pool management operations (TPMO). With the TPC strategy, this will result in 1 TPMO... Now depending on the actual costs for a single TPMO, I can imagine scenarios where TPC may scale better then TPR.
I think there are some things you haven't considered:
The web browser faced with lots of URLs to fetch to complete a page may well open multiple connections.
With TPC and persistent connections, the thread has to wait for the client to receive the response and send the next request. This wait time could be significant if the network latency is high.
The server has no way of knowing when a given (persistent) connection can be closed. If the browser doesn't close it, it could "linger", tying down the TPC thread until the server times out the connection.
The TPMO overheads are not huge, especially when you separate the pool overheads from the context switch overheads. (You need to do that, since TPC is going to incur context switches on a persistent connections; see above.)
My feeling is that these factors are likely to outweigh the TPMO saving of having one thread dedicated to each connection.

HTTP 1.1 - Has support for persistent connections which means more than one request/response can be received/sent using the same HTTP connection.
So to run those requests received using the same connection in parallel a new Thread is created for each request.
HTTP 1.0 - In this version only one request was received using the connection and the connection was closed after sending the response. So only one thread was created for one connection.

Thread per connection is the Concept of reusing the same HTTP Connection from multiple requests(keep-alive).
Thread per requestwill create a thread for each request from a client.Server can create a number of threads as per request.

Thread per request will create a thread for each HTTP Request the server receives .
Thread per connection will reuse the same HTTP Connection from multiple requests(keep-alive) .AKA HTTP persistent connection
but please note that this supported only from HTTP 1.1
Thread Per Request is faster as most web container use Thread Pooling.
The number of maximum parallel connections you should set on the number of cores on your server.
More cores => more parallel threads .
See here how to configure...
Tomcat 6: http://tomcat.apache.org/tomcat-6.0-doc/config/executor.html
Tomcat 7: http://tomcat.apache.org/tomcat-7.0-doc/config/executor.html
Example

Thread per request should be better, because it reuses threads, while some client may be idle. If you have a lot of concurrent users, they could be served with a less number of threads and having equal number of thread would be more expensive. There is also one more consideration - we do not know if user is still working with application, so we cant know when to destroy a thread. With a thread per request mechanism, we just use a thread pool.

Related

Asynchronous JAX-RS

My doubt is regarding the working of the Asynchronous JAX-RS which is kind of new to me and I'm trying to grasp its advantage.
What I understand is that the client sends a request, the request is delegated from the requestor thread to the worker thread and once the processing is completed the response is sent back to the client using the AsyncResponse. What I've also understood is that throughout the process the client waits for a response from the server. (So as far as the client is concerned its the same as a normal Synchronous request)
It also states that as the request thread is sent to a worker thread for further processing and therefore by having this approach the I/O threads are free to accept new connections.
What I did not understand is, the client is still waiting for a response and therefore an active connection is still maintained between the client and server. Is this not maintained in an I/O thread? What does suspended mean here?
Also, even if the case is that the I/O thread is released because of the delegation of the process to a worker connection with the client is still up how can the server accept more and more connections then?
And my next question is about the thread pool used here. The I/O threads and worker threads are from different pools? are the worker/processor threads not coming from a pool managed by the server?
Because of my failure to understand this, my next pondering is, just having a separate pool for the I/O and the processing with the client connection still up is the same as having the I/O blocked with the processing inside right?
I haven't grasped this concept very well.
The thread pools in use in this scenario are:
The request processing pool, managed by the JAX-RS container (e.g. Jersey), and
The worker thread pool, typically managed by your own code.
There is possibly an IO thread associated with the connection, but that's an implementation detail that doesn't affect this.
When you use AsyncResponse, as soon as you return from your handle method, the request processing thread (from pool #1) is freed and can be used by the container to handle another request.
On to your questions:
"... how can the server accept more and more connections?" - it can accept more connections than if you do not use AsyncResponse for long-running requests, because you are freeing up one of your limited resources (threads in thread pool #1). The connection itself is not freed, and connections are also limited resources, so you can run out of those still (as well as possibly being limited by CPU or memory).
"are the worker/processor threads not coming from a pool managed by the server?" - not normally. See the example in the link from Paul Samsotha here - the application code creates a new thread itself. This is probably not how you would do it, you would most likely want to use an ExecutorService or similar, but the point is that you manage the "worker thread" yourself. The exception here is if you use Jersey's #ManagedAsync annotation.
Your last question should be answered by my answer to #1 - at a lower level there is still a resource associated with a connection, even if you are using AsyncResponse, but AsyncResponse does free up the container's request processing threads, which can be more limited in number than the maximum number of connections. You may choose to handle this problem by changing the server configuration instead of using AsyncResponse, but AsyncResponse has two advantages - it is under the application's control, and it is per-request instead of per-server.

Does AsyncHttpClient knows how many threads to allocate for all the HTTP requests

I'm evaluating AsyncHttpClient for big loads (~1M HTTP requests).
For each request I would like to invoke a callback using the AsyncCompletionHandler which will just insert the result into a blocking queue
My question is: if I'm sending asynchronous requests in a tight loop, how many threads will the AsyncHttpClient use? (I know you can set the max but apparently you take a risk of losing requests, I've seen it here)
I'm currently using the Netty implementation with these versions:
async-http-client v1.9.33
netty v3.10.5.Final
I don't mind using other versions if there are any optimization in later versions
EDIT:
I read that Netty uses the reactor pattern for reacting to HTTP responses which means it allocates a very small number of threads to act as selectors. This also means that the number of allocated threads doesn't increase with high requests volume. However that contradicts the need to set the max number of connections.
Can anyone explain what I'm missing?
Thanks in advance
The AsyncHttpClient client (and other non-blocking IO clients for the matter), don't need to allocate a thread per request, and the client need not resize its thread pool even if you bombard it with requests. You do initiate many connections if you don't use HTTP keep-alive, or call multiple hosts, but it can all be handled by a single threaded client (there may be more than one IO thread, depending on the implementation).
However, it's always a good idea to limit the max requests per host, and max requests per domain, to avoid overloading a service on a specific host, or a site, and avoid getting blocked. This is why HTTP clients add a maxConnectionsPerXxx setting.
AHC has two types of threads:
For I/O operation. On your screen, it's AsyncHttpClient-x-x threads. AHC creates 2*core_number of those.
For timeouts. On your screen, it's AsyncHttpClient-timer-1-1 thread. Should be only one.
And as you mentioned:
maxConnections just means number of open connections which does not
directly affect the number of threads
Source: issue on GitHub: https://github.com/AsyncHttpClient/async-http-client/issues/1658

Request processing threads vs custom threads for async processing vs web server performance

Question: When I create a custom thread to handle incoming HTTP request in asynchronous way, am I actually harming the performance due to introduction of too many threads?
More insight:
Lets say an incoming requests require some heavy database operation to be performed. Web server is heavy loaded and at any given moment 10 request processing threads are constantly busy processing requests. Server has 10 cores so lets assume 1 thread per core is running.
The requests are processed in a synchronous way, each request processing threads handles the job from arrival till completion. There is some wait on database required though.
Possible "improvement" would be to change the flow a bit, instead of request processing thread handling the whole request, additional thread is created to handle the heavy database operation and request processing threads is released early.
This raises concerns
- now much more than 10 threads on 10 cores are required.
- context switching will degrade performance
What I mean by request processing thread:
http://docs.oracle.com/cd/E19146-01/821-1834/geeie/index.html
What I mean by custom thread:
void handleHttpMethod(){
//request processing thread running here
executorService.submit(new DBTask())
//request processing thread exits here
}
I know this is a bit opened question but I am really interested in your feedback and comments.
[EDIT]
Even more details:
I'm running a web application deployed to Glassfish3 server which handles ~1000 requests/sec. Each request involves some DB operation (storing some data and no need to wait for result) which is heavy compared to other logic performed within the request. I'm trying to understand how going async may affect webserver performance, as request processing threads will now have to share the CPU with my custom threads created to handle DB operation.
Note: below assumes that your DB runs on different host, so DB queries won't take CPU from your web-serving threads.
If DB queries times vary by client you can definitely improve throughput (in terms of number of served requests) by offloading DB requests to "custom" threads.
E.g., if client1Query runs for 10 seconds and there are no free threads, client2Query, which would need only 1 second to complete, will need to wait till either client1Query finishes, or some other thread becomes available.

Java pooling connection optimization

Which are the commons guidelines/advices to configure, in Java, a http connection pool to support huge number of concurrent http calls to the same server? I mean:
max total connections
max default connection per route
reuse strategy
keep alive strategy
keep alive duration
connection timeout
....
(I am using Apache http components 4.3, but I am available to explore new solutions)
In order to be more clear, this is my situation:
I developed a REST resource that needs to perform about 10 http calls to AWS CloudSearch in order to obtain search results to be collected in a final result (that I really cannot obtain through a single query).
The whole operation must take less than 0.25 seconds. So, I run http calls in parallel in 10 different threads.
During a benchamarking test, I noticed that with few concurrent request, 5, my objective is reached. But, increasing concurrent requests to 30, there is a tremendous degradation of performance due to the connection time that takes about 1 second. With few concurrent requests, instead, the connection time is about 150 ms (to be more precise, the first connection takes 1 second, all the following connections take about 150 ms). I can ensure that CloudSearch returns its response in less than 15 ms, so there is a problem somewhere in my connection pool.
Thank you!
The amount of threads/connections that are best for your implementation depend on that implementation (which you did not post), but here are some guidelines as requested:
If those threads never block at all, you should have as many threads as cores (Runtime.availableCores(), this will include hyperthread-cores). Simply because more than 100% CPU usage isn't possible.
If your threads rarely block, cores * 2 is a good start for benchmarking.
If your threads frequently block, you absolutely need to benchmark your application with various settings to find the best solution for your implementation, OS and hardware.
Now the most optimal case is obviously the first one, but to get to this one, you need to remove blocking from your code as much as you can. Java can do this for IO operations if you use the NIO package in non-blocking mode (which is not how the Apache package does it).
Then you have 1 thread that waits on a selector and awakes as soon as any data is ready to be sent or read. This thread then only copies the data from it's source to the destination and returns to the selector. In case of a read (incoming data), this destination is a blocking queue, on which core amount of threads wait. One of those threads will then pull out the received data and process it, now without any blocking.
You can then use the length of the blocking queue to adjust how many parallel requests are reasonable for your task and hardware.
The first connection takes >1 second, because it actually has to look-up the address via DNS. All other connections are put on hold for the moment, as there is no sense in doing this twice. You can circumvent that by either calling the IP (probably not good if you talk to a load-balancer) or by "warming-up" the connections with an initial request. Any new connection afterwards will use the cached DNS result, but still needs to perform other initializations, so reusing connections as much as you can will reduce latency a lot. With NIO this is a very easy task.
In addition there are HTTP-multi-requests, that is: you make one connection but request several URLs in one request and get several responses over "the same line". This massively reduces connection overhead, but needs to be supported by the server.

Servers and threading models

I am troubled with the following concept:
Most books/docs describe how robust servers are multithreaded and that the most common approach is to start a new thread to serve each new client. E.g. a thread is dedicated to each new connection. But how is this actually implemented in big systems? If we have a server that accepts requests from 100000 clients, it has started 100000 threads? Is this realistic? Aren't there limits on how many threads can run in a server? Additionally the overhead of context switching and synchronization, doesn't it degrade performance? Is it implemented as a mix of queues and threads? In this case is the number of queues fixed? Can anybody enlighten me on this, and perhaps give me a good reference that describes these?
Thanks!
The common method is to use thread pools. A thread pool is a collection of already created threads. When a new request gets to the server it is assigned a spare thread from the pool. When the request is handled, the thread is returned to the pool.
The number of threads in a pool is configured depending on the characteristics of the application. For example, if you have an application that is CPU bound you will not want too many threads since context switches will decrease performance. On the other hand, if you have a DB or IO bound application you want more threads since much time is spent waiting. Hence, more threads will utilize the CPU better.
Google "thread pools" and you will for sure find much to read about the concept.
Also Read up on the SEDA pattern link , link
In addition to the answers above I should notice, that really high-performance servers with many incoming connections attempt not to spawn a thread per each connection but use IO Completion Ports, select() and other asynchronous techniques for working with multiple sockets in one thread. And of course special attention must be paid to ensure that problems with one request or one socket won't block other sockets in the same thread.
Also thread management consumes CPU time, so threads should not be spawned for each connection or each client request.
In most systems a thread pool is used. This is a pool of available threads that wait for incoming requests. The number of threads can grow to a configured maximum number, depending on the number of simultaneous requests that come in and the characteristics of the application.
If a requests arrives, an unoccupied thread is requested from the thread pool. This thread is then dedicated to handling the request until the request finishes. When that happens, the thread is returned to the thread pool to handle another request.
Since there is only a limited number of threads, in most server systems one should attempt to make the lifetime of requests as short as possible. The less time a request needs to execute, the sooner a thread can be reused for a new request.
If requests come in while all threads are occupied, most servers implement a queueing mechanism for requests. Of course the size of the queue is also limited, so when more requests arrive than can be queued, new requests will be denied.
One other reason for having a thread pool instead of starting threads for each request is that starting a new thread is an expensive operation. It's better to have a number of threads started beforehand and reusing them then starting new threads all the time.
To get network servers to handle lots of concurrent connections there are several approaches (mostly divided up in "one thread per connection" and "several connections per thread" categories), take a look at the C10K page, which is a great resource on this topic, discussing and comparing a lot of approaches and linking to further resources on them.
Creating 10k threads is not likely to be efficient in most cases, but can be done and would work.
If you needed to serve 10k clients at once, doing so on a single machine would be unlikely but possible.
Depending on the client side implementation, it may be that the 10,000 clients do not need to maintain an open TCP connection - depending on the purpose, the protocol design can greatly improve the efficiency of implementation.
I think the appropriate solution for high scale systems is probably extremely domain-specific, and if you wanted a suggestion you'd have to explain more about your problem domain.

Categories