I am troubled with the following concept:
Most books/docs describe how robust servers are multithreaded and that the most common approach is to start a new thread to serve each new client. E.g. a thread is dedicated to each new connection. But how is this actually implemented in big systems? If we have a server that accepts requests from 100000 clients, it has started 100000 threads? Is this realistic? Aren't there limits on how many threads can run in a server? Additionally the overhead of context switching and synchronization, doesn't it degrade performance? Is it implemented as a mix of queues and threads? In this case is the number of queues fixed? Can anybody enlighten me on this, and perhaps give me a good reference that describes these?
Thanks!
The common method is to use thread pools. A thread pool is a collection of already created threads. When a new request gets to the server it is assigned a spare thread from the pool. When the request is handled, the thread is returned to the pool.
The number of threads in a pool is configured depending on the characteristics of the application. For example, if you have an application that is CPU bound you will not want too many threads since context switches will decrease performance. On the other hand, if you have a DB or IO bound application you want more threads since much time is spent waiting. Hence, more threads will utilize the CPU better.
Google "thread pools" and you will for sure find much to read about the concept.
Also Read up on the SEDA pattern link , link
In addition to the answers above I should notice, that really high-performance servers with many incoming connections attempt not to spawn a thread per each connection but use IO Completion Ports, select() and other asynchronous techniques for working with multiple sockets in one thread. And of course special attention must be paid to ensure that problems with one request or one socket won't block other sockets in the same thread.
Also thread management consumes CPU time, so threads should not be spawned for each connection or each client request.
In most systems a thread pool is used. This is a pool of available threads that wait for incoming requests. The number of threads can grow to a configured maximum number, depending on the number of simultaneous requests that come in and the characteristics of the application.
If a requests arrives, an unoccupied thread is requested from the thread pool. This thread is then dedicated to handling the request until the request finishes. When that happens, the thread is returned to the thread pool to handle another request.
Since there is only a limited number of threads, in most server systems one should attempt to make the lifetime of requests as short as possible. The less time a request needs to execute, the sooner a thread can be reused for a new request.
If requests come in while all threads are occupied, most servers implement a queueing mechanism for requests. Of course the size of the queue is also limited, so when more requests arrive than can be queued, new requests will be denied.
One other reason for having a thread pool instead of starting threads for each request is that starting a new thread is an expensive operation. It's better to have a number of threads started beforehand and reusing them then starting new threads all the time.
To get network servers to handle lots of concurrent connections there are several approaches (mostly divided up in "one thread per connection" and "several connections per thread" categories), take a look at the C10K page, which is a great resource on this topic, discussing and comparing a lot of approaches and linking to further resources on them.
Creating 10k threads is not likely to be efficient in most cases, but can be done and would work.
If you needed to serve 10k clients at once, doing so on a single machine would be unlikely but possible.
Depending on the client side implementation, it may be that the 10,000 clients do not need to maintain an open TCP connection - depending on the purpose, the protocol design can greatly improve the efficiency of implementation.
I think the appropriate solution for high scale systems is probably extremely domain-specific, and if you wanted a suggestion you'd have to explain more about your problem domain.
Related
I have a network server that has a few dozen backend controllers, each which process a different user request (i.e., user clicking something on the website).
Each controller makes network calls to a handful of services to get the data it needs. Each network call takes somewhere around 200ms. These network calls can be done in parallel, so I want to launch one thread for each of them and then collect at the end - maximizing parallelization. (5 network calls in parallel takes 200ms, where 5 in sequence will take 1000ms).
However I am unsure of best practice to design the thread management strategy.
Should I have one threadpool with say 1000 (arbitrary number for example) threads in it, and each controller draws from that pool?
Should I not have a threadpool at all and create new threads in each controller as I need them? This option seems "dumb" but I wonder - how much is the cost in CPU cycles of creating a thread, compared to waiting for network response? Quite minimal.
Should I have one threadpool per controller? (Meaning dozens of threadpools, each with around 5 or 6 threads for that specific controller).
Seeking pros / cons of each strategy, best practices, or an alternate strategy I haven't considered.
I'm evaluating AsyncHttpClient for big loads (~1M HTTP requests).
For each request I would like to invoke a callback using the AsyncCompletionHandler which will just insert the result into a blocking queue
My question is: if I'm sending asynchronous requests in a tight loop, how many threads will the AsyncHttpClient use? (I know you can set the max but apparently you take a risk of losing requests, I've seen it here)
I'm currently using the Netty implementation with these versions:
async-http-client v1.9.33
netty v3.10.5.Final
I don't mind using other versions if there are any optimization in later versions
EDIT:
I read that Netty uses the reactor pattern for reacting to HTTP responses which means it allocates a very small number of threads to act as selectors. This also means that the number of allocated threads doesn't increase with high requests volume. However that contradicts the need to set the max number of connections.
Can anyone explain what I'm missing?
Thanks in advance
The AsyncHttpClient client (and other non-blocking IO clients for the matter), don't need to allocate a thread per request, and the client need not resize its thread pool even if you bombard it with requests. You do initiate many connections if you don't use HTTP keep-alive, or call multiple hosts, but it can all be handled by a single threaded client (there may be more than one IO thread, depending on the implementation).
However, it's always a good idea to limit the max requests per host, and max requests per domain, to avoid overloading a service on a specific host, or a site, and avoid getting blocked. This is why HTTP clients add a maxConnectionsPerXxx setting.
AHC has two types of threads:
For I/O operation. On your screen, it's AsyncHttpClient-x-x threads. AHC creates 2*core_number of those.
For timeouts. On your screen, it's AsyncHttpClient-timer-1-1 thread. Should be only one.
And as you mentioned:
maxConnections just means number of open connections which does not
directly affect the number of threads
Source: issue on GitHub: https://github.com/AsyncHttpClient/async-http-client/issues/1658
Which are the commons guidelines/advices to configure, in Java, a http connection pool to support huge number of concurrent http calls to the same server? I mean:
max total connections
max default connection per route
reuse strategy
keep alive strategy
keep alive duration
connection timeout
....
(I am using Apache http components 4.3, but I am available to explore new solutions)
In order to be more clear, this is my situation:
I developed a REST resource that needs to perform about 10 http calls to AWS CloudSearch in order to obtain search results to be collected in a final result (that I really cannot obtain through a single query).
The whole operation must take less than 0.25 seconds. So, I run http calls in parallel in 10 different threads.
During a benchamarking test, I noticed that with few concurrent request, 5, my objective is reached. But, increasing concurrent requests to 30, there is a tremendous degradation of performance due to the connection time that takes about 1 second. With few concurrent requests, instead, the connection time is about 150 ms (to be more precise, the first connection takes 1 second, all the following connections take about 150 ms). I can ensure that CloudSearch returns its response in less than 15 ms, so there is a problem somewhere in my connection pool.
Thank you!
The amount of threads/connections that are best for your implementation depend on that implementation (which you did not post), but here are some guidelines as requested:
If those threads never block at all, you should have as many threads as cores (Runtime.availableCores(), this will include hyperthread-cores). Simply because more than 100% CPU usage isn't possible.
If your threads rarely block, cores * 2 is a good start for benchmarking.
If your threads frequently block, you absolutely need to benchmark your application with various settings to find the best solution for your implementation, OS and hardware.
Now the most optimal case is obviously the first one, but to get to this one, you need to remove blocking from your code as much as you can. Java can do this for IO operations if you use the NIO package in non-blocking mode (which is not how the Apache package does it).
Then you have 1 thread that waits on a selector and awakes as soon as any data is ready to be sent or read. This thread then only copies the data from it's source to the destination and returns to the selector. In case of a read (incoming data), this destination is a blocking queue, on which core amount of threads wait. One of those threads will then pull out the received data and process it, now without any blocking.
You can then use the length of the blocking queue to adjust how many parallel requests are reasonable for your task and hardware.
The first connection takes >1 second, because it actually has to look-up the address via DNS. All other connections are put on hold for the moment, as there is no sense in doing this twice. You can circumvent that by either calling the IP (probably not good if you talk to a load-balancer) or by "warming-up" the connections with an initial request. Any new connection afterwards will use the cached DNS result, but still needs to perform other initializations, so reusing connections as much as you can will reduce latency a lot. With NIO this is a very easy task.
In addition there are HTTP-multi-requests, that is: you make one connection but request several URLs in one request and get several responses over "the same line". This massively reduces connection overhead, but needs to be supported by the server.
Can you please explain the two methodologies which has been implemented in various servlet implementations:
Thread per connection
Thread per request
Which of the above two strategies scales better and why?
Which of the above two strategies scales better and why?
Thread-per-request scales better than thread-per-connection.
Java threads are rather expensive, typically using a 1Mb memory segment each, whether they are active or idle. If you give each connection its own thread, the thread will typically sit idle between successive requests on the connection. Ultimately the framework needs to either stop accepting new connections ('cos it can't create any more threads) or start disconnecting old connections (which leads to connection churn if / when the user wakes up).
HTTP connection requires significantly less resources than a thread stack, although there is a limit of 64K open connections per IP address, due to the way that TCP/IP works.
By contrast, in the thread-per-request model, the thread is only associated while a request is being processed. That usually means that the service needs fewer threads to handle the same number of users. And since threads use significant resources, that means that the service will be more scalable.
(And note that thread-per-request does not mean that the framework has to close the TCP connection between HTTP request ...)
Having said that, the thread-per-request model is not ideal when there are long pauses during the processing of each request. (And it is especially non-ideal when the service uses the comet approach which involves keeping the reply stream open for a long time.) To support this, the Servlet 3.0 spec provides an "asynchronous servlet" mechanism which allows a servlet's request method to suspend its association with the current request thread. This releases the thread to go and process another request.
If the web application can be designed to use the "asynchronous" mechanism, it is likely to be more scalable than either thread-per-request or thread-per-connection.
FOLLOWUP
Let's assume a single webpage with 1000 images. This results in 1001 HTTP requests. Further let's assume HTTP persistent connections is used. With the TPR strategy, this will result in 1001 thread pool management operations (TPMO). With the TPC strategy, this will result in 1 TPMO... Now depending on the actual costs for a single TPMO, I can imagine scenarios where TPC may scale better then TPR.
I think there are some things you haven't considered:
The web browser faced with lots of URLs to fetch to complete a page may well open multiple connections.
With TPC and persistent connections, the thread has to wait for the client to receive the response and send the next request. This wait time could be significant if the network latency is high.
The server has no way of knowing when a given (persistent) connection can be closed. If the browser doesn't close it, it could "linger", tying down the TPC thread until the server times out the connection.
The TPMO overheads are not huge, especially when you separate the pool overheads from the context switch overheads. (You need to do that, since TPC is going to incur context switches on a persistent connections; see above.)
My feeling is that these factors are likely to outweigh the TPMO saving of having one thread dedicated to each connection.
HTTP 1.1 - Has support for persistent connections which means more than one request/response can be received/sent using the same HTTP connection.
So to run those requests received using the same connection in parallel a new Thread is created for each request.
HTTP 1.0 - In this version only one request was received using the connection and the connection was closed after sending the response. So only one thread was created for one connection.
Thread per connection is the Concept of reusing the same HTTP Connection from multiple requests(keep-alive).
Thread per requestwill create a thread for each request from a client.Server can create a number of threads as per request.
Thread per request will create a thread for each HTTP Request the server receives .
Thread per connection will reuse the same HTTP Connection from multiple requests(keep-alive) .AKA HTTP persistent connection
but please note that this supported only from HTTP 1.1
Thread Per Request is faster as most web container use Thread Pooling.
The number of maximum parallel connections you should set on the number of cores on your server.
More cores => more parallel threads .
See here how to configure...
Tomcat 6: http://tomcat.apache.org/tomcat-6.0-doc/config/executor.html
Tomcat 7: http://tomcat.apache.org/tomcat-7.0-doc/config/executor.html
Example
Thread per request should be better, because it reuses threads, while some client may be idle. If you have a lot of concurrent users, they could be served with a less number of threads and having equal number of thread would be more expensive. There is also one more consideration - we do not know if user is still working with application, so we cant know when to destroy a thread. With a thread per request mechanism, we just use a thread pool.
Let's say I'm running a server, and set client SocketChannels that I accept as non blocking, and read them through a thread pool's threads. But what does that buy me? I anyway need to read the full client request before processing it, which means I need to make multiple read calls.
I've also come across articles saying that threads should block naturally so it gives a chance to other threads to run. However this won't happen in the aforementioned case as these threads will not block.
So how would non blocking IO be efficient? How to make sense of this all? Some multi-core CPU angle to it perhaps? But how?
EDIT: found a pretty good link that explains it programmatically:
http://rox-xmlrpc.sourceforge.net/niotut/
The problem using blocking IO starts when you want to scale your server program. You'd have to hold a blocking thread-per-request. Many many requests will introduce man many threads. This might make some hard time for a server application that serves thousands and more of IO involving concurrent requests.
Using nio non-blocking IO, this request-to-thread coupling is redundant. You can use any thread to complete the IO operation of any request. This lets you use the great pooling pattern for your IO handling threads, and decrease significantly the thread creation and management overhead. On the other hand, you'd have to work harder to sustain data consistency, but that would be the price of scalability.
Unless you want to use busy waiting (which sounds unlikely) if you want to use non-blocking you usually use a small number of threads (may be only one) and a Selector.
If you are going to use blocking IO, that is when you dedicate one or two threads per connection.