In apache HTTPComponent document there is a statement:
Contrary to the popular belief, the performance of NIO in terms of raw data throughput is significantly lower than that of blocking I/O."
Is that true? Can someone explain this in more details? And what is a typical use case where
request / response handling needs to be decoupled
Non blocking IO should be used when you can handle the request, dispatch it for processing on some other execution context (different thread, RPC call to another server, some other async mechanism) and release the web-server's thread to handle more incoming requests. When the processing of the response will be completed, a response handling thread will be invoked, and it will send the response to the client.
I would recommend reading netty documentation for better understanding of the concept.
As for higher throughput: When your server sends/recieves large amounts of data, all those context switches, and passing data between threads, can really hurt overall performance. Think of it like this: you receive a large request (PUT request with a large file). All you need to do is to save it to disk, and return OK. Starting to toss it between threads could result in few more mem-copy operations that would have been needed in case you've just threw it to disk in the same thread. And handling this operation in async manner would not improve performance: though you could have released the request handling thread back to web-server's thread pool and let it process other requests, your main performance bottleneck is your disk IO, and in this case - trying to save more files simultaneously, would only make things slower.
I hope I was clear enough. Please feel free to ask more questions in comments if you need more explanations.
The first statement is true only when the number of concurrent requests is relatively small (rather in tens than thousands). It's all about using many threads (blocking) instead of one or few threads (non-blocking). Let's say you want to write an application which only downloads a file from remote server. If your application need to download only one file at a time you need only one thread. But if you have a crawler which runs thousands of HTTP requests then you need to have thousands of threads (or use limited number of threads + NIO instead). For so big number of threads the problem is context switching which can slow down your application dramatically (therefore for this number of concurrent requests NIO is better).
But let's back to your question. Why NIO can be slower in terms of raw data throughput ? The reason is the amount of CPU time used by NIO driven applications. For such case in blocking model your code is doing only one thing - waiting for data (it executes recv() operation in a loop). In the NIO application the logic is much more complicated: in a loop the code is using the selector to select a set of keys (which involves epoll_wait system call on Linux, Oracle JVM), then iterate through the set, pick up a channel for every key and then read the data from the channel (read() operation in OS). In standard blocking model all you do is to execute the recv() system function. In summary: NIO driven application in such case use more CPU time and generates more mode switch operations because of higher number of system calls (by saying mode switch I mean the switch from user to kernel mode). Therefore the time needed to download the file will be higher.
Related
Angular 4 application sends a list of records to a Java spring MVC application that has been deployed in Websphere 8 Servlet container. The list is then inserted into to a temp table. After the batch insert, a procedure call is made in order to do some calculations and return results. Depending on the size of the list that was inserted into temp table it may take anywhere between: 3000ms( N ~ 500 ), 6000ms( N ~ 1000 ), 50,000+ms ( N > 2000 ).
My asendach would be to create chunks of data and simultaneously send them to database for processing. After threads (Futures) return results I would aggregate them and return back to the client. To sum up, I would split a synchronous call into multiple asynchronous processes(simultaneously executed) and return back to the client over the same thread that initiated HTTP call - landed into my controller.
Everything would be fine and I would not be asking this questions if a more experienced colleague of mine was not strongly disagreeing with this approach. His reasoning is that using this approach is prone to exceptions due to thread interrupts / timeouts / semaphores and so on. Hi is going as far as saying that multithreading should be avoided within a web container because it can crash the Servlet container in case it runs out of threads.
He proposes that we should have the browser send multiple AJAX requests and aggregates/present data in chunks.
Can you please help me understand which approach is better and why?
I would say that your approach is much better.
Threads created by application logic aren't application container threads and limited only by operating system. While each AJAX request uses a thread from application container. So the second approach reduces throughput and increases the possibility of reaching application container limit while and the first one not. Performance also should be considered because it's much cheaper to create a thread than to send a request over network. Plus each network requests uses additional resources for authentication/authorization/encryption etc.
It's definetely harder to write correct multithread code and it can easily prone to errors. However it shouldn't stop you from doing it because concurrency can significantly increase your performance. It's pretty straightforward to handle interrupts and timeouts using Future and you for sure don't need semaphores here.
Exposing this logic to client looks like breaking of encapsulation. Imagine that you use rest api which forces you to send multiple request by splitting you data in chunks. What chunk size should i use? How to deal with timeouts/interrupts? How many requests should i sent? etc. You will have almost the same challenges in both approaches, but it's much easier to deal with them using specially designed for this libraries like ExecutorService and Future.
I am researching if it is possible to have multiple threads output to elasticsearch concurrently using the transport client and bulk upload apis. Specifically, I want to have multiple transport clients or bulk upload api instances run on their own threads and handle input to elasticsearch. My specific reason for wanting to do this is so I can create a load balancing algorithm to handle a very large number of json messages efficiently. I have been googling for some time and can't find any documentation on this type of thing, or anyone else asking similar questions. Additionally, I am new to elasticsearch. Does anyone have any insight on this, some literature they could share, or a good place to start? Thanks.
An idea on how you can achieve this is to have a static class that acts as a wrapper for an elastic Client object. You can then spawn several threads in whatever code you are executing using the ExecutorService. The ExecutorService includes many utility methods, detailed in the link, that might help you manage your processing. These threads would then call into the static class to get the client object when doing processing, prepare their bulk requests, and then send them.
If you are lazy, you can just have loops that execute indefinitely and have sleep calls to help prevent overloading.
A few caveats to watch out for:
1) Be very mindful of Elasticsearch's Thread pool and queue sizes. Do not submit data to ES faster than your hardware can handle. If you are submitting data to ES too fast such that you are overloading the queue, bulk requests will be aborted. Do not increase the bulk queue size unless you need to and know your hardware can keep up and prevent overload. Increasing the queue size if you are running into roadblocks will only delay the inevitable. If you are overloading the bulk, include a way to throttle requests in your code.
2) Partition up your bulk requests by type/index. I am not 100% sure how ES handles bulk requests under the hood, but I have noticed some inconsistent behavior in the queue size when shoving tons requests to different indexes in one bulk request. It would make sense that Elasticsearch partitions up the requests to prevent tons of useless seqs and optimize shard/node traversal, but I have noticed that the queue size goes up much quicker if you mix.
I'm writing a Java tcp/http server that needs to handle thousands of connections through a non-blocking I/O selector. So I'm trying to handle all connections inside the same selector thread but some requests my take a long time to complete. What can I do in that situation? Go back to using threads?
There are three ways of doing this, not counting of course the old school one-thread-per-connection way, which as you know does not scale:
You basically use a concurrent queue (i.e. CoralQueue) to distribute the requests’ work (not the requests themselves) to a fixed number of threads that will execute in parallel. Let’s say you have 1000 simultaneous connections. Instead of having 1000 threads you can analyze how many available CPU cores your machine has and choose a much smaller number of threads. The flow would be:
request -> selector -> demux -> worker threads -> mux -> selector -> response
Like #David Schwartz said, you can make your outbound network calls through the same selector (and thread) that's receiving requests. They will be asynchronous network calls that would never block the selector main thread. You can see some source code for this solution in this article.
You can use a distributed system architecture, where the blocking operation is performed in a separate node. So your server would just pass asynchronous messages to nodes responsible for the heavy duty task, wait but never block. For more information about how asynchronous message queues work you can check this article.
The bottom line is that if you are doing I/O you most certainly should go with option #2. If you are doing CPU computations, you most certainly should go with option #1. If you never want to care about that you should think in terms of a distributed system and go with option #3.
Disclaimer: I'm one of the developers of CoralQueue.
Which are the commons guidelines/advices to configure, in Java, a http connection pool to support huge number of concurrent http calls to the same server? I mean:
max total connections
max default connection per route
reuse strategy
keep alive strategy
keep alive duration
connection timeout
....
(I am using Apache http components 4.3, but I am available to explore new solutions)
In order to be more clear, this is my situation:
I developed a REST resource that needs to perform about 10 http calls to AWS CloudSearch in order to obtain search results to be collected in a final result (that I really cannot obtain through a single query).
The whole operation must take less than 0.25 seconds. So, I run http calls in parallel in 10 different threads.
During a benchamarking test, I noticed that with few concurrent request, 5, my objective is reached. But, increasing concurrent requests to 30, there is a tremendous degradation of performance due to the connection time that takes about 1 second. With few concurrent requests, instead, the connection time is about 150 ms (to be more precise, the first connection takes 1 second, all the following connections take about 150 ms). I can ensure that CloudSearch returns its response in less than 15 ms, so there is a problem somewhere in my connection pool.
Thank you!
The amount of threads/connections that are best for your implementation depend on that implementation (which you did not post), but here are some guidelines as requested:
If those threads never block at all, you should have as many threads as cores (Runtime.availableCores(), this will include hyperthread-cores). Simply because more than 100% CPU usage isn't possible.
If your threads rarely block, cores * 2 is a good start for benchmarking.
If your threads frequently block, you absolutely need to benchmark your application with various settings to find the best solution for your implementation, OS and hardware.
Now the most optimal case is obviously the first one, but to get to this one, you need to remove blocking from your code as much as you can. Java can do this for IO operations if you use the NIO package in non-blocking mode (which is not how the Apache package does it).
Then you have 1 thread that waits on a selector and awakes as soon as any data is ready to be sent or read. This thread then only copies the data from it's source to the destination and returns to the selector. In case of a read (incoming data), this destination is a blocking queue, on which core amount of threads wait. One of those threads will then pull out the received data and process it, now without any blocking.
You can then use the length of the blocking queue to adjust how many parallel requests are reasonable for your task and hardware.
The first connection takes >1 second, because it actually has to look-up the address via DNS. All other connections are put on hold for the moment, as there is no sense in doing this twice. You can circumvent that by either calling the IP (probably not good if you talk to a load-balancer) or by "warming-up" the connections with an initial request. Any new connection afterwards will use the cached DNS result, but still needs to perform other initializations, so reusing connections as much as you can will reduce latency a lot. With NIO this is a very easy task.
In addition there are HTTP-multi-requests, that is: you make one connection but request several URLs in one request and get several responses over "the same line". This massively reduces connection overhead, but needs to be supported by the server.
Let's say I'm running a server, and set client SocketChannels that I accept as non blocking, and read them through a thread pool's threads. But what does that buy me? I anyway need to read the full client request before processing it, which means I need to make multiple read calls.
I've also come across articles saying that threads should block naturally so it gives a chance to other threads to run. However this won't happen in the aforementioned case as these threads will not block.
So how would non blocking IO be efficient? How to make sense of this all? Some multi-core CPU angle to it perhaps? But how?
EDIT: found a pretty good link that explains it programmatically:
http://rox-xmlrpc.sourceforge.net/niotut/
The problem using blocking IO starts when you want to scale your server program. You'd have to hold a blocking thread-per-request. Many many requests will introduce man many threads. This might make some hard time for a server application that serves thousands and more of IO involving concurrent requests.
Using nio non-blocking IO, this request-to-thread coupling is redundant. You can use any thread to complete the IO operation of any request. This lets you use the great pooling pattern for your IO handling threads, and decrease significantly the thread creation and management overhead. On the other hand, you'd have to work harder to sustain data consistency, but that would be the price of scalability.
Unless you want to use busy waiting (which sounds unlikely) if you want to use non-blocking you usually use a small number of threads (may be only one) and a Selector.
If you are going to use blocking IO, that is when you dedicate one or two threads per connection.