Serving single HTTP request with multiple threads

Serving single HTTP request with multiple threads - java

Angular 4 application sends a list of records to a Java spring MVC application that has been deployed in Websphere 8 Servlet container. The list is then inserted into to a temp table. After the batch insert, a procedure call is made in order to do some calculations and return results. Depending on the size of the list that was inserted into temp table it may take anywhere between: 3000ms( N ~ 500 ), 6000ms( N ~ 1000 ), 50,000+ms ( N > 2000 ).
My asendach would be to create chunks of data and simultaneously send them to database for processing. After threads (Futures) return results I would aggregate them and return back to the client. To sum up, I would split a synchronous call into multiple asynchronous processes(simultaneously executed) and return back to the client over the same thread that initiated HTTP call - landed into my controller.
Everything would be fine and I would not be asking this questions if a more experienced colleague of mine was not strongly disagreeing with this approach. His reasoning is that using this approach is prone to exceptions due to thread interrupts / timeouts / semaphores and so on. Hi is going as far as saying that multithreading should be avoided within a web container because it can crash the Servlet container in case it runs out of threads.
He proposes that we should have the browser send multiple AJAX requests and aggregates/present data in chunks.
Can you please help me understand which approach is better and why?

I would say that your approach is much better.
Threads created by application logic aren't application container threads and limited only by operating system. While each AJAX request uses a thread from application container. So the second approach reduces throughput and increases the possibility of reaching application container limit while and the first one not. Performance also should be considered because it's much cheaper to create a thread than to send a request over network. Plus each network requests uses additional resources for authentication/authorization/encryption etc.
It's definetely harder to write correct multithread code and it can easily prone to errors. However it shouldn't stop you from doing it because concurrency can significantly increase your performance. It's pretty straightforward to handle interrupts and timeouts using Future and you for sure don't need semaphores here.
Exposing this logic to client looks like breaking of encapsulation. Imagine that you use rest api which forces you to send multiple request by splitting you data in chunks. What chunk size should i use? How to deal with timeouts/interrupts? How many requests should i sent? etc. You will have almost the same challenges in both approaches, but it's much easier to deal with them using specially designed for this libraries like ExecutorService and Future.

Related

AWS Lambda Performance issues

I use aws api gateway integrated with aws lambda(java), but I'm seeing some serious problems in this approach. The concept of removing the server and having your app scaled out of the box is really nice but here are the problem I'm facing. My lambda is doing 2 simple things- validate the payload received from the client and then send it to a kinesis stream for further processing from another lambda(you will ask why I don't send directly to the stream and only use 1 lambda for all of the operations. Let's just say that I want to separate the logic and have a layer of abstraction and also be able to tell the client that he's sending invalid data.).
In the implementation of the lambda I integrated the spring DI. So far so good. I started making performance testing. I simulated 50 concurrent users making 4 requests each with 5 seconds between the requests. So what happened- In the lambda's coldstart I initialize the spring's application context but it seems that having so many simultaneous requests when the lambda was not started is doing some strange things. Here's a screenshot of the times the context was initialized for.
What we can see from the screenshot is that the times for initializing the context have big difference. My assumption of what happening is that when so many requests are received and there's no "active" lambda it initializes a lambda container for every one of them and in the same time it "blocks" some of them(the ones with the big times of 18s) until the others already started are ready. So maybe it has some internal limit of the containers it can start at the same time. The problem is that if you don't have equally distributed traffic this will happen from time to time and some of the requests will timeout. We don't want this to happen.
So next thing was to do some tests without spring container as my thought was "ok, the initialization is heavy, let's just make plain old java objects initialization". And unfortunatelly the same thing happened(maybe just reduced the 3s container initialization for some of the requests). Here is a more detailed screenshot of the test data:
So I logged the whole lambda execution time(from construction to the end), the kinesis client initialization and the actual sending of the data to the stream as these are the heaviest operations in the lambda. We still have these big times of 18s or something but the interesting thing is that the times are somehow proportional. So if the whole lambda takes 18s, around 7-8s is the client initialization and 6-7 for sending the data to the stream and 4-5 seconds left for the other operations in the lambda which for the moment is only validation. On the other hand if we take one of the small times(which means that it reuses an already started lambda),i.e. 820ms, it takes 100ms for the kinesis client initialization and 340 for the data sending and 400ms for the validation. So this pushes me again to the thoughts that internally it makes some sleeps because of some limits. The next screenshot is showing what is happening on the next round of requests when the lamda is already started:
So we don't have this big times, yes we still have some relatively big delta in some of the request(which for me is also strange), but the things looks much better.
So I'm looking for a clarification from someone who knows actually what is happening under the hood, because this is not a good behavior for a serious application which is using the cloud because of it's "unlimited" possibilities.
And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region. For me this is also a big limitation for a big application with lots of traffic. So as my business case in the moment(I don't know for the future) is more or less fire and forget the request. And I'm starting to think of changing the logic in the way that the gateway sends the data directly to the stream and the other lambda is taking care of the validation and the further processing. Yes, I'm loosing the current abstraction(which I don't need at the moment) but I'm increasing the application availability many times. What do you think?

The lambda execution time spikes to 18s because AWS launches new containers w/ your code to handle the incoming requests. The bootstrap time is ~18s.
Assigning more RAM can significantly improve the performance of your lambda function, because you have more RAM, CPU and networking throughput!
And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region.
You can ask to the AWS Support to increase that limit. I asked to increase that limit to 10,000 invocation/second and the AWS Support did it quickly!

You can proxy straight to the Kinesis stream via API Gateway. You would lose some control in terms of validation and transformation, but you won't have the cold start latency that you're seeing from Lambda.
You can use the API Gateway mapping template to transform the data and if validation is important, you could potentially do that at the processing Lambda on the other side of the stream.

Concurrent output from java application to elasticsearch

I am researching if it is possible to have multiple threads output to elasticsearch concurrently using the transport client and bulk upload apis. Specifically, I want to have multiple transport clients or bulk upload api instances run on their own threads and handle input to elasticsearch. My specific reason for wanting to do this is so I can create a load balancing algorithm to handle a very large number of json messages efficiently. I have been googling for some time and can't find any documentation on this type of thing, or anyone else asking similar questions. Additionally, I am new to elasticsearch. Does anyone have any insight on this, some literature they could share, or a good place to start? Thanks.

An idea on how you can achieve this is to have a static class that acts as a wrapper for an elastic Client object. You can then spawn several threads in whatever code you are executing using the ExecutorService. The ExecutorService includes many utility methods, detailed in the link, that might help you manage your processing. These threads would then call into the static class to get the client object when doing processing, prepare their bulk requests, and then send them.
If you are lazy, you can just have loops that execute indefinitely and have sleep calls to help prevent overloading.
A few caveats to watch out for:
1) Be very mindful of Elasticsearch's Thread pool and queue sizes. Do not submit data to ES faster than your hardware can handle. If you are submitting data to ES too fast such that you are overloading the queue, bulk requests will be aborted. Do not increase the bulk queue size unless you need to and know your hardware can keep up and prevent overload. Increasing the queue size if you are running into roadblocks will only delay the inevitable. If you are overloading the bulk, include a way to throttle requests in your code.
2) Partition up your bulk requests by type/index. I am not 100% sure how ES handles bulk requests under the hood, but I have noticed some inconsistent behavior in the queue size when shoving tons requests to different indexes in one bulk request. It would make sense that Elasticsearch partitions up the requests to prevent tons of useless seqs and optimize shard/node traversal, but I have noticed that the queue size goes up much quicker if you mix.

Java Servlets Large number of Requests and Threads

In one of my interview I was asked how servlets work and I told them for every request,servlet container creates a thread upon which he asked again then if we take a popular site like facebook which gets a huge number of requests and if we allocate a thread to each of this request then it wouldnt be a good approach,how do they handle such many request.I thought of thread pool but i do not know whether this is the approach.Someone please explain how such many of requests are handled in servlet container.

Two approaches here that complete each other:
yes, limit the number of threads to a fixed number and pre-create them into a pool - thus preventing the costly process of re-creating them every time. I think Apache's HTTP server works this way.
You can always throw more machines at the problem. Large sites always use clusters of web-servers, thus balancing the load.

How to effectively process lot of objects on a list on server side

I have a List which contains a lot of objects.
The problem is that i have to process these objects (process includes cloning, deep copy, and making DB calls, running business logic etc etc.
Doing this in a normal fashion, first come first serve is really time consuming and in a web application , this generally results in transaction timeouts at the server side (as this processing is anync from client perspective).
How do i process those objects so as to take minimal time and not overload the DB.
I'm using java 7 on server environment.
I'm already using a messaging solution , rabbitmq, which gets me the item and its quantity. problem occurs when i try to deep copy items to mimic real items (business logic every item should be uniquely processed) and save them to DB.

After some discussions, the viable solution is using a ABQ (array blocking queues) which is processed by a pool of threads.
Following are the thought out benefits:
1) we wont have to manage the 3rd party queues created e.g. rabbitmq
2) At any point in time the blocking queue wont have all the items to be processed as the consumer threads will be simultaneously processing them, so it will leave lesser memory footprint.
#cody123 i'm using spring batch for retry mechanisms in this case.

After another round of profiling i found that the bottle neck was the DB connection pool having low number of max connections.
I deduced this by running the same transaction without db thread pool and it went perfectly well and completed without any exception.
However combining the previous approach i.e. managing an ABQ and light commits with HA DB will be the best solution.

nonblocking-io vs blocking-io on raw data throughput

In apache HTTPComponent document there is a statement:
Contrary to the popular belief, the performance of NIO in terms of raw data throughput is significantly lower than that of blocking I/O."
Is that true? Can someone explain this in more details? And what is a typical use case where
request / response handling needs to be decoupled

Non blocking IO should be used when you can handle the request, dispatch it for processing on some other execution context (different thread, RPC call to another server, some other async mechanism) and release the web-server's thread to handle more incoming requests. When the processing of the response will be completed, a response handling thread will be invoked, and it will send the response to the client.
I would recommend reading netty documentation for better understanding of the concept.
As for higher throughput: When your server sends/recieves large amounts of data, all those context switches, and passing data between threads, can really hurt overall performance. Think of it like this: you receive a large request (PUT request with a large file). All you need to do is to save it to disk, and return OK. Starting to toss it between threads could result in few more mem-copy operations that would have been needed in case you've just threw it to disk in the same thread. And handling this operation in async manner would not improve performance: though you could have released the request handling thread back to web-server's thread pool and let it process other requests, your main performance bottleneck is your disk IO, and in this case - trying to save more files simultaneously, would only make things slower.
I hope I was clear enough. Please feel free to ask more questions in comments if you need more explanations.

The first statement is true only when the number of concurrent requests is relatively small (rather in tens than thousands). It's all about using many threads (blocking) instead of one or few threads (non-blocking). Let's say you want to write an application which only downloads a file from remote server. If your application need to download only one file at a time you need only one thread. But if you have a crawler which runs thousands of HTTP requests then you need to have thousands of threads (or use limited number of threads + NIO instead). For so big number of threads the problem is context switching which can slow down your application dramatically (therefore for this number of concurrent requests NIO is better).
But let's back to your question. Why NIO can be slower in terms of raw data throughput ? The reason is the amount of CPU time used by NIO driven applications. For such case in blocking model your code is doing only one thing - waiting for data (it executes recv() operation in a loop). In the NIO application the logic is much more complicated: in a loop the code is using the selector to select a set of keys (which involves epoll_wait system call on Linux, Oracle JVM), then iterate through the set, pick up a channel for every key and then read the data from the channel (read() operation in OS). In standard blocking model all you do is to execute the recv() system function. In summary: NIO driven application in such case use more CPU time and generates more mode switch operations because of higher number of system calls (by saying mode switch I mean the switch from user to kernel mode). Therefore the time needed to download the file will be higher.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.