Asynchronous JAX-RS

Asynchronous JAX-RS - java

My doubt is regarding the working of the Asynchronous JAX-RS which is kind of new to me and I'm trying to grasp its advantage.
What I understand is that the client sends a request, the request is delegated from the requestor thread to the worker thread and once the processing is completed the response is sent back to the client using the AsyncResponse. What I've also understood is that throughout the process the client waits for a response from the server. (So as far as the client is concerned its the same as a normal Synchronous request)
It also states that as the request thread is sent to a worker thread for further processing and therefore by having this approach the I/O threads are free to accept new connections.
What I did not understand is, the client is still waiting for a response and therefore an active connection is still maintained between the client and server. Is this not maintained in an I/O thread? What does suspended mean here?
Also, even if the case is that the I/O thread is released because of the delegation of the process to a worker connection with the client is still up how can the server accept more and more connections then?
And my next question is about the thread pool used here. The I/O threads and worker threads are from different pools? are the worker/processor threads not coming from a pool managed by the server?
Because of my failure to understand this, my next pondering is, just having a separate pool for the I/O and the processing with the client connection still up is the same as having the I/O blocked with the processing inside right?
I haven't grasped this concept very well.

The thread pools in use in this scenario are:
The request processing pool, managed by the JAX-RS container (e.g. Jersey), and
The worker thread pool, typically managed by your own code.
There is possibly an IO thread associated with the connection, but that's an implementation detail that doesn't affect this.
When you use AsyncResponse, as soon as you return from your handle method, the request processing thread (from pool #1) is freed and can be used by the container to handle another request.
On to your questions:
"... how can the server accept more and more connections?" - it can accept more connections than if you do not use AsyncResponse for long-running requests, because you are freeing up one of your limited resources (threads in thread pool #1). The connection itself is not freed, and connections are also limited resources, so you can run out of those still (as well as possibly being limited by CPU or memory).
"are the worker/processor threads not coming from a pool managed by the server?" - not normally. See the example in the link from Paul Samsotha here - the application code creates a new thread itself. This is probably not how you would do it, you would most likely want to use an ExecutorService or similar, but the point is that you manage the "worker thread" yourself. The exception here is if you use Jersey's #ManagedAsync annotation.
Your last question should be answered by my answer to #1 - at a lower level there is still a resource associated with a connection, even if you are using AsyncResponse, but AsyncResponse does free up the container's request processing threads, which can be more limited in number than the maximum number of connections. You may choose to handle this problem by changing the server configuration instead of using AsyncResponse, but AsyncResponse has two advantages - it is under the application's control, and it is per-request instead of per-server.

Related

Tomcat Thread Model - Does a thread in Thread per request model handle all work related to that request?

I understand that the Servlet Containers will use "Thread per request" model, but my question is, will the thread handling the request do all the below steps ?
Obtain thread from pool to handle request and and pass http request and http response objects to Servlet service method.
Invoke service/dao/ logic which could potentially involve delay since I/O operation is done in DB.
Return the Http response
Return the thread to the Container Thread pool
My main questions is, if somehow the I/O operation on step 2 takes a huge amount of time, will the Servlet container run out of threads from the pool ? Or does the Container use one thread/threads just to handle the request and then delegates the work to another thread to do the rest of the work ? Also I heard that nowadays they are changing the model to a Threaded Model with NIO operations? Thank you.

will the same thread be used for everything ?
TL;DR - No.
Once the Servlet Container (Catalina) spins up the thread per request, that thread is deallocated/exited right after that request-response cycle is finished (that is, corresponding HTTP request handler Servlet method returns).
If your service (DAO/logic/whatever) layer will block the thread, which eventually blocks the web layer (doGet(), doPost() or etc.), browser will go idle, awaiting the response (time is either default or configured), and Catalina (Servlet Container) will block that thread only (other requests may arrive successfully);
I/O (or to be specific Request-Response) timeout will be either default (which is 60 seconds, but it depends on the Tomcat version), or configured by yourself;
Design of the architecture, to delegate discrete incoming HTTP Message to separate threads, has a sole and simple purpose - to process the Request-Response cycles in isolation.
Head First Servlets & JSP:
The Container automatically creates a new Java thread for every servlet request it receives. When the servlet’s done running the HTTP service method for that client’s request, the thread completes (i.e. dies).
Update to your updated question
my question is, will the thread handling the request do all the below steps?
TL;DR - No again.
Servlet Objects live in container, which is a completely separate thread.
When the HTTP message (request, in this case) hits the Servlet-mapped endpoint, this happens:
Servlet Container creates HttpServletResponse and HttpServletRequest objects;
Container allocates(creates) a new thread for that request and response objects (Important: in order to isolate client-server communication.);
Container then passes those request and response objects to the servlet thread;
Container then calls the Servlet API's service() method and depending on what is the type of incoming message (GET, POST, etc.), it invokes corresponding method (doGet(); doPost(); etc.);
Container DOES NOT CARE whatever levels or layers of architecture you have - DAO, Repository, Service, Cherry, Apple or whatever. It will wait until the corresponding HTTP request handler method finishes (accordingly, if something blocks it, container will block that thread);
When the handler method returns; thread is deallocated.
Answering your further questions
My main questions is, if somehow the I/O operation on step 2 takes a huge amount of time, will the Servlet container run out of threads from the pool ?
Theoretically it can; however, that means, that it should block all the 200 threads at the same time and this time (if the default configuration is maintained) it will not accept any other requests (until some thread deallocates).
This, however, can be configured with maxThreads attribute and you can choose what should be the threshold number of request processing threads allowed in Tomcat.
Or does the Container use one thread/threads just to handle the request and then delegates the work to another thread to do the rest of the work?
We have answered this above.
Also I heard that nowadays they are changing the model to a Threaded Model with NIO operations?
NIO specific configuration and it can facilitate poller threads, which are used to simultaneously handle multiple connections per thread; however, this is a big and completely different topic. For the further reading, have a look a this and this.
PLEASE, make sure that your future posts are not too broad, containing 10 different questions in a single post.

Async HTTP request vs HTTP requests on new thread

I have 2 microservices (A and B).
A has an endpoint which accepts POST requests. When users make a POST request, this happens:
Service A takes the object from the POST request body and stores it in a database.
Service A converts the object to a different object. And the new object gets sent to service B via Jersey HTTP client.
Step 2 takes place on a Java thread pool I have created (Executors.newCachedThreadPool). By doing step 2 on a new thread, the response time of service A's endpoint is not affected.
However, if service B is taking long to respond, service A can potentially create too many threads when it is receiving many POST requests. To help fix this, I can use a fixed thread pool (Exectuors.newFixedThreadPool).
In addition to the fixed thread pool, should I also use an asynchronous non-blocking HTTP client? Such as the one here: https://hc.apache.org/httpcomponents-asyncclient-dev/. The Jersey HTTP client that I use is blocking.
It seems like it is right to use the async HTTP client. But if I switch to a fixed thread pool, I think the async HTTP client won't provide a significant benefit - am I wrong in thinking this?

Even if you use fixed thread pool all your threads in it will be blocked on step 2 meaning that they won't do any meaningful job - just wait for your API to return a response which is not a pragmatic resource management. In this case, you will be able to handle a limited amount of incoming requests since threads in the thread pool will be always busy instead of handling new requests.
In the case of a non-blocking client, you are blocking just one single thread (let's call it dispatcher thread) which is responsible for sending and waiting for all the requests/responses. It will be running in a "while loop" (you could call it an event loop) and check whether all the packages were received as a response so they are ready for worker threads to be picked up.
In the latter scenario, you get a larger amount of available threads ready to do some meaningful job, so your throughput will be increased.

The difference is that with sync client, step A thread will be doing a connection to step 2 endpoint and wait for a response. Making step 2 implementation async will and just return 200 directly (or whatever) will help on decreasing waiting time; but it will still be doing the connection and waiting for response.
With non-blocking client instead, the step A call itself will be done by another thread. So everything is untied from step A thread. Also, system can make use of that thread for other stuff until it gets a response from step B and needs to resume work.
The idea is that your origin threads will not be idle so much time waiting for responses, but instead being reused to do other work while in between.

The reason to use a non-blocking HTTP-Client is to prevent too much CPU from being used on thread-switching. If you already solve that problem by limiting the amount of background threads, then non-blocking IO won't provide any noticeable benefits.
There is another problem with your setup: it is very vulnerable to DDOS attacks (intentional or accidental ones). If a someone calls your service very often, it will internally create a huge work-load that will keep the service busy for a long time. You will definitely need to limit the background task queue (which is a supported feature of the Executor class) and return 503 (or equivalent) if there are too many pending tasks.

What is the difference between thread per connection vs thread per request?

Can you please explain the two methodologies which has been implemented in various servlet implementations:
Thread per connection
Thread per request
Which of the above two strategies scales better and why?

Which of the above two strategies scales better and why?
Thread-per-request scales better than thread-per-connection.
Java threads are rather expensive, typically using a 1Mb memory segment each, whether they are active or idle. If you give each connection its own thread, the thread will typically sit idle between successive requests on the connection. Ultimately the framework needs to either stop accepting new connections ('cos it can't create any more threads) or start disconnecting old connections (which leads to connection churn if / when the user wakes up).
HTTP connection requires significantly less resources than a thread stack, although there is a limit of 64K open connections per IP address, due to the way that TCP/IP works.
By contrast, in the thread-per-request model, the thread is only associated while a request is being processed. That usually means that the service needs fewer threads to handle the same number of users. And since threads use significant resources, that means that the service will be more scalable.
(And note that thread-per-request does not mean that the framework has to close the TCP connection between HTTP request ...)
Having said that, the thread-per-request model is not ideal when there are long pauses during the processing of each request. (And it is especially non-ideal when the service uses the comet approach which involves keeping the reply stream open for a long time.) To support this, the Servlet 3.0 spec provides an "asynchronous servlet" mechanism which allows a servlet's request method to suspend its association with the current request thread. This releases the thread to go and process another request.
If the web application can be designed to use the "asynchronous" mechanism, it is likely to be more scalable than either thread-per-request or thread-per-connection.
FOLLOWUP
Let's assume a single webpage with 1000 images. This results in 1001 HTTP requests. Further let's assume HTTP persistent connections is used. With the TPR strategy, this will result in 1001 thread pool management operations (TPMO). With the TPC strategy, this will result in 1 TPMO... Now depending on the actual costs for a single TPMO, I can imagine scenarios where TPC may scale better then TPR.
I think there are some things you haven't considered:
The web browser faced with lots of URLs to fetch to complete a page may well open multiple connections.
With TPC and persistent connections, the thread has to wait for the client to receive the response and send the next request. This wait time could be significant if the network latency is high.
The server has no way of knowing when a given (persistent) connection can be closed. If the browser doesn't close it, it could "linger", tying down the TPC thread until the server times out the connection.
The TPMO overheads are not huge, especially when you separate the pool overheads from the context switch overheads. (You need to do that, since TPC is going to incur context switches on a persistent connections; see above.)
My feeling is that these factors are likely to outweigh the TPMO saving of having one thread dedicated to each connection.

HTTP 1.1 - Has support for persistent connections which means more than one request/response can be received/sent using the same HTTP connection.
So to run those requests received using the same connection in parallel a new Thread is created for each request.
HTTP 1.0 - In this version only one request was received using the connection and the connection was closed after sending the response. So only one thread was created for one connection.

Thread per connection is the Concept of reusing the same HTTP Connection from multiple requests(keep-alive).
Thread per requestwill create a thread for each request from a client.Server can create a number of threads as per request.

Thread per request will create a thread for each HTTP Request the server receives .
Thread per connection will reuse the same HTTP Connection from multiple requests(keep-alive) .AKA HTTP persistent connection
but please note that this supported only from HTTP 1.1
Thread Per Request is faster as most web container use Thread Pooling.
The number of maximum parallel connections you should set on the number of cores on your server.
More cores => more parallel threads .
See here how to configure...
Tomcat 6: http://tomcat.apache.org/tomcat-6.0-doc/config/executor.html
Tomcat 7: http://tomcat.apache.org/tomcat-7.0-doc/config/executor.html
Example

Thread per request should be better, because it reuses threads, while some client may be idle. If you have a lot of concurrent users, they could be served with a less number of threads and having equal number of thread would be more expensive. There is also one more consideration - we do not know if user is still working with application, so we cant know when to destroy a thread. With a thread per request mechanism, we just use a thread pool.

Servers and threading models

I am troubled with the following concept:
Most books/docs describe how robust servers are multithreaded and that the most common approach is to start a new thread to serve each new client. E.g. a thread is dedicated to each new connection. But how is this actually implemented in big systems? If we have a server that accepts requests from 100000 clients, it has started 100000 threads? Is this realistic? Aren't there limits on how many threads can run in a server? Additionally the overhead of context switching and synchronization, doesn't it degrade performance? Is it implemented as a mix of queues and threads? In this case is the number of queues fixed? Can anybody enlighten me on this, and perhaps give me a good reference that describes these?
Thanks!

The common method is to use thread pools. A thread pool is a collection of already created threads. When a new request gets to the server it is assigned a spare thread from the pool. When the request is handled, the thread is returned to the pool.
The number of threads in a pool is configured depending on the characteristics of the application. For example, if you have an application that is CPU bound you will not want too many threads since context switches will decrease performance. On the other hand, if you have a DB or IO bound application you want more threads since much time is spent waiting. Hence, more threads will utilize the CPU better.
Google "thread pools" and you will for sure find much to read about the concept.

Also Read up on the SEDA pattern link , link

In addition to the answers above I should notice, that really high-performance servers with many incoming connections attempt not to spawn a thread per each connection but use IO Completion Ports, select() and other asynchronous techniques for working with multiple sockets in one thread. And of course special attention must be paid to ensure that problems with one request or one socket won't block other sockets in the same thread.
Also thread management consumes CPU time, so threads should not be spawned for each connection or each client request.

In most systems a thread pool is used. This is a pool of available threads that wait for incoming requests. The number of threads can grow to a configured maximum number, depending on the number of simultaneous requests that come in and the characteristics of the application.
If a requests arrives, an unoccupied thread is requested from the thread pool. This thread is then dedicated to handling the request until the request finishes. When that happens, the thread is returned to the thread pool to handle another request.
Since there is only a limited number of threads, in most server systems one should attempt to make the lifetime of requests as short as possible. The less time a request needs to execute, the sooner a thread can be reused for a new request.
If requests come in while all threads are occupied, most servers implement a queueing mechanism for requests. Of course the size of the queue is also limited, so when more requests arrive than can be queued, new requests will be denied.
One other reason for having a thread pool instead of starting threads for each request is that starting a new thread is an expensive operation. It's better to have a number of threads started beforehand and reusing them then starting new threads all the time.

To get network servers to handle lots of concurrent connections there are several approaches (mostly divided up in "one thread per connection" and "several connections per thread" categories), take a look at the C10K page, which is a great resource on this topic, discussing and comparing a lot of approaches and linking to further resources on them.

Creating 10k threads is not likely to be efficient in most cases, but can be done and would work.
If you needed to serve 10k clients at once, doing so on a single machine would be unlikely but possible.
Depending on the client side implementation, it may be that the 10,000 clients do not need to maintain an open TCP connection - depending on the purpose, the protocol design can greatly improve the efficiency of implementation.
I think the appropriate solution for high scale systems is probably extremely domain-specific, and if you wanted a suggestion you'd have to explain more about your problem domain.

Logic for controll concurrent in block/method

1)My environment is web application, I develop servlet to receive request.
A) In some block/method i want to control concurrent to not greater than 5
B) if there are 5 request in that block , the new coming must wait up to 60 second then throws error
C) if there are sleep/waiting request more then 30, the 31th request will be throwed an error
How I do this?
2)(Optional Question) from above I have to distribute control logic to all clustered host.
I plan to use hazelcast to share the control logic (e.g. current counter)
I see they provide BlockingQueue & ExectorService but I have no idea how to use in my case.
Please recommend if you have idea.

For A take a look at this: http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Semaphore.html
For B take a look at Object.wait() and Object.notify()
C should be easy if you have A and B.

The answers by #Roman and #David Soroko say how to do this within a servlet (as the OP asked).
However, this approach has the problem that tomcat has to allocate a thread to each request so that the they can participate in the queuing / timeout logic implemented by the servlet. Each of those threads uses memory and other resources. This does not scale well. And if you don't configure enough threads, requests will be either dropped by the tomcat request dispatcher or queued / timed out using different logic.
An alternative approach is to use a non-servlet architecture in the webserver; e.g. Grizzly and more specifically Grizzly Comet. This is a big topic, and frankly I don't know enough about it to go deeply into the implementation details.
EDIT - In the servlet model, every request is allocated to a single thread for its entire lifetime. For example, in a typical "server push" model, each active client has an outstanding HTTP request asking the server for more data. When new data arrives in the server, the server sends a response and the client immediately sends a new request. In the classic servlet implementation model, this means that the server has to have an request "in progress" ... and a thread ... for each active client, even though most of the threads are just waiting for data to arrive.
In a scalable architecture, you would detach the request from the thread so that the thread could be used for processing another request. Later (e.g. when the data "arrived" in the "server push" example), the request would be attached to a thread (possibly a different one) to continue processing. In Grizzly, I understand that this is done using an event-based processing model, but I imagine that you could also uses a coroutine-based model as well.

Try semaphors:
A counting semaphore. Conceptually, a
semaphore maintains a set of permits.
Each acquire() blocks if necessary
until a permit is available, and then
takes it. Each release() adds a
permit, potentially releasing a
blocking acquirer. However, no actual
permit objects are used; the Semaphore
just keeps a count of the number
available and acts accordingly.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.