I have 2 microservices (A and B).
A has an endpoint which accepts POST requests. When users make a POST request, this happens:
Service A takes the object from the POST request body and stores it in a database.
Service A converts the object to a different object. And the new object gets sent to service B via Jersey HTTP client.
Step 2 takes place on a Java thread pool I have created (Executors.newCachedThreadPool). By doing step 2 on a new thread, the response time of service A's endpoint is not affected.
However, if service B is taking long to respond, service A can potentially create too many threads when it is receiving many POST requests. To help fix this, I can use a fixed thread pool (Exectuors.newFixedThreadPool).
In addition to the fixed thread pool, should I also use an asynchronous non-blocking HTTP client? Such as the one here: https://hc.apache.org/httpcomponents-asyncclient-dev/. The Jersey HTTP client that I use is blocking.
It seems like it is right to use the async HTTP client. But if I switch to a fixed thread pool, I think the async HTTP client won't provide a significant benefit - am I wrong in thinking this?
Even if you use fixed thread pool all your threads in it will be blocked on step 2 meaning that they won't do any meaningful job - just wait for your API to return a response which is not a pragmatic resource management. In this case, you will be able to handle a limited amount of incoming requests since threads in the thread pool will be always busy instead of handling new requests.
In the case of a non-blocking client, you are blocking just one single thread (let's call it dispatcher thread) which is responsible for sending and waiting for all the requests/responses. It will be running in a "while loop" (you could call it an event loop) and check whether all the packages were received as a response so they are ready for worker threads to be picked up.
In the latter scenario, you get a larger amount of available threads ready to do some meaningful job, so your throughput will be increased.
The difference is that with sync client, step A thread will be doing a connection to step 2 endpoint and wait for a response. Making step 2 implementation async will and just return 200 directly (or whatever) will help on decreasing waiting time; but it will still be doing the connection and waiting for response.
With non-blocking client instead, the step A call itself will be done by another thread. So everything is untied from step A thread. Also, system can make use of that thread for other stuff until it gets a response from step B and needs to resume work.
The idea is that your origin threads will not be idle so much time waiting for responses, but instead being reused to do other work while in between.
The reason to use a non-blocking HTTP-Client is to prevent too much CPU from being used on thread-switching. If you already solve that problem by limiting the amount of background threads, then non-blocking IO won't provide any noticeable benefits.
There is another problem with your setup: it is very vulnerable to DDOS attacks (intentional or accidental ones). If a someone calls your service very often, it will internally create a huge work-load that will keep the service busy for a long time. You will definitely need to limit the background task queue (which is a supported feature of the Executor class) and return 503 (or equivalent) if there are too many pending tasks.
Related
On web scene, I use SpringBoot handle requests, for each request, main thread send several requests to other servers to acquire data (this step use treadPool to async), finally main thread "get" all data and return.
So I wonder what are advantages of Vert.x on this scene? Both of them use multi threads to async tasks, will performance be better if I replace the threadPool with vert.x?
Is it not the same, Vert.x does not create a new Thread for each request.
Vert.x has something called Event loop which are managed using few Java threads.
The number of threads used by Vert.x increases as the number of Verticles that you deploy increases, but it will still be very lower in Count.
Also the code in Vert.x is Reactive is executed based on events, so if you are waiting for a event (for example Http Response) you code will be invoked when there is a response available and there is not thread blocked for response.
Check this for more information:
https://alexey-soshin.medium.com/understanding-vert-x-event-loop-46373115fb3e
My doubt is regarding the working of the Asynchronous JAX-RS which is kind of new to me and I'm trying to grasp its advantage.
What I understand is that the client sends a request, the request is delegated from the requestor thread to the worker thread and once the processing is completed the response is sent back to the client using the AsyncResponse. What I've also understood is that throughout the process the client waits for a response from the server. (So as far as the client is concerned its the same as a normal Synchronous request)
It also states that as the request thread is sent to a worker thread for further processing and therefore by having this approach the I/O threads are free to accept new connections.
What I did not understand is, the client is still waiting for a response and therefore an active connection is still maintained between the client and server. Is this not maintained in an I/O thread? What does suspended mean here?
Also, even if the case is that the I/O thread is released because of the delegation of the process to a worker connection with the client is still up how can the server accept more and more connections then?
And my next question is about the thread pool used here. The I/O threads and worker threads are from different pools? are the worker/processor threads not coming from a pool managed by the server?
Because of my failure to understand this, my next pondering is, just having a separate pool for the I/O and the processing with the client connection still up is the same as having the I/O blocked with the processing inside right?
I haven't grasped this concept very well.
The thread pools in use in this scenario are:
The request processing pool, managed by the JAX-RS container (e.g. Jersey), and
The worker thread pool, typically managed by your own code.
There is possibly an IO thread associated with the connection, but that's an implementation detail that doesn't affect this.
When you use AsyncResponse, as soon as you return from your handle method, the request processing thread (from pool #1) is freed and can be used by the container to handle another request.
On to your questions:
"... how can the server accept more and more connections?" - it can accept more connections than if you do not use AsyncResponse for long-running requests, because you are freeing up one of your limited resources (threads in thread pool #1). The connection itself is not freed, and connections are also limited resources, so you can run out of those still (as well as possibly being limited by CPU or memory).
"are the worker/processor threads not coming from a pool managed by the server?" - not normally. See the example in the link from Paul Samsotha here - the application code creates a new thread itself. This is probably not how you would do it, you would most likely want to use an ExecutorService or similar, but the point is that you manage the "worker thread" yourself. The exception here is if you use Jersey's #ManagedAsync annotation.
Your last question should be answered by my answer to #1 - at a lower level there is still a resource associated with a connection, even if you are using AsyncResponse, but AsyncResponse does free up the container's request processing threads, which can be more limited in number than the maximum number of connections. You may choose to handle this problem by changing the server configuration instead of using AsyncResponse, but AsyncResponse has two advantages - it is under the application's control, and it is per-request instead of per-server.
We have a somewhat unique case where we need to interface with an outside API that requires us to long poll their endpoint for what they call real time events.
The thing is we may have as many as 80,000 people/devices hitting this endpoint at any given time, listening for events, 1 connection per device/person.
When a client makes a request from our Spring service to long poll for events, our service then in turn makes an async call to the outside API to long poll for events. The outside API has defined the minimum long poll timeout may be set to 180 seconds.
So here we have a situation where a thread pool with a queue will not work, because if we have a thread pool with something like (5 min, 10 max, 10 queue) then the 10 threads getting worked on may hog the spotlight and the 10 in queue will not get a chance until one of the current 10 are done.
We need a serve it or fail it (we will put load balancers etc. behind it), but we don't want to leave a client hanging without actual polling happening.
We have been looking into using DeferredResult for this, and returning that from the controller.
Something to the tune of
#RequestMapping(value = "test/deferredResult", method = RequestMethod.GET)
DeferredResult<ResponseEntity> testDeferredResult() {
final DeferredResult<ResponseEntity> deferredResult = new DeferredResult<ResponseEntity>();
CompletableFuture.supplyAsync(() -> testService.test()).whenCompleteAsync((result, throwable) -> deferredResult.setResult(result));
return deferredResult;
}
I am questioning if I am on the right path, and also should I provide an executor and what kind of executor (and configuration) to the CompletableFuture.supplyAsync() method to best accomplish our task.
I have read various articles, posts, and such and am wanting to see if anyone has any knowledge that might help our specific situation.
The problem you are describing does not sound like one that can be solved nicely if you are using blocking IO. So you are on the right path, because DeferredResult allows you to produce the result using any thread, without blocking the servlet-container thread.
With regards to calling a long-pooling API upstream, you need a NIO solution as well. If you use a Netty client, you can manage several thousand sockets using a single thread. When the NIO selector in Netty detects data, you will get a channel callback and eventually delegate to a thread in the Netty worker thread pool, and you can call deferredResult.setResult. If you don't do blocking IO the worker pool is usually sized after the number of CPU-cores, otherwise you may need more threads.
There are still a number of challenges.
You probably need more than one server (or network interface) since there are only 65K ports.
Sockets in Java does not have write timeouts, so if a client refuses to read data from the socket, and you send more data than your socket buffer, you would block the Netty worker thread(s) and then everything would stop (reverse slow loris attack). This is a classic problem in large async setups, and one of the reasons for using frameworks like Hystrix (by Netflix).
I have a camel instance with a Netty endpoint that consolidates many incoming requests to send to a single receiver. More specifically, this is a web service whereby each incoming SOAP request results in a Producer.sendBody() into the camel subsystem. The processing of each request involves different routes, but they will all end up in the single Netty endpoint to send on to the next-level server. All is fine, as long as I only have a handful of incoming requests at any one time. If I start having more than 100 simultaneous requests, though, I get this exception:
java.lang.IllegalStateException: Queue full
at java.util.AbstractQueue.add(AbstractQueue.java:71) ~[na:1.6.0_24]
at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:209) [na:1.6.0_24]
at org.apache.camel.impl.DefaultServicePool.release(DefaultServicePool.java:95) [camel-core-2.9.2.jar:2.9.2]
at org.apache.camel.impl.ProducerCache$1.done(ProducerCache.java:297) ~[camel-core-2.9.2.jar:2.9.2]
at org.apache.camel.processor.SendProcessor$2$1.done(SendProcessor.java:120) ~[camel-core-2.9.2.jar:2.9.2]
at org.apache.camel.component.netty.handlers.ClientChannelHandler.messageReceived(ClientChannelHandler.java:162) ~[camel-netty-2.9.2.jar:2.9.2]
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) ~[netty-3.3.1.Final.jar:na]
This is coming from the DefaultServicePool that's used by the Netty component. The DefaultServicePool uses an ArrayBlockingQueue as the backend to the queue and it sets it to a default capacity of 100 Producers. It uses a service pool for performance reasons, to avoid having to keep creating and destroying often-reused producers. Fair enough. Unfortunately, I'm not getting the logic on how it is implemented.
This all starts in ProducerCache::doInAsyncProducer, which starts off by calling doGetProducer. Said method attempts to acquire a Producer from the pool and, if that fails, it creates a new Producer using endpoint.getProducer(). It then makes sure that the service pool exists using pool.addAndAcquire. That done, it returns to the calling function. The doInAsyncProducer does its thing until it's finished, in which case it calls the done processor. At this point, we're completely done processing the exchange, so it releases the Producer back to the pool using pool.release
Here is where the rubber hits the road. The DefaultServicePool::release method inserts the Producer into the ArrayBlockingQueue backend using an add. This is where my java.lan.IllegalStateException is coming from.
Why? Well, let's look through a use case. I have 101 simultaneous incoming requests. Each of them hits the Netty endpoint at roughly the same time. The very first creates the service pool with the capacity of 100 but it's empty to start. In fact, each of the 101 requests will create a new Producer from the endpoint.getProducer; each will verify that they don't exceed the capacity of the service pool (which is empty); and each will continue on to send to the server. After each finishes, it tries to do a pool.release. The first 100 will succeed, since the pool capacity hasn't been reached. The 101st request will attempt to add to the queue and will fail, since the queue is full!
Is that right? If I'm reading that correctly, then this code will always fail whenever there are more than 100 simultaneous requests. My service needs to support upwards of 10,000 simultaneous requests, so that's just not going to fly.
It seems like a more stable solution might be to:
Pre-allocate all 100 Producers on initialization
Block during acquire until a Producer is available
Absolutely do not create your own non-pool Producers if using a ServicePool
In the meantime, I'm thinking of throttling incoming requests.
What I'm hoping for with this question is to learn if I'm reading that logic correctly and to see if it can get changed. Or, am I using it wrong? Is there a better way to handle this type of thing?
Yes the logic should IMHO be improved. I have logged a ticket to improve this.
https://issues.apache.org/jira/browse/CAMEL-5703
1)My environment is web application, I develop servlet to receive request.
A) In some block/method i want to control concurrent to not greater than 5
B) if there are 5 request in that block , the new coming must wait up to 60 second then throws error
C) if there are sleep/waiting request more then 30, the 31th request will be throwed an error
How I do this?
2)(Optional Question) from above I have to distribute control logic to all clustered host.
I plan to use hazelcast to share the control logic (e.g. current counter)
I see they provide BlockingQueue & ExectorService but I have no idea how to use in my case.
Please recommend if you have idea.
For A take a look at this: http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Semaphore.html
For B take a look at Object.wait() and Object.notify()
C should be easy if you have A and B.
The answers by #Roman and #David Soroko say how to do this within a servlet (as the OP asked).
However, this approach has the problem that tomcat has to allocate a thread to each request so that the they can participate in the queuing / timeout logic implemented by the servlet. Each of those threads uses memory and other resources. This does not scale well. And if you don't configure enough threads, requests will be either dropped by the tomcat request dispatcher or queued / timed out using different logic.
An alternative approach is to use a non-servlet architecture in the webserver; e.g. Grizzly and more specifically Grizzly Comet. This is a big topic, and frankly I don't know enough about it to go deeply into the implementation details.
EDIT - In the servlet model, every request is allocated to a single thread for its entire lifetime. For example, in a typical "server push" model, each active client has an outstanding HTTP request asking the server for more data. When new data arrives in the server, the server sends a response and the client immediately sends a new request. In the classic servlet implementation model, this means that the server has to have an request "in progress" ... and a thread ... for each active client, even though most of the threads are just waiting for data to arrive.
In a scalable architecture, you would detach the request from the thread so that the thread could be used for processing another request. Later (e.g. when the data "arrived" in the "server push" example), the request would be attached to a thread (possibly a different one) to continue processing. In Grizzly, I understand that this is done using an event-based processing model, but I imagine that you could also uses a coroutine-based model as well.
Try semaphors:
A counting semaphore. Conceptually, a
semaphore maintains a set of permits.
Each acquire() blocks if necessary
until a permit is available, and then
takes it. Each release() adds a
permit, potentially releasing a
blocking acquirer. However, no actual
permit objects are used; the Semaphore
just keeps a count of the number
available and acts accordingly.