I have written a Vert.x HTTP server in Java. When the client sends requests faster than the server can process them, the server-side request queue slowly fills up. Eventually the JVM runs out of memory because of all the accumulating requests.
Can I set a capacity on the Vert.x request queue?
I would like to set one or more of the following:
A maximum number of queued requests
A maximum size (in bytes) of all queued requests
When either of these limits is violated by an incoming request, I would like to immediately respond with 503 Service Unavailable.
AFAIK there's no built-in way to accomplish this. However, this type of back pressure should still be achievable by normal means. The approach you take is this:
When an HTTP requests is received, immediately forward the request via a message to a separate request handling verticle on the event bus and increment an outstanding request counter.
Perform request handling logic in that verticle and respond to the event bus message once complete.
Once the HTTP server verticle receives a response from the request handler verticle, decrement the request counter and send the appropriate response.
Add a request counter check to your HTTP server handler to check the outstanding request count and respond with an appropriate error if the queue grows too large.
This is a common pattern in Vert.x that essentially just separates request handling logic from the HTTP request handler. Forwarding the request on the event bus as a JsonObject ensures that requests are quickly queued in the event bus. You an use that queue to calculate the number of outstanding requests as I've shown.
Note also that you can scale your HTTP server across multiple verticle instances in order to handle more requests. In this case you can either use static variables or shared data to share a semaphore across the instances.
Related
I have a queue and I have this consumer written in java for this queue. After consuming, we are executing an HTTP call to a downstream partner and this is a one-way asynchronous call. After executing this request, the downstream partner will send an HTTP request back to our system with the response for the initial asynchronous call. This response is needed for the same thread that we executed the initial asynchronous call. This means we need to expose an endpoint within the thread so the downstream system can call and send the response back. I would like to know how can I implement a requirement like this.
PS : We also can get the same response to a different web service and update a database row with the response. But I'm not sure how to stop the main thread and listen to the database row when the response is needed.
Hope you understood what I want with this requirement.
My response based on some assumptions. (I didn't wait for you respond to my comment since I found the problem had some other interesting features anyhow.)
the downstream partner will send an HTTP request back to our system
This necessitates that you have a listening port (ie, a server) running on this side. This server could be in the same JVM or a different one. But...
This response is needed for the same thread
This is a little confusing because at a high level, reusing the thread programmatically itself is not usually our interest but reusing the object (no matter in which thread). To reuse threads, you may consider using ExecutorService. So, what you may try to do, I have tried to depict in this diagram.
Here are the steps:
"Queue Item Consumer" consumes item from the queue and sends the request to the downstream system.
This instance of the "Queue Item Consumer" is cached for handling the request from the downstream system.
There is a listener running at some port within the same JVM to which the downstream system sends its request.
The listener forwards this request to the "right" cached instance of "Queue Item Consumer" (you have to figure out a way for this based on your caching mechanism). May be some header has to be present in the request from the downstream system to identify the right handler on this side.
Hope this works for you.
I need to write a web application which receives a lot of HTTP requests and takes a long time (30s to 2min) to process each request (in turn making other network requests) before returning a response.
Because there would be a lot of requests coming in and those connections are held open I'm thinking of going down an event driven route, which leads me to think Netty is appropriate.
If each request takes a long time to process, is that going to block netty's processing? Or can I receive a request and then asynchronously process it before returning a result to the request's connection?
As long as you don't block the event loop, you will be able to serve a significant amount of concurrent requests (depending on the available memory, and the size of the context you're holding for each request).
What you need to do is to make sure you're making the outbound network requests in a non blocking manner. This normally looks like so (in your Netty inbound handler):
CompletableFuture<YourResultType> future = remoteTarget.getStuff();
future.thenApply(ctx::write);
You need to hold a reference to a context / channel if you're doing this outside of the handler of course.
Note that this is a simplified answer. If you're making several outbound requests and have some business logic, you need to stitch your code properly using continuations on the futures, or whatever non-blocking model you are using.
We have an old monolith system that is unstable, 95% of the requests are processed within 500ms but the other 5% takes > 10sec and the connection times out. I would like to make our service more resilient. The communication is done through REST and the architecture is like this.
Our current approach is to use an async http client with an exponential backoff retry mechanism. But this will cause performance issues as the traffic increases
My idea is to have a synchronous http call in S with a timeout of 500ms and a fallback method that adds a task to the queue for retrying the http request in the future, while returning a 202 to C along with a link to check the status of the task something like /queue/task-123. I know that I need to make S exposed service to C idempotent so I will have to check the queue every time I receive a new request from C to be sure that I do not have duplicate tasks.
Questions:
Is there a better approach to solve my problem?
Is a task in a queue the best way to handle a retry in a REST endpoint?
Our stack: Java using Spring boot and for a queue I think RabbitMQ
Have the requests to S create Futures for the AsyncHttpResponse, and send them to an Executor with a thread pool large enough to accommodate your load, but not so high that it will swamp your Monolith. That way when things start failing, it will not snowball on you, and the other requests can queue. You could still have a retry model in this model, but have it be controlled outside the future so that it will allow successful requests to come in before the retries.
I have 2 microservices (A and B).
A has an endpoint which accepts POST requests. When users make a POST request, this happens:
Service A takes the object from the POST request body and stores it in a database.
Service A converts the object to a different object. And the new object gets sent to service B via Jersey HTTP client.
Step 2 takes place on a Java thread pool I have created (Executors.newCachedThreadPool). By doing step 2 on a new thread, the response time of service A's endpoint is not affected.
However, if service B is taking long to respond, service A can potentially create too many threads when it is receiving many POST requests. To help fix this, I can use a fixed thread pool (Exectuors.newFixedThreadPool).
In addition to the fixed thread pool, should I also use an asynchronous non-blocking HTTP client? Such as the one here: https://hc.apache.org/httpcomponents-asyncclient-dev/. The Jersey HTTP client that I use is blocking.
It seems like it is right to use the async HTTP client. But if I switch to a fixed thread pool, I think the async HTTP client won't provide a significant benefit - am I wrong in thinking this?
Even if you use fixed thread pool all your threads in it will be blocked on step 2 meaning that they won't do any meaningful job - just wait for your API to return a response which is not a pragmatic resource management. In this case, you will be able to handle a limited amount of incoming requests since threads in the thread pool will be always busy instead of handling new requests.
In the case of a non-blocking client, you are blocking just one single thread (let's call it dispatcher thread) which is responsible for sending and waiting for all the requests/responses. It will be running in a "while loop" (you could call it an event loop) and check whether all the packages were received as a response so they are ready for worker threads to be picked up.
In the latter scenario, you get a larger amount of available threads ready to do some meaningful job, so your throughput will be increased.
The difference is that with sync client, step A thread will be doing a connection to step 2 endpoint and wait for a response. Making step 2 implementation async will and just return 200 directly (or whatever) will help on decreasing waiting time; but it will still be doing the connection and waiting for response.
With non-blocking client instead, the step A call itself will be done by another thread. So everything is untied from step A thread. Also, system can make use of that thread for other stuff until it gets a response from step B and needs to resume work.
The idea is that your origin threads will not be idle so much time waiting for responses, but instead being reused to do other work while in between.
The reason to use a non-blocking HTTP-Client is to prevent too much CPU from being used on thread-switching. If you already solve that problem by limiting the amount of background threads, then non-blocking IO won't provide any noticeable benefits.
There is another problem with your setup: it is very vulnerable to DDOS attacks (intentional or accidental ones). If a someone calls your service very often, it will internally create a huge work-load that will keep the service busy for a long time. You will definitely need to limit the background task queue (which is a supported feature of the Executor class) and return 503 (or equivalent) if there are too many pending tasks.
We have an requirement in which it causes an design constraint and it is show stopper. Here it is,
Sender thread will put requests into the messaging queue. Input source is a text file that contains 10 million requests.
Recevier thread polls the responses from another queue and write it onto another output file.
Design Constraint:
Recevier thread has to write the request and response onto output file. How this is possible ?
No Database should be used
Caching the request before sending and updating it after corresponding response has been recevied cannot be used because of performance bottleneck.
In few cases, timeout occurs if the response is delayed very long time.
Please advice.
Since you have just one Receiver thread, it is guaranteed that only one request will be processed at a time.
Having the sender thread write the request and response is probably not the most elegant design, but you could certainly have the Receiver thread write the {request, response} tuple. The Receiver thread could also write the request before it starts processing and the response after it is done. It will have the same result as what you are aiming for.
If you give out more details about your design, I can provide you with more design help.
Several ideas for solutions:
The receiver thread could look up the original request from the file. This requires responses to have some form of unique correlation id.
The handler threads could add the original request to the response. This makes message size bigger but avoids the need of a correlation id. It also requires configuration/code changes to be made to the handler threads.
The sender thread could duplicate the request on a secondary local queue. The receiver thread looks up the original request for a received response on this queue. Responses might not be received in the same order as the requests were sent so the receiver thread might need to 'walk the queue' to find the request which is not very efficient.
The last solution comprises a form of caching. You state this is not allowed due to performance reasons. I don't understand exactly why though. Local caching is fast and there should not be a large amount of queued requests in the cache at any given time since the sending, handling and receiving are all asynchronous.