How to limit the number of concurrent requests in web api?

How to limit the number of concurrent requests in web api? - java

How to limit by code the number of concurrent requests to web application, say to 3 requests ? Am I suppose to put each servlet class into a thread and create a global counter (by creating new class)?

How to limit by code the number of concurrent requests to web application, say to 3 requests ? Am I suppose to put each servlet class into a thread and create a global counter (by creating new class)?
You typically rely in the web container to limit the number of concurrent requests; e.g. by setting limit on the number of worker threads or connections ... in the web container configurations.
Apparently, if a Tomcat server gets more requests than it can handle, it will send generic 503 responses. For more information:
https://tomcat.apache.org/tomcat-8.5-doc/config/http.html - explains where / how the configs can be set
Tomcat responding HTTP 503 - gives an example of what would happen ...
But how can i display to the users that the wed reached its limit (like 3 requests)?
If you want to limit specific request types and display specific responses to the user, then you probably will need to implement this within each servlet, using a counter, etcetera.
But the problem with trying to do "nice" things when the server is overloaded is that doing the nice things tends to increase the load. This is particularly important:
when your server is grossly inadequate for the actual load from (real) users, or
when someone is DoS'ing you, either accidentally or deliberately.

Related

Scalable Way to Combine Same Requests Within Certain Time Threshold

I have an application, call it Service 1, that potentially makes a lot of the same requests to another application, call it Service 2. As an example, x number of people use Service 1 and that results in x requests (which are the exact same request) to Service 2. Each response is cached in Service 1.
Currently, we have a synchronized method that checks whether or not the same request has been made within a certain time threshold. The problem we are having is that when the server is under a heavy load that synchronized method locks up the threads, kubernetes can't perform liveness checks, so kubernetes restarts the service. The reason we want to prevent duplicate requests is two fold: 1) we don't want to hammer service 2, and 2) if we already are making the request we don't want to make it again, just wait for the result that will already be coming back.
What is the fastest, most scalable solution to not making duplicate requests without locking up and taking down the server?

FWIW, my experience with rx-java specifically is very limited, so I'm not entirely confident how applicable this is for your case. This is a solution I've used several times with Scala and I know Java itself does have analogous constructs that would allow the same approach.
A solution I have used in the past that has worked very well for me involves using Futures. It does not reduce duplication entirely, but it does remove duplication per requesting server. The approach involves using a TTL Cache in which we stored the Future object that does or will contain the result of a request we want to deduplicate on. It is stored under a key that can determine uniqueness of the request such as the different parameters that might be applicable.
So let's say you have a method that you call to fetch the response from Service 2 and returns it as a Future. As an example we'll say getPage which has one parameter, an integer, which is the page you'd like to fetch.
When a request begins and we're about to call getPage with the page number of 2, we check the cache for a key like "getPage:2". This won't contain anything for the first request, so we call getPage(2) which returns a Future[SomeResponseObject]. We set "getPage:2" in the TTL Cache to the Future object. When another request comes in that may spawn a duplicate request, the same cache check happens, however, there's a Future object already in the cache. We get this future and add a response listener to be invoked when the response is available, or in Scala, simply .map() on it.
This has a few advantages. If your request is slow or there's highly duplicative requests even in a small time frame, many requests to Service 1 are serviced by a single response from Service 2.
Secondarily, once the request to Service 2 has come back, assuming you have a window in which the response is still valid, the response is already available immediately and no request is necessary at all.
If your Service 2 request takes 50ms, and your response can be considered valid for 5 seconds, all requests happening to the same server in the first 50ms are serviced at ms 50 when the response is returned, and from that point forward for the remaining 4950 ms already have access to the response.
As I alluded earlier to the effectiveness here is tied to how many instances of Service 1 are running. The number of duplicate requests at any time is linear to the number of Servers running.
This is a mostly lock free way to achieve this. I saw mostly because some synchronization is necessary the TTL Cache itself to make sure the request is only started once, but has never been an issue for performance in my experience.
As an extension of this, you can potentially use something like redis to cache responses from Service 2 if it has long-ish response times, and have your getPage equivalent first check a redis cache for the serialized response (and write an expiring value if one wasn't there). This allows you to further reduce requests to Service 2 by having a more global value cached, but having a second caching layer does add some complexity and potential for issues.

Serving single HTTP request with multiple threads

Angular 4 application sends a list of records to a Java spring MVC application that has been deployed in Websphere 8 Servlet container. The list is then inserted into to a temp table. After the batch insert, a procedure call is made in order to do some calculations and return results. Depending on the size of the list that was inserted into temp table it may take anywhere between: 3000ms( N ~ 500 ), 6000ms( N ~ 1000 ), 50,000+ms ( N > 2000 ).
My asendach would be to create chunks of data and simultaneously send them to database for processing. After threads (Futures) return results I would aggregate them and return back to the client. To sum up, I would split a synchronous call into multiple asynchronous processes(simultaneously executed) and return back to the client over the same thread that initiated HTTP call - landed into my controller.
Everything would be fine and I would not be asking this questions if a more experienced colleague of mine was not strongly disagreeing with this approach. His reasoning is that using this approach is prone to exceptions due to thread interrupts / timeouts / semaphores and so on. Hi is going as far as saying that multithreading should be avoided within a web container because it can crash the Servlet container in case it runs out of threads.
He proposes that we should have the browser send multiple AJAX requests and aggregates/present data in chunks.
Can you please help me understand which approach is better and why?

I would say that your approach is much better.
Threads created by application logic aren't application container threads and limited only by operating system. While each AJAX request uses a thread from application container. So the second approach reduces throughput and increases the possibility of reaching application container limit while and the first one not. Performance also should be considered because it's much cheaper to create a thread than to send a request over network. Plus each network requests uses additional resources for authentication/authorization/encryption etc.
It's definetely harder to write correct multithread code and it can easily prone to errors. However it shouldn't stop you from doing it because concurrency can significantly increase your performance. It's pretty straightforward to handle interrupts and timeouts using Future and you for sure don't need semaphores here.
Exposing this logic to client looks like breaking of encapsulation. Imagine that you use rest api which forces you to send multiple request by splitting you data in chunks. What chunk size should i use? How to deal with timeouts/interrupts? How many requests should i sent? etc. You will have almost the same challenges in both approaches, but it's much easier to deal with them using specially designed for this libraries like ExecutorService and Future.

Is there a way to tell the servlet container to spawn one instance of a resource at a time?

I have a resource, say a #POST method serving the clients. It doesn't run on any external parameters, not even the caller URL (we're leaving that to the firewall) or the user authentication.
However, we don't want to handle user requests simultaneously. When a request1 is being processed and the method hasn't just yet returned, a request2 coming in should receive a response of status 309 (or whatever status code applies) and shouldn't get served.
Is there a way of doing this without getting into anything on the server back-end side like multithreading?
I'm using Tomcat 8. The application will be deployed on JBoss, however this wouldn't effect the outcome(?) I used Jersey 1.19 for coding the resource.
This is a Q relevant to How to ignore multiple clicks from an impatient user?.
TIA.

Depending on what you want to achieve, yes, it is possible to reject additional requests while a service is "in use." I don't know if it's possible at the servlet level; servlets are designed to spin up processes for as many requests as possible so that, say, if one user requests something simple and another requests something difficult, the simple request can get handled while the difficult request is processing.
The primary reason you would probably NOT want to return an HTTP error code simply because a service is in use is that the service didn't error; it was simply in use. Imagine trying to use a restroom that someone else was using and instead of "in use" the restroom said "out of order."
Another reason to think twice about a service that rejects requests while it is processing any other request is that it will not scale. Period. You will have some users have their requests accepted and others have their requests rejected, seemingly at random, and the ratio will tilt toward more rejections the more users the service has. Think of calling into the radio station to try to be the 9th caller, getting a busy tone, and then calling back again and again until you get through. This works for trying to win free tickets to a concert, but would not work well for a business you were a customer of.
That said, here are some ways I might approach handling expensive, possibly duplicate, requests.
If you're trying to avoid multiple identical/simultaneous requests from an impatient user, you most likely have a UX problem (e.g. a web button doesn't seem to respond when clicked because of processing lag). I'd implement a loading mask or something similar to prevent multiple clicks and to communicate that the user's request has been received and is processing. Loading/processing masks have the added benefit of giving users an abstract feeling of ease and confidence that the service is indeed working as expected.
If there is some reason out of your control why multiple identical requests might get triggered coming from the same source, I'd opt for a cache that returns the processed result to all requests, but only processes the first request (and retrieves the response from the cache for all other requests).
If you really really want to return errors, implement a singleton service that remembers a cache of some number of requests, detects duplicates, and handles them appropriately.
Remember that if your use case is indeed multiple clicks from a browser, you likely want to respond to the last request sent, not the first. If a user has clicked twice, the browser will register the error response first (it will come back immediately as a response to the last click). This can further undermine the UX: a single click results in a delay, but two clicks results in an error.
But before implementing a service that returns an error condsider the following: what if two different users request the same resource at the same time? Should one really get an error response? What if the quantity of requests increases during certain times? Do you really want to return errors to what amounts to a random selection of consumers of the service?

Java Servlets Large number of Requests and Threads

In one of my interview I was asked how servlets work and I told them for every request,servlet container creates a thread upon which he asked again then if we take a popular site like facebook which gets a huge number of requests and if we allocate a thread to each of this request then it wouldnt be a good approach,how do they handle such many request.I thought of thread pool but i do not know whether this is the approach.Someone please explain how such many of requests are handled in servlet container.

Two approaches here that complete each other:
yes, limit the number of threads to a fixed number and pre-create them into a pool - thus preventing the costly process of re-creating them every time. I think Apache's HTTP server works this way.
You can always throw more machines at the problem. Large sites always use clusters of web-servers, thus balancing the load.

Why do multiple RPC calls in GWT significantly slow response time?

I'm testing a Google Web Toolkit application and having some performance issue with multiple RPC calls. The structure of the app is:
User submits a query
Initial query is serviced by a single server-side servlet
Once initial reply received, multiple components are subsequently updated by iterating over each component and calling an update method, passing it the results of the initial query
Each component's update method does some work on the data passed to it, in addition to potentially calling other server-side services
On success of these calls, the component is updated in the UI.
With the initial query service and 1 component (effectively running sequentially), response time is fast. However, adding any other components (e.g initial query service + 2 components, these 2 components calling asynchronously) hugely impacts the response time.
Is there any way to improve / rectify this?
Example: (IQS = initial query, C1 = component 1, C2 = component 2, C1S = comp. 1 service, C2S = component 2 service)
Initial Query + 1 component
IQS, returned - propagating results, 1297273015477
C1, Sending server request,1297273015477
C1S, Sending back..., 1297273016486
C1, Receiving Complete, 1297273016522 (total time from initial call - 1045ms)
Initial Query + 2 components
IQS, returned - propagating results, 1297272667185
C1, Sending server request,1297272667185
C2, Sending server request,1297272668132
C1S, Sending back..., 1297272668723
C2S, Sending back..., 1297272669371
C1, Back at client to process, 1297272671077 (total time from initial call - 3892ms)
C2, Back at client to process, 1297272674518 (total time from initial call - 6386ms)
Thanks in advance.
Paul

I think you need to make your analysis more fine grained: in the data provided you have established that the client started the 2nd component call and got a response back 6386ms later. Some of this was
Going over the wire
Being received at the server
Processed at the server (this could be broken down, as well).
Sent back over the wire.
The GWT-RPC service really only has to do with 1 and 4. Do you know how long each step takes?

Well, I think your problem is not directly related to GWT. Because , I have used multiple rpc calls at same time, my application performance did not degraded.I think that you may have server side synchronization issues.

The overhead of http with cookies, and the sequencing of some of these (rather than firing all the request when the user is switching to another part of the application) is part of the reason why they seem to slow things down. E.G. A user requests a page, once that page's widgets are in place they fire requests for the data they're supposed to show, possibly making decisions to add more widgets based on that data (but hopefully passing the data into those widgets).
You might be looking for some tools that help you to create batched rpc calls like: gwt-dispatch. I don't think there's anything automatic.

A low-tech way to get more information is to put basic timing logging into every RPC to see how long they take. Create a new Date() at the top, subtract its ms from a new Date()'s ms at the end, print it to stdout or use Log.info() or whatever.
For something more industrial strength I used the "SpringSource tc" combined with Chrome's Speed Tracer in order to get a full stack view of what calls were taking what amount of time, and what was actually able to happen in parallel. Not trivial to set up but once I did I was able to zero in on the real issues (in my case it was getting tons of unnecessary information from Hibernate queries) very quickly.
Here's the basic info we used:
Download the tc Server Developer Edition (free)
http://www.springsource.com/products/tc-server-developer-edition-preview
NOTE: Do not even THINK about installing in a directory structure that has spaces.....
Installing tc Server: Main Steps
http://static.springsource.com/projects/tc-server/2.1/getting-started/html/ch06s02.html#install-developer-edition
Viewing Spring Insight Data In Google Speed Tracer
http://static.springsource.com/projects/tc-server/2.0/devedition/html/ch04s04.html
url is now localhost:8080 instead of the old port address for the other installation of tomcat.
One more detail, you'll need to make a .war file and deploy that to the tomcat directory. (You're not getting perf data on dev mode, but rather a local GWT compiled release)
-- Andrew # learnvc.com

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.