utility of asynchronous programming in web frameworks?

utility of asynchronous programming in web frameworks? - java

In both the C# and Java world, we can implement web controllers either with synchronous or asynchronous code. Following is a syntax example
Java Spring MVC
#Controller
public class Controller {
#GetMapping
public ResponseObject doSomething() {
----
return new ReponseObject();
}
public Future<ResponseObject> doSomethingAsync() {
-----
return AsyncTask.forValue(new ReponseObject());
}
}
C# ASP.NET MVC
[Controller]
public class Controller {
public ActionResult DoSomething(){
-------
return Json(new ResponseObject());
}
public async Task<ActionResult> DoSomethingAsync(){
-------
return Json(new ResponseObject());
}
}
From my current knowledge, this is what happens on both platforms when a request is received
In the synchronous case, the thread that was dispatched to serve the web request is busy serving the request, which could take up some time and if the number of threads the server allocates for workers gets exhausted, new clients will have to wait (call it Slashdot effect or DoS)
In the asynchronous case, the worker thread determines that the operation is deferred and assigns a task executor the duty of serving the request, then serves a new request
In both cases, single requests do not get a performance increase if their body is basically copied and pasted between synchronous and asynchronous version. I mean, if you don't parallelize I/O operations by code (e.g. using threads or more Futures within the method body) you don't get any performance improvement
The question
I simply want to ask, from a top-level view, what is the real advantage of employing such asynchronous way of developing web services considering that massive amount of requests will simply exhaust the worker executor's capacity instead of the application server pool. For example in Spring for Java, a thread in the DispatherServlet will be released in favour of a thread pool that has limited capacity. In .NET, tasks and threads are slightly different but the task executor has still limited capacity.
In other words, wouldn't it be simpler to tune the application server's thread pool capacity rather than using an additional layer of abstraction?
Or is there something I don't know yet about asynchronous programming? Why can't I understand its benefits in these kinds of scenarios?

Related

Does the use of Spring Webflux's WebClient in a blocking application design cause a larger use of resources than RestTemplate

I am working on several spring-boot applications which have the traditional pattern of thread-per-request. We are using Spring-boot-webflux to acquire WebClient to perform our RESTful integration between the applications. Hence our application design requires that we block the publisher right after receiving a response.
Recently, we've been discussing whether we are unnecessarily spending resources using a reactive module in our otherwise blocking application design. As I've understood it, WebClient makes use of the event loop by assigning a worker thread to perform the reactive actions in the event loop. So using webclient with .block() would sleep the original thread while assigning another thread to perform the http-request. Compared to the alternative RestTemplate, it seems like WebClient would spend additional resources by using the event loop.
Is it correct that partially introducing spring-webflux in this way leads to additional spent resources while not yielding any positive contribution to performance, neither single threaded and concurrent? We are not expecting to ever upgrade our current stack to be fully reactive, so the argument of gradually upgrading does not apply.

In this presentation Rossen Stoyanchev from the Spring team explains some of these points.
WebClient will use a limited number of threads - 2 per core for a total of 12 threads on my local machine - to handle all requests and their responses in the application. So if your application receives 100 requests and makes one request to an external server for each, WebClient will handle all of those using those threads in a non-blocking / asynchronous manner.
Of course, as you mention, once you call block your original thread will block, so it would be 100 threads + 12 threads for a total of 112 threads to handle those requests. But keep in mind that these 12 threads do not grow in size as you make more requests, and that they don't do I/O heavy lifting, so it's not like WebClient is spawning threads to actually perform the requests or keeping them busy on a thread-per-request fashion.
I'm not sure if when the thread is under block it behaves the same as when making a blocking call through RestTemplate - it seems to me that in the former the thread should be inactive waiting for the NIO call to complete, while in the later the thread should be handling I/O work, so maybe there's a difference there.
It gets interesting if you begin using the reactor goodies, for example handling requests that depend on one another, or many requests in parallel. Then WebClient definitely gets an edge as it'll perform all concurrent actions using the same 12 threads, instead of using a thread per request.
As an example, consider this application:
#SpringBootApplication
public class SO72300024 {
private static final Logger logger = LoggerFactory.getLogger(SO72300024.class);
public static void main(String[] args) {
SpringApplication.run(SO72300024.class, args);
}
#RestController
#RequestMapping("/blocking")
static class BlockingController {
#GetMapping("/{id}")
String blockingEndpoint(#PathVariable String id) throws Exception {
logger.info("Got request for {}", id);
Thread.sleep(1000);
return "This is the response for " + id;
}
#GetMapping("/{id}/nested")
String nestedBlockingEndpoint(#PathVariable String id) throws Exception {
logger.info("Got nested request for {}", id);
Thread.sleep(1000);
return "This is the nested response for " + id;
}
}
#Bean
ApplicationRunner run() {
return args -> {
Flux.just(callApi(), callApi(), callApi())
.flatMap(responseMono -> responseMono)
.collectList()
.block()
.stream()
.flatMap(Collection::stream)
.forEach(logger::info);
logger.info("Finished");
};
}
private Mono<List<String>> callApi() {
WebClient webClient = WebClient.create("http://localhost:8080");
logger.info("Starting");
return Flux.range(1, 10).flatMap(i ->
webClient
.get().uri("/blocking/{id}", i)
.retrieve()
.bodyToMono(String.class)
.doOnNext(resp -> logger.info("Received response {} - {}", I, resp))
.flatMap(resp -> webClient.get().uri("/blocking/{id}/nested", i)
.retrieve()
.bodyToMono(String.class)
.doOnNext(nestedResp -> logger.info("Received nested response {} - {}", I, nestedResp))))
.collectList();
}
}
If you run this app, you can see that all 30 requests are handled immediately in parallel by the same 12 (in my computer) threads. Neat! If you think you can benefit from such kind of parallelism in your logic, it's probably worth it giving WebClient a shot.
If not, while I wouldn't actually worry about the "extra resource spending" given the reasons above, I don't think it would be worth it adding the whole reactor/webflux dependency for this - besides the extra baggage, in day to day operations it should be a lot simpler to reason about and debug RestTemplate and the thread-per-request model.
Of course, as others have mentioned, you ought to run load tests to have proper metrics.

According to official Spring documentation for RestTemplate it's in the maintenance mode and probably will not be supported in future versions.
As of 5.0 this class is in maintenance mode, with only minor requests
for changes and bugs to be accepted going forward. Please, consider
using the org.springframework.web.reactive.client.WebClient which has
a more modern API and supports sync, async, and streaming scenarios
As for system resources, it really depends on your use case and I would recommend to run some performance tests, but it seems that for low workloads using blocking client could have a better performance owning to a dedicated thread per connection. As load increases, the NIO clients tend to perform better.
Update - Reactive API vs Http Client
It's important to understand the difference between Reactive API (Project Reactor) and http client. Although WebClient uses Reactive API it doesn't add any additional concurrently until we explicitly use operators like flatMap or delay that could schedule execution on different thread pools. If we just use
webClient
.get()
.uri("<endpoint>")
.retrieve()
.bodyToMono(String.class)
.block()
the code will be executed on the caller thread that is the same as for blocking client.
If we enable debug logging for this code, we will see that WebClient code is executed on the caller thread but for network operations execution will be switched to reactor-http-nio-... thread.
The main difference is that internally WebClient uses asynchronous client based on non-blocking IO (NIO). These clients use Reactor pattern (event loop) to maintain a separate thread pool(s) which allow you to handle a large number of concurrent connections.
The purpose of I/O reactors is to react to I/O events and to dispatch
event notifications to individual I/O sessions. The main idea of I/O
reactor pattern is to break away from the one thread per connection
model imposed by the classic blocking I/O model.
By default, Reactor Netty is used but you could consider Jetty Rective Http Client, Apache HttpComponents (async) or even AWS Common Runtime (CRT) Http Client if you create required adapter (not sure it already exists).
In general, you can see the trend across the industry to use async I/O (NIO) because they are more resource efficient for applications under high load.
In addition, to handle resource efficiently the whole flow must be async. By using block() we are implicitly reintroducing thread-per-connection approach that will eliminate most of the benefits of the NIO. At the same time using WebClient with block() could be considered as a first step for migration to the fully reactive application.

Great question.
Last week we considered migrating from resttemplate to webclient.
This week, I start testing the performance between the blocking webclient and the resttemplate, to my surprise, the resttemplate performed better in scenarios where the response payloads were large. The difference was considerably large, with the resttemplate taking less than half the time to respond and using fewer resources.
I'm still carrying out the performance tests, now I started the tests with a wider range of users for request.
The application is mvc and is using spring 5.13.19 and spring boot 2.6.7.
For perfomance testing I'm using jmeter and for health check visualvm/jconsole

quarkus reactive mutiny thread pool management

Background: I'm just this week getting started with Quarkus, though I've worked with some streaming platforms before (especially http4s/fs2 in scala).
Working with quarkus reactive (with mutiny) and any reactive database client (mutiny reactive postgres, reactive elasticsearch, etc.) I'm a little confused how to correctly manage blocking calls and thread pools.
The quarkus documentation suggests imperative code or cpu-intensive code to annotated with #Blocking to ensure it is shifted to a worker pool to not block the IO pool. This makes sense.
Consider the following:
public class Stuff {
// imperative, cpu intensive
public static int cpuIntensive(String arg) { /* ... */ }
// blocking IO
public static List<Integer> fetchFromDb() { /* ... */ }
// reactive IO
public static Multi<String> fetchReactive() { /* ... */ }
// reactive IO with CPU processing
public static Multi<String> fetchReactiveCpuIntensive(String arg) {
fetchReactive() // reactive fetch
.map(fetched -> cpuIntensive(arg + fetched)) // cpu intensive work
}
}
It's not clear to me what happens in each of the above conditions and where they get executed if they were called from a resteasy-reactive rest endpoint without the #Blocking annotation.
Presumably it's safe to use any reactive client in a reactive rest endpoint without #Blocking. But does wrapping a blocking call in Uni accomplish as much for 'unsafe' clode? That is, will anything returning a Multi/Uni effectively run in the worker pool?
(I'll open follow-up posts about finer control of thread pools as I don't see any way to 'shift' reactive IO calls to a separate pool than cpu-intensive work, which would be optimal.)
Edit
This question might imply I'm asking about return types (Uni/Multi vs direct objects) but it's really about the ability to select the thread pool in use at any given time. this mutiny page on imperative-to-reactive somewhat answers my question actually, along with the mutiny infrastructure docs which state that "the default executor is already configured to use the Quarkus worker thread pool.", and the mutiny thread control docs handles the rest I think.
So my understanding is this:
If I have an endpoint which conditionally can return something non-blocking (e.g. a local non-blocking cache hit) then I can effectively return any way I want on the IO thread. But if said cache is a miss, I could either call a reactive client directly or use mutiny to run a blocking action on the quarkus worker pool. Similarly mutiny provides control to execute any given stream on a specific thread pool (executor).
And reactive clients (or anything effectively running on the non-IO pool) is safe to call because the IO loop is just subscribing to data emitted by the worker pool.
Lastly, it seems like I could configure a cpu-bound pool separately from an IO-bound worker pool and explicitly provide them as executors to whichever emitters I need. So ... I think I'm all set now.

This is very good question!
The return type of a RESTEasy Reactice endpoint does not have any effect on which thread the endpoint will be served on.
The only thing that determines the thread is the presense of #Blocking / #NonBlocking.
The reason for this is simple: By just using the return type, it is not possible to know if the operation actually takes a long time to finish (and thus block the thread).
A non-reactive return type for example does not imply that the operation is CPU intensive (as you could for example just be returning some canned JSON response).
A reactive type on the other hand provides no guarantee that the operation is non-blocking, because as you mention, a user could simply wrap a blocking operation with a reactive return type.

Async API design client

Lets say I create an async REST API in Spring MVC with Java 8's Completeable.
How is this called in the client? If its non blocking, does the endpoint return something before processing? Ie
#RequestMapping("/") //GET method
public CompletableFuture<String> meth(){
thread.sleep(10000);
String result = "lol";
return CompletableFuture.completedFuture(result);
}
How does this exactly work? (This code above is just a randomly made code I just thought of).
When I send a GET request from say google chrome # localhost:3000/ then what happens? I'm a newbie to async APIs, would like some help.

No, the client doesn't know it's asynchronous. It'll have to wait for the result normally. It's just the server side that benefits from freeing up a worker thread to handle other requests.

In this version it's pointless, because CompletableFuture.completedFuture() creates a completed Future immediately.
However in a more complex piece of code, you might return a Future that is not yet complete. Spring will not send the response body until some other thread calls complete() on this Future.
Why not just use a new thread? Well, you could - but in some situations it might be more efficient not to. For example you might put a task into an Executor to be handled by a small pool of threads.
Or you might fire off a JMS message asking for the request to be handled by a completely separate machine. A different part of your program will respond to incoming JMS messages, find the corresponding Future and complete it. There is no need for a thread dedicated to this HTTP request to be alive while the work is being done on another system.
Very simple example:
#RequestMapping("/employeenames/{id}")
public CompletableFuture<String> getName(#PathVariable String id){
CompletableFuture<String> future = new CompletableFuture<>();
database.asyncSelect(
name -> future.complete(name),
"select name from employees where id = ?",
id
);
return future;
}
I've invented a plausible-ish API for an asynchronous database client here: asyncSelect(Consumer<String> callback, String preparedstatement, String... parameters). The point is that it fires off the query, then does not block the tread waiting for the DB to respond. Instead it leaves a callback (name -> future.complete(name)) for the DB client to invoke when it can.
This is not about improving API response times -- we do not send an HTTP response until we have a payload to provide. This is about using the resources on the server more efficiently, so that while we're waiting for the database to respond it can do other things.
There is a related, but different concept, of asynch REST, in which the server responds with 202 Accepted and a header like Location: /queue/12345, allowing the client to poll for the result. But this isn't what the code you asked about does.

CompletableFuture was introduced by Java to make handling complex asynchronous programming. It lets the programmer combine and cascade async calls, and offers the static utility methods runAsync and supplyAsync to abstract away the manual creation of threads.
These methods dispatch tasks to Java’s common thread pool by default or a custom thread pool if provided as an optional argument.
If a CompletableFuture is returned by an endpoint method and #complete is never called, the request will hang until it times out.

Spring MVC Rest Services - Number of Threads (Controller Instances)

In our application we want to achieve higher throughput so I just want to know how threading works in Spring MVC controllers.
Thanks in advance for your help.
This helped me
http://community.jaspersoft.com/wiki/how-increase-maximum-thread-count-tomcat-level

A web application is hosted in an application server (like tomcat). Usually the application server manage a thread pool and every request is handled by a thread.
The web application don't have to worry about this thread pool. The size of the thread pool is a parameter of the application server.
To achieve higher throughput you really need to identify the bottleneck.
(According my experience, the size of the thread pool of the application server is rarely the root cause of performance problem.)
Note that the "number of controller instances" is normally one. i.e. a controller is usually a singleton shared/used by all threads, and therefore a controller must be thread-safe.

Let us specify the question a little more: an application of interest, implementing a REST controller, is deployed on a typical mutlithreaded application server (running, possibly, other things as well). Q: Is there concurrence in handling of separate requests to the mapped methods of the controller?
I'm not authoritative in this subject, but it is of high importance (in particular: should single-threaded logic be applied to REST-Controller code?).
Edit: answer below is WRONG. Concurrent calls to different methods of same controller are handled concurrently, and so all shared resources they use (services, repositories etc.) must ensure thread safety. For some reason, however, calls to be handled by the same method of the controller are serialized (or: so it appears to me as of now).
The small test below shows, that even though subsequent (and rapid) calls to the mapped methods of the controller are indeed handled by different threads, single-threaded logic applies (i.e. there is no cuncurrency "out of the box").
Let us take the controller:
AtomicInteger count = new AtomicInteger();
#RequestMapping(value = {"/xx/newproduct"})
#ResponseBody
public Answer newProduct(){
Integer atCount = count.incrementAndGet();
////// Any delay/work would do here
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
Answer ans = new Answer("thread:" + Thread.currentThread().getName() + " controller:" + this, atCount);
count.decrementAndGet();
return ans;
}
and launch 10 rapid (almost concurrent w.r.t. the 1000ms sleep time) REST requests, e.g. by the AngularJS code
$scope.newProd = function (cnt) {
var url = $scope.M.dataSource + 'xx/newproduct';
for(var i=0; i<cnt; ++i) {
$http.get(url).success(function(data){
console.log(data);
});
}
};
(the Answer just carries a String and an Integer; making count static would not change anything). What happens, is that all requests become pending concurrently, but responses come sequentially, exactly 1s apart, and none has atCount>1. They do come from different threads though.
More specifically the console log has:
in other words:
Edit: This shows, that concurrent calls to the same method/route are serialized. However, by adding a second method to the controller we easily verify, that calls to this method would be handled concurrently with the calls to the first method, and hence, multithreaded logic for handling requests is mandatory "out-of-the-box".
In order to profit from multithreading one should therefore, as it seems, employ traditional explicit methods, such as launching any non-trivial work as a Runnable on an Executor.

Basically this has nothing to do with Spring. Usually each request is forked into a separate thread. So the usual thing to do here is finding the bottleneck.
However there is a possibility that badly written beans that share state over thread boundaries and therefore need to be synchronized might have a very bad effect.

How do we start a thread from a servlet?

What's the recommended way of starting a thread from a servlet?
Example: One user posts a new chat message to a game room. I want to send a push notification to all other players connected to the room, but it doesn't have to happen synchronously. Something like:
public MyChatServlet extends HttpServlet {
protected void doPost(HttpServletRequest request,
HttpServletResponse response)
{
// Update the database with the new chat message.
final String msg = ...;
putMsgInDatabaseForGameroom(msg);
// Now spawn a thread which will deal with communicating
// with apple's apns service, this can be done async.
new Thread() {
public void run() {
talkToApple(msg);
someOtherUnimportantStuff(msg);
}
}.start();
// We can send a reply back to the caller now.
// ...
}
}
I'm using Jetty, but I don't know if the web container really matters in this case.
Thanks

What's the recommended way of starting a thread from a servlet?
You should be very careful when writing the threading program in servlet.
Because it may causes errors (like memory leaks or missing synchronization) can cause bugs that are very hard to reproduce,
or bring down the whole server.
You can start the thread by using start() method.
As per my knowledge , I would recommend startAsync (servlet 3.0).
I got some helpful link for you Click.
but I don't know if the web container really matters in this case.
Yes it matters.Most webservers (Java and otherwise, including JBoss) follow a "one thread per request" model, i.e. each HTTP request is fully processed by exactly one thread.
This thread will often spend most of the time waiting for things like DB requests. The web container will create new threads as necessary.
Hope it will help you.

I would use a ThreadPoolExecutor and submit the tasks to it. The executor can be configured with a fixed/varying number of threads, and with a work queue that can be bounded or not.
The advantages:
The total number of threads (as well as the queue size) can be bounded, so you have good control on resource consumption.
Threads are pooled, eliminating the overhead of thread starting per request
You can choose a task rejection policy (Occurs when the pool is at full capacity)
You can easily monitor the load on the pool
The executor mechanism supports convenient ways of tracking the asynchronous operation (using Future)

In general that is the way. You can start any thread anywhere in a servlet web application.
But in particulary, you should protect your JVM from starting too much threads on any HTTP request. Someone may request a lot ( or very very much ) and propably at some point your JVM will stop with out of memory or something similiar.
So better choice is to use one of the queues found in the java.util.concurrent package.

One option would be to use ExecutorService and it's implementations like ThreadPoolExecutor
, to re-use the pooled threads thus reducing the creation overhead.
You can use also JMS for queuing you tasks to be executed later.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.