quarkus reactive mutiny thread pool management

quarkus reactive mutiny thread pool management - java

Background: I'm just this week getting started with Quarkus, though I've worked with some streaming platforms before (especially http4s/fs2 in scala).
Working with quarkus reactive (with mutiny) and any reactive database client (mutiny reactive postgres, reactive elasticsearch, etc.) I'm a little confused how to correctly manage blocking calls and thread pools.
The quarkus documentation suggests imperative code or cpu-intensive code to annotated with #Blocking to ensure it is shifted to a worker pool to not block the IO pool. This makes sense.
Consider the following:
public class Stuff {
// imperative, cpu intensive
public static int cpuIntensive(String arg) { /* ... */ }
// blocking IO
public static List<Integer> fetchFromDb() { /* ... */ }
// reactive IO
public static Multi<String> fetchReactive() { /* ... */ }
// reactive IO with CPU processing
public static Multi<String> fetchReactiveCpuIntensive(String arg) {
fetchReactive() // reactive fetch
.map(fetched -> cpuIntensive(arg + fetched)) // cpu intensive work
}
}
It's not clear to me what happens in each of the above conditions and where they get executed if they were called from a resteasy-reactive rest endpoint without the #Blocking annotation.
Presumably it's safe to use any reactive client in a reactive rest endpoint without #Blocking. But does wrapping a blocking call in Uni accomplish as much for 'unsafe' clode? That is, will anything returning a Multi/Uni effectively run in the worker pool?
(I'll open follow-up posts about finer control of thread pools as I don't see any way to 'shift' reactive IO calls to a separate pool than cpu-intensive work, which would be optimal.)
Edit
This question might imply I'm asking about return types (Uni/Multi vs direct objects) but it's really about the ability to select the thread pool in use at any given time. this mutiny page on imperative-to-reactive somewhat answers my question actually, along with the mutiny infrastructure docs which state that "the default executor is already configured to use the Quarkus worker thread pool.", and the mutiny thread control docs handles the rest I think.
So my understanding is this:
If I have an endpoint which conditionally can return something non-blocking (e.g. a local non-blocking cache hit) then I can effectively return any way I want on the IO thread. But if said cache is a miss, I could either call a reactive client directly or use mutiny to run a blocking action on the quarkus worker pool. Similarly mutiny provides control to execute any given stream on a specific thread pool (executor).
And reactive clients (or anything effectively running on the non-IO pool) is safe to call because the IO loop is just subscribing to data emitted by the worker pool.
Lastly, it seems like I could configure a cpu-bound pool separately from an IO-bound worker pool and explicitly provide them as executors to whichever emitters I need. So ... I think I'm all set now.

This is very good question!
The return type of a RESTEasy Reactice endpoint does not have any effect on which thread the endpoint will be served on.
The only thing that determines the thread is the presense of #Blocking / #NonBlocking.
The reason for this is simple: By just using the return type, it is not possible to know if the operation actually takes a long time to finish (and thus block the thread).
A non-reactive return type for example does not imply that the operation is CPU intensive (as you could for example just be returning some canned JSON response).
A reactive type on the other hand provides no guarantee that the operation is non-blocking, because as you mention, a user could simply wrap a blocking operation with a reactive return type.

Related

Does the use of Spring Webflux's WebClient in a blocking application design cause a larger use of resources than RestTemplate

I am working on several spring-boot applications which have the traditional pattern of thread-per-request. We are using Spring-boot-webflux to acquire WebClient to perform our RESTful integration between the applications. Hence our application design requires that we block the publisher right after receiving a response.
Recently, we've been discussing whether we are unnecessarily spending resources using a reactive module in our otherwise blocking application design. As I've understood it, WebClient makes use of the event loop by assigning a worker thread to perform the reactive actions in the event loop. So using webclient with .block() would sleep the original thread while assigning another thread to perform the http-request. Compared to the alternative RestTemplate, it seems like WebClient would spend additional resources by using the event loop.
Is it correct that partially introducing spring-webflux in this way leads to additional spent resources while not yielding any positive contribution to performance, neither single threaded and concurrent? We are not expecting to ever upgrade our current stack to be fully reactive, so the argument of gradually upgrading does not apply.

In this presentation Rossen Stoyanchev from the Spring team explains some of these points.
WebClient will use a limited number of threads - 2 per core for a total of 12 threads on my local machine - to handle all requests and their responses in the application. So if your application receives 100 requests and makes one request to an external server for each, WebClient will handle all of those using those threads in a non-blocking / asynchronous manner.
Of course, as you mention, once you call block your original thread will block, so it would be 100 threads + 12 threads for a total of 112 threads to handle those requests. But keep in mind that these 12 threads do not grow in size as you make more requests, and that they don't do I/O heavy lifting, so it's not like WebClient is spawning threads to actually perform the requests or keeping them busy on a thread-per-request fashion.
I'm not sure if when the thread is under block it behaves the same as when making a blocking call through RestTemplate - it seems to me that in the former the thread should be inactive waiting for the NIO call to complete, while in the later the thread should be handling I/O work, so maybe there's a difference there.
It gets interesting if you begin using the reactor goodies, for example handling requests that depend on one another, or many requests in parallel. Then WebClient definitely gets an edge as it'll perform all concurrent actions using the same 12 threads, instead of using a thread per request.
As an example, consider this application:
#SpringBootApplication
public class SO72300024 {
private static final Logger logger = LoggerFactory.getLogger(SO72300024.class);
public static void main(String[] args) {
SpringApplication.run(SO72300024.class, args);
}
#RestController
#RequestMapping("/blocking")
static class BlockingController {
#GetMapping("/{id}")
String blockingEndpoint(#PathVariable String id) throws Exception {
logger.info("Got request for {}", id);
Thread.sleep(1000);
return "This is the response for " + id;
}
#GetMapping("/{id}/nested")
String nestedBlockingEndpoint(#PathVariable String id) throws Exception {
logger.info("Got nested request for {}", id);
Thread.sleep(1000);
return "This is the nested response for " + id;
}
}
#Bean
ApplicationRunner run() {
return args -> {
Flux.just(callApi(), callApi(), callApi())
.flatMap(responseMono -> responseMono)
.collectList()
.block()
.stream()
.flatMap(Collection::stream)
.forEach(logger::info);
logger.info("Finished");
};
}
private Mono<List<String>> callApi() {
WebClient webClient = WebClient.create("http://localhost:8080");
logger.info("Starting");
return Flux.range(1, 10).flatMap(i ->
webClient
.get().uri("/blocking/{id}", i)
.retrieve()
.bodyToMono(String.class)
.doOnNext(resp -> logger.info("Received response {} - {}", I, resp))
.flatMap(resp -> webClient.get().uri("/blocking/{id}/nested", i)
.retrieve()
.bodyToMono(String.class)
.doOnNext(nestedResp -> logger.info("Received nested response {} - {}", I, nestedResp))))
.collectList();
}
}
If you run this app, you can see that all 30 requests are handled immediately in parallel by the same 12 (in my computer) threads. Neat! If you think you can benefit from such kind of parallelism in your logic, it's probably worth it giving WebClient a shot.
If not, while I wouldn't actually worry about the "extra resource spending" given the reasons above, I don't think it would be worth it adding the whole reactor/webflux dependency for this - besides the extra baggage, in day to day operations it should be a lot simpler to reason about and debug RestTemplate and the thread-per-request model.
Of course, as others have mentioned, you ought to run load tests to have proper metrics.

According to official Spring documentation for RestTemplate it's in the maintenance mode and probably will not be supported in future versions.
As of 5.0 this class is in maintenance mode, with only minor requests
for changes and bugs to be accepted going forward. Please, consider
using the org.springframework.web.reactive.client.WebClient which has
a more modern API and supports sync, async, and streaming scenarios
As for system resources, it really depends on your use case and I would recommend to run some performance tests, but it seems that for low workloads using blocking client could have a better performance owning to a dedicated thread per connection. As load increases, the NIO clients tend to perform better.
Update - Reactive API vs Http Client
It's important to understand the difference between Reactive API (Project Reactor) and http client. Although WebClient uses Reactive API it doesn't add any additional concurrently until we explicitly use operators like flatMap or delay that could schedule execution on different thread pools. If we just use
webClient
.get()
.uri("<endpoint>")
.retrieve()
.bodyToMono(String.class)
.block()
the code will be executed on the caller thread that is the same as for blocking client.
If we enable debug logging for this code, we will see that WebClient code is executed on the caller thread but for network operations execution will be switched to reactor-http-nio-... thread.
The main difference is that internally WebClient uses asynchronous client based on non-blocking IO (NIO). These clients use Reactor pattern (event loop) to maintain a separate thread pool(s) which allow you to handle a large number of concurrent connections.
The purpose of I/O reactors is to react to I/O events and to dispatch
event notifications to individual I/O sessions. The main idea of I/O
reactor pattern is to break away from the one thread per connection
model imposed by the classic blocking I/O model.
By default, Reactor Netty is used but you could consider Jetty Rective Http Client, Apache HttpComponents (async) or even AWS Common Runtime (CRT) Http Client if you create required adapter (not sure it already exists).
In general, you can see the trend across the industry to use async I/O (NIO) because they are more resource efficient for applications under high load.
In addition, to handle resource efficiently the whole flow must be async. By using block() we are implicitly reintroducing thread-per-connection approach that will eliminate most of the benefits of the NIO. At the same time using WebClient with block() could be considered as a first step for migration to the fully reactive application.

Great question.
Last week we considered migrating from resttemplate to webclient.
This week, I start testing the performance between the blocking webclient and the resttemplate, to my surprise, the resttemplate performed better in scenarios where the response payloads were large. The difference was considerably large, with the resttemplate taking less than half the time to respond and using fewer resources.
I'm still carrying out the performance tests, now I started the tests with a wider range of users for request.
The application is mvc and is using spring 5.13.19 and spring boot 2.6.7.
For perfomance testing I'm using jmeter and for health check visualvm/jconsole

Async API design client

Lets say I create an async REST API in Spring MVC with Java 8's Completeable.
How is this called in the client? If its non blocking, does the endpoint return something before processing? Ie
#RequestMapping("/") //GET method
public CompletableFuture<String> meth(){
thread.sleep(10000);
String result = "lol";
return CompletableFuture.completedFuture(result);
}
How does this exactly work? (This code above is just a randomly made code I just thought of).
When I send a GET request from say google chrome # localhost:3000/ then what happens? I'm a newbie to async APIs, would like some help.

No, the client doesn't know it's asynchronous. It'll have to wait for the result normally. It's just the server side that benefits from freeing up a worker thread to handle other requests.

In this version it's pointless, because CompletableFuture.completedFuture() creates a completed Future immediately.
However in a more complex piece of code, you might return a Future that is not yet complete. Spring will not send the response body until some other thread calls complete() on this Future.
Why not just use a new thread? Well, you could - but in some situations it might be more efficient not to. For example you might put a task into an Executor to be handled by a small pool of threads.
Or you might fire off a JMS message asking for the request to be handled by a completely separate machine. A different part of your program will respond to incoming JMS messages, find the corresponding Future and complete it. There is no need for a thread dedicated to this HTTP request to be alive while the work is being done on another system.
Very simple example:
#RequestMapping("/employeenames/{id}")
public CompletableFuture<String> getName(#PathVariable String id){
CompletableFuture<String> future = new CompletableFuture<>();
database.asyncSelect(
name -> future.complete(name),
"select name from employees where id = ?",
id
);
return future;
}
I've invented a plausible-ish API for an asynchronous database client here: asyncSelect(Consumer<String> callback, String preparedstatement, String... parameters). The point is that it fires off the query, then does not block the tread waiting for the DB to respond. Instead it leaves a callback (name -> future.complete(name)) for the DB client to invoke when it can.
This is not about improving API response times -- we do not send an HTTP response until we have a payload to provide. This is about using the resources on the server more efficiently, so that while we're waiting for the database to respond it can do other things.
There is a related, but different concept, of asynch REST, in which the server responds with 202 Accepted and a header like Location: /queue/12345, allowing the client to poll for the result. But this isn't what the code you asked about does.

CompletableFuture was introduced by Java to make handling complex asynchronous programming. It lets the programmer combine and cascade async calls, and offers the static utility methods runAsync and supplyAsync to abstract away the manual creation of threads.
These methods dispatch tasks to Java’s common thread pool by default or a custom thread pool if provided as an optional argument.
If a CompletableFuture is returned by an endpoint method and #complete is never called, the request will hang until it times out.

Long Polling in Spring

We have a somewhat unique case where we need to interface with an outside API that requires us to long poll their endpoint for what they call real time events.
The thing is we may have as many as 80,000 people/devices hitting this endpoint at any given time, listening for events, 1 connection per device/person.
When a client makes a request from our Spring service to long poll for events, our service then in turn makes an async call to the outside API to long poll for events. The outside API has defined the minimum long poll timeout may be set to 180 seconds.
So here we have a situation where a thread pool with a queue will not work, because if we have a thread pool with something like (5 min, 10 max, 10 queue) then the 10 threads getting worked on may hog the spotlight and the 10 in queue will not get a chance until one of the current 10 are done.
We need a serve it or fail it (we will put load balancers etc. behind it), but we don't want to leave a client hanging without actual polling happening.
We have been looking into using DeferredResult for this, and returning that from the controller.
Something to the tune of
#RequestMapping(value = "test/deferredResult", method = RequestMethod.GET)
DeferredResult<ResponseEntity> testDeferredResult() {
final DeferredResult<ResponseEntity> deferredResult = new DeferredResult<ResponseEntity>();
CompletableFuture.supplyAsync(() -> testService.test()).whenCompleteAsync((result, throwable) -> deferredResult.setResult(result));
return deferredResult;
}
I am questioning if I am on the right path, and also should I provide an executor and what kind of executor (and configuration) to the CompletableFuture.supplyAsync() method to best accomplish our task.
I have read various articles, posts, and such and am wanting to see if anyone has any knowledge that might help our specific situation.

The problem you are describing does not sound like one that can be solved nicely if you are using blocking IO. So you are on the right path, because DeferredResult allows you to produce the result using any thread, without blocking the servlet-container thread.
With regards to calling a long-pooling API upstream, you need a NIO solution as well. If you use a Netty client, you can manage several thousand sockets using a single thread. When the NIO selector in Netty detects data, you will get a channel callback and eventually delegate to a thread in the Netty worker thread pool, and you can call deferredResult.setResult. If you don't do blocking IO the worker pool is usually sized after the number of CPU-cores, otherwise you may need more threads.
There are still a number of challenges.
You probably need more than one server (or network interface) since there are only 65K ports.
Sockets in Java does not have write timeouts, so if a client refuses to read data from the socket, and you send more data than your socket buffer, you would block the Netty worker thread(s) and then everything would stop (reverse slow loris attack). This is a classic problem in large async setups, and one of the reasons for using frameworks like Hystrix (by Netflix).

utility of asynchronous programming in web frameworks?

In both the C# and Java world, we can implement web controllers either with synchronous or asynchronous code. Following is a syntax example
Java Spring MVC
#Controller
public class Controller {
#GetMapping
public ResponseObject doSomething() {
----
return new ReponseObject();
}
public Future<ResponseObject> doSomethingAsync() {
-----
return AsyncTask.forValue(new ReponseObject());
}
}
C# ASP.NET MVC
[Controller]
public class Controller {
public ActionResult DoSomething(){
-------
return Json(new ResponseObject());
}
public async Task<ActionResult> DoSomethingAsync(){
-------
return Json(new ResponseObject());
}
}
From my current knowledge, this is what happens on both platforms when a request is received
In the synchronous case, the thread that was dispatched to serve the web request is busy serving the request, which could take up some time and if the number of threads the server allocates for workers gets exhausted, new clients will have to wait (call it Slashdot effect or DoS)
In the asynchronous case, the worker thread determines that the operation is deferred and assigns a task executor the duty of serving the request, then serves a new request
In both cases, single requests do not get a performance increase if their body is basically copied and pasted between synchronous and asynchronous version. I mean, if you don't parallelize I/O operations by code (e.g. using threads or more Futures within the method body) you don't get any performance improvement
The question
I simply want to ask, from a top-level view, what is the real advantage of employing such asynchronous way of developing web services considering that massive amount of requests will simply exhaust the worker executor's capacity instead of the application server pool. For example in Spring for Java, a thread in the DispatherServlet will be released in favour of a thread pool that has limited capacity. In .NET, tasks and threads are slightly different but the task executor has still limited capacity.
In other words, wouldn't it be simpler to tune the application server's thread pool capacity rather than using an additional layer of abstraction?
Or is there something I don't know yet about asynchronous programming? Why can't I understand its benefits in these kinds of scenarios?

Best practices with Akka in Scala and third-party Java libraries

I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?

I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.

The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.