Spring WebFlux perform parallel HTTP requests and deserialize the response

Spring WebFlux perform parallel HTTP requests and deserialize the response - java

I have a List<String> containing URLs and I would like to perform a GET request for each URL in that List.
Those requests should be made in parallel. After all the requests are done I would like to have a List<CustomModel> containing all the deserialized responses.
So I created a method to make the HTTP request
public Flux<JsonNode> performGetRequest(String url) {
WebClient webClient = WebClient.create(String.format("%s%s", API_BASE_URL, url));
return webClient.get()
.retrieve()
.bodyToFlux(JsonNode.class);
}
The above method is called this way
public List<CustomModel> fetch(List<String> urls) {
return Flux.fromIterable(urls)
.parallel()
.runOn(Schedulers.boundedElastic())
.flatMap(this::performGetRequest)
.flatMap(jsonNode -> Flux.fromIterable(customDeserialize(jsonNode)))
.sequential()
.collectList()
.flatMapMany(Flux::fromIterable)
.collectList()
.block();
}
For each response, I am using a custom method to deserialize the response
private List<CustomModel> customDeserialize(final JsonNode jsonNodeResponse) {
List<CustomModel> customModelList = new ArrayList<>();
for (JsonNode block : jsonNodeResponse) {
// deserialize the response, create an instance of CustomModel class
// and add it to customModelList
}
return customModelList;
}
The problem is that even tho I use the parallel() method the whole process is probably not running on parallel. The time it takes to complete indicates that I am doing something wrong.
Am I missing something?

The problem is that even tho I use the parallel() method the whole process is probably not running on parallel. The time it takes to complete indicates that I am doing something wrong.
Am I missing something?
Since you are calling block im going to assume you are running a MVC servlet application which is using WebClient only for rest calls.
If you are not running a full webflux application, your application will start up a single event loop that will process all events that are scheduled. If running a full webflux application, you will get as many event loops as cores on the running machine.
By the usage of parallel the reactor documentation says:
To obtain a ParallelFlux, you can use the parallel() operator on any Flux. By itself, this method does not parallelize the work. Rather, it divides the workload into “rails” (by default, as many rails as there are CPU cores).
In order to tell the resulting ParallelFlux where to run each rail (and, by extension, to run rails in parallel) you have to use runOn(Scheduler). Note that there is a recommended dedicated Scheduler for parallel work: Schedulers.parallel().
You are creating a boundedElastic scheduler which is not optimised for parallel work.
But i want to mention, you are doing async i/o not parallel work which is very important to point out. You will most likely not gain any performance gains, when you are running in parallel since most of your i/o will fire off a request and then just wait for a response.
ParellelFlux will ensure that all cpu cores are being used, but there is also some punishements. There is a setup time to make sure that all cores get up to start doing work, then the work that needs to be done is not cpu-intensive, they just fire off say 1000 requests, then all the threads are done, and have to wait for responses.
Workers need to be setup on the cores, the information needs to be sent to each core, retrieved etc.
parallel gains most of its benefits when you have CPU intensive work, where each event needs to perform heavy computations on multiple cores. But for async work a regular Flux will most likely be enough.
Here is what Simon Baslé one of the reactor devs has to say about running i/o work in reactor, parallel vs async
Also worth mentioning, a boundedElastic scheduler is tuned for blocking work as a fallback to regular servlet behaviour in a pure webflux application.
You are running webflux in a servlet application, so what benefits you get may not be as a full as a webflux application.

I'm not 100% sure if this is the issue here, but I noticed when working with WebClient and ParallelFlux, that the WebClient is only returning the Publisher for the response (bodyToMono / bodyToFlux), not for the actual request.
Consider to wrap the remote call with Flux.defer / Mono.defer to get a Publisher already for the request, e.g. something like:
.flatMap(url -> Flux.defer(() -> performGetRequest(url)))

Related

Does the use of Spring Webflux's WebClient in a blocking application design cause a larger use of resources than RestTemplate

I am working on several spring-boot applications which have the traditional pattern of thread-per-request. We are using Spring-boot-webflux to acquire WebClient to perform our RESTful integration between the applications. Hence our application design requires that we block the publisher right after receiving a response.
Recently, we've been discussing whether we are unnecessarily spending resources using a reactive module in our otherwise blocking application design. As I've understood it, WebClient makes use of the event loop by assigning a worker thread to perform the reactive actions in the event loop. So using webclient with .block() would sleep the original thread while assigning another thread to perform the http-request. Compared to the alternative RestTemplate, it seems like WebClient would spend additional resources by using the event loop.
Is it correct that partially introducing spring-webflux in this way leads to additional spent resources while not yielding any positive contribution to performance, neither single threaded and concurrent? We are not expecting to ever upgrade our current stack to be fully reactive, so the argument of gradually upgrading does not apply.

In this presentation Rossen Stoyanchev from the Spring team explains some of these points.
WebClient will use a limited number of threads - 2 per core for a total of 12 threads on my local machine - to handle all requests and their responses in the application. So if your application receives 100 requests and makes one request to an external server for each, WebClient will handle all of those using those threads in a non-blocking / asynchronous manner.
Of course, as you mention, once you call block your original thread will block, so it would be 100 threads + 12 threads for a total of 112 threads to handle those requests. But keep in mind that these 12 threads do not grow in size as you make more requests, and that they don't do I/O heavy lifting, so it's not like WebClient is spawning threads to actually perform the requests or keeping them busy on a thread-per-request fashion.
I'm not sure if when the thread is under block it behaves the same as when making a blocking call through RestTemplate - it seems to me that in the former the thread should be inactive waiting for the NIO call to complete, while in the later the thread should be handling I/O work, so maybe there's a difference there.
It gets interesting if you begin using the reactor goodies, for example handling requests that depend on one another, or many requests in parallel. Then WebClient definitely gets an edge as it'll perform all concurrent actions using the same 12 threads, instead of using a thread per request.
As an example, consider this application:
#SpringBootApplication
public class SO72300024 {
private static final Logger logger = LoggerFactory.getLogger(SO72300024.class);
public static void main(String[] args) {
SpringApplication.run(SO72300024.class, args);
}
#RestController
#RequestMapping("/blocking")
static class BlockingController {
#GetMapping("/{id}")
String blockingEndpoint(#PathVariable String id) throws Exception {
logger.info("Got request for {}", id);
Thread.sleep(1000);
return "This is the response for " + id;
}
#GetMapping("/{id}/nested")
String nestedBlockingEndpoint(#PathVariable String id) throws Exception {
logger.info("Got nested request for {}", id);
Thread.sleep(1000);
return "This is the nested response for " + id;
}
}
#Bean
ApplicationRunner run() {
return args -> {
Flux.just(callApi(), callApi(), callApi())
.flatMap(responseMono -> responseMono)
.collectList()
.block()
.stream()
.flatMap(Collection::stream)
.forEach(logger::info);
logger.info("Finished");
};
}
private Mono<List<String>> callApi() {
WebClient webClient = WebClient.create("http://localhost:8080");
logger.info("Starting");
return Flux.range(1, 10).flatMap(i ->
webClient
.get().uri("/blocking/{id}", i)
.retrieve()
.bodyToMono(String.class)
.doOnNext(resp -> logger.info("Received response {} - {}", I, resp))
.flatMap(resp -> webClient.get().uri("/blocking/{id}/nested", i)
.retrieve()
.bodyToMono(String.class)
.doOnNext(nestedResp -> logger.info("Received nested response {} - {}", I, nestedResp))))
.collectList();
}
}
If you run this app, you can see that all 30 requests are handled immediately in parallel by the same 12 (in my computer) threads. Neat! If you think you can benefit from such kind of parallelism in your logic, it's probably worth it giving WebClient a shot.
If not, while I wouldn't actually worry about the "extra resource spending" given the reasons above, I don't think it would be worth it adding the whole reactor/webflux dependency for this - besides the extra baggage, in day to day operations it should be a lot simpler to reason about and debug RestTemplate and the thread-per-request model.
Of course, as others have mentioned, you ought to run load tests to have proper metrics.

According to official Spring documentation for RestTemplate it's in the maintenance mode and probably will not be supported in future versions.
As of 5.0 this class is in maintenance mode, with only minor requests
for changes and bugs to be accepted going forward. Please, consider
using the org.springframework.web.reactive.client.WebClient which has
a more modern API and supports sync, async, and streaming scenarios
As for system resources, it really depends on your use case and I would recommend to run some performance tests, but it seems that for low workloads using blocking client could have a better performance owning to a dedicated thread per connection. As load increases, the NIO clients tend to perform better.
Update - Reactive API vs Http Client
It's important to understand the difference between Reactive API (Project Reactor) and http client. Although WebClient uses Reactive API it doesn't add any additional concurrently until we explicitly use operators like flatMap or delay that could schedule execution on different thread pools. If we just use
webClient
.get()
.uri("<endpoint>")
.retrieve()
.bodyToMono(String.class)
.block()
the code will be executed on the caller thread that is the same as for blocking client.
If we enable debug logging for this code, we will see that WebClient code is executed on the caller thread but for network operations execution will be switched to reactor-http-nio-... thread.
The main difference is that internally WebClient uses asynchronous client based on non-blocking IO (NIO). These clients use Reactor pattern (event loop) to maintain a separate thread pool(s) which allow you to handle a large number of concurrent connections.
The purpose of I/O reactors is to react to I/O events and to dispatch
event notifications to individual I/O sessions. The main idea of I/O
reactor pattern is to break away from the one thread per connection
model imposed by the classic blocking I/O model.
By default, Reactor Netty is used but you could consider Jetty Rective Http Client, Apache HttpComponents (async) or even AWS Common Runtime (CRT) Http Client if you create required adapter (not sure it already exists).
In general, you can see the trend across the industry to use async I/O (NIO) because they are more resource efficient for applications under high load.
In addition, to handle resource efficiently the whole flow must be async. By using block() we are implicitly reintroducing thread-per-connection approach that will eliminate most of the benefits of the NIO. At the same time using WebClient with block() could be considered as a first step for migration to the fully reactive application.

Great question.
Last week we considered migrating from resttemplate to webclient.
This week, I start testing the performance between the blocking webclient and the resttemplate, to my surprise, the resttemplate performed better in scenarios where the response payloads were large. The difference was considerably large, with the resttemplate taking less than half the time to respond and using fewer resources.
I'm still carrying out the performance tests, now I started the tests with a wider range of users for request.
The application is mvc and is using spring 5.13.19 and spring boot 2.6.7.
For perfomance testing I'm using jmeter and for health check visualvm/jconsole

Parallel Flux vs Flux in project Reactor

So what I have understood from the docs is that parallel Flux is that essentially divided the flux elements into separate rails.(Essentially something like grouping). And as far as thread is considered, it would be the job of schedulers. So let's consider a situation like this. And all this will be run on the same scheduler instance provided via runOn() methods.
Let's consider a situation like below:
Mono<Response> = webClientCallAPi(..) //function returning Mono from webclient call
Now let's say we make around 100 calls
Flux.range(0,100).subscribeOn(Schedulers.boundedElastic()).flatMap(i -> webClientCallApi(i)).collecttoList() // or subscribe somehow
and if we use paralleFlux:
Flux.range(0,100).parallel().runOn(Schedulers.boundedElastic()).flatMap(i -> webClientCallApi(i)).sequential().collecttoList();
So if my understanding is correct, it pretty much seems to be similar. So what are the advantages of ParallelFlux over Flux and when should you use parallelFlux over flux?

In practice, you'll likely very rarely need to use a parallel flux, including in this example.
In your example, you're firing off 100 web service calls. Bear in mind the actual work needed to do this is very low - you generate and fire off an asynchronous request, and then some time later you receive a response back. In between that request & response you're not doing any work at all, it simply takes a tiny amount of CPU resources when each request is sent, and another tiny about when each response is received. (This is one of the core advantages of using an asynchronous framework to make your web requests, you're not tying up any threads while the request is in-flight.)
If you split this flux and run it in parallel, you're saying that you want these tiny amounts of CPU resources to be split so they can run simultaneously, on different CPU cores. This makes absolutely no sense - the overhead of splitting the flux, running it in parallel and then combining it later is going to be much, much greater than just leaving it to execute on a normal, sequential scheduler.
On the other hand, let's say I had a Flux<Integer> and I wanted to check if each of those integers was a prime for example - or perhaps a Flux<String> of passwords that I wanted to check against a BCrypt hash. Those sorts of operations are genuinely CPU intensive, so in that case a parallel flux, used to split execution across cores, could make a lot of sense. In reality though, those situations occur quite rarely in the normal reactor use cases.
(Also, just as a closing note, you almost always want to use Schedulers.parallel() with a parallel flux, not Schedulers.boundedElastic().)

How to improve the performance of a REST call which internally other REST calls

I am creating an endpoint which retrieves me some data and in this call it calls 3 different REST calls and due to this it hampers the performance of my application.
My Endpoint Code:
1. REST call to get the userApps()
2. iterate over userAPPs
2.1 make REST call to get the appDetails
2.2 make use of above response to call the 3rd REST call which returns list.
2.3 iterate over the above list and filter out the required fields and put it in main response object.
3.return response
So, this much complexity hampers the performance.
I have tried to add the multithreading concept but the time taken by normal code and multi threading is almost same.
Condition is like, We can not modify the 3 external REST calls to support the pagination.
We can not add the pagination because we don't have any database.
Is there any solution?

You shouldn't add threading, you should remove threads altogether. I.e. you should make all your code non-blocking. This just means that all the work will basically be done in the http-client's threadpool, and all the waiting can be done in the operating system's selector (which we want).
Here is some code how this core logic would work, assuming your http calls return CompletableFuture.
public CompletableFuture<List<Something>> retrieveSomethings() {
return retrieveUserApps().thenCompose(this::retriveAllAppsSomethings);
}
public CompletableFuture<List<Something>> retrieveAllAppsSomethings(List<UserApp> apps) {
return CompletableFuture.allOf(
apps.stream().map(this::retriveAppSomethings).toArray(CompletableFuture[]::new))
.apply(listOfLists -> listOfLists.stream().flatMap(List::stream).collect(Collectors.toList()))
.apply(this::filterSomethings);
}
public CompletableFuture<List<Something>> retreiveAppSomethings(UserApp app) {
return retrieveAppDetails(app).thenCompose(this::retreiveAppDetailSomethings);
}
All this does, is to make everything non-blocking, so everything that can be run in parallel will run in parallel. There is no need to limit anything, since everything will be run in the http-client's threadpool, which is most likely limited. It doesn't matter anyway, because waiting will not take up a thread.
All you have to do for the above code is to implement retrieveUserApps(), retrieveAppDetails(app) and retrieveAppDetailSometings(appDetail). All of which should return a CompletableFuture<> and be implemented with the async-enabled version of your http client.
This will make retrieving data for 1 app or 100 apps the same, since all of those will run in parallel (assuming they all take the same time and the downstream systems can handle this many parallel requests).

Run Async task, before return flux db entities

I have Flux<URL>. How can I make multiple concurrent void requests for each URL (for example myWebClient.refresh(URL)), then (after all requests done) read data from the database and return Flux<MyAnyEntity> (for example repo.findAll())?

You can achieve that using Flux/Mono operators:
// get the URIs from somewhere
Flux<URI> uris = //...
Flux<MyAnyEntity> entities = uris
// map each URI to a HTTP client call and do nothing with the response
.flatMap(uri -> webClient.get().uri(uri).exchange().then())
// chain that call with a call on your repository
.thenMany(repo.findAll());
Update:
This code is naturally asynchronous, non-blocking so all operations in the flatMap operator will be executed concurrently, according to the demand communicated by the consumer (this is the backpressure we're talking about).
If the Reactive Streams Subscriber request(N) elements, then N requests might be executed concurrently. I don't think this is not something you want to deal with directly, although you can influence things using windowing operators for micro-bacthing operations.
Using .subscribeOn(Schedulers.parallel()) will not improve concurrency in this case - as stated in the reference documentation you should only use that for CPU-bound work.

Async API design client

Lets say I create an async REST API in Spring MVC with Java 8's Completeable.
How is this called in the client? If its non blocking, does the endpoint return something before processing? Ie
#RequestMapping("/") //GET method
public CompletableFuture<String> meth(){
thread.sleep(10000);
String result = "lol";
return CompletableFuture.completedFuture(result);
}
How does this exactly work? (This code above is just a randomly made code I just thought of).
When I send a GET request from say google chrome # localhost:3000/ then what happens? I'm a newbie to async APIs, would like some help.

No, the client doesn't know it's asynchronous. It'll have to wait for the result normally. It's just the server side that benefits from freeing up a worker thread to handle other requests.

In this version it's pointless, because CompletableFuture.completedFuture() creates a completed Future immediately.
However in a more complex piece of code, you might return a Future that is not yet complete. Spring will not send the response body until some other thread calls complete() on this Future.
Why not just use a new thread? Well, you could - but in some situations it might be more efficient not to. For example you might put a task into an Executor to be handled by a small pool of threads.
Or you might fire off a JMS message asking for the request to be handled by a completely separate machine. A different part of your program will respond to incoming JMS messages, find the corresponding Future and complete it. There is no need for a thread dedicated to this HTTP request to be alive while the work is being done on another system.
Very simple example:
#RequestMapping("/employeenames/{id}")
public CompletableFuture<String> getName(#PathVariable String id){
CompletableFuture<String> future = new CompletableFuture<>();
database.asyncSelect(
name -> future.complete(name),
"select name from employees where id = ?",
id
);
return future;
}
I've invented a plausible-ish API for an asynchronous database client here: asyncSelect(Consumer<String> callback, String preparedstatement, String... parameters). The point is that it fires off the query, then does not block the tread waiting for the DB to respond. Instead it leaves a callback (name -> future.complete(name)) for the DB client to invoke when it can.
This is not about improving API response times -- we do not send an HTTP response until we have a payload to provide. This is about using the resources on the server more efficiently, so that while we're waiting for the database to respond it can do other things.
There is a related, but different concept, of asynch REST, in which the server responds with 202 Accepted and a header like Location: /queue/12345, allowing the client to poll for the result. But this isn't what the code you asked about does.

CompletableFuture was introduced by Java to make handling complex asynchronous programming. It lets the programmer combine and cascade async calls, and offers the static utility methods runAsync and supplyAsync to abstract away the manual creation of threads.
These methods dispatch tasks to Java’s common thread pool by default or a custom thread pool if provided as an optional argument.
If a CompletableFuture is returned by an endpoint method and #complete is never called, the request will hang until it times out.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.