Multiple asynchronous HTTP requests using Resttemplate - java

I have a service which uses springs RestTemplate to call out to multiple urls.
To improve performance I'd like to perform these requests in parallel. Two options available to me are:
java 8 parallel streams leveraging the fork-join common pool
completable future using isolated thread pool
Just wondering if it best practice to use parallel streams with blocking I/O calls ?

A ForkJoinPool isn't ideal for doing IO work since you don't gain any of the benefits of its work stealing properties. If you planned to use the commonPool and other parts of your app did as well, you might interfere with them. A dedicated thread pool, an ExecutorService for example, is probably the better solution among those two.
I'd like to suggest something even better. Instead of writing all the async wrapping code yourself, consider using Spring's AsyncRestTemplate. It's included in the Spring Web library and its API is almost identical to RestTemplate.
Spring's central class for asynchronous client-side HTTP access.
Exposes similar methods as RestTemplate, but returns ListenableFuture
wrappers as opposed to concrete results.
[...]
Note: by default AsyncRestTemplate relies on standard JDK facilities
to establish HTTP connections. You can switch to use a different HTTP
library such as Apache HttpComponents, Netty, and OkHttp by using a
constructor accepting an AsyncClientHttpRequestFactory.
ListenableFuture instances can easily be converted to CompletableFuture instances through ListenableFuture::completable().
As noted in the Javadoc, you can control what async mechanism you want to use by specifying a AsyncClientHttpRequestFactory. There are a number of built-in implementations, for each of the libraries listed. Internally, some of these libraries might do what you suggested and run blocking IO on dedicated thread pools. Others, like Netty (if memory serves), use non-blocking IO to run the connections. You might gain some benefit from that.
Then it's up to you how you reduce the results. With CompletableFuture, you have access to the anyOf and allOf helpers and any of the combination instance methods.
For example,
URI exampleURI = URI.create("https://www.stackoverflow.com");
AsyncRestTemplate template = new AsyncRestTemplate/* specific request factory*/();
var future1 = template.exchange(exampleURI, HttpMethod.GET, null, String.class).completable();
var future2 = template.exchange(exampleURI, HttpMethod.GET, null, String.class).completable();
var future3 = template.exchange(exampleURI, HttpMethod.GET, null, String.class).completable();
CompletableFuture.allOf(future1, future2, future3).thenRun(() -> {
// you're done
});
AsyncRestTemplate has since been deprecated in favor of Spring Web Flux' WebClient. This API is considerably different so I won't go into it (except to say that it does let you get back a CompletableFuture as well).

Completable future would be a better way to do this, as it is semantically more related to the task and you might keep the code flow going while the task proceeds.
If you use streams, beside the awkwardness of lambdas with exception handling inside and the fact that it is not so related to the task, semantically as in a pipeline, you will have to wait for all of them to finish, even if they are occuring in parallel. To avoid that you would need futures, but then you will be back to the first solution.
You might consider a mix, using streams to create the futures. But given that it is a blocking IO set of requests, you will probably not have enough requests or time to take advantage of the parallel streams, the library will probably not split the tasks in parallel for you and you will be better of with a loop.

Related

Does the use of Spring Webflux's WebClient in a blocking application design cause a larger use of resources than RestTemplate

I am working on several spring-boot applications which have the traditional pattern of thread-per-request. We are using Spring-boot-webflux to acquire WebClient to perform our RESTful integration between the applications. Hence our application design requires that we block the publisher right after receiving a response.
Recently, we've been discussing whether we are unnecessarily spending resources using a reactive module in our otherwise blocking application design. As I've understood it, WebClient makes use of the event loop by assigning a worker thread to perform the reactive actions in the event loop. So using webclient with .block() would sleep the original thread while assigning another thread to perform the http-request. Compared to the alternative RestTemplate, it seems like WebClient would spend additional resources by using the event loop.
Is it correct that partially introducing spring-webflux in this way leads to additional spent resources while not yielding any positive contribution to performance, neither single threaded and concurrent? We are not expecting to ever upgrade our current stack to be fully reactive, so the argument of gradually upgrading does not apply.
In this presentation Rossen Stoyanchev from the Spring team explains some of these points.
WebClient will use a limited number of threads - 2 per core for a total of 12 threads on my local machine - to handle all requests and their responses in the application. So if your application receives 100 requests and makes one request to an external server for each, WebClient will handle all of those using those threads in a non-blocking / asynchronous manner.
Of course, as you mention, once you call block your original thread will block, so it would be 100 threads + 12 threads for a total of 112 threads to handle those requests. But keep in mind that these 12 threads do not grow in size as you make more requests, and that they don't do I/O heavy lifting, so it's not like WebClient is spawning threads to actually perform the requests or keeping them busy on a thread-per-request fashion.
I'm not sure if when the thread is under block it behaves the same as when making a blocking call through RestTemplate - it seems to me that in the former the thread should be inactive waiting for the NIO call to complete, while in the later the thread should be handling I/O work, so maybe there's a difference there.
It gets interesting if you begin using the reactor goodies, for example handling requests that depend on one another, or many requests in parallel. Then WebClient definitely gets an edge as it'll perform all concurrent actions using the same 12 threads, instead of using a thread per request.
As an example, consider this application:
#SpringBootApplication
public class SO72300024 {
private static final Logger logger = LoggerFactory.getLogger(SO72300024.class);
public static void main(String[] args) {
SpringApplication.run(SO72300024.class, args);
}
#RestController
#RequestMapping("/blocking")
static class BlockingController {
#GetMapping("/{id}")
String blockingEndpoint(#PathVariable String id) throws Exception {
logger.info("Got request for {}", id);
Thread.sleep(1000);
return "This is the response for " + id;
}
#GetMapping("/{id}/nested")
String nestedBlockingEndpoint(#PathVariable String id) throws Exception {
logger.info("Got nested request for {}", id);
Thread.sleep(1000);
return "This is the nested response for " + id;
}
}
#Bean
ApplicationRunner run() {
return args -> {
Flux.just(callApi(), callApi(), callApi())
.flatMap(responseMono -> responseMono)
.collectList()
.block()
.stream()
.flatMap(Collection::stream)
.forEach(logger::info);
logger.info("Finished");
};
}
private Mono<List<String>> callApi() {
WebClient webClient = WebClient.create("http://localhost:8080");
logger.info("Starting");
return Flux.range(1, 10).flatMap(i ->
webClient
.get().uri("/blocking/{id}", i)
.retrieve()
.bodyToMono(String.class)
.doOnNext(resp -> logger.info("Received response {} - {}", I, resp))
.flatMap(resp -> webClient.get().uri("/blocking/{id}/nested", i)
.retrieve()
.bodyToMono(String.class)
.doOnNext(nestedResp -> logger.info("Received nested response {} - {}", I, nestedResp))))
.collectList();
}
}
If you run this app, you can see that all 30 requests are handled immediately in parallel by the same 12 (in my computer) threads. Neat! If you think you can benefit from such kind of parallelism in your logic, it's probably worth it giving WebClient a shot.
If not, while I wouldn't actually worry about the "extra resource spending" given the reasons above, I don't think it would be worth it adding the whole reactor/webflux dependency for this - besides the extra baggage, in day to day operations it should be a lot simpler to reason about and debug RestTemplate and the thread-per-request model.
Of course, as others have mentioned, you ought to run load tests to have proper metrics.
According to official Spring documentation for RestTemplate it's in the maintenance mode and probably will not be supported in future versions.
As of 5.0 this class is in maintenance mode, with only minor requests
for changes and bugs to be accepted going forward. Please, consider
using the org.springframework.web.reactive.client.WebClient which has
a more modern API and supports sync, async, and streaming scenarios
As for system resources, it really depends on your use case and I would recommend to run some performance tests, but it seems that for low workloads using blocking client could have a better performance owning to a dedicated thread per connection. As load increases, the NIO clients tend to perform better.
Update - Reactive API vs Http Client
It's important to understand the difference between Reactive API (Project Reactor) and http client. Although WebClient uses Reactive API it doesn't add any additional concurrently until we explicitly use operators like flatMap or delay that could schedule execution on different thread pools. If we just use
webClient
.get()
.uri("<endpoint>")
.retrieve()
.bodyToMono(String.class)
.block()
the code will be executed on the caller thread that is the same as for blocking client.
If we enable debug logging for this code, we will see that WebClient code is executed on the caller thread but for network operations execution will be switched to reactor-http-nio-... thread.
The main difference is that internally WebClient uses asynchronous client based on non-blocking IO (NIO). These clients use Reactor pattern (event loop) to maintain a separate thread pool(s) which allow you to handle a large number of concurrent connections.
The purpose of I/O reactors is to react to I/O events and to dispatch
event notifications to individual I/O sessions. The main idea of I/O
reactor pattern is to break away from the one thread per connection
model imposed by the classic blocking I/O model.
By default, Reactor Netty is used but you could consider Jetty Rective Http Client, Apache HttpComponents (async) or even AWS Common Runtime (CRT) Http Client if you create required adapter (not sure it already exists).
In general, you can see the trend across the industry to use async I/O (NIO) because they are more resource efficient for applications under high load.
In addition, to handle resource efficiently the whole flow must be async. By using block() we are implicitly reintroducing thread-per-connection approach that will eliminate most of the benefits of the NIO. At the same time using WebClient with block() could be considered as a first step for migration to the fully reactive application.
Great question.
Last week we considered migrating from resttemplate to webclient.
This week, I start testing the performance between the blocking webclient and the resttemplate, to my surprise, the resttemplate performed better in scenarios where the response payloads were large. The difference was considerably large, with the resttemplate taking less than half the time to respond and using fewer resources.
I'm still carrying out the performance tests, now I started the tests with a wider range of users for request.
The application is mvc and is using spring 5.13.19 and spring boot 2.6.7.
For perfomance testing I'm using jmeter and for health check visualvm/jconsole

How should one do sync http requests in modern Spring?

For a long time, Spring has been recommending RestTemplate for sync http requests. However, nowadays the documentation says:
NOTE: As of 5.0 this class is in maintenance mode, with only minor requests for changes and bugs to be accepted going forward. Please, consider using the org.springframework.web.reactive.client.WebClient which has a more modern API and supports sync, async, and streaming scenarios.
But I haven't been able to see how one is recommended to use WebClient for sync scenarios. There is this in the documentation:
WebClient can be used in synchronous style by blocking at the end for the result
and I've seen some codebases using .block() all over the place. However, my problem with this is that with some experience in reactive frameworks, I've grown to understand that blocking a reactive call is a code smell and should really be used in testing only. For example this page says
Sometimes you can only migrate part of your code to be reactive, and you need to reuse reactive sequences in more imperative code.
Thus if you need to block until the value from a Mono is available, use Mono#block() method. It will throw an Exception if the onError event is triggered.
Note that you should avoid this by favoring having reactive code end-to-end, as much as possible. You MUST avoid this at all cost in the middle of other reactive code, as this has the potential to lock your whole reactive pipeline.
So is there something I've missed that avoids block()s but allows you to do sync calls, or is using block() everywhere really the way?
Or is the intent of WebClient API to imply that one just shouldn't do blocking anywhere in your codebase anymore? As WebClient seems to be the only alternative for future http calls offered by Spring, is the only viable choice in the future to use non-blocking calls throughout your codebase, and change the rest of the codebase to accommodate that?
There's a related question here but it focuses on the occurring exception only, whereas I would be interested to hear what should be the approach in general.
Firstly accoring to the WebClient Java docs
public interface WebClient
Non-blocking, reactive client to perform HTTP requests, exposing a fluent, reactive API over underlying HTTP client libraries such as
Reactor Netty. Use static factory methods create() or create(String),
or builder() to prepare an instance.
So webClient is not created to be blocking somehow.
However the response that webClient returns can be of type <T> reactor.core.publisher.Flux<T> and other times of type <T> reactor.core.publisher.Mono<T>. Flux and Mono from reactor project are the ones that have blocking methods. ResponseSpec from WebClient.
WebClient was designed to be a reactive client.
As you might have seen from other reactive libraries from other languages example RxJs for javascript the reactive programming is usually based on functional programming.
What happens here with Flux and Mono from reactor project is that they allow you to make block() in order to make synchronous execution without the need of functional programming.
Here is a part of an article that I find much interesting
Extractors: The Subscribers from the Dark Side
There is another way to
subscribe to a sequence, which is to call Mono.block() or
Mono.toFuture() or Flux.toStream() (these are the "extractor"
methods — they get you out of the Reactive types into a less flexible,
blocking abstraction). Flux also has converters collectList() and
collectMap() that convert from Flux to Mono. They don’t actually
subscribe to the sequence, but they do throw away any control you
might have had over the suscription at the level of the individual
items.
Warning A good rule of thumb is "never call an extractor". There are
some exceptions (otherwise the methods would not exist). One notable
exception is in tests because it’s useful to be able to block to allow
results to accumulate. These methods are there as an escape hatch to
bridge from Reactive to blocking; if you need to adapt to a legacy
API, for instance Spring MVC. When you call Mono.block() you throw
away all the benefits of the Reactive Streams
So can you do synchronous programming without using the block() operations ?
Yes you can but then you have to think in terms of functional programming
for your application.
Example
public void doSomething1( ) {
webClientCall_1....subscribe( response1 -> {
...do something else ...
webClientCall_2....subscribe( response2 -> {
...do something else more with response1 and response2 available here...
});
});
}
This is called subscribe callback hell. You can avoid it using .block() methods but again as the provided article mentioned they throw away the reactive nature of that library.

quarkus reactive mutiny thread pool management

Background: I'm just this week getting started with Quarkus, though I've worked with some streaming platforms before (especially http4s/fs2 in scala).
Working with quarkus reactive (with mutiny) and any reactive database client (mutiny reactive postgres, reactive elasticsearch, etc.) I'm a little confused how to correctly manage blocking calls and thread pools.
The quarkus documentation suggests imperative code or cpu-intensive code to annotated with #Blocking to ensure it is shifted to a worker pool to not block the IO pool. This makes sense.
Consider the following:
public class Stuff {
// imperative, cpu intensive
public static int cpuIntensive(String arg) { /* ... */ }
// blocking IO
public static List<Integer> fetchFromDb() { /* ... */ }
// reactive IO
public static Multi<String> fetchReactive() { /* ... */ }
// reactive IO with CPU processing
public static Multi<String> fetchReactiveCpuIntensive(String arg) {
fetchReactive() // reactive fetch
.map(fetched -> cpuIntensive(arg + fetched)) // cpu intensive work
}
}
It's not clear to me what happens in each of the above conditions and where they get executed if they were called from a resteasy-reactive rest endpoint without the #Blocking annotation.
Presumably it's safe to use any reactive client in a reactive rest endpoint without #Blocking. But does wrapping a blocking call in Uni accomplish as much for 'unsafe' clode? That is, will anything returning a Multi/Uni effectively run in the worker pool?
(I'll open follow-up posts about finer control of thread pools as I don't see any way to 'shift' reactive IO calls to a separate pool than cpu-intensive work, which would be optimal.)
Edit
This question might imply I'm asking about return types (Uni/Multi vs direct objects) but it's really about the ability to select the thread pool in use at any given time. this mutiny page on imperative-to-reactive somewhat answers my question actually, along with the mutiny infrastructure docs which state that "the default executor is already configured to use the Quarkus worker thread pool.", and the mutiny thread control docs handles the rest I think.
So my understanding is this:
If I have an endpoint which conditionally can return something non-blocking (e.g. a local non-blocking cache hit) then I can effectively return any way I want on the IO thread. But if said cache is a miss, I could either call a reactive client directly or use mutiny to run a blocking action on the quarkus worker pool. Similarly mutiny provides control to execute any given stream on a specific thread pool (executor).
And reactive clients (or anything effectively running on the non-IO pool) is safe to call because the IO loop is just subscribing to data emitted by the worker pool.
Lastly, it seems like I could configure a cpu-bound pool separately from an IO-bound worker pool and explicitly provide them as executors to whichever emitters I need. So ... I think I'm all set now.
This is very good question!
The return type of a RESTEasy Reactice endpoint does not have any effect on which thread the endpoint will be served on.
The only thing that determines the thread is the presense of #Blocking / #NonBlocking.
The reason for this is simple: By just using the return type, it is not possible to know if the operation actually takes a long time to finish (and thus block the thread).
A non-reactive return type for example does not imply that the operation is CPU intensive (as you could for example just be returning some canned JSON response).
A reactive type on the other hand provides no guarantee that the operation is non-blocking, because as you mention, a user could simply wrap a blocking operation with a reactive return type.

Run Async task, before return flux db entities

I have Flux<URL>. How can I make multiple concurrent void requests for each URL (for example myWebClient.refresh(URL)), then (after all requests done) read data from the database and return Flux<MyAnyEntity> (for example repo.findAll())?
You can achieve that using Flux/Mono operators:
// get the URIs from somewhere
Flux<URI> uris = //...
Flux<MyAnyEntity> entities = uris
// map each URI to a HTTP client call and do nothing with the response
.flatMap(uri -> webClient.get().uri(uri).exchange().then())
// chain that call with a call on your repository
.thenMany(repo.findAll());
Update:
This code is naturally asynchronous, non-blocking so all operations in the flatMap operator will be executed concurrently, according to the demand communicated by the consumer (this is the backpressure we're talking about).
If the Reactive Streams Subscriber request(N) elements, then N requests might be executed concurrently. I don't think this is not something you want to deal with directly, although you can influence things using windowing operators for micro-bacthing operations.
Using .subscribeOn(Schedulers.parallel()) will not improve concurrency in this case - as stated in the reference documentation you should only use that for CPU-bound work.

Best practices with Akka in Scala and third-party Java libraries

I need to use memcached Java API in my Scala/Akka code. This API gives you both synchronous and asynchronous methods. The asynchronous ones return java.util.concurrent.Future. There was a question here about dealing with Java Futures in Scala here How do I wrap a java.util.concurrent.Future in an Akka Future?. However in my case I have two options:
Using synchronous API and wrapping blocking code in future and mark blocking:
Future {
blocking {
cache.get(key) //synchronous blocking call
}
}
Using asynchronous Java API and do polling every n ms on Java Future to check if the future completed (like described in one of the answers above in the linked question above).
Which one is better? I am leaning towards the first option because polling can dramatically impact response times. Shouldn't blocking { } block prevent from blocking the whole pool?
I always go with the first option. But i am doing it in a slightly different way. I don't use the blocking feature. (Actually i have not thought about it yet.) Instead i am providing a custom execution context to the Future that wraps the synchronous blocking call. So it looks basically like this:
val ecForBlockingMemcachedStuff = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(100)) // whatever number you think is appropriate
// i create a separate ec for each blocking client/resource/api i use
Future {
cache.get(key) //synchronous blocking call
}(ecForBlockingMemcachedStuff) // or mark the execution context implicit. I like to mention it explicitly.
So all the blocking calls will use a dedicated execution context (= Threadpool). So it is separated from your main execution context responsible for non blocking stuff.
This approach is also explained in a online training video for Play/Akka provided by Typesafe. There is a video in lesson 4 about how to handle blocking calls. It is explained by Nilanjan Raychaudhuri (hope i spelled it correctly), who is a well known author for Scala books.
Update: I had a discussion with Nilanjan on twitter. He explained what the difference between the approach with blocking and a custom ExecutionContext is. The blocking feature just creates a special ExecutionContext. It provides a naive approach to the question how many threads you will need. It spawns a new thread every time, when all the other existing threads in the pool are busy. So it is actually an uncontrolled ExecutionContext. It could create lots of threads and lead to problems like an out of memory error. So the solution with the custom execution context is actually better, because it makes this problem obvious. Nilanjan also added that you need to consider circuit breaking for the case this pool gets overloaded with requests.
TLDR: Yeah, blocking calls suck. Use a custom/dedicated ExecutionContext for blocking calls. Also consider circuit breaking.
The Akka documentation provides a few suggestions on how to deal with blocking calls:
In some cases it is unavoidable to do blocking operations, i.e. to put
a thread to sleep for an indeterminate time, waiting for an external
event to occur. Examples are legacy RDBMS drivers or messaging APIs,
and the underlying reason is typically that (network) I/O occurs under
the covers. When facing this, you may be tempted to just wrap the
blocking call inside a Future and work with that instead, but this
strategy is too simple: you are quite likely to find bottlenecks or
run out of memory or threads when the application runs under increased
load.
The non-exhaustive list of adequate solutions to the “blocking
problem” includes the following suggestions:
Do the blocking call within an actor (or a set of actors managed by a router), making sure to configure a thread pool which is either
dedicated for this purpose or sufficiently sized.
Do the blocking call within a Future, ensuring an upper bound on the number of such calls at any point in time (submitting an unbounded
number of tasks of this nature will exhaust your memory or thread
limits).
Do the blocking call within a Future, providing a thread pool with an upper limit on the number of threads which is appropriate for the
hardware on which the application runs.
Dedicate a single thread to manage a set of blocking resources (e.g. a NIO selector driving multiple channels) and dispatch events as they
occur as actor messages.
The first possibility is especially well-suited for resources which
are single-threaded in nature, like database handles which
traditionally can only execute one outstanding query at a time and use
internal synchronization to ensure this. A common pattern is to create
a router for N actors, each of which wraps a single DB connection and
handles queries as sent to the router. The number N must then be tuned
for maximum throughput, which will vary depending on which DBMS is
deployed on what hardware.

Categories