TLDR : I have background processing going on in RxJava Observables, I am in integration tests, I would like to be able to independently wait for that processing to finish to make sure that background processing started from one test does not interfere with another test.
Simplified, I hava a #RequestMapping method that does the following :
insert data in database
launch an asynchronous processing of that data (http calls via Feign, db updates)
returns nothing (HttpStatus.NO_CONTENT)
This asynchronous processing was previously done with a ThreadPoolTaskExecutor. We're going to transition to RxJava and would like to remove this ThreadPoolTaskExecutor and do the background processing with RxJava.
So quite naively for the moment I tried to do that instead:
Observable
.defer(() -> Observable.just(call to long blocking method)
.subscribeOn(Schedulers.io())
.subscribe();
The end goal is of course to, one step at a time, go down into "call to long blocking method" and use Observable all the way.
Now before that I would like to make my integration tests work first. I am testing this by doing a RestTemplate call to the mapping. As most of the work is asynchronous my call returns really fast. Now I would like to find a way to wait for the asynchronous processing to finish (to make sure it does not conflict with another test).
Before RxJava I would just count the tasks in the ThreadPoolTaskExecutor and wait till it would reach 0.
How can I do that with RxJava ?
What I tried :
I tried to make all my Schedulers immediate with an RxJavaSchedulersHook : this cause some sort of blocking somewhere, code execution stops just before my Feign calls (Feign uses RxJava under the hood)
I tried to count the tasks with an Rx RxJavaObservableExecutionHook : I tried retaining the subscriptions, and removing them when isSubcribed = false, but this didn't work at all (lots of subscribers, the count never goes down)
I tried to put an observeOn(immediate()) in the real production code. This seems to work, and I could inject the right scheduler for runtime/test phases, but I am not really keen on putting code just for testing purposes in my real production code.
I'm probably terribly wrong, or overcomplicating thing, so don't hesitate to correct my reasonning !
How to you return HttpStatus.NO_CONTENT ?
#RequestMapping(value = "/")
public HttpStatus home() {
Observable.defer(() -> Observable.just(longMethod())
.subscribeOn(Schedulers.io())
.subscribe();
return HttpStatus.NO_CONTENT;
}
In this form, you can't know when the longMethod is finished.
If you wants to know when all async jobs are completed, you can return HttpStatus.NO_CONTENT when all jobs are completed, using Spring DefferedResult or using a TestSubscriber
PS: you can use Observable.fromCallable(() -> longMethod()); instead of Observable.defer(() -> Observable.just(longMethod()); if you want
Using DefferedResult
#RequestMapping(value = "/")
public DeferredResult<HttpStatus> index() {
DeferredResult<HttpStatus> deferredResult = new DeferredResult<HttpStatus>();
Observable.fromCallable(() -> longMethod())
.subscribeOn(Schedulers.io())
.subscribe(value -> {}, e -> deferredResult.setErrorResult(e.getMessage()), () -> deferredResult.setResult(HttpStatus.NO_CONTENT))
return deferredResult;
}
Like this, if you call your method, you'll get your result only when your observable complete (so, when the longMethod is finished)
Using TestSubscriber
You'll have to inject a TestSubscriber and when ask him to wait/check the completion of your Observable :
#RequestMapping(value = "/")
public HttpStatus home() {
Observable.defer(() -> Observable.just(longMethod())
.subscribeOn(Schedulers.io())
.subscribe(subscriber); // you'll have to inject this subscriber in your test
return HttpStatus.NO_CONTENT;
}
and in your test :
TestSubscriber subscriber = new TestSubscriber(); // you'll have to inject it into your controller
// ....
controller.home();
subscriber.awaitTerminalEvent();
subscriber.assertCompleted(); // check that no error occurred
You could use a ExecutorServiceAdapter to bridge from the Spring ThreadPoolTaskExecutor to the ExecutorService in RX, and then do the same trick as before.
A few month later in the game : my advice is simply "don't do that". RxJava is not really suited to this kind of job. Without going too much in detail having lots of "loose" Observable running in the background is not appropriate : depending on the volume of your requests you can easily fall into queue and memory issues, and more importantly what happens with all the scheduled and running tasks if the webserver crashes ? How do you restart that ?
Spring offers other better alternatives imho : Spring Batch, Spring Cloud Task, messaging with Spring Cloud Stream, so don't do as I did and just use the right tool for the right job.
Now If you really want to go the bad route :
Either return an SseEmmitter and consume only the first event from the SSE in the consumer service, and consume all events in your tests
Either create an RxJava lift operator that wraps (in the call method) the Subscriber in a parent Subscriber that has a waitForCompletion method. How you do the waiting is up to you (with a CountDownLatch for example). That subscriber would be added to a synchronized list (and removed from it once completed), and in your tests you could just iterate over the list and call waitForCompletion on each item of the list. It's not that complicated and I got it to work, but please, dont do that !
Related
I have such a controller and such a service class. Why am I getting this warning in IDEA - "Possibly blocking call in non-blocking context could lead to thread starvation" ?
#PostMapping(value = {"/create"})
public Mono<ResponseEntity<ResponseDto>> create(
#RequestBody RequestDto request) {
ResponseDto result = service.create(request);
return Mono.just(ResponseEntity.ok(result));
}
#Transactional
public ResponseDto create(RequestDto request) {
taskRepository.save(request);
return new ResponseDto("Ок");
}
This is apparently caused by the #Transactional annotation. When I remove it, the warning disappears. What is this problem and how can it be fixed?
p.s. this example is schematic. the real code is bigger.
The reactive process is contrary to the norm. You cannot use blocking elements here! With Tomcat, it creates a separate thread for each request so that the topic can be blocked. Reactive Netty will NOT create a new thread, just uses a fixed pool.
With the loose approach, you can think that if a process is waiting for a response, it gives the resource of that thread to another. If you block it, it won't be able to do that. Therefore, even with a single-threaded Netty, it can handle to serve multiple parallel requests.
Therefore, thread-based data storage also does not work properly, because another process can interfere or modify it. Therefore, reactive context is available instead.
There is a article to reactive transaction. I don't know it will be solution for you:
https://itnext.io/integrating-hibernate-reactive-with-spring-5427440607fe
Here I've three subflows and out of which one is HTTP outbound call. I want that HTTP call should try to get response till a mentioned time. If times out then the main flow should break and it should show a Error message in Json format as output.
Below is the code -
#Bean
public IntegrationFlow flow() {
return flow ->
flow.handle(validatorService, "validateRequest")
.split()
.channel(c -> c.executor(Executors.newCachedThreadPool()))
.scatterGather(
scatterer ->
scatterer
.applySequence(true)
.recipientFlow(flow1())
.recipientFlow(
f ->
f.gateway(
flow2(), gateway -> gateway.replyTimeout(3000L).errorChannel("errorChannel")))
.recipientFlow(flow3()),
gatherer ->
gatherer
.releaseLockBeforeSend(true)
.releaseStrategy(group -> group.size() == 2))
.aggregate(someMethod1())
.to(someMethod2());
}
private IntegrationFlow returnError() {
return IntegrationFlows.from("errorChannel").handle(System.out::println).get();
}
I've added the errorChannel but how do I send a customized message to the user?
See documentation for error handling in the messaging gateway: https://docs.spring.io/spring-integration/docs/current/reference/html/messaging-endpoints.html#gateway-no-response.
Consider to add an errorChannel() along side with that replyTimeout() on the gateway definition to build an error reply you'd like. However you also may consider to add something like a request timeout for the RestTemplate you use for that HTTP call to prevent the long wait for HTTP response.
UPDATE
First of all you need to understand that request-reply is a bit of block-n-then-wait approach. We send a request and if the process consuming that message is blocking - performed immediately in a thread producing the message, then we don't reach "wait" part until that process lets go. In most cases (by default) a DirectChannel is used, so it is blocked because it is performed in the calling thread. This process might be your HTTP call which is also request-response pattern. So, we don't reach that "wait" part until this HTTP call returns, or timeout, or fail with error. Only after that a replyTimeout takes its effect to wait for the reply from the underlying process. This can be changes if an input channel of that process is not direct. See an ExecutorChannel or QueueChannel. This way a sending part exits immediately because there is nothing to block it and it goes to the "wait" part to observe a CountDownLatch.
So, you need to think again if that replyTimeout() option is appropriate for your or not. Perhaps the mentioned requestTimeout for the RestTemplate is better option for you, than rework your flow to the async solution to leverage that replyTimeout() feature. Again: see the documentation I've mentioned about that replyTimeout feature.
The error handling is described here: https://docs.spring.io/spring-integration/docs/current/reference/html/error-handling.html#error-handling.
It is really not recommended to rely on the global errorChannel bean. This is one which is used everywhere in async processes where there is no an explicit error channel configured.
You said in your question "send a customized message to the user", but your error handling flow is one-way - System.out::println. If you want to return anything from the error handling flow, the endpoint must be replying one, e.g.:
.handle((p, h) -> "The error during HTTP call: " + p)
Also see if you declare that returnError() correctly. It really cannot be just plain private method. The IntegrationFlow must be declared as a bean this or other way to initiate wiring process for endpoints and channels. Right now that one is just a plain, unused private method. The framework does not see that method to do anything. See basics of the Java DSL in docs: https://docs.spring.io/spring-integration/docs/current/reference/html/dsl.html#java-dsl
I have a Spring Rest Controller endpoint:
#PostMapping("/someheavyjob")
public Response indexAllReports() {
someService.startLongRunningOperation();
return Response.ok().build();
}
and then in the service class I have a method which performs many computations (databse read, other API calls etc.) :
public void startLongRunningOperation(){
List<String> involvedIds = otherApi.getAllActiveMembersIds(..);
involvedIds.forEach(id -> {
anotherComputatioMethod();
});
}
I know that this approach is bloking user's request until complete the job. I can solve it, it's in that state just to make it clear.
Quesiton: What I am considering about it is:
There should be only one instance of this heavy method running at a time (method: startLongRunningOperation).
Right now, as this method can be invoked from Rest controller, each api call will start this heavy method in new thread. There are mechanisms to rate limit users requests in java (eg. bucket4j), but wanted to ask you guys, what is the best way to handle this case ? Just one instance of long running task fired from Rest API calling Spring Service (which is and should be stateless) .
Edit 1:
To make it clear - blocking: I mean user have to wait for response unit whole task is finished - it can take minutes.
What I want to achieve is not to make this service method synchronized, but when the request comes, and in that time this long running task is working, then reject that 2nd request.
I am using Spring Webflux with Spring data jpa using PostgreSql as backend db.
I don't want to block the main thread while making db calls like find and save.
To achieve the same, I have a main scheduler in Controller class and a jdbcScheduler service classes.
The way I have defined them is:
#Configuration
#EnableJpaAuditing
public class CommonConfig {
#Value("${spring.datasource.hikari.maximum-pool-size}")
int connectionPoolSize;
#Bean
public Scheduler scheduler() {
return Schedulers.parallel();
}
#Bean
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(Executors.newFixedThreadPool(connectionPoolSize));
}
#Bean
public TransactionTemplate transactionTemplate(PlatformTransactionManager transactionManager) {
return new TransactionTemplate(transactionManager);
}
}
Now, while doing a get/save call in my service layer I do:
#Override
public Mono<Config> getConfigByKey(String key) {
return Mono.defer(
() -> Mono.justOrEmpty(configRepository.findByKey(key)))
.subscribeOn(jdbcScheduler)
.publishOn(scheduler);
}
#Override
public Flux<Config> getAllConfigsAfterAppVersion(int appVersion) {
return Flux
.fromIterable(configRepository.findAllByMinAppVersionIsGreaterThanEqual(appVersion))
.subscribeOn(jdbcScheduler)
.publishOn(scheduler);
}
#Override
public Flux<Config> addConfigs(List<Config> configList) {
return Flux.fromIterable(configRepository.saveAll(configList))
.subscribeOn(jdbcScheduler)
.publishOn(scheduler);
}
And in controller, I do:
#PostMapping
#ResponseStatus(HttpStatus.CREATED)
Mono<ResponseDto<List<Config>>> addConfigs(#Valid #RequestBody List<Config> configs) {
return configService.addConfigs(configs).collectList()
.map(configList -> new ResponseDto<>(HttpStatus.CREATED.value(), configList, null))
.subscribeOn(scheduler);
}
Is this correct? and/or there is a way better way to do it?
What I understand by:
.subscribeOn(jdbcScheduler)
.publishOn(scheduler);
is that task will run on jdbcScheduler threads and later result will be published on my main parallel scheduler. Is this understanding correct?
Your understanding is correct with regards to publishOn and subscribeOn (see reference documentation in the reactor project about those operators).
If you call blocking libraries without scheduling that work on a specific scheduler, those calls will block one of the few threads available (by default, the Netty event loop) and your application will only be able to serve a few requests concurrently.
Now I'm not sure what you're trying to achieve by doing that.
First, the parallel scheduler is designed for CPU bound tasks, meaning you'll have few of them, as many (or a bit more) as CPU cores. In this case, it's like setting your threadpool size to the number of cores on a regular Servlet container. Your app won't be able to process a large number of concurrent requests.
Even if you choose a better alternative (like the elastic Scheduler), it will be still not as good as the Netty event loop, which is where request processing is scheduled natively in Spring WebFlux.
If your ultimate goal is performance and scalability, wrapping blocking calls in a reactive app is likely to perform worse than your regular Servlet container.
You could instead use Spring MVC and:
use usual blocking return types when you're dealing with a blocking library, like JPA
use Mono and Flux return types when you're not tied to such libraries
This won't be non-blocking, but this will be asynchronous still and you'll be able to do more work in parallel without dealing with the complexity.
IMHO, there a way to execute this operation doing a better use of resources from machine. Following documentation you can wrap the call in other Thread and with this you can continue your execution.
I have a use case when I should send email to the users.
First I create email body.
Mono<String> emailBody = ...cache();
And then I select users and send the email to them:
Flux.fromIterable(userRepository.findAllByRole(Role.USER))
.map(User::getEmail)
.doOnNext(email -> sendEmail(email, emailBody.block(), massSendingSubject))
.subscribe();
What I don't like
Without cache() method emailBody Mono calculates in each iteration step.
To get emailBody value I use emailBody.block() but maybe there's a reactive way and not call block method inside Flux flow?
There are several issues in this code sample.
I'll assume that this is a reactive web application.
First, it's not clear how you are creating the email body; are you fetching things from a database or a remote service? If it is mostly CPU bound (and not I/O), then you don't need to wrap that into a reactive type. Now if it should be wrapper in a Publisher and the email content is the same for all users, using the cache operator is not a bad choice.
Also, Flux.fromIterable(userRepository.findAllByRole(Role.USER)) suggest that you're calling a blocking repository from a reactive context.
You should never do heavy I/O operations in a doOn*** operator. Those are made for logging or light side-effects operations. The fact that you need to .block() on it is another clue that you'll block your whole reactive pipeline.
Last one: you should not call subscribe anywhere in a web application. If this is bound to an HTTP request, you're basically triggering the reactive pipeline with no guarantee about resources or completion. Calling subscribe triggers the pipeline but does not wait until it's complete (this method returns a Disposable).
A more "typical" sample of that would look like:
Flux<User> users = userRepository.findAllByRole(Role.USER);
String emailBody = emailContentGenerator.createEmail();
// sendEmail() should return Mono<Void> to signal when the send operation is done
Mono<Void> sendEmailsOperation = users
.flatMap(user -> sendEmail(user.getEmail(), emailBody, subject))
.then();
// something else should subscribe to that reactive type,
// you could plug that as a return value of a Controller for example
If you're somehow stuck with blocking components (the sendEmail one, for example), you should schedule those blocking operations on a specific scheduler to avoid blocking your whole reactive pipeline. For that, look at the Schedulers section on the reactor reference documentation.