Wrapping blocking I/O in project reactor

Wrapping blocking I/O in project reactor - java

I have a spring-webflux API which, at a service layer, needs to read from an existing repository which uses JDBC.
Having done some reading on the subject, I would like to keep the execution of the blocking database call separate from the rest of my non-blocking async code.
I have defined a dedicated jdbcScheduler:
#Bean
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(Executors.newFixedThreadPool(maxPoolSize));
}
And an AsyncWrapper utility to use it:
#Component
public class AsyncJdbcWrapper {
private final Scheduler jdbcScheduler;
#Autowired
public AsyncJdbcWrapper(Scheduler jdbcScheduler) {
this.jdbcScheduler = jdbcScheduler;
}
public <T> Mono<T> async(Callable<T> callable) {
return Mono.fromCallable(callable)
.subscribeOn(jdbcScheduler)
.publishOn(Schedulers.parallel());
}
}
Which is then used to wrap jdbc calls like so:
Mono<Integer> userIdMono = asyncWrapper.async(() -> userDao.getUserByUUID(request.getUserId()))
.map(userOption -> userOption.map(u -> u.getId())
.orElseThrow(() -> new IllegalArgumentException("Unable to find user with ID " + request.getUserId())));
I've got two questions:
1) Am I correctly pushing the execution of blocking calls to another set of threads? Being fairly new to this stuff I'm struggling with the intricacies of subscribeOn()/publishOn().
2) Say I want to make use of the resulting mono, e.g call an API with the result of the userIdMono, on which scheduler will that be executed? The one specifically created for the jdbc calls, or the main(?) thread that reactor usually operates within? e.g.
userIdMono.map(id -> someApiClient.call(id));

1) Use of subscribeOn is correctly putting the JDBC work on the jdbcScheduler
2) Neither, the results of the Callable - while computed on the jdbcScheduler, are publishOn the parallel Scheduler, so your map will be executed on a thread from the Schedulers.parallel() pool (rather than hogging the jdbcScheduler).

Related

Thousands of rest calls with spring boot

Let's say that we have the following entities: Project and Release, which is a one to many relationship.
Upon an event consumption from an SQS queue where a release id is sent as part of the event, there might be scenarios where we might have to create thousands of releases in our DB, where for each release we have to make a rest call to a 3rd party service in order to get some information for each release.
That means that we might have to make thousands of calls, in some cases more than 20k calls just to retrieve the information for the different releases and store it in the DB.
Obviously this is not scalable, so I'm not really sure what's the way to go in this scenario.
I know I might use a CompletableFuture, but I'm not sure how to use that with spring.
The http client that I am using is WebClient.
Any ideas?

You can make the save queries in a method transactional by adding the annotation #Transactional above the method signature. The method should also be public, or else this annotation is ignored.
As for using CompletableFuture in spring; You could make a http call method asynchronous by adding the #Async annotation above its signature and by letting it return a CompletableFuture as a return type. You should return a completed future holding the response value from the http call. You can easily make a completed future with the method CompletableFuture.completedFuture(yourValue). Spring will only return the completed future once the asynchronous method is done executing everything int its code block. For #Async to work you must also add the #EnableAsync annotation to one of your configuration classes. On top of that the #Async annotated method must be public and cannot be called by a method from within the same class. If the method is private or is called from within the same class then the #Async annotation will be ignored and instead the method will be executed in the same thread as the calling method is executed.
Next to an #Async annotated method you could also use a parallelStream to execute all 20K http calls in parallel. For example:
List<Long> releaseIds = new ArrayList<>();
Map<Long,ReleaseInfo> releaseInfo = releaseIds.parallelStream().map(releaseId -> new AbstractMap.SimpleEntry<>(releaseId, webClient.getReleaseInfo(releaseId)).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
Lastly you could also use a ThreadPoolExecutor to execute the http calls in parallel. An example:
List<Long> releaseIds = new ArrayList<>();
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()); //I've made the amount of threads in the pool equal to the amount of available CPU processors on the machine.
//Submit tasks to the executor
List<Future<ReleaseInfo>> releaseInfoFutures = releaseIds.stream().map(releaseId -> executor.submit(() -> webClient.getReleaseInfo(releaseId)).collect(Collectors.toList());
//Wait for all futures to complete and map all non-null values to ReleaseInfo list.
List<ReleaseInfo> releaseInfo = releaseInfoFutures.stream().map(this::getValueAfterFutureCompletion).filter(releaseInfo -> releaseInfo != null).collect(Collectors.toList());
private ReleaseInfo getValueAfterFutureCompletion(Future<ReleaseInfo> future){
ReleaseInfo releaseInfo = null;
try {
releaseInfo = future.get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
} finally {
return releaseInfo;
}
}
Make sure to call shutdownNow() on ThreadPoolExecutor after you're done with it to avoid memory leaks.

ThreadLocals on GraphQL-Java

I'm exposing a legacy web app on GraphQL, but this web app uses Threadlocals (amongst other Apache-Shiro).
Since GraphQL-java seems to be using the fork-join pool for concurrency I worry about how far I need to go to ensure that my ThreadLocals still work and work safely.
Reading the documentation and the source it seems a large part of the concurrency is achieved by DataFetchers that return CompletableFuture's I can't tell for sure if that's the only source of concurrency (i think not) and whether the DataFetchers themselves are invoked from the fork-join pool
So would it be Safe to wrap my DataFetcher's in a delegate that set and clears the ThreadLocals? or does that still have the risk of being preempted and continued on another thread in the fork-join pool something like:
static class WrappedDataFetcher implements DataFetcher<Object> {
private DataFetcher<?> realDataFetcher;
WrappedDataFetcher(DataFetcher<?> realDataFetcher) {
this.realDataFetcher = realDataFetcher;
}
#Override
public Object get(DataFetchingEnvironment dataFetchingEnvironment) throws Exception {
try {
setThreadLocalsFromRequestOrContext(dataFetchingEnvironment);
return realDataFetcher.get(dataFetchingEnvironment);
} finally {
clearTreadLocals();
}
}
}
Or would I need to explicitly run my DataFetchers in a Threadpool like:
static class WrappedDataFetcherThreadPool implements DataFetcher<Object> {
private DataFetcher<?> wrappedDataFetcher;
private ThreadPoolExecutor executor;
WrappedDataFetcherThreadPool(DataFetcher<?> realDataFetcher, ThreadPoolExecutor executor) {
// Wrap in Wrapper from previous example to ensure threadlocals in the executor
this.wrappedDataFetcher = new WrappedDataFetcher(realDataFetcher);
this.executor = executor;
}
#Override
public Object get(DataFetchingEnvironment dataFetchingEnvironment) throws Exception {
Future<?> future = executor.submit(() -> wrappedDataFetcher.get(dataFetchingEnvironment));
return future.get(); //for simplicity / clarity of the question
}
}
I think the second one solves my problem but it feels like overkill and I worry about performance. But I think the first risks preemption.
If there is a better way to handle this I would love to hear it as well.
Note: this is not about the async nature of GraphQL (I hope to leverage that as well) but about the possible side effect of running multiple requests WITH treadLocals that might get mixed up between requests due to the fork-join pool

As far as I know graphql-java does not use its own thread pool and relies on the application for it. The way it achieves it using future callbacks. Say this is the current state of the application.
Thread T_1 with thread local storage TLS_1 executing data fetcher DF_1.
Graphql-java engine attaches a synchronous callback to the future returned by DF_1. If a future is not returned it wraps the result in a completed future and then attaches the synchronous callback. Since the callback is synchronous the thread that completes the future runs the callback. If any other thread apart from T_1 completes the future, TLS_1 is lost(unless it's copied over to the executing thread). One example of this is a non blocking HTTP I/O library which uses an I/O thread to complete the response future.
Here is a link where the authors have commented more on the thread behavior in graphql-java library
https://spectrum.chat/graphql-java/general/how-to-supply-custom-executor-service-for-data-fetchers-to-run-on~29caa730-9114-4883-ab4a-e9700f225f93

How can I run a concurrent queue of tasks using Rx?

I've found a lot of examples about it and doesn't know what's the 'right' implementation right there.
Basically I've got a object (let's call it NBAManager) and there's a method public Completable generateGame() for this object. The idea is that generateGame method gets called a lot of times and I want to generate games in a sequential way: I was thinking about concurrent queue. I came up with the following design: I'd create a singleton instance of NBAService: service for NBAManager and the body of generateGame() will look like this:
public Completable generateGame(RequestInfo info)
return service.generateGame(info);
So basically I'll pass up that Completable result. And inside of that NBAService object I'll have a queue (a concurrent one, because I want to have an opportunity to poll() and add(request) if there's a call of generateGame() while NBAManager was processing one of the earlier requests) of requests. I got stuck with this:
What's the right way to write such a job queue in Rx way? There're so many examples of it. Could you send me a link of a good implementation?
How do I handle the logic of queue execution? I believe we've to execute if there's one job only and if there're many then we just have to add it and that's it. How can I control it without runnable? I was thinking about using subjects.
Thanks!

There are multiple ways to implement this, you can choose how much RxJava should be invoked. The least involvement can use a single threaded ExecutorService as the "queue" and CompletableSubject for the delayed completion:
class NBAService {
static ExecutorService exec = Executors.newSingleThreadedExecutor();
public static Completable generateGame(RequestInfo info) {
CompletableSubject result = CompletableSubject.create();
exec.submit(() -> {
// do something with the RequestInfo instance
f(info).subscribe(result);
});
return result;
}
}
A more involved solution would be if you wanted to trigger the execution when the Completable is subscribed to. In this case, you can go with create() and subscribeOn():
class NBAService {
public static Completable generateGame(RequestInfo info) {
return Completable.create(emitter -> {
// do something with the RequestInfo instance
emitter.setDisposable(
f(info).subscribe(emitter::onComplete, emitter::onError)
);
})
.subscribeOn(Schedulers.single());
}
}

RxJava Multithreading with Realm - Realm access from incorrect thread

Background
I am using Realm within my app. When data is loaded it then undergoes intense processing therefore the processing occurs on a background thread.
The coding pattern in use is the Unit of Work pattern and Realm only exists within a repository under a DataManager. The idea here is that each repository can have a different database/file storage solution.
What I have tried
Below is an example of some similar code to what I have in my FooRespository class.
The idea here is that an instance of Realm is obtained, used to query the realm for objects of interest, return them and close the realm instance. Note that this is synchronous and at the end copies the objects from Realm to an unmanaged state.
public Observable<List<Foo>> getFoosById(List<String> fooIds) {
Realm realm = Realm.getInstance(fooRealmConfiguration);
RealmQuery<Foo> findFoosByIdQuery = realm.where(Foo.class);
for(String id : fooIds) {
findFoosByIdQuery.equalTo(Foo.FOO_ID_FIELD_NAME, id);
findFoosByIdQuery.or();
}
return findFoosByIdQuery
.findAll()
.asObservable()
.doOnUnsubscribe(realm::close)
.filter(RealmResults::isLoaded)
.flatMap(foos -> Observable.just(new ArrayList<>(realm.copyFromRealm(foos))));
}
This code is later used in conjunction with the heavy processing code via RxJava:
dataManager.getFoosById(foo)
.flatMap(this::processtheFoosInALongRunningProcess)
.subscribeOn(Schedulers.io()) //could be Schedulers.computation() etc
.subscribe(tileChannelSubscriber);
After reading the docs, my belief is that the above should work, as it is NOT asynchronous and therefore does not need a looper thread. I obtain the instance of realm within the same thread therefore it is not being passed between threads and neither are the objects.
The problem
When the above is executed I get
Realm access from incorrect thread. Realm objects can only be accessed
on the thread they were created.
This doesn't seem right. The only thing I can think of is that the pool of Realm instances is getting me an existing instance created from another process using the main thread.

Kay so
return findFoosByIdQuery
.findAll()
.asObservable()
This happens on UI thread, because that's where you're calling it from initially
.subscribeOn(Schedulers.io())
Aaaaand then you're tinkering with them on Schedulers.io().
Nope, that's not the same thread!
As much as I dislike the approach of copying from a zero-copy database, your current approach is riddled with issues due to misuse of realmResults.asObservable(), so here's a spoiler for what your code should be:
public Observable<List<Foo>> getFoosById(List<String> fooIds) {
return Observable.defer(() -> {
try(Realm realm = Realm.getInstance(fooRealmConfiguration)) { //try-finally also works
RealmQuery<Foo> findFoosByIdQuery = realm.where(Foo.class);
for(String id : fooIds) {
findFoosByIdQuery.equalTo(FooFields.ID, id);
findFoosByIdQuery.or(); // please guarantee this works?
}
RealmResults<Foo> results = findFoosByIdQuery.findAll();
return Observable.just(realm.copyFromRealm(results));
}
}).subscribeOn(Schedulers.io());
}

Note that you are creating the instance outside of all your RxJava processing pipeline. Thus on the main thread (or whichever thread you are on, when calling getFoosById().
Just because the method returns an Observable doesn't mean that it runs on another thread. Only the processing pipeline of the Observable created by the last statement of your getFoosById() method runs on the correct thread (the filter(), the flatMap() and all the processing done by the caller).
You thus have to ensure that the call of getFoosById()is already done on the thread used by Schedulers.io().
One way to achieve this is by using Observable.defer():
Observable.defer(() -> dataManager.getFoosById(foo))
.flatMap(this::processtheFoosInALongRunningProcess)
.subscribeOn(Schedulers.io()) //could be Schedulers.computation() etc
.subscribe(tileChannelSubscriber);

Concept of promises in Java

Is there a concept of using promises in java (just like ut is used in JavaScript) instead of using nested callbacks ?
If so, is there an example of how the callback is implemented in java and handlers are chained ?

Yep! Java 8 calls it CompletableFuture. It lets you implement stuff like this.
class MyCompletableFuture<T> extends CompletableFuture<T> {
static final Executor myExecutor = ...;
public MyCompletableFuture() { }
public <U> CompletableFuture<U> newIncompleteFuture() {
return new MyCompletableFuture<U>();
}
public Executor defaultExecutor() {
return myExecutor;
}
public void obtrudeValue(T value) {
throw new UnsupportedOperationException();
}
public void obtrudeException(Throwable ex) {
throw new UnsupportedOperationException();
}
}
The basic design is a semi-fluent API in which you can arrange:
(sequential or async)
(functions or actions)
triggered on completion of
i) ("then") ,or ii) ("andThen" and "orThen")
others. As in:
MyCompletableFuture<String> f = ...; g = ...
f.then((s -> aStringFunction(s)).thenAsync(s -> ...);
or
f.andThen(g, (s, t) -> combineStrings).or(CompletableFuture.async(()->...)....

UPDATE 7/20/17
I wanted to edit that there is also a Library called "ReactFX" which is supposed to be JavaFX as a reactive framework. There are many Reactive Java libraries from what I've seen, and since Play is based on the Reactive principal, I would assume that these Reactive libraries follow that same principal of non-blocking i/o, async calls from server to client and back while having communication be send by either end.
These libraries seem to be made for the client side of things, but there might be a Server reactive library as well, but I would assume that it would be wiser to use Play! with one of these client side reactive libraries.
You can take a look at https://www.playframework.com/
which implements this functionality here
https://www.playframework.com/documentation/2.2.0/api/java/play/libs/F.Promise.html
Additonal reading https://www.playframework.com/documentation/2.5.x/JavaAsync
Creating non-blocking actions
Because of the way Play works, action code must be as fast as possible, i.e., non-blocking. So what should we return from our action if we are not yet able to compute the result? We should return the promise of a result!
Java 8 provides a generic promise API called CompletionStage. A CompletionStage<Result> will eventually be redeemed with a value of type Result. By using a CompletionStage<Result> instead of a normal Result, we are able to return from our action quickly without blocking anything. Play will then serve the result as soon as the promise is redeemed.
The web client will be blocked while waiting for the response, but nothing will be blocked on the server, and server resources can be used to serve other clients.
How to create a CompletionStage
To create a CompletionStage<Result> we need another promise first: the promise that will give us the actual value we need to compute the result:
CompletionStage<Double> promiseOfPIValue = computePIAsynchronously();
CompletionStage<Result> promiseOfResult = promiseOfPIValue.thenApply(pi ->
ok("PI value computed: " + pi)
);
Play asynchronous API methods give you a CompletionStage. This is the case when you are calling an external web service using the play.libs.WS API, or if you are using Akka to schedule asynchronous tasks or to communicate with Actors using play.libs.Akka.
A simple way to execute a block of code asynchronously and to get a CompletionStage is to use the CompletableFuture.supplyAsync() helper:
CompletionStage<Integer> promiseOfInt = CompletableFuture.supplyAsync(() -> intensiveComputation());
Note: It’s important to understand which thread code runs on which promises. Here, the intensive computation will just be run on another thread.
You can’t magically turn synchronous IO into asynchronous by wrapping it in a CompletionStage. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a CompletionStage, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency. See Understanding Play thread pools for more information.
It can also be helpful to use Actors for blocking operations. Actors provide a clean model for handling timeouts and failures, setting up blocking execution contexts, and managing any state that may be associated with the service. Also Actors provide patterns like ScatterGatherFirstCompletedRouter to address simultaneous cache and database requests and allow remote execution on a cluster of backend servers. But an Actor may be overkill depending on what you need.
Async results
We have been returning Result up until now. To send an asynchronous result our action needs to return a CompletionStage<Result>:
public CompletionStage<Result> index() {
return CompletableFuture.supplyAsync(() -> intensiveComputation())
.thenApply(i -> ok("Got result: " + i));
}
Actions are asynchronous by default
Play actions are asynchronous by default. For instance, in the controller code below, the returned Result is internally enclosed in a promise:
public Result index() {
return ok("Got request " + request() + "!");
}
Note: Whether the action code returns a Result or a CompletionStage<Result>, both kinds of returned object are handled internally in the same way. There is a single kind of Action, which is asynchronous, and not two kinds (a synchronous one and an asynchronous one). Returning a CompletionStage is a technique for writing non-blocking code.
Some info on CompletionStage
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletionStage.html
which is a subclass of the class mentioned in #Debosmit Ray's answer called CompletableFuture
This Youtube Video by LinkedIn dev Mr. Brikman explains a bit about Promises in
https://youtu.be/8z3h4Uv9YbE?t=15m46s
and
https://www.youtube.com/watch?v=4b1XLka0UIw
I believe the first video gives an example of a promise, the second video might also give some good info, I don't really recall which video had what content.
Either way the information here is very good, and worth looking into.
I personally do not use Play, but I have been looking at it for a long, long time, as it does a lot of really good stuff.

If you want to do Promise even before Java7, "java-promise" may be useful. (Of course it works with Java8)
You can easily control asynchronous operations like JavaScript's Promise.
https://github.com/riversun/java-promise
example
import org.riversun.promise.Promise;
public class Example {
public static void main(String[] args) {
Promise.resolve("foo")
.then(new Promise((action, data) -> {
new Thread(() -> {
String newData = data + "bar";
action.resolve(newData);
}).start();
}))
.then(new Promise((action, data) -> {
System.out.println(data);
action.resolve();
}))
.start();
System.out.println("Promise in Java");
}
}
result:
Promise in Java
foobar

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Wrapping blocking I/O in project reactor - java

Related

Thousands of rest calls with spring boot

ThreadLocals on GraphQL-Java

How can I run a concurrent queue of tasks using Rx?

RxJava Multithreading with Realm - Realm access from incorrect thread

Concept of promises in Java

Categories

Resources