RxJava Multithreading with Realm - Realm access from incorrect thread

RxJava Multithreading with Realm - Realm access from incorrect thread - java

Background
I am using Realm within my app. When data is loaded it then undergoes intense processing therefore the processing occurs on a background thread.
The coding pattern in use is the Unit of Work pattern and Realm only exists within a repository under a DataManager. The idea here is that each repository can have a different database/file storage solution.
What I have tried
Below is an example of some similar code to what I have in my FooRespository class.
The idea here is that an instance of Realm is obtained, used to query the realm for objects of interest, return them and close the realm instance. Note that this is synchronous and at the end copies the objects from Realm to an unmanaged state.
public Observable<List<Foo>> getFoosById(List<String> fooIds) {
Realm realm = Realm.getInstance(fooRealmConfiguration);
RealmQuery<Foo> findFoosByIdQuery = realm.where(Foo.class);
for(String id : fooIds) {
findFoosByIdQuery.equalTo(Foo.FOO_ID_FIELD_NAME, id);
findFoosByIdQuery.or();
}
return findFoosByIdQuery
.findAll()
.asObservable()
.doOnUnsubscribe(realm::close)
.filter(RealmResults::isLoaded)
.flatMap(foos -> Observable.just(new ArrayList<>(realm.copyFromRealm(foos))));
}
This code is later used in conjunction with the heavy processing code via RxJava:
dataManager.getFoosById(foo)
.flatMap(this::processtheFoosInALongRunningProcess)
.subscribeOn(Schedulers.io()) //could be Schedulers.computation() etc
.subscribe(tileChannelSubscriber);
After reading the docs, my belief is that the above should work, as it is NOT asynchronous and therefore does not need a looper thread. I obtain the instance of realm within the same thread therefore it is not being passed between threads and neither are the objects.
The problem
When the above is executed I get
Realm access from incorrect thread. Realm objects can only be accessed
on the thread they were created.
This doesn't seem right. The only thing I can think of is that the pool of Realm instances is getting me an existing instance created from another process using the main thread.

Kay so
return findFoosByIdQuery
.findAll()
.asObservable()
This happens on UI thread, because that's where you're calling it from initially
.subscribeOn(Schedulers.io())
Aaaaand then you're tinkering with them on Schedulers.io().
Nope, that's not the same thread!
As much as I dislike the approach of copying from a zero-copy database, your current approach is riddled with issues due to misuse of realmResults.asObservable(), so here's a spoiler for what your code should be:
public Observable<List<Foo>> getFoosById(List<String> fooIds) {
return Observable.defer(() -> {
try(Realm realm = Realm.getInstance(fooRealmConfiguration)) { //try-finally also works
RealmQuery<Foo> findFoosByIdQuery = realm.where(Foo.class);
for(String id : fooIds) {
findFoosByIdQuery.equalTo(FooFields.ID, id);
findFoosByIdQuery.or(); // please guarantee this works?
}
RealmResults<Foo> results = findFoosByIdQuery.findAll();
return Observable.just(realm.copyFromRealm(results));
}
}).subscribeOn(Schedulers.io());
}

Note that you are creating the instance outside of all your RxJava processing pipeline. Thus on the main thread (or whichever thread you are on, when calling getFoosById().
Just because the method returns an Observable doesn't mean that it runs on another thread. Only the processing pipeline of the Observable created by the last statement of your getFoosById() method runs on the correct thread (the filter(), the flatMap() and all the processing done by the caller).
You thus have to ensure that the call of getFoosById()is already done on the thread used by Schedulers.io().
One way to achieve this is by using Observable.defer():
Observable.defer(() -> dataManager.getFoosById(foo))
.flatMap(this::processtheFoosInALongRunningProcess)
.subscribeOn(Schedulers.io()) //could be Schedulers.computation() etc
.subscribe(tileChannelSubscriber);

Related

ThreadLocals on GraphQL-Java

I'm exposing a legacy web app on GraphQL, but this web app uses Threadlocals (amongst other Apache-Shiro).
Since GraphQL-java seems to be using the fork-join pool for concurrency I worry about how far I need to go to ensure that my ThreadLocals still work and work safely.
Reading the documentation and the source it seems a large part of the concurrency is achieved by DataFetchers that return CompletableFuture's I can't tell for sure if that's the only source of concurrency (i think not) and whether the DataFetchers themselves are invoked from the fork-join pool
So would it be Safe to wrap my DataFetcher's in a delegate that set and clears the ThreadLocals? or does that still have the risk of being preempted and continued on another thread in the fork-join pool something like:
static class WrappedDataFetcher implements DataFetcher<Object> {
private DataFetcher<?> realDataFetcher;
WrappedDataFetcher(DataFetcher<?> realDataFetcher) {
this.realDataFetcher = realDataFetcher;
}
#Override
public Object get(DataFetchingEnvironment dataFetchingEnvironment) throws Exception {
try {
setThreadLocalsFromRequestOrContext(dataFetchingEnvironment);
return realDataFetcher.get(dataFetchingEnvironment);
} finally {
clearTreadLocals();
}
}
}
Or would I need to explicitly run my DataFetchers in a Threadpool like:
static class WrappedDataFetcherThreadPool implements DataFetcher<Object> {
private DataFetcher<?> wrappedDataFetcher;
private ThreadPoolExecutor executor;
WrappedDataFetcherThreadPool(DataFetcher<?> realDataFetcher, ThreadPoolExecutor executor) {
// Wrap in Wrapper from previous example to ensure threadlocals in the executor
this.wrappedDataFetcher = new WrappedDataFetcher(realDataFetcher);
this.executor = executor;
}
#Override
public Object get(DataFetchingEnvironment dataFetchingEnvironment) throws Exception {
Future<?> future = executor.submit(() -> wrappedDataFetcher.get(dataFetchingEnvironment));
return future.get(); //for simplicity / clarity of the question
}
}
I think the second one solves my problem but it feels like overkill and I worry about performance. But I think the first risks preemption.
If there is a better way to handle this I would love to hear it as well.
Note: this is not about the async nature of GraphQL (I hope to leverage that as well) but about the possible side effect of running multiple requests WITH treadLocals that might get mixed up between requests due to the fork-join pool

As far as I know graphql-java does not use its own thread pool and relies on the application for it. The way it achieves it using future callbacks. Say this is the current state of the application.
Thread T_1 with thread local storage TLS_1 executing data fetcher DF_1.
Graphql-java engine attaches a synchronous callback to the future returned by DF_1. If a future is not returned it wraps the result in a completed future and then attaches the synchronous callback. Since the callback is synchronous the thread that completes the future runs the callback. If any other thread apart from T_1 completes the future, TLS_1 is lost(unless it's copied over to the executing thread). One example of this is a non blocking HTTP I/O library which uses an I/O thread to complete the response future.
Here is a link where the authors have commented more on the thread behavior in graphql-java library
https://spectrum.chat/graphql-java/general/how-to-supply-custom-executor-service-for-data-fetchers-to-run-on~29caa730-9114-4883-ab4a-e9700f225f93

How can I run a concurrent queue of tasks using Rx?

I've found a lot of examples about it and doesn't know what's the 'right' implementation right there.
Basically I've got a object (let's call it NBAManager) and there's a method public Completable generateGame() for this object. The idea is that generateGame method gets called a lot of times and I want to generate games in a sequential way: I was thinking about concurrent queue. I came up with the following design: I'd create a singleton instance of NBAService: service for NBAManager and the body of generateGame() will look like this:
public Completable generateGame(RequestInfo info)
return service.generateGame(info);
So basically I'll pass up that Completable result. And inside of that NBAService object I'll have a queue (a concurrent one, because I want to have an opportunity to poll() and add(request) if there's a call of generateGame() while NBAManager was processing one of the earlier requests) of requests. I got stuck with this:
What's the right way to write such a job queue in Rx way? There're so many examples of it. Could you send me a link of a good implementation?
How do I handle the logic of queue execution? I believe we've to execute if there's one job only and if there're many then we just have to add it and that's it. How can I control it without runnable? I was thinking about using subjects.
Thanks!

There are multiple ways to implement this, you can choose how much RxJava should be invoked. The least involvement can use a single threaded ExecutorService as the "queue" and CompletableSubject for the delayed completion:
class NBAService {
static ExecutorService exec = Executors.newSingleThreadedExecutor();
public static Completable generateGame(RequestInfo info) {
CompletableSubject result = CompletableSubject.create();
exec.submit(() -> {
// do something with the RequestInfo instance
f(info).subscribe(result);
});
return result;
}
}
A more involved solution would be if you wanted to trigger the execution when the Completable is subscribed to. In this case, you can go with create() and subscribeOn():
class NBAService {
public static Completable generateGame(RequestInfo info) {
return Completable.create(emitter -> {
// do something with the RequestInfo instance
emitter.setDisposable(
f(info).subscribe(emitter::onComplete, emitter::onError)
);
})
.subscribeOn(Schedulers.single());
}
}

Wrapping blocking I/O in project reactor

I have a spring-webflux API which, at a service layer, needs to read from an existing repository which uses JDBC.
Having done some reading on the subject, I would like to keep the execution of the blocking database call separate from the rest of my non-blocking async code.
I have defined a dedicated jdbcScheduler:
#Bean
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(Executors.newFixedThreadPool(maxPoolSize));
}
And an AsyncWrapper utility to use it:
#Component
public class AsyncJdbcWrapper {
private final Scheduler jdbcScheduler;
#Autowired
public AsyncJdbcWrapper(Scheduler jdbcScheduler) {
this.jdbcScheduler = jdbcScheduler;
}
public <T> Mono<T> async(Callable<T> callable) {
return Mono.fromCallable(callable)
.subscribeOn(jdbcScheduler)
.publishOn(Schedulers.parallel());
}
}
Which is then used to wrap jdbc calls like so:
Mono<Integer> userIdMono = asyncWrapper.async(() -> userDao.getUserByUUID(request.getUserId()))
.map(userOption -> userOption.map(u -> u.getId())
.orElseThrow(() -> new IllegalArgumentException("Unable to find user with ID " + request.getUserId())));
I've got two questions:
1) Am I correctly pushing the execution of blocking calls to another set of threads? Being fairly new to this stuff I'm struggling with the intricacies of subscribeOn()/publishOn().
2) Say I want to make use of the resulting mono, e.g call an API with the result of the userIdMono, on which scheduler will that be executed? The one specifically created for the jdbc calls, or the main(?) thread that reactor usually operates within? e.g.
userIdMono.map(id -> someApiClient.call(id));

1) Use of subscribeOn is correctly putting the JDBC work on the jdbcScheduler
2) Neither, the results of the Callable - while computed on the jdbcScheduler, are publishOn the parallel Scheduler, so your map will be executed on a thread from the Schedulers.parallel() pool (rather than hogging the jdbcScheduler).

Deeper understanding about how Realm works?

Q1 : Please let me know what is different between two ways of implementation in below (about get realm instance). I want to know which one is faster, lighter in memory and what is recommended ?
1. Set and Get Realm as Default (with specific config)
private void setupCustomRealm() {
if (!Utils.isStringHasText(databaseName)) {
databaseName = DbManager.getInstance().getCurrentDb();
}
// get config
RealmConfiguration config = getRealmConfigByDBName(databaseName);
Realm.setDefaultConfiguration(config);
Realm.compactRealm(config);
}
public Realm getCustomRealm() {
if (firstTime) {
setupCustomRealm();
}
return Realm.getDefaultInstance();
}
2 .Get Realm from config directly
public Realm getCustomRealm(Context context) {
if (!Utils.isStringHasText(databaseName)) {
databaseName = DbManager.getInstance().getCurrentDb();
}
// get config
RealmConfiguration config = getRealmConfigByDBName(context, databaseName);
Realm.compactRealm(config);
return Realm.getInstance(config);
}
Q2 : In my application, now we are consider between 2 ways of implementation.
1: We create a new Realm instance every time when we need to do something with Database (in both of worker thread and UI thread) and close it when task get done.
2: We create only one instance of Realm and let it live along with application, when quit application we close instance above.
Please explain me the advantage and disadvantage of each one and which ways is recommended (My application using Service to handle database and network connection)
If I have 2 heavy tasks (take a long time to complete it's transaction), what is difference between execute 2 task by one Realm instance and execute 2 task on 2 Realm instances in 2 separate thread (I mean one thread have one Realm instances and instance will executes one in 2 tasks above), and which one if safer and faster.
what will happen if there is a problem while executing a transaction (example not responding or throws some exception)

Note: I am not an official Realm person, but I've been using Realm for a while now.
Here's a TL;DR version
1.) It's worth noting a few things:
A given Realm file should be accessed only with the same RealmConfiguration throughout the application, so the first solution here is preferable (don't create a new config for each Realm).
Realm.compactRealm(RealmConfig) works only when there are no open Realm instances on any threads. So either at application start, or at application finish (personally I found that it makes start-up slower, so I call compactRealm() when my activity count reaches 0, which I manage with a retained fragment bound to the activity - but that's just me).
2.) It's worth noting that Realm.getInstance() on first call creates a thread-local cache (the cache is shared among Realm instances that belong to the same thread, and increments a counter to indicate how many Realm instances are open on that given thread. When that counter reaches 0 as a result of calling realm.close() on all instances, the cache is cleared.
It's also worth noting that the Realm instance is thread-confined, so you will need to open a new Realm on any thread where you use it. This means that if you're using it in an IntentService, you'll need to open a new Realm (because it's in a background thread).
It is extremely important to call realm.close() on Realm instances that are opened on background threads.
Realm realm = null;
try {
realm = Realm.getDefaultInstance();
//do database operations
} finally {
if(realm != null) {
realm.close();
}
}
Or API 19+:
try(Realm realm = Realm.getDefaultInstance()) {
//do database operations
}
When you call realm.close() on a particular Realm instance, it invalidates the results and objects that belong to it. So it makes sense both to open/close Realms in Activity onCreate() and onDestroy(), or open it within the Application and share the same UI thread Realm instance for queries on the UI thread.
(It's not as important to close the Realm instance on the UI thread unless you intend to compact it after all of them are closed, but you have to close Realm instances on background threads.)
Note: calling RealmConfiguration realmConfig = new RealmConfiguration.Builder(appContext).build() can fail on some devices if you call it in Application.onCreate(), because getFilesDir() can return null, so it's better to initialize your RealmConfiguration only after the first activity has started.
With all that in mind, the answer to 2) is:
While I personally create a single instance of Realm for the UI thread, you'll still need to open (and close!) a new Realm instance for any background threads.
I use a single instance of Realm for the UI thread because it's easier to inject that way, and also because executeTransactionAsync()'s RealmAsyncTask gets cancelled if the underlying Realm instance is closed while it's still executing, so I didn't really want that to happen. :)
Don't forget, that you need a Realm instance on the UI thread to show RealmResults<T> from realm queries (unless you intend to use copyFromRealm() which makes everything use more memory and is generally slower)
IntentService works like a normal background thread, so you should also close the realm instance there as well.
Both heavy tasks work whether it's the same Realm instance or the other (just make sure you have a Realm instance on that given thread), but I'd recommend executing these tasks serially, one after the other.
If there's an exception during the transaction, you should call realm.cancelTransaction() (the docs say begin/commit, but it always forgets about cancel).
If you don't want to manually manage begin/commit/cancel, you should use realm.executeTransaction(new Realm.Transaction() { ... });, because it automatically calls begin/commit/cancel for you. Personally I use executeTransaction() everywhere because it's convenient.

Concept of promises in Java

Is there a concept of using promises in java (just like ut is used in JavaScript) instead of using nested callbacks ?
If so, is there an example of how the callback is implemented in java and handlers are chained ?

Yep! Java 8 calls it CompletableFuture. It lets you implement stuff like this.
class MyCompletableFuture<T> extends CompletableFuture<T> {
static final Executor myExecutor = ...;
public MyCompletableFuture() { }
public <U> CompletableFuture<U> newIncompleteFuture() {
return new MyCompletableFuture<U>();
}
public Executor defaultExecutor() {
return myExecutor;
}
public void obtrudeValue(T value) {
throw new UnsupportedOperationException();
}
public void obtrudeException(Throwable ex) {
throw new UnsupportedOperationException();
}
}
The basic design is a semi-fluent API in which you can arrange:
(sequential or async)
(functions or actions)
triggered on completion of
i) ("then") ,or ii) ("andThen" and "orThen")
others. As in:
MyCompletableFuture<String> f = ...; g = ...
f.then((s -> aStringFunction(s)).thenAsync(s -> ...);
or
f.andThen(g, (s, t) -> combineStrings).or(CompletableFuture.async(()->...)....

UPDATE 7/20/17
I wanted to edit that there is also a Library called "ReactFX" which is supposed to be JavaFX as a reactive framework. There are many Reactive Java libraries from what I've seen, and since Play is based on the Reactive principal, I would assume that these Reactive libraries follow that same principal of non-blocking i/o, async calls from server to client and back while having communication be send by either end.
These libraries seem to be made for the client side of things, but there might be a Server reactive library as well, but I would assume that it would be wiser to use Play! with one of these client side reactive libraries.
You can take a look at https://www.playframework.com/
which implements this functionality here
https://www.playframework.com/documentation/2.2.0/api/java/play/libs/F.Promise.html
Additonal reading https://www.playframework.com/documentation/2.5.x/JavaAsync
Creating non-blocking actions
Because of the way Play works, action code must be as fast as possible, i.e., non-blocking. So what should we return from our action if we are not yet able to compute the result? We should return the promise of a result!
Java 8 provides a generic promise API called CompletionStage. A CompletionStage<Result> will eventually be redeemed with a value of type Result. By using a CompletionStage<Result> instead of a normal Result, we are able to return from our action quickly without blocking anything. Play will then serve the result as soon as the promise is redeemed.
The web client will be blocked while waiting for the response, but nothing will be blocked on the server, and server resources can be used to serve other clients.
How to create a CompletionStage
To create a CompletionStage<Result> we need another promise first: the promise that will give us the actual value we need to compute the result:
CompletionStage<Double> promiseOfPIValue = computePIAsynchronously();
CompletionStage<Result> promiseOfResult = promiseOfPIValue.thenApply(pi ->
ok("PI value computed: " + pi)
);
Play asynchronous API methods give you a CompletionStage. This is the case when you are calling an external web service using the play.libs.WS API, or if you are using Akka to schedule asynchronous tasks or to communicate with Actors using play.libs.Akka.
A simple way to execute a block of code asynchronously and to get a CompletionStage is to use the CompletableFuture.supplyAsync() helper:
CompletionStage<Integer> promiseOfInt = CompletableFuture.supplyAsync(() -> intensiveComputation());
Note: It’s important to understand which thread code runs on which promises. Here, the intensive computation will just be run on another thread.
You can’t magically turn synchronous IO into asynchronous by wrapping it in a CompletionStage. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a CompletionStage, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency. See Understanding Play thread pools for more information.
It can also be helpful to use Actors for blocking operations. Actors provide a clean model for handling timeouts and failures, setting up blocking execution contexts, and managing any state that may be associated with the service. Also Actors provide patterns like ScatterGatherFirstCompletedRouter to address simultaneous cache and database requests and allow remote execution on a cluster of backend servers. But an Actor may be overkill depending on what you need.
Async results
We have been returning Result up until now. To send an asynchronous result our action needs to return a CompletionStage<Result>:
public CompletionStage<Result> index() {
return CompletableFuture.supplyAsync(() -> intensiveComputation())
.thenApply(i -> ok("Got result: " + i));
}
Actions are asynchronous by default
Play actions are asynchronous by default. For instance, in the controller code below, the returned Result is internally enclosed in a promise:
public Result index() {
return ok("Got request " + request() + "!");
}
Note: Whether the action code returns a Result or a CompletionStage<Result>, both kinds of returned object are handled internally in the same way. There is a single kind of Action, which is asynchronous, and not two kinds (a synchronous one and an asynchronous one). Returning a CompletionStage is a technique for writing non-blocking code.
Some info on CompletionStage
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletionStage.html
which is a subclass of the class mentioned in #Debosmit Ray's answer called CompletableFuture
This Youtube Video by LinkedIn dev Mr. Brikman explains a bit about Promises in
https://youtu.be/8z3h4Uv9YbE?t=15m46s
and
https://www.youtube.com/watch?v=4b1XLka0UIw
I believe the first video gives an example of a promise, the second video might also give some good info, I don't really recall which video had what content.
Either way the information here is very good, and worth looking into.
I personally do not use Play, but I have been looking at it for a long, long time, as it does a lot of really good stuff.

If you want to do Promise even before Java7, "java-promise" may be useful. (Of course it works with Java8)
You can easily control asynchronous operations like JavaScript's Promise.
https://github.com/riversun/java-promise
example
import org.riversun.promise.Promise;
public class Example {
public static void main(String[] args) {
Promise.resolve("foo")
.then(new Promise((action, data) -> {
new Thread(() -> {
String newData = data + "bar";
action.resolve(newData);
}).start();
}))
.then(new Promise((action, data) -> {
System.out.println(data);
action.resolve();
}))
.start();
System.out.println("Promise in Java");
}
}
result:
Promise in Java
foobar

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

RxJava Multithreading with Realm - Realm access from incorrect thread - java

Related

ThreadLocals on GraphQL-Java

How can I run a concurrent queue of tasks using Rx?

Wrapping blocking I/O in project reactor

Deeper understanding about how Realm works?

Concept of promises in Java

Categories

Resources