I'm exposing a legacy web app on GraphQL, but this web app uses Threadlocals (amongst other Apache-Shiro).
Since GraphQL-java seems to be using the fork-join pool for concurrency I worry about how far I need to go to ensure that my ThreadLocals still work and work safely.
Reading the documentation and the source it seems a large part of the concurrency is achieved by DataFetchers that return CompletableFuture's I can't tell for sure if that's the only source of concurrency (i think not) and whether the DataFetchers themselves are invoked from the fork-join pool
So would it be Safe to wrap my DataFetcher's in a delegate that set and clears the ThreadLocals? or does that still have the risk of being preempted and continued on another thread in the fork-join pool something like:
static class WrappedDataFetcher implements DataFetcher<Object> {
private DataFetcher<?> realDataFetcher;
WrappedDataFetcher(DataFetcher<?> realDataFetcher) {
this.realDataFetcher = realDataFetcher;
}
#Override
public Object get(DataFetchingEnvironment dataFetchingEnvironment) throws Exception {
try {
setThreadLocalsFromRequestOrContext(dataFetchingEnvironment);
return realDataFetcher.get(dataFetchingEnvironment);
} finally {
clearTreadLocals();
}
}
}
Or would I need to explicitly run my DataFetchers in a Threadpool like:
static class WrappedDataFetcherThreadPool implements DataFetcher<Object> {
private DataFetcher<?> wrappedDataFetcher;
private ThreadPoolExecutor executor;
WrappedDataFetcherThreadPool(DataFetcher<?> realDataFetcher, ThreadPoolExecutor executor) {
// Wrap in Wrapper from previous example to ensure threadlocals in the executor
this.wrappedDataFetcher = new WrappedDataFetcher(realDataFetcher);
this.executor = executor;
}
#Override
public Object get(DataFetchingEnvironment dataFetchingEnvironment) throws Exception {
Future<?> future = executor.submit(() -> wrappedDataFetcher.get(dataFetchingEnvironment));
return future.get(); //for simplicity / clarity of the question
}
}
I think the second one solves my problem but it feels like overkill and I worry about performance. But I think the first risks preemption.
If there is a better way to handle this I would love to hear it as well.
Note: this is not about the async nature of GraphQL (I hope to leverage that as well) but about the possible side effect of running multiple requests WITH treadLocals that might get mixed up between requests due to the fork-join pool
As far as I know graphql-java does not use its own thread pool and relies on the application for it. The way it achieves it using future callbacks. Say this is the current state of the application.
Thread T_1 with thread local storage TLS_1 executing data fetcher DF_1.
Graphql-java engine attaches a synchronous callback to the future returned by DF_1. If a future is not returned it wraps the result in a completed future and then attaches the synchronous callback. Since the callback is synchronous the thread that completes the future runs the callback. If any other thread apart from T_1 completes the future, TLS_1 is lost(unless it's copied over to the executing thread). One example of this is a non blocking HTTP I/O library which uses an I/O thread to complete the response future.
Here is a link where the authors have commented more on the thread behavior in graphql-java library
https://spectrum.chat/graphql-java/general/how-to-supply-custom-executor-service-for-data-fetchers-to-run-on~29caa730-9114-4883-ab4a-e9700f225f93
Related
I'm looking at the Simple RPC example from grpc.io's basic tutorial:
#Override
public void getFeature(Point request, StreamObserver<Feature> responseObserver) {
responseObserver.onNext(checkFeature(request));
responseObserver.onCompleted();
}
...
private Feature checkFeature(Point location) {
for (Feature feature : features) {
if (feature.getLocation().getLatitude() == location.getLatitude()
&& feature.getLocation().getLongitude() == location.getLongitude()) {
return feature;
}
}
// No feature was found, return an unnamed feature.
return Feature.newBuilder().setName("").setLocation(location).build();
}
Are there any caveats to interacting with the StreamObserver from other threads? For example, say checkFeature() asynchronously hits another service, returning a CompletableFuture:
#Override
public void getFeature(Point request, StreamObserver<Feature> responseObserver) {
checkFeature(request).
thenAccept(feature -> responseObserver.onNext(feature));
responseObserver.onCompleted();
}
Of course the above wouldn't work because the first thread would execute onCompleted() before the feature is returned. So let's fix that:
#Override
public void getFeature(Point request, StreamObserver<Feature> responseObserver) {
checkFeature(request).
thenAccept(feature -> {
responseObserver.onNext(feature);
responseObserver.onCompleted();
});
}
I think this should work, but I'm new to Java so I wonder what ramifications there are. For example,
Will Context.current() be consistent?
Will anything cause the StreamObserver to destruct or close prematurely besides onNext() for a unary calls and onError()?
Is there a better practice?
It would be great if someone could also step me through how they reasoned. I tried looking up actual implementations of StreamObserver but I wasn't sure what to look for.
Using thenAccept() to call onNext() and onCompleted() is fine, because the observer is not called concurrently from multiple threads.
The "broken" example that called onCompleted() separately was broken also because it could have called the observer from multiple threads without any form of synchronization. StreamObservers may not be called from multiple threads simultaneously.
Using thenAccept() isn't quite right though, as it doesn't handle the case where the future fails. So you need to receive the Throwable as well, which can be done with whenComplete():
#Override
public void getFeature(Point request, StreamObserver<Feature> responseObserver) {
checkFeature(request).
whenComplete((feature, t) -> {
if (t != null) {
responseObserver.onError(t);
} else {
responseObserver.onNext(feature);
responseObserver.onCompleted();
}
});
}
The Context could easily be "wrong" when processing that lambda. Typically we'd look for "architectural" solutions to make sure the context is propagated, like wrapping all application thread pools in Context.currentContextExecutor() when creating them, so individual call sites don't need to be concerned with propagation. I'm not familiar enough with CompletableFuture to provide strategy for it.
Will Context.current() be consistent?
Context.current() is using ThreadLocal. if you are accessing it on a different thread, it won't be consistent. You can propagate context between threads. You may find this post useful.
Will anything cause the StreamObserver to destruct or close prematurely besides onNext() for a unary calls and onError()?
Yes, Normal flow of StreamObserver ends with onError or onCompleted.
As StreamObserver javadoc states, "Since individual StreamObservers are not thread-safe, if multiple threads will be writing to a StreamObserver concurrently, the application must synchronize calls". If you are calling StreamObserver concurrently, you need to synchronize the calls. In other words, if you know for sure it won't be called concurrently even if you are using multiple threads, it should be fine.
If accessing the same StreamObserver on multiple threads, I would try to synchronize it unless the performance is critical since it is error prone. At least, it deserves a nice comment.
I have a spring-webflux API which, at a service layer, needs to read from an existing repository which uses JDBC.
Having done some reading on the subject, I would like to keep the execution of the blocking database call separate from the rest of my non-blocking async code.
I have defined a dedicated jdbcScheduler:
#Bean
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(Executors.newFixedThreadPool(maxPoolSize));
}
And an AsyncWrapper utility to use it:
#Component
public class AsyncJdbcWrapper {
private final Scheduler jdbcScheduler;
#Autowired
public AsyncJdbcWrapper(Scheduler jdbcScheduler) {
this.jdbcScheduler = jdbcScheduler;
}
public <T> Mono<T> async(Callable<T> callable) {
return Mono.fromCallable(callable)
.subscribeOn(jdbcScheduler)
.publishOn(Schedulers.parallel());
}
}
Which is then used to wrap jdbc calls like so:
Mono<Integer> userIdMono = asyncWrapper.async(() -> userDao.getUserByUUID(request.getUserId()))
.map(userOption -> userOption.map(u -> u.getId())
.orElseThrow(() -> new IllegalArgumentException("Unable to find user with ID " + request.getUserId())));
I've got two questions:
1) Am I correctly pushing the execution of blocking calls to another set of threads? Being fairly new to this stuff I'm struggling with the intricacies of subscribeOn()/publishOn().
2) Say I want to make use of the resulting mono, e.g call an API with the result of the userIdMono, on which scheduler will that be executed? The one specifically created for the jdbc calls, or the main(?) thread that reactor usually operates within? e.g.
userIdMono.map(id -> someApiClient.call(id));
1) Use of subscribeOn is correctly putting the JDBC work on the jdbcScheduler
2) Neither, the results of the Callable - while computed on the jdbcScheduler, are publishOn the parallel Scheduler, so your map will be executed on a thread from the Schedulers.parallel() pool (rather than hogging the jdbcScheduler).
Is there a concept of using promises in java (just like ut is used in JavaScript) instead of using nested callbacks ?
If so, is there an example of how the callback is implemented in java and handlers are chained ?
Yep! Java 8 calls it CompletableFuture. It lets you implement stuff like this.
class MyCompletableFuture<T> extends CompletableFuture<T> {
static final Executor myExecutor = ...;
public MyCompletableFuture() { }
public <U> CompletableFuture<U> newIncompleteFuture() {
return new MyCompletableFuture<U>();
}
public Executor defaultExecutor() {
return myExecutor;
}
public void obtrudeValue(T value) {
throw new UnsupportedOperationException();
}
public void obtrudeException(Throwable ex) {
throw new UnsupportedOperationException();
}
}
The basic design is a semi-fluent API in which you can arrange:
(sequential or async)
(functions or actions)
triggered on completion of
i) ("then") ,or ii) ("andThen" and "orThen")
others. As in:
MyCompletableFuture<String> f = ...; g = ...
f.then((s -> aStringFunction(s)).thenAsync(s -> ...);
or
f.andThen(g, (s, t) -> combineStrings).or(CompletableFuture.async(()->...)....
UPDATE 7/20/17
I wanted to edit that there is also a Library called "ReactFX" which is supposed to be JavaFX as a reactive framework. There are many Reactive Java libraries from what I've seen, and since Play is based on the Reactive principal, I would assume that these Reactive libraries follow that same principal of non-blocking i/o, async calls from server to client and back while having communication be send by either end.
These libraries seem to be made for the client side of things, but there might be a Server reactive library as well, but I would assume that it would be wiser to use Play! with one of these client side reactive libraries.
You can take a look at https://www.playframework.com/
which implements this functionality here
https://www.playframework.com/documentation/2.2.0/api/java/play/libs/F.Promise.html
Additonal reading https://www.playframework.com/documentation/2.5.x/JavaAsync
Creating non-blocking actions
Because of the way Play works, action code must be as fast as possible, i.e., non-blocking. So what should we return from our action if we are not yet able to compute the result? We should return the promise of a result!
Java 8 provides a generic promise API called CompletionStage. A CompletionStage<Result> will eventually be redeemed with a value of type Result. By using a CompletionStage<Result> instead of a normal Result, we are able to return from our action quickly without blocking anything. Play will then serve the result as soon as the promise is redeemed.
The web client will be blocked while waiting for the response, but nothing will be blocked on the server, and server resources can be used to serve other clients.
How to create a CompletionStage
To create a CompletionStage<Result> we need another promise first: the promise that will give us the actual value we need to compute the result:
CompletionStage<Double> promiseOfPIValue = computePIAsynchronously();
CompletionStage<Result> promiseOfResult = promiseOfPIValue.thenApply(pi ->
ok("PI value computed: " + pi)
);
Play asynchronous API methods give you a CompletionStage. This is the case when you are calling an external web service using the play.libs.WS API, or if you are using Akka to schedule asynchronous tasks or to communicate with Actors using play.libs.Akka.
A simple way to execute a block of code asynchronously and to get a CompletionStage is to use the CompletableFuture.supplyAsync() helper:
CompletionStage<Integer> promiseOfInt = CompletableFuture.supplyAsync(() -> intensiveComputation());
Note: It’s important to understand which thread code runs on which promises. Here, the intensive computation will just be run on another thread.
You can’t magically turn synchronous IO into asynchronous by wrapping it in a CompletionStage. If you can’t change the application’s architecture to avoid blocking operations, at some point that operation will have to be executed, and that thread is going to block. So in addition to enclosing the operation in a CompletionStage, it’s necessary to configure it to run in a separate execution context that has been configured with enough threads to deal with the expected concurrency. See Understanding Play thread pools for more information.
It can also be helpful to use Actors for blocking operations. Actors provide a clean model for handling timeouts and failures, setting up blocking execution contexts, and managing any state that may be associated with the service. Also Actors provide patterns like ScatterGatherFirstCompletedRouter to address simultaneous cache and database requests and allow remote execution on a cluster of backend servers. But an Actor may be overkill depending on what you need.
Async results
We have been returning Result up until now. To send an asynchronous result our action needs to return a CompletionStage<Result>:
public CompletionStage<Result> index() {
return CompletableFuture.supplyAsync(() -> intensiveComputation())
.thenApply(i -> ok("Got result: " + i));
}
Actions are asynchronous by default
Play actions are asynchronous by default. For instance, in the controller code below, the returned Result is internally enclosed in a promise:
public Result index() {
return ok("Got request " + request() + "!");
}
Note: Whether the action code returns a Result or a CompletionStage<Result>, both kinds of returned object are handled internally in the same way. There is a single kind of Action, which is asynchronous, and not two kinds (a synchronous one and an asynchronous one). Returning a CompletionStage is a technique for writing non-blocking code.
Some info on CompletionStage
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletionStage.html
which is a subclass of the class mentioned in #Debosmit Ray's answer called CompletableFuture
This Youtube Video by LinkedIn dev Mr. Brikman explains a bit about Promises in
https://youtu.be/8z3h4Uv9YbE?t=15m46s
and
https://www.youtube.com/watch?v=4b1XLka0UIw
I believe the first video gives an example of a promise, the second video might also give some good info, I don't really recall which video had what content.
Either way the information here is very good, and worth looking into.
I personally do not use Play, but I have been looking at it for a long, long time, as it does a lot of really good stuff.
If you want to do Promise even before Java7, "java-promise" may be useful. (Of course it works with Java8)
You can easily control asynchronous operations like JavaScript's Promise.
https://github.com/riversun/java-promise
example
import org.riversun.promise.Promise;
public class Example {
public static void main(String[] args) {
Promise.resolve("foo")
.then(new Promise((action, data) -> {
new Thread(() -> {
String newData = data + "bar";
action.resolve(newData);
}).start();
}))
.then(new Promise((action, data) -> {
System.out.println(data);
action.resolve();
}))
.start();
System.out.println("Promise in Java");
}
}
result:
Promise in Java
foobar
I am using Apache TomEE 1.5.2 JAX-RS, pretty much out of the box, with the predefined HSQLDB.
The following is simplified code. I have a REST-style interface for receiving signals:
#Stateless
#Path("signal")
public class SignalEndpoint {
#Inject
private SignalStore store;
#POST
public void post() {
store.createSignal();
}
}
Receiving a signal triggers a lot of stuff. The store will create an entity, then fire an asynchronous event.
public class SignalStore {
#PersistenceContext
private EntityManager em;
#EJB
private EventDispatcher dispatcher;
#Inject
private Event<SignalEntity> created;
public void createSignal() {
SignalEntity entity = new SignalEntity();
em.persist(entity);
dispatcher.fire(created, entity);
}
}
The dispatcher is very simple, and merely exists to make the event handling asynchronous.
#Stateless
public class EventDispatcher {
#Asynchronous
public <T> void fire(Event<T> event, T parameter) {
event.fire(parameter);
}
}
Receiving the event is something else, which derives data from the signal, stores it, and fires another asynchronous event:
#Stateless
public class DerivedDataCreator {
#PersistenceContext
private EntityManager em;
#EJB
private EventDispatcher dispatcher;
#Inject
private Event<DerivedDataEntity> created;
#Asynchronous
public void onSignalEntityCreated(#Observes SignalEntity signalEntity) {
DerivedDataEntity entity = new DerivedDataEntity(signalEntity);
em.persist(entity);
dispatcher.fire(created, entity);
}
}
Reacting to that is even a third layer of entity creation.
To summarize, I have a REST call, which synchronously creates a SignalEntity, which asynchronously triggers the creation of a DerivedDataEntity, which asynchronously triggers the creation of a third type of entity. It all works perfectly, and the storage processes are beautifully decoupled.
Except for when I programmatically trigger a lot (f.e. 1000) of signals in a for-loop. Depending on my AsynchronousPool size, after processing signals (quite fast) in the amount of about half of that size, the application completely freezes for up to some minutes. Then it resumes, to process about the same amount of signals, quite fast, before freezing again.
I have been playing around with AsynchronousPool settings for the last half hour. Setting it to 2000, for instance, will easily make all my signals be processed at once, without any freezes. But the system isn't sane either, after that. Triggering another 1000 signals, resulted in them being created allright, but the entire creation of derived data never happened.
Now I am completely at a loss as to what to do. I can of course get rid of all those asynchronous events and implement some sort of queue myself, but I always thought the point of an EE container was to relieve me of such tedium. Asynchronous EJB events should already bring their own queue mechanism. One which should not freeze as soon as the queue is too full.
Any ideas?
UPDATE:
I have now tried it with 1.6.0-SNAPSHOT. It behaves a little bit differently: It still doesn't work, but I do get an exception:
Aug 01, 2013 3:12:31 PM org.apache.openejb.core.transaction.EjbTransactionUtil handleSystemException
SEVERE: EjbTransactionUtil.handleSystemException: fail to allocate internal resource to execute the target task
javax.ejb.EJBException: fail to allocate internal resource to execute the target task
at org.apache.openejb.async.AsynchronousPool.invoke(AsynchronousPool.java:81)
at org.apache.openejb.core.ivm.EjbObjectProxyHandler.businessMethod(EjbObjectProxyHandler.java:240)
at org.apache.openejb.core.ivm.EjbObjectProxyHandler._invoke(EjbObjectProxyHandler.java:86)
at org.apache.openejb.core.ivm.BaseEjbProxyHandler.invoke(BaseEjbProxyHandler.java:303)
at <<... my code ...>>
...
Caused by: java.util.concurrent.RejectedExecutionException: Timeout waiting for executor slot: waited 30 seconds
at org.apache.openejb.util.executor.OfferRejectedExecutionHandler.rejectedExecution(OfferRejectedExecutionHandler.java:55)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
at org.apache.openejb.async.AsynchronousPool.invoke(AsynchronousPool.java:75)
... 38 more
It is as though TomEE would not do ANY queueing of operations. If no thread is free to process in the moment of the call, tough luck. Surely, this cannot be intended..?
UPDATE 2:
Okay, I seem to have stumbled upon a semi-solution: Setting the AsynchronousPool.QueueSize property to maxint solves the freeze. But questions remain: Why is the QueueSize so limited in the first place, and, more worryingly: Why would this block the entire application? If the queue is full, it blocks, but as soon as a task is taken from it, another should pop in, right? The queue appears to be blocked until it is completely empty again.
UPDATE 3:
For anyone who wants to have a go: http://github.com/JanDoerrenhaus/tomeefreezetestcase
UPDATE 4:
As it turns out, increasing the queue size does NOT solve the problem, it merely delays it. The problem remains the same: Too many asynchronous operations at once, and TomEE chockes so bad, that it cannot even undeploy the application on termination anymore.
So far, my diagnosis is that the task cleanup does not work properly. My tasks are all very small and fast (see the test case on github). I was already afraid that it would be OpenJPA or HSQLDB slowing down on too many concurrent calls, but I commented out all em.persist calls, and the problem remained the same. So if my tasks are quite small and fast, but still manage to block out TomEE so bad that it could not get any further task in after 30 seconds (javax.ejb.EJBException: fail to allocate internal resource to execute the target task), I would imagine that completed tasks linger, clogging up the pipe, so to speak.
How could I resolve this issue?
Basically BlockingQueues use locks to ensure the consistency of data and avoid data loss, so in too highly concurrent environment it will reject a lot of tasks (your case).
You can play on trunk with the RejectedExecutionHandler implementation to retry to offer the task. One implementation can be:
new RejectedExecutionHandler() {
#Override
public void rejectedExecution(final Runnable r, final ThreadPoolExecutor executor) {
for (int i = 0; i < 10; i++) {
if (executor.getQueue().offer(r)) {
return;
}
try {
Thread.sleep(50);
} catch (final InterruptedException e) {
// no-op
}
}
throw new RejectedExecutionException();
}
}
It even works better with random sleep (between min and max).
The idea is basically: if the queue is full, wait some short time to reduce the concurrency.
configurable through WEB-INF/application.properties https://issues.apache.org/jira/browse/TOMEE-1012
For some of HTTP requests from clients, there're very complex business logic in server side.
Some of these business logics doesn't require to response to the client immediately, like sending a email to somebody. Can I put those tasks in an asynchronous method,so I only need to ensure that they had been executed,I don't need to wait all tasks complete to respond to the user.
Updated: Some people asked about the framework I am using. I am using Struts2 + Spring.
You can use the following 'fire and forget' pattern:
new Thread(new Runnable(){
public void run(){
System.out.println("I Am Sending Email");
sendEmailFunction();
}
}).start();
But too many such threads will lead to trouble. If you are going to do this, then you should use a ThreadPoolExecutor to ensure that you have some control over thread production. At the very least, place a maximum on the number of threads.
I don't know what framework you're using, but, in basic Java, you can just create a new Thread:
interface MyTaskCallback {
void myTaskCallback();
}
class MyTask implements Runnable {
MyTaskCallback callback;
Thread me;
public MyTask(MyTaskCallback callback) {
this.callback = callback;
this.me = new Thread();
}
public void start() {
this.me = new Thread(this);
this.me.start();
}
public void stop() {
try {
this.me.join(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
public void run() {
// Calls here will not block the other threads
sendEmailRequest();
callback.myTaskCallback();
}
}
class Main implements MyTaskCallback {
public void foo() {
MyTask m = new MyTask(this);
m.start();
}
public void myTaskCallback() {
// called when MyTask completes
}
}
Yes. Read about concurrency.
You can probably set up an asynchronous producer/consumer queue, for example.
there is no "asynchroneous method" in java, but you will either use Threads (possibly through a framework like Quartz: http://www.quartz-scheduler.org/ ) or a message queue like JMS http://java.sun.com/products/jms/
You want to look at the java.util.concurrent.Executors. One way to solve your problem is to have a ScheduledExecutorService which keeps a Queue, and runs every so often. There are many different ways to offload work available in the concurrent utilities however, it depends on your requirements, how expensive the tasks are, how fast they need to be done, etc.
You should respond to all HTTP requests immediately, otherwise the client may think the server is not responding or timeout. However, you could start up other threads or processes to complete tasks in the background before you respond to the client.
You could also continue to send 100 responses until the task was complete.
Yes you can Servlet 3.0 has great asynchronous support.
Watch this its a really great resource, you can watch the entire cast if you are unfamiliar with Servlet 3.0.
A good run down of it here.
The api docs.
Spring has good support for Quartz scheduling as well as Java Threading. This link will give you better idea about it.
Can I put those tasks in an asynchronous method,so I don't need to wait all tasks complete to respond to the user ?
YES