Let's say that we have the following entities: Project and Release, which is a one to many relationship.
Upon an event consumption from an SQS queue where a release id is sent as part of the event, there might be scenarios where we might have to create thousands of releases in our DB, where for each release we have to make a rest call to a 3rd party service in order to get some information for each release.
That means that we might have to make thousands of calls, in some cases more than 20k calls just to retrieve the information for the different releases and store it in the DB.
Obviously this is not scalable, so I'm not really sure what's the way to go in this scenario.
I know I might use a CompletableFuture, but I'm not sure how to use that with spring.
The http client that I am using is WebClient.
Any ideas?
You can make the save queries in a method transactional by adding the annotation #Transactional above the method signature. The method should also be public, or else this annotation is ignored.
As for using CompletableFuture in spring; You could make a http call method asynchronous by adding the #Async annotation above its signature and by letting it return a CompletableFuture as a return type. You should return a completed future holding the response value from the http call. You can easily make a completed future with the method CompletableFuture.completedFuture(yourValue). Spring will only return the completed future once the asynchronous method is done executing everything int its code block. For #Async to work you must also add the #EnableAsync annotation to one of your configuration classes. On top of that the #Async annotated method must be public and cannot be called by a method from within the same class. If the method is private or is called from within the same class then the #Async annotation will be ignored and instead the method will be executed in the same thread as the calling method is executed.
Next to an #Async annotated method you could also use a parallelStream to execute all 20K http calls in parallel. For example:
List<Long> releaseIds = new ArrayList<>();
Map<Long,ReleaseInfo> releaseInfo = releaseIds.parallelStream().map(releaseId -> new AbstractMap.SimpleEntry<>(releaseId, webClient.getReleaseInfo(releaseId)).collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
Lastly you could also use a ThreadPoolExecutor to execute the http calls in parallel. An example:
List<Long> releaseIds = new ArrayList<>();
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()); //I've made the amount of threads in the pool equal to the amount of available CPU processors on the machine.
//Submit tasks to the executor
List<Future<ReleaseInfo>> releaseInfoFutures = releaseIds.stream().map(releaseId -> executor.submit(() -> webClient.getReleaseInfo(releaseId)).collect(Collectors.toList());
//Wait for all futures to complete and map all non-null values to ReleaseInfo list.
List<ReleaseInfo> releaseInfo = releaseInfoFutures.stream().map(this::getValueAfterFutureCompletion).filter(releaseInfo -> releaseInfo != null).collect(Collectors.toList());
private ReleaseInfo getValueAfterFutureCompletion(Future<ReleaseInfo> future){
ReleaseInfo releaseInfo = null;
try {
releaseInfo = future.get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
} finally {
return releaseInfo;
}
}
Make sure to call shutdownNow() on ThreadPoolExecutor after you're done with it to avoid memory leaks.
Related
I'm exposing a legacy web app on GraphQL, but this web app uses Threadlocals (amongst other Apache-Shiro).
Since GraphQL-java seems to be using the fork-join pool for concurrency I worry about how far I need to go to ensure that my ThreadLocals still work and work safely.
Reading the documentation and the source it seems a large part of the concurrency is achieved by DataFetchers that return CompletableFuture's I can't tell for sure if that's the only source of concurrency (i think not) and whether the DataFetchers themselves are invoked from the fork-join pool
So would it be Safe to wrap my DataFetcher's in a delegate that set and clears the ThreadLocals? or does that still have the risk of being preempted and continued on another thread in the fork-join pool something like:
static class WrappedDataFetcher implements DataFetcher<Object> {
private DataFetcher<?> realDataFetcher;
WrappedDataFetcher(DataFetcher<?> realDataFetcher) {
this.realDataFetcher = realDataFetcher;
}
#Override
public Object get(DataFetchingEnvironment dataFetchingEnvironment) throws Exception {
try {
setThreadLocalsFromRequestOrContext(dataFetchingEnvironment);
return realDataFetcher.get(dataFetchingEnvironment);
} finally {
clearTreadLocals();
}
}
}
Or would I need to explicitly run my DataFetchers in a Threadpool like:
static class WrappedDataFetcherThreadPool implements DataFetcher<Object> {
private DataFetcher<?> wrappedDataFetcher;
private ThreadPoolExecutor executor;
WrappedDataFetcherThreadPool(DataFetcher<?> realDataFetcher, ThreadPoolExecutor executor) {
// Wrap in Wrapper from previous example to ensure threadlocals in the executor
this.wrappedDataFetcher = new WrappedDataFetcher(realDataFetcher);
this.executor = executor;
}
#Override
public Object get(DataFetchingEnvironment dataFetchingEnvironment) throws Exception {
Future<?> future = executor.submit(() -> wrappedDataFetcher.get(dataFetchingEnvironment));
return future.get(); //for simplicity / clarity of the question
}
}
I think the second one solves my problem but it feels like overkill and I worry about performance. But I think the first risks preemption.
If there is a better way to handle this I would love to hear it as well.
Note: this is not about the async nature of GraphQL (I hope to leverage that as well) but about the possible side effect of running multiple requests WITH treadLocals that might get mixed up between requests due to the fork-join pool
As far as I know graphql-java does not use its own thread pool and relies on the application for it. The way it achieves it using future callbacks. Say this is the current state of the application.
Thread T_1 with thread local storage TLS_1 executing data fetcher DF_1.
Graphql-java engine attaches a synchronous callback to the future returned by DF_1. If a future is not returned it wraps the result in a completed future and then attaches the synchronous callback. Since the callback is synchronous the thread that completes the future runs the callback. If any other thread apart from T_1 completes the future, TLS_1 is lost(unless it's copied over to the executing thread). One example of this is a non blocking HTTP I/O library which uses an I/O thread to complete the response future.
Here is a link where the authors have commented more on the thread behavior in graphql-java library
https://spectrum.chat/graphql-java/general/how-to-supply-custom-executor-service-for-data-fetchers-to-run-on~29caa730-9114-4883-ab4a-e9700f225f93
Idea
I have a processing method which takes in a list of items and processes them asynchronously using external web service. The process steps also persist data while processing. At the end of whole process, I want to persist the whole process along with each processed results as well.
Problem
I convert each item in the list into CompletableFuture and run a processing task on them, and put them back into an array of futures. Now using its .ofAll method (in sequence method) to complete future when all the submitted tasks are completed and return another CompletableFuture which holds the result.
When I want to get that result, I call .whenComplete(..), and would want to set the returned result into my entity as data, and then persist to the database, however the repository save call just does nothing and continues threads just continue running, it's not going past the repository save call.
#Transactional
public void process(List<Item> items) {
List<Item> savedItems = itemRepository.save(items);
final Process process = createNewProcess();
final List<CompletableFuture<ProcessData>> futures = savedItems.stream()
.map(item -> CompletableFuture.supplyAsync(() -> doProcess(item, process), executor))
.collect(Collectors.toList());
sequence(futures).whenComplete((data, throwable) -> {
process.setData(data);
processRepository.save(process); // <-- transaction lost?
log.debug("Process DONE"); // <-- never reached
});
}
Sequence method
private static <T> CompletableFuture<List<T>> sequence(List<CompletableFuture<T>> futures) {
CompletableFuture<Void> allDoneFuture =
CompletableFuture.allOf(futures.toArray(new CompletableFuture[futures.size()]));
return allDoneFuture.thenApply(v ->
futures.stream().map(CompletableFuture::join).collect(Collectors.toList())
);
}
What is happening? Why is the persist call not passing. Is the thread that started the transaction not able to commit the transaction or where does it get lost? All the processed data returns fine and is all good. I've tried different transaction strategies, but how is it possible to control which thread is gonna finish the transaction, if it's the case?
Any advice?
The reason of your problem is, as said above, that the transaction ends
when the return of method process(..) is reached.
What you can do, is create the transaction manually, that gives you full
control over when it starts and ends.
Remove #Transactional
Autowire the TransactionManager then in process(..) :
TransactionDefinition txDef = new DefaultTransactionDefinition();
TransactionStatus txStatus = transactionManager.getTransaction(txDef);
try {
//do your stuff here like
doWhateverAsync().then(transactionManager.commit(txStatus);)
} catch (Exception e) {
transactionManager.rollback(txStatus);
throw e;
}
In case of Spring Boot Application , you need following configurations.
The main application method should be annotated with #EnableAsync.
#Async annotation should be on the top of method having #Transactional annotation. This is necessary to indicate processing will be taking place in child thread.
I've found a lot of examples about it and doesn't know what's the 'right' implementation right there.
Basically I've got a object (let's call it NBAManager) and there's a method public Completable generateGame() for this object. The idea is that generateGame method gets called a lot of times and I want to generate games in a sequential way: I was thinking about concurrent queue. I came up with the following design: I'd create a singleton instance of NBAService: service for NBAManager and the body of generateGame() will look like this:
public Completable generateGame(RequestInfo info)
return service.generateGame(info);
So basically I'll pass up that Completable result. And inside of that NBAService object I'll have a queue (a concurrent one, because I want to have an opportunity to poll() and add(request) if there's a call of generateGame() while NBAManager was processing one of the earlier requests) of requests. I got stuck with this:
What's the right way to write such a job queue in Rx way? There're so many examples of it. Could you send me a link of a good implementation?
How do I handle the logic of queue execution? I believe we've to execute if there's one job only and if there're many then we just have to add it and that's it. How can I control it without runnable? I was thinking about using subjects.
Thanks!
There are multiple ways to implement this, you can choose how much RxJava should be invoked. The least involvement can use a single threaded ExecutorService as the "queue" and CompletableSubject for the delayed completion:
class NBAService {
static ExecutorService exec = Executors.newSingleThreadedExecutor();
public static Completable generateGame(RequestInfo info) {
CompletableSubject result = CompletableSubject.create();
exec.submit(() -> {
// do something with the RequestInfo instance
f(info).subscribe(result);
});
return result;
}
}
A more involved solution would be if you wanted to trigger the execution when the Completable is subscribed to. In this case, you can go with create() and subscribeOn():
class NBAService {
public static Completable generateGame(RequestInfo info) {
return Completable.create(emitter -> {
// do something with the RequestInfo instance
emitter.setDisposable(
f(info).subscribe(emitter::onComplete, emitter::onError)
);
})
.subscribeOn(Schedulers.single());
}
}
I have a spring-webflux API which, at a service layer, needs to read from an existing repository which uses JDBC.
Having done some reading on the subject, I would like to keep the execution of the blocking database call separate from the rest of my non-blocking async code.
I have defined a dedicated jdbcScheduler:
#Bean
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(Executors.newFixedThreadPool(maxPoolSize));
}
And an AsyncWrapper utility to use it:
#Component
public class AsyncJdbcWrapper {
private final Scheduler jdbcScheduler;
#Autowired
public AsyncJdbcWrapper(Scheduler jdbcScheduler) {
this.jdbcScheduler = jdbcScheduler;
}
public <T> Mono<T> async(Callable<T> callable) {
return Mono.fromCallable(callable)
.subscribeOn(jdbcScheduler)
.publishOn(Schedulers.parallel());
}
}
Which is then used to wrap jdbc calls like so:
Mono<Integer> userIdMono = asyncWrapper.async(() -> userDao.getUserByUUID(request.getUserId()))
.map(userOption -> userOption.map(u -> u.getId())
.orElseThrow(() -> new IllegalArgumentException("Unable to find user with ID " + request.getUserId())));
I've got two questions:
1) Am I correctly pushing the execution of blocking calls to another set of threads? Being fairly new to this stuff I'm struggling with the intricacies of subscribeOn()/publishOn().
2) Say I want to make use of the resulting mono, e.g call an API with the result of the userIdMono, on which scheduler will that be executed? The one specifically created for the jdbc calls, or the main(?) thread that reactor usually operates within? e.g.
userIdMono.map(id -> someApiClient.call(id));
1) Use of subscribeOn is correctly putting the JDBC work on the jdbcScheduler
2) Neither, the results of the Callable - while computed on the jdbcScheduler, are publishOn the parallel Scheduler, so your map will be executed on a thread from the Schedulers.parallel() pool (rather than hogging the jdbcScheduler).
I'm using DeferredResult in my Spring MVC application to handle some server-side processing of a potentially long-running action. It might be very fast, or it could take a second or two.
But in either case, the incoming HTTP request causes an action to be pushed to a queue, which a separate thread (via an ExecutorService) is responsible for consuming. A callback is then called, notifying the pusher that the operation has completed.
I refactored some of this behavior into a utility method:
public static DeferredResult<String> toResponse(GameManager gameManager, final Player player, Action action) {
DeferredResult<String> deferredResult = new DeferredResult<>();
gameManager.execute(action, new Handler<Result>() {
#Override
public void handle(Result result) {
JSONObject obj;
try {
obj = gameManager.getGameJSON(player);
obj.put("success", result.getResult());
obj.put("message", result.getMessage());
deferredResult.setResult(obj.toString()); // POINT B
} catch (JSONException e) {
deferredResult.setErrorResult(e);
}
}
});
return deferredResult; // POINT A
}
But I'm wondering what happens if the execution of the action happens so quickly that the setResult() method is called (POINT B) on the DeferredResult before it has been returned (POINT A) to the calling method.
Will Spring see the returned DeferredResult already has a value and handle it, or does it only begin "watching" for the setter to be called after the instance has been provided?
I've not used Spring but would say that Class DeferredResult<> would be a pretty poor implementation of a Deferred if settlement timing made any difference to the downstream behaviour.
It seems safe to assume that the behaviour would be identical regardless of asynchronous process' timing - milliseconds, seconds or whatever, with the only proviso that a timeout didn't occur in which case the onTimeout handler would run (if set). Even if the Deferred was settled synchronously, in the same code block that created it, the caller function should act on the outcome as expected.
If this assumption is not valid then the Class DeferredResult<> is not fit for purpose and shouldn't be used.