Using Spring #Async and ThreadPoolTaskScheduler with pool-size=1 - java

We have a service implementation in our Spring-based web application that increments some statistics counters in the db. Since we don't want to mess up response time for the user we defined them asynchronous using Spring's #Async:
public interface ReportingService {
#Async
Future<Void> incrementLoginCounter(Long userid);
#Async
Future<Void> incrementReadCounter(Long userid, Long productId);
}
And the spring task configuration like this:
<task:annotation-driven executor="taskExecutor" />
<task:executor id="taskExecutor" pool-size="10" />
Now, having the pool-size="10", we have concurrency issues when two threads try two create the same initial record that will contain the counter.
Is it a good idea here to set the pool-size="1" to avoid those conflicts? Does this have any side affects? We have quite a few places that fire async operations to update statistics.

The side-effects would depend on the speed at which tasks are added to the executor in comparison to how quickly a single thread can process them. If the number of tasks being added per second is greater than the number that a single thread can process in a second you run the risk of the queue increasing in size over time until you finally get an out of memory error.
Check out the executor section at this page Task Execution. They state that having an unbounded queue is not a good idea.
If you know that you can process tasks faster than they will be added then you are probably safe. If not, you should add a queue capacity and handle the input thread blocking if the queue reaches this size.

Looking at the two examples you posted, instead of a constant stream of #Async calls, consider updating a JVM local variable upon client requests, and then have a background thread write that to the database every now and then. Along the lines of (mind the semi-pseudo-code):
class DefaultReportingService implements ReportingService {
ConcurrentMap<Long, AtomicLong> numLogins;
public void incrementLoginCounterForUser(Long userId) {
numLogins.get(userId).incrementAndGet();
}
#Scheduled(..)
void saveLoginCountersToDb() {
for (Map.Entry<Long, AtomicLong> entry : numLogins.entrySet()) {
AtomicLong counter = entry.getValue();
Long toBeSummedWithTheValueInDb = counter.getAndSet(0L);
// ...
}
}
}

Related

spring batch failure in the end of the day

is there a solution that allows you to check on the jobrepository for a given job(JobInstance), the presence of a completed job during the day, if there is no completed status on the batch_job_execution table during the current day, so I must send a notification or an exit code like what we got nothing today.
i plan to implement the solution in a class that extends from JobExecutionListenerSupport, like this:
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {
private Logger logger = LoggerFactory.getLogger(JobCompletionNotificationListener.class);
private JobRegistry jobRegistry;
private JobRepository jobRepository;
public JobCompletionNotificationListener(JobRegistry jobRegistry, JobRepository jobRepository) {
this.jobRegistry = jobRegistry;
this.jobRepository = jobRepository;
}
#Override
public void afterJob(JobExecution jobExecution) {
System.out.println("finishhhhh");
//the logic if no job completed to day
if(noJobCompletedToDay){
Notify();
}
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
logger.info("!!! JOB FINISHED! -> example action execute after Job");
}
}
}
You can use JobExplorer#getLastJobExecution to get the last execution for your job instance and check if it's completed during the current day.
Depending on when you are going to do that check, you might also make sure there are no currently running jobs (JobExplorer#findRunningJobExecutions can help).
You can implement monitoring in multiple ways. Since version 4.2 Spring Batch provides support for metrics and monitoring based on Micrometer. There is an example of spring [grafana sample][1], with prometheus and grafana from which you can rely to customize a custom board or launch alerts from these tools.
If you have several batch processes it may be the best option, in addition to these tools will help you to monitor services, applications etc.
Buily in metrics:
Duration of job execution
Currently active jobs
Duration of step execution
Duration of item reading
Duration of item processing
Duration of chunk writing
You can create your own custom metrics (eg. Execution failures).
Otherwise, you can implement the monitoring, for example, through another independent batch process, which executes and sends a notification / mail etc. collecting for example the state of the process base, of the application or a filesystem shared between both processes.
You can also implement the check the way you describe it, there is an interesting thread where you can find described how to throw an exception in one step and process it in a next step that sends or not an alert as appropriate.

Execute blocking JDBC call in Spring Webflux

I am using Spring Webflux with Spring data jpa using PostgreSql as backend db.
I don't want to block the main thread while making db calls like find and save.
To achieve the same, I have a main scheduler in Controller class and a jdbcScheduler service classes.
The way I have defined them is:
#Configuration
#EnableJpaAuditing
public class CommonConfig {
#Value("${spring.datasource.hikari.maximum-pool-size}")
int connectionPoolSize;
#Bean
public Scheduler scheduler() {
return Schedulers.parallel();
}
#Bean
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(Executors.newFixedThreadPool(connectionPoolSize));
}
#Bean
public TransactionTemplate transactionTemplate(PlatformTransactionManager transactionManager) {
return new TransactionTemplate(transactionManager);
}
}
Now, while doing a get/save call in my service layer I do:
#Override
public Mono<Config> getConfigByKey(String key) {
return Mono.defer(
() -> Mono.justOrEmpty(configRepository.findByKey(key)))
.subscribeOn(jdbcScheduler)
.publishOn(scheduler);
}
#Override
public Flux<Config> getAllConfigsAfterAppVersion(int appVersion) {
return Flux
.fromIterable(configRepository.findAllByMinAppVersionIsGreaterThanEqual(appVersion))
.subscribeOn(jdbcScheduler)
.publishOn(scheduler);
}
#Override
public Flux<Config> addConfigs(List<Config> configList) {
return Flux.fromIterable(configRepository.saveAll(configList))
.subscribeOn(jdbcScheduler)
.publishOn(scheduler);
}
And in controller, I do:
#PostMapping
#ResponseStatus(HttpStatus.CREATED)
Mono<ResponseDto<List<Config>>> addConfigs(#Valid #RequestBody List<Config> configs) {
return configService.addConfigs(configs).collectList()
.map(configList -> new ResponseDto<>(HttpStatus.CREATED.value(), configList, null))
.subscribeOn(scheduler);
}
Is this correct? and/or there is a way better way to do it?
What I understand by:
.subscribeOn(jdbcScheduler)
.publishOn(scheduler);
is that task will run on jdbcScheduler threads and later result will be published on my main parallel scheduler. Is this understanding correct?
Your understanding is correct with regards to publishOn and subscribeOn (see reference documentation in the reactor project about those operators).
If you call blocking libraries without scheduling that work on a specific scheduler, those calls will block one of the few threads available (by default, the Netty event loop) and your application will only be able to serve a few requests concurrently.
Now I'm not sure what you're trying to achieve by doing that.
First, the parallel scheduler is designed for CPU bound tasks, meaning you'll have few of them, as many (or a bit more) as CPU cores. In this case, it's like setting your threadpool size to the number of cores on a regular Servlet container. Your app won't be able to process a large number of concurrent requests.
Even if you choose a better alternative (like the elastic Scheduler), it will be still not as good as the Netty event loop, which is where request processing is scheduled natively in Spring WebFlux.
If your ultimate goal is performance and scalability, wrapping blocking calls in a reactive app is likely to perform worse than your regular Servlet container.
You could instead use Spring MVC and:
use usual blocking return types when you're dealing with a blocking library, like JPA
use Mono and Flux return types when you're not tied to such libraries
This won't be non-blocking, but this will be asynchronous still and you'll be able to do more work in parallel without dealing with the complexity.
IMHO, there a way to execute this operation doing a better use of resources from machine. Following documentation you can wrap the call in other Thread and with this you can continue your execution.

Where to create ExecutorServices and when to close them

I'm creating a REST service using Spring with Jersey and I have a use case where for every request I get, I need to make several calls (N) to an upstream API.
I get one request, it has n items, for each item, I create a thread to call my dependency (REST) and process the response. At the end I collect all the responses together, maintaining the order, and return them as a single response to the client.
I am using Java 8's CompletableFuture and wanted to know if I was using the ExecutorService framework correctly.
#Component // automatic singleton by Spring
class A {
private ExecutorService executorService = Executors.newCachedThreadPool();
private RawResponse getRawResponse(Item item) {
// make REST call
}
private Response processResponse(RawResponse rawResponse) {
// process response
}
public List<Response> handleRequest(Request request) {
List<CompletableFuture> futureResponses = new ArrayList<>();
for(Item item : request.getItems()) {
CompletableFuture<Response> futureResponse = CompletableFuture.supplyAsync(() -> getRawResponse(item), executorService)
.thenApply(rawResponse -> processResponse(rawResponse))
.handle((response, throwable) {
if(throwable != null) { // log and return default response
} else { return response;}});
futureResponses.add(futureResponse);
}
List<Response> result = new ArrayList<>();
for (CompletableFuture<Resource> futureResponse : futureResponses) {
try {
result.add(futureResponse.get());
} catch (Exception e) {
// log error
}
}
return result;
}
}
The question I have now is, should I move the creation of the executorService right above:
List<CompletableFuture> futureResponses = new ArrayList<>();
and call shutdown on it right above:
return result;
because at this time, I am not really calling shutdown anywhere since the app will always run in it's docker container.
Is it costly to keep creating and discarding the pool, or is the current way going to leak memory? And I think calling the pool static as a private field var is redundant since the class is a spring bean anyways (singleton).
Any advice will be appreciated, also should I be using a cachedThreadPool? I am not sure how to approximate the number of threads I need.
should I move the creation of the executorService right above?
No you don't, you have your ExecutorService the right place in your example code. Think it as a thread pool, you will not want to init a new thread pool and close it for each method call of handleRequest. Of course ExecutorService does more job than a thread pool, actually it'll manage a thread pool underneath, and provides life-circle management for async tasks.
I am not really calling shutdown anywhere since the app will always run in it's docker container.
In most of cases you init your ExecutorService when applications starts and shut it down when applications shuts down. So you can just leave it there, because it'll be closed when application exits, or you can add some kind of shutdown hooks if you need to do graceful shutdown.
Is it costly to keep creating and discarding the pool.
Kind of, we don't want to create and discard Thread very often, so we have thread pool, if you create/discard pool for each method call, what's the point to have a thread pool.
or is the current way going to leak memory?
No, as long as the task you submitted does not leak memory. The implementation of ExecutorService itself is good to use.
And I think calling the pool static as a private field var is redundant since the class is a spring bean anyways (singleton)
Yes, you're correct. You can also define ExecutorService as Spring Bean and inject it to service bean, if you want to do some customized init process.
should I be using a cachedThreadPool, I am not sure how to approximate the number of threads I need.
That's hard to say, you need to do some test to get the right number of threads for your application. But most of NIO or EventDriven framework has the twice number of available cores to be the number of threads by default.
As you are using Spring, you might want to let it handle the asynchronous execution instead.
Just put #EnableAsync in one of your #Configuration classes to enable the #Async annotation on methods.
You would then change your getRawResponse() to
#Async
private CompletableFuture<RawResponse> getRawResponse(Item item) {
// make REST call
return CompletableFuture.completedFuture(rawResponse);
}
(you might need to put this method in a separate service to allow proper proxying, depending on how AOP is configured in your project)
and change your loop to simply
for(Item item : request.getItems()) {
CompletableFuture<Response> futureResponse = getRawResponse(item)
.thenApply(rawResponse -> processResponse(rawResponse))
.handle((response, throwable) {
if(throwable != null) { // log and return default response
} else { return response;}
});
futureResponses.add(futureResponse);
}
As you can see, you do not need to care about the executor in your service anymore.
You can also customize your executor by declaring its Spring bean, for example:
#SpringBootApplication
#EnableAsync
public class Application extends AsyncConfigurerSupport {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
#Override
public Executor getAsyncExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(2);
executor.setMaxPoolSize(2);
executor.setQueueCapacity(500);
executor.setThreadNamePrefix("GithubLookup-");
executor.initialize();
return executor;
}
}
You can even configure several executors and select one by providing its name as parameter to the #Async annotation.
See also Getting Started: Creating Async Methods and The #Async annotation.

stop Spring Scheduled execution if it hangs after some fixed time

I have used Spring Framework's Scheduled to schedule my job to run at every 5 mins using cron. But sometime my job waits infinitely for an external resource and I can't put timeout there. I can't use fixedDelay as previous process sometime goes in wait infinitely mode and I have to refresh data at every 5 mins.
So I was looking any option in Spring Framework's Scheduled to stop that process/thread after a fixed-time either it run successfully or not.
I have found below setting which initialized ThreadPoolExecutor with 120 seconds for keepAliveTime which I put in #Configuration class. Can anybody tell me will this work as I expected.
#Bean(destroyMethod="shutdown")
public Executor taskExecutor() {
int coreThreads = 8;
int maxThreads = 20;
final ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(
coreThreads, maxThreads, 120L,
TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>()
);
threadPoolExecutor.allowCoreThreadTimeOut(true);
return threadPoolExecutor;
}
I'm not sure this will work as expected. Indeed the keepAlive is for IDLE thread and I don't know if your thread waiting for resources is in IDLE. Furthermore it's only when the number of threads is greater than the core so you can't really know when it happen unless you monitor the threadpool.
keepAliveTime - when the number of threads is greater than the core, this is the maximum time that excess idle threads will wait for new tasks before terminating.
What you can do is the following:
public class MyTask {
private final long timeout;
public MyTask(long timeout) {
this.timeout = timeout;
}
#Scheduled(cron = "")
public void cronTask() {
Future<Object> result = doSomething();
result.get(timeout, TimeUnit.MILLISECONDS);
}
#Async
Future<Object> doSomething() {
//what i should do
//get ressources etc...
}
}
Don't forget to add #EnableAsync
It's also possible to do the same without #Async by implementing a Callable.
Edit: Keep in mind that it will wait until timeout but the thread running the task won't be interrupted. You will need to call Future.cancel when TimeoutException occurs. And in the task check for isInterrupted() to stop the processing. If you are calling an api be sure that isInterrupted() is checked.
allowCoreThreadTimeOut and timeout setting doesn't help cause it just allow work thread to be ended after some time without work (See javadocs)
You say your job waits infinitely for an external resource. I'am sure it's because you (or some third-party library you using) use sockets with time out infinite-by-default.
Also keep in mind what jvm ignores Thread.interrupt() when it blocked on socket.connect/read.
So find out witch socket library used in your task (and how exactly it used) and change it's default timeout settings.
As example: there is RestTemplate widely used inside Spring (in rest client, in spring social, in spring security OAuth and so on). And there is ClientHttpRequestFactory implementation to create RestTemplate instances. By default, spring use SimpleClientHttpRequestFactory which use JDK sockets. And by default all it's timeouts are infinite.
So find out where exactly you freeze, read it's docs and configure it properly.
P.S. If you don't have enough time and "feeling lucky" try to run your app with setting jvm properties sun.net.client.defaultConnectTimeout and
sun.net.client.defaultReadTimeout to some reasonable values (See docs for more details)
The keepAliveTime is just for cleaning out worker threads that hasn't been needed for a while - it doesn't have any impact on the execution time of the tasks submitted to the executor.
If whatever is taking time respects interrupts you can start a new thread and join it with a timeout, interrupting it if it doesn't complete in time.
public class SomeService {
#Scheduled(fixedRate = 5 * 60 * 1000)
public void doSomething() throws InterruptedException {
Thread taskThread = new TaskThread();
taskThread.start();
taskThread.join(120 * 000);
if(taskThread.isAlive()) {
// We timed out
taskThread.interrupt();
}
}
private class TaskThread extends Thread {
public void run() {
// Do the actual work here
}
}
}

Waiting for all threads to finish in Spring Integration

I have a self-executable jar program that relies heavily on Spring Integration. The problem I am having is that the program is terminating before the other Spring beans have completely finished.
Below is a cut-down version of the code I'm using, I can supply more code/configuration if needed. The entry point is a main() method, which bootstraps Spring and starts the import process:
public static void main(String[] args) {
ctx = new ClassPathXmlApplicationContext("flow.xml");
DataImporter importer = (DataImporter)ctx.getBean("MyImporterBean");
try {
importer.startImport();
} catch (Exception e) {
e.printStackTrace();
} finally {
ctx.close();
}
}
The DataImporter contains a simple loop that fires messages to a Spring Integration gateway. This delivers an active "push" approach to the flow, rather than the common approach of polling for data. This is where my problem comes in:
public void startImport() throws Exception {
for (Item item : items) {
gatewayBean.publish(item);
Thread.sleep(200); // Yield period
}
}
For completeness, the flow XML looks something like this:
<gateway default-request-channel="inChannel" service-interface="GatewayBean" />
<splitter input-channel="inChannel" output-channel="splitChannel" />
<payload-type-router input-channel="splitChannel">
<mapping type="Item" channel="itemChannel" />
<mapping type="SomeOtherItem" channel="anotherChannel" />
</payload-type-router>
<outbound-channel-adapter channel="itemChannel" ref="DAOBean" method="persist" />
The flow starts and processes items effectively, but once the startImport() loop finishes the main thread terminates and tears down all the Spring Integration threads immediately. This results in a race condition, the last (n) items are not completely processed when the program terminates.
I have an idea of maintaining a reference count of the items I am processing, but this is proving to be quite complicated, since the flow often splits/routes the messages to multiple service activators - meaning it is difficult to determine if each item has "finished".
What I think I need is some way to either check that no Spring beans are still executing, or to flag that all items sent to the gateway have been completely processed before terminating.
My question is, how might I go about doing either of these, or is there a better approach to my problem I haven't thought of?
You're not using a request-response pattern here.
outbound-channel-adapter is a fire and forget action, if you want to wait for the response you should use an outbound-gateway that will wait for response, and connect the response to the original gateway, then in java sendAndReceive not just publish.
If you can get an Item to determine, whether it is still needed or not (processingFinished() or something similar executed in the back-end-stages), you can register all Items at a central authority, which keeps track of the number of non-finished Items and effecitvely determines a termination-condition.
If this approach is feasible, you could even think of packaging the items into FutureTask-objects or make use of similar concepts from java.util.concurrent.
Edit: Second Idea:
Have you thought about making the channels more intelligent? A sender closes the channel once it does not send any more data. In this scenario, the worker-beans do not have to be deamon threads but can determine their termination-criterion based on a closed and empty input channel.

Categories