We run several spring batch jobs within tomcat in the same web application that serves up our UI. Lately we have been adding many more jobs and we are noticing that when we patch our app, several jobs may get stuck in a STARTING or STARTED status. Many of those jobs ensure that another job is not running before they start up, so this means after we patch the server, some of our jobs are broken until we manually run SQL to update the statuses of the jobs to ABANDONED or STOPPED.
I have read here that JobScope and StepScope jobs don't play nicely with shutting down.
That article suggests not using JobScope or StepScope but I can't help but think that this is a solved problem where people must be doing something when the application exits to prevent this problem.
Are there some best practices for handling this scenario? What are you doing in your applications?
We are using spring-batch version 3.0.3.RELEASE
I will provide you an idea on how to solve this scenario. Not necessarily a spring-batch solution.
Everytime I need to add jobs in an application I do as this:
Create a table to control the jobs (queue, priority, status, etc.)
Create a JobController class to manage all jobs
All jobs are defined by the status R-running, F-Finished, Q-Queue (you can add more as you need like aborted, cancelled, etc) (the jobs control these statuses)
The jobController must be loaded only once, you can define it as a spring bean for this
Add a boolean attribute to JobController to inform if you already checked the jobs when you instantiate it. Set it to false
Check if there are jobs with the R status which means that in the last stop of the server they were running so you update every job with this R status to Q and increase their priority so it will get executed first after a restart of the server. This check is inside the if for that boolean attribute, after the check set it to true.
That way every time you call the JobController for the first time and there are unfinished jobs from a server crash you will be able to set then all to a status where it can be executed again. And this check will happens only once since you will be checking that boolean attribute.
A thing that you should be aware of is caution with your jobs priority, if you manage it wrong you may run into a starvation problem.
You can easily adapt this solution to spring-batch.
Hope it helps.
Related
Context
We have a Spring boot application (an API used by an angular frontend).
It is running on a docker container.
It is using a single instance of a PostgreSQL database.
Our application had some load problems so we asked us to scale it.
We told us to run our API on several docker containers for that.
We have several questions / problems dealing with code synchronization over multiple docker instances executing our code.
Problem 1
We have some #Scheduled jobs integrated and deployed with our API code.
We don't want these scheduled jobs to be executed by all container instances, but only one.
I think we can simply handle this by disabling jobs on other containers through environment variables with the "-" value to disable the Spring scheduled cron.
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/scheduling/annotation/Scheduled.html#CRON_DISABLED
Does it sounds right?
Problem 2
The other problem is that we use Spring's #Lock annotation on some repository methods.
public interface IncrementRepository extends JpaRepository<IncrementEntity, UUID> {
#Lock(LockModeType.PESSIMISTIC_FORCE_INCREMENT)
Optional<IncrementEntity> findByAnnee(String pAnneeAA);
#Lock(LockModeType.PESSIMISTIC_WRITE)
IncrementEntity save(IncrementEntity pIncrementEntity);
}
This is critical for us to have a lock on that as we get / compute an increment used to act as a unique identifier of some of our data.
If I correctly understood this locking mechanism :
if a process execute this code, the Spring JPA #Transaction will acquire a lock on the IncrementEntity (lock the database table).
when another process tries do do the same thing before the first lock has been released by the first transaction, it should have a PessimisticLockException and the second transaction will rollback
this is managed by Spring at application level, NOT directly at database level (right??)
So what will happen if we're running our code on several containers ?
app running in container 1 sets a lock
app running in container 2 execute the same code and tries to set the same lock while the first one has not been released yet
each Spring application running in different containers will probably acquire the lock without problems as they don't share the same information?
Please tell me if I correctly understood how it works, and if we will effectively have a problem running such code on several docker containers.
I guess that solution would be to set a lock directly on the database table, as we have only one instance of it?
Is there a way to easily set / release the lock at database level using Spring JPA code ?
Or perhaps I misunderstood and setting a lock using Spring's #Lock annotation sets a real database lock ?
In that case, perhaps we don't have any problem at all, as the lock is correctly set on the database itself, shared by all containers instances??
Problem 3
To avoid having too much exceptions and reject some requests trying to acquire a lock at the same time, we also added a synchronized block around the above code.
String numIncrement;
synchronized (this.mutex) {
try {
numIncrement = this.incrementService.getIncrement(var);
} catch (Exception e) {
// rethrow custom technical exception
}
}
This way concurrent requests should be delayed and queued, which is better for our users experience.
I guess that we will also have problems here as docker instances doesn't share the same JVM, so synchronization can work only in the scope of the container itself... right?
Conclusion
For all these problems, please tell me if you have some solutions to workaround / adapt our code so it can be compatible with app scaling.
Following a set of tests I can confirm these points about my original question
Problem 1
We can disable a Spring CRON with the - value
#Scheduled(cron = "-")
Problem 2
The Spring's JPa #Lock annotation sets a lock on the database itself. It is not managed by Spring software.
So when duplicating containers, if the Spring app in the first container sets a lock, the database is locked and when the second app in another container tries to get data it has the PessimisticLockException.
Problem 3
Synchronized code using the synchronized JAVA keyword is obviously managed by JVM, so there is no code mutual exclusion between containers.
I have a method with #postconstruct annotation , that needs to be executed after application start up. My application is hosted in OS with multiple pods. Now the method gets executed every time a pod starts.But I want it to run only once irrespective of number of instances.
One way you can do this would be for each instance to lookup a record in a shared database and for the pod to lock this record while it executes the method.
Once the executing method has completed, it sets the record with a flag to indicate that the init sequence is complete.
This flag can the be used by instances that did not execute the init method to ignore execution.
Alternatively, if you can restructure your codes, you could use other techniques. e.g this link might be useful
Kubernetes: Tasks that need to be done once per cluster or per statefulset or replicaset
In my application I have one cron job which connects to a FTP server and transfer files, a very simple functionality and it is configured using spring #Schedule annotation with cron expression as a parameter.
It was running fine for few months and then suddenly it stopped, got the connectException.
May be the FTP server was down or something happened which causes the cron thread to stop.
I looked (google) for the reasons but didnt get any ( Nothing much in the logs also - Just the exception name ).It may be a one time thing :)
my question is that can I put some check or watcher on the #Schedule cron job to know whether it is running or not ?
Sorry for my bad explanation/english
Thanks
my question is that can I put some check or watcher on the #Schedule
cron job to know whether it is running or not ?
Basically, you can't.
When you use #Scheduled, Spring uses a ScheduledAnnotationBeanPostProcessor to register the tasks you specify (annotated methods). It registers them with a ScheduledTaskRegistrar. The ScheduledAnnotationBeanPostProcessor is an ApplicationListener<ContextRefreshEvent>. When it receives the ContextRefreshEvent from the ApplicationContext, it schedules the tasks registered in the ScheduledTaskRegistrar.
During this step, these tasks are scheduled with a TaskScheduler which typically wraps a ScheduledExecutorService. If an exception is uncaught in a submitted task, then the task is removed from the ScheduledExecutorService queue.
The TaskScheduler class does not provide a public API to retrieve the scheduled tasks, ie. the ScheduledFuture objects. So you can't use it to find out if your tasks are running or not.
And you probably shouldn't. Develop your tasks, your #Scheduled methods, to be able to withstand an exception being thrown. Some times, obviously, that's not possible. With a network error, for example, you would probably have to restart your application. Without knowing anything else about your application, I would say more logging is your best bet.
So I started to tinker around with JDBCJobStore in Quartz. Firstly, I could not find a single good resource on how to configure it from scratch. After looking for it for a while and singling out a good resource for beginners, I downloaded the sample application at Job scheduling with Quartz. I have a few doubts regarding it.
How does JDBCJobStore capture jobs.? I mean in order for the job to get stored in the database does the job have to run manually once.? Or will JDBCJobStore automatically detect the jobs and their details..?
How does JDBCJobStore schedule the jobs.? Does it hit the database at a fixed interval like a heartbeat to check if there are any scheduled jobs.? Or does it keep the triggers in the memory while the application is running.?
In order to run the jobs will I have to manually specify the details of the job like like name and group and fetch the trigger accordingly.? Is there any alternative to this.?
On each application restart how can I tell the scheduler to start automatically..? Can it be specified somehow.?
If you are using servlet/app server you can start it during startup:
http://quartz-scheduler.org/documentation/quartz-2.2.x/cookbook/ServletInitScheduler
If you are running standalone you have to initialize it manually i think.
You can read more about JobStores here:
http://quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/tutorial-lesson-09
And about jobs and triggers:
http://quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/tutorial-lesson-02
http://quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/tutorial-lesson-03
http://quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/tutorial-lesson-04
I guess that quartz checks jobs based on time interval to proper work in clusters and distributed systems.
I have a Quartz setup with multiple instances and I want to interrupt a job wherever it is executed. As it was said in documentation, Scheduler.interrupt() method is not cluster aware so I'm looking for some common practice to overcome such limitation.
Well, here are some basics you should use to achieve that.
When running in cluster mode, the information about the currently running jobs are available in the quartz tables. For instance, the q_fired_triggers contains the job being executed.
The first column of this table is the scheduler name being in charge of it. So it is pretty easy to know who is doing what.
Then, if you enable the JMX export of your quartz instances org.quartz.scheduler.jmx.export, the MBeans you will enable a new entry point to remotely manage each scheduler individually. The MBean provides a method boolean interruptJob("JobName", "JobGroup")
Then you "just" need to call this method on the appropriated scheduler instance to effectively interrupt it.
I tried all the process manually and it works fine, just need to be automatized :)
HIH
You are right. The Scheduler.interrupt() does not work in the cluster mode. Let's say that a job trigger is fired by a scheduler in a node but this API is called in another node.
To overcome this, you might use the message broker approach (e.g. JMS, RabbitMQ, etc.) with publish/subscribe model. Instead of calling Scheduler.interrupt(), the client sends a message of this interruption to the message broker, the payload of the message consists of the identity of the job detail i.e JobKey and the name of scheduler ((if there are multiple schedulers used in a node). Then, the message is consumed by all nodes in which the Quartz instance is running, and the nodes find Quartz scheduler by name and then executes Scheduler.interrupt() of the found scheduler with the identity of the job detail taken from the message payload.