Interrupting a job in quartz cluster

Interrupting a job in quartz cluster - java

I have a Quartz setup with multiple instances and I want to interrupt a job wherever it is executed. As it was said in documentation, Scheduler.interrupt() method is not cluster aware so I'm looking for some common practice to overcome such limitation.

Well, here are some basics you should use to achieve that.
When running in cluster mode, the information about the currently running jobs are available in the quartz tables. For instance, the q_fired_triggers contains the job being executed.
The first column of this table is the scheduler name being in charge of it. So it is pretty easy to know who is doing what.
Then, if you enable the JMX export of your quartz instances org.quartz.scheduler.jmx.export, the MBeans you will enable a new entry point to remotely manage each scheduler individually. The MBean provides a method boolean interruptJob("JobName", "JobGroup")
Then you "just" need to call this method on the appropriated scheduler instance to effectively interrupt it.
I tried all the process manually and it works fine, just need to be automatized :)
HIH

You are right. The Scheduler.interrupt() does not work in the cluster mode. Let's say that a job trigger is fired by a scheduler in a node but this API is called in another node.
To overcome this, you might use the message broker approach (e.g. JMS, RabbitMQ, etc.) with publish/subscribe model. Instead of calling Scheduler.interrupt(), the client sends a message of this interruption to the message broker, the payload of the message consists of the identity of the job detail i.e JobKey and the name of scheduler ((if there are multiple schedulers used in a node). Then, the message is consumed by all nodes in which the Quartz instance is running, and the nodes find Quartz scheduler by name and then executes Scheduler.interrupt() of the found scheduler with the identity of the job detail taken from the message payload.

Related

Flink Job Cluster vs Session Cluster - deploying and configuration

I'm researching docker/k8s deployment possibilities for Flink 1.9.1.
I'm after reading/watching [1][2][3][4].
Currently we do think that we will try go with Job Cluster approach although
we would like to know what is the community trend with this? We would rather
not deploy more than one job per Flink cluster.
Anyways, I was wondering about few things:
How can I change the number of task slots per task manager for Job and
Session Cluster? In my case I'm running docker on VirtualBox where I have 4
CPUs assigned to this machine. However each task manager is spawned with
only one task slot for Job Cluster. With Session Cluster however, on the
same machine, each task manager is spawned with 4 task slots.
In both cases Flink's UI shows that each Task manager has 4 CPUs.
How can I resubmit job if I'm using a Job Cluster. I'm referring this use
case [5]. You may say that I have to start the job again but with different
arguments. What is the procedure for this? I'm using checkpoints btw.
Should I kill all task manager containers and rerun them with different
parameters?
How I can resubmit job using Session Cluster?
How I can provide log config for Job/Session cluster?
I have a case, where I changed log level and log format in log4j.properties
and this is working fine on local (IDE) environment. However when I build
the fat jar, and ran a Job Cluster based on this jar it seams that my log4j
properties are not passed to the cluster. I see the original format and
original (INFO) level.
Thanks,
[1] https://youtu.be/w721NI-mtAA
[2] https://youtu.be/WeHuTRwicSw
[3] https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html
[4] https://github.com/apache/flink/blob/release-1.9/flink-container/docker/README.md
[5] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Job-claster-scalability-td32027.html

Currently we do think that we will try go with Job Cluster approach although we would like to know what is the community trend with this? We would rather not deploy more than one job per Flink cluster.
This question is probably better suited on the user mailing list.
How can I change the number of task slots per task manager for Job and Session Cluster?
You can control this via the config option taskmanager.numberOfTaskSlots
How I can resubmit job using Session Cluster?
This is described here. The bottom line is that you create a savepoint and resume your job from it. It is also possible to resume a job from retained checkpoints.
How can I resubmit job if I'm using a Job Cluster.
Conceptually, this is no different from resuming a job from a savepoint in a session cluster. You can specify the path to the savepoint as a command line argument to the cluster entrypoint. The details are described here.
How I can provide log config for Job/Session cluster?
If you are using the scripts in the bin/ directory of the Flink binary distribution to start your cluster (e.g., bin/start-cluster.sh, bin/jobmanager.sh, bin/taskmanager.sh, etc.), you can change the log4j configuration by adapting conf/log4j.properties. The logging configuration is passed to the JobManager and TaskManager JVMs as a system variable (see bin/flink-daemon.sh). See also the Chapter "How to use logging" in the Flink documentation.

Do we need to test, Is cron job running or not? or Is Health check of cron job needed?

We are using a spring boot cron job which runs after every 1 hour. Do we need to test its health for checking, Is it running or not?

If you are asking whether you need to test if the cron job runs, that is for you to decide:
Does it matter if it doesn't run?
Does it matter if it runs correctly?
Do you want to be notified?
(At 4am on your rostered day off?)
Assuming that you have decided that you do need to know, there are many ways to detect that the cron job has run or not. For example:
Have the job append messages to a log file
Have the job write a report file
Have the job write a record to a database
Have the job send a message to syslog (or equivalent).
Next you need some kind of monitoring to check that the log messages have been written, etcetera, and notifying someone in an appropriate fashion. There are lots of existing systems for doing this; e.g. Nagios, Checkmk, Remedy, etc. Some are free.

Shutting down spring batch jobs gracefully in tomcat

We run several spring batch jobs within tomcat in the same web application that serves up our UI. Lately we have been adding many more jobs and we are noticing that when we patch our app, several jobs may get stuck in a STARTING or STARTED status. Many of those jobs ensure that another job is not running before they start up, so this means after we patch the server, some of our jobs are broken until we manually run SQL to update the statuses of the jobs to ABANDONED or STOPPED.
I have read here that JobScope and StepScope jobs don't play nicely with shutting down.
That article suggests not using JobScope or StepScope but I can't help but think that this is a solved problem where people must be doing something when the application exits to prevent this problem.
Are there some best practices for handling this scenario? What are you doing in your applications?
We are using spring-batch version 3.0.3.RELEASE

I will provide you an idea on how to solve this scenario. Not necessarily a spring-batch solution.
Everytime I need to add jobs in an application I do as this:
Create a table to control the jobs (queue, priority, status, etc.)
Create a JobController class to manage all jobs
All jobs are defined by the status R-running, F-Finished, Q-Queue (you can add more as you need like aborted, cancelled, etc) (the jobs control these statuses)
The jobController must be loaded only once, you can define it as a spring bean for this
Add a boolean attribute to JobController to inform if you already checked the jobs when you instantiate it. Set it to false
Check if there are jobs with the R status which means that in the last stop of the server they were running so you update every job with this R status to Q and increase their priority so it will get executed first after a restart of the server. This check is inside the if for that boolean attribute, after the check set it to true.
That way every time you call the JobController for the first time and there are unfinished jobs from a server crash you will be able to set then all to a status where it can be executed again. And this check will happens only once since you will be checking that boolean attribute.
A thing that you should be aware of is caution with your jobs priority, if you manage it wrong you may run into a starvation problem.
You can easily adapt this solution to spring-batch.
Hope it helps.

How to check whether #Schedule annotation (Spring 3.1) cron job is running.

In my application I have one cron job which connects to a FTP server and transfer files, a very simple functionality and it is configured using spring #Schedule annotation with cron expression as a parameter.
It was running fine for few months and then suddenly it stopped, got the connectException.
May be the FTP server was down or something happened which causes the cron thread to stop.
I looked (google) for the reasons but didnt get any ( Nothing much in the logs also - Just the exception name ).It may be a one time thing :)
my question is that can I put some check or watcher on the #Schedule cron job to know whether it is running or not ?
Sorry for my bad explanation/english
Thanks

my question is that can I put some check or watcher on the #Schedule
cron job to know whether it is running or not ?
Basically, you can't.
When you use #Scheduled, Spring uses a ScheduledAnnotationBeanPostProcessor to register the tasks you specify (annotated methods). It registers them with a ScheduledTaskRegistrar. The ScheduledAnnotationBeanPostProcessor is an ApplicationListener<ContextRefreshEvent>. When it receives the ContextRefreshEvent from the ApplicationContext, it schedules the tasks registered in the ScheduledTaskRegistrar.
During this step, these tasks are scheduled with a TaskScheduler which typically wraps a ScheduledExecutorService. If an exception is uncaught in a submitted task, then the task is removed from the ScheduledExecutorService queue.
The TaskScheduler class does not provide a public API to retrieve the scheduled tasks, ie. the ScheduledFuture objects. So you can't use it to find out if your tasks are running or not.
And you probably shouldn't. Develop your tasks, your #Scheduled methods, to be able to withstand an exception being thrown. Some times, obviously, that's not possible. With a network error, for example, you would probably have to restart your application. Without knowing anything else about your application, I would say more logging is your best bet.

How does the JDBCJobStore work .?

So I started to tinker around with JDBCJobStore in Quartz. Firstly, I could not find a single good resource on how to configure it from scratch. After looking for it for a while and singling out a good resource for beginners, I downloaded the sample application at Job scheduling with Quartz. I have a few doubts regarding it.
How does JDBCJobStore capture jobs.? I mean in order for the job to get stored in the database does the job have to run manually once.? Or will JDBCJobStore automatically detect the jobs and their details..?
How does JDBCJobStore schedule the jobs.? Does it hit the database at a fixed interval like a heartbeat to check if there are any scheduled jobs.? Or does it keep the triggers in the memory while the application is running.?
In order to run the jobs will I have to manually specify the details of the job like like name and group and fetch the trigger accordingly.? Is there any alternative to this.?
On each application restart how can I tell the scheduler to start automatically..? Can it be specified somehow.?

If you are using servlet/app server you can start it during startup:
http://quartz-scheduler.org/documentation/quartz-2.2.x/cookbook/ServletInitScheduler
If you are running standalone you have to initialize it manually i think.
You can read more about JobStores here:
http://quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/tutorial-lesson-09
And about jobs and triggers:
http://quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/tutorial-lesson-02
http://quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/tutorial-lesson-03
http://quartz-scheduler.org/documentation/quartz-2.2.x/tutorials/tutorial-lesson-04
I guess that quartz checks jobs based on time interval to proper work in clusters and distributed systems.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.