My application may fire around 1000's or more triggers everyday. Each of these is categorized into 4 categories:
morning(9am)
afternoon(1pm)
evening(6pm)
night(10pm)
And at each of these times, there will be 100's of trigger fired. Below is code
SchedulerFactory schdFact = new StdSchedulerFactory();
Scheduler schd;
for(ecah morningSchedulers){
// some logic for instantiating trigger
AbstractTrigger trigger = (AbstractTrigger) newTrigger().withSchedule(cronSchedule("0 0"+mAlert+"0,0,0 * * ?")).build();
trigger.setStartTime(strtDat);
trigger.setEndTime(endDat);
final JobDetail job = newJob(AlertJob.class).build();
schd.scheduleJob(job, trigger);
}
I have 2 questions here
Should I instantiate scheduler inside for loop or outside and schedule many triggers to the same scheduler. i.e. schd = schdFact.getScheduler(); where should I write this line?(inside or outside for loop)
I have to reschedule some of these triggers. i.e. stop on some condition and start again.
how many triggers can a single instance of SchedulerFactory have?
There is a similar question in the official FAQ:
How many jobs is Quartz capable of running?
This is a tough question to answer... the answer is basically "it depends". [...] So, the limiting factor of the number of Triggers and Jobs Quartz can "store" and monitor is really the amount of storage space available to the JobStore (either the amount of RAM or the amount of disk space).
Also remember about this:
[...] The actual number of jobs that can be running at any moment in time is limited by the size of the thread pool. If there are five threads in the pool, no more than five jobs can run at a time.
If you have thousands of triggers (you aren't confusing triggers and trigger executions?), consider JDBC storage. But if you have only few triggers but running several times a day, RAM store is enough.
Should I instantiate scheduler inside for loop or outside and schedule many triggers to the same scheduler
Definitely have only a single scheduler for the whole application. It is rare to have more than one scheduler in an application, see: Utilizing Multiple (Non-Clustered) Scheduler Instances.
Create the scheduler once and treat as a singleton.
I have to reschedule some of these triggers. i.e. stop on some condition and start again.
This is, again, explained in the documentation: Updating an existing Trigger. Basically you need to know the trigger key:
// retrieve the trigger
Trigger oldTrigger = sched.getTrigger(triggerKey("oldTrigger", "group1");
// obtain a builder that would produce the trigger
TriggerBuilder tb = oldTrigger.getTriggerBuilder();
// update the schedule associated with the builder, and build the new trigger
// (other builder methods could be called, to change the trigger in any desired way)
Trigger newTrigger = tb.withSchedule(simpleSchedule()
.withIntervalInSeconds(10)
.withRepeatCount(10)
.build();
sched.rescheduleJob(oldTrigger.getKey(), newTrigger);
BTW if just want to run a job at a given single hour, there is an easier API. Instead of:
newTrigger().
withSchedule(
cronSchedule("0 0"+mAlert+"0,0,0 * * ?")
).
build()
You can simply say:
newTrigger().
withSchedule(
dailyAtHourAndMinute(mAlert, 42)
).
build();
Have a look at Lesson 6: CronTrigger of the official tutorial.
Related
I'm building a system where users can set a future date(down to hours and minutes) in calendar. At that date a trigger is calling a certain task, unique for every user.
Every user can set a different date. The system will have 10k+ from the start and a user can create more than one trigger.
So assuming I have 10k users each user create on average 3 triggers => 30k triggers with 30k different dates.
All dates are saved in a database.
I'm new to quartz, can this be done in a more optimized way?
I was thinking about making a task run every minute that will get the tasks that will suppose to run in the next hour and remove them from database.
Do you have any better ideas? Did someone used quartz for a large number of triggers.
You have the schedule backed in the database. If I understand the idea - you want the quartz to load all the upcoming tasks to execute them in the future.
This is problematic approach:
Synchronization Issues: I assume that users can edit, remove and add new tasks to the database. You would have to periodically ask the database to refresh the state of the quartz jobs, remove some jobs, edit other jobs etc. This may not be trivial. The state of the program would be a long living cache which needs to be synchronised often.
Performance and scalability issues: Even if proposed solution may be ok for 30K tasks it may not be ok for 70k or 700k tasks. In your approach it's not easy to scale - adding new machine would require additional layer of synchronisation - which machine should actually execute which job (as all of them have all the tasks).
What I would propose:
Add the "stage" to the Tasks table (new, queued, running, finished, failed)
divide your solution into several components. (Initially they can run on a single machine but it will be easy to scale)
Components:
Task Finder: Executed periodically (once every few seconds). Scans the database for tasks that are "new", and due soon. Sends the tasks found to Message Queue and marks the task as "queued" in the db. Marking as "queued" has to be done carefully as there can be multiple "task finders". (As an addition it may find the tasks that have been marked as "queued" or "running" more than N minutes ago and are not "finished" nor "canceled" - probably need to re-run these)
Message Queue: Connector between Taks Finder and Task Executor.
Task Executor: Listens to the Message Queue and process the tasks that it received. Marks the tasks as "running" initially and "finished" or "failed" later on.
With this approach you can have:
multiple Task Executors on multiple machines
multiple Task Schedulers on multiple machines
even if one of the Task Schedulers or Executors will fail it will not be Single Point of Failure. Some of the tasks will be delayed but it will be picked up and run afterwards.
This may not address all the scenarios but would be a good starting point.
I don't see why you need quartz here at all. As far as I remember, quartz is best used to schedule backend internal processes, not user-defined tasks obtained from db.
Just process the trigger as it is created, save a row to your tasks table with start_date based on the trigger and every second select all incomplete tasks with start_date< sysdate. If the job is repeating, calculate next execution time and insert new task row / update previous accordingly.
As Sam pointed out there are some nice topics addressing the same problem:
Quartz Performance
Quartz FAQ
In a system like the mentioned it should not a problem mostly to handle this amount of triggers. But according to my experiance it is a better way to create something like a "JobChecker". If you enable your users to create own triggers it could really break Quartz in some cases. For example if 5000 user creates an event to the exact same time, Quartz will have a hard time to handle them correctly. (It is not likely a situation that will occur often, but it is possible as your specification does not excludes it.) Quartz has difficulties only when a lot of triggers should be fired at the same time.
My recommendation to this problem is to create one job that is running in every hour/minute etc and that should handle every user set events. This way is simmilar to a cron job in bash. With this kind of processing your system will be pretty stable even if the number of "triggers" increases dramatically. Basically your line of thought is correct if you thrive for scalability.
Related: Quartz Clustering - triggers duplicated when the server starts
I'm using Quartz Scheduler to manage scheduled jobs in a java-based clustered environment. There are a handful of nodes in the cluster at any given time, and they all run Quartz, backed by a data store in a postgresql database that all nodes connect to.
When an instance is initialized, it tries to create or update the jobs and triggers in the Quartz data store by executing this code:
private void createOrUpdateJob(JobKey jobKey, Class<? extends org.quartz.Job> clazz, Trigger trigger) throws SchedulerException {
JobBuilder jobBuilder = JobBuilder.newJob(clazz).withIdentity(jobKey);
if (!scheduler.checkExists(jobKey)) {
// if the job doesn't already exist, we can create it, along with its trigger. this prevents us
// from creating multiple instances of the same job when running in a clustered environment
scheduler.scheduleJob(jobBuilder.build(), trigger);
log.error("SCHEDULED JOB WITH KEY " + jobKey.toString());
} else {
// if the job has exactly one trigger, we can just reschedule it, which allows us to update the schedule for
// that trigger.
List<? extends Trigger> triggers = scheduler.getTriggersOfJob(jobKey);
if (triggers.size() == 1) {
scheduler.rescheduleJob(triggers.get(0).getKey(), trigger);
return;
}
// if for some reason the job has multiple triggers, it's easiest to just delete and re-create the job,
// since we want to enforce a one-to-one relationship between jobs and triggers
scheduler.deleteJob(jobKey);
scheduler.scheduleJob(jobBuilder.build(), trigger);
}
}
This approach solves a number of problems:
If the environment is not properly configured (i.e. jobs/triggers don't exist), then they will be created by the first instance that launches
If the job already exists, but I want to modify its schedule (change a job that used to run every 7 minutes to now run every 5 minutes), I can define a new trigger for it, and a redeploy will reschedule the triggers in the database
Exactly one instance of a job will be created, because we always refer to jobs by the specified JobKey, which is defined by the job itself. This means that jobs (and their associated triggers) are created exactly once, regardless of how many nodes are in the cluster, or how many times we deploy.
This is all well and good, but I'm concerned about a potential race condition when two instances are started at exactly the same time. Because there's no global lock around this code that all nodes in the cluster will respect, if two instances come online at the same time, I could end up with duplicate jobs or triggers, which kind of defeats the point of this code.
Is there a best practice for automatically defining Quartz jobs and triggers in a clustered environment? Or do I need to resort to setting my own lock?
I am not sure if there is a better way to do this in Quartz. But in case you are already using Redis or Memcache, I would recommend letting all instances perform an atomic increment against a well known key. If the code you pasted is supposed to run only one job per cluster per hour, you could do the following:
long timestamp = System.currentTimeMillis() / 1000 / 60 / 60;
String key = String.format("%s_%d", jobId, timestamp);
// this will only be true for one instance in the cluster per (job, timestamp) tuple
bool shouldExecute = redis.incr(key) == 1
if (shouldExecute) {
// run the mutually exclusive code
}
The timestamp gives you a moving window within which jobs are competing to execute this job.
I had (almost) the same problem: How to create triggers and jobs exactly once per software version in clustered environment. I solved the problem by assigning one of the cluster nodes to be a lead node during start-up and letting it to re-create the Quartz jobs. The lead node is the one, which first successfully inserts the git revision number of the running software to the database. Other nodes use the Quartz configuration created by the lead node. Here's complete solution: https://github.com/perttuta/quartz
I'm looking for a way to simulate or force a trigger misfire programatically. Here's the scenario:
I have a job set to trigger but the job requires some underlying resource that may be unavailable at times. If the resource is unavailable, I would like Quartz to re-fire the trigger later based on the misfire policy.
I've explored two options that are similar but not quite what I'm looking for:
Throwing a JobExecutionException with refireImmediately set to true:
Works, but doesn't delay execution based on misfire policy; this
would hammer the resource availability check.
Scheduling a second trigger at some fixed interval of time in the
future: Also works, but doesn't take into account misfire policy.
This means a job could wind up with a bunch of retries queued up
stemming from different failed runs.
Any ideas or anything I'm missing? Thanks!
If I got it right, you don't need to force or simulate a misfire because the resource availability is "something your Job can handle".
Misfires exists and are managed by Quartz for situations like server shutdown or other "unexpected problems" that prevent Job execution.
You have two options to follow to implement a simple fault-tolerance retry logic;
your Job can execute it's logic only when the underlying resource is available, so you can:
Wait for the resource to become availabile:
in this case your job waits and repeat the check for the resource availability, eventually after a short timeout, the job can give up and end.
Just do nothing if the resource is not available:
in this case the job ends without doing anything and the it will fire normally and retry according the Trigger definition.
In both cases, if the resource is avaliable, the Job can execute it's inner logic and use the underlying resource. (No misfires, because the Job was actually executed)
This can be done using a Trigger, setting the misfire policy you need in case the Job cannot be executed at all. See This great and detailed article on Quartz misfires.
In your situation, If you want to execute a job one time every day:
Define a Cron trigger that fires multiple times in a day in a given span of time, for exemple, every 15 minutes from 8:00 AM to 12:00 AM:
0 0/15 8-12 * * ?
Buld a Job that use one of the two approaches described before
The first time your Job inner logic executes (that is, the resource is available) your job will save a "job executed flag" with the day of execution somewhere on DB.
On the following trigger executions, the Job will check the flag and will not execute it's inner logic again.
Also, if your job will take long time to finish, you may want to prevent concurrent execution of the same job using the following annotation on the job implementation:
#DisableConcurrentExecution
See Quartz tutorials for more informations on job execution.
I have 5 different quartz schedulers which implement 5 different jobs. If I am stopping one scheduler, remaining schedulers are getting stopped. Why?
I'm pretty sure your actually creating references to the same Scheduler, you need to give each scheduler a different "SchedulerName". At the moment it looks like each time your creating a new scheduler its defaulting the SchedulerName.
The "job executor" is actually not the SchedulerFactoryBean. It is the Scheduler bean(to be precise calling its start method invokes the aggregated QuartzScheduler.start method which fires the Trigger-s), provided by the SchedulerFactoryBean. As a matter of fact this Scheduler is stored(and looked-up) under the schedulerName(which if not explicitly set has the same default value for every configured SchedulerFactoryBean) in the SchedulerRepository singleton(SchedulerRepository.getInstance()).
That's how unless you set a different schedulerName for your SchedulerFactoryBean-s, you will always get the same scheduler by each and every SchedulerFactoryBean-s
http://forum.springsource.org/showthread.php?40945-Multiple-Quartz-SchedulerFactoryBean-instances
I know this refers to Spring Beans but i still think the same applies here.
I am working on an application where we have 100 of jobs that's needs to be schedules for executions.
Here is my sample quartz.property file
org.quartz.scheduler.instanceName=QuartzScheduler
org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.threadPool.threadCount=7
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.MSSQLDelegate
org.quartz.jobStore.tablePrefix = QRTZ_
org.quartz.jobStore.dataSource = myDS
org.quartz.dataSource.myDS.driver=com.mysql.jdbc.Driver
org.quartz.dataSource.myDS.URL=jdbc:mysql://localhost:3306/quartz
org.quartz.dataSource.myDS.user=root
org.quartz.dataSource.myDS.password=root
org.quartz.dataSource.myDS.maxConnections=5
Though this is working fine, but we are planning to separates jobs in different groups so that it can be easy to maintain them.
Groups will be unique and we want that if a user(Admin) creates a new group a new instance of scheduler should get created and all jobs within that group should be handled by that scheduler instance in future.
This means if the Admin creates a new group say NewProductNotification than we should be able to create a scheduler instance with same name NewProductNotification and all jobs which are parts of the NewProductNotification group should be handeled by NewProductNotification instance of scheduler.
How is this possible and how can we store this information in the Database so that next time when the server is up Quartz should have knowledge about all the scheduler instances or do we need to add this information about new instance in property file.
As the proprty file above showing , we are using jdbcjobstore to handle everything using database.
I don't think dynamically creating schedulers is a good desing approach in Quartz. You can share the same database tables for multiple schedulers (job details and triggers have scheduler name as part of their primary key) but Scheduler is kind of heavyweight.
Can you explain why do you relly need separate schedulers? Maybe you can simply use Job groups and triggers groups (you are in fact using the term group) to distinguish jobs/groups? Also you can use different priorities for each trigger.
As a side note:
I'm using JobStoreCMT and I'm seeing deadlocks, what can I do?
Make sure you have at least number-of-threads-in-thread-pool + 2 connections in your datasources.
And in your configuration (reverse the values and it will be fine):
org.quartz.threadPool.threadCount=7
org.quartz.dataSource.myDS.maxConnections=5
From: I'm using JobStoreCMT and I'm seeing deadlocks, what can I do?
Dynamically creating schedules is very much possible. You would need to create objects of JobDetail and Trigger and pass to the SchedulerFactoryBean object. It will take care of the rest.