I'm looking for a way to simulate or force a trigger misfire programatically. Here's the scenario:
I have a job set to trigger but the job requires some underlying resource that may be unavailable at times. If the resource is unavailable, I would like Quartz to re-fire the trigger later based on the misfire policy.
I've explored two options that are similar but not quite what I'm looking for:
Throwing a JobExecutionException with refireImmediately set to true:
Works, but doesn't delay execution based on misfire policy; this
would hammer the resource availability check.
Scheduling a second trigger at some fixed interval of time in the
future: Also works, but doesn't take into account misfire policy.
This means a job could wind up with a bunch of retries queued up
stemming from different failed runs.
Any ideas or anything I'm missing? Thanks!
If I got it right, you don't need to force or simulate a misfire because the resource availability is "something your Job can handle".
Misfires exists and are managed by Quartz for situations like server shutdown or other "unexpected problems" that prevent Job execution.
You have two options to follow to implement a simple fault-tolerance retry logic;
your Job can execute it's logic only when the underlying resource is available, so you can:
Wait for the resource to become availabile:
in this case your job waits and repeat the check for the resource availability, eventually after a short timeout, the job can give up and end.
Just do nothing if the resource is not available:
in this case the job ends without doing anything and the it will fire normally and retry according the Trigger definition.
In both cases, if the resource is avaliable, the Job can execute it's inner logic and use the underlying resource. (No misfires, because the Job was actually executed)
This can be done using a Trigger, setting the misfire policy you need in case the Job cannot be executed at all. See This great and detailed article on Quartz misfires.
In your situation, If you want to execute a job one time every day:
Define a Cron trigger that fires multiple times in a day in a given span of time, for exemple, every 15 minutes from 8:00 AM to 12:00 AM:
0 0/15 8-12 * * ?
Buld a Job that use one of the two approaches described before
The first time your Job inner logic executes (that is, the resource is available) your job will save a "job executed flag" with the day of execution somewhere on DB.
On the following trigger executions, the Job will check the flag and will not execute it's inner logic again.
Also, if your job will take long time to finish, you may want to prevent concurrent execution of the same job using the following annotation on the job implementation:
#DisableConcurrentExecution
See Quartz tutorials for more informations on job execution.
Related
I am working on a project in which I can hit maximum 15k hit a day to Google API. So I want to stop the job after 15k and resume it next day. Please let me know how can I do the same.
Please let me know how can I achieve the same. Right now I am thinking of using quartz scheduler to schedule the job every day.
If anyone needs full explanation, I can explain it more.
Thanks in advance.
You can stop a step execution (and its surrounding job) using StepExecution#setTerminateOnly. So in your case, you can use for example a ItemReadListener#afterRead or ItemWriteListener#afterWrite that has access to the step execution and set the terminateOnly flag after processing 15k items. When you stop the job gracefully like this, its status will be STOPPED and you will be able to restart it again the next day as you mentioned.
You can find an example in the Stopping a Job Manually for Business Reasons section of the reference documentation.
Hope this helps.
I had something similar where I needed to stop a 24/7 job 5 minutes before server maintenance was scheduled to start.
The easiest I found was to use the Reader and return null to indicate the job should stop. In your case, return null when 15k API requests were processed.
This will likely mean you'll need a bean (could be just an AtomicInteger) available to the Reader and updated by the Processor. But also a Job Listener (sorry, I don't have the code) which also knows about the bean. If the maximum is reached the Listener sets up a custom job exit value to be returned to the scheduler when the job stops. The scheduler has to be configurable enough to know the particular exit value means to start the job again the next day. (Any other non-zero value was treated as an error.)
This means there is a small possibility the job hits 15k but also that it is the last item, so the job is scheduled again for the next day even though there is nothing more to be processed. It shouldn't matter though - the job will start the next day and stop immediately with a normal complete status so the scheduler will not schedule again.
I have a non-concurrent quartz job running on 6 application server instances. A high level responsibility of the job is to walk through a DB table and process and update which ever row is expired. Now I see a behavior of the job which is not understandable.
I have a configuration by which the job should be triggered after 15 minutes, but as the span of a single run can be multiple days, each of this trigger after 15 minutes should be suppressed by a lock already acquired by running job instance.
So, the ideal behavior is, job starts running on one of the 6 server instances, it completes a single DB table iteration in let us say 3 days. Meanwhile, quartz is trying to push in another job every minutes, but as lock is already acquired, it should not. After 3 days when the first job run finishes quartz scheduler should succeed in starting another job, within <= 15 minutes of the first run endtime.
But, in reality I see a behavior, where the the job has run for some days and has not run for some of the days. some time this gap is as long as 8-10 days. I am unable to explain this scenario.
The closest theory I can think of, is that it might be the case that during a particular job run, the server instance got killed(due to deployment/redeployment), because of which the quartz did not get a chance to remove shared lock. So, all the attempts of acquiring a lock for next job run keep on failing till the orphan lock is not expired by an expiry date. The moment it got expired, a new job kicks in.
My question here is, what could be the possible explanations to this, and more importantly, how to debug it? Any leads to Quartz Lock management documentation for non-concurrent jobs can helpful.
I use DisallowConcurrentExecution annotation for non-concurrency.
With all due respect quartz is not very clear about repeatInterval in this question:
what happen if the method takes longer than the repeatInterval, is it going to fire the trigger even if the current method didn't finish? And is it going to cause connection pooling issue if the method create data source object?
say if the method normally takes 5 seconds to finish but may surge to 10 seconds, and repeatInterval is set to 8000 (8 seconds)
what will happen to the next occurrence of the trigger? I did some sample test and looks like it will occur in the 16th second because the first attempt at 8000 ms failed
is that the way it works? Is there any performance impact on the server?
If the method execution takes longer than specified interval, the second instance of the job will be created and run concurrently.
You can annotate your Job instance with #DisallowConcurrentExecution annotation which prevents multiple instances of the job to run concurrently.
#DisallowConcurrentExecution
public class TestJob implements Job {}
Having this, the further attempted job instances will be queued up and you have to watch out, as without any control the infinite number of job instances may queue up which may lead to issues (such as performance, race-conditions, etc).
Is there any performance impact on the server?
As 2 jobs are performing the same operation this will have performance impact, depending on the operations you are trying to perform of course.
You can find alternatives of scheduling, also different approaches of preventing concurrent execution here.
I'm building a system where users can set a future date(down to hours and minutes) in calendar. At that date a trigger is calling a certain task, unique for every user.
Every user can set a different date. The system will have 10k+ from the start and a user can create more than one trigger.
So assuming I have 10k users each user create on average 3 triggers => 30k triggers with 30k different dates.
All dates are saved in a database.
I'm new to quartz, can this be done in a more optimized way?
I was thinking about making a task run every minute that will get the tasks that will suppose to run in the next hour and remove them from database.
Do you have any better ideas? Did someone used quartz for a large number of triggers.
You have the schedule backed in the database. If I understand the idea - you want the quartz to load all the upcoming tasks to execute them in the future.
This is problematic approach:
Synchronization Issues: I assume that users can edit, remove and add new tasks to the database. You would have to periodically ask the database to refresh the state of the quartz jobs, remove some jobs, edit other jobs etc. This may not be trivial. The state of the program would be a long living cache which needs to be synchronised often.
Performance and scalability issues: Even if proposed solution may be ok for 30K tasks it may not be ok for 70k or 700k tasks. In your approach it's not easy to scale - adding new machine would require additional layer of synchronisation - which machine should actually execute which job (as all of them have all the tasks).
What I would propose:
Add the "stage" to the Tasks table (new, queued, running, finished, failed)
divide your solution into several components. (Initially they can run on a single machine but it will be easy to scale)
Components:
Task Finder: Executed periodically (once every few seconds). Scans the database for tasks that are "new", and due soon. Sends the tasks found to Message Queue and marks the task as "queued" in the db. Marking as "queued" has to be done carefully as there can be multiple "task finders". (As an addition it may find the tasks that have been marked as "queued" or "running" more than N minutes ago and are not "finished" nor "canceled" - probably need to re-run these)
Message Queue: Connector between Taks Finder and Task Executor.
Task Executor: Listens to the Message Queue and process the tasks that it received. Marks the tasks as "running" initially and "finished" or "failed" later on.
With this approach you can have:
multiple Task Executors on multiple machines
multiple Task Schedulers on multiple machines
even if one of the Task Schedulers or Executors will fail it will not be Single Point of Failure. Some of the tasks will be delayed but it will be picked up and run afterwards.
This may not address all the scenarios but would be a good starting point.
I don't see why you need quartz here at all. As far as I remember, quartz is best used to schedule backend internal processes, not user-defined tasks obtained from db.
Just process the trigger as it is created, save a row to your tasks table with start_date based on the trigger and every second select all incomplete tasks with start_date< sysdate. If the job is repeating, calculate next execution time and insert new task row / update previous accordingly.
As Sam pointed out there are some nice topics addressing the same problem:
Quartz Performance
Quartz FAQ
In a system like the mentioned it should not a problem mostly to handle this amount of triggers. But according to my experiance it is a better way to create something like a "JobChecker". If you enable your users to create own triggers it could really break Quartz in some cases. For example if 5000 user creates an event to the exact same time, Quartz will have a hard time to handle them correctly. (It is not likely a situation that will occur often, but it is possible as your specification does not excludes it.) Quartz has difficulties only when a lot of triggers should be fired at the same time.
My recommendation to this problem is to create one job that is running in every hour/minute etc and that should handle every user set events. This way is simmilar to a cron job in bash. With this kind of processing your system will be pretty stable even if the number of "triggers" increases dramatically. Basically your line of thought is correct if you thrive for scalability.
According to this Google article,
"You can also use the Task Queue to do the write at a later time, which has the added benefit that the Task Queue automatically retries failures."
Suppose I'm trying to keep my daily spend on Google App Engine under a certain budget. Let's say I start to detect I'm getting low on quota for the day so I want to reschedule the work for tomorrow. It would be great to use Task Queues for this instead of Cron jobs because the initiation of the work and the rescheduling of the work can be handled pretty similarly.
How do I put a task on the Task Queue and specify that it should not begin until a particular time? I can see how I might use RetryOptions to get part of what I want, namely to delay the work. But RetryOptions doesn't seem to provide a way to specify not to retry until 24 hours have passed since "now" or don't retry until midnight.
Thanks for your help.
Looks like I can use TaskOptions.countdownMillis(long) to specify how long to wait before executing the task.
The documentation says "later time", in the sense that your application doesn't stop to wait for your write to go through, so you work in parallel.
If you want to control WHEN to start a cleanup or something similar, look into CRON jobs