I have configure Quartz JDBCJobStore to run some schedulers on a clustered app and while testing what would happen if one of the servers goes down in the middle of a job processing I found that Quartz does not have a mechanism for the other instances of the app to figure out if one job could not be finished by one of the servers and should be reassign to another one, the problem is that when a server fails it does not have the opportunity to released the lock that it has over the job in the database, so the Job basically stays locked until someone removes the lock by hand.
Is there a built-in feature in Quartz that allow to handle this type of situation (to detect if a job should be release from its lock)?
Related
I am moving from single pod(docker image) to multiple pods for my Spring Application on Kubernetes for load handling. But I am facing an issue because I have a cron scheduler method in my application which runs daily at a particular time. If I deploy multiple pods, they all run simultaneously and as a result multiple entries get saved into my DB, but I want only a single pod to execute that function.
I have thought of generating java uuid and saving it in the DB as the function starts execution on each pod. Then in the same function, putting a sleep timer of, let's say, 5 seconds and comparing the uuid from the DB in each pod. The pod which updated the value in the database latest will match that value and will execute the method ahead.
Is this a good approach? If not, please give suggestions that can be done to solve this issue.
You can use a quartz scheduler with JDBC store. This will take care of your requirement automatically.
In short: "Load-balancing occurs automatically, with each node of the cluster firing jobs as quickly as it can. When a trigger’s firing time occurs, the first node to acquire it (by placing a lock on it) is the node that will fire it."
Please refer to the official link for details- http://www.quartz-scheduler.org/documentation/quartz-2.3.0/configuration/ConfigJDBCJobStoreClustering.html
Also, you can move the workload to an isolated process, this will be helpful and cleaner, check this out https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/ this will give you an idea.
the better approach is to configure your application to have multiple Spring Profiles and use cron only on one profile. Follow this tutorial https://faun.pub/spring-scheduler-in-multi-node-environment-49814e031e7c
I am working on an application which is deployed on web-sphere application server 8.0. This application insert record in one table and uses the data-source by jndi lookup.
I need to create a batch job which will read data from the above table and will insert into some other table continuously on a fixed interval of time. It will be deployed on the same WAS server and use the same jndi lookup for data source.
I read on internet that web-sphere application server scheduling is an option and is done using EJB and session beans.
I also read about jdk's ScheduledThreadPoolExecutor. I can create a war having ScheduledThreadPoolExecutor implementation and deploy it on the WAS for this.
I tried to find the difference between these two in terms of usage, complexity, performance and maintainability but could not.
Please help me in deciding which approach will be better for creating the scheduler for insert batch jobs and why. And in case if WAS scheduler is better then please provide me link to create and deploy the same.
Thanks!
Some major differences between WAS Scheduler and Java SE ScheduledThreadPoolExecutor is that WAS Scheduler is transactional (task execution can roll back or commit), persistent (tasks are stored in a database), and can coordinate across members of a cluster (such that tasks can be scheduled from any member but only run on one member). ScheduledThreadPoolExecutor is much lighter weight approach because it doesn't have any of these capabilities and does all of its scheduling within a single JVM. Task executions neither roll back nor retry and are not kept externally in a database in case the server goes down. It should be noted that WebSphere Application Server also has CommonJ TimerManager (and AlarmManager via WorkManager) which are more similar to what you get with ScheduledThreadPoolExecutor if that is what you want. In that case, the application server still manages the threads and ensures that context of the scheduling thread is available on the thread of execution. Hope this helps with your decision.
I have quartz schedulers load balanced across four cluster. Could you please let me know whether we can do some simple operation so it will be block the quartz schedulers from running in one instance.
All my jobs are JDBC store kind of jobs. My requirement is to disabling jobs from kicking off from instance alone from the cluster.
Any Suggestions?
I want to use Quartz Scheduler framework in my application. I came across two types of JobStores:
1) RAM Job Store
2) JDBC Job store.
I am wondering in which case I have to use which job store. And what is the pros and cons between them.
Any thoughts on this is really helpful for me and I appreciate it.
JDBC job store saves information about fired triggers and jobs in the database, thus:
it won't lose firings if application was down when trigger was suppose to fire (this depends on chosen misfire instruction)
you can cluster your scheduler, where each node uses the same database
JDBC job store is considerably slower
RAM job store is applicable only in non-clustered application where loosing a firing is not a big deal. It's also much faster. If you want to use Quartz with RAM job store, most likely you don't need Quartz at all. Both Spring and EJB provide mechanisms to run periodic jobs, both time and CRON based.
The RAM Job Store is very fast, but very volatile - jobs won't survive a server restart.
The JDBC Job Store is a little slower, but since the jobs are in a persistent store (the database), they will survive a restart.
So, if you only have short-lived job schedules, and it's ok to lose them when the server restarts or the application is redeployed, then you can use the RAM Job Store.
If you need the assurance that your jobs will survive a shutdown / restart, then you should use the JDBC job store.
I have a webapp which will run on 2 different machines. From the webapp it is possible to "order" jobs to be executed at specific times by quartz. quartz is running inside the webapp. Thus quartz is running on each of the two machines.
I am using JDBC datastore to persist the jobs, triggers etc.
However, the idea is that only one of the machines will run jobs and the other will only use quartz to schedule jobs. Thus, the scheduler will only be started (scheduler.start()) on one of the machines.
In the documentation it says
Never run clustering on separate machines, unless their clocks are synchronized using some form of time-sync service (daemon) that runs very regularly (the clocks must be within a second of each other). See http://www.boulder.nist.gov/timefreq/service/its.htm if you are unfamiliar with how to do this.
Never start (scheduler.start()) a non-clustered instance against the same set of database tables that any other instance is running (start()ed) against. You may get serious data corruption, and will definitely experience erratic behavior.
And i'm not sure that the two machines in which my webapp is running have their clocks synchronized.
My question is this: Should i still run quartz in clustering mode for this setup when only one of the quartz instances will be started and run jobs while the other instance will only used for scheduling jobs to be executed by the first instance.
What about simply starting the scheduler on one node only and accessing it remotely on another machine? You can schedule jobs using RMI/JMX. Or you can use a RemoteScheduler adapter.
Basically instead of having two clustered schedulers where one is working and another one is only accessing the shared database you have only a single scheduler (server) and you access it from another machine, scheduling and monitoring jobs via API.
If you will never call the start() method on the second node, then you shouldn't need to worry about clock synchronization.
However you will want to set the isClustered config prop to true to make sure that table-based locking is used when data is inserted/updated/deleted by both nodes.