I'm looking to use the quartz scheduler in my application because I have a clustered environment and want to guarantee that only one instance of my job runs each hour. My question is...Do I have to use a JDBC job store or some sort of "outside" storage of job data to guarantee that only once instance in my cluster runs the job at any given hour or is there more magic to Quartz that I am aware of?
Yes, you need to use the JDBC-JobStore, or else the TerracottaJobStore to enable a mechanism for the nodes to communicate with each other (in the one case they communicate in the db tables, in the other via the Terracotta networking features).
Related
I have a question on Tomcat clustering. I have a java application in which we have implemented in-memory caching. So basically when Tomcat starts, it loads a few objects from the database. These objects are stored in the tomcat memory like static objects. so whenever we update something from the application, it writes to the database and also updates the object in memory.
My question is, if we implement clustering in tomcat with 2 or more nodes, will those cached objects be also shared? Is that possible? I dont think it is. HttpSession objects can be shared using the session replication provided by tomcat delta manager or backup manager. But can the in-memory things also be shared?
Additionally what happens to batch jobs that are running? Will they also run multiple times as there will be multiple tomcat instances in the cluster and they would each trigger the job? That would be a failure as well as.
Any thoughts \ ideas?
If you save something in memory, it will not be replicated unless you implement something specifically to send it to other machines. Each jvm keeps their memory independent from each other.
In general, if you want to have replicated caching, a good solution is to use ehcache (http://www.ehcache.org/).
With regard to batch jobs, it depends on the library you use but generally, if you use an established library (like http://www.quartz-scheduler.org/), it should be capable of making sure that only one instance runs the job. Perhaps you need to configure.
The important thing is to test to make sure that any solution you put in place actually does what you expect it to do.
Good luck!
Whenever moving to a cluster or a cluster-like topology you need to revise your application solution design/architecture to make sure it will support multiple instance execution.
Data cached in memory by a given Tomcat instance WILL NOT be shared across instances in the cluster. You will need to move such data outside the Tomcat instance to a shared cache instance - Redis seems to be a popular option this days.
Job execution probably needs to be revised and customized to be driven by configuration. Create a Boolean flag your app can read and kick off batch processing if required. Select the node within the cluster you need the job to run on and set the flag to true there. Set it to false in all other nodes. Quartz WILL NOT ensure/control/manage multiple instances of a job running in a cluster.
I am working on an application which is deployed on web-sphere application server 8.0. This application insert record in one table and uses the data-source by jndi lookup.
I need to create a batch job which will read data from the above table and will insert into some other table continuously on a fixed interval of time. It will be deployed on the same WAS server and use the same jndi lookup for data source.
I read on internet that web-sphere application server scheduling is an option and is done using EJB and session beans.
I also read about jdk's ScheduledThreadPoolExecutor. I can create a war having ScheduledThreadPoolExecutor implementation and deploy it on the WAS for this.
I tried to find the difference between these two in terms of usage, complexity, performance and maintainability but could not.
Please help me in deciding which approach will be better for creating the scheduler for insert batch jobs and why. And in case if WAS scheduler is better then please provide me link to create and deploy the same.
Thanks!
Some major differences between WAS Scheduler and Java SE ScheduledThreadPoolExecutor is that WAS Scheduler is transactional (task execution can roll back or commit), persistent (tasks are stored in a database), and can coordinate across members of a cluster (such that tasks can be scheduled from any member but only run on one member). ScheduledThreadPoolExecutor is much lighter weight approach because it doesn't have any of these capabilities and does all of its scheduling within a single JVM. Task executions neither roll back nor retry and are not kept externally in a database in case the server goes down. It should be noted that WebSphere Application Server also has CommonJ TimerManager (and AlarmManager via WorkManager) which are more similar to what you get with ScheduledThreadPoolExecutor if that is what you want. In that case, the application server still manages the threads and ensures that context of the scheduling thread is available on the thread of execution. Hope this helps with your decision.
I want to use Quartz Scheduler framework in my application. I came across two types of JobStores:
1) RAM Job Store
2) JDBC Job store.
I am wondering in which case I have to use which job store. And what is the pros and cons between them.
Any thoughts on this is really helpful for me and I appreciate it.
JDBC job store saves information about fired triggers and jobs in the database, thus:
it won't lose firings if application was down when trigger was suppose to fire (this depends on chosen misfire instruction)
you can cluster your scheduler, where each node uses the same database
JDBC job store is considerably slower
RAM job store is applicable only in non-clustered application where loosing a firing is not a big deal. It's also much faster. If you want to use Quartz with RAM job store, most likely you don't need Quartz at all. Both Spring and EJB provide mechanisms to run periodic jobs, both time and CRON based.
The RAM Job Store is very fast, but very volatile - jobs won't survive a server restart.
The JDBC Job Store is a little slower, but since the jobs are in a persistent store (the database), they will survive a restart.
So, if you only have short-lived job schedules, and it's ok to lose them when the server restarts or the application is redeployed, then you can use the RAM Job Store.
If you need the assurance that your jobs will survive a shutdown / restart, then you should use the JDBC job store.
I have a webapp which will run on 2 different machines. From the webapp it is possible to "order" jobs to be executed at specific times by quartz. quartz is running inside the webapp. Thus quartz is running on each of the two machines.
I am using JDBC datastore to persist the jobs, triggers etc.
However, the idea is that only one of the machines will run jobs and the other will only use quartz to schedule jobs. Thus, the scheduler will only be started (scheduler.start()) on one of the machines.
In the documentation it says
Never run clustering on separate machines, unless their clocks are synchronized using some form of time-sync service (daemon) that runs very regularly (the clocks must be within a second of each other). See http://www.boulder.nist.gov/timefreq/service/its.htm if you are unfamiliar with how to do this.
Never start (scheduler.start()) a non-clustered instance against the same set of database tables that any other instance is running (start()ed) against. You may get serious data corruption, and will definitely experience erratic behavior.
And i'm not sure that the two machines in which my webapp is running have their clocks synchronized.
My question is this: Should i still run quartz in clustering mode for this setup when only one of the quartz instances will be started and run jobs while the other instance will only used for scheduling jobs to be executed by the first instance.
What about simply starting the scheduler on one node only and accessing it remotely on another machine? You can schedule jobs using RMI/JMX. Or you can use a RemoteScheduler adapter.
Basically instead of having two clustered schedulers where one is working and another one is only accessing the shared database you have only a single scheduler (server) and you access it from another machine, scheduling and monitoring jobs via API.
If you will never call the start() method on the second node, then you shouldn't need to worry about clock synchronization.
However you will want to set the isClustered config prop to true to make sure that table-based locking is used when data is inserted/updated/deleted by both nodes.
I need to be able to run some scheduled tasks (reports) for an EJB application running on JBoss 4.2.
In my initial implementation I am using a servlet in an associated WAR to read some configuration from a properties file and then reset the scheduled tasks using the Timer Service API. This works but it seems a bit awkward to have the initialization off in a web project. Also I'm not sure if this will work as expected when the app is deployed in a clustered environment.
What are the best practice for accomplishing this type of task? Should I be using something other than Timer Service and is there a better way to initialize the timers when the server starts?
Maybe have a look at Quartz Scheduler. Quoting its website:
Quartz is a full-featured, open source job scheduling system that can be integrated with, or used along side virtually any J2EE or J2SE application - from the smallest stand-alone application to the largest e-commerce system. Quartz can be used to create simple or complex schedules for executing tens, hundreds, or even tens-of-thousands of jobs; jobs whose tasks are defined as standard Java components or EJBs. The Quartz Scheduler includes many enterprise-class features, such as JTA transactions and clustering.
I've used it in the past to trigger EJB jobs and the whole solution was working very well, with very good scalability. To use it with EJB, you'll need to use the JobStoreCMT to store scheduling information (job, triggers and calendars). To tune resources for jobs execution, have a look at the Configure ThreadPool Settings doc. Then, just let the EJB client do its job to load balance requests over the different instances if EJBs are deployed on a cluster.
Quartz itself can also be clustered to get both high availability and scalability through fail-over and load balancing if required.
Regarding the properties file you mentioned, I'm not sure of what kind of data you need to read exactly but, without a servlet, if you need to read something, you'll have to read it from the database.