Following is my requirement:
On a periodical basis, a "request" needs to be enqueued into a queue
The different parameters of the "request", the periodicity of execution, start date of execution, and the last run date are stored in a database.
There can be few thousands of such requests in the database.
Now, I want to write a scheduler which polls the database regularly, and when the current date is equal to the "last run date" + "periodicity", the request should be enqueued.
Please suggest the alternatives availabe for schedulers
This scheduler should be capable of running on multiple hosts
Thanks,
Hima
Have you looked at Quartz Scheduler? It may well do all you need very simply. I haven't used it myself, but I've heard good things about it.
Related
I have a requirement to send SOAP message to lot of devices everyday at a certain time. I will get the time from a tomcat parameter in web.xml. Something like;
<context-param>
<param-name>DailyTime</param-name>
<param-value>04:00</param-value>
</context-param>
I must create a separate thread that sends the messages. Time will be in 24-hrs format.
The problem is, as a starter i have no idea where to start or how to do it. Can you guys please point me in the right direction or give me some tips, which will help me greatly.
Thank You Everyone :)
You have several options. The two I've used most in the past are:
1) Schedule a cron job to run at the time(s) you want, and have it call an executable java class / jar file.
2) Use a scheduler library like Quartz
Regarding #1 - this assumes you're using a *nix system. If you're using Windows, you can schedule tasks through the Task Scheduler.
Regarding #2 - this gives you more flexibility on the conditions of running a task/job. For example, you could schedule a job to run every 1 minute, but not to start a new job until any existing job is complete.
Anecdotal remark from a version of Quartz circa 2006 - on WebSphere, it seems that my quartz jobs were getting executed by some background thread that made jobs take hours which should have only taken a few seconds. But that was almost a decade ago, and certainly quartz (and hopefully websphere) have vastly improved.
I have a piece of the same software, running on 2 servers. Each of this piece of software runs Quartz that lunch at a specific time a job to be executed.
Presuming that both servers have the clocks synchronized, these 2 jobs will start at the same time doing the same thing...
How can I do that only one job to run and the other one don't ?
I have a database also, and my first thought was to make a table where to insert a line when the job starts and also to verify if there is a record for current day (if so then skip job execution...)
But again, if the clocks on servers are perfectly synchronized then both apps will write and check at the same time, making useless this mechanism.
What other solution can I implement ?
Cluster 'em!
(source: quartz-scheduler.org)
Basically both of your Quartz schedulers use the same database to synchronize and will only run job on one (idle) machine.
But again, if the clocks on servers are perfectly synchronized then both apps will write and check at the same time, making useless this mechanism.
This exactly what Quartz does! However it uses some database locking/transaction mechanisms so that when one instance fetches new jobs, the second one must wait.
Presuming that both servers have the clocks synchronized
They must have synchronized clocks to run in a cluster:
Never run clustering on separate machines, unless their clocks are synchronized using some form of time-sync service
We're developing a web app and are coming to the end of development, and the client we're working with has suddenly sprung the fact on us that we will need to be able to handle load balancing.
We have batch jobs running which would need to run on both servers, but we don't want them to overlap. They are selecting rows from the database, processing the objects, and merging them back into the database. One of these jobs MUST run at the same time each day, though the others run every n minutes. We have about a week at most to get something working, and it'll become technical debt for us.
My question is: what quick and dirty hacks exist to get this working properly? We have a SQLServer 2008 instance and are running Java EE 6 on JBoss 5, which will be load balanced between two servers. We're using Spring 3 and JPA2 backed by Hibernate, and using the stock spring scheduler to schedule and run the jobs. Help me Obi Wan Kenobi; you're my only hope!
on jboss5 u need to use Scheduler API as the simplest solution - the implmentation is built on top of quartz and generally you would user clustered configuration like described here
http://quartz-scheduler.org/documentation/quartz-2.x/configuration/ConfigJDBCJobStoreClustering
Almost 10 years after this question was asked, I had the same need and the "quickest and dirty-ess" solution for me was a load balancer using shared file system without any master.
Each worker locks->picks the jobs from the shared file system, independent of other workers. To balance load, each worker sleeps X seconds between each job polling iteration, where X is proportional to load on the worker (in my case load is count of processes started by worker in the background). Thus high load sleeper gives higher probability to other workers to pick up the next job. Worker loops are running under supervisor (linux).
My use case was execution of sparklyr client-mode jobs on Spark/Hadoop cluster without overloading the edge nodes. It was implemented as a bash script within few hours and then scaled to 3 hosts, and has been stable for some months now - till there is time to invest in a better solution.
I am looking for ideas on how to deal with a search related task which takes more than usual time (in human terms more than 3 seconds)
I have to query multiple sources, sift through information for the first time and then cache it in the DB for later quick return.
The context of the project is J2EE, Spring and Hibernate (on top of SpringROO)
The possible solutions I could think of
-On the webpage let the user know that task is running in background, if possible give them a queue number or waiting time. Refresh the page via a controller which basically checks if the task is done, then when its done (ie the search result is prepared and stored in DB) then just forward to a new controller and fetch the result from the DB
-The background tasks could be done with Spring Task executor. I am not sure if it is easy to give a measure of how long it would take. It would probably be a bad idea to let all the search terms run concurrently, so some sort of pooling will be a good idea.
-Another option to use background tasks is to use JMS. This is perhaps a solution with more control (retries etc)
-Spring batch also comes to mind
Please suggest how you would do it. I would greatly appreciate a semi-detailed+ description. The sources of info can be man and can be sequential in nature so it can take upto 4-5 minutes for the results to form. It is also possible that such tasks run automatically in the background without user intervention (ie to update from the sources)
From a user perspective, I use AJAX. The default web page contains some kind of "Busy" indicator. When the AJAX request completes, the busy indicator is replaced with the result.
In the background, request handlers are already multi-threaded. So you can simply format the default result, close&flush the output, and do the processing in the current thread. You should put something in the session or DB to make sure that no one can start the same heavy process a second time.
Running task pools in a web container is possible but there are some caveats, especially how to synchronize startup/shutdown: Do you want your web server to "hang" during shutdown while some thread is busy collecting your results? Also the additional load should be considered. It might be better to use JMS and offload the strain to a second server dedicated to build the search results.
Such a system will scale much better if your searches start to become a burden. It also makes it trivial to automate the process by writing a small program which posts searches in the JMS queue.
I've solved this problem in the past doing something like this:
When the user initiates a long running task, I open a popup window that displays the task status. The task status includes a name and estimated time to complete
This task is also stored in my "app" (this can be stored in the DB, session, or application context), so the user can continue doing other things on my web app while having an easy way to navigate back to the running task.
I stored my tasks in a DB, so I could manage what happens on startup and shutdown of the web app. This requires storing the progress of the task in the DB.
The tricky part is display results to the user. If you use the method I've described, you'll need to store results in either the DB, session, or application contexts.
This system I've described is pretty heavyweight, and may be overkill for your application.
In response to the comment
so what do you use to do the
background computing. I have asked
this before
I use java.util.concurrent. A lot of this depends on the nature of your application. Is the task (or steps in the task) idempotent? How critical is it that it run to completion? If you have a non-idempotent task that must run to completion, I would say you generally must record every piece of work you do, and you must do that piece of work within a transaction. For example, if one of your tasks is to email a list of people (this is definitely not idempotent) you would do the emailing in a "transaction" (I'm using the term lightly here) and store your progress after each transaction is complete.
I'm trying to find the best solution for periodic task running in parallel. Requirements:
Java (Spring w/o Hibernate).
Tasks are being managed by front-end application and stored in MySQL DB (fields: id, frequency (in seconds), <other attributes/settings about task scenario>). -- Something like crontab, only with frequency (seconds) field, instead of minutes/hours/days/months/days of weeks.
I'm thinking about:
TaskImporter thread polling Tasks from DB (via TasksDAO.findToProcess()) and submitting them to queue.
java.util.concurrent.ThreadPoolExecutor running tasks (from queue) in parallel.
The most tricky part of this architecture is TasksDAO.findToProcess():
How do I know which tasks is time to run right now?
I'm thinking about next_run Task field, which would be populated (UPDATE tasks SET next_run = TIMESTAMPADD(SECOND, NOW(), frequency) WHERE id = ? straight after selection (SELECT * FROM tasks WHERE next_run IS NULL OR next_run <= NOW() FOR UPDATE). The problem: Have to run lots of UPDATES for lots of SELECT'ed tasks (UPDATE for each Task or bulk UPDATE) + concurrency problems (see below).
Ability to run several concurrent processing applications (cloud), using/polling same DB.
All of the concurring processing applications must run concrete task only once. Must lock all SELECT's from all other apps, until app A finishes updating (next_run) of all selected tasks. The problem: locking production table (front-end app) would slow things down. Table mirror?
I love simple and clean solutions and believe there's a better way to implement this processing application. Do you see any? :)
Thanks in advance.
EDIT: Using Quartz as a scheduler/executor is not an option because of syncing latency. Front-end app is not in Java and so is not able to interact with Quartz, except Webservice-oriented solution, which is not an option too, because front-end app has more data associated with previously mentioned Tasks and needs direct access to all data in DB (read+write).
I would suggest using Scheduling API like Quartz rather than using Home grown implementation.
It provides lot of API for implementation of logic and convenience. You will also have better control over jobs.
http://www.quartz-scheduler.org/
http://www.quartz-scheduler.org/docs/tutorial/index.html