Running Only Once A Schedule Job Across Multiple Instances - java

I have a schedule job that run every end of the month. After running it saves some data to database.
When i scale the app(for example with 2 instances) both instances run the schedule job and both save the data and at the end of day my database has the same data.
So i want the schedule job only run one time regardless of instances numbers at cloud.

In my project, I have maintained a database table to hold a lock for each job which needs to be executed only once in the cluster.
When a Job gets triggered then it first tries to acquire lock from the database and if it gets that lock only then it will get executed. If it fails to acquire the lock then job will not get executed.
You can also look at the clustering feature of Quartz job.
http://www.quartz-scheduler.org/documentation/2.4.0-SNAPSHOT/introduction.html

I agree with the comments. If you can utilize a scheduler that's going to be your best, most flexible option. In addition, a scheduler should be executing your job as a "task" on Cloud Foundry. The task will only run on one instance, so you won't need to worry about how many instances your application is using (the two are separate in that regard).
If you're using Pivotal Cloud Foundry/Tanzu Cloud Foundry there is a scheduler you can ask your operations team to install. I don't know about other variants of CF, but I assume there are other schedulers.
https://network.pivotal.io/products/p-scheduler/
If using a scheduler is not an option then this is a concern you'll need to handle in your application. The solution of using a shared lock is a good one, but there is also a little trick you can do on Cloud Foundry that I feel is a little simpler.
When your application runs, certain environment variables are set by the platform. There is one called INSTANCE_INDEX which has a number indicating the instance on which the app is running. It's zero-based, so your first app instance will be running on instance zero, the second instance one, etc.
In your code, simply look at the instance index and see if it's zero. If the index is non-zero, have your task end without doing anything. If it's zero, then let the task proceed and do its work. The task will execute on every application instance, but it will only do work on the first instance. It's an easy way to guarantee something like a database migration or background job only runs once.
One final option would be to use multiple processes. This is a feature of Cloud Foundry that enables you to have different processes running, like your web process and a background worker process.
https://docs.cloudfoundry.org/devguide/multiple-processes.html
The interesting thing about this feature is that you can scale the different processes independently of each other. Thus you could have as many web processes running, but only one background worker which would guarantee that your background process only runs once.
That said, the downside of this approach is that you end up with separate containers for each process and the background process would need to continue running. The foundation expects it to be a long-running process, not a finite duration batch job. You could get around this by wrapping your periodic task a loop or something which keeps the process running forever.
I wouldn't really recommend this option but I wanted to throw it out there just in case.

You can use #SnapLock annotation in your method which guarantees that task only runs once. See documentation in this repo https://github.com/luismpcosta/snap-scheduler
Example:
Import maven dependency
<dependency>
<groupId>io.opensw.scheduler</groupId>
<artifactId>snap-scheduler-core</artifactId>
<version>0.3.0</version>
</dependency>
After importing maven dependency, you'll need to create the required tables tables.
Finally, see bellow how to annotate methods which guarantees that only runs once with #SnapLock annotation:
import io.opensw.scheduler.core.annotations.SnapLock;
...
#SnapLock(key = "UNIQUE_TASK_KEY", time = 60)
#Scheduled(fixedRate = 30000)
public void reportCurrentTime() {
...
}
With this approach you also guarantee audit of the tasks execution.

Related

Execute sequential from two instance of the same Java application

I have a Java application named 'X'. In Windows environment, at a given point of time there might be more than one instance of the application.
I want a common piece of code to be executed sequentially in the Application 'X' no matter how many instances of the application are running. Is that something possible and can be achieved ? Any suggestions will help.
Example :- I have a class named Executor where a method execute() will be invoked. Assuming there might be two or more instances of the application at any given point of time, how can i have the method execute() run sequential from different instances ?
Is there something like a lock which can be accessed from two instances and see if the lock is currently active or not ? Any help ?
I think what you are looking for is a distributed lock (i.e. a lock which is visible and controllable from many processes). There are quite a few 3rd party libraries that have been developed with this in mind and some of them are discussed on this page.
Distributed Lock Service
There are also some other suggestions in this post which use a file on the underlying system as a synchornization mechanism.
Cross process synchronization in Java
To my knowledge, you cannot do this that easily. You could implement TCP calls between processes... but well I wouldn't advice it.
You should better create an external process in charge of executing the task and a request all the the tasks to execute by sending a message to a JMS queue that your executor process would consume.
...Or maybe you don't really need to have several processes running in the same time but what you might require is just an application that would have several threads performing things in the same time and having one thread dedicated to the Executor. That way, synchronizing the execute()method (or the whole Executor) would be enough and spare you some time.
You cannot achieve this with Executors or anything like that because Java virtual machines will be separate.
If you really need to synchronize between multiple independent instances, one of the approaches would be to dedicate internal port and implement a simple internal server within the application. Look into ServerSocket or RMI is full blown solution if you need extensive communications. First instance binds to the dedicated application port and becomes the master node. All later instances find the application port taken but then can use it to make HTTP (or just TCP/IP) call to the master node reporting about activities they need to do.
As you only need to execute some action sequentially, any slave node may ask master to do this rather than executing itself.
A potential problem with this approach is that if the user shuts down the master node, it may be complex to implement approach how another running node could take its place. If only one node is active at any time (receiving input from the user), it may take a role of the master node after discovering that the master is not responding and then the port is not occupied.
A distributed queue, could be used for this type of load-balancing. You put one or more 'request messages' into a queue, and the next available consumer application picks it up and processes it. Each such request message could describe your task to process.
This type of queue could be implemented as JMS queue (e.g. using ActiveMQ http://activemq.apache.org/), or on Windows there is also MSMQ: https://msdn.microsoft.com/en-us/library/ms711472(v=vs.85).aspx.
If performance is an issue and you can have C/C++ develepors, also the 'shared memory queue' could be interesting: shmemq API

create multithreaded java ee

Is it possible to create a multithreaded Java EE Glassfish container?
My intention is to create an application where users can capture data launch a social network, then each user would launch a new thread with the parameters he wants to retrieve information from the social network.
all these threads would be limited in number to avoid memory server.
As I can create multiple threads in java ee and that these once the user exits the application to remain running in the background until the user closes them?
One solution may be the job of glassfish?
Your question is pretty broad, but in general I understand you need to execute a thread for each user, which runs in background even when user stops using the application (logs out), does some repetitive task, and is terminated by user when required.
First, I would point out that this can be accomplished in cleaner way using timer service - you can schedule a periodical background job, which will do everyting you need. It can read the list of user and their tasks, perform them at a given interval. Then, a user may request to cance their tasks - they will remove their task from the list.
In this way, the number of users having the background task running would not be limited. They also can run sequentially in a single thread, but you may tweak that, see the rest of my answer.
More into on shceduling a timer in Java EE tutorial: https://docs.oracle.com/javaee/7/tutorial/ejb-basicexamples004.htm.
In case you really need a separate thread per user, there are several ways how to execute a thread separately from the request-handling thread. You might use asynchronous EJB method invocation, using #Asynchronous annotation. You may also inject ManagedExecutorService and use it to execute a Runnable asynchrnously using submit method. In both ways, you would not loose context and dependency injection will continue to work.
See more details about asynchronous eecution in Java EE tutorial about Concurrency utilities
You may also execute runnables asynchronously from a timer, but you may not need that, if you execute only a single task from within a timer handler, as timer handler will be executed when timer triggers in a new thread, if the previous handler did not complete yet.

How to avoid such scenario?

I have a piece of the same software, running on 2 servers. Each of this piece of software runs Quartz that lunch at a specific time a job to be executed.
Presuming that both servers have the clocks synchronized, these 2 jobs will start at the same time doing the same thing...
How can I do that only one job to run and the other one don't ?
I have a database also, and my first thought was to make a table where to insert a line when the job starts and also to verify if there is a record for current day (if so then skip job execution...)
But again, if the clocks on servers are perfectly synchronized then both apps will write and check at the same time, making useless this mechanism.
What other solution can I implement ?
Cluster 'em!
(source: quartz-scheduler.org)
Basically both of your Quartz schedulers use the same database to synchronize and will only run job on one (idle) machine.
But again, if the clocks on servers are perfectly synchronized then both apps will write and check at the same time, making useless this mechanism.
This exactly what Quartz does! However it uses some database locking/transaction mechanisms so that when one instance fetches new jobs, the second one must wait.
Presuming that both servers have the clocks synchronized
They must have synchronized clocks to run in a cluster:
Never run clustering on separate machines, unless their clocks are synchronized using some form of time-sync service

How to deal with a search task which takes more time than usual in Spring 3.0

I am looking for ideas on how to deal with a search related task which takes more than usual time (in human terms more than 3 seconds)
I have to query multiple sources, sift through information for the first time and then cache it in the DB for later quick return.
The context of the project is J2EE, Spring and Hibernate (on top of SpringROO)
The possible solutions I could think of
-On the webpage let the user know that task is running in background, if possible give them a queue number or waiting time. Refresh the page via a controller which basically checks if the task is done, then when its done (ie the search result is prepared and stored in DB) then just forward to a new controller and fetch the result from the DB
-The background tasks could be done with Spring Task executor. I am not sure if it is easy to give a measure of how long it would take. It would probably be a bad idea to let all the search terms run concurrently, so some sort of pooling will be a good idea.
-Another option to use background tasks is to use JMS. This is perhaps a solution with more control (retries etc)
-Spring batch also comes to mind
Please suggest how you would do it. I would greatly appreciate a semi-detailed+ description. The sources of info can be man and can be sequential in nature so it can take upto 4-5 minutes for the results to form. It is also possible that such tasks run automatically in the background without user intervention (ie to update from the sources)
From a user perspective, I use AJAX. The default web page contains some kind of "Busy" indicator. When the AJAX request completes, the busy indicator is replaced with the result.
In the background, request handlers are already multi-threaded. So you can simply format the default result, close&flush the output, and do the processing in the current thread. You should put something in the session or DB to make sure that no one can start the same heavy process a second time.
Running task pools in a web container is possible but there are some caveats, especially how to synchronize startup/shutdown: Do you want your web server to "hang" during shutdown while some thread is busy collecting your results? Also the additional load should be considered. It might be better to use JMS and offload the strain to a second server dedicated to build the search results.
Such a system will scale much better if your searches start to become a burden. It also makes it trivial to automate the process by writing a small program which posts searches in the JMS queue.
I've solved this problem in the past doing something like this:
When the user initiates a long running task, I open a popup window that displays the task status. The task status includes a name and estimated time to complete
This task is also stored in my "app" (this can be stored in the DB, session, or application context), so the user can continue doing other things on my web app while having an easy way to navigate back to the running task.
I stored my tasks in a DB, so I could manage what happens on startup and shutdown of the web app. This requires storing the progress of the task in the DB.
The tricky part is display results to the user. If you use the method I've described, you'll need to store results in either the DB, session, or application contexts.
This system I've described is pretty heavyweight, and may be overkill for your application.
In response to the comment
so what do you use to do the
background computing. I have asked
this before
I use java.util.concurrent. A lot of this depends on the nature of your application. Is the task (or steps in the task) idempotent? How critical is it that it run to completion? If you have a non-idempotent task that must run to completion, I would say you generally must record every piece of work you do, and you must do that piece of work within a transaction. For example, if one of your tasks is to email a list of people (this is definitely not idempotent) you would do the emailing in a "transaction" (I'm using the term lightly here) and store your progress after each transaction is complete.

Concurrent periodic task running

I'm trying to find the best solution for periodic task running in parallel. Requirements:
Java (Spring w/o Hibernate).
Tasks are being managed by front-end application and stored in MySQL DB (fields: id, frequency (in seconds), <other attributes/settings about task scenario>). -- Something like crontab, only with frequency (seconds) field, instead of minutes/hours/days/months/days of weeks.
I'm thinking about:
TaskImporter thread polling Tasks from DB (via TasksDAO.findToProcess()) and submitting them to queue.
java.util.concurrent.ThreadPoolExecutor running tasks (from queue) in parallel.
The most tricky part of this architecture is TasksDAO.findToProcess():
How do I know which tasks is time to run right now?
I'm thinking about next_run Task field, which would be populated (UPDATE tasks SET next_run = TIMESTAMPADD(SECOND, NOW(), frequency) WHERE id = ? straight after selection (SELECT * FROM tasks WHERE next_run IS NULL OR next_run <= NOW() FOR UPDATE). The problem: Have to run lots of UPDATES for lots of SELECT'ed tasks (UPDATE for each Task or bulk UPDATE) + concurrency problems (see below).
Ability to run several concurrent processing applications (cloud), using/polling same DB.
All of the concurring processing applications must run concrete task only once. Must lock all SELECT's from all other apps, until app A finishes updating (next_run) of all selected tasks. The problem: locking production table (front-end app) would slow things down. Table mirror?
I love simple and clean solutions and believe there's a better way to implement this processing application. Do you see any? :)
Thanks in advance.
EDIT: Using Quartz as a scheduler/executor is not an option because of syncing latency. Front-end app is not in Java and so is not able to interact with Quartz, except Webservice-oriented solution, which is not an option too, because front-end app has more data associated with previously mentioned Tasks and needs direct access to all data in DB (read+write).
I would suggest using Scheduling API like Quartz rather than using Home grown implementation.
It provides lot of API for implementation of logic and convenience. You will also have better control over jobs.
http://www.quartz-scheduler.org/
http://www.quartz-scheduler.org/docs/tutorial/index.html

Categories