We have an application deployed in a clustered environment. Every 5 minutes our application sends a ping operation to all other applications connected to it. We have used non-persistent Quartz scheduler in order to do this work.
The problem is that in a clustered environment only one node is doing this activity(ping operation). Are there any references or any sample code for this? (This is a plain servlet application.)
Since all nodes are working in a cluster, every job runs on just a single machine (most idle one). This is the reason you use clustering. But you want all machines to run given job independently, not being aware of other cluster nodes. Basically, you don't need Quartz (cluster) at all!
Enough is to use ScheduledExecutorService.html#scheduleAtFixedRate():
final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
final Runnable pinger = new Runnable() {
public void run() {
//send PING
}
};
scheduler.scheduleAtFixedRate(pinger, 5, 5, MINUTES);
Just run this code on every machine and use Quartz where you need it.
Related
Some background
We are running a fairly simple application that handles subscriptions and are running into the limits of the external service. The solution is that we are introducing a queue and throttle the consumers of this queue to optimize the throughput.
For this we are using a Quarkus (2.7.5.Final) implementation and using quarkus-smallrye-reactive-messaging-rabbitmq connector provided by quarkus.io
Simplified implementation
rabbitmq-host=localhost
rabbitmq-port=5672
rabbitmq-username=guest
rabbitmq-password=guest
mp.messaging.incoming.subscriptions-in.connector=smallrye-rabbitmq
mp.messaging.incoming.subscriptions-in.queue.name=subscriptions
#Incoming("subscriptions-in")
public CompletionStage<Void> consume(Message<JsonObject> message) {
try {
Thread.sleep(1000);
return message.ack();
} catch (Exception e) {
return message.nack(e);
}
}
The problem
This only uses one worker thread and therefore the jobs are handles 1 by 1, ideally this application picks up as many jobs as there are worker threads available (in parallel), how can I make this work?
I tried
#Incoming("subscriptions-in")
#Blocking
Didn't change anything
#Incoming("subscriptions-in")
#NonBlocking
Didn't change anything
#Incoming("subscriptions-in")
#Blocking(ordered = false)
This made it split of into different worker threads, but ?detached? the job from the queue, so none of the messages got ack'd or nack'd
#Incoming("subscriptions-in-1")
..
#Incoming("subscriptions-in-2")
..
#Incoming("subscriptions-in-3")
These different channels seem to all work on the same worker thread (which is picked on startup)
The only way I currently see is to slim down the application and run one consumer thread each and just run 50 in parallel in kubernetes. This feels wrong and I can't believe there is no way to multithread at least some of the consuming.
Question
I am hopeful that I am missing a simple solution or am missing the concept of this RabbitMQ connector.
Is there anyway to get the #Incoming consumption to run in parallel?
Or is there a way in this Java implementation to increase the prefetch count? If so I can multithread them myself
Suppose you have Spark + Standalone cluster manager. You opened spark session with some configs and want to launch SomeSparkJob 40 times in parallel with different arguments.
Questions
How to set reties amount on job failures?
How to restart jobs programmatically on failure? This could be useful if jobs failure due lack of resources. Than I can launch one by one all jobs that require extra resources.
How to restart spark application on job failure? This could be useful if job lack resources even when it's launched simultaneously. Than to change cores, CPU etc configs I need to relaunch application in Standalone cluster manager.
My workarounds
1) I pretty sure the 1st point is possible, since it's possible at spark local mode. I just don't know how to do that in standalone mode.
2-3) It's possible to hand listener on spark context like spark.sparkContext().addSparkListener(new SparkListener() {. But seems SparkListener lacks failure callbacks.
Also there is a bunch of methods with very poor documentation. I've never used them, but perhaps they could help to solve my problem.
spark.sparkContext().dagScheduler().runJob();
spark.sparkContext().runJob()
spark.sparkContext().submitJob()
spark.sparkContext().taskScheduler().submitTasks();
spark.sparkContext().dagScheduler().handleJobCancellation();
spark.sparkContext().statusTracker()
You can use SparkLauncher and control the flow.
import org.apache.spark.launcher.SparkLauncher;
public class MyLauncher {
public static void main(String[] args) throws Exception {
Process spark = new SparkLauncher()
.setAppResource("/my/app.jar")
.setMainClass("my.spark.app.Main")
.setMaster("local")
.setConf(SparkLauncher.DRIVER_MEMORY, "2g")
.launch();
spark.waitFor();
}
}
See API for more details.
Since it creates process you can check the Process status and retry e.g. try following:
public boolean isAlive()
If Process is not live start again, see API for more details.
Hoping this gives high level idea of how we can achieve what you mentioned in your question. There could be more ways to do same thing but thought to share this approach.
Cheers !
check your spark.sql.broadcastTimeout and spark.broadcast.blockSize properties, try to increase them .
I have a Quartz Job like this
#PersistJobDataAfterExecution
#DisallowConcurrentExecution
public class MyJob{
public void execute(JobExecutionContext jec) throws JobExecutionException {
//connect to a FTP server, monitor directory for new files and download
//Using FTPClient of commons-net-3.5.jar
}
The job is triggered with
JobDetail jobDetail = newJob(MyJob.class)
.withIdentity(jobName, DEFAULT_GROUP)
.usingJobData(new JobDataMap(jobProperties))
.build();
//trigger every minute
Trigger trigger = newTrigger()
.withIdentity(jobName, DEFAULT_GROUP)
.startNow()
.withSchedule(cronSchedule(cronExpression))
.build();
scheduler.scheduleJob(jobDetail,trigger);
The job is triggered every minute. It works well for about 1 week (10000 Executions) and inexplicably not relaunches. There are no errors in the log and see that it has completed the previous execution. The other processes are firing correctly.
Upgrading libraries to quartz-2.2.3 and commons-net-3.5 (looking for a possible bug in the FTP library) I managed to last 3 weeks
I have a Job to monitor Scheduler that says trigger state is BLOCKED. The Thread of the blocked process is not reused by application server
TriggerState triggerState = scheduler.getTriggerState(triggerKey);
I have not found documentation on this type of problem with Quartz, so my suspicion is a bug in the FTP library that interferes with the thread started by quartz for example with the usage of #PersistJobDataAfterExecution
I wonder if it's a known issue or could be a bug so I could apply a solution or a workaround ( killing the quartz job how to stop/interrupt quartz scheduler job manually)
After months with occasional drops of service and suspect that FTP connectivity errors block the service, we have finally implemented a measure that seems to solve the problem
Each process executions do now:
FTPClient ftp = new FTPClient();
//Added connection timeout before connect()
ftp.setDefaultTimeout(getTimeoutInMilliseconds());
ftp.connect(host, port);
//Added more timeouts to see if thread locks disappear...
ftp.setBufferSize(1024 * 1024);
ftp.setSoTimeout(getTimeoutInMilliseconds());
The weird thing is that the process was not blocked previously in connect(), the process continued and ended without restarting, but when setting the timeout the problem has not happened again
I have two quartz apps that must run in cluster mode so I have two jars. When I run those two jars (java -jar) only one process seems to be working, the other seems to be in standby and does nothing and only begins to work when I kill the other process. I need the two processes to run in cluster mode.
This is my config:
private Properties getProperties() {
final Properties quartzProperties = new Properties();
quartzProperties.put("org.quartz.jobStore.class", "org.quartz.impl.jdbcjobstore.JobStoreTX");
quartzProperties.put("org.quartz.jobStore.isClustered", "true");
quartzProperties.put("org.quartz.jobStore.tablePrefix", "QRTZ_");
quartzProperties.put("org.quartz.jobStore.driverDelegateClass", "org.quartz.impl.jdbcjobstore.StdJDBCDelegate");
quartzProperties.put("org.quartz.threadPool.class", "org.quartz.simpl.SimpleThreadPool");
quartzProperties.put("org.quartz.threadPool.threadCount", "25");
quartzProperties.put("org.quartz.scheduler.instanceId", "AUTO");
quartzProperties.put("org.quartz.scheduler.instanceName", "qrtz");
quartzProperties.put("org.quartz.threadPool.threadPriority", "5");
quartzProperties.put("org.quartz.jobStore.clusterCheckinInterval","10000");
quartzProperties.put("org.quartz.jobStore.useProperties", "false");
quartzProperties.put("org.quartz.jobStore.dataSource", "quartzDS");
quartzProperties.put("org.quartz.dataSource.quartzDS.URL", environment.getRequiredProperty("org.quartz.dataSource.quartzDS.URL"));
quartzProperties.put("org.quartz.dataSource.quartzDS.user", environment.getRequiredProperty("org.quartz.dataSource.quartzDS.user"));
quartzProperties.put("org.quartz.dataSource.quartzDS.password", environment.getRequiredProperty("org.quartz.dataSource.quartzDS.password"));
quartzProperties.put("org.quartz.dataSource.quartzDS.maxConnections", "5");
quartzProperties.put("org.quartz.dataSource.quartzDS.validationQuery", "select 0 from dual");
quartzProperties.put("org.quartz.dataSource.quartzDS.driver", environment.getRequiredProperty("org.quartz.dataSource.quartzDS.driver"));
return quartzProperties;
}
TL;TR : Your problem comes from Quartz Scheduler itself and there is no way to change its behaviour.
To make you understand why, I have to explain you how Quartz cluster mode behaves. We will take your case as example.
You start your two apps which each run a Quartz instance that synchronize through a database. Each jobs you are scheduling is stored in the database with processing data like "last time the job run", "last instance that run the job", etc. Each Quartz instance regularly scans the database for jobs to fire and fires as much jobs it cans.
The things is, if you don't have enough load, one of your node will always scans the database before the other one and take all the load.
To see your other instance working, you have to shutdown or standby the first one or increase the load of the cluster.
The only thing you can configure on this is the size of the thread pool of each node : See http://www.quartz-scheduler.org/documentation/quartz-2.x/configuration/ConfigJDBCJobStoreClustering.html
I'm not quite sure whether this is more of an Openbravo issue or more of a Quartz issue, but we have some manual processes that run on schedules via Openbravo ProcessRequest objects (OB v2.50MP24), but it seems that the processes are running twice, at the exact same time. Openbravo extends the Quartz platform for their scheduling. I've tried to resolve this issue on my own by ensuring that my process classes extend this class:
import java.util.List;
import org.openbravo.dal.service.OBDal;
import org.openbravo.model.ad.ui.ProcessRequest;
import org.openbravo.scheduling.ProcessBundle;
import org.openbravo.service.db.DalBaseProcess;
public abstract class RBDDalProcess extends DalBaseProcess {
#Override
protected void doExecute(ProcessBundle bundle) throws Exception {
org.quartz.Scheduler sched = org.openbravo.scheduling.OBScheduler
.getInstance().getScheduler();
int runCount = 0;
synchronized (sched) {
List<org.quartz.JobExecutionContext> currentlyExecutingJobs = (List<org.quartz.JobExecutionContext>) sched
.getCurrentlyExecutingJobs();
for (org.quartz.JobExecutionContext jec : currentlyExecutingJobs) {
ProcessRequest processRequest = OBDal.getInstance().get(
ProcessRequest.class, jec.getJobDetail().getName());
if (processRequest == null)
continue;
String processClass = processRequest.getProcess()
.getJavaClassName();
if (bundle.getProcessClass().getCanonicalName()
.equals(processClass)) {
runCount++;
}
}
}
if (runCount > 1) {
System.out.println("Process "
+ bundle.getProcessClass().getSimpleName()
+ " is already running. Cancelling.");
return;
}
doRun(bundle);
}
protected abstract void doRun(ProcessBundle bundle);
}
This worked fine when I tested by requesting the process to run immediately twice at the same time. One of them cancelled. However, it's not working on the scheduled processes. I have S.o.p's set up to log when the processes start, and looking at the logs shows each line of the output twice, each line one right after the other.
I have a sneaking suspicion that it's because the processes are either running in two completely different threads that don't know about each others' processes, however, I'm not sure how to verify my suspicions or, if I am correct, what to do about it. I've already verified that there is only one instance of each of the ProcessRequest objects stored in the database.
Has anyone else experienced this, know why they might be running twice, or know what I can do to prevent them from simultaneously running?
The most common reasons for a double Job execution are the following:
EDITED:
Your application is deployed in a clustered environment and you have not configured Quartz to run in a cluster environment.
Your application is deployed more than once. There are many cases where the application is deployed twice especially in Tomcat server. As a consequence the QuartzInitializerListener is invoked twice and the Jobs are executed twice. In case you use Tomcat server and you are defining contexts explicitly in server.xml, you should turn off automatic application deployment or specify deployIgnore. Both the autoDeploy set to true and the context element existence in server.xml, have as a consequence the twice deployment of the application. Set autoDeploy to false or remove the context element from the server.xml.
Your application has been redeployed without unscheduling the current processes.
I hope this helps you.
Quartz uses a thread pool for the jobs execution. So as you suspect, the RBDDalProcess will probably have separate instances a in separate thread and the counter check will fail.
One thing you can do is list the jobs registered in the Scheduler (you can get the Scheduler using the OB API as: OBScheduler.getScheduler()):
// enumerate each job group
for(String group: sched.getJobGroupNames()) {
// enumerate each job in group
for(JobKey jobKey : sched.getJobKeys(groupEquals(group))) {
System.out.println("Found job identified by: " + jobKey);
}
}
If you see the same job added twice, check out org.quartz.spi.JobFactory and the org.quartz.Scheduler.setJobFactory method for controlling jobs instantiations.
Also make sure you have only one entry for this process in the 'Report and Process' table in Openbravo.
I have used DalBaseProcess in Openbravo 3.0 and I cannot confirm this behavior you're describing. Having this in mind it would be probably a good idea to checkout the reported bugs for Openbravov2.50MP24 and Quartz or post a thread in Openbravo Forge forums with your problem.