We have 4 cron jobs of Quartz each triggering at
0 */5 * ? * * - 5 mins
0 0 */1 ? * * - 1 hr
0 */1 * ? * * - 1 min
0 */3 * ? * * - 3 mins
All the jobs works fine in local and single cluster environment.
but when we switch to more than one pod the job with 3 mins trigger starts to misbehave and goes on to trigger at every 2 hour or randomly.
We are using Springboot and Quartz scheduler and standard Kubernetes Helm chart deployment in our production.
Cron triggers are defined in the application.properties. any reason or pointers to make it consistent would be great
Related
I have enabled checkpoint in flink 1.12.1 programmatically as below:
int duration = 10 ;
if (!environment.getCheckpointConfig().isCheckpointingEnabled()) {
environment.enableCheckpointing(duration * 6 * 1000, CheckpointingMode.EXACTLY_ONCE);
environment.getCheckpointConfig().setMinPauseBetweenCheckpoints(duration * 3 * 1000);
}
Flink Version: 1.12.1
configuration:
state.backend: rocksdb
state.checkpoints.dir: file:///flink/
blob.server.port: 6124
jobmanager.rpc.port: 6123
parallelism.default: 2
queryable-state.proxy.ports: 6125
taskmanager.numberOfTaskSlots: 2
taskmanager.rpc.port: 6122
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
jobmanager.web.address: 0.0.0.0
rest.address: 0.0.0.0
rest.bind-address: 0.0.0.0
rest.port: 8081
taskmanager.data.port: 6121
classloader.resolve-order: parent-first
execution.checkpointing.unaligned: false
execution.checkpointing.max-concurrent-checkpoints: 2
execution.checkpointing.interval: 60000
But it is failing with following error:
Caused by: org.apache.flink.runtime.resourcemanager.exceptions.UnfulfillableSlotRequestException: Could not fulfill slot request 44ec308e34aa86629d2034a017b8ef91. Requested resource profile (ResourceProfile{UNKNOWN}) is unfulfillable.
If I remove/disable checkpoint, everything works normally. I have checkpoint requirement because, if my pod, gets restart, data which is being handled by earlier run gets reset.
Can somebody direct, how this can be addressed?
I am working with Spring-boot and Oracle sql developer.
I want to implement something like 'user administrating scheduler'.
User can add or remove scheduler on web page.
If user adds 'scheduler running every 3 minutes', db table may be...
s_id | s_cron | s_detail
sid000001 | 0 0/3 * * * ? | do job 1
and spring scheduler must do 'job 1' in every 3 minutes.
And if, another user also adds 'scheduler running every 1 minutes',
s_id | s_cron | s_detail
sid000001 | 0 0/3 * * * ? | do job 1
sid000002 | 0 0/1 * * * ? | do job 2
and spring scheduler must do 'job 1' in every 3 minutes, and must do 'job 2' in every 1 minutes simultaneously.
The problem is : how can I make it in Spring-boot?
Spring service must add/remove dynamically(or automatically) scheduler with db data added/removed in server running.
Please give me some hands.
If you want to dynamically schedule tasks you can do it without spring by using ExecutorService in particular ScheduledThreadPoolExecutor
Runnable task = () -> doSomething();
ScheduledExecutorService executor =
Executors.newScheduledThreadPool(Runtime.getRuntime().availableProcessors());
// Schedule a task that will be executed in 120 sec
executor.schedule(task, 120, TimeUnit.SECONDS);
// Schedule a task that will be first run in 120 sec and each 120sec
// If an exception occurs then it's task executions are canceled.
executor.scheduleAtFixedRate(task, 120, 120, TimeUnit.SECONDS);
// Schedule a task that will be first run in 120 sec and each 120sec after
the last
execution
// If an exception occurs then it's task executions are canceled.
executor.scheduleWithFixedDelay(task, 120, 120, TimeUnit.SECONDS);
I have Quartz running in a cluster and I do get jobs running periodically. The job is started in one machine and the others will hold until next execution time.
What I want now is to delay the job invocation if the previous invocation isn't finished yet. For instance:
10:00 - instance invocation#1
10:06 - invocation#1 finished
10:10 - instance invocation#2
10:13 - invocation#2 finished
10:20 - instance invocation#3
10:31 - invocation#3 finished // took longer than expected
10:31 - instance invocation#4 // start delayed
10:35 - invocation#4 finished
Even this would be acceptable:
10:00 - instance invocation#1
10:06 - invocation#1 finished
10:10 - instance invocation#2
10:13 - invocation#2 finished
10:20 - instance invocation#3
10:31 - invocation#3 finished // took longer than expected
10:40 - instance invocation#4 // waits for next timed invocation
10:44 - invocation#4 finished
I'm using cron expression like triggers and it is triggered once each 10 minutes (0 0/10 * * *).
Annotating your job with #DisallowConcurrentExecution should do the trick.
I am running spark job on hadoop cluster, and the job is failing at few times with the exception :
exception : Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, begin > end in range (begin, end): (1494159709088, 1494159706071)
the job ran successfully on the rerun.
After searching on google, It might be Clock skew between the Oozie server host and launcher host.
Is there a way i can check if there is clock skew ? or how can i check the time on all the nodes whether they are in sync or not.
Thanks
ntptime command output :
ntp_gettime() returns code 0 (OK)
time dcb9b19b.a2328f64 Sun, May 7 2017 14:45:47.633, (.633584090),
maximum error 434990 us, estimated error 815 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 176.871 us, frequency -25.666 ppm, interval 1 s,
maximum error 434990 us, estimated error 815 us,
status 0x2001 (PLL,NANO),
time constant 10, precision 0.001 us, tolerance 500 ppm,
ntpstat command output :
synchronised to NTP server (174.68.168.57) at stratum 3
time correct to within 77 ms
polling server every 1024 s
I config a Job to execute every 3 hours day time, below is cron config:
#On("0 0 10-20/3 * * ?")
But it didn't work
This is my play staus output:
Requests execution pool:
~~~~~~~~~~~~~~~~~~~~~~~~
Pool size: 20
Active count: 0
Scheduled task count: 876
Queue size: 0
I think I got the answer:
#On("0 0 10-20/3 * * ?")
didn't means the job will run 4 times(10, 13, 16, 19), play will wait for the first job until it ends, and then wait 3 hours to run next job.
so, if this job spend 10 hours, the job will only execute once per day.