Has "spring batch" ability to limit run jobs without manual check job status? Job can be different or instances of one job. Need something like configurable property.
No. The only way is to manually check via JobExplorer interface or directly query jobs metadata tables
Related
I have to run a schduler(any schduled jobs) where i have to fetch 300,000-400,000 in average twice daily from database and apply business logics one by one where each process requests to thirdparty which takes 3-4 seconds to respond.
what are alternates of spring batch processing to process such huge data in efficient ways?
Note: fetched data are not static, data may vary everyday.
May be #Scheduled annotation can help you out.
You can use the Spring Batch schedular. It execute spring batch jobs periodically on fixed schedule using some cron expression passed to Spring TaskScheduler
To configure, batch job scheduling is done in two steps:
Enable scheduling with #EnableScheduling annotation.
Create method annotated with #Scheduled and provide recurrence details using cron job. Add the job execution logic inside this method.
Here is the link of an example : https://howtodoinjava.com/spring-batch/job-scheduler-example/
From this article we can learn that Spring-Batch holds the Job's status in some SQL repository.
And from this article we can learn that the location of the JobRepository can be configured - can be in-memory and can be remote DB.
So if we need to scale a batch job, should we run several different Spring-batch JARs, all configured to use the same shared DB in order to keep them synchronized?
Is this the right pattern / architecture?
Yes, this is the way to go. The problem that might happen when you launch the same job from different physical nodes is that you can create the same job instance twice. In this case, Spring Batch will not know which instance to pick up when restarting a failed execution. A shared job repository acts as a safeguard to prevent this kind of concurrency issues.
The job repository achieves this synchronization thanks to the transactional capabilities of the underlying database. The IsolationLevelForCreate can be set to an aggressive value (SERIALIZABLE is the default) in order to avoid the aforementioned issue.
I have a Quartz setup with multiple instances and I want to interrupt a job wherever it is executed. As it was said in documentation, Scheduler.interrupt() method is not cluster aware so I'm looking for some common practice to overcome such limitation.
Well, here are some basics you should use to achieve that.
When running in cluster mode, the information about the currently running jobs are available in the quartz tables. For instance, the q_fired_triggers contains the job being executed.
The first column of this table is the scheduler name being in charge of it. So it is pretty easy to know who is doing what.
Then, if you enable the JMX export of your quartz instances org.quartz.scheduler.jmx.export, the MBeans you will enable a new entry point to remotely manage each scheduler individually. The MBean provides a method boolean interruptJob("JobName", "JobGroup")
Then you "just" need to call this method on the appropriated scheduler instance to effectively interrupt it.
I tried all the process manually and it works fine, just need to be automatized :)
HIH
You are right. The Scheduler.interrupt() does not work in the cluster mode. Let's say that a job trigger is fired by a scheduler in a node but this API is called in another node.
To overcome this, you might use the message broker approach (e.g. JMS, RabbitMQ, etc.) with publish/subscribe model. Instead of calling Scheduler.interrupt(), the client sends a message of this interruption to the message broker, the payload of the message consists of the identity of the job detail i.e JobKey and the name of scheduler ((if there are multiple schedulers used in a node). Then, the message is consumed by all nodes in which the Quartz instance is running, and the nodes find Quartz scheduler by name and then executes Scheduler.interrupt() of the found scheduler with the identity of the job detail taken from the message payload.
I am looking at Spring Batch 2.0 to implement a pipeline process. The process is listening to some event, and needs to perform a set of transformation steps base on the event type and its content.
Spring batch seem to be a great fit. However, going through the documentation, every example have them job and its steps configured in xml. Does the framework support creating jobs during run-time and configuring the steps dynamically?
the job configuration itself is set before the job runs, but it is possible to create a flexible job configuration with conditional flows
you can't just change the job configuration while the job runs, but between jobs its easy to replace the configuration
Addon for Michael answer:
Do you want to create a flow from beginning to end completely dynamically or you want to have some dynamics at certain point?
As Spring Batch instantiates jobs (will all internals) from XML configuration, that means all necessary beans have setters/getters and you can create the Job from empty page. This is long and bug-prone way (you need to create FlowJob as in JobParserJobFactoryBean goes, then SimpleFlow then StepState then TaskletStep as in SimpleStepFactoryBean and bind them together).
I think the alternative to XML flows could be your coded logic. For String Batch it will look as one step, but with custom implementation and subflow. See <tasklet ref="myCleverTasklet" /> example in Example Tasklet Implementation.
I am using spring batch. I have an ETL process that writes records to a DB and after it completes the ETL process, it will also write a FLAG to PROCESS_COMPLETE table.
Now, I'd like my spring job to trigger once when both the below conditions are true
It is past 5 PM and
The FLAG has been written in PROCESS_COMPLETE table
Appreciate if someone can suggest how to achieve the above using spring batch.
I'd recommend using Quartz for this. The actual triggering the start of a job is not Spring Batch's responsibility. Using Quartz you can create a custom trigger that will fire when both the time and database conditions are met.