Spring Batch: different job launcher for different jobs - java

I have 2 different jobs (actually more but for simplicity assume 2). Each job can run in parallel with the other job, but each instance of the same job should be run sequentially (otherwise the instances will cannibalize eachother's resources).
Basically I want each of these jobs to have it's own queue of job instances. I figured I could do this using two different thread pooled job launchers (each with 1 thread) and associating a job launcher with each job.
Is there a way to do this that will be respected when launching jobs from the Spring Batch Admin web UI?

There is a way to specify a specific job launcher for a specific job, but the only way I have found to do it is through use of a JobStep.
If you have a job called "specificJob" this will create another job "queueSpecificJob" so when you launch it, either through Quartz or Spring Batch web admin, it will queue up a "specificJob" execution.
<bean id="specificJobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository"/>
<property name="taskExecutor">
<task:executor id="singleThreadPoolExecutor" pool-size="1"/>
</property>
</bean>
<job id="queueSpecificJob">
<step id="specificJobStep">
<job ref="specificJob" job-launcher="specificJobLauncher" job-parameters-extractor="parametersExtractor" />
</step>
</job>

# ahbutfore
How are the jobs triggered? Do you use Quartz trigger by any chance?
If yes, would implementing/extending the org.quartz.StatefulJob interface in all your jobs do the work for you?
See Spring beans configuration here : https://github.com/regunathb/Trooper/blob/master/examples/example-batch/src/main/resources/external/shellTaskletsJob/spring-batch-config.xml. Check source code of org.trpr.platform.batch.impl.spring.job.BatchJob
You can do more complex serialization (including across Spring batch nodes) using a suitable "Leader Election" implementation. I have used Netflix Curator (an Apache Zookeeper recipe) in my project. Some pointers here : https://github.com/regunathb/Trooper/wiki/Useful-Batch-Libraries

Using a shell script you can launch different jobs parallel.
Add an '&' to the end of each command line. The shell will execute them in parallel with it's own execution.

Related

Synchronize batch applications deployed on different servers to process file by one server only

I am writing a spring batch to read file from shared drive and load data into shared DB. This batch will be deployed/executed from 2 nodes(servers). I want to make sure the file is read by only one server and load data.
I am not finding anything concrete on internet. I have couple of ideas to handle this as mentioned below.
1. Use FileChannel tryLock to get a lock on file and move the file after reading it.
2. Maintain a table in shared DB to maintain a record say "fileReadJobExcution" with status as NULL initially. when batch application runs it will lookup into this table to get record with status null and try to update status as IN_PROGRESS. So whichever node(server) get updateCount > 0 will be allowed to read file from shared location and after successful that batch updates status back to NULL.
I am looking for if something already available in either in Spring batch or JAVA to handle multi node synchronization to a shared server.
Please help with suggestions.
It sounds like you could either use remote chunking or partitioning to achieve your objective. From what you've described, I think partitioning would work best.
You could create a master Step to pull in your list of files, and then delegate the processing of those files to slave Step objects - either remotely or locally on different threads - passing the file name via the ExecutionContext.
The Spring Batch Samples GitHub project has some great examples, and I think you may find the partitionFileJob.xml particularly helpful.
In particular, review the following Bean definitions from the sample project:
<job id="partitionJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step">
<partition step="step1" partitioner="partitioner">
<handler grid-size="2" task-executor="taskExecutor" />
</partition>
</step>
</job>
<bean id="partitioner" class="org.springframework.batch.core.partition.support.MultiResourcePartitioner">
<property name="resources" value="classpath:data/iosample/input/delimited*.csv" />
</bean>
<bean id="itemReader" scope="step" autowire-candidate="false" parent="itemReaderParent">
<property name="resource" value="#{stepExecutionContext[fileName]}" />
</bean>

Clustered Quartz trigger is paused by Camel on shutdown

I have following configuration in a quartz.properties
org.quartz.scheduler.instanceId=AUTO
org.quartz.scheduler.instanceName=JobCluster
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
org.quartz.jobStore.dataSource=myDataSource
org.quartz.dataSource.chargebackDataSource.jndiURL=jdbc/myDataSource
org.quartz.jobStore.isClustered=true
org.quartz.threadPool.threadCount=5
Spring configuration looks like this:
<bean id="quartz2" class="org.apache.camel.component.quartz2.QuartzComponent">
<property name="propertiesFile" value="quartz.properties"/>
</bean>
<route>
<from uri="quartz2://myTrigger?job.name=myJob&job.durability=true&stateful=true&trigger.repeatInterval=60000&trigger.repeatCount=-1"/>
<to uri="bean:myBean?method=retrieve"/>
....
On application shut down the Quartz trigger state changed to PAUSED and after the next start never changed to WAITING again so never fired again.
Is it possible to configure quartz/camel somehow to resume trigger after the application restart?
Camel version is 2.12.0.
Spring version 3.2.4.RELEASE
Actually such behavior contradicts with theit statement at guideline:
If you use Quartz in clustered mode, e.g. the JobStore is clustered. Then the Quartz2 component will not pause/remove triggers when a node is being stopped/shutdown. This allows the trigger to keep running on the other nodes in the cluster.
If you want to dynamic suspend/resume routes as the
org.apache.camel.impl.ThrottlingRoutePolicy
does then its advised to use org.apache.camel.SuspendableService as it allows for fine grained suspend and resume operations. And use the org.apache.camel.util.ServiceHelper to aid when invoking these operations as it support fallback for regular org.apache.camel.Service instances.
For more details please refer RoutePolicy and Quartz Component
Hope this migh help

how to remove unused triggers from quartz tables

I'm using spring with Quartz and every thing is working fine but some previously cofigured triggers also got executed because they are stored in Quartz tables.
Manually we can delete all unconfigured triggers and execute the application but that is not a good practice.
I want to remove all the triggers through a spring+quartz property or some other solution.
When I have configured 3 triggers in spring configuration file like
<property name="triggers">
<list>
<ref bean="FirstTrigger" />
<ref bean="secondTrigger" />
<ref bean="ThirdTrigger"/>
</list>
</property>
When server started, all the triggers stored in Quartz tables with corresponding cron triggers and job details.
If i remove any of the triggers in my configuration, in above for example I removed second Trigger, but it wasn't removed from Quartz tables.
At that time DBtrigger (removed trigger) also executed.
In spring + Quartz integration, is there any property to handle this problem or do we need to do something else for this problem?
Thanks in advance.
If you store triggers in DB (suppose your triggers are cron-based), you can simply delete records like this:
DELETE FROM QRTZ_CRON_TRIGGERS WHERE SCHED_NAME='scheduler' and TRIGGER_NAME='myTrigger' and TRIGGER_GROUP='DEFAULT';
DELETE FROM QRTZ_TRIGGERS WHERE SCHED_NAME='scheduler' and TRIGGER_NAME='myTrigger' and TRIGGER_GROUP='DEFAULT';
You may also consider looking around other Quartz DB tables to find leftovers related to your job.
You can access the Quartz Scheduler, Jobs, Triggers, etc. using the Quartz API.
Have a look at this Quartz CookBook, you will find out how to list all the triggers defined, etc. Maybe you should remove the unnecessary triggers using this API.

Several task executors in spring integration

I am currently working on a project involving lots of asynchronous tasks running independently. I have one spring configuration file.
<task:executor id="taskScheduler" pool-size="5-20">
<task:executor id="specificTaskScheduler" pool-size="5-50" queue-capacity="100">
<!-- integration beans and
several object pools, with a total number of 100 beans created
using CommonsPoolTargetSource -->
I specifically created two executors - one to be used for spring integration needs and custom executor in order for it to run only my tasks feeding it to integration beans with explicit reference. After that I supplied a long running task to be procesed. My EAR runs on WebLogic and I dumped stacktrasce of threads being run and was very disappointed to find out that most of fifty threads in my custom executor wait in a executor's queue for an object to be available from the pool. I did not want CommonsPoolTargetSource to use my executor as a platform for managing its sources. What can I do here? Maybe creating a separate spring file with CommonsTargetSource beans will solve it? Thank you for any ideas.
Thanks guys. Turned out that pool was not a problem, I just had to add more instances to it and slightly increase the pool size with queue capacity set to zero and rejection policy set to an execution of the call in the caller's thread. I have yet to test it under heavy load though.

run one scheduler at a time

I have two beans which is running the scheduler
<bean id="eventService" class="xxx.xxxx.xxxxx.EventSchedulerImpl">
</bean>
<bean id="UpdateService" class="xxx.xxxx.xxxxx.UpdateSchedulerImpl">
</bean>
I want to make sure only one scheduler is running at time
when EventSchedulerImpl is running UpdateSchedulerImpl should not run also implemented "StatefulJob" on both the scheduler
will that work? do i need to do more?
appericate your idea guys
One way would be to configure a special task executor so that it contains only one thread in its thread pool, and configure its queue capacity so that jobs can be kept "on-hold". So only one task can run at a time with this task executor, and the other task will get queued.
But I don't like this approach. Having a single-threaded task executor seems like a recipe for problems down the road.
What I would do is simply write a wrapper service that calls your target services in the order you need. Then schedule the wrapper service instead.

Categories