ExecutionContext and Thread safety on Spring Batch

ExecutionContext and Thread safety on Spring Batch - java

I read few articles about that but Iam not sure about my case.
I am using ExecutionContext in order to pass params from Tasklet to Step
I wanna make sure that if I am executing same job instances in parallel using the same steps I wont have concurrency side-effects.
Thats my job:
#Bean
public Job processFileJob() throws Exception {
return this.jobs.get("processFileJob").start(downloadFileStep()).next(processSnidFileStep()).build();
}
public Step downloadFileStep() {
return this.steps.get("downloadFileTaskletStep").tasklet(downloadFileTasklet()).listener(executionContextPromotionListener()).build();
}
I am passing param from downloadFileStep to processSnidFileStep this way:
public class DownloadFileTasklet implements Tasklet, StepExecutionListener {
..
private void downloadFileFromExtractTool(ChunkContext chunkContext,
..
stepContext.put("totalRecords", totalRecords);
..
}
and retrieve the param onto my step this way:
#Bean
#Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
public ItemProcessor<MyDTO, MyDTO> processor(#Value("#{jobExecutionContext[totalRecords]}") int totalRecords
) {
return new PushItemProcessor(totalRecords);
}
Now I can see that param successfully inside processor step.
But what if I execute the whole job in parallel having different values for different jobs? any concurrent side effects?
Thank you.
ray.

From the spring batch reference we can see that one of the usages is "Concurrent batch processing: parallel processing of a job"
As already answered in the comments the job instance has it's own job execution, which has it's own context, so the spring batch design is assume thread safety for execution context, if you follow the design of course.
Here is a link to Spring batch reference guide, it's not that big actually
https://docs.spring.io/spring-batch/trunk/reference/html/spring-batch-intro.html#springBatchUsageScenarios

Related

spring batch : how to dynamically create steps and tasks

I am a spring newbie and a spring batch newbie -- so, please bear with me.
I understand that spring batch is the framework that will help run steps and tasks.
I tried using spring batch by creating steps and task using but these steps and tasks are hardcoded at program build/compile time. However, I could not figure out how to dynamically create Tasks and Steps.
What I want to do is to have a user create a script of how tasks are assembled from a list of steps. Each step will invoke a remote call to an existing REST endpoint. A task will have multiple such steps. the user will create multiple such tasks.
Is it possible to dynamically create such a task with such steps ? If yes, could you point me to some sample code how to do this with the required API ?
UPDATE :
I understand that a HelloWorld Job which calls a REST application using callRestApplication() looks like this.
#SpringBootApplication
#EnableBatchProcessing
public class HelloApplication {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Bean
public Step step() {
return this.stepBuilderFactory.get("step1")
.tasklet((stepContribution, chunkContext) -> {
callRestApplication();
return RepeatStatus.FINISHED;
}).build();
}
#Bean
public Job job() {
return this.jobBuilderFactory.get("HelloWorldJob")
.start(step())
.build();
}
public static void main(String[] args) {
SpringApplication.run(HelloApplication.class, args);
}
}
However, this is static. I was looking for a way where I can do something like this
public static void main(String[] args) {
SpringApplication.run(HelloagainApplication.class, args);
List<JobDefinition> jobDefinitions = parseScript();
for (JobDefinition jobDefinition : jobDefinitions) {
Job job = new Job();
job.setName(jobDefinition.getName());
for (StepDefinition stepDefinition : jobDefinition.getStepDefinitions()) {
Step step = new Step();
step.setEndPoint(stepDefinition.getEndPoint());
step.setName(stepDefinition.getName());
step.setProperty1(stepDefinition.getProp1());
job.addStep(step);
}
job.startJob();
}
}
If the script has 10 job definitions, then 10 jobs will be started, each with x number of steps.
In spring batch, how can I do the following
job.addStep(step);
job.startJob();
Thanks

There is no such feature in Spring Batch. You need to write custom code that parses your user's script and create job/step definitions accordingly.
That said, Spring Cloud Dataflow might help you. You can pre-define your tasks and let the user compose them using the GUI or the text-based scriptable DSL. Another feature that could be interesting to you is the single-step job starter from Spring Cloud Task, which allows you to dynamically create a single-step job by providing some properties (no coding needed).

How can I define multiple jobs dynamically in Spring Batch?

I have an application that uses Spring Batch to define a preset number of jobs, which are currently all defined in the XML.
We add more jobs over time which requires updating the XML, however these jobs are always based on the same parent and can easily be predetermined using a simple SQL query.
So I've been trying to switch to use some combination of XML configuration and Java-based configuration but am quickly getting confused.
Even though we have many jobs, each job definition falls into essentially one of two categories. All of the jobs inherit from one or the other parent job and are effectively identical, besides having different names. The job name is used in the process to select different data from the database.
I've come up with some code much like the following but have run into problems getting it to work.
Full disclaimer that I'm also not entirely sure I'm going about this in the right way. More on that in a second; first, the code:
#Configuration
#EnableBatchProcessing
public class DynamicJobConfigurer extends DefaultBatchConfigurer implements InitializingBean {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private JobRegistry jobRegistry;
#Autowired
private DataSource dataSource;
#Autowired
private CustomJobDefinitionService customJobDefinitionService;
private Flow injectedFlow1;
private Flow injectedFlow2;
public void setupJobs() throws DuplicateJobException {
List<JobDefinition> jobDefinitions = customJobDefinitionService.getAllJobDefinitions();
for (JobDefinition jobDefinition : jobDefinitions) {
Job job = null;
if (jobDefinition.getType() == 1) {
job = jobBuilderFactory.get(jobDefinition.getName())
.start(injectedFlow1).build()
.build();
} else if (jobDefinition.getType() == 2) {
job = jobBuilderFactory.get(jobDefinition.getName())
.start(injectedFlow2).build()
.build();
}
if (job != null) {
jobRegistry.register(new ReferenceJobFactory(job));
}
}
}
#Override
public void afterPropertiesSet() throws Exception {
setupJobs();
}
public void setInjectedFlow1(Flow injectedFlow1) {
this.injectedFlow1 = injectedFlow1;
}
public void setInjectedFlow2(Flow injectedFlow2) {
this.injectedFlow2 = injectedFlow2;
}
}
I have the flows that get injected defined in the XML, much like this:
<batch:flow id="injectedFlow1">
<batch:step id="InjectedFlow1.Step1" next="InjectedFlow1.Step2">
<batch:flow parent="InjectedFlow.Step1" />
</batch:step>
<batch:step id="InjectedFlow1.Step2">
<batch:flow parent="InjectedFlow.Step2" />
</batch:step>
</batch:flow>
So as you can see, I'm effectively kicking off the setupJobs() method (which is intended to dynamically create these job definitions) from the afterPropertiesSet() method of InitializingBean. I'm not sure that's right. It is running, but I'm not sure if there's a different entry point that's better intended for this purpose. Also I'm not sure what the point of the #Configuration annotation is to be honest.
The problem I'm currently running into is as soon as I call register() from JobRegistry, it throws the following IllegalStateException:
To use the default BatchConfigurer the context must contain no more than one DataSource, found 2.
Note: my project actually has two data sources defined. The first is the default dataSource bean which connects to the database that Spring Batch uses. The second data source is an external database, and this second one contains all the information I need to define my list of jobs. But the main one does use the default name "dataSource" so I'm not quite sure how else I can tell it to use that one.

First of all - I don't recommend using a combination of XML as well as Java Configuration. Use only one, preferably Java one as its not much of an effort to convert XML config to Java config. (Unless you have some very good reasons to do it - that you haven't explained)
I haven't used Spring Batch alone as I have always used it with Spring Boot and I have a project where I have defined multiple jobs and it always worked well for similar code that you have shown.
For your issue, there are some answers on SO like this OR this which are basically trying to say that you need to write your own BatchConfigurer and not rely on default one.
Now coming to solution using Spring Boot
With Spring Boot, You should try segregate job definitions and job executions.
You should first try to just define jobs and initialize Spring context without enabling jobs (spring.batch.job.enabled=false)
In your Spring Boot main method, when you start app with something like - SpringApplication.run(Application.class, args); ...you will get ApplicationContext ctx
Now you can get your relevant beans from this context & launch specif jobs by getting names from property or command line etc & using JobLauncher.run(...) method.
You can refer my this answer if willing to order job executions. You can also write job schedulers using Java.
Point being, you separate your job building / bean configurations & job execution concerns.
Challenge
Keeping multiple jobs in a single project can be challenging when you try to have different settings for each job as application.properties file is environment specific and not job specific i.e. spring boot properties will apply to all jobs.

In my particular case, the solution was to actually eliminate the #Configuration
and #EnableBatchProcessing annotations from my class above. Something about these caused it to try and use the DefaultBatchConfigurer which fails when you have more than one data source defined (even if you've identified them clearly with "dataSource" as the primary and some other name for the secondary).
The #Configuration class in particular wasn't necessary because all it really does is lets your class get auto-instantiated without having to define it as a bean in the app context. But since I was doing that anyway this one was superfluous.
One of the downsides of removing #EnableBatchProcessing was that I could no longer auto-wire the JobBuilderFactory bean. So instead I just had to do to create it:
JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
factory.setDataSource(dataSource);
factory.setTransactionManager(transactionManager);
factory.afterPropertiesSet();
jobRepository = factory.getObject();
jobBuilderFactory = new JobBuilderFactory(jobRepository);
Then it seems I was on the right track already by using jobRegistry.register(...) to define my jobs. So essentially once I removed those annotations above everything started working. I'm going to mark Sabir's answer as the correct one however because it helped me out.

Spring - Dynamically add and remove scheduled task

I am working on spring based server application. Basically it will poll scores of various sporting events in very short interval and save in db. For polling there will be many(can be around 100) calls to different apis concurrently at regular interval for example some api call will have 3 seconds interval some have 5 seconds etc., server will keep polling for latest data at frequent interval.
These calls will be added and removed dynamically. I have little experience in using spring. I think I have to use some scheduler. Can anyone point in right direction what approach or which scheduler is best in this scenario.

In essence you want to inject an instance of a scheduling task executor
#Configuration
public class MyApplicationConfiguration {
#Bean
public ThreadPoolTaskScheduler threadPoolTaskScheduler() {
ThreadPoolTaskScheduler tpts = new ThreadPoolTaskScheduler();
// maybe configure it a little?
return tpts;
}
#Bean
public MyService myService() {
return new MyService();
}
}
class MyService {
#Autowired
private ThreadPoolTaskScheduler tpts;
public void doSomething() {
Runnable task = ...
tpts.scheduleWithFixedDelay(task, 1000);
}
}
You can see a reasonable guide here, or the SchedulingTaskExecutor Javadoc and the Spring Task Execution and Scheduling Reference

You can use #Scheduled Spring Annotation for this. Refer this link for examples.

java scheduler spring vs quartz

Currently I am building a spring standalone program in order to learn new methods and architectures.
The last few days I tried to learn scheduler. I never used them before so I read some articles handling the different possible methods. Two of them are especially interesting: The spring nativ #Scheduler and Quartz.
From what I read, Spring is a little bit smaller then Quartz and much more basic. And quartz is not easy to use with spring (because of the autowired and components).
My problem now is, that there is one thing I do not understand:
From my understanding, both methods are creating parallel Threads in order to asynchronously run the jobs. But what if I now have a spring #Service in my main Application, that is holding a HashMap with some information. The data is updated and changed with user interaction. Parallel there are the scheduler. And a scheduler now whants to use this HashMap from the main application as well. Is this even possible?
Or do I understand something wrong? Because there is also the #Async annotation and I did not understand the difference. Because a scheduler itself is already parallel to the main corpus, isn't it?
(summing up, two questions:
can a job that is executed every five seconds, implemented with a scheduler, use a HashMap out of a service inside the main program? (in spring #Scheduler and/or in Quartz?)
Why is there a #Async annotation. Isn't a scheduler already parallel to the main process?
)

I have to make a few assumptions about which version of Spring you're using but as you're in the process of learning, I would assume that you're using spring-boot or a fairly new version, so please excuse if the annotations don't match your version of Spring. This said, to answer your two questions the best I can:
can a job that is executed every five seconds, implemented with a scheduler, use a HashMap out of a service inside the main program? (in spring #Scheduler and/or in Quartz?)
Yes, absolutely! The easiest way is to make sure that the hashmap in question is declared as static. To access the hashmap from the scheduled job, simply either autowire your service class or create a static get function for the hashmap.
Here is an example of a recent Vaadin project where I needed a scheduled message sent to a set of subscribers.
SchedulerConfig.class
#Configuration
#EnableAsync
#EnableScheduling
public class SchedulerConfig {
#Scheduled(fixedDelay=5000)
public void refreshVaadinUIs() {
Broadcaster.broadcast(
new BroadcastMessage(
BroadcastMessageType.AUTO_REFRESH_LIST
)
);
}
}
Broadcaster.class
public class Broadcaster implements Serializable {
private static final long serialVersionUID = 3540459607283346649L;
private static ExecutorService executorService = Executors.newSingleThreadExecutor();
private static LinkedList<BroadcastListener> listeners = new LinkedList<BroadcastListener>();
public interface BroadcastListener {
void receiveBroadcast(BroadcastMessage message);
}
public static synchronized void register(BroadcastListener listener) {
listeners.add(listener);
}
public static synchronized void unregister(BroadcastListener listener) {
listeners.remove(listener);
}
public static synchronized void broadcast(final BroadcastMessage message) {
for (final BroadcastListener listener: listeners)
executorService.execute(new Runnable() {
#Override
public void run() {
listener.receiveBroadcast(message);
}
});
}
}
Why is there a #Async annotation. Isn't a scheduler already parallel to the main process?
Yes, the scheduler is running in its own thread but what occurs to the scheduler on long running tasks (ie: doing a SOAP call to a remote server that takes a very long time to complete)?
The #Async annotation isn't required for scheduling but if you have a long running function being invoked by the scheduler, it becomes quite important.
This annotation is used to take a specific task and request to Spring's TaskExecutor to execute it on its own thread instead of the current thread. The #Async annotation causes the function to immediately return but execution will be later made by the TaskExecutor.
This said, without the #EnableAsync or #Async annotation, the functions you call will hold up the TaskScheduler as they will be executed on the same thread. On a long running operation, this would cause the scheduler to be held up and unable to execute any other scheduled functions until it returns.
I would suggest a read of Spring's Documentation about Task Execution and Scheduling It provides a great explanation of the TaskScheduler and TaskExecutor in Spring

Running a Job only once Using Quartz

Is there a way I could run a job only once using Quartz in Java? I understand it does not make sense to use Quartz in this case. But, the thing is, I have multiple jobs and they are run multiple times. So, I am using Quartz.
Is this even possible?

You should use SimpleTrigger that fires at specific time and without repeating. TriggerUtils has many handy methods for creating these kind of things.

Yes, it's possible!
JobKey jobKey = new JobKey("testJob");
JobDetail job = newJob(TestJob.class)
.withIdentity(jobKey)
.storeDurably()
.build();
scheduler.addJob(job, true);
scheduler.triggerJob(jobKey); //trigger a job inmediately

In quartz > 2.0, you can get the scheduler to unschedule any job after work is done:
#Override
protected void execute(JobExecutionContext context)
throws JobExecutionException {
...
// process execution
...
context.getScheduler().unscheduleJob(triggerKey);
...
}
where triggerKey is the ID of the job to run only once. After this, the job wouldn't be called anymore.

Here is an example of how to run a TestJob class immediately with Quartz 2.x:
public JobKey runJob(String jobName)
{
// if you don't call startAt() then the current time (immediately) is assumed.
Trigger runOnceTrigger = TriggerBuilder.newTrigger().build();
JobKey jobKey = new JobKey(jobName);
JobDetail job = JobBuilder.newJob(TestJob.class).withIdentity(jobKey).build();
scheduler.scheduleJob(job, runOnceTrigger);
return jobKey;
}
see also Quartz Enterprise Job Scheduler Tutorials → SimpleTriggers

I'm not sure how much similar is Quartz in Mono and Java but this seems working in .Net
TriggerBuilder.Create ()
.StartNow ()
.Build ();

I had to ask myself if it made sense to try to configure a job and add checks if it had been run already as suggested in Marko Lahma's answer (since scheduling a job to run once results in it being run once, every time we start the app). I found examples of CommandLineRunner apps which didn't quite work for me, mostly because we already had an ApplicationRunner which was used for other jobs which use Quartz scheduling / cron. I wasn't happy with having Quartz initialize this job using a SimpleTrigger, so I had to find something else.
Using some ideas from the following articles:
Multiple Spring boot CommandLineRunner based on command line argument
Run Spring Batch Job programmatically?
Firing Quartz jobs manually
Is there any way to get job keys in Quartz by job name
How to list all Jobs in the Quartz Scheduler
Spring Boot CommandLineRunner and ApplicationRunner
I was able to piece together a working implementation which allows me to do the following:
run existing jobs via Quartz, on a timer
run new job, one time programmatically (single use Quartz job using the SimpleTrigger didn't satisfy my requirements, since it would be run once on every application load)
I came up with the following CommandLineRunner class:
public class BatchCommandLineRunner implements CommandLineRunner {
#Autowired
private Scheduler scheduler;
private static final Logger LOGGER = LoggerFactory.getLogger(BatchCommandLineRunner.class);
public void run(final String... args) throws SchedulerException {
LOGGER.info("BatchCommandLineRunner: running with args -> " + Arrays.toString(args));
for (final String jobName : args) {
final JobKey jobKey = findJobKey(jobName);
if (jobKey != null) {
LOGGER.info("Triggering job for: " + jobName);
scheduler.triggerJob(jobKey);
} else {
LOGGER.info("No job found for jobName: " + jobName);
}
}
}
private JobKey findJobKey(final String jobNameToFind) throws SchedulerException {
for (final JobKey jobKey : scheduler.getJobKeys(GroupMatcher.jobGroupEquals("DEFAULT"))) {
final String jobName = jobKey.getName();
if (jobName.equals(jobNameToFind)) {
return jobKey;
}
}
return null;
}
}
In one of my configuration classes I added a CommandLineRunner bean which calls the custom CommandLineRunner I created:
#Configuration
public class BatchConfiguration {
private static final Logger LOGGER = LoggerFactory.getLogger(BatchConfiguration.class);
#Bean
public BatchCommandLineRunner batchCommandLineRunner() {
return new BatchCommandLineRunner();
}
#Bean
public CommandLineRunner runCommandLineArgs(final ApplicationArguments applicationArguments) throws Exception {
final List<String> jobNames = applicationArguments.getOptionValues("jobName");
LOGGER.info("runCommandLineArgs: running the following jobs -> " + ArrayUtils.toString(jobNames));
batchCommandLineRunner().run(jobNames.toArray(ArrayUtils.EMPTY_STRING_ARRAY));
return null;
}
}
Later, I am able to initiate these jobs via the CLI without affecting my current Quartz scheduled jobs, and as long as no one runs the command via CLI multiple times, it will never be run again. I have to do some juggling of types since I accept ApplicationArguments, and then convert them into String[].
Finally, I am able to call it like this:
java -jar <your_application>.jar --jobName=<QuartzRegisteredJobDetailFactoryBean>
The result is that the job is initialized only when I call it, and it is excluded from my CronTriggerFactoryBean triggers which I used for my other jobs.
There are several assumptions being made here, so I'll try to summarize:
the job must be registered as a JobDetailFactoryBean (e.g.: scheduler.setJobDetails(...))
everything is essentially the same as a job with CronTriggerFactoryBean, excepting the lacking scheduler.setTriggers(...) call
Spring knows to execute the CommandLineRunner classes after the application has booted
I hardcoded the parameter being passed into the application to "jobName"
I assumed a group name of "DEFAULT" for all jobs; if you want to use differing groups this would need to be adjusted when fetching JobKey, which is used to actually run the job
there is nothing which prevents this job from being run multiple times via CLI, but it was triggered on every application load using SimpleTrigger approach, so this is better for me; if this is not acceptable, perhaps using StepListener and ExitStatus, etc. can prevent it from being executed twice

Another solution: There is a method .withRepeatCount(0) in SimpleSchedulerBuilder:
public final int TEN_SECONDS = 10;
Trigger trigger = newTrigger()
.withIdentity("myJob", "myJobGroup")
.startAt(new Date(System.currentMillis()+TEN_SECONDS*1000)
.withSchedule(SimpleScheduleBuilder.simpleSchedule()
.withRepeatCount(0)
.withIntervalInMinutes(1))
.build();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.