spring batch : how to dynamically create steps and tasks - java

I am a spring newbie and a spring batch newbie -- so, please bear with me.
I understand that spring batch is the framework that will help run steps and tasks.
I tried using spring batch by creating steps and task using but these steps and tasks are hardcoded at program build/compile time. However, I could not figure out how to dynamically create Tasks and Steps.
What I want to do is to have a user create a script of how tasks are assembled from a list of steps. Each step will invoke a remote call to an existing REST endpoint. A task will have multiple such steps. the user will create multiple such tasks.
Is it possible to dynamically create such a task with such steps ? If yes, could you point me to some sample code how to do this with the required API ?
UPDATE :
I understand that a HelloWorld Job which calls a REST application using callRestApplication() looks like this.
#SpringBootApplication
#EnableBatchProcessing
public class HelloApplication {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Bean
public Step step() {
return this.stepBuilderFactory.get("step1")
.tasklet((stepContribution, chunkContext) -> {
callRestApplication();
return RepeatStatus.FINISHED;
}).build();
}
#Bean
public Job job() {
return this.jobBuilderFactory.get("HelloWorldJob")
.start(step())
.build();
}
public static void main(String[] args) {
SpringApplication.run(HelloApplication.class, args);
}
}
However, this is static. I was looking for a way where I can do something like this
public static void main(String[] args) {
SpringApplication.run(HelloagainApplication.class, args);
List<JobDefinition> jobDefinitions = parseScript();
for (JobDefinition jobDefinition : jobDefinitions) {
Job job = new Job();
job.setName(jobDefinition.getName());
for (StepDefinition stepDefinition : jobDefinition.getStepDefinitions()) {
Step step = new Step();
step.setEndPoint(stepDefinition.getEndPoint());
step.setName(stepDefinition.getName());
step.setProperty1(stepDefinition.getProp1());
job.addStep(step);
}
job.startJob();
}
}
If the script has 10 job definitions, then 10 jobs will be started, each with x number of steps.
In spring batch, how can I do the following
job.addStep(step);
job.startJob();
Thanks

There is no such feature in Spring Batch. You need to write custom code that parses your user's script and create job/step definitions accordingly.
That said, Spring Cloud Dataflow might help you. You can pre-define your tasks and let the user compose them using the GUI or the text-based scriptable DSL. Another feature that could be interesting to you is the single-step job starter from Spring Cloud Task, which allows you to dynamically create a single-step job by providing some properties (no coding needed).

Related

How to always start a new job instance for each run when using spring batch with spring boot?

While using spring batch with spring boot, we will get a fat jar which contains all of my spring batch jobs, I want to be able to trigger a specific job from the command line by specifying the job name, the problem is that spring batch detects the job completed, so it will only run the job once, with spring boot, we can specify the name using --spring.batch.job.names=jobToRun, the problem is how can I make it always start a new instance and still be able to use this mechanism to pass a job name to run.
I didn't configure a JobLauncher, so I guess it's using the default JobLauncherCommandLineRunner, All I currently configured is:
#SpringBootApplication
#EnableBatchProcessing
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}
With this configuration, I can run the job from the command line:
java -jar batch.jar --spring.batch.job.names=job1ToRun
How can I start a new instance for every run with a similiar mechanism? I need to specify a job name from the command line to choose a job to run.
From the spring-batch documentation
The CommandLineJobRunner
Because the script launching the job must kick off a Java Virtual Machine, there needs to be a class with a main method to act as the primary entry point. Spring Batch provides an implementation that serves just this purpose: CommandLineJobRunner. It's important to note that this is just one way to bootstrap your application, but there are many ways to launch a Java process, and this class should in no way be viewed as definitive. The CommandLineJobRunner performs four tasks:
Load the appropriate ApplicationContext
Parse command line arguments into JobParameters
Locate the appropriate job based on arguments
Use the JobLauncher provided in the application context to launch the job.
A very basic example of how you could do this:
public static void main(String[] args) throws Exception {
ConfigurableApplicationContext ctx = SpringApplication.run(Application.class, args);
JobLauncher jobLauncher = ctx.getBean(JobLauncher.class);
String jobName = args[0] // assuming args[0] is the jobName
JobParameters jobParameters = createJobParameters(args)
Job jobInstance = ctx.getBean(jobName, Job.class);
jobLauncher.run(jobInstance, jobParameters);
}
private static JobParameters createJobParameters(String[] args) {
// TODO: Create & return the needed parameters
}

Testing spring batch job stepScope

I'm trying to test a spring batch job that performs a read (get data from another application) process (simple calculation) and write (into the mongodb)
the reader is #StepScope
here is the postConstruct of the read task.
#PostConstruct
public void init(){
employees.addAll(getListOfEmployeesBy(affectationMotifService.findAllRegistrationNumbers()));
}
public List<EmployeeForSalaryDTO> getListOfEmployeesBy(List<String> registrationNumbers){
LOG.debug("request to get all the employees by registration numbers {}" , registrationNumbers);
return coreResourceFeign.getAllEmployeesForSalaryByRegistrationNumbers(registrationNumbers).getBody();
}
When I try to launch the test of the job or what ever test in the application. spring always runs the init() of the read task .. which will fail the test because I need to mock the coreResourceFeign.getAllEmployeesForSalaryByRegistrationNumbers(registrationNumbers) .
I can't mock the method because it runs before the test begin.
here is the test
#RunWith(SpringRunner.class)
#SpringBootTest(classes = {SalaryApp.class, SecurityBeanOverrideConfiguration.class})
public class SalaryJobServiceTest {
#Autowired
#InjectMocks
private SalaryJobService salaryJobService;
#Test
public void startJob() throws Exception {
SalaryJobDTO SalaryJobDTO = salaryJobService.start(Collections.emptyList());
Assert.assertNotNull(salaryJobDTO.getId());
}
}
I have no idea how to deal with spring batch tests. Any recommendation or help will be welcomed.
#PostConstruct will make sure your method is called immediately after the object is created. Spring application creates all the beans as per the configs while application start. This is expected behavior. If you do not want to call your method during application start up remove #PostConstruct and you can run your test mocking the dependent objects.
Rather you should use readers read method to load your data to the reader.

SpringBoot CommandLineRunner

In what circumstances CommandLineRunner is preferred instead of writing additional code in the main method of SpringBoot application.
I understand that CommandLineRunner gets executed before main gets completed.
In simple cases, there is no difference.
But if the code need to access features provided by spring, such as ioc or only interface repositories/services, you need to wait for the complete application startup. And the call of the overrided run method after completion is garanteed.
Besides, CommandLineRunner has other advantages:
Can be implemented multiple times
The capability to start any scheduler or log any message before application starts to run
I have used it to decouple code. Instead of placing a bunch of code into main method, the CommandLineRunner lets you distribute it more evenly around the codebase. It really depends on what kind of flags you are passing in and why you need to pass them in. Spring offers a lot of flexibility for you to get the job done in the easiest way.
For a full on command line tool, you can decouple initialization and config a little bit by dividing your code between init and core behavior.
A spring boot server can overwrite configuration based on args passed in from the command line.
I would suggest all time time. It adds a lot of flexibility to your "bootstrapping code".
1) For example, command line runners are spring #Beans, so you can activate/deactivate them at run-time.
2) You can use them in an ordered fashion by appending the #Order annotation
3) You can unit test them just like regular classes
4) You can inject dependencies into them. Each runner can then have its own dependencies.
All of the above are more difficult, if not impossible to achieve if you add all your bootstrapping logic in the main() method of the Spring Application class.
Hope my answer helps,
Cheers
I haven't found any good reason for using it over just writing code after starting the application.
The only thing I can think of is that the command line runners are called before any SpringApplicationRunListeners have #finished called.
I have seen people use them to perform main application logic, but I think this is an antipattern.
One of the annoying things about doing so is that the application start timer is still running and when the task completes you will see a log entry like Started DemoApplication in 5.626 seconds (JVM running for 0.968).
It's confusing to see a message about your application starting despite, in reality it having just finished.
I encountered a scenario where i had to keep a certain data from the db loaded into the cache before the method was hit from the controller end point the first time. In this scenario it was desirable to hit the method for populating the cache using the run method after extending CommandLineRunner class so that before the application even starts up the data is already available in the cache.
I use this to populate my Default Data. I usually create ApplicationInitializer class that extends CommandLineRunner.
I have methods like createDefaultUser(), createDefaultSytemData() etc.
This way I do not rely on sql files to populate database for me. :)
ApplicationRunner and CommandLineRunner:
two of them can execute some custom code before your application finished starting up.
ComandLineRunner:
#Component
public class CommandLineAppStartupRunner implements CommandLineRunner {
private static final Logger logger = LoggerFactory.getLogger(CommandLineAppStartupRunner.class);
#Override
public void run(String...args) throws Exception {
logger.info(Arrays.toString(args));
}
}
you can get the args directly
ApplicationRunner:
#Component
public class AppStartupRunner implements ApplicationRunner {
private static final Logger logger = LoggerFactory.getLogger(AppStartupRunner.class);
#Override
public void run(ApplicationArguments args) throws Exception {
logger.info(args.getOptionNames());
}
}
ApplicationRunner has many methods to get params
ComandLineRunner can get params directly
if you custom two runner class, you can use annotation #Order to Specify the order of execution
public class Phone {
#Autowired
BeanExample beanExample;
public void print(){
beanExample.fn();
}
}
public class BeansCreatorClass {
#Bean
public BeanExample getBeanExample(){
return new BeanExample();
}
#Bean
public Phone getPhone(){
return new Phone();
}
}
#SpringBootApplication
public class SpringBootRunnerConfigurationPropertiesApplication implements CommandLineRunner, ApplicationRunner {
public static void main(String[] args){
SpringApplication.run(SpringBootRunnerConfigurationPropertiesApplication.class, args);
System.out.println("==== spring boot commandLine is running === ");
// beans creator class is the class contains all beans needed
ApplicationContext applicationContext = new AnnotationConfigApplicationContext(BeansCreatorClass.class);
Phone phone = applicationContext.getBean(Phone.class);
phone.print();
}
// commandLineRunner
public void run(String... args) throws Exception {
System.out.println("=== commandLine Runner is here ==== ");
}
// application runner
#Override
public void run(ApplicationArguments args) throws Exception {
System.out.println("=== application runner is here ====");
}
}
I mostly use CommandLineRunner to:
Apply initial migrations
Run a code that is independent of REST/SOAP calls.

ExecutionContext and Thread safety on Spring Batch

I read few articles about that but Iam not sure about my case.
I am using ExecutionContext in order to pass params from Tasklet to Step
I wanna make sure that if I am executing same job instances in parallel using the same steps I wont have concurrency side-effects.
Thats my job:
#Bean
public Job processFileJob() throws Exception {
return this.jobs.get("processFileJob").start(downloadFileStep()).next(processSnidFileStep()).build();
}
public Step downloadFileStep() {
return this.steps.get("downloadFileTaskletStep").tasklet(downloadFileTasklet()).listener(executionContextPromotionListener()).build();
}
I am passing param from downloadFileStep to processSnidFileStep this way:
public class DownloadFileTasklet implements Tasklet, StepExecutionListener {
..
private void downloadFileFromExtractTool(ChunkContext chunkContext,
..
stepContext.put("totalRecords", totalRecords);
..
}
and retrieve the param onto my step this way:
#Bean
#Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
public ItemProcessor<MyDTO, MyDTO> processor(#Value("#{jobExecutionContext[totalRecords]}") int totalRecords
) {
return new PushItemProcessor(totalRecords);
}
Now I can see that param successfully inside processor step.
But what if I execute the whole job in parallel having different values for different jobs? any concurrent side effects?
Thank you.
ray.
From the spring batch reference we can see that one of the usages is "Concurrent batch processing: parallel processing of a job"
As already answered in the comments the job instance has it's own job execution, which has it's own context, so the spring batch design is assume thread safety for execution context, if you follow the design of course.
Here is a link to Spring batch reference guide, it's not that big actually
https://docs.spring.io/spring-batch/trunk/reference/html/spring-batch-intro.html#springBatchUsageScenarios

Running a Job only once Using Quartz

Is there a way I could run a job only once using Quartz in Java? I understand it does not make sense to use Quartz in this case. But, the thing is, I have multiple jobs and they are run multiple times. So, I am using Quartz.
Is this even possible?
You should use SimpleTrigger that fires at specific time and without repeating. TriggerUtils has many handy methods for creating these kind of things.
Yes, it's possible!
JobKey jobKey = new JobKey("testJob");
JobDetail job = newJob(TestJob.class)
.withIdentity(jobKey)
.storeDurably()
.build();
scheduler.addJob(job, true);
scheduler.triggerJob(jobKey); //trigger a job inmediately
In quartz > 2.0, you can get the scheduler to unschedule any job after work is done:
#Override
protected void execute(JobExecutionContext context)
throws JobExecutionException {
...
// process execution
...
context.getScheduler().unscheduleJob(triggerKey);
...
}
where triggerKey is the ID of the job to run only once. After this, the job wouldn't be called anymore.
Here is an example of how to run a TestJob class immediately with Quartz 2.x:
public JobKey runJob(String jobName)
{
// if you don't call startAt() then the current time (immediately) is assumed.
Trigger runOnceTrigger = TriggerBuilder.newTrigger().build();
JobKey jobKey = new JobKey(jobName);
JobDetail job = JobBuilder.newJob(TestJob.class).withIdentity(jobKey).build();
scheduler.scheduleJob(job, runOnceTrigger);
return jobKey;
}
see also Quartz Enterprise Job Scheduler Tutorials → SimpleTriggers
I'm not sure how much similar is Quartz in Mono and Java but this seems working in .Net
TriggerBuilder.Create ()
.StartNow ()
.Build ();
I had to ask myself if it made sense to try to configure a job and add checks if it had been run already as suggested in Marko Lahma's answer (since scheduling a job to run once results in it being run once, every time we start the app). I found examples of CommandLineRunner apps which didn't quite work for me, mostly because we already had an ApplicationRunner which was used for other jobs which use Quartz scheduling / cron. I wasn't happy with having Quartz initialize this job using a SimpleTrigger, so I had to find something else.
Using some ideas from the following articles:
Multiple Spring boot CommandLineRunner based on command line argument
Run Spring Batch Job programmatically?
Firing Quartz jobs manually
Is there any way to get job keys in Quartz by job name
How to list all Jobs in the Quartz Scheduler
Spring Boot CommandLineRunner and ApplicationRunner
I was able to piece together a working implementation which allows me to do the following:
run existing jobs via Quartz, on a timer
run new job, one time programmatically (single use Quartz job using the SimpleTrigger didn't satisfy my requirements, since it would be run once on every application load)
I came up with the following CommandLineRunner class:
public class BatchCommandLineRunner implements CommandLineRunner {
#Autowired
private Scheduler scheduler;
private static final Logger LOGGER = LoggerFactory.getLogger(BatchCommandLineRunner.class);
public void run(final String... args) throws SchedulerException {
LOGGER.info("BatchCommandLineRunner: running with args -> " + Arrays.toString(args));
for (final String jobName : args) {
final JobKey jobKey = findJobKey(jobName);
if (jobKey != null) {
LOGGER.info("Triggering job for: " + jobName);
scheduler.triggerJob(jobKey);
} else {
LOGGER.info("No job found for jobName: " + jobName);
}
}
}
private JobKey findJobKey(final String jobNameToFind) throws SchedulerException {
for (final JobKey jobKey : scheduler.getJobKeys(GroupMatcher.jobGroupEquals("DEFAULT"))) {
final String jobName = jobKey.getName();
if (jobName.equals(jobNameToFind)) {
return jobKey;
}
}
return null;
}
}
In one of my configuration classes I added a CommandLineRunner bean which calls the custom CommandLineRunner I created:
#Configuration
public class BatchConfiguration {
private static final Logger LOGGER = LoggerFactory.getLogger(BatchConfiguration.class);
#Bean
public BatchCommandLineRunner batchCommandLineRunner() {
return new BatchCommandLineRunner();
}
#Bean
public CommandLineRunner runCommandLineArgs(final ApplicationArguments applicationArguments) throws Exception {
final List<String> jobNames = applicationArguments.getOptionValues("jobName");
LOGGER.info("runCommandLineArgs: running the following jobs -> " + ArrayUtils.toString(jobNames));
batchCommandLineRunner().run(jobNames.toArray(ArrayUtils.EMPTY_STRING_ARRAY));
return null;
}
}
Later, I am able to initiate these jobs via the CLI without affecting my current Quartz scheduled jobs, and as long as no one runs the command via CLI multiple times, it will never be run again. I have to do some juggling of types since I accept ApplicationArguments, and then convert them into String[].
Finally, I am able to call it like this:
java -jar <your_application>.jar --jobName=<QuartzRegisteredJobDetailFactoryBean>
The result is that the job is initialized only when I call it, and it is excluded from my CronTriggerFactoryBean triggers which I used for my other jobs.
There are several assumptions being made here, so I'll try to summarize:
the job must be registered as a JobDetailFactoryBean (e.g.: scheduler.setJobDetails(...))
everything is essentially the same as a job with CronTriggerFactoryBean, excepting the lacking scheduler.setTriggers(...) call
Spring knows to execute the CommandLineRunner classes after the application has booted
I hardcoded the parameter being passed into the application to "jobName"
I assumed a group name of "DEFAULT" for all jobs; if you want to use differing groups this would need to be adjusted when fetching JobKey, which is used to actually run the job
there is nothing which prevents this job from being run multiple times via CLI, but it was triggered on every application load using SimpleTrigger approach, so this is better for me; if this is not acceptable, perhaps using StepListener and ExitStatus, etc. can prevent it from being executed twice
Another solution: There is a method .withRepeatCount(0) in SimpleSchedulerBuilder:
public final int TEN_SECONDS = 10;
Trigger trigger = newTrigger()
.withIdentity("myJob", "myJobGroup")
.startAt(new Date(System.currentMillis()+TEN_SECONDS*1000)
.withSchedule(SimpleScheduleBuilder.simpleSchedule()
.withRepeatCount(0)
.withIntervalInMinutes(1))
.build();

Categories