I have a Spring Batch project where I have several jobs. I moduralized them in a single job to execute them sequentially like this:
#Autowired
#Qualifier("job1")
private Job job1;
#Autowired
#Qualifier("job2")
private Job job2;
#Autowired
#Qualifier("job3")
private Job job3;
#Bean(name = "step1")
public Step step1() {
return stepBuilderFactory.get("step1").job(job1).build();
}
#Bean(name = "step2")
public Step step2() {
return stepBuilderFactory.get("step2").job(job2).build();
}
#Bean(name = "step3")
public Step step3() {
return stepBuilderFactory.get("step3").job(job3).build();
}
#Bean(name = "parentJob")
public Job parentJob(#Qualifier("step1") Step step1, #Qualifier("step2") Step step2, #Qualifier("step3") Step step3) {
return jobBuilderFactory.get("parentJob")
.incrementer(new RunIdIncrementer())
.repository(jobRepository)
.start(step1)
.next(step2)
.next(step3)
.build();
}
However if one of the JobSteps fail the parent job fails as well. Is there any mechanism to render fault tolerant all the JobSteps so that if one of them fails the parent job continues?
You are basically looking for a conditional flow as described here i.e. you want to move on to next job even if previous job has failed and that is not default flow. You need to use pattern - * to achieve that.
In Java Config, it would be like , .on("*").to(...) and so on.
Your pattern parameter to on(..) can be anything from ExitStatus and * provided you are not using any other custom statuses.
Hope it helps !!
Related
I am trying improve the performance of the job listed below. As is, without threading, it runs successfully. But is runs very slow. I would like to thread step 2 where 95% of the work happens in the reading, filtering and transforming the input data read from very large heterogeneous files. The job:
• Step1 gets some job parameters that are passed into Step2.
• Step2 will read in X number of files. Each file is heterogenous, i.e., contains several different record formats. The records are filtered, transformed and sent to a single output file.
Does Spring Batch have a built-in way to thread Step2 in this scenario? For example, can I add some type of executor to step2? I’ve tried SimpleAsyncTaskExecutor and ThreadPoolTaskExecutor. Neither work. Adding SimpleAsyncTaskExecutor throws an exception. (See can we process the multiple files sequentially using spring Batch while multiple threads used to process individual files data..?)
Here is the batch configuration:
public Job job() {
return jobBuilderFactory.get("MyJob")
.start(step1())
.next(step2())
.build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("Step1GetJobParams")
.tasklet(MyParamsTasklet)
.build();
}
#Bean
public Step step2() {
return stepBuilderFactory.get("Step2")
.<InputDO, OutputDO>chunk(1000)
.reader(myMultiResourceReader())
.processor(myStep2ItemProcessor)
.writer(myStep2FileWriter())
.taskExecutor(???) line #23
.build();
}
#Bean
public MultiResourceItemReader<InputDO> myMultiResourceReader(){
MultiResourceItemReader<InputDO> multiResourceItemReader = new MultiResourceItemReader<InputDO>();
multiResourceItemReader.setResources(resourceManager.getResources());
multiResourceItemReader.setDelegate(myStep2FileReader());
multiResourceItemReader.setSaveState(false);
return multiResourceItemReader;
}
#Bean
public FlatFileItemReader<InputDO> myStep2FileReader() {
return new FlatFileItemReaderBuilder<InputDO>()
.name("MyStep2FileReader")
.lineMapper(myCompositeLineMapper())
.build();
}
#Bean
public PatternMatchingCompositeLineMapper<InputDO> myCompositeLineMapper() {
PatternMatchingCompositeLineMapper<InputDO> lineMapper = new PatternMatchingCompositeLineMapper<InputDO>();
Map<String, LineTokenizer> tokenizers = new HashMap<String, LineTokenizer>();
tokenizers.put("A", InputDOTokenizer.getInputDOTokenizer());
tokenizers.put("*", InputDOFillerTokenizer.getInputDOFillerTokenizer());
lineMapper.setTokenizers(tokenizers);
Map<String, FieldSetMapper<InputDO>> mappers = new HashMap<String, FieldSetMapper<InputDO>>();
mappers.put("A", new InputDOFieldSetMapper());
mappers.put("*", new InputDOFillerFieldSetMapper());
lineMapper.setFieldSetMappers(mappers);
return lineMapper;
}
#Bean
public FlatFileItemWriter<OutputDO> myOutputDOFileWriter() {
return new FlatFileItemWriterBuilder<OutputDO>()
.name("MyOutputDOFileWriter")
.resource(resourceManager.getFileSystemResource("myOutputDOFileName"))
.lineAggregator(new DelimitedLineAggregator<OutputDO>() {
{
setDelimiter("");
setFieldExtractor(outputDOFieldExtractor.getOutputDOFieldExtractor());
};
})
.lineSeparator("\r\n")
.build();
}
Any/all guidance is much appreciated!
I guess you want to use this mode of Multi-threaded Step to resolve read slowly problem. more details is available from spring batch office - Multi-threaded Step about it.
Hope to help you.
Issue
I am a bit confused, because when starting the execution of a Spring Batch job by HTTP request, if I receive another HTTP request to start the same job, but with different parameters while the job is executing, the job that is being executed stops unfinished and processing of the new job starts.
Context
I've developed an API REST to load and process the content of Excel files. The web service exposes two endpoints, one to load, validate and store the content of Excel files in the database and the other to start the processing of the records stored in the database.
How does it works
POST /api/excel/upload
This endpoint receives the Excel files. When a request is received, each file is assigned a unique identifier and its content is validated. If the content is correct, it inserts it into a temporary table waiting to be processed.
GET /api/Excel/process?id=x
This endpoint receives the identifiers of the files to be processed. When a request is received, a Spring Batch job is started to process the records in the temporary table.
Some code
Controller
#PostMapping(produces = {APPLICATION_JSON_VALUE})
public ResponseEntity<Page<ExcelLoad>> post(#RequestParam("file") MultipartFile multipartFile)
{
return super.getResponse().returnPage(service.upload(multipartFile));
}
#GetMapping(value = "/process", produces = APPLICATION_JSON_VALUE)
public DeferredResult<ResponseEntity<Void>> get(#RequestParam("id") Integer idCarga)
{
DeferredResult<ResponseEntity<Void>> response = new DeferredResult<>(1000L);
response.onTimeout(() -> response.setResult(super.getResponse().returnVoid()));
ForkJoinPool.commonPool().submit(() -> service.startJob(idCarga));
return response;
}
I use DeferredResult to send a response to the client after receiving the request without waiting for the job to finish
Service
public void startJob(int idCarga)
{
JobParameters params = new JobParametersBuilder()
.addString("mainJob", String.valueOf(System.currentTimeMillis()))
.addString("idCarga", String.valueOf(idCarga))
.toJobParameters();
try
{
jobLauncher.run(job, params);
}
catch (JobExecutionException e)
{
log.error("---ERROR: {}", e.getMessage());
}
}
Batch
#Bean
public Step mainStep(ReaderImpl reader, ProcessorImpl processor, WriterImpl writer)
{
return stepBuilderFactory.get("step")
.<List<ExcelLoad>, Invoice>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.faultTolerant().skipPolicy(new ExceptionSkipPolicy())
.listener(stepSkipListener)
.build();
}
#Bean
public Job mainJob(Step mainStep)
{
return jobBuilderFactory.get("mainJob")
.listener(mainJobExecutionListener)
.incrementer(new RunIdIncrementer())
.start(mainStep)
.build();
}
Performing some tests I have observed the following behavior:
If I make a request to the endpoint /process to process each file at different times: in this case, all the records stored in the temporary table are processed:
Records processed file1: 3606 (expected 3606).
Records processed file2: 1776 (expected 1776).
If I make a request to the endpoint /process to first process file1, and before it finishes I make another request to process file2: in this case, not all the records stored in the temporary table are processed:
Records processed file1: 1080 (expected 3606)
Records processed file2: 1774 (expected 1776)
The JobLauncher does not stop job executions, it only launches them. The default job launcher provided by Spring Batch is the SimpleJobLauncher which delegates job launching to a TaskExecutor. Now depending on the task executor implementation you use and how it is configured to launch concurrent tasks, you can see different behaviours. For example, when you launch a new job execution and a new task is submitted to the task executor, the task executor can decide to reject this submission if all workers are busy, or put it in a waiting queue, or stop another task and submit the new one. Those strategies depend on several parameters (TaskExecutor implementation, the type of the queue used behind the scene, the RejectedExecutionHandler implementation, etc) .
In your case, you seem to be using the following:
ForkJoinPool.commonPool().submit(() -> service.startJob(idCarga));
So you need to check the behaviour of this pool with regard to how it handles new task submissions (I guess this is what is stopping your jobs, but you need to confirm that). That said, I don't see why you need this. If your requirement is the following:
I use DeferredResult to send a response to the client after receiving the request without waiting for the job to finish
Then you can use an asynchronous task executor implementation (like the ThreadPoolTaskExecutor) in your job launcher, see Running Jobs from within a Web Container.
Thanks to help with answer from #Mahmoud Ben Hassine, I was able to resolve the issue. To help with the implementation, in case someone comes to this question, I share the code that, in my case, has worked to solve the problem:
Controller
#Autowired
private JobLauncher jobLauncher;
#Autowired
private Job job;
#GetMapping(value = "/process", produces = APPLICATION_JSON_VALUE)
public void get(#RequestParam("id") Integer idCarga) throws JobExecutionException
{
JobParameters params = new JobParametersBuilder()
.addString("mainJob", String.valueOf(System.currentTimeMillis()))
.addString("idCarga", String.valueOf(idCarga))
.toJobParameters();
jobLauncher.run(job, params);
}
Batch config, job and steps
#Configuration
#EnableBatchProcessing
public class BatchConfig extends DefaultBatchConfigurer
{
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private StepSkipListener stepSkipListener;
#Autowired
private MainJobExecutionListener mainJobExecutionListener;
#Bean
public TaskExecutor taskExecutor()
{
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setMaxPoolSize(10);
taskExecutor.setThreadNamePrefix("batch-thread-");
return taskExecutor;
}
#Bean
public JobLauncher jobLauncher() throws Exception
{
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(getJobRepository());
jobLauncher.setTaskExecutor(taskExecutor());
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
#Bean
public Step mainStep(ReaderImpl reader, ProcessorImpl processor, WriterImpl writer)
{
return stepBuilderFactory.get("step")
.<List<ExcelLoad>, Invoice>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.faultTolerant().skipPolicy(new ExceptionSkipPolicy())
.listener(stepSkipListener)
.build();
}
#Bean
public Job mainJob(Step mainStep)
{
return jobBuilderFactory.get("mainJob")
.listener(mainJobExecutionListener)
.incrementer(new RunIdIncrementer())
.start(mainStep)
.build();
}
}
If after applying this code, as it happened to me, you have also had problems inserting the records in the database, you can go through this question where I also put the code that works for me.
I have a spring batch application with a series of jobs. I want to send an email after ALL the jobs have completed, but I'm not sure the best way to do this. The options I am considering are:
Run the jobs in a certain order, and amend the JobListener of the last job, so that it will send the email. Downside with this is it won't work if a further job is added to the end of the batch.
Add a new job which will send the email and order the jobs, making sure this additional job is run last.
Are there any built in spring-batch constructs that will be triggered on completion of the entire batch?
The final option would be my preferred solution, so my question is, are there any spring-batch classes that listen for batch completion (similar to JobExecutionListenerSupport or a Step Listener)?
No. I am not aware of any batch listener that listen for the whole batch's completion.
I have two alternatives for you. Both allows you to stick to Spring.
(1) If your application is designed as perpetual (i.e. like a web-server), you can inject a custom jobLauncher, grab the TaskExecutor and wait its completion (either through simple counters that counts call-backs from afterJob functions or through a certain amount of time that it requires all jobs to be submitted -- not necessarily started).
Add a configuration class like this:
#Configuration
class JobConfiguration implements InitializingBean {
TaskExecutor taskExecutor;
#Bean TaskExecutor taskExecutor () {
// here, change to your liking, in this example
// I put a SyncTaskExecutor
taskExecutor = new SyncTaskExecutor();
return taskExecutor;
}
#Bean
public JobLauncher jobLauncher(#Autowired JobRepository jobRepository,
#Autowired TaskExecutor taskExecutor) {
SimpleJobLauncher launcher = new SimpleJobLauncher();
launcher.setJobRepository(jobRepository);
launcher.setTaskExecutor(taskExecutor);
return launcher;
}
List<Job> jobs;
// I don't use this in this example
// however, jobs.size() will help you with the countdown latch
#Autowired public void setJobs (List<Job> jobs) {
this.jobs = jobs;
}
#AfterPropertiesSet
public void init () {
// either countdown until all jobs are submitted
// or sleep a finite amount of time
// in this example, I'll be lazy and just sleep
Thread.sleep(1 * 3600 * 1000L); // wait 1 hour
taskExecutor.shutdown();
try {
taskExecutor.awaitTermination();
} catch (Exception e) {}
finally {
// send your e-mail here.
}
}
}
(2) If your application stops when all jobs are done, you can simply follow this to send out an e-mail.
I repeat a few lines of code for completeness:
public class TerminateBean {
#PreDestroy
public void onDestroy() throws Exception {
// send out e-mail here
}
}
We also have to add a bean of this type:
#Configuration
public class ShutdownConfig {
#Bean
public TerminateBean getTerminateBean() {
return new TerminateBean();
}
}
If I understand correctly, you have a job of jobs. In this case, you can define an enclosing job with a series of steps of type JobStep (which delegates to a job). Then, you can register a JobExecutionListener on the enclosing job. This listener will be called once all steps (ie sub-jobs) have completed.
More details about the JobStep here: https://docs.spring.io/spring-batch/4.0.x/api/org/springframework/batch/core/step/job/JobStep.html
I have a Job that looks into a configuration table and I would like it to trigger a Job per configuration entry using the values from the table.
The first Job comprises of a single Tasklet that does the config table lookup then triggers a subsequent Job per entry.
Mother Job code:
#Configuration
public class RunProcessesJobConfig {
... // steps, jobs factory init
#Autowired
private Tasklet runProcessTask;
#Bean
public Step runProcessStep() {
return steps.get("runProcessStep")
.tasklet(runProcessTask)
.build();
}
#Bean
public Job runProcessJob() {
return jobs.get("runProcessJob")
.start(runProcessStep())
.build();
}
}
The way I'm currently trying to implement it is by Autowiring a JobLauncher and the Job I need into the Tasklet and running the Job from there.
'RunProcessTask' - gets autowired into job config above:
#Autowired
Job myJob;
#Override
public RepeatStatus execute(StepContribution sc, ChunkContext cc) throws Exception {
List<DeployCfg> deployCfgs = da.getJobCfgs();
for (DeployCfg deployCfg : deployCfgs) {
String cfg1 = deployCfg.getCfg1();
String cfg2 = deployCfg.getCfg2();
String cfg3 = deployCfg.getCfg3();
// trigger job per config
new JobParametersBuilder()
.addString("cfg1", cfg1)
.addString("cfg2", cfg2)
.addString("cfg3", cfg3);
final JobExecution jobExec = jobLauncher.run(myJob, new JobParameters());
}
return RepeatStatus.FINISHED;
}
When I try executing the Mother Job I get an TransactionSuspensionNotSupportedException: Transaction manager [org.springframework.batch.support.transaction.ResourcelessTransactionManager] does not support transaction suspension error along the jobLauncher.run(...) line.
I'm thinking running a Job within another Job messes up with Spring Batch's Transaction Manager. Any ideas on how to do this?
Additional version info:
spring-boot-starter-parent version 1.5.9.RELEASE
spring-boot-starter-batch
Using Spring Batch 3.0.4.RELEASE.
I configure a job to use a partition step. The slave step uses chunk size 1. There are six threads in the task executor. I run this test with various grid sizes from six to hundreds. My grid size is the number of slave StepExecutions I expect == the number of ExecutionContexts created by my partitioner.
The result is always this:
The six threads pick up six different step executions and execute them successfully. Then the same six step executions run again and again in the same thread!
I notice that there is a loop in RepeatTemplate.executeInternal(...) that never ends. It keeps executing the same StepExecution just incrementing the version.
Here's the Java configuration code:
#Bean
#StepScope
public RapRequestItemReader rapReader(
#Value("#{stepExecutionContext['" + RapJobConfig.LIST_OF_IDS_STEP_EXECUTION_CONTEXT_VAR + "']}") String listOfIds,
final #Value("#{stepExecutionContext['" + RapJobConfig.TIME_STEP_EXECUTION_CONTEXT_VAR + "']}") String timeString) {
final List<Asset> farms = Arrays.asList(listOfIds.split(",")).stream().map(intString -> assetDao.getById(Integer.valueOf(intString)))
.collect(Collectors.toList());
return new RapRequestItemReader(timeString, farms);
}
#Bean
public ItemProcessor<RapRequest, PullSuccess> rapProcessor() {
return rapRequest -> {
return rapPull.pull(rapRequest.timestamp, rapRequest.farms);
};
}
#Bean
public TaskletStep rapStep1(StepBuilderFactory stepBuilderFactory, RapRequestItemReader rapReader) {
return stepBuilderFactory.get(RAP_STEP_NAME)
.<RapRequest, PullSuccess> chunk(RAP_STEP_CHUNK_SIZE)
.reader(rapReader)
.processor(rapProcessor())
.writer(updateCoverageWriter)
.build();
}
private RapFilePartitioner createRapFilePartitioner(RapParameter rapParameter) {
RapFilePartitioner partitioner = new RapFilePartitioner(rapParameter, rapPull.getIncrementHours());
return partitioner;
}
#Bean
public ThreadPoolTaskExecutor pullExecutor() {
ThreadPoolTaskExecutor pullExecutor = new ThreadPoolTaskExecutor();
pullExecutor.setCorePoolSize(weatherConfig.getNumberOfThreadsPerModelType());
pullExecutor.setMaxPoolSize(weatherConfig.getNumberOfThreadsPerModelType());
pullExecutor.setAllowCoreThreadTimeOut(true);
return pullExecutor;
}
#Bean
#JobScope
public Step rapPartitionByTimestampStep(StepBuilderFactory stepBuilderFactory, #Value("#{jobParameters['config']}") String config,
TaskletStep rapStep1) {
RapParameter rapParameter = GsonHelper.fromJson(config, RapParameter.class);
int gridSize = calculateGridSize(rapParameter);
return stepBuilderFactory.get("rapPartitionByTimestampStep")
.partitioner(rapStep1)
.partitioner(RAP_STEP_NAME, createRapFilePartitioner(rapParameter))
.taskExecutor(pullExecutor())
.gridSize(gridSize)
.build();
}
#Bean
public Job rapJob(JobBuilderFactory jobBuilderFactory, Step rapPartitionByTimestampStep) {
return jobBuilderFactory.get(JOB_NAME)
.start(rapPartitionByTimestampStep)
.build();
}
Though it's hard to tell this from the question, the problem was in the reader. The ItemReader was never returning null.
In the design, a StepExecution was supposed to process only one item. However, after processing that item, the ItemReader was returning that same item again instead of returning null.
I fixed it by having the ItemReader return null the second time read is called.
A better design might be to use a TaskletStep instead of a ChunkStep.