Running job stops if start new running the same job - Spring Batch - java

Issue
I am a bit confused, because when starting the execution of a Spring Batch job by HTTP request, if I receive another HTTP request to start the same job, but with different parameters while the job is executing, the job that is being executed stops unfinished and processing of the new job starts.
Context
I've developed an API REST to load and process the content of Excel files. The web service exposes two endpoints, one to load, validate and store the content of Excel files in the database and the other to start the processing of the records stored in the database.
How does it works
POST /api/excel/upload
This endpoint receives the Excel files. When a request is received, each file is assigned a unique identifier and its content is validated. If the content is correct, it inserts it into a temporary table waiting to be processed.
GET /api/Excel/process?id=x
This endpoint receives the identifiers of the files to be processed. When a request is received, a Spring Batch job is started to process the records in the temporary table.
Some code
Controller
#PostMapping(produces = {APPLICATION_JSON_VALUE})
public ResponseEntity<Page<ExcelLoad>> post(#RequestParam("file") MultipartFile multipartFile)
{
return super.getResponse().returnPage(service.upload(multipartFile));
}
#GetMapping(value = "/process", produces = APPLICATION_JSON_VALUE)
public DeferredResult<ResponseEntity<Void>> get(#RequestParam("id") Integer idCarga)
{
DeferredResult<ResponseEntity<Void>> response = new DeferredResult<>(1000L);
response.onTimeout(() -> response.setResult(super.getResponse().returnVoid()));
ForkJoinPool.commonPool().submit(() -> service.startJob(idCarga));
return response;
}
I use DeferredResult to send a response to the client after receiving the request without waiting for the job to finish
Service
public void startJob(int idCarga)
{
JobParameters params = new JobParametersBuilder()
.addString("mainJob", String.valueOf(System.currentTimeMillis()))
.addString("idCarga", String.valueOf(idCarga))
.toJobParameters();
try
{
jobLauncher.run(job, params);
}
catch (JobExecutionException e)
{
log.error("---ERROR: {}", e.getMessage());
}
}
Batch
#Bean
public Step mainStep(ReaderImpl reader, ProcessorImpl processor, WriterImpl writer)
{
return stepBuilderFactory.get("step")
.<List<ExcelLoad>, Invoice>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.faultTolerant().skipPolicy(new ExceptionSkipPolicy())
.listener(stepSkipListener)
.build();
}
#Bean
public Job mainJob(Step mainStep)
{
return jobBuilderFactory.get("mainJob")
.listener(mainJobExecutionListener)
.incrementer(new RunIdIncrementer())
.start(mainStep)
.build();
}
Performing some tests I have observed the following behavior:
If I make a request to the endpoint /process to process each file at different times: in this case, all the records stored in the temporary table are processed:
Records processed file1: 3606 (expected 3606).
Records processed file2: 1776 (expected 1776).
If I make a request to the endpoint /process to first process file1, and before it finishes I make another request to process file2: in this case, not all the records stored in the temporary table are processed:
Records processed file1: 1080 (expected 3606)
Records processed file2: 1774 (expected 1776)

The JobLauncher does not stop job executions, it only launches them. The default job launcher provided by Spring Batch is the SimpleJobLauncher which delegates job launching to a TaskExecutor. Now depending on the task executor implementation you use and how it is configured to launch concurrent tasks, you can see different behaviours. For example, when you launch a new job execution and a new task is submitted to the task executor, the task executor can decide to reject this submission if all workers are busy, or put it in a waiting queue, or stop another task and submit the new one. Those strategies depend on several parameters (TaskExecutor implementation, the type of the queue used behind the scene, the RejectedExecutionHandler implementation, etc) .
In your case, you seem to be using the following:
ForkJoinPool.commonPool().submit(() -> service.startJob(idCarga));
So you need to check the behaviour of this pool with regard to how it handles new task submissions (I guess this is what is stopping your jobs, but you need to confirm that). That said, I don't see why you need this. If your requirement is the following:
I use DeferredResult to send a response to the client after receiving the request without waiting for the job to finish
Then you can use an asynchronous task executor implementation (like the ThreadPoolTaskExecutor) in your job launcher, see Running Jobs from within a Web Container.

Thanks to help with answer from #Mahmoud Ben Hassine, I was able to resolve the issue. To help with the implementation, in case someone comes to this question, I share the code that, in my case, has worked to solve the problem:
Controller
#Autowired
private JobLauncher jobLauncher;
#Autowired
private Job job;
#GetMapping(value = "/process", produces = APPLICATION_JSON_VALUE)
public void get(#RequestParam("id") Integer idCarga) throws JobExecutionException
{
JobParameters params = new JobParametersBuilder()
.addString("mainJob", String.valueOf(System.currentTimeMillis()))
.addString("idCarga", String.valueOf(idCarga))
.toJobParameters();
jobLauncher.run(job, params);
}
Batch config, job and steps
#Configuration
#EnableBatchProcessing
public class BatchConfig extends DefaultBatchConfigurer
{
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private StepSkipListener stepSkipListener;
#Autowired
private MainJobExecutionListener mainJobExecutionListener;
#Bean
public TaskExecutor taskExecutor()
{
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setMaxPoolSize(10);
taskExecutor.setThreadNamePrefix("batch-thread-");
return taskExecutor;
}
#Bean
public JobLauncher jobLauncher() throws Exception
{
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(getJobRepository());
jobLauncher.setTaskExecutor(taskExecutor());
jobLauncher.afterPropertiesSet();
return jobLauncher;
}
#Bean
public Step mainStep(ReaderImpl reader, ProcessorImpl processor, WriterImpl writer)
{
return stepBuilderFactory.get("step")
.<List<ExcelLoad>, Invoice>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.faultTolerant().skipPolicy(new ExceptionSkipPolicy())
.listener(stepSkipListener)
.build();
}
#Bean
public Job mainJob(Step mainStep)
{
return jobBuilderFactory.get("mainJob")
.listener(mainJobExecutionListener)
.incrementer(new RunIdIncrementer())
.start(mainStep)
.build();
}
}
If after applying this code, as it happened to me, you have also had problems inserting the records in the database, you can go through this question where I also put the code that works for me.

Related

Threading a variable number of heterogeneous input files, process the input and output to a single file

I am trying improve the performance of the job listed below. As is, without threading, it runs successfully. But is runs very slow. I would like to thread step 2 where 95% of the work happens in the reading, filtering and transforming the input data read from very large heterogeneous files. The job:
• Step1 gets some job parameters that are passed into Step2.
• Step2 will read in X number of files. Each file is heterogenous, i.e., contains several different record formats. The records are filtered, transformed and sent to a single output file.
Does Spring Batch have a built-in way to thread Step2 in this scenario? For example, can I add some type of executor to step2? I’ve tried SimpleAsyncTaskExecutor and ThreadPoolTaskExecutor. Neither work. Adding SimpleAsyncTaskExecutor throws an exception. (See can we process the multiple files sequentially using spring Batch while multiple threads used to process individual files data..?)
Here is the batch configuration:
public Job job() {
return jobBuilderFactory.get("MyJob")
.start(step1())
.next(step2())
.build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("Step1GetJobParams")
.tasklet(MyParamsTasklet)
.build();
}
#Bean
public Step step2() {
return stepBuilderFactory.get("Step2")
.<InputDO, OutputDO>chunk(1000)
.reader(myMultiResourceReader())
.processor(myStep2ItemProcessor)
.writer(myStep2FileWriter())
.taskExecutor(???) line #23
.build();
}
#Bean
public MultiResourceItemReader<InputDO> myMultiResourceReader(){
MultiResourceItemReader<InputDO> multiResourceItemReader = new MultiResourceItemReader<InputDO>();
multiResourceItemReader.setResources(resourceManager.getResources());
multiResourceItemReader.setDelegate(myStep2FileReader());
multiResourceItemReader.setSaveState(false);
return multiResourceItemReader;
}
#Bean
public FlatFileItemReader<InputDO> myStep2FileReader() {
return new FlatFileItemReaderBuilder<InputDO>()
.name("MyStep2FileReader")
.lineMapper(myCompositeLineMapper())
.build();
}
#Bean
public PatternMatchingCompositeLineMapper<InputDO> myCompositeLineMapper() {
PatternMatchingCompositeLineMapper<InputDO> lineMapper = new PatternMatchingCompositeLineMapper<InputDO>();
Map<String, LineTokenizer> tokenizers = new HashMap<String, LineTokenizer>();
tokenizers.put("A", InputDOTokenizer.getInputDOTokenizer());
tokenizers.put("*", InputDOFillerTokenizer.getInputDOFillerTokenizer());
lineMapper.setTokenizers(tokenizers);
Map<String, FieldSetMapper<InputDO>> mappers = new HashMap<String, FieldSetMapper<InputDO>>();
mappers.put("A", new InputDOFieldSetMapper());
mappers.put("*", new InputDOFillerFieldSetMapper());
lineMapper.setFieldSetMappers(mappers);
return lineMapper;
}
#Bean
public FlatFileItemWriter<OutputDO> myOutputDOFileWriter() {
return new FlatFileItemWriterBuilder<OutputDO>()
.name("MyOutputDOFileWriter")
.resource(resourceManager.getFileSystemResource("myOutputDOFileName"))
.lineAggregator(new DelimitedLineAggregator<OutputDO>() {
{
setDelimiter("");
setFieldExtractor(outputDOFieldExtractor.getOutputDOFieldExtractor());
};
})
.lineSeparator("\r\n")
.build();
}
Any/all guidance is much appreciated!
I guess you want to use this mode of Multi-threaded Step to resolve read slowly problem. more details is available from spring batch office - Multi-threaded Step about it.
Hope to help you.

Strategies to implement callback mechanism / notify, when all the asynchrous spring integration flows/threads execution is completed

I have spring integration flow that gets triggered once a every day, that pulls all parties from database and sends each party to an executorChannel.
The next flow would pull data for each party and then process them parallelly by sending in to a different executor channel.
Challenge i'm facing is how do i know when this entire process ends. Any ideas on how to acheve this .
Here's my pseudo code of executor channels and integration flows.
#Bean
public IntegrationFlow fileListener() {
return IntegrationFlows.from(Files.inboundAdapter(new
File("pathtofile"))).channel("mychannel").get();
}
#Bean
public IntegrationFlow flowOne() throws ParserConfigurationException {
return IntegrationFlows.from("mychannel").handle("serviceHandlerOne",
"handle").nullChannel();
}
#Bean
public IntegrationFlow parallelFlowOne() throws ParserConfigurationException {
return IntegrationFlows.from("executorChannelOne").handle("parallelServiceHandlerOne",
"handle").nullChannel();
}
#Bean
public IntegrationFlow parallelFlowTwo() throws ParserConfigurationException {
return IntegrationFlows.from("executorChannelTwo").handle("parallelServiceHandlerTwo",
"handle").nullChannel();
}
#Bean
public MessageChannel executorChannelOne() {
return new ExecutorChannel(
Executors.newFixedThreadPool(10));
}
#Bean
public MessageChannel executorChannelTwo;() {
return new ExecutorChannel(
Executors.newFixedThreadPool(10));
}
#Component
#Scope("prototype")
public class ServiceHandlerOne{
#Autowired
MessageChannel executorChannelOne;
#ServiceActivator
public Message<?> handle(Message<?> message) {
List<?> rowDatas = repository.findAll("parties");
rowDatas.stream().forEach(data -> {
Message<?> message = MessageBuilder.withPayload(data).build();
executorChannelOne.send(message);
});
return message;
}
}
#Component
#Scope("prototype")
public class ParallelServiceHandlerOne{
#Autowired
MessageChannel executorChannelTwo;;
#ServiceActivator
public Message<?> handle(Message<?> message) {
List<?> rowDatas = repository.findAll("party");
rowDatas.stream().forEach(data -> {
Message<?> message = MessageBuilder.withPayload(data).build();
executorChannelTwo;.send(message);
});
return message;
}
}
First of all no reason to make your services as #Scope("prototype"): I don't see any state holding in your services, so they are stateless, therefore can simply be as singleton. Second: since you make your flows ending with the nullChannel(), therefore point in returning anything from your service methods. Therefore just void and flow is going to end over there naturally.
Another observation: you use executorChannelOne.send(message) directly in the code of your service method. The same would be simply achieved if you just return that new message from your service method and have that executorChannelOne as the next .channel() in your flow definition after that handle("parallelServiceHandlerOne", "handle").
Since it looks like you do that in the loop, you might consider to add a .split() in between: the handler return your List<?> rowDatas and splitter will take care for iterating over that data and replies each item to that executorChannelOne.
Now about your original question.
There is really no easy to say that your executors are not busy any more. They might not be at the moment of request just because the message for task has not reached an executor channel yet.
Typically we recommend to use some async synchronizer for your data. The aggregator is a good way to correlate several messages in-the-flight. This way the aggregator collects a group and does not emit reply until that group is completed.
The splitter I've mentioned above adds a sequence details headers by default, so subsequent aggregator can track a message group easily.
Since you have layers in your flow, it looks like you would need a several aggregators: two for your executor channels after splitting, and one top level for the file. Those two would reply to the top-level for the final, per-file grouping.
You also may think about making those parties and party calls in parallel using a PublishSubscribeChannel, which also can be configured with a applySequence=true. This info then will be used by the top-level aggregator for info per file.
See more in docs:
https://docs.spring.io/spring-integration/docs/current/reference/html/core.html#channel-implementations-publishsubscribechannel
https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#splitter
https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#aggregator

Spring batch trigger process after all jobs complete

I have a spring batch application with a series of jobs. I want to send an email after ALL the jobs have completed, but I'm not sure the best way to do this. The options I am considering are:
Run the jobs in a certain order, and amend the JobListener of the last job, so that it will send the email. Downside with this is it won't work if a further job is added to the end of the batch.
Add a new job which will send the email and order the jobs, making sure this additional job is run last.
Are there any built in spring-batch constructs that will be triggered on completion of the entire batch?
The final option would be my preferred solution, so my question is, are there any spring-batch classes that listen for batch completion (similar to JobExecutionListenerSupport or a Step Listener)?
No. I am not aware of any batch listener that listen for the whole batch's completion.
I have two alternatives for you. Both allows you to stick to Spring.
(1) If your application is designed as perpetual (i.e. like a web-server), you can inject a custom jobLauncher, grab the TaskExecutor and wait its completion (either through simple counters that counts call-backs from afterJob functions or through a certain amount of time that it requires all jobs to be submitted -- not necessarily started).
Add a configuration class like this:
#Configuration
class JobConfiguration implements InitializingBean {
TaskExecutor taskExecutor;
#Bean TaskExecutor taskExecutor () {
// here, change to your liking, in this example
// I put a SyncTaskExecutor
taskExecutor = new SyncTaskExecutor();
return taskExecutor;
}
#Bean
public JobLauncher jobLauncher(#Autowired JobRepository jobRepository,
#Autowired TaskExecutor taskExecutor) {
SimpleJobLauncher launcher = new SimpleJobLauncher();
launcher.setJobRepository(jobRepository);
launcher.setTaskExecutor(taskExecutor);
return launcher;
}
List<Job> jobs;
// I don't use this in this example
// however, jobs.size() will help you with the countdown latch
#Autowired public void setJobs (List<Job> jobs) {
this.jobs = jobs;
}
#AfterPropertiesSet
public void init () {
// either countdown until all jobs are submitted
// or sleep a finite amount of time
// in this example, I'll be lazy and just sleep
Thread.sleep(1 * 3600 * 1000L); // wait 1 hour
taskExecutor.shutdown();
try {
taskExecutor.awaitTermination();
} catch (Exception e) {}
finally {
// send your e-mail here.
}
}
}
(2) If your application stops when all jobs are done, you can simply follow this to send out an e-mail.
I repeat a few lines of code for completeness:
public class TerminateBean {
#PreDestroy
public void onDestroy() throws Exception {
// send out e-mail here
}
}
We also have to add a bean of this type:
#Configuration
public class ShutdownConfig {
#Bean
public TerminateBean getTerminateBean() {
return new TerminateBean();
}
}
If I understand correctly, you have a job of jobs. In this case, you can define an enclosing job with a series of steps of type JobStep (which delegates to a job). Then, you can register a JobExecutionListener on the enclosing job. This listener will be called once all steps (ie sub-jobs) have completed.
More details about the JobStep here: https://docs.spring.io/spring-batch/4.0.x/api/org/springframework/batch/core/step/job/JobStep.html

How to trigger multiple "child" Jobs from a "mother" Job using Spring Batch?

I have a Job that looks into a configuration table and I would like it to trigger a Job per configuration entry using the values from the table.
The first Job comprises of a single Tasklet that does the config table lookup then triggers a subsequent Job per entry.
Mother Job code:
#Configuration
public class RunProcessesJobConfig {
... // steps, jobs factory init
#Autowired
private Tasklet runProcessTask;
#Bean
public Step runProcessStep() {
return steps.get("runProcessStep")
.tasklet(runProcessTask)
.build();
}
#Bean
public Job runProcessJob() {
return jobs.get("runProcessJob")
.start(runProcessStep())
.build();
}
}
The way I'm currently trying to implement it is by Autowiring a JobLauncher and the Job I need into the Tasklet and running the Job from there.
'RunProcessTask' - gets autowired into job config above:
#Autowired
Job myJob;
#Override
public RepeatStatus execute(StepContribution sc, ChunkContext cc) throws Exception {
List<DeployCfg> deployCfgs = da.getJobCfgs();
for (DeployCfg deployCfg : deployCfgs) {
String cfg1 = deployCfg.getCfg1();
String cfg2 = deployCfg.getCfg2();
String cfg3 = deployCfg.getCfg3();
// trigger job per config
new JobParametersBuilder()
.addString("cfg1", cfg1)
.addString("cfg2", cfg2)
.addString("cfg3", cfg3);
final JobExecution jobExec = jobLauncher.run(myJob, new JobParameters());
}
return RepeatStatus.FINISHED;
}
When I try executing the Mother Job I get an TransactionSuspensionNotSupportedException: Transaction manager [org.springframework.batch.support.transaction.ResourcelessTransactionManager] does not support transaction suspension error along the jobLauncher.run(...) line.
I'm thinking running a Job within another Job messes up with Spring Batch's Transaction Manager. Any ideas on how to do this?
Additional version info:
spring-boot-starter-parent version 1.5.9.RELEASE
spring-boot-starter-batch

How to create a fault tolerant JobStep

I have a Spring Batch project where I have several jobs. I moduralized them in a single job to execute them sequentially like this:
#Autowired
#Qualifier("job1")
private Job job1;
#Autowired
#Qualifier("job2")
private Job job2;
#Autowired
#Qualifier("job3")
private Job job3;
#Bean(name = "step1")
public Step step1() {
return stepBuilderFactory.get("step1").job(job1).build();
}
#Bean(name = "step2")
public Step step2() {
return stepBuilderFactory.get("step2").job(job2).build();
}
#Bean(name = "step3")
public Step step3() {
return stepBuilderFactory.get("step3").job(job3).build();
}
#Bean(name = "parentJob")
public Job parentJob(#Qualifier("step1") Step step1, #Qualifier("step2") Step step2, #Qualifier("step3") Step step3) {
return jobBuilderFactory.get("parentJob")
.incrementer(new RunIdIncrementer())
.repository(jobRepository)
.start(step1)
.next(step2)
.next(step3)
.build();
}
However if one of the JobSteps fail the parent job fails as well. Is there any mechanism to render fault tolerant all the JobSteps so that if one of them fails the parent job continues?
You are basically looking for a conditional flow as described here i.e. you want to move on to next job even if previous job has failed and that is not default flow. You need to use pattern - * to achieve that.
In Java Config, it would be like , .on("*").to(...) and so on.
Your pattern parameter to on(..) can be anything from ExitStatus and * provided you are not using any other custom statuses.
Hope it helps !!

Categories