Spring Batch how to set filepath in ExecutionContext for next step - java

I have a spring batch workflow where I read from a flat csv file and write it a csv file.
This is what my ItemWriter looks like:
#Configuration
public class MyCSVFileWriter implements ItemWriter<RequestModel> {
private final FlatFileItemWriter<RequestModel> writer;
private ExecutionContext jobContext;
public MyCSVFileWriter() throws Exception {
this.writer = new FlatFileItemWriter<>();
DelimitedLineAggregator<RequestModel> lineAggregator = new DelimitedLineAggregator<>();
BeanWrapperFieldExtractor<RequestModel> extractor = new BeanWrapperFieldExtractor<>();
extractor.setNames(new String[]{"id", "source", "date"});
lineAggregator.setFieldExtractor(extractor);
this.writer.setLineAggregator(lineAggregator);
this.writer.setShouldDeleteIfExists(true);
this.writer.afterPropertiesSet();
}
#Override
public void write(List<? extends RequestModel> items) throws Exception {
this.writer.open(jobContext);
this.writer.write(items);
}
#BeforeStep
public void beforeStepHandler(StepExecution stepExecution) {
JobExecution jobExecution = stepExecution.getJobExecution();
jobContext = jobExecution.getExecutionContext();
this.writer.setResource(new FileSystemResource(getRequestOutputPathResource(jobContext)));
}
private String getRequestOutputPathResource(ExecutionContext jobContext) {
//***
return resourcePath;
}
}
I use the executionContext to extract some data used to calculate my resourcePath for my writer.
My next step after writing is uploading the file I have written to in previous step to a remote server. For this I need to store the file path which were calculated and store it ExecutionContext and make it available in the next step.
What is the best way to do this? Should I be doing this in a #AfterStep handler?

Should I be doing this in a #AfterStep handler?
Yes, this is a good option. You can store the path of the file that has been written in the job execution context and read it from there in the next step that uploads the file.

Related

Spring batch reading files with MultiResourceItemReader and using ItemReadListener

Here's the scenario: I have a Spring Batch that reads multiple input files, processes them, and finally generates more output files.
Using FlatFileItemReader and restarting the entire Batch with a cron, I can process the files 1 by 1, however it is not feasible to restart the batch every X seconds just to process the files individually.
PS: I use ItemReadListener to add some properties of the object being read within a jobExecutionContext, which will be used later to validate (and generate, or not, the output file).
However, if I use MultiResourceItemReader to read all the input files without completely restarting the whole context (and the resources), the ItemReadListener overwrites the properties of each object (input file) in the jobExecutionContext, so that we only have data from the last one object present in the array of input files.
Is there any way to use the ItemReadListener for each Resource read inside a MultiResourceItemReader?
Example Reader:
#Bean
public MultiResourceItemReader<CustomObject> multiResourceItemReader() {
MultiResourceItemReader<CustomObject> resourceItemReader = new MultiResourceItemReader<CustomObject>();
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
#Bean
public FlatFileItemReader<CustomObject> reader() {
FlatFileItemReader<CustomObject> reader = new FlatFileItemReader<CustomObject>();
reader.setLineMapper(customObjectLineMapper());
return reader;
}
Example Step:
#Bean
public Step loadInputFiles() {
return stepBuilderFactory.get("loadInputFiles").<CustomObject, CustomObject>chunk(10)
.reader(multiResourceItemReader())
.writer(new NoOpItemWriter())
.listener(customObjectListener())
.build();
}
Example Listener:
public class CustomObjectListener implements ItemReadListener<CustomObject> {
#Value("#{jobExecution.executionContext}")
private ExecutionContext executionContext;
#Override
public void beforeRead() {
}
#Override
public void afterRead(CustomObject item) {
executionContext.put("customProperty", item.getCustomProperty());
}
#Override
public void onReadError(Exception ex) {
}
}
Scheduler:
public class Scheduler {
#Autowired
JobLauncher jobLauncher;
#Autowired
Job job;
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
#Scheduled(fixedDelay = 5000, initialDelay = 5000)
public void scheduleByFixedRate() throws Exception {
JobParameters params = new JobParametersBuilder().addString("time", format.format(Calendar.getInstance().getTime()))
.toJobParameters();
jobLauncher.run(job, params);
}
Using FlatFileItemReader and restarting the entire Batch with a cron, I can process the files 1 by 1, however it is not feasible to restart the batch every X seconds just to process the files individually.
That is the very reason I always recommend the job-per-file approach over the single-job-for-all-files-with-MultiResourceItemReader approach, like here or here.
Is there any way to use the ItemReadListener for each Resource read inside a MultiResourceItemReader?
No, because the listener is not aware of the resource the item was read from. This is a limitation of the approach itself, not in Spring Batch. What you can do though is make your items aware of the resource they were read from, by implementing ResourceAware.

spring batch using spring boot: Read arguments from config or command line and use them in job

I am pretty new to spring technology. I am trying to build an ETL like app using spring batch with spring boot.
Able to run the basic job (read->process->write). Now, I want to read the arguments (like date, file name, type, etc) from a config file (later) or command line (can work with it now) and use them in my job.
Entry point:
// Imports
#SpringBootApplication
#EnableBatchProcessing
public class EtlSpringBatchApplication {
public static void main(String[] args) {
SpringApplication.run(EtlSpringBatchApplication.class, args);
}
}
My batch configuration
// BatchConfig.java
// Imports
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
#Autowired
public MyDao myDao;
#Bean
public Job job() {
return jobBuilderFactory
.get("job")
.incrementer(new RunIdIncrementer())
.listener(new Listener(myDao))
.flow(step1())
.end()
.build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("step1").<myModel, myModel>chunk(1000)
.reader(Reader.reader("my_file_20200520.txt"))
.processor(new Processor())
.writer(new Writer(myDao))
.build();
}
I have basic steps steps.
Reader.java has method to read flat file.
public static FlatFileItemReader<MyModel> reader(String path) {......}
Processor.java has process method defined. I added a #BeforeStep to fetch some details from DB required for processing.
public class Processor implements ItemProcessor<MyModel, MyModel> {
private static final Logger log = LoggerFactory.getLogger(Processor.class);
private Long id = null;
#BeforeStep
public void getId(StepExecution stepExecution) {
this.id = stepExecution.getJobExecution().getExecutionContext().getLong("Id");
}
#Override
public MyModel process(MyModel myModel) throws Exception {
}
}
Writer.java is implementing ItemWriter and write code.
Listener.java extends JobExecutionListenerSupport and has overridden methods afterJob and beforeJob.
Basically tried to use executioncontext here in beforeJob.
#Override
public void beforeJob(JobExecution jobExecution) {
log.info("Getting the id..");
this.id = myDao.getLatestId();
log.info("id retrieved is: " + this.id);
jobExecution.getExecutionContext().putLong("Id", this.id);
}
Now, what I am looking for is:
The reader should get the file name from job arguments. i.e. when run the job, I should be able to give some arguments, one of them is file path.
Later some methods (like get id, etc) require few more variables which can be passed as arguments to job i.e. run_date, type, etc.
In short I am looking for a way to,
Pass job arguments to my app (run_date, type, file path etc)
Use them in reader and other places (Listener, Writer)
Can someone provide me what addiitons I should do in my BatchConfig.java and other places, to read the job parameters (from command line or config file, whichever is easy)?
Both Spring Batch and Spring Boot reference documentation show how to pass parameters to a job:
Running Jobs from the Command Line
Running Spring Batch jobs from the Command Line
Moreover, Spring Batch docs explain in details and with code examples how to use those parameters in batch components (like reader, writer, etc):
Late Binding of Job and Step Attributes
You can read the value of the of the job parameters set from the config file inside the reader or other classes within the spring batch execution context. Below is a snippet for reference,
application.yml file can have the below config,
batch.configs.filePath: c:\test
You can add the filePath read from the config to your job parameters when you start the job. Snippet of the class,
// Job and Job Launcher related autowires..
#Value("${batch.configs.filePath}")
private String filePath;
// inside a method block,
JobParameters jobParameters = new JobParametersBuilder().addLong("JobID", System.currentTimeMillis())
.addString("filePath", filePath).toJobParameters();
try {
jobLauncher.run(batchJob, jobParameters);
} catch (Exception e) {
logger.error("Exception while running a batch job {}", e.getMessage());
}
One of the ways to access the Job Parameters is to implement StepExecutionListener to your reader Class to make use of its Overridden methods beforeStep and afterStep. Similar implementations can be performed to other classes as well,
public class Reader implements ItemReader<String>, StepExecutionListener {
private String filePath;
#Override
public void beforeStep(StepExecution stepExecution) {
try {
filePath = (String) stepExecution.getJobExecution().getExecutionContext()
.get("filePath");
} catch (Exception e) {
logger.error("Exception while performing read {}", e);
}
}
#Override
public String read() throws Exception {
// filePath value read from the job execution can be used inside read use case impl
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
return ExitStatus.COMPLETED;
}
}

Spring Batch - How to generate parallel steps based on params created in a previous step

Introduction
I am trying to use jobparameters created in a tasklet to create steps following the execution of the tasklet.
A tasklet tries to finds some files (findFiles()) and if it finds some files it saves the filenames to a list of strings.
In the tasklet I pass the data as following:
chunkContext.getStepContext().getStepExecution().getExecutionContext().put("files", fileNames);
The next step is a parallel flow where for each file a simple reader-processor-writer step will be executed (if you are interested in how I got there please see my previous question: Spring Batch - Looping a reader/processor/writer step)
Upon building the job readFilesJob() a flow is created initially using a "fake" list of files because only after the tasklet has been executed the real list of files is known.
Question
How do I configure my job so the tasklet gets executed first and then the parallel flow gets executed using the list of files generated from the tasklet?
I think it comes down to getting the list of filenames loaded with the correct data at the correct moment during runtime... but how?
Reproduce
Here is my simplified configuration:
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
private static final String FLOW_NAME = "flow1";
private static final String PLACE_HOLDER = "empty";
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
public List<String> files = Arrays.asList(PLACE_HOLDER);
#Bean
public Job readFilesJob() throws Exception {
List<Step> steps = files.stream().map(file -> createStep(file)).collect(Collectors.toList());
FlowBuilder<Flow> flowBuilder = new FlowBuilder<>(FLOW_NAME);
Flow flow = flowBuilder
.start(findFiles())
.next(createParallelFlow(steps))
.build();
return jobBuilderFactory.get("readFilesJob")
.start(flow)
.end()
.build();
}
private static Flow createParallelFlow(List<Step> steps){
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(steps.size());
List<Flow> flows = steps.stream()
.map(step ->
new FlowBuilder<Flow>("flow_" + step.getName())
.start(step)
.build())
.collect(Collectors.toList());
return new FlowBuilder<SimpleFlow>("parallelStepsFlow").split(taskExecutor)
.add(flows.toArray(new Flow[flows.size()]))
.build();
}
private Step createStep(String fileName){
return stepBuilderFactory.get("readFile" + fileName)
.chunk(100)
.reader(reader(fileName))
.writer(writer(filename))
.build();
}
private FileFinder findFiles(){
return new FileFinder();
}
}
Research
The question and answer from How to safely pass params from Tasklet to step when running parallel jobs suggest the usage of a construct like this in the reader/writer:
#Value("#{jobExecutionContext[filePath]}") String filePath
However, I really hope it is possible to pass the fileName as a string to the reader/writer due to the way the steps are created in the createParallelFlow() method. Therefore, even tho the answer to that question might be a solution for my problem here, it is not the desired solution. But please do not refrain from correcting me if I am wrong.
Closing
I am using the file names example to clarify the problem better. My problem is not actually the reading of multiple files from a directory. My question really boils down to the idea of generating data during runtime and passing it to the next dynamically generated step(s).
EDIT:
Added a simplified tasklet of the fileFinder.
#Component
public class FileFinder implements Tasklet, InitializingBean {
List<String> fileNames;
public List<String> getFileNames() {
return fileNames;
}
#PostConstruct
public void afterPropertiesSet() {
// read the filenames and store dem in the list
fileNames.add("sample-data1.csv");
fileNames.add("sample-data2.csv");
}
#Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
// Execution of methods that will find the file names and put them in the list...
chunkContext.getStepContext().getStepExecution().getExecutionContext().put("files", fileNames);
return RepeatStatus.FINISHED;
}
}
I'm not sure, if I did understand your problem correctly, but as far as I see, you need to have the list with the filenames before you build your job dynamically.
You could do it like this:
#Component
public class MyJobSetup {
List<String> fileNames;
public List<String> getFileNames() {
return fileNames;
}
#PostConstruct
public void afterPropertiesSet() {
// read the filenames and store dem in the list
fileNames = ....;
}
}
After that, you can inject this Bean inside your JobConfiguration Bean
#Configuration
#EnableBatchProcessing
#Import(MyJobSetup.class)
public class BatchConfiguration {
private static final String FLOW_NAME = "flow1";
private static final String PLACE_HOLDER = "empty";
#Autowired
private MyJobSetup jobSetup; // <--- Inject
// PostConstruct of MyJobSetup was executed, when it is injected
#Autowired
public JobBuilderFactory jobBuilderFactory;
#Autowired
public StepBuilderFactory stepBuilderFactory;
public List<String> files = Arrays.asList(PLACE_HOLDER);
#Bean
public Job readFilesJob() throws Exception {
List<Step> steps = jobSetUp.getFileNames() // get the list of files
.stream() // as stream
.map(file -> createStep(file)) // map...
.collect(Collectors.toList()); // and create the list of steps

How to get Job parameteres in to item processor using spring Batch annotation

I am using spring MVC. From my controller, I am calling jobLauncher and in jobLauncher I am passing job parameters like below and I'm using annotations to enable configuration as below:
#Configuration
#EnableBatchProcessing
public class BatchConfiguration {
// read, write ,process and invoke job
}
JobParameters jobParameters = new JobParametersBuilder().addString("fileName", "xxxx.txt").toJobParameters();
stasrtjob = jobLauncher.run(job, jobParameters);
and here is my itemprocessor
public class DataItemProcessor implements ItemProcessor<InputData, OutPutData> {
public OutPutData process(final InputData inputData) throws Exception {
// i want to get job Parameters here ????
}
}
1) Put a scope annotation on your data processor i.e.
#Scope(value = "step")
2) Make a class instance in your data processor and inject the job parameter value by using value annotation :
#Value("#{jobParameters['fileName']}")
private String fileName;
Your final Data processor class will look like:
#Scope(value = "step")
public class DataItemProcessor implements ItemProcessor<InputData, OutPutData> {
#Value("#{jobParameters['fileName']}")
private String fileName;
public OutPutData process(final InputData inputData) throws Exception {
// i want to get job Parameters here ????
System.out.println("Job parameter:"+fileName);
}
public void setFileName(String fileName) {
this.fileName = fileName;
}
}
In case your data processor is not initialized as a bean, put a #Component annotation on it:
#Component("dataItemProcessor")
#Scope(value = "step")
public class DataItemProcessor implements ItemProcessor<InputData, OutPutData> {
A better solution (in my opinion) that avoids using Spring's hacky expression language (SpEL) is to autowire the StepExecution context into your processor using #BeforeStep.
In your processor, add something like:
#BeforeStep
public void beforeStep(final StepExecution stepExecution) {
JobParameters jobParameters = stepExecution.getJobParameters();
// Do stuff with job parameters, e.g. set class-scoped variables, etc.
}
The #BeforeStep annotation
Marks a method to be called before a Step is executed, which comes
after a StepExecution is created and persisted, but before the first
item is read.
I have written the in the process itself, rather then creating separate file using the lambda expression.
#Bean
#StepScope
public ItemProcessor<SampleTable, SampleTable> processor(#Value("#{jobParameters['eventName']}") String eventName) {
//return new RandomNumberProcessor();
return item -> {
SampleTable dataSample = new SampleTable();
if(data.contains(item)) {
return null;
}
else {
dataSample.setMobileNo(item.getMobileNo());
dataSample.setEventId(eventName);
return dataSample;
}
};
}

Implementation of a step-scoped in-memory destination in spring batch

I have a requirement when using spring batch for the creation for a large report, in binary format, which is held in a database. None of the working data can be written directly to files, or to working tables outside the JobExecutionContext.
I'm aware that normally you would just write to the job execution context, but I'm a little confused as to how I would go about it with such a large report (potentially several hundred megabytes.)
At the moment, my Writer implementation has a dependency on an aggregator class, which is injected as a bean, then there's a TaskLet that has the aggregator injected in which writes the finished report to the database.
The problem is that I cannot scope my aggregator to the step context and as such if two jobs are running at the same time they will be writing to the same aggregator.
Here is my current implementation
Domain class
public class DataChunk {
private int pageNumber;
private byte[] data;
}
Writer
public class FooWriter implements ItemWriter<DataChunk> {
private DataChunkAggregator dataChunkAggregator;
public void write(List<? extends DataChunk> dataChunks) throws Exception {
dataChunks.stream().forEach(chunk -> dataChunkAggregator.addChunk(chunk.getPageNumber(), chunk.getData()));
}
}
Aggregator
public class FooAggregator {
private Map<int, byte> pagedData; // Key sorted implementation
public void addChunk(int pageNumber, byte[] data) {
pagedData.put(pageNumber, data)
}
public byte[] aggregate() {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
pagedData.values.stream().forEach(data -> baos.write(data));
return baos.toByteArray();
}
}
Report Writing Tasklet
public class ReportWritingTasklet implements TaskLet {
private ReportRepository reportRepository;
private FooAggregator fooAggregator;
public RepeatStatus execute(StepContribution contribution, ChunkContext context) {
byte[] data = fooAggregator.aggregate();
reportRepository.getOne(reportId).setDataBytes(data);
}
}
Context
<?xml version="1.0" encoding="UTF-8"?>
<beans>
<bean id=fooWriter class="FooWriter" scope="step"
p:fooAggregator-ref="fooAggregator"/>
<bean id="fooAggregator" class="FooAggregator"/>
<bean id="reportWritingTasklet" class="ReportWritingTasklet" scope="step"
p:fooAggregator-ref="fooAggregator"/>
<batch:job id="fooJob">
<batch:step id="generateReport" next="assembleReport">
<batch:chunk reader="fooReader" processor="fooProcessor" writer="fooWriter"/>
</batch:step>
<batch:step id="assembleReport">
<batch:tasklet class="ReportWritingTasklet"/>
</batch:step>
</batch:job>
</beans>
If I attempt to make the FooAggregator step-scoped I get the following exception as a root-cause
Caused by: java.lang.IllegalStateException: Cannot convert value of type [com.sun.proxy.$Proxy98 implementing org.springframework.aop.scope.ScopedObject,java.io.Serializable,org.springframework.aop.framework.AopInfrastructureBean,org.springframework.aop.SpringProxy,org.springframework.aop.framework.Advised] to required type [FooAggregator] for property 'fooAggregator': no matching editors or conversion strategy found
This is because you're only meant to be able to scope certain things to the step.
How can I use the Execution context as a sink for my data chunks, bearing in mind there will be a lot of them and they will be very large?
I've managed to solve this. It's not very Spring batch-y, but it meets my requirements.
Essentially, there is too much data to push into and out of the context. The solution has been to maintain the state in the writer itself, and to make it a StepExecutionListener that will save at the end of the Step, via a TransactionCallback.
The Updated Writer
public class FooWriter extends StepExecutionListenerSupport implements ItemWriter<DataChunk> {
private String reportId;
private Map<Integer, byte[]> byteArrayMap = new ConcurrentSkipListMap<>();
private TransactionTemplate transactionTemplate;
private ReportRepository reportRepository;
#Override
public synchronized void write(List<? extends CaseChunk> caseChunks) throws Exception {
caseChunks.stream().forEach(chunk -> {
byteArrayMap.put(chunk.getPageNumber(), chunk.getBytes());
});
}
#Override
public void beforeStep(StepExecution stepExecution) {
// No-op
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
StringBuilder sb = new StringBuilder();
for (byte[] byteArrayOutputStream : byteArrayMap.values()) {
sb.append(new String(Base64.decode(byteArrayOutputStream)));
}
String encodedReportData = new String(Base64.encode(sb.toString().getBytes()));
TransactionCallback<Report> transactionCallback = transactionStatus -> {
Report report = reportRepository.getOne(this.reportId);
report.setReportData(encodedReportData);
reportRepository.save(report);
return report;
};
// TransactionTemplate throws its own declared TransactionException, rethrows encountered RuntimeExceptions
// and also Errors. Any problem writing the date kills the job, so it's OK to catch Throwable here instead
// of trying to
try {
transactionTemplate.execute(transactionCallback);
} catch (Throwable t) {
LOGGER.error("Error saving report data ID:[{}]", reportId);
return ExitStatus.FAILED.addExitDescription(t);
}
return ExitStatus.COMPLETED;
}
}

Categories