I am trying to configure openCSV in the reader() step in the spring batch to directly convert a record read from a CSV file into a JAVA POJO. But I am running into the issue of how to correctly set the lineMapper with the openCSV.
As suggested in the post linked here How to replace flatFileItemReader with openCSV in spring batch, I am trying as below:
public Event reader() throws IOException {
FlatFileItemReader<Event> itemReader = new FlatFileItemReader<Event>();
itemReader.setLineMapper(lineMapper());
itemReader.setLinesToSkip(1);
itemReader.setResource(new FileSystemResource(inputFilePath));
return itemReader;
}
But I am not able to figure out how to configure the lineMapper:
public LineMapper<Event> lineMapper() throws IOException {
DefaultLineMapper<Event> lineMapper = new DefaultLineMapper<Event>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer("\t");
BeanWrapperFieldSetMapper<Event> fieldSetMapper = new BeanWrapperFieldSetMapper<Event>();
fieldSetMapper.setTargetType(Event.class);
lineMapper.setLineTokenizer(???);
lineMapper.setFieldSetMapper(???);
I have the code to read the file and convert it to the desired POJO but where to put it:
try (
Reader reader = Files.newBufferedReader(Paths.get(inputFilePath));
) {
CsvToBean<Event> csvToBean = new CsvToBeanBuilder(reader)
.withSkipLines(1)
.withType(Event.class)
.withIgnoreLeadingWhiteSpace(true)
.build();
return csvToBean.iterator().next();
}
Any help to point me in the right direction is highly appreciated.
You are using the DefaultLineMapper and trying to set a LineTokenizer and FieldSetMapper in it, but this is not what is mentioned in the link you shared.
You need a custom implementation of the LineMapper interface that is based on OpenCSV:
public class OpenCSVLineMapper<T> implements LineMapper<T> {
#Override
public T mapLine(String line, int lineNumber) throws Exception {
// TODO use OpenCSV to map a line to a POJO of type T
return null;
}
}
OpenCSV provides APIs to both read the file and map data to objects. You don't need the reading part as this will be done by the FlatFileItemReader from Spring Batch, you only need to use OpenCSV for the mapping part.
Once this in place, you can set your OpenCSV based line mapper implementation on the FlatFileItemReader:
public FlatFileItemReader<Event> reader() throws IOException {
FlatFileItemReader<Event> itemReader = new FlatFileItemReader<Event>();
itemReader.setResource(new FileSystemResource(inputFilePath));
itemReader.setLinesToSkip(1);
itemReader.setLineMapper(new OpenCSVLineMapper<>());
return itemReader;
}
Related
Here's the scenario: I have a Spring Batch that reads multiple input files, processes them, and finally generates more output files.
Using FlatFileItemReader and restarting the entire Batch with a cron, I can process the files 1 by 1, however it is not feasible to restart the batch every X seconds just to process the files individually.
PS: I use ItemReadListener to add some properties of the object being read within a jobExecutionContext, which will be used later to validate (and generate, or not, the output file).
However, if I use MultiResourceItemReader to read all the input files without completely restarting the whole context (and the resources), the ItemReadListener overwrites the properties of each object (input file) in the jobExecutionContext, so that we only have data from the last one object present in the array of input files.
Is there any way to use the ItemReadListener for each Resource read inside a MultiResourceItemReader?
Example Reader:
#Bean
public MultiResourceItemReader<CustomObject> multiResourceItemReader() {
MultiResourceItemReader<CustomObject> resourceItemReader = new MultiResourceItemReader<CustomObject>();
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
#Bean
public FlatFileItemReader<CustomObject> reader() {
FlatFileItemReader<CustomObject> reader = new FlatFileItemReader<CustomObject>();
reader.setLineMapper(customObjectLineMapper());
return reader;
}
Example Step:
#Bean
public Step loadInputFiles() {
return stepBuilderFactory.get("loadInputFiles").<CustomObject, CustomObject>chunk(10)
.reader(multiResourceItemReader())
.writer(new NoOpItemWriter())
.listener(customObjectListener())
.build();
}
Example Listener:
public class CustomObjectListener implements ItemReadListener<CustomObject> {
#Value("#{jobExecution.executionContext}")
private ExecutionContext executionContext;
#Override
public void beforeRead() {
}
#Override
public void afterRead(CustomObject item) {
executionContext.put("customProperty", item.getCustomProperty());
}
#Override
public void onReadError(Exception ex) {
}
}
Scheduler:
public class Scheduler {
#Autowired
JobLauncher jobLauncher;
#Autowired
Job job;
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
#Scheduled(fixedDelay = 5000, initialDelay = 5000)
public void scheduleByFixedRate() throws Exception {
JobParameters params = new JobParametersBuilder().addString("time", format.format(Calendar.getInstance().getTime()))
.toJobParameters();
jobLauncher.run(job, params);
}
Using FlatFileItemReader and restarting the entire Batch with a cron, I can process the files 1 by 1, however it is not feasible to restart the batch every X seconds just to process the files individually.
That is the very reason I always recommend the job-per-file approach over the single-job-for-all-files-with-MultiResourceItemReader approach, like here or here.
Is there any way to use the ItemReadListener for each Resource read inside a MultiResourceItemReader?
No, because the listener is not aware of the resource the item was read from. This is a limitation of the approach itself, not in Spring Batch. What you can do though is make your items aware of the resource they were read from, by implementing ResourceAware.
I have a spring batch workflow where I read from a flat csv file and write it a csv file.
This is what my ItemWriter looks like:
#Configuration
public class MyCSVFileWriter implements ItemWriter<RequestModel> {
private final FlatFileItemWriter<RequestModel> writer;
private ExecutionContext jobContext;
public MyCSVFileWriter() throws Exception {
this.writer = new FlatFileItemWriter<>();
DelimitedLineAggregator<RequestModel> lineAggregator = new DelimitedLineAggregator<>();
BeanWrapperFieldExtractor<RequestModel> extractor = new BeanWrapperFieldExtractor<>();
extractor.setNames(new String[]{"id", "source", "date"});
lineAggregator.setFieldExtractor(extractor);
this.writer.setLineAggregator(lineAggregator);
this.writer.setShouldDeleteIfExists(true);
this.writer.afterPropertiesSet();
}
#Override
public void write(List<? extends RequestModel> items) throws Exception {
this.writer.open(jobContext);
this.writer.write(items);
}
#BeforeStep
public void beforeStepHandler(StepExecution stepExecution) {
JobExecution jobExecution = stepExecution.getJobExecution();
jobContext = jobExecution.getExecutionContext();
this.writer.setResource(new FileSystemResource(getRequestOutputPathResource(jobContext)));
}
private String getRequestOutputPathResource(ExecutionContext jobContext) {
//***
return resourcePath;
}
}
I use the executionContext to extract some data used to calculate my resourcePath for my writer.
My next step after writing is uploading the file I have written to in previous step to a remote server. For this I need to store the file path which were calculated and store it ExecutionContext and make it available in the next step.
What is the best way to do this? Should I be doing this in a #AfterStep handler?
Should I be doing this in a #AfterStep handler?
Yes, this is a good option. You can store the path of the file that has been written in the job execution context and read it from there in the next step that uploads the file.
Can I use FlatfileItemReader with Taskexecutor in spring batch??
I have implemented FlatFileItemReader with ThreadPoolTaskExecutor. When I print the records in ItemProcessor, I do not get consistent results, i.e. not all the records are printed and sometimes one of the record is printed more than once. It leads me to the fact that FlatFileItemReader is not thread safe and also it says the same in spring docs but I see some blogs where it says it is possible to use FlatFileItemReader with Task Executor.
So my question is: Is it possible to use FlatfileItemReader with Task Executor in anyway ?
#Bean
#StepScope
public FlatFileItemReader<DataLifeCycleEvent> csvFileReader(
#Value("#{stepExecution}") StepExecution stepExecution) {
Resource inputResource;
FlatFileItemReader<DataLifeCycleEvent> itemReader = new FlatFileItemReader<>();
itemReader.setLineMapper(new OnboardingLineMapper(stepExecution));
itemReader.setLinesToSkip(1);
itemReader.setSaveState(false);
itemReader.setSkippedLinesCallback(new OnboardingHeaderMapper(stepExecution));
String inputResourceString = stepExecution.getJobParameters().getString("inputResource");
inputResource = new FileSystemResource(inputFileLocation + ApplicationConstant.SLASH + inputResourceString);
itemReader.setResource(inputResource);
stepExecution.getJobExecution().getExecutionContext().putInt(ApplicationConstant.ERROR_COUNT, 0);
return itemReader;
}
FlatFileItemReader extends AbstractItemCountingItemStreamItemReader which is NOT thread-safe. So if you use it in a multi-threaded step, you need to synchronize it.
You can wrap it in a SynchronizedItemStreamReader. Here is a quick example:
#Bean
public SynchronizedItemStreamReader<DataLifeCycleEvent> itemReader() {
FlatFileItemReader<DataLifeCycleEvent> itemReader = ... // your item reader
SynchronizedItemStreamReader<DataLifeCycleEvent> synchronizedItemStreamReader = new SynchronizedItemStreamReader<>();
synchronizedItemStreamReader.setDelegate(itemReader);
return synchronizedItemStreamReader;
}
This method is giving this exception :-java.lang.ClassCastException: com.sun.proxy.$Proxy344 cannot be cast to org.springframework.batch.item.support.SynchronizedItemStreamReader
#Bean
public SynchronizedItemStreamReader<DataLifeCycleEvent> itemReader() {
FlatFileItemReader<DataLifeCycleEvent> itemReader = ... // your item reader
SynchronizedItemStreamReader<DataLifeCycleEvent> synchronizedItemStreamReader = new SynchronizedItemStreamReader<>();
synchronizedItemStreamReader.setDelegate(itemReader);
return synchronizedItemStreamReader;
}
Using Spring Batch I am trying to get every line of an input file as a String giving it to the ItemProcessor without any "CSV parsing" in the ItemReader.
I came out with a configuration Java class (using #Configuration and #EnableBatchProcessing) containing the following reader() method which is making the next ItemProcessor to throw a ClassCastException though.
This ItemReader should read an input file and pass to the ItemProcessor every line of the input file as a String.
#Bean
public ItemReader<String> reader() {
FlatFileItemReader<String> reader = new FlatFileItemReader<>();
reader.setResource(new ClassPathResource("data-to-process.txt"));
reader.setLineMapper(new DefaultLineMapper() {{
setLineTokenizer(new DelimitedLineTokenizer());
setFieldSetMapper(new PassThroughFieldSetMapper());
}});
return reader;
}
When running the previous code I am getting an exception in the ItemProcessor which is expecting a String from the reader():
java.lang.ClassCastException: org.springframework.batch.item.file.transform.DefaultFieldSet cannot be cast to java.lang.String
The custom ItemProcessor I wrote is defined as:
public class MyOwnCustomItemProcessor implements ItemProcessor<String, MyOwnCustomBusinessBean> {
I believe I should use this PassThroughFieldSetMapper in the ItemReader and I would not like to use any kind of tokenizer. According to the documentation I think I must use it and I can not avoid it, but I am keeping getting exceptions thrown.
How can I "transfer" every input line directly as a String to an ItemProcessor e.g. ?
Use PassThroughLineMapper if available else
public class PassThroughLineMapper implements LineMapper<String> {
#Override
public String mapLine(String line, int lineNumber) throws Exception {
return line;
}
}
I am adding also a workaround I found in the meanwhile (using FieldSet as a input parameter of the ItemReader and the ItemProcessor), even if #bellabax provided a better solution.
Check how I used the objects of type FieldSet.
The ItemReader:
#Bean
public ItemReader<FieldSet> reader() {
FlatFileItemReader<FieldSet> reader = new FlatFileItemReader<>();
reader.setResource(new ClassPathResource("data-to-process.txt"));
reader.setLineMapper(new DefaultLineMapper() {{
setLineTokenizer(new DelimitedLineTokenizer());
setFieldSetMapper(new PassThroughFieldSetMapper() {{
}});
}});
return reader;
}
The ItemProcessor in the Spring Batch Configuration class:
#Bean
public ItemProcessor<FieldSet, MyOwnCustomBusinessBean> processor() {
return new MyOwnCustomItemProcessor();
}
The ItemProcessor:
public class MyOwnCustomItemProcessor implements ItemProcessor<FieldSet, MyOwnCustomBusinessBean> {
#Override
public MyOwnCustomBusinessBean process(FieldSet originalInputLineFromInputFile) throws Exception {
String originalInputLine = originalInputLineFromInputFile.getValues()[0];
[...]
I am currently writing a Spring batch where I am reading a chunk of data, processing it and then I wish to pass this data to 2 writers. One writer would simply update the database whereas the second writer will write to a csv file.
I am planning to write my own custom writer and inject the two itemWriters in the customItemWriter and call the write methods of both the item writers in the write method of customItemWriter. Is this approach correct? Are there any ItemWriter implementations available which meet my requirements?
Thanks in advance
You can use Spring's CompositeItemWriter and delegate to it all your writers.
here is a configuration example.
You don't necessarily have to use xml like the example. If the rest of your code uses annotation, you could simply do the following.
public ItemWriter<T> writerOne(){
ItemWriter<T> writer = new ItemWriter<T>();
//your logic here
return writer;
}
public ItemWriter<T> writerTwo(){
ItemWriter<T> writer = new ItemWriter<T>();
//your logic here
return writer;
}
public CompositeItemWriter<T> compositeItemWriter(){
CompositeItemWriter writer = new CompositeItemWriter();
writer.setDelegates(Arrays.asList(writerOne(),writerTwo()));
return writer;
}
You were right. SB is heavly based on delegation so using a CompositeItemWriter is the right choice for your needs.
Java Config way SpringBatch4
#Bean
public Step step1() {
return this.stepBuilderFactory.get("step1")
.<String, String>chunk(2)
.reader(itemReader())
.writer(compositeItemWriter())
.stream(fileItemWriter1())
.stream(fileItemWriter2())
.build();
}
/**
* In Spring Batch 4, the CompositeItemWriter implements ItemStream so this isn't
* necessary, but used for an example.
*/
#Bean
public CompositeItemWriter compositeItemWriter() {
List<ItemWriter> writers = new ArrayList<>(2);
writers.add(fileItemWriter1());
writers.add(fileItemWriter2());
CompositeItemWriter itemWriter = new CompositeItemWriter();
itemWriter.setDelegates(writers);
return itemWriter;
}
Depending on your need, another option is to extend the Writer class and add functionality there. For example, I have a project where I am extending HibernateItemWriter and then overriding write(List items). I then send the objects I am writing along with my sessionFactory to the doWrite method of the Writer: doWrite(sessionFactory, filteredRecords).
So in the example above, I could write to the csv file in my extended class and then the HibernateItemWriter would write to the database. Obviously this might not be ideal for this example, but for certain scenarios it is a nice option.
Here's a possible solution. Two writers inside a Composite Writer.
#Bean
public JdbcBatchItemWriter<XPTO> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<XPTO>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("UPDATE xxxx")
.dataSource(dataSource)
.build();
}
#Bean
public JdbcBatchItemWriter<XPTO> writer2(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<XPTO>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("UPDATE yyyyy")
.dataSource(dataSource)
.build();
}
#Bean
public CompositeItemWriter<XPTO> compositeItemWriter(DataSource dataSource) {
CompositeItemWriter<XPTO> compositeItemWriter = new CompositeItemWriter<>();
compositeItemWriter.setDelegates(Arrays.asList( writer(dataSource), writer2(dataSource)));
return compositeItemWriter;
}
#Bean
protected Step step1(DataSource datasource) {
return this.stepBuilderFactory.get("step1").
<XPTO, XPTO>chunk(1).
reader(reader()).
processor(processor()).
writer(compositeItemWriter(datasource)).
build();
}