I am currently writing a Spring batch where I am reading a chunk of data, processing it and then I wish to pass this data to 2 writers. One writer would simply update the database whereas the second writer will write to a csv file.
I am planning to write my own custom writer and inject the two itemWriters in the customItemWriter and call the write methods of both the item writers in the write method of customItemWriter. Is this approach correct? Are there any ItemWriter implementations available which meet my requirements?
Thanks in advance
You can use Spring's CompositeItemWriter and delegate to it all your writers.
here is a configuration example.
You don't necessarily have to use xml like the example. If the rest of your code uses annotation, you could simply do the following.
public ItemWriter<T> writerOne(){
ItemWriter<T> writer = new ItemWriter<T>();
//your logic here
return writer;
}
public ItemWriter<T> writerTwo(){
ItemWriter<T> writer = new ItemWriter<T>();
//your logic here
return writer;
}
public CompositeItemWriter<T> compositeItemWriter(){
CompositeItemWriter writer = new CompositeItemWriter();
writer.setDelegates(Arrays.asList(writerOne(),writerTwo()));
return writer;
}
You were right. SB is heavly based on delegation so using a CompositeItemWriter is the right choice for your needs.
Java Config way SpringBatch4
#Bean
public Step step1() {
return this.stepBuilderFactory.get("step1")
.<String, String>chunk(2)
.reader(itemReader())
.writer(compositeItemWriter())
.stream(fileItemWriter1())
.stream(fileItemWriter2())
.build();
}
/**
* In Spring Batch 4, the CompositeItemWriter implements ItemStream so this isn't
* necessary, but used for an example.
*/
#Bean
public CompositeItemWriter compositeItemWriter() {
List<ItemWriter> writers = new ArrayList<>(2);
writers.add(fileItemWriter1());
writers.add(fileItemWriter2());
CompositeItemWriter itemWriter = new CompositeItemWriter();
itemWriter.setDelegates(writers);
return itemWriter;
}
Depending on your need, another option is to extend the Writer class and add functionality there. For example, I have a project where I am extending HibernateItemWriter and then overriding write(List items). I then send the objects I am writing along with my sessionFactory to the doWrite method of the Writer: doWrite(sessionFactory, filteredRecords).
So in the example above, I could write to the csv file in my extended class and then the HibernateItemWriter would write to the database. Obviously this might not be ideal for this example, but for certain scenarios it is a nice option.
Here's a possible solution. Two writers inside a Composite Writer.
#Bean
public JdbcBatchItemWriter<XPTO> writer(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<XPTO>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("UPDATE xxxx")
.dataSource(dataSource)
.build();
}
#Bean
public JdbcBatchItemWriter<XPTO> writer2(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<XPTO>()
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
.sql("UPDATE yyyyy")
.dataSource(dataSource)
.build();
}
#Bean
public CompositeItemWriter<XPTO> compositeItemWriter(DataSource dataSource) {
CompositeItemWriter<XPTO> compositeItemWriter = new CompositeItemWriter<>();
compositeItemWriter.setDelegates(Arrays.asList( writer(dataSource), writer2(dataSource)));
return compositeItemWriter;
}
#Bean
protected Step step1(DataSource datasource) {
return this.stepBuilderFactory.get("step1").
<XPTO, XPTO>chunk(1).
reader(reader()).
processor(processor()).
writer(compositeItemWriter(datasource)).
build();
}
Related
I am trying improve the performance of the job listed below. As is, without threading, it runs successfully. But is runs very slow. I would like to thread step 2 where 95% of the work happens in the reading, filtering and transforming the input data read from very large heterogeneous files. The job:
• Step1 gets some job parameters that are passed into Step2.
• Step2 will read in X number of files. Each file is heterogenous, i.e., contains several different record formats. The records are filtered, transformed and sent to a single output file.
Does Spring Batch have a built-in way to thread Step2 in this scenario? For example, can I add some type of executor to step2? I’ve tried SimpleAsyncTaskExecutor and ThreadPoolTaskExecutor. Neither work. Adding SimpleAsyncTaskExecutor throws an exception. (See can we process the multiple files sequentially using spring Batch while multiple threads used to process individual files data..?)
Here is the batch configuration:
public Job job() {
return jobBuilderFactory.get("MyJob")
.start(step1())
.next(step2())
.build();
}
#Bean
public Step step1() {
return stepBuilderFactory.get("Step1GetJobParams")
.tasklet(MyParamsTasklet)
.build();
}
#Bean
public Step step2() {
return stepBuilderFactory.get("Step2")
.<InputDO, OutputDO>chunk(1000)
.reader(myMultiResourceReader())
.processor(myStep2ItemProcessor)
.writer(myStep2FileWriter())
.taskExecutor(???) line #23
.build();
}
#Bean
public MultiResourceItemReader<InputDO> myMultiResourceReader(){
MultiResourceItemReader<InputDO> multiResourceItemReader = new MultiResourceItemReader<InputDO>();
multiResourceItemReader.setResources(resourceManager.getResources());
multiResourceItemReader.setDelegate(myStep2FileReader());
multiResourceItemReader.setSaveState(false);
return multiResourceItemReader;
}
#Bean
public FlatFileItemReader<InputDO> myStep2FileReader() {
return new FlatFileItemReaderBuilder<InputDO>()
.name("MyStep2FileReader")
.lineMapper(myCompositeLineMapper())
.build();
}
#Bean
public PatternMatchingCompositeLineMapper<InputDO> myCompositeLineMapper() {
PatternMatchingCompositeLineMapper<InputDO> lineMapper = new PatternMatchingCompositeLineMapper<InputDO>();
Map<String, LineTokenizer> tokenizers = new HashMap<String, LineTokenizer>();
tokenizers.put("A", InputDOTokenizer.getInputDOTokenizer());
tokenizers.put("*", InputDOFillerTokenizer.getInputDOFillerTokenizer());
lineMapper.setTokenizers(tokenizers);
Map<String, FieldSetMapper<InputDO>> mappers = new HashMap<String, FieldSetMapper<InputDO>>();
mappers.put("A", new InputDOFieldSetMapper());
mappers.put("*", new InputDOFillerFieldSetMapper());
lineMapper.setFieldSetMappers(mappers);
return lineMapper;
}
#Bean
public FlatFileItemWriter<OutputDO> myOutputDOFileWriter() {
return new FlatFileItemWriterBuilder<OutputDO>()
.name("MyOutputDOFileWriter")
.resource(resourceManager.getFileSystemResource("myOutputDOFileName"))
.lineAggregator(new DelimitedLineAggregator<OutputDO>() {
{
setDelimiter("");
setFieldExtractor(outputDOFieldExtractor.getOutputDOFieldExtractor());
};
})
.lineSeparator("\r\n")
.build();
}
Any/all guidance is much appreciated!
I guess you want to use this mode of Multi-threaded Step to resolve read slowly problem. more details is available from spring batch office - Multi-threaded Step about it.
Hope to help you.
I am trying to configure openCSV in the reader() step in the spring batch to directly convert a record read from a CSV file into a JAVA POJO. But I am running into the issue of how to correctly set the lineMapper with the openCSV.
As suggested in the post linked here How to replace flatFileItemReader with openCSV in spring batch, I am trying as below:
public Event reader() throws IOException {
FlatFileItemReader<Event> itemReader = new FlatFileItemReader<Event>();
itemReader.setLineMapper(lineMapper());
itemReader.setLinesToSkip(1);
itemReader.setResource(new FileSystemResource(inputFilePath));
return itemReader;
}
But I am not able to figure out how to configure the lineMapper:
public LineMapper<Event> lineMapper() throws IOException {
DefaultLineMapper<Event> lineMapper = new DefaultLineMapper<Event>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer("\t");
BeanWrapperFieldSetMapper<Event> fieldSetMapper = new BeanWrapperFieldSetMapper<Event>();
fieldSetMapper.setTargetType(Event.class);
lineMapper.setLineTokenizer(???);
lineMapper.setFieldSetMapper(???);
I have the code to read the file and convert it to the desired POJO but where to put it:
try (
Reader reader = Files.newBufferedReader(Paths.get(inputFilePath));
) {
CsvToBean<Event> csvToBean = new CsvToBeanBuilder(reader)
.withSkipLines(1)
.withType(Event.class)
.withIgnoreLeadingWhiteSpace(true)
.build();
return csvToBean.iterator().next();
}
Any help to point me in the right direction is highly appreciated.
You are using the DefaultLineMapper and trying to set a LineTokenizer and FieldSetMapper in it, but this is not what is mentioned in the link you shared.
You need a custom implementation of the LineMapper interface that is based on OpenCSV:
public class OpenCSVLineMapper<T> implements LineMapper<T> {
#Override
public T mapLine(String line, int lineNumber) throws Exception {
// TODO use OpenCSV to map a line to a POJO of type T
return null;
}
}
OpenCSV provides APIs to both read the file and map data to objects. You don't need the reading part as this will be done by the FlatFileItemReader from Spring Batch, you only need to use OpenCSV for the mapping part.
Once this in place, you can set your OpenCSV based line mapper implementation on the FlatFileItemReader:
public FlatFileItemReader<Event> reader() throws IOException {
FlatFileItemReader<Event> itemReader = new FlatFileItemReader<Event>();
itemReader.setResource(new FileSystemResource(inputFilePath));
itemReader.setLinesToSkip(1);
itemReader.setLineMapper(new OpenCSVLineMapper<>());
return itemReader;
}
Can I use FlatfileItemReader with Taskexecutor in spring batch??
I have implemented FlatFileItemReader with ThreadPoolTaskExecutor. When I print the records in ItemProcessor, I do not get consistent results, i.e. not all the records are printed and sometimes one of the record is printed more than once. It leads me to the fact that FlatFileItemReader is not thread safe and also it says the same in spring docs but I see some blogs where it says it is possible to use FlatFileItemReader with Task Executor.
So my question is: Is it possible to use FlatfileItemReader with Task Executor in anyway ?
#Bean
#StepScope
public FlatFileItemReader<DataLifeCycleEvent> csvFileReader(
#Value("#{stepExecution}") StepExecution stepExecution) {
Resource inputResource;
FlatFileItemReader<DataLifeCycleEvent> itemReader = new FlatFileItemReader<>();
itemReader.setLineMapper(new OnboardingLineMapper(stepExecution));
itemReader.setLinesToSkip(1);
itemReader.setSaveState(false);
itemReader.setSkippedLinesCallback(new OnboardingHeaderMapper(stepExecution));
String inputResourceString = stepExecution.getJobParameters().getString("inputResource");
inputResource = new FileSystemResource(inputFileLocation + ApplicationConstant.SLASH + inputResourceString);
itemReader.setResource(inputResource);
stepExecution.getJobExecution().getExecutionContext().putInt(ApplicationConstant.ERROR_COUNT, 0);
return itemReader;
}
FlatFileItemReader extends AbstractItemCountingItemStreamItemReader which is NOT thread-safe. So if you use it in a multi-threaded step, you need to synchronize it.
You can wrap it in a SynchronizedItemStreamReader. Here is a quick example:
#Bean
public SynchronizedItemStreamReader<DataLifeCycleEvent> itemReader() {
FlatFileItemReader<DataLifeCycleEvent> itemReader = ... // your item reader
SynchronizedItemStreamReader<DataLifeCycleEvent> synchronizedItemStreamReader = new SynchronizedItemStreamReader<>();
synchronizedItemStreamReader.setDelegate(itemReader);
return synchronizedItemStreamReader;
}
This method is giving this exception :-java.lang.ClassCastException: com.sun.proxy.$Proxy344 cannot be cast to org.springframework.batch.item.support.SynchronizedItemStreamReader
#Bean
public SynchronizedItemStreamReader<DataLifeCycleEvent> itemReader() {
FlatFileItemReader<DataLifeCycleEvent> itemReader = ... // your item reader
SynchronizedItemStreamReader<DataLifeCycleEvent> synchronizedItemStreamReader = new SynchronizedItemStreamReader<>();
synchronizedItemStreamReader.setDelegate(itemReader);
return synchronizedItemStreamReader;
}
I had a problem with a Spring Batch job for reading a large CSV file (a few million records) and saving the records from it to a database. The job uses FlatFileItemReader for reading the CSV and JpaItemWriter for writing read and processed records to the database. The problem is that JpaItemWriter doesn't clear the persistence context after flushing another chunk of items to the database and the job ends up with OutOfMemoryError.
I have solved the problem by extending JpaItemWriter and overriding the write method so that it calls EntityManager.clear() after writing a bunch, but I was wondering whether Spring Batch addresses this issue already and the root of the problem is in the job config. How to address this issue the right way?
My solution:
class ClearingJpaItemWriter<T> extends JpaItemWriter<T> {
private EntityManagerFactory entityManagerFactory;
#Override
public void write(List<? extends T> items) {
super.write(items);
EntityManager entityManager = EntityManagerFactoryUtils.getTransactionalEntityManager(entityManagerFactory);
if (entityManager == null) {
throw new DataAccessResourceFailureException("Unable to obtain a transactional EntityManager");
}
entityManager.clear();
}
#Override
public void setEntityManagerFactory(EntityManagerFactory entityManagerFactory) {
super.setEntityManagerFactory(entityManagerFactory);
this.entityManagerFactory = entityManagerFactory;
}
}
You can see the added entityManager.clear(); in the write method.
Job config:
#Bean
public JpaItemWriter postgresWriter() {
JpaItemWriter writer = new ClearingJpaItemWriter();
writer.setEntityManagerFactory(pgEntityManagerFactory);
return writer;
}
#Bean
public Step appontmentInitStep(JpaItemWriter<Appointment> writer, FlatFileItemReader<Appointment> reader) {
return stepBuilderFactory.get("initEclinicAppointments")
.transactionManager(platformTransactionManager)
.<Appointment, Appointment>chunk(5000)
.reader(reader)
.writer(writer)
.faultTolerant()
.skipLimit(1000)
.skip(FlatFileParseException.class)
.build();
}
#Bean
public Job appointmentInitJob(#Qualifier("initEclinicAppointments") Step step) {
return jobBuilderFactory.get(JOB_NAME)
.incrementer(new RunIdIncrementer())
.preventRestart()
.start(step)
.build();
}
That's a valid point. The JpaItemWriter (and HibernateItemWriter) used to clear the persistent context but this has been removed in BATCH-1635 (Here is the commit that removed it). However, this has been re-added and made configurable in the HibernateItemWriter in BATCH-1759 through the clearSession parameter (See this commit) but not in the JpaItemWriter.
So I suggest to open an issue against Spring Batch to add the same option to the JpaItemWriter as well in order to clear the persistence context after writing items (This would be consistent with the HibernateItemWriter).
That's said, to answer your question, you can indeed use a custom writer to clear the persistence context as you did.
Hope this helps.
I've a Spring Batch process that read Report objects from a CSV and insert Analytic objects into a MySQL DB correctly, but the logical has changed for a more than one Analytics insert for each Report readed.
I'm new in Spring Batch and the actually process was very difficult for me, and I don't know how to do this change.
I haven't XML configuration, all is with annotations. Report and Analytics classes have a getter and a setter for two fields, adId and value. The new logic has seven values for an adId and I need to insert seven rows into table.
I hide, delete or supress some code that not contribute for the question.
Here is my BatchConfiguration.java:
#Configuration
#EnableBatchProcessingpublic
class BatchConfiguration {
#Autowired
private transient JobBuilderFactory jobBuilderFactory;
#Autowired
private transient StepBuilderFactory stepBuilderFactory;
#Autowired
private transient DataSource dataSource;
public FlatFileItemReader<Report> reader() {
// The reader from the CSV works fine.
}
#Bean
public JdbcBatchItemWriter<Analytic> writer() {
final JdbcBatchItemWriter<Analytic> writer = new JdbcBatchItemWriter<Analytic>();
writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<Analytic>());
writer.setSql("INSERT INTO TABLE (ad_id, value) VALUES (:adId, :value)");
writer.setDataSource(dataSource);
return writer;
}
#Bean
public AnalyticItemProcessor processor() {
return new AnalyticItemProcessor();
}
#Bean
public Step step() {
return stepBuilderFactory.get("step1").<Report, Analytic> chunk(10000).reader(reader()).processor(processor()).writer(writer()).build();
}
#Bean
public Job process() {
final JobBuilder jobBuilder = jobBuilderFactory.get("process");
return jobBuilder.start(step()).build();
}
}
Then the AnalyticItemProcessor.java
public class AnalyticItemProcessor implements ItemProcessor<Report, Analytic> {
#Override
public Analytic process(final Report report) {
// Creates a new Analytic call BeanUtils.copyProperties(report, analytic) and returns analytic.
}
}
And the Process:
#SpringBootApplication
public class Process {
public static void main(String[] args) throws Exception {
SpringApplication.run(Process.class, args);
}
}
How can I do this change? Maybe with ItemPreparedStatementSetter or ItemSqlParameterSourceProvider? Thanks.
If I'm understanding your question correctly, you can use the CompositeItemWriter to wrap multiple JdbcBatchItemWriter instances (one per insert you need to accomplish). That would allow you to insert multiple rows per item. Otherwise, you'd need to write your own ItemWriter implementation.