Spring batch multithreading problem with Iterator Item Reader

Spring batch multithreading problem with Iterator Item Reader - java

I'm new to spring batch and still learning, I have batch configuration with IteratorItemReader , Custom Processor and Custom Writer as below,
#Autowired
JobBuilderFactory jobBuilderFactory;
#Autowired
StepBuilderFactory stepBuilderFactory;
#Value("${inputFile.location}")
private String inputFile;
#Bean
public Job testJob() throws IOException {
return jobBuilderFactory.get("testJob")
.incrementer(new RunIdIncrementer())
.start(testStep())
.listener(new JobListener())
.build();
}
#Bean
public Step testStep() throws IOException {
return stepBuilderFactory.get("testStep")
.<File, File>chunk(1)
.reader(testReader())
.processor(testProcessor())
.writer(testWriter())
.taskExecutor(threadPoolTaskExecutor())
.build();
}
#Bean
public ItemReader<File> testReader() throws IOException {
List<File> files = Files.walk(Paths.get(inputFile), 1)
.filter(Files::isRegularFile)
.map(Path::toFile)
.collect(Collectors.toList());
return new IteratorItemReader<>(files);
}
#Bean
public CustomProcessor testProcessor() {
return new CustomProcessor();
}
#Bean
public CustomWriter testWriter() {
return new CustomWriter();
}
#Bean
public ThreadPoolTaskExecutor threadPoolTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(4);
executor.setMaxPoolSize(6);
executor.setQueueCapacity(4);
executor.initialize();
return executor;
}
Here testReader() will check the given input path and list all the files into a List then it returns IteratorItemReader, and then in processor business logic is happening.
with multithreading If there are multiple files (more than one) in the input location everything is working fine, i'm not getting any error but,
Problem Statement : Let's say there's only one file in the input location (Ex: C:/User/documents/abc.txt), one thread will process the file completely everything is ok but in the end i'm getting this below exception,
ERROR - Encountered an error executing step testStep in job testJob
java.util.NoSuchElementException: null
at java.util.ArrayList$Itr.next(ArrayList.java:864)
at org.springframework.batch.item.support.IteratorItemReader.read (IteratorItemReader.java:70)
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead (SimpleChunk Provider.java:99)
at org.springframework.batch.core.step.item.SimpleChunkProvider.read (SimpleChunkProvider.java:180)
at org.springframework.batch.core.step.item.SimpleChunkProvider$1.doInIteration (SimpleChunk Provider.java:126)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult (RepeatTemplate.java:375)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal (RepeatTemplate.java:215)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:145)
at org.springframework.batch.core.step.item.SimpleChunk Provider.provide (SimpleChunkProvider.java:118)
at org.springframework.batch.core.step.item. ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:71)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction (TaskletStep.java:407)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:331)
at org.springframework.transaction.support. Transaction Template.execute(Transaction Template.java:140)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext (TaskletStep.java:273)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration (StepContextRepeatCallback.java:82)
at org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate$ExecutingRunnable.run (TaskExecutorRepeatTemplate.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
This Exception is happening because of multithreading only, when i tried to look into the IteratorItemReader class line number 70 i've found this below code,
if (iterator.hasNext())
return iterator.next();
else
return null; // end of data
What will be the best solution to over come this issue please provide your inputs on this,
Thanks in advance.
any suggestions would be helpful.

Related

How To Stop Polling InboundChannelAdapter

Im polling files from 2 different directories in 1 server using RotatingServerAdvice and that´s working fine, the problem is that I can´t stop polling once time I start the inboundtest.start (). The main idea is retrive all the files in those directories, and then send inboundtest.stop (), this is the code.
#Bean
public SessionFactory<LsEntry> sftpSessionFactory() {
DefaultSftpSessionFactory factory = new DefaultSftpSessionFactory(false);
factory.setHost(host);
factory.setPort(port);
factory.setUser(user);
factory.setPassword(password);
factory.setAllowUnknownKeys(true);
//factory.setTestSession(true);
return factory;
}
#Bean
public SftpInboundFileSynchronizer sftpInboundFileSynchronizer() {
SftpInboundFileSynchronizer fileSynchronizer = new SftpInboundFileSynchronizer(sftpSessionFactory());
fileSynchronizer.setDeleteRemoteFiles(true);
fileSynchronizer.setRemoteDirectory(sftpRemoteDirectory);
fileSynchronizer.setFilter(new SftpRegexPatternFileListFilter(".*?\\.(txt|TXT?)"));
return fileSynchronizer;
}
#Bean(name = "sftpMessageSource")
#EndpointId("inboundtest")
#InboundChannelAdapter(channel = "sftpChannel",poller = #Poller("fileReadingMessageSourcePollerMetadata"), autoStartup = "false")
public MessageSource<File> sftpMessageSource() {
SftpInboundFileSynchronizingMessageSource source =
new SftpInboundFileSynchronizingMessageSource(sftpInboundFileSynchronizer());
source.setLocalDirectory(new File(sftpLocalDirectoryDownloadUpload));
source.setAutoCreateLocalDirectory(true);
source.setLocalFilter(new AcceptOnceFileListFilter<File>());
return source;
}
#Bean
public DelegatingSessionFactory<LsEntry> sessionFactory() {
Map<Object, SessionFactory<LsEntry>> factories = new LinkedHashMap<>();
factories.put("one", sftpSessionFactory());
// use the first SF as the default
return new DelegatingSessionFactory<LsEntry>(factories, factories.values().iterator().next());
}
#Bean
public RotatingServerAdvice advice() {
List<RotationPolicy.KeyDirectory> keyDirectories = new ArrayList<>();
keyDirectories.add(new RotationPolicy.KeyDirectory("one", sftpRemoteDirectory));
keyDirectories.add(new RotationPolicy.KeyDirectory("one", sftpRemoteDirectoryNonUpload));
return new RotatingServerAdvice(sessionFactory(), keyDirectories, false);
}
#Bean
MessageChannel controlChannel() {
return new DirectChannel();
}
#Bean
#ServiceActivator(inputChannel = "controlChannel")
ExpressionControlBusFactoryBean controlBus() {
return new ExpressionControlBusFactoryBean();
}
#Bean
public PollerMetadata fileReadingMessageSourcePollerMetadata() {
PollerMetadata meta = new PollerMetadata();
meta.setTrigger(new PeriodicTrigger(1000));
meta.setAdviceChain(List.of(advice()));
meta.setMaxMessagesPerPoll(1);
meta.setErrorHandler(throwable -> new IOException());
return meta;
}
Allways is waiting for a new file in one of the 2 directories, but thats no the idea, the idea is stop polling when all the files be retrived
From another class I call inbound.start() trouhg the control chanel here the code:
#Autowired
private MessageChannel controlChannel;
public void startProcessingFiles() throws InterruptedException {
controlChannel.send(new GenericMessage<>("#inboundtest.start()"));
}
I was tryong stop with this class but doesn´t works
#Component
public class StopPollingAdvice implements ReceiveMessageAdvice {
#Autowired
private MessageChannel controlChannel;
#Override
public Message<?> afterReceive(Message<?> message, Object o) {
System.out.println("There is no more files, stopping connection" + message.getPayload());
if(message == null) {
System.out.println("There is no more files, stopping connection" + message.getPayload());
Message operation = MessageBuilder.withPayload("#inboundtest.stop()").build();
controlChannel.send(operation);
}
return message;
}
}

OK. Now I see your point. The RotatingServerAdvice does move to other server only when the first is exhausted (by default, see that fair option). So, when you stop it in the advice it cannot go to other dir for fetching any more. You need to think about some other stopping solution. Something what is not tied to the advice and this afterReceive(), somewhere downstream in your flow...
Or you can provide a custom RotationPolicy (extension of StandardRotationPolicy) and in its overridden afterReceive() check for all the dirs processed and then send stop command.

Spring batch JdbcPagingItemReaders cannot read the data again and finish immediately with read counts 0

I've a JdbcPagingItemReader which goes to Oracle DB and pulls records and writes to Mongo. The step itself is partitioned and I have saveState flag set to false. We don't really need to restart the job again. The reader is #StepScoped and the query provider too. However when the job is ran twice, the second time around it finishes immediately with read counts set to 0. I cannot see anything obviously wrong and there are no errors. I tried to look into batch_step_execution_context to see if it's somehow reusing the previously ran ItemReader which ran to completion but I couldn't see anything in that table related to ItemReader per se. Any ideas on how to go about debugging this?
#Bean
public Step load_A_Step_Partitioned(
Step load_A_Step,
OracleAnIdPartitioner oracleAnIdPartitioner,
TaskExecutor taskExecutor) {
return stepBuilderFactory
.get("load_A_Step_Partitioned")
.partitioner("load_A_Step_Partitioned", oracleAnIdPartitioner)
.step(load_A_Step)
.gridSize(appConfig.getGridSize())
.taskExecutor(taskExecutor)
.build();
}
#Bean
public Step load_A_Step(
JdbcPagingItemReader<SomeDTO> A_Reader,
MongoItemWriter<A> writer,
A_Processor A_processor) {
return stepBuilderFactory
.get("load_A")
.<SomeDTO, A>chunk(jobConfigCommon.getChunkSize())
.reader(A_Reader)
.processor(A_processor)
.writer(writer)
.build();
}
#Bean
#StepScope
public JdbcPagingItemReader<SomeDTO> A_Reader(
PagingQueryProvider A_QueryProvider,
#Qualifier("secondaryDatasource") DataSource dataSource) {
return new JdbcPagingItemReaderBuilder<SomeDTO>()
.name("A_Reader")
.dataSource(dataSource)
.queryProvider(A_QueryProvider)
.rowMapper(new A_RowMapper())
.pageSize(jobConfigCommon.getChunkSize())
.saveState(false)
.build();
}
#Bean
#StepScope
public PagingQueryProvider A_QueryProvider(
#Value("#{stepExecutionContext['ANID']}") String anId,
#Qualifier("secondaryDatasource") DataSource dataSource) {
SqlPagingQueryProviderFactoryBean providerFactory = new SqlPagingQueryProviderFactoryBean();
providerFactory.setDataSource(datasource);
providerFactory.setSelectClause(
"SOME QUERY");
providerFactory.setWhereClause(" anId = '" + anId + "'");
providerFactory.setFromClause(" A TABLE ");
providerFactory.setSortKey("COLUMN_TO_SORT");
try {
return providerFactory.getObject();
} catch (Exception e) {
throw new IllegalStateException("Failed to create A_QueryProvider", e);
}
}

How to skip empty rows using Spring Batch

I'm reading a fixed lenght flatfile with Spring Batch and I would like to skip empty rows and incorrect rows for my batch processing. In the exemple bellow I'm also want to skip rows that starts with the characters "------".
Could you please help me giving an exemple using Skip Policy or other ways?
My file:
---------------------------A---------------------------
AARON THIAGO LOPES 3099234 100-11
AARON PAPA DA SILVA 8610822 160-26
ABNER MENEZEZ SOUZA 1494778 500-35
EDSON EDUARD MOZART 1286664 500-34
//Method that reads the file.
#Configuration
#EnableBatchProcessing
public class SpringBatchConfig {
#Bean
public Job job(JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
ItemReader<Aluno> itemReader,
ItemWriter<Aluno> itemWriter){
Step step = stepBuilderFactory.get("ETL-file-load")
.<Aluno, Aluno>chunk(100)
.reader(itemReader)
.writer(itemWriter)
.build();
return jobBuilderFactory.get("ETL-Load")
.incrementer(new RunIdIncrementer())
.start(step)
.build();
}
#Bean
public FlatFileItemReader<Aluno> itemReader(#Value("${input}") Resource resource) {
FlatFileItemReader<Aluno> flatFileItemReader = new FlatFileItemReader<>();
flatFileItemReader.setResource(resource);
flatFileItemReader.setName("CSV-Reader");
flatFileItemReader.setLinesToSkip(2);
flatFileItemReader.setLineMapper(lineMapper());
return flatFileItemReader;
}

How to return indefinetely to previous Step when condition of a following Step is reached Spring Batch

Hello Spring Batch community! I have an input flat file with a header and a body. header is 1 line (naturally..) and 5 parameters. Body can reach up to 1 million records with 12 parameters each.
Input File:
01.01.2017|SUBDCOBR|12:21:23|01/12/2016|31/12/2016
01.01.2017|12345678231234|0002342434|BORGIA RUBEN|27-32548987-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,01
01.01.2017|12345673201234|2342434|ALVAREZ ESTHER|27-32533987-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,02
01.01.2017|12345673201234|0002342434|LOPEZ LUCRECIA|27-32553387-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,12
01.01.2017|12345672301234|0002342434|SILVA JESUS|27-32558657-9|NC|A|2062-
00010443|142,12|30/08/2017|142,12
.
.
.
I need to write this into a .txt file with certain format, and in this specific structure:
HEADER (8 customed lines, using data from HEADER input)
TITLE OF COLUMNS (1 line)
DETAILS (17 records from the body)
line break
SAME HEADER
SAME TITLE OF COLUMNS
DETAILS (next 17 records from the body)
line break
...
...
...
REPEAT until end of file
What I did was... create a stepHeader and a stepBody . Each of them with their own reader, processor (business formatter) and writer.
The job will have only this 2 simple steps.
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job")
.incrementer(new RunIdIncrementer())
.listener(new JobListener())
.start(stepHeader())
.next(stepBody())
.on("BACK TO STEPHEADER").to(stepHeader())
.on("END").end().build()
.build();
}
The header i read is configured with MaxItemCount=1, and mapped it to CabeceraFacturacion:
#Bean
public FlatFileItemReader<CabeceraFacturacion> readerCabecera() throws Exception{
FlatFileItemReader<CabeceraFacturacion> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(0);
reader.setMaxItemCount(1);
reader.setResource(new ClassPathResource("/inputFiles/input.txt"));
DefaultLineMapper<CabeceraFacturacion> cabeceraLineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer("|"); // en el default, la coma es el separador
tokenizer.setNames(new String[] {"printDate", "reportIdentifier", "tituloReporte", "fechaDesde", "fechaHasta"});
cabeceraLineMapper.setLineTokenizer(tokenizer);
cabeceraLineMapper.setFieldSetMapper(new CabeceraFieldSetMapper());
cabeceraLineMapper.afterPropertiesSet();
reader.setLineMapper(cabeceraLineMapper);
return reader;
}
The body i read it this way, skipping first line, and mapped it to DetalleFacturacion:
#Bean
public FlatFileItemReader<DetalleFacturacion> readerDetalleFacturacion(){
FlatFileItemReader<DetalleFacturacion> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(1);
//reader.setMaxItemCount(17);
reader.setResource(new ClassPathResource("/inputFiles/input.txt"));
DefaultLineMapper<DetalleFacturacion> detalleLineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer tokenizerDet = new DelimitedLineTokenizer("|"); // en el default, la coma es el separador
tokenizerDet.setNames(new String[] {"fechaEmision", "tipoDocumento", "letra", "nroComprobante",
"nroCliente", "razonSocial", "cuit", "montoNetoGP", "montoNetoG3",
"montoExento", "impuestos", "montoTotal"});
detalleLineMapper.setLineTokenizer(tokenizerDet);
detalleLineMapper.setFieldSetMapper(new DetalleFieldSetMapper());
detalleLineMapper.afterPropertiesSet();
reader.setLineMapper(detalleLineMapper);
return reader;
}
My Steps:
#Bean
public Step stepHeader() throws Exception {
return stepBuilderFactory.get("stepHeader")
.<CabeceraFacturacion, CabeceraFacturacion> chunk(17)
.faultTolerant()
.listener(new ChunkListener())
.reader(readerCabecera())
.writer(writerCabeceraFact())
.allowStartIfComplete(true)
.build();
}
#Bean
public Step stepBody() {
return stepBuilderFactory.get("stepBody")
.<DetalleFacturacion, DetalleFacturacion> chunk(17)
.chunk(17)
.faultTolerant()
.listener(new ChunkListener())
.reader(readerDetalleFacturacion())
.writer(writerDetalleFact())
.listener(new StepExecutionListener() {
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
if(stepExecution.getWriteCount()==17) {
return new ExitStatus("BACK TO STEPHEADER");
};
// if(stepExecution.getReadCount()<17) {
// return new ExitStatus("END");
// }
return null;
}
#Override
public void beforeStep(StepExecution stepExecution) {
}
})
.allowStartIfComplete(true)
.build();
}
1) I don't know how to achieve going back to the StepHeader indefinetely until the file ends. There i tried usind the stepExecution.getWriteCount(17).. but i'm not sure this is the way.
2) i don´t know how to read 17 different records every time it loops ( i managed to make it loop but it would write the same first 17 records over and over again until i manually stopped the job. I now know that loops are not recommended in Spring Batch processes.
3) if anyone has any idea on another way to achieve my goal, it will be most welcome.
4) Is there a way to make a decider that's "hearing" all the time, and sends the order to print header or body if certain condition is satisfied?
Up until now, the max i achieved is to read & write only one time the header... and in the next step read & write 17 lines of the body.
Thank you everyone!
Cheers!!

Not Sure if i understood your question correctly, But this what you want to achive
Step 1 : Read header from file
Step 2 : Read file ,process data and write to some file Until some condition A
Step 3 : On Condition A Go to Step 1
There can be multiple options to configure this. the one i can think is by adding additional step for flow decision .. below is sample configuration.
Note I have not tested this, you might have to do some modifications
#Bean
public Job conditionalJob(JobBuilderFactory jobs, Step conditionalStep1, Step conditionalStep2, Step conditionalStep3, Step conditionalStep4, Step conditionalStep5) throws Exception {
return jobs.get("conditionalJob")
.incrementer(new RunIdIncrementer())
.flow(flowDesider).on("HEADER").to(step1).next("flowDesider")
.from(flowDesider).on("BODAY").to(step2).next("flowDesider")
.from(flowDesider).on("*").stop()
.end()
.build();
}
public class flowDesider implements Tasklet{
private Logger logger = LoggerFactory.getLogger(this.getClass());
#Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
logger.info("flowDesider");
//put your flow logic header
//you can use step excequation to pass infomrtion from one step to onother
if(codition1)
return status as HEADER
if (condition2)
return status as Body
if condition3
return status as complited
}

Unfound resource for FlatFileItemReader after moving the file

I am using Spring Batch to read from a CSV file and write the lines on the screen.
My job is composed of 3 parts:
Part 1 : Verify if the CSV file exists in some INPUT directory on my disk, if it returns TRUE the file will be moved to another directory called PROD.
Part 2 : Extract data from the CSV file using FlatFileItemReader.
Part 3 : Write the all the items to the screen.
The problem is the FlatFileItemReader throws org.springframework.batch.item.ItemStreamException: Failed to initialize the reader caused by java.lang.IllegalArgumentException: Input resource must be set
Here is my code:
#Bean
public FlatFileItemReader<UniversInvestissement> reader() {
FlatFileItemReader<UniversInvestissement> reader = new FlatFileItemReader<>();
File csvFile = new File("C://INPUT/data.csv");
Resource resource = resourceLoader.getResource("file:" + csvFile.getAbsolutePath());
reader.setLinesToSkip(1);
reader.setResource(resource);
DefaultLineMapper lineMapper = new DefaultLineMapper();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setNames(new String[]{"COL1", "COL2", "COL3", "COL4"});
tokenizer.setDelimiter(";");
FieldSetMapper fieldSetMapper = new UniversInvestissementFieldSetMapper();
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
reader.setLineMapper(lineMapper);
reader.setEncoding("Cp1252");
return reader;
}
#Bean
public UniversInvestissementWriter writer() {
return new UniversInvestissementWriter();
}
#Bean
public UniversInvestissementProcessor processor() {
return new UniversInvestissementProcessor();
}
#Bean
public Step extractData() {
return steps.get("extractData")
.<UniversInvestissement, UniversInvestissementProcessorResult>chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
Actually the problem is that when the FlatFileItemReader is initialized it can't find the CSV file as a resource !
Is there a way to postpone the resource assignment and avoid this exception ?

you Can use reader.setStrict(false); if you set strict mode to false the reader will Not throw an exception on. You might have to use #StepScope to make reader lazy. I am using same setup and it's working fine for me , Hope this helps you

Verify if the CSV file exists in some INPUT directory on my disk, if
it returns TRUE the file will be moved to another directory called
PROD
This problem can easly be solved using a JobExecutionDecider
class Checker implements JobExecutionDecider {
FlowExecutionStatus decide(...) {
if(<file not found in INPUT/ dir>) {
return FlowExecutionStatus.STOPPED;
}
if(!<copy file from INPUT/ to PROD/ works>) {
return FlowExecutionStatus.FAILED;
}
return FlowExecutionStatus.COMPLETED;
}
}
Of course, extractData() must be changed to insert use of programmatic flow decision (check here for a simple example)

I think that problem in your resourceLoader, because such exception thrown by non-null assertion of resource instance. So you resourceLoader return null value.
Try to use FileSystemResource and without any resource loaders. For example:
reader.setResource(new FileSystemResource(csvFile));

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spring batch multithreading problem with Iterator Item Reader - java

Related

How To Stop Polling InboundChannelAdapter

Spring batch JdbcPagingItemReaders cannot read the data again and finish immediately with read counts 0

How to skip empty rows using Spring Batch

How to return indefinetely to previous Step when condition of a following Step is reached Spring Batch

Unfound resource for FlatFileItemReader after moving the file

Categories

Resources