How to skip empty rows using Spring Batch

How to skip empty rows using Spring Batch - java

I'm reading a fixed lenght flatfile with Spring Batch and I would like to skip empty rows and incorrect rows for my batch processing. In the exemple bellow I'm also want to skip rows that starts with the characters "------".
Could you please help me giving an exemple using Skip Policy or other ways?
My file:
---------------------------A---------------------------
AARON THIAGO LOPES 3099234 100-11
AARON PAPA DA SILVA 8610822 160-26
ABNER MENEZEZ SOUZA 1494778 500-35
EDSON EDUARD MOZART 1286664 500-34
//Method that reads the file.
#Configuration
#EnableBatchProcessing
public class SpringBatchConfig {
#Bean
public Job job(JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
ItemReader<Aluno> itemReader,
ItemWriter<Aluno> itemWriter){
Step step = stepBuilderFactory.get("ETL-file-load")
.<Aluno, Aluno>chunk(100)
.reader(itemReader)
.writer(itemWriter)
.build();
return jobBuilderFactory.get("ETL-Load")
.incrementer(new RunIdIncrementer())
.start(step)
.build();
}
#Bean
public FlatFileItemReader<Aluno> itemReader(#Value("${input}") Resource resource) {
FlatFileItemReader<Aluno> flatFileItemReader = new FlatFileItemReader<>();
flatFileItemReader.setResource(resource);
flatFileItemReader.setName("CSV-Reader");
flatFileItemReader.setLinesToSkip(2);
flatFileItemReader.setLineMapper(lineMapper());
return flatFileItemReader;
}

Related

Spring batch multithreading problem with Iterator Item Reader

I'm new to spring batch and still learning, I have batch configuration with IteratorItemReader , Custom Processor and Custom Writer as below,
#Autowired
JobBuilderFactory jobBuilderFactory;
#Autowired
StepBuilderFactory stepBuilderFactory;
#Value("${inputFile.location}")
private String inputFile;
#Bean
public Job testJob() throws IOException {
return jobBuilderFactory.get("testJob")
.incrementer(new RunIdIncrementer())
.start(testStep())
.listener(new JobListener())
.build();
}
#Bean
public Step testStep() throws IOException {
return stepBuilderFactory.get("testStep")
.<File, File>chunk(1)
.reader(testReader())
.processor(testProcessor())
.writer(testWriter())
.taskExecutor(threadPoolTaskExecutor())
.build();
}
#Bean
public ItemReader<File> testReader() throws IOException {
List<File> files = Files.walk(Paths.get(inputFile), 1)
.filter(Files::isRegularFile)
.map(Path::toFile)
.collect(Collectors.toList());
return new IteratorItemReader<>(files);
}
#Bean
public CustomProcessor testProcessor() {
return new CustomProcessor();
}
#Bean
public CustomWriter testWriter() {
return new CustomWriter();
}
#Bean
public ThreadPoolTaskExecutor threadPoolTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(4);
executor.setMaxPoolSize(6);
executor.setQueueCapacity(4);
executor.initialize();
return executor;
}
Here testReader() will check the given input path and list all the files into a List then it returns IteratorItemReader, and then in processor business logic is happening.
with multithreading If there are multiple files (more than one) in the input location everything is working fine, i'm not getting any error but,
Problem Statement : Let's say there's only one file in the input location (Ex: C:/User/documents/abc.txt), one thread will process the file completely everything is ok but in the end i'm getting this below exception,
ERROR - Encountered an error executing step testStep in job testJob
java.util.NoSuchElementException: null
at java.util.ArrayList$Itr.next(ArrayList.java:864)
at org.springframework.batch.item.support.IteratorItemReader.read (IteratorItemReader.java:70)
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead (SimpleChunk Provider.java:99)
at org.springframework.batch.core.step.item.SimpleChunkProvider.read (SimpleChunkProvider.java:180)
at org.springframework.batch.core.step.item.SimpleChunkProvider$1.doInIteration (SimpleChunk Provider.java:126)
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult (RepeatTemplate.java:375)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal (RepeatTemplate.java:215)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:145)
at org.springframework.batch.core.step.item.SimpleChunk Provider.provide (SimpleChunkProvider.java:118)
at org.springframework.batch.core.step.item. ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:71)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction (TaskletStep.java:407)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:331)
at org.springframework.transaction.support. Transaction Template.execute(Transaction Template.java:140)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext (TaskletStep.java:273)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration (StepContextRepeatCallback.java:82)
at org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate$ExecutingRunnable.run (TaskExecutorRepeatTemplate.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
This Exception is happening because of multithreading only, when i tried to look into the IteratorItemReader class line number 70 i've found this below code,
if (iterator.hasNext())
return iterator.next();
else
return null; // end of data
What will be the best solution to over come this issue please provide your inputs on this,
Thanks in advance.
any suggestions would be helpful.

Spring batch JdbcPagingItemReaders cannot read the data again and finish immediately with read counts 0

I've a JdbcPagingItemReader which goes to Oracle DB and pulls records and writes to Mongo. The step itself is partitioned and I have saveState flag set to false. We don't really need to restart the job again. The reader is #StepScoped and the query provider too. However when the job is ran twice, the second time around it finishes immediately with read counts set to 0. I cannot see anything obviously wrong and there are no errors. I tried to look into batch_step_execution_context to see if it's somehow reusing the previously ran ItemReader which ran to completion but I couldn't see anything in that table related to ItemReader per se. Any ideas on how to go about debugging this?
#Bean
public Step load_A_Step_Partitioned(
Step load_A_Step,
OracleAnIdPartitioner oracleAnIdPartitioner,
TaskExecutor taskExecutor) {
return stepBuilderFactory
.get("load_A_Step_Partitioned")
.partitioner("load_A_Step_Partitioned", oracleAnIdPartitioner)
.step(load_A_Step)
.gridSize(appConfig.getGridSize())
.taskExecutor(taskExecutor)
.build();
}
#Bean
public Step load_A_Step(
JdbcPagingItemReader<SomeDTO> A_Reader,
MongoItemWriter<A> writer,
A_Processor A_processor) {
return stepBuilderFactory
.get("load_A")
.<SomeDTO, A>chunk(jobConfigCommon.getChunkSize())
.reader(A_Reader)
.processor(A_processor)
.writer(writer)
.build();
}
#Bean
#StepScope
public JdbcPagingItemReader<SomeDTO> A_Reader(
PagingQueryProvider A_QueryProvider,
#Qualifier("secondaryDatasource") DataSource dataSource) {
return new JdbcPagingItemReaderBuilder<SomeDTO>()
.name("A_Reader")
.dataSource(dataSource)
.queryProvider(A_QueryProvider)
.rowMapper(new A_RowMapper())
.pageSize(jobConfigCommon.getChunkSize())
.saveState(false)
.build();
}
#Bean
#StepScope
public PagingQueryProvider A_QueryProvider(
#Value("#{stepExecutionContext['ANID']}") String anId,
#Qualifier("secondaryDatasource") DataSource dataSource) {
SqlPagingQueryProviderFactoryBean providerFactory = new SqlPagingQueryProviderFactoryBean();
providerFactory.setDataSource(datasource);
providerFactory.setSelectClause(
"SOME QUERY");
providerFactory.setWhereClause(" anId = '" + anId + "'");
providerFactory.setFromClause(" A TABLE ");
providerFactory.setSortKey("COLUMN_TO_SORT");
try {
return providerFactory.getObject();
} catch (Exception e) {
throw new IllegalStateException("Failed to create A_QueryProvider", e);
}
}

How to return indefinetely to previous Step when condition of a following Step is reached Spring Batch

Hello Spring Batch community! I have an input flat file with a header and a body. header is 1 line (naturally..) and 5 parameters. Body can reach up to 1 million records with 12 parameters each.
Input File:
01.01.2017|SUBDCOBR|12:21:23|01/12/2016|31/12/2016
01.01.2017|12345678231234|0002342434|BORGIA RUBEN|27-32548987-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,01
01.01.2017|12345673201234|2342434|ALVAREZ ESTHER|27-32533987-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,02
01.01.2017|12345673201234|0002342434|LOPEZ LUCRECIA|27-32553387-9|FA|A|2062-
00010443/444/445|142,12|30/08/2017|142,12
01.01.2017|12345672301234|0002342434|SILVA JESUS|27-32558657-9|NC|A|2062-
00010443|142,12|30/08/2017|142,12
.
.
.
I need to write this into a .txt file with certain format, and in this specific structure:
HEADER (8 customed lines, using data from HEADER input)
TITLE OF COLUMNS (1 line)
DETAILS (17 records from the body)
line break
SAME HEADER
SAME TITLE OF COLUMNS
DETAILS (next 17 records from the body)
line break
...
...
...
REPEAT until end of file
What I did was... create a stepHeader and a stepBody . Each of them with their own reader, processor (business formatter) and writer.
The job will have only this 2 simple steps.
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job")
.incrementer(new RunIdIncrementer())
.listener(new JobListener())
.start(stepHeader())
.next(stepBody())
.on("BACK TO STEPHEADER").to(stepHeader())
.on("END").end().build()
.build();
}
The header i read is configured with MaxItemCount=1, and mapped it to CabeceraFacturacion:
#Bean
public FlatFileItemReader<CabeceraFacturacion> readerCabecera() throws Exception{
FlatFileItemReader<CabeceraFacturacion> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(0);
reader.setMaxItemCount(1);
reader.setResource(new ClassPathResource("/inputFiles/input.txt"));
DefaultLineMapper<CabeceraFacturacion> cabeceraLineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer("|"); // en el default, la coma es el separador
tokenizer.setNames(new String[] {"printDate", "reportIdentifier", "tituloReporte", "fechaDesde", "fechaHasta"});
cabeceraLineMapper.setLineTokenizer(tokenizer);
cabeceraLineMapper.setFieldSetMapper(new CabeceraFieldSetMapper());
cabeceraLineMapper.afterPropertiesSet();
reader.setLineMapper(cabeceraLineMapper);
return reader;
}
The body i read it this way, skipping first line, and mapped it to DetalleFacturacion:
#Bean
public FlatFileItemReader<DetalleFacturacion> readerDetalleFacturacion(){
FlatFileItemReader<DetalleFacturacion> reader = new FlatFileItemReader<>();
reader.setLinesToSkip(1);
//reader.setMaxItemCount(17);
reader.setResource(new ClassPathResource("/inputFiles/input.txt"));
DefaultLineMapper<DetalleFacturacion> detalleLineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer tokenizerDet = new DelimitedLineTokenizer("|"); // en el default, la coma es el separador
tokenizerDet.setNames(new String[] {"fechaEmision", "tipoDocumento", "letra", "nroComprobante",
"nroCliente", "razonSocial", "cuit", "montoNetoGP", "montoNetoG3",
"montoExento", "impuestos", "montoTotal"});
detalleLineMapper.setLineTokenizer(tokenizerDet);
detalleLineMapper.setFieldSetMapper(new DetalleFieldSetMapper());
detalleLineMapper.afterPropertiesSet();
reader.setLineMapper(detalleLineMapper);
return reader;
}
My Steps:
#Bean
public Step stepHeader() throws Exception {
return stepBuilderFactory.get("stepHeader")
.<CabeceraFacturacion, CabeceraFacturacion> chunk(17)
.faultTolerant()
.listener(new ChunkListener())
.reader(readerCabecera())
.writer(writerCabeceraFact())
.allowStartIfComplete(true)
.build();
}
#Bean
public Step stepBody() {
return stepBuilderFactory.get("stepBody")
.<DetalleFacturacion, DetalleFacturacion> chunk(17)
.chunk(17)
.faultTolerant()
.listener(new ChunkListener())
.reader(readerDetalleFacturacion())
.writer(writerDetalleFact())
.listener(new StepExecutionListener() {
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
if(stepExecution.getWriteCount()==17) {
return new ExitStatus("BACK TO STEPHEADER");
};
// if(stepExecution.getReadCount()<17) {
// return new ExitStatus("END");
// }
return null;
}
#Override
public void beforeStep(StepExecution stepExecution) {
}
})
.allowStartIfComplete(true)
.build();
}
1) I don't know how to achieve going back to the StepHeader indefinetely until the file ends. There i tried usind the stepExecution.getWriteCount(17).. but i'm not sure this is the way.
2) i don´t know how to read 17 different records every time it loops ( i managed to make it loop but it would write the same first 17 records over and over again until i manually stopped the job. I now know that loops are not recommended in Spring Batch processes.
3) if anyone has any idea on another way to achieve my goal, it will be most welcome.
4) Is there a way to make a decider that's "hearing" all the time, and sends the order to print header or body if certain condition is satisfied?
Up until now, the max i achieved is to read & write only one time the header... and in the next step read & write 17 lines of the body.
Thank you everyone!
Cheers!!

Not Sure if i understood your question correctly, But this what you want to achive
Step 1 : Read header from file
Step 2 : Read file ,process data and write to some file Until some condition A
Step 3 : On Condition A Go to Step 1
There can be multiple options to configure this. the one i can think is by adding additional step for flow decision .. below is sample configuration.
Note I have not tested this, you might have to do some modifications
#Bean
public Job conditionalJob(JobBuilderFactory jobs, Step conditionalStep1, Step conditionalStep2, Step conditionalStep3, Step conditionalStep4, Step conditionalStep5) throws Exception {
return jobs.get("conditionalJob")
.incrementer(new RunIdIncrementer())
.flow(flowDesider).on("HEADER").to(step1).next("flowDesider")
.from(flowDesider).on("BODAY").to(step2).next("flowDesider")
.from(flowDesider).on("*").stop()
.end()
.build();
}
public class flowDesider implements Tasklet{
private Logger logger = LoggerFactory.getLogger(this.getClass());
#Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
logger.info("flowDesider");
//put your flow logic header
//you can use step excequation to pass infomrtion from one step to onother
if(codition1)
return status as HEADER
if (condition2)
return status as Body
if condition3
return status as complited
}

Unfound resource for FlatFileItemReader after moving the file

I am using Spring Batch to read from a CSV file and write the lines on the screen.
My job is composed of 3 parts:
Part 1 : Verify if the CSV file exists in some INPUT directory on my disk, if it returns TRUE the file will be moved to another directory called PROD.
Part 2 : Extract data from the CSV file using FlatFileItemReader.
Part 3 : Write the all the items to the screen.
The problem is the FlatFileItemReader throws org.springframework.batch.item.ItemStreamException: Failed to initialize the reader caused by java.lang.IllegalArgumentException: Input resource must be set
Here is my code:
#Bean
public FlatFileItemReader<UniversInvestissement> reader() {
FlatFileItemReader<UniversInvestissement> reader = new FlatFileItemReader<>();
File csvFile = new File("C://INPUT/data.csv");
Resource resource = resourceLoader.getResource("file:" + csvFile.getAbsolutePath());
reader.setLinesToSkip(1);
reader.setResource(resource);
DefaultLineMapper lineMapper = new DefaultLineMapper();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setNames(new String[]{"COL1", "COL2", "COL3", "COL4"});
tokenizer.setDelimiter(";");
FieldSetMapper fieldSetMapper = new UniversInvestissementFieldSetMapper();
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
reader.setLineMapper(lineMapper);
reader.setEncoding("Cp1252");
return reader;
}
#Bean
public UniversInvestissementWriter writer() {
return new UniversInvestissementWriter();
}
#Bean
public UniversInvestissementProcessor processor() {
return new UniversInvestissementProcessor();
}
#Bean
public Step extractData() {
return steps.get("extractData")
.<UniversInvestissement, UniversInvestissementProcessorResult>chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
Actually the problem is that when the FlatFileItemReader is initialized it can't find the CSV file as a resource !
Is there a way to postpone the resource assignment and avoid this exception ?

you Can use reader.setStrict(false); if you set strict mode to false the reader will Not throw an exception on. You might have to use #StepScope to make reader lazy. I am using same setup and it's working fine for me , Hope this helps you

Verify if the CSV file exists in some INPUT directory on my disk, if
it returns TRUE the file will be moved to another directory called
PROD
This problem can easly be solved using a JobExecutionDecider
class Checker implements JobExecutionDecider {
FlowExecutionStatus decide(...) {
if(<file not found in INPUT/ dir>) {
return FlowExecutionStatus.STOPPED;
}
if(!<copy file from INPUT/ to PROD/ works>) {
return FlowExecutionStatus.FAILED;
}
return FlowExecutionStatus.COMPLETED;
}
}
Of course, extractData() must be changed to insert use of programmatic flow decision (check here for a simple example)

I think that problem in your resourceLoader, because such exception thrown by non-null assertion of resource instance. So you resourceLoader return null value.
Try to use FileSystemResource and without any resource loaders. For example:
reader.setResource(new FileSystemResource(csvFile));

How to use JMS to knows when the last message of a list has been processed

1500 records that I'm breaking up with asynchronous processing with JMS into smaller groups (~250).
1500 too is not a fixed value though. For each client can be more or less. In some cases there can be a 8000 products, or more. I will have N clients doing this operation one, two, three, or four times per day.
I have been breaking the records into smaller groups to avoid having a transaction with 1500 records.
I need to start some task only when all parts have been processed (all 1500).
How can I do this? I'm using Spring 4, JMS 2, HornetQ, and for now using config by annotations.
Maybe I'm not doing the right thing using JMS for that problem. I need help with that too. I have an XML file (from a webservice) with 1500 products (code, price, stock, stock_local, title) and I have to persist all of them.
After, and only after all of them are processed I need to start the task that will update Stock and Price values of each (into a remote system), based on the newly stored values (along with some other conditions)
The code:
// in some RestController i have
Lists.partition(newProducts, 250).forEach(listPart->
myQueue.add(createMessage(Lists.newArrayList(listPart))));
//called some times. Each message contains a list of 250 products to persist
public void add(ProductsMessage message) {
this.jmsTemplate.send(QUEUE_NAME, session -> session.createObjectMessage(message));
}
#JmsListener(destination = QUEUE_NAME, )
public void importProducts(ProductsMessage message) {
....
//at this method i get message.getList and persist all 250 products
}
Actual config JMS:
#Configuration
#EnableJms
public class JmsConfig {
public static final int DELIVERY_DELAY = 1000;
public static final int SESSION_CACHE_SIZE = 10;
#Bean
#Autowired
public DefaultJmsListenerContainerFactory jmsListenerContainerFactory(PlatformTransactionManager transactionManager) {
DefaultJmsListenerContainerFactory factory =
new DefaultJmsListenerContainerFactory();
factory.setConnectionFactory(connectionFactory());
factory.setDestinationResolver(destinationResolver());
factory.setConcurrency("1-2");
factory.setTransactionManager(transactionManager);
return factory;
}
#Bean
public DestinationResolver destinationResolver() {
return new DynamicDestinationResolver();
}
#Bean
public ConnectionFactory connectionFactory() {
TransportConfiguration transport = new TransportConfiguration(InVMConnectorFactory.class.getName());
ConnectionFactory originalConnectionFactory = HornetQJMSClient.createConnectionFactoryWithoutHA(JMSFactoryType.CF, transport);
CachingConnectionFactory connectionFactory = new CachingConnectionFactory();
connectionFactory.setTargetConnectionFactory(originalConnectionFactory);
connectionFactory.setSessionCacheSize(SESSION_CACHE_SIZE);
return connectionFactory;
}
#Bean
public JmsTemplate template(ConnectionFactory connectionFactory) {
JmsTemplate template = new JmsTemplate();
template.setConnectionFactory(connectionFactory);
template.setDeliveryDelay(DELIVERY_DELAY);
template.setSessionTransacted(true);
return template;
}
/**
* Inicializa um broker JMS embarcado
*/
#Bean(initMethod = "start", destroyMethod = "stop")
public EmbeddedJMS startJmsBroker() {
return new EmbeddedJMS();
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to skip empty rows using Spring Batch - java

Related

Spring batch multithreading problem with Iterator Item Reader

Spring batch JdbcPagingItemReaders cannot read the data again and finish immediately with read counts 0

How to return indefinetely to previous Step when condition of a following Step is reached Spring Batch

Unfound resource for FlatFileItemReader after moving the file

How to use JMS to knows when the last message of a list has been processed

Categories

Resources