Spring Batch JdbcPagingItemReader seems not EXECUTING ALL THE ITEMS - java

I'm working on an app that extract records from an Oracle database and then are exported as one single tabulated file.
However, when I attempt to read from the DB using JdbcPagingItemReader and write to a file I only get the number of records specified in pageSize. So if the pageSize is 10, then I get a file with 10 lines and the rest of the records seem to be ignored. So far, I haven't been able to find whats is really going on and any help would be most welcome.
Here is the JdbcPagingItemReader config:
<bean id="databaseItemReader"
class="org.springframework.batch.item.database.JdbcPagingItemReader" >
<property name="dataSource" ref="dataSourceTest" />
<property name="queryProvider">
<bean
class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="dataSourceTest" />
<property name="selectClause" value="SELECT *" />
<property name="fromClause" value="FROM *****" />
<property name="whereClause" value="where snapshot_id=:name" />
<property name="sortKey" value="snapshot_id" />
</bean>
</property>
<property name="parameterValues">
<map>
<entry key="name" value="18596" />
</map>
</property>
<property name="pageSize" value="100000" />
<property name="rowMapper">
<bean class="com.mkyong.ViewRowMapper" />
</property>
<bean id="itemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<!-- write to this csv file -->
<property name="resource" value="file:cvs/report.csv" />
<property name="shouldDeleteIfExists" value="true" />
<property name="lineAggregator">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<property name="delimiter" value=";" />
<property name="fieldExtractor">
<bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor">
<property name="names" value="ID" />
</bean>
</property>
</bean>
</property>
<job id="testJob" xmlns="http://www.springframework.org/schema/batch">
<step id="step1">
<tasklet>
<chunk reader="databaseItemReader" writer="itemWriter" commit-interval="1" />
</tasklet>
</step>
thanks

it was the scope="step" that was missing it should be:
<bean id="databaseItemReader"
class="org.springframework.batch.item.database.JdbcPagingItemReader" scope="step">

Your setting seem incorrect for whereClause and sort key can not be same because pagesize works hand to hand with your sorting column name.
Check how is your data(in corresponding table) looks like.
In spring batch , as per your configuration, spring will create and execute as given below..
first query executed with pagesize = 10 , is like following
SELECT top 10 FROM tableName where snapshot_id=18596 snapshot_id > 10
Second /remaining query executed depends on your sort key.
SELECT * FROM tableName where snapshot_id=18596 snapshot_id > 10
SELECT * FROM tableName where snapshot_id=18596 snapshot_id > 20
and so on.. try running this query in database , doesn't it look weird . :-)
If you don't need where clause, remove it.
And if possible keep page size and commit-interval same, because that's how you decide to process and persist. But of course that depends on your design. So you decide.

Adding #StepScope made my item reader take off with paging capability.
#Bean
#StepScope
ItemReader<Account> ItemReader(#Value("#{jobParameters[id]}") String id) {
JdbcPagingItemReader<Account> databaseReader = new JdbcPagingItemReader<>();
databaseReader.setDataSource(dataSource);
databaseReader.setPageSize(100);
databaseReader.setFetchSize(100);
PagingQueryProvider queryProvider = createQueryProvider(id);
databaseReader.setQueryProvider(queryProvider);
databaseReader.setRowMapper(new BeanPropertyRowMapper<>(Account.class));
return databaseReader;
}

Related

Spring batch commit interval rollback issue for duplicate record in database

I am trying to read rows from CSV files and persist into database. I am using MultiResourceItemReader delegated with FlatFileItemReader and JdbcBatchItemWriter for the read/persist operations. I configured the commit-interval of 50(for example)and skip policy.
I am using spring-batch-3.0.8, oracle database.
To be simple to understand,In the CSV file, I have 2 rows and commit-interval is 2.
Here, ROLLNO-201 is the record already present in DB.
Observation:
1.If the 1st row is duplicate of the record present in DB, and 2nd row is new record. I see the new record is inserted into DB, skipping the 1st row as it is duplicate.[working fine as expected].
ROLLNO NAME CLASS CITY
201 JOHN 4 MADISON
202 STEPHEN 5 MADISON
2.If the 1st row is new record, and 2nd row is duplicate of the record present in DB. I see the new record is not inserted into DB.[Issue].
ROLLNO NAME CLASS CITY
202 STEPHEN 5 MADISON
201 JOHN 4 MADISON
I see the transaction rollback is based on the commit-interval's records and not based on chunkSize. If commit-interval is 10, then the 10th record should not be a duplicate for transaction to commit.
Can anyone help me on this as I'm clueless here?
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean">
<property name="transactionManager" ref="transactionManager" />
<property name="dataSource" ref="cisDataSource" />
<property name="lobHandler" ref="lobHandler" />
<property name="isolationLevelForCreate" value="ISOLATION_READ_COMMITTED" />
</bean>
<bean id="transactionManager"
class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
<property name="dataSource" ref="cisDataSource" />
</bean>
<bean id="cisDataSource" class="org.apache.commons.dbcp.BasicDataSource">
<property name="initialSize" value="1" />
<property name="maxActive" value="${db.connection.pool.size}" />
<property name="driverClassName" value="oracle.jdbc.driver.OracleDriver" />
<property name="url" value="${cisdb.connection.string}" />
<property name="username" value="${cisdb.username}" />
<property name="password" value="${cisdb.password}" />
</bean>
<batch:job id="mdtJob1">
<batch:step id="mdtJob1Step1">
<batch:tasklet ref="fileTransferToProcessingFolderTasklet" />
<batch:next on="COMPLETED" to="mdtJob1Step2" />
</batch:step>
<batch:step id="mdtJob1Step2">
<batch:tasklet>
<batch:chunk reader="multiResourceReader" writer="naxAddressSqlItemWriter"
commit-interval="5">
<batch:skip-policy>
<bean class="org.springframework.batch.core.step.skip.AlwaysSkipItemSkipPolicy" scope="step"/>
</batch:skip-policy>
<batch:retry-policy>
<bean class="org.springframework.retry.policy.NeverRetryPolicy" scope="step"/>
</batch:retry-policy>
</batch:chunk>
<batch:no-rollback-exception-classes>
<batch:include class="java.sql.SQLException"/>
<batch:include class="org.springframework.dao.DuplicateKeyException"/>
<batch:include class="java.sql.SQLIntegrityConstraintViolationException"/>
</batch:no-rollback-exception-classes>
</batch:tasklet>
<batch:next on="COMPLETED" to="mdtJob1Step3" />
</batch:step>
<batch:step id="mdtJob1Step3">
<batch:tasklet ref="fileTransferToArchiveFolderTasklet" />
</batch:step>
</batch:job>
<bean id="multiResourceReader"
class="org.springframework.batch.item.file.MultiResourceItemReader"
scope="step">
<property name="delegate" ref="flatFileItemReader" />
<property name="resources" value="${batch.processing.files}" />
</bean>
<bean id="flatFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names"
value="${csv.fields.in.order}" />
</bean>
</property>
<property name="fieldSetMapper">
<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="addressDto" />
</bean>
</property>
</bean>
</property>
</bean>
<bean id="naxAddressSqlItemWriter"
class="org.springframework.batch.item.database.JdbcBatchItemWriter">
<property name="dataSource" ref="cdmDataSource" />
<property name="sql" value="${nax.address.insertion.query}" />
<property name="itemSqlParameterSourceProvider">
<bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" />
</property>
</bean>
I hope you would have got the solution by now. Just In case if you haven't
This solution had worked for me.
If you try to make commit-interval = 1, this will make sure one row will get inserted at once and duplicate row would not have new entry into DB.
For example:
chunk(1)

Spring batch to write multiple XML files using MultiResourceItemWriter

I am new to Spring batch and trying to write multiple XML files for each record that I am going to read from database table. Suppose if I read 10 records, I need to create 10 XML files. One for each record.
The name for each XML file should be unique. For that, I am planning to use "column_name1" value but I am not sure how to achieve that. If anyone can help me in this then that would be a great help.
Updated:
Added the #{formsPMVRowMapper.id} to resource property which can point to DefaultOutboundIFMRowMapper(Custom implementation of RowMapper) where I created a class level variable to set the row id. but still it's not working since it tries to call getter of ID even before getting into mapRow method which I think correct behviour but I am not sure how to get hold of that ID which I can use as resource name for my file in multiXmlFileItemWriter.
Could someone please let me know what could be the possibly correct way to do this?
Below is my Spring batch configuration file.
<util:properties id="batchProperties">
<prop key="batch.output.file">${outbound.pmv.filename}</prop>
</util:properties>
<bean id="itemReader" parent="pagingItemReader">
<property name="queryProvider" ref="outboundQueryProvider" />
<property name="rowMapper" ref="pmvRowMapper" />
</bean>
<bean id="pmvRowMapper"
class="tx.oag.cs.txcses.arch.batch.readers.DefaultOutboundIFMRowMapper">
<property name="idName" value="outbound_locate_record_staging_id" />
</bean>
<bean id="outboundQueryProvider" class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="selectClause"
value="select column_name1" />
<property name="fromClause" value="from table_name" />
<property name="whereClause"
value="where column_name1='AAA' and column_name1='bbbb'" />
<property name="sortKey" value="column_name1" />
</bean>
<bean id="batchProcessor" parent="outboundStagingBatchProcessor">
<property name="entityClass"
value="Class_Name" />
</bean>
<bean id="itemWriter" parent="multiXmlFileItemWriter"/>
<bean id="multiXmlFileItemWriter"
class="org.springframework.batch.item.file.MultiResourceItemWriter">
<property name="resource" value="${outbound.ifm.outbound}/#{pmvRowMapper.id}">
</property>
<property name="delegate">
<bean class="org.springframework.batch.item.xml.StaxEventItemWriter">
<property name="marshaller">
<bean class="tx.oag.cs.txcses.arch.batch.utils.XMLStringMarshaller" />
</property>
</bean>
</property>
<property name="itemCountLimitPerResource" value="1" />
</bean>
<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">
<property name="location"
value="classpath:config/env/#{#env}/batch-outbound.properties" />
<property name="properties" ref="batchProperties" />
<property name="localOverride" value="true" />
</bean>
I understand that above "resource" property can only write files with same name but I am not sure how to use "resource" property well in co-ordination with "resourceSuffixCreator" property.

Spring Batch using >= and < in Where Clause

Trying hands on with Spring Batch to read data which is created only yesterday. Below is the bean I am trying to use, using JdbcPagingItemReader & SqlPagingQueryProviderFactoryBean. However, the query isn't getting executed.
Appreciate your help!
<bean id="customersPagingItemReader"
class="org.springframework.batch.item.database.JdbcPagingItemReader"
scope="step">
<property name="dataSource" ref="dataSource" />
<property name="queryProvider">
<bean
class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="selectClause" value="SELECT CUST_ID, CREATED " />
<property name="fromClause" value=" from CUSTOMERS" />
<property name="whereClause" value=" where CREATED >= trunc(SYSDATE-1) and CREATED < trunc(SYSDATE)" />
</bean>
</property>
<property name="pageSize" value="5" />
<property name="fetchSize" value="5" />
<property name="rowMapper">
<bean class="com.yahoo.affiliationapi.api.CustomerRowMapper" />
</property>
</bean>
I was able to figure this out. When I looked at the Job Step Exit Message says - 'sortKey must be specified'
I just added the below property to the above code & it started working fine.
<property name="sortKey" value="CUST_ID" />
you have to set sort key for queryProvider.
you can refer this example:
#Bean
public SqlPagingQueryProviderFactoryBean queryProvider() {
SqlPagingQueryProviderFactoryBean provider = new SqlPagingQueryProviderFactoryBean();
provider.setSelectClause("select id, name, credit");
provider.setFromClause("from customer");
provider.setWhereClause("where status=:status");
provider.setSortKey("id");
return provider;
}

Spring batch ItemReader executing multiple times for same record

I am trying to implement Spring batch job for database cleanup.It just delete entry from a table in scheduled way.
First we fetch 10 rows from table.(ItemReader)
Removing these 10 entries from table(ItemWriter)
I have scheduled the batch at 15 minute interval.
When we launch the batch,surprisingly 10 thread tries to read the data from table.
Below is the configuration.
<!-- spring batch context -->
<bean id="jobRepository"
class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
<property name="transactionManager" ref="batchTransactionManager" />
</bean>
<bean id="batchTransactionManager"
class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
<bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
<!--<property name="taskExecutor">
<bean class="org.springframework.core.task.SimpleAsyncTaskExecutor" />
</property>-->
</bean>
<bean
class="org.springframework.batch.core.configuration.support.JobRegistryBeanPostProcessor">
<property name="jobRegistry" ref="jobRegistry" />
</bean>
<bean id="jobRegistry"
class="org.springframework.batch.core.configuration.support.MapJobRegistry" />
<!-- spring batch context -->
<!--<bean id="completionPolicy" class="org.springframework.batch.repeat.policy.DefaultResultCompletionPolicy"/>-->
<batch:job id="csrfTokenCleanUpBatchJob">
<batch:step id="step">
<tasklet>
<chunk reader="csrfTokenReader" writer="csrfTokenWriter" commit-interval="10"></chunk>
</tasklet>
</batch:step>
</batch:job>
<!-- run every 10 seconds -->
<bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean">
<property name="triggers">
<bean id="cronTrigger" class="org.springframework.scheduling.quartz.CronTriggerBean">
<property name="jobDetail" ref="jobDetail" />
<property name="cronExpression" value="* 0/15 * * * ?" />
</bean>
</property>
</bean>
<bean id="jobDetail" class="org.springframework.scheduling.quartz.JobDetailBean">
<property name="jobClass" value="com.test.oauth.batch.job.CSRFTokenJobLauncher" />
<property name="group" value="quartz-batch" />
<property name="jobDataAsMap">
<map>
<entry key="jobName" value="csrfTokenCleanUpBatchJob" />
<entry key="jobLocator" value-ref="jobRegistry" />
<entry key="jobLauncher" value-ref="jobLauncher" />
</map>
</property>
</bean>
</beans>
It is all by design you want to process each record. The ItemWriter gets as many records as you want but is bound by the commit-interval. Yours is 1 which means each record is individually committed, I suggest you set it to 50. The processor processes each record by it self until the commit interval is reached then the writer is called. As mentioned yours is 1.
Also, make the read method of ItemReader as synchronised.

Configure spring batch partition to for processor and writer whilst excluding the reader from the partitioning

I am using spring batch patitioning to run multiple threads of a job.
The job is supposed to read from the database, process the data and write the results to either a file of database,
below is my current configuration for my job.
<step id="masterStep">
<partition step="slave" partitioner="rangePartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
</partition>
</step>
</job>
<!-- Jobs to run -->
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
<tasklet>
<chunk reader="pagingItemReader" writer="flatFileItemWriter"
processor="itemProcessor" commit-interval="1" />
</tasklet>
</step>
With this configuration when the job is run, 10 threads of the job are started and that also means 10 readers are used, which means each record will be processed 10 times rendering the partitioning useless.
Can you please assist with a solution to partition only the processor and the reader so that we have multiple threads of the processor and the writer and use just 1 instance of the reader
Without the configuration for your reader, it's tough to give you an exact fix, but my bet is that you are not injecting the range values into your query. Below is an example of using the JdbcPagingItemReader in a remote partitioning job. You'll notice that the range of items being read are provided by the stepExecutionContext. Those are the values provided by the partitioner. Each ItemReader will get its own values.
<bean id="targetItemReader" class="org.springframework.batch.item.database.JdbcPagingItemReader" scope="step">
<property name="dataSource" ref="dataSource" />
<property name="queryProvider">
<bean
class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="selectClause" value="ID, IP, PORT, CONNECTED, BANNER" />
<property name="fromClause" value="FROM TARGET" />
<property name="whereClause" value="ID >= :minId AND ID <= :maxId AND CONNECTED IS NULL"/>
<property name="sortKey" value="ID" />
</bean>
</property>
<property name="pageSize" value="10" />
<property name="parameterValues">
<map>
<entry key="minId" value="#{stepExecutionContext[minValue]}"/>
<entry key="maxId" value="#{stepExecutionContext[maxValue]}"/>
</map>
</property>
<property name="rowMapper">
<bean class="com.michaelminella.springbatch.domain.TargetRowMapper"/>
</property>
</bean>
You can hear more about remote partitioning in my talk about it on YouTube here: https://www.youtube.com/watch?v=CYTj5YT7CZU
The code for that talk is here: https://github.com/mminella/Spring-Batch-Talk-2.0

Categories