spring batch and Multi-threaded step

spring batch and Multi-threaded step - java

I am currently working on a Batch that consumes data from a large SQL database with millions of rows.
It does some processing in the processor that consists of grouping rows retrieved from the Reader via a large sql query with joins.
And the Writer writes the result to another table.
The problem is that this Batch has performance problems, because the Sql selection queries take a lot of time and the steps are not executed in multithreading.
So I'd like to run them in multitheading but the problem is that the steps group the rows by calculating a total amount of all the rows with the same types for example.
So if I put it in multitheading how can I do that when each partition is going to be processed in a different thread knowing that it's millions of rows that I can't store in the context to retrieve them after the step and do the grouping.
and I can't save them in the database either since it's millions of rows
Do you have any idea how I can do this?
I hope I was able to explain my problem well.
And thanks in advance for your help

I've had a similar task like yours, unlikly we were using java 1.7 and spring 3.x. I can provide a configuiration in xml so maybe you will be able to use annotation configuration for this I've not tryed.
<batch:job id="dualAgeRestrictionJob">
<-- use a listner if you need -->
<batch:listeners>
<batch:listener ref="dualAgeRestrictionJobListener" />
</batch:listeners>
<!-- master step, 10 threads (grid-size) -->
<batch:step id="dualMasterStep">
<partition step="dualSlaveStep"
partitioner="arInputRangePartitioner">
<handler grid-size="${AR_GRID_SIZE}" task-executor="taskExecutor" />
</partition>
</batch:step>
</batch:job>
<-- here you define your reader processor and writer and the commit interval -->
<batch:step id="dualSlaveStep">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="arInputPagingItemReader"
writer="arOutputWriter" processor="arInputItemProcessor"
commit-interval="${AR_COMMIT_INTERVAL}" />
</batch:tasklet>
</batch:step>
<!-- The partitioner -->
<bean id="arInputRangePartitioner" class="com.example.ArInputRangePartitioner">
<property name="arInputDao" ref="arInputJDBCTemplate" />
<property name="statsForMail" ref="statsForMail" />
</bean>
<bean id="taskExecutor"
class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="${AR_CORE_POOL_SIZE}" />
<property name="maxPoolSize" value="${AR_MAX_POOL_SIZE}" />
<property name="allowCoreThreadTimeOut" value="${AR_ALLOW_CORE_THREAD_TIME_OUT}" />
</bean>
<bean id="transactionManager"
class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
<property name="dataSource" ref="kvrDatasource" />
</bean>
The partitioner makes a query to count the rows and make chunks for each thread:
public class ArInputRangePartitioner implements Partitioner {
private static final Logger logger = LoggerFactory.getLogger(ArInputRangePartitioner.class);
private ArInputDao arInputDao;
private StatsForMail statsForMail;
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>();
// You can make a query and then divede the from to for each thread
Map<Integer,Integer> idMap = arInputDao.getOrderIdList();
Integer countRow = idMap.size();
statsForMail.setNumberOfRecords( countRow );
Integer range = countRow / gridSize;
Integer remains = countRow % gridSize;
int fromId = 1;
int toId = range;
for (int i = 1; i <= gridSize; i++) {
ExecutionContext value = new ExecutionContext();
if(i == gridSize) {
toId += remains;
}
logger.info("\nStarting : Thread {}", i);
logger.info("fromId : {}", idMap.get(fromId) );
logger.info("toId : {}", idMap.get(toId) );
value.putInt("fromId", idMap.get(fromId) );
value.putInt("toId", idMap.get(toId) );
value.putString("name", "Thread" + i);
result.put("partition" + i, value);
fromId = toId + 1;
toId += range;
}
return result;
}
public ArInputDao getArInputDao() {
return arInputDao;
}
public void setArInputDao(ArInputDao arInputDao) {
this.arInputDao = arInputDao;
}
public StatsForMail getStatsForMail() {
return statsForMail;
}
public void setStatsForMail(StatsForMail statsForMail) {
this.statsForMail = statsForMail;
}
}
This is the configuration for the reader and writer:
<bean id="arInputPagingItemReader" class="org.springframework.batch.item.database.JdbcPagingItemReader" scope="step" >
<property name="dataSource" ref="kvrDatasource" />
<property name="queryProvider">
<bean class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean" >
<property name="dataSource" ref="kvrDatasource" />
<property name="selectClause" value="${AR_INPUT_PAGING_ITEM_READER_SELECT}" />
<property name="fromClause" value="${AR_INPUT_PAGING_ITEM_READER_FROM}" /> <property name="whereClause" value="${AR_INPUT_PAGING_ITEM_READER_WHERE}" />
<property name="sortKey" value="${AR_INPUT_PAGING_ITEM_READER_SORT}" />
</bean>
</property>
<!-- Inject via the ExecutionContext in rangePartitioner -->
<property name="parameterValues">
<map>
<entry key="fromId" value="#{stepExecutionContext[fromId]}" />
<entry key="toId" value="#{stepExecutionContext[toId]}" />
</map>
</property>
<property name="pageSize" value="${AR_PAGE_SIZE}" />
<property name="rowMapper" ref="arOutInRowMapper" />
</bean>
<bean id="arOutputWriter"
class="org.springframework.batch.item.database.JdbcBatchItemWriter"
scope="step">
<property name="dataSource" ref="kvrDatasource" />
<property name="sql" value="${SQL_AR_OUTPUT_INSERT}"/>
<property name="itemSqlParameterSourceProvider">
<bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" />
</property>
</bean>
Maybe some one knows how to convert this with modern spring-batch/spring-boot
PS: Don't use a lot of thread otherwise spring batch will lose a lot of time to fill it's own tables. You have to make some benchmark to understand the correct configuration
I also suggest to not use jpa/hibernate with millions of rows, in my case I’ve used jdbcTemplate
EDIT for annotation configuration see this question
Follow an example of configuration with partitioner
#Configuration
#RequiredArgsConstructor
public class JobConfig {
private static final Logger log = LoggerFactory.getLogger(JobConfig.class);
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
#Value(value = "classpath:employees.csv")
private Resource resource;
#Bean("MyJob1")
public Job createJob(#Qualifier("MyStep1") Step stepMaster) {
return jobBuilderFactory.get("MyJob1")
.incrementer(new RunIdIncrementer())
.start(stepMaster)
.build();
}
#Bean("MyStep1")
public Step step(PartitionHandler partitionHandler, Partitioner partitioner) {
return stepBuilderFactory.get("MyStep1")
.partitioner("slaveStep", partitioner)
.partitionHandler(partitionHandler)
.build();
}
#Bean("slaveStep")
public Step slaveStep(FlatFileItemReader<Employee> reader) {
return stepBuilderFactory.get("slaveStep")
.<Employee, Employee>chunk(1)
.reader(reader)
.processor((ItemProcessor<Employee, Employee>) employee -> {
System.out.printf("Processed item %s%n", employee.getId());
return employee;
})
.writer(list -> {
for (Employee item : list) {
System.out.println(item);
}
})
.build();
}
#Bean
public Partitioner partitioner() {
return gridSize -> {
Map<String, ExecutionContext> result = new HashMap<>();
int lines = 0;
try(BufferedReader reader = new BufferedReader(new InputStreamReader(resource.getInputStream()))) {
while (reader.readLine() != null) lines++;
} catch (IOException e) {
throw new RuntimeException(e);
}
int range = lines / gridSize;
int remains = lines % gridSize;
int fromLine = 0;
int toLine = range;
for (int i = 1; i <= gridSize; i++) {
if(i == gridSize) {
toLine += remains;
}
ExecutionContext value = new ExecutionContext();
value.putInt("fromLine", fromLine);
value.putInt("toLine", toLine);
fromLine = toLine;
toLine += range;
result.put("partition" + i, value);
}
return result;
};
}
#StepScope
#Bean
public FlatFileItemReader<Employee> flatFileItemReader(#Value("#{stepExecutionContext['fromLine']}") int startLine, #Value("#{stepExecutionContext['toLine']}") int lastLine) {
FlatFileItemReader<Employee> reader = new FlatFileItemReader<>();
reader.setResource(resource);
DefaultLineMapper<Employee> lineMapper = new DefaultLineMapper<>();
lineMapper.setFieldSetMapper(fieldSet -> {
String[] values = fieldSet.getValues();
return Employee.builder()
.id(Integer.parseInt(values[0]))
.firstName(values[1])
.build();
});
lineMapper.setLineTokenizer(new DelimitedLineTokenizer(";"));
reader.setLineMapper(lineMapper);
reader.setCurrentItemCount(startLine);
reader.setMaxItemCount(lastLine);
return reader;
}
#Bean
public PartitionHandler partitionHandler(#Qualifier("slaveStep") Step step, TaskExecutor taskExecutor) {
TaskExecutorPartitionHandler taskExecutorPartitionHandler = new TaskExecutorPartitionHandler();
taskExecutorPartitionHandler.setTaskExecutor(taskExecutor);
taskExecutorPartitionHandler.setStep(step);
taskExecutorPartitionHandler.setGridSize(5);
return taskExecutorPartitionHandler;
}
#Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setMaxPoolSize(5);
taskExecutor.setCorePoolSize(5);
taskExecutor.setQueueCapacity(5);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
}

We had a similar use case where I had to start off with reading millions of records based on certain criteria as input from a rest endpoint and process it parallelly using 20-30 threads to meet extreme deadlines. But subsequent challenges were that same complex queries were made to database and then partitioned to be shared across generated threads.
Better solution:
We solved it by reading the data once and then internally partition it and pass it to threads initiated.
A typical batch process would have the objective -> to read, make some http calls/ manipulate the data, and write it to a response log table.
Spring batch provides the capability to keep track of the records processed so that a restart can be initiated to pick up the remaining lot to process. An alternative to this can be a flag in your master table to mark the record as processed so it need not be picked during restart.
Multiple challenges faced were :
support of joins in the query reader
partitioning of data.
same record being processed again
Coming to multi processing ->
Lets say you have 10000 records and you need to process 5 records parallelly.
Multiple creative solutions can be implemented but the two most often used that fit all use cases would be
partitioning data on no of records.
partitioning data on the mod of value of index data if numeric.
Considering the memory the machine will be able to serve, a suitable number of threads can be selected. Eg 5. => 10000/5 => each thread would process 2000 records.
Partitioning is a processing to split the ranges and allowing each step execution process to pick it in its own thread and run it. For the above step we will need to split those ranges and pass it while query execution to make it fetch records for the range and continue the process in a separate thread.
Thread 0 : 1–2000
Thread 1 : 2001–4000
Thread 2 : 4001–6000
Thread 3 : 6001–8000
Thread 4 : 8001–10000
Another logic for partitioning would be assigning the threads 0 to 4 and query basis the modulo of the number. But one drawback of this could be that one particular range would receive more load compared to others whereas the previous approach would ensure that everyone gets a fair share.
The split data is passed on to the separate thread which will start processing it and write data at the commit interval ( chunk size ) mentioned in the step.
Code :
READER
#Bean
#StepScope
public JdbcPagingItemReader<YourDataType> dataReaders(
#Value("#{jobParameters[param1]}") final String param1,
#Value("#{stepExecutionContext['modulo']}") Long modulo) throws Exception {
logger.info("Thread started reading for modulo index : " + modulo);
JdbcPagingItemReader<YourDataType> reader = new JdbcPagingItemReader <> ();
reader.setDataSource(getDataSource());
reader.setRowMapper(new YourDataTypeRowMapper());
reader.setQueryProvider(queryProvider(param1, modulo));
return reader;
public OraclePagingQueryProvider queryProvider(String param1, Long modulo) throws Exception {
OraclePagingQueryProvider provider = new OraclePagingQueryProvider();
provider.setSelectclause("your elements to query");
provider.setFromClause("your tables/ joined tables");
provider.setWhereclause("where clauses AND MOD (TO_NUMBER(yourkey) = " + modulo);
Map<String,Order> sortkeys = new HashMap<>();
sortKeys.put("yoursortkey", Order.ASCENDING);
provider.setSortKeys(sortKeys);
return provider;
}
Sample data reader -> param1 is any parameter that user would want to input. modulo is a step execution parameter — passed from the Partitioner object.
Paritioner object if to be used for modulo 5 would have modulo 0|1|2|3|4 and this would spawn 5 threads which would interact with the reader and fetch data for the divided sets.
WRITER
#Bean
public JdbcbatchItemWriter<YourDataType> dataWriter() throws Exception {
logger.info("Initializing data writer");
JdbcBatchItemWriter<YourDataType> databaseItemWriter = new JdbcBatchItemWriter<>();
databaseItemWriter.setDataSource(injectyourdatasourcehere);
databaseItemWriter.setsql(INSERT_QUERY_HERE);
ItemPreparedStatementsetter<RespData> ps = new YourResponsePreparedStatement();
databaseItemWriter.setItemPreparedStatementsetter(ps);
return databaseItemWriter;
}
public class Your ResponsePreparedStatement implements ItemPreparedStatementSetter<RespData> {
public void setValues (RespData respData, PreparedStatement preparedStatement)throws SQLException {
preparedStatement.setString(1, respData.getYourData());
}
}
Response Writer to log response to any table to keep tab of the processed data for analytics or business reporting.
PROCESSOR
#Bean
public ItemProcessor<YourDataType,RespData> processor() {
return new YOURProcessor();
}
Processor where the core logic for the data manipulation would be written. Response returned is of the type which is expected by the Data writer.
If you wish to skip spring batch tables auto creation, overriding batch configuration would solve the issue.
#Configuration
#EnableAutoConfiguration
#EnableBatchProcessing
public class BatchConfiguration extends DefaultBatchConfigurer {
#Override
public void setDataSource(DataSource dataSource) {}
}
else such an exception could be encountered:
at java.lang.Thread.run(Thread.java:829) [?:?]Caused by:
org.springframework.dao.CannotSerializeTransactionException:
PreparedStatementCallback; SQL [INSERT into
BATCH_JOB_INSTANCE(JOB_INSTANCE_ID, JOB_NAME, JOB_KEY, VERSION) values
(?, ?, ?, ?)]; ORA-08177: can’t serialize access for this transaction
; nested exception is java.sql.SQLException: ORA-08177: can’t
serialize access for this transaction
Column Range partitioner can be created as:
#Component
public class ColumnRangePartitioner implements Partitioner {
Map<String,ExecutionContext> result = new HashMap();
#Override
public Map<String,ExecutionContext> partition(int gridsize) {
Map<String,ExecutionContext> result = new HashMap<>();
int start = 0;
while (start < gridSize) {
ExecutionContext value = new ExecutionContext();
result.put("partition : " + start, value);
value.putInt("modulo", start);
start += 1;
}
return result;
}
}
Setting up of job and step
our job will be focusing on executing step1 — which will spawn threads based on the partitioner provided — here columnrange partitioner — to process the step.
Grid size is the no of parallel threads ( modulo to be calculated of using ).
Every processStep step is a series of reading the data for that specific thread assigned modulo, processing it and then writing it.
#Bean
public ColumnRangePartitioner getParitioner () throws Exception {
ColumnRangePartitioner columnRangePartitioner = new ColumnRangePartitioner();
return columnRangePartitioner;
}
#Bean
public Step step1(#Qualifier("processStep") Step processStep,
StepBuilderFactory stepBuilderFactory) throws Exception {
return stepBuilderFactory.get("step1")
.listener(jobCompletionNotifier)
.partitioner(processStep.getName(),getParitioner())
.step(processStep)
.gridSize(parallelThreads)
.taskExecutor(taskExecutor())
.build();
}
#Bean
public Step processStep(
#Qualifier("DataReader") ItemReader<ReadType> reader,
#Qualifier("LogWRITE") ItemWriter<WriterType> writer,
StepBuilderFactory stepBuilderFactory) throws Exception {
return stepBuilderFactory.get("processStep")
.<ReadType,WriterType> chunk(1)
.reader(reader)
.processor(processor())
.writer (writer)
.faultTolerant()
.skip(Exception.class)
.skipLimit(exceptionLimit)
.build();
}
#Bean
public SimpleAsyncTaskExecutor taskExecutor() {
SimpleAsyncTaskExecutor asyncTaskExecutor = new SimpleAsyncTaskExecutor();
return asyncTaskExecutor;
}
#Bean
public Job our JOB (#Qualifier("step1") Step step1, JobBuilderFactory jobBuilderFactory) throws Exception {
return jobBuilderFactory.get("ourjob")
.start(step1)
.incrementer(new RunIdIncrementer())
.preventRestart()
.build();
}
This might be an usual spring batch solution but would be applicable to every migration requirement involving commonly used SQL DB/ java based solutions.
We did add customizations to the application
Avoid executing the join query again and then filtering. complex joins could impact database performance. Hence a better solution would be to fetch the data once and split it internally. Memory used by the application would be huge and the hashmap would be populated with all the data your query would fetch but java is capable of handling that. That fetched data could be passed to the ListItemReader to process list of data for that particular thread parallelly.
For processing parallel requests ( not threads but parallel api calls to this application ) modification can be made to process a certain query once only keeping a lock on it using a semaphore so that other threads are waiting on it. Once lock is release those waiting threads would find that data to be present and db will not be queries again.
The code would for the above impl would be complex for this blog scope. Feel free to ask if any use case is required by your application.
Would love to solve any issues regarding the same. Feel free to reach out to me(Akshay) at akshay.patell1702#gmail.com or my colleague(Sagar) at sagarnagdev61#gmail.com

Related

Spring batch JdbcPagingItemReaders cannot read the data again and finish immediately with read counts 0

I've a JdbcPagingItemReader which goes to Oracle DB and pulls records and writes to Mongo. The step itself is partitioned and I have saveState flag set to false. We don't really need to restart the job again. The reader is #StepScoped and the query provider too. However when the job is ran twice, the second time around it finishes immediately with read counts set to 0. I cannot see anything obviously wrong and there are no errors. I tried to look into batch_step_execution_context to see if it's somehow reusing the previously ran ItemReader which ran to completion but I couldn't see anything in that table related to ItemReader per se. Any ideas on how to go about debugging this?
#Bean
public Step load_A_Step_Partitioned(
Step load_A_Step,
OracleAnIdPartitioner oracleAnIdPartitioner,
TaskExecutor taskExecutor) {
return stepBuilderFactory
.get("load_A_Step_Partitioned")
.partitioner("load_A_Step_Partitioned", oracleAnIdPartitioner)
.step(load_A_Step)
.gridSize(appConfig.getGridSize())
.taskExecutor(taskExecutor)
.build();
}
#Bean
public Step load_A_Step(
JdbcPagingItemReader<SomeDTO> A_Reader,
MongoItemWriter<A> writer,
A_Processor A_processor) {
return stepBuilderFactory
.get("load_A")
.<SomeDTO, A>chunk(jobConfigCommon.getChunkSize())
.reader(A_Reader)
.processor(A_processor)
.writer(writer)
.build();
}
#Bean
#StepScope
public JdbcPagingItemReader<SomeDTO> A_Reader(
PagingQueryProvider A_QueryProvider,
#Qualifier("secondaryDatasource") DataSource dataSource) {
return new JdbcPagingItemReaderBuilder<SomeDTO>()
.name("A_Reader")
.dataSource(dataSource)
.queryProvider(A_QueryProvider)
.rowMapper(new A_RowMapper())
.pageSize(jobConfigCommon.getChunkSize())
.saveState(false)
.build();
}
#Bean
#StepScope
public PagingQueryProvider A_QueryProvider(
#Value("#{stepExecutionContext['ANID']}") String anId,
#Qualifier("secondaryDatasource") DataSource dataSource) {
SqlPagingQueryProviderFactoryBean providerFactory = new SqlPagingQueryProviderFactoryBean();
providerFactory.setDataSource(datasource);
providerFactory.setSelectClause(
"SOME QUERY");
providerFactory.setWhereClause(" anId = '" + anId + "'");
providerFactory.setFromClause(" A TABLE ");
providerFactory.setSortKey("COLUMN_TO_SORT");
try {
return providerFactory.getObject();
} catch (Exception e) {
throw new IllegalStateException("Failed to create A_QueryProvider", e);
}
}

Spring batch using AbstractPaginatedDataItemReader for paginated API call

I am trying to call a paginated API eg. Search API from AbstractPaginatedDataItemReader. I want to keep calling this API till it doesn't have any more data for a page, I am trying to continue the chunk after every page and it seems the batch doesn't get past page 1, here is the code and configuration I am using
Launch context as below
<batch:job id="fileupload">
<batch:step id="readApi">
<batch:tasklet>
<batch:chunk reader="readPaginatedApi" processor="processApiResults"
writer="emailItemWriter" commit-interval="10"/>
</batch:tasklet>
<batch:next on="NEXT_PAGE" to="readPaginatedApi"/>
<batch:end on="END" />
</batch:step>
</batch:job>
And here is the reader snippet
#Component("readPaginatedApi")
#Scope("step")
public class ReadPaginatedApi extends AbstractPaginatedDataItemReader<SearchResponse> {
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
this.setName("READER");
this.setExecutionContextName("READER");
String pageSizeString = stepExecution.getJobParameters().getString("page_size");
if (StringUtils.isNotBlank(pageSizeString) && NumberUtils.isParsable(pageSizeString)) {
try {
pageSize = Integer.parseInt(pageSizeString);
} catch (Exception e) {
e.printStackTrace();
}
}
String pageString = stepExecution.getJobParameters().getString("page");
if (StringUtils.isNotBlank(pageString) && NumberUtils.isParsable(pageString)) {
try {
page = Integer.parseInt(pageString);
} catch (Exception e) {
e.printStackTrace();
}
}
}
#Override
protected Iterator<Payee> doPageRead() {
//Call API
//Return iterator of results or empty iterator
}
#AfterStep
public ExitStatus afterStep(StepExecution stepExecution) {
AtomicInteger pageAtomicInteger = new AtomicInteger(page);
SearchResponse searchResponse = //call service, get response
if (searchResponse != null && CollectionUtils.isNotEmpty(searchResponse.getItems())) {
pageAtomicInteger.set(page + 1);
return new ExitStatus("NEXT_PAGE", String.format("page %d", page));
}
return new ExitStatus("END", String.format("page %d", page));
}
}
What am I missing here? How can I make this work? Is this the right approach for this case?Appreciate any help on this

batch:next, batch:end, etc are used to define the execution flow of the steps of your job. Those are not intended to iterate over all pages of a paging item reader, they are used at a higher level.
What you need to do is extend AbstractPaginatedDataItemReader and implement doPageRead. Your implementation should maintain the state of which page is currently being read, the list of items, etc.

Looking at the equivalent java config and the signature of on and to method, to accepts a Flow, Step or JobExecutionDecider . So I think you need to replace
<batch:next on="NEXT_PAGE" to="readPaginatedApi"/>
with
<batch:next on="NEXT_PAGE" to="readApi"/>

Application context not loading after Spring 4 upgrade

I'm in the process of upgrading the spring framework version used in our webapp from 3.1.4 to 4.1.8. With the new Spring version, A few of our unit tests are failing because #Autowired is no longer working. This is one of the failing tests:
#ContextConfiguration(locations={"/math-application-context.xml"})
public class MathematicaMathServiceTest extends JavaMathServiceTest{
#Autowired
private KernelLinkPool mathematicalKernelPool;
protected static String originalServiceType = System.getProperty("calculation.math.service.type");
#AfterClass
public static void unsetMathServiceType(){
System.clearProperty("calculation.math.service.type");
}
#BeforeClass
public static void setMathServiceType(){
System.setProperty("calculation.math.service.type","Mathematica");
}
#Test
public void testMathematicaService() throws Exception{
try {
acquireKernelAndExecute(0);
Assert.assertEquals(0, mathematicalKernelPool.getBorrowingThreadsCount());
} catch(UnsatisfiedLinkError e) {
System.out.println("Mathematica not installed. Skipping test");
}catch(Exception ex){
if (!ExceptionFormatter.hasCause(ex, MathServiceNotConfiguredException.class)){throw ex;}
if (System.getProperty(MathService.SERVICE_CONFIGURED_SYSTEM_VARIABLE) != null){
throw ex;
}
logger.error("Cannot execute test. Math service is not configured");
}
}
}
This is the KernelLinkPool class:
public class KernelLinkPool extends GenericObjectPool implements InitializingBean{
private static final int RETRY_TIMEOUT_MS = 5000;
private static final long STARTUP_WAIT_TIME_MS = 10000;
private boolean mathematicaConfigured = true;
private PoolableObjectFactory factory;
// ensures that multiple requests from the same thread will be given the same KernelLink object
private static ThreadLocal<KernelLink> threadBoundKernel = new ThreadLocal<KernelLink>();
// holds the number of requests issued on each thread
private static ThreadLocal<Integer> callDepth = new ThreadLocal<Integer>();
private long maxBorrowWait;
private Integer maxKernels;
private boolean releaseLicenseOnReturn;
private Logger logger = LoggerFactory.getLogger(this.getClass());
// (used only for unit testing at this point)
private Map<String,Integer> borrowingThreads = new ConcurrentHashMap<String,Integer>();
public KernelLinkPool(PoolableObjectFactory factory) {
super(factory);
this.factory = factory;
this.setMaxWait(maxBorrowWait);
}
#Override
public Object borrowObject() throws Exception{
return borrowObject(this.maxBorrowWait);
}
public Object borrowObject(long waitTime) throws Exception {
long starttime = System.currentTimeMillis();
if (!mathematicaConfigured){
throw new MathServiceNotConfiguredException();
}
try{
if (callDepth.get() == null){
callDepth.set(1);
}else{
callDepth.set(callDepth.get()+1);
}
KernelLink link = null;
if (threadBoundKernel.get() != null){
link = threadBoundKernel.get();
}else{
//obtain kernelLink from object pool
//retry when borrowObject fail until
//maxBorrowWait is reached
while(true){
try{
logger.debug("Borrowing MathKernel from object pool");
link = (KernelLink) super.borrowObject();
break;
}catch(KernelLinkCreationException ex){
long timeElapsed = System.currentTimeMillis() - starttime;
logger.info("Failed to borrow MathKernel. Time elapsed [" + timeElapsed + "] ms", ex);
if(timeElapsed >= waitTime){
logger.info("Retry timeout reached");
throw ex;
}
Thread.sleep(RETRY_TIMEOUT_MS);
}
}
logger.debug("borrowed [" + link + "]");
threadBoundKernel.set(link);
}
borrowingThreads.put(Thread.currentThread().getName(),callDepth.get());
return link;
}catch(Exception ex){
logger.error("Failed to acquire Mathematica kernel. Borrowing threads [" + borrowingThreads + "]");
throw ex;
}
}
public void returnObject(Object obj) throws Exception {
callDepth.set(callDepth.get()-1);
if (callDepth.get() <= 0){
threadBoundKernel.set(null);
borrowingThreads.remove(Thread.currentThread().getName());
if (releaseLicenseOnReturn){
// will destroy obj
super.invalidateObject(obj);
}
else{
// will park obj in the pool of idle objects
super.returnObject(obj);
}
}else{
borrowingThreads.put(Thread.currentThread().getName(),callDepth.get());
}
}
#Override
public void afterPropertiesSet() throws Exception {
try{
if (maxKernels == 0){
List<KernelLink> links = new ArrayList<KernelLink>();
while (true){
try{
links.add((KernelLink)factory.makeObject());
}catch(KernelLinkCreationException ex){
break;
}
}
if(links.isEmpty()){
logger.warn("No available Mathematica license!");
mathematicaConfigured = false;
return;
}
for (KernelLink link : links){
factory.destroyObject(link);
}
logger.info("Detected number of available Mathematica license = [" + links.size() + "]");
setMaxActive(links.size());
setMaxIdle(links.size());
}else{
if(maxKernels < 0){
logger.info("Set number of Mathematica license to no limit");
}else{
logger.info("Set number of Mathematica license to [" + maxKernels + "]");
}
setMaxActive(maxKernels);
setMaxIdle(maxKernels);
}
Object ob = borrowObject(STARTUP_WAIT_TIME_MS);
returnObject(ob);
mathematicaConfigured = true;
}catch(Throwable ex){
logger.warn("Mathematica kernel pool could not be configured: ", ex.getMessage());
mathematicaConfigured = false;
}
}
public int getBorrowingThreadsCount() {
return borrowingThreads.size();
}
public Integer getMaxKernels() {
return maxKernels;
}
public void setMaxKernels(Integer maxKernels) {
this.maxKernels = maxKernels;
}
public boolean isMathematicaConfigured(){
return mathematicaConfigured;
}
public boolean isReleaseLicenseOnReturn() {
return releaseLicenseOnReturn;
}
public void setReleaseLicenseOnReturn(boolean releaseLicenseOnReturn) {
this.releaseLicenseOnReturn = releaseLicenseOnReturn;
}
public long getMaxBorrowWait() {
return maxBorrowWait;
}
public void setMaxBorrowWait(long maxBorrowWait) {
this.maxBorrowWait = maxBorrowWait;
}
}
The tests are failing with this exception:
org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.etse.math.wolfram.KernelLinkPool] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {#org.springframework.beans.factory.annotation.Autowired(required=true)}
This is the math-application-context file:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd">
<beans profile="unitTest,integratedTest,activeServer">
<bean class="org.springframework.jmx.export.MBeanExporter"
lazy-init="false">
<property name="registrationBehaviorName" value="REGISTRATION_IGNORE_EXISTING" />
<property name="beans">
<map>
<entry key="etse.math:name=MathematicalKernelFactory"
value-ref="mathematicalKernelFactory" />
<entry key="etse.math:name=MathematicalKernelPool" value-ref="mathematicalKernelPool" />
</map>
</property>
</bean>
<bean id="mathService" class="com.etse.math.MathServiceFactoryBean">
<property name="mathServiceType" value="${calculation.math.service.type}"/>
<property name="mathematicaService" ref="mathematicaService"/>
</bean>
<bean id="mathematicaService" class="com.etse.math.wolfram.MathematicaService">
<property name="kernelPool" ref="mathematicalKernelPool" />
<property name="minParallelizationSize" value="${calculation.mathematica.kernel.parallel.batch.size}" />
</bean>
<bean id="mathematicalKernelPool" class="com.etse.math.wolfram.KernelLinkPool"
destroy-method="close">
<constructor-arg ref="mathematicalKernelFactory" />
<property name="maxKernels" value="${calculation.mathematica.max.kernels}" />
<property name="maxBorrowWait"
value="${calculation.mathematica.kernel.borrow.max.wait}" />
<property name="releaseLicenseOnReturn"
value="${calculation.mathematica.kernel.release.license.on.return}" />
</bean>
<bean id="mathematicalKernelFactory" class="com.etse.math.wolfram.KernelLinkFactory">
<property name="debugPackets" value="false" />
<property name="linkMode" value="launch" />
<property name="mathematicaKernelLocation" value="${calculation.mathematica.kernel.location}" />
<property name="mathematicaLibraryLocation" value="${calculation.mathematica.library.location}" />
<property name="mathematicaAddOnsDirectory" value="${calculation.mathematica.addons.directory}" />
<property name="linkProtocol" value="sharedMemory" />
</bean>
</beans>
<beans profile="passiveServer,thickClient,tools">
<bean id="mathService" class="com.etse.math.DummyMathService"/>
</beans>
I also tried using the application context to load the bean, but that failed with the following exception:
org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named 'mathematicalKernelPool' is defined
If I remove the autowired field, the test fails with a NoSuchBeanDefinitionException for another bean (mathService) that is loaded via the application context in a super class. So it appears that the application context from math-application-context is not loaded for some reason. Any idea of what could be happening here? Thank you.
UPDATE:
I took a look at the beans defined in the application context and confirmed that none of the beans defined in math-application-context are present. The application context contains only beans defined in another context file loaded by the super class. Why would it fail to load math-application-context?

At this point I would honestly get rid of the XML config and go total annotation/code based. Create a Config class and have it create any beans you need to be autowired.

It was a profile issue. The super class to the test was using:
#ProfileValueSourceConfiguration(TestProfileValueSource.class)
to set the profile, but it was not working. After removing that annotation I added:
#ActiveProfiles(resolver=TestProfileValueSource.class) and now its working again.

Spring Batch Item Reader is executing only once

Trying to implement Spring batch,but facing a strange problem,Our ItemReader class is executing only once.
Here below is the detail.
If we have 1000 rows in DB.
Our Item reader fetch 1000 rows from DB,and pass list to ItemWriter
ItemWriter successfully delete all items.
Now ItemReader again tries to fetch the data from DB,but did not find,hence returns NULL,so execution stops.
But we have configured batch to be executed with Quartz scheduler,which is every minute.
Now if we insert let say 1000 rows in DB by dump import,the batch job should pick this data in next execution,but it is not even executing,although
JobLauncher is executing.
Configuration :-
1.We have ItemReader,ItemWriter with commit interval equals to 1.
<batch:job id="csrfTokenBatchJob">
<batch:step id="step1">
<tasklet>
<chunk reader="csrfTokenReader" writer="csrfTokenWriter" commit-interval="1"></chunk>
</tasklet>
</batch:step>
</batch:job>
2.Job is scheduled to be triggered at every minute.
<bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean">
<property name="triggers">
<bean id="cronTrigger" class="org.springframework.scheduling.quartz.CronTriggerBean">
<property name="jobDetail" ref="jobDetail" />
<property name="cronExpression" value="0 0/1 * * * ?" />
</bean>
</property>
</bean>
3.Job configuration
<bean id="jobDetail" class="org.springframework.scheduling.quartz.JobDetailBean">
<property name="jobClass" value="com.tavant.oauth.batch.job.CSRFTokenJobLauncher" />
<property name="jobDataAsMap">
<map>
<entry key="jobName" value="csrfTokenCleanUpBatchJob" />
<entry key="jobLocator" value-ref="jobRegistry" />
<entry key="jobLauncher" value-ref="jobLauncher" />
</map>
</property>
</bean>
First time it is executing successfully,but later it does not execute,but i can see in logs that JobLauncher is executing.
#Component("csrfTokenReader")
#Scope(value="step")
public class CSRFTokenReader implements ItemReader<List<CSRFToken>> {
private static final Logger logger = LoggerFactory.getLogger(CSRFTokenReader.class);
#Autowired
private CleanService cleanService;
#Override
public List<CSRFToken> read() {
List<CSRFToken> csrfTokenList = null;
try{
int keepUpto = Integer.valueOf(PropertiesContext.getInstance().getProperties().getProperty("token.keep", "1"));
Calendar calTime = Calendar.getInstance();
calTime.add(Calendar.HOUR, -keepUpto);
Date toKeep = calTime.getTime();
csrfTokenList = cleanService.getCSRFTokenByTime(toKeep);
}
catch(Throwable th){
logger.error("Exception in running job At " + new Date() + th);
}
if(CollectionUtils.isEmpty(csrfTokenList)){
return null;
}
return csrfTokenList;
}
}
EDIT:--
public class CSRFTokenJobLauncher extends QuartzJobBean {
static final String JOB_NAME = "jobName";
private JobLocator jobLocator;
private JobLauncher jobLauncher;
public void setJobLocator(JobLocator jobLocator) {
this.jobLocator = jobLocator;
}
public void setJobLauncher(JobLauncher jobLauncher) {
this.jobLauncher = jobLauncher;
}
#Override
protected void executeInternal(JobExecutionContext context) {
Map<String, Object> jobDataMap = context.getMergedJobDataMap();
String jobName = (String) jobDataMap.get(JOB_NAME);
log.info("Quartz trigger firing with Spring Batch jobName="+jobName);
JobParameters jobParameters = getJobParametersFromJobMap(jobDataMap);
try {
jobLauncher.run(jobLocator.getJob(jobName), jobParameters);
}
catch (JobExecutionException e) {
log.error("Could not execute job.", e);
}
}
private JobParameters getJobParametersFromJobMap(Map<String, Object> jobDataMap) {
JobParametersBuilder builder = new JobParametersBuilder();
for (Entry<String, Object> entry : jobDataMap.entrySet()) {
String key = entry.getKey();
Object value = entry.getValue();
if (value instanceof String && !key.equals(JOB_NAME)) {
builder.addString(key, (String) value);
}
else if (value instanceof Float || value instanceof Double) {
builder.addDouble(key, ((Number) value).doubleValue());
}
else if (value instanceof Integer || value instanceof Long) {
builder.addLong(key, ((Number)value).longValue());
}
else if (value instanceof Date) {
builder.addDate(key, (Date) value);
}
}
return builder.toJobParameters();
}
}

After hours of time wasting,the problem seems to be solved now,i have configured allow-start-if-complete="true" in tasklet.Now Batch Item Reader is executing as per schedule.
<batch:job id="csrfTokenBatchJob">
<batch:step id="step1">
<batch:tasklet allow-start-if-complete="true">
<batch:chunk reader="csrfTokenReader" writer="csrfTokenWriter" commit-interval="1"></batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>

Spring batch records every job execution in database. Which is why spring batch need to differentiate every job run. It checks whether the job is already executed on the same day and it would not start again unless any job parameter varies from previous run or allow start if complete setting is enabled.
OPTION1:- As mentioned above answer we can use allow-start-if-complete="true"
OPTION2:- Always pass a job parameter which is a current date time stamp. This way job parameter value is always unique.
JobExecution jobExecution = jobLauncher.run(reportJob, new JobParametersBuilder()
.addDate("now", new Date()).build());
OPTION3:- Use an incrementor for example RunIdIncrementer so we do not need to make sure to pass unique job parameter every time.
#Bean
public Job job1(JobBuilderFactory jobs, Step s1) {
return jobs.get("job1")
.incrementer(new RunIdIncrementer())
.flow(s1)
.end()
.build();
}

Hibernate 3 : optimistic locking unit test

I'm trying to test my optimistic locking implementation of my application. However the result is not what I expect. The steps I take to test are the following
load an entity from the database
set the version attribute to one less than present in the database
change another attribute thats just a string to something else
save the entity
I expected a staleException now, however the entity just gets saved and the version increases to the next in line.
Here is a small extract of my save code
public <T extends DomainObject> T save(T objectToSave) {
Session currentSession = null;
try {
currentSession = sessionFactory.getCurrentSession();
currentSession.save(objectToSave);
return objectToSave;
} catch (Exception ex) {
logger.error("save error",ex);
}
return null;
}
I load objects by id with named queries thru my entire application with following code
#SuppressWarnings("unchecked")
public <T extends Object> List<T> query(Class<T> returnClass, String query, List<String> namedParams, List<? extends Object> params, Integer limit) {
Session currentSession = null;
try {
currentSession = sessionFactory.getCurrentSession();
Query namedQuery = currentSession.getNamedQuery(query);
if(limit != null){
namedQuery.setMaxResults(limit);
}
namedQuery.setCacheable(true);
if (namedParams != null && namedParams.size() > 0) {
addParams(namedQuery, namedParams, (List<Object>) params);
}
return namedQuery.list();
} catch (Exception ex) {
logger.error("query error",ex);
}
return null;
}
And this is the configuration of my sessionfactory
<bean id="sessionFactory"
class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">
<property name="entityInterceptor" ref="auditInterceptor" />
<property name="dataSource" ref="dataSource" />
<property name="hibernateProperties">
<value>
hibernate.dialect=${hibernate.dialect}
hibernate.show_sql=${hibernate.show_sql}
hibernate.cache.provider_class=org.hibernate.cache.EhCacheProvider
hibernate.cache.region.factory_class=net.sf.ehcache.hibernate.EhCacheRegionFactory
hibernate.cache.use_query_cache=true
hibernate.cache.use_second_level_cache=true
hibernate.cache.provider_configuration_file_resource_path=ehcache.xml
hibernate.generate_statistics=true
hbm2ddl.auto=${hbm2ddl.auto}
hibernate.c3p0.min_size=${hibernate.c3p0.min_size}
hibernate.c3p0.max_size=${hibernate.c3p0.max_size}
hibernate.c3p0.timeout=${hibernate.c3p0.timeout}
hibernate.c3p0.max_statements=${hibernate.c3p0.max_statements}
hibernate.c3p0.idle_test_period=${hibernate.c3p0.idle_test_period}
</value>
</property>
<property name="schemaUpdate">
<value>true</value>
</property>
<property name="annotatedClasses">
<list>
<value>com.mbalogos.mba.domain.site.Site</value>
</list>
</property>
</bean>
Is there anything I missed on the configuration of the sessionFactory or do I test it completly wrong and should I test it in a different way ?
Thanks in advance

Regressing the version field is dangerous as it is Hibernate controlled (if you're using the JPA versioning functionality). Load in an entity, detach it (maybe use session.evict(obj)), alter an attribute. Load in the same entity, keep it attached, alter it, save it. Reattach the first entity and attempt to save it (I think Merge will do this). You should then see the StaleObjectStateException exception.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

spring batch and Multi-threaded step - java

Related

Spring batch JdbcPagingItemReaders cannot read the data again and finish immediately with read counts 0

Spring batch using AbstractPaginatedDataItemReader for paginated API call

Application context not loading after Spring 4 upgrade

Spring Batch Item Reader is executing only once

Hibernate 3 : optimistic locking unit test

Categories

Resources