Set chunksize dynamically after fetching from db - java

I need to set the chunk-size dynamically in a spring batch job's step which is stored in the database i.e the chunksize needs to be fetched from the database and set into the bean.
My Query is something like:
select CHUNK_SIZE from SOME_TABLE_NAME where ID='some_id_param_value'
Here the value for ID would come from the job parameters which is set via a request param passed with the request into the Rest Controller(while triggering the batch job)
I want to fetch this CHUNK_SIZE from the database and set it dynamically into the job's step.
Our requirement is that the chunksize varies for the step based on the ID value, the details of which are stored in a db table. For example:
ID
CHUNK_SIZE
01
1000
02
2500
I know that the beans in a job are set at the configuration time, and the job parameters are passed at the runtime while triggering the job.
EDIT:
The example provided by MahmoudBenHassine uses #JobScope and accesses the jobParameters in the step bean using #Value("#{jobParameters['id']}"). I tried implementing a similar approach using the jobExecutionContext as follows:
Fetched the chunkSize from the db table in the
StepExecutionListener's beforeStep method and set it in the
ExecutionContext.
Annotated the step bean with #JobScope and used
#Value("#{jobExecutionContext['chunk']}") to access it in the step
bean.
But I face the following error:
Error creating bean with name 'scopedTarget.step' defined in class path resource [com/sample/config/SampleBatchConfig.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.batch.core.Step]: Factory method 'step' threw exception; nested exception is java.lang.NullPointerException
It is not able to access the 'chunk' key-value from the jobExecutionContext, thus throwing the NullPointerException.
Does it need to be promoted somehow so that it can be accessed in the step bean? If yes, a quick sample or a direction would be really appreciated.
My Controller class:
#RestController
public class SampleController {
#Autowired
JobLauncher sampleJobLauncher;
#Autowired
Job sampleJob;
#GetMapping("/launch")
public BatchStatus launch(#RequestParam(name = "id", required = true) String id){
Map<String, JobParameter> map = new HashMap<>();
map.put("id", new JobParameter(id));
map.put("timestamp", new JobParameter(System.currentTimeMillis));
JobParameters params = new JobParameters(map);
JobExecution j = sampleJobLauncher.run(sampleJob, params);
return j.getStatus();
}
}
My batch config class(containing job and step bean):
#Configuration
public class SampleBatchConfig{
#Autowired
private JobBuilderFactory myJobBuilderFactory;
#Autowired
private StepBuilderFactory myStepBuilderFactory;
#Autowired
private MyRepoClass myRepo; // this class contains the jdbc method to fetch chunksize from the db table
#Autowired
MyReader myReader;
#Autowired
MyWriter myWriter;
#Bean
#JobScope
public Step sampleStep(#Value("#{jobExecutionContext['chunk']}") Integer chunkSize){
return myStepBuilderFactory.get("sampleStep")
.<MyClass, MyClass>chunk(chunkSize) //TODO ~instead of hardcoding the chunkSize or getting it from the properties file using #Value, the requirement is to fetch it from the db table using the above mentioned query with id job parameter and set it here
.reader(myReader.sampleReader())
.writer(myWriter.sampleWriter())
.listener(new StepExecutionListener() {
#Override
public void beforeStep(StepExecution stepExecution) {
int chunk = myRepo.findChunkSize(stepExecution.getJobExecution().getExecutionContext().get("id")); // this method call fetches chunksize from the db table using the id job parameter
stepExecution.getJobExecution().getExecutionContext().put("chunk", chunk);
}
#Override
public ExitStatus afterStep(StepExecution stepExecution) {
return null;
}
})
.build();
}
#Bean
public Job job(){
return myJobBuilderFactory.get("sampleJob")
.incrementer(new RunIdIncrementer())
.start(sampleStep(null))
.build();
}
}
NOTE:
The job may have multiple steps with different chunkSizes, and in that case chunkSize is to be fetched separately for each step.
EDIT 2:
Changing my step definition as follows works, but there is a problem.
Here the reader reads a list having 17 items, in a chunk of size 4.
#Bean
#JobScope
public Step sampleStep(#Value("#{jobParameters['id']}") Integer id){
int chunkSize = myRepo.findChunkSize(id); // this method call fetches chunksize from the db table using the id job parameter
return myStepBuilderFactory.get("sampleStep")
.<MyClass, MyClass>chunk(chunkSize)
.reader(myReader.sampleReader())
.writer(myWriter.sampleWriter())
.listener(new ChunkListenerSupport() {
#Override
public void afterChunk(ChunkContext context) {
System.out.println("MyJob.afterChunk");
}
#Override
public void beforeChunk(ChunkContext context) {
System.out.println("MyJob.beforeChunk");
}
})
.build();
}
The first time I trigger the job from the url, it works fine and prints the following: (The chunk Size is set to 4 in the db table)
2021-05-03 15:06:44.859 INFO 11924 --- [nio-8081-exec-1] o.s.batch.core.job.SimpleStepHandler : Executing step: [sampleStep]
MyJob.beforeChunk
item = 1
item = 2
item = 3
item = 4
MyJob.afterChunk
MyJob.beforeChunk
item = 5
item = 6
item = 7
item = 8
MyJob.afterChunk
MyJob.beforeChunk
item = 9
item = 10
item = 11
item = 12
MyJob.afterChunk
MyJob.beforeChunk
item = 13
item = 14
item = 15
item = 16
MyJob.afterChunk
MyJob.beforeChunk
item = 17
MyJob.afterChunk
But if I trigger the job again, without restarting the server/spring container, the following is printed:
2021-05-03 15:11:02.427 INFO 11924 --- [nio-8081-exec-4] o.s.batch.core.job.SimpleStepHandler : Executing step: [sampleStep]
MyJob.beforeChunk
MyJob.afterChunk
In Short, it works fine for exactly once, when the server is restarted. But it doesn't work for the subsequent job executions without restarting the server.

Since you pass the ID as a job parameter and you want to get the chunk size dynamically from the database based on that ID while configuring the step, you can use a job-scoped step as follows:
#Bean
#JobScope
public Step sampleStep(#Value("#{jobParameters['id']}") Integer id){
int chunkSize = myRepo.findChunkSize(id); // this method call fetches chunksize from the db table using the id job parameter
return myStepBuilderFactory.get("sampleStep")
.<MyClass, MyClass>chunk(chunkSize)
.reader(myReader.sampleReader())
.writer(myWriter.sampleWriter())
.build();
}

Related

How to pass arguments from slave steps to reader in Spring Batch?

I have a spring batch process that is reading data from a database. Basically what happens is I have a SQL query that needs to get data by a column (type) value. That column has 50 different values. So there are 50 queries and each is executed on a separate slave step. But the query is building inside the Reader. So I need to pass each type to the Reader to build the query and read data. I am using Partitioner to separate the query with Offset and Limit.
Here is the code I have,
private Flow flow(List<Step> steps) {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setConcurrencyLimit(1);
return new FlowBuilder<SimpleFlow>("flow")
.split(taskExecutor).add(steps.stream().map(step -> new FlowBuilder<Flow>("flow_" + step.getName())
.start(step).build()).toArray(Flow[]::new)).build();
}
#Bean
public Job job() {
List<Step> masterSteps = TYPES.stream().map(this::masterStep).collect(Collectors.toList());
return jobBuilderFactory.get("job")
.incrementer(new RunIdIncrementer())
.start(flow(masterSteps))
.end()
.build();
}
#Bean
#SneakyThrows
public Step slaveStep(String type) {
return stepBuilderFactory.get("slaveStep")
.<User, User>chunk(100)
.reader(reader(type, 0, 0))
.writer(writer())
.build();
}
#Bean
#SneakyThrows
public Step masterStep(String type) {
return stepBuilderFactory.get("masterStep")
.partitioner(slaveStep(type).getName(), partitioner(0))
.step(slaveStep(type))
.gridSize(5)
.taskExecutor(executor)
.build();
}
#Bean
#StepScope
#SneakyThrows
public JdbcCursorItemReader<User> reader(String type,
#Value("#{stepExecutionContext['offset']}") Integer offset,
#Value("#{stepExecutionContext['limit']}") Integer limit) {
String query = MessageFormat.format(SELECT_QUERY, type, offset, limit); // Ex: SELECT * FROM users WHERE type = 'type' OFFSET 500 LIMIT 1000;
JdbcCursorItemReader<User> itemReader = new JdbcCursorItemReader<>();
itemReader.setSql(query);
itemReader.setDataSource(dataSource);
itemReader.setRowMapper(new UserMapper());
itemReader.afterPropertiesSet();
return itemReader;
}
#Bean
#StepScope
public ItemWriter<User> writer() {
return new Writer();
}
#Bean
#StepScope
public Partitioner partitioner(#Value("#{jobParameters['limit']}") int limit) {
return new Partitioner(limit);
}
The issue I am using is to reader() method the type value is not passing. And even when I am adding #Bean annotation it is saying Could not autowire. No beans of 'String' type found.. If I didn't put #Bean offset and limit is always 0 because #Value is not populating. Right now when I am executing the batch nothing happens inside reader because type is null. When I am hardcoding the value it is working. So how can I fix this? Thanks in advance.
If you are iterating every TYPE and execute masterStep, why don't you remove that TYPE logic and instead you can SELECT * FROM table OFFSET ? LIMIT ? and handle offset and limit inside Partitioner? And then your 5 threads will handle this. If your final goal is to process every record in that table then you can simply use this without worrying about TYPE and executing them inside separate Step.

How to skip any error in chunk and to continue with next chunk items?

I created Reader, Processor and Writer.
I defined size of chunk as 5.
I have one operation in Processor for each item.
I have two transactions in Writer. Update DB for all 5 items and confirm transaction for all 5 items on another place.
My items don't depends on each other, so if one of them failed, the other items doesn't care, they want to be proceeded.
Use Case 1:
If it failed in Processor with any kind of exception (RESTful exception, any java exception, DB exception, runtime exception), let say 2nd item, I want to continue with 3rd, 4th and 5th item.
If it failed on 4th item I want to continue with 5th item.
So, with skip, as I understand, when this chunk with failed item in Processor failed I can repeat this chunk but without 2nd and 4th items (which failed), right?
And if Writer goes well, both transactions are commited after chunk and jog start next chunk with next 5 items, right?
Use Case 2:
No matter if chunk is new or repeated Use Case 1 without that 2 items, if in Writer failed second transaction I want to rollback first transaction without manually doing rollback and commit.
So, if Write throws exception it will automatically rollback first transaction.
And that is good.
But what I want is that even there was exception and transaction rolled back (for that chunk), I want to continue with next chunk on same way, with same behaviour, and so on to the last chunk.
To achive Use Case 1 I guess I have to configure step as:
#Configuration
#EnableBatchProcessing
#EnableScheduling
#Slf4j
public class BatchConfiguration {
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
private final MyItemReader myItemReader;
private final MyItemProcessor myItemProcessor;
private final MyItemWriter myItemWriter;
private final SimpleJobExecutionListener simpleJobExecutionListener;
private final MyChunkListener myChunkListener;
private final ApplicationContext applicationContext;
private final DataSource dataSource;
public BatchConfiguration(
JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
MyItemReader myItemReader,
MyItemProcessor myItemProcessor,
MyItemtWriter myItemWriter,
SimpleJobExecutionListener simpleJobExecutionListener,
MyChunkListener myChunkTransactionListener,
DataSource dataSource,
ApplicationContext applicationContext) {
this.jobBuilderFactory = jobBuilderFactory;
this.stepBuilderFactory = stepBuilderFactory;
this.myItemReader = myItemReader;
this.myItemProcessor = myItemProcessor;
this.myItemWriter = myItemWriter;
this.simpleJobExecutionListener = simpleJobExecutionListener;
this.myChunkListener = myChunkListener;
this.dataSource = dataSource;
this.applicationContext = applicationContext;
}
#Bean
public Job registrationChunkJob() {
return jobBuilderFactory.get("MyJob")
.incrementer(new RunIdIncrementer())
.listener(simpleJobExecutionListener)
.flow(step()).end().build();
}
#Bean
TaskExecutor taskExecutorStepPush() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(2);
taskExecutor.setMaxPoolSize(20);
taskExecutor.setQueueCapacity(4);
taskExecutor.setAllowCoreThreadTimeOut(true);
taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
taskExecutor.setThreadNamePrefix(LoggingUtil.getWeblogicName() + "-");
return taskExecutor;
}
#Bean
public Step step() {
DefaultTransactionAttribute attribute = new DefaultTransactionAttribute();
attribute.setPropagationBehavior(Propagation.REQUIRED.value());
attribute.setIsolationLevel(Isolation.READ_COMMITTED.value());
return stepBuilderFactory.get("myStep").<MyObject, MyObject>chunk(5)
.reader(myItemReader)
.processor(myItemProcessor)
.faultTolerant()
.writer(myItemWriter)
.listener(myChunkListener)
.taskExecutor(taskExecutorStepPush())
.throttleLimit(5)
.transactionAttribute(attribute)
.build();
}
My job is not scheduled. I start next job manually when current job is finished, successful or not.
As I said, I don't change flag in DB from Writer, so if it failed and some data are skipped and not updated in DB (Writer), when job is finished and after 1h it will start new job and try with same (and maybe new) items from DB (Reader will select them because flag will not be updated as processed).
But somehow, this doesn't work and it's late and I can't see why.
It takes 5 items in chunk, and it didn't failed in Processor, but it failed in Writer while tried to commit 2 transactions (second one failed). It repeat chunk but only with one item, with first item and tried it 2 times (with one item, first item) and than marked job as Failed and stop. Which I don't want. There is so much items to be selected from DB which could be good one.
I don't want to repeat same chunk if it failed from Writer. I want to repeat chunk only if it failed in Processor (to get good one only).
Also, if chunk failed, I don't want job to stop, I want job to continue with next chunk and so on...
How to achieve this?
How to skip any error in chunk and to continue with next items?
To do that, you need to configure which exceptions should cause the item to be skipped, as explained it the Configuring Skip Logic section.
According to your configuration, you did not specify any skippable exception. Your step definition should be something like:
#Bean
public Step step() {
DefaultTransactionAttribute attribute = new DefaultTransactionAttribute();
attribute.setPropagationBehavior(Propagation.REQUIRED.value());
attribute.setIsolationLevel(Isolation.READ_COMMITTED.value());
return stepBuilderFactory.get("myStep").<MyObject, MyObject>chunk(5)
.reader(myItemReader)
.processor(myItemProcessor)
.faultTolerant()
// add skip configuration
.skipLimit(10)
.skip(MySkippableException.class)
.writer(myItemWriter)
.listener(myChunkListener)
.taskExecutor(taskExecutorStepPush())
.throttleLimit(5)
.transactionAttribute(attribute)
.build();
}

Axon MongoDB - message='E11000 duplicate key error collection uniqueAggregateIndex dup key: { : "101", : 0 }

In my application, we are using axon 3.3.3 and mongo db as event store
we have situation to save all events whenever user updated his profile information.
below is use case
user has created his profile (aggregate id : 101)
In mongodb(CreateEvent) has been saved with 101 aggregate id.
user has updated his profile info,
so we would like to store UpdateEvent in mongo db(event store)
But getting below exception
13:52:49.643 [http-nio-7030-exec-3] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.axonframework.commandhandling.model.ConcurrencyException: An event for aggregate [101] at sequence [0] was already inserted] with root cause
com.mongodb.MongoBulkWriteException: Bulk write operation error on server 127.0.0.1:27017. Write errors: [BulkWriteError{index=0, code=11000, message='E11000 duplicate key error collection: mytest.domainevents index: uniqueAggregateIndex dup key: { : "101", : 0 }', details={ }}].
at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:176)
at com.mongodb.connection.BulkWriteBatchCombiner.throwOnError(BulkWriteBatchCombiner.java:205)
at com.mongodb.connection.BulkWriteBatchCombiner.getResult(BulkWriteBatchCombiner.java:146)
at com.mongodb.operation.BulkWriteBatch.getResult(BulkWriteBatch.java:227)
at com.mongodb.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:276)
so how can i save the updated event ?
Below is the uniqueAggregateIndex in mongo db
{
"aggregateIdentifier" : 1,
"sequenceNumber" : 1
}
#Value("${mongo.host:127.0.0.1}")
private String mongoHost;
#Value("${mongo.port:27017}")
private int mongoPort;
#Value("${mongo.db:mytest}")
private String mongoDB;
#Bean
public MongoSagaStore sagaStore() {
return new MongoSagaStore(axonMongoTemplate());
}
#Bean
public TokenStore tokenStore(Serializer serializer) {
return new MongoTokenStore(axonMongoTemplate(), serializer);
}
#Bean
public EventStorageEngine eventStorageEngine(Serializer serializer) {
return new MongoEventStorageEngine(serializer, null, axonMongoTemplate(), new DocumentPerEventStorageStrategy());
}
#Bean
public MongoTemplate axonMongoTemplate() {
return new DefaultMongoTemplate(mongo(), mongoDB);
}
#Bean
public MongoClient mongo() {
MongoFactory mongoFactory = new MongoFactory();
mongoFactory.setMongoAddresses(Collections.singletonList(new ServerAddress(mongoHost, mongoPort)));
return mongoFactory.createMongo();
}
Axon uses that index to guarantee that there are no concurrent actions on Aggregates. After all, an Aggregate is a consistency boundary, and all state changes on it should be atomic and highly consistent.
Changing the index to be non-unique is a bad idea. It would only allow for insertion of events that conflict with other events already in the events store.
Given that the problem seems to be at sequence #0, it may be that you accidentally modelled your #CommandHandler method as a constructor. Axon treats these command handlers specially, by creating a new instance, rather than attempt to load an existing one. In your case, an aggregate already exists, as some events were already stored.

Unable to get all Records based on different pageSize and ChunkSize while using JpaPagingItemReader

I need to scoped 14 records.
chunkSize is 10
page Size is 2.
It is scoping only 10 records.
I checked with different ways.
chunkSize = 5
pageSize = 10
Still scoped only 10 records not all 14.
it works fine only if chunksize =11 and pageSize =10 or chunkSize = 10 and pageSize = 20
build.gradle
partition:
defaultPartitionSize: 5
partitionScopeChunkSize: 10
jobs:
jpaPagingSize: 2
===================ReaderClass============================
public class PagingItemReader extends
JpaPagingItemReader<ScopeParams> {
public PagingItemReader (
EntityManager entityManager,
EntityManagerFactory entityManagerFactory,
#Value("${spring.jobs.jpaPagingSize}") int jpaPagingSize)
Map<String, Object> parameterValues = new HashMap<>();
this.setQueryProvider(
ScopeParamsQueryProvider.buildForContinuousMatchScoping(
entityManager,
IndustryCodes.valueFromCode(industryCd)));
this.setEntityManagerFactory(entityManagerFactory);
this.setPageSize(jpaPagingSize);
this.setSaveState(true);
this.setParameterValues(parameterValues);
}
}
==============WriterClass==========
public class JpaItemWriter<T> extends JpaItemWriter<T> {
private JpaRepository<T, ? extends Serializable> repository;
public JpaItemWriter(JpaRepository<T, ?> repository) {
this.repository = repository;
}
#Override
#Transactional
public void write(List<? extends T> items) {
persistEntities(items);
}
private void persistEntities(List<? extends T> list) {
list.stream()
.peek(item -> log.info("Writing={}", item))
.forEach(repository::save);
}
}
===================Step Configuration========
public Step WorkStep(StepBuilderFactory stepBuilderFactory,
PagingItemReader ItemReader,
ItemProcessor ItemProcessor,
JpaItemWriter<Scope> itemWriter) {
return stepBuilderFactory.get(WORK_MATCH)
.<Scope, ExecutionScope>chunk(10)
.reader(ItemReader)
.processor(ItemProcessor)
.writer(itemWriter)
.build();
}
Processor Code,
public class MatchItemProcessor implements ItemProcessor<Scope,ExecutionScope> {
public ExecutionScope process(Scope financialTransaction) throws Exception {
return batchExecutionScope;
}
}
private ExecutionScope prepareData(Scope transaction) { ExecutionScope executionScope = new ExecutionScope(); executionScope .setIndustryTypeCode(financialTransaction.getIndustryTypeCode()); return executionScope ; }
I am updating other object in processor with same fields on which reading happens. So i am reading "Scope" entity in reader class . In processor class creating execitionScope object and updating values based on scope and persisting execitionScope in DB.
Both entities are pointing to different tables. ScopeParam hit fin_t
table and ExecutionScope hit exec_scope table.
please provide me suggestion.
Issue has been resolved.
I got help with this link.
Spring batch jpaPagingItemReader why some rows are not read?
Actual Issue
JPAPagingItemReader uses offsets and limits, and if your scoping query output gets modified as part of your writer/chunking, then the next page would already have a modified data set and the offset would keep on skipping unprocessed data.
Since our scoping query ignore transactions already scoped as part of any active batch, as soon as the first paged set gets chucked they fall under the omission.
Solution
Modified my scoping query and ignore current running job.

How to trigger multiple "child" Jobs from a "mother" Job using Spring Batch?

I have a Job that looks into a configuration table and I would like it to trigger a Job per configuration entry using the values from the table.
The first Job comprises of a single Tasklet that does the config table lookup then triggers a subsequent Job per entry.
Mother Job code:
#Configuration
public class RunProcessesJobConfig {
... // steps, jobs factory init
#Autowired
private Tasklet runProcessTask;
#Bean
public Step runProcessStep() {
return steps.get("runProcessStep")
.tasklet(runProcessTask)
.build();
}
#Bean
public Job runProcessJob() {
return jobs.get("runProcessJob")
.start(runProcessStep())
.build();
}
}
The way I'm currently trying to implement it is by Autowiring a JobLauncher and the Job I need into the Tasklet and running the Job from there.
'RunProcessTask' - gets autowired into job config above:
#Autowired
Job myJob;
#Override
public RepeatStatus execute(StepContribution sc, ChunkContext cc) throws Exception {
List<DeployCfg> deployCfgs = da.getJobCfgs();
for (DeployCfg deployCfg : deployCfgs) {
String cfg1 = deployCfg.getCfg1();
String cfg2 = deployCfg.getCfg2();
String cfg3 = deployCfg.getCfg3();
// trigger job per config
new JobParametersBuilder()
.addString("cfg1", cfg1)
.addString("cfg2", cfg2)
.addString("cfg3", cfg3);
final JobExecution jobExec = jobLauncher.run(myJob, new JobParameters());
}
return RepeatStatus.FINISHED;
}
When I try executing the Mother Job I get an TransactionSuspensionNotSupportedException: Transaction manager [org.springframework.batch.support.transaction.ResourcelessTransactionManager] does not support transaction suspension error along the jobLauncher.run(...) line.
I'm thinking running a Job within another Job messes up with Spring Batch's Transaction Manager. Any ideas on how to do this?
Additional version info:
spring-boot-starter-parent version 1.5.9.RELEASE
spring-boot-starter-batch

Categories