Spring Batch - Pass data between Processor & Writer

Spring Batch - Pass data between Processor & Writer - java

I've a spring batch which contains reader->processor->writer.
Data passed b/w is of type Emp:
class Emp {
iny id;
String name;
EmpTypeEnum empType; // HR, Dev, Tester, etc.
// getters and setters
}
As a simple batch data is read from a CSV file in Reader, some processing inside Processor & and an output CSV file is written by Writer.
But apart from this output CSV file, I want to generate a secondary output file which only contains count of each EmpType, i.e. Total number of HR, Dev & Tester.
I was thinking of performing the counting within the processor only, like:
public class EmpItemProcessor implements ItemProcessor<Emp, Emp> {
int countHr;
int countDev;
int countTester;
#Override
public Person process(final Emp emp) throws Exception {
if (item.getEmpType.equals(EmpTypeEnum.HR) {
countHr++;
} else if // .....
// other processor on emp
return emp;
}
}
But as you can see I can only return Emp from Processor, so how can I pass countHr, countDev, etc. from processor & use it to create secondary file?
Please suggest. If you think any other approach will be better, please suggest.
Thanks

You could use ItemWriteListener and JobExecutionListenerSupport for this.
Define a ItemWriteListener , Which will be called after calling your writer every time.
In this Listener update a counter in execution context every time
Write a JobExecutionListener which will be called after the whole job is completed, where you can read the value from execution context and do further processing.
#Component
#JobScope
public class EmployeeWriteListener implements ItemWriteListener<Emp> {
#Value("#{jobExecution.executionContext}")
private ExecutionContext executionContext;
#Override
public void afterWrite(final List<? extends Emp> paramList) {
final int counter =
this.executionContext.getInt("TOTAL_EXPORTED_ITEMS", 0);
this.executionContext.putInt("TOTAL_EXPORTED_ITEMS", counter + 1);
}
}
}
#Component
#JobScope
public class EmployeeNotificationListener extends JobExecutionListenerSupport {
#Override
public void afterJob(final JobExecution jobExecution) {
jobExecution.getExecutionContext()
.getInt("TOTAL_EXPORTED_ITEMS")
...................
}
}
You should register these listeners when you declare your step and job .
this.jobBuilders.get("someJob").incrementer(new RunIdIncrementer()).listener(new EmployeeNotificationListener())
.flow(this.getSomeStep()).end().build();
//instead of new(..) you should Autowire listener
public Step getSomeStep() {
return stepBuilders.get("someStep").<X, Y>chunk(10)
.reader(this.yourReader).processor(this.yourProcessor)
.writer(this.yourProcessor).listener(this.EmployeeWriteListener)
.build();
}

Basically you need multple ItemWriter's to process two different writing tasks. You can easily use CompositeItemWriter which has the capability of holding list of different ItemWriter within it. And on each item it will call all it's ItemWriter.
In your case,
Make two FlatFileItemWriter - one for your normal CSV output & other for your statistics.
Then create CompositeItemWriter<Emp> object and add both these FlatFileItemWriter<Emp> into it using this method of it - public void setDelegates(List<ItemWriter<Emp>> delegates)
Use this CompositeItemWriter as you ItemWriter in the step
So, when your CompositeItemWriter is called it will delegate to both the ItemWriter in order you have added to the list.
Job done :)

Related

Spring batch: reader gave one item, processor have to extract many from it [duplicate]

I'm writing a spring batch job and in one of my step I have the following code for the processor:
#Component
public class SubscriberProcessor implements ItemProcessor<NewsletterSubscriber, Account>, InitializingBean {
#Autowired
private AccountService service;
#Override public Account process(NewsletterSubscriber item) throws Exception {
if (!Strings.isNullOrEmpty(item.getId())) {
return service.getAccount(item.getId());
}
// search with email address
List<Account> accounts = service.findByEmail(item.getEmail());
checkState(accounts.size() <= 1, "Found more than one account with email %s", item.getEmail());
return accounts.isEmpty() ? null : accounts.get(0);
}
#Override public void afterPropertiesSet() throws Exception {
Assert.notNull(service, "account service must be set");
}
}
The above code works but I've found out that there are some edge cases where having more than one Account per NewsletterSubscriber is allowed. So I need to remove the state check and to pass more than one Account to the item writer.
One solution I found is to change both ItemProcessor and ItemWriter to deal with List<Account> type instead of Account but this have two drawbacks:
Code and tests are uglier and harder to write and maintain because of nested lists in writer
Most important more than one Account object may be written in the same transaction because a list given to writer may contain multiple accounts and I'd like to avoid this.
Is there any way, maybe using a listener, or replacing some internal component used by spring batch to avoid lists in processor?
Update
I've opened an issue on spring Jira for this problem.
I'm looking into isComplete and getAdjustedOutputs methods in FaultTolerantChunkProcessor which are marked as extension points in SimpleChunkProcessor to see if I can use them in some way to achieve my goal.
Any hint is welcome.

Item Processor takes one thing in, and returns a list
MyItemProcessor implements ItemProcessor<SingleThing,List<ExtractedThingFromSingleThing>> {
public List<ExtractedThingFromSingleThing> process(SingleThing thing) {
//parse and convert to list
}
}
Wrap the downstream writer to iron things out. This way stuff downstream from this writer doesn't have to work with lists.
#StepScope
public class ItemListWriter<T> implements ItemWriter<List<T>> {
private ItemWriter<T> wrapped;
public ItemListWriter(ItemWriter<T> wrapped) {
this.wrapped = wrapped;
}
#Override
public void write(List<? extends List<T>> items) throws Exception {
for (List<T> subList : items) {
wrapped.write(subList);
}
}
}

There isn't a way to return more than one item per call to an ItemProcessor in Spring Batch without getting pretty far into the weeds. If you really want to know where the relationship between an ItemProcessor and ItemWriter exits (not recommended), take a look at the implementations of the ChunkProcessor interface. While the simple case (SimpleChunkProcessor) isn't that bad, if you use any of the fault tolerant logic (skip/retry via FaultTolerantChunkProcessor), it get's very unwieldily quick.
A much simpler option would be to move this logic to an ItemReader that does this enrichment before returning the item. Wrap whatever ItemReader you're using in a custom ItemReader implementation that does the service lookup before returning the item. In this case, instead of returning a NewsletterSubscriber from the reader, you'd be returning an Account based on the previous information.

Instead of returning an Account you return create an AccountWrapper or Collection. The Writer obviously must take this into account :)

You can made transformer to transform your Pojo( Pojo object from file) to your Entity
By making the following code :
public class Intializer {
public static LGInfo initializeEntity() throws Exception {
Constructor<LGInfo> constr1 = LGInfo.class.getConstructor();
LGInfo info = constr1.newInstance();
return info;
}
}
And in your item Processor
public class LgItemProcessor<LgBulkLine, LGInfo> implements ItemProcessor<LgBulkLine, LGInfo> {
private static final Log log = LogFactory.getLog(LgItemProcessor.class);
#SuppressWarnings("unchecked")
#Override
public LGInfo process(LgBulkLine item) throws Exception {
log.info(item);
return (LGInfo) Intializer.initializeEntity();
}
}

Master/slave Job architectural design in spring batch using modular job approach

I hope you're doing great.
I'm facing design problem in spring batch.
Let me explain:
I have a modular spring batch job architecture,
each job has its own config file et context.
I am designing a master Job to launch the subjobs (50+ types of subjobs).
X obj has among other name, state and blob which contains the csv file attached to it.
X obj will be updated after being processed.
I follow the first approach of fetching all X obj and then looping (in java stream) to call the appropriate job.
But this approach has a lot of limitations.
So I design a masterJob with reader processor and writer.
MasterJob should read X obj and call the appropriate subJob and the update the state of X obj.
masterJobReader which call a custom service to get a list of let's say X obj.
I started by trying to launch subjob from within the masterJob processor but It did not work.
I did some research and I find that JobStep could be more adequate for this scenario.
But I'm stuck with how to pass the item read by masterJobReader to JobStep has parameter.
I did saw DefaultJobParameterExtractor and I try to set the Item read into the stepExecutionContext but It's not working.
My question how to pass parameter from MasterJob to SubJob using
JobStep approach?
If there is better way to deal with this then I'm all yours!
I'm using Java Config and spring batch 4.3.
Edit to provide sample code:
#Configuration
public class MasterJob {
#Value("${defaultCompletionPolicy}")
private Integer defaultCompletionPolicy;
#Autowired
protected StepBuilderFactory masterStepBuilderFactory;
private Logger logger = LoggerFactory.getLogger(MasterJob.class);
#Autowired
protected JobRepository jobRepo;
#Autowired
protected PlatformTransactionManager transactionManager;
#Autowired
#Qualifier("JOB_NAME1")
private Job JOB_NAME1; // this should change to be dynamic as there are around 50 types of job
#Bean(name = "masterJob")
protected Job masterBatchJob() throws ApiException {
return new JobBuilderFactory(jobRepo).get("masterJob")
.incrementer(new RunIdIncrementer())
.start(masterJobStep(masterJobReader(), masterJobWriter()))
.next(jobStepJobStep1(null))
.next(masterUpdateStep()) // update the state of objX
.build();
}
#Bean(name = "masterJobStep")
protected Step masterJobStep(#Qualifier("masterJobReader") MasterJobReader masterReader,
#Qualifier("masterJobWriter") MasterJobWriter masterWriter) throws ApiException {
logger.debug("inside masterJobStep");
return this.masterStepBuilderFactory.get("masterJobStep")
.<Customer, Customer>chunk(defaultCompletionPolicy)
.reader(masterJobReader())
.processor(masterJobProcessor())
.writer(masterJobWriter())
.transactionManager(transactionManager)
.listener(new MasterJobWriter()) // I set the parameter inside this.
.listener(masterPromotionListener())
.build();
}
#Bean(name = "masterJobWriter", destroyMethod = "")
#StepScope
protected MasterJobWriter masterJobWriter() {
return new MasterJobWriter();
}
#Bean(name = "masterJobReader", destroyMethod = "")
#StepScope
protected MasterJobReader masterJobReader() throws ApiException {
return new MasterJobReader();
}
protected FieldSetMapper<Customer> mapper() {
return new CustomerMapper();
}
#Bean(name="masterPromotionListener")
public ExecutionContextPromotionListener masterPromotionListener() {
ExecutionContextPromotionListener listener = new ExecutionContextPromotionListener();
listener.setKeys(
new String[]
{
"inputFile",
"outputFile",
"customerId",
"comments",
"customer"
});
//listener.setStrict(true);
return listener;
}
#Bean(name = "masterUpdateStep")
public Step masterUpdateStep() {
return this.masterStepBuilderFactory.get("masterCleanStep").tasklet(new MasterUpdateTasklet()).build();
}
#Bean(name = "masterJobProcessor", destroyMethod = "")
#StepScope
protected MasterJobProcessor masterJobProcessor() {
return new MasterJobProcessor();
}
#Bean
public Step jobStepJobStep1(JobLauncher jobLauncher) {
return this.masterStepBuilderFactory.get("jobStepJobStep1")
.job(JOB_NAME1)
.launcher(jobLauncher)
.parametersExtractor(jobParametersExtractor())
.build();
}
#Bean
public DefaultJobParametersExtractor jobParametersExtractor() {
DefaultJobParametersExtractor extractor = new DefaultJobParametersExtractor();
extractor.setKeys(
new String[] { "inputFile", "outputFile", , "customerId", "comments", "customer" });
return extractor;
}
}
This is how I set parameter from within the MasterJobWriter:
String inputFile = fetchInputFile(customer);
String outputFile = buildOutputFileName(customer);
Comments comments = "comments"; // from business logic
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
stepContext.put("inputFile", inputFile);
stepContext.put("outputFile", outputFile);
stepContext.put("customerId", customer.getCustomerId());
stepContext.put("comments", new CustomJobParameter<Comments>(comments));
stepContext.put("customer", new CustomJobParameter<Customer>(customer));
I follow this section of the documentation of spring batch

My question how to pass parameter from MasterJob to SubJob using JobStep approach?
The JobParametersExtractor is what you are looking for. It allows you to extract parameters from the main job and pass them to the subjob. You can find an example here.
EDIT: Adding suggestions based on comments
I have a list of X obj in the DB. X obj has among other fields, id, type(of work), name, state and blob which contains the csv file attached to it. The blob field containing the csv file depends on the type field so it's not one pattern csv file. I need to process each X obj and save the content of the csv file in the DB and generate a csv result file containing the original data plus a comment field in the result csv file and update X obj state with the result csv field attached to X obj and other fields.
As you can see, the process is already complex for a single X object. So trying to process all X objects in the same job of jobs is too complex IMHO. So much complexity in software comes from trying to make one thing do two things..
If there is better way to deal with this then I'm all yours!
Since you are open for suggestions, I will recommend two options:
Option 1:
If it were up to me, I would create a job instance per X obj. This way, I can 1) parallelize things and 2) in case of failure, restart only the failed job. These two characteristics (Scalability and Restartability) are almost impossible with the job of jobs approach. Even if you have a lot of X objects, this is not a problem. You can use one of the scaling techniques provided by Spring Batch to process things in parallel.
Option 2:
If you really can't or don't want to use different job instances, you can use a single job with a chunk-oriented step that iterates over X objects list. The processing logic seems independent from one record to another, so this step should be easily scalable with multiple threads.

aggregate not found in the event store

I am trying to add data using CQRS framework AXON. But while hitting the API(used to add an order). I am getting the below error:-
Command 'com.cqrs.order.commands.CreateOrderCommand' resulted in org.axonframework.modelling.command.AggregateNotFoundException(The aggregate was not found in the event store)
But i already have an Aggregate in my code(OrderAggregate.Java).
The Full code can be found at - https://github.com/iftekharkhan09/OrderManagementSystem
API to add Order - http://localhost:8080/confirmOrder
Request Body:-
{
"studentName":"Sunny Khan"
}
Can anyone please tell me where am i doing wrong?
Any help is appreciated!

For other readers, let me share the Aggregate you've created in your repository:
#Aggregate
public class OrderAggregate {
public OrderAggregate(OrderRepositoryData orderRepositoryData) {
this.orderRepositoryData = orderRepositoryData;
}
#AggregateIdentifier
private Integer orderId;
private OrderRepositoryData orderRepositoryData;
#CommandHandler
public void handle(CreateOrderCommand command) {
apply(new OrderCreatedEvent(command.getOrderId()));
}
#EventSourcingHandler
public void on(OrderCreatedEvent event) {
this.orderId=event.getOrderId();
Order order=new Order("Order New");
orderRepositoryData.save(order);
}
protected OrderAggregate() {
// Required by Axon to build a default Aggregate prior to Event Sourcing
}
}
There are several things you can remove entirely from this Aggregate, which are:
The OrderRepositoryData
The OrderAggregate constructor which sets the OrderRepositoryData
The manually saving of an Order in the #EventSourcingHandler annotated function
What you're doing here is mixing the Command Model's concern of making decisions with creating a queryable Order for the Query Model. It would be better to remove this logic entirely from an Aggregate (the Command Model in your example) and move this to an Event Handling Component.
This is however not the culprit for the AggregateNotFoundException you're receiving.
What you've missed is to make the CreateOrderCommand command handler a constructor.
The CreateOrderCommand will create an Order, as it's name already suggests.
Hence, it should be handled by a constructor rather than a regular method.
So, instead of this:
#CommandHandler
public *void* handle(CreateOrderCommand command) {
apply(new OrderCreatedEvent(command.getOrderId()));
}
You should be doing this:
#CommandHandler
public OrderAggregate(CreateOrderCommand command) {
apply(new OrderCreatedEvent(command.getOrderId()));
}
Hope this helps you out #Sunny!

aggregate not found in the event store
The main reason for this exception is, When the axon is trying to save the aggregate it should create the aggragate first.
#CommandHandler
public OrderAggregate(CreateOrderCommand command) {
apply(new OrderCreatedEvent(command.getOrderId()));
}
Also in this way ur
private OrderRepositoryData orderRepositoryData;
won't be initialized, so autowired the orderRepositoryData also.
#Autowired
private OrderRepositoryData orderRepositoryData;
For the successive events you should use same OrderId ,else also it will throw
handleThrowable(java.lang.Throwable,org.springframework.web.context.request.WebRequest)
org.axonframework.modelling.command.AggregateNotFoundException: The aggregate was not found in the event store
at org.axonframework.eventsourcing.EventSourcingRepository.doLoadWithLock(EventSourcingRepository.java:122)

Spring Batch: Multiple Item Readers in a single step

I am a newbie in spring batch. The task I need to achieve in spring batch as follows:
Need to read some metadata from database.
Based on this metadata, I need to read some files.
After some processing, need to write those values from file to database.
My queries are the following:
a. For the 1st requirement, I needed to map the whole resultset to a single object, where Person related data are in 1 table and Pets related data are in another table and joined by person id.
public class PersonPetDetails {
private String personName;
private String personAddr;
private int personAge;
private List<Pet> pets;
For this I have written a custom Item reader which extends JdbcCursorItemReader.
public class CustomJDBCCusrorItemReader<T> extends JdbcCursorItemReader<T> {
private ResultSetExtractor<T> resultSetExtractor;
public void setResultSetExtractor(ResultSetExtractor<T> resultSetExtractor) {
this.resultSetExtractor = resultSetExtractor;
}
#Override
public void afterPropertiesSet() throws Exception {
setVerifyCursorPosition(false);
Assert.notNull(getDataSource(), "DataSource must be provided");
Assert.notNull(getSql(), "The SQL query must be provided");
Assert.notNull(resultSetExtractor, "ResultSetExtractor must be provided");
}
#Override
protected T readCursor(ResultSet rs, int currentRow) throws SQLException {
return resultSetExtractor.extractData(rs);
}
}
Is this the correct way to achieve my requirement? Or is there a much better way?
b. AFAIK, in spring batch there cannot be a step with just a reader, without a writer. Hence, I cannot call another set of reader in a different step of the Job. Then, how can I call multiple readers in a single step?
c. Also, based on some condition I may need to call a third set of Reader. How can I conditionally call a reader in a step?
Thanks for going through my post. I know it is long. Any help is much appreciated. Also, i guess an example code snippet would help me better understand the point. :)

I would recommend as below
High Level Design:
Partitioner
It will deal with list of persons. Note: there is not Pet data pulled at this point of time.
Reader
It will get a list of Pet which are belong to a Person. Note: Reader will return a list of Pet specific to a Person only.
Processor
Base on a Pet-Person you will process base on your requirement.
Writer
Based on your requirement to write to DB.
Low Level Code snippet:
Partitioner
public class PetPersonPartitioner implements Partitioner {
#Autowired
private PersonDAO personDAO;
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> queue = new HashMap<String, ExecutionContext>();
List<Person> personList = this.personDAO.getAllPersons();
for (Person person : personList) {
ExecutionContext ec = new ExecutionContext();
ec.put("person", person);
ec.put("personId", person.getId());
queue.put(person.getId(), ec);
}
return queue;
}
}
Reader
<bean id="petByPersonIdRowMapper" class="yourpackage.PetByPersonIdRowMapper" />
<bean id="petByPesonIdStatementSetter" scope="step"
class="org.springframework.batch.core.resource.ListPreparedStatementSetter">
<property name="parameters">
<list>
<value>#{stepExecutionContext['personId']}</value>
</list>
</property>
</bean>
public class PetByPersonIdRowMapper implements RowMapper<PersonPetDetails> {
#Override
public BillingFeeConfigEntity mapRow(ResultSet rs, int rowNum) throws SQLException {
PersonPetDetails record = new PersonPetDetails();
record.setPersonId(rs.getLong("personId"));
record.setPetId(rs.getLong("petid");
...
...
}
Processor
You can continue working on each PersonPetDetails object.

Java template method pattern

I am trying to implement something along the lines of the template method pattern within some JavaEE beans that do some processing work on a dataset.
Each processing bean takes the Job object, does some work then returns the updated job for the next bean (about 10 in total), each bean has a single method with the same name (doProcessing) and single argument (job)
I would like to perform some logging at the start and end of each beans 'doProcessing' method so that at the end of processing the Job contains logging info from each bean (stored in a hashmap or suchlike)
My current implementation looks something like this...
#Stateless
public class processingTaskOne(){
public void doProcessing(Job job){
//always called at beginning of method
String beanReport = "Info from Task 1: ";
for(int i=0; i<job.getDataArray().size();i++){
beanReport+="\n some more info";
//Do some processing work here
}
//always called at end of method
job.addNewReportSection(beanReport)
}
}
But I know that I can do better than this, using inheritance I should be able to create a superclass along the lines of...
public abstract class Reportable{
private String sectionReport;
public void preProcessing(Job job){
//Setup bean report, use reflection to get subclass name
}
public void postProcessing(Job job){
//Finish bean report and append to job
job.addNewReportSection(sectionReport)
}
public abstract doProcessing(){
//not sure how this should work
}
}
And any class that extends the superclass will automatically perform the pre/postprocessing actions...
#Stateless
public class processingTaskOne() extends Reportable{
public void doProcessing(Job job){
for(int i=0; i<job.getDataArray().size();i++){
super.sectionReport += "log some info"
//Do some processing work here
}
}
}
But I have been unable to work out how to implement this exactly, since all the examples refer to POJO's and as my beans are #Stateless there is no constructor.
Can someone provide some guidance on this? Am I barking up the wrong tree?

If I understood your question correctly, you can try the following:
public abstract class Reporting {
public void setUp(Job job) {
// set things up
}
public void tearDown(Job job) {
// post-processing stuff
}
public void process(Job job) {
setUp(job);
doProcessing(job);
tearDown(job);
}
public abstract void doProcessing(Job job);
}
public class Processor1 extends Reporting {
#Override
public void doProcessing(Job job) {
// business logic
}
}
and later, somewhere in your code, you should call not the doProcessing(), but rather the process() method of your base class.
Also, in case my interpretation was correct, you might be interested in using some aspect-oriented programming framework like AcpectJ or Spring AOP.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.