Spring Batch: Multiple Item Readers in a single step - java

I am a newbie in spring batch. The task I need to achieve in spring batch as follows:
Need to read some metadata from database.
Based on this metadata, I need to read some files.
After some processing, need to write those values from file to database.
My queries are the following:
a. For the 1st requirement, I needed to map the whole resultset to a single object, where Person related data are in 1 table and Pets related data are in another table and joined by person id.
public class PersonPetDetails {
private String personName;
private String personAddr;
private int personAge;
private List<Pet> pets;
For this I have written a custom Item reader which extends JdbcCursorItemReader.
public class CustomJDBCCusrorItemReader<T> extends JdbcCursorItemReader<T> {
private ResultSetExtractor<T> resultSetExtractor;
public void setResultSetExtractor(ResultSetExtractor<T> resultSetExtractor) {
this.resultSetExtractor = resultSetExtractor;
}
#Override
public void afterPropertiesSet() throws Exception {
setVerifyCursorPosition(false);
Assert.notNull(getDataSource(), "DataSource must be provided");
Assert.notNull(getSql(), "The SQL query must be provided");
Assert.notNull(resultSetExtractor, "ResultSetExtractor must be provided");
}
#Override
protected T readCursor(ResultSet rs, int currentRow) throws SQLException {
return resultSetExtractor.extractData(rs);
}
}
Is this the correct way to achieve my requirement? Or is there a much better way?
b. AFAIK, in spring batch there cannot be a step with just a reader, without a writer. Hence, I cannot call another set of reader in a different step of the Job. Then, how can I call multiple readers in a single step?
c. Also, based on some condition I may need to call a third set of Reader. How can I conditionally call a reader in a step?
Thanks for going through my post. I know it is long. Any help is much appreciated. Also, i guess an example code snippet would help me better understand the point. :)

I would recommend as below
High Level Design:
Partitioner
It will deal with list of persons. Note: there is not Pet data pulled at this point of time.
Reader
It will get a list of Pet which are belong to a Person. Note: Reader will return a list of Pet specific to a Person only.
Processor
Base on a Pet-Person you will process base on your requirement.
Writer
Based on your requirement to write to DB.
Low Level Code snippet:
Partitioner
public class PetPersonPartitioner implements Partitioner {
#Autowired
private PersonDAO personDAO;
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> queue = new HashMap<String, ExecutionContext>();
List<Person> personList = this.personDAO.getAllPersons();
for (Person person : personList) {
ExecutionContext ec = new ExecutionContext();
ec.put("person", person);
ec.put("personId", person.getId());
queue.put(person.getId(), ec);
}
return queue;
}
}
Reader
<bean id="petByPersonIdRowMapper" class="yourpackage.PetByPersonIdRowMapper" />
<bean id="petByPesonIdStatementSetter" scope="step"
class="org.springframework.batch.core.resource.ListPreparedStatementSetter">
<property name="parameters">
<list>
<value>#{stepExecutionContext['personId']}</value>
</list>
</property>
</bean>
public class PetByPersonIdRowMapper implements RowMapper<PersonPetDetails> {
#Override
public BillingFeeConfigEntity mapRow(ResultSet rs, int rowNum) throws SQLException {
PersonPetDetails record = new PersonPetDetails();
record.setPersonId(rs.getLong("personId"));
record.setPetId(rs.getLong("petid");
...
...
}
Processor
You can continue working on each PersonPetDetails object.

Related

Spring batch: reader gave one item, processor have to extract many from it [duplicate]

I'm writing a spring batch job and in one of my step I have the following code for the processor:
#Component
public class SubscriberProcessor implements ItemProcessor<NewsletterSubscriber, Account>, InitializingBean {
#Autowired
private AccountService service;
#Override public Account process(NewsletterSubscriber item) throws Exception {
if (!Strings.isNullOrEmpty(item.getId())) {
return service.getAccount(item.getId());
}
// search with email address
List<Account> accounts = service.findByEmail(item.getEmail());
checkState(accounts.size() <= 1, "Found more than one account with email %s", item.getEmail());
return accounts.isEmpty() ? null : accounts.get(0);
}
#Override public void afterPropertiesSet() throws Exception {
Assert.notNull(service, "account service must be set");
}
}
The above code works but I've found out that there are some edge cases where having more than one Account per NewsletterSubscriber is allowed. So I need to remove the state check and to pass more than one Account to the item writer.
One solution I found is to change both ItemProcessor and ItemWriter to deal with List<Account> type instead of Account but this have two drawbacks:
Code and tests are uglier and harder to write and maintain because of nested lists in writer
Most important more than one Account object may be written in the same transaction because a list given to writer may contain multiple accounts and I'd like to avoid this.
Is there any way, maybe using a listener, or replacing some internal component used by spring batch to avoid lists in processor?
Update
I've opened an issue on spring Jira for this problem.
I'm looking into isComplete and getAdjustedOutputs methods in FaultTolerantChunkProcessor which are marked as extension points in SimpleChunkProcessor to see if I can use them in some way to achieve my goal.
Any hint is welcome.
Item Processor takes one thing in, and returns a list
MyItemProcessor implements ItemProcessor<SingleThing,List<ExtractedThingFromSingleThing>> {
public List<ExtractedThingFromSingleThing> process(SingleThing thing) {
//parse and convert to list
}
}
Wrap the downstream writer to iron things out. This way stuff downstream from this writer doesn't have to work with lists.
#StepScope
public class ItemListWriter<T> implements ItemWriter<List<T>> {
private ItemWriter<T> wrapped;
public ItemListWriter(ItemWriter<T> wrapped) {
this.wrapped = wrapped;
}
#Override
public void write(List<? extends List<T>> items) throws Exception {
for (List<T> subList : items) {
wrapped.write(subList);
}
}
}
There isn't a way to return more than one item per call to an ItemProcessor in Spring Batch without getting pretty far into the weeds. If you really want to know where the relationship between an ItemProcessor and ItemWriter exits (not recommended), take a look at the implementations of the ChunkProcessor interface. While the simple case (SimpleChunkProcessor) isn't that bad, if you use any of the fault tolerant logic (skip/retry via FaultTolerantChunkProcessor), it get's very unwieldily quick.
A much simpler option would be to move this logic to an ItemReader that does this enrichment before returning the item. Wrap whatever ItemReader you're using in a custom ItemReader implementation that does the service lookup before returning the item. In this case, instead of returning a NewsletterSubscriber from the reader, you'd be returning an Account based on the previous information.
Instead of returning an Account you return create an AccountWrapper or Collection. The Writer obviously must take this into account :)
You can made transformer to transform your Pojo( Pojo object from file) to your Entity
By making the following code :
public class Intializer {
public static LGInfo initializeEntity() throws Exception {
Constructor<LGInfo> constr1 = LGInfo.class.getConstructor();
LGInfo info = constr1.newInstance();
return info;
}
}
And in your item Processor
public class LgItemProcessor<LgBulkLine, LGInfo> implements ItemProcessor<LgBulkLine, LGInfo> {
private static final Log log = LogFactory.getLog(LgItemProcessor.class);
#SuppressWarnings("unchecked")
#Override
public LGInfo process(LgBulkLine item) throws Exception {
log.info(item);
return (LGInfo) Intializer.initializeEntity();
}
}

How to use non-keyed state with Kafka Consumer in Flink?

I'm trying to implement (just starting work with Java and Flink) a non-keyed state in KafkaConsumer object, since in this stage no keyBy() in called. This object is the front end and the first module to handle messages from Kafka.
SourceOutput is a proto file representing the message.
I have the KafkaConsumer object :
public class KafkaSourceFunction extends ProcessFunction<byte[], SourceOutput> implements Serializable
{
#Override
public void processElement(byte[] bytes, ProcessFunction<byte[], SourceOutput>.Context
context, Collector<SourceOutput> collector) throws Exception
{
// Here, I want to call to sorting method
collector.collect(output);
}
}
I have an object (KafkaSourceSort) that do all the sorting and should keep the unordered message in priorityQ in the state and also responsible to deliver the message if it comes in the right order thru the collector.
class SessionInfo
{
public PriorityQueue<SourceOutput> orderedMessages = null;
public void putMessage(SourceOutput Msg)
{
if(orderedMessages == null)
orderedMessages = new PriorityQueue<SourceOutput>(new SequenceComparator());
orderedMessages.add(Msg);
}
}
public class KafkaSourceState implements Serializable
{
public TreeMap<String, SessionInfo> Sessions = new TreeMap<>();
}
I read that I need to use a non-keyed state (ListState) which should contain a map of sessions while each session contains a priorityQ holding all messages related to this session.
I found an example so I implement this:
public class KafkaSourceSort implements SinkFunction<KafkaSourceSort>,
CheckpointedFunction
{
private transient ListState<KafkaSourceState> checkpointedState;
private KafkaSourceState state;
#Override
public void snapshotState(FunctionSnapshotContext functionSnapshotContext) throws Exception
{
checkpointedState.clear();
checkpointedState.add(state);
}
#Override
public void initializeState(FunctionInitializationContext context) throws Exception
{
ListStateDescriptor<KafkaSourceState> descriptor =
new ListStateDescriptor<KafkaSourceState>(
"KafkaSourceState",
TypeInformation.of(new TypeHint<KafkaSourceState>() {}));
checkpointedState = context.getOperatorStateStore().getListState(descriptor);
if (context.isRestored())
{
state = (KafkaSourceState) checkpointedState.get();
}
}
#Override
public void invoke(KafkaSourceState value, SinkFunction.Context contex) throws Exception
{
state = value;
// ...
}
}
I see that I need to implement an invoke message which probably will be called from processElement() but the signature of invoke() doesn't contain the collector and I don't understand how to do so or even if I did OK till now.
Please, a help will be appreciated.
Thanks.
A SinkFunction is a terminal node in the DAG that is your job graph. It doesn't have a Collector in its interface because it cannot emit anything downstream. It is expected to connect to an external service or data store and send data there.
If you share more about what you are trying to accomplish perhaps we can offer more assistance. There may be an easier way to go about this.

Design approach when a class has many dependencies but requires to use only some of them based on some condition

I have multiple classes which implement an interface and return an Object.
public interface DataFetcher {
Data getData(Info info);
}
public class Data {
private String name;
private String value;
}
#Component
public class DataPointA implements DataFetcher {
#Override
public Data getData(Info info) {
//..Do some processing
return new Data("SomeName", valueComputed);
}
}
Now I have about 20 data points which implement the DataFetcher class and returns the Data Object.
I autowire all the data points to a class and based on certain conditions I use certain data points.
#Component
public class DataComputer {
#Autowired
private DataPointA dataPointA;
#Autowired
private DataPointB dataPointB;
.
.
.
public void computeData(String inputType, Info info) {
List<DataFetcher> dataFecthers;
switch(inputType) {
case "typeA" : dataFecthers = ImmutableList.of(dataPointA, dataPointB);
break;
.
.
.
case "typeD" : dataFecthers = ImmutableList.of(dataPointE, dataPointF, dataPointG);
break;
}
dataFetcher.forEach(dataPoint -> {
//Do some processing with dataPoint.getData(info)
})
}
}
As can be seen DataComputer class will have a whole list of dependencies which can become unmanageable. Also the data point to be used based on the inputType is known before hand so this can be extracted out. This was my attempt at doing it:
#Component
public class DataComputationPointDecider {
#Autowired
private DataPointA dataPointA;
#Autowired
private DataPointB dataPointB;
.
.
.
#Bean
public Map<String, List<DataFetcher>> getDataComputationPoints() {
return new ImmutableMap.Builder<String, List<DataFetcher>>()
.put("typeA", ImmutableList.of(dataPointA, dataPointB))
.put("typeD", ImmutableList.of(dataPointE, dataPointF, dataPointG))
.build();
}
}
And then my DataComputer dependencies reduces:
#Component
public class DataComputer {
#Autowired
private Map<String, List<DataFetcher>> dataComputationPoints;
public void computeData(String inputType, Info info) {
List<DataFetcher> dataFecthers = dataComputationPoints.get(inputType);
dataFetcher.forEach(dataPoint -> {
//Do some processing with dataPoint.getData(info)
})
}
}
Is there a better way to design this?
I don't see anything majorly wrong in your approach. But I'm suggesting one more option.
Instead of maintaining a map that maps an inputType with a list of DataFetcher, you can make a DataFetcher decide or say what input type(s) it can handle.
But this needs changing the interface of DataFetcher as
public interface DataFetcher {
boolean canHandle(String inputType);
Data getData(Info info);
}
The implementations would look like
#Component
public class DataPointA implements DataFetcher {
#Override
boolean canHandle(String inputType) {
return "typeA".equals(inputType);
}
#Override
public Data getData(Info info) {
//..Do some processing
return new Data("SomeName", valueComputed);
}
}
Then you can just inject all DataFetcher as one single list (and need not add one #Autowired field for each one) and process it as
#Autowired
List<DataFetcher> dataFetchers;
...
dataFetchers.stream()
.filter(dataFetcher -> dataFetcher.canHandle(inputType))
.forEach(dataFetcher.getData(info));
Advantages:
In your current approach, if you add a new DataFetcher implementation, you need to add a #AutoWired field/member and modify the (getDataComputationPoints)map. But, with this, the inputTypes a DataFetcher can handle is specified with that itself and hence you just need to add new classes for new input types.
Reference
Autowire reference beans into list by type
UPDATE:
Disadvantages
The input types are specified inside the class means that you cannot easily find the list of DataFetchers (data points) for a given input type.
If you need to remove support for an inputType, then again you need to visit each implementation (to remove that inputType from canHandle). In your approach, it is about simply remove one map entry.
Have you considered using the Factory pattern? This allows you to submit a request for an object instance based on certain criteria.

Spring Batch - Pass data between Processor & Writer

I've a spring batch which contains reader->processor->writer.
Data passed b/w is of type Emp:
class Emp {
iny id;
String name;
EmpTypeEnum empType; // HR, Dev, Tester, etc.
// getters and setters
}
As a simple batch data is read from a CSV file in Reader, some processing inside Processor & and an output CSV file is written by Writer.
But apart from this output CSV file, I want to generate a secondary output file which only contains count of each EmpType, i.e. Total number of HR, Dev & Tester.
I was thinking of performing the counting within the processor only, like:
public class EmpItemProcessor implements ItemProcessor<Emp, Emp> {
int countHr;
int countDev;
int countTester;
#Override
public Person process(final Emp emp) throws Exception {
if (item.getEmpType.equals(EmpTypeEnum.HR) {
countHr++;
} else if // .....
// other processor on emp
return emp;
}
}
But as you can see I can only return Emp from Processor, so how can I pass countHr, countDev, etc. from processor & use it to create secondary file?
Please suggest. If you think any other approach will be better, please suggest.
Thanks
You could use ItemWriteListener and JobExecutionListenerSupport for this.
Define a ItemWriteListener , Which will be called after calling your writer every time.
In this Listener update a counter in execution context every time
Write a JobExecutionListener which will be called after the whole job is completed, where you can read the value from execution context and do further processing.
#Component
#JobScope
public class EmployeeWriteListener implements ItemWriteListener<Emp> {
#Value("#{jobExecution.executionContext}")
private ExecutionContext executionContext;
#Override
public void afterWrite(final List<? extends Emp> paramList) {
final int counter =
this.executionContext.getInt("TOTAL_EXPORTED_ITEMS", 0);
this.executionContext.putInt("TOTAL_EXPORTED_ITEMS", counter + 1);
}
}
}
#Component
#JobScope
public class EmployeeNotificationListener extends JobExecutionListenerSupport {
#Override
public void afterJob(final JobExecution jobExecution) {
jobExecution.getExecutionContext()
.getInt("TOTAL_EXPORTED_ITEMS")
...................
}
}
You should register these listeners when you declare your step and job .
this.jobBuilders.get("someJob").incrementer(new RunIdIncrementer()).listener(new EmployeeNotificationListener())
.flow(this.getSomeStep()).end().build();
//instead of new(..) you should Autowire listener
public Step getSomeStep() {
return stepBuilders.get("someStep").<X, Y>chunk(10)
.reader(this.yourReader).processor(this.yourProcessor)
.writer(this.yourProcessor).listener(this.EmployeeWriteListener)
.build();
}
Basically you need multple ItemWriter's to process two different writing tasks. You can easily use CompositeItemWriter which has the capability of holding list of different ItemWriter within it. And on each item it will call all it's ItemWriter.
In your case,
Make two FlatFileItemWriter - one for your normal CSV output & other for your statistics.
Then create CompositeItemWriter<Emp> object and add both these FlatFileItemWriter<Emp> into it using this method of it - public void setDelegates(List<ItemWriter<Emp>> delegates)
Use this CompositeItemWriter as you ItemWriter in the step
So, when your CompositeItemWriter is called it will delegate to both the ItemWriter in order you have added to the list.
Job done :)

How to use Spring jdbc templates (jdbcTemplate or namedParameterJDBCTem) to retrieve values from database

Few days into Spring now. Integrating Spring-JDBC into my web application. I was successfully able to preform CRUD operations on my DB, impressed with boiler-plate code reduction. But I am failing to use the query*() methods provided in NamedParameterJDBCTemplate. Most of the examples on the internet provide the usage of either RowMapper or ResultSetExtractor. Though both uses are fine, it forces me to create classes which have to implement these interfaces. I have to create bean for every type of data I am loading for the DB (or maybe I am mistaken).
Problem arises in code section where I have used something like this:
String query="select username, password from usertable where username=?"
ps=conn.prepareStatement(query);
ps.setString(username);
rs=ps.executeQuery();
if(rs.next()){
String username=rs.getString("username");
String password=rs.getString("password")
//Performs operation on them
}
As these values are not stored in any bean and used directly, I am not able to integrate jdbcTemplate in these kind of situations.
Another situation arises when I am extracting only part of properties present in bean from my database.
Example:
public class MangaBean{
private String author;
private String title;
private String isbn;
private String releaseDate;
private String rating;
//getters and setters
}
Mapper:
public class MangaBeanMapper implements RowMapper<MangaBean>{
#Override
public MangaBean mapRow(ResultSet rs, int arg1) throws SQLException {
MangaBean mb=new MangaBean();
mb.setAuthor(rs.getString("author"));
mb.setTitle(rs.getString("title"));
mb.setIsbn(rs.getString("isbn"));
mb.setReleaseDate(rs.getString("releaseDate"));
mb.setRating(rs.getString("rating"));
return mb;
}
}
The above arrangement runs fine like this:
String query="select * from manga_data where isbn=:isbn"
Map<String, String> paramMap=new HashMap<String, String>();
paramMap.put("isbn", someBean.getIsbn());
return template.query(query, paramMap, new MangaBeanMapper());
However, if I only want to retrieve two/three values from my db, I cannot use the above pattern as it generates a BadSqlGrammarException: releaseDate does not exist in ResultSet . Example :
String query="select title, author where isbn=:isbn"
Map<String, String> paramMap=new HashMap<String, String>();
paramMap.put("isbn", someBean.getIsbn());
return template.query(query, paramMap, new MangaBeanMapper());
Template is an instance of NamedParameterJDBCTemplate. Please advice me solutions for these situations.
The other answers are sensible: you should create a DTO bean, or use the BeanPropertyRowMapper.
But if you want to be able to have more control than the BeanPropertyRowMapper, (or reflection makes it too slow), you can use the
queryForMap
method, which will return you a list of Maps (one per row) with the returned columns as keys. Because you can call get(/* key that is not there */) on a Map without throwing an exception (it will just return null), you can use the same code to populate your object irrespective of which columns you selected.
You don't even need to write your own RowMapper, just use the BeanPropertyRowMapper that spring provides. The way it works is it matches the column names returned to the properties of your bean. Your query has columns that match your bean exactly, if it didn't you would use an as in your select as follows...
-- This query matches a property named matchingName in the bean
select my_column_that doesnt_match as matching_name from mytable;
The BeanPropertyRowMapper should work with both queries you listed.
Typically, yes : for most queries you would create a bean or object to transform the result into. I would suggest that more most cases, that's want you want to do.
However, you can create a RowMapper that maps a result set to a map, instead of a bean, like this. Downside would be be losing the type management of beans, and you'd be relying on your jdbc driver to return the correct type for each column.
As #NimChimpskey has just posted, it's best to create a tiny bean object : but if you really don't want to do that, this is another option.
class SimpleRowMapper implements RowMapper<Map<String, Object>> {
String[] columns;
SimpleRowMapper(String[] columns) {
this.columns = columns;
}
#Override
public Map<String, Object> mapRow(ResultSet resultSet, int i) throws SQLException {
Map<String, Object> rowAsMap = new HashMap<String, Object>();
for (String column : columns) {
rowAsMap.put(column, resultSet.getObject(column));
}
return rowAsMap;
}
}
In yr first example I would just create a DTO Bean/Value object to store them. There is a reason its a commonly implemented pattern, it takes minutes to code and provides many long term benefits.
In your second example, create a second implementation of rowmapper where you don;t set the fields, or supply a null/subsitute value to mangabean where necessary :
#Override
public MangaBean mapRow(ResultSet rs, int arg1) throws SQLException {
MangaBean mb=new MangaBean();
mb.setAuthor(rs.getString("author"));
mb.setTitle(rs.getString("title"));
/* mb.setIsbn("unknown");*/
mb.setReleaseDate("unknown");
mb.setRating(null);
return mb;
}

Categories