I am learning spring batch, i have basic idea of reader, mapper and writer. Now my current requirement is to read each line from file, compress the data and write in a gigaspace based in-memory grid.
I understand that a line mapper is compulsory attribute for iteam reader. I have no use of created a mapper, all i need to do is read the line and send it to writer to write in the grid. So how can i skip the line mapper or how can i read the plain line. Currently i have done something like this, which doesnot seems to be idead solution.
public class ShrFileMapper implements FieldSetMapper<SpaceDocument> {
#Override
public SpaceDocument mapFieldSet(FieldSet fieldSet) throws BindException {
String positionId = fieldSet.readString(0);
StringBuffer line = new StringBuffer();
for (String fieldValue : fieldSet.getValues()) {
line.append("\t").append(fieldSet);
}
// logic for compression
SpaceDocument spaceDocument = new SpaceDocument("shr.doc");
spaceDocument.setProperty("id", positionId);
spaceDocument.setProperty("payload", compressedString);
return spaceDocument;
}
}
Assuming you are using a FlatFileItemReader, you need to provide a resource and a LineMapper. As you do not want to turn the line of input into anything else you do not need a LineTokenizer, you just want to passthrough the raw input. For more information you can checkout the official documentation:
http://docs.spring.io/spring-batch/reference/html/readersAndWriters.html#flatFileItemReader
Spring has provided already such functionality.
Please checkout the PassThroughLineMapper https://github.com/spring-projects/spring-batch/blob/master/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/file/mapping/PassThroughLineMapper.java
public class PassThroughLineMapper implements LineMapper<String>{
#Override
public String mapLine(String line, int lineNumber) throws Exception {
return line;
}
}
This class does exactly what you need!
Related
I'm writing a spring batch job and in one of my step I have the following code for the processor:
#Component
public class SubscriberProcessor implements ItemProcessor<NewsletterSubscriber, Account>, InitializingBean {
#Autowired
private AccountService service;
#Override public Account process(NewsletterSubscriber item) throws Exception {
if (!Strings.isNullOrEmpty(item.getId())) {
return service.getAccount(item.getId());
}
// search with email address
List<Account> accounts = service.findByEmail(item.getEmail());
checkState(accounts.size() <= 1, "Found more than one account with email %s", item.getEmail());
return accounts.isEmpty() ? null : accounts.get(0);
}
#Override public void afterPropertiesSet() throws Exception {
Assert.notNull(service, "account service must be set");
}
}
The above code works but I've found out that there are some edge cases where having more than one Account per NewsletterSubscriber is allowed. So I need to remove the state check and to pass more than one Account to the item writer.
One solution I found is to change both ItemProcessor and ItemWriter to deal with List<Account> type instead of Account but this have two drawbacks:
Code and tests are uglier and harder to write and maintain because of nested lists in writer
Most important more than one Account object may be written in the same transaction because a list given to writer may contain multiple accounts and I'd like to avoid this.
Is there any way, maybe using a listener, or replacing some internal component used by spring batch to avoid lists in processor?
Update
I've opened an issue on spring Jira for this problem.
I'm looking into isComplete and getAdjustedOutputs methods in FaultTolerantChunkProcessor which are marked as extension points in SimpleChunkProcessor to see if I can use them in some way to achieve my goal.
Any hint is welcome.
Item Processor takes one thing in, and returns a list
MyItemProcessor implements ItemProcessor<SingleThing,List<ExtractedThingFromSingleThing>> {
public List<ExtractedThingFromSingleThing> process(SingleThing thing) {
//parse and convert to list
}
}
Wrap the downstream writer to iron things out. This way stuff downstream from this writer doesn't have to work with lists.
#StepScope
public class ItemListWriter<T> implements ItemWriter<List<T>> {
private ItemWriter<T> wrapped;
public ItemListWriter(ItemWriter<T> wrapped) {
this.wrapped = wrapped;
}
#Override
public void write(List<? extends List<T>> items) throws Exception {
for (List<T> subList : items) {
wrapped.write(subList);
}
}
}
There isn't a way to return more than one item per call to an ItemProcessor in Spring Batch without getting pretty far into the weeds. If you really want to know where the relationship between an ItemProcessor and ItemWriter exits (not recommended), take a look at the implementations of the ChunkProcessor interface. While the simple case (SimpleChunkProcessor) isn't that bad, if you use any of the fault tolerant logic (skip/retry via FaultTolerantChunkProcessor), it get's very unwieldily quick.
A much simpler option would be to move this logic to an ItemReader that does this enrichment before returning the item. Wrap whatever ItemReader you're using in a custom ItemReader implementation that does the service lookup before returning the item. In this case, instead of returning a NewsletterSubscriber from the reader, you'd be returning an Account based on the previous information.
Instead of returning an Account you return create an AccountWrapper or Collection. The Writer obviously must take this into account :)
You can made transformer to transform your Pojo( Pojo object from file) to your Entity
By making the following code :
public class Intializer {
public static LGInfo initializeEntity() throws Exception {
Constructor<LGInfo> constr1 = LGInfo.class.getConstructor();
LGInfo info = constr1.newInstance();
return info;
}
}
And in your item Processor
public class LgItemProcessor<LgBulkLine, LGInfo> implements ItemProcessor<LgBulkLine, LGInfo> {
private static final Log log = LogFactory.getLog(LgItemProcessor.class);
#SuppressWarnings("unchecked")
#Override
public LGInfo process(LgBulkLine item) throws Exception {
log.info(item);
return (LGInfo) Intializer.initializeEntity();
}
}
I have created a Generic Spring Batch job for Processing of Data and storing into a CSV. I need some data from the Reader passed into the writer which I am trying to do using JobExecution. However suprisingly, the code seems to call the getWriter() first than the getReader() function.
My config is given below. Could someone explain why it is happening and if there is any alternative way to pass data from reader to writer.
#Bean
#StepScope
public ItemReader<Map<String, Object>> getDataReader() throws Exception {
return springBatchReader.getReader();
}
#Bean
#StepScope
public FlatFileItemWriter<Map<String, Object>> getDataWriter() throws Exception {
return (FlatFileItemWriter<Map<String, Object>>) springBatchWriter.getWriter();
}
#Bean
public Job SpringBatchJob(Step generateReport) throws Exception {
return jobBuilderFactory.get("SpringBatchJob" + System.currentTimeMillis())
.preventRestart()
.incrementer(new RunIdIncrementer())
.flow(generateReport)
.end()
.build();
}
#Bean
public Step generateReport() throws Exception {
return stepBuilderFactory.get("generateReport").<Map<String, Object>, Map<String, Object>>chunk(batchSize)
.reader(getDataReader()).writer(getDataWriter()).build();
}
The Data I want to pass from Reader to Writer is the column names for the CSV. Since my Reader runs variable SQL queries(passing the SQL query to be run as a command line argument) and hence the result-set/columns are not static and vary based on the given query. To provide the writer with the column names to be written for that particular execution in the setHeaderCallback was the rationale behind sending data from Reader to Writer.
The Reader simple runs the given query and puts the data into Map<String, Object> rather than any POJO due to the variable nature of the data. Here the key of the Map represent the column name while the corresponding object holds the values for that column. So essentially I want the writer setHeaderCallback to be able to access Keys of the passed Map or pass the keys from the Reader to the Writer somehow.
The Writer Code is as follows:
public FlatFileItemWriter<Map<String, Object>> getWriter() throws Exception {
String reportName = getReportName();
saveToContext(reportName, reportPath);
FileSystemResource resource = new FileSystemResource(String.join(File.separator, reportPath, getReportName()));
FlatFileItemWriter<Map<String, Object>> flatFileItemWriter = new FlatFileItemWriter<>();
flatFileItemWriter.setResource(resource);
//NEED HELP HERE..HOW TO SET THE HEADER TO BE THE KEYS OF THE MAP
//flatFileItemWriter.setHeaderCallback();
flatFileItemWriter.setLineAggregator(new DelimitedLineAggregator<Map<String, Object>>() {
{
setDelimiter(delimiter);
setFieldExtractor(
new PassThroughFieldExtractor<>()
);
}
});
flatFileItemWriter.afterPropertiesSet();
return flatFileItemWriter;
}
The execution order of those methods does not matter. You should not be looking for a way to pass data from the reader to the writer using the execution context, the Chunk-oriented Tasklet implementation provided by Spring Batch will do that for you.
The execution context could be used to pass data from one step to another, but not from the reader to the writer within the same step.
EDIT: update answer based on comments:
Your issue is that you are calling saveToContext(reportName, reportPath); in the getWriter method. This method is called at configuration time and not at runtime.
What you really need is provide the column names either via job parameters or put them in the execution context with a step, then use a step-scoped Header callback that is configured with those headers.
You can find an example here: https://stackoverflow.com/a/56719077/5019386. This example is for the lineMapper but you can do the same for the headerCallback. If you don't want to use the job parameters approach, you can create a tasklet step that determines column names and puts them in the execution context, then configure the step-scoped header callback with those names from the execution context, something like:
#Bean
#StepScope
public FlatFileHeaderCallback headerCallback(#Value("#{jobExecutionContext['columnNames']}") String columnNames) {
return new FlatFileHeaderCallback() {
#Override
public void writeHeader(Writer writer) throws IOException {
// use columnNames here
}
};
}
I was recently asked on a coding interview to write a simple Java console app that does some file io and displays the data. I was going to go to town with a DAO but since I never manipulate the data past a read, the entire idea of a DAO seems overkill.
Anyone know a clean way to ensure separation of concern without the weight of full CRUD when you don't need it ?
Looks like standard MVC pattern. Your console is the view, the code that reads file is the controller and the code that captures file line or whole file content is your model.
You can further simplify it as View and Model where model will encapsulate both file reading and wrapping its content into Java class.
How about Martin Fowler's Table Gateway pattern, explained here. Just include the find (Read) methods and miss create, insert, and update.
you can simply refer Command /Query pattern ,where commands are one which perform create update and delete operation seperately and Queries are introduce to read only purpose .
hence you implement what you need and left the others
This question was in interview so there was not much time for detailed design, As a minimum fulfillment of above concerns, following structure will provide flexibility. details could be filled as per the requirements.
public interface IODevice {
String read();
void write(String data);
}
class FileIO implements IODevice {
#Override
public String read() {
return null;
}
#Override
public void write(String data) {
//...;
}
}
class ConsoleIO implements IODevice {
#Override
public String read() {
return null;
}
#Override
public void write(String data) {
//... null;
}
}
public class DataConverter {
public static void main(String[] args) {
FileIO fData1 = null;// ... appropriately obtained instance;
FileIO fData2 = null;// ... appropriately obtained instance;
ConsoleIO cData = null;// ... appropriately obtained instance;
cData.write(fData2.read());
fData1.write(cData.read());
}
}
The client class uses only APIs of the devices. This will keep option of extending interface to implement new device wrapper (e.g. xml, stream etc)
So I have a custom class that extends FlatFileItemWriter that I am using to write a CSV file. Based on some condition I want to finish with my first file and then write to a new file. Is there a way to do this with FlatFileItemWriter?
Here is an example of my code:
#Override
public void write(List<? extends MyObject> items) throws Exception {
ExecutionContext stepContext = this.stepExecution.getExecutionContext();
int currentFileIncrememnt = (int) stepContext.get("currentFileIncrement");
if(currentFileIncrement== fileIncrement) {
super.write(items);
}
else {
super.close();
fileIncrement = currentFileIncrement;
super.setHeaderCallback(determineHeaderCallback());
super.setResource(new FileSystemResource("src/main/resources/" + fileIncrement+ ".csv"));
super.setShouldDeleteIfExists(true);
DelimitedLineAggregator<MyObject> delLineAgg = new DelimitedLineAggregator<>();
delLineAgg.setDelimiter(",");
BeanWrapperFieldExtractor<MyObject> fieldExtractor = new BeanWrapperFieldExtractor<>();
fieldExtractor.setNames(new String[] {"id", "amount"});
delLineAgg.setFieldExtractor(fieldExtractor);
super.setLineAggregator(delLineAgg);
super.write(items);
}
}
I don t understand if you are using a custom writer or the spring one. If your using the custom (maybe extendig from spring one) you could use whatever you want by passing parameters through the processor, reader o mapper. If you want to use the Spring writer you should make isolated steps.
Give more details.
I'm using Spring-Batch to read csv files sequentially with MultiResourceItemReader.
I want to create a reader that
reads the chunksize from file 1
reads the chunksize from file 2
compare both what has been read and create some kind of "patch" object
write the patch object to database
Now the problem with MultiResourceItemReader is that it will first read the full file1 in chunks, and when the file is finished, it will continue with file2.
How can I create batch steps that will switch between the files based on the chunksize?
You're going to need to create a custom reader to address what you're attempting. You can use the FlatFileItemReader under the hood for the actual file reading, but the logic of reading from two files at once you'll have to orchestrate yourself. Just coding off the top of my head, I'd expect something like this:
public class MultiFileReader implements ItemReader<SomeObject> {
private List<ItemStreamReader> readers;
public SomeObject read() {
SomeObject domainObject = new SomeObject();
for(ItemStreamReader curReader : readers) {
domainObject.add(curReader.read());
}
return domainObject;
}
}
you could use something like
#Bean
public MultiResourceItemReader<Company> readerCompany() throws IOException {
DelimitedLineTokenizer dlt = new DelimitedLineTokenizer();
dlt.setDelimiter("^");
dlt.setNames("name", "cui", "code", "euid", "companyState", "address");
dlt.setStrict(false);
return new MultiResourceItemReaderBuilder<Company>()
.name("readerCompany")
.resources(inputCompanyResources)
.delegate(new FlatFileItemReaderBuilder<Company>()
.name("getCompanyStatusReader")
.fieldSetMapper(new FieldSetMapper<Company>() {
#Override
public Company mapFieldSet(FieldSet fieldSet) throws BindException {
return Company.builder()
.name(fieldSet.readString("name"))
.localId(fieldSet.readString("cui"))
.code(fieldSet.readString("code"))
.companyStatus(readCompanyStatuses(fieldSet.readString("companyState")))
.address(fieldSet.readString("address"))
.internationalId(fieldSet.readString("euid"))
.build();
}
})
.linesToSkip(1)
.lineTokenizer(dlt)
.build())
.build();
}