I wanted to validate the number of records in the file be equal to the number mentioned in the footer row.
is there any inbuilt functionality within spring batch to read fixed-width file footer row?
I tried to use the custom fieldset mapper to validate the number of records to be same as mentioned in the footer row.
new FlatFileItemReaderBuilder<FieldSet>()
.name("personItemReader")
.resource(fileName)
.fixedLength()
.columns(getRange())
.names(getNames())
.fieldSetMapper(new PassThroughFieldSetMapper())
.linesToSkip(1)
.build();
public class PersonFileReader implements ItemStreamReader<Person> {
private final ItemStreamReader<FieldSet> itemStreamReader;
private int recordCount = 0;
private int expectedRecordCount = 0;
private StepExecution stepExecution;
private final Logger logger = LoggerFactory.getLogger(PersonFileReader.class);
public PersonFileReader(#Qualifier("flatFileReader") ItemStreamReader itemStreamReader) {
this.itemStreamReader = itemStreamReader;
}
#Override
public Person read() throws Exception {
return process(itemStreamReader.read());
}
//code removed
#AfterStep
public ExitStatus afterStep(StepExecution execution) {
if (recordCount == expectedRecordCount) {
return execution.getExitStatus();
} else {
return ExitStatus.STOPPED;
}
}
private Person process(FieldSet fieldSet) {
Person result = null;
if (fieldSet != null) {
if (fieldSet.getFieldCount() > 1) {
logger.info("fieldSet", fieldSet.getFieldCount());
result = Person.builder()
.name(fieldSet.readString(0))
.lastName(fieldSet.readString(1))
.age(fieldSet.readInt(2))
.salary(fieldSet.readDouble(3))
.houseNumber(fieldSet.readInt(4))
.line1(fieldSet.readString(5))
.line2(fieldSet.readString(6)).build();
recordCount++;
} else {
expectedRecordCount = fieldSet.readInt(0);
if (expectedRecordCount != this.recordCount) {
this.stepExecution.setTerminateOnly();
}
}
}
return result;
}
}
But this fails to read the footer with an exception.
ogg.springframework.batch.item.file.FlatFileParseException: Parsing error at line: 6 in resource=[file [D:\Data\Study\spring-batch\batch-process\src\main\resources\static\file1.txt]], input=[4]
Caused by: org.springframework.batch.item.file.transform.IncorrectLineLengthException: Line is shorter than max range 66
Sample File.
Name lastName age salary HouseNo AddLine1 AddLine2
Andrew Green 25 26000 10 Avenue Glasgow
Ben Brown 32 28500 11 Avenue Glasgow
Celia Black 29 28000 12 Avenue Glasgow
Steve Pink 41 52000 13 Avenue Glasgow
4
Related
package Testing;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class testing {
// map to store the number of errors per user
private static Map<String, Integer> errorsPerUser = new HashMap<>();
// variable to store the number of jobs started
private static int jobsStarted = 0;
// variable to store the number of jobs completed
private static int jobsCompleted = 0;
public static void main(String[] args) {
// specify the path to the log file
String filePath = "C:/Users/Wafiq/Documents/WIX1002/GroupAssignment/extracted_log.txt";
try (Scanner scanner = new Scanner(new File(filePath))) {
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
int timestampEndIndex = line.indexOf("]");
String lineWithoutTimestamp = line.substring(timestampEndIndex+2);
// check if line contains error message
if (lineWithoutTimestamp.contains("error: This association")) {
// extract the user from the line
String user = extractUser(lineWithoutTimestamp);
// increment the error count for the user
incrementErrorCount(user);
}
// check if line indicates job start
if (lineWithoutTimestamp.contains("sched: Allocate")) {
jobsStarted++;
}
// check if line indicates job completion
if (lineWithoutTimestamp.contains("_job_complete: JobId")) {
jobsCompleted++;
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
// print the results
System.out.println("Number of jobs started: " + jobsStarted);
System.out.println("Number of jobs completed: " + jobsCompleted);
System.out.println("Number of errors per user:");
for (Map.Entry<String, Integer> entry : errorsPerUser.entrySet()) {
System.out.println(": " + entry.getValue());
}
}
// method to extract the user from the line
private static String extractUser(String line) {
// assuming the user is the string before "error" in the line
return line.substring(0, line.indexOf("error")).trim();
}
// method to increment the error count for the user
private static void incrementErrorCount(String user) {
if (errorsPerUser.containsKey(user)) {
errorsPerUser.put(user, errorsPerUser.get(user) + 1);
} else {
errorsPerUser.put(user, 1);
}
}
}
Output:
File data:
I'm trying to extract the number of jobs causing error and the corresponding user. I have done the number of jobs causing error but I don't know how to extract the number of corresponding user.
(p/s: Pls don't slander me, I'm a first year student in Comp Science. I have tried my best)
The user is not at the same index each line so I dont know how to extract it from the line.
While the user is not at the same index across lines, it always comes after user=' and ends on the next '. Search for these substrings in your line and you are done.
int startIndex = line.indexOf("user='");
if (startIndex>=0) {
int endIndex = line.indexOf("'", startIndex);
String user = line.substring(startIndex, endIndex);
System.out.println("user="+user);
} else {
System.out.println("no user in line");
}
Edit: I saw there is another pattern also in use. I think you can change the above algorithm to also allow for the second one.
I am learning how to work with files in Java. I have a sample file which contains key pairs and it values. I am trying to find a key pairs and if it matches, then output file would be updated with both, key pair and it's value. I am able to get key pairs in output file but unable to get values too. Stringbuilder may work here to append strings but I don't know how.
Below are my input and output files.
Input File:
born time 9 AM London -- kingNumber 1234567890 -- address: abc/cd/ef -- birthmonth: unknown
born time 9 AM Europe -- kingNumber 1234567890 -- address: abc/cd/ef -- birthmonth: december
Expected Output File:
kingNumber 1234567890 birthmonth unknown
kingNumber 1234567890 birthmonth unkbown
Current Output File:
kingNumber birthmonth
kingNumber birthmonth
I am able to write key pair ("kingNumber" and "birthmonth" in this case) to output file but I am not sure what I can do to get it's value too.
String kn = "kingNumber:";
String bd = "birthmonth:";
try {
File f = new File("sample.txt");
Scanner sc = new Scanner(f);
FileWriter fw = new FileWriter("output.txt");
while(sc.hasNextLine()) {
String lineContains = sc.next();
if(lineContains.contains(kn)) {
fw.write(kn + "\n");
// This is where I am stuck. What
// can I do to get it's value (number in this case).
}
else if(lineContains.contains(bd)) {
fw.write(bd);
// This is where I am stuck. What
// can I do to get it's value (birthday in this case).
}
}
} catch (IOException e) {
e.printStackTrace();
}
you could use java.util.regex.Pattern & java.util.regex.Matcherwith a pattern alike:
^born\stime\s([a-zA-Z0-9\s]*)\s--\skingNumber\s(\d+)\s--\saddress:\s([a-zA-Z0-9\s/]*)\s--\sbirthmonth:\s([a-zA-Z0-9\s]*)$
write less, do more.
I have written a simple parser that it following data format from your example.
You will need to call it like this:
PairParser parser = new PairParser(lineContains);
then you can get value from the parser by pair keys
How to get value:
parser.getValue("kingNumber")
Note that keys do not have trailing column character.
The parser code is here:
package com.grenader.example;
import java.util.HashMap;
import java.util.Map;
public class PairParser {
private Map<String, String> data = new HashMap<>();
/**
* Constructor, prepare the data
* #param dataString line from the given data file
*/
public PairParser(String dataString) {
if (dataString == null || dataString.isEmpty())
throw new IllegalArgumentException("Data line cannot be empty");
// Spit the input line into array of string blocks based on '--' as a separator
String[] blocks = dataString.split("--");
for (String block : blocks)
{
if (block.startsWith("born time")) // skip this one because it doesn't looks like a key/value pair
continue;
String[] strings = block.split("\\s");
if (strings.length != 3) // has not exactly 3 items (first items is empty), skipping this one as well
continue;
String key = strings[1];
String value = strings[2];
if (key.endsWith(":"))
key = key.substring(0, key.length()-1).trim();
data.put(key.trim(), value.trim());
}
}
/**
* Return value based on key
* #param key
* #return
*/
public String getValue(String key)
{
return data.get(key);
}
/**
* Return number of key/value pairs
* #return
*/
public int size()
{
return data.size();
}
}
And here is the Unit Test to make sure that the code works
package com.grenader.example;
import com.grenader.example.PairParser;
import org.junit.Test;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.fail;
public class PairParserTest {
#Test
public void getValue_Ok() {
PairParser parser = new PairParser("born time 9 AM London -- kingNumber 1234567890 -- address: abc/cd/ef -- birthmonth: unknown");
assertEquals("1234567890", parser.getValue("kingNumber"));
assertEquals("unknown", parser.getValue("birthmonth"));
}
#Test(expected = IllegalArgumentException.class)
public void getValue_Null() {
new PairParser(null);
fail("This test should fail with Exception");
}
#Test(expected = IllegalArgumentException.class)
public void getValue_EmptyLine() {
new PairParser("");
fail("This test should fail with Exception");
}
#Test()
public void getValue_BadData() {
PairParser parser = new PairParser("bad data bad data");
assertEquals(0, parser.size());
}
}
I have 107 documents in my index base, i created a method to return all these documents with pagination, in my case the first page contains 20 documents and i logically get 6 pages, the 5 first pages contain 20 documents each and the 6th page contains only 7. The problem is that the methods reeturn always 1 page not 6
#Override
#Transactional(readOnly = true)
public Page<Convention> findAll(Pageable pageable) throws UnknownHostException {
String[] parts = pageable.getSort().toString().split(":");
SortOrder sortOrder;
if ("DESC".equalsIgnoreCase(parts[1].trim())) {
sortOrder = SortOrder.DESC;
} else {
sortOrder = SortOrder.ASC;
}
SearchResponse searchResponse = elasticsearchConfiguration.getTransportClient()
.prepareSearch("convention")
.setTypes("convention")
.setQuery(QueryBuilders.matchAllQuery())
.addSort(SortBuilders.fieldSort(parts[0])
.order(sortOrder))
.setSize(pageable.getPageSize())
.setFrom(pageable.getPageNumber() * pageable.getPageSize())
.setSearchType(SearchType.QUERY_THEN_FETCH)
.get();
return searchResults(searchResponse);
}
private Page<Convention> searchResults(SearchResponse searchResponse) {
List<Convention> conventions = new ArrayList<>();
for (SearchHit hit : searchResponse.getHits()) {
if (searchResponse.getHits().getHits().length <= 0) {
return null;
}
String sourceAsString = hit.getSourceAsString();
if (sourceAsString != null) {
ObjectMapper mapper = new ObjectMapper();
Convention convention = null;
try {
convention = mapper.readValue(sourceAsString, Convention.class);
} catch (IOException e) {
LOGGER.error("Error", e);
}
conventions.add(convention);
}
}
return new PageImpl<>(conventions);
}
http://localhost:8081/api/conventions?page=0&size=20&sort=shortname,DESC
When i execute this api, i have TotalElements=20, Number=0, TotalPages=1, and Size=0
#GetMapping("/conventions")
public ResponseEntity<List<Convention>> getAllConventions(final Pageable pageable) throws UnknownHostException {
final Page<Convention> page = conventionService.findAll(pageable);
System.out.println("-------------- 1:" + page.getTotalElements()); // 20
System.out.println("-------------- 2:" + page.getNumber()); // 0
System.out.println("-------------- 3:" + page.getTotalPages()); // 1
System.out.println("-------------- 4:" + page.getSize()); // 0
HttpHeaders headers = new HttpHeaders();
headers.add("X-Total-Count", Long.toString(page.getTotalElements()));
return new ResponseEntity<>(page.getContent(), headers, HttpStatus.OK);
}
This issue is addressed and fixed in current stable version of spring-data-elasticsearch 3.0.7
See https://jira.spring.io/browse/DATAES-402
i think it comes from this line: return new PageImpl<>(conventions);
Maybe you should transfer the total size of the responshits, because you override the query.
My use case is reading objects from a CSV, performing some modifications on the objects, then writing them to another CSV. I've attempted to modify the supercsv examples as below, but am getting a SuperCsvConstraintViolationException that I'm not sure how to address.
public class ReadingWriting {
private static final String CSV_READ_FILENAME = "src/test/resources/customers.csv";
public static final String CSV_WRITE_FILENAME = "target/writeWithCsvBeanWriter.csv";
public static void main(String[] args) throws Exception {
readWithCsvBeanReader();
}
/**
* Sets up the processors used for the examples. There are 10 CSV columns, so 10 processors are defined. Empty
* columns are read as null (hence the NotNull() for mandatory columns).
*
* #return the cell processors
*/
private static CellProcessor[] getProcessors() {
final String emailRegex = "[a-z0-9\\._]+#[a-z0-9\\.]+"; // just an example, not very robust!
StrRegEx.registerMessage(emailRegex, "must be a valid email address");
final CellProcessor[] processors = new CellProcessor[]{
new UniqueHashCode(), // customerNo (must be unique)
new NotNull(), // firstName
new NotNull(), // lastName
new ParseDate("dd/MM/yyyy"), // birthDate
new NotNull(), // mailingAddress
new Optional(new ParseBool()), // married
new Optional(new ParseInt()), // numberOfKids
new NotNull(), // favouriteQuote
new StrRegEx(emailRegex), // email
new LMinMax(0L, LMinMax.MAX_LONG) // loyaltyPoints
};
return processors;
}
/**
* An example of reading using CsvBeanReader.
*/
private static void readWithCsvBeanReader() throws Exception {
ICsvBeanReader beanReader = null;
ICsvBeanWriter beanWriter = null;
try {
beanReader = new CsvBeanReader(new FileReader(CSV_READ_FILENAME), CsvPreference.STANDARD_PREFERENCE);
beanWriter = new CsvBeanWriter(new FileWriter(CSV_WRITE_FILENAME), CsvPreference.STANDARD_PREFERENCE);
final CellProcessor[] processors = getProcessors();
// the header elements are used to map the values to the bean (names must match)
final String[] header = beanReader.getHeader(true);
beanWriter.writeHeader(header);
CustomerBean customer;
while ((customer = beanReader.read(CustomerBean.class, header, processors)) != null) {
System.out.println(String.format("lineNo=%s, rowNo=%s, customer=%s", beanReader.getLineNumber(),
beanReader.getRowNumber(), customer));
beanWriter.write(customer, header, processors);//this line causes the below output
/*
lineNo=4, rowNo=2, customer=CustomerBean(customerNo=1, loyaltyPoints=0, mailingAddress=1600 Amphitheatre Parkway
Mountain View, CA 94043
United States)
Exception in thread "main" Disconnected from the target VM, address: '127.0.0.1:60782', transport: 'socket'
org.supercsv.exception.SuperCsvConstraintViolationException: duplicate value '1' encountered with hashcode 49
processor=org.supercsv.cellprocessor.constraint.UniqueHashCode
context={lineNo=2, rowNo=2, columnNo=1, rowSource=[1, John, Dunbar, Wed Jun 13 00:00:00 AEST 1945, 1600 Amphitheatre Parkway
Mountain View, CA 94043
United States, null, null, "May the Force be with you." - Star Wars, jdunbar#gmail.com, 0]}
at org.supercsv.cellprocessor.constraint.UniqueHashCode.execute(UniqueHashCode.java:78)
at org.supercsv.util.Util.executeCellProcessors(Util.java:93)
at org.supercsv.io.CsvBeanWriter.write(CsvBeanWriter.java:136)
at Reading.readWithCsvBeanReader(Reading.java:78)
at Reading.main(Reading.java:25)
Process finished with exit code 1
*/
}
} finally {
if (beanReader != null) {
beanReader.close();
}
if( beanWriter != null ) {
beanWriter.close();
}
}
}
}
In your cell processor you are mentioning that customer number is having unique hash code. In the file I suspect a line with the same customer number.
When I run the LensKit demo program I get this error:
[main] ERROR org.grouplens.lenskit.data.dao.DelimitedTextRatingCursor - C:\Users\sean\Desktop\ml-100k\u - Copy.data:4: invalid input, skipping line
I reworked the ML 100k data set so that it only holds this line although I dont see how this would effect it:
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244
Here is the code I am using too:
public class HelloLenskit implements Runnable {
public static void main(String[] args) {
HelloLenskit hello = new HelloLenskit(args);
try {
hello.run();
} catch (RuntimeException e) {
System.err.println(e.getMessage());
System.exit(1);
}
}
private String delimiter = "\t";
private File inputFile = new File("C:\\Users\\sean\\Desktop\\ml-100k\\u - Copy.data");
private List<Long> users;
public HelloLenskit(String[] args) {
int nextArg = 0;
boolean done = false;
while (!done && nextArg < args.length) {
String arg = args[nextArg];
if (arg.equals("-e")) {
delimiter = args[nextArg + 1];
nextArg += 2;
} else if (arg.startsWith("-")) {
throw new RuntimeException("unknown option: " + arg);
} else {
inputFile = new File(arg);
nextArg += 1;
done = true;
}
}
users = new ArrayList<Long>(args.length - nextArg);
for (; nextArg < args.length; nextArg++) {
users.add(Long.parseLong(args[nextArg]));
}
}
public void run() {
// We first need to configure the data access.
// We will use a simple delimited file; you can use something else like
// a database (see JDBCRatingDAO).
EventDAO base = new SimpleFileRatingDAO(inputFile, "\t");
// Reading directly from CSV files is slow, so we'll cache it in memory.
// You can use SoftFactory here to allow ratings to be expunged and re-read
// as memory limits demand. If you're using a database, just use it directly.
EventDAO dao = new EventCollectionDAO(Cursors.makeList(base.streamEvents()));
// Second step is to create the LensKit configuration...
LenskitConfiguration config = new LenskitConfiguration();
// ... configure the data source
config.bind(EventDAO.class).to(dao);
// ... and configure the item scorer. The bind and set methods
// are what you use to do that. Here, we want an item-item scorer.
config.bind(ItemScorer.class)
.to(ItemItemScorer.class);
// let's use personalized mean rating as the baseline/fallback predictor.
// 2-step process:
// First, use the user mean rating as the baseline scorer
config.bind(BaselineScorer.class, ItemScorer.class)
.to(UserMeanItemScorer.class);
// Second, use the item mean rating as the base for user means
config.bind(UserMeanBaseline.class, ItemScorer.class)
.to(ItemMeanRatingItemScorer.class);
// and normalize ratings by baseline prior to computing similarities
config.bind(UserVectorNormalizer.class)
.to(BaselineSubtractingUserVectorNormalizer.class);
// There are more parameters, roles, and components that can be set. See the
// JavaDoc for each recommender algorithm for more information.
// Now that we have a factory, build a recommender from the configuration
// and data source. This will compute the similarity matrix and return a recommender
// that uses it.
Recommender rec = null;
try {
rec = LenskitRecommender.build(config);
} catch (RecommenderBuildException e) {
throw new RuntimeException("recommender build failed", e);
}
// we want to recommend items
ItemRecommender irec = rec.getItemRecommender();
assert irec != null; // not null because we configured one
// for users
for (long user: users) {
// get 10 recommendation for the user
List<ScoredId> recs = irec.recommend(user, 10);
System.out.format("Recommendations for %d:\n", user);
for (ScoredId item: recs) {
System.out.format("\t%d\n", item.getId());
}
}
}
}
I am really lost on this one and would appreciate any help. Thanks for your time.
The last line of your input file only contains one field. Each input file line needs to contain 3 or 4 fields.