Apache Commons CSV doesn't ignore missing column - java

Using Apache Commons CSV for parsing, but doesn't ignore missing column and throws exception.
with this sample data:
name age
Ali 35
John 25
Vahid 75
Below code record.get(DataColumns.surname) throws java.lang.IllegalArgumentException: Mapping for surname not found, expected one of [name, surname, age]. I need it returns null, optional or default value. Is there any option? I know it is possible with record.toMap().get(DataColumns.surname.name()) but its performance will not be good:
...
enum DataColumns { name, surname, age }
...
Reader in = new BufferedReader(new FileReader(fileName));
try (CSVParser records = CSVFormat.TDF
.withDelimiter(' ')
.withIgnoreSurroundingSpaces()
.withAllowDuplicateHeaderNames(false)
.withIgnoreHeaderCase()
.withTrim()
.withHeader(DataColumns.class)
.withFirstRecordAsHeader()
.withSkipHeaderRecord()
.withAllowMissingColumnNames(false)
.withIgnoreEmptyLines()
.parse(in)) {
for (CSVRecord record : records) {
String name = record.get(DataColumns.name);
String surname = record.get(DataColumns.surname);
Short age = Short.valueOf(record.get(DataColumns.age));
}
}
...

You might try using record.isMapped(columnName) to check if the column exists, recording into a variable so you don't have to check again every line.
Another option would be to use records.getHeaderNames() and store it into a variable once, before the loop, maybe even using a Set<String> for an extra kick of existance checking performance: Set<String> headerNames = new HashSet<>(records.getHeaderNames()).
Then, you can use the resulting variable inside the loop by calling headerNames.contains(columnName) to check whether the column exists or not.
Plese, see: https://javadoc.io/doc/org.apache.commons/commons-csv/latest/org/apache/commons/csv/CSVRecord.html

There is method: record.get(String) while you gave enum instead.
Try record.get(DataColumns.name.name())

Related

Read a csv file using FlatFileItemReader, throwing an exception when encountering an empty column

When using the FlatFileItemReader to read a csv file, a column mapping type is Int, but this column is null in the csv file (eg:6321517,Jack, 1,, . The last two columns are empty).
An exception(java.lang.NumberFormatException: Unparseable number) is thrown when parsing a file
csv
CUSTR_NBR,SUR_NAME,CHECK_FLAG,RESN_CODE
6321517,Jack,1,,
The first line of data (CUSTR_NBR, SUR_NAME, CHECK_FLAG, RESN_CODE) is parsed at first, so I set .SetLinesToSkip(1). However, the corresponding "CHECK_FLAG" and "RESN_CODE" cannot be performed normally if the null value is parsed. I believe there is a corresponding configuration item. I looked at the documentation for springbatch and couldn't find any related configuration items.
csvItemReader
#Bean
#StepScope
public FlatFileItemReader<InfoDTO> csvItemReader() {
FlatFileItemReader<InfoDTO> csvItemReader = new FlatFileItemReader<>();
csvItemReader.setResource(new ClassPathResource("data/charge-off.csv"));
csvItemReader.setLinesToSkip(1);
DelimitedLineTokenizer tokenizer=new DelimitedLineTokenizer();
String[] tokens = new String[]{"CUSTR_NBR","SUR_NAME","CHECK_FLAG","RESN_CODE","EMPNO"};
tokenizer.setNames(tokens);
DefaultLineMapper<InfoDTO> lineMapper=new DefaultLineMapper<InfoDTO>();
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(new InfoFileMapper());
lineMapper.afterPropertiesSet();
csvItemReader.setLineMapper(lineMapper);
return csvItemReader;
}
mapper
public class InfoFileMapper implements FieldSetMapper<ChargeOffBatchDTO> {
#Override
public InfoDTO mapFieldSet(FieldSet fieldSet) throws BindException {
if(fieldSet == null){
return null;
}
return new InfoDTO(
fieldSet.readString("CUSTR_NBR"),
fieldSet.readString("SUR_NAME"),
fieldSet.readString("CHECK_FLAG"),
fieldSet.readInt("RESN_CODE"),
fieldSet.readInt("EMPNO")
);
}
}
I need to map the null column to a value of 0. How to configure?
I've not worked with Spring Batch, but looking at the FieldSet Interface specification, there seem to be some alternative of ways which you could achieve what you want.
The Spring Batch reference does however mention some fault tolerance, specifically around throwing of exceptions, when a value does not exist. In order to disable this you need to set the strict to false
tokenizer.setStrict(false);
Otherwise you could simply try some old fashion alternatives, such as instead of trying to read the value directly into an int, just read it into a String and then validate that String before casting it to an int
String empNo = fieldSet.readString("EMPNO");
if ((empNo == null) || (empNo.equals(""))) {
empNo = "0";
}
int i = Integer.valueOf(empNo);
You may still get a java.lang.NumberFormatException if the field is not empty and not a String, so personally I would just solve the problem by handling the exception:
int myEmp = 0;
try {
myEmp = fieldSet.readInt("EMPNO");
} catch (NumberFormatException nfe) {
myEmp = 0;
}
It's maybe not that eloquent, but it'll work and server the purpose.
You can create your own line mapper implementation and check for the substring where it is empty and replace it with zero and pass the line forward.

Processing large number of records from a file in Java

I have million records in CSV file which has 3 columns id,firstName,lastName. I have to process this file in java and validate that id should be unique, firstName should not be null. If there are scenarios where id is not unique and/or firstName is null then I have to write these records in an output file with a fourth column as the reason("id not unique"/"firstName is NULL"). Performance should be good. Please suggest the best effective way.
You can use a collection (ArrayList) to store all the ID's in it in a loop and check if it doesn't already exist. If it doest, write it in a file.
The code should be like this:
if(!idList.contains(id)){
idList.add(id);
}else{
writer.write(id);
}
The above code should work in a loop for all the records being read from the CSV file.
You can use OpenCsv jar for the purpose you have specified. It's under Apache 2.0 licence.
You can download the jar from
http://www.java2s.com/Code/Jar/o/Downloadopencsv22jar.htm
below is the code for the same
Reader reader = Files.newBufferedReader(Paths.get(INPUT_SAMPLE_CSV_FILE_PATH));
CSVReader csvReader = new CSVReader(reader);
Writer writer = Files.newBufferedReader(Paths.get(OUTPUT_SAMPLE_CSV_FILE_PATH));
CSVWriter csvWriter = new CSVWriter(writer);
List<String[]> list = csvReader.readAll();
for (String[] row : list) {
//assuming First column to be Id
String id = row[0];
//assuming name to be second column
String name = row[1];
//assuming lastName to be third column
String lastName = row[2];
//Put your pattern here
if(id==null || !id.matches("pattern") || name==null || !name.matches("pattern")){
String[] outPutData = new String[]{id, name , lastName, "Invalid Entry"};
csvWriter.writeNext(outPutData);
}
}
let me know if this works or you need further help or clarifications.
If you want a good performance algorithm, you should not use ArrayList.contains(element) as explained here, uses O(n) complexity. Instead I suggest you to use a HashSet as the HashSet.Contains(element) operation has an O(1) complexity. To make things short, with ArrayList you would make 1,000,000^2 operations, while with HashSet you would use 1,000,000 operations.
In pseudo-code (to not give away the full answer and make you find the answer on your own) I would do this:
File outputFile
String[] columns
HashSet<String> ids
for(line in file):
columns = line.split(',')
if(ids.contains(columns.id):
outputFile.append(columns.id + " is not unique")
continue
if(columns.name == null):
outputFile.append("first name is null!")
continue
ids.add(columns.id)

How to read data from CSV if contains more than excepted separators?

I use CsvJDBC for read data from a CSV. I get CSV from web service request, so not loaded from file. I adjust these properties:
Properties props = new java.util.Properties();
props.put("separator", ";"); // separator is a semicolon
props.put("fileExtension", ".txt"); // file extension is .txt
props.put("charset", "UTF-8"); // UTF-8
My sample1.txt contains these datas:
code;description
c01;d01
c02;d02
my sample2.txt contains these datas:
code;description
c01;d01
c02;d0;;;;;2
It is optional for me deleted headers from CSV. But not optional for me change semi-colon separator.
EDIT: My query for resultSet: SELECT * FROM myCSV
I want to read code column in sample1.txt and sample2.txt with:
resultSet.getString(1)
and read full description column with many semi-colons (d0;;;;;2). Is it possible with CsvJdbc driver or need to change driver?
Thank you any advice!
This is a problem that occurs when you have messy, invalid input, which you need to try to interpret, that's being read by a too-high-level package that only handles clean input. A similar example is trying to read arbitrary HTML with an XML parser - close, but no cigar.
You can guess where I'm going: you need to pre-process your input.
The preprocessing may be very easy if you can make some assumptions about the data - for example, if there are guaranteed to be no quoted semi-colons in the first column.
You could try supercsv. We have implemented such a solution in our project. More on this can be found in http://supercsv.sourceforge.net/
and
Using CsvBeanReader to read a CSV file with a variable number of columns
Finally this problem solved without a CSVJdbc or SuperCSV driver. These drivers works fine. There are possible query data form CSV file and many features content. In my case I don't need query data from CSV. Unfortunately, sometimes the description column content one or more semi-colons and which it is my separator.
First I check code in answer of #Maher Abuthraa and modified to:
private String createDescriptionFromResult(ResultSet resultSet, int columnCount) throws SQLException {
if (columnCount > 2) {
StringBuilder data_list = new StringBuilder();
for (int ii = 2; ii <= columnCount; ii++) {
data_list.append(resultSet.getString(ii));
if (ii != columnCount)
data_list.append(";");
}
// data_list has all data from all index you are looking for ..
return data_list.toString();
} else {
// use standard way
return resultSet.getString(2);
}
}
The loop started from 2, because 1 column is code and only description column content many semi-colons. The CSVJdbc driver split columns by separator ; and these semi-colons disappears from columns data. So, I explicit add semi-colons to description, except the last column, because it is not relevant in my case.
This code work fine. But not solved my all problem. When I adjusted two columns in header of CSV I get error in row, which content more than two semi-colons. So I try adjust ignore of headers or add many column name (or simple ;) to a header. In superCSV ignore of headers option work fine.
My colleague opinion was: you are don't need CSV driver, because try load CSV which not would be CSV, if separator is sometimes relevant data.
I think my colleague has right and I loaded CSV data whith following code:
InputStream in = null;
try {
in = new ByteArrayInputStream(csvData);
List lines = IOUtils.readLines(in, "UTF-8");
Iterator it = lines.iterator();
String line = "";
while (it.hasNext()) {
line = (String) it.next();
String description = null;
String code = null;
String[] columns = line.split(";");
if (columns.length >= 2) {
code = columns[0];
String[] dest = new String[columns.length - 1];
System.arraycopy(columns, 1, dest, 0, columns.length - 1);
description = org.apache.commons.lang.StringUtils.join(dest, ";");
(...)
ok.. my solution to go and read all fields if columns are more than 2 ... like:
int ccc = meta.getColumnCount();
if (ccc > 2) {
ArrayList<String> data_list = new ArrayList<String>();
for (int ii = 1; ii < ccc; ii++) {
data_list.add(resultSet.getString(i));
}
//data_list has all data from all index you are looking for ..
} else {
//use standard way
resultSet.getString(1);
}
If the table is defined to have as many columns as there could be semi-colons in the source, ignoring the initial column definitions, then the excess semi-colons would be consumed by the database driver automatically.
The most likely reason for them to appear in the final column is because the parser returns the balance of the row to the terminator in the field.
Simply increasing the number of columns in the table to match the maximum possible in the input will avoid the need for custom parsing in the program. Try:
code;description;dummy1;dummy2;dummy3;dummy4;dummy5
c01;d01
c02;d0;;;;;2
Then, the additional ';' delimiters will be consumed by the parser correctly.

error during grouping files based on the date field

I have a large file which has 10,000 rows and each row has a date appended at the end. All the fields in a row are tab separated. There are 10 dates available and those 10 dates have randomly been assigned to all the 10,000 rows. I am now writing a java code to write all those rows with the same date into a separate file where each file has the corresponding rows with that date.
I am trying to do it using string manipulations, but when I am trying to sort the rows based on date, I am getting an error while mentioning the date and the error says the literal is out of range. Here is the code that I used. Please have a look at it let me know if this is the right approach, if not, kindly suggest a better approach. I tried changing the datatype to Long, but still the same error. The row in the file looks something like this:
Each field is tab separated and the fields are:
business id, category, city, biz.name, longitude, state, latitude, type, date
**
qarobAbxGSHI7ygf1f7a_Q ["Sandwiches","Restaurants"] Gilbert Jersey
Mike's Subs -111.8120071 AZ 3.5 33.3788385 business 06012010
**
The code is:
File f=new File(fn);
if(f.exists() && f.length()>0)
{
BufferedReader br=new BufferedReader(new FileReader(fn));
BufferedWriter bw = new BufferedWriter(new FileWriter("FilteredDate.txt"));
String s=null;
while((s=br.readLine())!=null){
String[] st=s.split("\t");
if(Integer.parseInt(st[13])==06012010){
Thanks a lot for your time..
Try this,
List<String> sampleList = new ArrayList<String>();
sampleList.add("06012012");
sampleList.add("06012013");
sampleList.add("06012014");
sampleList.add("06012015");
//
//
String[] sampleArray = s.split(" ");
if (sampleArray != null)
{
String sample = sampleArray[sampleArray.length - 1];
if (sampleList.contains(sample))
{
stringBuilder.append(sample + "\n");
}
}
i suggest not to use split, but rather use
String str = s.subtring(s.lastIndexOf('\t'));
in any case, you try to take st[13] when i see you only have 9 columns. might be you just need st[8]
one last thing, look at this post to learn what 06012010 really means

Reading a property file and saving to an object

I have property file called person.properties. I need to add several person entries in.
A person entry will have a Name, Age, Telephone. There will be many Person entries in this Property file.
ID : 1
Name: joe
Age: 30
Telephone: 444444
ID : 2
Name: Anne
Age: 20
Telephone: 575757
ID : 3
Name: Matt
Age: 17
Telephone : 7878787
ID : 4
Name: Chris
Age: 21
Telephone : 6767676
I need to read the property file and save each record in an Person object.
Person p = new Person();
p.setId(ADD THE FIRST VALUE OF ID FROM THE PROPERTY FILE);
p.setName(ADD THE FIRST VALUE OF NAME FROM THE PROPERTY FILE);
like wise.. and save it in an array.
I think, that i will not be able to read from the person.properties file above and save it to the person object as i require. Because i am having the same key in the property file. Therefore how can i achieve this?
You don't have to use the Property methods for this, you can simply read the file as a text file and parse it manually:
Scanner s = new Scanner(new File("propertyfile.properties"));
while (s.hasNextLine()) {
String id = s.nextLine().split(":")[1].trim();
String name = s.nextLine().split(":")[1].trim();
String age = s.nextLine().split(":")[1].trim();
String phone = s.nextLine().split(":")[1].trim();
}
The file format you describe is not really a properties file. Just read it yourself, using something like
public File openFile(String URI); // write this yourself
public void readFile(File names) {
BufferedReader br = new BufferedReader(new FileReader(name));
while(br.ready()) {
String next = br.readLine();
String[] split = next.split(" : ");
// handle each case, etc.
Modification of file
If you want to modify the key and write it back to the same position, you should use a database. Here are two free ones: MySQL and SQLite. It's possible to edit the file in that way, but it's much easier to just do it with a database, that's what it's designed for.
What you do is actually not the purpose of property files in java, I think. Nevertheless, here is how to handle property files:
Properties prop = new Properties();
try {
//load a properties file
prop.load(new FileInputStream("file.properties"));
//get the property value and print it out
System.out.println(prop.getProperty("name"));
System.out.println(prop.getProperty("age"));
System.out.println(prop.getProperty("telephone"));
} catch (IOException ex) {
ex.printStackTrace();
}
Could this help you or what you actually want to do?
I think for your approach a database style thingy would be better.

Categories