How to read from particular header in opencsv? - java

I have a csv file. I want to extract particular column from it.For example:
Say, I have csv:
id1,caste1,salary,name1
63,Graham,101153.06,Abraham
103,Joseph,122451.02,Charlie
63,Webster,127965.91,Violet
76,Smith,156150.62,Eric
97,Moreno,55867.74,Mia
65,Reynolds,106918.14,Richard
How can i use opencsv to read only data from header caste1?

Magnilex and Sparky are right in that CSVReader does not support reading values by column name. But that being said there are two ways you can do this.
Given that you have the column names and the default CSVReader reads the header you can search the first the header for the position then use that from there on out;
private int getHeaderLocation(String[] headers, String columnName) {
return Arrays.asList(headers).indexOf(columnName);
}
so your method would look like (leaving out a lot of error checks you will need to put in)
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
int columnPosition;
nextLine = reader.readNext();
columnPosition = getHeaderLocation(nextLine, "castle1");
while ((nextLine = reader.readNext()) != null && columnPosition > -1) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[columnPosition]);
}
I would only do the above if you were pressed for time and it was only one column you cared about. That is because openCSV can convert directly to an object that has the variables the same as the header column names using the CsvToBean class and the HeaderColumnNameMappingStrategy.
So first you would define a class that has the fields (and really you only need to put in the fields you want - extras are ignored and missing ones are null or default values).
public class CastleDTO {
private int id1;
private String castle1;
private double salary;
private String name1;
// have all the getters and setters here....
}
Then your code would look like
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
HeaderColumnNameMappingStrategy<CastleDTO> castleStrategy = new HeaderColumnNameMappingStrategy<CastleDTO>();
CsvToBean<CastleDTO> csvToBean = new CsvToBean<CastleDTO>();
List<CastleDTO> castleList = csvToBean.parse(castleStrategy, reader);
for (CastleDTO dto : castleList) {
System.out.println(dto.getCastle1());
}

There is no built in functionality in opencsv for reading from a column by name.
The official FAQ example has the following example on how to read from a file:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
You simply fetch the value in second column for each row by accesing the row with nextLine[1] (remember, arrays indices are zero based).
So, in your case you could simply read from the second line:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
System.out.println(nextLine[1]);
}
For a more sophisticated way of determining the column index from its header, refer to the answer from Scott Conway.

From the opencsv docs:
Starting with version 4.2, there’s another handy way of reading CSV files that doesn’t even require creating special classes. If your CSV file has headers, you can just initialize a CSVReaderHeaderAware and start reading the values out as a map:
reader = new CSVReaderHeaderAware(new FileReader("yourfile.csv"));
record = reader.readMap();
.readMap() will return a single record. You need to call .readMap() repeatedly to get all the records until you get null when it runs to the end (or to the first empty line), e.g.:
Map<String, String> values;
while ((values = reader.readMap()) != null) {
// consume the values here
}
The class also has another constructor which allows more customization, e.g.:
CSVReaderHeaderAware reader = new CSVReaderHeaderAware(
new InputStreamReader(inputStream),
0, // skipLines
parser, // custom parser
false, // keep end of lines
true, // verify reader
0, // multiline limit
null // null for default locale
);
One downside which I have found is that since the reader is lazy it does not offer a record count, therefore, if you need to know the total number (for example to display correct progress information), then you'll need to use another reader just for counting lines.
You also have available the CSVReaderHeaderAwareBuilder

I had a task to remove several columns from existing csv, example of csv:
FirstName, LastName, City, County, Zip
Steve,Hopkins,London,Greater London,15554
James,Bond,Vilnius,Vilniaus,03250
I needed only FirstName and LastName columns with values and it is very important that order should be the same - default rd.readMap() does not preserve the order, code for this task:
String[] COLUMN_NAMES_TO_REMOVE = new String[]{"", "City", "County", "Zip"};
CSVReaderHeaderAware rd = new CSVReaderHeaderAware(new StringReader(old.csv));
CSVWriter writer = new CSVWriter((new FileWriter(new.csv)),
CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER, CSVWriter.NO_ESCAPE_CHARACTER, CSVWriter.DEFAULT_LINE_END);
// let's get private field
Field privateField = CSVReaderHeaderAware.class.getDeclaredField("headerIndex");
privateField.setAccessible(true);
Map<String, Integer> headerIndex = (Map<String, Integer>) privateField.get(rd);
// do ordering in natural order - 0, 1, 2 ... n
Map<String, Integer> sortedInNaturalOrder = headerIndex.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.naturalOrder()))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue,
(oldValue, newValue) -> oldValue, LinkedHashMap::new));
// let's get headers in natural order
List<String> headers = sortedInNaturalOrder.keySet().stream().distinct().collect(Collectors.toList());
// let's remove headers
List<String> removedColumns = new ArrayList<String>(Arrays.asList(COLUMN_NAMES_TO_REMOVE));
headers.removeAll(removedColumns);
// save column names
writer.writeNext(headers.toArray(new String[headers.size()]));
List<String> keys = new ArrayList<>();
Map<String, String> values;
while ((values = rd.readMap()) != null) {
for (String key : headers) {
keys.add(values.get(key));
if (keys.size() == headers.size()) {
String[] itemsArray = new String[headers.size()];
itemsArray = keys.toArray(itemsArray);
// save values
writer.writeNext(itemsArray);
keys.clear();
}
}
}
writer.flush();
Output:
FirstName, LastName
Steve,Hopkins
James,Bond

Looking at the javadoc
if you create a CSVReader object, then you can use the method .readAll to pull the entire file. It returns a List of String[], with each String[] representing a line of the file. So now you have the tokens of each line, and you only want the second element of that, so split them up as they have been nicely given to you with delimiters. And on each line you only want the second element, so:
public static void main(String[] args){
String data = "63,Graham,101153.06,Abraham";
String result[] = data.split(",");
System.out.print(result[1]);
}

Related

BufferedReader - Output columns in in different order JAVA

I have 2 csv files with column 'car', 'bike', 'tractor' etc
The below code prints out data from the csv which works fine, however cvs 1 prints out in a different or to csv 2 so I want to arrange the columns in a different order.
From this code, how can I organise the data to print out in order of which column I want first, second etc.
BufferedReader r = new BufferedReader(new InputStreamReader(str));
Stream lines = r.lines().skip(1);
lines.forEachOrdered(
line -> {
line= ((String) line).replace("\"", "");
ret.add((String) line);
The columns print out like this:
csv 1
Car, Bike, Tractor, Plane, Train
csv 2
Bike, Plane, Tractor, Train, Car,
but I want to manipulate the code so the two csv files print out in the same order like;
Bike, Plane ,Tractor, Train, Car
I can't use the likes of col[1],col[3], as the two files are in different or so I would need to call them by column name in the csv file so col["Truck"] etc
Or is there another way. Like creating a new list from the csv 1 output and rearranging ?
I haven't used BufferedReader much so I'm not sure if this is a silly question and there's a simple solution
A BufferedReader reads lines, and does not care for the content of those lines. So this code will simply save lines into ret as it is reading them:
List<String> ret = new ArrayList<>();
try (BufferedReader r = new BufferedReader(new InputStreamReader(str))) {
r.lines().skip(1).forEachOrdered(l -> ret.add(l.replace("\"", ""));
}
// now ret contains one string per CSV line, excluding the 1st
(This is somewhat better than your code in that it is guaranteed to close the reader correctly, and does not require any casts to string).
If your CSV lines do not contain any , characters that are not separators, you can modify the above code to split lines into columns; which you can then reorder:
List<String[]> ret = new ArrayList<>(); // list of string arrays
try (BufferedReader r = new BufferedReader(new InputStreamReader(str))) {
r.lines().skip(1).forEachOrdered(l ->
ret.add(l.replace("\"", "").split(",")); // splits by ','
}
// now ret contains a String[] per CSV line, skipping the 1st;
// with ret.get(0)[1] being the 2nd column of the 1st non-skipped line
// this will output all lines, reversing the order of columns 1 and 2:
for (String[] line : ret) {
System.out.print(line[1] + ", " + line[0]);
for (int i=2; i<line.length; i++) System.out.print(", " + line[i]);
System.out.println();
}
If your CSV lines can contain ,s that are not delimiters, you will need to learn how to correctly parse (=read) CSVs, and that requires significantly more than a BufferedReader. I would recommend using an external library to handle this correctly (for there are many types of CSVs in the wild). In particular, using Apache Commons CSV, things are relatively straightforward:
try (Reader in = new FileReader("path/to/file.csv")) {
Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in);
for (CSVRecord record : records) {
String columnOne = record.get(0);
String columnTwo = record.get(1);
}
}

Best way to populate a user defined object using the values of string array

I am reading two different csv files and populating data into two different objects. I am splitting each line of csv file based on regex(regex is different for two csv files) and populating the object using each data of that array which is obtained by splitting each line using regex as shown below:
public static <T> List<T> readCsv(String filePath, String type) {
List<T> list = new ArrayList<T>();
try {
File file = new File(filePath);
FileInputStream fileInputStream = new FileInputStream(file);
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader)
list = bufferedReader.lines().skip(1).map(line -> {
T obj = null;
String[] data = null;
if (type.equalsIgnoreCase("Student")) {
data = line.split(",");
ABC abc = new ABC();
abc.setName(data[0]);
abc.setRollNo(data[1]);
abc.setMobileNo(data[2]);
obj = (T)abc;
} else if (type.equalsIgnoreCase("Employee")) {
data = line.split("\\|");
XYZ xyz = new XYZ();s
xyz.setName(Integer.parseInt(data[0]));
xyz.setCity(data[1]);
xyz.setEmployer(data[2]);
xyz.setDesignation(data[3]);
obj = (T)xyz;
}
return obj;
}).collect(Collectors.toList());} catch(Exception e) {
}}
csv files are as below:
i. csv file to populate ABC object:
Name,rollNo,mobileNo
Test1,1000,8888888888
Test2,1001,9999999990
ii. csv file to populate XYZ object
Name|City|Employer|Designation
Test1|City1|Emp1|SSE
Test2|City2|Emp2|
The issue is there can be a missing data for any of the above columns in the csv file as shown in the second csv file. In that case, I will get ArrayIndexOutOfBounds exception.
Can anyone let me know what is the best way to populate the object using the data of the string array?
Thanks in advance.
In addition to the other mistakes you made and that were pointed out to you in the comments your actual problem is caused by line.split("\\|") calling line.split("\\|", 0) which discards the trailing empty String. You need to call it with line.split("\\|", -1) instead and it will work.
The problem appears to be that one or more of the last values on any given CSV line may be empty. In that case, you run into the fact that String.split(String) suppresses trailing empty strings.
Supposing that you can rely on all the fields in fact being present, even if empty, you can simply use the two-arg form of split():
data = line.split(",", -1);
You can find details in that method's API docs.
If you cannot be confident that the fields will be present at all, then you can force them to be by adding delimiters to the end of the input string:
data = (line + ",,").split(",", -1);
Since you only use the first values few values, any extra trailing values introduced by the extra delimiters would be ignored.

How to read CSV headers and get them in to a list in java

I have a CSV with headers for eg:-
Title,Project ID,summary,priority
1,1,Test summary,High
Now i want to get the headers list that are passed in the CSV file.
NOTE: The headers passed will be different every time.
Thanks in advance
You can use CSVReader
String fileName = "data.csv";
CSVReader reader = new CSVReader(new FileReader(fileName ));
// if the first line is the header
String[] header = reader.readNext();
You can read csv file line by line.
Split the line at comma.
Split method returns array.
Each array element contain value from line read.
Suppose Title and Project ID fields are of integer type then whichever 2 elements are integer treat first as title and second as Project ID.
Strings can be considered as Summary and Priority
You could use the org.apache.commons.csv.CSVParser of apache commons. It has methods to get the headers and the content.
Try below to read header alone from a CSV file
BufferedReader br = new BufferedReader(new FileReader("myfile.csv"));
String header = br.readLine();
if (header != null) {
String[] columns = header.split(",");
}
Apache commons CSV: To get only header content from csv file and store it in list, use the below code to get it.
List<Map<String, Integer>> list = new ArrayList<>();
try (Reader reader = Files.newBufferedReader(Paths.get(CSV_FILE_PATH))) {
CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT.withHeader());
Map<String, Integer> header = csvParser.getHeaderMap();
list.add(header);
list.forEach(System.out::println);
}catch (IOException e){
e.printStackTrace();
}

Only partialially parse a CSV file with OpenCSV

I have a CSV file which I want to parse in Java with OpenCSV's csvreader.
To do so I have created a bean object to which the information is mapped. Mine is a bit long so here's an example I got from a tutorial :
package net.viralpatel.java;
public class Country {
private String countryName;
private String capital;
public String getCountryName() {
return countryName;
}
public void setCountryName(String countryName) {
this.countryName = countryName;
}
public String getCapital() {
return capital;
}
public void setCapital(String capital) {
this.capital = capital;
}
}
The code I used to parse my CSV file and map the information to the bean resembles this one :
ColumnPositionMappingStrategy strat = new ColumnPositionMappingStrategy();
strat.setType(Country.class);
String[] columns = new String[] {"countryName", "capital"};
strat.setColumnMapping(columns);
CsvToBean csv = new CsvToBean();
String csvFilename = "C:\\sample.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List list = csv.parse(strat, csvReader);
The problem is that my CSV contains not only raw data but also column titles and other data. For the column titles, I solved the issue by only reading my file from a certain line with :
CSVReader csvReader = new CSVReader(new FileReader(csvFilename), ';', '\'', 1);
(1 being the line from which the reading starts)
The other data is mostly strings in (for example) integer columns at the end of the file.
For example i have a "Max Speed" column with integer information, just next to a "Distance" column with integer information too. But at the end of the "Distance" column there is the total distance, so the String "total:" is in the "Max Speed" column right next to it.
What can I do to ensure that the reader ignores this last lines and only reads the raw information above?
PS : the CSV files I read have different lengths. So saying "stop reading after line X" won't do the trick. On the other hand the "appendix" lines are always the same. So saying "Stop reading two lines before the end of the file" should work.
Thank you very much for your help.
You can always fall to lower level and check raw string array before map it into bean like this:
ColumnPositionMappingStrategy<Country> strat = new ColumnPositionMappingStrategy<Country>();
strat.setType(Country.class);
String[] columns = new String[] {"countryName", "capital"};
strat.setColumnMapping(columns);
PublicProcessLineCsvToBean<Country> csv = new PublicProcessLineCsvToBean<Country>();
String csvFilename = "C:\\sample.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List<Country> list = new ArrayList<Country>();
String [] row = csvReader.readNext(); //skip header
if(row == null) throw new RuntimeException("File is empty");
row = csvReader.readNext();
String [] nextRow = csvReader.readNext();
while(row != null) {
if(nextRow == null) break; //check what 'row' is last
if("Total:".equalsIgnoreCase(row[1])) break; //check column for special strings
list.add(csv.processLine(strat, row));
row = nextRow;
nextRow = csvReader.readNext();
}
and to make processLine public:
public static class PublicProcessLineCsvToBean<T> extends CsvToBean<T> {
#Override
public T processLine(MappingStrategy<T> mapper, String[] line) throws IllegalAccessException, InvocationTargetException, InstantiationException, IntrospectionException {
return super.processLine(mapper, line);
}
}
If you are using the newer versions of opencsv then inject a CsvToBeanFilter into you CSVtoBean class. The opencsv javadoc gives an excellent example of how to create a filter. For your example you would just create a filter whose allowLine method would return false if the Max Speed is null, empty or "total:"

Transposing arrays

I am using the following code to read in a CSV file:
String next[] = {};
List<String[]> dataArray = new ArrayList<String[]>();
try {
CSVReader reader = new CSVReader(new InputStreamReader(getAssets().open("inputFile.csv")));
for(;;) {
next = reader.readNext();
if(next != null) {
dataArray.add(next);
} else {
break;
}
}
} catch (IOException e) {
e.printStackTrace();
}
This turns a CSV file into the array 'dataArray'. My application is for a dictionary type app - the input data's first column is a list of words, and the second column is the definitions of those words. Here is an example of the array loaded in:
Term 1, Definition 1
Term 2, Definition 2
Term 3, Definition 3
In order to access one of the strings in the array, I use the following code:
dataArray.get(rowNumber)[columnNumber]
However, I need to be able to generate a list of all the terms, so that they can be displayed for the dictionary application. As I understand it, accessing the columns by themselves is a much more lengthy process than accessing the rows (I come from a MATLAB background, where this would be simple).
It seems that in order to have ready access to any row of my input data, I would be better off transposing the data and reading it in that way; i.e.:
Term 1, Term 2, Term3
Definition 1, Definition 2, Definition 3
Of course, I could just provide a CSV file that is transposed in the first place - but Excel or OO Calc don't allow more than 256 rows, and my dictionary contains around 2000 terms.
Any of the following solutions would be welcomed:
A way to transpose an array once it has been read in
An alteration to the code posted above, such that it reads in data in the 'transposed' way
A simple way to read an entire column of an array as a whole
You would probably be better served by using a Map data structure (e.g. HashMap):
String next[] = {};
HashMap<String, String> dataMap = new HashMap<String, String>();
try {
CSVReader reader = new CSVReader(new InputStreamReader(getAssets().open("inputFile.csv")));
for(;;) {
next = reader.readNext();
if(next != null) {
dataMap.put(next[0], next[1]);
} else {
break;
}
}
} catch (IOException e) {
e.printStackTrace();
}
Then you can access the first column by
dataMap.keySet();
and the second column by
dataMap.values();
Note one assumption here: that the first column of your input data is all unique values (that is, there are not repeated values in the "Term" column).
To be able to access the keys (terms) as an array, you can simply do as follows:
String[] terms = new String[dataMap.keySet().size()];
terms = dataMap.keySet().toArray(terms);
If each row has two values, where the first one is the term and the second one is the definition, you could build a Map of it like this (Btw, this while loop does the exact same thing as your for loop):
String next[] = {};
Map<String, String> dataMap = new HashMap<String, String>();
try {
CSVReader reader = new CSVReader(new InputStreamReader(getAssets().open("inputFile.csv")));
while((next = reader.readNext()) != null) {
dataMap.put(next[0], next[1]);
}
} catch (IOException e) {
e.printStackTrace();
}
Then you can get the definition from a term via:
String definition = dataMap.get(term);
or all definitions like this:
for (String term: dataMap.keySet()) {
String definition = dataMap.get(term);
}

Categories