Only partialially parse a CSV file with OpenCSV - java

I have a CSV file which I want to parse in Java with OpenCSV's csvreader.
To do so I have created a bean object to which the information is mapped. Mine is a bit long so here's an example I got from a tutorial :
package net.viralpatel.java;
public class Country {
private String countryName;
private String capital;
public String getCountryName() {
return countryName;
}
public void setCountryName(String countryName) {
this.countryName = countryName;
}
public String getCapital() {
return capital;
}
public void setCapital(String capital) {
this.capital = capital;
}
}
The code I used to parse my CSV file and map the information to the bean resembles this one :
ColumnPositionMappingStrategy strat = new ColumnPositionMappingStrategy();
strat.setType(Country.class);
String[] columns = new String[] {"countryName", "capital"};
strat.setColumnMapping(columns);
CsvToBean csv = new CsvToBean();
String csvFilename = "C:\\sample.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List list = csv.parse(strat, csvReader);
The problem is that my CSV contains not only raw data but also column titles and other data. For the column titles, I solved the issue by only reading my file from a certain line with :
CSVReader csvReader = new CSVReader(new FileReader(csvFilename), ';', '\'', 1);
(1 being the line from which the reading starts)
The other data is mostly strings in (for example) integer columns at the end of the file.
For example i have a "Max Speed" column with integer information, just next to a "Distance" column with integer information too. But at the end of the "Distance" column there is the total distance, so the String "total:" is in the "Max Speed" column right next to it.
What can I do to ensure that the reader ignores this last lines and only reads the raw information above?
PS : the CSV files I read have different lengths. So saying "stop reading after line X" won't do the trick. On the other hand the "appendix" lines are always the same. So saying "Stop reading two lines before the end of the file" should work.
Thank you very much for your help.

You can always fall to lower level and check raw string array before map it into bean like this:
ColumnPositionMappingStrategy<Country> strat = new ColumnPositionMappingStrategy<Country>();
strat.setType(Country.class);
String[] columns = new String[] {"countryName", "capital"};
strat.setColumnMapping(columns);
PublicProcessLineCsvToBean<Country> csv = new PublicProcessLineCsvToBean<Country>();
String csvFilename = "C:\\sample.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List<Country> list = new ArrayList<Country>();
String [] row = csvReader.readNext(); //skip header
if(row == null) throw new RuntimeException("File is empty");
row = csvReader.readNext();
String [] nextRow = csvReader.readNext();
while(row != null) {
if(nextRow == null) break; //check what 'row' is last
if("Total:".equalsIgnoreCase(row[1])) break; //check column for special strings
list.add(csv.processLine(strat, row));
row = nextRow;
nextRow = csvReader.readNext();
}
and to make processLine public:
public static class PublicProcessLineCsvToBean<T> extends CsvToBean<T> {
#Override
public T processLine(MappingStrategy<T> mapper, String[] line) throws IllegalAccessException, InvocationTargetException, InstantiationException, IntrospectionException {
return super.processLine(mapper, line);
}
}

If you are using the newer versions of opencsv then inject a CsvToBeanFilter into you CSVtoBean class. The opencsv javadoc gives an excellent example of how to create a filter. For your example you would just create a filter whose allowLine method would return false if the Max Speed is null, empty or "total:"

Related

How to update a record in csv file using Apache commons csv api in java?

I have a csv file written using Apache commons API and I also can read the file, however I'm unable to know how to edit a record value in the csv file using Apache commons API, need help on this.
I tried the below code and it worked exactly the way I expected.
public static void updateCsvFile(File f) throws Exception {
CSVParser parser = new CSVParser(new FileReader(f), CSVFormat.DEFAULT);
List<CSVRecord> list = parser.getRecords();
String edited = f.getAbsolutePath();
f.delete();
CSVPrinter printer = new CSVPrinter(new FileWriter(edited), CSVFormat.DEFAULT.withRecordSeparator(NEW_LINE_SEPARATOR));
for (CSVRecord record : list) {
String[] s = toArray(record);
if(s[0].equalsIgnoreCase("Actual Text")){
s[0] = "Replacement Text";
}
print(printer, s);
}
parser.close();
printer.close();
System.out.println("CSV file was updated successfully !!!");
}
public static String[] toArray(CSVRecord rec) {
String[] arr = new String[rec.size()];
int i = 0;
for (String str : rec) {
arr[i++] = str;
}
return arr;
}
public static void print(CSVPrinter printer, String[] s) throws Exception {
for (String val : s) {
printer.print(val != null ? String.valueOf(val) : "");
}
printer.println();
}
The Apache CSV interface only support reading and writing exclusively, you cannot update records with the provided API.
So your best option is probably to read the file into memory, do the changes and write it out again.
If the size of the file is bigger than available memory you might need some streaming approach which reads records and writes them out before reading the next one. You need to write to a separate file in this case naturally.

How to read from particular header in opencsv?

I have a csv file. I want to extract particular column from it.For example:
Say, I have csv:
id1,caste1,salary,name1
63,Graham,101153.06,Abraham
103,Joseph,122451.02,Charlie
63,Webster,127965.91,Violet
76,Smith,156150.62,Eric
97,Moreno,55867.74,Mia
65,Reynolds,106918.14,Richard
How can i use opencsv to read only data from header caste1?
Magnilex and Sparky are right in that CSVReader does not support reading values by column name. But that being said there are two ways you can do this.
Given that you have the column names and the default CSVReader reads the header you can search the first the header for the position then use that from there on out;
private int getHeaderLocation(String[] headers, String columnName) {
return Arrays.asList(headers).indexOf(columnName);
}
so your method would look like (leaving out a lot of error checks you will need to put in)
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
int columnPosition;
nextLine = reader.readNext();
columnPosition = getHeaderLocation(nextLine, "castle1");
while ((nextLine = reader.readNext()) != null && columnPosition > -1) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[columnPosition]);
}
I would only do the above if you were pressed for time and it was only one column you cared about. That is because openCSV can convert directly to an object that has the variables the same as the header column names using the CsvToBean class and the HeaderColumnNameMappingStrategy.
So first you would define a class that has the fields (and really you only need to put in the fields you want - extras are ignored and missing ones are null or default values).
public class CastleDTO {
private int id1;
private String castle1;
private double salary;
private String name1;
// have all the getters and setters here....
}
Then your code would look like
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
HeaderColumnNameMappingStrategy<CastleDTO> castleStrategy = new HeaderColumnNameMappingStrategy<CastleDTO>();
CsvToBean<CastleDTO> csvToBean = new CsvToBean<CastleDTO>();
List<CastleDTO> castleList = csvToBean.parse(castleStrategy, reader);
for (CastleDTO dto : castleList) {
System.out.println(dto.getCastle1());
}
There is no built in functionality in opencsv for reading from a column by name.
The official FAQ example has the following example on how to read from a file:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
You simply fetch the value in second column for each row by accesing the row with nextLine[1] (remember, arrays indices are zero based).
So, in your case you could simply read from the second line:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
System.out.println(nextLine[1]);
}
For a more sophisticated way of determining the column index from its header, refer to the answer from Scott Conway.
From the opencsv docs:
Starting with version 4.2, there’s another handy way of reading CSV files that doesn’t even require creating special classes. If your CSV file has headers, you can just initialize a CSVReaderHeaderAware and start reading the values out as a map:
reader = new CSVReaderHeaderAware(new FileReader("yourfile.csv"));
record = reader.readMap();
.readMap() will return a single record. You need to call .readMap() repeatedly to get all the records until you get null when it runs to the end (or to the first empty line), e.g.:
Map<String, String> values;
while ((values = reader.readMap()) != null) {
// consume the values here
}
The class also has another constructor which allows more customization, e.g.:
CSVReaderHeaderAware reader = new CSVReaderHeaderAware(
new InputStreamReader(inputStream),
0, // skipLines
parser, // custom parser
false, // keep end of lines
true, // verify reader
0, // multiline limit
null // null for default locale
);
One downside which I have found is that since the reader is lazy it does not offer a record count, therefore, if you need to know the total number (for example to display correct progress information), then you'll need to use another reader just for counting lines.
You also have available the CSVReaderHeaderAwareBuilder
I had a task to remove several columns from existing csv, example of csv:
FirstName, LastName, City, County, Zip
Steve,Hopkins,London,Greater London,15554
James,Bond,Vilnius,Vilniaus,03250
I needed only FirstName and LastName columns with values and it is very important that order should be the same - default rd.readMap() does not preserve the order, code for this task:
String[] COLUMN_NAMES_TO_REMOVE = new String[]{"", "City", "County", "Zip"};
CSVReaderHeaderAware rd = new CSVReaderHeaderAware(new StringReader(old.csv));
CSVWriter writer = new CSVWriter((new FileWriter(new.csv)),
CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER, CSVWriter.NO_ESCAPE_CHARACTER, CSVWriter.DEFAULT_LINE_END);
// let's get private field
Field privateField = CSVReaderHeaderAware.class.getDeclaredField("headerIndex");
privateField.setAccessible(true);
Map<String, Integer> headerIndex = (Map<String, Integer>) privateField.get(rd);
// do ordering in natural order - 0, 1, 2 ... n
Map<String, Integer> sortedInNaturalOrder = headerIndex.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.naturalOrder()))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue,
(oldValue, newValue) -> oldValue, LinkedHashMap::new));
// let's get headers in natural order
List<String> headers = sortedInNaturalOrder.keySet().stream().distinct().collect(Collectors.toList());
// let's remove headers
List<String> removedColumns = new ArrayList<String>(Arrays.asList(COLUMN_NAMES_TO_REMOVE));
headers.removeAll(removedColumns);
// save column names
writer.writeNext(headers.toArray(new String[headers.size()]));
List<String> keys = new ArrayList<>();
Map<String, String> values;
while ((values = rd.readMap()) != null) {
for (String key : headers) {
keys.add(values.get(key));
if (keys.size() == headers.size()) {
String[] itemsArray = new String[headers.size()];
itemsArray = keys.toArray(itemsArray);
// save values
writer.writeNext(itemsArray);
keys.clear();
}
}
}
writer.flush();
Output:
FirstName, LastName
Steve,Hopkins
James,Bond
Looking at the javadoc
if you create a CSVReader object, then you can use the method .readAll to pull the entire file. It returns a List of String[], with each String[] representing a line of the file. So now you have the tokens of each line, and you only want the second element of that, so split them up as they have been nicely given to you with delimiters. And on each line you only want the second element, so:
public static void main(String[] args){
String data = "63,Graham,101153.06,Abraham";
String result[] = data.split(",");
System.out.print(result[1]);
}

SuperCSV reading from multiple files and parsing into one bean object

I am currently trying to read in multiple CSV files using beanReader before taking a few columns from each and parsing them into one bean.
So far I cannot seem to parse columns from different files into one bean object. Is this even possible with ICsvBeanReader?
Yes, it's possible :) As of Super CSV 2.2.0 you can read into an existing bean (see javadoc).
The following example uses 3 readers simultaneously (operating on 3 different files) - the first reader is used to create the bean, the other 2 just update the existing bean. This approach assumes that each file has the same number of rows (and that each row number represents the same person). If they don't, but they share some unique identifier, you'll have to read all the records from the first file into memory first, then update from the second/third matching on the identifier.
I've tried to make it a little bit smart, so you don't have to hard-code the name mapping - it just nulls out the headers it doesn't know about (so that Super CSV doesn't attempt to map fields that don't exist in your bean - see the partial reading examples on the website). Of course this will only work if your file has headers - otherwise you'll just have to hard code the mapping arrays with nulls in the appropriate places.
Person bean
public class Person {
private String firstName;
private String sex;
private String country;
// getters/setters
}
Example code
public class Example {
private static final String FILE1 = "firstName,lastName\nJohn,Smith\nSally,Jones";
private static final String FILE2 = "age,sex\n21,male\n24,female";
private static final String FILE3 = "city,country\nBrisbane,Australia\nBerlin,Germany";
private static final List<String> DESIRED_HEADERS = Arrays.asList("firstName", "sex", "country");
#Test
public void testMultipleFiles() throws Exception {
try (
ICsvBeanReader reader1 = new CsvBeanReader(new StringReader(FILE1), CsvPreference.STANDARD_PREFERENCE);
ICsvBeanReader reader2 = new CsvBeanReader(new StringReader(FILE2), CsvPreference.STANDARD_PREFERENCE);
ICsvBeanReader reader3 = new CsvBeanReader(new StringReader(FILE3), CsvPreference.STANDARD_PREFERENCE);){
String[] mapping1 = getNameMappingFromHeader(reader1);
String[] mapping2 = getNameMappingFromHeader(reader2);
String[] mapping3 = getNameMappingFromHeader(reader3);
Person person;
while((person = reader1.read(Person.class, mapping1)) != null){
reader2.read(person, mapping2);
reader3.read(person, mapping3);
System.out.println(person);
}
}
}
private String[] getNameMappingFromHeader(ICsvBeanReader reader) throws IOException{
String[] header = reader.getHeader(true);
// only read in the desired fields (set unknown headers to null to ignore)
for (int i = 0; i < header.length; i++){
if (!DESIRED_HEADERS.contains(header[i])){
header[i] = null;
}
}
return header;
}
}
Output
Person [firstName=John, sex=male, country=Australia]
Person [firstName=Sally, sex=female, country=Germany]

updating specific cell csv file using java

Hi i have a small problem and think i'm just not getting the correct syntax on one line of code. basically, i can write into my csv file and find a specific record using string tokenizer but it is not updating/editing the specified cells of that record. the record remains the same. please help....
I have used http://opencsv.sourceforge.net in java
Hi,
This is the code to update CSV by specifying row and column
/**
* Update CSV by row and column
*
* #param fileToUpdate CSV file path to update e.g. D:\\chetan\\test.csv
* #param replace Replacement for your cell value
* #param row Row for which need to update
* #param col Column for which you need to update
* #throws IOException
*/
public static void updateCSV(String fileToUpdate, String replace,
int row, int col) throws IOException {
File inputFile = new File(fileToUpdate);
// Read existing file
CSVReader reader = new CSVReader(new FileReader(inputFile), ',');
List<String[]> csvBody = reader.readAll();
// get CSV row column and replace with by using row and column
csvBody.get(row)[col] = replace;
reader.close();
// Write to CSV file which is open
CSVWriter writer = new CSVWriter(new FileWriter(inputFile), ',');
writer.writeAll(csvBody);
writer.flush();
writer.close();
}
This solution worked for me,
Cheers!
I used the below code where I will replace a string with another and it worked exactly the way I needed:
public static void updateCSV(String fileToUpdate) throws IOException {
File inputFile = new File(fileToUpdate);
// Read existing file
CSVReader reader = new CSVReader(new FileReader(inputFile), ',');
List<String[]> csvBody = reader.readAll();
// get CSV row column and replace with by using row and column
for(int i=0; i<csvBody.size(); i++){
String[] strArray = csvBody.get(i);
for(int j=0; j<strArray.length; j++){
if(strArray[j].equalsIgnoreCase("Update_date")){ //String to be replaced
csvBody.get(i)[j] = "Updated_date"; //Target replacement
}
}
}
reader.close();
// Write to CSV file which is open
CSVWriter writer = new CSVWriter(new FileWriter(inputFile), ',');
writer.writeAll(csvBody);
writer.flush();
writer.close();
}
You're doing something like this:
String line = readLineFromFile();
line.replace(...);
This is not editing the file, it's creating a new string from a line in the file.
String instances are immutable, so the replace call you're making returns a new string it does not modify the original string.
Either use a file stream that allows you to both read and write to the file - i.e. RandomAccessFile or (more simply) write to a new file then replace the old file with the new one
In psuedo code:
for (String line : inputFile) {
String [] processedLine = processLine(line);
outputFile.writeLine(join(processedLine, ","));
}
private String[] processLine(String line) {
String [] cells = line.split(","); // note this is not sufficient for correct csv parsing.
for (int i = 0; i < cells.length; i++) {
if (wantToEditCell(cells[i])) {
cells[i] = "new cell value";
}
}
return cells;
}
Also, please take a look at this question. There are libraries to help you deal with csv.
CSV file is just a file. It is not being changed if you are reading it.
So, write your changes!
You have 3 ways.
1
read line by line finding the cell you want to change.
change the cell if needed and composite new version of current line.
write the line into second file.
when you finished you have the source file and the result file. Now if you want you can remove the source file and rename the result file to source.
2
Use RandomAccess file to write into specific place of the file.
3
Use one of available implementations of CSV parser (e.g. http://commons.apache.org/sandbox/csv/)
It already supports what you need and exposes high level API.

Java: CSV file read & write

I'm reading 2 csv files: store_inventory & new_acquisitions.
I want to be able to compare the store_inventory csv file with new_acquisitions.
1) If the item names match just update the quantity in store_inventory.
2) If new_acquisitions has a new item that does not exist in store_inventory, then add it to the store_inventory.
Here is what i have done so far but its not very good. I added comments where i need to add taks 1 & 2.
Any advice or code to do the above tasks would be great! thanks.
File new_acq = new File("/src/test/new_acquisitions.csv");
Scanner acq_scan = null;
try {
acq_scan = new Scanner(new_acq);
} catch (FileNotFoundException ex) {
Logger.getLogger(mainpage.class.getName()).log(Level.SEVERE, null, ex);
}
String itemName;
int quantity;
Double cost;
Double price;
File store_inv = new File("/src/test/store_inventory.csv");
Scanner invscan = null;
try {
invscan = new Scanner(store_inv);
} catch (FileNotFoundException ex) {
Logger.getLogger(mainpage.class.getName()).log(Level.SEVERE, null, ex);
}
String itemNameInv;
int quantityInv;
Double costInv;
Double priceInv;
while (acq_scan.hasNext()) {
String line = acq_scan.nextLine();
if (line.charAt(0) == '#') {
continue;
}
String[] split = line.split(",");
itemName = split[0];
quantity = Integer.parseInt(split[1]);
cost = Double.parseDouble(split[2]);
price = Double.parseDouble(split[3]);
while(invscan.hasNext()) {
String line2 = invscan.nextLine();
if (line2.charAt(0) == '#') {
continue;
}
String[] split2 = line2.split(",");
itemNameInv = split2[0];
quantityInv = Integer.parseInt(split2[1]);
costInv = Double.parseDouble(split2[2]);
priceInv = Double.parseDouble(split2[3]);
if(itemName == itemNameInv) {
//update quantity
}
}
//add new entry into csv file
}
Thanks again for any help. =]
Suggest you use one of the existing CSV parser such as Commons CSV or Super CSV instead of reinventing the wheel. Should make your life a lot easier.
Your implementation makes the common mistake of breaking the line on commas by using line.split(","). This does not work because the values themselves might have commas in them. If that happens, the value must be quoted, and you need to ignore commas within the quotes. The split method can not do this -- I see this mistake a lot.
Here is the source of an implementation that does it correctly:
http://agiletribe.purplehillsbooks.com/2012/11/23/the-only-class-you-need-for-csv-files/
With help of the open source library uniVocity-parsers, you could develop with pretty clean code as following:
private void processInventory() throws IOException {
/**
* ---------------------------------------------
* Read CSV rows into list of beans you defined
* ---------------------------------------------
*/
// 1st, config the CSV reader with row processor attaching the bean definition
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setLineSeparator("\n");
BeanListProcessor<Inventory> rowProcessor = new BeanListProcessor<Inventory>(Inventory.class);
settings.setRowProcessor(rowProcessor);
settings.setHeaderExtractionEnabled(true);
// 2nd, parse all rows from the CSV file into the list of beans you defined
CsvParser parser = new CsvParser(settings);
parser.parse(new FileReader("/src/test/store_inventory.csv"));
List<Inventory> storeInvList = rowProcessor.getBeans();
Iterator<Inventory> storeInvIterator = storeInvList.iterator();
parser.parse(new FileReader("/src/test/new_acquisitions.csv"));
List<Inventory> newAcqList = rowProcessor.getBeans();
Iterator<Inventory> newAcqIterator = newAcqList.iterator();
// 3rd, process the beans with business logic
while (newAcqIterator.hasNext()) {
Inventory newAcq = newAcqIterator.next();
boolean isItemIncluded = false;
while (storeInvIterator.hasNext()) {
Inventory storeInv = storeInvIterator.next();
// 1) If the item names match just update the quantity in store_inventory
if (storeInv.getItemName().equalsIgnoreCase(newAcq.getItemName())) {
storeInv.setQuantity(newAcq.getQuantity());
isItemIncluded = true;
}
}
// 2) If new_acquisitions has a new item that does not exist in store_inventory,
// then add it to the store_inventory.
if (!isItemIncluded) {
storeInvList.add(newAcq);
}
}
}
Just follow this code sample I worked out according to your requirements. Note that the library provided simplified API and significent performance for parsing CSV files.
The operation you are performing will require that for each item in your new acquisitions, you will need to search each item in inventory for a match. This is not only not efficient, but the scanner that you have set up for your inventory file would need to be reset after each item.
I would suggest that you add your new acquisitions and your inventory to collections and then iterate over your new acquisitions and look up the new item in your inventory collection. If the item exists, update the item. If it doesnt, add it to the inventory collection. For this activity, it might be good to write a simple class to contain an inventory item. It could be used for both the new acquisitions and for the inventory. For a fast lookup, I would suggest that you use HashSet or HashMap for your inventory collection.
At the end of the process, dont forget to persist the changes to your inventory file.
As Java doesn’t support parsing of CSV files natively, we have to rely on third party library. Opencsv is one of the best library available for this purpose. It’s open source and is shipped with Apache 2.0 licence which makes it possible for commercial use.
Here, this link should help you and others in the situations!
For writing to CSV
public void writeCSV() {
// Delimiter used in CSV file
private static final String NEW_LINE_SEPARATOR = "\n";
// CSV file header
private static final Object[] FILE_HEADER = { "Empoyee Name","Empoyee Code", "In Time", "Out Time", "Duration", "Is Working Day" };
String fileName = "fileName.csv");
List<Objects> objects = new ArrayList<Objects>();
FileWriter fileWriter = null;
CSVPrinter csvFilePrinter = null;
// Create the CSVFormat object with "\n" as a record delimiter
CSVFormat csvFileFormat = CSVFormat.DEFAULT.withRecordSeparator(NEW_LINE_SEPARATOR);
try {
fileWriter = new FileWriter(fileName);
csvFilePrinter = new CSVPrinter(fileWriter, csvFileFormat);
csvFilePrinter.printRecord(FILE_HEADER);
// Write a new student object list to the CSV file
for (Object object : objects) {
List<String> record = new ArrayList<String>();
record.add(object.getValue1().toString());
record.add(object.getValue2().toString());
record.add(object.getValue3().toString());
csvFilePrinter.printRecord(record);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
fileWriter.flush();
fileWriter.close();
csvFilePrinter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
You can use Apache Commons CSV api.
FYI this anwser : https://stackoverflow.com/a/42198895/6549532
Read / Write Example

Categories