Read CSV file column by column

Read CSV file column by column - java

I want to read specific columns from a multi column csv file and print those columns in other csv file using Java. Any help please? Following is my code to print each token line by line..But I am looking to print only few columns out of the multi column csv.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.StringTokenizer;
public class ParseCSV {
public static void main(String[] args) {
try
{
//csv file containing data
String strFile = "C:\\Users\\rsaluja\\CMS_Evaluation\\Drupal_12_08_27.csv";
//create BufferedReader to read csv file
BufferedReader br = new BufferedReader( new FileReader(strFile));
String strLine = "";
StringTokenizer st = null;
int lineNumber = 0, tokenNumber = 0;
//read comma separated file line by line
while( (strLine = br.readLine()) != null)
{
lineNumber++;
//break comma separated line using ","
st = new StringTokenizer(strLine, ",");
while(st.hasMoreTokens())
{
//display csv values
tokenNumber++;
System.out.println("Line # " + lineNumber +
", Token # " + tokenNumber
+ ", Token : "+ st.nextToken());
System.out.println(cols[4]);

You should use the excellent OpenCSV for reading and writing CSV files. To adapt your example to use the library it would look like this:
public class ParseCSV {
public static void main(String[] args) {
try {
//csv file containing data
String strFile = "C:/Users/rsaluja/CMS_Evaluation/Drupal_12_08_27.csv";
CSVReader reader = new CSVReader(new FileReader(strFile));
String [] nextLine;
int lineNumber = 0;
while ((nextLine = reader.readNext()) != null) {
lineNumber++;
System.out.println("Line # " + lineNumber);
// nextLine[] is an array of values from the line
System.out.println(nextLine[4] + "etc...");
}
}
}
}

Reading a CSV file in very simple and common in Java. You actually don't require to load any extra third party library to do this for you. CSV (comma separated value) file is just a normal plain-text file, store data in column by column, and split it by a separator (e.g comma ",").
In order to read specific columns from the CSV file, there are several ways. Simplest of all is as below:
Code to read CSV without any 3rd party library
BufferedReader br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
// use comma as separator
String[] cols = line.split(cvsSplitBy);
System.out.println("Coulmn 4= " + cols[4] + " , Column 5=" + cols[5]);
}
If you notice, nothing special is performed here. It is just reading a text file, and spitting it by a separator – ",".
Consider an extract from legacy country CSV data at GeoLite Free Downloadable Databases
"1.0.0.0","1.0.0.255","16777216","16777471","AU","Australia"
"1.0.1.0","1.0.3.255","16777472","16778239","CN","China"
"1.0.4.0","1.0.7.255","16778240","16779263","AU","Australia"
"1.0.8.0","1.0.15.255","16779264","16781311","CN","China"
"1.0.16.0","1.0.31.255","16781312","16785407","JP","Japan"
"1.0.32.0","1.0.63.255","16785408","16793599","CN","China"
"1.0.64.0","1.0.127.255","16793600","16809983","JP","Japan"
"1.0.128.0","1.0.255.255","16809984","16842751","TH","Thailand"
Above code will output as below:
Column 4= "AU" , Column 5="Australia"
Column 4= "CN" , Column 5="China"
Column 4= "AU" , Column 5="Australia"
Column 4= "CN" , Column 5="China"
Column 4= "JP" , Column 5="Japan"
Column 4= "CN" , Column 5="China"
Column 4= "JP" , Column 5="Japan"
Column 4= "TH" , Column 5="Thailand"
You can, in fact, put the columns in a Map and then get the values simply by using the key.
Shishir

I am sorry, but none of these answers provide an optimal solution. If you use a library such as OpenCSV you will have to write a lot of code to handle special cases to extract information from specific columns.
For example, if you have rows with less columns than what you're after, you'll have to write a lot of code to handle it. Using the OpenCSV example:
CSVReader reader = new CSVReader(new FileReader(strFile));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
//let's say you are interested in getting columns 20, 30, and 40
String[] outputRow = new String[3];
if(parsedRow.length < 40){
outputRow[2] = null;
} else {
outputRow[2] = parsedRow[40]
}
if(parsedRow.length < 30){
outputRow[1] = null;
} else {
outputRow[1] = parsedRow[30]
}
if(parsedRow.length < 20){
outputRow[0] = null;
} else {
outputRow[0] = parsedRow[20]
}
}
This is a lot of code for a simple requirement. It gets worse if you are trying to get values of columns by name. You should use a more modern parser such as the one provided by uniVocity-parsers.
To reliably and easily get the columns you want, simply write:
CsvParserSettings settings = new CsvParserSettings();
parserSettings.selectIndexes(20, 30, 40);
CsvParser parser = new CsvParser(settings);
List<String[]> allRows = parser.parseAll(new FileReader(yourFile));
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

To read some specific column
I did something like this:
dpkcs.csv content:
FN,LN,EMAIL,CC
Name1,Lname1,email1#gmail.com,CC1
Nmae2,Lname2,email2r#gmail.com,CC2
The function to read it:
private void getEMailRecepientList() {
List<EmailRecepientData> emailList = null;// Blank list of POJO class
Scanner scanner = null;
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("dpkcs.csv"));
Map<String, Integer> mailHeader = new HashMap<String, Integer>();
// read file line by line
String line = null;
int index = 0;
line = reader.readLine();
// Get header from 1st row of csv
if (line != null) {
StringTokenizer str = new StringTokenizer(line, ",");
int headerCount = str.countTokens();
for (int i = 0; i < headerCount; i++) {
String headerKey = str.nextToken();
mailHeader.put(headerKey.toUpperCase(), new Integer(i));
}
}
emailList = new ArrayList<EmailRecepientData>();
while ((line = reader.readLine()) != null) {
// POJO class for getter and setters
EmailRecepientData email = new EmailRecepientData();
scanner = new Scanner(line);
scanner.useDelimiter(",");
//Use Specific key to get value what u want
while (scanner.hasNext()) {
String data = scanner.next();
if (index == mailHeader.get("EMAIL"))
email.setEmailId(data);
else if (index == mailHeader.get("FN"))
email.setFirstName(data);
else if (index == mailHeader.get("LN"))
email.setLastName(data);
else if (index == mailHeader.get("CC"))
email.setCouponCode(data);
index++;
}
index = 0;
emailList.add(email);
}
reader.close();
} catch (Exception e) {
StringWriter stack = new StringWriter();
e.printStackTrace(new PrintWriter(stack));
} finally {
scanner.close();
}
System.out.println("list--" + emailList);
}
The POJO Class:
public class EmailRecepientData {
private String emailId;
private String firstName;
private String lastName;
private String couponCode;
public String getEmailId() {
return emailId;
}
public void setEmailId(String emailId) {
this.emailId = emailId;
}
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
public String getCouponCode() {
return couponCode;
}
public void setCouponCode(String couponCode) {
this.couponCode = couponCode;
}
#Override
public String toString() {
return "Email Id=" + emailId + ", First Name=" + firstName + " ,"
+ " Last Name=" + lastName + ", Coupon Code=" + couponCode + "";
}
}

I sugges to use the Apache Commons CSV https://commons.apache.org/proper/commons-csv/
Here is one example:
Path currentRelativePath = Paths.get("");
String currentPath = currentRelativePath.toAbsolutePath().toString();
String csvFile = currentPath + "/pathInYourProject/test.csv";
Reader in;
Iterable<CSVRecord> records = null;
try
{
in = new FileReader(csvFile);
records = CSVFormat.EXCEL.withHeader().parse(in); // header will be ignored
}
catch (IOException e)
{
e.printStackTrace();
}
for (CSVRecord record : records) {
String line = "";
for ( int i=0; i < record.size(); i++)
{
if ( line == "" )
line = line.concat(record.get(i));
else
line = line.concat("," + record.get(i));
}
System.out.println("read line: " + line);
}
It automaticly recognize , and " but not ; (maybe it can be configured...).
My example file is:
col1,col2,col3
val1,"val2",val3
"val4",val5
val6;val7;"val8"
And output is:
read line: val1,val2,val3
read line: val4,val5
read line: val6;val7;"val8"
Last line is considered like one value.

We can use the core java stuff alone to read the CVS file column by column. Here is the sample code I have wrote for my requirement. I believe that it will help for some one.
BufferedReader br = new BufferedReader(new FileReader(csvFile));
String line = EMPTY;
int lineNumber = 0;
int productURIIndex = -1;
int marketURIIndex = -1;
int ingredientURIIndex = -1;
int companyURIIndex = -1;
// read comma separated file line by line
while ((line = br.readLine()) != null) {
lineNumber++;
// use comma as line separator
String[] splitStr = line.split(COMMA);
int splittedStringLen = splitStr.length;
// get the product title and uri column index by reading csv header
// line
if (lineNumber == 1) {
for (int i = 0; i < splittedStringLen; i++) {
if (splitStr[i].equals(PRODUCTURI_TITLE)) {
productURIIndex = i;
System.out.println("product_uri index:" + productURIIndex);
}
if (splitStr[i].equals(MARKETURI_TITLE)) {
marketURIIndex = i;
System.out.println("marketURIIndex:" + marketURIIndex);
}
if (splitStr[i].equals(COMPANYURI_TITLE)) {
companyURIIndex = i;
System.out.println("companyURIIndex:" + companyURIIndex);
}
if (splitStr[i].equals(INGREDIENTURI_TITLE)) {
ingredientURIIndex = i;
System.out.println("ingredientURIIndex:" + ingredientURIIndex);
}
}
} else {
if (splitStr != null) {
String conditionString = EMPTY;
// avoiding arrayindexoutboundexception when the line
// contains only ,,,,,,,,,,,,,
for (String s : splitStr) {
conditionString = s;
}
if (!conditionString.equals(EMPTY)) {
if (productURIIndex != -1) {
productCVSUriList.add(splitStr[productURIIndex]);
}
if (companyURIIndex != -1) {
companyCVSUriList.add(splitStr[companyURIIndex]);
}
if (marketURIIndex != -1) {
marketCVSUriList.add(splitStr[marketURIIndex]);
}
if (ingredientURIIndex != -1) {
ingredientCVSUriList.add(splitStr[ingredientURIIndex]);
}
}
}
}

Finds all files in folder and write that data to ArrayList row.
Initialize
ArrayList<ArrayList<String>> row=new ArrayList<ArrayList<String>>();
BufferedReader br=null;
For Accessing row
for(ArrayList<String> data:row){
data.get(col no);
}
or row.get(0).get(0) // getting first row first col
Functions that reads all files from folders and concatenate them row.
static void readData(){
String path="C:\\Users\\Galaxy Computers\\Desktop\\Java project\\Nasdaq\\";
File files=new File(path);
String[] list=files.list();
try {
String sCurrentLine;
char check;
for(String filename:list){
br = new BufferedReader(new FileReader(path+filename));
br.readLine();//If file contains uneccessary first line.
while ((sCurrentLine = br.readLine()) != null) {
row.add(splitLine(sCurrentLine));
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
try {
if (br != null)br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
static ArrayList<String> splitLine(String line){
String[] ar=line.split(",");
ArrayList<String> d=new ArrayList<String>();
for(String data:ar){
d.add(data);
}
return d;
}

Well, how about this !!
This code calculates both row and column count in a csv file. Try this out !!
static int[] getRowsColsNo() {
Scanner scanIn = null;
int rows = 0;
int cols = 0;
String InputLine = "";
try {
scanIn = new Scanner(new BufferedReader(
new FileReader("filename.csv")));
scanIn.useDelimiter(",");
while (scanIn.hasNextLine()) {
InputLine = scanIn.nextLine();
String[] InArray = InputLine.split(",");
rows++;
cols = InArray.length;
}
} catch (Exception e) {
System.out.println(e);
}
return new int[] { rows, cols };
}

Related

Import Data as Array from TXT File in Java

I want to import a data like this from .txt file using Java.
I want to import this as an Array. The class of data is like this:
class SolidStateDrive {
String brand;
String model;
int capacityInGB;
}
So if the array from txt file is hardcoded, that would be like this:
ssd[0].brand = Samsung;
ssd[1].brand = Adata;
ssd[0].model = Evo970;
ssd[1].model = SU650;
ssd[0].capacityInGB = 512;
ssd[1].capacityInGB 240;
The problem is, when i try to read the .txt file, it only can store 1 data from first line. If more than one, it will give error ArrayOutOfBound Exception.
Im using while loop so that as long as the nextLine is not null, it will loop. This is my code:
int n = 0;
SolidStateDrive[] ssd = new SolidStateDrive[n+1];
try {
BufferedReader br = new BufferedReader(new FileReader("SSD.txt"));
String line = null;
while((line = br.readLine()) != null) {
String tmp[] = line.split("\t");
ssd[n] = new SolidStateDrive();
ssd[n].brand = tmp[0];
ssd[n].model = tmp[1];
ssd[n].capacityInGB = Integer.parseInt(tmp[2]);
n++;
}
br.close();
} catch(IOException ex) {
ex.printStackTrace();
}
Update : i already tried this but doesnt work either
SolidStateDrive[] ssd = new SolidStateDrive[2];
For complete code in that file : pastebin

The problem is in splitting line String tmp[] = line.split("\t");.
Take line one from txt ssd[0].brand = Samsung; output of split is same as input ssd[0].brand = Samsung;.
while ((line = br.readLine()) != null) {
String tmp[] = line.split("\t");
ssd[n] = new SolidStateDrive();
ssd[n].brand = tmp[0];
ssd[n].model = tmp[1];
ssd[n].capacityInGB = Integer.parseInt(tmp[2]);
n++;
}
So the tmp[] will contain only tmp[0] = ssd[0].brand = Samsung;.
When you try to access tmp[1] you will get
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1.
The solution to your problem,
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class ReadData {
public static void main(String[] str) {
readData();
}
public static void readData() {
List<SolidStateDrive> ssd = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new FileReader("SSD.txt"))) {
ssd = br.lines().map(s-> {
String[] tmp = s.split("\\s+");
return new SolidStateDrive(tmp[0], tmp[1], Integer.parseInt(tmp[2]));
}).collect(Collectors.toList());
} catch (IOException ex) {
ex.printStackTrace();
}
ssd.stream().forEach(System.out::println);
}
}
class SolidStateDrive {
String brand;
String model;
int capacityInGB;
public SolidStateDrive(String brand, String model, int capacityInGB) {
super();
this.brand = brand;
this.model = model;
this.capacityInGB = capacityInGB;
}
#Override
public String toString() {
return "SolidStateDrive [brand=" + brand + ", model=" + model + ", capacityInGB=" + capacityInGB + "]";
}
}

The ArrayOutOfBounds exception occurs in the code you posted in pastebin; you have a for loop that loops n+1 times, which will always be 1 more than the number of SSDs in the array. This is the offending for loop:
for (int j=0; j<=n; j++) {
System.out.println(j+ "\t" +ssd[j].brand+ "\t" +ssd[j].model+ "\t" +ssd[j].capacityInGB);
}
To fix it, just change the <= to <, so the loop goes up to, but not including, n, since you started it at 0, not 1. It should look like this:
for (int j = 0; j < n; j++) {
System.out.println(j + "\t" + ssd[j].brand + "\t" + ssd[j].model + "\t" + ssd[j].capacityInGB);
}

display array issues possibly due to whitespace characters

I'm trying to import a txt file with car info and separate the strings into arrays and then display them. The number of doors is combined with the next number plate. Have tried a few ways to get rid of the whitespace characters which I think is causing the issue but have had no luck.
whitespace chars
My code displays this result:
Number Plate : AG53DBO
Car Type : Mercedes
Engine Size : 1000
Colour : (255:0:0)
No. of Doors : 4
MD17WBW
Number Plate : 4
MD17WBW
Car Type : Volkswagen
Engine Size : 2300
Colour : (0:0:255)
No. of Doors : 5
ED03HSH
Code:
public class Application {
public static void main(String[] args) throws IOException {
///// ---- Import File ---- /////
String fileName =
"C:\\Users\\beng\\eclipse-workspace\\Assignment Trailblazer\\Car Data";
BufferedReader reader = new BufferedReader(new FileReader(fileName));
StringBuilder stringBuilder = new StringBuilder();
String line = null;
String ls = System.getProperty("line.separator");
while ((line = reader.readLine()) != null) {
stringBuilder.append(line);
stringBuilder.append(ls);
}
reader.close();
String content = stringBuilder.toString();
///// ---- Split file into array ---- /////
String[] dataList = content.split(",");
// Display array
for (String temp : dataList) {
// System.out.println(temp);
}
ArrayList<Car> carArray = new ArrayList();
// Loop variables
int listLength = 1;
int arrayPosition = 0;
// (dataList.length/5)
while (listLength < 5) {
Car y = new Car(dataList, arrayPosition);
carArray.add(y);
listLength++;
arrayPosition += 4;
}
for (Car temp : carArray) {
System.out.println(temp.displayCar());
}
}
}
And
public class Car {
String[] data;
private String modelUnpro;
private String engineSizeUnpro;
private String registrationUnpro;
private String colourUnpro;
private String doorNoUnpro;
// Constructor
public Car(String[] data, int arrayPosition) {
registrationUnpro = data[arrayPosition];
modelUnpro = data[arrayPosition + 1];
engineSizeUnpro = data[arrayPosition + 2];
colourUnpro = data[arrayPosition + 3];
doorNoUnpro = data[arrayPosition + 4];
}
// Getters
private String getModelUnpro() {
return modelUnpro;
}
private String getEngineSizeUnpro() {
return engineSizeUnpro;
}
private String getRegistrationUnpro() {
return registrationUnpro;
}
private String getColourUnpro() {
return colourUnpro;
}
private String getDoorNoUnpro() {
return doorNoUnpro;
}
public String displayCar() {
return "Number Plate : " + getRegistrationUnpro() + "\n Car Type : " + getModelUnpro() + "\n Engine Size : "
+ getEngineSizeUnpro() + "\n Colour : " + getColourUnpro() + "\n No. of Doors : " + getDoorNoUnpro() + "\n";
}
}
Text file:
AG53DBO,Mercedes,1000,(255:0:0),4
MD17WBW,Volkswagen,2300,(0:0:255),5
ED03HSH,Toyota,2000,(0:0:255),4
OH01AYO,Honda,1300,(0:255:0),3
WE07CND,Nissan,2000,(0:255:0),3
NF02FMC,Mercedes,1200,(0:0:255),5
PM16DNO,Volkswagen,1300,(255:0:0),5
MA53OKB,Honda,1400,(0:0:0),4
VV64BHH,Honda,1600,(0:0:255),5
ER53EVW,Ford,2000,(0:0:255),3

Remove Line separator from while loop.
String fileName = "D:\\Files\\a.txt";
BufferedReader reader = new BufferedReader(new FileReader(fileName));
StringBuilder stringBuilder = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
stringBuilder.append(line.trim());
}
reader.close();
String content = stringBuilder.toString();
String[] dataList = content.split(",");
ArrayList<Car> carArray = new ArrayList();
int listLength = 1;
int arrayPosition = 0;
// (dataList.length/5)
while (listLength < 3) {
Car y = new Car(dataList, arrayPosition);
carArray.add(y);
listLength++;
arrayPosition += 4;
}
for (Car temp : carArray) {
System.out.println(temp.displayCar());
}

In StringBuilder you collect all lines:
AG53DBO,Mercedes,1000,(255:0:0),4\r\nMD17WBW,Volkswagen,2300,(0:0:255),5\r\n...
This string should first be spit on ls - and then you have lines with fields separated by comma.
Now just splitting by comma will cause a doubled array element 4\r\nMD17WBW.
Something like:
String fileName =
"C:\\Users\\beng\\eclipse-workspace\\Assignment Trailblazer\\Car Data";
Path path = Paths.get(fileName);
List<String> lines = Files.readAllLines(path); // Without line ending.
List<Car> cars = new ArrayList<>();
for (String line : lines) {
String[] data = line.split(",");
Car car = new Car(data);
cars.add(car);
}
Path, Paths and especially Files are very handy classes. With java Streams one also can abbreviate things like:
String fileName =
"C:\\Users\\beng\\eclipse-workspace\\Assignment Trailblazer\\Car Data";
Path path = Paths.get(fileName);
List<Car> cars = Files.lines(path) // Stream<String>
.map(line -> line.split(",")) // Stream<String[]>
.map(Car::new) // Stream<Car>
.collect(Collectors.toList()); // List<Car>
Here .lines returns a Stream<String> (walking cursor) of lines in the file, without line separator.
Then .map(l -> l.split(",")) splits every line.
Then the Car(String[]) constructor is called on the string array.
Then the result is collected in a List.

readNext() function of CSVReader not looping through all rows of csv [EDIT: How to handle erroneous CSV (remove unescaped quotes)]

FileReader fr = new FileReader(inp);
CSVReader reader = new CSVReader(fr, ',', '"');
// writer
File writtenFromWhile = new File(dliRootPath + writtenFromWhilePath);
writtenFromWhile.createNewFile();
CSVWriter writeFromWhile = new CSVWriter(new FileWriter(writtenFromWhile), ',', '"');
int insideWhile = 0;
String[] currRow = null;
while ((currRow = reader.readNext()) != null) {
insideWhile++;
writeFromWhile.writeNext(currRow);
}
System.out.println("inside While: " + insideWhile);
System.out.println("lines read (acc.to CSV reader): " + reader.getLinesRead());
The output is:
inside While: 162199
lines read (acc.to CSV reader): 256865
Even though all lines are written to the output CSV (when viewed in a text editor, Excel shows much lesser number of rows), the while loop does not iterate the same number of times as the rows in input CSV. My main objective is to implement some other logic inside while loop on each line.
I have been trying to debug since two whole days ( a bigger code) without any results.
Please explain how I can loop through while 256865 times
Reference data, complete picture:
Here is the CSV I am reading in the above snippet.
My complete program tries to separate out those records from this CSV which are not present in this CSV, based on the fields title and author (i.e if author and title is the same in 2 records, even if other fields are different, they are counted as duplicate and should not be written to the output file). Here is my complete code (the difference should be around 300000, but i get only ~210000 in the output file with my code):
//TODO ask id
/*(*
* id also there in fields getting matched (thisRow[0] is id)
* u can replace it by thisRow[fielAnd Column.get(0)] to eliminate id
*/
package mainOne;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;
public class Diff_V3 {
static String dliRootPath = "/home/gurnoor/Incoming/Untitled Folder 2/";
static String dli = "new-dli-IITG.csv";
static String oldDli = "dli-iisc.csv";
static String newFile = "newSampleFile.csv";// not used
static String unqFile = "UniqueFileFinal.csv";
static String log = "Diff_V3_log.txt";
static String splittedNewDliDir = "/home/gurnoor/Incoming/Untitled Folder 2/splitted new file";
static String splittedOldDliDir = "/home/gurnoor/Incoming/Untitled Folder 2/splitted old file";
// debug
static String testFilePath = "testFile.csv";
static int insidepopulateMapFromSplittedCSV = 0;
public static void main(String[] args) throws IOException, CustomException {
// _readSample(dliRootPath+dli, dliRootPath+newFile);
// System.out.println(areIDsunique(dliRootPath + dli, 550841) );// open
// in geany to get total no
// of lines
// TODO implement sparate function to check equals
// File filteredFile = new File(dliRootPath + "filteredFile.csv");
// filteredFile.createNewFile();
File logFile = new File(dliRootPath + log);
logFile.createNewFile();
new File(dliRootPath + testFilePath).createNewFile();
List<String> fieldsToBeMatched = new ArrayList<>();
fieldsToBeMatched.add("dc.contributor.author[]");
fieldsToBeMatched.add("dc.title[]");
filterUniqueFileds(new File(splittedNewDliDir), new File(splittedOldDliDir), fieldsToBeMatched);
}
/**
* NOTE: might remove the row where fieldToBeMatched is null
*
* #param inpfile
* #param file
* #param filteredFile
* #param fieldsToBeMatched
* #throws IOException
* #throws CustomException
*/
private static void filterUniqueFileds(File newDir, File oldDir, List<String> fieldsToBeMatched)
throws IOException, CustomException {
CSVReader reader = new CSVReader(new FileReader(new File(dliRootPath + dli)), '|');
// writer
File unqFileOp = new File(dliRootPath + unqFile);
unqFileOp.createNewFile();
CSVWriter writer = new CSVWriter(new FileWriter(unqFileOp), '|');
// logWriter
BufferedWriter logWriter = new BufferedWriter(new FileWriter(new File(dliRootPath + log)));
String[] headingRow = // allRows.get(0);
reader.readNext();
writer.writeNext(headingRow);
int headingLen = headingRow.length;
// old List
System.out.println("[INFO] reading old list...");
// CSVReader oldReader = new CSVReader(new FileReader(new
// File(dliRootPath + oldDli)));
Map<String, List<String>> oldMap = new HashMap<>();
oldMap = populateMapFromSplittedCSV(oldMap, oldDir);// populateMapFromCSV(oldMap,
// oldReader);
// oldReader.close();
System.out.println("[INFO] Read old List. Size = " + oldMap.size());
printMapToCSV(oldMap, dliRootPath + testFilePath);
// map of fieldName, ColumnNo
Map<String, Integer> fieldAndColumnNoInNew = new HashMap<>(getColumnNo(fieldsToBeMatched, headingRow));
Map<String, Integer> fieldAndColumnNoInOld = new HashMap<>(
getColumnNo(fieldsToBeMatched, (String[]) oldMap.get("id").toArray()));
// error check: did columnNo get populated?
if (fieldAndColumnNoInNew.isEmpty()) {
reader.close();
writer.close();
throw new CustomException("field to be matched not present in input CSV");
}
// TODO implement own array compare using areEqual()
// error check
// if( !Arrays.equals(headingRow, (String[]) oldMap.get("id").toArray())
// ){
// System.out.println("heading in new file, old file: \n"+
// Arrays.toString(headingRow));
// System.out.println(Arrays.toString((String[])
// oldMap.get("id").toArray()));
// reader.close();
// writer.close();
// oldReader.close();
// throw new CustomException("Heading rows are not same in old and new
// file");
// }
int noOfRecordsInOldList = 0, noOfRecordsWritten = 0, checkManually = 0;
String[] thisRow;
while ((thisRow = reader.readNext()) != null) {
// for(int l=allRows.size()-1; l>=0; l--){
// thisRow=allRows.get(l);
// error check
if (thisRow.length != headingLen) {
String error = "Line no: " + reader.getLinesRead() + " in file: " + dliRootPath + dli
+ " not read. Check manually";
System.err.println(error);
logWriter.append(error + "\n");
logWriter.flush();
checkManually++;
continue;
}
// write if not present in oldMap
if (!oldMap.containsKey(thisRow[0])) {
writer.writeNext(thisRow);
writer.flush();
noOfRecordsWritten++;
} else {
// check if all reqd fields match
List<String> twinRow = oldMap.get(thisRow[0]);
boolean writtenToOp = false;
// for (int k = 0; k < fieldsToBeMatched.size(); k++) {
List<String> newFields = new ArrayList<>(fieldAndColumnNoInNew.keySet());
List<String> oldFields = new ArrayList<>(fieldAndColumnNoInOld.keySet());
// faaltu error check
if (newFields.size() != oldFields.size()) {
reader.close();
writer.close();
CustomException up = new CustomException("something is really wrong");
throw up;
}
// for(String fieldName : fieldAndColumnNoInNew.keySet()){
for (int m = 0; m < newFields.size(); m++) {
int columnInNew = fieldAndColumnNoInNew.get(newFields.get(m)).intValue();
int columnInOld = fieldAndColumnNoInOld.get(oldFields.get(m)).intValue();
String currFieldTwin = twinRow.get(columnInOld);
String currField = thisRow[columnInNew];
if (!areEqual(currField, currFieldTwin)) {
writer.writeNext(thisRow);
writer.flush();
writtenToOp = true;
noOfRecordsWritten++;
System.out.println(noOfRecordsWritten);
break;
}
}
if (!writtenToOp) {
noOfRecordsInOldList++;
// System.out.println("[INFO] present in old List: \n" +
// Arrays.toString(thisRow) + " AND\n"
// + twinRow.toString());
}
}
}
System.out.println("--------------------------------------------------------\nDebug info");
System.out.println("old File: " + oldMap.size());
System.out.println("new File:" + reader.getLinesRead());
System.out.println("no of records in old list (present in both old and new) = " + noOfRecordsInOldList);
System.out.println("checkManually: " + checkManually);
System.out.println("noOfRecordsInOldList+checkManually = " + (noOfRecordsInOldList + checkManually));
System.out.println("no of records written = " + noOfRecordsWritten);
System.out.println();
System.out.println("inside populateMapFromSplittedCSV() " + insidepopulateMapFromSplittedCSV + "times");
logWriter.close();
reader.close();
writer.close();
}
private static void printMapToCSV(Map<String, List<String>> oldMap, String testFilePath2) throws IOException {
// writer
int i = 0;
CSVWriter writer = new CSVWriter(new FileWriter(new File(testFilePath2)), '|');
for (String key : oldMap.keySet()) {
List<String> row = oldMap.get(key);
String[] tempRow = new String[row.size()];
tempRow = row.toArray(tempRow);
writer.writeNext(tempRow);
writer.flush();
i++;
}
writer.close();
System.out.println("[hello from line 210 ( inside printMapToCSV() ) of ur code] wrote " + i + " lines");
}
private static Map<String, List<String>> populateMapFromSplittedCSV(Map<String, List<String>> oldMap, File oldDir)
throws IOException {
File defective = new File(dliRootPath + "defectiveOldFiles.csv");
defective.createNewFile();
CSVWriter defectWriter = new CSVWriter(new FileWriter(defective));
CSVReader reader = null;
for (File oldFile : oldDir.listFiles()) {
insidepopulateMapFromSplittedCSV++;
reader = new CSVReader(new FileReader(oldFile), ',', '"');
oldMap = populateMapFromCSV(oldMap, reader, defectWriter);
// printMapToCSV(oldMap, dliRootPath+testFilePath);
System.out.println(oldMap.size());
reader.close();
}
defectWriter.close();
System.out.println("inside populateMapFromSplittedCSV() " + insidepopulateMapFromSplittedCSV + "times");
return new HashMap<String, List<String>>(oldMap);
}
private static Map<String, Integer> getColumnNo(List<String> fieldsToBeMatched, String[] headingRow) {
Map<String, Integer> fieldAndColumnNo = new HashMap<>();
for (String field : fieldsToBeMatched) {
for (int i = 0; i < headingRow.length; i++) {
String heading = headingRow[i];
if (areEqual(field, heading)) {
fieldAndColumnNo.put(field, Integer.valueOf(i));
break;
}
}
}
return fieldAndColumnNo;
}
private static Map<String, List<String>> populateMapFromCSV(Map<String, List<String>> oldMap, CSVReader oldReader,
CSVWriter defectWriter) throws IOException {
int headingLen = 0;
List<String> headingRow = null;
if (oldReader.getLinesRead() > 1) {
headingRow = oldMap.get("id");
headingLen = headingRow.size();
}
String[] thisRow;
int insideWhile = 0, addedInMap = 0, doesNotContainKey = 0, containsKey = 0;
while ((thisRow = oldReader.readNext()) != null) {
// error check
// if (oldReader.getLinesRead() > 1) {
// if (thisRow.length != headingLen) {
// System.err.println("Line no: " + oldReader.getLinesRead() + " in
// file: " + dliRootPath + oldDli
// + " not read. Check manually");
// defectWriter.writeNext(thisRow);
// defectWriter.flush();
// continue;
// }
// }
insideWhile++;
if (!oldMap.containsKey(thisRow[0])) {
doesNotContainKey++;
List<String> fullRow = Arrays.asList(thisRow);
fullRow = oldMap.put(thisRow[0], fullRow);
if (fullRow == null) {
addedInMap++;
}
} else {
List<String> twinRow = oldMap.get(thisRow[0]);
boolean writtenToOp = false;
// for(String fieldName : fieldAndColumnNoInNew.keySet()){
for (int m = 0; m < headingRow.size(); m++) {
String currFieldTwin = twinRow.get(m);
String currField = thisRow[m];
if (!areEqual(currField, currFieldTwin)) {
System.err.println("do something!!!!!! DUPLICATE ID in old file");
containsKey++;
FileWriter logWriter = new FileWriter(new File((dliRootPath + log)));
System.err.println("[Skipped record] in old file. Row no: " + oldReader.getLinesRead()
+ "\nRecord: " + Arrays.toString(thisRow));
logWriter.append("[Skipped record] in old file. Row no: " + oldReader.getLinesRead()
+ "\nRecord: " + Arrays.toString(thisRow));
logWriter.close();
break;
}
}
}
}
System.out.println("inside while: " + insideWhile);
System.out.println("oldMap size = " + oldMap.size());
System.out.println("addedInMap: " + addedInMap);
System.out.println("doesNotContainKey: " + doesNotContainKey);
System.out.println("containsKey: " + containsKey);
return new HashMap<String, List<String>>(oldMap);
}
private static boolean areEqual(String field, String heading) {
// TODO implement, askSubhayan
return field.trim().equals(heading.trim());
}
/**
* Returns the first duplicate ID OR the string "unique" OR (rarely)
* totalLinesInCSV != totaluniqueIDs
*
* #param inpCSV
* #param totalLinesInCSV
* #return
* #throws IOException
*/
private static String areIDsunique(String inpCSV, int totalLinesInCSV) throws IOException {
CSVReader reader = new CSVReader(new FileReader(new File(dliRootPath + dli)), '|');
List<String[]> allRows = new ArrayList<>(reader.readAll());
reader.close();
Set<String> id = new HashSet<>();
for (String[] thisRow : allRows) {
if (thisRow[0] != null || !thisRow[0].isEmpty() || id.add(thisRow[0])) {
return thisRow[0];
}
}
if (id.size() == totalLinesInCSV) {
return "unique";
} else {
return "totalLinesInCSV != totaluniqueIDs";
}
}
/**
* writes 20 rowsof input csv into the output file
*
* #param input
* #param output
* #throws IOException
*/
public static void _readSample(String input, String output) throws IOException {
File opFile = new File(dliRootPath + newFile);
opFile.createNewFile();
CSVWriter writer = new CSVWriter(new FileWriter(opFile));
CSVReader reader = new CSVReader(new FileReader(new File(dliRootPath + dli)), '|');
for (int i = 0; i < 20; i++) {
// String[] op;
// for(String temp: reader.readNext()){
writer.writeNext(reader.readNext());
// }
// System.out.println();
}
reader.close();
writer.flush();
writer.close();
}
}

RC's comment nailed it!
If you check the java docs you will see that there are two methods in the CSVReader: getLinesRead and getRecordsRead. And they both do exactly what they say. getLinesRead returns the number of lines that was read using the FileReader. getRecordsRead returns the number of records that the CSVReader read. Keep in mind that if you have embedded new lines in the records of your file then it will take multiple line reads to get one record. So it is very conceivable to have a csv file with 100 records but taking 200 line reads to read them all.

Unescaped quotes inside a CSV cell can mess up your whole data. This might happen in a CSV if the data you are working with has been created manually. Below is a function I wrote a while back for this situation. Let me know if this is not the right place to share it.
/**
* removes quotes inside a cell/column puts curated data in
* "../CuratedFiles"
*
* #param curateDir
* #param del Csv column delimiter
* #throws IOException
*/
public static void curateCsvRowQuotes(File curateDir, String del) throws IOException {
File parent = curateDir.getParentFile();
File curatedDir = new File(parent.getAbsolutePath() + "/CuratedFiles");
curatedDir.mkdir();
for (File file : curateDir.listFiles()) {
BufferedReader bufRead = new BufferedReader(new FileReader(file));
// output
File fOp = new File(curatedDir.getAbsolutePath() + "/" + file.getName());
fOp.createNewFile();
BufferedWriter bufW = new BufferedWriter(new FileWriter(fOp));
bufW.append(bufRead.readLine() + "\n");// heading
// logs
File logFile = new File(curatedDir.getAbsolutePath() + "/CurationLogs.txt");
logFile.createNewFile();
BufferedWriter logWriter = new BufferedWriter(new FileWriter(logFile));
String thisLine = null;
int lineCount = 0;
while ((thisLine = bufRead.readLine()) != null) {
String opLine = "";
int endIndex = thisLine.indexOf("\"" + del);
String str = thisLine.substring(0, endIndex);
opLine += str + "\"" + del;
while (endIndex != (-1)) {
// leave out first " in a cell
int tempIndex = thisLine.indexOf("\"" + del, endIndex + 2);
if (tempIndex == (-1)) {
break;
}
str = thisLine.substring(endIndex + 2, tempIndex);
int indexOfQuote = str.indexOf("\"");
opLine += str.substring(0, indexOfQuote + 1);
// remove all "
str = str.substring(indexOfQuote + 1);
str = str.replace("\"", "");
opLine += str + "\"" + del;
endIndex = thisLine.indexOf("\"" + del, endIndex + 2);
}
str = thisLine.substring(thisLine.lastIndexOf("\"" + del) + 2);
if ((str != null) && str.matches("[" + del + "]+")) {
opLine += str;
}
System.out.println(opLine);
bufW.append(opLine + "\n");
bufW.flush();
lineCount++;
}
System.out.println(lineCount + " no of lines in " + file.getName());
bufRead.close();
bufW.close();
}
}

In my case, I've used csvReader.readAll() before the readNext().
Like
List<String[]> myData =csvReader.readAll();
while ((nextRecord = csvReader.readNext()) != null) {
}
So my csvReader.readNext() returns always null. Since all the values were already read by myData.
Please be caution for using readNext() and readAll() functions.

Handle Empty lines in Java

I am facing a problem in the following code. I am trying to run the program and it terminates when it hits empty space in my input. How else I should approach this.
try {
BufferedReader sc = new BufferedReader(new FileReader(text.txt);
ArrayList<String> name = new ArrayList<>();
ArrayList<String> id = new ArrayList<>();
ArrayList<String> place = new ArrayList<>();
ArrayList<String> details = new ArrayList<>();
String line = null;
while ((line = sc.readLine()) !=null) {
if (!line.trim().equals("")) {
System.out.println(line);
if (line.toLowerCase().contains("name")) {
name.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("id")) {
id.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("location")) {
place.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("details")) {
details.add(line.split("=")[1].trim());
}
}
}
PrintWriter pr = new PrintWriter(new File(text.csv));
pr.println("Name;Id;;Location;Details");
for (int i = 0; i < name.size(); i++) {
pr.println(name.get(i) + ";" + id.get(i) + ";" + place.get(i) + ";" + details.get(i));
}
pr.close();
sc.close();
} catch (Exception e) {
e.printStackTrace();
} }
My Input looks like
name = abc
id = 123
place = xyz
details = hsdyhuslkjaldhaadj
name = ert
id = 7872
place =
details = shahkjdhksdhsala
name = sfd
id = 4343
place = ksjks
Details = kljhaljs
when im trying to execute then above text my program terminates at place = "null" because of no value there.I need the output as an empty space created in place ="null" and print the rest as follows in a .csv file

If you process the location, line.split("=")[1] could result in an ArrayIndexOutOfBoundException and line.split("=")[1].trim() could result in a NullPointerException.
You can avoid this by testing your parsed result.
Instead of place.add(line.split("=")[1].trim());, do place.add(parseContentDefaultEmpty(line));, with:
private String parseContentDefaultEmpty(final String line) {
final String[] result = line.split("=");
if(result.length <= 1) {
return "";
}
final String content = line.split("=")[1];
return content != null ? content.trim() : "";
}

First there is a issue,your input file contains key as "place" but your are trying for word "location"
if (line.toLowerCase().contains("location")) { //this must be changed to place
place.add(line.split("=")[1].trim());
}
Modified the code snippet as below.check it
while ((line = sc.readLine()) != null) {
if (!line.trim().equals("")) {
System.out.println(line);
if (line.toLowerCase().contains("name")) {
name.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("id")) {
id.add(line.split("=")[1].trim());
}
if (line.toLowerCase().contains("place")) {
// change done here to add space if no value
place.add(line.split("=").length > 1 ? line.split("=")[1]
.trim() : " ");
}
if (line.toLowerCase().contains("details")) {
details.add(line.split("=")[1].trim());
}
}
}

Setting question to line doesn't appear to change what line is read later (if you're wanting the line to advance before it hits the while loop).

Output is Displayed in console and not in the File

try {
BufferedReader sc = new BufferedReader(new FileReader("/home/aravind/Desktop/India.txt"));
ArrayList<String> name = new ArrayList<>();
ArrayList<String> Location = new ArrayList<>();
ArrayList<String> Id = new ArrayList<>();
ArrayList<String> Details = new ArrayList<>();
String line = " ";
while ((line = sc.readLine()) != null) {
if (!line.trim().equals("")) {
System.out.println(line);
if (line.toLowerCase().contains("name")) {
name.add(line.split(":")[1].trim());
}
if (line.toLowerCase().contains("Location")) {
Location.add(line.split(":")[1].trim());
}
if (line.toLowerCase().contains("Id")) {
Id.add(line.split(":")[1].trim());
}
if (line.toLowerCase().contains("Details")) {
Details.add(line.split(":")[1].trim());
}
}
}
for (int i = 0; i < name.size(); i++) {
PrintWriter out = new PrintWriter(newFileWriter("output.csv"));
out.println("name;Location;Id;Details;");
out.println(name.get(i) + ";"
+ Location.get(i) + ";"
+ Id.get(i) + ";"
+ Details.get(i) + ";");
out.close();
}
sc.close();
} catch (Exception e) {
}
and my input file looks like
name = abc
id = 123
Place = xyz
Details = some texts with two line
name = aaa
id = 54657
Place = dfd
Details = some texts with some lines
What could be the problem why it is not printing in csv file instead prints o/p in console..Kindly help me

In your file, title and value are always separated by "=", whereas at runtime you trim strings by ":". You should replace ":" by "=", thus your trim result will not be empty at index 1.:
name.add(line.split("=")[1].trim());

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Read CSV file column by column - java

Related

Import Data as Array from TXT File in Java

display array issues possibly due to whitespace characters

readNext() function of CSVReader not looping through all rows of csv [EDIT: How to handle erroneous CSV (remove unescaped quotes)]

Handle Empty lines in Java

Output is Displayed in console and not in the File

Categories

Resources