Reading CSV line by line using OpenCSV and FifoBuffer - java

I am reading a CSV file and using OpenCSV to read it and CircularFifoBuffer to split the data into columns and assign the value from each column to a string. This works fine for reading a specific row in the csv file, however I wish to read the csv file line by line starting at the beginning and working downwards to the final row.
Then each time a row is read the string values will be compared and provided a given condition is satisfied the next row will be read.
I can handle all of the above bar processing the CSV data line by line.
Any pointers would be greatly appreciated.

Directly from the FAQ:
If you want to use an Iterator style pattern, you might do something like this:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
Or, if you might just want to slurp the whole lot into a List, just call readAll()...
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
List myEntries = reader.readAll();
which will give you a List of String[] that you can iterate over. If all else fails, check out the Javadoc.

Related

How to replace a CSV file after removing a column in Java

I am new to Java. I was successfully able to read my CSV file from my local file location and was able to identify which column needed to be deleted for my requirements. However, I was not able to delete the required column and write the file into my local folder. Is there a way to resolve this issue? I have used the following code:
CSVReader reader = new CSVReader(new FileReader(fileName));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
System.out.println(nextLine[15]);
}
All I would like to do is to remove the column having index 15 and write the file as a CSV file in my local folder.
I'm assuming you're using the OpenCSV library.
In order to make your code work, you have to fix 2 issues:
You need a writer to write your modified CSV to. OpenCSV provides a CSVWriter class for this purpose.
You need to convert your line (which is currently a String array) into a list to be able to remove an element, then convert it back into an array to match what the CSVWriter.writeNext method expects.
Here's some code that does this:
CSVReader reader = new CSVReader(new FileReader(fileName));
CSVWriter writer = new CSVWriter(new FileWriter(outFileName));
String[] origLine;
while ((origLine = reader.readNext()) != null) {
List<String> lineList = new ArrayList<>(Arrays.asList(origLine));
lineList.remove(15);
String[] newLine = lineList.toArray(new String[lineList.size()]);
writer.writeNext(newLine, true);
}
writer.close();
reader.close();
Some additional remarks:
The code probably needs a bit more error handling etc if it's to be used in a production capacity.
List indices in Java start at 0, so remove[15] actually removes the 16th column from the file.
The code writes its output to a separate file. Trying to use the same file name for input and output will not work.

BufferedReader - Output columns in in different order JAVA

I have 2 csv files with column 'car', 'bike', 'tractor' etc
The below code prints out data from the csv which works fine, however cvs 1 prints out in a different or to csv 2 so I want to arrange the columns in a different order.
From this code, how can I organise the data to print out in order of which column I want first, second etc.
BufferedReader r = new BufferedReader(new InputStreamReader(str));
Stream lines = r.lines().skip(1);
lines.forEachOrdered(
line -> {
line= ((String) line).replace("\"", "");
ret.add((String) line);
The columns print out like this:
csv 1
Car, Bike, Tractor, Plane, Train
csv 2
Bike, Plane, Tractor, Train, Car,
but I want to manipulate the code so the two csv files print out in the same order like;
Bike, Plane ,Tractor, Train, Car
I can't use the likes of col[1],col[3], as the two files are in different or so I would need to call them by column name in the csv file so col["Truck"] etc
Or is there another way. Like creating a new list from the csv 1 output and rearranging ?
I haven't used BufferedReader much so I'm not sure if this is a silly question and there's a simple solution
A BufferedReader reads lines, and does not care for the content of those lines. So this code will simply save lines into ret as it is reading them:
List<String> ret = new ArrayList<>();
try (BufferedReader r = new BufferedReader(new InputStreamReader(str))) {
r.lines().skip(1).forEachOrdered(l -> ret.add(l.replace("\"", ""));
}
// now ret contains one string per CSV line, excluding the 1st
(This is somewhat better than your code in that it is guaranteed to close the reader correctly, and does not require any casts to string).
If your CSV lines do not contain any , characters that are not separators, you can modify the above code to split lines into columns; which you can then reorder:
List<String[]> ret = new ArrayList<>(); // list of string arrays
try (BufferedReader r = new BufferedReader(new InputStreamReader(str))) {
r.lines().skip(1).forEachOrdered(l ->
ret.add(l.replace("\"", "").split(",")); // splits by ','
}
// now ret contains a String[] per CSV line, skipping the 1st;
// with ret.get(0)[1] being the 2nd column of the 1st non-skipped line
// this will output all lines, reversing the order of columns 1 and 2:
for (String[] line : ret) {
System.out.print(line[1] + ", " + line[0]);
for (int i=2; i<line.length; i++) System.out.print(", " + line[i]);
System.out.println();
}
If your CSV lines can contain ,s that are not delimiters, you will need to learn how to correctly parse (=read) CSVs, and that requires significantly more than a BufferedReader. I would recommend using an external library to handle this correctly (for there are many types of CSVs in the wild). In particular, using Apache Commons CSV, things are relatively straightforward:
try (Reader in = new FileReader("path/to/file.csv")) {
Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in);
for (CSVRecord record : records) {
String columnOne = record.get(0);
String columnTwo = record.get(1);
}
}

Univocity CSV parser glues the whole line if it begins with quote "

I'm using univocity 2.7.5 to parse csv file. Till now it worked fine and parsed a row in csv file as String array with n elements, where n = number of columns in a row. But now i have a file, where rows start with quote " and the parser cannot handle it. It returns a row as String array with only one element which contains whole row data. I tried to remove that quote from csv file and it worked fine, but there are about 500,000 rows. What should i do to make it work?
Here is the sample line from my file (it has quotes in source file too):
"100926653937,Kasym Amina,620414400630,Marzhan Erbolova,""Kazakhstan, Almaty, 66, 3"",87029845662"
And here's my code:
CsvParserSettings settings = new CsvParserSettings();
settings.setDelimiterDetectionEnabled(true);
CsvParser parser = new CsvParser(settings);
List<String[]> rows = parser.parseAll(csvFile);
Author of the library here. The input you have there is a well-formed CSV, with a single value consisting of:
100926653937,Kasym Amina,620414400630,Marzhan Erbolova,"Kazakhstan, Almaty, 66, 3",87029845662
If that row appeared in the middle of your input, I suppose your input has unescaped quotes (somewhere before you got to that line). Try playing with the unescaped quote handling setting:
For example, this might work:
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
If nothing works, and all your lines look like the one you posted, then you can parse the input twice (which is shitty and slow but will work):
CsvParser parser = new CsvParser(settings);
parser.beginParsing(csvFile);
List<String[]> out = new ArrayList<>();
String[] row;
while ((row = parser.parseNext()) != null) {
//got a row with unexpected length?
if(row.length == 1){
//break it down again.
row = parser.parseLine(row[0]);
}
out.add(row);
}
Hope this helps.

Java: How to read huge records (>10k)

Looking for best practice to read a file line by line which has >10 records and storing it in ArrayList.
My program was able to read to 3.5k records and ignoring rest of the records.
URL cityurl = ClassLoader.getSystemResource(citypath);
citybr = new BufferedReader(new FileReader(cityurl.getFile()));
for (String city = citybr.readLine(); city != null; city = citybr.readLine()) {
citycountryairport.add(citybr.readLine());
}
Thanks in advance!!
BufferedReader is a good choice for reading large files because it buffers the file and thus avoids loading the whole file into memory, see BufferedReader Doc.
Each time you call
readLine();
The next line of the file is read, in your code change :
citycountryairport.add(citybr.readLine());
to :
citycountryairport.add(city);
otherwise the lines read by the line
city = citybr.readLine()
will not be added to your list because you never add the String city to your list.

Trying to convert CSV to XLSX, but columns are being split up using the wrong commas

I am using the accepted answer from here. Basically, I am converting a csv to .xlsx, and it looks like the solution pulls everything in individual cells into 1 line using the buffered reader, and then using:
String str[] = currentLine.split(",");
.. the string is split up into separate parts of the array for each column. My problem is that in some of my data, there are commas, so the algorithm gets confused and makes more columns than needed, splitting sentences into different columns which doesn't really work for me. Is there another way I can split the sentences up perhaps? I'd happily split the string up using a different unique character (maybe |?), but I don't know how to replace the comma provided by the bufferedreader. Any help would be great. Code I am using below for reference:
public static void csvToXLSX() {
try {
String csvFileAddress = "test.csv"; //csv file address
String xlsxFileAddress = "test.xlsx"; //xlsx file address
XSSFWorkbook workBook = new XSSFWorkbook();
XSSFSheet sheet = workBook.createSheet("sheet1");
String currentLine=null;
int RowNum=0;
BufferedReader br = new BufferedReader(new FileReader(csvFileAddress));
while ((currentLine = br.readLine()) != null) {
String str[] = currentLine.split(",");
RowNum++;
XSSFRow currentRow=sheet.createRow(RowNum);
for(int i=0;i<str.length;i++){
currentRow.createCell(i).setCellValue(str[i]);
}
}
FileOutputStream fileOutputStream = new FileOutputStream(xlsxFileAddress);
workBook.write(fileOutputStream);
fileOutputStream.close();
System.out.println("Done");
} catch (Exception ex) {
System.out.println(ex.getMessage()+"Exception in try");
}
}
Well, CSV is something more than just text file with lines separated with commas.
For example, some fields in CSV can be quoted; this is the way comma is escaped within one field.
Quotes are quoted as well, with double-quotes.
And there also could be newlines within one CSV line, they must also be quoted.
So, to sum up, a CSV lines
1,"2,3","4
5",6,7,""""
should be parsed to array of "1", "2,3", "4\n5", "6", "7","\"" (and that is a single row of a CSV table).
As you can see, you can't just mindlessly split every line by comma. I suggest you to use some library instead of doing this by yourself. http://www.liquibase.org/javadoc/liquibase/util/csv/opencsv/CSVReader.html will work just fine.

Categories