Univocity CSV parser glues the whole line if it begins with quote " - java

I'm using univocity 2.7.5 to parse csv file. Till now it worked fine and parsed a row in csv file as String array with n elements, where n = number of columns in a row. But now i have a file, where rows start with quote " and the parser cannot handle it. It returns a row as String array with only one element which contains whole row data. I tried to remove that quote from csv file and it worked fine, but there are about 500,000 rows. What should i do to make it work?
Here is the sample line from my file (it has quotes in source file too):
"100926653937,Kasym Amina,620414400630,Marzhan Erbolova,""Kazakhstan, Almaty, 66, 3"",87029845662"
And here's my code:
CsvParserSettings settings = new CsvParserSettings();
settings.setDelimiterDetectionEnabled(true);
CsvParser parser = new CsvParser(settings);
List<String[]> rows = parser.parseAll(csvFile);

Author of the library here. The input you have there is a well-formed CSV, with a single value consisting of:
100926653937,Kasym Amina,620414400630,Marzhan Erbolova,"Kazakhstan, Almaty, 66, 3",87029845662
If that row appeared in the middle of your input, I suppose your input has unescaped quotes (somewhere before you got to that line). Try playing with the unescaped quote handling setting:
For example, this might work:
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
If nothing works, and all your lines look like the one you posted, then you can parse the input twice (which is shitty and slow but will work):
CsvParser parser = new CsvParser(settings);
parser.beginParsing(csvFile);
List<String[]> out = new ArrayList<>();
String[] row;
while ((row = parser.parseNext()) != null) {
//got a row with unexpected length?
if(row.length == 1){
//break it down again.
row = parser.parseLine(row[0]);
}
out.add(row);
}
Hope this helps.

Related

Importing two CSV files into Java and then parsing them. The first one works the second doesnt

Im working on my code where I am importing two csv files and then parsing them
//Importing CSV File for betreuen
String filename = "betreuen_4.csv";
File file = new File(filename);
//Importing CSV File for lieferant
String filename1 = "lieferant.csv";
File file1 = new File(filename1);
I then proceed to parse them. For the first csv file everything works fine. The code is
try {
Scanner inputStream = new Scanner(file);
while(inputStream.hasNext()) {
String data = inputStream.next();
String[] values = data.split(",");
int PInummer = Integer.parseInt(values[1]);
String MNummer = values[0];
String KundenID = values[2];
//System.out.println(MNummer);
//create the caring object with the required paramaters
//Caring caring = new Caring(MNummer,PInummer,KundenID);
//betreuen.add(caring);
}
inputStream.close();
}catch(FileNotFoundException d) {
d.printStackTrace();
}
I then proceed to parse the other csv file the code is
// parsing csv file lieferant
try {
Scanner inputStream1 = new Scanner(file1);
while(inputStream1.hasNext()) {
String data1 = inputStream1.next();
String[] values1 = data1.split(",");
int LIDnummer = Integer.parseInt(values1[0]);
String citynames = values1[1];
System.out.println(LIDnummer);
String firmanames = values1[2];
//create the suppliers object with the required paramaters
//Suppliers suppliers = new
//Suppliers(LIDnummer,citynames,firmanames);
//lieferant.add(suppliers);
}
inputStream1.close();
}catch(FileNotFoundException d) {
d.printStackTrace();
}
the first error I get is
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at Verbindung.main(Verbindung.java:61)
So I look at my array which is firmaname at line 61 and I think, well it's impossible that its out of range since in my CSV file there are three columns and at index 2 (which I know is the third column in the CSV file) is my list of company names. I know the array is not empty because when i wrote
`System.out.println(firmanames)`
it would print out three of the first company names. So in order to see if there is something else causing the problem I commented line 61 out and I ran the code again. I get the following error
`Exception in thread "main" java.lang.NumberFormatException: For input
string: "Ridge"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at Verbindung.main(Verbindung.java:58)`
I google these errors and you know it was saying im trying to parse something into an Integer which cannot be an integer, but the only thing that I am trying to parse into an Integer is the code
int LIDnummer = Integer.parseInt(values1[0]);
Which indeed is a column containing only Integers.
My second column is also indeed just a column of city names in the USA. The only thing with that column is that there are spaces in some town names like Middle brook but I don't think that would cause problems for a String type. Also in my company columns there are names like AT&T but i would think that the & symbol would also not cause problems for a string. I don't know where I am going wrong here.
I cant include the csv file but here is a pic of a part of it. The length of each column is a 1000.
A pic of the csv file
Scanner by default splits its input by whitespace (docs). Whitespace means spaces, tabs and newlines.
So your code will, I think, split the whole input file at every space and every newline, which is not what you want.
So, the first three elements your code will read are
5416499,Prairie
Ridge,NIKE
1765368,Edison,Cartier
I suggest using method readLine of BufferedReader then calling split on that.
The alternative is to explicitly tell Scanner how you want it to split the input
Scanner inputStream1 = new Scanner(file1).useDelimiter("\n");
but I think this is not the best use of Scanner when a simpler class (BufferedReader) will do.
First of all, I would highly suggest you try and use an existing CSV parser, for example this one.
But if you really want to use your own, you are going to need to do some simple debugging. I don't know how large your file is, but the symptoms you are describing lead me to believe that somewhere in the csv there may be a missing comma or an accidental escape character. You need to find out what line it is. So run this code and check its output before it crashes:
int line = 1;
try {
Scanner inputStream1 = new Scanner(file1);
while(inputStream1.hasNext()) {
String data1 = inputStream1.next();
String[] values1 = data1.split(",");
int LIDnummer = Integer.parseInt(values1[0]);
String citynames = values1[1];
System.out.println(LIDnummer);
String firmanames = values1[2];
line++;
}
} catch (ArrayIndexOutOfBoundsException e){
System.err.println("The issue in the csv is at line:" + line);
}
Once you find what line it is, the answer should be obvious. If not, post a picture of that line and we'll see...

Trying to convert CSV to XLSX, but columns are being split up using the wrong commas

I am using the accepted answer from here. Basically, I am converting a csv to .xlsx, and it looks like the solution pulls everything in individual cells into 1 line using the buffered reader, and then using:
String str[] = currentLine.split(",");
.. the string is split up into separate parts of the array for each column. My problem is that in some of my data, there are commas, so the algorithm gets confused and makes more columns than needed, splitting sentences into different columns which doesn't really work for me. Is there another way I can split the sentences up perhaps? I'd happily split the string up using a different unique character (maybe |?), but I don't know how to replace the comma provided by the bufferedreader. Any help would be great. Code I am using below for reference:
public static void csvToXLSX() {
try {
String csvFileAddress = "test.csv"; //csv file address
String xlsxFileAddress = "test.xlsx"; //xlsx file address
XSSFWorkbook workBook = new XSSFWorkbook();
XSSFSheet sheet = workBook.createSheet("sheet1");
String currentLine=null;
int RowNum=0;
BufferedReader br = new BufferedReader(new FileReader(csvFileAddress));
while ((currentLine = br.readLine()) != null) {
String str[] = currentLine.split(",");
RowNum++;
XSSFRow currentRow=sheet.createRow(RowNum);
for(int i=0;i<str.length;i++){
currentRow.createCell(i).setCellValue(str[i]);
}
}
FileOutputStream fileOutputStream = new FileOutputStream(xlsxFileAddress);
workBook.write(fileOutputStream);
fileOutputStream.close();
System.out.println("Done");
} catch (Exception ex) {
System.out.println(ex.getMessage()+"Exception in try");
}
}
Well, CSV is something more than just text file with lines separated with commas.
For example, some fields in CSV can be quoted; this is the way comma is escaped within one field.
Quotes are quoted as well, with double-quotes.
And there also could be newlines within one CSV line, they must also be quoted.
So, to sum up, a CSV lines
1,"2,3","4
5",6,7,""""
should be parsed to array of "1", "2,3", "4\n5", "6", "7","\"" (and that is a single row of a CSV table).
As you can see, you can't just mindlessly split every line by comma. I suggest you to use some library instead of doing this by yourself. http://www.liquibase.org/javadoc/liquibase/util/csv/opencsv/CSVReader.html will work just fine.

Reading CSV line by line using OpenCSV and FifoBuffer

I am reading a CSV file and using OpenCSV to read it and CircularFifoBuffer to split the data into columns and assign the value from each column to a string. This works fine for reading a specific row in the csv file, however I wish to read the csv file line by line starting at the beginning and working downwards to the final row.
Then each time a row is read the string values will be compared and provided a given condition is satisfied the next row will be read.
I can handle all of the above bar processing the CSV data line by line.
Any pointers would be greatly appreciated.
Directly from the FAQ:
If you want to use an Iterator style pattern, you might do something like this:
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}
Or, if you might just want to slurp the whole lot into a List, just call readAll()...
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
List myEntries = reader.readAll();
which will give you a List of String[] that you can iterate over. If all else fails, check out the Javadoc.

How to get proper string array when parsing CSV?

Using jcsv I'm trying to parse a CSV to a specified type. When I parse it, it says length of the data param is 1. This is incorrect. I tried removing line breaks, but it still says 1. Am I just missing something in plain sight?
This is my input string csvString variable
"Symbol","Last","Chg(%)","Vol",
INTC,23.90,1.06,28419200,
GE,26.83,0.19,22707700,
PFE,31.88,-0.03,17036200,
MRK,49.83,0.50,11565500,
T,35.41,0.37,11471300,
This is the Parser
public class BuySignalParser implements CSVEntryParser<BuySignal> {
#Override
public BuySignal parseEntry(String... data) {
// console says "Length 1"
System.out.println("Length " + data.length);
if (data.length != 4) {
throw new IllegalArgumentException("data is not a valid BuySignal record");
}
String symbol = data[0];
double last = Double.parseDouble(data[1]);
double change = Double.parseDouble(data[2]);
double volume = Double.parseDouble(data[3]);
return new BuySignal(symbol, last, change, volume);
}
}
And this is where I use the parser (right from the example)
CSVReader<BuySignal> cReader = new CSVReaderBuilder<BuySignal>(new StringReader( csvString)).entryParser(new BuySignalParser()).build();
List<BuySignal> signals = cReader.readAll();
jcsv allows different delimiter characters. The default is semicolon. Use CSVStrategy.UK_DEFAULT to get to use commas.
Also, you have four commas, and that usually indicates five values. You might want to remove the delimiters off the end.
I don't know how to make jcsv ignore the first line
I typically use CSVHelper to parse CSV files, and while jcsv seems pretty good, here is how you would do it with CVSHelper:
Reader reader = new InputStreamReader(new FileInputStream("persons.csv"), "UTF-8");
//bring in the first line with the headers if you want them
List<String> firstRow = CSVHelper.parseLine(reader);
List<String> dataRow = CSVHelper.parseLine(reader);
while (dataRow!=null) {
...put your code here to construct your objects from the strings
dataRow = CSVHelper.parseLine(reader);
}
You shouldn't have commas at the end of lines. Generally there are cell delimiters (commas) and line delimiters (newlines). By placing commas at the end of the line it looks like the entire file is one long line.

Exporting CSV really strange formatted

What am i doing? I am exporting my sqlite database into a csv -- atleast i try to
I've done this both manually and with "OpenCSV".
With both methods I get very strange results. They just seem not well formatted. Neither the columns (which are usually seperated by ',' ? ) nor special characters (which are said to be handled within opencsv) look like they should. code:
CSVWriter writer = new CSVWriter(new FileWriter(file),'\n',',');
String[] items = new String[11];
c.moveToFirst();
while(!c.isAfterLast()){
items[0] = c.getString(c.getColumnIndex(BaseColumns._ID));
items[1] = c.getString(c.getColumnIndex(DepotTableMetaData.ITEM_QRCODE));
items[2] = c.getString(c.getColumnIndex(DepotTableMetaData.ITEM_NAME));
items[3] = c.getString(c.getColumnIndex(DepotTableMetaData.ITEM_AMOUNT));
items[4] = c.getString(c.getColumnIndex(DepotTableMetaData.ITEM_UNIT));
items[5] = c.getString(c.getColumnIndex(DepotTableMetaData.ITEM_PPU));
items[6] = c.getString(c.getColumnIndex(DepotTableMetaData.ITEM_TOTAL));
items[7] = c.getString(c.getColumnIndex(DepotTableMetaData.ITEM_COMMENT));
items[8] = c.getString(c.getColumnIndex(DepotTableMetaData.ITEM_SHOPPING));
items[9] = c.getString(c.getColumnIndex(DepotTableMetaData.CREATED_DATE));
items[10] = c.getString(c.getColumnIndex(DepotTableMetaData.MODIFIED_DATE));
c.moveToNext();
writer.writeNext(items);
}
writer.close();
and it all gives this as a result:
I've also done it through FileWriter and StringBuffer but it seems to give exactly the same results...I'd love if you could help me ;)
I have looked through stackoverflow but couldn't find any matching question ;/
edit: yes i know that I use the "old, deprecated" cursor, but that's not the question here. Thanks.
edit2: SOLVED !
you have to assign some common encoding !
CSVWriter writer = new CSVWriter(new OutputStreamWriter(new FileOutputStream(destination+"/output.csv"),"UTF-8"));
did the job perfectly!
You use an OpenCSV Writer, which takes a row of the CSV file as an array of Strings, and generates the separators between columns and rows automatically, but instead of letting OpenCSV do it for you, you do it explicitely by appending all the values of a row in a single String. So obviously, OpenCSV takes your unique value and considers it contains a single column, where commas and newlines must be encoded.
You should call writer.writeNext() with an array of Strings, each String in the array being a single cell from the table. The writer will generate the commas and the newlines for you.

Categories