Invalid char between encapsulated token and delimiter in Apache Commons CSV library - java

I am getting the following error while parsing the CSV file using the Apache Commons CSV library.
Exception in thread "main" java.io.IOException: (line 2) invalid char between encapsulated token and delimiter
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:275)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:152)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:450)
at org.apache.commons.csv.CSVParser.getRecords(CSVParser.java:327)
at parse.csv.file.CSVFileParser.main(CSVFileParser.java:29)
What's the meaning of this error ?

We ran into this issue when we had embedded quote in our data.
0,"020"1,"BS:5252525 ORDER:99999"4
Solution applied was CSVFormat csvFileFormat = CSVFormat.DEFAULT.withQuote(null);
#Cuga tip helped us to resolve. Thanks #Cuga
Full code is
public static void main(String[] args) throws IOException {
FileReader fileReader = null;
CSVFormat csvFileFormat = CSVFormat.DEFAULT.withQuote(null);
String fileName = "test.csv";
fileReader = new FileReader(fileName);
CSVParser csvFileParser = new CSVParser(fileReader, csvFileFormat);
List<CSVRecord> csvRecords = csvFileParser.getRecords();
for (CSVRecord csvRecord : csvRecords) {
System.out.println(csvRecord);
}
csvFileParser.close();
}
Result is
CSVRecord [comment=null, mapping=null, recordNumber=1, values=[0, "020"1, "BS:5252525 ORDER:99999"4]]

That line in the CSV file contains an invalid character between one of your cells and either the end of line, end of file, or the next cell. A very common cause for this is a failure to escape your encapsulating character (the character that is used to "wrap" each cell, so CSV knows where a cell (token) starts and ends.

I found the solution to the problem.
One of my CSV file has an attribute as follows:
"attribute with nested "quote" "
Due to nested quote in the attribute the parser fails.
To avoid the above problem escape the nested quote as follows:
"attribute with nested """"quote"""" "
This is the one way to solve the problem.

We ran into this in this same error with data containing quotes in otherwise unquoted input. I.e.:
some cell|this "cell" caused issues|other data
It was hard to find, but in Apache's docs, they mention the withQuote() method which can take null as a value.
We were getting the exact same error message and this (thankfully) ended up fixing the issue for us.

I ran into this issue when I forgot to call .withNullString("") on my CSVFormat. Basically, this exception always occurs when:
your quote symbol is wrong
your null string representation is wrong
your column separator char is wrong
Make sure you know the details of your format. Also, some programs use leading byte-order-marks (for example, Excel uses \uFEFF) to denote the encoding of the file. This can also trip up your parser.

Related

Using Same Escape and Quote Character Breaks CSV

I have a simple CSV file like this:
SellerProductID;ProductTextLong
1000;"a ""good"" Product"
And this is the try to read it in with Apache CSV:
try (Reader reader = new StringReader(content)) {
CSVFormat format = CSVFormat.DEFAULT.withDelimiter(';').withHeader().withEscape('"').withQuote('"');
CSVParser records = format.parse(reader);
System.out.println(records.iterator().next());
}
That doesn't work because of:
Exception in thread "main" java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (startline 2) EOF reached before encapsulated token finished
at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:145)
at org.apache.commons.csv.CSVParser$CSVRecordIterator.next(CSVParser.java:171)
at org.apache.commons.csv.CSVParser$CSVRecordIterator.next(CSVParser.java:137)
Caused by: java.io.IOException: (startline 2) EOF reached before encapsulated token finished
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:288)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:158)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:674)
at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:142)
... 3 more
Other CSV tools (e.g. Google Sheets) can load the CSV just fine.
It works if I use another quote or escape character, but sadly the customer's CSV is set.
How do I configure Apache CSV to allow the same escape and quote character? Or is there any way to modify a stream to replace the quote characters on the fly (the files are gigantic)?
The entire problem is that " is not the "escape character".
From Wikipedia:
Embedded double quote characters may then be represented by a pair of consecutive double quotes, or by prefixing a double quote with an escape character such as a backslash.
So in this case, "" is just two quote characters next to each other, while the escape character is a differenct character used to escape quotes or line breaks or separators.
This fixes it (note that withEscape() is called differently, but the example data doesn't show what the escape character actually is):
try (Reader reader = new StringReader(content)) {
CSVFormat format = CSVFormat.DEFAULT.withDelimiter(';').withHeader().withEscape('/').withQuote('"');
CSVParser records = format.parse(reader);
System.out.println(records.iterator().next());
}
I have looked over your issue and this article and this post might help you. Try to use also with .withNullString("").

Java: Using InputStream and Apache Commons CSV without line numbers

This is probably very simple, but I have not been able to find an option to do this. I'm trying to Apache Commons CSV to read a file for later validations. The CSV in question is submitted as an Input Stream, which seems to add an additional column to the file when it reads it, containing the line numbers. I would like to be able to ignore it, if possible, as the header row does not contain a number, which causes an error. Is there an option already in InputStream to do this, or will I have to set up some kind of post processing?
The code I'm using is as follows:
public String validateFile(InputStream filePath) throws Exception{
System.out.println("Sending file to reader");
System.out.println(filePath);
InputStreamReader in = new InputStreamReader(filePath);
//CSVFormat parse needs a reader object
System.out.println("sending reader to CSV parse");
for (CSVRecord record : CSVFormat.DEFAULT.withHeader().parse(in)) {
for (String field : record) {
System.out.print("\"" + field + "\", ");
}
System.out.println();
}
return null;
}
When using withHeader(), I end up with the following error:
java.lang.IllegalArgumentException: A header name is missing in [, Employee_ID, Department, Email]
and I can't simply skip it, as I will need to do some validations on the header row.
Also, here is an example CSV file:
"Employee_ID", "Department", "Email"
"0123456","Department of Hello World","John.Doe#gmail.com"
EDIT: Also, The end goal is to validate the following:
That there are columns called "Employee_ID", "Department", and "Email". For this, I think I'll need to remove .withHeader().
Each line is comma delimited.
There are no empty cells values
Newer versions of Commons-CSV have trouble with empty headers.
Maybe that's the case here as well?
You just mentioned "no empty cell values" not sure if this included headers as well...
Also see: https://issues.apache.org/jira/browse/CSV-257
Setting .setAllowMissingColumnNames(true) did the trick for me.
final CSVFormat csvFormat = CSVFormat.Builder.create()
.setHeader(HEADERS)
.setAllowMissingColumnNames(true)
.build();
final Iterable<CSVRecord> records = csvFormat.parse(reader);

Java: OpenCSV escape character in fields

I have my input file with quoted fields. Below is how I am initializing the CSV reader
CSVParser parser = new CSVParserBuilder().withSeparator(CSVParser.DEFAULT_SEPARATOR).build();
CSVReader reader = new CSVReaderBuilder(new FileReader("abc.txt")).withCSVParser(parser).build();
With the following input, it reads properly.
"1","abc","this works properly with ""quotes"" as well"
With the following input, it fails
"1","abc","this fails with \""backslash\"" and ""quotes"". "
I know in java backslash is an escape character. Is there a workaround to read the above line properly? Unfortunately, I can't change the input format as its generated by our client's legacy system.

Java: Trying to ignore (or skip) a Carriage-Return(\n) that isn't an End-of-Line

I successfully read in data from csv files with this code
Scanner scanIt = new Scanner(new BufferedReader(new FileReader(filef)));
while (scanIt.hasNextLine())
{
String inputLine = scanIt.nextLine();
System.out.println(inputLine);
}
scanIt.close();
until I encountered this line (in the file) which seems to have a "carriage return" buried within the read line, located between- ,"TBD and TBD",
Incoming PR# & Doc#: ,"TBDTBD",,Funds Held By Sponsor/Unallocated
Funds,,,, $- ,,,NS02 , $- , $- ,
I am trying to solve this problem by tell the code to look for a carriage return "\n" preceded by a comma "," as a true end of line
while (scanIt.hasNext(","+"\n"))
but that did not solve the problem.
What are ideas to resolve this problem?
Thank you for taking the time to do this.
you need a CSV parser to properly read the file. The real solution is that a \n or \r in between " is not the end of a line.
Please use a proper CSV parser because this is not the only problem you will encounter. Data with a , could be another such problem. a CSV parser will solve all of these.

JSon to CSV with Java using CDL: possible to replace comma-sep. by semi-colum sep. values?

Everything is in the title :)
I'm using org.json.CDL to convert JSONArray into CSV data but it renders a string with ',' as separator.
I'd like to know if it's possible to replace with ';' ?
Here is a simple example of what i'm doing:
public String exportAsCsv() throws Exception {
return CDL.toString(
new JSONArray(
mapper.writeValueAsString(extractAccounts()))
);
}
Thanks in advance for any advice on that question.
Edit: No replacement solution of course, as this could have impact for large data, and of course the library used enable me to specify the field separator.
Edit2: Finally the solution to extract data as JSONArray (and String...) was not very good, especially for large data file.
So i made the following changes:
use a Java CSV library (for example: http://www.csvreader.com/java_csv_samples.php)
refactor code to stream data from json input source to csv output source
This is nicer for large data treatment. If you have comments do not hesitate.
String output = "Hello,This,is,separated,by,a,comma";
// Simple call the replaceAll method.
output = output.replace(',',';');
I found this in the String documentation.
Example
String value = "Hello,tthis,is,a,string";
value = value.replace(',', ';');
System.out.println(value);
// Outputs: Hello;tthis;is;a;string

Categories