Bindy CRLF for UNMARSHAL - java

I wanted to unmarshall a csv file to a Bean.
The issue is the record separator or the newline will be a semi colon ";"
The CSVAnnotation has a crlf separator for marhalling to a csv file. Is there a work around for the reverse scenario. As of now I am replacing the semicolon with a NEWLINE character.
But I might have a requirement where the NEWLINE could be the conventaion "\r\n" or ";"
Any suggestions would be of great help

You can set a custom newline character with
#CsvRecord(separator = ",", crlf=";")
public Class Order {
...
}

Related

CSV parsing with univocity-parsers and backslash-escaped quotes

I'm having some trouble parsing CSV with backslash escaped qoutes \". Most of lines in source CSV don't include escaped quotes but where there are I can't seem to find appropriate settings for correct parsing.
CSV example (each line with 4 columns):
1,,No quote escape,test
2,,"One quote escape\"",test
3,,"Two \"quote escapes\",test
4,,"Two \"quote escapes\" 2",test
CSV parser settings:
CsvFormat:
Comment character=#
Field delimiter=,
Line separator (normalized)=\n
Line separator sequence=\r\n
Quote character="
Quote escape character=\
Quote escape escape character=null
Code snippet:
CsvParserSettings settings = new CsvParserSettings();
settings.setDelimiterDetectionEnabled(true);
settings.setLineSeparatorDetectionEnabled(true);
settings.getFormat().setQuote('"');
settings.getFormat().setQuoteEscape('\\');
CsvParser parser = new CsvParser(settings);
parser.beginParsing(file, StandardCharsets.UTF_8);
...
Lines are parsed correctly until two escaped quotes are present in one line. Expected parsed lines are:
- 1,null,No quote escape,test
- 2,null,One quote escape",test
- 3,null,Two "quote escapes",test
- 4,null,Two "quote escapes" 2,test
Upon further inspection I found an existing issue for v2.9.1.

Java write escaped characters into file

I'm writing to a file and I need to escape some characters like a quotation mark.
File fout = new File("output.txt");
try (FileOutputStream fos = new FileOutputStream(fout); BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fos));) {
String insert = "quote's";
s += "'"+insert.replaceAll("'", "\\\'")+"'";
bw.write(s.replaceAll("\r\n", "\\\r\\\n"));
bw.newLine();
}
I'm trying to acheive writing 'quote\'s' to the file but it keeps removing the backslash and producing 'quote's'
I also want to write newlines into the file as the escaped character i.e instead of inserting a newline in file I want to write \r\n
Is this possible. I feel like I'm missing/forgetting something.
replaceAll() works with regex and accepts a special replacement syntax:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string
You're not using regex, so you can use the plaintext replace() instead. And you only need 2 backslashes at a time:
s += "'"+insert.replace("'", "\\'")+"'";
bw.write(s.replace("\r\n", "\\r\\n"));

Using Same Escape and Quote Character Breaks CSV

I have a simple CSV file like this:
SellerProductID;ProductTextLong
1000;"a ""good"" Product"
And this is the try to read it in with Apache CSV:
try (Reader reader = new StringReader(content)) {
CSVFormat format = CSVFormat.DEFAULT.withDelimiter(';').withHeader().withEscape('"').withQuote('"');
CSVParser records = format.parse(reader);
System.out.println(records.iterator().next());
}
That doesn't work because of:
Exception in thread "main" java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (startline 2) EOF reached before encapsulated token finished
at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:145)
at org.apache.commons.csv.CSVParser$CSVRecordIterator.next(CSVParser.java:171)
at org.apache.commons.csv.CSVParser$CSVRecordIterator.next(CSVParser.java:137)
Caused by: java.io.IOException: (startline 2) EOF reached before encapsulated token finished
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:288)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:158)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:674)
at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:142)
... 3 more
Other CSV tools (e.g. Google Sheets) can load the CSV just fine.
It works if I use another quote or escape character, but sadly the customer's CSV is set.
How do I configure Apache CSV to allow the same escape and quote character? Or is there any way to modify a stream to replace the quote characters on the fly (the files are gigantic)?
The entire problem is that " is not the "escape character".
From Wikipedia:
Embedded double quote characters may then be represented by a pair of consecutive double quotes, or by prefixing a double quote with an escape character such as a backslash.
So in this case, "" is just two quote characters next to each other, while the escape character is a differenct character used to escape quotes or line breaks or separators.
This fixes it (note that withEscape() is called differently, but the example data doesn't show what the escape character actually is):
try (Reader reader = new StringReader(content)) {
CSVFormat format = CSVFormat.DEFAULT.withDelimiter(';').withHeader().withEscape('/').withQuote('"');
CSVParser records = format.parse(reader);
System.out.println(records.iterator().next());
}
I have looked over your issue and this article and this post might help you. Try to use also with .withNullString("").

How to skip invalid double quote character line in csv file using java?

I have a csv file contain 78400 lines (25MB).
When I read the csv file line by line, 1 column has error in 2nd line.
It contains backslash character.
When I read this column, it read all the remaining columns in the csv file as single column.
"CDE","456","6346","testdata2","MyData2","ClassB"
"ABC","123","4567\","testdata","MyData","ClassA"
"CDE","456","6346","testdata2","MyData2","ClassB"
How to skip that line by using line seperator in java?
you can write method which would check by splitting the line into words and then identify the \ using as a char
String line=br.readline();
String words =line.split(",");
char[] word=words.toCharArray();
boolean escape=(word=='\');
You can identify the escape and handle it specially .
If you are using openCSV then just define your parser with an escape character other than backslash. If you don't want an escape character you can use the ICSVParser.NULL_CHARACTER or if you are using the 3.9 version of openCSV you can use the RFC4180Parser.
RFC4180ParserBuilder rfc4180ParserBuilder = new RFC4180ParserBuilder();
ICSVParser rfc4180Parser = rfc4180ParserBuilder.build();
CSVReaderBuilder builder = new CSVReaderBuilder(sr);
CSVReader reader = builder.withCSVParser(parser).build();

Reading a text file with a Scanner in Java - Token's return character

I'm triying to read the text file below with a java.util.Scanner in a simple Java Program.
0001;GUAJARA-MIRIM;RO
0002;ALTO ALEGRE DOS PARECIS;RO
0003;PORTO VELHO;RO
I read the text file using the code below:
scanner = new Scanner(filerader).useDelimiter("\\;|\\n");
while (scanner.hasNext()) {
int id= scanner.nextInt();
String name = scanner.next();
String code = scanner.next();
System.out.printf(".%s.%s.%d.\n", name, code, id);
}
The results are:
.GUAJARA-MIRIM.RO.1
.
.ALTO ALEGRE DOS PARECIS.RO.2
.
.PORTO VELHO.RO.3
.
But the result of the third token of each line has an incovenient '\r' caracther at the end (ANSI code 13). I have no idea why (I used the '.' character on the formatting string to to make it clear where the '\r' is).
So,
Why there's a '\r' at the end of the third token?
How to bypass it.
It is very simple to use an workaround like code.substring(0, 2), but instead I want to understand why there's a '\r' character there.
In some file systems(specially Windows), \r\n is used a new line character. You are using \n only a delimiter so \r remain out. Add \r also in your delimiters.
To make your code little more robust, use System.lineSeparator() to get the new line characters and use the delimiters accordingly.
You are using a Windows file, which uses \r\n as line delimiters (aka Carriage Return Line Feed). Unix uses only \n (Line Feed).
To fix this, add \r to your scanner delimiter.
The reason why it happens is already given, Other way to avoid this is to use scanner.nextLine() and then split by ; .

Categories