Regarding Java Split Command CSV File Parsing - java

I have a csv file in the below format. I get an issue if either one of the beow csv data is read by the program
"D",abc"def,"","0429"292"0","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
"D","abc"def","","04292920","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
The below split command is used to ignore the commas inside the double quotes i got the below split command from an earlier post. Pasted the URL that i took this command
String items[] = line.split(",(?=([^\"]\"[^\"]\")[^\"]$)",15);
System.out.println("items.length"+items.length);
Regarding Java Split Command Parsing Csv File
The items.length is printed as 14 instead of 15. The abc"def is not recognized as a individual field and it's getting incorrectly stored as
"D",abc"def in items[0]. . I want it to be stored in the below way
items[0] should be "D" and items[1] should be abc"def
The same issue happens when there is a value "abc"def". I want it to be stored as
items[0] should be "D" and items[1] should be "abc"def"
Also this split command works perfectly if the double quotes repeated inside the double quotes( field value is D,"abc""def",1 ).
How can i resolve this issue.

I think you would be much better off writing a parser to parse the CSV files rather than try to use a regular expression. Once you start dealing with CSV files with carriage returns within the lines, then the Regex will probably fall apart. It wouldn't take that much code to write a simple while loop that went through all the characters and split up the data. It would be lot easier to deal with "Non-Standard"* CSV files such as yours when you have a parser rather than a Regex.
*I say non-standard because there isn't really an official standard for CSV, and when you're dealing with CSV files from many different systems, you see lots of weird things, like the abc"def field as shown above.

opencsv is a great simple and light weight CSV parser for Java. It will easily handle your data.

If possible, changing your CSV format would make the solution very simple.
See the following for an overview of Delimiter Separated Values, a common format on Unix-based systems:
http://www.faqs.org/docs/artu/ch05s02.html#id2901882

Opencsv is very simple and best API for CSV parsing . This can be done with Linux SED commands prior processing it in java . If File is not in proper format convert it into proper delimited which is your (" , " ) into pipe or other unique delimiter , so inside field value and column delimiter can be differentiated easily by Opencsv.Use the power of linux with your java code.

Related

Read CSV with semicolons into a String on Java

My main problem is that I'm trying to read a CSV delimited by ; in Java and the problem comes when I try to read a field of the CSV that contains a ;. For example:
"I want you to do that;"
In this case the field is recognized like
"I want you to do that"
And it creates another field that is just an empty string.
I use a BufferedReader to read the CSV and the split method to separate it with the ;. I'm not allowed to use libraries like OpenCSV so I want to find a solution with the method I'm using.
Parse according to the quotation marks
If the data incidentally containing the delimiter is wrapped in double quotes (QUOTATION MARK), then you should have no problem with parsing. Your parsing should look first for pairs of double quote characters. After that, look for delimiters outside of those pairs.
Rather than writing the parsing code yourself, I highly recommend using a CSV library. In the Java ecosystem, you have a wealth of good products to choose. For example, I have made successful use of Apache Commons CSV.
See also the specification for CSV: RFC 4180.

Splitting a csv file which is having comma snd special characters in its data in Java

I want to split a CSV file which is having comma and other special characters in its data using java. I tried regex way of splitting like line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1); and more similar kind of things. But splitting is wrong in some rows.
CSV is having around 3000 rows. Some of them are not properly getting split.
Please suggest a standard way to split the data in csv file.
If you have standard desktop or web application Apache-CSV or OpenCSVmight help you. If you are dealing with some kind of "Big Data" technologies have a look at Spark.
Replace all special character to + and then split
String result = str.replaceAll("[^\\dA-Za-z ]", "").replaceAll("\\s+", "+");
Instead of separating values using a comma, you can use tab(\t).
File can be saved with .csv extension. It has worked for me.

comma seperated escaping for a value while writing in csv

I am writing some values in csv file but the value containing commas get split into >1 once
e.g. a,b,c is one value and should appear in 1 cell but it's appearing in 3 cells.
writer.append(node.getLongName());
this is how I am writing data into csv files using FileWriter. If node.getLongName() gives me value having commas then value is split according to internal comma.
Can anyone please tell how to make this work and avoid splitting of value.
You are writing in to a CSV file but do you know out of your source file which fields should not be separated. If you do then you can change the seperator for that field from comma to some other seperator like '+' and than append with the other element of the CSV. As an example:
10/09/2016, cycling club,(sam+1000+oklahoma),(henry+ 1001+california),( bill+1002+NY)
Here inside the parenthesis It has the details of students. They were command separated before but I changed it to plus sign.
Although is can be manipulated by hand for trivial tasks, CSV format is tricky as soon as you need to process delimiter or new line escaping.
Unless you want to do the heavy testing yourself for all corner cases, you best bet is to rely on a well known CSV library like the one from apache.
Here it is still simple enough (assuming you only need to escape commas), and the common usage is to quote fields containing blanks or delimiters. That means to not write a,b,c but "a,b,c":
writer.append("\"" + node.getLongName()+ "\"");

Mongoexport - Issue with "\n"

I was trying to export the data from Mongo to Oracle. I used to below approach.
Step 1 : Export the data to CS file usign mongoExport command.
Step 2 : Read the data through a java code, do the necessary data transformation.
Step 3 : Insert the data to Oracle
Issue is that, when any of the comment section has a new line character ('\n'), the data is moving to next line and java read fails to process the document.
There is a open bug with 10gen for this, JIRA. Has any one faced issue. Is there is a workaround for this ?
As with many formatting nuances in CSV, there is no agreed "standard" for how to handle embedded newline characters in a CSV field.
A common implementation is RFC-4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files, which suggests:
6) Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes.
For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
This is the format that mongoexport is currently using. If you use a CSV parser compliant with RFC-4180 (eg. SuperCSV as suggested by #evanchooly) it should handle the quoted newlines as expected.
If you need an alternative to the format used by mongoexport or need more flexibility in your output, you can always write your own export script.
Are you trying to parse the csv manually? If so, take a look at http://opencsv.sourceforge.net/ or http://supercsv.sourceforge.net/ and see if they help.

If data contains commas, how to store it in csv?

If the task is to create a csv file out of some data where commas may be present, is there a way to do it without later confusing which comma is a delimiter and which comma is part of a value?
Obviously, we can use a different delimiter, replace all occurrences, or replace the original comma with something else, but for the purpose of this question let's say that modifying the original data is not an option and a comma is the only delimiter allowed.
How would you approach something like this? Would it be easier to create the xls instead? Can you recommend any java libraries that handle this well?
A true CSV reader should be able to handle this; the values should be in quotes, e.g.:
one,two,"a, b, c",four
...per item #6 in Section 2 of the RFC.
While there's no single CSV standard, the usual convention is to surround entries containing commas in double quotes (i.e. ").
Prempting the next question: What to do if your data contains a double quote? In this case they are usually substituted for a pair of double quotes.
While I hate to cite wikipedia as a source, they do have a pretty good roundup of basic rules and examples for CSV formatting.
I would either use a different delimiter or use a library like Apache POI.
I think the best way is to use Apache POI: http://poi.apache.org/
You can easily create XLS documents without much hassle.
However, if you really need CSV and not XLS, you can surround the value with quotes. This should also solve the problem.
Usually, you work with , as separator and ' as quote. So your values would look like:
foo, 'bar, baz', iik, aje
the task is to create a csv file
Actually an impossible task, since there is no such thing as "a CSV" file. Different Microsoft produces have used different (subtly different, I grant) formats and named them all "CSV". As most spreadsheets can read delimiter separated value (DSV) files, you might be better writing one of those.

Categories