I am writing some values in csv file but the value containing commas get split into >1 once
e.g. a,b,c is one value and should appear in 1 cell but it's appearing in 3 cells.
writer.append(node.getLongName());
this is how I am writing data into csv files using FileWriter. If node.getLongName() gives me value having commas then value is split according to internal comma.
Can anyone please tell how to make this work and avoid splitting of value.
You are writing in to a CSV file but do you know out of your source file which fields should not be separated. If you do then you can change the seperator for that field from comma to some other seperator like '+' and than append with the other element of the CSV. As an example:
10/09/2016, cycling club,(sam+1000+oklahoma),(henry+ 1001+california),( bill+1002+NY)
Here inside the parenthesis It has the details of students. They were command separated before but I changed it to plus sign.
Although is can be manipulated by hand for trivial tasks, CSV format is tricky as soon as you need to process delimiter or new line escaping.
Unless you want to do the heavy testing yourself for all corner cases, you best bet is to rely on a well known CSV library like the one from apache.
Here it is still simple enough (assuming you only need to escape commas), and the common usage is to quote fields containing blanks or delimiters. That means to not write a,b,c but "a,b,c":
writer.append("\"" + node.getLongName()+ "\"");
Related
I've got a two column CSV with a name and a number. Some people's name use commas, for example Joe Blow, CFA. This comma breaks the CSV format, since it's interpreted as a new column.
I've read up and the most common prescription seems to be replacing that character, or replacing the delimiter, with a new value (e.g. this|that|the, other).
I'd really like to keep the comma separator (I know excel supports other delimiters but other interpreters may not). I'd also like to keep the comma in the name, as Joe Blow| CFA looks pretty silly.
Is there a way to include commas in CSV columns without breaking the formatting, for example by escaping them?
To encode a field containing comma (,) or double-quote (") characters, enclose the field in double-quotes:
field1,"field, 2",field3, ...
Literal double-quote characters are typically represented by a pair of double-quotes (""). For example, a field exclusively containing one double-quote character is encoded as """".
For example:
Sheet: |Hello, World!|You "matter" to us.|
CSV: "Hello, World!","You ""matter"" to us."
More examples (sheet → csv):
regular_value → regular_value
Fresh, brown "eggs" → "Fresh, brown ""eggs"""
" → """"
"," → ""","""
,,," → ",,,"""
,"", → ","""","
""" → """"""""
See wikipedia.
I found that some applications like Numbers in Mac ignore the double quote if there is space before it.
a, "b,c" doesn't work while a,"b,c" works.
The problem with the CSV format, is there's not one spec, there are several accepted methods, with no way of distinguishing which should be used (for generate/interpret). I discussed all the methods to escape characters (newlines in that case, but same basic premise) in another post. Basically it comes down to using a CSV generation/escaping process for the intended users, and hoping the rest don't mind.
Reference spec document.
If you want to make that you said, you can use quotes. Something like this
$name = "Joe Blow, CFA.";
$arr[] = "\"".$name."\"";
so now, you can use comma in your name variable.
You need to quote that values.
Here is a more detailed spec.
In addition to the points in other answers: one thing to note if you are using quotes in Excel is the placement of your spaces. If you have a line of code like this:
print '%s, "%s", "%s", "%s"' % (value_1, value_2, value_3, value_4)
Excel will treat the initial quote as a literal quote instead of using it to escape commas. Your code will need to change to
print '%s,"%s","%s","%s"' % (value_1, value_2, value_3, value_4)
It was this subtlety that brought me here.
You can use Template literals (Template strings)
e.g -
`"${item}"`
CSV files can actually be formatted using different delimiters, comma is just the default.
You can use the sep flag to specify the delimiter you want for your CSV file.
Just add the line sep=; as the very first line in your CSV file, that is if you want your delimiter to be semi-colon. You can change it to any other character.
This isn't a perfect solution, but you can just replace all uses of commas with ‚ or a lower quote. It looks very very similar to a comma and will visually serve the same purpose. No quotes are required
in JS this would be
stringVal.replaceAll(',', '‚')
You will need to be super careful of cases where you need to directly compare that data though
Depending on your language, there may be a to_json method available. That will escape many things that break CSVs.
I faced the same problem and quoting the , did not help. Eventually, I replaced the , with +, finished the processing, saved the output into an outfile and replaced the + with ,. This may seem ugly but it worked for me.
May not be what is needed here but it's a very old question and the answer may help others. A tip I find useful with importing into Excel with a different separator is to open the file in a text editor and add a first line like:
sep=|
where | is the separator you wish Excel to use.
Alternatively you can change the default separator in Windows but a bit long-winded:
Control Panel>Clock & region>Region>Formats>Additional>Numbers>List separator [change from comma to your preferred alternative]. That means Excel will also default to exporting CSVs using the chosen separator.
You could encode your values, for example in PHP base64_encode($str) / base64_decode($str)
IMO this is simpler than doubling up quotes, etc.
https://www.php.net/manual/en/function.base64-encode.php
The encoded values will never contain a comma so every comma in your CSV will be a separator.
You can use the Text_Qualifier field in your Flat file connection manager to as ". This should wrap your data in quotes and only separate by commas which are outside the quotes.
First, if item value has double quote character ("), replace with 2 double quote character ("")
item = item.ToString().Replace("""", """""")
Finally, wrap item value:
ON LEFT: With double quote character (")
ON RIGHT: With double quote character (") and comma character (,)
csv += """" & item.ToString() & ""","
Double quotes not worked for me, it worked for me \". If you want to place a double quotes as example you can set \"\".
You can build formulas, as example:
fprintf(strout, "\"=if(C3=1,\"\"\"\",B3)\"\n");
will write in csv:
=IF(C3=1,"",B3)
A C# method for escaping delimiter characters and quotes in column text. It should be all you need to ensure your csv is not mangled.
private string EscapeDelimiter(string field)
{
if (field.Contains(yourEscapeCharacter))
{
field = field.Replace("\"", "\"\"");
field = $"\"{field}\"";
}
return field;
}
If the task is to create a csv file out of some data where commas may be present, is there a way to do it without later confusing which comma is a delimiter and which comma is part of a value?
Obviously, we can use a different delimiter, replace all occurrences, or replace the original comma with something else, but for the purpose of this question let's say that modifying the original data is not an option and a comma is the only delimiter allowed.
How would you approach something like this? Would it be easier to create the xls instead? Can you recommend any java libraries that handle this well?
A true CSV reader should be able to handle this; the values should be in quotes, e.g.:
one,two,"a, b, c",four
...per item #6 in Section 2 of the RFC.
While there's no single CSV standard, the usual convention is to surround entries containing commas in double quotes (i.e. ").
Prempting the next question: What to do if your data contains a double quote? In this case they are usually substituted for a pair of double quotes.
While I hate to cite wikipedia as a source, they do have a pretty good roundup of basic rules and examples for CSV formatting.
I would either use a different delimiter or use a library like Apache POI.
I think the best way is to use Apache POI: http://poi.apache.org/
You can easily create XLS documents without much hassle.
However, if you really need CSV and not XLS, you can surround the value with quotes. This should also solve the problem.
Usually, you work with , as separator and ' as quote. So your values would look like:
foo, 'bar, baz', iik, aje
the task is to create a csv file
Actually an impossible task, since there is no such thing as "a CSV" file. Different Microsoft produces have used different (subtly different, I grant) formats and named them all "CSV". As most spreadsheets can read delimiter separated value (DSV) files, you might be better writing one of those.
I am using opencsv to read csv file. Fields are separated by comma. But in one field, if it contains quote, the comma inside the quote then is not a delimiter. For example, "Hello, World".
The current opencsv cannot deal with that. How to address this problem?
update
I found that it is not the problem of comma (so far). The problem row is:...,"a children""s heart\",.... It seems to remove the quote, thus the read field becomes a children"s heart",...... and ...... represent all the following data.
It seems not the problem of opencsv, but mess of the input data.
You can write a custom code to search through your csv file and replace all comma's that are inside quotes with a , or a special character that you can identify later and place back as a comma.
According to the documentation, you can supply custom separator and quote characters in the constructor, which should deal with it:
CSVReader(Reader reader, char separator, char quotechar)
Construct your reader with , as separator and " as quotechar.
What will be the most eficient way to split a file in Java ?
Like to get it grid ready...
(Edit)
Modifying the question.
Basically after scouring the net I understand that there are generally two methods followed for file splitting....
Just split them by the number of bytes
I guess the advantage of this method is that it is fast, but say I have all the data in a line and suppose the file split puts half the data in one split and the other half the data in another split, then what do I do ??
Read them line by line
This will keep my data intact, fine, but I suppose this ain't as fast as the above method
Well, just read the file line by line and start saving it to a new file. Then when you decide it's time to split, start saving the lines to a new place.
Don't worry about efficiency too much unless it's a real problem later.
My first impression is that you have something like a comma separated value (csv) file. The usual way to read / parse those files is to
read them line by line
skip headers and empty lines
use String#split(String reg) to split a line into values (reg is chosen to match the delimiter)
I have a csv file in the below format. I get an issue if either one of the beow csv data is read by the program
"D",abc"def,"","0429"292"0","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
"D","abc"def","","04292920","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""
The below split command is used to ignore the commas inside the double quotes i got the below split command from an earlier post. Pasted the URL that i took this command
String items[] = line.split(",(?=([^\"]\"[^\"]\")[^\"]$)",15);
System.out.println("items.length"+items.length);
Regarding Java Split Command Parsing Csv File
The items.length is printed as 14 instead of 15. The abc"def is not recognized as a individual field and it's getting incorrectly stored as
"D",abc"def in items[0]. . I want it to be stored in the below way
items[0] should be "D" and items[1] should be abc"def
The same issue happens when there is a value "abc"def". I want it to be stored as
items[0] should be "D" and items[1] should be "abc"def"
Also this split command works perfectly if the double quotes repeated inside the double quotes( field value is D,"abc""def",1 ).
How can i resolve this issue.
I think you would be much better off writing a parser to parse the CSV files rather than try to use a regular expression. Once you start dealing with CSV files with carriage returns within the lines, then the Regex will probably fall apart. It wouldn't take that much code to write a simple while loop that went through all the characters and split up the data. It would be lot easier to deal with "Non-Standard"* CSV files such as yours when you have a parser rather than a Regex.
*I say non-standard because there isn't really an official standard for CSV, and when you're dealing with CSV files from many different systems, you see lots of weird things, like the abc"def field as shown above.
opencsv is a great simple and light weight CSV parser for Java. It will easily handle your data.
If possible, changing your CSV format would make the solution very simple.
See the following for an overview of Delimiter Separated Values, a common format on Unix-based systems:
http://www.faqs.org/docs/artu/ch05s02.html#id2901882
Opencsv is very simple and best API for CSV parsing . This can be done with Linux SED commands prior processing it in java . If File is not in proper format convert it into proper delimited which is your (" , " ) into pipe or other unique delimiter , so inside field value and column delimiter can be differentiated easily by Opencsv.Use the power of linux with your java code.