My main problem is that I'm trying to read a CSV delimited by ; in Java and the problem comes when I try to read a field of the CSV that contains a ;. For example:
"I want you to do that;"
In this case the field is recognized like
"I want you to do that"
And it creates another field that is just an empty string.
I use a BufferedReader to read the CSV and the split method to separate it with the ;. I'm not allowed to use libraries like OpenCSV so I want to find a solution with the method I'm using.
Parse according to the quotation marks
If the data incidentally containing the delimiter is wrapped in double quotes (QUOTATION MARK), then you should have no problem with parsing. Your parsing should look first for pairs of double quote characters. After that, look for delimiters outside of those pairs.
Rather than writing the parsing code yourself, I highly recommend using a CSV library. In the Java ecosystem, you have a wealth of good products to choose. For example, I have made successful use of Apache Commons CSV.
See also the specification for CSV: RFC 4180.
I've got a two column CSV with a name and a number. Some people's name use commas, for example Joe Blow, CFA. This comma breaks the CSV format, since it's interpreted as a new column.
I've read up and the most common prescription seems to be replacing that character, or replacing the delimiter, with a new value (e.g. this|that|the, other).
I'd really like to keep the comma separator (I know excel supports other delimiters but other interpreters may not). I'd also like to keep the comma in the name, as Joe Blow| CFA looks pretty silly.
Is there a way to include commas in CSV columns without breaking the formatting, for example by escaping them?
To encode a field containing comma (,) or double-quote (") characters, enclose the field in double-quotes:
field1,"field, 2",field3, ...
Literal double-quote characters are typically represented by a pair of double-quotes (""). For example, a field exclusively containing one double-quote character is encoded as """".
For example:
Sheet: |Hello, World!|You "matter" to us.|
CSV: "Hello, World!","You ""matter"" to us."
More examples (sheet → csv):
regular_value → regular_value
Fresh, brown "eggs" → "Fresh, brown ""eggs"""
" → """"
"," → ""","""
,,," → ",,,"""
,"", → ","""","
""" → """"""""
See wikipedia.
I found that some applications like Numbers in Mac ignore the double quote if there is space before it.
a, "b,c" doesn't work while a,"b,c" works.
The problem with the CSV format, is there's not one spec, there are several accepted methods, with no way of distinguishing which should be used (for generate/interpret). I discussed all the methods to escape characters (newlines in that case, but same basic premise) in another post. Basically it comes down to using a CSV generation/escaping process for the intended users, and hoping the rest don't mind.
Reference spec document.
If you want to make that you said, you can use quotes. Something like this
$name = "Joe Blow, CFA.";
$arr[] = "\"".$name."\"";
so now, you can use comma in your name variable.
You need to quote that values.
Here is a more detailed spec.
In addition to the points in other answers: one thing to note if you are using quotes in Excel is the placement of your spaces. If you have a line of code like this:
print '%s, "%s", "%s", "%s"' % (value_1, value_2, value_3, value_4)
Excel will treat the initial quote as a literal quote instead of using it to escape commas. Your code will need to change to
print '%s,"%s","%s","%s"' % (value_1, value_2, value_3, value_4)
It was this subtlety that brought me here.
You can use Template literals (Template strings)
e.g -
`"${item}"`
CSV files can actually be formatted using different delimiters, comma is just the default.
You can use the sep flag to specify the delimiter you want for your CSV file.
Just add the line sep=; as the very first line in your CSV file, that is if you want your delimiter to be semi-colon. You can change it to any other character.
This isn't a perfect solution, but you can just replace all uses of commas with ‚ or a lower quote. It looks very very similar to a comma and will visually serve the same purpose. No quotes are required
in JS this would be
stringVal.replaceAll(',', '‚')
You will need to be super careful of cases where you need to directly compare that data though
Depending on your language, there may be a to_json method available. That will escape many things that break CSVs.
I faced the same problem and quoting the , did not help. Eventually, I replaced the , with +, finished the processing, saved the output into an outfile and replaced the + with ,. This may seem ugly but it worked for me.
May not be what is needed here but it's a very old question and the answer may help others. A tip I find useful with importing into Excel with a different separator is to open the file in a text editor and add a first line like:
sep=|
where | is the separator you wish Excel to use.
Alternatively you can change the default separator in Windows but a bit long-winded:
Control Panel>Clock & region>Region>Formats>Additional>Numbers>List separator [change from comma to your preferred alternative]. That means Excel will also default to exporting CSVs using the chosen separator.
You could encode your values, for example in PHP base64_encode($str) / base64_decode($str)
IMO this is simpler than doubling up quotes, etc.
https://www.php.net/manual/en/function.base64-encode.php
The encoded values will never contain a comma so every comma in your CSV will be a separator.
You can use the Text_Qualifier field in your Flat file connection manager to as ". This should wrap your data in quotes and only separate by commas which are outside the quotes.
First, if item value has double quote character ("), replace with 2 double quote character ("")
item = item.ToString().Replace("""", """""")
Finally, wrap item value:
ON LEFT: With double quote character (")
ON RIGHT: With double quote character (") and comma character (,)
csv += """" & item.ToString() & ""","
Double quotes not worked for me, it worked for me \". If you want to place a double quotes as example you can set \"\".
You can build formulas, as example:
fprintf(strout, "\"=if(C3=1,\"\"\"\",B3)\"\n");
will write in csv:
=IF(C3=1,"",B3)
A C# method for escaping delimiter characters and quotes in column text. It should be all you need to ensure your csv is not mangled.
private string EscapeDelimiter(string field)
{
if (field.Contains(yourEscapeCharacter))
{
field = field.Replace("\"", "\"\"");
field = $"\"{field}\"";
}
return field;
}
My problem is:
I'm using a csv that came out from some software, and the issue is that this software is not handling csv so well cause there are some strings in the csv that have quote in them and what is wrapping a string is also quotes so then im having issues parsing it.
so this is normal csv:
"one","two","three"
and here is my case:
"one","tw"o","three"
So I'm having issues parsing strings like "tw"o". This is basically a problem with the software that is outputting the file, and I can't edit that software.
So I thought I could create a regex that will take the unnecessary quotes or commas and make sure that each string is wrapped in quotes and delimited by comma, does someone know how can i achieve it?
im using tototoshi library for scala
I tried Python csv module, and it was able to do that (sounds like a hack but the input file is wrong after all, and using regex would be a hack too):
import csv
z = '''"one","tw"o","three"'''
cr = csv.reader([z])
print(next(cr))
result:
['one', 'two"', 'three']
For some reason, the quote has been moved in the end of the string (a valid way to put a double quote in a field would be to double it).
To remove it you can do
print([x.replace('"',"") for x in next(cr)])
to get
['one', 'two', 'three']
note that csv will issue 4 fields with "one","tw",o","three" so if the quote is followed by a comma, nothing works, only human verification can fix this.
One pretty simple regex solution that may work for you is this:
regex: (?<=\w)"(?=\w) //global flag
replace: '' //blank string
As long as we can view "bad" double quotes as those that are surrounded by alphanumerics, this will work. It's just a lookbehind for a alphanumeric, a double quote, and lookahead for a alphanumeric. It would not match a double quote escaped with a backslash or another double quote, so "" or \" would be okay.
demo here
Looks like you can't predict what kind of values with unescaped quotes you might get. There's no way to clean this up reliably with regex.
Maybe try univocity-parsers as it has a CSV parser that can handle this sort of input properly. Example:
//first configure the parser
CsvParserSettings settings = new CsvParserSettings();
//override the default unescape quote handling. This seems more appropriate for your case.
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
//then create a parser and parse your input line:
CsvParser parser = new CsvParser(settings);
List<String[]> results = parser.parseAll(<your input here>);
Hope it helps.
Disclaimer: I'm the author of this library. It's open-source and free (Apache v2.0 license)
I'm using the CSVPrinter class from the Apache Commons in order to output a CSV file. What I would like to have happen is that if a given field contains any spaces in it, that it gets encapsulated in quotes. But if it's just a long string of numbers, or a date string, for example, then those do not need to be quoted.
Unfortunately the QuoteMode enum seems pretty limited; it offers the four following choices:
ALL Quotes all fields.
MINIMAL Quotes fields which contain special characters such as a delimiter, quote character or any of the characters in line separator.
NON_NUMERIC Quotes all non-numeric fields.
NONE Never quotes fields.
The MINIMAL option seems to be the closest to what I want to do here, but since the space character is not part of a line separator, that doesn't work. Is there any way to configure a CSVPrinter object to quote fields that have spaces in them?
CSVPrinter is not that flexible, it is also final so you can't override the printing implementation.
I suggest you find a different csv library or look at the source code of CSVPrinter and implement your own version with your own requirements. It is definitely not a requirement of the CSV format to quote strings with whitespace though. Any implementation that complies with the format should be able to read strings with whitespace in them (and not quoted).
I am using opencsv to read csv file. Fields are separated by comma. But in one field, if it contains quote, the comma inside the quote then is not a delimiter. For example, "Hello, World".
The current opencsv cannot deal with that. How to address this problem?
update
I found that it is not the problem of comma (so far). The problem row is:...,"a children""s heart\",.... It seems to remove the quote, thus the read field becomes a children"s heart",...... and ...... represent all the following data.
It seems not the problem of opencsv, but mess of the input data.
You can write a custom code to search through your csv file and replace all comma's that are inside quotes with a , or a special character that you can identify later and place back as a comma.
According to the documentation, you can supply custom separator and quote characters in the constructor, which should deal with it:
CSVReader(Reader reader, char separator, char quotechar)
Construct your reader with , as separator and " as quotechar.