I would like to store some String in a file and then read it back again. The problem is Strings could be anything for instance it could even be something like "Entry1","Entry2" for one field. So if I simply check commas and split Strings accordingly to that it will definitely fail.
Is there any built-in Java class that handles situations like that? If not how can I make a simple CSV parser in Java?
You might want to have a look at thisspecification for CSV. Bear in mind that there is no official recognized specification. You can probably try this parser too else
There is Apache Common library for CSV too that can help.
If you do not know about delimiter it will not be possible to do this so you have to find out somehow. If the delimiter can vary your only hope is to be able to deduce if from the formatting of the known data. When Excel imports CSV files it lets the user choose the delimiter and this is a solution you could use as well.
I would recommend openCSV: http://opencsv.sourceforge.net/
I have used it for numerous Java projects requiring CSV support, both reading and writing. A simple example of how it writes a CSV from the docs:
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"), ',');
// feed in your array (or convert your data to an array)
String[] entries = "first,second,third".split(",");
writer.writeNext(entries);
writer.close();
Assuming you can make a String[] out of your data it's that simple.
To deal with comma's in your entries you'd need to quote the entire entry:
`Make,Take,Break", "Top,Right,Left,Bottom",
With OpenCSV you can provide a quote character,you just pass it in the constructor:
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"), ',', '"');
That should take care of the needs you listed.
Related
My main problem is that I'm trying to read a CSV delimited by ; in Java and the problem comes when I try to read a field of the CSV that contains a ;. For example:
"I want you to do that;"
In this case the field is recognized like
"I want you to do that"
And it creates another field that is just an empty string.
I use a BufferedReader to read the CSV and the split method to separate it with the ;. I'm not allowed to use libraries like OpenCSV so I want to find a solution with the method I'm using.
Parse according to the quotation marks
If the data incidentally containing the delimiter is wrapped in double quotes (QUOTATION MARK), then you should have no problem with parsing. Your parsing should look first for pairs of double quote characters. After that, look for delimiters outside of those pairs.
Rather than writing the parsing code yourself, I highly recommend using a CSV library. In the Java ecosystem, you have a wealth of good products to choose. For example, I have made successful use of Apache Commons CSV.
See also the specification for CSV: RFC 4180.
If the task is to create a csv file out of some data where commas may be present, is there a way to do it without later confusing which comma is a delimiter and which comma is part of a value?
Obviously, we can use a different delimiter, replace all occurrences, or replace the original comma with something else, but for the purpose of this question let's say that modifying the original data is not an option and a comma is the only delimiter allowed.
How would you approach something like this? Would it be easier to create the xls instead? Can you recommend any java libraries that handle this well?
A true CSV reader should be able to handle this; the values should be in quotes, e.g.:
one,two,"a, b, c",four
...per item #6 in Section 2 of the RFC.
While there's no single CSV standard, the usual convention is to surround entries containing commas in double quotes (i.e. ").
Prempting the next question: What to do if your data contains a double quote? In this case they are usually substituted for a pair of double quotes.
While I hate to cite wikipedia as a source, they do have a pretty good roundup of basic rules and examples for CSV formatting.
I would either use a different delimiter or use a library like Apache POI.
I think the best way is to use Apache POI: http://poi.apache.org/
You can easily create XLS documents without much hassle.
However, if you really need CSV and not XLS, you can surround the value with quotes. This should also solve the problem.
Usually, you work with , as separator and ' as quote. So your values would look like:
foo, 'bar, baz', iik, aje
the task is to create a csv file
Actually an impossible task, since there is no such thing as "a CSV" file. Different Microsoft produces have used different (subtly different, I grant) formats and named them all "CSV". As most spreadsheets can read delimiter separated value (DSV) files, you might be better writing one of those.
This question already has answers here:
How do I create a Java string from the contents of a file?
(35 answers)
Closed 7 years ago.
I am trying to read a simple text file into a String. Of course there is the usual way of getting the input stream and iterating with readLine() and reading contents into String.
Having done this hundreds of times in past, I just wondered how can I do this in minimum lines of code? Isn't there something in java like String fileContents = XXX.readFile(myFile/*File*/) .. rather anything that looks as simple as this?
I know there are libraries like Apache Commons IO which provide such simplifications or even I can write a simple Util class to do this. But all that I wonder is - this is a so frequent operation that everyone needs then why doesn't Java provide such simple function? Isn't there really a single method somewhere to read a file into string with some default or specified encoding?
Yes, you can do this in one line (though for robust IOException handling you wouldn't want to).
String content = new Scanner(new File("filename")).useDelimiter("\\Z").next();
System.out.println(content);
This uses a java.util.Scanner, telling it to delimit the input with \Z, which is the end of the string anchor. This ultimately makes the input have one actual token, which is the entire file, so it can be read with one call to next().
There is a constructor that takes a File and a String charSetName (among many other overloads). These two constructor may throw FileNotFoundException, but like all Scanner methods, no IOException can be thrown beyond these constructors.
You can query the Scanner itself through the ioException() method if an IOException occurred or not. You may also want to explicitly close() the Scanner after you read the content, so perhaps storing the Scanner reference in a local variable is best.
See also
Java Tutorials - I/O Essentials - Scanning and formatting
Related questions
Validating input using java.util.Scanner - has many examples of more typical usage
Third-party library options
For completeness, these are some really good options if you have these very reputable and highly useful third party libraries:
Guava
com.google.common.io.Files contains many useful methods. The pertinent ones here are:
String toString(File, Charset)
Using the given character set, reads all characters from a file into a String
List<String> readLines(File, Charset)
... reads all of the lines from a file into a List<String>, one entry per line
Apache Commons/IO
org.apache.commons.io.IOUtils also offer similar functionality:
String toString(InputStream, String encoding)
Using the specified character encoding, gets the contents of an InputStream as a String
List readLines(InputStream, String encoding)
... as a (raw) List of String, one entry per line
Related questions
Most useful free third party Java libraries (deleted)?
From Java 7 (API Description) onwards you can do:
new String(Files.readAllBytes(Paths.get(filePath)), StandardCharsets.UTF_8);
Where filePath is a String representing the file you want to load.
You can use apache commons IO..
FileInputStream fisTargetFile = new FileInputStream(new File("test.txt"));
String targetFileStr = IOUtils.toString(fisTargetFile, "UTF-8");
This should work for you:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public static void main(String[] args) throws IOException {
String content = new String(Files.readAllBytes(Paths.get("abc.java")));
}
Using Apache Commons IO.
import org.apache.commons.io.FileUtils;
//...
String contents = FileUtils.readFileToString(new File("/path/to/the/file"), "UTF-8")
You can see de javadoc for the method for details.
Don't write your own util class to do this - I would recommend using Guava, which is full of all kinds of goodness. In this case you'd want either the Files class (if you're really just reading a file) or CharStreams for more general purpose reading. It has methods to read the data into a list of strings (readLines) or totally (toString).
It has similar useful methods for binary data too. And then there's the rest of the library...
I agree it's annoying that there's nothing similar in the standard libraries. Heck, just being able to supply a CharSet to a FileReader would make life a little simpler...
Another alternative approach is:
How do I create a Java string from the contents of a file?
Other option is to use utilities provided open source libraries
http://commons.apache.org/io/api-1.4/index.html?org/apache/commons/io/IOUtils.html
Why java doesn't provide such a common util API ?
a) to keep the APIs generic so that encoding, buffering etc is handled by the programmer.
b) make programmers do some work and write/share opensource util libraries :D ;-)
Sadly, no.
I agree that such frequent operation should have easier implementation than copying of input line by line in loop, but you'll have to either write helper method or use external library.
I discovered that the accepted answer actually doesn't always work, because \\Z may occur in the file. Another problem is that if you don't have the correct charset a whole bunch of unexpected things may happen which may cause the scanner to read only a part of the file.
The solution is to use a delimiter which you are certain will never occur in the file. However, this is theoretically impossible. What we CAN do, is use a delimiter that has such a small chance to occur in the file that it is negligible: such a delimiter is a UUID, which is natively supported in Java.
String content = new Scanner(file, "UTF-8")
.useDelimiter(UUID.randomUUID().toString()).next();
I'm trying to parse a csv file into a 2d array, where each row is a data entry and each column is a field in that entry.
Doing this all at once simplifies and separates my processing code from my parsing code.
I tried to write a simple parser that used String.Split to separate file by commas. This is a horrible approach as I have discovered. It completely fails to parse any special cases like double quotes, line feeds, and other special chars.
What is the proper way to parse a CSV file into a 2d array as I have described?
Code samples in Java would be appreciated.
The array can be a dynamic list object or vector or something like that, it just has to be indexable with two indexers.
Have a look at Commons CSV?
CSVParser parser = new CSVParser(new FileReader(file));
String[] line;
while ((line = parser.getLine()) != null) {
// process
}
If your file has fields with double quoted entries that contain separators and fields with line feeds, than I doubt that it is a real csv file... a proper csv file is something like this
1;John;Doe;engineer,manager
2;Bart;Foo;engineer,dilbert
while this is "something else":
1;John;Doe;"engineer;manager"
2;Bart;Foo;
"engineer,dilbert"
And the first example is parseable with String.split on each line.
I decided to create a currency converter in Java, and have it so that it would pull the conversion values out of a text file (to allow for easy editability since these values are constantly changing). I did manage to do it by using the Scanner class and putting all the values into an ArrayList.
Now I'm wondering if there is a way to add comments to the text file for the user to read, which Scanner will ignore. "//" doesn't seem to work.
Thanks
Best way would be to read the file line by line using java.io.BufferedReader and scan every line for comments using String#startsWith() where in you searches for "//".
But have you considered using a properties file and manage it using the java.util.Properties API? This way you can benefit from a ready-made specification and API's and you can use # as start of comment line. Also see the tutorial at sun.com.
Scanner wont ignore anything, you will have to remove the comments from your data after you have read it in.
Yea, while ((currentLine = bufferedReader.readLine()) != null) is possibly the easiest, then perform your necessary tests. currentLine.split(regex) is also very handy for converting a line into an array of values using a delimiter.
With Java nio, you could do something like this. Assuming you want to ignore lines that start with "//" and end up with an ArrayList.
List<String> dataList;
Path path = FileSystems.getDefault().getPath(".", "data.txt");
dataList = Files.lines(path)
.filter(line -> !(line.startsWith("//")))
.collect(Collectors.toCollection(ArrayList::new));