What is simplest way to read a file into String? [duplicate] - java

This question already has answers here:
How do I create a Java string from the contents of a file?
(35 answers)
Closed 7 years ago.
I am trying to read a simple text file into a String. Of course there is the usual way of getting the input stream and iterating with readLine() and reading contents into String.
Having done this hundreds of times in past, I just wondered how can I do this in minimum lines of code? Isn't there something in java like String fileContents = XXX.readFile(myFile/*File*/) .. rather anything that looks as simple as this?
I know there are libraries like Apache Commons IO which provide such simplifications or even I can write a simple Util class to do this. But all that I wonder is - this is a so frequent operation that everyone needs then why doesn't Java provide such simple function? Isn't there really a single method somewhere to read a file into string with some default or specified encoding?

Yes, you can do this in one line (though for robust IOException handling you wouldn't want to).
String content = new Scanner(new File("filename")).useDelimiter("\\Z").next();
System.out.println(content);
This uses a java.util.Scanner, telling it to delimit the input with \Z, which is the end of the string anchor. This ultimately makes the input have one actual token, which is the entire file, so it can be read with one call to next().
There is a constructor that takes a File and a String charSetName (among many other overloads). These two constructor may throw FileNotFoundException, but like all Scanner methods, no IOException can be thrown beyond these constructors.
You can query the Scanner itself through the ioException() method if an IOException occurred or not. You may also want to explicitly close() the Scanner after you read the content, so perhaps storing the Scanner reference in a local variable is best.
See also
Java Tutorials - I/O Essentials - Scanning and formatting
Related questions
Validating input using java.util.Scanner - has many examples of more typical usage
Third-party library options
For completeness, these are some really good options if you have these very reputable and highly useful third party libraries:
Guava
com.google.common.io.Files contains many useful methods. The pertinent ones here are:
String toString(File, Charset)
Using the given character set, reads all characters from a file into a String
List<String> readLines(File, Charset)
... reads all of the lines from a file into a List<String>, one entry per line
Apache Commons/IO
org.apache.commons.io.IOUtils also offer similar functionality:
String toString(InputStream, String encoding)
Using the specified character encoding, gets the contents of an InputStream as a String
List readLines(InputStream, String encoding)
... as a (raw) List of String, one entry per line
Related questions
Most useful free third party Java libraries (deleted)?

From Java 7 (API Description) onwards you can do:
new String(Files.readAllBytes(Paths.get(filePath)), StandardCharsets.UTF_8);
Where filePath is a String representing the file you want to load.

You can use apache commons IO..
FileInputStream fisTargetFile = new FileInputStream(new File("test.txt"));
String targetFileStr = IOUtils.toString(fisTargetFile, "UTF-8");

This should work for you:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public static void main(String[] args) throws IOException {
String content = new String(Files.readAllBytes(Paths.get("abc.java")));
}

Using Apache Commons IO.
import org.apache.commons.io.FileUtils;
//...
String contents = FileUtils.readFileToString(new File("/path/to/the/file"), "UTF-8")
You can see de javadoc for the method for details.

Don't write your own util class to do this - I would recommend using Guava, which is full of all kinds of goodness. In this case you'd want either the Files class (if you're really just reading a file) or CharStreams for more general purpose reading. It has methods to read the data into a list of strings (readLines) or totally (toString).
It has similar useful methods for binary data too. And then there's the rest of the library...
I agree it's annoying that there's nothing similar in the standard libraries. Heck, just being able to supply a CharSet to a FileReader would make life a little simpler...

Another alternative approach is:
How do I create a Java string from the contents of a file?
Other option is to use utilities provided open source libraries
http://commons.apache.org/io/api-1.4/index.html?org/apache/commons/io/IOUtils.html
Why java doesn't provide such a common util API ?
a) to keep the APIs generic so that encoding, buffering etc is handled by the programmer.
b) make programmers do some work and write/share opensource util libraries :D ;-)

Sadly, no.
I agree that such frequent operation should have easier implementation than copying of input line by line in loop, but you'll have to either write helper method or use external library.

I discovered that the accepted answer actually doesn't always work, because \\Z may occur in the file. Another problem is that if you don't have the correct charset a whole bunch of unexpected things may happen which may cause the scanner to read only a part of the file.
The solution is to use a delimiter which you are certain will never occur in the file. However, this is theoretically impossible. What we CAN do, is use a delimiter that has such a small chance to occur in the file that it is negligible: such a delimiter is a UUID, which is natively supported in Java.
String content = new Scanner(file, "UTF-8")
.useDelimiter(UUID.randomUUID().toString()).next();

Related

URI.resolve() does not support the full spectrum of allowed file name characters

I used to use a URI element for representing the base folder and use URI.resolve(filename) to get the URI to the actually file I would like to write to disk.
Now I come along that for apparent reasons the resolve method does not support many characters that the OS supports for file names and those have to be encoded using %HEX.
Since I am not aware of that limitation and how far the encoding really goes. Often this is used in parameter values and I can barely come up with a situation I see encoding in the path.
So is it save to assume that using URI.resolve(URLEncoder.encode(filename)) does the trick? Are there any better alternatives or edge cases I should know about?
It's actually URI.create(en) which fails, for example using "!##$%^&()" (which is a valid if a very strange filename) throws IllegalArgumentException: Malformed escape pair at index 4
As for URLEncoder.encode(filename) - It is deprecated and URLEncoder.encode(filename, encoding) should be used instead.
From my experience, filename URI resolution is best handled by new File(f).toURI() as for a given abstract pathname f, it is guaranteed that:
new File(f.toURI()).equals( f.getAbsoluteFile())

Java Buffered Reader detecting patterns in phrases

I want my program to be able to read in a file of java code and be able to identify the different methods. Is this possible to do with a buffered reader or should I be doing something different? Since methods can return any type (String/void/int/etc) and can be of many different types of modifier (private/public etc) I don't see how I can identify them easily.
public returnType methodName(String s){
How can I get my program to read that in and automatically detect that it is of the same format as:
private Set<String> nextstates(int newInt)
You can use regular expressions to search the file for method definitions. You would just read in the file line by line using a BufferedReader for example and search in every line for matches with the regex. One possible regex is the one suggested in the following post by Georgios Gousios

Use File or FileReader with Scanner?

Disclaimer: I've looked through all the questions I can find and none of them answers this exact question. If you find one please point me to it and be polite.
So, the Oracle I/O tutorial opens a text file with Scanner as follows:
new Scanner(BufferedReader(FileReader("xanadu.txt")));
But the Javadoc opens a text file with Scanner like this:
new Scanner(new File("myNumbers"));
It would be nice to use the simpler method, especially when I have a small file and can live with the smaller buffer, but I've also seen people say that when you open a File directly you can't close it. If that's the case, why is that idiom used in the official documentation?
Edit: I've also seen new Scanner(FileReader("blah.txt")); but this seems like the worst of both worlds.
Edit: I'm not trying to start a debate about whether to use Scanner or not. I have a question about how to use Scanner. Thank you.
You could look at implementation of Scanner (JDK is shipped with source code). There is a close() method in Scanner class as well. Essentially both approaches you listed are identical for your use case of reading small file - just don't forget to call close() at the end.
The File class has no close() method because it only abstracts a disk file. It is not an input stream to the file, so there is nothing to close.
Yes you can do that.
Basically you do:
Scanner file = new Scanner(new FileReader("file.txt"));
To read a String:
String s = file.next();
When you are done with the file, do
file.close();
Horses for courses. From the Scanner javadocs, a Scanner is "A simple text scanner which can parse primitive types and strings using regular expressions." So, my take on your question is: it does not matter which approach you use, the simpler option with File is just as good as the one found in Oracle tutorials. Scanner is for convenient tokenizing of text files, and if your file is small, as you said, than it's a perfect fit.
Because a Scanner uses regular expressions, you can't really expect huge performance with it, whether you create a buffered file reader for the scanner or not. The underlying Readable will be close()d (if it's a Closeable, which it will be, if you use the Scanner(File) constructor), and so you don't have to worry as long as you close() your Scanner object (or use try-with-resources).
There are multiple ways to construct a Scanner object.
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html
I personally wouldn't even use Scanner for file reading though. Look at BufferedReader tutorials. It's not too hard to figure out.

Is there a tool to take a block of text and turn it into Java StringBuffer code?

I have several blocks of text that I need to be able to paste inline in my code for some unit tests. It would make the code difficult to read if they were externalized, so is there some web tool where I can paste in my text and it will generate the code for a StringBuffer that preserves it's formatting? Or even a String, I'm not that picky at this point.
This seems like a code generator like this must exist somewhere on the web. I tried to Google one, but I have yet to come up with a set of search terms that don't fill my results with Java examples and documentation.
I suppose I could write one myself, but I'm in a bit of a time crunch and would rather not duplicate effort.
If I understood it correctly, any text editor which supports regexps should make it an easy task. For instance Notepad++ - just replace ^(.+)$ with "\1"+, then copy the result to the code, remove the last + and add String s = to the beginning :)
If you want to externalize then, use a properties file or something like that to read the text.
If you are looking for a simple tool to break up your text into concatenated strings that are joined together by stringbuffer then, most modern IDE will help you do it automatically. Here's how.
Copy the block of text in the IDE
Surround it in double quotes and assign to a String type variable. (This step may not be required)
Enter carriage returns wherever you want to wrap the text to next line and the IDE will automatically break the literals, concatenate them using double quotes "" and add them together
All modern compilers will internally convert "addas" + "addasfdas" literals to a String using StringBuffer.
The squirrel SQL client has a function called convert to string buffer it works nice.

Unicode aware CSV parser in Java

I'm looking for Java implementation of CSV (comma separated values) parser with proper handling of Unicode data, e.g. UTF-8 CSV files with Chinese text. I suppose such a parser should internally use code point related methods while iterating, comparing etc.
Apache 2 license or similar would work the best.
I don't believe in reinventing the wheel. So I do not want to write my own parser and go through the same headaches someone else did.
I personally like the CSV Parser from Ostermiller. They also have a Maven Repository if interested.
You can also check out OpenCSV. There is a Stack Overflow question already about parsing unicode.
Have you tried Commons CSV?
It's pretty easy to write yourself. Open the file with a FileInputStream and an InputStreamReader that uses UTF-8. Wrap it in a BufferedReader you can iterate through it using readLine(). Get each line as a String. Use regular expressions to split it into fields.
The only tricky part is constructing the regexes so they don't treat commas that are enclosed within quotes as field delimiters.
The approach above is a bit inefficient, but fast enough for most apps. If you have real performance requirements then you'll need something that iterates through characters. I wrote one a few years ago that uses a state machine that worked ok.

Categories