Disclaimer: I've looked through all the questions I can find and none of them answers this exact question. If you find one please point me to it and be polite.
So, the Oracle I/O tutorial opens a text file with Scanner as follows:
new Scanner(BufferedReader(FileReader("xanadu.txt")));
But the Javadoc opens a text file with Scanner like this:
new Scanner(new File("myNumbers"));
It would be nice to use the simpler method, especially when I have a small file and can live with the smaller buffer, but I've also seen people say that when you open a File directly you can't close it. If that's the case, why is that idiom used in the official documentation?
Edit: I've also seen new Scanner(FileReader("blah.txt")); but this seems like the worst of both worlds.
Edit: I'm not trying to start a debate about whether to use Scanner or not. I have a question about how to use Scanner. Thank you.
You could look at implementation of Scanner (JDK is shipped with source code). There is a close() method in Scanner class as well. Essentially both approaches you listed are identical for your use case of reading small file - just don't forget to call close() at the end.
The File class has no close() method because it only abstracts a disk file. It is not an input stream to the file, so there is nothing to close.
Yes you can do that.
Basically you do:
Scanner file = new Scanner(new FileReader("file.txt"));
To read a String:
String s = file.next();
When you are done with the file, do
file.close();
Horses for courses. From the Scanner javadocs, a Scanner is "A simple text scanner which can parse primitive types and strings using regular expressions." So, my take on your question is: it does not matter which approach you use, the simpler option with File is just as good as the one found in Oracle tutorials. Scanner is for convenient tokenizing of text files, and if your file is small, as you said, than it's a perfect fit.
Because a Scanner uses regular expressions, you can't really expect huge performance with it, whether you create a buffered file reader for the scanner or not. The underlying Readable will be close()d (if it's a Closeable, which it will be, if you use the Scanner(File) constructor), and so you don't have to worry as long as you close() your Scanner object (or use try-with-resources).
There are multiple ways to construct a Scanner object.
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html
I personally wouldn't even use Scanner for file reading though. Look at BufferedReader tutorials. It's not too hard to figure out.
Related
In Java 11 BufferedReader documentation I have found following sentence:
Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.
I cannot find any explanation how it can be done - what would be an appropriate buffered reader in this context?
Yes, it's a really bizarre (I'd say broken) use of the term 'localization' here - it's making an obscure (in that it doesn't link to it) reference to DataInputStream's known-broken readLine method, especially considering that this method's javadoc explicitly refers to BufferedReader. I assume that line was added at the same time as the 'can be used to localize' line was added to BR's javadoc.
I know that:
Parsing is the process of turning some kind of data into another kind
of data.
But then I also came across this difference between Scanner and BufferedReader:
BufferedReader is faster than Scanner because BufferedReader does not
need to parse the data.
So my question is how is using Scanner slower than using BufferedReader if I am reading just text file (plain characters) and I am not doing any parsing? Is there any parsing I am not aware of?
Or from following code perspective, how is here Scanner slower because of parsing than using BufferedReader?
//1
BufferedReader bufferedReader = new BufferedReader(new FileReader("xanadu.txt"));
System.out.println(bufferedReader.readLine());
//2
Scanner scanner = new Scanner(new FileReader("xanadu.txt"));
scanner.useDelimiter("\n");
System.out.println(scanner.next());
I don't understand quote how Scanner is slower because of parsing, when I am technically not parsing any data..
Dividing an input stream into lines is a (very limited) form of parsing, but as you say BufferedReader can also do that. The difference, if there is one, will be that BufferedReader can use a highly-optimised procedure to implement a single use case (divide a stream into lines) while Scanner needs to be able to be considerably more flexible (divide a stream into tokens delimited by an arbitrary string or regular expression). Flexibility almost always comes at a price, although you won't know what that cost is without doing some benchmarking. (And it may be very small, since it is conceivable that Scanner has optimised algorithms for particular special cases which it can recognise.)
In short, "because parsing" is not a very good explanation for why one interface is slower than another one. But the more flexibly and precisely you parse an input, the more time it is expected to take.
I was wondering if I can print out a string with Japanese characters. I stopped a mini-project that was, at first, out of my league. But as my skills and curiosity of high-level languages improved, I stumbled across my old project. But even with breaks from coding, I still wondered if it was possible. This isn't my project by any stretch (in fact, if the example given is non-applicable to programming, I'll feel stupid for the mere attempt.)
public static void main(String[] args) {
// TODO code application logic here
//Example:
System.out.println("Input English String Here... ");
Scanner english = new Scanner(System.in);
String English = english.next();
System.out.println("今、漢字に入ります。 ");
Scanner japanese = new Scanner(System.in);
String Japanese = japanese.next();
System.out.println("Did it work...? ");
System.out.println(English);
System.out.println(Japanese);
}
run:
Input English String Here...
Good
今、漢字に入ります。
いい
Did it work...?
Good
??
I expect to see いい on the last line of output.
The most likely explanation for getting ?? instead of いい is that there is a mismatch between the character encoding that is being delivered by your computer's input system, and the default Java character encoding determined by the JVM.
Assuming that the input is UTF-8 encoded, then a more reliable way to configure the scanner is new Scanner(System.in, "UTF-8").
Also note that it is not necessary to create multiple scanner objects. You can ... and should ... create one and reuse it. It probably will not matter if the input is genuinely interactive, but if there is any possibility that input could be piped to the program, you could find that the first Scanner gobbles up input that should go to the second Scanner.
If you are using eclipse you can change the default character encoding under run->run configurations -> common.
Also it would be better to use Scanner(System.in,StandardCharsets.UTF_8.displayName()) instead of a hard coding a string value.
Here is a link to another topic about the changing the default encoding for net beans:
How to change file encoding in NetBeans?
Support for Japanese in fonts is spotty, and different between AWT and Swing components.
Those funny blobs probably mean you are using a font/component combination that doesn't
have japanese glyphs.
Another possibility is if you've been manipulating the characters of the string,
by passing them through byte arrays or integers, it's easy to accidentally lose
high order bits. There are several deprecated APIs because of this hazard.
This question already has answers here:
How do I create a Java string from the contents of a file?
(35 answers)
Closed 7 years ago.
I am trying to read a simple text file into a String. Of course there is the usual way of getting the input stream and iterating with readLine() and reading contents into String.
Having done this hundreds of times in past, I just wondered how can I do this in minimum lines of code? Isn't there something in java like String fileContents = XXX.readFile(myFile/*File*/) .. rather anything that looks as simple as this?
I know there are libraries like Apache Commons IO which provide such simplifications or even I can write a simple Util class to do this. But all that I wonder is - this is a so frequent operation that everyone needs then why doesn't Java provide such simple function? Isn't there really a single method somewhere to read a file into string with some default or specified encoding?
Yes, you can do this in one line (though for robust IOException handling you wouldn't want to).
String content = new Scanner(new File("filename")).useDelimiter("\\Z").next();
System.out.println(content);
This uses a java.util.Scanner, telling it to delimit the input with \Z, which is the end of the string anchor. This ultimately makes the input have one actual token, which is the entire file, so it can be read with one call to next().
There is a constructor that takes a File and a String charSetName (among many other overloads). These two constructor may throw FileNotFoundException, but like all Scanner methods, no IOException can be thrown beyond these constructors.
You can query the Scanner itself through the ioException() method if an IOException occurred or not. You may also want to explicitly close() the Scanner after you read the content, so perhaps storing the Scanner reference in a local variable is best.
See also
Java Tutorials - I/O Essentials - Scanning and formatting
Related questions
Validating input using java.util.Scanner - has many examples of more typical usage
Third-party library options
For completeness, these are some really good options if you have these very reputable and highly useful third party libraries:
Guava
com.google.common.io.Files contains many useful methods. The pertinent ones here are:
String toString(File, Charset)
Using the given character set, reads all characters from a file into a String
List<String> readLines(File, Charset)
... reads all of the lines from a file into a List<String>, one entry per line
Apache Commons/IO
org.apache.commons.io.IOUtils also offer similar functionality:
String toString(InputStream, String encoding)
Using the specified character encoding, gets the contents of an InputStream as a String
List readLines(InputStream, String encoding)
... as a (raw) List of String, one entry per line
Related questions
Most useful free third party Java libraries (deleted)?
From Java 7 (API Description) onwards you can do:
new String(Files.readAllBytes(Paths.get(filePath)), StandardCharsets.UTF_8);
Where filePath is a String representing the file you want to load.
You can use apache commons IO..
FileInputStream fisTargetFile = new FileInputStream(new File("test.txt"));
String targetFileStr = IOUtils.toString(fisTargetFile, "UTF-8");
This should work for you:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public static void main(String[] args) throws IOException {
String content = new String(Files.readAllBytes(Paths.get("abc.java")));
}
Using Apache Commons IO.
import org.apache.commons.io.FileUtils;
//...
String contents = FileUtils.readFileToString(new File("/path/to/the/file"), "UTF-8")
You can see de javadoc for the method for details.
Don't write your own util class to do this - I would recommend using Guava, which is full of all kinds of goodness. In this case you'd want either the Files class (if you're really just reading a file) or CharStreams for more general purpose reading. It has methods to read the data into a list of strings (readLines) or totally (toString).
It has similar useful methods for binary data too. And then there's the rest of the library...
I agree it's annoying that there's nothing similar in the standard libraries. Heck, just being able to supply a CharSet to a FileReader would make life a little simpler...
Another alternative approach is:
How do I create a Java string from the contents of a file?
Other option is to use utilities provided open source libraries
http://commons.apache.org/io/api-1.4/index.html?org/apache/commons/io/IOUtils.html
Why java doesn't provide such a common util API ?
a) to keep the APIs generic so that encoding, buffering etc is handled by the programmer.
b) make programmers do some work and write/share opensource util libraries :D ;-)
Sadly, no.
I agree that such frequent operation should have easier implementation than copying of input line by line in loop, but you'll have to either write helper method or use external library.
I discovered that the accepted answer actually doesn't always work, because \\Z may occur in the file. Another problem is that if you don't have the correct charset a whole bunch of unexpected things may happen which may cause the scanner to read only a part of the file.
The solution is to use a delimiter which you are certain will never occur in the file. However, this is theoretically impossible. What we CAN do, is use a delimiter that has such a small chance to occur in the file that it is negligible: such a delimiter is a UUID, which is natively supported in Java.
String content = new Scanner(file, "UTF-8")
.useDelimiter(UUID.randomUUID().toString()).next();
I decided to create a currency converter in Java, and have it so that it would pull the conversion values out of a text file (to allow for easy editability since these values are constantly changing). I did manage to do it by using the Scanner class and putting all the values into an ArrayList.
Now I'm wondering if there is a way to add comments to the text file for the user to read, which Scanner will ignore. "//" doesn't seem to work.
Thanks
Best way would be to read the file line by line using java.io.BufferedReader and scan every line for comments using String#startsWith() where in you searches for "//".
But have you considered using a properties file and manage it using the java.util.Properties API? This way you can benefit from a ready-made specification and API's and you can use # as start of comment line. Also see the tutorial at sun.com.
Scanner wont ignore anything, you will have to remove the comments from your data after you have read it in.
Yea, while ((currentLine = bufferedReader.readLine()) != null) is possibly the easiest, then perform your necessary tests. currentLine.split(regex) is also very handy for converting a line into an array of values using a delimiter.
With Java nio, you could do something like this. Assuming you want to ignore lines that start with "//" and end up with an ArrayList.
List<String> dataList;
Path path = FileSystems.getDefault().getPath(".", "data.txt");
dataList = Files.lines(path)
.filter(line -> !(line.startsWith("//")))
.collect(Collectors.toCollection(ArrayList::new));