How to localize with BufferedReader? - java

In Java 11 BufferedReader documentation I have found following sentence:
Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.
I cannot find any explanation how it can be done - what would be an appropriate buffered reader in this context?

Yes, it's a really bizarre (I'd say broken) use of the term 'localization' here - it's making an obscure (in that it doesn't link to it) reference to DataInputStream's known-broken readLine method, especially considering that this method's javadoc explicitly refers to BufferedReader. I assume that line was added at the same time as the 'can be used to localize' line was added to BR's javadoc.

Related

How parser's buffer works? Matching the regex

One of my students have a task to do which part is to check if there is a regex matching string inside of a file.
The trick is that his teacher has forbidden reading whole file at once then parse it. Instead he said that he supposed to use buffer. The problem is that you never know how much of input you suppose to read from the file: there might be a matching sequence if you read just one character more from the file.
So the teacher wrote(translated):
Use technique known from parsers:
rewrite second half of the buffer to the first part of buffer
read next part of file to the second half
check if whole buffer contains the matching sequence
So how it suppose to be done(idea)? In my opinion it does not solve the problem stated above and it is pretty stupid and wasteful.
A Matcher does use an internal buffer of some kind, certainly. But if you look at the prototype to build a Matcher, you see that the only thing it takes as an argument is a simple CharSequence, which has only three operations:
knowing its length,
getting one character at a given offset,
getting a subsequence (another CharSequence).
When reading from a file, one possibility is to map the whole file using FileChannel.map(), then use an appropriate CharsetDecoder to read into a CharBuffer (which implements CharSequence). Or do that in chunks...
... Or use yours truly's crazy idea: this! I have tested it on 800+ MiB files and it works...
What your teacher is saying:
The regex will never need to match anything longer than half the length of the buffer.
The match could lie on a buffer boundary, hence you need to shift:
That seems realistic.
A BufferedReader reading line wise seems not entirely fitting. Maybe you might consider a byte array, BufferedInputStream.

how to peek at a single char in a non-buffered reader in java

I'm writing a program that takes a Reader and parses input from that reader. I cannot use a BufferedReader, but I would like to implement a peek method that looks at the current char in the reader, without actually calling read() on that character. Is there an easier/better way to do this than converting it to a character array or a string and looking at the character? I would love to use the mark() method but unfortunately that only works on a buffered reader.
The natural solution is to use PushBackReader which provides unread methods that can be used to implement peek. (You can work out the details!)
But it is not entirely clear if this is allowed. It depends whether you are forbidden from using BufferedReader or "a buffered Reader". In the latter case, PushBackReader is arguably a buffered Reader.
If you have to implement this without using an existing buffered Reader class, then how you should approach this depends on how much lookahead is needed:
If you need just one character lookahead (e.g. just peek) then you can implement this using an int to represent the lookahead and a boolean to say if it is currently valid.
If you need multiple characters lookahead, you need an array (or list), and so on.
For the record, the mark() and reset() methods are actually in the Reader API. The problem is that not all Reader classes are able to implement these methods ... due to limitations of the underlying streams ... at the operating system level.
You can write your own class with peek method based on java.io.PushBackReader.

Use File or FileReader with Scanner?

Disclaimer: I've looked through all the questions I can find and none of them answers this exact question. If you find one please point me to it and be polite.
So, the Oracle I/O tutorial opens a text file with Scanner as follows:
new Scanner(BufferedReader(FileReader("xanadu.txt")));
But the Javadoc opens a text file with Scanner like this:
new Scanner(new File("myNumbers"));
It would be nice to use the simpler method, especially when I have a small file and can live with the smaller buffer, but I've also seen people say that when you open a File directly you can't close it. If that's the case, why is that idiom used in the official documentation?
Edit: I've also seen new Scanner(FileReader("blah.txt")); but this seems like the worst of both worlds.
Edit: I'm not trying to start a debate about whether to use Scanner or not. I have a question about how to use Scanner. Thank you.
You could look at implementation of Scanner (JDK is shipped with source code). There is a close() method in Scanner class as well. Essentially both approaches you listed are identical for your use case of reading small file - just don't forget to call close() at the end.
The File class has no close() method because it only abstracts a disk file. It is not an input stream to the file, so there is nothing to close.
Yes you can do that.
Basically you do:
Scanner file = new Scanner(new FileReader("file.txt"));
To read a String:
String s = file.next();
When you are done with the file, do
file.close();
Horses for courses. From the Scanner javadocs, a Scanner is "A simple text scanner which can parse primitive types and strings using regular expressions." So, my take on your question is: it does not matter which approach you use, the simpler option with File is just as good as the one found in Oracle tutorials. Scanner is for convenient tokenizing of text files, and if your file is small, as you said, than it's a perfect fit.
Because a Scanner uses regular expressions, you can't really expect huge performance with it, whether you create a buffered file reader for the scanner or not. The underlying Readable will be close()d (if it's a Closeable, which it will be, if you use the Scanner(File) constructor), and so you don't have to worry as long as you close() your Scanner object (or use try-with-resources).
There are multiple ways to construct a Scanner object.
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html
I personally wouldn't even use Scanner for file reading though. Look at BufferedReader tutorials. It's not too hard to figure out.

ISO 8859-1 Encoding of files printed in Java program

I write a program that implements a file structure, the program prints out a product file based on the structure. Product names include letters Æ, Ø and Å. These letters are not displayed correctly in the output file. I use
PrintWriter printer = new PrintWriter(new FileOutputStream(new File("products.txt")));
IS0 8859 - 1 or Windows ANSI (CP 1252) is the character sets that the implementation requiers.
There are two possibilities:
Java is using the wrong encoding when outputting the file.
The file is actually correct, and whatever you are using to display the file is using the wrong encoding.
Assuming that the problem is the first one, the root cause is that Java has figured out that the default encoding for the platform is something other than the one you want / expect. There are three ways to solve this:
Figure out why Java has the got default locale and encoding "wrong" and remedy that. It will be something to do with your operating system's locale settings ...
Read this FAQ for details on how you can override the default locale settings at the command line.
Use a PrintWriter constructor that specifies the encoding explicitly so that your application doesn't rely on the default encoding. For example:
PrintWriter pw = new PrintWriter("filename", "ISO-8859-1");
In response to this comment:
Don’t PrintWriters all have the bug that you can’t know you had an error with them?
It is not a bug, it is a design feature.
You can find out if there was an error. You just can't find out what it was.
If you don't like it, you can use Writer instead.
They won’t raise an exception or even return failure if you try to shove a codepoint at them that can’t fit in the designated encoding.
Neither will a regular Writer I believe ... unless you specifically construct it to do this. The normal behaviour is to replace any unmappable codepoint with a specific character, though this is not specified in the javadocs (IIRC).
Do they even tell if you the filesystem fills up; I seem to recall that they don’t.
That is true. However:
For the kind of file you typically write using a PrintWriter this is not a critical issue.
If it is a critical issue AND you still want to use PrintWriter, you can always call checkError() (IIRC) to find out if there was an error.
I always end up writing my out OutputStreamWriter constructor with the explicit Charset.forName("UTF-8").newEncoder() second argument. It’s kind of tedious, so perhaps there’s a better way.
I dunno.

What is simplest way to read a file into String? [duplicate]

This question already has answers here:
How do I create a Java string from the contents of a file?
(35 answers)
Closed 7 years ago.
I am trying to read a simple text file into a String. Of course there is the usual way of getting the input stream and iterating with readLine() and reading contents into String.
Having done this hundreds of times in past, I just wondered how can I do this in minimum lines of code? Isn't there something in java like String fileContents = XXX.readFile(myFile/*File*/) .. rather anything that looks as simple as this?
I know there are libraries like Apache Commons IO which provide such simplifications or even I can write a simple Util class to do this. But all that I wonder is - this is a so frequent operation that everyone needs then why doesn't Java provide such simple function? Isn't there really a single method somewhere to read a file into string with some default or specified encoding?
Yes, you can do this in one line (though for robust IOException handling you wouldn't want to).
String content = new Scanner(new File("filename")).useDelimiter("\\Z").next();
System.out.println(content);
This uses a java.util.Scanner, telling it to delimit the input with \Z, which is the end of the string anchor. This ultimately makes the input have one actual token, which is the entire file, so it can be read with one call to next().
There is a constructor that takes a File and a String charSetName (among many other overloads). These two constructor may throw FileNotFoundException, but like all Scanner methods, no IOException can be thrown beyond these constructors.
You can query the Scanner itself through the ioException() method if an IOException occurred or not. You may also want to explicitly close() the Scanner after you read the content, so perhaps storing the Scanner reference in a local variable is best.
See also
Java Tutorials - I/O Essentials - Scanning and formatting
Related questions
Validating input using java.util.Scanner - has many examples of more typical usage
Third-party library options
For completeness, these are some really good options if you have these very reputable and highly useful third party libraries:
Guava
com.google.common.io.Files contains many useful methods. The pertinent ones here are:
String toString(File, Charset)
Using the given character set, reads all characters from a file into a String
List<String> readLines(File, Charset)
... reads all of the lines from a file into a List<String>, one entry per line
Apache Commons/IO
org.apache.commons.io.IOUtils also offer similar functionality:
String toString(InputStream, String encoding)
Using the specified character encoding, gets the contents of an InputStream as a String
List readLines(InputStream, String encoding)
... as a (raw) List of String, one entry per line
Related questions
Most useful free third party Java libraries (deleted)?
From Java 7 (API Description) onwards you can do:
new String(Files.readAllBytes(Paths.get(filePath)), StandardCharsets.UTF_8);
Where filePath is a String representing the file you want to load.
You can use apache commons IO..
FileInputStream fisTargetFile = new FileInputStream(new File("test.txt"));
String targetFileStr = IOUtils.toString(fisTargetFile, "UTF-8");
This should work for you:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public static void main(String[] args) throws IOException {
String content = new String(Files.readAllBytes(Paths.get("abc.java")));
}
Using Apache Commons IO.
import org.apache.commons.io.FileUtils;
//...
String contents = FileUtils.readFileToString(new File("/path/to/the/file"), "UTF-8")
You can see de javadoc for the method for details.
Don't write your own util class to do this - I would recommend using Guava, which is full of all kinds of goodness. In this case you'd want either the Files class (if you're really just reading a file) or CharStreams for more general purpose reading. It has methods to read the data into a list of strings (readLines) or totally (toString).
It has similar useful methods for binary data too. And then there's the rest of the library...
I agree it's annoying that there's nothing similar in the standard libraries. Heck, just being able to supply a CharSet to a FileReader would make life a little simpler...
Another alternative approach is:
How do I create a Java string from the contents of a file?
Other option is to use utilities provided open source libraries
http://commons.apache.org/io/api-1.4/index.html?org/apache/commons/io/IOUtils.html
Why java doesn't provide such a common util API ?
a) to keep the APIs generic so that encoding, buffering etc is handled by the programmer.
b) make programmers do some work and write/share opensource util libraries :D ;-)
Sadly, no.
I agree that such frequent operation should have easier implementation than copying of input line by line in loop, but you'll have to either write helper method or use external library.
I discovered that the accepted answer actually doesn't always work, because \\Z may occur in the file. Another problem is that if you don't have the correct charset a whole bunch of unexpected things may happen which may cause the scanner to read only a part of the file.
The solution is to use a delimiter which you are certain will never occur in the file. However, this is theoretically impossible. What we CAN do, is use a delimiter that has such a small chance to occur in the file that it is negligible: such a delimiter is a UUID, which is natively supported in Java.
String content = new Scanner(file, "UTF-8")
.useDelimiter(UUID.randomUUID().toString()).next();

Categories