It seems that there are many, many ways to read text files in Java (BufferedReader, DataInputStream etc.) My personal favorite is Scanner with a File in the constructor (it's just simpler, works with mathy data processing better, and has familiar syntax).
Boris the Spider also mentioned Channel and RandomAccessFile.
Can someone explain the pros and cons of each of these methods? To be specific, when would I want to use each?
(edit) I think I should be specific and add that I have a strong preference for the Scanner method. So the real question is, when wouldn't I want to use it?
Lets start at the beginning. The question is what do you want to do?
It's important to understand what a file actually is. A file is a collection of bytes on a disc, these bytes are your data. There are various levels of abstraction above that that Java provides:
File(Input|Output)Stream - read these bytes as a stream of byte.
File(Reader|Writer) - read from a stream of bytes as a stream of char.
Scanner - read from a stream of char and tokenise it.
RandomAccessFile - read these bytes as a searchable byte[].
FileChannel - read these bytes in a safe multithreaded way.
On top of each of those there are the Decorators, for example you can add buffering with BufferedXXX. You could add linebreak awareness to a FileWriter with PrintWriter. You could turn an InputStream into a Reader with an InputStreamReader (currently the only way to specify character encoding for a Reader).
So - when wouldn't I want to use it [a Scanner]?.
You would not use a Scanner if you wanted to, (these are some examples):
Read in data as bytes
Read in a serialized Java object
Copy bytes from one file to another, maybe with some filtering.
It is also worth nothing that the Scanner(File file) constructor takes the File and opens a FileInputStream with the platform default encoding - this is almost always a bad idea. It is generally recognised that you should specify the encoding explicitly to avoid nasty encoding based bugs. Further the stream isn't buffered.
So you may be better off with
try (final Scanner scanner = new Scanner(new BufferedInputStream(new FileInputStream())), "UTF-8") {
//do stuff
}
Ugly, I know.
It's worth noting that Java 7 Provides a further layer of abstraction to remove the need to loop over files - these are in the Files class:
byte[] Files.readAllBytes(Path path)
List<String> Files.readAllLines(Path path, Charset cs)
Both these methods read the entire file into memory, which might not be appropriate. In Java 8 this is further improved by adding support for the new Stream API:
Stream<String> Files.lines(Path path, Charset cs)
Stream<Path> Files.list(Path dir)
For example to get a Stream of words from a Path you can do:
final Stream<String> words = Files.lines(Paths.get("myFile.txt")).
flatMap((in) -> Arrays.stream(in.split("\\b")));
SCANNER:
can parse primitive types and strings using regular expressions.
A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types.more can be read at http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html
DATA INPUT STREAM:
Lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class. More can be read at http://docs.oracle.com/javase/7/docs/api/java/io/DataInputStream.html
BufferedReader:
Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.More detail are at http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
NOTE: This approach is outdated. As Boris points out in his comment. I will leave it here for history, but you should use methods available in JDK.
It depends on what kind of operation you are doing and the size of the file you are reading.
In most of the cases, I recommend using commons-io for small files.
byte[] data = FileUtils.readFileToByteArray(new File("myfile"));
You can read it as string or character array...
Now, you are handing big files, or changing parts of a file directly on the filesystem, then the best it to use a RandomAccessFile and potentially even a FileChannel to do "nio" style.
Using BufferedReader
BufferedReader reader;
char[] buffer = new char[10];
reader = new BufferedReader(new FileReader("FILE_PATH"));
//or
reader = Files.newBufferedReader(Path.get("FILE_PATH"));
while (reader.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new char[10];
}
//or
while (buffReader.ready()) {
System.out.println(
buffReader.readLine());
}
reader.close();
Using FileInputStream-Read Binary Files to Bytes
FileInputStream fis;
byte[] buffer = new byte[10];
fis = new FileInputStream("FILE_PATH");
//or
fis=Files.newInoutSream(Paths.get("FILE_PATH"))
while (fis.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new byte[10];
}
fis.close();
Using Files– Read Small File to List of Strings
List<String> allLines = Files.readAllLines(Paths.get("FILE_PATH"));
for (String line : allLines) {
System.out.println(line);
}
Using Scanner – Read Text File as Iterator
Scanner scanner = new Scanner(new File("FILE_PATH"));
while (scanner.hasNextLine()) {
System.out.println(scanner.nextLine());
}
scanner.close();
Using RandomAccessFile-Reading Files in Read-Only Mode
RandomAccessFile file = new RandomAccessFile("FILE_PATH", "r");
String str;
while ((str = file.readLine()) != null) {
System.out.println(str);
}
file.close();
Using Files.lines-Reading lines as stream
Stream<String> lines = Files.lines(Paths.get("FILE_PATH") .forEach(s -> System.out.println(s));
Using FileChannel-for increasing performance by using off-heap memory furthermore using MappedByteBuffer
FileInputStream i = new FileInputStream(("FILE_PATH");
ReadableByteChannel r = i.getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(16 * 1024);
while (r.read(buffer) != -1) {
buffer.flip();
while (buffer.hasRemaining()) {
System.out.print((char) buffer.get());
}
buffer.clear();
}
Related
I have written a little program that just reads a files contents and writes it to a new copy. This works perfectly with text files, but with PNGs and video files, it fails to correctly create the file (the image is all black or the video will not play). I know there are APIs that can copy files with one line, but I'd love to know why this isn't working. Here is the code:
import java.io.*;
public class CopyFile
{
public static void main(String[] args) throws Exception
{
File file = new File("test.mp4");
File copy = new File("copy.mp4");
InputStreamReader input = new InputStreamReader(new FileInputStream(file));
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(copy));
System.out.println(input.ready());
while(input.ready())
{
int i = input.read();
//System.out.print( (char) ( (byte) i));
out.write(i);
}
input.close();
out.flush();
out.close();
}
}
Don't use Reader and Writer unless you know the input is text. Use InputStream and OutputStream.
Don't use ready(), or, for Sotirios' benefit, available() either. Neither of them is a valid test for end of stream. They both concern whether the input can be read without blocking, which isn't the same thing at all. See the Javadoc.
You're not detecting end of stream correctly. If read() returns -1 you're still copying that to the output.
Copying a single character or single byte at a time is extremely slow.
The canonical way to copy streams in Java is as follows:
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
where count is an int, and buffer is a byte[] of any size greater than zero, typically 8192.
Readers and Writers are for reading character streams (i.e., text). Pictures and videos are binary data, not text, and will probably be corrupted if you pass them through character streams. This is because, depending on the character set, there is not necessarily a reversible mapping between bytes and characters. Some byte sequences are gibberish if interpreted as characters, then gibberish gets written back to the file.
Use the InputStream and OutputStream that you open directly, instead of wrapping them up as a Reader and Writer, and it will work correctly. These are byte streams and can handle any type of data.
E.g.,
InputStream input = new FileInputStream(file);
OutputStream out = new FileOutputStream(copy);
P.S. This will still be quite slow. You can wrap the streams in a BufferedInputStream and BufferedOutputStream for a simple way to improve performance, although the one-line copy APIs will probably still be faster.
My code reads through an xml file encoded with UTF-8 until a specfied string has been found. It finds the specified string fine, but I wish to write at this point in the file.
I would much prefer to do this through a stream as only small tasks need to be done.
I cannot find a way to do this. Any alternative methods are welcome.
Code so far:
final String RESOURCE = "/path/to/file.xml";
BufferedReader in = new BufferedReader(new InputStreamReader(ClassLoader.class.getResourceAsStream(RESOURCE), "UTF-8"));
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(ClassLoader.class.getResource(RESOURCE).getPath()),"UTF-8"));
String fileLine = in.readLine();
while (!fileLine.contains("some string")) {
fileLine = in.readLine();
}
// File writing code here
You can't really write into the middle of the file, except for overwriting existing bytes (using something like RandomAccessFile). that would only work, however, if what you needed to write was exactly the same byte length as what you were replacing, which i highly doubt.
instead, you need to re-write the file to a new file, copying the input to the output, replacing the parts you need to replace in the process. there are a variety of ways you could do this. i would recommend using a StAX event reader and writer as the StAX api is fairly user friendly (compared to SAX) as well as fast and memory efficient.
I have program in which I have to load a PNG as a String and then save it again, but after I save it it becomes unreadable. If I open both the loaded PNG and the saved String in the editor, I can see that java created linebreaks all over the file. If this is is the problem, how can I avoid this?
public static void main(String[] args)
{
try
{
File file1 = new File("C://andim//testFile.png");
StringBuffer content = new StringBuffer();
BufferedReader reader = null;
reader = new BufferedReader(new FileReader(file1));
String s = null;
while ((s = reader.readLine()) != null)
{
content.append(s).append(System.getProperty("line.separator"));
}
reader.close();
String loaded=content.toString();
File file2=new File("C://andim//testString.png");
FileWriter filewriter = new FileWriter(file2);
filewriter.write(loaded);
filewriter.flush();
filewriter.close();
}
catch(Exception exception)
{
exception.printStackTrace();
}
}
I have program in which I have to load a PNG as a String and then save it again, but after I save it it becomes unreadable.
Yes, I'm not surprised. You're treating arbitrary binary data as if it's text data (in whatever your platform default encoding is, to boot). It's not. Don't do that. It's possible that in some encodings you'll get away with it - until you start trying to pass the string elsewhere in a way that strips unprintable characters etc.
If you must convert arbitrary binary data to text, use base64 or hex. If possible, avoid the conversion to text in the first place though. If you just want to copy a file, use InputStream and OutputStream - not Reader and Writer.
This is a big general point: keep data in its "native" representation as long as you possibly can. Only convert data to a different representation when you absolutely have to, and be very careful about it.
Don't use text-based APIs to read binary files. In this case, you don't want a BufferedReader, and you certainly don't want readLine, which may well treat more than just one thing as a line separator. Use an InputStream (for instance, FileInputStream) and an OutputStream (for instance, FileOutputStream), not readers and writers.
Don't do that
PNGs are not textual data.
If you try to read arbitrary bytes into a string, Java will mangle the bytes into actual text, corrupting the data you read.
You need to use byte[]sm not strings.
I have to write a code in JAVA like following structure:
Read String From File
// Perform some string processing
Write output string in file
Now, for reading/writing string to/from file, I am using,
BufferedReader br = new BufferedReader(new FileReader("Text.txt"), 32768);
BufferedWriter out = new BufferedWriter(new FileWriter("AnotherText.txt"), 32768);
while((line = br.readLine()) != null) {
//perform some string processing
out.write(output string) ;
out.newLine();
}
However, it seems reading and writing is quite slow. Is there any other fastest method to read/write strings to/from a file in JAVA ?
Additional Info:
1) Read File is 144 MB.
2) I can allocate large memory (50 MB) for reading or writing.
3)I have to write it as a string, not as Byte.
It sounds slower than it should be.
You can try increasing the buffer size.
Maybe also try FileOutputStream instead of FileWriter.
You mentioned 50MB. Are you modifying the memory parameters of the program at all when you run it using a -X switch?
Ignoring the fact that you have not posted what your performance requirements are:
Try reading/writing the file as bytes and internally convert the byte to characters/string.
This question might be helpful: Number of lines in a file in Java
I was handed some data in a file with an .dat extension. I need to read this data in a java program and build the data into some objects we defined. I tried the following, but it did not work
FileInputStream fstream = new FileInputStream("news.dat");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
Could someone tell me how to do this in java?
What kind of file is it? Is it a binary file which contains serialized Java objects? If so, then you rather need ObjectInputStream instead of DataInputStream to read it.
FileInputStream fis = new FileInputStream("news.dat");
ObjectInputStream ois = new ObjectInputStream(fis);
Object object = ois.readObject();
// ...
(don't forget to properly handle resources using close() in finally, but that's beyond the scope of this question)
See also:
Basic serialization tutorial
A .dat file is usually a binary file, without any specific associated format. You can read the raw bytes of the file in a manner similar to what you posted - but you will need to interpret these bytes according to the underlying format. In particular, when you say "open" the file, what exactly do you want to happen in Java? What kind of objects do you want to be created? How should the stream of bytes map to these objects?
Once you know this, you can either write this layer yourself or use an existing API (assuming it's a standard format).
For reference, your example doesn't work because it assumes that the binary format is a character representation in the platform's default charset (as per the InputStreamReader constructor). And as you say it's binary, this will fail to convert the binary to a stream of characters (since, after all, it's not).
// BufferedInputStream not strictly needed, but much more efficient than reading
// one byte at a time
BufferedInputStream in = new BufferedInputStream (new FileInputStream("news.dat"));
This will give you a buffered stream which will return the raw bytes of the file; you can now either read and process them yourself, or pass this input stream to some library API that will create appropriate objects for you (if such a library exists).
That entirely depends on what sort of file the .dat is. Unfortunately, .dat is often used as a generic extension for a data file. It could be binary, in which case you could use FileInputStream fstream = new FileInputStream(new File("news.dat")); and call read() to get bytes from the file, or text, in which case you could use BufferedReader buff = new BufferedInputReader(new FileInputStream(new File("news.dat"))); and call readLine() to get each line of text. [edit]Or it could be Java objects in which case what BalusC said.[/edit]
In both cases, you'd then need to know what format the file was in to divide things up and get meaning from it, although this would be much easier if it was text as it could be done by inspection.
Please try the below code:
FileReader file = new FileReader(new File("File.dat"));
BufferedReader br = new BufferedReader(file);
String temp = br.readLine();
while (temp != null) {
temp = br.readLine();
System.out.println(temp);
}
A better way would be to use try-with-resources so that you would not have to worry about closing the resources.
Here is the code.
FileInputStream fis = new FileInputStream("news.dat");
try(ObjectInputStream objectstream = new ObjectInputStream(fis)){
objectstream.readObject();
}
catch(IOException e){
//
}