I am using BufferedWriter to write text with specific encoding to file, I want to count the file size in bytes before I close the file.
bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file),encoding),bsize);
bw.write(string);
My plan was to use string.getBytes() but this method doesn't allow to provide specific encoding (and I can't override the default encoding property).
Use String#getBytes(encoding) instead
If you're looking for a method that works regardless of whether you're appending or not and does not duplicate data just for counting, store the reference to the FileOutputStream in a variable and access the FileChannel through getFileChannel(). You can call long position() before and after the data has been written, and the difference of the values will give you the number of bytes that have been written.
Related
I would like to know the specific difference between BufferedReader and FileReader.
I do know that BufferedReader is much more efficient as opposed to FileReader, but can someone please explain why (specifically and in detail)? Thanks.
First, You should understand "streaming" in Java because all "Readers" in Java are built upon this concept.
File Streaming
File streaming is carried out by the FileInputStream object in Java.
// it reads a byte at a time and stores into the 'byt' variable
int byt;
while((byt = fileInputStream.read()) != -1) {
fileOutputStream.write(byt);
}
This object reads a byte(8-bits) at a time and writes it to the given file.
A practical useful application of it would be to work with raw binary/data files, such as images or audio files (use AudioInputStream instead of FileInputStream for audio files).
On the other hand, it is very inconvenient and slower for text files, because of looping through a byte at a time, then do some processing and store the processed byte back is tedious and time-consuming.
You also need to provide the character set of the text file, i.e if the characters are in Latin or Chinese, etc. Otherwise, the program would decode and encode 8-bits at a time and you'd see weird chars printed on the screen or written in the output file (if a char is more than 1 byte long, i.e. non-ASCII characters).
File Reading
This is just a fancy way of saying "File streaming" with inclusive charset support (i.e no need to define the charset, like earlier).
The FileReader class is specifically designed to deal with the text files.
As you've seen earlier, the file streaming is best to deal with raw binary data, but for the sake of text, it is not so efficient.
So the Java-dudes added the FileReader class, to deal specifically with the text files. It reads 2 bytes (or 4 bytes, depends on the charset) at a time. A remarkably huge improvement over the preceding FileInputStream!!
so the streaming operation is like this,
int c;
while ( (c = fileReader.read()) != -1) { // some logic }
Please note, Both classes use an integer variable to store the value retrieved from the input file (so every char is converted into an integer while fetching and back to the char while storing).
The only advantage here is that this class deals only with text files, so you don't have to specify the charset and a few other properties. It provides an out-of-the-box solution, for most of the text files processing cases. It also supports internationalization and localization.
But again it's still very slow (Imaging reading 2 bytes at a time and looping through it!).
Buffering streams
To tackle the problem of continuous looping over a byte or 2. The Java-dudes added another spectacular functionality. "To create a buffer of data, before processing."
The concept is pretty much alike when a user streams a video on YouTube. A video is buffered before playing, to provide flawless video watching experience. (Tho, the browser keeps buffering until the whole video is buffered ahead of time.) The same technique is used by the BufferedReader class.
A BufferedReader object takes a FileReader object as an input which contains all the necessary information about the text file that needs to be read. (such as the file path and charset.)
BufferedReader br = new BufferedReader( new FileReader("example.txt") );
When the "read" instruction is given to the BufferedReader object, it uses the FileReader object to read the data from the file. When an instruction is given, the FileReader object reads 2 (or 4) bytes at a time and returns the data to the BufferedReader and the reader keeps doing that until it hits '\n' or '\r\n' (The end of the line symbol).
Once a line is buffered, the reader waits patiently, until the instruction to buffer the next line is given.
Meanwhile, The BufferReader object creates a special memory place (On the RAM), called "Buffer", and stores all the fetched data from the FileReader object.
// this variable points to the buffered line
String line;
// Keep buffering the lines and print it.
while ((line = br.readLine()) != null) {
printWriter.println(line);
}
Now here, instead of reading 2 bytes at a time, a whole line is fetched and stored in the RAM somewhere, and when you are done with processing the data, you can store the whole line back to the hard disk. So it makes the process run way faster than doing 2 bytes a time.
But again, why do we need to pass FileReader object to the BufferReader? Can't we just say "buffer this file" and the BufferReader would take care of the rest? wouldn't that be sweet?
Well, the BufferReader class is created in a way that it only knows how to create a buffer and to store incoming data. It is irrelevant to the object from where the data is coming. So the same object can be used for many other input streams than just text files.
So being said that, When you provide the FileReader object as an input, it buffers the file, the same way if you provide the InputStreamReader as an object, it buffers the Terminal/Console input data until it hits a newline symbol. such as,
// Object that reads console inputs
InputStreamReader console = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(console);
System.out.println(br.readLine());
This way, you can read (or buffer) multiple streams with the same BufferReader class, such as text files, consoles, printers, networking data etc, and all you have to remember is,
bufferedReader.readLine();
to print whatever you've buffered.
In simple manner:
A FileReader class is a general tool to read in characters from a File. The BufferedReader class can wrap around Readers, like FileReader, to buffer the input and improve efficiency. So you wouldn't use one over the other, but both at the same time by passing the FileReader object to the BufferedReader constructor.
Very Detail
FileReader is used for input of character data from a disk file. The input file can be an ordinary ASCII, one byte per character text file. A Reader stream automatically translates the characters from the disk file format into the internal char format. The characters in the input file might be from other alphabets supported by the UTF format, in which case there will be up to three bytes per character. In this case, also, characters from the file are translated into char format.
As with output, it is good practice to use a buffer to improve efficiency. Use BufferedReader for this. This is the same class we've been using for keyboard input. These lines should look familiar:
BufferedReader stdin =
new BufferedReader(new InputStreamReader( System.in ));
These lines create a BufferedReader, but connect it to an input stream from the keyboard, not to a file.
Source: http://www.oopweb.com/Java/Documents/JavaNotes/Volume/chap84/ch84_3.html
BufferedReader requires a Reader, of which FileReader is one - it descends from InputStreamReader, which descends from Reader.
FileReader - read character files
BufferedReader - "Read text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines."
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
http://docs.oracle.com/javase/7/docs/api/java/io/FileReader.html
Actually BufferedReader makes use of Readers like FileReader.
FileReader class helps in writing on file but its efficency is low since it has yo retrive one character at a time from file but BufferedReader takes chunks of data and store it in buffer so instead of retriving one character at atime from file retrival becomes easy using buffer.
Bufferedreader - method that you can use actually as a substitute for Scanner method, gets file, gets input.
FileReader - as the name suggests.
Ok so I am learning about I/O, and I found the following code in one of the slides. can someone please explain why there is a need to have a FileWrite, BufferedWriter and PrintWriter? I know BufferedWriter is to buffer the output and put it all at once but why would they use FileWriter and PrintWriter ? dont they pretty much do the same with a bit of difference in error handling etc?
And also why do they pass bw to PrintWriter?
FileWriter fw = new FileWriter (file);
BufferedWriter bw = new BufferedWriter (fw);
PrintWriter outFile = new PrintWriter (bw);
Presumably they're using a FileWriter because they want to write to a file. Both BufferedWriter and PrintWriter have to be given another writer to write to - you need some eventual destination.
(Personally I don't like FileWriter as it doesn't let you specify the encoding. I prefer to use FileOutputStream wrapped in an OutputStreamWriter, but that's a different matter.)
BufferedWriter is used for buffering, as you say - although it doesn't buffer all the output, just a fixed amount of it (the size of the buffer). It creates "chunkier" writes to the underlying writer.
As for the use of PrintWriter - well, that exposes some extra methods such as println. Personally I dislike it as it swallows exceptions (you have to check explicitly with checkError, which still doesn't give the details and which I don't think I've ever seen used), but again it depends on what you're doing. The PrintWriter is passed the BufferedWriter as its destination.
So the code below the section you've shown will presumably write to the PrintWriter, which will write to the BufferedWriter, which will (when its buffer is full, or it's flushed or closed) write to the FileWriter, which will in turn convert the character data into bytes on disk.
From the Docs:
In general, a Writer sends its output immediately to the underlying character or byte stream. Unless prompt output is required, it is advisable to wrap a BufferedWriter around any Writer whose write() operations may be costly, such as FileWriters and OutputStreamWriters. For example,
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("foo.out")));
will buffer the PrintWriter's output to the file. Without buffering, each invocation of a print() method would cause characters to be converted into bytes that would then be written immediately to the file, which can be very inefficient.
You can understand from this that a BufferedWriter is an efficient way to write stuff.
Writes text to a character-output stream, buffering characters so as to provide for the efficient writing of single characters, arrays, and strings.
A FileWriter object is passed to the BufferedWriter as the intent here is to write to some output file using a BufferedWriter.
And finally, a PrintWriter is used for print* methods like println().
PrintWriter from here
Prints formatted representations of objects to a text-output stream.
This class implements all of the print methods found in PrintStream.
It does not contain methods for writing raw bytes, for which a program
should use unencoded byte streams.
from the above statement it seems the main reason to use PrintWriter is to get the access of all the methods of PrintStream like println(),println(char [] x) etc.
BufferedWriter, You are right It's one of the best way to write to a file because it will buffered the character into the virtual memory before writing to a file directly and came up with a newLine() method.
FileWriter from here
FileWriter is meant for writing streams of characters. For writing
streams of raw bytes, consider using a FileOutputStream
.
FileWriter is simply to write plain text(without formatting) it doesn't use any buffer mechanism, whatever comes its way it just writes.
BufferedWriter is a wrapper for Writer classes to allow it to be able to use buffer functionality (to optimize IO).
PrintWriter prints formatted text, you can provide format string along with the data to be printed, though it can directly work with any Writer/OutputStream, just to provide buffering, Writer/OutputStream is 1st passed to BufferedWriter then to have formatted text is passed to PrintWriter
Usually, this kind of Writer chaining is about abstraction. PrintWriter have some useful print and println methods that can be convenient if you want to print Strings and lines to a File. Working directly with FileWriter, you would have to use a more "low level" API. And as you say BufferedWriter is about buffering. So it's basically a matter of what you want to output to the file, and what level of abstraction you prefer.
The objects are wrapped in this order because you want to use the outermost PrintWriter for its more sophisticated formatting. BufferedWriter must be wrapped on something. So FileWriter, as a result, is what BufferedWriter wraps and is the innermost object.
I'm using the FindBug program from Maryland University and it gives me this error.
I've tested my code on numerous platforms and it works, so why is this code bad-practice, and what can I do to improve it?
It's telling you the encoding (how the string is turned into bytes) isn't specified.
If you write a text file in Turkey, and load it up in Uzbekistan then you might get different results. Instead (for example) you could specify the encoding directly by converting the string to bytes yourself using a specified encoding (see String.getBytes for an example).
you need to specify the charset
you can use anOutputStreamWriter
fileWriter = new OutputStreamWriter(new FileOutputStream(file),charset);
See the FileWriter documentation: "The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable. To specify these values yourself, construct an OutputStreamWriter on a FileOutputStream."
It can be considered bad practice to depend on default character encoding.
Use FileOutputStream, instead of FileWriter. Which can be wrapped using the OutputStreamWriter, which allows you to pass an encoding in the constructor.
Or else, as said by Jeff, the data won't load correctly.
Example
OutputStream fout = new FileOutputStream("test.txt");
OutputStream bout = new BufferedOutputStream(fout);
OutputStreamWriter out = new OutputStreamWriter(bout, "UTF-8");
I was wondering what the exact difference is between Android's FileOutputStream and FileWriter class. When would it be most appropriate to use each one?
If I remember correctly, FileOutputStream is more general purpose - it can be used for binary data or text data. FileWriter is used for text only.
http://docs.oracle.com/javase/1.4.2/docs/api/java/io/FileWriter.html
FileWriter is meant for writing streams of characters. For writing
streams of raw bytes, consider using a FileOutputStream.
I'm looking at the following example
Which uses the following code
try {
BufferedWriter out = new BufferedWriter(new FileWriter("outfilename"));
out.write("aString");
out.close();
}
catch (IOException e) {}
What's the advantage over doing
FileWriter fw = new FileWriter("outfilename");
I have tried both and they seem comparable in speed when it comes to the task of appending to a file one line at a time
The Javadoc provides a reasonable discussion on this subject:
In general, a Writer sends its output immediately to the underlying
character or byte stream. Unless prompt output is required, it is
advisable to wrap a BufferedWriter around any Writer whose write()
operations may be costly, such as FileWriters and OutputStreamWriters.
For example,
PrintWriter out = new PrintWriter(new BufferedWriter(new
FileWriter("foo.out")));
will buffer the PrintWriter's output to the
file. Without buffering, each invocation of a print() method would
cause characters to be converted into bytes that would then be written
immediately to the file, which can be very inefficient.
If you're writing large blocks of text at once (like entire lines) then you probably won't notice a difference. If you have a lot of code that appends a single character at a time, however, a BufferedWriter will be much more efficient.
Edit
As per andrew's comment below, the FileWriter actually uses its own fixed-size 1024 byte buffer. This was confirmed by looking at the source code. The BufferedWriter sources, on the other hand, show that it uses and 8192 byte buffer size (default), which can be configured by the user to any other desired size. So it seems like the benefits of BufferedWriter vs. FileWriter are limited to:
Larger default buffer size.
Ability to override/customize the buffer size.
And to further muddy the waters, the Java 6 implementation of OutputStreamWriter actually delegates to a StreamEncoder, which uses its own buffer with a default size of 8192 bytes. And the StreamEncoder buffer is user-configurable, although there is no way to access it directly through the enclosing OutputStreamWriter.
this is explained in the javadocs for outputstreamwriter. a filewriter does have a buffer (in the underlying outputstreamwriter), but the character encoding converter is invoked on each call to write. using an outer buffer avoids calling the converter so often.
http://download.oracle.com/javase/1.4.2/docs/api/java/io/OutputStreamWriter.html
A buffer effectivity is more easily seen when the load is high. Loop the out.write a couple thousand of times and you should see a difference.
For a few bytes passed in just one call probably the BufferedWriter is even worse (because it problably later calls FileOutputStream).