I know that if I do something like
copyFromInToOut(new FileInputStream(f1), new FileOutputStream(f2));
System.gc();
It will run the GC on those FileInputStreams, closing them. But if I do
copyFromInToOut(new BufferedInputStream(new FileInputStream(f1)), new BufferedOutputStream(new FileOutputStream(f2));
System.gc();
Is there any danger that the FileOutputStream will be GCed before the BufferedOutputStream, not causing the buffer to flush?
I can't call flush, close, because that takes more steps than this. It would first involve declaring a bufferedinputstream, passing, then calling close. OR am I safe to do this?
Don't call System.gc() explicitly. Don't rely on finalizers to do anything. Especially if you don't understand how garbage collection works. Explicit garbage collection requests can be ignored, and finalizers might never run.
A well-written copyFromInToOut method for streams is likely to use its own buffer internally, so wrapping the output should be unnecessary.
Declare variables for the FileInputStream and FileOutputStream, and invoke close() on each in a finally block:
FileInputStream is = new FileInputStream(f1);
try {
FileOutputStream os = new FileOutputStream(f2);
try {
copyFromInToOut(is, os);
os.flush();
} finally {
os.close();
}
} finally {
is.close();
}
No InputStream implementation that I am aware with will close() for you when it is GCd. You MUST close() the InputStream manually.
EDIT: Apparently FileInputStream does close() for you in a finalize method which I wasn't aware of, but see other answers for the reason why you shouldn't rely on this.
In both your examples above you must close both the Input and Output streams. For the wrapped buffer case, and for any wrapped case, you should need only call close() on the outer-most InputStream, in this case BufferedInputStream
well it's interesting to analyze what's really happening when you do that, just don't do that.
always close your streams explicitly.
(to answer your question, yes, bytes in buffers may not have been flushed to file i/o streams when gc occurs and they are lost)
Streams should be closed explicitly using the pattern shown in #erickson's answer. Relying on finalization to close streams for you is a really bad idea:
Calling System.gc() is expensive, especially since (if it does anything) to is likely to trigger a full garbage collection. That will cause every reference in every reachable object in your heap to be traced.
If you read the javadocs for System.gc() you will see that it is only a "hint" to the JVM to run the GC. A JVM is free to ignore the hint ... which takes us to the next problem.
If you don't run the GC explicitly, it might be a long time until the GC runs. And even then, there's no guarantee that finalizers are run immediately.
In the mean time:
all open files stay open, possibly preventing other applications using them
any unwritten data in the streams remains unwritten
your java application might even run into problems opening other streams do to running out of file descriptor slots.
And there's one last problem with relying on finalization to deal with output streams. If there is unflushed data in the stream when it is finalized, an output stream class will attempt to flush it. But this is all happening on a JVM internal thread, not one of your application's threads. So if the flush fails (e.g. because the file system is full), your application won't be able to catch the resulting exception, and therefore won't be able to report it ... or do anything to recover.
EDIT
Returning to the original question, it turns out that the BufferedOutputStream class does not override the default Object.finalize() method. So that means that a BufferedOutputStrean is not flushed at all when it is garbage collected. Any unwritten data in the buffer will be lost.
That's yet another reason for closing your streams explicitly. Indeed, in this particular case, calling System.gc() is not just bad practice; it is also likely to lead to loss of data.
Related
Closing and flushing IO resources is very important and seldom done correctly (at least by me). The reason for this is that most of the time, it still works without doing it correctly. Files are closed by the garbage collector, which happens from time to time in most applications. Flushing is done automatically when a stream is closed (possibly also by the garbage collector) or when a lot of data is written.
Java 1.7's try-with-resource makes it much easier to close IO resources if their lifetime coincides with the lifetime of a local variable. Not so much if they should e.g. live as long as some other object, but that is another story.
Since I started writing programs that are complex enough that I needed to use resources that wrap other resource, I instead find that it's much harder to decide what to close and/or flush than when to do it. Examples of wrapping a resource in another resource are:
Creating an InputStreamReader from an InputStream.
Creating an InputStream from a ReadableByteChannel.
Creating a DataOutputStream from an OutputStream.
Creating a PrintStream or OutputStreamWriter from an OutputStream.
This may also happen multiple layers deep, like wrapping a ReadableByteChannel in an InputStream in a GZIPInputStream in an InputStreamReader in a BufferedReader (never had to do that but seems plausible). Almost always the wrapping and the wrapped resources should have the same lifetime and it is most convenient if flushing can be done on the outermost resource, where writing is also done, so that only one object needs to be passed around.
In all this time I've never seen a satisfactory explanation of how closing and flushing interacts with resources wrapped in other resources. My assumptions are the following:
Flushing a resource (i.e. calling flush() on it) also flushes wrapped resources recursively until data is pushed onto e.g. the disk or the network.
Closing a resource (i.e. calling close() on it) also closes wrapped resources recursively until some operating system resource is freed.
Now to my question; are these assumptions correct when using JDK implementations of IO resources, specifically of the interfaces InputStream, OutputStream, ReadableByteChannel, WritableByteChannel, Reader and Writer?
If one or both assumptions are not correct at all, what assumptions would be better?
If those assumptions are not always correct, where does the behavior of an implementation differ and what are the reasons?
The question says it all.
What are the consequences of not closing the various byte streams?
It is very much emphasized to always do so, but there is no mention of how it causes problems.
Can someone please explain what actually happens?
This is not only byte streams. This concerns anything implementing Closeable.
As the documentation states:
The close method is invoked to release resources that the object is holding (such as open files).
Whether a Closeable holds system resources or not, the rule of thumb is: do not take the chance. .close() it correctly, and you'll be ensured that such system resources (if any) are freed.
Typical idiom (note that InputStream implements Closeable):
final InputStream in = whateverIsNeeded;
try {
workWith(in);
} finally {
in.close();
}
With Java 7 you also have AutoCloseable (which Closeable implements) and the try-with-resources statement, so do:
try (
final InputStream in = whateverIsNeeded;
) {
workWith(in);
}
This will handle closing in for you.
Again: don't take the chance. And if you don't use JDK 7 but can afford Guava, use Closer.
Not closing limited resources such as database connections will dramatically slow down execution, and likely result in errors as those connections run out, with old ones sitting there unused.
Not closing file-streams could result in multiple threads writing to the same file, or files not being terminated properly, or files being locked when another thread attempts to write or read it.
This is a major topic relating to all Closeables, as stated by #fge. There are numerous libraries supplying things such as connection pools and caches for handling problems such as this.
More information:
https://www.google.com/search?q=consequentes+of+not+closing+resources+java
It will hang around util collected by the GC. (thus holding to unmanaged resources (files, sockets etc)
There are several streams including:
ByteArray
File
Filter
Object
Piped
Corba version of the outputStream
Depending on the kind of resource is behind the stream the result could be different. InByteArrayInputStream and ByteArrayOutputStream where the documentation says:
Closing a ByteArrayInputStream has no effect. The methods in this
class can be called after the stream has been closed without
generating an IOException.
But in FileInputStream there is an open file. If you keept it open there is memory reserved and anybody who tries to edit the file will find it locked. In case of doubt always call the close() method.
I don't understand. For example We have an variable of OutputStream type in a code, we should call close() on it when we stop using it, well why it wasn't implemented this way: GC calls close() itself when it does a clean up of this variable?
Update:
ok What I've concluded so far: releasing unmanaged recourses is not only about releasing memory what is more important that we don't now internal behavior behind it, it could exists a limitation on amount of this resource(number of connections/handlers) that's why we need to free them as soon as possible. Am I right? Because if it's all about the memory then I don't know why GC can't do the job just the way it does with managed recourses.
close will in most cases eventually be called automatically by the GC through the finalize method (any class can have such finalize that is called by the GC when destroying the object; for Closable types that hold resources it will generally be implemented to call close). The issue of course is that you just don't have control over when this happens (if ever). It could be 10 sec or 10 min from when you stopped needing the object, still using up whatever resource was allocated. So it's good style to clean up your resource handles once you don't need them anymore.
Also, since Java 7 you can actually do this:
try (BufferedReader br =
new BufferedReader(new FileReader(path))) {
return br.readLine();
}
And close will be called automatically at the end of the try block.
See the documentation for more details.
And finally, to the why there is Closable and things that need explicit closing in the first place, this is for resources that are not directly managed by the JVM and can hence also not automatically be reclaimed by GC. Such resources are for instances files, sockets or audio output.
Assume that I have the following code fragment:
operation1();
bw.close();
operation2();
When I call BufferedReader.close() from my code, I am assuming my JVM makes a system call that ensures that the buffer has been flushed and written to disk. I want to know if close() waits for the system call to complete its operation or does it proceed to operation2() without waiting for close() to finish.
To rephrase my question, when I do operation2(), can I assume that bw.close() has completed successfully?
when I do operation2(), can I assume that bw.close() has completed successfully?
Yes
Close the stream, flushing it first. Once a stream has been closed, further write() or flush() invocations will cause an IOException to be thrown. Closing a previously-closed stream, however, has no effect.
Though the documentation does not say anything specifically, I would assume this call does block until finished. In fact, I'm pretty sure nothing in the java.io package is non-blocking.
The JavaDoc for the java.io.BufferedReader.close() is taken exactly from the contract if fulfills with the java.io.Reader.
The Doc says:
Closes the stream and releases any system resources associated with it. Once the stream has been closed, further read(), ready(), mark(), reset(), or skip() invocations will throw an IOException. Closing a previously closed stream has no effect.
While this makes no explicit claim of blocking until the file system is complete, with this same instance of BufferedReader all other operations will throw an exception if close() returns. Although the JavaDoc could be seen as ambiguous about when the operation completes, if the file system flush and close were not complete when this method returned it would violate the spirit of the contract and be a bug in Java (implementation or documentation).
NO! You cannot be sure for the following reason:
A BufferedWriter is a Wrapper for another Writer. A close() to the BufferedWriter just propagates to the underlying Writer.
IF this underlying Writer is an OutputStreamWriter, and IF the OutputStream is a FileOutputStream, THEN the close will issue a system call to close the file handle.
You are completely free to even have a Writer where close() is a noop, or where the close is implemented non-blocking, but when using only classes from java.io, this is never the case.
A Writer (or BufferedWriter) is a black box that writes a stream of characters somewhere, not necessarily to the disk. A call to close() must (by method contract) flush its buffered content before closing, and should (normally) block before all its "essential" work is done. But this would depend on the implementation and the environment (you cannot know about caches that are below the Java layer, for example). In what respects of the work to be done by the Java writer itself (eg: make the system call to write to disk, in the case of a FileWriter or akin, and close the filehandle) , yes, you can assume that when close() returns it has already done all its work.
In general with any i/o operation you can make no assumptions about what has happened after the write() operation completes, even after you close. The idea of delivery is a subjective concept relative to the medium.
For instance, what if the writer represents a TCP connection, and then the data is lost inbetween client and server? Or what if the kernel writes data to a disk, but the drive physically fails to write it? Or if the writer represents a carrier pigeon that gets shot en route?
Furthermore, imagine the case when the write has no way of confirming that the endpoint has received the data (read: udp/datagrams). What should the blocking policy be in that situation?
The buffer will have been flushed to the operating system and the file handle closed, so the Java operations required will have been completed.
BUT the operating system will have cached or queued the write to the actual disk, pipe, network, whatever - there is no guarantee that the physical write has completed. FileChannel.force() provides a way to do that for files on local disks: see the Javadoc.
Yes, IF you reach operation2();, the stream would've had to have been completely closed. However, close() throws IOException, so you may not even get to operation2();. This may or may not be the behavior that you expect.
I have a question in my mind that, while writing into the file, before closing is done, should we include flush()??. If so what it will do exactly? dont streams auto flush??
EDIT:
So flush what it actually do?
Writers and streams usually buffer some of your output data in memory and try to write it in bigger blocks at a time. flushing will cause an immediate write to disk from the buffer, so if the program crashes that data won't be lost. Of course there's no guarantee, as the disk may not physically write the data immediately, so it could still be lost. But then it wouldn't be the Java program's fault :)
PrintWriters auto-flush (by default) when you write an end-of-line, and of course streams and buffers flush when you close them. Other than that, there's flushing only when the buffer is full.
I would highly recommend to call flush before close. Basically it writes remaining bufferized data into file.
If you call flush explicitly you may be sure that any IOException coming out of close is really catastrophic and related to releasing system resources.
When you flush yourself, you can handle its IOException in the same way as you handle your data write exceptions.
You don't need to do a flush because close() will do it for you.
From the javadoc:
"Close the stream, flushing it first. Once a stream has been closed, further write() or flush() invocations will cause an IOException to be thrown. Closing a previously-closed stream, however, has no effect."
To answer your question as to what flush actually does, it makes sure that anything you have written to the stream - a file in your case - does actually get written to the file there and then.
Java can perform buffering which means that it will hold onto data written in memory until it has a certain amount, and then write it all to the file in one go which is more efficient. The downside of this is that the file is not necessarily up-to-date at any given time. Flush is a way of saying "make the file up-to-date.
Close calls flush first to ensure that after closing the file has what you would expect to see in it, hence as others have pointed out, no need to flush before closing.
Close automatically flushes. You don't need to call it.
There's no point in calling flush() just before a close(), as others have said. The time to use flush() is if you are keeping the file open but want to ensure that previous writes have been fully completed.
As said, you don't usually need to flush.
It only makes sense if, for some reason, you want another process to see the complete contents of a file you're working with, without closing it. For example, it could be used for a file that is concurrently modified by multiple processes, although with a LOT of care :-)
FileWriter is an evil class as it picks up whatever character set happens to be there, rather than taking an explicit charset. Even if you do want the default, be explicit about it.
The usual solution is OutputStreamWriter and FileOutputStream. It is possible for the decorator to throw an exception. Therefore you need to be able to close the stream even if the writer was never constructed. If you are going to do that, you only need to flush the writer (in the happy case) and always close the stream. (Just to be confusing, some decorators, for instance for handling zips, have resources that do require closing.)
Another usecase for flushing in program is writing progress of longrunning job into file (so it can be stopped and restarted later. You want to be sure that data is safe on the drive.
while (true) {
computeStuff();
progresss += 1;
out.write(String.format("%d", progress));
out.flush();
}
out.close();