The question says it all.
What are the consequences of not closing the various byte streams?
It is very much emphasized to always do so, but there is no mention of how it causes problems.
Can someone please explain what actually happens?
This is not only byte streams. This concerns anything implementing Closeable.
As the documentation states:
The close method is invoked to release resources that the object is holding (such as open files).
Whether a Closeable holds system resources or not, the rule of thumb is: do not take the chance. .close() it correctly, and you'll be ensured that such system resources (if any) are freed.
Typical idiom (note that InputStream implements Closeable):
final InputStream in = whateverIsNeeded;
try {
workWith(in);
} finally {
in.close();
}
With Java 7 you also have AutoCloseable (which Closeable implements) and the try-with-resources statement, so do:
try (
final InputStream in = whateverIsNeeded;
) {
workWith(in);
}
This will handle closing in for you.
Again: don't take the chance. And if you don't use JDK 7 but can afford Guava, use Closer.
Not closing limited resources such as database connections will dramatically slow down execution, and likely result in errors as those connections run out, with old ones sitting there unused.
Not closing file-streams could result in multiple threads writing to the same file, or files not being terminated properly, or files being locked when another thread attempts to write or read it.
This is a major topic relating to all Closeables, as stated by #fge. There are numerous libraries supplying things such as connection pools and caches for handling problems such as this.
More information:
https://www.google.com/search?q=consequentes+of+not+closing+resources+java
It will hang around util collected by the GC. (thus holding to unmanaged resources (files, sockets etc)
There are several streams including:
ByteArray
File
Filter
Object
Piped
Corba version of the outputStream
Depending on the kind of resource is behind the stream the result could be different. InByteArrayInputStream and ByteArrayOutputStream where the documentation says:
Closing a ByteArrayInputStream has no effect. The methods in this
class can be called after the stream has been closed without
generating an IOException.
But in FileInputStream there is an open file. If you keept it open there is memory reserved and anybody who tries to edit the file will find it locked. In case of doubt always call the close() method.
Related
org.apache.hadoop.io.compress.**GzipCodec, in this class GzipOutputStream is not closed, so memory leak.
How to close GzipOutputStream? Or other stream should also be closed? Is there a good alternative?
spark version is 2.1.0 and hadoop version is 2.8.4
sparkPairRdd.saveAsHadoopFile(outputPath, String.class, String.class, MultipleTextOutputFormat.class, GzipCodec.class);
If I am understanding the GzipCodec class correctly, its purpose is to create various compressor and decompressor streams and return them to the caller. It is not responsible for closing those streams. That is the responsibility of the caller.
How to close a GzipOutputStream?
You simply call close() on the object. If saveAsHadoopFile is using GzipCodec to create a GzipOutputStream, then that method is responsible for closing it.
Or other stream should also be closed?
The same as for a GzipOutputStream. Call close() on it.
Is there a good alternative?
To calling close explicitly?
As an alternative, you could manage a stream created by GzipCodec using try with resources.
But if you are asking if there is a way to avoid managing the streams properly, then the answer is No.
If you are actually encountering a storage leak that is (you think) due to saveAsHadoopFile not closing the streams that it opens, please provide a minimal reproducible example that we can look at. It could be a bug in Hadoop ... or you could be using it incorrectly.
Here is a line reading a file into a List:
List<String> lines =
new BufferedReader(
new InputStreamReader(classLoader.getResourceAsStream(fileName)))
.lines()
.collect(Collectors.toList());
Is this correct or should I assign the BufferedReader to a variable to be able to close it later?
You should always close your resources. Closing may not be a big problem for small programs which only use a couple of files quickly, since most mature OSes will close the files for you when the process completes. However, there are usually limits on how many files you can have open at one time. It is good to be tidy, so that you don't hit those limits when you start writing bigger programs. There are also other types of resources, like network and serial ports, which you may want to let others use once your program is done with them, even if it is still running.
An alternative to closing the file manually is using try-with-resources syntax, which ensures that the file will be closed properly even in case of an error:
List<String> lines;
try(BufferedReader reader = new BufferedReader(
new InputStreamReader(classLoader.getResourceAsStream(fileName)))) {
lines = reader.lines().collect(Collectors.toList());
}
Well, in your concrete example, the stream opened by
classLoader.getResourceAsStream(fileName)
is never closed. This stream must be closed - it is most likely a file handle in the local system. You can close it by closing the BufferedReader, which closes the wrapped InputStreamReader, which closes the underlying InputStream. You could instead also store a reference to the original InputStream and only close this.
Please also have a look into try-with-resources, this could potentially make things easier for you here.
I stand corrected
From documentation:
Streams have a close() method and implement AutoCloseable interface, but nearly all stream instances do not actually need to be closed after use.
Generally, only streams whose source is an IO channel, for example a BufferedReader.lines will require closing.
Most streams are backed by collections, arrays, or generating functions, which require no special resource management. If a stream does require closing, it can be declared as a resource in a try-with-resources statement.
When my program starts, it opens a file and writes to it periodically. (It's not a log file; it's one of the outputs of the program.) I need to have the file available for the length of the program, but I don't need to do anything in particular to end the file; just close it.
I gather that for file I/O in Java I'm supposed to implement AutoCloseable and wrap it in a try-with-resources block. However, because this file is long-lived, and it's one of a few outputs of the program, I'm finding it hard to organize things such that all the files I open are wrapped in try-with-resources blocks. Furthermore, the top-level classes (where my main() function lies) don't know about this file.
Here's my code; note the lack of writer.close():
public class WorkRecorder {
public WorkRecorder(String recorderFile) throws FileNotFoundException {
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(recorderFile)));
}
private Writer writer;
public void record(Data data) throws Exception {
// format Data object to match expected file format
// ...
writer.write(event.toString());
writer.write(System.lineSeparator());
writer.flush();
}
}
tl;dr do I need to implement AutoCloseable and call writer.close() if the resource is an opened output file, and I never need to close it until the program is done? Can I assume the JVM and the OS (Linux) will clean things up for me automatically?
Bonus (?): I struggled with this in C#'s IDisposeable too. The using block, like Java's try-with-resources construct, is a nice feature when I have something that I'm going to open, do something with quickly, and close right away. But often that's not the case, particularly with files, when the access to that resource hangs around for a while, or when needing to manage multiple such resources. If the answer to my question is "always use try-with-resources blocks" I'm stuck again.
I have similar code that doesn't lend itself to being wrapped in a try-with-resources statement. I think that is fine, as long as you close it when the program is done.
Just make sure you account for any Exceptions that may happen. For example, in my program, there is a cleanup() method that gets called when the program is shut down. This calls writer.close(). This is also called if there is any abnormal behavior that would cause the program to shut down.
If this is just a simple program, and you're expecting the Writer to be open for its duration, I don't think it's really a big deal for it to not be closed when the program terminates...but it is good practice to make sure your resources are closed, so I would go ahead and add that to wherever your program may shut down.
You should always close resources or set them to null so it can be picked up by the garbage collector in Java. Using try-with-resource blocks is a great way to have Java automatically close resources when you're done with them. Even if you use it for the duration of the program, it is good programming practice to close it even at the end. Some might say you don't need to, I personally would say just go ahead and do it and here's why:
"When a stream is no longer needed, always close it using the close() method or automatically close it using a try-with-resource statement. Not closing streams may cause data corruption in the output file, or other programming errors."
-Introduction to Java Programming 10th Edition, Y. Daniel Liang
If possible, just run the .close() method on the resource at the very end of the program.
I (now) think a better answer is "It depends" :-). A detailed treatment is provided by Lukas Eder here. Also check out the Lambda EG group post.
But in general, it's a good idea to return the resource back to the operating system when you are done with it and use try-with-resources all the time (except when you know what you are doing).
Closing and flushing IO resources is very important and seldom done correctly (at least by me). The reason for this is that most of the time, it still works without doing it correctly. Files are closed by the garbage collector, which happens from time to time in most applications. Flushing is done automatically when a stream is closed (possibly also by the garbage collector) or when a lot of data is written.
Java 1.7's try-with-resource makes it much easier to close IO resources if their lifetime coincides with the lifetime of a local variable. Not so much if they should e.g. live as long as some other object, but that is another story.
Since I started writing programs that are complex enough that I needed to use resources that wrap other resource, I instead find that it's much harder to decide what to close and/or flush than when to do it. Examples of wrapping a resource in another resource are:
Creating an InputStreamReader from an InputStream.
Creating an InputStream from a ReadableByteChannel.
Creating a DataOutputStream from an OutputStream.
Creating a PrintStream or OutputStreamWriter from an OutputStream.
This may also happen multiple layers deep, like wrapping a ReadableByteChannel in an InputStream in a GZIPInputStream in an InputStreamReader in a BufferedReader (never had to do that but seems plausible). Almost always the wrapping and the wrapped resources should have the same lifetime and it is most convenient if flushing can be done on the outermost resource, where writing is also done, so that only one object needs to be passed around.
In all this time I've never seen a satisfactory explanation of how closing and flushing interacts with resources wrapped in other resources. My assumptions are the following:
Flushing a resource (i.e. calling flush() on it) also flushes wrapped resources recursively until data is pushed onto e.g. the disk or the network.
Closing a resource (i.e. calling close() on it) also closes wrapped resources recursively until some operating system resource is freed.
Now to my question; are these assumptions correct when using JDK implementations of IO resources, specifically of the interfaces InputStream, OutputStream, ReadableByteChannel, WritableByteChannel, Reader and Writer?
If one or both assumptions are not correct at all, what assumptions would be better?
If those assumptions are not always correct, where does the behavior of an implementation differ and what are the reasons?
Is there any reason for calling close methods on the StreamWriter class? Why do I have to do it? If I don't close the StreamWriter will I get some kind of undefined behavior?
Assuming you're talking about java.io.OutputStreamWriter, yes, you should close it, in a finally block, when you don't want to write anything more. This allows the underlying OutputStream to be closed. If the underlying OutputStream is a FileOutputStream, it will release the file descriptor (which is a limited OS resource), and allow other aps to read the file. If it's a SocketOutputSTream, it will signal to the other side that it shouldn't expect anything more from the socket input stream.
In general, streams and readers/writers must always be closed properly. If using Java 7, use the new try-with-resources construct to make sure it's done automatically for you.
The operating system manages files, and if in java the file is not closed, system wide resources are lost.
In java 7 you can however use
try (OutputStreamWriter outWriter = new OuputStreamWriter(outStream, "UTF-8")) {
...
}
without close. (Output streams and writers implement Closeable).
BTW #PriestVallon was just trying to make you formulate your question a bit better/attractive for answering. A "light" response to that can be misunderstood as you've seen.
Writing and reading streams involves quite often the use of os resources,as sockets,file handles and so on.if you're writing on a stream you should also close it,im order to release resources you may have obtained(it depends on the actualresources you are using beneath the stream). Sometimes closing a stream writer involves the release of an exclusive allocation of a resource, or the flushing of temporary data to the stream.
Sometimes the close is uneffective, it depends on the kind of stream you have, but the interface must take care of all the cases where a stream have to be closed.