Is there any reason for calling close methods on the StreamWriter class? Why do I have to do it? If I don't close the StreamWriter will I get some kind of undefined behavior?
Assuming you're talking about java.io.OutputStreamWriter, yes, you should close it, in a finally block, when you don't want to write anything more. This allows the underlying OutputStream to be closed. If the underlying OutputStream is a FileOutputStream, it will release the file descriptor (which is a limited OS resource), and allow other aps to read the file. If it's a SocketOutputSTream, it will signal to the other side that it shouldn't expect anything more from the socket input stream.
In general, streams and readers/writers must always be closed properly. If using Java 7, use the new try-with-resources construct to make sure it's done automatically for you.
The operating system manages files, and if in java the file is not closed, system wide resources are lost.
In java 7 you can however use
try (OutputStreamWriter outWriter = new OuputStreamWriter(outStream, "UTF-8")) {
...
}
without close. (Output streams and writers implement Closeable).
BTW #PriestVallon was just trying to make you formulate your question a bit better/attractive for answering. A "light" response to that can be misunderstood as you've seen.
Writing and reading streams involves quite often the use of os resources,as sockets,file handles and so on.if you're writing on a stream you should also close it,im order to release resources you may have obtained(it depends on the actualresources you are using beneath the stream). Sometimes closing a stream writer involves the release of an exclusive allocation of a resource, or the flushing of temporary data to the stream.
Sometimes the close is uneffective, it depends on the kind of stream you have, but the interface must take care of all the cases where a stream have to be closed.
Related
org.apache.hadoop.io.compress.**GzipCodec, in this class GzipOutputStream is not closed, so memory leak.
How to close GzipOutputStream? Or other stream should also be closed? Is there a good alternative?
spark version is 2.1.0 and hadoop version is 2.8.4
sparkPairRdd.saveAsHadoopFile(outputPath, String.class, String.class, MultipleTextOutputFormat.class, GzipCodec.class);
If I am understanding the GzipCodec class correctly, its purpose is to create various compressor and decompressor streams and return them to the caller. It is not responsible for closing those streams. That is the responsibility of the caller.
How to close a GzipOutputStream?
You simply call close() on the object. If saveAsHadoopFile is using GzipCodec to create a GzipOutputStream, then that method is responsible for closing it.
Or other stream should also be closed?
The same as for a GzipOutputStream. Call close() on it.
Is there a good alternative?
To calling close explicitly?
As an alternative, you could manage a stream created by GzipCodec using try with resources.
But if you are asking if there is a way to avoid managing the streams properly, then the answer is No.
If you are actually encountering a storage leak that is (you think) due to saveAsHadoopFile not closing the streams that it opens, please provide a minimal reproducible example that we can look at. It could be a bug in Hadoop ... or you could be using it incorrectly.
Here is a line reading a file into a List:
List<String> lines =
new BufferedReader(
new InputStreamReader(classLoader.getResourceAsStream(fileName)))
.lines()
.collect(Collectors.toList());
Is this correct or should I assign the BufferedReader to a variable to be able to close it later?
You should always close your resources. Closing may not be a big problem for small programs which only use a couple of files quickly, since most mature OSes will close the files for you when the process completes. However, there are usually limits on how many files you can have open at one time. It is good to be tidy, so that you don't hit those limits when you start writing bigger programs. There are also other types of resources, like network and serial ports, which you may want to let others use once your program is done with them, even if it is still running.
An alternative to closing the file manually is using try-with-resources syntax, which ensures that the file will be closed properly even in case of an error:
List<String> lines;
try(BufferedReader reader = new BufferedReader(
new InputStreamReader(classLoader.getResourceAsStream(fileName)))) {
lines = reader.lines().collect(Collectors.toList());
}
Well, in your concrete example, the stream opened by
classLoader.getResourceAsStream(fileName)
is never closed. This stream must be closed - it is most likely a file handle in the local system. You can close it by closing the BufferedReader, which closes the wrapped InputStreamReader, which closes the underlying InputStream. You could instead also store a reference to the original InputStream and only close this.
Please also have a look into try-with-resources, this could potentially make things easier for you here.
I stand corrected
From documentation:
Streams have a close() method and implement AutoCloseable interface, but nearly all stream instances do not actually need to be closed after use.
Generally, only streams whose source is an IO channel, for example a BufferedReader.lines will require closing.
Most streams are backed by collections, arrays, or generating functions, which require no special resource management. If a stream does require closing, it can be declared as a resource in a try-with-resources statement.
The question says it all.
What are the consequences of not closing the various byte streams?
It is very much emphasized to always do so, but there is no mention of how it causes problems.
Can someone please explain what actually happens?
This is not only byte streams. This concerns anything implementing Closeable.
As the documentation states:
The close method is invoked to release resources that the object is holding (such as open files).
Whether a Closeable holds system resources or not, the rule of thumb is: do not take the chance. .close() it correctly, and you'll be ensured that such system resources (if any) are freed.
Typical idiom (note that InputStream implements Closeable):
final InputStream in = whateverIsNeeded;
try {
workWith(in);
} finally {
in.close();
}
With Java 7 you also have AutoCloseable (which Closeable implements) and the try-with-resources statement, so do:
try (
final InputStream in = whateverIsNeeded;
) {
workWith(in);
}
This will handle closing in for you.
Again: don't take the chance. And if you don't use JDK 7 but can afford Guava, use Closer.
Not closing limited resources such as database connections will dramatically slow down execution, and likely result in errors as those connections run out, with old ones sitting there unused.
Not closing file-streams could result in multiple threads writing to the same file, or files not being terminated properly, or files being locked when another thread attempts to write or read it.
This is a major topic relating to all Closeables, as stated by #fge. There are numerous libraries supplying things such as connection pools and caches for handling problems such as this.
More information:
https://www.google.com/search?q=consequentes+of+not+closing+resources+java
It will hang around util collected by the GC. (thus holding to unmanaged resources (files, sockets etc)
There are several streams including:
ByteArray
File
Filter
Object
Piped
Corba version of the outputStream
Depending on the kind of resource is behind the stream the result could be different. InByteArrayInputStream and ByteArrayOutputStream where the documentation says:
Closing a ByteArrayInputStream has no effect. The methods in this
class can be called after the stream has been closed without
generating an IOException.
But in FileInputStream there is an open file. If you keept it open there is memory reserved and anybody who tries to edit the file will find it locked. In case of doubt always call the close() method.
What are the bad things that can happen when I don't close the stream?
Does the close operation automatically flush?
Are all the streams closed after the program exits?
Thanks in advance.
Bad things that can happen when you don't close your streams:
you can run out of file handles
data that you think is written to disk may still be in the buffer (only)
files might still be locked for other processes (depends on the platform)
...
Yes, close operation always flushes the stream.
All file handles that the OS is aware of are closed. This means effectively that FileOutputStream, FileInputStream and the input/output of a Socket will be closed. But if you wrap a FileOutputStream in a BufferedOutputStream then that BufferedOutputStream will not be known to the OS and won't be closed/flushed on shutdown. So data written to the BufferedOutputStream but not yet flushed to the FileOutputStream can be lost.
1) You tie up system resources unnecessarily (e.g. file descriptors). Possibly to the extent of running out of them.
2) Yes (although you should check the documentation of the particular stream you're interested in to be sure).
3) Yes
To qualify, when you perform a close(), it flushes the data and closes the file handle. If you exit, the file handle is closed but un-flushed data is lost.
I have a question in my mind that, while writing into the file, before closing is done, should we include flush()??. If so what it will do exactly? dont streams auto flush??
EDIT:
So flush what it actually do?
Writers and streams usually buffer some of your output data in memory and try to write it in bigger blocks at a time. flushing will cause an immediate write to disk from the buffer, so if the program crashes that data won't be lost. Of course there's no guarantee, as the disk may not physically write the data immediately, so it could still be lost. But then it wouldn't be the Java program's fault :)
PrintWriters auto-flush (by default) when you write an end-of-line, and of course streams and buffers flush when you close them. Other than that, there's flushing only when the buffer is full.
I would highly recommend to call flush before close. Basically it writes remaining bufferized data into file.
If you call flush explicitly you may be sure that any IOException coming out of close is really catastrophic and related to releasing system resources.
When you flush yourself, you can handle its IOException in the same way as you handle your data write exceptions.
You don't need to do a flush because close() will do it for you.
From the javadoc:
"Close the stream, flushing it first. Once a stream has been closed, further write() or flush() invocations will cause an IOException to be thrown. Closing a previously-closed stream, however, has no effect."
To answer your question as to what flush actually does, it makes sure that anything you have written to the stream - a file in your case - does actually get written to the file there and then.
Java can perform buffering which means that it will hold onto data written in memory until it has a certain amount, and then write it all to the file in one go which is more efficient. The downside of this is that the file is not necessarily up-to-date at any given time. Flush is a way of saying "make the file up-to-date.
Close calls flush first to ensure that after closing the file has what you would expect to see in it, hence as others have pointed out, no need to flush before closing.
Close automatically flushes. You don't need to call it.
There's no point in calling flush() just before a close(), as others have said. The time to use flush() is if you are keeping the file open but want to ensure that previous writes have been fully completed.
As said, you don't usually need to flush.
It only makes sense if, for some reason, you want another process to see the complete contents of a file you're working with, without closing it. For example, it could be used for a file that is concurrently modified by multiple processes, although with a LOT of care :-)
FileWriter is an evil class as it picks up whatever character set happens to be there, rather than taking an explicit charset. Even if you do want the default, be explicit about it.
The usual solution is OutputStreamWriter and FileOutputStream. It is possible for the decorator to throw an exception. Therefore you need to be able to close the stream even if the writer was never constructed. If you are going to do that, you only need to flush the writer (in the happy case) and always close the stream. (Just to be confusing, some decorators, for instance for handling zips, have resources that do require closing.)
Another usecase for flushing in program is writing progress of longrunning job into file (so it can be stopped and restarted later. You want to be sure that data is safe on the drive.
while (true) {
computeStuff();
progresss += 1;
out.write(String.format("%d", progress));
out.flush();
}
out.close();