Is Opening Java FileOutputStream efficient?

Is Opening Java FileOutputStream efficient? - java

I'm writing a singleton logger for my program right now, and I was wondering whether it would be better to open and close it everytime I log something, or to open the stream at creation of the singleton and close it at the termination of the program? And if I were to do that, how would I close it at termination?

The main advantage of opening the file once is performance. You save yourself the penalty of calling an open each time, and seek to the end of the file for appending; this get worse if the file is big (and some logs tend to be).
The cons are:
You might not be able to read the last log line inmmediately, if there is some buffering in the writer (delayed writes). Howeever, this can be fixed by flushing after each write (you might lose some performance, but this is not usually relevant).
You cannot simultaneously write to the same log from different processes. But you probably don't need this - and if you need it, the open-and-close solution still needs to deal with concurrency.
Some external log processing (typically, log rotation with renaming) becomes problematic. To allow for this, you might need to implement some signalling that closes and reopens the file.
Typically, the advantage outweights the cons, so the general rule is to keep the log file open. But that depends on the scenario.
(As other answers point out, normally you'd prefer to use some standard logging library instead of implementing this on your own. But it's instructive to give it a try, or at least to think of all the issues involved).

Do not close it, just flush, this is what Log4j FileAppender does by default.

You should open once (and close once). If you do nothing, Java will close it for you. You may prefer to explicitly override Object.finalize().

Related

what is the proper way to close streams on java's exec?

After
Runtime.getRuntime().exec(command);
i see syscalls happening that show 2~3 file descriptors (FIFO pipes). What is the proper way to close them with try-with-resource pattern?
Most historical tribal knowledge found on java forums suggest:
# out of date!
... } finally {
IOUtils.closeQuietly(p.getOutputStream());
IOUtils.closeQuietly(p.getInputStream());
IOUtils.closeQuietly(p.getErrorStream());
}
but that doesn't sound right because 1) method closeQuietly is deprecated and most libraries suggest using try-with-resource, 2) it is inelegant as I might not necessarily have all streams.
And simply moving the exec() call into try feels wrong as it is not the resource i will call close() on.

Closing them isn't necessary; the close by themselves when the process dies. If the process never dies, it is also not neccessary: Either you make a new never-dying process every so often in which case your system is going to crash and run out of resources whether you close these or not, or you make it only once, in which case these resources aren't going to count for much. For what it is worth, these are quite lightweight resources, and often they simply cannot be 'closed' in the sense that the resources can be 'freed' - closing them either keeps them open but denies further chat (and sends EOFs where needed), or reroutes them to /dev/null; generally processes just have 3 pipes on em and will continue to have them until the process dies.
Yes, closeQuietly is a silly idea for virtually all purposes, and so it is here. If closing these streams somehow fail you probably don't want to silently ignore that.
If you must close them, the individual streams from these 3 are closable. However, note that you're reading rules of thumb and attempting to apply them as if they are gospel truth. try-with-resources is not always the right answer, and try-with-resources is not a 100% always replacement for close, let alone closeQuietly.
For example, try-with-resources specifically is designed around a period of usage. You declare the span of statements within which the resource should be available (the braces that go with the try block), and the construct will then ensure that the resource is closed only once code flow transitions out of that span of statements, no matter how it exits this. That makes it probably irrelevant here, too!
You are starting a long-lived process and don't care about the in/out. You just want the process to run and to keep running. This means there is no span at all, and you should just call close() on these if somehow you feel it is important to try to save the resources even though most likely this accomplishes nothing at all. No span-of-statements means try-with-resources isn't right.
You are starting a short-lived process that you interact with. The right thing to 'close' is the process itself, except you can't use try-with-resources for that. That can only be used on auto-closables. (resources where the class that represents them implement AutoClosable. Most do, some don't. Lock is a famous one. Process is another: To 'close' it, you invoke destroy() or even destroyForcibly(). You cannot use try-with-resources (not without ugly hacks that defeats the purpose) to do this! Once you close/destroy the process, the streams that went along with them are dead too.
More generally the principle is: If you create it, you close it. If you never call getOutputStream() you never created them. On some OSes, fetching these streams and then closing them wastes more resources than not doing this. Thus, if the argument is based on some sort of purity model, then you shouldn't close them either. If it's based on pragmatics, you'd have to test how heavy these resources really are (most likely, extremely light), whether closing them actually saves you some pipes (most likely, it will not), and whether close()-ing the result of invoking getOutputStream() on the process even helps if the answers to the above questions make that relevant (it probably will, but the spec does not guarantee this).

They are very light processes that in almost every case don't require closing...

Is there any harm in failing to close a file when a Java program terminates?

When my program starts, it opens a file and writes to it periodically. (It's not a log file; it's one of the outputs of the program.) I need to have the file available for the length of the program, but I don't need to do anything in particular to end the file; just close it.
I gather that for file I/O in Java I'm supposed to implement AutoCloseable and wrap it in a try-with-resources block. However, because this file is long-lived, and it's one of a few outputs of the program, I'm finding it hard to organize things such that all the files I open are wrapped in try-with-resources blocks. Furthermore, the top-level classes (where my main() function lies) don't know about this file.
Here's my code; note the lack of writer.close():
public class WorkRecorder {
public WorkRecorder(String recorderFile) throws FileNotFoundException {
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(recorderFile)));
}
private Writer writer;
public void record(Data data) throws Exception {
// format Data object to match expected file format
// ...
writer.write(event.toString());
writer.write(System.lineSeparator());
writer.flush();
}
}
tl;dr do I need to implement AutoCloseable and call writer.close() if the resource is an opened output file, and I never need to close it until the program is done? Can I assume the JVM and the OS (Linux) will clean things up for me automatically?
Bonus (?): I struggled with this in C#'s IDisposeable too. The using block, like Java's try-with-resources construct, is a nice feature when I have something that I'm going to open, do something with quickly, and close right away. But often that's not the case, particularly with files, when the access to that resource hangs around for a while, or when needing to manage multiple such resources. If the answer to my question is "always use try-with-resources blocks" I'm stuck again.

I have similar code that doesn't lend itself to being wrapped in a try-with-resources statement. I think that is fine, as long as you close it when the program is done.
Just make sure you account for any Exceptions that may happen. For example, in my program, there is a cleanup() method that gets called when the program is shut down. This calls writer.close(). This is also called if there is any abnormal behavior that would cause the program to shut down.
If this is just a simple program, and you're expecting the Writer to be open for its duration, I don't think it's really a big deal for it to not be closed when the program terminates...but it is good practice to make sure your resources are closed, so I would go ahead and add that to wherever your program may shut down.

You should always close resources or set them to null so it can be picked up by the garbage collector in Java. Using try-with-resource blocks is a great way to have Java automatically close resources when you're done with them. Even if you use it for the duration of the program, it is good programming practice to close it even at the end. Some might say you don't need to, I personally would say just go ahead and do it and here's why:
"When a stream is no longer needed, always close it using the close() method or automatically close it using a try-with-resource statement. Not closing streams may cause data corruption in the output file, or other programming errors."
-Introduction to Java Programming 10th Edition, Y. Daniel Liang
If possible, just run the .close() method on the resource at the very end of the program.

I (now) think a better answer is "It depends" :-). A detailed treatment is provided by Lukas Eder here. Also check out the Lambda EG group post.
But in general, it's a good idea to return the resource back to the operating system when you are done with it and use try-with-resources all the time (except when you know what you are doing).

Is FileOutStream.write(byte[]) always blocking?

I wondered if FileOutputStream.write(byte[]) is always blocking the current thread, leading to a ThreadContext switch, or can it be that this operation does not block if the OS buffers are large enought to handle the bytes.
The reason for these thoughts are, I wondered if the logging I do with log4j in my application is a real performance hit, and if it would be faster to use a Queue of logging messages which is read by a separate thread and written to the logfiles (I know the disadvantages of swallowed logging statement if the app quits and the statements in the queue are not flushed to disk).
No, I didn't profile it yet, these are rather conceptual thoughts.

Need not be.
FileOutputStream.write(byte[]) is a native method. Common sense would suggest that write() may just write to the internal buffers, and a later call to flush() would actually commit it.

You can use the log4j org.apache.log4j.AsyncAppender and logging calls will not block. The actual logging is done in another thread so you won't need to worry about calls to log4j not returning in a timely manner.

By default immediateFlush is enabled which means that logging is slower but ensures that each append request is actually written out. You can set this to false if you don't care whether or not the last lines are written out if your application crashes.
log4j.appender.R.ImmediateFlush=false
Also, take a look at this post on Log4j: Performance Tips, in which the author has got some test stats on using immediateFlush, bufferedIO and asyncAppender. He concludes, that for local logging "set immediateFlush=false, and leave bufferedIO at the default of don't buffer" and that "asycAppender actually takes longer than normal non-asyc".

It's likely going to depend on the OS, drivers and underlying file system. If write caching is enabled for example it'll probably return right away. I've seen gigabytes/day of logs written synchronously without affecting performance too much, as long as IO isn't bottlenecked. It's still probably worth writing them asynchronously if you're concerned about response times. And it eliminates potential future issues, e.g. if you changed to writing to network drive and the network has issues.

flush in java.io.FileWriter

I have a question in my mind that, while writing into the file, before closing is done, should we include flush()??. If so what it will do exactly? dont streams auto flush??
EDIT:
So flush what it actually do?

Writers and streams usually buffer some of your output data in memory and try to write it in bigger blocks at a time. flushing will cause an immediate write to disk from the buffer, so if the program crashes that data won't be lost. Of course there's no guarantee, as the disk may not physically write the data immediately, so it could still be lost. But then it wouldn't be the Java program's fault :)
PrintWriters auto-flush (by default) when you write an end-of-line, and of course streams and buffers flush when you close them. Other than that, there's flushing only when the buffer is full.

I would highly recommend to call flush before close. Basically it writes remaining bufferized data into file.
If you call flush explicitly you may be sure that any IOException coming out of close is really catastrophic and related to releasing system resources.
When you flush yourself, you can handle its IOException in the same way as you handle your data write exceptions.

You don't need to do a flush because close() will do it for you.
From the javadoc:
"Close the stream, flushing it first. Once a stream has been closed, further write() or flush() invocations will cause an IOException to be thrown. Closing a previously-closed stream, however, has no effect."

To answer your question as to what flush actually does, it makes sure that anything you have written to the stream - a file in your case - does actually get written to the file there and then.
Java can perform buffering which means that it will hold onto data written in memory until it has a certain amount, and then write it all to the file in one go which is more efficient. The downside of this is that the file is not necessarily up-to-date at any given time. Flush is a way of saying "make the file up-to-date.
Close calls flush first to ensure that after closing the file has what you would expect to see in it, hence as others have pointed out, no need to flush before closing.

Close automatically flushes. You don't need to call it.

There's no point in calling flush() just before a close(), as others have said. The time to use flush() is if you are keeping the file open but want to ensure that previous writes have been fully completed.

As said, you don't usually need to flush.
It only makes sense if, for some reason, you want another process to see the complete contents of a file you're working with, without closing it. For example, it could be used for a file that is concurrently modified by multiple processes, although with a LOT of care :-)

FileWriter is an evil class as it picks up whatever character set happens to be there, rather than taking an explicit charset. Even if you do want the default, be explicit about it.
The usual solution is OutputStreamWriter and FileOutputStream. It is possible for the decorator to throw an exception. Therefore you need to be able to close the stream even if the writer was never constructed. If you are going to do that, you only need to flush the writer (in the happy case) and always close the stream. (Just to be confusing, some decorators, for instance for handling zips, have resources that do require closing.)

Another usecase for flushing in program is writing progress of longrunning job into file (so it can be stopped and restarted later. You want to be sure that data is safe on the drive.
while (true) {
computeStuff();
progresss += 1;
out.write(String.format("%d", progress));
out.flush();
}
out.close();

Java: What's the reason behind System.out.println() being that slow?

For small logical programs that can be done in a text editor, for tracing I use the classic System.out.println().
I guess you all know how frustrating it is to use that in a block of high number of iterations. Why is it so slow? What's the reason behind it?

This has nothing whatsoever to do with the JVM. Printing text to screen simply involves a lot of work for the OS in drawing the letters and especially scrolling. If you redirect System.out to a file, it will be much faster.

This is very OS-dependent. For example, in Windows, writing to the console is a blocking operation, and it's also slow, and so writing lots of data to the console slows down (or blocks) your application. In unix-type OSes, writing to the console is buffered, so your app can continue unblocked, and the console will catch up as it can.

Ya, there is a huge amount of overhead in writing to the console. Far greater than that required to write to a file or a socket. Also if there are a large number of threads they are all contending on the same lock. I would recommend using something other that System.out.println to trace.

This has nothing to do with Java and JVM but with the console terminal. In most OSes I know writing in the console output is slow.

Buffering can help enormously. Try this:
System.setOut( new PrintStream(new BufferedOutputStream(System.out)) );
But beware: you won't see the output appear gradually, but all in a flash. Which is great, but if you're using it for debugging, and the program crashes before it terminates, in some circumstances it's possible you won't see the text printed just before the crash.
This is because the buffer wasn't flushed before the crash. It was printed, but it's still in the buffer, and didn't make it out to the console where you can see it. I remember this happening to me, in a puzzling debug session. It's best to occasionally flush explicitly, to make sure you see it:
System.out.flush();

Some terminals just are faster than others. This may vary even within one operating system.

This might look as it doesn't directly answer your question, but my advice is to never use System.out for tracing ( if you mean by that a kind of debugging, in order just to see the advance of your app )
The problems with System.out for debugging are several :
once the app ends, when you close the console you'll loose the log
you'll have to remove those statements once your app is working properly ( or comment them ). Later if you want to reactivate them, you'll have to uncomment/comment again ... tedious
I recommend instead to use log4j and "watch" the log file, either with a tail command - there's also a Tail for Windows - either with an Eclipse plugin like LogWatcher.

One interesting thing I've noticed about writing to the terminal (at least in Windows). Is it actually runs much much faster if the window is minimized. This is definitely closely tied with Michael Borgwardt's answer about drawing and scrolling. Really if you're logging enough to notice the slowdown you're probably better off writing to a file.

The slowness is due the large amount of Java-Native transitions which happen on every line break or flush. If the iteration has to many steps, the System.out.println() isn't much of a help. If the iteration steps are not that important by themselves, you may call System.out.println() on every 10 or 100 steps only. You can also wrap the System.out into a BufferedOutputStream. And of course there is always the option to asynchronize the print via ExecutorService.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.