what is the proper way to close streams on java's exec? - java

After
Runtime.getRuntime().exec(command);
i see syscalls happening that show 2~3 file descriptors (FIFO pipes). What is the proper way to close them with try-with-resource pattern?
Most historical tribal knowledge found on java forums suggest:
# out of date!
... } finally {
IOUtils.closeQuietly(p.getOutputStream());
IOUtils.closeQuietly(p.getInputStream());
IOUtils.closeQuietly(p.getErrorStream());
}
but that doesn't sound right because 1) method closeQuietly is deprecated and most libraries suggest using try-with-resource, 2) it is inelegant as I might not necessarily have all streams.
And simply moving the exec() call into try feels wrong as it is not the resource i will call close() on.

Closing them isn't necessary; the close by themselves when the process dies. If the process never dies, it is also not neccessary: Either you make a new never-dying process every so often in which case your system is going to crash and run out of resources whether you close these or not, or you make it only once, in which case these resources aren't going to count for much. For what it is worth, these are quite lightweight resources, and often they simply cannot be 'closed' in the sense that the resources can be 'freed' - closing them either keeps them open but denies further chat (and sends EOFs where needed), or reroutes them to /dev/null; generally processes just have 3 pipes on em and will continue to have them until the process dies.
Yes, closeQuietly is a silly idea for virtually all purposes, and so it is here. If closing these streams somehow fail you probably don't want to silently ignore that.
If you must close them, the individual streams from these 3 are closable. However, note that you're reading rules of thumb and attempting to apply them as if they are gospel truth. try-with-resources is not always the right answer, and try-with-resources is not a 100% always replacement for close, let alone closeQuietly.
For example, try-with-resources specifically is designed around a period of usage. You declare the span of statements within which the resource should be available (the braces that go with the try block), and the construct will then ensure that the resource is closed only once code flow transitions out of that span of statements, no matter how it exits this. That makes it probably irrelevant here, too!
You are starting a long-lived process and don't care about the in/out. You just want the process to run and to keep running. This means there is no span at all, and you should just call close() on these if somehow you feel it is important to try to save the resources even though most likely this accomplishes nothing at all. No span-of-statements means try-with-resources isn't right.
You are starting a short-lived process that you interact with. The right thing to 'close' is the process itself, except you can't use try-with-resources for that. That can only be used on auto-closables. (resources where the class that represents them implement AutoClosable. Most do, some don't. Lock is a famous one. Process is another: To 'close' it, you invoke destroy() or even destroyForcibly(). You cannot use try-with-resources (not without ugly hacks that defeats the purpose) to do this! Once you close/destroy the process, the streams that went along with them are dead too.
More generally the principle is: If you create it, you close it. If you never call getOutputStream() you never created them. On some OSes, fetching these streams and then closing them wastes more resources than not doing this. Thus, if the argument is based on some sort of purity model, then you shouldn't close them either. If it's based on pragmatics, you'd have to test how heavy these resources really are (most likely, extremely light), whether closing them actually saves you some pipes (most likely, it will not), and whether close()-ing the result of invoking getOutputStream() on the process even helps if the answers to the above questions make that relevant (it probably will, but the spec does not guarantee this).

They are very light processes that in almost every case don't require closing...

Related

Is there any harm in failing to close a file when a Java program terminates?

When my program starts, it opens a file and writes to it periodically. (It's not a log file; it's one of the outputs of the program.) I need to have the file available for the length of the program, but I don't need to do anything in particular to end the file; just close it.
I gather that for file I/O in Java I'm supposed to implement AutoCloseable and wrap it in a try-with-resources block. However, because this file is long-lived, and it's one of a few outputs of the program, I'm finding it hard to organize things such that all the files I open are wrapped in try-with-resources blocks. Furthermore, the top-level classes (where my main() function lies) don't know about this file.
Here's my code; note the lack of writer.close():
public class WorkRecorder {
public WorkRecorder(String recorderFile) throws FileNotFoundException {
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(recorderFile)));
}
private Writer writer;
public void record(Data data) throws Exception {
// format Data object to match expected file format
// ...
writer.write(event.toString());
writer.write(System.lineSeparator());
writer.flush();
}
}
tl;dr do I need to implement AutoCloseable and call writer.close() if the resource is an opened output file, and I never need to close it until the program is done? Can I assume the JVM and the OS (Linux) will clean things up for me automatically?
Bonus (?): I struggled with this in C#'s IDisposeable too. The using block, like Java's try-with-resources construct, is a nice feature when I have something that I'm going to open, do something with quickly, and close right away. But often that's not the case, particularly with files, when the access to that resource hangs around for a while, or when needing to manage multiple such resources. If the answer to my question is "always use try-with-resources blocks" I'm stuck again.
I have similar code that doesn't lend itself to being wrapped in a try-with-resources statement. I think that is fine, as long as you close it when the program is done.
Just make sure you account for any Exceptions that may happen. For example, in my program, there is a cleanup() method that gets called when the program is shut down. This calls writer.close(). This is also called if there is any abnormal behavior that would cause the program to shut down.
If this is just a simple program, and you're expecting the Writer to be open for its duration, I don't think it's really a big deal for it to not be closed when the program terminates...but it is good practice to make sure your resources are closed, so I would go ahead and add that to wherever your program may shut down.
You should always close resources or set them to null so it can be picked up by the garbage collector in Java. Using try-with-resource blocks is a great way to have Java automatically close resources when you're done with them. Even if you use it for the duration of the program, it is good programming practice to close it even at the end. Some might say you don't need to, I personally would say just go ahead and do it and here's why:
"When a stream is no longer needed, always close it using the close() method or automatically close it using a try-with-resource statement. Not closing streams may cause data corruption in the output file, or other programming errors."
-Introduction to Java Programming 10th Edition, Y. Daniel Liang
If possible, just run the .close() method on the resource at the very end of the program.
I (now) think a better answer is "It depends" :-). A detailed treatment is provided by Lukas Eder here. Also check out the Lambda EG group post.
But in general, it's a good idea to return the resource back to the operating system when you are done with it and use try-with-resources all the time (except when you know what you are doing).

Is Opening Java FileOutputStream efficient?

I'm writing a singleton logger for my program right now, and I was wondering whether it would be better to open and close it everytime I log something, or to open the stream at creation of the singleton and close it at the termination of the program? And if I were to do that, how would I close it at termination?
The main advantage of opening the file once is performance. You save yourself the penalty of calling an open each time, and seek to the end of the file for appending; this get worse if the file is big (and some logs tend to be).
The cons are:
You might not be able to read the last log line inmmediately, if there is some buffering in the writer (delayed writes). Howeever, this can be fixed by flushing after each write (you might lose some performance, but this is not usually relevant).
You cannot simultaneously write to the same log from different processes. But you probably don't need this - and if you need it, the open-and-close solution still needs to deal with concurrency.
Some external log processing (typically, log rotation with renaming) becomes problematic. To allow for this, you might need to implement some signalling that closes and reopens the file.
Typically, the advantage outweights the cons, so the general rule is to keep the log file open. But that depends on the scenario.
(As other answers point out, normally you'd prefer to use some standard logging library instead of implementing this on your own. But it's instructive to give it a try, or at least to think of all the issues involved).
Do not close it, just flush, this is what Log4j FileAppender does by default.
You should open once (and close once). If you do nothing, Java will close it for you. You may prefer to explicitly override Object.finalize().

Closing class IO resources in overridden finalize() method

If I have a class that utilizes an IO resource, such as a disk flat file, DB, or some other form of external resource, what are pros and cons of closing those streams/connections in an overridden finalize() method to be run by GC? I though this could leverage the existing JVM GC and reduce the exposure to the risk of relying on the client to invoke a class method called something like closeResources() as well as writing spaghetti-like try-catches (nested try-catches and ifs being my least favorite programming constructs).
As a concrete example, I have a simple file reading wrapper. The class is constructed with String filePath, it reads the file into a List<String[]> . I don't wan't to have to close the BufferedReader in multiple places like close it if there is a problem opening the file (catch clause) but also close it if the file reads fine etc. I want to put it in one place and make sure it is ALWAYS closed no matter what when the object gets GC.
Is this approach a good practice or am I trying to afford myself too high level a convenience within the scope of Java?
This is not a great idea as the finalize() method is not guaranteed to be called.
It's easier and better to just close the resources when your code is done with them.
If you hate writing the nested try-finally blocks to close the resources correctly, use something like commons-io's IOUtils to silently close the resources (or write your own simple util method to silently close them):
InputStream stream = ...;
try {
...
}
finally {
IOUtils.closeQuietly(stream);
}
When the IO resource is an instance variable, then you should close it in the finalize() method.
Why ?
Because beeing an instance variable, you need it in an open state because some method will be using it repeated times.
If you close it in a method other than finalize, then you are creating a temporal coupling, meaning the class user needs to know that he has to call certain methods in a certain temporal order, i.e, A before B etc.
EDIT:
Java documentation states that garbage collector is not guaranteed to run at any specific time, and will not run finalize() as long as there's any references to the object. If references linger, it's a memory leak, a programming error. finalize() is the best option when the resource is not local to a method. If the resource is local to a method, then close it in the finally end of a try/cath block.
Yes, finally block is always the best approach to release the resources such as connection, I/O Sreams etc.

Is FileOutStream.write(byte[]) always blocking?

I wondered if FileOutputStream.write(byte[]) is always blocking the current thread, leading to a ThreadContext switch, or can it be that this operation does not block if the OS buffers are large enought to handle the bytes.
The reason for these thoughts are, I wondered if the logging I do with log4j in my application is a real performance hit, and if it would be faster to use a Queue of logging messages which is read by a separate thread and written to the logfiles (I know the disadvantages of swallowed logging statement if the app quits and the statements in the queue are not flushed to disk).
No, I didn't profile it yet, these are rather conceptual thoughts.
Need not be.
FileOutputStream.write(byte[]) is a native method. Common sense would suggest that write() may just write to the internal buffers, and a later call to flush() would actually commit it.
You can use the log4j org.apache.log4j.AsyncAppender and logging calls will not block. The actual logging is done in another thread so you won't need to worry about calls to log4j not returning in a timely manner.
By default immediateFlush is enabled which means that logging is slower but ensures that each append request is actually written out. You can set this to false if you don't care whether or not the last lines are written out if your application crashes.
log4j.appender.R.ImmediateFlush=false
Also, take a look at this post on Log4j: Performance Tips, in which the author has got some test stats on using immediateFlush, bufferedIO and asyncAppender. He concludes, that for local logging "set immediateFlush=false, and leave bufferedIO at the default of don't buffer" and that "asycAppender actually takes longer than normal non-asyc".
It's likely going to depend on the OS, drivers and underlying file system. If write caching is enabled for example it'll probably return right away. I've seen gigabytes/day of logs written synchronously without affecting performance too much, as long as IO isn't bottlenecked. It's still probably worth writing them asynchronously if you're concerned about response times. And it eliminates potential future issues, e.g. if you changed to writing to network drive and the network has issues.

Separate threads for socket input and output

I got assigned to work on some performance and random crashing issues of a multi-threaded java server. Even though threads and thread-safety are not really new topics for me, I found out designing a new multi-threaded application is probably half as difficult as trying to tweak some legacy code. I skimmed through some well known books in search of answers, but the weird thing is, as long as I read about it and analyze the examples provided, everything seems clear. However, the second I look at the code I'm supposed to work on, I'm no longer sure about anything! Must be too much of theoretical knowledge and little real-world experience or something.
Anyway, getting back on topic, as I was doing some on-line research, I came across this piece of code. The question which keeps bothering me is: Is it really safe to invoke getInputStream() and getOutputStream() on the socket from two separate threads without synchronization? Or am I now getting a bit too paranoid about the whole thread-safety issue? Guess that's what happens when like the 5th book in a row tells you how many things can possibly go wrong with concurrency.
PS. Sorry if the question is a bit lengthy or maybe too 'noobie', please be easy on me - that's my first post here.
Edit: Just to be clear, I know sockets work in full-duplex mode and it's safe to concurrently use their input and output streams. Seems fine to me when you acquire those references in the main thread and then initialize thread objects with those, but is it also safe to get those streams in two different threads?
#rsp:
So I've checked Sun's code and PlainSocketImpl does synchronize on those two methods, just as you said. Socket, however, doesn't. getInputStream() and getOutputStream() are pretty much just wrappers for SocketImpl, so probably concurrency issues wouldn't cause the whole server to explode. Still, with a bit of unlucky timing, seems like things could go wrong (e.g. when some other thread closes the socket when the method already checked for error conditions).
As you pointed out, from a code structure standpoint, it would be a good idea to supply each thread with a stream reference instead of a whole socket. I would've probably already restructured the code I'm working on if not for the fact that each thread also uses socket's close() method (e.g. when the socket receives "shutdown" command). As far as I can tell, the main purpose of those threads is to queue messages for sending or for processing, so maybe it's a Single Responsibility Principle violation and those threads shouldn't be able to close the socket (compare with Separated Modem Interface)? But then if I keep analysing the code for too long, it appears the design is generally flawed and the whole thing requires rewriting. Even if the management was willing to pay the price, seriously refactoring legacy code, having no unit tests what so ever and dealing with a hard to debug concurrency issues, would probably do more harm than good. Wouldn't it?
The input stream and output stream of the socket represent two separate datastreams or channels. It is perfectly save using both streams in threads that are not synchronised between them. The socket streams themselves will block reading and writing on empty or full buffers.
Edit: the socket implementation classes from Sun do sychronize the getInputStream() and getOutputStream() methods, calling then from different threads should be OK. I agree with you however that passing the streams to the threads using them might make more sense from a code structure standpoint (dependency injection helps testing for instance.)

Categories