Is FileOutStream.write(byte[]) always blocking?

Is FileOutStream.write(byte[]) always blocking? - java

I wondered if FileOutputStream.write(byte[]) is always blocking the current thread, leading to a ThreadContext switch, or can it be that this operation does not block if the OS buffers are large enought to handle the bytes.
The reason for these thoughts are, I wondered if the logging I do with log4j in my application is a real performance hit, and if it would be faster to use a Queue of logging messages which is read by a separate thread and written to the logfiles (I know the disadvantages of swallowed logging statement if the app quits and the statements in the queue are not flushed to disk).
No, I didn't profile it yet, these are rather conceptual thoughts.

Need not be.
FileOutputStream.write(byte[]) is a native method. Common sense would suggest that write() may just write to the internal buffers, and a later call to flush() would actually commit it.

You can use the log4j org.apache.log4j.AsyncAppender and logging calls will not block. The actual logging is done in another thread so you won't need to worry about calls to log4j not returning in a timely manner.

By default immediateFlush is enabled which means that logging is slower but ensures that each append request is actually written out. You can set this to false if you don't care whether or not the last lines are written out if your application crashes.
log4j.appender.R.ImmediateFlush=false
Also, take a look at this post on Log4j: Performance Tips, in which the author has got some test stats on using immediateFlush, bufferedIO and asyncAppender. He concludes, that for local logging "set immediateFlush=false, and leave bufferedIO at the default of don't buffer" and that "asycAppender actually takes longer than normal non-asyc".

It's likely going to depend on the OS, drivers and underlying file system. If write caching is enabled for example it'll probably return right away. I've seen gigabytes/day of logs written synchronously without affecting performance too much, as long as IO isn't bottlenecked. It's still probably worth writing them asynchronously if you're concerned about response times. And it eliminates potential future issues, e.g. if you changed to writing to network drive and the network has issues.

Related

what is the proper way to close streams on java's exec?

After
Runtime.getRuntime().exec(command);
i see syscalls happening that show 2~3 file descriptors (FIFO pipes). What is the proper way to close them with try-with-resource pattern?
Most historical tribal knowledge found on java forums suggest:
# out of date!
... } finally {
IOUtils.closeQuietly(p.getOutputStream());
IOUtils.closeQuietly(p.getInputStream());
IOUtils.closeQuietly(p.getErrorStream());
}
but that doesn't sound right because 1) method closeQuietly is deprecated and most libraries suggest using try-with-resource, 2) it is inelegant as I might not necessarily have all streams.
And simply moving the exec() call into try feels wrong as it is not the resource i will call close() on.

Closing them isn't necessary; the close by themselves when the process dies. If the process never dies, it is also not neccessary: Either you make a new never-dying process every so often in which case your system is going to crash and run out of resources whether you close these or not, or you make it only once, in which case these resources aren't going to count for much. For what it is worth, these are quite lightweight resources, and often they simply cannot be 'closed' in the sense that the resources can be 'freed' - closing them either keeps them open but denies further chat (and sends EOFs where needed), or reroutes them to /dev/null; generally processes just have 3 pipes on em and will continue to have them until the process dies.
Yes, closeQuietly is a silly idea for virtually all purposes, and so it is here. If closing these streams somehow fail you probably don't want to silently ignore that.
If you must close them, the individual streams from these 3 are closable. However, note that you're reading rules of thumb and attempting to apply them as if they are gospel truth. try-with-resources is not always the right answer, and try-with-resources is not a 100% always replacement for close, let alone closeQuietly.
For example, try-with-resources specifically is designed around a period of usage. You declare the span of statements within which the resource should be available (the braces that go with the try block), and the construct will then ensure that the resource is closed only once code flow transitions out of that span of statements, no matter how it exits this. That makes it probably irrelevant here, too!
You are starting a long-lived process and don't care about the in/out. You just want the process to run and to keep running. This means there is no span at all, and you should just call close() on these if somehow you feel it is important to try to save the resources even though most likely this accomplishes nothing at all. No span-of-statements means try-with-resources isn't right.
You are starting a short-lived process that you interact with. The right thing to 'close' is the process itself, except you can't use try-with-resources for that. That can only be used on auto-closables. (resources where the class that represents them implement AutoClosable. Most do, some don't. Lock is a famous one. Process is another: To 'close' it, you invoke destroy() or even destroyForcibly(). You cannot use try-with-resources (not without ugly hacks that defeats the purpose) to do this! Once you close/destroy the process, the streams that went along with them are dead too.
More generally the principle is: If you create it, you close it. If you never call getOutputStream() you never created them. On some OSes, fetching these streams and then closing them wastes more resources than not doing this. Thus, if the argument is based on some sort of purity model, then you shouldn't close them either. If it's based on pragmatics, you'd have to test how heavy these resources really are (most likely, extremely light), whether closing them actually saves you some pipes (most likely, it will not), and whether close()-ing the result of invoking getOutputStream() on the process even helps if the answers to the above questions make that relevant (it probably will, but the spec does not guarantee this).

They are very light processes that in almost every case don't require closing...

How do logging frameworks like Log4j guarantee log statement ordering?

This question has been bugging me for a while, how do popular logging frameworks like Log4j which allow concurrent, async logging order guarantee of log order without performance bottlenecks, i.e if log statement L1 was invoked before log statement L2, L1 is guaranteed to be in the log file before L2.
I know Log4j2 uses a ring buffer and sequence numbers, but it still isn't intuitive how this solves the problem.
Could anyone give an intuitive explanation or point me to a resource doing the same?

This all depends on what you mean by "logging order". When talking about a single thread the logging order is preserved because each logging call results in a write.
When logging asynchronously each log event is added to a queue in the order it was received and is processed in First-in/First-out order, regardless of how it got there. This isn't really very challenging because the writer is single-threaded.
However, if you are talking about logging order across threads, that is never guaranteed - even when logging synchronously - because it can't be. Thread 1 could start to log before Thread 2 but thread 2 could get to the synchronization point in the write ahead of thread 1. Likewise, the same could occur when adding events to the queue. Locking the logging call in the logging method would preserve order, but for little to no benefit and with disastrous performance consequences.
In a multi-threaded environment it is entirely possible that you might see logging events where the timestamp is out of order because Thread 1 resolved the timestamp, was interrupted by thread 2 which then resolved the timestamp and logged the event. However, if you write your logs to something like ElasticSearch you would never notice since it orders them by timestmap.

Does log.debug decrease performance

I want to write some logs at the debug log which will not be available in the production logs which has info log level. So how will this extra debug logs affect the performance? I mean if we set the log level at INFO, the logger has to check what the log level is and find that the log.debug needto be ignored.
So does this extra log level checking affect performance?
Is there any automagical way of removing the log.debug() statements while deployment? I mean during development time the log.debug will be there and we can debug. But during production deployment time the automagical mechanism will remove all log.debug() messages. I am not sure whether these are possible.

So how will this extra debug logs affect the performance?
It affects the performance of the application as loggers are disc I/O calls (assuming you are writing to file system) and DEBUG log level is strictly NOT recommended for production environments.
Is there any automagical way of removing the log.debug() statements
while deployment?
No, there is no magical way of removing the log.debug() statements, BUT when you set the logging level to INFO, then as long as you are NOT doing heavy computations while passing the parameters to the debug() method, it should be fine. For example, if you have logger level set to INFO and assume you have got the below two loggers in your code:
logger.debug(" Entry:: "); //this logger is fine, no calculations
//Below logger, you are doing computations to print i.e., calling to String methods
logger.debug(" Entry : product:"+product+" dept:"+dept);//overhead toString() calls
I recommend to use slf4j so that you can avoid the second logger computations overhead by using {} (which replaces with actual values using it's MessageFormatter) as shown below:
//Below logger product and dept toString() NOT invoked
logger.debug(" Entry : product:{} dept{}", product, dept);
One more important point is that with slf4j is just an abstraction and you can switch between any logging frameworks, you can look below text taken from here.
The Simple Logging Facade for Java (SLF4J) serves as a simple facade
or abstraction for various logging frameworks (e.g. java.util.logging,
logback, log4j) allowing the end user to plug in the desired logging
framework at deployment time.

You can wrap your "debug" statements in a call to isDebugEnabled()
if (log.isDebugEnabled()) {
log.debug("my debug statement");
}
Likewise, wrap your "info" statements in a call to isInfoEnabled() etc.
The idea behind doing this is that checking whether a logging level is enabled is an inexpensive (fixed cost) operation. The cost to generate the statement that is being logged will vary depending on what you are doing.

You can minimize this by how you write your logging statements. If you write
Object a = ....
log.debug("I have an a: " + a);
then regardless of the logging framework you're using the argument has to get evaluated before the debug function gets run. That means that even if you're at INFO level, you're paying the performance cost of calling toString on a and building the argument string. If you instead write e.g. (depending on what formatting your logging framework uses, this works in log4j and slf4j)
log.debug("I have an a: {}", a);
you don't pay this cost but only the cost of the logger checking whether or not you're in DEBUG mode - unless you need it you don't pay for the argument evaluation.
The other thing to check is that you're buffering output (again, in slf4j, there are buffering appenders) which will minimize the writes.

Another technique that I'd like to point out, often used in Android development, is that you can post-process your jar to remove calls such as debug. The tool used is usually proguard. If you define the call as side-effect free, it can be removed by the optimizer ensuring pretty much zero performance penalty.... it should even be smart enough to optimize away any string construction you were doing for the log message.
https://www.guardsquare.com/en/proguard/manual/usage#assumenosideeffects

The overhead of checking the logging level is very less, almost negligible. You will see a significant impact on performance when you enable debug logs. The impact would depend on how much you data you write to the logs, your storage(if your storage is an SSD the performance hit is lesser compared to the performance hit you get using a normal disk), how many threads write to log (Since only one thread can write to a file at once all the other threads have to wait and it is a sequential process). I have mentioned three but there are more factors that decide how much impact logging will have on application performance.
To answer your second question there is no automatic way to remove debug statements from your code.

Is Opening Java FileOutputStream efficient?

I'm writing a singleton logger for my program right now, and I was wondering whether it would be better to open and close it everytime I log something, or to open the stream at creation of the singleton and close it at the termination of the program? And if I were to do that, how would I close it at termination?

The main advantage of opening the file once is performance. You save yourself the penalty of calling an open each time, and seek to the end of the file for appending; this get worse if the file is big (and some logs tend to be).
The cons are:
You might not be able to read the last log line inmmediately, if there is some buffering in the writer (delayed writes). Howeever, this can be fixed by flushing after each write (you might lose some performance, but this is not usually relevant).
You cannot simultaneously write to the same log from different processes. But you probably don't need this - and if you need it, the open-and-close solution still needs to deal with concurrency.
Some external log processing (typically, log rotation with renaming) becomes problematic. To allow for this, you might need to implement some signalling that closes and reopens the file.
Typically, the advantage outweights the cons, so the general rule is to keep the log file open. But that depends on the scenario.
(As other answers point out, normally you'd prefer to use some standard logging library instead of implementing this on your own. But it's instructive to give it a try, or at least to think of all the issues involved).

Do not close it, just flush, this is what Log4j FileAppender does by default.

You should open once (and close once). If you do nothing, Java will close it for you. You may prefer to explicitly override Object.finalize().

Java synchronization performance

I would like opinion on this to settle a small dispute. Any help would be greatly appreciated.
I have written my own file handler that is attached to the logger. This being a file handler and being accessed by multiple threads, I am using synchronization in order to ensure that there is no collision during the writing process. Additionally it is a rolling log, so I also close and open files, and do not want any problems there either.
His response to it was (as pasted from email)
I strongly believe that Synchronization is very bad in the Handler. It
is too complex for such easy task. So, I would say why do not use one
instance per Thread?
What would you say is better from performance's and memory management perspective.
Thank you very much for any response. Whenever writing and reading is involved in multithreaded applications I have used synchronization on java applications all my life, and have not heard of any severe performance issues.
So please I would like to know if there are any issues and I really should switch to one instance per thread.
And in general, what would be the downfall of using synchronization?
EDIT: the reason why I wrote a custom file handler (yes I do love slf4j), is because my custom handler is dealing with two files at once, and additionally I have few other functions I perform on top of writing to files.

another solution would be to use a separate thread to do the (costly on its own) writing and use concurrent queues to pass the log messages from the domain threads
the key part here is that pushing to a queue is much less costly that writing to a file and means that there is less interference from concurrent log calls
the call to log would then log like
private static BlockingQueue logQueue = //...
public static void log(String message){
//construct&filter message
logQueue.add(message);
}
then in the logger thread it will look like
while(true){
String message = logQueue.poll();
logFile.println(message);//or whatever you are doing
}

As with all I/O, you have little choice but mutual exclusion. You may theoretically build up a complex scheme with a lock-free queue which accumulates logging entries, but its utility, and especially its reliability, would be very questionable: without careful design you could get a logging-caused OOME, have the application hang on due to threads which you didn't clean up, etc.
Keep in mind that, assuming you are using buffered I/O, you already have an equivalent of a queue, minimizing the time spent occupying the lock.

The downfall to synchronisation is the fact that only one thread can access that part of the code at any one time, meaning your code will see little benefit from multithreading I.e. the synchronised part of your application will only be as fast as a single thread. (Small overhead for handling the synchronised status too, so a little slower perhaps)
However, in subjects where you don't want the threads to interfere with one another, such as writing to files, the security gained from the synchronisation is paramount, and the performance loss should just be accepted.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.