Optmizing disk writes for Java Logger

Optmizing disk writes for Java Logger - java

I am using Java.util.Logger to log various events of my project. I am using a file handler to create the log. I see that the rate at which events are written to the log ( in the disk ) is almost the pace at which events are happening. This seems to be good and bad at same time. Good as event updates are written quickly, but I am concerned about the IO time. Sometimes there is a lot of data that needs to be written to the logs. So in those cases, my program would run slower because of this logging, which is not desirable.
It would be of great help, if somebody could suggest what I should do in this case. I do not care the rate at which events are logged, they just need to be there in the log file at the end of execution.
Thanks.

A performance loss of 5-10% is expected when running full debug logging. This seems to be acceptable for our customers.
If the code to generate some of the content to log out is expensive, consider using a simple test like this to avoid executing this code when debug is turned off:
if (log.isLoggable(Level.FINEST)) {
// code to generate the log entry
}
You can also create a java.util.logging.MemoryHandler and push out to a file in a regular interval.

Jochen Bedersdorfer's answer is a good one and just4log is a system that will do it automatically for you - via post-processing. Therefore you won't have to ugly up your code with if statements around the log statements.

Pexus has recently released an open source performance logging package - PerfLog, that also includes an application logger based on java.util.logging.* API. It includes an option for asynchronous logging using Common J Work Manager that is availble in all J2EE container (1.4+)
For more information see: http://www.pexus.com/perflog

Use a more-modern logging library such as log4j or slf4j which have support for asynchronous/buffered appenders.
In log4j, you can use AsyncAppender (which provides the buffering facility) and wire up a FileAppender to it:
The AsyncAppender will collect the events sent to it and then dispatch them to all the appenders that are attached to it. You can attach multiple appenders to an AsyncAppender.
The AsyncAppender uses a separate thread to serve the events in its buffer.
This way the events are written to the disk in a controlled manner, and your threads doing actual work are not tied up with disk I/O.
Or as a simpler option, consider if you really need to have the full output of the logs when running this program. It's often overkill to run an application in production with logging at the DEBUG level.

I would suggest you try out another logging solution, like log4j which is widely used (often in combination with commons-logging). It offers a performant approach to logging.
If you however desire even more control you can implement your own appender. Assuming you desire a file appender you can override the append routine of the FileAppender.
E.g.,
public class BatchingFileAppender extends FileAppender {
private List<LoggingEvent> batch = new LinkedList<LoggingEvent>;
public static final int BATCH_SIZE = 10;
#Override
protected void append(LoggingEvent event) {
batch.add(event);
// you can even optionally push ever 10'th or so messages to file
if (batch.size() == BATCH_SIZE) {
appendBatch();
}
}
#Override
protected void reset() {
appendBatch();
}
#Override
protected void closeWriter() {
appendBatch();
}
private void appendBatch() {
for(LoggingEvent event : batch) {
super.append(event);
}
batch.clear();
}
}

You should check out Logback. Same authors as log4j if I'm not mistaken.
Based on our previous work on log4j, logback internals have been
re-written to perform about ten times faster on certain critical
execution paths. Not only are logback components faster, they have a
smaller memory footprint as well.

Related

Does log.debug decrease performance

I want to write some logs at the debug log which will not be available in the production logs which has info log level. So how will this extra debug logs affect the performance? I mean if we set the log level at INFO, the logger has to check what the log level is and find that the log.debug needto be ignored.
So does this extra log level checking affect performance?
Is there any automagical way of removing the log.debug() statements while deployment? I mean during development time the log.debug will be there and we can debug. But during production deployment time the automagical mechanism will remove all log.debug() messages. I am not sure whether these are possible.

So how will this extra debug logs affect the performance?
It affects the performance of the application as loggers are disc I/O calls (assuming you are writing to file system) and DEBUG log level is strictly NOT recommended for production environments.
Is there any automagical way of removing the log.debug() statements
while deployment?
No, there is no magical way of removing the log.debug() statements, BUT when you set the logging level to INFO, then as long as you are NOT doing heavy computations while passing the parameters to the debug() method, it should be fine. For example, if you have logger level set to INFO and assume you have got the below two loggers in your code:
logger.debug(" Entry:: "); //this logger is fine, no calculations
//Below logger, you are doing computations to print i.e., calling to String methods
logger.debug(" Entry : product:"+product+" dept:"+dept);//overhead toString() calls
I recommend to use slf4j so that you can avoid the second logger computations overhead by using {} (which replaces with actual values using it's MessageFormatter) as shown below:
//Below logger product and dept toString() NOT invoked
logger.debug(" Entry : product:{} dept{}", product, dept);
One more important point is that with slf4j is just an abstraction and you can switch between any logging frameworks, you can look below text taken from here.
The Simple Logging Facade for Java (SLF4J) serves as a simple facade
or abstraction for various logging frameworks (e.g. java.util.logging,
logback, log4j) allowing the end user to plug in the desired logging
framework at deployment time.

You can wrap your "debug" statements in a call to isDebugEnabled()
if (log.isDebugEnabled()) {
log.debug("my debug statement");
}
Likewise, wrap your "info" statements in a call to isInfoEnabled() etc.
The idea behind doing this is that checking whether a logging level is enabled is an inexpensive (fixed cost) operation. The cost to generate the statement that is being logged will vary depending on what you are doing.

You can minimize this by how you write your logging statements. If you write
Object a = ....
log.debug("I have an a: " + a);
then regardless of the logging framework you're using the argument has to get evaluated before the debug function gets run. That means that even if you're at INFO level, you're paying the performance cost of calling toString on a and building the argument string. If you instead write e.g. (depending on what formatting your logging framework uses, this works in log4j and slf4j)
log.debug("I have an a: {}", a);
you don't pay this cost but only the cost of the logger checking whether or not you're in DEBUG mode - unless you need it you don't pay for the argument evaluation.
The other thing to check is that you're buffering output (again, in slf4j, there are buffering appenders) which will minimize the writes.

Another technique that I'd like to point out, often used in Android development, is that you can post-process your jar to remove calls such as debug. The tool used is usually proguard. If you define the call as side-effect free, it can be removed by the optimizer ensuring pretty much zero performance penalty.... it should even be smart enough to optimize away any string construction you were doing for the log message.
https://www.guardsquare.com/en/proguard/manual/usage#assumenosideeffects

The overhead of checking the logging level is very less, almost negligible. You will see a significant impact on performance when you enable debug logs. The impact would depend on how much you data you write to the logs, your storage(if your storage is an SSD the performance hit is lesser compared to the performance hit you get using a normal disk), how many threads write to log (Since only one thread can write to a file at once all the other threads have to wait and it is a sequential process). I have mentioned three but there are more factors that decide how much impact logging will have on application performance.
To answer your second question there is no automatic way to remove debug statements from your code.

Log4J - One log file per thread in an environment with dynamic thread creation

Let me begin with a brief explanation of the system I am using:
We have a system that runs as a Daemon, running work-units as they come in. This daemon dynamically creates a new Thread when it is given a new work-unit. For each of these units of work, I need Log4J to create a new log file to append to - the filename will be provided at runtime, when the new log file must be created. This daemon must be able to remain alive indefinitely, which I believe causes some memory concerns, as I will explain.
My first thought was to create a new Logger for each work-unit, naming it after the thread, of course. The work-unit's thread retains a reference to that Logger. When the unit is finished, it will be garbage-collected, but the problem is that Log4J itself retains a reference to the Logger, which will never be used again. It seems likely that all these Loggers will cause the VM to run out of memory.
Another solution: subclass Filter, to filter Appenders by the thread names, and place them on the same Logger. Then, remove the Appenders as the work-units complete. Of course, this necessitates adding the code to remove the appenders. That will be a lot of code changes.
I've looked into NDC and MDC, which appear to be intended for managing interleaved output to the same file. I've considered proposing this as a solution, but I don't think it will be accepted.
I want to say that Log4J appears that it is intended not to function this way, that is, dynamically creating new log files at runtime, as they are required (or desired). So I am not sure which direction to look in next - Is log4j not the solution here, or did I comPLETELY miss something? Have I not looked closely enough at NDC? Or is my concern with Log4J holding onto Loggers a nonissue for reasons I don't see?

You could just create a new log method that wraps the normal log method, and append thread id.
Something like the following (oversimplified, but you get the idea). Log4j is already thread save I believe, so as long as you aren't logging a ton, you should be fine. Then you can easily grep on thread id.
public log(long id, String message)
{
logger.log("ThreadId: id + "message: " + message);
}

ThreadLocal appears to be the solution for you.

Log4j: is it synchronized for multithreaded calls?

We are running into an interesting issue that we noticed while doing stress testing of our system. We are using log4j (in JBOSS) very heavily for our logging. Here is a naive example of some logging we ave
void someFunction()
{
Log.info("entered some function");
...
Log.info("existed some function");
}
Now the interesting thing we noticed is that if we launch 100 threads against this function; the Log.info() calls is blocking per thread.. meaning thread 2 is waiting for thread1 to finish the "Log.info" call. In case of Thread 100; it ends up waiting quite a long time.. We are using a native file logger.
Is this a known issue?

Log4J has to be synchronized, otherwise you would see interleaved and garbled log messages in your file. But at least in Logback only appenders are synchronized, not the whole logging message (so computing effective log level, log message, etc. is multi-threaded).
However even if synchronization was removed, I/O would be the bottleneck since it is inherently single-threaded. Thus consider reducing the amount of logging, since it is the file access that is slow, not Log4J.
You may also be interested in AsyncAppender to queue logging messages in a single, different thread.

Yes, log4j uses multithread syncronyzation. And not perfectly, sometimes.
We had experienced some performance degradation caused by contention for log4j locks and even deadlocks with use of complex toString() method.
See https://issues.apache.org/bugzilla/show_bug.cgi?id=24159 and https://issues.apache.org/bugzilla/show_bug.cgi?id=41214#c38, for example.
More details in my another answer:
Production settings file for log4j?
I guess this is one of the reasons of logback existence and switch to custom logmanager since JBoss AS 6.

What you might want is asynchronous logging, see this article on how to achieve that:
Asynchronous logging with log4j
Also, consider using the right log levels. The entered... and exi(s)ted... statements should typically be logged at TRACE level, which might be handy when debugging (then set configure log4j to log at TRACE level as well). In a production setting you might want to tell log4j to log only from level INFO or DEBUG, thus avoiding unnecessary log actions.
See also this question on the performance of log4j:
log4j performance

Others have already suggested you alternatives, I've been digging through the source code and indeed there is a synchronized section:
public void info(Object message) {
if(repository.isDisabled(Level.INFO_INT))
return;
if(Level.INFO.isGreaterOrEqual(this.getEffectiveLevel()))
forcedLog(FQCN, Level.INFO, message, null);
}
...
protected void forcedLog(String fqcn, Priority level, Object message, Throwable t) {
callAppenders(new LoggingEvent(fqcn, this, level, message, t));
}
...
public void callAppenders(LoggingEvent event) {
int writes = 0;
for(Category c = this; c != null; c=c.parent) {
// Protected against simultaneous call to addAppender, removeAppender,...
synchronized(c) {
if(c.aai != null) {
writes += c.aai.appendLoopOnAppenders(event);
}
if(!c.additive) {
break;
}
}
}
if(writes == 0) {
repository.emitNoAppenderWarning(this);
}
}

Agree with previous answers. One of the first performance improvement steps in any application is to reduce the log level and have it dump less logs. Application developers should be diligent in using the right logging levels. Logging has a huge impact on performance because of I/O as well as synchronization especially when logger objects are static and shared among various threads.

Is FileOutStream.write(byte[]) always blocking?

I wondered if FileOutputStream.write(byte[]) is always blocking the current thread, leading to a ThreadContext switch, or can it be that this operation does not block if the OS buffers are large enought to handle the bytes.
The reason for these thoughts are, I wondered if the logging I do with log4j in my application is a real performance hit, and if it would be faster to use a Queue of logging messages which is read by a separate thread and written to the logfiles (I know the disadvantages of swallowed logging statement if the app quits and the statements in the queue are not flushed to disk).
No, I didn't profile it yet, these are rather conceptual thoughts.

Need not be.
FileOutputStream.write(byte[]) is a native method. Common sense would suggest that write() may just write to the internal buffers, and a later call to flush() would actually commit it.

You can use the log4j org.apache.log4j.AsyncAppender and logging calls will not block. The actual logging is done in another thread so you won't need to worry about calls to log4j not returning in a timely manner.

By default immediateFlush is enabled which means that logging is slower but ensures that each append request is actually written out. You can set this to false if you don't care whether or not the last lines are written out if your application crashes.
log4j.appender.R.ImmediateFlush=false
Also, take a look at this post on Log4j: Performance Tips, in which the author has got some test stats on using immediateFlush, bufferedIO and asyncAppender. He concludes, that for local logging "set immediateFlush=false, and leave bufferedIO at the default of don't buffer" and that "asycAppender actually takes longer than normal non-asyc".

It's likely going to depend on the OS, drivers and underlying file system. If write caching is enabled for example it'll probably return right away. I've seen gigabytes/day of logs written synchronously without affecting performance too much, as long as IO isn't bottlenecked. It's still probably worth writing them asynchronously if you're concerned about response times. And it eliminates potential future issues, e.g. if you changed to writing to network drive and the network has issues.

Log4J rerouting of Log Events

I would like to build an Appender (or something similar) that inspects Events and on certain conditions creates logs new Events.
An example would be and Escalating Appender that checks if a certain amount of identical Events get logged and if so logs the Event with a higher logleve. So you could define something like: If you get more then 10 identical Warnings on this logger, make it an Error.
So my questions are:
Does something like this already exist?
Is an Appender the right class to implement this behavior?
Are there any traps you could think of I should look out for?
Clarification:
I am fine with the algorithm of gathering and analysing the events. I'll do that with a collection inside the appender. Persistence is not necessary for my purpose. My question #2 is: is an appender the right place for this to do? After all it is not normal behaviour to creat logging entries for an appender.

You can create your own appender by implementing the Appender interface provided by log4j.
http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Appender.html
That would be one approach. Another would be to use an existing appender and then write some code that monitors the log. For example, you could log to the database and then write a process that monitors the log entries in the database and creates meta-events based on what it sees.
It depends most on what you're comfortable with. One question you'll have to deal with is how to look back in the log to create your meta-events. Either you'll have to accumulate events in your appender or persist them somewhere that you can query to construct your meta-events. The problem with accumulating them is that if you stop and start your process, you'll either have to dump them somewhere so they get picked back up or start over whenever the process restarts.
For example, let's say that I want to create a log entry every 10th time a NullPointerException is thrown. If I have the log entries in a database of some kind, every time an NPE is thrown I run a query to see how many NPEs have been thrown since the last time I created a log entry for them. If I just count them in memory every time one is thrown, if I restart the application after 5 are thrown, if I don't persist that number I'll lose count.

Logback (log4j's successor) will allow you to enable logging for any event via TurboFilters. For example, assuming the same event occurs N or more times in a given timeframe, you could force the event to be accepted (regardless of its level). See also DuplicateMessageFilter which does the inverse (denying re-occurring events).
However, even logback will not allow the level of the logging event to be incremented. Log4j will not either. Neither framework is designed for this and I would discourage you from attempting to increment the level on the fly and within the same thread. On the other hand, incrementing the level during post processing is a different matter altogether. Signaling another thread to generate a new logging event with a higher level is an additional possibility. (Have your turbo-filter signal another thread to generate a new logging event with a higher level.)
It was not clear from your question why you wished the level to be incremented. Was the increment of the level a reason in itself or was it a means to a goal, that is having the event logged regardless of its level. If the latter, then logback's TurboFilters are the way to go.
HTH,

As Rafe already pointed out, the greatest challenge would be persisting the actual events in the Appender, so that you'll know the time has come to trigger your event (e.g. escalate log level).
Therefore, I propose a following strategy:
Use a custom JDBCAppender. Unlike the one bundled with Log4j, this one can log exceptions.
Set-up an embedded database, like HSQLDB, and set-up a database with one table for event logging. It solves the persistence problem, as you can use SQL to find types of events that occurred.
Run a separate thread that monitors the database, and detects desired event patterns.
Use a LogManager to access desired Loggers and set their level manually.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.