log4j performance on huge treatment batch (10-12h)

log4j performance on huge treatment batch (10-12h) - java

I would know if someone has observed if a lot of log writes in logfile (300k lines if 4 hours of treatment) could penalize batch performance.
P1: The batch writes a lot of info in the logfile and I'm in doubt if we delete or comment all this log writes in source code the batch performance could be increased and gain 15 min or more in time execution.
We could have a million or more lines in a full batch execution (8-12 hours).
P2: Or database check and log writes could be done in parallel ? But i thought our source code doesn't do that.

Well, yes. Too much logging does affect performance. But the only way to know how much it affects performance would be to measure it.
P1: The batch writes a lot of info in the logfile and I'm in doubt if we delete or comment all this log writes in source code the batch performance could be increased and gain 15 min or more in time execution.
Nobody can tell you how much time you would gain. (I'd be surprised if you gained as much as that, but I could be wrong. Measure it!!)
P2: Or database check and log writes could be done in parallel ? But i thought our source code doesn't do that.
It is probably a bad idea to explicitly code parallel logging into your application, since it will make your code a lot more complicated. And there is a better way to get some parallelism: try using an asynchronous appender.
There are a number of things that you can do to tune logging performance without going to the extreme of ripping it all out. These include:
Switch to a different logging library. For example, log4j 2.x should be more efficient than log4j 1.2.
Don't log too much.
Log at an appropriate level, and adjust the log level depending on the circumstances.
Make sure that you are creating the log messages efficiently. For example, avoid generating complicated message strings that won't be logged due to the logging level. (In log4j 2.x, use the Logger methods that take format strings.)
Avoid expensive features in your log format / formatter. For instance logging the class / method is relatively expensive.
Try using an asynchronous log appender.
For some background on logging performance, take a look at the log4j2 Performance page.

Related

Dose disabled logger.debug affects performance?

I use log4j in my application. In development I use tons of logger.debug to display infomation for debugging. I know I can make these verbose displays go away by changing the logging level in the configuration file when deployed, my questions is will this affect performance? Is it that although the debug level is disabled, the logging work is still there and dose something silently? Is it better to remove all the logger.debug codes in the final deploy version if performance is important?

Modern loggers very quickly return from an inactive logging statement for this very reason
You need to be aware of the price of constructing the string to be logged. If you use slf4j as the front end, use {} to delay this until after the tests

Any IO operation will affect performance. Even if you change logging level, each time you call log.debug, logger have to make decision to print message or not. However, making decision is faster than doing it with writing to file/console/something else.

Production logging for Java app

There is inherent tension between logging verbosity and performance of Java app in production. If we log very selectively then we might miss evidences for issues in production to debug . If we add too much logging in production , can impact performance.
I was thinking along the line with couple of options :
Log all selective and important things
Have SSDs instead of hard disks in prod
Have logging utility that can "batch" logging statements and flush periodically
Have some utility that will hold logs in memory and then flush eventually.
What are best approaches other than outlined above ? Are there any existing logging tools that can be used for this purpose ?

Try apache log4j with slf4j( you can switch log4j without much changes in your code). Use the configuration xml to provide what to log and which files to log.
Also use rolling file appenders and buffer appenders to handle flushing and batching of logs.

log4j over System.out.println

What is performance wise advantage of using log4j over System.out.Println?
FYI:I know log4j has multiple appenders,debug logging and other features which System.Out.println doesn't have and is applicable at class level also and is used in larger applications.
But if I have a small application, say a file will log4j will provide better performance than System.Out.println. How internally log4j works?

Log4j isn't entitled to be more performant. It was created for having more abilities to decrease log amounts and specify the log output. Imagine a Tomcat server which logs amounts of hibernate database accesses. E.g. with the log level you can stop straining the server through this info logs. But this is not a "native" performance advantage since you can simulate this with flag checking before sysos.

The only case I can think of in which you could perceive a performance advantage using log4j over System.out.println would be for example if doing this:
logger.debug("Logging something");
Instead of doing this:
System.out.println("Logging something");
And you then configure the logging level in such a way that it does not require logging debug messages.
In that scenario, the first line will not actually log, but the second one will still write to the System.out.
In all other cases (I mean, in all cases where logging is actually done) I don't think it will be faster to use log4j performance wise.
Usually it's used for all the reasons you state, and because the performance is not that different (framework overhead is minimal compared to the time it actually takes to write things on files).

Transaction log library

I am in need of transaction log library with following features:
maximum performance. No force (flush), let O/S write buffers at its own discretion. File size increases in big chunks, to minimize metadata modifications. I don't care if some last records are lost.
reading records in backward order (most recent first).
The problem is, how to find the last valid record when reading the log file? What technics can be used, or is there a ready opensource library?

Did you check if HOWL - High-speed ObjectWeb Logger matches your requirements? It is rather out of date and seems not to allow random access or reading backward. However, it supports setting a mark and replaying events from a mark on. Because it is open source it might be adaptable to your needs.
You may investigate too if the logging part of JBoss Transaction is suitable.
Please specifiy what you mean with "read backwards" through a transaction log. A transaction log may contain logs from multiple transactions each consisting of a sequence of events.
More information on transaction logging can be found here (and on the web of course):
Java Transaction Processing: Design and Implementation (ISBN 978-0130352903)
Fundamentals of Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery (ISBN 978-1558605084)
Principles of Transaction Processing (ISBN 978-1558606234)
and in various books about database system concepts
Hope this helps a bit to come closer to your goal
Michael

Most of the famous logging systems (like log4j and apache) support different kind of logging mechanisems, you just have to config them right. But, if you want to log backward, it is really resource consuming, because streams are sequential and you should push a new record into top of all the other records. Also probably you should do most of the logging code by yourself.

Using java.util.logging, is it possible to restart logs after a certain period of time?

I have some java code that will be running as an importer for data for a much larger project. The initial logging code was done with the java.util.logging classes, so I'd like to keep it if possible, but it seems to be a little inadequate now given he amount of data passing through the importer.
Often times in the system, the importer will get data that the main system doesn't have information for or doesn't match the system's data so it is ignored but a message is written to the log about what information was dropped and why it wasn't imported. The problem is that this tends to grow in size very quickly, so we'd like to be able to start a fresh log daily or weekly.
Does anybody have an idea if this can be done in the logging classes or would I have to switch to log4j or custom?
Thanks for any help!

I think you have to roll your own StreamHandler (at least as of Java 1.5 it didn't come with an implementation). Here is someone who did it.

You can use log4j and use the DatedFileAppender (distributed separately). This creates a single file per Date. I like it very much and use it everywhere where I implement log4j (even my Tomcat server logs through it!).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.