How to know when there's too much logging messages? - java

I came across one very good library for parsing CUE files. But when I started to read its source code, I realized that it is almost unreadable:
public void setParent(final CueSheet parent) {
FileData.logger.entering(FileData.class.getCanonicalName(), "setParent(CueSheet)", parent);
this.parent = parent;
FileData.logger.exiting(FileData.class.getCanonicalName(), "setParent(CueSheet)");
}
every method has logger.entering() and logger.exiting() messages. Isn't that too much?
There's another java library for parsing audio tags. It also had like 15 log messages for each file it read. It was annoying so I commented out every call to logger. And the library became twice as fast, because they used a lot of string concatenation for log messages.
So the question is: should I really log everything, even if it is not large enterprise application? Because these libraries obviously don't need any logging, except for error messages. And my experience shows that loggers are terrible tool for debugging. Why should I use it?

How to know when is too much logging? When you know that the logged information isn't important in the long term, such as for straightforward debug actions or bug correction, or for when the application doesn't deal with too much important information.
Sometimes you need to log almost everything. Is performance or full possibility of analysis the most important part of an application? It really depends.
I've worked in the past with some integration with a lot of different webservices, like 10 in a same app. We logged all xml requests and responses. Is this an overhead? In the long term, I don't think so because we worked with a lot of credit card operations and should have every process made with the server logged. How to know what happened when there was a bug?
You wouldn't believe what I've seen in some of the xml responses. I've even received a xml without closing tags, from a BIG airplane company. Were the "excessive logs" a bad practice? Say that to your clients when you have to prove that the error came from the other vendor.

Ideally, you use a logger that allows logging levels; log4j has fatal/error/warn/debug/info, for example. That way, if you set the level to "only show errors", you don't lose speed to the software building log messages you didn't need.
That said, it's only too much logging until you wind up needing something that would have been logged. It sounds like most of the logging that's slowing you down should be "trace" level, though; it's showing you what a profiler would have.

Most logging libraries incorporate a means to confirm that logging is enabled before processing an instruction:
For example:
public void foo(ComplicatedObject bar) {
Logger.getInstance(Foo.class).trace("Entering foo(" + bar + ")");
}
Could be quite costly depending on the efficiency of the bar.toString() method. However, if you instead wrap that in a check for the logging level before doing the string concatenation:
static {
Logger log = Logger.getInstance(Foo.class);
public void foo(ComplicatedObject bar) {
if (log.isTraceEnabled()) {
log.trace("Entering foo(" + bar + ")");
}
}
Then the string concatenation only occurs if at least one appender for the class is set to Trace. Any complicated log message should do this to avoid unnecessary String creation.

This level of logging is canonically bad - in fact, I saw code exactly like this in the Daily WTF a few days ago.
But logging is in general a Very Good Thing.

It depends, it this code for an application, or a library? For an application, logger are useful once the code is in production. It should not be used to debug, but to help you replicate a bug. When a user tells you that your application crashed, you always want the maximum logging information.
I agree that it makes the code less readable. It even make the application slower!
It's a total different game for a library. You should have consistent logging with adequate level. The library SHOULD inform the development team when an error occurs.

Logging should provide you with information that a stack trace can't in order to track down a problem. This usually means that the info is some kind of historical trace of what the program did, as opposed to what state it's in at the time of failure.
Too much historical data will be ignored. If you can safely deduce that a method was called without having to actually log its entry and exit, then it's safe to remove those logging entries.
Another bad sign is if your logging files start to use up a huge amounts of disk space. You're not only sacrificing space, you're probably slowing down the app too much.

To answer the question, why should I use loggers?
Have you ever encountered a piece of software where the only error indicated presented to the end user is Fatal error occured. Would it not be nice to find out what have caused it?
Logging is a tool that can really help you narrow these kind of problems in the field.
Remember, end-user systems don't have nice IDE's to debug and the end-users usually are not knowledgeable enough to run these tools. However end-users, in most cases, are capable of copying log configuration files ( written by us, clever programmers ) into predefined location and fetch log files and email them back to us ( poor soles for having to parse megabytes of log output ) when they encounter problems.
Having said this, logging should be highly configurable and under normal conditions produce minimal output. Also, guards should protect finer level logging from consuming too many resources.
I think in the example that you have provided all logging should have been done on a TRACE level. Also, because nothing bad can really happen between function entry point and exit, it probably make sense to have only one log statement there.

Over the years I've swayed backwards and forwards between promoting logging everything at the appropriate levels (trace, info, etc...) and thinking that any is a complete waste of time. In reality it depends on what is going to be useful to track down or required (logging can be a cheap way of maintaining an audit trail).
Personally, I tend to log entry/exit at a component or service level and then log significant points in the processing such as a business logic decision or a call on another service/component. Of course errors are always logged, but once only and at the place they were handled (the stack trace and exception message should have sufficient info to diagnose the problem) and any service/component interface should always handle an errors (even if it is just converting it into another more appropriate to the caller).
The problem with logging stuff on the off chance something goes wrong is that you end up with too much information that it is impossible to identify the issue, especially if it is running under a server as you end up with loads of intertwined log entries. Obviously you can get around that by incorporating a request id in the entry and using some software to filter on that. Of course you also have the case where your application is distributed and/or cluster and you have multiple logs.
Nowadays I would never actually write trace entering/exiting entries code, the code just gets in the way and it is so much easier to use something like aspectj if it is really needed. Using aspectj also would guarantee to be consistent (you can change the log format in one place rather than having to change every operation) and accurate (in case some refactoring adds a new paramater and teh developer forgets to add it to the logging).
One thing I have thought about doing or looking to see if someone already has is a logger that will hold the entries in memory, then if an error is encountered they are written, if the operation succeeds the entries are just discarded. If anyone knows of one (ideally for log4j) please let me know, alternatively I have a few ideas on how to implement this if anyone is interested in doing one.

This is where log levels are helpful. In general, log levels in the order of verbosity and priority are TRACE, DEBUG, INFO, WARN, ERROR, FATAL.
The developer has to take a conscious call to use the correct log level while logging in the code.
While creating an instance of Logger we have to pass the correct log level by choosing it from a config (always prefer config). This decides which levels to be logged. For example, while creating the logger, if the config for log level is set to "INFO", anything below "INFO" (TRACE, DEBUG) won't be logged.
For instance, in the example you mentioned above, a TRACE OR DEBUG level would make more sense.
In runtime in production, the config for log level should always be set to INFO.
When an issue occurs in production and if the developer wants to find out the root cause, they can request for changing the log level to TRACE or DEBUG (mostly inside a QA environment where they can replicate the scenario), to see what exactly is happening (The app sometimes has to be restarted to have the log level changed, but it is helpful).
Log levels is a great practice, as most of the times, we won't be able to launch a debugger in the landscapes. As we are skipping the unnecessary file writes by choosing a higher log level, the performance won't take a hit

Related

What are ways to change log level at runtime depending on occurrence rate

I would like to know whether there are conventional ways (probably some third-party libraries) to help me with the following problem. I would like to have an ability to change log level depending on occurrence rate. For example, if I catch some exception and it happens say once an hour I would like to log it with the WARN level. But if the rate becomes higher then I would like to switch to the ERROR level. The purpose of this behavior is to not get distracted by rare sporadic exceptions that are inevitable in general but shouldn't be brought to developers' attention unless there is a sufficient amount of them. When I open Sentry summary page I would like to see only what's really important and requires attention, not some rare request failures that will inevitably happen from time to time. But at the same time I can't just use WARN level because if there are many such failures then there's clearly something bad going on and it requires attention.
To better illustrate what I mean here's how in my opinion that could be done with an imaginary third-party library that would imitate (or be a wrapper over) slf4j/commons-logging:
LOG.warn("Failed to send request to ...", e).onRate(Rates.hourlyMoreThan(5)).switchTo(LogLevel.ERROR);
An alternative would be an option in Sentry not to show events (matching some criteria like a class of the exception) with a low occurrence rate but as far as I know that's not possible.
Assuming you want to do this with Log4j (other logging frameworks should have similar structures), this can probably be done with a custom appender. If that appender sees one message too many times, it could try to change the category's (or the logger's) level.
Here is how you could create your own Appender:
https://logging.apache.org/log4j/2.x/manual/extending.html
Here is what the appender would have to do to reconfigure the logging:
https://logging.apache.org/log4j/2.x/manual/customconfig.html
Not sure if it is possible to filter in Sentry UI, but you an definitely create a custom Sentry appender based on SentryAppender which will measure the occurrence and emit event to Sentry only when it reaches certain threshold.

log4j/logback pass logger level as a parameter

I want to do something which seems really straightforward: just pass a lot of logging commands (maybe all, but particularly WARN and ERROR levels) through a method in a simple utility class. I want to do this in particular so that during testing I can suppress the actual output to logging by mocking the method which does this call.
But I can't find out how, with ch.qos.logback.classic.Logger, to call a single method with the Level as a parameter ... obviously I could use a switch command based on this value, but in some logging frameworks there's a method or two which lets you pass the logging Level as a parameter. Seems a bit "primitive" not to provide this.
The method might look a bit like this:
Logger.log( Level level, String msg )
Later
Having now looked up the "XY problem" I understand the scepticism about this question. Dynamic logging is considered bad, at least in Java (possibly less so in Python)... now I know and understand that the preferred route is to configure the logging configuration appropriately for testing.
One minor point, though, if I may: although I haven't implemented this yet with this particular project, I generally find "just" tracing the stacktrace back to the beginning of the particular Thread insufficient, and this is what logback does (with Exceptions passed at WARN or ERROR levels). I plan to implement a system for recording "snapshots" of Threads when they run new Threads... which can then be listed (right back to the start of the app's first Thread) if an error occurs. This is, if you like, another justification for using something to "handle" outgoing log calls. I suppose that if I want to implement something like this I will instead have to try to extend some elements of logback in some way.

What level to use for exception stack trace logging in Java?

I'm looking for best practices document (or your opinions) on how to effectively log exceptions and their stack traces. Of course, assuming one of popular logging frameworkks such as Log4J, SLF4J, java.util.logging, etc.
I'm particularly interested in your opinion about on what level stack traces should be logged.
I heard few contradicting each other opinions such as:
stack traces should be logged only on DEBUG level while ERROR level should contain only "human readable" error message
stack traces should be logged on ERROR level in order to give to the operator maximum amount of information required to find root cause of an exception
I have found couple of interesting articles however none of them touches this particular subject:
http://today.java.net/pub/a/today/2006/04/06/exception-handling-antipatterns.html
http://today.java.net/pub/a/today/2003/12/04/exceptions.html
which probably means that authors of these articles had same concerns as I do :-)
I'd be really interested in your view on this subject.
Stack trace is the most valuable piece of information you get when troubleshooting. I would never risk logging it on DEBUG level since it might be disabled. And I almost never want to suppress stack traces.
Also note that:
log.error("Houston, we have a problem", ex);
will print the human readable message in line marked as ERROR, while the stack trace is following that line. If you want your errors to be only human readable, just do grep ERROR.
I'm not sure about best-practice advice for this, but in the end, for me it boils down to this:
Exceptions should only be visible in exceptional circumstances. The concept of exception was invented to give developers a chance to handle errors internally.
In reality, most code I see doesn't even try to handle them, instead dumping them to the log, sysout, (or worst case of all) into dialog boxes. I know, that for a developer it is important in some cases to get the full stacetrace. But not nearly in all of them. Creating your own exception framework (which is definitely a best practice) might already be enough to figure out the context of an exception simply by classname.
So I would advise to do the following:
Create your own exception framework
Include specific error codes in the message, for your reference
Log the exception message on ERROR
Log the stacktrace on DEBUG
NEVER EVER display the user either of these. Instead show a useful message. Maybe include a way to report the error (with stacktrace) with minimal fuzz.
Note: If you are writing an internal "enterprise" software, forget everything I wrote. :-)
I think the stack trace should be logged at the appropriate place based on priority best practices. Depending on the nature of the exception and its place within your application, this may be one of many levels. Please see this related question:
Commons Logging priority best practices

Do you provide DEBUG logging level for private methods?

I add logging for Java classes.In particularly, logger.debug level (for development mode).
As far private method called within public ones, is there sense to provide logging within private methods?Is it efficiently?
Thank you.
It's not necessarily a matter of efficiency or not. You have to ask yourself: "Does that private method do something worthy of logging?"
In order to avoid performance issues, you can always check the logger whether it will log debug messages or not, before actually logging those messages. In log4j, you'd write
if (logger.isDebugEnabled()) {
logger.debug("complicated" + "innefficient" + "string concatenation");
}
I imagine, other loggers provide similar concepts
yeah, if you want to "debug" your private methods in maintance mode or development, why not?
in production you'll set your logging level either warn or error! that's upto you...
Of course. Private methods usually hide implementation detail, and that's where most bugs lurk. Put your logging statements where you feel the need, and disable them in production.
Logging might seem wasteful when you don't need it, but it is indispensable when you do. Always provide meaningful log messages in your classes and functions. The log levels are there for you to be able to filter out the noise, and increase verbosity when you need it. Good loggers provide capabilities to change log levels at runtime.
Still, if you are worried about performance for whatever reason, you can use some simple guards to reduce the overhead of logging, especially debug logging, which is rarely on. With log4j/slf4j for example, you can wrap log debug statements in:
if(logger.isDebugEnabled()) {
logger.debug("something");
}
slf4j additionally has a printf-like syntax which only does string formatting if the log level is correct.
logger.debug("Object {} is not valid!", obj);
Like darioo commented, this second form removed the need to check the log level before the log statement.

How to properly handle error logs?

I tried to do several searches before posting this question. If this is a duplicate, please let me know and I will delete it.
My question revolves around the proper way to handle errors produced through our web application. We currently log everything through log4j. If an error happens, it just says "An error has occurred. The IT Department has been notified and will work to correct this as soon as possible" right on the screen. This tells the user nothing... but it also does not tell the developer anything either when we try to reproduce the error. We have to go to the error log folder and try finding this error. Let me also mention that the folder is full of logs from the past week. Each time there is an error, one log file is created for that user and email is sent to the IT staff assigned to work on errors. This email does not mention the log file name but it is a copy of the same error text written that is in the log file.
So if Alicia has a problem at 7:15 with something, but there are 10 other errors that happen that same minute, I have to go through each log file trying to find hers.
What I was proposing to my fellow co-workers is adding an Error Log table into the database. This would write a record to the table for each error, record who it is for, the error, what page it happened on, etc. The bonus of this would be that we can return the primary key value from the table (error_log_id) and show that on the page with a message like "Error Reference Id (1337) has been logged and the proper IT staff has been notified. Please keep this reference id handy for future use". When we get the email, it would tell us the id of the error for quick reference. Or if the user is persistent, they can contact us with the id and we can find the error rather quickly.
How do you setup your error logging? By the way, our system uses Java Servlets that connect to a SQL Server database.
I answered a similar question here, but I will adapt that answer to your question.
We use requestID for this purpose - assign a request ID to each incoming (HTTP) request, at the very beginning of processing (in filter) and then log that on every log line, so you can easily grep those logs later by that ID and find all relevant lines.
If you think it is very tedious to add that ID to every log statement, then you are not alone - java logging frameworks have made it transparent with the use of Mapped Diagnostic Context (MDC) (at least log4j and logback have this).
RequestID can also work as a handy reference number, to spit out, in case of errors (as you already suggested). However, as others have commented, it is not wise to load those details to database - better use file-system. Or, the simplest approach is to just use the requestID - then you do not need to do anything special at the moment error occurs. It just helps you to locate the correct logfile and search inside that file.
How would one requestID look like?
We use the following pattern:
<instanceName>:<currentTimeInMillis>.<counter>
In consists of the following variables:
instanceName uniquely identifies particular JVM in particular deployment environment / .
currentTimeInMillis is quite self-explanatory. We chose to represent it in human-readable format "yyyyMMddHHmmssSSS", so it is easy to read request start time from it (beware: SimpleDateFormat is not thread-safe, so you need to either synchronize it or create a new one on each request).
counter is request counter in that particular millisecond - in the rare case you might need to generate more than one request ID in one millisecond
As you can see, the ID format has been set up in such a way that currentTimeInMillis.counter combination is guaranteed to be unique in particular JVM and the whole ID is guaranteed to be globally unique (well, not in the true sense of "global", but it is global enough for our purposes), without the need to involve database or some other central node. Also, the use of instanceName variable gives you the possibility to limit the number of log files you later need to look in to find that request.
Then, the final question: "that is fine and dandy in single-JVM solution, but how do you scale that to several JVMs, communicating over some network protocol?"
As we use Spring Remoting for our remoting purposes, we have implemented custom RemoteInvocationFactory (that takes request ID from context and saves it to RemoteInvocation attributes) and RemoteInvocationExecutor (that takes request ID from attributes and adds it to diagnostic context in the other JVM).
Not sure how you would implement it with plain-RMI or other remoting methods.
If multiple servers are running and each server leaves log messages on itself, it is really difficult to trace them. So,somebody or a tool should gather and sort them in time order.
It is a good way to have a central point where all messages are sent.
A possible solution, have your error page include a 'send email to whatever' link. When the user clicks this email the body of the e-mail might start with a few blank lines followed by something like:
----Please do not modify the information below this line.---
Error details
Any users complaining via this link will automatically send you the info you need and if you are reproducing the error you have quick access to the error message. You might even have a form for sending the e-mail so that the user never sees this (which may be important to some) but then you are relying on your system being at least able to send an e-mail.
Actually I find it useful to print the error details in an HTML comment on error pages like this so that I can always get at them myself.
I do agree with david above that I do not like storing this kind of information in a DB.
For the strategies of logging you can see the discussion Logging best practices.
I have used an approach like the one you're suggesting ( log to a db ) in the past and it has been very helpful.
Not only you cat get the error via SQL but you can also generate reports of what's the most recurring errors and attend them first.
On the design we did, equals stacktraces belong to the same records ( since they were originated exactly in the same place )
We had an small app that pooled that db and we knew then a new exception was generated instead of getting an e-mail that summed with the rest of the previous weeks were ignored altogether.
Of course this database design was very specific for the application we had and additional identifications were possible, we had software version, build, some times input parameters , etc. etc.
With time, the system administrators get to know what to do with each kind of exception and proceed accordingly.
But! Your application may not be that big anyway. Probably you can have the same just parsing the log files.
I'd oppose the idea of storing error logs in a database. The logging system should be as simple as possible and not involve components that are not 100% necesary to write a log record.
Things can get pretty complex when logging into a DB - e.g. you can having troubles logging any database-related errors (how to log errors that occured because DB not responding, e.g. because of a heavy load or a infrastructure error); other issue I'd see is a potential need to have separate transactions for logging, etc.
On the other hand, having a reference ID for an error is not a bad idea, but again, this it also means to increase complexity of logging system ( e.g. how would you propagate the ref. ID through all layers of your application when a error occurs? )
In projects I'm involved to, the general guideline is to log errors as verbosely as possible, and to include as much context information as possible (to write the logs, we use a 'conventional' approach usually - log4j or simillar). Usually, this works well even for heavy loaded systems.

Categories