How to reduce logs size [closed]

How to reduce logs size [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
In every project I've been working on there's always been the issue of log files becoming too large.
A quick off the shelf solution was to use the Log4j RollingFileAppender and set the maximum size allowed.
However there are situations when the same exception happens repeatedly reaching the maximum size very quickly, before somebody manually intervenes. In that scenario because of the rolling policy you end up losing information of important events that happened just before the exception.
Can anybody suggest a fix for this issue?
P.S. Something I can think of is to hold a cache of the Exceptions happened so far, so that when the same Exception re-occurs I don't log tons of stacktrace lines. Still I think this must be a well-known issue and I don't want to reinvent the wheel.

There are two directions to approach this from: The System side and the development side. There are several answers already around dealing with this from the system side (i.e. after the application is deployed and running). However, I'd like to address the development side.
A very common pattern I see is to log exceptions at every level. I see UI components, EJB's, connectors, threads, helper classes, pojos, etc, etc, logging any and all exceptions that occur. In many cases, without bothering to check for the log level. This has the exact result you are encountering as well as making debugging and troubleshooting take more time than necessary as one has to sift through all of the duplication of errors.
My advice is to do the following in the code:
THINK. Not every exception is fatal, and in many cases actually irrelevant (e.g. IOException from a close() operation on a stream.) I don't want to say, "Don't log an exception," because you certainly don't want to miss any issues, so at worst, put the log statement within a conditional check for the debug level
if(logger.isDebugEnabled()){
// log exception
}
Log only at the top level. I'm sure this will meet with some negativity, but my feeling is that unless the class is a top-level interface into an application or component, or the exception ceases to be passed up, then the exception should not be logged. Said another way, if an exception is rethrown, wrapped and thrown or declared to be thrown from the method, do not log it at that level.
For example, the first case is contributing to the issue of too many log statements because it's likely the caller and whatever was called will also log the exception or something statement about the error.
public void something() throws IllegalStateException{
try{
// stuff that throws some exception
}catch(SomeException e){
logger.error(e); // <- NO because we're throwing one
throw new IllegalStateException("Can't do stuff.",e);
}
}
Since we are throwing it, don't log it.
public void something() throws IllegalStateException{
try{
// stuff that throws some exception
}catch(SomeException e){
// Whoever called Something should make the decision to log
throw new IllegalStateException("Can't do stuff.",e);
}
}
However, if something halts the propagation of the exception, it should log it.
public void something(){
try{
// stuff that throws some exception
}catch(SomeException e){
if(logger.isLogLevelEnabled(Log.INFO)){
logger.error(e); // DEFINITELY LOG!
}
}
}

Use Log4J feature to zip the log file after a specified size is reached using "Rolling File Appender". Zips are around 85KB for a 1MB file.
For this specify the trigger policy to zip based on size and specify the zip file in the rolling policy.
Let me know if you need for info.

In my experience, logging is used as a substitute for proper testing and debugging of code. Programmers say to themselves, "I can't be sure this code works, so I'll sprinkle logging messages in it, so when it fails I can use the log messages to figure out what went wrong."
Instead of just sprinkling logging messages around without thought, consider each log message as part of the user interface of your software. The user interface for the DBA, webmaster or system administrator, but part of the user interface nonetheless. Every message should do something useful. The message should be a spur for action, or provide information that they can use. If a message could not be useful, do not log it.
Give an appropriate logging level for each message. If the message is not describing an actual problem, and is not providing status information that is often useful, the message is probably useful only for debugging, so mark it as being a DEBUG or TRACING message. Your usual Log4J configuration should not write those messages at all. Change the configuration to write them only when you are debugging a problem.
You mention that the messages are due to an exception that happens often. Not all exceptions indicate a bug in the program, or even a problem in the operation of the program. You should log all exceptions that indicate a bug in your program, and log the stack trace for them. In many cases that is almost all you need to work out the cause of the bug. If the exception you are worried about is due to a bug, you are focusing on the wrong problem: you should fix the bug. If an exception does not indicate a bug in your program, you should not log a stacktrace for it. A stacktrace is useful only to programmers trying to debug a problem. If the exception does not indicate a problem at all, you need not log it at all.

Buy bigger hard drives and set up a batch process to automatically zip up older logs on a regular basis.
(Zip will detect the repeated exception pattern and compress it very effectively).

use the strategy if reach maximum size, append to the new log file. and run scheduler like everyday to wipe the old log file

Related

The different Exception reporting

I am just thinking about java Exceptions. There are many different types and they all work for their own part. What I am curious about is the handling of them. For example
try
{
//Protected code
}catch(ExceptionName e1)
{
//Catch block
}
In the catch blok there multiple ways to report the Exception.
I have found several, but i assume there are more around:
System.err.println(e1); for debugging
system.println.out(e1); to just view the error for local validation
e1.printStackTrace(); to just view the error
Logger.getLogger(classname.class.getName()).log(Level.SEVERE, null, e1); where the level can vary in debug, info and error if I am correct.
Why would you choose one over the other? All I can think about is the info it reports? So for short errors you would just print the Exception and whilst looking for actual problems you would use something bigger. And if you know there is going to be an exception, but don't think it's important, can just throw it?
And is Exception handeling a good tool for testing code? Could it be a replacement for Black-Box-testing?

Your question is mainly about logging, and there are several ways to do it based on your requirements and complexity of your application. There are obviously differences between them for example:
System.out.println() uses the Scanner classes's PrintStream out static object to print the passed argument into console. println() is a method of PrintStream classes. Definitely not a suitable logging solution.
System.println.out() I do not think such a method exist in the System class, see the documentation.
System.err.println() do exist and is yet again a static object of PrintStream class. This is the standard error output stream, it is already open and it is waiting to receive data that should be brought to the user's attention.
If you are using console, you will not be able to see the difference between err.println() and out.println(). You can obviously configure them so that err.println() output all errors in a file.
Java's Exception class extends Throwable and implements Serializable interface. Exception inherits all the following methods from Throwable class:
getCause() - returns Throwable or null if the cause don't exist
getMessage() - returns String message of details of this throwable
getStackTrace() - returns StackTraceElement[] of the throwable
printStackTrace() - has two variations described below
getStackTrace() gives you programmatic access to the stack trace.
Returns an array of stack trace elements, each representing one stack
frame. The zeroth element of the array (assuming the array's length is
non-zero) represents the top of the stack, which is the last method
invocation in the sequence. Typically, this is the point at which this
throwable was created and thrown. The last element of the array
(assuming the array's length is non-zero) represents the bottom of the
stack, which is the first method invocation in the sequence.
printStackTrace() or printStackTrace(PrintStream s) the first one without PrintStream argument prints the stacktrace in the standard error output stream (correct guess! that is err.println()). If we wish to print the stacktrace in a file, we pass the printStackTrace() method PrintStream pointing to a file or other destinations.
Alright, now back to logging. There are several logging frameworks that allow you to log data in different levels of severity. For example, you have an enterprise application and you would like to log data based on
SEVERE (Highest)
WARNING
INFO
OTHER levels
The logging framework can be used to do a lot, a few are listed below:
Logging simple text messages
Log levels to filter different log messages
Log categories
Log file rotations
Configuration config file with ability for the configs to be loaded
The huge list goes on
There are a bunch of logging frameworks that you can use based on the requirement of application you are developing:
Log4j
Java Logging API
Apache Commons API
See more in here and here
There are benchmark results for some of these logging framework, for example see here for comparison of Log4j, Logback and Java Logging API.
You have a lot of options to choose from depending on the need of your project, its complexity and level of logging you wish to achieve.
Exception handling good for testing? No.
Is logging good for testing? No.
Exception handling is when you handle an unexpected situations. For example, you are expecting integer input and then you get string instead. The execution breaks if you don't handle such a scenario hence, you write your try and catch blocks to catch such exceptions and then warn the user that s/he should input an integer only. Like this there are many exceptions and exceptions cause the execution of code to be halted. If a user is able to bring halt to execution of your code then that is not a good program hence, you need exception handling to be able to deal with any kind of users, inputed data, etc.
You cannot use Exception handling for testing but, it does aid you. How? Exception handling can be used with testing frameworks, to help you manually throw different types of exceptions and then handle it using your exception handling piece of code.
Logging cannot be used to do test but, it can be used with testing. You can use logging framework with testing framework such as JUnit in order to run the tests as well as log all events that happens during execution of the test. You can configure your logging framework to create special set of log files, each time tests are executed.
If you wish to do logging and wish to be a programmer in the future (you might already be), you definitely need to use Testing frameworks for testing, logging frameworks for logging and exception handling to handle exceptions.

Capture all thrown exceptions in java?

I doubt such a thing is possible, but without attaching a debugger to a java application, is it possible to have some collection populated with information about every exception that is generated in a java application, regardless of if it is caught or not? I know that in .NET, messages get generated by the application about exceptions which at that point are called "First Chance Exceptions", which may or may not subsequently be handled by the application. I'm wondering if there might be a similar mechanism in java I can exploit to view information about all the exceptions generated at runtime.
Just to clarify. This has nothing to do with the context in which an exception occurs. This question is not about what I do in a catch block, or unhandled exceptions. Its about knowing if the JVM provides a mechanism to see every exception generated at runtime, regardless of what generated it, or the context.

Why not, it's of course possible! But firstly.. Logging all exceptions encountered by the JVM is a waste of life. It's meaningless in every sense, there could be several excetion's thrown without any significance.
But if indeed if you have no choice, you could tweak your Java to do that.
Breaking every rule of good programming, what we live for, I give you this idea:
Copy the source code of java.lang.Exception from JDK sources to your project.
Create a method in your Exception.java like below:
private void logException() {
// Your logging routine here.
}
Edit java.lang.Exception to call this method logException() at the end of every constructor.
Add the new java.lang.Exception to bootstrap classpath.
Set your logging levels etc and run.
Put your heads up, present this to your weird client, use your diplomatic skills and scare them in few words 'we can do it.. but its your own risk'. Likely you will convince him not to use this.

How to combine logging with an exception handling chain?

Suppose I have the following code:
void foo() {
/* ... */
try {
bar(param1);
} catch (MyException e) {
/* ??? */
}
}
void bar(Object param1) throws MyException {
/* ... */
try {
baz(param2);
} catch (MyException e) {
/* ??? */
}
}
void baz(Object param2) throws MyException {
/* ... */
if (itsAllATerribleMistakeOhNo) {
/* ??? */
throw new MyException("oops, error.");
}
}
I'm wondering where and how I should be logging the error.
Where the error occurs, below, in baz(), I know exactly what operation went awry and can log that fact.
At the top I have the most general context (e.g. what's the IP of the connection during whose handling we encountered the error.)
Along the way I might have some context which isn't known either at the top or at the bottom.
Another complication is that the error at the bottom might not really be considered an error when you look at it from the top (e.g. looking up something in a database fails; maybe you weren't sure ) - so I might choose to logger.WARN() instead of logger.ERROR().
So, above I described 3 locations (bottom, top, and along the way) - but it's not just a question of where to log, but also what to throw up. At every level in the middle, you have 2x2 options:
Log/Don't log a message
Throw the original exception / wrap the exception in a new exception with the added message.
What are the best practices, or some common wisdom, regarding these complex choices?
Note: I'm not asking about error handling/exception use in general, just about the dilemmae described above.

When it comes to logging, I prefer to keep all my logging at the top at the application boundary. Usually I use an interceptor or filter to handle all logging in a general way. By this concept, I can guarantee that everything is logged once and only once.
In this case, you would log inside your foo() method or whatever the entry point to your application is (you mentioned the IP address, I suppose we are talking about a servlet container or application server).
Than, catch your own exception in the filter/interceptor and log it depending on your needs. Add a catch throwable to catch all other exceptions that you did not handle in your code and log them as an error, since obviously you missed something further down in the stack trace.
This concept requires some planning ahead. You will probably use your own ApplicationException that stores the Error Message (String) along with some severity level (probably in an Enum). You need this to choose the correct log level when you do the actual logging.
This works well for all cases and has the advantage that all logging is happening exactly once. However, there is one case where you still need logging in your code: if you can fully deal with an error somewhere in your code (that is, an exception happens and you can do something that allows you to continue working without (re)throwing an exception). Since your are not throwing an exception, nothing would be logged otherwise.
To sum it up:
Log at the topmost position in a general way, preferably using an interceptor or filter.
Wrap exceptions inside your own ApplicationExceptions and add severity plus other things of interest for logging in your application.

Some suggestions that I tend to follow:
Link for some best practices
1) Trace the exception where it occurs. As the point where the exception occurs if the class or API knows the context in which the exception occurs then tracing and providing a proper log is better. But if the API cannot handle or comment on the exact context then API should not log the event and leave it on the caller.
2) Wrapping the exceptions : When there are lot of exceptions that can be thrown and all exceptions form a similar group (SQLException) which provides single exception and lets you to extract information if needed. Otherwise there would have been an explosion of exceptions that the caller needs to handle.
3) Re-Throwing the exceptions: If the API logs the exception and user can take some actions on that then the Exception MUST be rethrown to tell the user that some error condition occured.
4) Proper cause of exception : The exception message should not be too techy for the caller to understand, the message itself should guide the user to understand the underlying reason for the exception.
UPDATE:
Exception Management in Java

When I throw Exceptions in my code, I do not usually log anything. The exception is information enough.
The only exception to this is, when I am at the border of my system, that is, when the exception will leave the boundary of my system, then I log as I am not sure what the other system will do with the error.
When I handle exceptions, I log them when I actively handle them, that means when I am in a catch clause which does something more then just rethrowing the exception.
Usually this is rather at the top, but this depends on the situation.

When throwing an exception at the testing stage, you should remember:
Keep the exception message as clear as possible. Stack traces can be confusing at the best of times so ensure that what you are reading, at least, makes sense to you.
Ensure that the exception is relevant to the event. If the user types in the wrong value and you throw a NullPointerException, your code is illogical and loses it's value.
Ensure that it has as much information ABOUT THE EVENT as possible. That is, keep the message relevant. If a database call has gone wrong, print the connection string to the database, and the SQL query attempted. The state of every variable currently being used isn't necessary.
Don't waffle. It's tempting to type in technical jargon to make it look like you're hacking into the matrix. It doesn't help you in a stressful situation, and it certainly doesn't help anyone else using your code. Simple english words are always preferable.
Finally, NEVER IGNORE AN EXCEPTION. Always ensure you handle the exception, and you're outputting details in some way, following the rules I've stated above.

What level to use for exception stack trace logging in Java?

I'm looking for best practices document (or your opinions) on how to effectively log exceptions and their stack traces. Of course, assuming one of popular logging frameworkks such as Log4J, SLF4J, java.util.logging, etc.
I'm particularly interested in your opinion about on what level stack traces should be logged.
I heard few contradicting each other opinions such as:
stack traces should be logged only on DEBUG level while ERROR level should contain only "human readable" error message
stack traces should be logged on ERROR level in order to give to the operator maximum amount of information required to find root cause of an exception
I have found couple of interesting articles however none of them touches this particular subject:
http://today.java.net/pub/a/today/2006/04/06/exception-handling-antipatterns.html
http://today.java.net/pub/a/today/2003/12/04/exceptions.html
which probably means that authors of these articles had same concerns as I do :-)
I'd be really interested in your view on this subject.

Stack trace is the most valuable piece of information you get when troubleshooting. I would never risk logging it on DEBUG level since it might be disabled. And I almost never want to suppress stack traces.
Also note that:
log.error("Houston, we have a problem", ex);
will print the human readable message in line marked as ERROR, while the stack trace is following that line. If you want your errors to be only human readable, just do grep ERROR.

I'm not sure about best-practice advice for this, but in the end, for me it boils down to this:
Exceptions should only be visible in exceptional circumstances. The concept of exception was invented to give developers a chance to handle errors internally.
In reality, most code I see doesn't even try to handle them, instead dumping them to the log, sysout, (or worst case of all) into dialog boxes. I know, that for a developer it is important in some cases to get the full stacetrace. But not nearly in all of them. Creating your own exception framework (which is definitely a best practice) might already be enough to figure out the context of an exception simply by classname.
So I would advise to do the following:
Create your own exception framework
Include specific error codes in the message, for your reference
Log the exception message on ERROR
Log the stacktrace on DEBUG
NEVER EVER display the user either of these. Instead show a useful message. Maybe include a way to report the error (with stacktrace) with minimal fuzz.
Note: If you are writing an internal "enterprise" software, forget everything I wrote. :-)

I think the stack trace should be logged at the appropriate place based on priority best practices. Depending on the nature of the exception and its place within your application, this may be one of many levels. Please see this related question:
Commons Logging priority best practices

How to know when there's too much logging messages?

I came across one very good library for parsing CUE files. But when I started to read its source code, I realized that it is almost unreadable:
public void setParent(final CueSheet parent) {
FileData.logger.entering(FileData.class.getCanonicalName(), "setParent(CueSheet)", parent);
this.parent = parent;
FileData.logger.exiting(FileData.class.getCanonicalName(), "setParent(CueSheet)");
}
every method has logger.entering() and logger.exiting() messages. Isn't that too much?
There's another java library for parsing audio tags. It also had like 15 log messages for each file it read. It was annoying so I commented out every call to logger. And the library became twice as fast, because they used a lot of string concatenation for log messages.
So the question is: should I really log everything, even if it is not large enterprise application? Because these libraries obviously don't need any logging, except for error messages. And my experience shows that loggers are terrible tool for debugging. Why should I use it?

How to know when is too much logging? When you know that the logged information isn't important in the long term, such as for straightforward debug actions or bug correction, or for when the application doesn't deal with too much important information.
Sometimes you need to log almost everything. Is performance or full possibility of analysis the most important part of an application? It really depends.
I've worked in the past with some integration with a lot of different webservices, like 10 in a same app. We logged all xml requests and responses. Is this an overhead? In the long term, I don't think so because we worked with a lot of credit card operations and should have every process made with the server logged. How to know what happened when there was a bug?
You wouldn't believe what I've seen in some of the xml responses. I've even received a xml without closing tags, from a BIG airplane company. Were the "excessive logs" a bad practice? Say that to your clients when you have to prove that the error came from the other vendor.

Ideally, you use a logger that allows logging levels; log4j has fatal/error/warn/debug/info, for example. That way, if you set the level to "only show errors", you don't lose speed to the software building log messages you didn't need.
That said, it's only too much logging until you wind up needing something that would have been logged. It sounds like most of the logging that's slowing you down should be "trace" level, though; it's showing you what a profiler would have.

Most logging libraries incorporate a means to confirm that logging is enabled before processing an instruction:
For example:
public void foo(ComplicatedObject bar) {
Logger.getInstance(Foo.class).trace("Entering foo(" + bar + ")");
}
Could be quite costly depending on the efficiency of the bar.toString() method. However, if you instead wrap that in a check for the logging level before doing the string concatenation:
static {
Logger log = Logger.getInstance(Foo.class);
public void foo(ComplicatedObject bar) {
if (log.isTraceEnabled()) {
log.trace("Entering foo(" + bar + ")");
}
}
Then the string concatenation only occurs if at least one appender for the class is set to Trace. Any complicated log message should do this to avoid unnecessary String creation.

This level of logging is canonically bad - in fact, I saw code exactly like this in the Daily WTF a few days ago.
But logging is in general a Very Good Thing.

It depends, it this code for an application, or a library? For an application, logger are useful once the code is in production. It should not be used to debug, but to help you replicate a bug. When a user tells you that your application crashed, you always want the maximum logging information.
I agree that it makes the code less readable. It even make the application slower!
It's a total different game for a library. You should have consistent logging with adequate level. The library SHOULD inform the development team when an error occurs.

Logging should provide you with information that a stack trace can't in order to track down a problem. This usually means that the info is some kind of historical trace of what the program did, as opposed to what state it's in at the time of failure.
Too much historical data will be ignored. If you can safely deduce that a method was called without having to actually log its entry and exit, then it's safe to remove those logging entries.
Another bad sign is if your logging files start to use up a huge amounts of disk space. You're not only sacrificing space, you're probably slowing down the app too much.

To answer the question, why should I use loggers?
Have you ever encountered a piece of software where the only error indicated presented to the end user is Fatal error occured. Would it not be nice to find out what have caused it?
Logging is a tool that can really help you narrow these kind of problems in the field.
Remember, end-user systems don't have nice IDE's to debug and the end-users usually are not knowledgeable enough to run these tools. However end-users, in most cases, are capable of copying log configuration files ( written by us, clever programmers ) into predefined location and fetch log files and email them back to us ( poor soles for having to parse megabytes of log output ) when they encounter problems.
Having said this, logging should be highly configurable and under normal conditions produce minimal output. Also, guards should protect finer level logging from consuming too many resources.
I think in the example that you have provided all logging should have been done on a TRACE level. Also, because nothing bad can really happen between function entry point and exit, it probably make sense to have only one log statement there.

Over the years I've swayed backwards and forwards between promoting logging everything at the appropriate levels (trace, info, etc...) and thinking that any is a complete waste of time. In reality it depends on what is going to be useful to track down or required (logging can be a cheap way of maintaining an audit trail).
Personally, I tend to log entry/exit at a component or service level and then log significant points in the processing such as a business logic decision or a call on another service/component. Of course errors are always logged, but once only and at the place they were handled (the stack trace and exception message should have sufficient info to diagnose the problem) and any service/component interface should always handle an errors (even if it is just converting it into another more appropriate to the caller).
The problem with logging stuff on the off chance something goes wrong is that you end up with too much information that it is impossible to identify the issue, especially if it is running under a server as you end up with loads of intertwined log entries. Obviously you can get around that by incorporating a request id in the entry and using some software to filter on that. Of course you also have the case where your application is distributed and/or cluster and you have multiple logs.
Nowadays I would never actually write trace entering/exiting entries code, the code just gets in the way and it is so much easier to use something like aspectj if it is really needed. Using aspectj also would guarantee to be consistent (you can change the log format in one place rather than having to change every operation) and accurate (in case some refactoring adds a new paramater and teh developer forgets to add it to the logging).
One thing I have thought about doing or looking to see if someone already has is a logger that will hold the entries in memory, then if an error is encountered they are written, if the operation succeeds the entries are just discarded. If anyone knows of one (ideally for log4j) please let me know, alternatively I have a few ideas on how to implement this if anyone is interested in doing one.

This is where log levels are helpful. In general, log levels in the order of verbosity and priority are TRACE, DEBUG, INFO, WARN, ERROR, FATAL.
The developer has to take a conscious call to use the correct log level while logging in the code.
While creating an instance of Logger we have to pass the correct log level by choosing it from a config (always prefer config). This decides which levels to be logged. For example, while creating the logger, if the config for log level is set to "INFO", anything below "INFO" (TRACE, DEBUG) won't be logged.
For instance, in the example you mentioned above, a TRACE OR DEBUG level would make more sense.
In runtime in production, the config for log level should always be set to INFO.
When an issue occurs in production and if the developer wants to find out the root cause, they can request for changing the log level to TRACE or DEBUG (mostly inside a QA environment where they can replicate the scenario), to see what exactly is happening (The app sometimes has to be restarted to have the log level changed, but it is helpful).
Log levels is a great practice, as most of the times, we won't be able to launch a debugger in the landscapes. As we are skipping the unnecessary file writes by choosing a higher log level, the performance won't take a hit

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.