Java logging export using fluentd

Java logging export using fluentd - java

We are streaming Jetty server logs to stackdriver using google-fluentd. The issue I'm having is that fluentd is treating each line in the log as a separate log entry. This is problematic for log analysis later.
I've tried a few format multiline patterns but they're not very reliable, there are quite a few edge cases to handle (exception stacktrace, etc). I think it would be best to just replace all newlines with "\n" within the same log entry, which would solve the issue. I can always replace "\n" later to make it more readable.
I couldn't find log4j property that does this... anyone knows which setting I need to tweak?
Thanks.

It's not log4j, and it likely never will be.
Its configured with java.util.logging on gcloud, but as a system logger (not application controled), with limited ability to configure it (only system properties).
There's a pull request with Google that fixes some of the issues surrounding it, but generally speaking its not meant to be configured by the application.
Note: in the future the connection between the application and the fluentd will be a formal non-logging API.

Related

three questions logging in spring

I have 3 question about logging inside srping
First:
spring documentation:
By default, If you use the ‘Starter POMs’, Logback will be used for
logging. Appropriate Logback routing is also included to ensure that
dependent libraries that use Java Util Logging, Commons Logging, Log4J
or SLF4J will all work correctly.
I don't understand that if a third-party library uses a different logger, what problem will be created in the program? If that library uses another logger, that logger is located as a dependency in its jar file, and when the library is added that logger is also added and there is no problem.
second:
I saw in a tutorial that it says that trace and debug are disabled by default in spring because it causes performance problems. I understand why trace is a problem because it must report everything that happens in the program. But why does debug cause performance problems? When I did: debug=true, it didn't take me that much time. So what's the problem?
Third:
In this tutorial, it says that logback does not have a FATAL level. Why not? Is it possible that the spring boot program does not have some of the required settings, but it can still start without the need for FATAL?

Different logging implementations require different configurations. Log4j uses XML and Java-Util-Log (JUL) uses properties. Also the xml sematics differs.
So you do not like to configure all logging implementations individually. You like to configure one logging configuration to rule them all, one source-of-truth for logging-config. This have nothing to do with the main intend of the software you are running. Later logging-frameworks generalize older logging-frameworks, so you need the latest logging-framework to rule them all.
Let me rephrase first: Why do we differ between debug and trace? Debug (or de-bug) is a special condition that let you inspect a bug for debugging purposes. Debug may show clients real-world firstname and family-name, in order to output those informations you need code under debugging-circumstances only. To log them may cause legal problems because you are processing/storing personal informations without permission in log-files. In order to de-bug a software you need the debug-log in 90% of all cases. Only in rare cases you need the trace-log. That meens they differ.
Thats a good one. Fatal for me meens the server has hardware problems (burning hard-drive, loose of power supply). This is indicated by errors. Seriously? I have no idea. I may argue that everything that is fatal-worthit should be an error.

Logging from Java app to ELK without need for parsing logs

I want to send logs from a Java app to ElasticSearch, and the conventional approach seems to be to set up Logstash on the server running the app, and have logstash parse the log files (with regex...!) and load them into ElasticSearch.
Is there a reason it's done this way, rather than just setting up log4J (or logback) to log things in the desired format directly into a log collector that can then be shipped to ElasticSearch asynchronously? It seems crazy to me to have to fiddle with grok filters to deal with multiline stack traces (and burn CPU cycles on log parsing) when the app itself could just log it the desired format in the first place?
On a tangentially related note, for apps running in a Docker container, is best practice to log directly to ElasticSearch, given the need to run only one process?

If you really want to go down that path, the idea would be to use something like an Elasticsearch appender (or this one or this other one) which would ship your logs directly to your ES cluster.
However, I'd advise against it for the same reasons mentioned by #Vineeth Mohan. You'd also need to ask yourself a couple questions, but mainly what would happen if your ES cluster goes down for any reason (OOM, network down, ES upgrade, etc)?
There are many reasons why asynchronicity exists, one of which is robustness of your architecture and most of the time that's much more important than burning a few more CPU cycles on log parsing.
Also note that there is an ongoing discussion about this very subject going on in the official ES discussion forum.

I think it's usually ill-advised to log directly to Elasticsearch from a Log4j/Logback/whatever appender, but I agree that writing Logstash filters to parse a "normal" human-readable Java log is a bad idea too. I use https://github.com/logstash/log4j-jsonevent-layout everywhere I can to have Log4j's regular file appenders produce JSON logs that don't require any further parsing by Logstash.

There is also https://github.com/elastic/java-ecs-logging which provides a layout for log4j, log4j2 and Logback. It's quite efficient and the Filebeat configuration is very minimal.
Disclaimer: I'm the author of this library.

If you need a quick solution I've written this appender here Log4J2 Elastic REST Appender if you want to use it. It has the ability to buffer log events based on time and/or number of events before sending it to Elastic (using the _bulk API so that it sends it all in one go). It has been published to Maven Central so it's pretty straight forward.
As the other folks have already mentioned the best way to do it would be to save it to file, and then ship it to ES separately. However I think that there is value if you need to get something running quickly until you have time/resources implement the optimal way.

Avoid printStackTrace(); use a logger call instead

In my application, I am running my code through PMD.It shows me this message:
Avoid printStackTrace(); use a logger call instead.
What does that mean?

It means you should use logging framework like logback or log4j and instead of printing exceptions directly:
e.printStackTrace();
you should log them using this frameworks' API:
log.error("Ops!", e);
Logging frameworks give you a lot of flexibility, e.g. you can choose whether you want to log to console or file - or maybe skip some messages if you find them no longer relevant in some environment.

If you call printStackTrace() on an exception the trace is written to System.err and it's hard to route it elsewhere (or filter it). Instead of doing this you are advised using a logging framework (or a wrapper around multiple logging frameworks, like Apache Commons Logging) and log the exception using that framework (e.g. logger.error("some exception message", e)).
Doing that allows you to:
write the log statement to different locations at once, e.g. the console and a file
filter the log statements by severity (error, warning, info, debug etc.) and origin (normally package or class based)
have some influence on the log format without having to change the code
etc.

A production quality program should use one of the many logging alternatives (e.g. log4j, logback, java.util.logging) to report errors and other diagnostics. This has a number of advantages:
Log messages go to a configurable location.
The end user doesn't see the messages unless you configure the logging so that he/she does.
You can use different loggers and logging levels, etc to control how much little or much logging is recorded.
You can use different appender formats to control what the logging looks like.
You can easily plug the logging output into a larger monitoring / logging framework.
All of the above can be done without changing your code; i.e. by editing the deployed application's logging config file.
By contrast, if you just use printStackTrace, the deployer / end user has little if any control, and logging messages are liable to either be lost or shown to the end user in inappropriate circumstances. (And nothing terrifies a timid user more than a random stack trace.)

In Simple,e.printStackTrace() is not good practice,because it just prints out the stack trace to standard error. Because of this you can't really control where this output goes.

Almost every logging framework provides a method in which we can pass the throwable object along with a message. Like:
public trace(Marker marker, String msg, Throwable t);
They print the stacktrace of the throwable object.

Let's talk in from company concept. Log gives you flexible levels (see Difference between logger.info and logger.debug). Different people want to see different levels, like QAs, developers, business people. But e.printStackTrace() will print out everything. Also, like if this method will be restful called, this same error may print several times. Then the Devops or Tech-Ops people in your company may be crazy because they will receive the same error reminders.
I think a better replacement could be log.error("errors happend in XXX", e)
This will also print out whole information which is easy reading than e.printStackTrace()

The main reason is that Proguard would remove Log calls from production. Because by logging or printing StackTrace, it is possible to see them (information inside stack trace or Log) inside the Android phone by for example Logcat Reader application. So that it is a bad practice for security. Also, we do not access them during production, it would better to get removed from production. As ProGuard remove all Log calls not stackTrace, so it is better to use Log in catch blocks and let them removed from Production by Proguard.

Advantage of log4j

What's the advantage of log4j over set System.out and System.err to output to a log file?

At a high level, the win from Log4j over manual logging is that you can decouple your logging code from what you actually want to log and where and how you want to log it. Details about logging verbosity/filtering, formatting, log location, and even log type (files, network, etc.) are handled declaratively using configuration and extensibly via custom appenders, rather you having to code that flexibility yourself.
This is critically important because it's often hard for developers to predict how logging needs will change once their software is in production. Operations teams managing that software may need less verbose logs, may need mulitple logs, may need to ship those logs to multiple servers, may need to sometimes get really verbose data for troubleshooting, etc. And it's usually impossible for operations teams, if they need to change how logging works, to convince the developer to make big code changes. This often leads to production downtime, friction between operations and development, and wasted time all around.
From the developer's point of view, Log4j insulates you from having to make code changes to support logging, and insulates you from being pestered by people who want logging changes. It enables people managing your code to scratch their own itch rather than bugging you!
Also, since Log4j is the de-facto standard for Java logging, there are lots of tools available which can do cool things with Log4j-- furthermore preventing you and your operations teams from re-inventing the wheel.
My favorite feature is the ability to easily write appenders send data to non-file sources, like SYSLOG, Splunk, etc. which makes it easy to your app's custom logging into operations management tools your IT department is already using.

Actually, you should look into the slf4j facade these days, as it allows you to use {}-placeholders for the most concise statements. You can then use the appropriate logging framework behind slf4j to handle the actual treatment of your log statements. This could be log4j or the slf4j-simple which just prints out all of INFO, WARN and ERROR, and discards the rest.
The crucial observation you need to make is that the WRITING of log statements is done when the code is written, and the DECISION of what is needed is done when the code is deployed, which may be years after the code was written and tested. System.out.println requires you to physically change your code to get rid of them, which is unacceptable in a rigid write-test-deploy cycle. IF the code changes, it must be retested. With slf4j you just enable those you want to see.
We have full logging in the test phase, and rather verbose logging in the initial period of a production deployment, after which we go down to information only. This gives us full information in a scenario where debugging a case is very rarely possible.
You might find this article I wrote interesting. The target audience is beginning Java programmers, with my intention of giving them good habits from the start. http://runjva.appspot.com/logging101/index.html

my favorites (not all)
Ability to set parameters of logging in config, without recompiling
Ability to set the way log is written (from text file to SMTP sender)
Ability to filter by severity

Levels, formatting, logging to multiple files... A logging framework (even if it's java.util.logging) is really beneficial if there's a chance anything may go wrong while your code is running.

log4j allows you to log to various resources e.g. event log, email, file system etc while allowing your application to remain decoupled from all of these resources. Furthermore, you get to use a common interface to log to all of the various resources without having to learn or integrate thier corresponding APIs.

Log4j offers the ability to rotate your log files based on size and delete them based on quantity (logrotate), so your servers don't fill up their disks. Personally I think that is one of the more valuable features in Log4j.
Also Log4j is popular and understood by many developers. The last three companies I've worked at have all used Log4j in most projects.

Take a look and you will understand the power of log4j :
log4j.properties I used once for a project :
# ALL < DEBUG < INFO < WARN < ERROR < FATAL < OFF
# No appenders for rootLogger
log4j.rootLogger=OFF
folder=..
prefix=
fileExtension=.log
htmlExtension=${fileExtension}.html
datestamp=yyyy-MM-dd/HH:mm:ss.SSS/zzz
layout=%d{${datestamp}} ms=%-4r [%t] %-5p %l %n%m %n%n
# myLogger logger
log4j.logger.myLogger=ALL, stdout, infoFile, infoHtml, errorFile
# stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=${layout}
# infoFile
log4j.appender.infoFile=org.apache.log4j.FileAppender
log4j.appender.infoFile.File=${folder}/${prefix}_info${fileExtension}
log4j.appender.infoFile.layout=org.apache.log4j.PatternLayout
log4j.appender.infoFile.layout.ConversionPattern=${layout}
# infoHtml
log4j.appender.infoHtml=org.apache.log4j.FileAppender
log4j.appender.infoHtml.File=${folder}/${prefix}_info${htmlExtension}
log4j.appender.infoHtml.layout=org.apache.log4j.HTMLLayout
log4j.appender.infoHtml.layout.Title=Logs
log4j.appender.infoHtml.layout.LocationInfo=true
# errorFile
log4j.appender.errorFile=org.apache.log4j.FileAppender
log4j.appender.errorFile.File=${folder}/${prefix}_error${fileExtension}
log4j.appender.errorFile.layout=org.apache.log4j.PatternLayout
log4j.appender.errorFile.layout.ConversionPattern=${layout}
# APPENDERS SETTINGS
log4j.appender.stdout.Threshold = ALL
log4j.appender.infoFile.Threshold = INFO
log4j.appender.infoHtml.Threshold = INFO
log4j.appender.errorFile.Threshold = WARN.
To change the variables in your java code you can do :
Loading Configuration
Log4j will automatically load the configuration if it is stored in a
file called "log4j.properties" and is present on the classpath under
"" (e.g. WEB-INF/classes/log4j.properties).
I don't like that approach and prefer to load the configuration
explicitly by calling:
PropertyConfigurator.configure( Config.ETC + "/log4j.properties" );
This way I can reload the configuration at any time as long as my
application is still running. I like to add a button to an
administrative jsp, "Reload Log4J".
Dynamic Log File Location
Many people complain that Log4j forces you to hard-code the location
where your logs will be kept. Actually, it is possible to dynamically
choose the log-file location, especially if you use the ${log.dir}
property substitution technique above. Here's how:
String dynamicLog = // log directory somehow chosen...
Properties p = new Properties( Config.ETC + "/log4j.properties" );
p.put( "log.dir", dynamicLog ); // overwrite "log.dir"
PropertyConfigurator.configure( p );

logging (Document historical business events that occur, you can check old logs)
track the application (project flow)
debugging the application (Detailed information what occurs in a method at granular level //data, value and all inside methods)
error handling (information about specific error that occur)

What's wrong with using System.err in Java?

I'm using the Enerjy (http://www.enerjy.com/) static code analyzer tool on my Java code. It tells me that the following line:
System.err.println("Ignored that database");
is bad because it uses System.err. The exact error is: "JAVA0267 Use of System.err"
What is wrong with using System.err?

Short answer: It is considered a bad practice to use it for logging purposes.
It is an observation that in the old times when there where no widely available/accepted logging frameworks, everyone used System.err to print error messages and stack traces to the console. This approach might be appropriate during the development and local testing phase but is not appropriate for a production environment, because you might lose important error messages. Because of this, in almost all static analysis tools today this kind of code is detected and flagged as bad practice (or a similarly named problem).
Logging frameworks in turn provide the structured and logical way to log your events and error messages as they can store the message in various persistent locations (log file, log db, etc.).
The most obvious (and free of external dependencies) hack resolution is to use the built in Java Logging framework through the java.util.logging.Logger class as it forwards the logging events to the console by default. For example:
final Logger log = Logger.getLogger(getClass().getName());
...
log.log(Level.ERROR, "Something went wrong", theException);
(or you could just turn off that analysis option)

the descriptor of your error is:
The use of System.err may indicate residual debug or boilerplate code. Consider using a
full-featured logging package such as Apache Commons to handle error logging.
It seems that you are using System.err for logging purposes, that is suboptimal for several reasons:
it is impossible to enable logging at runtime without modifying the application binary
logging behavior cannot be controlled by editing a configuration file
problably many others

Whilst I agree with the points above about using a logging framework, I still tend to use System.err output in one place: Within shut-down hooks. This is because I discovered that when using the java.util.logging framework log statements are not always displayed if they occur in shut-down hooks. This is because the logging library presumably contains its own shutdown hook to clean up log files and other resources, and as you can't rely on the order in which shutdown hooks run, you cannot rely on java.util.logging statements working as expected.
Check out this link (the "Comments" section) for more information on this.
http://weblogs.java.net/blog/dwalend/archive/2004/05/shutdown_hooks_2.html
(Obviously the other alternative is to use a different logging framework.)

System.err is really more for debugging purposes than anything else. Proper exception handling and dealing with errors in a manner that is more user-friendly is preferred. If the user is meant to see the error, use a System.out.println instead.
If you want to keep track of those errors from a developer's standpoint, you should use a logger.

Things written to System.err are typically lost at runtime, so it is considered a better practice to use a logging framework that is more flexible about where to output the message, so it can be stored as a file and analyzed.
System.err and System.out for non-console applications is only ever seen by the developer running the code in his or her IDE, and useful information may get lost if the item is triggered in production.

System.err.println and System.out.println should not be used as loggging-interface. STD-Output and STD-Error (these are written by System.out and .err) are for messages from command-line-tools.

System.err prints to the console. This may be suitable for a student testing their homework, but will be unsuitable for an application where these messages won't be seen (Console only store so many lines).
A better approach would be to throw an exception holding the message that would normally get sent to the console. An alternative to this would be use third party logging software which would store this messages in a file which can be stored forever.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.