Logging from Java app to ELK without need for parsing logs

Logging from Java app to ELK without need for parsing logs - java

I want to send logs from a Java app to ElasticSearch, and the conventional approach seems to be to set up Logstash on the server running the app, and have logstash parse the log files (with regex...!) and load them into ElasticSearch.
Is there a reason it's done this way, rather than just setting up log4J (or logback) to log things in the desired format directly into a log collector that can then be shipped to ElasticSearch asynchronously? It seems crazy to me to have to fiddle with grok filters to deal with multiline stack traces (and burn CPU cycles on log parsing) when the app itself could just log it the desired format in the first place?
On a tangentially related note, for apps running in a Docker container, is best practice to log directly to ElasticSearch, given the need to run only one process?

If you really want to go down that path, the idea would be to use something like an Elasticsearch appender (or this one or this other one) which would ship your logs directly to your ES cluster.
However, I'd advise against it for the same reasons mentioned by #Vineeth Mohan. You'd also need to ask yourself a couple questions, but mainly what would happen if your ES cluster goes down for any reason (OOM, network down, ES upgrade, etc)?
There are many reasons why asynchronicity exists, one of which is robustness of your architecture and most of the time that's much more important than burning a few more CPU cycles on log parsing.
Also note that there is an ongoing discussion about this very subject going on in the official ES discussion forum.

I think it's usually ill-advised to log directly to Elasticsearch from a Log4j/Logback/whatever appender, but I agree that writing Logstash filters to parse a "normal" human-readable Java log is a bad idea too. I use https://github.com/logstash/log4j-jsonevent-layout everywhere I can to have Log4j's regular file appenders produce JSON logs that don't require any further parsing by Logstash.

There is also https://github.com/elastic/java-ecs-logging which provides a layout for log4j, log4j2 and Logback. It's quite efficient and the Filebeat configuration is very minimal.
Disclaimer: I'm the author of this library.

If you need a quick solution I've written this appender here Log4J2 Elastic REST Appender if you want to use it. It has the ability to buffer log events based on time and/or number of events before sending it to Elastic (using the _bulk API so that it sends it all in one go). It has been published to Maven Central so it's pretty straight forward.
As the other folks have already mentioned the best way to do it would be to save it to file, and then ship it to ES separately. However I think that there is value if you need to get something running quickly until you have time/resources implement the optimal way.

Related

Java logging export using fluentd

We are streaming Jetty server logs to stackdriver using google-fluentd. The issue I'm having is that fluentd is treating each line in the log as a separate log entry. This is problematic for log analysis later.
I've tried a few format multiline patterns but they're not very reliable, there are quite a few edge cases to handle (exception stacktrace, etc). I think it would be best to just replace all newlines with "\n" within the same log entry, which would solve the issue. I can always replace "\n" later to make it more readable.
I couldn't find log4j property that does this... anyone knows which setting I need to tweak?
Thanks.

It's not log4j, and it likely never will be.
Its configured with java.util.logging on gcloud, but as a system logger (not application controled), with limited ability to configure it (only system properties).
There's a pull request with Google that fixes some of the issues surrounding it, but generally speaking its not meant to be configured by the application.
Note: in the future the connection between the application and the fluentd will be a formal non-logging API.

Are logging tools like log4j useful only during development phase? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I need to log the activity of the users connected to my server, should I use log4j? Or is log4j beneficial only during development phase?

They are actually not particularly useful during development, System.out.println is pretty good for most dev debug logging, but once you deploy the following abilities become really useful:
roll logfiles so they don't get too big allowing for continuous maintenance-free operation
add times/dates so you can look at the logs for a certain time period
Change verbosity on the fly (You don't always want trace or debug info, but being able to flip it on when he system isn't running well can be a lifesaver)
Re-route logfiles to a more accessible place... Log4j can send your logs to various databases or other locations for when you can't actually reach your server directly.
Some of our code has trace statements on every significant line. If we run into problems when we are developing we leave the debugging/trace statements in and are able to turn them on when we need to in production--almost equivalent to single-stepping through your deployed code. In addition most methods have trace or debug statements at the top showing the parameters being passed in and program flow--again only really useful for a deployed system where a debugger is unavailable.
So in short, yes it's useful after development.
edit (in response to comment question)--
Just as an example. The app I'm working on now has 20ish logs. One is "Performance", it logs data coming in including timings--sometimes more than one line a second. This logfile "Rolls" at 10mb (about hourly) but we use it to find lags in our data delivery. We even use other software to analyze this log sometimes to look for patterns in data timing.
We have a separate "Error" log that logs all error-level activity. This log doesn't roll so fast that we lose data when we are getting a bunch of other log information.
There is another log to put problems related to Hibernate/SQL and one for problems related to our message queue and one for our inter-app cache....
These are all also combined into a "main" log through the log4j config file. We can reconfigure any one log to a different level (for instance, we were having authentication problems with a data source so we turned up it's debugging level on that source to find out what had changed in our server's environment to cause that)
Even though some of the logs scroll 10mb in an hour (our max file size). Log4j will roll them into .1 and .2 files so we can keep 10-50 of them depending on need.
All of this is done through config files and can be changed after deployment without rebuilding the system.
Edit 2--another thought
Another useful point about using log4j and the java logging interface is that libraries like hibernate using it can be configured through xml files without rebuilding.
Without Log4j/java's logging APIs you would either A) have a custom API to control the logs, B) only have default logging or C) have no logging from that subsystem. Since Hibernate uses java's APIs however, you can set the log level for "Hibernate" log info in a standard, documented xml config file and even re-route it's logs to a database or other logging device.

Logging is especially useful to locate errors that occur in productive code. During development you can use more powerful tools like debuggers.
Of course you have to be aware that logging potentially affects performance and can create huge files. That's why tools like log4j can be configured to turn on and off logging or to control its verbosity.
It's ok to use log4j because it is the most common library for Java. However I personally find its configuration a bit unitntuitive.

Another reason for using loggers such as Log4j is to allow for different logging levels for different components. Logs can get pretty large and messy if you turn DEBUG on for everything. If you know what area of code you want to "magnify" in your logs you can ratchet up the logging for that area alone.

yes (or any other logging framework / SLF4J)
no

Distributed Metrics

I have been working on a single box application which uses codehale metrics heavily for instrumentation. Right now we are moving to cloud and I have below questions on how I can monitor metrics when the application is distributed.
Is there a metrics reporter that can write metrics data to Cassandra?
When and how does the aggregation happen if there are records per server in the database?
Can I define the time interval at which the metrics data gets saved into the database?
Are there any inbuilt frameworks that are available to achieve this?
Thanks a bunch and appreciate all your help.

I am answering your questions first, but I think you are misunderstanding how to use Metrics.
You can google this fairly easily. I don't know of any (I also don't understand what you'll do with it in cassandra?). You would normally use something like graphite for that. In any case, a reporter implementation is very straight forward and easy.
That question does not make too much sense. Why would you aggregate over 2 different servers - they are independent. Each of your monitored instances should be standalone. Aggregation happens on the receiving side (e.g. graphite)
You can - see 1. Write a reporter, and configure it accordingly.
Not that i know of.
Now to metrics in general:
I think you are having the wrong idea. You can monitor X servers, that is not a problem at all, but you should not aggregate that on the client side (or database side). how would that even work? Restarts zero the clients, and essentially that means you need to track the state of each of your servers so that your aggregation does work. How do you manage outages?
The way you should monitor your servers with metrics:
create a namespace
io.my.server.{hostname}.my.metric
now you have X different namespaces, but they all have a common prefix. That means, you have grouped them.
Send them to your prefered monitoring solution.
There are heaps out there. I do not understand why you want this to be cassandra - what kind of advantage do you gain from that? http://graphite.wikidot.com/ for example is a graphng solution. Your applications can automatically submit data there (graphite comes with a reporter in java that you can use). See http://graphite.wikidot.com/screen-shots on how it looks like.
The main point is that graphite (and all or most providers) know how to handle your namespaces. E.g. also look at Zabix, which can do the same thing.
Aggregations
Now the aggregation happens on the receiving side. Your provider knows how to do that, and you can define rules.
For example, you could wildcard alerts like:
io.my.server.{hostname}.my.metric.count > X
Graphite (I believe) even supports operations, e.g:
sum(io.my.server.{hostname}.my.metric.request) - which would sum up ALL your hosts's requests
That is where the aggregation happens. At that point, your servers are again standalone (as they should), and have no dependency on each other or any monitoring database etc. They simply report on their own metrics (which is what they should do) and you - as the consumer of those metrics - are responsible to make the right alerts/aggregations/formulars on the receiving end.
Aggregating this on server side would involve:
Discover all other servers
Monitor their state
Receive/send metrics back and forth
Synchronise what they report etc
That just sounds like a nightmare for maintenance :) I hope that gives you some inside/ideas.
(Disclaimer: Neither a metrics dev nur a graphite dev - this is just how I did this in the past/ and the approach I still use)
Edit:
With your comment in mind, here are my two fave solutions on what you want to achieve:
DB
you can use the DB and store dates e.g. for start message and end message.
This is not really a metric thing so maybe not preferred. As per your question you could write your own reporter on that, but it would get complicated with regards to upserts/updates etc. I think option 2 is easier and has more potential.
Logs
This is I think what you need. Your servers independently log on Start/Stop/Pause etc - whatever it is you want to report on. You then set up logstash and collect those logs.
Logstash allows you to track these events over time and create metrics on it, see:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-metrics.html
Or:
https://github.com/logstash-plugins/logstash-filter-elapsed
The first one uses actual metrics. The second one is a different plugin that just measures times between start/stop events.
This is the option with the most potential because it does not rely on any format/ any data store or anything other. You even get Kibana for plotting out of the box if you use the entire ELK stack.
Say you wanted to measure your messages. You can just look for the logs, there are no application changes involved. The solution does not even touch on your application (e.g. storing your reporting data manually does take up threads and processing in your applications, so if you need to be real-time compatible this will put your overall performance down), it is a complete standalone solution. Later on, when wanting to measure other metrics, you can easily add to your logstash configuration and start doing other metrics.
I hope this helps

How is logback's "prudent mode" implemented?

The prudent mode in logback serializes IO operations between all JVMs writing to the same file, potentially running on different hosts. In other logging frameworks, logging to a central TCP (or JMS) appender seems to be the only solution if output from many loggers should go to the same file.
As I am using a Delphi library which is based on log4j and also can not log to the same file from different instances of the same applications (on a terminal server), it would be interesting to know how this feature is implemented. - p.s. I'll check the logback source code and come back to answer my question if nobody is faster :)

It's implemented with a simple FileLock. You can check in the source of FileAppender.

Using java.util.logging, is it possible to restart logs after a certain period of time?

I have some java code that will be running as an importer for data for a much larger project. The initial logging code was done with the java.util.logging classes, so I'd like to keep it if possible, but it seems to be a little inadequate now given he amount of data passing through the importer.
Often times in the system, the importer will get data that the main system doesn't have information for or doesn't match the system's data so it is ignored but a message is written to the log about what information was dropped and why it wasn't imported. The problem is that this tends to grow in size very quickly, so we'd like to be able to start a fresh log daily or weekly.
Does anybody have an idea if this can be done in the logging classes or would I have to switch to log4j or custom?
Thanks for any help!

I think you have to roll your own StreamHandler (at least as of Java 1.5 it didn't come with an implementation). Here is someone who did it.

You can use log4j and use the DatedFileAppender (distributed separately). This creates a single file per Date. I like it very much and use it everywhere where I implement log4j (even my Tomcat server logs through it!).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.