log4j2 multiple log files performance vs one big log file

log4j2 multiple log files performance vs one big log file - java

Working on a huge Java/Spring monolith app. We use log4j2 for logging. We have very few log files because our chief architect does not accept creating more log files for separate things, because a) he "knows" that every new file will impact performance negatively (writing to many files in a monolith will impact performance much more than writing to single one or the few we have) b) we use a log aggregator service in production and CI that can be used to filter out/sort all logs, and he considers that this is the way to go all time.
Problem is, though, that using so few log files makes a mess in the actual logs and without using any tools to sort and filter, developers usually look into those files only for errors and not understanding the flow or anything else, thus more time spent actually debugging app than adding logs or going through them. We do not have log aggregators for development machines obviously as it is a paid service for production/CI only.
Personally I do not like the idea of having a mess in the log files the way we have and consider it a mistake as it is hard to understand something in there if you open the logs with Notepad++ only, for example.
I have found the answer logging big file small files which somehow answers part of my question from one point of view, but still, does anyone have any idea if there is any issue with multiple files from performance perspective, or shall I spend time actually trying to prove one idea or the other?

Related

Are logging tools like log4j useful only during development phase? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I need to log the activity of the users connected to my server, should I use log4j? Or is log4j beneficial only during development phase?

They are actually not particularly useful during development, System.out.println is pretty good for most dev debug logging, but once you deploy the following abilities become really useful:
roll logfiles so they don't get too big allowing for continuous maintenance-free operation
add times/dates so you can look at the logs for a certain time period
Change verbosity on the fly (You don't always want trace or debug info, but being able to flip it on when he system isn't running well can be a lifesaver)
Re-route logfiles to a more accessible place... Log4j can send your logs to various databases or other locations for when you can't actually reach your server directly.
Some of our code has trace statements on every significant line. If we run into problems when we are developing we leave the debugging/trace statements in and are able to turn them on when we need to in production--almost equivalent to single-stepping through your deployed code. In addition most methods have trace or debug statements at the top showing the parameters being passed in and program flow--again only really useful for a deployed system where a debugger is unavailable.
So in short, yes it's useful after development.
edit (in response to comment question)--
Just as an example. The app I'm working on now has 20ish logs. One is "Performance", it logs data coming in including timings--sometimes more than one line a second. This logfile "Rolls" at 10mb (about hourly) but we use it to find lags in our data delivery. We even use other software to analyze this log sometimes to look for patterns in data timing.
We have a separate "Error" log that logs all error-level activity. This log doesn't roll so fast that we lose data when we are getting a bunch of other log information.
There is another log to put problems related to Hibernate/SQL and one for problems related to our message queue and one for our inter-app cache....
These are all also combined into a "main" log through the log4j config file. We can reconfigure any one log to a different level (for instance, we were having authentication problems with a data source so we turned up it's debugging level on that source to find out what had changed in our server's environment to cause that)
Even though some of the logs scroll 10mb in an hour (our max file size). Log4j will roll them into .1 and .2 files so we can keep 10-50 of them depending on need.
All of this is done through config files and can be changed after deployment without rebuilding the system.
Edit 2--another thought
Another useful point about using log4j and the java logging interface is that libraries like hibernate using it can be configured through xml files without rebuilding.
Without Log4j/java's logging APIs you would either A) have a custom API to control the logs, B) only have default logging or C) have no logging from that subsystem. Since Hibernate uses java's APIs however, you can set the log level for "Hibernate" log info in a standard, documented xml config file and even re-route it's logs to a database or other logging device.

Logging is especially useful to locate errors that occur in productive code. During development you can use more powerful tools like debuggers.
Of course you have to be aware that logging potentially affects performance and can create huge files. That's why tools like log4j can be configured to turn on and off logging or to control its verbosity.
It's ok to use log4j because it is the most common library for Java. However I personally find its configuration a bit unitntuitive.

Another reason for using loggers such as Log4j is to allow for different logging levels for different components. Logs can get pretty large and messy if you turn DEBUG on for everything. If you know what area of code you want to "magnify" in your logs you can ratchet up the logging for that area alone.

yes (or any other logging framework / SLF4J)
no

Logging from Java app to ELK without need for parsing logs

I want to send logs from a Java app to ElasticSearch, and the conventional approach seems to be to set up Logstash on the server running the app, and have logstash parse the log files (with regex...!) and load them into ElasticSearch.
Is there a reason it's done this way, rather than just setting up log4J (or logback) to log things in the desired format directly into a log collector that can then be shipped to ElasticSearch asynchronously? It seems crazy to me to have to fiddle with grok filters to deal with multiline stack traces (and burn CPU cycles on log parsing) when the app itself could just log it the desired format in the first place?
On a tangentially related note, for apps running in a Docker container, is best practice to log directly to ElasticSearch, given the need to run only one process?

If you really want to go down that path, the idea would be to use something like an Elasticsearch appender (or this one or this other one) which would ship your logs directly to your ES cluster.
However, I'd advise against it for the same reasons mentioned by #Vineeth Mohan. You'd also need to ask yourself a couple questions, but mainly what would happen if your ES cluster goes down for any reason (OOM, network down, ES upgrade, etc)?
There are many reasons why asynchronicity exists, one of which is robustness of your architecture and most of the time that's much more important than burning a few more CPU cycles on log parsing.
Also note that there is an ongoing discussion about this very subject going on in the official ES discussion forum.

I think it's usually ill-advised to log directly to Elasticsearch from a Log4j/Logback/whatever appender, but I agree that writing Logstash filters to parse a "normal" human-readable Java log is a bad idea too. I use https://github.com/logstash/log4j-jsonevent-layout everywhere I can to have Log4j's regular file appenders produce JSON logs that don't require any further parsing by Logstash.

There is also https://github.com/elastic/java-ecs-logging which provides a layout for log4j, log4j2 and Logback. It's quite efficient and the Filebeat configuration is very minimal.
Disclaimer: I'm the author of this library.

If you need a quick solution I've written this appender here Log4J2 Elastic REST Appender if you want to use it. It has the ability to buffer log events based on time and/or number of events before sending it to Elastic (using the _bulk API so that it sends it all in one go). It has been published to Maven Central so it's pretty straight forward.
As the other folks have already mentioned the best way to do it would be to save it to file, and then ship it to ES separately. However I think that there is value if you need to get something running quickly until you have time/resources implement the optimal way.

Is a single logback.xml file for multiple applications a good practice?

There are multiple applications deployed on my Tomcat server.
At first everyone had it's one logback.xml file packaged in WEB-INF/classes with it.
Then I've put another directory outside the Tomcat's deploy directory on the common classpath, put a single logback.xml there and excluded the other ones from the applications. The reason for that was that I wanted logging to be conveniently configurable in one place.
Unfortunately there's the requirement now to log every application to it's own file.
Since I think that this is not so easy to achieve with this setup, I'm wondering whether this setup is that good at all. What do you think?

Unfortunately there's the requirement now to log every application to it's own file.
I think, that this is the only correct way to do it. It is ok to have several log files for single application, but to have many applications writing in the same log is bad practice.

What you want to do to have a single configuration file is to use a SiftAppender.

LOGS need to be easy to read and easy to parse by any user. If you have a single log file where multiple applications writing to the same file you might jumble up the various log entries. Since you are the developer who has knowledge of all 7 applications you might be able to get it but a new developer will have a difficult time understanding the logs. Logs should be concise and easy to decipher so that support issues can be analysed just be analyzing the log entries.
I would suggest you follow these tips

Using java.util.logging, is it possible to restart logs after a certain period of time?

I have some java code that will be running as an importer for data for a much larger project. The initial logging code was done with the java.util.logging classes, so I'd like to keep it if possible, but it seems to be a little inadequate now given he amount of data passing through the importer.
Often times in the system, the importer will get data that the main system doesn't have information for or doesn't match the system's data so it is ignored but a message is written to the log about what information was dropped and why it wasn't imported. The problem is that this tends to grow in size very quickly, so we'd like to be able to start a fresh log daily or weekly.
Does anybody have an idea if this can be done in the logging classes or would I have to switch to log4j or custom?
Thanks for any help!

I think you have to roll your own StreamHandler (at least as of Java 1.5 it didn't come with an implementation). Here is someone who did it.

You can use log4j and use the DatedFileAppender (distributed separately). This creates a single file per Date. I like it very much and use it everywhere where I implement log4j (even my Tomcat server logs through it!).

Incremental deployment of java web applications

We have following problem. Developers frequently need to make small changes to our web applications. When I say small, I mean things like correcting the spelling on a web page or similar. Generating and redeploying war archives can be slow and costly in such scenarios.
How could we automate and install changes incrementally? For example, generate new exploded war, compare files with exploded war in production and then replace in production only the files affected by change: .jsp .html .class etc.
This need not be hot deployment, it’s ok to restart the server. What I wish to avoid is having to copy and deploy wars that can be 80Mb in size. Sometimes connections are slow and making such minuscule change to web application as simple spelling correction can take hours.
We use Maven to automate our build process. The key issue is to automate the whole process, so that I can be sure that app v2.2.3 in my Subversion is exactly what I have in production after incremental deployment.

We used to do this sort of thing all of the time. We worked in a bank, and there were sometimes changes to legal phrases or terms and conditions that needed to be changed today (or more usually yesterday).
We did two things to help us deploy quickly. We had a good change control and build process. We could change and deploy any version we liked. We also had a good test suite, with which we could test changes easily.
The second was more controversial. All of our html was deployed as separate files on the server. There was no WAR. Therefore, when the circumstances came up that we needed to change something textual quickly, we could do it. If java needed changing, we always did a FULL build and deploy.
This is not something I'd recommend, but it was good for our situation.
The point of a WAR is so that everything gets deployed at the same time. If you're using a WAR, that means you want it to be deployed all at once.
One suggestion is not to do such corrections so often (once a week?). Then you don't have so much pain.

Hard to say. You can ofcourse replace single class files in an exploded webapp, but this is generally a bad idea and you don't see many people doing this.
The reason is that when you make small changes it becomes harder and harder to detect differences between production and development. The chances of you sending a wrong classfile and breaking the production server increases over time.
When you say text changes, isn't it an idea to keep the text resources seperate from the war file? That way, not only developers but maybe even the customer can easily add/change translations.
To the customer it's important, but technically it's silly to do a 80MB deploy over a slow line to fix a small typo.
You can also try to look at your build/delivery cycle and increase testing efforts to prevent these small changes.
Hope this helps.

You can have the master war deployed somewhere the running servers can access it, and instead of deploying war files to the individual servers you can use rsync and perl to determine if there are changes to any files in the master war, distribute them to the servers and execute restarts.

diff and patch:
http://stephenjungels.com/jungels.net/articles/diff-patch-ten-minutes.html

At the moment I installed SVN on the remote server so in case of a simple udate you can just update single file. Transfering the big WAR file would be quite impractical.
You can automate to a single click deployment using putty / plink [if you are using windows] by creating a simple script on the local machine an another one in the remote machine.
At the moment I have a DEVELOPMENT SVN and a LIVE SVN. The ANT build is merging the DEV to LIVE and the commit again back to the LIVE repository. At that stage the remote server can do a SVN UP and you will get automatically the file requested.
You can furter improve the update script to restart the server in case some classes are changed and do not restart in case of updating scripts/JSP.
In this way you will have also the option to rollback to a previous version to be sure that you have a working web app all the times.
To improve the process of merging SVN this tool is quite useful. : http://www.orcaware.com/svn/wiki/Svnmerge.py

The usual answer is to use a Continuous Integration sstem which watches your subversion and build the artifacts and deploy them - you just want your web application to be abel to work even after being redeployed. Question is if that is fast enough for you?

I don't think there's a straightforward answer to this one. T
The key here is modularisation - a problem which I don't think is solved very well with Java applications at present. You may want to look at OSGi or dynamic modules lathough I'm not sure how effective they are in terms of this problem.
I've seen solutions where people drop classes into application server/servlet container, I don't agree with it, but it does appear to work... I'm sure there are horror stories though!
Maven certainly makes things easier by splitting applications into modules, but if you do this and deploy modules independently you need to make sure that the various versions play nice together in a test environment to begin with...
An alternative is to partition your application in terms of functionality and host separate functions on various servers, e.g:
Customer Accounts - Server A
Search - Server B
Online Booking - Server C
Payment Services - Server D
The partitioning makes it easier to deploy applications, but again you have to make sure that your modules play nicely together first. Hope that helps.

I have had a similar situation before. It really is a separation of concerns issue, and it's not too straight forward. What you need to do is separate the text from the template/HTML page.
We solved this by placing our text in a database table, and using the table as a message resource - the same way people use myMessages.properties for internationalization (i8n). This gives you two advantages, you can i8n the text, and make changes in prod instantly and easily without a code deployment. We also cached the table to ensure performance didn't suffer much at all.
Not a solution for all, but it did work really well for us.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.