Efficient high-volume logger? - java

I have a server which generates about 5MB/s of logs and much more during peaks. My initial test setup was to just use systemd to forward the output to rsyslog and then sending the data to a file. As I expected, this is very expensive with systemd-journal eating up a large chunk of my CPU resources and prevents the server from being able to handle traffic peaks.
I'm looking for a logging setup that would be both high-performance as well as support log rotation. I ran into the following article
https://medium.com/javarevisited/high-performance-logging-java-59ba374b2166
which mentions the memory mapped file appender for log4j. Log4j also has a rotating file logger, but these two do not compose, so I would need to use something external to rotate the logs. However, normally logrotate on the machine sends a HUP signal to tell syslog that a file has been rotated, so that it can close file handles and reopen the new file. This won't really work for a Java server using log4j.
Any recommendations for what to use? My code currently uses slf4j to actual emit stuff to logs, so a big plus if it can interoperate with it.

Related

How logback determines number of threads and feasible way for fast logging

I am using logback with sl4j for logging with in my application. When profiling my application , I can see around 8 logback threads running ,and not doing anything -- as i have removed all logging to support low latency requirements for application.
How logback determines number of threads and how we can control it ?
To support latency && minimum object allocation to avoid GC requirements, applciation stores all logs binary messages in Ring buffer which is read by separate thread for logging.
I can use log4j which have less allocation in comparison to logback, but it still allocates.
USe String builder of fixed length, but it will convert toString while saving in logs, which will creates objects.
Use empty memory map file and dump encoded messages to file, however we need to use another utility for reading logs which is not user friendly.
Is there any other solution by which I can dump String messages to pre mapped file ?
Thanks for any inputs.

Different logging configurations per enviroment, good or bad?

I'm currently using a very basic logging configuration and I'm using the same configuration in all environments. For development it is beneficial to see the output in the console, so I've configured log4j with the following root categories:
log4j.rootCategory=INFO, console, file
When I deploy, I am only interested in the output that is directed to file and have configured each file to have maximum file size.
Is there any performance hit of logging to console in production where I have no use for it? Also, where does this output go in a Linux vs. a Windows machine when no console is available? What, if anything, do I gain by having separate configurations?

Multiple appenders log4j performance

I am using slf4j, implementation of log4j for logging in my java project. Currently I am having 2 appenders, FILE and CONSOLE.
I want to know following 2 things:
Does using multiple appenders (in this case CONSOLE and FILE) causes performance issue in logging?
When somebody would want to use CONSOLE and FILE appenders both?
When writing to CONSOLE and FILE, you are writing to 2 different streams. In a multithreaded system, the performance hit will not be much, but with big volumes it is still apparent.
From the log4J manual
The typical cost of actually logging is about 100 to 300 microseconds.
This includes building the statement and writing it, but the time taken for writing will still be apparent if you are logging heavily.
But you need to ask a more basic question - Why are you logging?
to keep track of what is going on
to find out errors
The CONSOLE is not useful for the first part as the logs are not saved anywhere. If the logging is heavy, and all the logs are sent to the CONSOLE, the amount of logs will make the output on the console unreadable, so purpose 2 is also defeated.
IMO it makes much more sense reading logs from a file using something like less. As a general practice, you log to file and if you must, log only the ERROR messages to console, as a few ERROR messages would be an indicator of something going wrong, whereas hundreds of log lines on console is just junk, as you cannot make any sense of it when the console is refreshing so rapidly.
TL-DR
The cost might not be much, but why incur an added cost when you are getting no added advantage?
Read these links on log 4j performance.
log4j-performance
log4j-decreased application performance
log4j appenders
I challenge you to notice any performance change.
For instance you might want a daemon application to log both in the console and in a file. It does not seem to be such an uncommon behavior.

Fastest way to store data from sensors in java

I am currently writing a Java application that receives data from various sensors. How often this happens varies, but I believe that my application will receive signals about 100k times per day. I would like to log the data received from a sensor every time the application receives a signal. Because the application does much more than just log sensor data, performance is an issue. I am looking for the best and fastest way to log the data. Thus, I might not use a database, but rather write to a file and keep 1 file per day.
So what is faster? Use a database or log to files? No doubt there is also a lot of options to what logging software to use. Which is the best for my purpose if logging to file is the best option?
The data stored might be used later for analytical purposes, so please keep this in mind as well.
I would recommend you first of all to use log4j (or any other logging framework).
You can use a jdbc appender that writes into the db or any kind of file appender that writes into the file. The point is that your code will be generic enough to be changed later if you like...
In general files are much faster than db access, but there is a place for optimizations here.
If the performance is critical, you can use batching/asynchronous calls to the logging infrastructure.
A free database on a cheap PC should be able to record 10 records per second easily.
A tuned database on a good system or a logger on a cheap PC should be able to write 100 records/lines per second easily.
A tuned logger should be able to write 1000 lines per second easily.
A fast binary logger can perform 1 million records per second easily (depending on the size of the record)
Your requirement is about 1.2 records per second per signal which should be able to achieve any way you like. I assume you want to be able to query your data so you want it in a database eventually so I would put it there.
Ah the world of embedded systems. I had a similar problem when working with a hovercraft. I solved it with a separate computer(you can do this with a separate program) over the local area network that would just SIT and LISTEN as a server for logs I sent to it. The FileWriter program was written in C++. This must solve two problems of yours. First is the obvious performance gain while writing the logs. And secondly the Java program is FREED of writing any logs at all(but will act as a proxy) and can concentrate on performance critical tasks. Using a DB for this is going to be an overkill, except if you're using SQLite.
Good luck!

Configuring Hadoop logging to avoid too many log files

I'm having a problem with Hadoop producing too many log files in $HADOOP_LOG_DIR/userlogs (the Ext3 filesystem allows only 32000 subdirectories) which looks like the same problem in this question: Error in Hadoop MapReduce
My question is: does anyone know how to configure Hadoop to roll the log dir or otherwise prevent this? I'm trying to avoid just setting the "mapred.userlog.retain.hours" and/or "mapred.userlog.limit.kb" properties because I want to actually keep the log files.
I was also hoping to configure this in log4j.properties, but looking at the Hadoop 0.20.2 source, it writes directly to logfiles instead of actually using log4j. Perhaps I don't understand how it's using log4j fully.
Any suggestions or clarifications would be greatly appreciated.
Unfortunately, there isn't a configurable way to prevent that. Every task for a job gets one directory in history/userlogs, which will hold the stdout, stderr, and syslog task log output files. The retain hours will help keep too many of those from accumulating, but you'd have to write a good log rotation tool to auto-tar them.
We had this problem too when we were writing to an NFS mount, because all nodes would share the same history/userlogs directory. This means one job with 30,000 tasks would be enough to break the FS. Logging locally is really the way to go when your cluster actually starts processing a lot of data.
If you are already logging locally and still manage to process 30,000+ tasks on one machine in less than a week, then you are probably creating too many small files, causing too many mappers to spawn for each job.
I had this same problem. Set the environment variable "HADOOP_ROOT_LOGGER=WARN,console" before starting Hadoop.
export HADOOP_ROOT_LOGGER="WARN,console"
hadoop jar start.jar
Configuring hadoop to use log4j and setting
log4j.appender.FILE_AP1.MaxFileSize=100MB
log4j.appender.FILE_AP1.MaxBackupIndex=10
like described on this wiki page doesn't work?
Looking at the LogLevel source code, seems like hadoop uses commons logging, and it'll try to use log4j by default, or jdk logger if log4j is not on the classpath.
Btw, it's possible to change log levels at runtime, take a look at the commands manual.
According to the documentation, Hadoop uses log4j for logging. Maybe you are looking in the wrong place ...
I also ran in the same problem.... Hive produce a lot of logs, and when the disk node is full, no more containers can be launched. In Yarn, there is currently no option to disable logging. One file particularly huge is the syslog file, generating GBs of logs in few minutes in our case.
Configuring in "yarn-site.xml" the property yarn.nodemanager.log.retain-seconds to a small value does not help. Setting "yarn.nodemanager.log-dirs" to "file:///dev/null" is not possible because a directory is needed. Removing the writing ritght (chmod -r /logs) did not work either.
One solution could be to a "null blackhole" directory. Check here:
https://unix.stackexchange.com/questions/9332/how-can-i-create-a-dev-null-like-blackhole-directory
Another solution working for us is to disable the log before running the jobs. For instance, in Hive, starting the script by the following lines is working:
set yarn.app.mapreduce.am.log.level=OFF;
set mapreduce.map.log.level=OFF;
set mapreduce.reduce.log.level=OFF;

Categories