log4j Rolling file appender - multi-threading issues? - java

Are there any known bugs with the Log4J rolling file appender. I have been using log4j happily for a number of years but was not aware of this. A colleague of mine is suggesting that there are known issues ( and i found one a Bugzilla entry on this) where under heavy load,the rolling file appender (we use the time-based one) might not perform correctly when the rollover occurs # midnight.
Bugzilla entry - https://issues.apache.org/bugzilla/show_bug.cgi?id=44932
Appreciate inputs and pointers on how others overcome this.
Thanks,
Manglu

I have not encountered this issue myself, and from the bug report, I would suspect that it is very uncommon. Th Log4j RollingFileAppender has always worked in a predictable and reliable fashion for the apps I have developed and maintained.
This particular bug, If I understand correctly, would only happen if there are multiple instances of Log4j, like if you had multiple instances of the same app running simultaneously, writing to the same log file. Then, when it is rollover time, one instance cannot get a lock on the file in order to delete it and archive its contents, resulting in the loss of the data that was to be archived.
I cannot speak to any of the other known bugs your colleague mentioned unless you would like to cite them specifically. In general, I believe Log4j is reliable for production apps.

#kg, this happens to me too. This exact situation. 2 instances of the same program.
I updated it to the newer rolling.RollingFileAppender instead of using DailyFileRoller( whatever it was called ).
I run two instances simultenously via crontab. The instances output as many messages as they can until 5 seconds is reached. They measure the time for 1 second by using System.currentTimeMillis, and append to a counter to estimate a 5 second timeperiod for the loop. So there's minimum overhead in this test. Each output log message contains an incrementing number, and the message contains identifiers set from commandline to be able to separate them.
From putting the log message order together, one of the processes succeeds in writing from the start to end of the sequence, the other one loses the first entries of its output (from 0 onward).
This really ought to be fixed...

Related

Logging from two simultaneous processes

I have written an application in java and deployed it on a unix server.
I have implemented the logging in my app and the logs are generated in a file say X.log.txt
If I run multiple instances of my jar using different users or single user different sessions: Is there a chance that my logs in X.log.txt get mixed?
or it will be in FCFS manner??
Example: let P1 and P2 be two processes that are calling the java app and are generating logs.
P1 and P2 ARE writing their individual logs at the same time to X.log.txt. Is this statement true? Or is it entirely based on the CPU scheduling algorithm (FCFS, SJF, etc.)?
Even if i don't use the timestamping Its working fine for me.
When I am executing them the logs are generated one after the other , Means For a particular instance all the logs are written into the file and then for the next instance. My question is still open is it all based on the way our processor is written to handle jobs or is it some thing else ?
If two processes are writing to same log file, data will get randomly corrupted. You will get lines cut in the middle and finishing with data from other log. You can even end up with good chunks of binary zeroes in various places of the file, depending on OS (and in some OSes, it will just fail to write to same file from two places at same time).
Write to separate files and then join/browse them using some 3rd party tools to get timestamp-ordered view.
If both your processes are writting to the same directory and file path you will get some odd behaviour. Depending on your implementation both applications will write to the file at the same time or one application will block the other from writing at all.
My suggestion would be to generate the log file's name at runtime and append something unique like a timestamp or a pid (process id) so there's no more conflict:
X.log.[PID].txt or X.log.[TIMESTAMP].txt
NOTE: You'll have to use a low enough resolution in the timestamp (seconds or nanoseconds) to avoid a name collision.

Multiple appenders log4j performance

I am using slf4j, implementation of log4j for logging in my java project. Currently I am having 2 appenders, FILE and CONSOLE.
I want to know following 2 things:
Does using multiple appenders (in this case CONSOLE and FILE) causes performance issue in logging?
When somebody would want to use CONSOLE and FILE appenders both?
When writing to CONSOLE and FILE, you are writing to 2 different streams. In a multithreaded system, the performance hit will not be much, but with big volumes it is still apparent.
From the log4J manual
The typical cost of actually logging is about 100 to 300 microseconds.
This includes building the statement and writing it, but the time taken for writing will still be apparent if you are logging heavily.
But you need to ask a more basic question - Why are you logging?
to keep track of what is going on
to find out errors
The CONSOLE is not useful for the first part as the logs are not saved anywhere. If the logging is heavy, and all the logs are sent to the CONSOLE, the amount of logs will make the output on the console unreadable, so purpose 2 is also defeated.
IMO it makes much more sense reading logs from a file using something like less. As a general practice, you log to file and if you must, log only the ERROR messages to console, as a few ERROR messages would be an indicator of something going wrong, whereas hundreds of log lines on console is just junk, as you cannot make any sense of it when the console is refreshing so rapidly.
TL-DR
The cost might not be much, but why incur an added cost when you are getting no added advantage?
Read these links on log 4j performance.
log4j-performance
log4j-decreased application performance
log4j appenders
I challenge you to notice any performance change.
For instance you might want a daemon application to log both in the console and in a file. It does not seem to be such an uncommon behavior.

Java multiple processes running in parallel causing open file pointer exceptions

I have two Java programs that are scheduled to run one after the other. When they run in sequence without an overlap, there are no issues. But sometimes one of them gets prolonged a little while more due to the volume it is processing and the second one starts before the first one ends. Now, when this happens at a point when the second one is half way through completion it crashes giving the Max no of files open exception. The first one finishes successfully though. When run separately, with the same volume, there are no issues with either of them. Both processes are completely independent of each other - no common resources, invoked from different scripts and are finally 2 different processes running on the same OS on 2 different JVM's - I use a HP-UNIX system and have tried to trace the open handles using the TUSC utility but there aren't any that could cause such a problem. Xmx is 2Gigs for both and I doubt that would be reached - but is there an explanation to this that I am not seeing? Can the parallel runs be an issue or is it just a coincidence?
The solution to your problem could be either to up your file descriptor limit (see link below) or to make sure that your code is properly closing resources (file input streams, sockets) so you are not leaking open file descriptors. I'll leave it up to you which approach you take.
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02261152/c02261152.pdf
Edit
If your program really is generating that many files at a time, I would certainly look into upping the open file descriptor limit. Then, you might also consider putting in a throttle/regulator into the code. Create X number of files and then back off for a few seconds, let the system reap the resources back and then continue on again. This might be helpful.
Also, this link sounds very similar to your issue:
Apache FOP 1.0 Multithreading - Too many open files err24
In a similar scenario and with resource constraints, we have used the below architecture.
There will be a monitor thread which will be started first which will have a count variable. There will be a configurable limit till which new processes will be created. For every new process, the count variable will be incremented. Once the process is completed, the count variable will be decremented. New processes will be created only if the count is lesser than the configured limit.
Above approach helped us to gain better control and able to scale up wherever it was possible.

Reading a file that is being written to - Locking it?

There is a file - stored on an external server which is updated very frequently by a vendor. My application polls this file every minute getting the values out. All I am doing is reading the file.
I am worried that by doing this I could inadvertently lock the file so it cant be written too by the vendor. Is this a possibility?
Kind regards
Further to Eric's answer - you could check the Last Modified Property of the temp file and only merge it with your 'working' file when it changes - that should protect you from read/write conflicts and only merge files just after the vendor has written to the temp. Though this is messy and mrab's comment is valid, a better solution should be found.
I have faced this problem several times, and as Peter Lawrey says there isn't any portable way to do this, and if your environment is Unix this should not be an issue at all as these concurrent access conditions are properly managed by the operating systems. However windows do not handle this at all (yes, that's the consequence of using an amateur OS for production work, lol).
Now that's said, there is a way to solve this if your vendor is flexible enough. They could write to a temp file and when finished move the temp file to the final destination. By doing this you avoid any concurrent access to the file between you and the vendor.
Another way is to precisely (difficult?) know the timing of your vendors update and avoid reading the file during some time frames. For instance if your vendor update the file every hour, avoid reading from five-to-the-hour to five-past-the-hour.
Hope it helps.
There is the Windows Shadow Copy service for volumes. This would allow to read the backup copy.
If the third party software is in java too, and uses a Logger, that should be tweakable: every minute writing to the next from 10 files or so.
I would try to relentlessly read the file (when modified since last read), till something goes wrong. Maybe you can make a test run with hundreds of reads in the weekend or at midnight, when no harm is done.
My answer:
Maybe you need a local watch program, a watch service for a directoryr, that waits till the file is modified, and then makes a fast cooy; after that allowing the copy to be transmitted.

What is the best size for a log file in LOG4J?

I have an application that runs slow. This is because of a huge amount of loggings at DEBUG and INFO levels inside the code. I have made some modifications in the code and changed the log level to WARN and it works well now.
But there is only one log file (currently at 1.6GB). I want to use a RollingFileAppender to have more, smaller, files. What is the best (maximum) size that I should use for the appender’s MaxFileSize property so that performance won’t degrade?
That really depends on many factors so to answer the question, you'd have to run a profiler with various file sizes. But since log4j only writes to the log file, you can simply create files of various sizes on your system and time how long it takes.
To be able to find errors in the file, I suggest to use a DailyRollingFileAppender, though. This will make it much more simple to look for something "yesterday" or "two weeks ago".
Having smaller chunks will make your log files more manageable, but it will not improve performance.
It seems that your limitation is the hdd. One solution to the performance issue would be to have WARN level logged in one file, and DEBUG and INFO in another. Ideally, you would have this larger file kept on a dedicated fast hdd.
Another solution to the performance issue would be to tweak logging to different package. You rarely need INFO on all packages, because parsing 2 GB of data would be hard to do, especially in real time.
Answer to the smallest size question:
It should be as large as your tools can handle without trouble. Let's say that you would use a log viewer to watch the log file. Some log viewers will perform badly on files, let's say, bigger than 10 MB. But, again, on 1 GB of log data generated in ... 1 hour maybe, you won't be able to watch it in real time.

Categories