Logging from two simultaneous processes

Logging from two simultaneous processes - java

I have written an application in java and deployed it on a unix server.
I have implemented the logging in my app and the logs are generated in a file say X.log.txt
If I run multiple instances of my jar using different users or single user different sessions: Is there a chance that my logs in X.log.txt get mixed?
or it will be in FCFS manner??
Example: let P1 and P2 be two processes that are calling the java app and are generating logs.
P1 and P2 ARE writing their individual logs at the same time to X.log.txt. Is this statement true? Or is it entirely based on the CPU scheduling algorithm (FCFS, SJF, etc.)?
Even if i don't use the timestamping Its working fine for me.
When I am executing them the logs are generated one after the other , Means For a particular instance all the logs are written into the file and then for the next instance. My question is still open is it all based on the way our processor is written to handle jobs or is it some thing else ?

If two processes are writing to same log file, data will get randomly corrupted. You will get lines cut in the middle and finishing with data from other log. You can even end up with good chunks of binary zeroes in various places of the file, depending on OS (and in some OSes, it will just fail to write to same file from two places at same time).
Write to separate files and then join/browse them using some 3rd party tools to get timestamp-ordered view.

If both your processes are writting to the same directory and file path you will get some odd behaviour. Depending on your implementation both applications will write to the file at the same time or one application will block the other from writing at all.
My suggestion would be to generate the log file's name at runtime and append something unique like a timestamp or a pid (process id) so there's no more conflict:
X.log.[PID].txt or X.log.[TIMESTAMP].txt
NOTE: You'll have to use a low enough resolution in the timestamp (seconds or nanoseconds) to avoid a name collision.

Related

How to check tree of directories for changes efficiently after termination and restart of program?

I am writing a program that loads a library of data from the disk. It scans recursively through each folder specified by the user, reads the necessary metadata from each file, and then saves that in the program's library into a data structure that is suitable for display and manipulation by the user.
For a reasonable sized data set, this process takes 5-10 minutes. On the high end I could imagine it taking half an hour.
It also sets up a watcher for each directory in the tree, so if anything is changed after the initial scan while the program is running, that changed file or folder can be re-scanned and the library updated with the new data.
When the program terminates, the library data structure is serialized to disk, and then loaded back in at the beginning of the next session.
This leaves one gap that needed to be addressed -- if files are changed between sessions, there is no way to know about those changes.
The solution currently implemented is, when the program is launched and the persisted data is loaded, to then rescan the entire file structure and compare the scanned information to the loaded data, and if anything is different, to replace it.
Given that the rescan reads the metadata of each file and reloads everything, just to discard it after confirming nothing has changed, this seems like a very inefficient method to me.
Here is my question: I'd like to find some way to shortcut this re-scan process so I don't have to read all of the metadata back in and do a full rescan. Instead, it would be nice if there were a way to ask a folder "have your contents changed at all since the last time I saw you? If so, let me rescan you, otherwise, I won't bother rescanning."
One idea that occurs to me is to take a checksum of the folder's contents and store that in the database, and then compare the hashes during the re-scan.
Before I implement this solution, does anyone have a recommendation on how to accomplish this in a better way (or any advice for how to efficiently take the hash of a directory with java)?

Store timestamp on shutdown, then just do find -mnewer?

The most practical way is to traverse the file tree checking for files with a timestamp newer than when your application stopped. For example
find root-dir -mnewer`
though if you did it that way you may run into race conditions. (It would be better to do it in Java ... as you reinstantiate the watchers.)
There are a couple of caveats:
Scanning a file tree takes time. The larger the tree, the longer it takes. If you are talking millions of files it could take hours, just to look at the timestamps.
Timestamps are not bombproof:
there can be issues if there are "discontinuities" in the system clock, or
there can be issues if some person or program with admin privilege tweaks file timestamps.
One idea that occurs to me is to take a checksum of the folder's contents and store that in the database, and then compare the hashes during the re-scan.
It would take much longer to compute checksums or hashes of files. The only way that would be feasible is if the operating system itself was to automatically compute and record a file checksum or hash each time a file was updated. (That would be a significant performance hit on all file / directory write operations ...)

Inter-process file exchange: efficiency and race conditions

The story:
A few days ago I was thinking about inter-process communication based on file exchange. Say process A creates several files during its work and process B reads these files afterwards. To ensure that all files were correctly written, it would be convenient to create a special file, which existence will signal that all operations were done.
Simple workflow:
process A creates file "file1.txt"
process A creates file "file2.txt"
process A creates file "processA.ready"
Process B is waiting until file "processA.ready" appears and then reads file1 and file2.
Doubts:
File operations are performed by the operating system, specifically by the file subsystem. Since implementations can differ in Unix, Windows or MacOS, I'm uncertain about the reliability of file exchange inter-process communication. Even if OS will guarantee this consistency, there are things like JIT compiler in Java, which can reorder program instructions.
Questions:
1. Are there any real specifications on file operations in operating systems?
2. Is JIT really allowed to reorder file operation program instructions for a single program thread?
3. Is file exchange still a relevant option for inter-process communication nowadays or it is unconditionally better to choose TCP/HTTP/etc?

You don’t need to know OS details in this case. Java IO API is documented to guess whether file was saved or not.
JVM can’t reorder native calls. It is not written in JMM explicitly but it is implied that it can’t do it. JVM can’t guess what is impact of native call and reordering of those call can be quite generous.
There are some disadvantages of using files as a way of communication:
It uses IO which is slow
It is difficult to separate processes between different machines in case you would need it (there are ways using samba for example but is quite platform-dependant)

You could use File watcher (WatchService) in Java to receive a signal when your .ready file appears.
Reordering could apply but it shouldn't hurt your application logic in this case - refer the following link:
https://assylias.wordpress.com/2013/02/01/java-memory-model-and-reordering/
I don't know the size of your data but I feel it would still be better to use an Message Queue (MQ) solution in this case. Using a File IO is a relatively slow operation which could slow down the system.

Used file exchange based approach on one of my projects. It's based on renaming file extensions when a process is done so other process can retrieve it by file name expression checking.
FTP process downloads a file and put its name '.downloaded'
Main task processor searched directory for the files '*.downloaded'.
Before starting, job updates file name as '.processing'.
When finished then updates to '.done'.
In case of error, it creates a new supplemantary file with '.error' extension and put last processed line and exception trace there. On retries, if this file exists then read it and resume from correct position.
Locator process searches for '.done' and according to its config move to backup folder or delete
This approach is working fine with a huge load in a mobile operator network.
Consideration point is to using unique names for files is important. Because moving file's behaviour changes according to operating system.
e.g. Windows gives error when there is same file at destination, however unix ovrwrites it.

How to poll a directory and not hit a file-transfer race condition?

I am working on an application that polls a directory for new input files at a defined interval. The general process is:
Input files FTP'd to landing strip directory by another app
Our app wakes up
List files in the input directory
Atomic-move the files to a separate staging directory
Kick off worker threads (via a work-distributing queue) to consume the files from the staging directory
Go to back sleep
I've uncovered a problem where the app will pick up an input file while it is incomplete and still in the middle of being transferred, resulting in a worker thread error, requiring manual intervention. This is a scenario we need to avoid.
I should note the file transfer will complete successfully and the server will get a complete copy, but this will happen to occur after the app has given up due to an error.
I'd like to solve this in a clean way, and while I have some ideas for solutions, they all have problems I don't like.
Here's what I've considered:
Force the other apps (some of which are external to our company) to initially transfer the input files to a holding directory, then atomic-move them into the input directory once they're transferred. This is the most robust idea I've had, but I don't like this because I don't trust that it will always be implemented correctly.
Retry a finite number of times on error. I don't like this because it's a partial solution, it makes assumptions about transfer time and file size that could be violated. It would also blur the lines between a genuinely bad file and one that's just been incompletely transferred.
Watch the file sizes and only pick up the file if its size hasn't changed for a defined period of time. I don't like this because it's too complex in our environment: the poller is a non-concurrent clustered Quartz job, so I can't just persist this info in memory because the job can bounce between servers. I could store it in the jobdetail, but this solution just feels too complicated.
I can't be the first have encountered this problem, so I'm sure I'll get better ideas here.

I had that situation once, we got the other guys to load the files with a different extension, e.g. *.tmp, then after the file copy is completed they rename the file with the extension that my code is polling for. Not sure if that is as easily done when the files are coming in by FTP tho.

Java multiple processes running in parallel causing open file pointer exceptions

I have two Java programs that are scheduled to run one after the other. When they run in sequence without an overlap, there are no issues. But sometimes one of them gets prolonged a little while more due to the volume it is processing and the second one starts before the first one ends. Now, when this happens at a point when the second one is half way through completion it crashes giving the Max no of files open exception. The first one finishes successfully though. When run separately, with the same volume, there are no issues with either of them. Both processes are completely independent of each other - no common resources, invoked from different scripts and are finally 2 different processes running on the same OS on 2 different JVM's - I use a HP-UNIX system and have tried to trace the open handles using the TUSC utility but there aren't any that could cause such a problem. Xmx is 2Gigs for both and I doubt that would be reached - but is there an explanation to this that I am not seeing? Can the parallel runs be an issue or is it just a coincidence?

The solution to your problem could be either to up your file descriptor limit (see link below) or to make sure that your code is properly closing resources (file input streams, sockets) so you are not leaking open file descriptors. I'll leave it up to you which approach you take.
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02261152/c02261152.pdf
Edit
If your program really is generating that many files at a time, I would certainly look into upping the open file descriptor limit. Then, you might also consider putting in a throttle/regulator into the code. Create X number of files and then back off for a few seconds, let the system reap the resources back and then continue on again. This might be helpful.
Also, this link sounds very similar to your issue:
Apache FOP 1.0 Multithreading - Too many open files err24

In a similar scenario and with resource constraints, we have used the below architecture.
There will be a monitor thread which will be started first which will have a count variable. There will be a configurable limit till which new processes will be created. For every new process, the count variable will be incremented. Once the process is completed, the count variable will be decremented. New processes will be created only if the count is lesser than the configured limit.
Above approach helped us to gain better control and able to scale up wherever it was possible.

log4j Rolling file appender - multi-threading issues?

Are there any known bugs with the Log4J rolling file appender. I have been using log4j happily for a number of years but was not aware of this. A colleague of mine is suggesting that there are known issues ( and i found one a Bugzilla entry on this) where under heavy load,the rolling file appender (we use the time-based one) might not perform correctly when the rollover occurs # midnight.
Bugzilla entry - https://issues.apache.org/bugzilla/show_bug.cgi?id=44932
Appreciate inputs and pointers on how others overcome this.
Thanks,
Manglu

I have not encountered this issue myself, and from the bug report, I would suspect that it is very uncommon. Th Log4j RollingFileAppender has always worked in a predictable and reliable fashion for the apps I have developed and maintained.
This particular bug, If I understand correctly, would only happen if there are multiple instances of Log4j, like if you had multiple instances of the same app running simultaneously, writing to the same log file. Then, when it is rollover time, one instance cannot get a lock on the file in order to delete it and archive its contents, resulting in the loss of the data that was to be archived.
I cannot speak to any of the other known bugs your colleague mentioned unless you would like to cite them specifically. In general, I believe Log4j is reliable for production apps.

#kg, this happens to me too. This exact situation. 2 instances of the same program.
I updated it to the newer rolling.RollingFileAppender instead of using DailyFileRoller( whatever it was called ).
I run two instances simultenously via crontab. The instances output as many messages as they can until 5 seconds is reached. They measure the time for 1 second by using System.currentTimeMillis, and append to a counter to estimate a 5 second timeperiod for the loop. So there's minimum overhead in this test. Each output log message contains an incrementing number, and the message contains identifiers set from commandline to be able to separate them.
From putting the log message order together, one of the processes succeeds in writing from the start to end of the sequence, the other one loses the first entries of its output (from 0 onward).
This really ought to be fixed...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.