I'm working on a small Java application (Java 1.6, Solaris) that will use multiple background threads to monitor a series of text files for output lines that match a particular regex pattern and then make use of those lines. I have one thread per file; they write the lines of interest into a queue and another background thread simply monitors the queue to collect all the lines of interest across the whole collection of files being monitored.
One problem I have is when one of the files I'm monitoring is reopened. Many of the applications that create the files I'm monitoring will simply restart their logfile when they are restarted; they don't append to what's already there.
I need my Java application to detect that the file has been reopened and restart following the file.
How can I best do this?
Could you keep a record of each of the length of each file? When the current length subsequently goes back to zero or is smaller than the last time you recorded the length, you know the file has been restarted by the app?
using a lockfile is a solution as Jurassic mentioned.
Another way is to try and open the file while you're reading the pattern and find if the file has a new size and create time. If the create time is NOT same as when you found it, then you can be sure that it has been recreated.
You could indicate somewhere on the filesystem that indicates you are reading a given file. Suppose next to the file being read (a.txt), you create a file next to it (a.txt.lock) that indicates a.txt is being read. When your process is done with it, a.txt.lock is deleted. Every time a process goes to open a file to read it, it will check for the lock file beforehand. If there is no lockfile, its not being used. I hope that makes sense and answers your question. cheers!
Related
I want to move two files to a different directory in same filesystem.
Concrete example, I want to move /var/bigFile to /var/path/bigFile, and /var/smallFile to /var/path/smallFile.
Currently I use Files.move(source, target), without any options, moving the small file first and big file second. I need this order since there is another process waiting for this files to arrive, and the order is important.
Problem is that, sometimes I see the creation date for small file being greater than the creation date for the big file, like the moving order is not followed.
Initially I thought I have to do a sync, but it does not make sense.
Given the fact that the move will actually be a simple rename, there is no system buffers included, to force them to be flushed to disk.
Timestamp for the files was checked using ls -alrt command.
Does anyone have any idea what could be wrong?
i am writing a program that parses xml files that hold tourist attractions for cities. each city has it's own xml and the nodes have info like cost, address etc... i want to have a thread on a timer to check for new xml files or more recent versions of existing ones in a specific directory. creating the thread is not the problem. i just have no idea what the best way to check for these new files or changed files is. does anyone have any suggestions as to an easy way to make do that. i was thinking of crating a csv file with names and date altered info for each file processed and then checking against this csv file when i go to check for new or altered xml, but that seems overly complicated and i would like a better solution. i have no code to offer at this point for this mechanism i am just looking for a direction to go in.
the idea is as i get xml's for different cities fitting the schema that it will update my db automatically next time the program runs or periodically if already running.
To avoid polling you should watch the directory containing the xml file. Oracle has an extensive documentation about the topic at Watching a Directory for Changes
What you are describing looks like asynchronous feeding of new info. One common pitfall on such problem is race condition : what happens if you are trying to read a file while it's being modified or if something else tries to write a file while you are reading it ? What happens if your app (or the app that edit your xml files) breaks in the middle of processing ? To avoid such problems you should move files (change name or directory) to follow their status because moves are atomical operation on normal file systems. If you want a bullet proof solution, you should have :
files being edited or transfered by an external part
files being fully edited or transfered and ready to be read by you app
files being processed
files completely processed
files containing errors (tried to process them but could not complete processing)
The 2 first are under external responsability (you just define an interface contract), the 2 latter are under yours. The cost if 4 or 5 directories (if you choose that solution), the gains are :
if there is any problem while editing-tranfering a xml file, the external app just have to restart its operation
if a file can't be processed (syntax error, oversized, ...) it is put apart for further analysis but does not prevent processing of other files
you only have to watch almost empty directories
if your app breaks in the middle of processing a file, at next start it can restart its processing.
I have a file scanner application in Java, that keeps scanning a directory on a server using FTP. gets list of files of the directory and downloads them one by one. on the other side, on the server, there's a process that writes these files. if I'm lucky I wouldn't try to download an incomplete file but how can I make sure if the write process on the server is complete and the file handle is closed, and file is ready to be downloaded?
I have no control on the write process which is on the server. moreover, I don't have write permission on the directory to try to get a write-handle in order to check if there's already a write handle open, so this option is off the table.
Is there an FTP function addressing this problem?
This is a very old and well-known problem.
There is no way to be absolutely certain a file being written by the FTP daemon is complete. It's even possible that the file transfer failed and then gets restarted and completed. You must poll the file's size and set a time limit, say 5 minutes. If the size does not change during that time you assume the file is complete.
If possible, the program that processes the file should be able to deal with partial files.
A much better alternative is rsync, which is much more robust and deterministic. It can even be configured (via command-line option) to write the data initially to a temporary location and move it to its final destination path upon successful completion. If the file exists where you expect it, then it is by definition complete.
A possible solution would be first uploading the file with a different filename (e.g. adding ".partial") and then renaming it to its final name.
If the server finds the final name then the upload has been completed.
If you cannot control the upload process then what you are asking is impossible by definition: the file upload could stop because of a network problem or because the sending process is stopped for whatever reason.
What the receiving end will observe is just a closing of the incoming stream; there is no way to guarantee that the data will not be a partial transfer.
Other workarounds could be checking for an end-of-data marker or using a request to the sending server to check if (in their view) the transfer has been completed.
This is more fundamental than FTP: you'd have a similar problem reading those files even if they were being created on the local machine.
If you can't modify the writing process, you'll need to jump through some hoops. None are great, but some are safer than others.
Keep reading until nothing changes for some window (maybe a minute, like David Schwartz suggests). You could optimize this a bit by watching the file size.
Figure out if the files are written serially in a reliable order. When you see file N appear, you know that file N-1 is ready. (Assumes that the directory is empty before the files are written, though you could also look at timestamps.) The downside is that your logic will break if the writer ever changes order or starts writing in parallel.
The reliable, safe solutions require improving the writer process.
Writer can write the files to hidden or temporary locations and only make them visible once the entire file (or directory) is ready, using symlinks or file-moving or chmod.
Writer creates a special file (e.g., "./DONE") only after all other files have been written, and reader doesn't read any files until that file is present.
Depending on the file type, the writer could add some kind of end-of-file record/line at the end of the file, and the reader could ensure that it's present.
You can use Ftp library from Apache common API
get more information
boolean flag = retrieveFile(String remote, OutputStream local);
This flag check output stream is available of the current file.
my java application is supposed to read logging data of a Snort application on a Debian server.
The Snort application runs independent from my evaluation app and writes his logs into a file.
My evaulation app is supposed to check just the new content every 5 minutes. That's why I will move the logfile, so that the Snort application has to create a new file while my app can check the already written data from the old one.
Now the question: How can I ensure that I don't destroy the file in the case, that I move it in the moment the Snort application is writing on it? Has Java a functionality to check the current actions for the file so that no data can get lost? Does the OS lock the file while writing?
Thanks for your help, Kn0rK3
Not exactly what you are looking for, but I would do this in a very different way. Either by recording the line number / timestamp of the last entry read from the log file or the position in a RandomAccessFile (the second option is more efficient for obvious reasons), and, the next time you read the file, only do it from the recorded position to the EOF (at which you can record the last read position again).
Also, you can replace the "pool every 5 minutes" to a "pool every time I get a update notification" for this file strategy.
Since I assume that you don't have control of the code of the "Snort" application, I don't think that NIO FileLocks will help you.
It should not be an issue. Typically a logging application has some sort of file-descriptor or stream open to a file. If the file gets renamed, that doesn't affect the writing application in any way -- the name is independent to the contents of the file or its location on disk. Snort should continue to write to the new file-name until it notices that the file has been renamed at which point it re-opens a new log file to the old-name and switches to writing to that one.
That's the whole reason why it reopens in the first place. To support this sort of mechanism.
Now the question: How can I ensure that I don't destroy the file in the case...
The only thing you have to worry about is that you are renaming the file to a file-name that does not already exist. I would recommend moving it to a .YYYYMMDD.HHMMSS extension or something.
NOTE: In threaded logging operations, even if the new file has been opened, you may have to wait a bit for all of the threads to switch to the new logging stream. I'm not sure how Snort works but I have seen the log.YYYYMMDD file growing even after the log file was re-opened. I just wait a minute before I consume the renamed logfile. FYI.
I have this recurrent Java JAR program tasks that tries to modify a file every 60seconds.
Problem is that if user is viewing the file than Java program will not be able to modify the file. I get the typical IOException.
Anyone knows if there is a way in Java to modify a file currently in use? Or anyone knows what would be the best way to solve this problem?
I was thinking of using the File canRead(), canWrite() methods to check if file is in use. If file is in use then I'm thinking of making a backup copy of data that could not be written. Then after 60 seconds add some logic to check if backup file is empty or not. If backup file is not empty then add its contents to main file. If empty then just add new data to main file. Of course, the first thing I will always do is check if file is in use.
Thanks for all your ideas.
I was thinking of using the File
canRead(), canWrite() methods to check
if file is in use.
Not a good idea - you'll run into race conditions e.g. when your code has used those check methods, received true return values, but then the file is locked by a different application (possibly the user) just before you open it for writing.
Instead, try to get a FileLock on the file and use the "backup file" when that fails.
You can hold a lock on the file. This should guarantee you are able to write on the file.
See here on how to use the FileLock class.
If the user is viewing the file you should still be able to read it. In this case, make an exact copy of the file, and make changes to the new file.
Then after the next 60 seconds you can either:
1) Check if the file is being viewed and if not, delete it and replace it with the earlier file, then directly update this file.
2) If it is being viewed, continue making changes to the copy of the file.
EDIT: As Michael mentioned, when working with the main file, get a lock on it first.