When does JNotify notify creation of a file - java

Suppose JNotify is listening to a folder named A and I copied a file f to A from folder B which is not part of a sub directory of A. At what exact point of time will JNotify notify.
1) Is it at the point when writing into the new file starts i.e when open() is called on the file to write into it?
OR
2) Is it at the point after the new file is written completely and closed i.e when close() is called on the file after completion of writing into it?
And I am not sure if copying a file involves writing into the file. But I guess it should do so.
I would like to know the scenario in ubuntu(Linux).Any reference is highly appreciated.

JNotify does not make any guaranties as to exact timing of notification delivery.
generally it depends on the operating system API, and those APIs can do their own buffering as well (Windows will behave differently than Linux or Mac).
in most non trivial file creation and writes you will be getting a series of events, not just one.
you may want to have a timer on your side to get the point where the events for the file stops to determine that it's safe to operate on.

Related

How to open a file in Java that does not prevent external "Safe Save"?

We want to open a file in Java and read its contents.
This file may be updated by an external application using Safe Save. That means the file will be externally read and its updated contents will be stored to a new file. Eventually the original file is deleted and the new file is renamed to match the original file's name.
Unfortunately the external process fails during rename (last part of the Safe Save) when our Java Application is reading the original file at the same time.
We played with different kind of open modes but could not get a solution that does not fail the external reader.
Is there some way to open a file that does not interfere with external processes accessing the same file? Ideally, whenever an external process moves or deletes the file we would like to get an exception in our Java application. And only there.
Do you have any ideas on how to achieve that?
EDIT:
Just some clarification regarding the use case:
This an indexer like scenario. We want to index contents of a potentially very large filesystem where 3rd party independent processes can concurrently read from or write to as well. We have no control over the 3rd party processes.
Copying the original file seems like a big overhead and we are not sure if that helps with the original problem as it will probably fail the external reader on a Safe Save as well.
Last but not least: This should work on Windows and Linux. But we are experiencing this problems on Windows.
On Windows, whether a file can be renamed or deleted while it's open is controlled by the FILE_SHARE_DELETE sharing mode flag. This flag should be passed in when the file is opened with the low level CreateFile function.
Unfortunately, Java API does not give you control over low level Windows-specific flags. There is an open bug report to have FILE_SHARE_DELETE added by default, but it's unlikely it will be done because of backwards compatibility (some applications may depend on this behavior). the A comment in the report suggests a workaround: instead of new FileInputStream(file) use the java.nio API.
InputStream in = Files.newInputStream(file.toPath());
I don't have access to Windows right now to verify that this workaround uses the right sharing mode.
Make a copy of the original file an use this within your Java program, and at the same time keep track of the original file.
Here, this might help you out:
The java.nio.file package provides a file change notification API, called the Watch Service API. This API enables you to register a directory (or directories) with the watch service. When registering, you tell the service which types of events you are interested in: file creation, file deletion, or file modification. When the service detects an event of interest, it is forwarded to the registered process. The registered process has a thread (or a pool of threads) dedicated to watching for any events it has registered for. When an event comes in, it is handled as needed. Official docs
You cannot achieve this only with files, at least not without making additional assumptions. If the processes are not synchronized you will get either (a) errors (b) corrupted data or (c) both. Furthermore, such system will be unstable, prone to race conditions and implementation-specific details. This means that even if it looks like it's working it will not work correctly always and in each case.
Depending on your circumstances you might try to use a combination of scehduling (i.e. process A runs every even minute, process B every odd minute), exclusive/shared open flags, range locks, copying files, file change notifiers, retrying on failure etc. If you can somehow ensure that your assumptions are never broken you might end up with something which is "good enough". But all in all, this is a bad engineering practice and should be avoided.
For a proper solution, you need to make both processes aware that they are talking to each other. What you have is really a textbook use case for a database. Besides using a database there are plenty of other ways to synchronize access to data - messaging, streams, locks, shared memory etc. Each way has its own benefits and downsides and without knowing more about your specific situation it is impossible to say which would be better.

Detect file deletion while using a FileOutputStream

I have created a Java process which writes to a plain text file and another Java process which consumes this text file. The 'consumer' reads then deletes the text file. For the sake of simplicity, I do not use file locks (I know it may lead to concurrency problems).
The 'consumer' process runs every 30 minutes from crontab. The 'producer' process currently just redirects whatever it receives from the standard input to the text file. This is just for testing - in the future, the 'producer' process will write the text file by itself.
The 'producer' process opens a FileOutputStream once and keeps writing to the text file usign this output stream. The problem is when the 'consumer' deletes the file. Since I'm in an UNIX environment, this situation is handled 'gracefully': the 'producer' keeps working as if nothing happened, since the inode of the file is still valid, but the file can no longer be found in the file system. This thread provides a way to handle this situation using C. Since I'm using Java, which is portable and therefore hides all platform-specific features, I'm not able to use the solution presented there.
Is there a portable way in Java to detect when the file was deleted while the FileOutputStream was still open?
This isn't a robust way for your processes to communicate, and the best I can advise is to stop doing that.
As far as I know there isn't a reliable way for a C program to detect when a file being written is unlinked, let alone a Java program. (The accepted answer you've linked to can only poll the directory entry to see if it's still there; I don't consider this sufficiently robust).
As you've noticed, UNIX doesn't consider it abnormal for an open file to be unlinked (indeed, it's an established practice to create a named tempfile, grab a filehandle, then delete it from the directory so that other processes can't get at it, before reading and writing).
If you must use files, consider having your consumer poll a directory. Have a .../pending/ directory for files in the process of being written and .../inbox/ for files that are ready for processing.
Producer creates a new uniquefilename (e.g. a UUID) and writes a new file to pending/.
After closing the file, Producer moves the file to inbox/ -- as long as both dirs are on the same filesystem, this will just be a relink, so the file will never be incomplete in inbox/.
Consumer looks for files in inbox/, reads them and deletes when done.
You can enhance this with more directories if there are eventually multiple consumers, but there's no immediate need.
But polling files/directories is always a bit fragile. Consider a database or a message queue.
You can check the filename itself for existence:
if (!Files.exists(Paths.get("/path/to/file"))) {
// The consumer has deleted the file.
}
but in any case, shouldn't the consumer be waiting for the producer to finish writing the file before it reads & deletes it? If it did, you wouldn't have this problem.
To solve this the way you're intending to do, you might have to look at JNI, which lets you call c/c++ functions from within Java, but this might also require you to program a wrapper-library for stat/fstat first (in c/c++).
However - that will cause you major headache.
This might be a workaround which doesnt require much change to your code right now (i assume).
You can let the producer write to a new File each time its producing new Data. Depending on the amount, you might want to group the data, so that the directory wont be flooded with files. For example, one file per minute that contains all data that's been produced so far.
Also it might be a good idea to write the files to another directory first and then move them to your Consumers input-directory - i'm a bit paranoid here, because there could be some race-conditions causing you some dataloss... - moving the files after everything has been already written and then moving them will make sure, no data gets lost.
Hope this helps Good luck :)

Lock future file

So I have a Samba file server on which my Java app needs to write some files. The thing is that there is also another php application (if a php script is even considered an application) that is aggressively pulling the same directory for new files.
Sometimes, the php script is pulling the file before my Java app is done writing it completely to the disk. Here is a little bit of ascii art to help visualize what I currently have (but doesn't work):
Samba share
/foo (my java app drops file here)
/bar (the directory that the php is pulling)
What I'm currently doing is when the file meets some criterias, it's being moved to /bar and then picked up by the php for more processing. I've tried different thing such has setting the file non writable and non readable before calling renameTo.
I've looked a little bit at FileLocks but it doesn't seem to be able to lock future files. So I am wondering what kind of possiblities I have here? What could I use to lock the file from being picked up before it's fully written without touching the php (because, well, it's php and I don't really have the right to modify it right now).
Thanks
Edit 1
I've got some insight on what the php script is really doing if it can help in any way.
It's reading the directory file in loop (using readdir without sleeping).
As soon as it finds a filename other than "." and "..", it calls file_get_contents and that's where it fails because the file is not completely written to disk (or not even there since the Java code might not even had time to write it between the readdir and file_get_contents.
Edit 2
This Java application is replacing an old php script. When they implemented it, they had the same problem I'm having right now. They solved it by writing the new file in /bar/tmp(with file_put_contents) and then use rename to move it to bar (it looks like rename is supposed to be atomic). And it's been working fine so far. I can't and won't believe that Java can't do something better than what php does...
I think this is due to the fact read locks are shared (multiple process can apply read locks to the same file and read it together).
One approach you can do is to create a separate temporary lock file (eg: /bar/file1.lock) while /bar/file1 hasn't finished copying. Delete the lock file as soon as the file copying is finished.
Then alter the php code to ensure the file isn't being locked before it reads.
You mentioned that you tried FileLock, but keep in mind the disclaimer in the javadoc for that method:
Whether or not a lock actually prevents another program from accessing
the content of the locked region is system-dependent and therefore
unspecified. The native file-locking facilities of some systems are
merely advisory, meaning that programs must cooperatively observe a
known locking protocol in order to guarantee data integrity.
You also mentioned you are using File.renameTo, which also has some caveats (mentioned in the javadoc):
Many aspects of the behavior of this method are inherently
platform-dependent: The rename operation might not be able to move a
file from one filesystem to another, it might not be atomic, and it
might not succeed
Instead of File.renameTo, Try Files.move with the ATOMIC_MOVE option. You'll have to catch AtomicMoveNotSupportedException and possibly fall back to some alternative workaround in case an atomic move is not possible.
You could create a hardlink with Files.createLink(Paths.get('/foo/myFile'), 'Paths.get('/bar/myFile')) then delete the original directory entry (in this example, /foo/myFile.
Failing that, a simple workaround that doesn't require modification to the PHP is to use a shell command or system call to move the file from /foo to /bar. You could, for example, use ProcessBuilder to call mv, or perhaps call ln to create a symlink or hardlink in /bar. You might still have the same problem with mv if /foo and /bar are on different filesystems.
If you have root privileges on the server, you could also try implementing mandatory file locking. I found an example in C, but you could call the C program from Java or adapt the example to Java using JNA (or JNI if you want to punish yourself).

JNotify dosen't recognize Files changed by Linux System

I am using JNotify in one of my Projects on a linux system (arm7).
And it works great. If i change, rename, delete or create a File it throws an Interrupt.
But I would like to us JNotify to get informed if the Linux System change a File by itself.
I am using a BeagleBone (embeded Linux System). There is a file called value which contains the status of an InputPin (high, low). But if this file is changed by the system JNotify dosen't work... If I change the file by my self everything is ok...
Does anyone know why the change wasn't recognize in the first case.
Linux seems to use a special way to write the file... yet i dont't know how...
But need to interrupt my main loop if this file changes.
Or is there another solution?
Thanks
JNotify relies on events from the file system. for Linux it's using the inotify system call (which is actually what inspired it's name).
inotify only works for real file, the file you are described is a virtual file that does not exist on disk and is not a way to store information but rather an easy way to access system information and sometimes change it).
an alternative solution would be to create a sampling thread that will check the file, sleep, and check the file again.
since you only care about a specific file, this is pretty easy.
while it may feel too expensive, polling is actually very common when dealing directly with hardware.
since that file is actually not really a file, reading it would actually be faster than reading a file.

Java WatchService on Windows informing of folder creation before contents have been copied

I'm trying to use Java 7 and WatchService to monitor when folders are added to a folder (by being copied from a different location), then I want to act on the files within the newly created folder.
On OSX it works as I expect, I don't receive notification of new folder creation until the folder and its contents have been copied over. But on Windows I receive the key event on the folder creation before the contents of the folder have been copied so when I try to process the files within the folder there are not there, usually just the first file is there.
My current workaround is after receiving the folder notification I sleep for 10 seconds to wait for the files within to be copied over but this is not very satisfactory because the size of folders can vary considerably so Im going to be sleeping not long enough or too long most of the time.
Why the difference between OSX and Windows, and how can I solve my problem on Windows ?
WatchService is intended to be somewhat platform-dependent. From the Java 7 API documentation:
The implementation that observes events from the file system is
intended to map directly on to the native file event notification
facility where available, or to use a primitive mechanism, such as
polling, when a native facility is not available. Consequently, many
of the details on how events are detected, their timeliness, and
whether their ordering is preserved are highly implementation specific.
Consider the following two cases.
A single copy operation that takes longer than the sleep.
Multiple copy operations into the same folder.
If you respond to the creation of the folder contents rather than the folder itself, you cover both these cases. You can also eliminate the race condition inherent in the sleep.

Categories