Lock future file - java

So I have a Samba file server on which my Java app needs to write some files. The thing is that there is also another php application (if a php script is even considered an application) that is aggressively pulling the same directory for new files.
Sometimes, the php script is pulling the file before my Java app is done writing it completely to the disk. Here is a little bit of ascii art to help visualize what I currently have (but doesn't work):
Samba share
/foo (my java app drops file here)
/bar (the directory that the php is pulling)
What I'm currently doing is when the file meets some criterias, it's being moved to /bar and then picked up by the php for more processing. I've tried different thing such has setting the file non writable and non readable before calling renameTo.
I've looked a little bit at FileLocks but it doesn't seem to be able to lock future files. So I am wondering what kind of possiblities I have here? What could I use to lock the file from being picked up before it's fully written without touching the php (because, well, it's php and I don't really have the right to modify it right now).
Thanks
Edit 1
I've got some insight on what the php script is really doing if it can help in any way.
It's reading the directory file in loop (using readdir without sleeping).
As soon as it finds a filename other than "." and "..", it calls file_get_contents and that's where it fails because the file is not completely written to disk (or not even there since the Java code might not even had time to write it between the readdir and file_get_contents.
Edit 2
This Java application is replacing an old php script. When they implemented it, they had the same problem I'm having right now. They solved it by writing the new file in /bar/tmp(with file_put_contents) and then use rename to move it to bar (it looks like rename is supposed to be atomic). And it's been working fine so far. I can't and won't believe that Java can't do something better than what php does...

I think this is due to the fact read locks are shared (multiple process can apply read locks to the same file and read it together).
One approach you can do is to create a separate temporary lock file (eg: /bar/file1.lock) while /bar/file1 hasn't finished copying. Delete the lock file as soon as the file copying is finished.
Then alter the php code to ensure the file isn't being locked before it reads.

You mentioned that you tried FileLock, but keep in mind the disclaimer in the javadoc for that method:
Whether or not a lock actually prevents another program from accessing
the content of the locked region is system-dependent and therefore
unspecified. The native file-locking facilities of some systems are
merely advisory, meaning that programs must cooperatively observe a
known locking protocol in order to guarantee data integrity.
You also mentioned you are using File.renameTo, which also has some caveats (mentioned in the javadoc):
Many aspects of the behavior of this method are inherently
platform-dependent: The rename operation might not be able to move a
file from one filesystem to another, it might not be atomic, and it
might not succeed
Instead of File.renameTo, Try Files.move with the ATOMIC_MOVE option. You'll have to catch AtomicMoveNotSupportedException and possibly fall back to some alternative workaround in case an atomic move is not possible.
You could create a hardlink with Files.createLink(Paths.get('/foo/myFile'), 'Paths.get('/bar/myFile')) then delete the original directory entry (in this example, /foo/myFile.
Failing that, a simple workaround that doesn't require modification to the PHP is to use a shell command or system call to move the file from /foo to /bar. You could, for example, use ProcessBuilder to call mv, or perhaps call ln to create a symlink or hardlink in /bar. You might still have the same problem with mv if /foo and /bar are on different filesystems.
If you have root privileges on the server, you could also try implementing mandatory file locking. I found an example in C, but you could call the C program from Java or adapt the example to Java using JNA (or JNI if you want to punish yourself).

Related

How to open a file in Java that does not prevent external "Safe Save"?

We want to open a file in Java and read its contents.
This file may be updated by an external application using Safe Save. That means the file will be externally read and its updated contents will be stored to a new file. Eventually the original file is deleted and the new file is renamed to match the original file's name.
Unfortunately the external process fails during rename (last part of the Safe Save) when our Java Application is reading the original file at the same time.
We played with different kind of open modes but could not get a solution that does not fail the external reader.
Is there some way to open a file that does not interfere with external processes accessing the same file? Ideally, whenever an external process moves or deletes the file we would like to get an exception in our Java application. And only there.
Do you have any ideas on how to achieve that?
EDIT:
Just some clarification regarding the use case:
This an indexer like scenario. We want to index contents of a potentially very large filesystem where 3rd party independent processes can concurrently read from or write to as well. We have no control over the 3rd party processes.
Copying the original file seems like a big overhead and we are not sure if that helps with the original problem as it will probably fail the external reader on a Safe Save as well.
Last but not least: This should work on Windows and Linux. But we are experiencing this problems on Windows.
On Windows, whether a file can be renamed or deleted while it's open is controlled by the FILE_SHARE_DELETE sharing mode flag. This flag should be passed in when the file is opened with the low level CreateFile function.
Unfortunately, Java API does not give you control over low level Windows-specific flags. There is an open bug report to have FILE_SHARE_DELETE added by default, but it's unlikely it will be done because of backwards compatibility (some applications may depend on this behavior). the A comment in the report suggests a workaround: instead of new FileInputStream(file) use the java.nio API.
InputStream in = Files.newInputStream(file.toPath());
I don't have access to Windows right now to verify that this workaround uses the right sharing mode.
Make a copy of the original file an use this within your Java program, and at the same time keep track of the original file.
Here, this might help you out:
The java.nio.file package provides a file change notification API, called the Watch Service API. This API enables you to register a directory (or directories) with the watch service. When registering, you tell the service which types of events you are interested in: file creation, file deletion, or file modification. When the service detects an event of interest, it is forwarded to the registered process. The registered process has a thread (or a pool of threads) dedicated to watching for any events it has registered for. When an event comes in, it is handled as needed. Official docs
You cannot achieve this only with files, at least not without making additional assumptions. If the processes are not synchronized you will get either (a) errors (b) corrupted data or (c) both. Furthermore, such system will be unstable, prone to race conditions and implementation-specific details. This means that even if it looks like it's working it will not work correctly always and in each case.
Depending on your circumstances you might try to use a combination of scehduling (i.e. process A runs every even minute, process B every odd minute), exclusive/shared open flags, range locks, copying files, file change notifiers, retrying on failure etc. If you can somehow ensure that your assumptions are never broken you might end up with something which is "good enough". But all in all, this is a bad engineering practice and should be avoided.
For a proper solution, you need to make both processes aware that they are talking to each other. What you have is really a textbook use case for a database. Besides using a database there are plenty of other ways to synchronize access to data - messaging, streams, locks, shared memory etc. Each way has its own benefits and downsides and without knowing more about your specific situation it is impossible to say which would be better.

Which is the correct way to delete a file so that it is not recoverable?

Currently I am using file.delete() but it is showing a security risk for this as files deleted like this can be recovered by different means. So please provide me a correct way to delete a file. The security risk depicted here is provided by a testing tool called Quixxi and it checks for any vulnerability in app.
The reason a "deleted" file is recoverable is because a delete operation simply unlinks the file in the filesystem, so the directory no longer considers that file part of it. The contents on disk (or whatever storage) still exist on that device.
If you want to guarantee the contents can never be recovered, you have to overwrite the contents first. There are no built-in functions to do this - you'd have to find a library or write the code yourself. Typically you'd write something like all 0s over the file (make sure to flush to media), write all 1s, write a pattern of 01 repeating, 10 repeating, something like that. After you've written with garbage patterns to media (flush) a few times, then you issue the delete.
Not possible in JRE, unfortunately. The JVM is not designed for that, and you need OS-dependent utilities.
The answer by user1676075 contains a mistake. Let's go by steps.
As pointed out already, Java's File.delete method only unlinks the file leaving its contents on disk. It actually invokes the underlying OS APIs to perform this unlink operation.
The problem occurs when you want to overwrite contents in Java.
Java can open a file for overwrite, but will leverage OS utils to do so. And the OS will likely:
Unlink the allocated space on disk
Link the file to a new free area of disk
The result is that you are now writing tons of zeroes... somewhere else!!!
And even if you managed to write zeroes on the same sectors used by the original file, Gutmann method exists for a reason. Gutmann utilities require root/Administrator (Super User) permissions and direct DMA access to precisely control where the writes have to occur.
And with SSDs, things changes. Actually, it might get easier! At this point, I should provide source for SSDs having a CLEAR instructions to replace a sector with zeroes and that privacy-savy disk controllers do that. But maybe pretend you have read nothing.
This will be a sufficient answer for now, because we have demonstrated that there is no out-of-the-box and straightforward way to securely clear a file in Java.
What Java allows, and is called Java Native Interfaces (please also see Java Native Access), is to call native code from Java. So, you got your Gutmann tool in C++ ready? Are you running root? You can write code to invoke Gutmann-ish erasure from Java, but that's a whole other point.
Never tried, but surely feasible

Detect file deletion while using a FileOutputStream

I have created a Java process which writes to a plain text file and another Java process which consumes this text file. The 'consumer' reads then deletes the text file. For the sake of simplicity, I do not use file locks (I know it may lead to concurrency problems).
The 'consumer' process runs every 30 minutes from crontab. The 'producer' process currently just redirects whatever it receives from the standard input to the text file. This is just for testing - in the future, the 'producer' process will write the text file by itself.
The 'producer' process opens a FileOutputStream once and keeps writing to the text file usign this output stream. The problem is when the 'consumer' deletes the file. Since I'm in an UNIX environment, this situation is handled 'gracefully': the 'producer' keeps working as if nothing happened, since the inode of the file is still valid, but the file can no longer be found in the file system. This thread provides a way to handle this situation using C. Since I'm using Java, which is portable and therefore hides all platform-specific features, I'm not able to use the solution presented there.
Is there a portable way in Java to detect when the file was deleted while the FileOutputStream was still open?
This isn't a robust way for your processes to communicate, and the best I can advise is to stop doing that.
As far as I know there isn't a reliable way for a C program to detect when a file being written is unlinked, let alone a Java program. (The accepted answer you've linked to can only poll the directory entry to see if it's still there; I don't consider this sufficiently robust).
As you've noticed, UNIX doesn't consider it abnormal for an open file to be unlinked (indeed, it's an established practice to create a named tempfile, grab a filehandle, then delete it from the directory so that other processes can't get at it, before reading and writing).
If you must use files, consider having your consumer poll a directory. Have a .../pending/ directory for files in the process of being written and .../inbox/ for files that are ready for processing.
Producer creates a new uniquefilename (e.g. a UUID) and writes a new file to pending/.
After closing the file, Producer moves the file to inbox/ -- as long as both dirs are on the same filesystem, this will just be a relink, so the file will never be incomplete in inbox/.
Consumer looks for files in inbox/, reads them and deletes when done.
You can enhance this with more directories if there are eventually multiple consumers, but there's no immediate need.
But polling files/directories is always a bit fragile. Consider a database or a message queue.
You can check the filename itself for existence:
if (!Files.exists(Paths.get("/path/to/file"))) {
// The consumer has deleted the file.
}
but in any case, shouldn't the consumer be waiting for the producer to finish writing the file before it reads & deletes it? If it did, you wouldn't have this problem.
To solve this the way you're intending to do, you might have to look at JNI, which lets you call c/c++ functions from within Java, but this might also require you to program a wrapper-library for stat/fstat first (in c/c++).
However - that will cause you major headache.
This might be a workaround which doesnt require much change to your code right now (i assume).
You can let the producer write to a new File each time its producing new Data. Depending on the amount, you might want to group the data, so that the directory wont be flooded with files. For example, one file per minute that contains all data that's been produced so far.
Also it might be a good idea to write the files to another directory first and then move them to your Consumers input-directory - i'm a bit paranoid here, because there could be some race-conditions causing you some dataloss... - moving the files after everything has been already written and then moving them will make sure, no data gets lost.
Hope this helps Good luck :)

How to watch a complete file system for changes in Java?

Problem description
I would like to watch a complete file system for changes. I'm talking about watching changes in a directory recursively. So, when watching a directory (or a whole file system) all changes in sub-directories need to be captured too. The application needs to be able to track all changes by getting notified.
Java's WatchService isn't suitable
Java already has a WatchService feature, which allows you to monitor a directory for changes. The problem is however, that this isn't a recursive process as far as I know, thus you can't use this to monitor all changes in the root directory of a file system.
Watching all sub-directories explicitly
A solution I've thought of would be to register each directory inside the specified root directory explicitly. The problem with this is however, that walking through, and registering these directories is very resource expensive on a system with more than a million sub-directories. This is because the system would need to go through the whole file system recursively to only register all directories in the first place. The performance impact of this feature would be too big, if it's even possible without crashing the application.
Logical solution
I would assume an operating system would fire/call some sort of event when anything is changed on the file system, that an application is able to listen to. I did however, not find anything like this yet. This would allow the application to listen to all changes without the need to register all sub-directories explicitly. Thus the performance impact with such a method would be minimal.
Question
Is watching a whole file system, or watching a directory recursively possible in Java, and how would this be achieved?
The question should be split into several:
How to track file events across the disk on certain OS
How to use this mechanism in Java
The answer to the first question is that the approaches are different. On Windows there exist Windows API functions that let you do this (and famous FileSystemWatcher class in .NET Framework is a kind of wrapper around this API function set). The more robust method on windows is to create or use a pre-created file system filter driver. On Linux there exists inotify. On MacOS X there exist several approaches (there was a question on this topic somewhere around), none of them being universal or always available.
Also all approaches except a filesystem filter driver are good only for being notified after the event happens, but they don't let you intercept and deny the request (AFAIK, I can be mistaken here).
As for the second question, there seems to be no universal solution that would cover all or most variants that I mentioned above. You would need to first choose the mechanism for each OS, then find some wrappers for Java to use those mechanisms.
Here is an example to to watch a directory (or tree) for changes to file
Please find example https://github.com/syncany/syncany/blob/59cf87c72de4322c737f0073ce8a7ddd992fd898/syncany-lib/src/main/java/org/syncany/operations/watch/RecursiveWatcher.java
Even you can filtered our directory that you don't want to watch

What Tool/Utility can I use to list deleted files on windows?

I am making a Java Desktop Application that is going to "Shred" or "Swipe" or "More Permanently Delete Diles". I can do the swiping but first I have to find and access the deleted files.
Is there some tool or utility that I can use to access deleted files? I could restore them to a temporary location and then shred them. Or is there a way I can do this with Java or the command line?
How do I list files marked as deleted by the Windows delete process using Java or the Command Line?
The short answer is that you can't do it in Java.
The longer answer is that the only way that you could do this in Java would be write a lot of native code to:
access the disk at the disk-block level,
decode the file system data structured to locate deleted files and orphaned blocks that were once part of deleted files, and
zero the relevant blocks.
... while ...
making sure that the blocks haven't been reallocated to another (non-deleted) file, and
taking account of other running processes that may be creating, modifying and deleting files.
Doing all of this is really hard if you are implementing everything in C / C++, and even harder if you are doing it from Java. And if you screw it up, you could trash the PC's file system.
A better idea would be to find some existing tool / utility that does the job and use Runtime.exec(...) or equivalent to run it as a separate process.
(I'll leave it to someone else to suggest possible tools / utilities. The sysinternals sdelete tool doesn't appear to deal with files that have already been deleted.)

Categories