Detect file deletion while using a FileOutputStream - java

I have created a Java process which writes to a plain text file and another Java process which consumes this text file. The 'consumer' reads then deletes the text file. For the sake of simplicity, I do not use file locks (I know it may lead to concurrency problems).
The 'consumer' process runs every 30 minutes from crontab. The 'producer' process currently just redirects whatever it receives from the standard input to the text file. This is just for testing - in the future, the 'producer' process will write the text file by itself.
The 'producer' process opens a FileOutputStream once and keeps writing to the text file usign this output stream. The problem is when the 'consumer' deletes the file. Since I'm in an UNIX environment, this situation is handled 'gracefully': the 'producer' keeps working as if nothing happened, since the inode of the file is still valid, but the file can no longer be found in the file system. This thread provides a way to handle this situation using C. Since I'm using Java, which is portable and therefore hides all platform-specific features, I'm not able to use the solution presented there.
Is there a portable way in Java to detect when the file was deleted while the FileOutputStream was still open?

This isn't a robust way for your processes to communicate, and the best I can advise is to stop doing that.
As far as I know there isn't a reliable way for a C program to detect when a file being written is unlinked, let alone a Java program. (The accepted answer you've linked to can only poll the directory entry to see if it's still there; I don't consider this sufficiently robust).
As you've noticed, UNIX doesn't consider it abnormal for an open file to be unlinked (indeed, it's an established practice to create a named tempfile, grab a filehandle, then delete it from the directory so that other processes can't get at it, before reading and writing).
If you must use files, consider having your consumer poll a directory. Have a .../pending/ directory for files in the process of being written and .../inbox/ for files that are ready for processing.
Producer creates a new uniquefilename (e.g. a UUID) and writes a new file to pending/.
After closing the file, Producer moves the file to inbox/ -- as long as both dirs are on the same filesystem, this will just be a relink, so the file will never be incomplete in inbox/.
Consumer looks for files in inbox/, reads them and deletes when done.
You can enhance this with more directories if there are eventually multiple consumers, but there's no immediate need.
But polling files/directories is always a bit fragile. Consider a database or a message queue.

You can check the filename itself for existence:
if (!Files.exists(Paths.get("/path/to/file"))) {
// The consumer has deleted the file.
}
but in any case, shouldn't the consumer be waiting for the producer to finish writing the file before it reads & deletes it? If it did, you wouldn't have this problem.

To solve this the way you're intending to do, you might have to look at JNI, which lets you call c/c++ functions from within Java, but this might also require you to program a wrapper-library for stat/fstat first (in c/c++).
However - that will cause you major headache.
This might be a workaround which doesnt require much change to your code right now (i assume).
You can let the producer write to a new File each time its producing new Data. Depending on the amount, you might want to group the data, so that the directory wont be flooded with files. For example, one file per minute that contains all data that's been produced so far.
Also it might be a good idea to write the files to another directory first and then move them to your Consumers input-directory - i'm a bit paranoid here, because there could be some race-conditions causing you some dataloss... - moving the files after everything has been already written and then moving them will make sure, no data gets lost.
Hope this helps Good luck :)

Related

How to open a file in Java that does not prevent external "Safe Save"?

We want to open a file in Java and read its contents.
This file may be updated by an external application using Safe Save. That means the file will be externally read and its updated contents will be stored to a new file. Eventually the original file is deleted and the new file is renamed to match the original file's name.
Unfortunately the external process fails during rename (last part of the Safe Save) when our Java Application is reading the original file at the same time.
We played with different kind of open modes but could not get a solution that does not fail the external reader.
Is there some way to open a file that does not interfere with external processes accessing the same file? Ideally, whenever an external process moves or deletes the file we would like to get an exception in our Java application. And only there.
Do you have any ideas on how to achieve that?
EDIT:
Just some clarification regarding the use case:
This an indexer like scenario. We want to index contents of a potentially very large filesystem where 3rd party independent processes can concurrently read from or write to as well. We have no control over the 3rd party processes.
Copying the original file seems like a big overhead and we are not sure if that helps with the original problem as it will probably fail the external reader on a Safe Save as well.
Last but not least: This should work on Windows and Linux. But we are experiencing this problems on Windows.
On Windows, whether a file can be renamed or deleted while it's open is controlled by the FILE_SHARE_DELETE sharing mode flag. This flag should be passed in when the file is opened with the low level CreateFile function.
Unfortunately, Java API does not give you control over low level Windows-specific flags. There is an open bug report to have FILE_SHARE_DELETE added by default, but it's unlikely it will be done because of backwards compatibility (some applications may depend on this behavior). the A comment in the report suggests a workaround: instead of new FileInputStream(file) use the java.nio API.
InputStream in = Files.newInputStream(file.toPath());
I don't have access to Windows right now to verify that this workaround uses the right sharing mode.
Make a copy of the original file an use this within your Java program, and at the same time keep track of the original file.
Here, this might help you out:
The java.nio.file package provides a file change notification API, called the Watch Service API. This API enables you to register a directory (or directories) with the watch service. When registering, you tell the service which types of events you are interested in: file creation, file deletion, or file modification. When the service detects an event of interest, it is forwarded to the registered process. The registered process has a thread (or a pool of threads) dedicated to watching for any events it has registered for. When an event comes in, it is handled as needed. Official docs
You cannot achieve this only with files, at least not without making additional assumptions. If the processes are not synchronized you will get either (a) errors (b) corrupted data or (c) both. Furthermore, such system will be unstable, prone to race conditions and implementation-specific details. This means that even if it looks like it's working it will not work correctly always and in each case.
Depending on your circumstances you might try to use a combination of scehduling (i.e. process A runs every even minute, process B every odd minute), exclusive/shared open flags, range locks, copying files, file change notifiers, retrying on failure etc. If you can somehow ensure that your assumptions are never broken you might end up with something which is "good enough". But all in all, this is a bad engineering practice and should be avoided.
For a proper solution, you need to make both processes aware that they are talking to each other. What you have is really a textbook use case for a database. Besides using a database there are plenty of other ways to synchronize access to data - messaging, streams, locks, shared memory etc. Each way has its own benefits and downsides and without knowing more about your specific situation it is impossible to say which would be better.

Which is the correct way to delete a file so that it is not recoverable?

Currently I am using file.delete() but it is showing a security risk for this as files deleted like this can be recovered by different means. So please provide me a correct way to delete a file. The security risk depicted here is provided by a testing tool called Quixxi and it checks for any vulnerability in app.
The reason a "deleted" file is recoverable is because a delete operation simply unlinks the file in the filesystem, so the directory no longer considers that file part of it. The contents on disk (or whatever storage) still exist on that device.
If you want to guarantee the contents can never be recovered, you have to overwrite the contents first. There are no built-in functions to do this - you'd have to find a library or write the code yourself. Typically you'd write something like all 0s over the file (make sure to flush to media), write all 1s, write a pattern of 01 repeating, 10 repeating, something like that. After you've written with garbage patterns to media (flush) a few times, then you issue the delete.
Not possible in JRE, unfortunately. The JVM is not designed for that, and you need OS-dependent utilities.
The answer by user1676075 contains a mistake. Let's go by steps.
As pointed out already, Java's File.delete method only unlinks the file leaving its contents on disk. It actually invokes the underlying OS APIs to perform this unlink operation.
The problem occurs when you want to overwrite contents in Java.
Java can open a file for overwrite, but will leverage OS utils to do so. And the OS will likely:
Unlink the allocated space on disk
Link the file to a new free area of disk
The result is that you are now writing tons of zeroes... somewhere else!!!
And even if you managed to write zeroes on the same sectors used by the original file, Gutmann method exists for a reason. Gutmann utilities require root/Administrator (Super User) permissions and direct DMA access to precisely control where the writes have to occur.
And with SSDs, things changes. Actually, it might get easier! At this point, I should provide source for SSDs having a CLEAR instructions to replace a sector with zeroes and that privacy-savy disk controllers do that. But maybe pretend you have read nothing.
This will be a sufficient answer for now, because we have demonstrated that there is no out-of-the-box and straightforward way to securely clear a file in Java.
What Java allows, and is called Java Native Interfaces (please also see Java Native Access), is to call native code from Java. So, you got your Gutmann tool in C++ ready? Are you running root? You can write code to invoke Gutmann-ish erasure from Java, but that's a whole other point.
Never tried, but surely feasible

When does JNotify notify creation of a file

Suppose JNotify is listening to a folder named A and I copied a file f to A from folder B which is not part of a sub directory of A. At what exact point of time will JNotify notify.
1) Is it at the point when writing into the new file starts i.e when open() is called on the file to write into it?
OR
2) Is it at the point after the new file is written completely and closed i.e when close() is called on the file after completion of writing into it?
And I am not sure if copying a file involves writing into the file. But I guess it should do so.
I would like to know the scenario in ubuntu(Linux).Any reference is highly appreciated.
JNotify does not make any guaranties as to exact timing of notification delivery.
generally it depends on the operating system API, and those APIs can do their own buffering as well (Windows will behave differently than Linux or Mac).
in most non trivial file creation and writes you will be getting a series of events, not just one.
you may want to have a timer on your side to get the point where the events for the file stops to determine that it's safe to operate on.

Java: Updating a .txt file as the program runs and being able to see the change

I have to run a Java program that needs to keep track of transactions the user makes. I need to log these transactions in the .txt file.
Everything is working well with my code, expect that I cannot see the .txt file - it is not created - till the programs closes.
The goal for our Project is to be able to see this file get updated live as the programs is running. The user completes Order #1 and the transactions of that order get logged into the .txt file and one can see the changes right away - while the program is still running. The user completes Order #2 and the transaction of that order are appended to the .txt file - again, while the program is running.
I am using:
PrintWriter out;
out = (new PrintWriter(new FileWriter("log.txt", true)));
(writes lines to file)
out.flush();
out.close();
This code is within a method that gets called every time the users finishes his or her order. As soon as the order is finish the log.txt file should reflect the changes right away without the program quitting. I have spend hours on searching how to do this but I have not suceeeded. I am also relatively new to Java and programming; therefore, any guidance is greatly appreciated.
Thank you.
have you looked at standard logging framework for java? (slf4j) it's an api that is pretty much ubiquitous and there are many very good implementations, like logback, or log4j and so on. Let those worry about writing to files. Program to an interface (slf4j interface, namely) and copy-paste (if you don't want to do anything fancy) some xml configuration for the logger implementation from the internet.
you would not have to open files, or flush and close them. your code would be:
log.info("something happened");
read up on this topic, as there practically aren't serious java projects that would not have a logging element to them. invest some time into learning this framework once, as you can use it forever.
Probably your buffer is unloaded and written onto the file only when you invoke the flush method. Buffer waits to accumulate some data before the writing operation.

Lock future file

So I have a Samba file server on which my Java app needs to write some files. The thing is that there is also another php application (if a php script is even considered an application) that is aggressively pulling the same directory for new files.
Sometimes, the php script is pulling the file before my Java app is done writing it completely to the disk. Here is a little bit of ascii art to help visualize what I currently have (but doesn't work):
Samba share
/foo (my java app drops file here)
/bar (the directory that the php is pulling)
What I'm currently doing is when the file meets some criterias, it's being moved to /bar and then picked up by the php for more processing. I've tried different thing such has setting the file non writable and non readable before calling renameTo.
I've looked a little bit at FileLocks but it doesn't seem to be able to lock future files. So I am wondering what kind of possiblities I have here? What could I use to lock the file from being picked up before it's fully written without touching the php (because, well, it's php and I don't really have the right to modify it right now).
Thanks
Edit 1
I've got some insight on what the php script is really doing if it can help in any way.
It's reading the directory file in loop (using readdir without sleeping).
As soon as it finds a filename other than "." and "..", it calls file_get_contents and that's where it fails because the file is not completely written to disk (or not even there since the Java code might not even had time to write it between the readdir and file_get_contents.
Edit 2
This Java application is replacing an old php script. When they implemented it, they had the same problem I'm having right now. They solved it by writing the new file in /bar/tmp(with file_put_contents) and then use rename to move it to bar (it looks like rename is supposed to be atomic). And it's been working fine so far. I can't and won't believe that Java can't do something better than what php does...
I think this is due to the fact read locks are shared (multiple process can apply read locks to the same file and read it together).
One approach you can do is to create a separate temporary lock file (eg: /bar/file1.lock) while /bar/file1 hasn't finished copying. Delete the lock file as soon as the file copying is finished.
Then alter the php code to ensure the file isn't being locked before it reads.
You mentioned that you tried FileLock, but keep in mind the disclaimer in the javadoc for that method:
Whether or not a lock actually prevents another program from accessing
the content of the locked region is system-dependent and therefore
unspecified. The native file-locking facilities of some systems are
merely advisory, meaning that programs must cooperatively observe a
known locking protocol in order to guarantee data integrity.
You also mentioned you are using File.renameTo, which also has some caveats (mentioned in the javadoc):
Many aspects of the behavior of this method are inherently
platform-dependent: The rename operation might not be able to move a
file from one filesystem to another, it might not be atomic, and it
might not succeed
Instead of File.renameTo, Try Files.move with the ATOMIC_MOVE option. You'll have to catch AtomicMoveNotSupportedException and possibly fall back to some alternative workaround in case an atomic move is not possible.
You could create a hardlink with Files.createLink(Paths.get('/foo/myFile'), 'Paths.get('/bar/myFile')) then delete the original directory entry (in this example, /foo/myFile.
Failing that, a simple workaround that doesn't require modification to the PHP is to use a shell command or system call to move the file from /foo to /bar. You could, for example, use ProcessBuilder to call mv, or perhaps call ln to create a symlink or hardlink in /bar. You might still have the same problem with mv if /foo and /bar are on different filesystems.
If you have root privileges on the server, you could also try implementing mandatory file locking. I found an example in C, but you could call the C program from Java or adapt the example to Java using JNA (or JNI if you want to punish yourself).

Categories