Java Files.copy replace existing deletes file entirely

Java Files.copy replace existing deletes file entirely - java

I have some code that is designed to open a local master file, make additions, and save the file both by overwriting the master file and overwriting a write protected copy on an accessible network location. This is done by saving the modified file to a temp file and then copying over the other two files.
String tempFileName= "File.tmp";
String fileName= "File.xlsm";
String serverPath="\\\\network path\\";
File serverFile = new File(serverPath+fileName);
Files.copy(Paths.get(tempFileName),Paths.get(fileName),
StandardCopyOption.COPY_ATTRIBUTES,StandardCopyOption.REPLACE_EXISTING);
if(serverFile.exists()){serverFile.setWritable(true, false);}
Files.copy(Paths.get(tempFileName),Paths.get(serverPath+fileName),
StandardCopyOption.COPY_ATTRIBUTES,StandardCopyOption.REPLACE_EXISTING);
serverFile.setWritable(false,false);
Files.delete(Paths.get(tempFileName));
This code works well most of the time however, some of the time the code completes successfully without exception but with the network location file deleted. The local master file is saved and updated correctly but the file that should exist on the network is simply gone.
What makes this more difficult is that i have been unable to reproduce this problem under any controlled circumstances. So i ask you for any guidance on how this could occur from a file copy/overwrite operation.
Thank you
UPDATE:
I had a hunch and checked network access logs to the server file path. The deletion of the file occurs if and only if the file is being accessed by a user other than the creator but not all of the time. Again though, this is accessed as read only so a user having the file open should not affect overwriting a new version and most of the time does not. Digging deeper it seems that occasionally if and only if the file is opened by another user and java is trying to overwrite the file an AccessDenied Exception is thrown and the file is deleted.
I believe this must be a bug in setWritable() or Files.copy (or a combination) as the file should not be deleted in any case and isWritable() returns true every time. I have tried other methods for setting/UN-setting read only permissions and have come up empty. The current work around that I have in place simply catches the exception and loops until the file is deleted and a fresh copy is in place. This works but is really a hack so if anyone has any better solutions/suggestions I welcome them.

See How does FileLock work?, you could do something like:
Wait for file to become available
Lock file
Overwrite/delete/other
Unlock (if applicable)
This should prevent access by other users during the process of modifying the file.

Related

How to know if file is complete on the server using FTP?

I have a file scanner application in Java, that keeps scanning a directory on a server using FTP. gets list of files of the directory and downloads them one by one. on the other side, on the server, there's a process that writes these files. if I'm lucky I wouldn't try to download an incomplete file but how can I make sure if the write process on the server is complete and the file handle is closed, and file is ready to be downloaded?
I have no control on the write process which is on the server. moreover, I don't have write permission on the directory to try to get a write-handle in order to check if there's already a write handle open, so this option is off the table.
Is there an FTP function addressing this problem?

This is a very old and well-known problem.
There is no way to be absolutely certain a file being written by the FTP daemon is complete. It's even possible that the file transfer failed and then gets restarted and completed. You must poll the file's size and set a time limit, say 5 minutes. If the size does not change during that time you assume the file is complete.
If possible, the program that processes the file should be able to deal with partial files.
A much better alternative is rsync, which is much more robust and deterministic. It can even be configured (via command-line option) to write the data initially to a temporary location and move it to its final destination path upon successful completion. If the file exists where you expect it, then it is by definition complete.

A possible solution would be first uploading the file with a different filename (e.g. adding ".partial") and then renaming it to its final name.
If the server finds the final name then the upload has been completed.
If you cannot control the upload process then what you are asking is impossible by definition: the file upload could stop because of a network problem or because the sending process is stopped for whatever reason.
What the receiving end will observe is just a closing of the incoming stream; there is no way to guarantee that the data will not be a partial transfer.
Other workarounds could be checking for an end-of-data marker or using a request to the sending server to check if (in their view) the transfer has been completed.

This is more fundamental than FTP: you'd have a similar problem reading those files even if they were being created on the local machine.
If you can't modify the writing process, you'll need to jump through some hoops. None are great, but some are safer than others.
Keep reading until nothing changes for some window (maybe a minute, like David Schwartz suggests). You could optimize this a bit by watching the file size.
Figure out if the files are written serially in a reliable order. When you see file N appear, you know that file N-1 is ready. (Assumes that the directory is empty before the files are written, though you could also look at timestamps.) The downside is that your logic will break if the writer ever changes order or starts writing in parallel.
The reliable, safe solutions require improving the writer process.
Writer can write the files to hidden or temporary locations and only make them visible once the entire file (or directory) is ready, using symlinks or file-moving or chmod.
Writer creates a special file (e.g., "./DONE") only after all other files have been written, and reader doesn't read any files until that file is present.
Depending on the file type, the writer could add some kind of end-of-file record/line at the end of the file, and the reader could ensure that it's present.

You can use Ftp library from Apache common API
get more information
boolean flag = retrieveFile(String remote, OutputStream local);
This flag check output stream is available of the current file.

Why can I successfully move a file in Linux while it is being written to?

This question I think is technical enough for Stack Overflow, and probably too programming-oriented for Android. I am intrigued as to how files are handled in Android (or Java or Linux, as appropriate), since I did something with my new smartphone and I'd curious to know how it happened.
I was transferring a file from my laptop to my Android phone, via Bluetooth. I saw the new file in the file explorer, assumed it was fully transferred, and so moved it from /sdcard/bluetooth to /sdcard/torrents. After I had done so, I noticed it was in fact still being transferred. To my surprise, it completed successfully, confirmed with a notification icon on the phone, and by a manual MD5 check on both sides. In most systems, the file move would have caused a crash.
What is the reason for this successful transfer? I'm aware that in general, the file path is separate to the file location on the file system (in this case, an SD card). I imagine that the Bluetooth app has opened a handle to the file, and when I did the file move, a table of 'open files' was updated with a new path. Is this feature generally true of any Linux system? Could I do a mv on a file being written and expect the copy - in its new location - to be correct?

When you move a file inside the same filesystem, the file itself (the inode) isn't moved at all. The only thing that changes are the directory entries in that filesystem. (The system call invoked by mv in this case is rename(2) - check that page for additional information and restrictions.)
When a process opens a file, the filename is passed to the OS to indicate which file is meant, but the file descriptor you get back isn't linked to that name at all (you can't get back a filename from it) – it is linked to the inode.
Since the inode remains unchanged when you rename a file (inside the same filesystem), processes that have it open can happily keep reading from and writing to it – nothing changed for them, their file descriptor is still valid and pointing to the right data.
Same thing if you delete a file. Processes can keep reading and writing from it even if the file is no longer reachable through any directory entry. (This can lead to confusing situations where df reports that your disk is full, but du says you're using much less space that df reports. The blocks assigned to deleted files that are still open won't be released until those processes close their file descriptor.)
If the mv moves the file across filesystems, then the behavior is different since inodes are specific to each filesystem. In that case, mv will actually copy the data over, creating a new inode (and directory entry) on the destination filesystem. When the copy is over, the old file is unlinked, and removed if there are no open filehandles on it, as above.
In your case, if you had crossed a filesystem boundary, you'd have a partial file in the destination. And your upload process happily writing to a deleted file you can't easily access, possibly filling up that filesystem, until the upload finished at which point the inode would get dropped.
Some posts on Unix & Linux that you could find interesting:
How are directories implemented in Unix filesystems?
What is a Superblock, Inode, Dentry and a File?

Java moving file while writing consistent

my java application is supposed to read logging data of a Snort application on a Debian server.
The Snort application runs independent from my evaluation app and writes his logs into a file.
My evaulation app is supposed to check just the new content every 5 minutes. That's why I will move the logfile, so that the Snort application has to create a new file while my app can check the already written data from the old one.
Now the question: How can I ensure that I don't destroy the file in the case, that I move it in the moment the Snort application is writing on it? Has Java a functionality to check the current actions for the file so that no data can get lost? Does the OS lock the file while writing?
Thanks for your help, Kn0rK3

Not exactly what you are looking for, but I would do this in a very different way. Either by recording the line number / timestamp of the last entry read from the log file or the position in a RandomAccessFile (the second option is more efficient for obvious reasons), and, the next time you read the file, only do it from the recorded position to the EOF (at which you can record the last read position again).
Also, you can replace the "pool every 5 minutes" to a "pool every time I get a update notification" for this file strategy.
Since I assume that you don't have control of the code of the "Snort" application, I don't think that NIO FileLocks will help you.

It should not be an issue. Typically a logging application has some sort of file-descriptor or stream open to a file. If the file gets renamed, that doesn't affect the writing application in any way -- the name is independent to the contents of the file or its location on disk. Snort should continue to write to the new file-name until it notices that the file has been renamed at which point it re-opens a new log file to the old-name and switches to writing to that one.
That's the whole reason why it reopens in the first place. To support this sort of mechanism.
Now the question: How can I ensure that I don't destroy the file in the case...
The only thing you have to worry about is that you are renaming the file to a file-name that does not already exist. I would recommend moving it to a .YYYYMMDD.HHMMSS extension or something.
NOTE: In threaded logging operations, even if the new file has been opened, you may have to wait a bit for all of the threads to switch to the new logging stream. I'm not sure how Snort works but I have seen the log.YYYYMMDD file growing even after the log file was re-opened. I just wait a minute before I consume the renamed logfile. FYI.

How to be sure a file has been successfully written?

I'm adding autosave functionality to a graphics application in Java. The application periodically autosaves the current document and also autosaves on exit. When the user starts the application, the autosave file is reloaded.
If the autosave file is corrupted in any way (I assume a power cut when the file is in the middle of being saved would do this?), the user will lose their work. How can I prevent such situations and do all I can to guarantee that the autosave document is in a consistent state?
To further complicate matters, to autosave the document I need to save one .xml file and several .png files. Also, the .png saving occurs in C code over JNI.
My current strategy is to write each .png with the extension .png.tmp, write the .xml file with the extension .xml.tmp, and then rename each file to remove the .tmp part leaving the .xml until last. On startup, I only load the autosave document if I can find a .xml file and ignore .xml.tmp files. I also don't delete the previous autosave document until the .xml.tmp file for the new document is renamed.
I guess my knowledge of what happens when you write to disk is poor. I know you can have software read/write buffers when using files, as well as OS and hardware buffers and that all of these need to be flushed. I'm confused how I can know for sure when something really has been written to disk and what I can do to protect myself. Does the renaming operation do anything to make sure buffers are flushed?

If the autosave file is corrupted in any way (I assume a power cut when the file is in the middle of being saved would do this?), the user will lose their work. How can I prevent such situations and do all I can to guarantee that the autosave document is in a consistent state?
To prevent loss of data due to partially written autosave file, don't overwrite the autosave file. Instead, write to a new file each time, and then rename it once the file has been safely written.
To guard against not noticing that an autosave file has not been correctly written:
Pay attention to the exceptions thrown as the autosave file is written and closed in case a disc error, file system full, etc.
Keep a running checksum of the file as it is written and write it at the end of the file. Then when you load the autosave file, check that the checksum is there and is correct.
If the checkpointed state involves multiple files, make sure that you write the files in a well known order (without overwriting!), and write the checksum on the autosave file after all of the other files have been safely closed. You might want to create a directory for each checkpoint.
FOLLOW UP
No. I'm not saying that rename always succeeds. However, it is atomic - it either succeeds (and completes) or the file system is not changed. So, if you do this:
write "file.new" and close,
delete "file",
rename "file.new" to "file"
then provided the first step succeeds you are guaranteed to have the latest "file" safely on disc. And it is simple to add a couple of steps so that you have a backup of "file" at all times. (If the 3rd step fails, you are left with "file.new" and no "file". This can be recovered manually, or automatically by the application next time you run it.)
Also, I'm not saying that writes always succeed, or that applications don't crash, or that the power never goes off. And the point of the checksum is to allow you to detect the cases where these things have happened and the autosave file is incomplete.
Finally, it is a good idea to have two autosaves in case your application gets itself into a state where its data structures are messed up and the last autosave is nonsensical as a result. (The checksum won't protect against this.) Be cautious about autosaving when the application crashes for the same reason.

As an aside, since you have several different files as part of this one document, consider using either a project directory to hold them all together, or using some encapsulation format (like .zip) to put them all inside one file.
What you want to do is atomically replace the old backup files with new ones. Unfortunately, I don't believe that Java gives you enough control do this directly. You also need to reason about what operations are atomic in the underlying operating system. I know Linux file systems, so my answer will be biased towards a Java program running on that system. I would be shocked if Windows didn't do the same thing, but I can't say for certain.
Most Linux file systems (e.g. the meta-data journaled ones) let you rename files atomically. If the system crashes half-way through a rename, when you restart, it will be as if you never renamed a file in the first place. For this reason, a common way to atomically update an existing file F is to write your new data to a temporary file T and then rename T to F. Any system or application crash up to that rename will not affect F, so it will always be consistent.
Of course, before you rename, you need to make sure that your temporary file is consistent. Make sure that all streaming buffers for the file are flushed to the OS (Channel.force() or OutputStream.flush()) and the OS buffers are flushed to the disk (FileOutputStream.getFD.sync()). Of course, unless your OS disables the write cache on the hard disk itself (it probably hasn't), there's still a chance that your data can be corrupted. Add a checksum to the XML if you really want to be really sure. If you're truly paranoid, you should flush the OS and hard disk buffer caches and re-read the file to verify that it is consistent. This is beyond any reasonable expectation for normal consumer applications.
But that's just to atomically write write a single file. Your propblem is more complex: you have many files to update atomically. For example, I'll say that you have two files, img.png and main.xml. I'd do one of these:
The easy solution is to make a per-savefile directory. You wouldn't need to worry about renaming each individual file, and you could still atomically rename the new backup dir over the old backup dir you're replacing. That is, if your old backup is bak/img.png and bak/main.xml, write bak.tmp/img.png and bak.tmp/main.xml and rename bak.tmp to bak.
Name the new auxiliary files something else and let them coexist with the old ones for a little while. That is, write img.2.png and main.xml.tmp (which should refer to img.2.png, not img.png) and only rename main.xml.tmp to main.xml. Then delete img.png.
addition: If you don't have atomic renames, the next best thing extends on #2. Whenever you save the project, give it a new name (e.g. ver342.xml). When you load, just find the most recent XML that is consistent (i.e. its checksum verifies). Keep around 2 or 3 to be safe. Only delete an auto-save if you have successfully restored from a more-recent copy.

Modifying File while in use using Java

I have this recurrent Java JAR program tasks that tries to modify a file every 60seconds.
Problem is that if user is viewing the file than Java program will not be able to modify the file. I get the typical IOException.
Anyone knows if there is a way in Java to modify a file currently in use? Or anyone knows what would be the best way to solve this problem?
I was thinking of using the File canRead(), canWrite() methods to check if file is in use. If file is in use then I'm thinking of making a backup copy of data that could not be written. Then after 60 seconds add some logic to check if backup file is empty or not. If backup file is not empty then add its contents to main file. If empty then just add new data to main file. Of course, the first thing I will always do is check if file is in use.
Thanks for all your ideas.

I was thinking of using the File
canRead(), canWrite() methods to check
if file is in use.
Not a good idea - you'll run into race conditions e.g. when your code has used those check methods, received true return values, but then the file is locked by a different application (possibly the user) just before you open it for writing.
Instead, try to get a FileLock on the file and use the "backup file" when that fails.

You can hold a lock on the file. This should guarantee you are able to write on the file.
See here on how to use the FileLock class.

If the user is viewing the file you should still be able to read it. In this case, make an exact copy of the file, and make changes to the new file.
Then after the next 60 seconds you can either:
1) Check if the file is being viewed and if not, delete it and replace it with the earlier file, then directly update this file.
2) If it is being viewed, continue making changes to the copy of the file.
EDIT: As Michael mentioned, when working with the main file, get a lock on it first.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.