How to be sure a file has been successfully written? - java

I'm adding autosave functionality to a graphics application in Java. The application periodically autosaves the current document and also autosaves on exit. When the user starts the application, the autosave file is reloaded.
If the autosave file is corrupted in any way (I assume a power cut when the file is in the middle of being saved would do this?), the user will lose their work. How can I prevent such situations and do all I can to guarantee that the autosave document is in a consistent state?
To further complicate matters, to autosave the document I need to save one .xml file and several .png files. Also, the .png saving occurs in C code over JNI.
My current strategy is to write each .png with the extension .png.tmp, write the .xml file with the extension .xml.tmp, and then rename each file to remove the .tmp part leaving the .xml until last. On startup, I only load the autosave document if I can find a .xml file and ignore .xml.tmp files. I also don't delete the previous autosave document until the .xml.tmp file for the new document is renamed.
I guess my knowledge of what happens when you write to disk is poor. I know you can have software read/write buffers when using files, as well as OS and hardware buffers and that all of these need to be flushed. I'm confused how I can know for sure when something really has been written to disk and what I can do to protect myself. Does the renaming operation do anything to make sure buffers are flushed?

If the autosave file is corrupted in any way (I assume a power cut when the file is in the middle of being saved would do this?), the user will lose their work. How can I prevent such situations and do all I can to guarantee that the autosave document is in a consistent state?
To prevent loss of data due to partially written autosave file, don't overwrite the autosave file. Instead, write to a new file each time, and then rename it once the file has been safely written.
To guard against not noticing that an autosave file has not been correctly written:
Pay attention to the exceptions thrown as the autosave file is written and closed in case a disc error, file system full, etc.
Keep a running checksum of the file as it is written and write it at the end of the file. Then when you load the autosave file, check that the checksum is there and is correct.
If the checkpointed state involves multiple files, make sure that you write the files in a well known order (without overwriting!), and write the checksum on the autosave file after all of the other files have been safely closed. You might want to create a directory for each checkpoint.
FOLLOW UP
No. I'm not saying that rename always succeeds. However, it is atomic - it either succeeds (and completes) or the file system is not changed. So, if you do this:
write "file.new" and close,
delete "file",
rename "file.new" to "file"
then provided the first step succeeds you are guaranteed to have the latest "file" safely on disc. And it is simple to add a couple of steps so that you have a backup of "file" at all times. (If the 3rd step fails, you are left with "file.new" and no "file". This can be recovered manually, or automatically by the application next time you run it.)
Also, I'm not saying that writes always succeed, or that applications don't crash, or that the power never goes off. And the point of the checksum is to allow you to detect the cases where these things have happened and the autosave file is incomplete.
Finally, it is a good idea to have two autosaves in case your application gets itself into a state where its data structures are messed up and the last autosave is nonsensical as a result. (The checksum won't protect against this.) Be cautious about autosaving when the application crashes for the same reason.

As an aside, since you have several different files as part of this one document, consider using either a project directory to hold them all together, or using some encapsulation format (like .zip) to put them all inside one file.
What you want to do is atomically replace the old backup files with new ones. Unfortunately, I don't believe that Java gives you enough control do this directly. You also need to reason about what operations are atomic in the underlying operating system. I know Linux file systems, so my answer will be biased towards a Java program running on that system. I would be shocked if Windows didn't do the same thing, but I can't say for certain.
Most Linux file systems (e.g. the meta-data journaled ones) let you rename files atomically. If the system crashes half-way through a rename, when you restart, it will be as if you never renamed a file in the first place. For this reason, a common way to atomically update an existing file F is to write your new data to a temporary file T and then rename T to F. Any system or application crash up to that rename will not affect F, so it will always be consistent.
Of course, before you rename, you need to make sure that your temporary file is consistent. Make sure that all streaming buffers for the file are flushed to the OS (Channel.force() or OutputStream.flush()) and the OS buffers are flushed to the disk (FileOutputStream.getFD.sync()). Of course, unless your OS disables the write cache on the hard disk itself (it probably hasn't), there's still a chance that your data can be corrupted. Add a checksum to the XML if you really want to be really sure. If you're truly paranoid, you should flush the OS and hard disk buffer caches and re-read the file to verify that it is consistent. This is beyond any reasonable expectation for normal consumer applications.
But that's just to atomically write write a single file. Your propblem is more complex: you have many files to update atomically. For example, I'll say that you have two files, img.png and main.xml. I'd do one of these:
The easy solution is to make a per-savefile directory. You wouldn't need to worry about renaming each individual file, and you could still atomically rename the new backup dir over the old backup dir you're replacing. That is, if your old backup is bak/img.png and bak/main.xml, write bak.tmp/img.png and bak.tmp/main.xml and rename bak.tmp to bak.
Name the new auxiliary files something else and let them coexist with the old ones for a little while. That is, write img.2.png and main.xml.tmp (which should refer to img.2.png, not img.png) and only rename main.xml.tmp to main.xml. Then delete img.png.
addition: If you don't have atomic renames, the next best thing extends on #2. Whenever you save the project, give it a new name (e.g. ver342.xml). When you load, just find the most recent XML that is consistent (i.e. its checksum verifies). Keep around 2 or 3 to be safe. Only delete an auto-save if you have successfully restored from a more-recent copy.

Related

Do I need to call sync on file descriptor after using Files move operation?

I want to move two files to a different directory in same filesystem.
Concrete example, I want to move /var/bigFile to /var/path/bigFile, and /var/smallFile to /var/path/smallFile.
Currently I use Files.move(source, target), without any options, moving the small file first and big file second. I need this order since there is another process waiting for this files to arrive, and the order is important.
Problem is that, sometimes I see the creation date for small file being greater than the creation date for the big file, like the moving order is not followed.
Initially I thought I have to do a sync, but it does not make sense.
Given the fact that the move will actually be a simple rename, there is no system buffers included, to force them to be flushed to disk.
Timestamp for the files was checked using ls -alrt command.
Does anyone have any idea what could be wrong?

Java Files.copy replace existing deletes file entirely

I have some code that is designed to open a local master file, make additions, and save the file both by overwriting the master file and overwriting a write protected copy on an accessible network location. This is done by saving the modified file to a temp file and then copying over the other two files.
String tempFileName= "File.tmp";
String fileName= "File.xlsm";
String serverPath="\\\\network path\\";
File serverFile = new File(serverPath+fileName);
Files.copy(Paths.get(tempFileName),Paths.get(fileName),
StandardCopyOption.COPY_ATTRIBUTES,StandardCopyOption.REPLACE_EXISTING);
if(serverFile.exists()){serverFile.setWritable(true, false);}
Files.copy(Paths.get(tempFileName),Paths.get(serverPath+fileName),
StandardCopyOption.COPY_ATTRIBUTES,StandardCopyOption.REPLACE_EXISTING);
serverFile.setWritable(false,false);
Files.delete(Paths.get(tempFileName));
This code works well most of the time however, some of the time the code completes successfully without exception but with the network location file deleted. The local master file is saved and updated correctly but the file that should exist on the network is simply gone.
What makes this more difficult is that i have been unable to reproduce this problem under any controlled circumstances. So i ask you for any guidance on how this could occur from a file copy/overwrite operation.
Thank you
UPDATE:
I had a hunch and checked network access logs to the server file path. The deletion of the file occurs if and only if the file is being accessed by a user other than the creator but not all of the time. Again though, this is accessed as read only so a user having the file open should not affect overwriting a new version and most of the time does not. Digging deeper it seems that occasionally if and only if the file is opened by another user and java is trying to overwrite the file an AccessDenied Exception is thrown and the file is deleted.
I believe this must be a bug in setWritable() or Files.copy (or a combination) as the file should not be deleted in any case and isWritable() returns true every time. I have tried other methods for setting/UN-setting read only permissions and have come up empty. The current work around that I have in place simply catches the exception and loops until the file is deleted and a fresh copy is in place. This works but is really a hack so if anyone has any better solutions/suggestions I welcome them.
See How does FileLock work?, you could do something like:
Wait for file to become available
Lock file
Overwrite/delete/other
Unlock (if applicable)
This should prevent access by other users during the process of modifying the file.

Java moving file while writing consistent

my java application is supposed to read logging data of a Snort application on a Debian server.
The Snort application runs independent from my evaluation app and writes his logs into a file.
My evaulation app is supposed to check just the new content every 5 minutes. That's why I will move the logfile, so that the Snort application has to create a new file while my app can check the already written data from the old one.
Now the question: How can I ensure that I don't destroy the file in the case, that I move it in the moment the Snort application is writing on it? Has Java a functionality to check the current actions for the file so that no data can get lost? Does the OS lock the file while writing?
Thanks for your help, Kn0rK3
Not exactly what you are looking for, but I would do this in a very different way. Either by recording the line number / timestamp of the last entry read from the log file or the position in a RandomAccessFile (the second option is more efficient for obvious reasons), and, the next time you read the file, only do it from the recorded position to the EOF (at which you can record the last read position again).
Also, you can replace the "pool every 5 minutes" to a "pool every time I get a update notification" for this file strategy.
Since I assume that you don't have control of the code of the "Snort" application, I don't think that NIO FileLocks will help you.
It should not be an issue. Typically a logging application has some sort of file-descriptor or stream open to a file. If the file gets renamed, that doesn't affect the writing application in any way -- the name is independent to the contents of the file or its location on disk. Snort should continue to write to the new file-name until it notices that the file has been renamed at which point it re-opens a new log file to the old-name and switches to writing to that one.
That's the whole reason why it reopens in the first place. To support this sort of mechanism.
Now the question: How can I ensure that I don't destroy the file in the case...
The only thing you have to worry about is that you are renaming the file to a file-name that does not already exist. I would recommend moving it to a .YYYYMMDD.HHMMSS extension or something.
NOTE: In threaded logging operations, even if the new file has been opened, you may have to wait a bit for all of the threads to switch to the new logging stream. I'm not sure how Snort works but I have seen the log.YYYYMMDD file growing even after the log file was re-opened. I just wait a minute before I consume the renamed logfile. FYI.

Safe file-based data persistance in Java EE

Is it possible to have a Java EE application (based on Spring Framework, running in Tomcat container) persisting its data in a file on the server?
The scenario is as follows: I have a class with an int field (read from ?? during startup). I want to save it to a file in a safe manner (as safe as possible, meaning surviving server crash would be appreciated). Is it possible (besides naive file reading/writing)
Kind regards,
q
Really the only "safe" way to do it is to rely on the underlying file system.
Simply:
public void saveThing(Serializable thing, String fileName) throws Exception {
String tempFileName = fileName + "_tmp";
File tempFile = new File(tempFileName);
FileOutputStream fos = new FileOutputStream(tempFile);
FileDescriptor fd = fos.getFD();
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(thing);
oos.flush();
fd.sync();
oos.close();
f.renameTo(fileName);
}
What's happening here is first we're writing the file to a temporary file. This ensures that the entire file write succeeds without damaging the original file (for example, if you run out of disk space, the original will be retained as this routine will not finish). However if this routine fails, the lingering temp file will remain, and will need to be cleaned up later.
Once we've written the file, we force the OS to flush any pending writes to the actual disk. Many systems buffer file system writes to ram, and "eventually" write them out to disk. This is for obvious performance reasons. However, should the system crash or lose power between when you closed the file, and the OS decides to flush the writes, you can potentially lose data. This sync is an EXPENSIVE operation.
Finally, once we are sure that we have written the file, and that it is committed to disk (as sure as we can be anyway), we then RENAME the temp file to the actual file name.
Renaming a file on the file system is an atomic operation. It's can't partially fail. It either works, or it doesn't. If the two files are on the same file system, the rename is near instantaneous since it simply updates some file system information. If the two are on separate file systems, then the new file must be copied first to the new file system, and then renamed. I ASSUME this is how it is done, I never tested this. I tend to stick to the same file system and avoid the question completely.
This process ensures that the file will be updated, under the correct name, completely, "all at once". The file (under its correct name) never only "partially exists", which is what would happen if you were to simply overwrite the existing file.
Finally, on Windows you may have a problem if there is contention for the original file, since Windows will not delete a file that is opened by something else. Unix has no problem doing this, but Windows does. So you need to ensure through some external means that you have sole access to the file before doing this rename procedure.
The short answer is yes. I actually had to do just that for a project that I did with a university a while back. I posted the code for it on my git hub: Speak To Me project. In that Web app, I persisted user data to file in plain text so it was both human readable and easy to for objects to reinitialize themselves.
So readers of this question might be wondering why I didn't use a database for these purposes. Well the university that I was working with didn't want to support one. As well, this app had really low traffic; it is a research prototype for testing search interfaces so it was only used for user studies. Finally, because of the nature of the application, persisting to file keep things really simple. In fact, the data files were later used for post study analyses. Plus it kept the option open for students who were not great coders to get their feet wet (that... never happened).
Anyhow, my recommendation is that if you are just persisting simple values, then plain text will be fine. If your data has any amount of complexity, then use JSON. XML is a bit heavyweight and really should only be used if your application is large but in that scenario, you shouldn't be persisting to file.
It may be on overkill for your situation, but you could use HSQLDB. You can configure it to persist in a file.
For a simpler solution, you can always write/read from a file. Some issues worth of consideration:
Use JNDI or a system variable to store the name and path of the file.
Make sure that the user that runs the server has read/write access to the file.
Other than that you can use standard Java File operations
You can use the serializable interface in Java to create persistent objects that you can save and reload from disk.

Check if the folder and all data already exist

My Android application downloads data only the first lunch. the data is ~50 mb with ~2500 files.
1. Is it a good idea to store if the files got downloaded in SharedSettings? The problem is that if a user clears the data application (maybe by mistake), he has to redownload everything. I manually copy a prepacked database to /data/data/../databases/, is it a good idea to check if the db exists, and if no then download everything?:
if(new File(/data/data/../databases/myDB.db).exists){//dont download}
2.Is getting the folder size and checking if its the same a good way to see if the folder+data are good? or is there a better way to check if 2 folders are the same?
Thanks.
No, do not put 50MB of data into SharedSettings. That will fall over and die. A set of SharedSettings is stored in XML on disk and entirely loaded into RAM when opened. This also won't keep the user from clearing this data.
For determining whether the data has been downloaded, I would suggest just having a file you make once the download is complete indicating it is done. The user can't selectively remove files. They can clear your data, but that will also clear the sentinel file and you will know you need to re-download. (Also keep in mind you will need to deal with restarting the download if it gets interrupting in the middle.)
Also be sure you correctly handle filesystem operations as described here: http://android-developers.blogspot.com/2010/12/saving-data-safely.html
An alternate idea if you're worried about missing data files... If at any point your app looks for a file and it doesn't exist, throw an exception, pass it to a handler that shows a dialog and 'verifies' your data. You can keep a list of all needed data files, and then only download ones that don't exist. Something like a system check, if you will.
That way they don't end up downloading 50MB if they were only missing a couple files they accidentally deleted in root explorer ;-)

Categories