Detect a file that will be modified - java

I have a large directory containing files that are modified by a seperate system at varying intervals. I am running a watcher on this directory to detect which files are modified.
I'm wondering if there is some sort of trigger that occurs when a file is accessed by the system for modification. If so, the following would apply:
Using Java, is it possible to detect which files are about to be modified and make a temporary backup before that happens?
Alternately, is it possible to compare the newly modified file against it's previous version?
In this scenario, it is impossible to make a back up of every file as the files are large and there are many of them.
Example:
I have four files:
a.xml
b.xml
c.xml
d.log
b.xml has a new section added.
Is it possible to copy the newly created section into d.log?

I can think of one way which could be a possible solution to your problem.
"Maintain a log file which tracks lastModified date of each files and you can verify which file has been modified by using your log file.
--
Jitendra

No. you can not detect a file that will be modified. not until they come up with a highly accurate future predicting AI system.
your best approach would be to maintain a versioned backup of the the files. I would start with looking into some source code management system design considerations.

How would you know if the files are about to be modified? The system handles all of the file IO. The only way you could do that is to have the program doing the modification trigger the backup, and then make the modifications. For comparison, it depends on what you want. If you want a line-by-line comparison, that should be fairly simple to do using Java's file IO classes. If you just want to check if they are the same or not, you can use a checksum on both files.

Related

Do I need to call sync on file descriptor after using Files move operation?

I want to move two files to a different directory in same filesystem.
Concrete example, I want to move /var/bigFile to /var/path/bigFile, and /var/smallFile to /var/path/smallFile.
Currently I use Files.move(source, target), without any options, moving the small file first and big file second. I need this order since there is another process waiting for this files to arrive, and the order is important.
Problem is that, sometimes I see the creation date for small file being greater than the creation date for the big file, like the moving order is not followed.
Initially I thought I have to do a sync, but it does not make sense.
Given the fact that the move will actually be a simple rename, there is no system buffers included, to force them to be flushed to disk.
Timestamp for the files was checked using ls -alrt command.
Does anyone have any idea what could be wrong?

Check a file if it's completely copied

I'm using java to parse XML files which come from a FTP protocol. The problem is, the file I take may being copied/modified by the FTP. So I need a method which can check whether the file is completely written.
I've tried using File::canWrite method (which did work at all) or finding the ending tag of the XML file but none of them works correctly at any case. The File::renameTo is pretty slow and doesn't look decent although it works (not all the case either). Is there any good and fast way to check a file if it's completely copied?
Thanks alot!
Short answer no. The best practice is to write to a file with a temporary name, for example somefile.part and rename it when done. The writing program needs to do that. The workaround when you don't control the writing application is to check the modification time and ensure that some reasonable time has passed since the most recent change. Perhaps a minute. Then you assume that the file is complete.

Mule: How to track non-deletable & non-movable files

I have a directory with files that cannot be removed because they are used by other applications or have read only properties. This means that I can't move or delete the files like Mule does as a natural file tracking system. In order to process these files through Mule once they arrive or when they get updated without deleting/moving them from the original directory I need some sort of custom tracking. To do this I think I need to add some rules and be able to track files that are:
New files
Processed files
Updated files
For this, I thought of having a log file in the same directory that would track each file by name and date modified, but I'm not sure if this is the correct way of doing this. I would need to be able to write and read this log file and compare its content with current files in the directory in order to determine which files are new or updated. This seems to be a bit too complicated and requires me to add quite a bit of programming (maybe as groovy scripts or overriding some methods).
Is there any other simpler way to do this on Mule? If not, how should I start tackling this problem? I'm guessing I can write some java to talk to File EndPoint.
As Victor Romero pointed out, Idempotent Filter does the trick. I tried two types of Idempotent Filter to see which one works best: Idempotent Message Filter and Idempotent Secure Hash Message Filter. Both of them did the job, however I ended up using Idempotent Message Filter (no Hash) to log timestamp and filename in the simple-text-file-store.
Just after the File inbound-endpoint:
<idempotent-message-filter idExpression="#[message.inboundProperties.originalFilename+'-'+message.inboundProperties.timestamp]" storePrefix="prefix" doc:name="Idempotent Message">
<simple-text-file-store name="uniqueProcessedMessages" directory="C:\yourDirectory"/>
</idempotent-message-filter>
Only new or modified files for the purposes of my process would pass through. However Idempotent Secure Hash Message Filter should do a better job at identifying different files.

how to write into a text file in Java

I am doing a project in java and in that i need to add and modify my
text file at runtime,which is grouped in the jar.
I am using class.getResourceAsStream(filename) this method we
can read that file from class path.
i want to write into the same textfile.
What is the possible solution for this.
If i can't update the text file in jar what other solution is there?
Appreciate any help.
The easiest solution here is to not put the file in the jar. It sounds like you are putting files in your jar so that your user only needs to worry about one file that contains everything related to that program. This is an artificial constraint and just add headaches.
There is a simple solution that still allows you to distribute just the jar file. At start up, attempt to read the file from the file system. If you don't find it, use default values that are encoded in you program. Then when changes are made, you can write it to the file system.
In general, you can't update a file that you located using getResourceAsStream. It might be a file in a JAR/ZIP file ... and writing it would entail rewriting the entire JAR file. It might be a remote file served up by a Url classloader.
For your sanity (and good practice), you should not attempt to update files that you access via the classpath. If you need to, read the file out of the JAR file (or whatever), copy it into the regular file system, and then update the copy.
I'm not saying that it is impossible to do this in all cases. Indeed, in most normal cases you can do it with some effort. However, this is not supported, and there are no standard APIs for doing this.
Furthermore, attempts to update resources are liable to cause anomalies in the classloader. For example, I'd expect resources in JAR files to not update (from the perspective of the application) until the application restarted. But resources in exploded JAR files probably would update ... though new resources might not show up.
Finally, there are cases where updating a resource is impossible:
When the user doesn't have write access to the application's installation directory. This is typical for a properly administered UNIX / Linux machine.
When the JAR file is fetched from a remote server, you are likely not to be able to write the updates back.
When you are using an arbitrary custom classloader, you've got no way of knowing where the actual bytes of an updated resource should be stored, and no way of storing them.
All JAR rewriting techniques in Java look similar. Open the Jar file, read all of it's contents, and write a new Jar file containing the unmodified contents (and the modifications you whished to make). Such techniques are not advisable for a Jar file on the class path, much less a Jar file you're running from.
If you decide you must do it this way, Java World has a few articles:
Modifying Archives, Part 1
Modifying Archives, Part 2
A good solution that avoids the need to put your items into a Jar file is to read (if present) a properties file out of a hidden subdirectory in the user's home directory. The logic looks a bit like this:
if (the hidden directory named after my application doesn't exist) {
makeTheHiddenDirectory();
writeTheDefaultPropertiesFile();
}
Properties appProps = new Properties();
appProps.load(new FileInputStream(fileInHiddenDir));
...
... After the appProps have changed ...
...
appProps.store(new FileOutputStream(fileInHiddenDir), "Do not modify this file");
Look to java.util.Properties, and keep in mind that they have two different load and store formats (key = value based and XML based). Pick the one that suits you best.
If i can't update the text file in jar what other solution is there?
Store the information in any of:
Cookies
The server
Deploy the applet using 1.6.0_10+, launch it using JWS and use the PersistenceService to store the information. Here is my demo. of the PersistenceService.
Also, if your users will agree to a trusted applet (which seems overkill for this), you might write the information to a sub-directory of user.home.

How to be sure a file has been successfully written?

I'm adding autosave functionality to a graphics application in Java. The application periodically autosaves the current document and also autosaves on exit. When the user starts the application, the autosave file is reloaded.
If the autosave file is corrupted in any way (I assume a power cut when the file is in the middle of being saved would do this?), the user will lose their work. How can I prevent such situations and do all I can to guarantee that the autosave document is in a consistent state?
To further complicate matters, to autosave the document I need to save one .xml file and several .png files. Also, the .png saving occurs in C code over JNI.
My current strategy is to write each .png with the extension .png.tmp, write the .xml file with the extension .xml.tmp, and then rename each file to remove the .tmp part leaving the .xml until last. On startup, I only load the autosave document if I can find a .xml file and ignore .xml.tmp files. I also don't delete the previous autosave document until the .xml.tmp file for the new document is renamed.
I guess my knowledge of what happens when you write to disk is poor. I know you can have software read/write buffers when using files, as well as OS and hardware buffers and that all of these need to be flushed. I'm confused how I can know for sure when something really has been written to disk and what I can do to protect myself. Does the renaming operation do anything to make sure buffers are flushed?
If the autosave file is corrupted in any way (I assume a power cut when the file is in the middle of being saved would do this?), the user will lose their work. How can I prevent such situations and do all I can to guarantee that the autosave document is in a consistent state?
To prevent loss of data due to partially written autosave file, don't overwrite the autosave file. Instead, write to a new file each time, and then rename it once the file has been safely written.
To guard against not noticing that an autosave file has not been correctly written:
Pay attention to the exceptions thrown as the autosave file is written and closed in case a disc error, file system full, etc.
Keep a running checksum of the file as it is written and write it at the end of the file. Then when you load the autosave file, check that the checksum is there and is correct.
If the checkpointed state involves multiple files, make sure that you write the files in a well known order (without overwriting!), and write the checksum on the autosave file after all of the other files have been safely closed. You might want to create a directory for each checkpoint.
FOLLOW UP
No. I'm not saying that rename always succeeds. However, it is atomic - it either succeeds (and completes) or the file system is not changed. So, if you do this:
write "file.new" and close,
delete "file",
rename "file.new" to "file"
then provided the first step succeeds you are guaranteed to have the latest "file" safely on disc. And it is simple to add a couple of steps so that you have a backup of "file" at all times. (If the 3rd step fails, you are left with "file.new" and no "file". This can be recovered manually, or automatically by the application next time you run it.)
Also, I'm not saying that writes always succeed, or that applications don't crash, or that the power never goes off. And the point of the checksum is to allow you to detect the cases where these things have happened and the autosave file is incomplete.
Finally, it is a good idea to have two autosaves in case your application gets itself into a state where its data structures are messed up and the last autosave is nonsensical as a result. (The checksum won't protect against this.) Be cautious about autosaving when the application crashes for the same reason.
As an aside, since you have several different files as part of this one document, consider using either a project directory to hold them all together, or using some encapsulation format (like .zip) to put them all inside one file.
What you want to do is atomically replace the old backup files with new ones. Unfortunately, I don't believe that Java gives you enough control do this directly. You also need to reason about what operations are atomic in the underlying operating system. I know Linux file systems, so my answer will be biased towards a Java program running on that system. I would be shocked if Windows didn't do the same thing, but I can't say for certain.
Most Linux file systems (e.g. the meta-data journaled ones) let you rename files atomically. If the system crashes half-way through a rename, when you restart, it will be as if you never renamed a file in the first place. For this reason, a common way to atomically update an existing file F is to write your new data to a temporary file T and then rename T to F. Any system or application crash up to that rename will not affect F, so it will always be consistent.
Of course, before you rename, you need to make sure that your temporary file is consistent. Make sure that all streaming buffers for the file are flushed to the OS (Channel.force() or OutputStream.flush()) and the OS buffers are flushed to the disk (FileOutputStream.getFD.sync()). Of course, unless your OS disables the write cache on the hard disk itself (it probably hasn't), there's still a chance that your data can be corrupted. Add a checksum to the XML if you really want to be really sure. If you're truly paranoid, you should flush the OS and hard disk buffer caches and re-read the file to verify that it is consistent. This is beyond any reasonable expectation for normal consumer applications.
But that's just to atomically write write a single file. Your propblem is more complex: you have many files to update atomically. For example, I'll say that you have two files, img.png and main.xml. I'd do one of these:
The easy solution is to make a per-savefile directory. You wouldn't need to worry about renaming each individual file, and you could still atomically rename the new backup dir over the old backup dir you're replacing. That is, if your old backup is bak/img.png and bak/main.xml, write bak.tmp/img.png and bak.tmp/main.xml and rename bak.tmp to bak.
Name the new auxiliary files something else and let them coexist with the old ones for a little while. That is, write img.2.png and main.xml.tmp (which should refer to img.2.png, not img.png) and only rename main.xml.tmp to main.xml. Then delete img.png.
addition: If you don't have atomic renames, the next best thing extends on #2. Whenever you save the project, give it a new name (e.g. ver342.xml). When you load, just find the most recent XML that is consistent (i.e. its checksum verifies). Keep around 2 or 3 to be safe. Only delete an auto-save if you have successfully restored from a more-recent copy.

Categories