java - File lastModified vs reading the file - java

I am using a file and need to update value in java when file is modified.
So, I am thinking to check modified time using lastModified of File class, and if modified read the file and update single property from the file.
My doubt is, is lastModified as heavy as reading single property from the file/reading whole file. Because my test results are showing almost same results.
So is it better to read file and update property from the file everytime or checking lastModified is better option in long run.
Note: This operation is performed every one minute.
Or is there any better option than polling lastModified to check if file has changed. I am using java 6.

Because you are using Java 6, checking the modified date or file contents is your only option (there's another answer that discusses using the newer java.nio.file functionality, and if you have the option of moving to Java 7, you should really, really consider that).
To answer your original question:
You didn't specify the location of the file (i.e. is it on a local disk or a server somewhere else) - I'll respond assuming local disk, but if the file is on a different machine, network latencies and netbios/dfs/whatever-network-file-system-you-use will exacerbate the differences.
Checking modified date on a file involves reading the meta data from disk. Checking the contents of the file require reading the file contents from disk (if the file is small, this will be one read operation. If the file is larger, it could be multiple read operations).
Reading the content of the file will probably involve read/write lock checking. Generally speaking, checking the modified date on the file will not require read/write lock checking (depending on the file system, there may still be consistency locks occurring on the meta data disk page, but those are generally lighter weight than file locks).
If the file changes frequently (i.e. you actually expect it to change every minute), then checking the modified date is just overhead - you are going to read the file contents in most cases anyway. If the file doesn't change frequently, then there would definitely be an advantage to modified date checking if the file was large (and you had to read the entire file to get at your information).
If the file is small, and doesn't change frequently, then it's pretty much a wash. In most cases, the file contents and the file meta data are already going to be paged into RAM - so both operations are a relatively efficient check of contents in RAM.
I personally would do the modified date check just b/c it logically makes sense (and it protects you from the performance hit if the file size ever grows above one disk page) - but if the file changes frequently, then I'd just read the file contents. But really, either way is fine.
And that brings us to the unsolicited advice: my guess is that the performance on this operation isn't a big deal in the greater scheme of things. Even if it took 1000X longer than it does now, it probably still wouldn't impact your application's primary purpose/performance. So my real advice here is just write the code and move on - don't worry about it's performance unless this becomes a bottleneck for your application.

Quoting from The JAVA Tutorials
To implement this functionality, called file change notification, a program must be able to detect what is happening to the relevant directory on the file system. One way to do so is to poll the file system looking for changes, but this approach is inefficient. It does not scale to applications that have hundreds of open files or directories to monitor.
The java.nio.file package provides a file change notification API, called the Watch Service API. This API enables you to register a directory (or directories) with the watch service. When registering, you tell the service which types of events you are interested in: file creation, file deletion, or file modification. When the service detects an event of interest, it is forwarded to the registered process. The registered process has a thread (or a pool of threads) dedicated to watching for any events it has registered for. When an event comes in, it is handled as needed.`
Here are some links which provide some sample source on implementation of this service:-
Link 1
Link 2
Edit:- Thanks to Kevin Day for pointing out in comments, since you are using java 6 this might not work for you. Although there is an alternative available in Apache Commons IO . But have not worked with it, so you have to check it yourself :)

Related

Inter-process file exchange: efficiency and race conditions

The story:
A few days ago I was thinking about inter-process communication based on file exchange. Say process A creates several files during its work and process B reads these files afterwards. To ensure that all files were correctly written, it would be convenient to create a special file, which existence will signal that all operations were done.
Simple workflow:
process A creates file "file1.txt"
process A creates file "file2.txt"
process A creates file "processA.ready"
Process B is waiting until file "processA.ready" appears and then reads file1 and file2.
Doubts:
File operations are performed by the operating system, specifically by the file subsystem. Since implementations can differ in Unix, Windows or MacOS, I'm uncertain about the reliability of file exchange inter-process communication. Even if OS will guarantee this consistency, there are things like JIT compiler in Java, which can reorder program instructions.
Questions:
1. Are there any real specifications on file operations in operating systems?
2. Is JIT really allowed to reorder file operation program instructions for a single program thread?
3. Is file exchange still a relevant option for inter-process communication nowadays or it is unconditionally better to choose TCP/HTTP/etc?
You don’t need to know OS details in this case. Java IO API is documented to guess whether file was saved or not.
JVM can’t reorder native calls. It is not written in JMM explicitly but it is implied that it can’t do it. JVM can’t guess what is impact of native call and reordering of those call can be quite generous.
There are some disadvantages of using files as a way of communication:
It uses IO which is slow
It is difficult to separate processes between different machines in case you would need it (there are ways using samba for example but is quite platform-dependant)
You could use File watcher (WatchService) in Java to receive a signal when your .ready file appears.
Reordering could apply but it shouldn't hurt your application logic in this case - refer the following link:
https://assylias.wordpress.com/2013/02/01/java-memory-model-and-reordering/
I don't know the size of your data but I feel it would still be better to use an Message Queue (MQ) solution in this case. Using a File IO is a relatively slow operation which could slow down the system.
Used file exchange based approach on one of my projects. It's based on renaming file extensions when a process is done so other process can retrieve it by file name expression checking.
FTP process downloads a file and put its name '.downloaded'
Main task processor searched directory for the files '*.downloaded'.
Before starting, job updates file name as '.processing'.
When finished then updates to '.done'.
In case of error, it creates a new supplemantary file with '.error' extension and put last processed line and exception trace there. On retries, if this file exists then read it and resume from correct position.
Locator process searches for '.done' and according to its config move to backup folder or delete
This approach is working fine with a huge load in a mobile operator network.
Consideration point is to using unique names for files is important. Because moving file's behaviour changes according to operating system.
e.g. Windows gives error when there is same file at destination, however unix ovrwrites it.

Lock on existence of file in Java

Short version: Why should File.createNewFile() not be used for file locking? Or more specifically: Are there issues if it is used to lock an applications data directory?
Details:
I would like to protect my applications data directory using a lock file: If the file lock exists, the directory is locked and the application exits with an error message. If it does not exist it will be created and the application continues. On exit the file will be deleted.
The lock will not be created that often (i.e. performance is not an issue) and I have no problem with manually deleting the lock file in case of some error (i.e. failing to delete the file is not an issue).
The code looks something like this:
File lockFile = new File("lock");
boolean lockCreated = lockFile.createNewFile();
if (lockCreated)
{
// do stuff
lockFile.delete();
}
else
{
System.err.println("Lockfile exists => please retry later");
// alternative: Wait and retry e.g. 5 times
}
Now I'm a bit confused about the Javadoc of createNewFile():
Atomically creates a new, empty file named by this abstract pathname if and only if a file with this name does not yet exist. The check for the existence of the file and the creation of the file if it does not exist are a single operation that is atomic with respect to all other filesystem activities that might affect the file.
Note: this method should not be used for file-locking, as the resulting protocol cannot be made to work reliably. The FileLock facility should be used instead.
What are the potential problems mentioned in the note, considering the existence check and file creation are atomic?
This forum post from December 2007 indicates there are "significant platform differences" according to the Javadoc of File.delete() (although I cannot find such a statement since at least Java SE 1.4.2). But even if there would be such differences: Could they really cause the locking to fail (i.e. two processes think the data directory is usable at the same time)?
Note: I do not want any of the following:
Lock a file so that no other process can access and/or modify it (most information I found seems to discuss this issue).
Make sure no other process can remove the lock.
Synchronize multiple threads of the same JVM (although I think my solution should be able to handle that too).
The Javadoc of Files.createFile(…), part of java.nio.file available since Java 7, repeats the promise of atomicity but does not mention anything about file based locking.
My reasoning:
Either the newer method (from java.nio.file.Files) is affected by the same (or similar) problems as the older one (from java.io.File) and the Javadoc is simply missing this information…
… or the newer method actually behaves more predictably and correct.
Given the error handling and specification in java.nio.file has generally been improved compared to the File class (existing ever since JDK 1.2), I assume the second alternative is correct.
My conclusion: Using Files.createFile(…) is fine for this use case.
The short answer: reliable file based locking in Java is not practical.
The long answer: The issue with file based locking, in any OS, always comes down to what kind of storage system the file comes from. Almost all network accessed file systems (NFS, SAMBA, etc) have very unreliable (or at least unpredictable) synchronizations on file creates or deletes that make a general Java-ish approach inadvisable. In certain OSes, using local file systems, you can sometimes get what you desire. But you need to understand the underlying file system and its characteristics and proceed with care.

How to consistently access a file?

I'm looking for a way to select a file, and a time frame/or until a certain action is performed, and "use" or "read" the file for that amount of time. Just a way to keep other programs from accessing it. The quality assurance department as my company is in need of an application like this and I believe it's possible to make it but I'm not sure how to approach this. Possibly "read" the file over and over until the time is reach or an action is performed?
Any ideas?
For Java, the answer would be to use a FileLock, which maps to the native mechanism of the operating system.
On Linux you can block your file access using system calls like flock.
A rather low-tech alternative can be:
Read the file
Keep it in memory.
Delete the file.
Work with your memory copy.
Dump your memory copy file into a new file with the same name when you have finished.
This second method is limited by the file and your system memory sizes, besides you can loose your file if the system stops working before reaching step 5. It is just a silly alternative to system calls. I would prefer to use system API services such flock.

Reading a file that is being written to - Locking it?

There is a file - stored on an external server which is updated very frequently by a vendor. My application polls this file every minute getting the values out. All I am doing is reading the file.
I am worried that by doing this I could inadvertently lock the file so it cant be written too by the vendor. Is this a possibility?
Kind regards
Further to Eric's answer - you could check the Last Modified Property of the temp file and only merge it with your 'working' file when it changes - that should protect you from read/write conflicts and only merge files just after the vendor has written to the temp. Though this is messy and mrab's comment is valid, a better solution should be found.
I have faced this problem several times, and as Peter Lawrey says there isn't any portable way to do this, and if your environment is Unix this should not be an issue at all as these concurrent access conditions are properly managed by the operating systems. However windows do not handle this at all (yes, that's the consequence of using an amateur OS for production work, lol).
Now that's said, there is a way to solve this if your vendor is flexible enough. They could write to a temp file and when finished move the temp file to the final destination. By doing this you avoid any concurrent access to the file between you and the vendor.
Another way is to precisely (difficult?) know the timing of your vendors update and avoid reading the file during some time frames. For instance if your vendor update the file every hour, avoid reading from five-to-the-hour to five-past-the-hour.
Hope it helps.
There is the Windows Shadow Copy service for volumes. This would allow to read the backup copy.
If the third party software is in java too, and uses a Logger, that should be tweakable: every minute writing to the next from 10 files or so.
I would try to relentlessly read the file (when modified since last read), till something goes wrong. Maybe you can make a test run with hundreds of reads in the weekend or at midnight, when no harm is done.
My answer:
Maybe you need a local watch program, a watch service for a directoryr, that waits till the file is modified, and then makes a fast cooy; after that allowing the copy to be transmitted.

Technology to transfer data with external system

We have an interface with an external system in which we get flat files from them and process those files. At present we run a job a few times a day that checks if the file is at the ftp location and then processes if it exists.
I recently read that it is a bad idea to make use of file systems as a message broker which is why I am putting in this question. Can someone clarify if a situation like this one is a right fitment for the use of some other tool and if so which one?
Ours is a java based application.
The first question you should ask is "is it working?".
If the answer to that is yes, then you should be circumspect about change just because you read it was a bad idea. I've read that chocolate may be bad for you but I'm not giving it up :-)
There are potential problems that you can run into, such as files being deleted without your knowledge, or trying to process files that are only half-transferred (though there are ways to mitigate both of those, such as permissions in the former case, or the use of sentinel files or content checking in the latter case).
Myself, I would prefer a message queueing system such as IBM's MQ or JMS (since that's what they're built for, and they do make life a little easier) but, as per the second paragraph above, only if either:
problems appear or become evident with the current solution; or
you have some spare time and money lying around for unnecessary rework.
The last bullet needs expansion. While the work may be unnecessary (in terms of fixing a non-existent problem), that doesn't necessarily make it useless, especially if it can improve performance or security, or reduce the maintenance effort.
I would use a database to synchronize your files. Have a database that points to the file locations. Put an entry into the database only when the files have been fully transferred. This would ensure that you are picking up completed files. You can poll the database to check if new entries are present instead of polling the file system. A very easy simple set up for a polling mechanism. If you would like to be told when a new file appears on the folder, then you would need to go in for a Message Queue.

Categories