Reading current and new files from a directory using Java

Reading current and new files from a directory using Java - java

I have written a program to process files in a directory. At start up it reads the current files in a directory, and then it uses a monitor to discover new files. Once it has processed a file,the program deletes the file. The problem is that there is a time gap, no matter how slight, between reading the files in a directory at startup and then starting the listener. A file created in that gap would be missed. One possible solution would be to repeatedly read the files in a directory (newDirectoryStream), but that doesn't seem as elegant or possibly efficient as using a monitor. The code uses the Apache Commons monitor and looks something like:
// Read Current files
stream = Files.newDirectoryStream(listenDir);
processFile(file);
// Process New files
FileAlterationObserver observer = new
FileAlterationObserver(listenDir.toAbsolutePath().toString(),filter);
FileAlterationMonitor monitor = new FileAlterationMonitor(POLL_INTERVAL);
FileAlterationListener listener = new FileAlterationListenerAdaptor() {
#Override
public void onFileCreate(File file) {
processFile( file.toPath());
}
};
observer.addListener(listener);
monitor.addObserver(observer);
monitor.start();

Simply flip it: First set up the listener and then obtain a directory stream. Go through a concurrent set which lets you do a 'only once ever' layout (the one that added the file name to the set and got the return value indicating 'you actually are the one that added it, you're not merely re-applying something that was already in there' - then you handle the file, otherwise you keep going). This way, if the file is added right in the 'sweet spot', both the dirstream and the observer would get it, but still only one will process it.

Related

WatchService issues - logfile doesn't refresh on modification

I use a (third-party) Windows10 application, which generates .txt log files. In my own application I wrote a class, which uses WatchService and observes the folder for these .txt changes. I know my class works correctly, becuase I tested with other files/JUnit. During testing changes are picked up right away, all correct.
In case of the .txt logs nothing gets picked up.
I played around and noticed, that my application pickes up the updates only if I go to the explorer window and hit F5 (refresh). Size of the file is refreshed and also my WatchService fires an update.
Any idea why this weird behaviour would happen? It probably is on the log application level or Windows itself, but I wonder if anyone can come up with a solution to this with Java?

Because there is no suggestion I will share what I did, solution I don't particularly like, but perhaps it will be helpfull to someone.
In order to refresh the folder I simply created a loop, which checks for file sizes. The below code checks sizes of all files, but if you know the exact name of the file, you could also implement this for that particular file.
File folder = new File("C:\\path\\to\\logs\\");
while (true)
{
Thread.sleep(5000);
File[] listOfFiles = folder.listFiles();
for (File f : listOfFiles)
{
System.out.println("\tFile size: " + f.length());
}
}
You can also change the sleep value from 5000 (5seconds) if you need it checked more or less often.

Java copy-overwrite file, gets old file when reading

In a unit test I am overwriting a config file to test handling bad property values.
I am using Apache Commons IO:
org.apache.commons.io.FileUtils.copyFile(new File(configDir, "xyz.properties.badValue"), new File(configDir, "xyz.properties"), false)
When investigating the file system I can see that xyz.properties is in fact overwritten - size is updated and the content is the same as that of xyz.properties.badValue.
When I complete the test case which goes through code that reads the file into a Properties object (using a FileReader object) I get the properties of the original xyz.properties file, not the newly copied version.
Through debugging where I single step and investigate the file I can rule out it being a timing issue of writing to the file system.
Does the copy step somehow hold a file handle? If so how would I release it again?
If not, does anybody have any idea why this happens and how to resolve it?
Thanks.

If you initialized the FileReader object before this object, then it will have already stored a temp copy of the old version.
You'll need to reset it:
FileReader f = new FileReader("the.file");
// Copy and overwrite "the.file"
f = new FileReader("the.file");
In the Unix filesystem model, the inode containing the file's contents will persist as long as someone has an open filehandle into the file, or there is a directory entry pointing to it.
Replacing the file's name in the directory, does not remove the inode (contents of the file), so your already-open filehandle can continue to be used.
This is actually exploitable to create temporary files that never need to be cleaned up: create the file, then unlink it immediately, while keeping it open. When you close the file handle, the inode is reaped

I realize that this doesn't answer your question directly, but I think that it would be better to maintain two separate files, and arrange for your code to have the name of the configuration file configurable / injected at runtime. That way, your tests can specify which config file to use, rather than overwriting a single file.

Java: Efficient way to scan a folder for a particular file

I am contacting an external services with my Java app.
The flow is as follow: ->I generate an XML file, and put it in an folder, then the service processes the file and return another file with the same name having an extension .out
Right now after I put the file in the folder I start with a loop, until I get that file back so I can read the result.
Here is the code:
fileName += ".out";
File f = new File(fileName);
do
{
f = new File(fileName);
} while (!f.exists());
response = readResponse(fileName); // got the response now read it
My question comes here, am I doing it in the right way, is there a better/more efficient way to wait for the file?
Some info: I run my app on WinXP, usually it takes the external service less than a second to respond with a file, I send around 200 request per day to this services. The path to the folder with the result file is always the same.
All suggestions are welcome.
Thank you for your time.

There's no reason to recreate the File object. It just represents the file location, whether the file exists or not. Also you probably don't want a loop without at least a short delay, otherwise it'll just max out a processor until the file exists. You probably want something like this instead:
File file = new File(filename);
while (!file.exists()) {
Thread.sleep(100);
}
Edit: Ingo makes a great point in the comments. The file might not be completely there just because it exists. One way to guarantee that it's ready is have the first process create a second file after the first is completely written. Then have the Java program detect that second file, delete it and then safely read the first one.

Renaming a Log4J log file during the program run

We're recently switched over to Log4J from JUL (java.util.Logging) because I wanted to add additional log files for different logging levels.
We have the option in the program to optionally append a value and a date/time stamp to the log file name at the (for all intents and purposes) end of the program's execution.
Because JUL seemed to open and close the file as needed to write to the file, it wasn't locked and we could simply use .renameTo() to change the filename.
Now, using Log4J, that file is left open and is locked, preventing us from renaming the file(s).
I can't decide the name of the file before I configure the logging because the property file containing the options for renaming is some time after the logging is needed (this is why we renamed it at the end of the program).
Do you have any suggestions as to how this can be achieved?
Would Logback and/or SLF4J help or hinder this?
I have sort of worked around the issue by using a system parameter in the log4j properties file, setting the property and then reloading the property file.
This allows me to change the name of the log file to something else at the end of the run, and then rename the old files.
It's inelegant, and very much of a kludge, so I would like to avoid this as it also leaves these temporary files around after the run.

One surefire approach would be to implement your own log4j Appender, perhaps based on the FileAppender ( http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/FileAppender.html ). Add your own specialized API to request the file be renamed.
I haven't tried this yet, but the tact I would take would be to use the underlying API setFile(...): http://www.jdocs.com/log4j/1.2.13/org/apache/log4j/FileAppender.html#M-setFile%28String,boolean,boolean,int%29
For example:
public class RenamingFileAppender extends FileAppender {
...
/** fix concurrency issue in stock implementation **/
public synchronized void setFile(String file) {
super.setFile(file);
}
public synchronized void renameFile(String newName) {
// whole method is synchronized to avoid losing log messages
// implementation can be smarter in having a short term queue
// for any messages that arrive while file is being renamed
File currentFile = new File(this.fileName);
File newFile = new File(newName);
// do checks to ensure current file exists, can be renamed etc.
...
// create a temp file to use while current log gets renamed
File tempFile = File.createTempFile("renaming-appender", ".log");
tempFile.deleteOnExit();
// tell underlying impl to use temporary file, so current file is flushed and closed
super.setFile(tempFile.getAbsolutePath(), false, this.bufferedIO, this.bufferSize);
// rename the recently closed file
currentFile.renameTo(newFile);
// now go back to the original log contents under the new name. Note append=true
super.setFile(newFile.getAbsolutePath(), true, this.bufferedIO, this.bufferSize);
}

Consider using a shutdown hooks, and renaming the file there...
http://onjava.com/pub/a/onjava/2003/03/26/shutdownhook.html
http://www.developerfeed.com/threads/tutorial/understanding-java-shutdown-hook
http://download.oracle.com/javase/1.4.2/docs/guide/lang/hook-design.html

How to handle incomplete files? Getting exception

I need to create a java program which will create thread to search for a file in particular folder(source folder) and pick the file immediately for process work(convert it into csv file format) once it found the file in the source folder. Problem i am facing now is file which comes to source folder is big size(FTP tool is used to copy file from server to source folder), thread is picking that file immediately before it copies fully to source folder and throwing exception. How do i stop thread until the file copy into source folder completely?. It has to pick the file for processing only after the file is copied completely into source folder.

Tha safest way is to download the file to a different location and then move it to the target folder.
Another variation mentioned by Bombe is to change the file name to some other extension after downloading and look only for files with that extension.

I only read the file which is not in write mode. This is safest as this means no other process is writing in this file. You can check if file is not in write mode by using canWrite method of File class.
This solution works fine for me as I also have the exact same scenario you facing.

You could try different things:
Repeatedly check the last modification date and the size of the file until it doesn’t change anymore for a given amount of time, then process it. (As pointed out by qbeuek this is neither safe nor deterministic.)
Only process files with names that match certain criteria (e.g. *.dat). Change the FTP upload/download process to upload/download files with a different name (e.g. *.dat.temp) and rename the files once they are complete.
Download the files to a different location and move them to your processing directory once they’re complete.
As Vinegar said, if it doesn’t work the first time, try again later. :)

If you have some control on the process that does the FTP you could potentially have it create a "flag file" in the source directory immediately AFTER the ftp for the big file is finished.
Then your Java thread has to check the presence of this flag file, if it's present then there is a file ready to be processed in the source directory. Before processing the big file, the thread should remove the flag file.
Flag file can be anything (even an empty file).

Assuming you have no control over FTP process...
Let it be like this. When you get the exception, then try to process it again next time. Repeat it until the file gets processed. Its good to keep few attributes in case of exception to check it later, like; name, last-modified, size.
Check the exact exception before deciding to process it later, the exception might occur for some other reason.

If your OS is Linux, and your kernel > 2.6.13, you could use the filesystem event notification API named inotify.
There's a Java implementation here : https://bitbucket.org/nbargnesi/inotify-java.
Here's a sample code (heavily inspired from the website).
try {
Inotify i = new Inotify();
InotifyEventListener e = new InotifyEventListener() {
#Override
public void filesystemEventOccurred(InotifyEvent e) {
System.out.println("inotify event occurred!");
}
#Override
public void queueFull(EventQueueFull e) {
System.out.println("inotify event queue: " + e.getSource() +
" is full!");
}
};
i.addInotifyEventListener(e);
i.addWatch(System.getProperty("user.home"), Constants.IN_CLOSE_WRITE);
} catch (UnsatisfiedLinkError e) {
System.err.println("unsatisfied link error");
} catch (UserLimitException e) {
System.err.println("user limit exception");
} catch (SystemLimitException e) {
System.err.println("system limit exception");
} catch (InsufficientKernelMemoryException e) {
System.err.println("insufficient kernel memory exception");
}

This is in Grails and I am using FileUtils Library from the Apache commons fame. The sizeof function returns the size in bytes.
def fileModified = sourceFile.lastModified()
def fileSize = FileUtils.sizeOf(sourceFile)
Thread.sleep(3000) //sleep to calculate size difference if the file is currently getting copied
if((fileSize != FileUtils.sizeOf(sourceFile)) && (fileModified != sourceFile.lastModified())) //the file is still getting copied to return
{
if(log.infoEnabled)
log.info("File is getting copied!")
return
}
Thread.sleep(1000) //breather for picking up file just copied.
Please note that this also depends on what utility or OS you are using to transfer the files.
The safest bet is to copy the file which is been copied or has been copied to different file or directory. The copy process is robust one and it assure you that file is present after the copying process. The one I am using is from commons API.
FileUtils.copyFileToDirectory(File f, Directory D)
If you are copying a huge file which is in process of getting copied beware that this will take time and you might like to start this in parallel thread or best have a seperate application dedicated for transfer process.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading current and new files from a directory using Java - java

Related

WatchService issues - logfile doesn't refresh on modification

Java copy-overwrite file, gets old file when reading

Java: Efficient way to scan a folder for a particular file

Renaming a Log4J log file during the program run

How to handle incomplete files? Getting exception

Categories

Resources