How to test if a file is "complete" (completely written) with Java - java

Let's say you had an external process writing files to some directory, and you had a separate process periodically trying to read files from this directory. The problem to avoid is reading a file that the other process is currently in the middle of writing out, so it would be incomplete. Currently, the process that reads uses a minimum file age timer check, so it ignores all files unless their last modified date is more than XX seconds old.
I'm wondering if there is a cleaner way to solve this problem. If the filetype is unknown (could be a number of different formats) is there some reliable way to check the file header for the number of bytes that should be in the file, vs the number of bytes currently in the file to confirm they match?
Thanks for any thoughts or ideas!

The way I've done this in the past is that the process writing the file writes to a "temp" file, and then moves the file to the read location when it has finished writing the file.
So the writing process would write to info.txt.tmp. When it's finished, it renames the file to info.txt. The reading process then just had to check for the existence of info.txt - and it knows that if it exists, it has been written completely.
Alternatively you could have the write process write info.txt to a different directory, and then move it to the read directory if you don't like using weird file extensions.

You could use an external marker file. The writing process could create a file XYZ.lock before it starts creating file XYZ, and delete XYZ.lock after XYZ is completed. The reader would then easily know that it can consider a file complete only if the corresponding .lock file is not present.

I had no option of using temp markers etc as the files are being uploaded by clients over keypair SFTP. they can be very large in size.
Its quite hacky but I compare file size before and after sleeping a few seconds.
Its obviously not ideal to lock the thread but in our case it is merely running as a background system processes so seems to work fine
private boolean isCompletelyWritten(File file) throws InterruptedException{
Long fileSizeBefore = file.length();
Thread.sleep(3000);
Long fileSizeAfter = file.length();
System.out.println("comparing file size " + fileSizeBefore + " with " + fileSizeAfter);
if (fileSizeBefore.equals(fileSizeAfter)) {
return true;
}
return false;
}
Note: as mentioned below this might not work on windows. This was used in a Linux environment.

One simple solution I've used in the past for this scenario with Windows is to use boolean File.renameTo(File) and attempt to move the original file to a separate staging folder:
boolean success = potentiallyIncompleteFile.renameTo(stagingAreaFile);
If success is false, then the potentiallyIncompleteFile is still being written to.

This possible to do by using Apache Commons IO maven library FileUtils.copyFile() method. If you try to copy file and get IOException its means that file is not completely saved.
Example:
public static void copyAndDeleteFile(File file, String destinationFile) {
try {
FileUtils.copyFile(file, new File(fileDirectory));
} catch (IOException e) {
e.printStackTrace();
copyAndDeleteFile(file, fileDirectory, delayThreadPeriod);
}
Or periodically check with some delay size of folder that contains this file:
FileUtils.sizeOfDirectory(folder);

Even the number of bytes are equal, the content of the file may be different.
So I think, you have to match the old and the new file byte by byte.

2 options that seems to solve this issue:
the best option- writer process notify reading process somehow that
the writing was finished.
write the file to {id}.tmp, than when finish- rename it to {id}.java, and the reading process run only on *.java files. renaming taking much less time and the chance this 2 process work together decrease.

First, there's Why doesn't OS X lock files like windows does when copying to a Samba share? but that's variation of what you're already doing.
As far as reading arbitrary files and looking for sizes, some files have that information, some do not, but even those that do do not have any common way of representing it. You would need specific information of each format, and manage them each independently.
If you absolutely must act on the file the "instant" it's done, then your writing process would need to send some kind of notification. Otherwise, you're pretty much stuck polling the files, and reading the directory is quite cheap in terms of I/O compared to reading random blocks from random files.

One more method to test that a file is completely written:
private void waitUntilIsReadable(File file) throws InterruptedException {
boolean isReadable = false;
int loopsNumber = 1;
while (!isReadable && loopsNumber <= MAX_NUM_OF_WAITING_60) {
try (InputStream in = new BufferedInputStream(new FileInputStream(file))) {
log.trace("InputStream readable. Available: {}. File: '{}'",
in.available(), file.getAbsolutePath());
isReadable = true;
} catch (Exception e) {
log.trace("InputStream is not readable yet. File: '{}'", file.getAbsolutePath());
loopsNumber++;
TimeUnit.MILLISECONDS.sleep(1000);
}
}
}

Use this for Unix if you are transferring files using FTP or Winscp:
public static void isFileReady(File entry) throws Exception {
long realFileSize = entry.length();
long currentFileSize = 0;
do {
try (FileInputStream fis = new FileInputStream(entry);) {
currentFileSize = 0;
while (fis.available() > 0) {
byte[] b = new byte[1024];
int nResult = fis.read(b);
currentFileSize += nResult;
if (nResult == -1)
break;
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("currentFileSize=" + currentFileSize + ", realFileSize=" + realFileSize);
} while (currentFileSize != realFileSize);
}

Related

How to check that file is opened by another process in Java? [duplicate]

I need to write a custom batch File renamer. I've got the bulk of it done except I can't figure out how to check if a file is already open. I'm just using the java.io.File package and there is a canWrite() method but that doesn't seem to test if the file is in use by another program. Any ideas on how I can make this work?
Using the Apache Commons IO library...
boolean isFileUnlocked = false;
try {
org.apache.commons.io.FileUtils.touch(yourFile);
isFileUnlocked = true;
} catch (IOException e) {
isFileUnlocked = false;
}
if(isFileUnlocked){
// Do stuff you need to do with a file that is NOT locked.
} else {
// Do stuff you need to do with a file that IS locked
}
(The Q&A is about how to deal with Windows "open file" locks ... not how implement this kind of locking portably.)
This whole issue is fraught with portability issues and race conditions:
You could try to use FileLock, but it is not necessarily supported for your OS and/or filesystem.
It appears that on Windows you may be unable to use FileLock if another application has opened the file in a particular way.
Even if you did manage to use FileLock or something else, you've still got the problem that something may come in and open the file between you testing the file and doing the rename.
A simpler though non-portable solution is to just try the rename (or whatever it is you are trying to do) and diagnose the return value and / or any Java exceptions that arise due to opened files.
Notes:
If you use the Files API instead of the File API you will get more information in the event of a failure.
On systems (e.g. Linux) where you are allowed to rename a locked or open file, you won't get any failure result or exceptions. The operation will just succeed. However, on such systems you generally don't need to worry if a file is already open, since the OS doesn't lock files on open.
// TO CHECK WHETHER A FILE IS OPENED
// OR NOT (not for .txt files)
// the file we want to check
String fileName = "C:\\Text.xlsx";
File file = new File(fileName);
// try to rename the file with the same name
File sameFileName = new File(fileName);
if(file.renameTo(sameFileName)){
// if the file is renamed
System.out.println("file is closed");
}else{
// if the file didnt accept the renaming operation
System.out.println("file is opened");
}
On Windows I found the answer https://stackoverflow.com/a/13706972/3014879 using
fileIsLocked = !file.renameTo(file)
most useful, as it avoids false positives when processing write protected (or readonly) files.
org.apache.commons.io.FileUtils.touch(yourFile) doesn't check if your file is open or not. Instead, it changes the timestamp of the file to the current time.
I used IOException and it works just fine:
try
{
String filePath = "C:\sheet.xlsx";
FileWriter fw = new FileWriter(filePath );
}
catch (IOException e)
{
System.out.println("File is open");
}
I don't think you'll ever get a definitive solution for this, the operating system isn't necessarily going to tell you if the file is open or not.
You might get some mileage out of java.nio.channels.FileLock, although the javadoc is loaded with caveats.
Hi I really hope this helps.
I tried all the options before and none really work on Windows. The only think that helped me accomplish this was trying to move the file. Event to the same place under an ATOMIC_MOVE. If the file is being written by another program or Java thread, this definitely will produce an Exception.
try{
Files.move(Paths.get(currentFile.getPath()),
Paths.get(currentFile.getPath()), StandardCopyOption.ATOMIC_MOVE);
// DO YOUR STUFF HERE SINCE IT IS NOT BEING WRITTEN BY ANOTHER PROGRAM
} catch (Exception e){
// DO NOT WRITE THEN SINCE THE FILE IS BEING WRITTEN BY ANOTHER PROGRAM
}
If file is in use FileOutputStream fileOutputStream = new FileOutputStream(file); returns java.io.FileNotFoundException with 'The process cannot access the file because it is being used by another process' in the exception message.

new FileOutputStream slow, is there a better way?

I'm writing a bunch of relatively small files (about 50k or so each).
The total processing time for writing all of these files is about 400 seconds.
I put in some checks to see what's taking the most time and of that 400 total seconds, 12 seconds is spent writing the data to the files and 380 seconds are spent just doing this code:
fos = new FileOutputStream(fileObj);
I would expect the writing and closing of the file to take most of the time but it looks like just creating the FileOutputStream is taking the most amount of time by far.
Is there a better way to create my files or is the file creation just generally a slow operation? This is the total time for thousands of files by the way, not just the time for a single file.
What you are seeing is pretty much normal behavior, its not java-specific.
When a file is created, the file system needs to add a file entry to its structures, and in the process modify existing structure (e.g. the directory the file is contained in) to take note of the new entry.
On a typical harddisk this requires some head movements, a single seek takes time in the order of milliseconds. On the other hand, once you start writing to the file, the file system will assign new blocks to the file in a linear fashion (as long as possible), so you can write sequential data with about the maximum speed the drive can handle.
The only way to make major improvements in speed is use a faster device (e.g. an SSD drive).
You can pretty much observe this effect everywhere, Windows explorer and similar tools all show the same behavior: large files are copied with speeds close to the devices limits, while tons of small files go painfully slow.
Something to avoid that problem and spend the same time in all files is when you give the path of the file delete the extension and when you finish to copy that file, rename the file with the extension you took before. Here is an example:
public static void copiarArchivo(String pathOrigen, String pathDestino)
{
InputStream in = null;
OutputStream out = null;
// ultPunto has the index where the last point is in the name of the
// file. Before of the last point is the fileName after is the extension
int ultPunto = pathDestino.lastIndexOf(".");
// take the extension of the file
String extension = pathDestino.substring(ultPunto, pathDestino.length());
// take the fileName without extension
String pathSinExtension = pathDestino.substring(0, ultPunto);
try
{
in = new FileInputStream(pathOrigen);
// creates the new file without extension cause it is faster as
// expleanied below
out = new FileOutputStream(pathSinExtension);
byte[] buf = new byte[buffer];
int len;
// binary copy of the content file
while ((len = in.read(buf)) > 0)
{
out.write(buf, 0, len);
}
} catch (IOException e) {
e.printStackTrace();
}
// when the finished is copyed or and exception occour the streams must
// be closed to save resources
finally
{
try
{
if(in != null )
in.close();
if(out != null)
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
// the file was copyed with out extension and it must be added after the
// fileName
new File(pathSinExtension).renameTo(new File(pathSinExtension + extension));
}
Where pathOrigen is the path of the file you want to copy and the pathDestino is the path where it is going to be copyed.

reduce number of opened files in java code

Hi I have some code that uses block
RandomAccessFile file = new RandomAccessFile("some file", "rw");
FileChannel channel = file.getChannel();
// some code
String line = "some data";
ByteBuffer buf = ByteBuffer.wrap(line.getBytes());
channel.write(buf);
channel.close();
file.close();
but the specific of the application is that I have to generate large number of temporary files, more then 4000 in average (used for Hive inserts to the partitioned table).
The problem is that sometimes I catch exception
Failed with exception Too many open files
during the app running.
I wounder if there any way to tell OS that file is closed already and not used anymore, why the
channel.close();
file.close();
does not reduce the number of opened files. Is there any way to do this in Java code?
I have already increased max number of opened files in
#/etc/sysctl.conf:
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.ipc.somaxconn=8096
Update:
I tried to eliminate the problem, so I parted the code to investigate each part of it (create files, upload to hive, delete files).
Using class 'File' or 'RandomAccessFile' fails with the exception "Too many open files".
Finally I used the code:
FileOutputStream s = null;
FileChannel c = null;
try {
s = new FileOutputStream(filePath);
c = s.getChannel();
// do writes
c.write("some data");
c.force(true);
s.getFD().sync();
} catch (IOException e) {
// handle exception
} finally {
if (c != null)
c.close();
if (s != null)
s.close();
}
And this works with large amounts of files (tested on 20K with 5KB size each). The code itself does not throw exception as previous two classes.
But production code (with hive) still had the exception. And it appears that the hive connection through the JDBC is the reason of it.
I will investigate further.
The amount of open file handles that can be used by the OS is not the same thing as the number of file handles that can be opened by a process. Most unix systems restrict the number of file handles per process. Most likely it something like 1024 file handles for your JVM.
a) You need to set the ulimit in the shell that launches the JVM to some higher value. (Something like 'ulimit -n 4000')
b) You should verify that you don't have any resource leaks that are preventing your files from being 'finalized'.
Make sure to use a finally{} block. If there is an exception for some reason the close will never happen in the code as written.
Is this the exact code? Because I can think of one scenario where you might be opening all the files in a loop and written the code to close all of them in the end which is causing this problem. Please post the full code.

Reliable File.renameTo() alternative on Windows?

Java's File.renameTo() is problematic, especially on Windows, it seems.
As the API documentation says,
Many aspects of the behavior of this
method are inherently
platform-dependent: The rename
operation might not be able to move a
file from one filesystem to another,
it might not be atomic, and it might
not succeed if a file with the
destination abstract pathname already
exists. The return value should always
be checked to make sure that the
rename operation was successful.
In my case, as part of an upgrade procedure, I need to move (rename) a directory that may contain gigabytes of data (lots of subdirectories and files of varying sizes). The move is always done within the same partition/drive, so there's no real need to physically move all the files on disk.
There shouldn't be any file locks to the contents of the dir to be moved, but still, quite often, renameTo() fails to do its job and returns false. (I'm just guessing that perhaps some file locks expire somewhat arbitrarily on Windows.)
Currently I have a fallback method that uses copying & deleting, but this sucks because it may take a lot of time, depending on the size of the folder. I'm also considering simply documenting the fact that the user can move the folder manually to avoid waiting for hours, potentially. But the Right Way would obviously be something automatic and quick.
So my question is, do you know an alternative, reliable approach to do a quick move/rename with Java on Windows, either with plain JDK or some external library. Or if you know an easy way to detect and release any file locks for a given folder and all of its contents (possibly thousands of individual files), that would be fine too.
Edit: In this particular case, it seems we got away using just renameTo() by taking a few more things into account; see this answer.
See also the Files.move() method in JDK 7.
An example:
String fileName = "MyFile.txt";
try {
Files.move(new File(fileName).toPath(), new File(fileName).toPath(), java.nio.file.StandardCopyOption.REPLACE_EXISTING);
} catch (IOException ex) {
Logger.getLogger(SomeClass.class.getName()).log(Level.SEVERE, null, ex);
}
For what it's worth, some further notions:
On Windows, renameTo() seems to fail if the target directory exists, even if it's empty. This surprised me, as I had tried on Linux, where renameTo() succeeded if target existed, as long as it was empty.
(Obviously I shouldn't have assumed this kind of thing works the same across platforms; this is exactly what the Javadoc warns about.)
If you suspect there may be some lingering file locks, waiting a little before the move/rename might help. (In one point in our installer/upgrader we added a "sleep" action and an indeterminate progress bar for some 10 seconds, because there might be a service hanging on to some files). Perhaps even do a simple retry mechanism that tries renameTo(), and then waits for a period (which maybe increases gradually), until the operation succeeds or some timeout is reached.
In my case, most problems seem to have been solved by taking both of the above into account, so we won't need to do a native kernel call, or some such thing, after all.
The original post requested "an alternative, reliable approach to do a quick move/rename with Java on Windows, either with plain JDK or some external library."
Another option not mentioned yet here is v1.3.2 or later of the apache.commons.io library, which includes FileUtils.moveFile().
It throws an IOException instead of returning boolean false upon error.
See also big lep's response in this other thread.
On windows i use Runtime.getRuntime().exec("cmd \\c ") and then use commandline rename function to actually rename files. It is much more flexible, e.g if you want to rename extension of all txt files in a dir to bak just write this to output stream:
rename *.txt *.bak
I know it is not a good solution but apparently it has always worked for me, much better then Java inline support.
In my case it seemed to be a dead object within my own application, which kept a handle to that file. So that solution worked for me:
for (int i = 0; i < 20; i++) {
if (sourceFile.renameTo(backupFile))
break;
System.gc();
Thread.yield();
}
Advantage: it is pretty quick, as there is no Thread.sleep() with a specific hardcoded time.
Disadvantage: that limit of 20 is some hardcoded number. In all my tests, i=1 is enough. But to be sure I left it at 20.
I know this seems a little hacky, but for what I've been needing it for, it seems buffered readers and writers have no issue making the files.
void renameFiles(String oldName, String newName)
{
String sCurrentLine = "";
try
{
BufferedReader br = new BufferedReader(new FileReader(oldName));
BufferedWriter bw = new BufferedWriter(new FileWriter(newName));
while ((sCurrentLine = br.readLine()) != null)
{
bw.write(sCurrentLine);
bw.newLine();
}
br.close();
bw.close();
File org = new File(oldName);
org.delete();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
}
Works well for small text files as part of a parser, just make sure oldName and newName are full paths to the file locations.
Cheers
Kactus
The following piece of code is NOT an 'alternative' but has reliably worked for me on both Windows and Linux environments:
public static void renameFile(String oldName, String newName) throws IOException {
File srcFile = new File(oldName);
boolean bSucceeded = false;
try {
File destFile = new File(newName);
if (destFile.exists()) {
if (!destFile.delete()) {
throw new IOException(oldName + " was not successfully renamed to " + newName);
}
}
if (!srcFile.renameTo(destFile)) {
throw new IOException(oldName + " was not successfully renamed to " + newName);
} else {
bSucceeded = true;
}
} finally {
if (bSucceeded) {
srcFile.delete();
}
}
}
Why not....
import com.sun.jna.Native;
import com.sun.jna.Library;
public class RenamerByJna {
/* Requires jna.jar to be in your path */
public interface Kernel32 extends Library {
public boolean MoveFileA(String existingFileName, String newFileName);
}
public static void main(String[] args) {
String path = "C:/yourchosenpath/";
String existingFileName = path + "test.txt";
String newFileName = path + "renamed.txt";
Kernel32 kernel32 = (Kernel32) Native.loadLibrary("kernel32", Kernel32.class);
kernel32.MoveFileA(existingFileName, newFileName);
}
}
works on nwindows 7, does nothing if existingFile does not exist, but obviously could be better instrumented to fix this.
I had a similar issue. File was copied rather moving on Windows but worked well on Linux. I fixed the issue by closing the opened fileInputStream before calling renameTo(). Tested on Windows XP.
fis = new FileInputStream(originalFile);
..
..
..
fis.close();// <<<---- Fixed by adding this
originalFile.renameTo(newDesitnationForOriginalFile);
In my case, the error was in the path of the parent directory. Maybe a bug, I had to use the substring to get a correct path.
try {
String n = f.getAbsolutePath();
**n = n.substring(0, n.lastIndexOf("\\"));**
File dest = new File(**n**, newName);
f.renameTo(dest);
} catch (Exception ex) {
...
Well I have found a pretty straight forward solution to this problem -
boolean retVal = targetFile.renameTo(new File("abcd.xyz"));
while(!retVal) {
retVal= targetFile.renameTo(new File("abcd.xyz"));
}
As suggested by Argeman, you can place a counter and limit the number of times the while loop will run so that it doesn't get into an infinite loop in case of some file are being used by another windows process.
int counter = 0;
boolean retVal = targetFile.renameTo(new File("abcd.xyz"));
while(!retVal && counter <= 10) {
retVal = targetFile.renameTo(new File("abcd.xyz"));
counter = counter + 1;
}
I know it sucks, but an alternative is to create a bat script which outputs something simple like "SUCCESS" or "ERROR", invoke it, wait for it to be executed and then check its results.
Runtime.getRuntime().exec("cmd /c start test.bat");
This thread may be interesting. Check also the Process class on how to read the console output of a different process.
You may try robocopy. This is not exactly "renaming", but it's very reliable.
Robocopy is designed for reliable mirroring of directories or directory trees. It has features to ensure all NTFS attributes and properties are copied, and includes additional restart code for network connections subject to disruption.
To move/rename a file you can use this function:
BOOL WINAPI MoveFile(
__in LPCTSTR lpExistingFileName,
__in LPCTSTR lpNewFileName
);
It is defined in kernel32.dll.
File srcFile = new File(origFilename);
File destFile = new File(newFilename);
srcFile.renameTo(destFile);
The above is the simple code. I have tested on windows 7 and works perfectly fine.

How to handle incomplete files? Getting exception

I need to create a java program which will create thread to search for a file in particular folder(source folder) and pick the file immediately for process work(convert it into csv file format) once it found the file in the source folder. Problem i am facing now is file which comes to source folder is big size(FTP tool is used to copy file from server to source folder), thread is picking that file immediately before it copies fully to source folder and throwing exception. How do i stop thread until the file copy into source folder completely?. It has to pick the file for processing only after the file is copied completely into source folder.
Tha safest way is to download the file to a different location and then move it to the target folder.
Another variation mentioned by Bombe is to change the file name to some other extension after downloading and look only for files with that extension.
I only read the file which is not in write mode. This is safest as this means no other process is writing in this file. You can check if file is not in write mode by using canWrite method of File class.
This solution works fine for me as I also have the exact same scenario you facing.
You could try different things:
Repeatedly check the last modification date and the size of the file until it doesn’t change anymore for a given amount of time, then process it. (As pointed out by qbeuek this is neither safe nor deterministic.)
Only process files with names that match certain criteria (e.g. *.dat). Change the FTP upload/download process to upload/download files with a different name (e.g. *.dat.temp) and rename the files once they are complete.
Download the files to a different location and move them to your processing directory once they’re complete.
As Vinegar said, if it doesn’t work the first time, try again later. :)
If you have some control on the process that does the FTP you could potentially have it create a "flag file" in the source directory immediately AFTER the ftp for the big file is finished.
Then your Java thread has to check the presence of this flag file, if it's present then there is a file ready to be processed in the source directory. Before processing the big file, the thread should remove the flag file.
Flag file can be anything (even an empty file).
Assuming you have no control over FTP process...
Let it be like this. When you get the exception, then try to process it again next time. Repeat it until the file gets processed. Its good to keep few attributes in case of exception to check it later, like; name, last-modified, size.
Check the exact exception before deciding to process it later, the exception might occur for some other reason.
If your OS is Linux, and your kernel > 2.6.13, you could use the filesystem event notification API named inotify.
There's a Java implementation here : https://bitbucket.org/nbargnesi/inotify-java.
Here's a sample code (heavily inspired from the website).
try {
Inotify i = new Inotify();
InotifyEventListener e = new InotifyEventListener() {
#Override
public void filesystemEventOccurred(InotifyEvent e) {
System.out.println("inotify event occurred!");
}
#Override
public void queueFull(EventQueueFull e) {
System.out.println("inotify event queue: " + e.getSource() +
" is full!");
}
};
i.addInotifyEventListener(e);
i.addWatch(System.getProperty("user.home"), Constants.IN_CLOSE_WRITE);
} catch (UnsatisfiedLinkError e) {
System.err.println("unsatisfied link error");
} catch (UserLimitException e) {
System.err.println("user limit exception");
} catch (SystemLimitException e) {
System.err.println("system limit exception");
} catch (InsufficientKernelMemoryException e) {
System.err.println("insufficient kernel memory exception");
}
This is in Grails and I am using FileUtils Library from the Apache commons fame. The sizeof function returns the size in bytes.
def fileModified = sourceFile.lastModified()
def fileSize = FileUtils.sizeOf(sourceFile)
Thread.sleep(3000) //sleep to calculate size difference if the file is currently getting copied
if((fileSize != FileUtils.sizeOf(sourceFile)) && (fileModified != sourceFile.lastModified())) //the file is still getting copied to return
{
if(log.infoEnabled)
log.info("File is getting copied!")
return
}
Thread.sleep(1000) //breather for picking up file just copied.
Please note that this also depends on what utility or OS you are using to transfer the files.
The safest bet is to copy the file which is been copied or has been copied to different file or directory. The copy process is robust one and it assure you that file is present after the copying process. The one I am using is from commons API.
FileUtils.copyFileToDirectory(File f, Directory D)
If you are copying a huge file which is in process of getting copied beware that this will take time and you might like to start this in parallel thread or best have a seperate application dedicated for transfer process.

Categories