Java I/O over an NFS mount

Java I/O over an NFS mount - java

I have a bit of Java code that outputs an XML file to a NFS mounted filesystem. On another server that has the filesytem mounted as a Samba share, there is a process running that polls for new XML files every 30 seconds. If a new file is found, it is processed and then renamed as a backup file. 99% of the time, the files are written without an issue. However, every now and then the backup file contains a partially written file.
After some discussion with some other people, we guessed that the process running on the external server was interfering with the Java output stream when it read the file. They suggested first creating a file of type .temp which will then be renamed to .xml after the file write is complete. A common industry practice. After the change, the rename fails every time.
Some research turned up that Java file I/O is buggy when working with NFS mounted filesystems.
Help me Java gurus! How do I solve this problem?
Here is some relevant information:
My process is Java 1.6.0_16 running on Solaris 10
Mounted filesystem is a NAS
Server with polling process is Windows Server 2003 R2 Standard, Service Pack 2
Here is a sample of my code:
//Write the file
XMLOutputter serializer = new XMLOutputter(Format.getPrettyFormat());
FileOutputStream os = new FileOutputStream(outputDirectory + fileName + ".temp");
serializer.output(doc, os);//doc is a constructed xml document using JDOM
os.flush();
os.close();
//Rename the file
File oldFile = new File(outputDirectory + fileName + ".temp");
File newFile = new File(fileName + ".xml");
boolean success = oldFile.renameTo(newFile);
if (!success) {
// File was not successfully renamed.
throw new IOException("The file " + fileName + ".temp could not be renamed.");
}//if

You probably have to specify the complete path in the new file name:
File newFile = new File(outputDirectory + fileName + ".xml");

This looks like a bug to me:
File oldFile = new File(outputDirectory + fileName + ".temp");
File newFile = new File(fileName + ".xml");
I would have expected this:
File oldFile = new File(outputDirectory + fileName + ".temp");
File newFile = new File(outputDirectory + fileName + ".xml");
In general, it sounds like there is a race condition between the writing of the XML file and the read/process/rename task. Can you have the read/process/rename task only operate on files > 1 minute old or something similar?
Or, have the Java program write out an additional, empty file once it has completed writing out the XML file that signals that the writing to the XML file has completed. Only read/process/rename the XML file when the signal file is present. Then delete the signal file.

The original bug definitely sounds like an issue with concurrent access to the file -- your solution should have worked, but there are alternate solutions too.
For example, put a timer on your auto-read process so it when a new file is detected it records filesize, sleeps X seconds, and then if the sizes don't match restarts the timer. That should avoid problems with partial file transfer.
EDIT: or check the timestamps as pre above to check this, but make sure it's old enough that any imprecision in the timestamp doesn't matter (say, 10 seconds to 1 minute since last modified).
Alternately, try this:
File f = new File("foo.xml");
FileOutputStream fos = new FileOutputStream(f);
FileChannel fc = fos.getChannel();
FileLock lock = fc.lock();
(DO FILE WRITE)
fis.flush();
lock.release();
fos.close();
This SHOULD use native OS file locking to prevent concurrent access by other programs (such as your XML reader daemon).
As far as NFS glitches: there is a documented "feature" (bug) where files can't be moved between filesystems via "rename" in Java. Could there be confusion, since it is on a NFS filesystem?

Some information to NFS in general. Depending on your NFS settings, locks might not work at all and a lot of big NFS installations are tuned for read performance, therefore new data might turn up later than expected, due to caching effects.
I have seen effects where you created a file, added data (this was seen on another machine), but all data after that appeared with a 30 sec delay.
Best solution by the way is a rotating file schema. So that the last one is assumed to be written and the one before was safely written and can be read. I would not work on a single file and use it as a "pipe".
You can alternatively use an empty file that is written after the large file was written and closed properly. So if the small guys is there, the big guy was definitively done and can be read.

Possibly due to "The rename operation might not be able to move a file from one filesystem to another" from http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#renameTo%28java.io.File%2)
Try using apache commons io FiltUtils.copyFileToDirectory http://commons.apache.org/io/api-release/org/apache/commons/io/FileUtils.html#copyFileToDirectory(java.io.File,%20java.io.File) instead

Related

handling about 450.000 files in a zip

My question is simple. Would Java handle a .zip file with about 450,000 files in there? The code that I wrote would not load all of the files, just one specific file would be searched in the zip, and be read line by line. The file size is about 500kb.
Would this work or will I get an OutOfMemory Exception?
Oh sry, uncompressed there about 0,5MB. Zipped are they whole files about 250mb.
Ok, the name of the Files are IDs + Date(unique) in that zip file. If i have to check a log, ill call Java and give the ID + Date and Java is reading just that one file, never more.
Edit: It works, it works very well. About 400.000 files in a zip, if u have the Memory to Zip the Files works without any problem.
Edit2: It works on Linux Filesystems witout a problem, on NTFS sometimes it crashed. NTFS has a problem with that musch files in 1 Zip.

Using the zip filesystem in Java 7, you can actually access one individual file pretty easily and open a BufferedReader on it.
First you have to create the FileSystem:
public static FileSystem getZipFileSystem(final String zipPath)
{
final Path path = Paths.get(zipPath).toAbsolutePath();
final Map<String, Object> env = new HashMap<>();
final URI uri = URI.create("jar:file:" + path.toString());
return FileSystems.newFileSystem(uri, env, null);
}
Once you have done that, you can create a BufferedReader from an entry in the zip itself:
try (
final FileSystem fs = getZipFileSystem("/path/to/the.zip");
final BufferedReader reader = Files.newBufferedReader(fs.getPath("path/to/entry"),
StandardCharsets.UTF_8);
) {
// operate on the reader
}
You could also read all lines in the entry at once using Files.readAllLines().
If you wish to copy a zip entry to a file on the filesystem, you can also do that:
Files.copy(zipfs.getPath("path/to/entry"), Paths.get("file/on/local/fs"));
Or you can directly copy the result to an OutputStream, or directly create an entry from an OutputStream...
Or even walk the entire zip using Files.walkFileTree().
Or get all the entries in a "directory" in a zip using Files.newDirectoryStream(). Note that as its name says, this is a stream; unlike File.listFiles() (which only works on files on disk anyway), this returns a iterator over the entries.
Or... Or... Or...
Note that a FileSystem needs to be .close()d.

I'm not sure that I understand what you're trying to do.
If it's 0.5 MB/file and 450,000 files, you'll need 225GB. You won't have enough memory to do all this in a single zip in memory even if you get 90% compression.
I'd recommend breaking it into manageable chunks. You'll be able to parallelize that way too, so it's not a bad idea.

Do I need to delete tmp files created by my java application?

I output several temporary files in my application to tmp directories but was wondering if it is best practise that I delete them on close or should I expect the host OS to handle this for me?
I am pretty new to Java, I can handle the delete but want to keep the application as multi-OS and Linux friendly as possible. I have tried to minimise file deletion if I don't need to do it.
This is the method I am using to output the tmp file:
try {
java.io.InputStream iss = getClass().getResourceAsStream("/nullpdf.pdf");
byte[] data = IOUtils.toByteArray(iss);
iss.read(data);
iss.close();
String tempFile = "file";
File temp = File.createTempFile(tempFile, ".pdf");
FileOutputStream fos = new FileOutputStream(temp);
fos.write(data);
fos.flush();
fos.close();
nopathbrain = temp.getAbsolutePath();
System.out.println(tempFile);
System.out.println(nopathbrain);
} catch (IOException ex) {
ex.printStackTrace();
System.out.println("TEMP FILE NOT CREATED - ERROR ");
}

createTempFile() only creates a new file with a unique name, but does not mark it for deletion. Use deleteOnExit() on the created file to achieve that. Then, if the JVM shuts down properly, the temporary files should be deleted.
edit:
Sample for creating a 'true' temporary file in java:
File temp = File.createTempFile("temporary-", ".pdf");
temp.deleteOnExit();
This will create a file in the default temporary folder with a unique random name (temporary-{randomness}.pdf) and delete it when the JVM exits.
This should be sufficient for programs with a short to medium run time (e.g. scripts, simple GUI applications) that do sth. and then exit. If the program runs longer or indefinitely (server application, a monitoring client, ...) and the JVM won't exit, this method may clog the temporary folder with files. In such a situation the temporary files should be deleted by the application, as soon as they are not needed anymore (see delete() or Files helper class in JDK7).
As Java already abstracts away OS specific file system details, both approaches are as portable as Java. To ensure interoperability have a look at the new Path abstraction for file names in Java7.

java renameTo method not working

I know this has been probably answered a million times on here but everything I have looked at has not helped me. Here is my code:
for(File g: f.listFiles()){
for(File h : g.listFiles()){
try{
Scanner s = new Scanner(h);
String timestamp = s.next().split("[?]")[4];
File z = new File(h.getAbsolutePath().split("[.]")[0] + timestamp + h.getAbsolutePath().split("[.]")[1]);
boolean q = h.renameTo(z);
}catch(Exception e){
}
}
}
I have checked to see if File z exists and it doesnt. I have checked if File h exists and it does. I have doublechecked that h is an absolute path. If I print the absolute path of z, I get the correct path. None of the directories in f or files in g are open. The files denoted by h are not open. Could there be some flag set or something on the file where windows is not allowing my program to rename it?

My guess is that you are having a similar problem to one I had here File deletion/moving failing
Try using FileinputStreams for the Scanner
FileInputStream fin = new FileInputStream(h);
fin.open()
Scanner s = new Scanner(fin);
//do work
fin.close()
and closing the stream before renaming

The behavior of renameTo varies from platform to platform. Operations that succeed on one platform may fail on another. For example, on my local development workstation (OS X), everything worked as expected. On a production system (Solaris), renameTo failed consistently. I finally determined that it failed when the files were located on different partitions. Obviously that is not the case here, but it illustrates that the method can behave in unexpected ways.
To get consistent behavior, copy the data to a new file, then delete the original.

I had a almost same issue. Some of rename cases succeeded, some failed. For those failed cases, I found, the source file path and destination file path are not on in same file system. In my cases, the NTFS mounted another file system which the destination file would be moved to. Since the rename function's original purpose simply rename a name, not to move the data of the concerned file. If both source file path and destination file path are in different file system, some version of JVM will fail on certain platforms. Actually, it is a bug in java.io and Solaris has fixed this bug in new versions.
Good Luck!
HappyForever,

Avoiding fragmentation when saving files to BlackBerry filesystem. Best practice?

In my application I need to save some file (a pdf) to the filesystem. My current method involves creating a directory for storing the files:
FileConnection fc = (FileConnection)Connector.open("file:///SDCard/BlackBerry/pdfs/");
if (!fc.exists())
fc.mkdir();
fc.close();
I then write to the directory with my file:
fc = (FileConnection)Connector.open("file:///SDCard/BlackBerry/pdfs/" + filename, Connector.READ_WRITE);
if (!fc.exists())
fc.create();
OutputStream outStream = fc.openOutputStream();
outStream.write(pdf);
outStream.close();
fc.close();
This all works fine, and my pdf arrives in my created directory. My question is: will I run into trouble with the fact that I have hard coded a file path as my save destination. With the BlackBerry API is it possible to retrieve a writeable folder which exists on all models/configurations?

You can query the system for the available roots using FileSystemRegistry.listRoots(). Note that it is not guaranteed that there will be an sdcard, or that it will be visible even if there is one (when in mass storage mode, for instance). I think that the only root guaranteed to be on all devices is internal storage ("file:///Store").
There's (a little) more information here.

Java: Efficient way to scan a folder for a particular file

I am contacting an external services with my Java app.
The flow is as follow: ->I generate an XML file, and put it in an folder, then the service processes the file and return another file with the same name having an extension .out
Right now after I put the file in the folder I start with a loop, until I get that file back so I can read the result.
Here is the code:
fileName += ".out";
File f = new File(fileName);
do
{
f = new File(fileName);
} while (!f.exists());
response = readResponse(fileName); // got the response now read it
My question comes here, am I doing it in the right way, is there a better/more efficient way to wait for the file?
Some info: I run my app on WinXP, usually it takes the external service less than a second to respond with a file, I send around 200 request per day to this services. The path to the folder with the result file is always the same.
All suggestions are welcome.
Thank you for your time.

There's no reason to recreate the File object. It just represents the file location, whether the file exists or not. Also you probably don't want a loop without at least a short delay, otherwise it'll just max out a processor until the file exists. You probably want something like this instead:
File file = new File(filename);
while (!file.exists()) {
Thread.sleep(100);
}
Edit: Ingo makes a great point in the comments. The file might not be completely there just because it exists. One way to guarantee that it's ready is have the first process create a second file after the first is completely written. Then have the Java program detect that second file, delete it and then safely read the first one.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java I/O over an NFS mount - java

You probably have to specify the complete path in the new file name: File newFile = new File(outputDirectory + fileName + ".xml");

Related

handling about 450.000 files in a zip

Do I need to delete tmp files created by my java application?

java renameTo method not working

Avoiding fragmentation when saving files to BlackBerry filesystem. Best practice?

Java: Efficient way to scan a folder for a particular file

Categories

Resources