File access synchronized on Java object - java

I have an object responsible for persisting JTable state to disk. It saves/loads visible columns, their size, position etc. A few interesting bits from its class definition are below.
class TableSaver {
Timer timer = new Timer(true);
TableSaver() {
timer.schedule(new TableSaverTimerTask(), 15000, SAVE_STATE_PERIOD);
}
synchronized TableColumns load(PersistentTable table) {
String xml = loadFile(table.getTableKey());
// parse XML, return
}
synchronized void save(String key, TableColumns value) {
try {
// Some preparations
writeFile(app.getTableConfigFileName(key), xml);
} catch (Exception e) {
// ... handle
}
}
private class TableSaverTimerTask extends TimerTask {
#Override
public void run() {
synchronized (TableSaver.this) {
Iterator<PersistentTable> iterator = queue.iterator();
while (iterator.hasNext()) {
PersistentTable table = iterator.next();
if (table.getTableKey() != null) {
save(table.getTableKey(), dumpState(table));
}
iterator.remove();
}
}
}
}
}
There only exists one instance of TableSaver, ever.
load() can be called from many threads. Timer clearly is another thread.
loadFile() and writeFile() do not leave open file streams - they use a robust, well tested and broadly used library which always closes the streams with try ... finally.
Sometimes this fails with an exception like:
java.lang.RuntimeException: java.io.FileNotFoundException: C:\path\to\table-MyTable.xml (The requested operation cannot be performed on a file with a user-mapped section open)
at package.FileUtil.writeFile(FileUtil.java:33)
at package.TableSaver.save(TableSaver.java:175)
at package.TableSaver.access$600(TableSaver.java:34)
at package.TableSaver$TableSaverTimerTask.run(TableSaver.java:246)
at java.util.TimerThread.mainLoop(Unknown Source)
at java.util.TimerThread.run(Unknown Source)
Caused by: java.io.FileNotFoundException: C:\path\to\table-MyTable.xml (The requested operation cannot be performed on a file with a user-mapped section open)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(Unknown Source)
at java.io.FileOutputStream.<init>(Unknown Source)
at package.FileUtilWorker.writeFile(FileUtilWorker.java:57)
... 6 more
So I have two questions:
How can this kind of synchronization fail? Note that I am sure there only is one instance of TableSaver.
What is this thing in the stacktrace: package.TableSaver.access$600(TableSaver.java:34)? Line 34 is the line with class TableSaver {. Can this be the reason why the synchronization is not working?

Google learns me that this seems to be Windows specific. Here's an extract of Bug 6354433:
This is Windows platform issue with memory-mapped file, i.e. MappedByteBuffer. The Java 5.0 doc for FileChannel state that "the buffer and the mapping that it represents will remain valid until the buffer itself is garbage-collected". The error occurs when we tried to re-open the filestore and the mapped byte buffer has not been GC. Since there is no unmap() method for mapped byte buffer (see bug 4724038), we're at the mercy of the underlying operating system on when the buffer get free up. Calling System.gc() might free up the buffer but it is not guarantee. The problem doesn't occurs on Solaris; may be due to the way shared memory is implemented on Solaris. So the work-around for Windows is not to use memory-mapped file for the transaction information tables.
What Java/Windows version are you using? Does it have the latest updates?
Here are two other related bugs with some useful insights:
Bug 4715154 - Memory mapped file cannot be deleted.
Bug 4469299 - Memory mapped files are not GC'ed.
As to your second question, that's just the autogenerated classname of an inner or anonymous class.

Assuming there are no issues with the code I have seen this occur when a virus scanner is running in the background which is cheerfully opening files to scan them behind the scenes. If you have a memory resident virus scanner that checks files in the background try disabling it, or at least disabling it for the directory you reading from/writing to.

Your code looks fine. Are you sure it's not related to file permission? Does the application has write privilege to this folder? To this file?
[EDIT] This seems to be Windows related, not Java The requested operation cannot be performed on a file with a user-mapped section open.

I had this issue with some tightly threaded Java code. I took a look at the referenced .NET conversation and the penny dropped. It is simply that I have contention for the same file, among different threads. Looking more closely the contention is (also) for some internals too. So my best course is to synchronize-d around the shared object when the update it.
This works and the error dissolves in the mist.
private static ShortLog tasksLog = new ShortLog( "filename" );
private static Boolean tasksLogLock = false;
...
synchronized( tasksLogLock ){
tasksLog.saveLastDatum( this.toString() );
}
see also:
Synchronization of non-final field

Your synchronization only protects against access from your own process. If you want to protect against accesses from any process, you have to use file locking:
http://download.oracle.com/javase/1.4.2/docs/api/java/nio/channels/FileLock.html

Related

Why does usage of java.nio.files.File::list is causing this breadth-first file traversal program to crash with the "Too many open files" error?

Assumption:
Streams are lazy, hence the following statement does not load the entire children of the directory referenced by the path into memory; instead it loads them one by one, and after each invocation of forEach, the directory referenced by p is eligible for garbage collection, so its file descriptor should also become closed:
Files.list(path).forEach(p ->
absoluteFileNameQueue.add(
p.toAbsolutePath().toString()
)
);
Based on this assumption, I have implemented a breadth-first file traversal tool:
public class FileSystemTraverser {
public void traverse(String path) throws IOException {
traverse(Paths.get(path));
}
public void traverse(Path root) throws IOException {
final Queue<String> absoluteFileNameQueue = new ArrayDeque<>();
absoluteFileNameQueue.add(root.toAbsolutePath().toString());
int maxSize = 0;
int count = 0;
while (!absoluteFileNameQueue.isEmpty()) {
maxSize = max(maxSize, absoluteFileNameQueue.size());
count += 1;
Path path = Paths.get(absoluteFileNameQueue.poll());
if (Files.isDirectory(path)) {
Files.list(path).forEach(p ->
absoluteFileNameQueue.add(
p.toAbsolutePath().toString()
)
);
}
if (count % 10_000 == 0) {
System.out.println("maxSize = " + maxSize);
System.out.println("count = " + count);
}
}
System.out.println("maxSize = " + maxSize);
System.out.println("count = " + count);
}
}
And I use it in a fairly straightforward way:
public class App {
public static void main(String[] args) throws IOException {
FileSystemTraverser traverser = new FileSystemTraverser();
traverser.traverse("/media/Backup");
}
}
The disk mounted in /media/Backup has about 3 million files.
For some reason, around the 140,000 mark, the program crashes with this stack trace:
Exception in thread "main" java.nio.file.FileSystemException: /media/Backup/Disk Images/Library/Containers/com.apple.photos.VideoConversionService/Data/Documents: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at java.nio.file.Files.list(Files.java:3451)
It seems to me for some reason the file descriptors are not getting closed or the Path objects are not garbage collected that causes the app to eventually crash.
System Details
OS: is Ubuntu 15.0.4
Kernel: 4.4.0-28-generic
ulimit: unlimited
File System: btrfs
Java runtime: tested with both of OpenJDK 1.8.0_91 and Oracle JDK 1.8.0_91
Any ideas what am I missing here and how can I fix this problem (without resorting to java.io.File::list (i.e. by staying within the ream of NIO2 and Paths)?
Update 1:
I doubt that JVM is keeping the file descriptors open. I took this heap dump around the 120,000 files mark:
Update 2:
I installed a file descriptor probing plugin in VisualVM and indeed it revealed that the FDs are not getting disposed of (as correctly pointed out by cerebrotecnologico and k5):
Seems like the Stream returned from Files.list(Path) is not closed correctly. In addition you should not be using forEach on a stream you are not certain it is not parallel (hence the .sequential()).
try (Stream<Path> stream = Files.list(path)) {
stream.map(p -> p.toAbsolutePath().toString()).sequential().forEach(absoluteFileNameQueue::add);
}
From the Java documentation:
"The returned stream encapsulates a DirectoryStream. If timely disposal of file system resources is required, the try-with-resources construct should be used to ensure that the stream's close method is invoked after the stream operations are completed"
The other answers give you the solution. I just want to correct this misapprehension in your question which is the root cause of your problem
... the directory referenced by p is eligible for garbage collection, so its file descriptor should also become closed.
This assumption is incorrect.
Yes, the directory (actually DirectoryStream) will be eligible for garbage collection. However, that does not mean that it will be garbage collected. The GC runs when the Java runtime system determines that it would be a good time to run it. Generally speaking, it takes no account of the number of open file descriptors that your application has created.
In other words, you should NOT rely on garbage collection and finalization to close resources. If you need a resource to be closed in a timely fashion, then your application should take care of this for itself. The "try-with-resources" construct is the recommended way to do it.
You commented:
I actually thought that because nothing references the Path objects and that their FDs are also closed, then the GC will remove them from the heap.
A Path object doesn't have a file descriptor. And if you look at the API, there isn't a Path.close() operation either.
The file descriptors that are being leaked in your example are actually associated with the DirectoryStream objects that are created by list(path). These objects will become eligible when the Stream.forEach() call completes.
My misunderstanding was that the FD of the Path objects are closed after each forEach invocation.
Well, that doesn't make sense; see above.
But even if it did make sense (i.e. if Path objects did have file descriptors), there is no mechanism for the GC to know that it needs to do something with the Path objects at that point.
Otherwise I know that the GC does not immediately remove eligible objects from the memory (hence the term "eligible").
That really >>is<< the root of the problem ... because the eligible file descriptor objects will >>only<< be finalized when the GC runs.

Resource leak in Files.list(Path dir) when stream is not explicitly closed?

I recently wrote a small app that periodically checked the content of a directory. After a while, the app crashed because of too many open file handles. After some debugging, I found the error in the following line:
Files.list(Paths.get(destination)).forEach(path -> {
// To stuff
});
I then checked the javadoc (I probably should have done that earlier) for Files.list and found:
* <p> The returned stream encapsulates a {#link DirectoryStream}.
* If timely disposal of file system resources is required, the
* {#code try}-with-resources construct should be used to ensure that the
* stream's {#link Stream#close close} method is invoked after the stream
* operations are completed
To me, "timely disposal" still sounds like the resources are going to be released eventually, before the app quits. I looked through the JDK (1.8.60) code but I wasn't able to find any hint about the file handles opened by Files.list being released again.
I then created a small app that explicitly calls the garbage collector after using Files.list like this:
while (true) {
Files.list(Paths.get("/")).forEach(path -> {
System.out.println(path);
});
Thread.sleep(5000);
System.gc();
System.runFinalization();
}
When I checked the open file handles with lsof -p <pid> I could still see the list of open file handles for "/" getting longer and longer.
My question now is: Is there any hidden mechanism that should eventually close no longer used open file handles in this scenario? Or are these resources in fact never disposed and the javadoc is a bit euphemistic when talking about "timely disposal of file system resources"?
If you close the Stream, Files.list() does close the underlying DirectoryStream it uses to stream the files, so there should be no resource leak as long as you close the Stream.
You can see where the DirectoryStream is closed in the source code for Files.list() here:
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.DISTINCT), false)
.onClose(asUncheckedRunnable(ds));
The key thing to understand is that a Runnable is registered with the Stream using Stream::onClose that is called when the stream itself is closed. That Runnable is created by a factory method, asUncheckedRunnable that creates a Runnable that closes the resource passed into it, translating any IOException thrown during the close() to an UncheckedIOException
You can safely assure that the DirectoryStream is closed by ensuring the Stream is closed like this:
try (Stream<Path> files = Files.list(Paths.get(destination))){
files.forEach(path -> {
// Do stuff
});
}
Regarding the IDE part: Eclipse performs resource leak analysis based on local variables (and explicit resource allocation expressions), so you only have to extract the stream to a local variable:
Stream<Path> files =Files.list(Paths.get(destination));
files.forEach(path -> {
// To stuff
});
Then Eclipse will tell you
Resource leak: 'files' is never closed
Behind the scenes the analysis works with a cascade of exceptions:
All Closeables need closing
java.util.stream.Stream (which is Closeable) does not need closing
All streams produced by methods in java.nio.file.Files do need closing
This strategy was developed in coordination with the library team when they discussed whether or not Stream should be AutoCloseable.
List<String> fileList = null;
try (Stream<Path> list = Files.list(Paths.get(path.toString()))) {
fileList =
list.filter(Files::isRegularFile).map(Path::toFile).map(File::getAbsolutePath)
.collect(Collectors.toList());
} catch (IOException e) {
logger.error("Error occurred while reading email files: ", e);
}

Major java libraries doesn't predict the case of “cyclic copy” of a file onto a destination differently mapped

In my experience and after repeated tests I've done and deep web researches, I've found that major java libraries (either "Apache Commons" or Google.coomons or Jcifs) doesn't predict the case of “cyclic copy” of a file onto a destination differently mapped (denoted with different RootPath according with newer java.nio package Path Class) that,at last end of mapping cycle,resolves into the itself origin file.
That's a situation of data losing, because Outputsream method nor jnio's GetChannel method prevents itself this case:the origin file and the destination file are in reality "the same file" and the result of these methods is that the file become lost, better said the size o file become 0 length.
How can one avoid this without get off at a lower filesystem level or even surrender to a more safe Runtime.exec, delegating the stuff at the underlying S.O.
Should I have to lock the destination file (the above methods not allowing this), perhaps with the aid of the oldest RandomAccessFile Class ?
You can test using those cited major libraries with a common "CopyFile(File origin,File dest)" method after having done:
1) the origin folder of file c:\tmp\test.txt mapped to to x: virtual drive via a cmd's [SUBST x: c:\tmp] thus trying to copy onto x:\test.txt
2) Similar case if the local folder c:\tmp has been shared via Windows share mechanism and the destination is represented as a UNC path ending with the same file name
3) Other similar network situations ...
I think there must be another better solution, but my experience of java is fairly few and so I ask for this to you all. Thanks in advance if interested in this “real world” discussion.
Your question is interesting, never thought about that. Look at this question: Determine Symbolic Links. You should detect the cycle before copying.
Perhaps you can try to approach this problem slightly differently and try to detect that source and destination files are the same by comparing file's metadata (name, size, date, etc) and perhaps even calculate hash of the files content as well. This would of course slow processing down.
If you have enough permissions you could also write 'marker' file with random name in destination and try to read it at the source to detect that they're pointing to the same place. Or try to check that file already exist at destination before copying.
I agree that it is unusual situations, but you will agree that files are a critical base of every IT system. I disagree that manipulating files in java is unusual: in my case I have to attach image files of products through FileChooser and copy them in ordered way to a repository ... but real world users (call them customers who buy your product) may fall in such situations and if it happens, one can not 'blame the devil of bad luck if your product does something "less" than expected.
It is a good practice learning from experience and try to avoid what one of Murphy's Laws says, more' or less: "if something CAN go wrong, it WILL go wrong sooner or later.
Is perhaps also for one of those a reason I believe the Java team at Sun and Oracle has enhanced the old java.io package for to the newest java.nio. I'm analyzing a the new java.nio.Files Class which I had escaped to attention, and soon I believe I've found the solution I wanted and expected. See you later.
Thank for the address from other experienced members of the community,and thanks also to a young member of my team, Tindaro, who helped me in the research, I've found the real solution in Jdk 1.7, which is made by reliable, fast, simple and almost definitively will spawn a pity veil on older java.io solutions. Despite the web is still plenty full of examples of copying files in java using In/out Streams I'll warmely suggest everyone to use a simple method : java.nio.Files.copy(Path origin, Path destination) with optional parameters for replacing destination,migrate metadata file attributes and even try a transactional move of files (if permitted by the underlying O.S.).
That's a really good Job, waited for so long!
You can easily convert code from copy(File file1, File file2) by appending a ".toPath()" to the File instance (e.g. file1.toPath(), file2.toPath().
Note also that the boolean method "isSameFile(file1.toPath(), file2.toPath())", is already used inside the above copy method but easily usable in every case you want.
For every case you can't upgrade to 1.7 using community libraries from Apache or Google is still suggested, but for reliable purpose, permit me to suggest the temporary workaround I've found before:
public static boolean isTheSameFile(File f1, File f2) {//throws Exception{
// minimum prerequisites !
if(f1.length()!=f2.length()) return false;
if (!file1.exists() || !file2.exists()) { return false; }
if (file1.isDirectory() || file2.isDirectory()){ return false; }
//if (file1.getCanonicalFile().equals(file2.getCanonicalFile())); //don't rely in this ! can even still fail
//new FileInputStream(f2).getChannel().lock();//exception, can lock only on OutputStream
RandomAccessFile rf1=null,rf2=null; //the only practicable solution on my own ... better than parse entire files
try {
rf1 = new RandomAccessFile(f1, "r");
rf2=new RandomAccessFile(f2, "rw");
} catch (FileNotFoundException e) {
e.printStackTrace();
return false;
}
try {
rf2.getChannel().lock();
} catch (IOException e) {
return false;
}
try {
rf1.getChannel().read(ByteBuffer.allocate(1));//reads 1 only byte
} catch (IOException e) {
//e.printStackTrace(); // if and if only the same file, the O.S. will throw an IOException with reason "file already in use"
try {rf2.close();} catch (IOException e1) {}
return true;
}
//close the still opened resources ...
if (rf1.getChannel().isOpen())
try {rf1.getChannel().close();} catch (IOException e) {}
try {
rf2.close();
} catch (IOException e) {
return false;
}
// done, files differs
return false;
}

Any sure fire way to check file existence on Linux NFS? [duplicate]

This question already has answers here:
Alternative to File.exists() in Java
(6 answers)
Closed 2 years ago.
I am working on a Java program that requires to check the existence of files.
Well, simple enough, the code make use calls to File.exists() for checking file existence. And the problem I have is, it reports false positive. That means the file does not actually exist but exists() method returns true. No exception was captured (at least no exception like "Stale NFS handle"). The program even managed to read the file through InputStream, getting 0 bytes as expected and yet no exception. The target directory is a Linux NFS. And I am 100% sure that the file being looked for never exists.
I know there are known bugs (kind of API limitation) exist for java.io.File.exists(). So I've then added another way round by checking file existence using Linux command ls. Instead of making call to File.exists() the Java code now runs a Linux command to ls the target file. If exit code is 0, file exists. Otherwise, file does not exist.
The number of times the issue is hit seems to be reduced with the introduction of the trick, but still pops. Again, no error was captured anywhere (stdout this time). That means the problem is so serious that even native Linux command won't fix for 100% of the time.
So there are couple of questions around:
I believe Java's well known issue on File.exists() is about reporting false negative. Where file was reported to not exist but in fact does exist. As the API does not throws IOException for File.exists(), it choose to swallow the Exception in the case calls to OS's underlying native functions failed e.g. NFS timeout. But then this does not explain the false positive case I am having, given that the file never exist. Any throw on this one?
My understanding on Linux ls exit code is, 0 means okay, equivalent to file exists. Is this understanding wrong? The man page of ls is not so clear on explaining the meaning of exit code: Exit status is 0 if OK, 1 if minor problems, 2 if serious trouble.
All right, back to subject. Any surefire way to check File existence with Java on Linux? Before we see JDK7 with NIO2 officially released.
Here is a JUnit test that shows the problem and some Java Code that actually tries to read the file.
The problem happens e.g. using Samba on OSX Mavericks. A possible reason
is explaned by the statement in:
http://appleinsider.com/articles/13/06/11/apple-shifts-from-afp-file-sharing-to-smb2-in-os-x-109-mavericks
It aggressively caches file and folder properties and uses opportunistic locking to enable better caching of data.
Please find below a checkFile that will actually attempt to read a few bytes and forcing a true file access to avoid the caching misbehaviour ...
JUnit test:
/**
* test file exists function on Network drive replace the testfile name and ssh computer
* with your actual environment
* #throws Exception
*/
#Test
public void testFileExistsOnNetworkDrive() throws Exception {
String testFileName="/Volumes/bitplan/tmp/testFileExists.txt";
File testFile=new File(testFileName);
testFile.delete();
for (int i=0;i<10;i++) {
Thread.sleep(50);
System.out.println(""+i+":"+OCRJob.checkExists(testFile));
switch (i) {
case 3:
// FileUtils.writeStringToFile(testFile, "here we go");
Runtime.getRuntime().exec("/usr/bin/ssh phobos /usr/bin/touch "+testFileName);
break;
}
}
}
checkExists source code:
/**
* check if the given file exists
* #param f
* #return true if file exists
*/
public static boolean checkExists(File f) {
try {
byte[] buffer = new byte[4];
InputStream is = new FileInputStream(f);
if (is.read(buffer) != buffer.length) {
// do something
}
is.close();
return true;
} catch (java.io.IOException fnfe) {
}
return false;
}
JDK7 was released a few months ago. There are exists and notExists methods in the Files class but they return a boolean rather than throwing an exception. If you really want an exception then use FileSystems.getDefault().provider().checkAccess(path) and it will throw an exception if the file does not exist.
If you need to be robust, try to read the file - and fail gracefully if the file is not there (or there is a permission or other problem). This applies to any other language than Java as well.
The only safe way to tell if the file exist and you can read from it is to actually read a data from the file. Regardless of a file system - local, or remote. The reason is a race condition which can occur right after you get success from checkAccess(path): check, then open file, and you find it suddenly does not exist. Some other thread (or another remote client) may have removed it, or has acquired an exclusive lock. So don't bother checking access, but rather try to read the file. Spending time in running ls just makes race condition window easier to fit.

Locking file across services

What is the best way to share a file between two "writer" services in the same application?
Edit:
Sorry I should have given more details I guess.
I have a Service that saves entries into a buffer. When the buffer gets full it writes all the entries to the file (and so on). Another Service running will come at some point and read the file (essentially copy/compress it) and then empty it.
Here is a general idea of what you can do:
public class FileManager
{
private final FileWriter writer = new FileWriter("SomeFile.txt");
private final object sync = new object();
public void writeBuffer(string buffer)
{
synchronized(sync)
{
writer.write(buffer.getBytes());
}
}
public void copyAndCompress()
{
synchronized(sync)
{
// copy and/or compress
}
}
}
You will have to do some extra work to get it all to work safe, but this is just a basic example to give you an idea of how it looks.
A common method for locking is to create a second file in the same location as the main file. The second file may contain locking data or be blank. The benefit to having locking data (such as a process ID) is that you can easily detect a stale lockfile, which is an inevitability you must plan for. Although PID might not be the best locking data in your case.
example:
Service1:
creates myfile.lock
creates/opens myfile
Service2:
Notices that myfile.lock is present and pauses/blocks/waits
When myfile.lock goes away, it creates it and then opens myfile.
It would also be advantageous for you to double-check that the file contains your locking information (identification specific to your service) right after creating it - just in case two or more services are waiting and create a lock at the exact same time. The last one succeeds and so all other services should notice that their locking data is no longer in the file. Also - pause a few milliseconds before checking its contents.

Categories