i got two java processes running on the same machine in parallel, i want these processes to append debugging data to the same file when the order at which they append is crucial and must be preserved. because these two processes share nothing but the OS itself, i think i need OS IO synchronization.
so how is it done on java?
Fortunately java 7 provides file lock. Take a look on this discussion: http://www.adme.ru/vdohnovenie-919705/samyj-sumasshedshij-tryuk-531205/
and javadoc: http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileLock.html
Related
I am using Java's file locking API on a Linux server machine and try to lock a file on a remote Linux NFS system. There are some issues that have popped and logs show that 2 different cluster nodes running the same Java webserver app are able to both acquire a lock on the same NFS file.
Reading online about how Java handles locks and Linux file locking (we usually deploy on Windows servers so there is very little Linux experience), what we do should work. We are essentially issuing advisory locks but since both cluster nodes run the same code they are cooperating processes and they check for lock acquisition before starting to do any read-write ops. However this does not seem to be the case and both systems are able to successfully acquire a lock on the file, concurrently.
Are those known issues? Many comments/articles online declare NFS file locking as unstable and should be avoided. Why is that? How would network connectivity issues (e.g. sudden communication drops) influence this behavior? Linux kernel version should be quite recent.
#StephenC so after some more testing, when calling RandomAccessFile.getChannel().tryLock() from a java main method it works fine over nfs4 but when the same code runs within Tomcat (8.5.68) multi-locking occurs.
OK. So I think I understand the root of your problem now. From what you have said, it sounds to me like you have are trying to use FileLock to stop one thread of your Tomcat JVM from locking a section of a file while another Tomcat thread has it locked.
That's not going to work.
The lock that you are using is a FileLock. A key paragraph of the javadoc states this:
File locks are held on behalf of the entire Java virtual machine. They are not suitable for controlling access to a file by multiple threads within the same virtual machine.
In this case, "not suitable" means "doesn't work".
If you drill down to the Linux manual page for flock (2) (which is used by Java to implement these locks), you will see that the semantics are defined in terms of multiple processes, not multiple threads. For example:
LOCK_EX Place an exclusive lock. Only one process may hold a shared lock for a given file at a given time.
and
A call to flock() may block if an incompatible lock is held by another process.
So, in summary, it is still not Java's fault. You are trying to use FileLock in a way that Java doesn't support ... and could not support, given how Linux (and indeed POSIX) flock is specified.
(IMO, all of the stuff about NFS is a red herring. The above problem is not caused by NFS. The reason that it shows up on an NFS file system, is that NFS operations take longer and therefore the time window for overlapping operations on the same file is much larger. And if your customer's use-case is hammering their NFS ...)
(But if I am wrong and NFS is implicated, then your "main vs Tomcat" observation is inexplicable. The JVM will not be doing file locking differently in those two cases: it will be using the same OpenJDK code in both cases. Furthermore, the JVM won't even be aware that it is talking to an NFS file system. You can take a look at the OpenJDK codebase if you don't believe me. It's not that hard ...)
See also:
How to lock file for another threads
Is FileLock in java safe across multiple threads within the same process or between different processes or both?
and so on.
I found the root cause of this issue. It seems that when two different threads of the same JVM create a RandomAccessFile object on the same file, calling RandomAccessFile.close from one thread, releases the lock the other thread has.
The documentation of RandomAccessFile.close says
Closes this random access file stream and releases any system resources associated with the stream.
I'm not sure if this means that the system resources are released on the JVM level. I would expect that each RandomAccessFile object would get its own stream and release only that one but it seems that is not the case (or at least the lock on that file gets released. Note that this behavior has not been observed on Windows systems.
The story:
A few days ago I was thinking about inter-process communication based on file exchange. Say process A creates several files during its work and process B reads these files afterwards. To ensure that all files were correctly written, it would be convenient to create a special file, which existence will signal that all operations were done.
Simple workflow:
process A creates file "file1.txt"
process A creates file "file2.txt"
process A creates file "processA.ready"
Process B is waiting until file "processA.ready" appears and then reads file1 and file2.
Doubts:
File operations are performed by the operating system, specifically by the file subsystem. Since implementations can differ in Unix, Windows or MacOS, I'm uncertain about the reliability of file exchange inter-process communication. Even if OS will guarantee this consistency, there are things like JIT compiler in Java, which can reorder program instructions.
Questions:
1. Are there any real specifications on file operations in operating systems?
2. Is JIT really allowed to reorder file operation program instructions for a single program thread?
3. Is file exchange still a relevant option for inter-process communication nowadays or it is unconditionally better to choose TCP/HTTP/etc?
You don’t need to know OS details in this case. Java IO API is documented to guess whether file was saved or not.
JVM can’t reorder native calls. It is not written in JMM explicitly but it is implied that it can’t do it. JVM can’t guess what is impact of native call and reordering of those call can be quite generous.
There are some disadvantages of using files as a way of communication:
It uses IO which is slow
It is difficult to separate processes between different machines in case you would need it (there are ways using samba for example but is quite platform-dependant)
You could use File watcher (WatchService) in Java to receive a signal when your .ready file appears.
Reordering could apply but it shouldn't hurt your application logic in this case - refer the following link:
https://assylias.wordpress.com/2013/02/01/java-memory-model-and-reordering/
I don't know the size of your data but I feel it would still be better to use an Message Queue (MQ) solution in this case. Using a File IO is a relatively slow operation which could slow down the system.
Used file exchange based approach on one of my projects. It's based on renaming file extensions when a process is done so other process can retrieve it by file name expression checking.
FTP process downloads a file and put its name '.downloaded'
Main task processor searched directory for the files '*.downloaded'.
Before starting, job updates file name as '.processing'.
When finished then updates to '.done'.
In case of error, it creates a new supplemantary file with '.error' extension and put last processed line and exception trace there. On retries, if this file exists then read it and resume from correct position.
Locator process searches for '.done' and according to its config move to backup folder or delete
This approach is working fine with a huge load in a mobile operator network.
Consideration point is to using unique names for files is important. Because moving file's behaviour changes according to operating system.
e.g. Windows gives error when there is same file at destination, however unix ovrwrites it.
I was wondering if anybody knew with any certainty whether ProcessBuilder/Runtime.exec() executes inside the space of the JVM's memory or whether it uses completely separate system memory and somehow sends the output to Java. I could not find any documentation on the subject.
I assume it is the former due to security issues and being able to read output, but I would like to make absolutely sure.
The new process runs outside the Java process that started it. Allocation of memory to the new process is managed by the operating system, as part of process management.
The Java class ProcessBuilder, which provides an interface for starting and communicating with the new process, runs inside the Java process.
Seems pretty clear that exec launches a new process, or program for those not versed in operating system terminology. That's why it has input output facilities, ability to set the environment, and ability to wait on the external program returning.
The first line of the javadoc says it all.
Executes the specified string command in a separate process.
The command argument is parsed into tokens and then executed as a command in a
separate process. The token parsing is done by a StringTokenizer created by the
call:
new StringTokenizer(command)
with no further modifications of the character categories. This method has exactly
the same effect as exec(command, null).
From the concurrency reference of Java SE, it is said that:
A process has a self-contained execution environment. A process
generally has a complete, private set of basic run-time resources; in
particular, each process has its own memory space.
If you are interested in the internals, check the UNIXProcess class from the openJDK.
Is there any way in java to read a file's content, which is being updated by another handler before closing it?
That depends on the operating systems.
Traditionally, POSIX-y operating systems (Linux, Solaris, ...) have absolutely no problem with having a file open for both reading and writing, even by separate processes (they even support deleting a file while it's being read from and/or written to).
In Windows, the more common approach is to open files exclusively (contrary to common believe, Windows does support non-exclusive file access, it's just rarely used by applications).
Java has no way* of specifying what way you want to access a file, so the platform default is used (shared access on Linux/Solaris, exclusive access on Windows).
* This might be wrong for NIO and new NIO in Java 7, but I'm not a big NIO expert.
In theory its quite easy to do, however files are not designed to exchange data this way and depending on your requirements it can be quite tricky to get right. This is why there is no general solution for this.
e.g. if you want to read a file as another process writes to it, the reading thread will see an EOF even though the writer hasn't finished. You have to re-open the file and skip to where the file was last read and continue. The writing thread might roll the files it is writing meaning the reading has to detect this and handle it.
What specificity do you want to do?
What happens when you concurrently open two (or more) FileOutputStreams on the same file?
The Java API says this:
Some platforms, in particular, allow a file to be opened for writing by only one FileOutputStream (or other file-writing object) at a time.
I'm guessing Windows isn't such a platform, because I have two threads that read some big file (each one a different one) then write it to the same output file. No exception is thrown, the file is created and seems to contain chunks from both input files.
Side questions:
Is this true for Unix, too?
And since I want the behaviour to be the same (actually I want one thread to write correctly and the other to be warned of the conflict), how can I determine that the file is already opened for writing?
There's not a reliable, cross-platform way to be passively notified when a file has another writer—i.e., raise an exception if a file is already open for writing. There are a couple of techniques that help you actively check for this, however.
If multiple processes (which can be a mix of Java and non-Java) might be using the file, use a FileLock. A key to using file locks successfully is to remember that they are only "advisory". The lock is guaranteed to be visible if you check for it, but it won't stop you from doing things to the file if you forget. All processes that access the file should be designed to use the locking protocol.
If a single Java process is working with the file, you can use the concurrency tools built into Java to do it safely. You need a map visible to all threads that associates each file name with its corresponding lock instance. The answers to a related question can be adapted easily to do this with File objects or canonical paths to files. The lock object could be a FileOutputStream, some wrapper around the stream, or a ReentrantReadWriteLock.
I would be wary of letting the OS determine file status for you (since this is OS-dependent). If you've got a shared resource I would restrict access to it using a Re-entrant lock
Using this lock means one thread can get the resource (file) and write to it. The next thread can check for this lock being held by another thread, and/or block indefinitely until the first thread releases it.
Windows (I think) would restrict two processes writing to the same file. I don't believe Unix would do the same.
If the 2 threads you are talking about are in the same JVM, then you could have a boolean variable somewhere that is accessed by both threads.
Unix allows concurrent writers to the same file.
You shouldn't be attempting to write to the same file more than once. If you are you have a design flaw.