Publisher-Subscriber setup with WatchService - java

I am trying to setup a 2 way publisher-subscriber using the WatchService in NIO.
I'm not terribly experienced with threads, so if I'm not making any sense feel free to call me out on it!
This is only a sample to figure out how the library works, but the production code is going to listen for a change in an input file, and when the file changes it will do some calculations and then write to an output file. This output file will be read by another program, some calculations will be run on it. The input file will then be written to and the cycle continues.
For this test though, I am making 2 threads with watchers, the first thread listens on first.txt and writes to second.txt, and the second thread waits on second.txt and writes to first.txt. All that I am doing is incrementing a count variable and writing to each thread's output file. Both of the threads have blocking calls and filters on what files they actually care about, so I figured the behavior would look like
Both threads are waiting on take() call.
Change first.txt to start the process
This triggers the first thread to change second.txt
Which then triggers the second thread to change first.txt
and so on.
Or so I hoped. The end result is that the threads get way out of sync and when I do this for count up to 1000, one thread is usually behind by more than 50 points.
Here is the code for the watcher
Watcher(Path input, Path output) throws IOException {
this.watcher = FileSystems.getDefault().newWatchService();
this.input = input;
this.output = output;
dir = input.getParent();
dir.register(watcher, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY);
}
void watchAndRespond() throws IOException, InterruptedException {
while (count < 1000) {
WatchKey key = watcher.take();
for (WatchEvent<?> event: key.pollEvents()) {
if (! event.context().equals(input.getFileName())) {
continue;
}
WatchEvent.Kind kind = event.kind();
if (kind == OVERFLOW) {
continue;
}
count++;
try (BufferedWriter out = new BufferedWriter(new FileWriter(output.toFile()))) {
out.write(count + "");
}
}
key.reset();
}
}
I don't want to have to read the file to decide whether or not the file has changed, because these files in production could potentially be large.
I feel like maybe it is too complicated and I'm trying to deal with a scraped knee with amputation. Am I using this library incorrectly? Am I using the wrong tool for this job, and if so are there any other file listening libraries that I can use so I don't have to do polling for last edited?
EDIT:
Oops, here is the test I wrote that sets up the two threads
#Test
public void when_two_watchers_run_together_they_end_up_with_same_number_of_evaluation() throws InterruptedException, IOException {
//setup
Path input = environment.loadResourceAt("input.txt").asPath();
Path output = environment.loadResourceAt("output.txt").asPath();
if (Files.exists(input)) {
Files.delete(input);
}
if (Files.exists(output)) {
Files.delete(output);
}
Thread thread1 = makeThread(input, output, "watching input");
Thread thread2 = makeThread(output, input, "watching output");
//act
thread1.start();
thread2.start();
Thread.sleep(50);
BufferedWriter out = new BufferedWriter(new FileWriter(input.toFile()));
out.write(0 + "");
out.close();
thread1.join();
thread2.join();
int inputResult = Integer.parseInt(Files.readAllLines(input).get(0));
int outputResult = Integer.parseInt(Files.readAllLines(output).get(0));
//assert
assertThat(inputResult).describedAs("Expected is output file, Actual is input file").isEqualTo(outputResult);
}
public Thread makeThread(Path input, Path output, String threadName) {
return new Thread(() ->
{
try {
new Watcher(input, output).watchAndRespond();
}
catch (IOException | InterruptedException e) {
fail();
}
}, threadName);
}
I think the problem is that some of the modifications are putting multiple events into the queue and at this point I have no way to discern whether or not they are 2 events that were created by one save or 2 separate saves.

The tool seems to be quite right but your code will have to be flow in sequence else everything is going to be out of sync as you have noticed.
Look at it as a transaction which has to be completed before another transaction starts.
In this case the transaction can boil down to
1.) Detect File1 change
2.) Modify File2
3.) Detect File2 change
4.) Modify File1
So before this cycle ends completely if another cycle starts then there will be trouble. When you are using Threads, the scheduling and execution is not entirely predictable so you would not know what your 2 threads are doing.
Whether they are doing things sequentially as per your requirements.
So you would have to share your Thread code for anyone to give a specific
solution.
Another point is can you keep a small change file which contains the change
and use that instead of using the bigger production file. That
way you can reduce the focus to a smaller object.

Here is something more specific I noticed after running your code.
Its quite fine...some points though...there is no need for
thread1.join();
thread2.join();
Both the threads are required to run concurrently so join is not
needed.
For your main question ...the threads are out of sync because they
are connected to their own watcher objects and so the count value
is different for both the threads.
So depending on how the scheduler runs the threads...one of them
will get more mileage and will reach count 1000 first while the
other is still lagging behind.
I am editing in response to your comment....Take is a blocking call and
it is working perfectly. In my case the only event trapped is ENTRY_MODIFY
so no multiple event issue.
One tip is you can set dir.register(watcher, ENTRY_MODIFY); in code
to check only for modify events. Please see my code below..also
my printlns may help to get better understanding of the code flow.
public class WatcherTest {
public static void main(String args[]) throws InterruptedException, IOException {
Path input = FileSystems.getDefault().getPath("txt", "input.txt");
Path output = FileSystems.getDefault().getPath("txt", "output.txt");
if (Files.exists(input)) {
Files.delete(input);
}
if (Files.exists(output)) {
Files.delete(output);
}
Thread thread1 = new Thread(new WatchFileTask(input, output ), "ThreadToOpt");
Thread thread2 = new Thread(new WatchFileTask(output, input ), "ThreadToInpt");
thread1.start();
thread2.start();
Thread.sleep(100);
BufferedWriter out = new BufferedWriter(new FileWriter(input.toFile()));
out.write(0 + "");
out.close();
//thread1.join();
//thread2.join();
//int inputResult = Integer.parseInt(Files.readAllLines(input, Charset.defaultCharset()).get(0));
// int outputResult = Integer.parseInt(Files.readAllLines(output, Charset.defaultCharset()).get(0));
}
}
class FileWatcherService {
private WatchService watcher;
private Path input;
private Path output;
private Path dir;
private int count = 0;
FileWatcherService(Path input, Path output) throws IOException {
this.watcher = FileSystems.getDefault().newWatchService();
this.input = input;
this.output = output;
Path dir = input.getParent();
dir.register(watcher, ENTRY_MODIFY);
}
void watchAndRespond() throws IOException, InterruptedException {
while (count < 1000) {
System.out.println("\n COUNT IS " + count + " in Thread " + Thread.currentThread().getName());
System.out.println("\n Blocking on Take in Thread " + Thread.currentThread().getName());
WatchKey key = watcher.take();
System.out.println("\n Out of Blocking State " + Thread.currentThread().getName());
int eventsPassed = 0;
for (WatchEvent<?> event: key.pollEvents()) {
if (!event.context().equals(input.getFileName())) {
continue;
}
System.out.println("\n File Context : " + event.context() + " Event Kind " + event.kind() + " in Thread " + Thread.currentThread().getName());
WatchEvent.Kind kind = event.kind();
if (kind == OVERFLOW) {
continue;
}
eventsPassed++;
count++;
//synchronized(output){
try (BufferedWriter out = new BufferedWriter(new FileWriter(output.toFile()))) {
out.write(count + "");
System.out.println("\n Wrote count : " + count + " to File " + output.getFileName() + " in Thread " + Thread.currentThread().getName());
}
// }
}
System.out.println("\n The eventsPassed counter is " + eventsPassed + " \n for thread " + Thread.currentThread().getName());
key.reset();
}
}
}

Related

process.waitFor(timeout, timeUnit) does not quit the process after specified time

I'm trying to execute a visual basic script code in my java application using process builder. As script provided by the user might not finish its execution in time, I want to provide means to limit this execution time. In the following code, you can see my logic but it doesn't really do what it supposed to do. How can I make this waitfor work in order to limit the execution time?
private void run(String scriptFilePath) throws ScriptPluginException {
BufferedReader input = null;
BufferedReader error = null;
try {
ProcessBuilder p = new ProcessBuilder("cscript.exe", "//U", "\"" + scriptFilePath + "\"");
String path = "";
if (scriptFilePath.indexOf("/") != -1) {
path = scriptFilePath.substring(0, scriptFilePath.lastIndexOf("/"));
}
path += "/" + "tempvbsoutput.txt";
p.redirectOutput(new File(path));
Process pp = p.start();
try {
pp.waitFor(executionTimeout, TimeUnit.MINUTES);
} catch (InterruptedException e) {
SystemLog.writeError(jobId, ScriptConsts.COMPONENT_ID, "VBScriptExecutor", "run", 80401104,
"VB Script executes fail.");
}
if (!pp.isAlive()) {
pp.getOutputStream().close();
}
// rest of the code flow
}
Process.waitFor(long, TimeUnit) waits until the process has terminated or the specified time elapsed (Javadoc). The return value indicates whether the process exited or not.
if (process.waitFor(1, TimeUnit.MINUTES)) {
System.out.println("process exited");
} else {
System.out.println("process is still running");
}
waitFor() does not kill the process after the time elapsed.
If you want to kill the subprocess, use either destroy() or destroyForcibly().

Process large text file concurrently

So I have a large text file, in this case it's roughly 4.5 GB, and I need to process the entire file as fast as is possible. Right now I have multi-threaded this using 3 threads (not including the main thread). An input thread for reading the input file, a processing thread to process the data, and an output thread to output the processed data to a file.
Currently, the bottleneck is the processing section. Therefore, I'd like to add more processing threads into the mix. However, this creates a situation where I've got multiple threads accessing the same BlockingQueue, and their results are therefore not maintaining the order of the input file.
An example of the functionality I'm looking for would be something like this:
Input file: 1, 2, 3, 4, 5
Output file: ^ the same. Not 2, 1, 4, 3, 5 or any other combination.
I've written a dummy program that is identical in functionality to the actual program minus the processing part, (I can't give you the actual program due to the processing class containing info that is confidential). I should also mention, all of the classes (Input, Processing, and Output) are all Inner classes contained within a Main class that contains the initialise() method and the class level variables mentioned in the main thread code listed below.
Main thread:
static volatile boolean readerFinished = false; // class level variables
static volatile boolean writerFinished = false;
private void initialise() throws IOException {
BlockingQueue<String> inputQueue = new LinkedBlockingQueue<>(1_000_000);
BlockingQueue<String> outputQueue = new LinkedBlockingQueue<>(1_000_000); // capacity 1 million.
String inputFileName = "test.txt";
String outputFileName = "outputTest.txt";
BufferedReader reader = new BufferedReader(new FileReader(inputFileName));
BufferedWriter writer = new BufferedWriter(new FileWriter(outputFileName));
Thread T1 = new Thread(new Input(reader, inputQueue));
Thread T2 = new Thread(new Processing(inputQueue, outputQueue));
Thread T3 = new Thread(new Output(writer, outputQueue));
T1.start();
T2.start();
T3.start();
while (!writerFinished) {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
reader.close();
writer.close();
System.out.println("Exited.");
}
Input thread: (Please forgive the commented debug code, was using it to ensure the reader thread was actually executing properly).
class Input implements Runnable {
BufferedReader reader;
BlockingQueue<String> inputQueue;
Input(BufferedReader reader, BlockingQueue<String> inputQueue) {
this.reader = reader;
this.inputQueue = inputQueue;
}
#Override
public void run() {
String poisonPill = "ChH92PU2KYkZUBR";
String line;
//int linesRead = 0;
try {
while ((line = reader.readLine()) != null) {
inputQueue.put(line);
//linesRead++;
/*
if (linesRead == 500_000) {
//batchesRead += 1;
//System.out.println("Batch read");
linesRead = 0;
}
*/
}
inputQueue.put(poisonPill);
} catch (IOException | InterruptedException e) {
e.printStackTrace();
}
readerFinished = true;
}
}
Processing thread: (Normally this would actually be doing something to the line, but for purposes of the mockup I've just made it immediately push to the output thread). If necessary we can simulate it doing some work by making the thread sleep for a small amount of time for each line.
class Processing implements Runnable {
BlockingQueue<String> inputQueue;
BlockingQueue<String> outputQueue;
Processing(BlockingQueue<String> inputQueue, BlockingQueue<String> outputQueue) {
this.inputQueue = inputQueue;
this.outputQueue = outputQueue;
}
#Override
public void run() {
while (true) {
try {
if (inputQueue.isEmpty() && readerFinished) {
break;
}
String line = inputQueue.take();
outputQueue.put(line);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
Output thread:
class Output implements Runnable {
BufferedWriter writer;
BlockingQueue<String> outputQueue;
Output(BufferedWriter writer, BlockingQueue<String> outputQueue) {
this.writer = writer;
this.outputQueue = outputQueue;
}
#Override
public void run() {
String line;
ArrayList<String> outputList = new ArrayList<>();
while (true) {
try {
line = outputQueue.take();
if (line.equals("ChH92PU2KYkZUBR")) {
for (String outputLine : outputList) {
writer.write(outputLine);
}
System.out.println("Writer finished - executing termination");
writerFinished = true;
break;
}
line += "\n";
outputList.add(line);
if (outputList.size() == 500_000) {
for (String outputLine : outputList) {
writer.write(outputLine);
}
System.out.println("Writer wrote batch");
outputList = new ArrayList<>();
}
} catch (IOException | InterruptedException e) {
e.printStackTrace();
}
}
}
}
So right now the general data flow is very linear, looking something like this:
Input > Processing > Output.
But what I'd like to have is something like this:
But the catch is, when the data gets to output, it either needs to be sorted into the correct order, or it needs to already be in the correct order.
Recommendations or examples on how to go about this would be greatly appreciated.
In the past I have used the Future and Callable interfaces to solve a task involving parallel data flows like this, but unfortunately that code was not reading from a single queue, and so is of minimal help here.
I should also add, for those of you that will notice this, batchSize and poisonPill are normally defined in the main thread and then passed around via variables, they are not usually hard coded as they are in the code for Input thread, and the output checks for the writer thread. I was just a wee bit lazy when writing the mockup for experimentation at ~1am.
Edit: I should also mention, this is required to use Java 8 at most. Java 9 features and above cannot be used due to these versions not being installed in the environments in which this program will be run.
What you could do:
Take X threads for processing, where X is the number of cores available for processing
Give each thread its own input queue.
The reader thread gives records to each thread's input queue round-robin in a predictable fashion.
Since the output files are too big for memory, you write X output files, one for each thread, and each file name has the index of the thread in it, so that you can reconstitute the original order from the file names.
After the process is complete, you merge the X output files. One line from the file for thread 1, one from the files for thread 2, etc. in a round-robin fashion again. This reconstitutes the original order.
As an added bonus, since you have an input queue per thread, you don't have lock contention on the queue between readers. (only between the reader and the writer) You could even optimize this by putting things in the input queues in batches larger than 1.
As was also proposed by Alexei, you can create OrderedTask:
class OrderedTask implements Comparable<OrderedTask> {
private final Integer index;
private final String line;
public OrderedTask(Integer index, String line) {
this.index = index;
this.line = line;
}
#Override
public int compareTo(OrderedTask o) {
return index < o.getIndex() ? -1 : index == o.getIndex() ? 0 : 1;
}
public Integer getIndex() {
return index;
}
public String getLine() {
return line;
}
}
As an output queue you can use your own backed by priority queue:
class OrderedTaskQueue {
private final ReentrantLock lock;
private final Condition waitForOrderedItem;
private final int maxQueuesize;
private final PriorityQueue<OrderedTask> backedQueue;
private int expectedIndex;
public OrderedTaskQueue(int maxQueueSize, int startIndex) {
this.maxQueuesize = maxQueueSize;
this.expectedIndex = startIndex;
this.backedQueue = new PriorityQueue<>(2 * this.maxQueuesize);
this.lock = new ReentrantLock();
this.waitForOrderedItem = this.lock.newCondition();
}
public boolean put(OrderedTask item) {
ReentrantLock lock = this.lock;
lock.lock();
try {
while (this.backedQueue.size() >= maxQueuesize && item.getIndex() != expectedIndex) {
this.waitForOrderedItem.await();
}
boolean result = this.backedQueue.add(item);
this.waitForOrderedItem.signalAll();
return result;
} catch (InterruptedException e) {
throw new RuntimeException();
} finally {
lock.unlock();
}
}
public OrderedTask take() {
ReentrantLock lock = this.lock;
lock.lock();
try {
while (this.backedQueue.peek() == null || this.backedQueue.peek().getIndex() != expectedIndex) {
this.waitForOrderedItem.await();
}
OrderedTask result = this.backedQueue.poll();
expectedIndex++;
this.waitForOrderedItem.signalAll();
return result;
} catch (InterruptedException e) {
throw new RuntimeException();
} finally {
lock.unlock();
}
}
}
StartIndex is the index of the first ordered task, and
maxQueueSize is used to stop processing of other tasks (not to fill the memory), when we wait for some earlier task to finish. It should be double/tripple of the number of processing thread, to not stop the processing immediatelly and allow the scalability.
Then you should create your task :
int indexOrder =0;
while ((line = reader.readLine()) != null) {
inputQueue.put(new OrderedTask(indexOrder++,line);
}
The line by line is only used because of your example. You should change the OrderedTask to support the batch of lines.
Why not reverse the flow ?
Output call for X batches;
Generate X promise/task (promise pattern) who will call randomly one of the processing core (keep a batch number, to pass through to the input core); batch the calls handler into a ordered list;
Each processing core call for a batch in the input core;
Enjoy ?

Traversing and enumerating a directory with multi-threads

I am running a thread to traverse my local directory (no sub directory) and as soon as I am getting a text file, I am starting a new thread which will search a word in that file.
What is wrong in the below code?
Searching and traversing are working fine, separately. But when I am putting it together, some thing is going wrong, it is skipping some files (Not exactly, due to multithreading object sunchronization is not happening properly).
Please help me out.
Traverse.java
public void executeTraversing() {
Path dir = null;
if(dirPath.startsWith("file://")) {
dir = Paths.get(URI.create(dirPath));
} else {
dir = Paths.get(dirPath);
}
listFiles(dir);
}
private synchronized void listFiles(Path dir) {
ExecutorService executor = Executors.newFixedThreadPool(1);
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
for (Path file : stream) {
if (Files.isDirectory(file)) {
listFiles(file);
} else {
search.setFileNameToSearch(file);
executor.submit(search);
}
}
} catch (IOException | DirectoryIteratorException x) {
// IOException can never be thrown by the iteration.
// In this snippet, it can only be thrown by
// newDirectoryStream.
System.err.println(x);
}
}
Search.java
/**
* #param wordToSearch
*/
public Search(String wordToSearch) {
super();
this.wordToSearch = wordToSearch;
}
public void run() {
this.search();
}
private synchronized void search() {
counter = 0;
Charset charset = Charset.defaultCharset();
try (BufferedReader reader = Files.newBufferedReader(fileNameToSearch.toAbsolutePath(), charset)) {
// do you have permission to read this directory?
if (Files.isReadable(fileNameToSearch)) {
String line = null;
while ((line = reader.readLine()) != null) {
counter++;
//System.out.println(wordToSearch +" "+ fileNameToSearch);
if (line.contains(wordToSearch)) {
System.out.println("Word '" + wordToSearch
+ "' found at "
+ counter
+ " in "
+ fileNameToSearch);
}
}
} else {
System.out.println(fileNameToSearch
+ " is not readable.");
}
} catch (IOException x) {
System.err.format("IOException: %s%n", x);
}
}
this Search instance that you keep reusing here:
search.setFileNameToSearch(file);
executor.submit(search);
while its actual search() method is synchronized, it appears like by the time it actually gets to searching something setFileNameToSearch() would have been called several times, which would explain the skipping.
create a new instance of Search each time, then you wouldnt need to sync the actual search() function.
You are creating the ExecutorService inside your listFiles method, this is probably not a good idea: because of that you're probably creating too many threads.
On top of that you're not monitoring the state of all these ExecutorServices, some of them might not be started when you application stops
Instead you should create the ExecutorService only once, before starting the recursion. When the recursion is over, call shutdown() on your ExecutorService to wait for all tasks completion
Furthermore you are reusing a Search object and passing it to mutliple tasks while modifying it, you should create a Search for each file you're processing

Thread interrupt not ending blocking call on input stream read

I'm using RXTX to read data from a serial port. The reading is done within a thread spawned in the following manner:
CommPortIdentifier portIdentifier = CommPortIdentifier.getPortIdentifier(port);
CommPort comm = portIdentifier.open("Whatever", 2000);
SerialPort serial = (SerialPort)comm;
...settings
Thread t = new Thread(new SerialReader(serial.getInputStream()));
t.start();
The SerialReader class implements Runnable and just loops indefinitely, reading from the port and constructing the data into useful packages before sending it off to other applications. However, I've reduced it down to the following simplicity:
public void run() {
ReadableByteChannel byteChan = Channels.newChannel(in); //in = InputStream passed to SerialReader
ByteBuffer buffer = ByteBuffer.allocate(100);
while (true) {
try {
byteChan.read(buffer);
} catch (Exception e) {
System.out.println(e);
}
}
}
When a user clicks a stop button, the following functionality fires that should in theory close the input stream and break out of the blocking byteChan.read(buffer) call. The code is as follows:
public void stop() {
t.interrupt();
serial.close();
}
However, when I run this code, I never get a ClosedByInterruptException, which SHOULD fire once the input stream closes. Furthermore, the execution blocks on the call to serial.close() -- because the underlying input stream is still blocking on the read call. I've tried replacing the interrupt call with byteChan.close(), which should then cause an AsynchronousCloseException, however, I'm getting the same results.
Any help on what I'm missing would be greatly appreciated.
You can't make a stream that doesn't support interruptible I/O into an InterruptibleChannel simply by wrapping it (and, anyway, ReadableByteChannel doesn't extend InterruptibleChannel).
You have to look at the contract of the underlying InputStream. What does SerialPort.getInputStream() say about the interruptibility of its result? If it doesn't say anything, you should assume that it ignores interrupts.
For any I/O that doesn't explicitly support interruptibility, the only option is generally closing the stream from another thread. This may immediately raise an IOException (though it might not be an AsynchronousCloseException) in the thread blocked on a call to the stream.
However, even this is extremely dependent on the implementation of the InputStream—and the underlying OS can be a factor too.
Note the source code comment on the ReadableByteChannelImpl class returned by newChannel():
private static class ReadableByteChannelImpl
extends AbstractInterruptibleChannel // Not really interruptible
implements ReadableByteChannel
{
InputStream in;
⋮
The RXTX SerialInputStream (what is returned by the serial.getInputStream() call) supports a timeout scheme that ended up solving all my problems. Adding the following before creating the new SerialReader object causes the reads to no longer block indefinitely:
serial.enableReceiveTimeout(1000);
Within the SerialReader object, I had to change a few things around to read directly from the InputStream instead of creating the ReadableByteChannel, but now, I can stop and restart the reader without issue.
i am using the code below to shutdown rxtx. i run tests that start them up and shut them down and the seems to work ok. my reader looks like:
private void addPartsToQueue(final InputStream inputStream) {
byte[] buffer = new byte[1024];
int len = -1;
boolean first = true;
// the read can throw
try {
while ((len = inputStream.read(buffer)) > -1) {
if (len > 0) {
if (first) {
first = false;
t0 = System.currentTimeMillis();
} else
t1 = System.currentTimeMillis();
final String part = new String(new String(buffer, 0, len));
queue.add(part);
//System.out.println(part + " " + (t1 - t0));
}
try {
Thread.sleep(sleep);
} catch (InterruptedException e) {
//System.out.println(Thread.currentThread().getName() + " interrupted " + e);
break;
}
}
} catch (IOException e) {
System.err.println(Thread.currentThread().getName() + " " + e);
//if(interruSystem.err.println(e);
e.printStackTrace();
}
//System.out.println(Thread.currentThread().getName() + " is ending.");
}
thanks
public void shutdown(final Device device) {
shutdown(serialReaderThread);
shutdown(messageAssemblerThread);
serialPort.close();
if (device != null)
device.setSerialPort(null);
}
public static void shutdown(final Thread thread) {
if (thread != null) {
//System.out.println("before intterupt() on thread " + thread.getName() + ", it's state is " + thread.getState());
thread.interrupt();
//System.out.println("after intterupt() on thread " + thread.getName() + ", it's state is " + thread.getState());
try {
Thread.sleep(100);
} catch (InterruptedException e) {
System.out.println(Thread.currentThread().getName() + " was interrupted trying to sleep after interrupting" + thread.getName() + " " + e);
}
//System.out.println("before join() on thread " + thread.getName() + ", it's state is " + thread.getState());
try {
thread.join();
} catch (InterruptedException e) {
System.out.println(Thread.currentThread().getName() + " join interruped");
}
//System.out.println(Thread.currentThread().getName() + " after join() on thread " + thread.getName() + ", it's state is" + thread.getState());
}

In Java, what is the best/safest pattern for monitoring a file being appended to?

Someone else's process is creating a CSV file by appending a line at a time to it, as events occur. I have no control over the file format or the other process, but I know it will only append.
In a Java program, I would like to monitor this file, and when a line is appended read the new line and react according to the contents. Ignore the CSV parsing issue for now. What is the best way to monitor the file for changes and read a line at a time?
Ideally this will use the standard library classes. The file may well be on a network drive, so I'd like something robust to failure. I'd rather not use polling if possible - I'd prefer some sort of blocking solution instead.
Edit -- given that a blocking solution is not possible with standard classes (thanks for that answer), what is the most robust polling solution? I'd rather not re-read the whole file each time as it could grow quite large.
Since Java 7 there has been the newWatchService() method on the FileSystem class.
However, there are some caveats:
It is only Java 7
It is an optional method
it only watches directories, so you have to do the file handling yourself, and worry about the file moving etc
Before Java 7 it is not possible with standard APIs.
I tried the following (polling on a 1 sec interval) and it works (just prints in processing):
private static void monitorFile(File file) throws IOException {
final int POLL_INTERVAL = 1000;
FileReader reader = new FileReader(file);
BufferedReader buffered = new BufferedReader(reader);
try {
while(true) {
String line = buffered.readLine();
if(line == null) {
// end of file, start polling
Thread.sleep(POLL_INTERVAL);
} else {
System.out.println(line);
}
}
} catch(InterruptedException ex) {
ex.printStackTrace();
}
}
As no-one else has suggested a solution which uses a current production Java I thought I'd add it. If there are flaws please add in comments.
You can register to get notified by the file system if any change happens to the file using WatchService class. This requires Java7, here the link for the documentation http://docs.oracle.com/javase/tutorial/essential/io/notification.html
here the snippet code to do that:
public FileWatcher(Path dir) {
this.watcher = FileSystems.getDefault().newWatchService();
WatchKey key = dir.register(watcher, ENTRY_MODIFY);
}
void processEvents() {
for (;;) {
// wait for key to be signalled
WatchKey key;
try {
key = watcher.take();
} catch (InterruptedException x) {
return;
}
for (WatchEvent<?> event : key.pollEvents()) {
WatchEvent.Kind<?> kind = event.kind();
if (kind == OVERFLOW) {
continue;
}
// Context for directory entry event is the file name of entry
WatchEvent<Path> ev = cast(event);
Path name = ev.context();
Path child = dir.resolve(name);
// print out event
System.out.format("%s: %s file \n", event.kind().name(), child);
}
// reset key and remove from set if directory no longer accessible
boolean valid = key.reset();
}
}
This is not possible with standard library classes. See this question for details.
For efficient polling it will be better to use Random Access. It will help if you remember the position of the last end of file and start reading from there.
Use Java 7's WatchService, part of NIO.2
The WatchService API is designed for applications that need to be notified about file change events.
Just to expand on Nick Fortescue's last entry, below are two classes that you can run concurrently (e.g. in two different shell windows) which shows that a given File can simultaneously be written to by one process and read by another.
Here, the two processes will be executing these Java classes, but I presume that the writing process could be from any other application. (Assuming that it does not hold an exclusive lock on the file-are there such file system locks on certain operating systems?)
I have successfully tested these two classes on both Windoze and Linux. I would very much like to know if there is some condition (e.g. operating system) on which they fail.
Class #1:
import java.io.File;
import java.io.FileWriter;
import java.io.PrintWriter;
public class FileAppender {
public static void main(String[] args) throws Exception {
if ((args != null) && (args.length != 0)) throw
new IllegalArgumentException("args is not null and is not empty");
File file = new File("./file.txt");
int numLines = 1000;
writeLines(file, numLines);
}
private static void writeLines(File file, int numLines) throws Exception {
PrintWriter pw = null;
try {
pw = new PrintWriter( new FileWriter(file), true );
for (int i = 0; i < numLines; i++) {
System.out.println("writing line number " + i);
pw.println("line number " + i);
Thread.sleep(100);
}
}
finally {
if (pw != null) pw.close();
}
}
}
Class #2:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
public class FileMonitor {
public static void main(String[] args) throws Exception {
if ((args != null) && (args.length != 0)) throw
new IllegalArgumentException("args is not null and is not empty");
File file = new File("./file.txt");
readLines(file);
}
private static void readLines(File file) throws Exception {
BufferedReader br = null;
try {
br = new BufferedReader( new FileReader(file) );
while (true) {
String line = br.readLine();
if (line == null) { // end of file, start polling
System.out.println("no file data available; sleeping..");
Thread.sleep(2 * 1000);
}
else {
System.out.println(line);
}
}
}
finally {
if (br != null) br.close();
}
}
}
Unfortunately, TailInputStream class, which can be used to monitor the end of a file, is not one of standard Java platform classes, but there are few implementations on the web. You can find an implementation of TailInputStream class together with a usage example on http://www.greentelligent.com/java/tailinputstream.
Poll, either on a consistent cycle or on a random cycle; 200-2000ms should be a good random poll interval span.
Check two things...
If you have to watch for file growth, then check the EOF / byte count, and be sure to compare that and the fileAccess or fileWrite times with the lass poll. If ( > ), then the file has been written.
Then, combine that with checking for exclusive lock / read access. If the file can be read-locked and it has grown, then whatever was writing to it has finished.
Checking for either property alone won't necessarily get you a guaranteed state of written++ and actually done and available for use.

Categories