Allow only one Thread to access a particular file in Java - java

I have a folder with large number of files in it. There is one cron job that takes 10 file names at a time e.g. file1, file2....., file10 and creates 10 new Threads to read those files. The content of those files is extracted and dumped in another process(irrelevant) and the file is deleted
Now the problem is that if one of the Threads takes more than a minute to process the file, the cron job triggers again and picks up the same file again as it is not deleted yet and processes the content again.
Is there a way to restrict Thread from reading a file/creating a File object if there is another Thread already reading content from it.
I can have a synchronized hash map store the details of 10 files that the 10 Threads are currently processing and check the map before I assign a file to a Thread but I am finding it difficult to believe that is there is no better way to do this in Java.

Obviously you need a "sync point" between your different threads; for that, there are plenty of options.
When all your threads are running in the same JVM, then you could use a class like
class CurrentlyProcessedFilesTracker {
synchronized void markFileForProcessing(File f) { }
synchronized void markFileAsDone(File f) {
or something alike (where the first one would throw an exception in case the provided file is already known).
Then you just have to make sure that all your threads have access to the same instance of that class ... and they use it to "lock" and "unlock" while being busy working a certain file.
If there is more than one JVM, things become more complicated. Then you would need either some inter-process communication; or you go for the idea from Scary Wombat and have the "scheduler part" rename files (depending on your context that might a good idea; or not; it depends what other "responsibilities" that scheduler part has - you shouldn't put too much things into that one component).

Related

Let two long running processes "talk" to each other by text files

I have two (Java) processes on different JVMs running repeatedly. The first one regularly finds some "information" and needs to store it somewhere. The second process regularly reads this information to handle it. The intervals are more or less random, so process 1 may find three pieces of information until process 2 reads them or vice versa.
My approach is to write this information to text files. But I am afraid that appending and reading the text files accidentally happens at the same time so that I run into locks. But writing a new text file for each piece of information seems like overkill.
What would be a better solution?
EDIT: I am sorry, I did not make clear: The java processes run in different JVMs. They cannot see each other directly.
You can get this to work, provided you are careful with file handling and you don't have a high update rate e.g. 10 updates per second.
Note: you could do it with file renaming instead of locks.
What would be a better solution?
Just about anything, SO is not for recommending things, but in this case I could recommend just about anything without more specific requirements. I could for example recommend my library Chronicle Queue because I wrote it and I sure it could do what you want, however there are many possible alternatives.
I am sending about one line of text every minute.
So you can write a temporary file for each message, rename it when finished. The consumer can have a directory watcher so it knows as soon as you have done this. The consumer could delete the file when done. This has an overhead but it would be less than 10 ms.
If you want to keep a record of all messages, the producer can also write to a log file.

Message Queue slow performance

I am writing a message queue, but it is functioning slow, the method processFile taking too much time, and files get stuck in queue for long time. How to avoid it.
System.out.println("Message Reader Started....");
do
{
String directoryPath = "C:\\Queue";
int fileCount = new File(directoryPath).list().length;
if (fileCount < 1) {
System.out.println("Files Not Present");
}
else
{
File[] file = new File(directoryPath).listFiles();
String firstFile = file[0].getAbsolutePath();
processFile(firstFile);
}
} while (true);
have you tried using concurrency for this? its an apt problem for concurrent processing. Assuming that file processing is a mutually exclusive action:
the do while loop in main Thread finds the file to read
process file is delegated to an executor thread for processing
and after processing (I am assuming reading the file) the processing of the contents can again be done in parallel. Its like read first 1000 lines and delegate to a thread to process.
You need to design it in a better way to run fast. A single threaded read and process list of files is bound to run slow.
Your main issue is possibly CPU usage used for scanning the folder.
You should add Thread.sleep(100); at the end of the loop to give the system some time to breathe.
The issue you want to have resolved is obviously the processFile() method. You should do as it was commented by #Nazgul and implement it in it's own class with a Runnable interface.
To limit the amount of threads running, place the filename in a List or Queue and implement a Thread working on the List. You can add as many worker threads as your system can handle. The queue should be synchronized, so that you can safely remove items from multiple threads at the same time.
You wrote an endless loop, so why do you worry how long a single iteration takes?
You're needless reading the directory twice per iteration. Assuming your processFile removes the file processed (and possible another thread or process adds some files, but deletes none), you don't need to read the directory in each iteration.
Read it once and process all files found. If there's none, then re-read the directory. If there's still none, then you may terminate or sleep for a while (or consider watching the directory, but this is a bit more complicated and probably not necessary).
I'd strongly suggest to improve your loop before you start playing with treads (then use ExecutorService as suggested).

How can i implement multithreading in java to process 2 million text files?

I have to process around 2 million text files and generate there triples.
Suppose I have a txt file xyz.txt(one of the files of 2 million input) , it is processed as below:
start(xyz.txt)---->module1(xyz.tpd)------>module2(xyz.adv)-------->module3(xyz.tpl)
suggest me a logic or concept so that i can process faster and in an optimized way on x64 4GB windows systems.
module1(working): it parses the txt file using a .bat file in which parser is invoked, it is a separate system thread and after 15 seconds it again starts parsing another txt file, and so on....
module2(working): it accepts .tpd file as input and generates .adv file.
module3(working): it accepts .adv file as input and generates .tpl(triples).
should i start threads from txt files or at some other point..?
i am afraid that if i the CPU get stuck in context switching.
can anyone have a better logic, so that i can try it..!?
Use a ThreadPoolExecutor .Tune it's parameters like number of active threads and others to suit your environment and system.
Most importantly, you have to write the program, profile it, and see where the bottleneck is. It is more than probable that the disk I/O operations will be the bottleneck and no amount of multithreading will solve your problems.
In that case using two(three? four?) separate hard drives may yield more speed gain than the best multithreaded solution.
Furthermore, the general rule is that you should optimize your application only when you have working code and you really know what to optimize. Profile, profile, profile.
Taking the future multithreaded optimizations into account when writing is OK; the architecture should be flexible enough to allow for future optimizations.
There is not much told here about your hardware environment; but the basic solution would be to use a fixed-size ExecutorService, where the size would, at first, be the number of your execution units:
private static final int NR_CPUS = Runtime.getRuntime().availableProcessors();
// Then:
final ExecutorService executor = Executors.newFixedThreadPool(NR_CPUS);
Then, for each file, you can create a Runnable to process it, and submit it to the thread pool using its .execute() method.
Note that .execute() is asynchronous; if the submitted runnable cannot be run right now, it will be queued.
..sounds like a typical batch application needed for data integration. Although, I do not intend to throw hyperlinks without completely understanding your needs at you, but, probably you need a solution which should work in a single VM and over the period of time you like to extend the solution for multiple VM/machines.. and may be we are not dealing with PBs of data to start with.. try Spring Batch not only will it solve the problem in the given context you will learn to structure your thoughts (think vocabulary!) to solve similar problems..
As a starting point, I would create one IO thread and a pool of CPU threads. The IO thread reads in text files and offers them to a BlockingQueue, while the CPU threads take the files from the BlockingQueue and process them. Then profile the application to see how many CPU threads you should use to keep pace with the IO thread (you can also dynamically determine this, e.g. start with one CPU thread and start another when the size of the BlockingQueue exceeds a threshold, probably something along the lines of 20 files). It's possible that you'll find that you only need one CPU thread to keep pace with the IO thread, in which case your program is IO bound and you'll need to e.g. place the text files next to each other on disk (so that you can use sequential reads on all but the first file) or put them on separate disks in order to speed up the application; one idea is to zip the files together and read them in with a ZipInputStream - this will reduce the number of disk seeks when reading the files and will also reduce the amount of data you need to read

Java - Multithreading and Files Question

I have one text file that needs to be read by two threads, but I need to make the reading sequentially. Example: Thread 1 gets the lock and read first line, lock is free. Thread 2 gets the lock and read line 2, and so goes on.
I was thinking in sharing the same buffer reader or something like that, but I'm not so sure about it.
Thanks in advance!
EDITED
Will be 2 classes each one with a thread. Those 2 classes will read the same file.
You can lock the BufferReader as you say.
I would warn you that the performance is likely to be worse than using just one thread. However you can do it as an exercise.
It would probably be more performant to read the file line by line in one thread, and pass the resulting input lines to a thread pool via a queue such as ConcurrentLinkedQueue, if you want to guarantee order at least of start of processing of the files lines. Much simpler to implement, and no contention on whatever class you use to read the file.
Unless there's some cast-iron reason why you need the reading to happen local to each thread, I'd avoid sharing the file like this.

deleting a file in java while uploading it in other thread

i'm trying to build a semi file sharing program, when each computer acts both as a server and as a client.
I give multiple threads the option to DL the file from my system.
also, i've got a user interface that can recieve a delete message.
my problem is that i want that the minute a delete message receieved, i wait for all the threads that are DL the file to finish DL, and ONLY than excute file.delete().
what is the best way to do it?
I thought about some database that holds > and iterate and check if the thread is active, but it seems clumsy. is there a better way?
thanks
I think you can do this more simply than using a database. I would put a thin wrapper class around File.. a TrackedFile. It has the file inside, and a count of how many people are reading it. When you do to delete, just stop allowing new people to grab the file, and wait for the count to get to 0.
Since you are dealing with many threads accessing shared state, make sure you properly use java.util.concurrent
I am not sure this addresses all your problems, but this is what I have in mind:
Assumming that all read/write/delete operations occur only from within the same application, a thread synchronization mechanism using locks can be useful.
For every new file that arrives, a new read/write lock can be created (See Java's ReentrantReadWriteLock). The read lock should be acquired for all read operations, while the write lock should be acquired for write/delete operations. Of course, when the lock is acquired you should check whether the operation is still meaningful (i.e. whether the file still exists).
Your delete event handling thread (probably your UI) will become un-responsive if you have to wait for all readers to finish. Instead queue the delete and periodically poll for deletions which can be processed. You can use:
private class DeleteRunnable implements Runnable {
public void run() {
while (!done) {
ArrayList<DeletedObject> tmpList = null;
synchronized (masterList) {
tmpList = new ArrayList<DeletedObjects>(masterList);
}
for (DeletedObject o : tmpList)
if (o.waitForReaders(500, TimeUnit.MilliSeconds))
synchronized (masterList) {
masterList.remove(o);
}
}
}
}
If you were to restructure your design just slightly so that loading the file from disk and uploading the file to the client were not done in the same thread, you could wait for the file to stop being accessed simply by locking out new threads from reading this file, then iterating over all of the threads reading from that file and do a join() on each one, one at a time. As long as the file-reading threads terminate directly after loading the file, the iteration will finish the moment the last thread is no longer reading the file and you are good to go.
The following paragraph is based on the assumption that you keep re-reading the file data over multiple times, even if the reading threads are both reading during the same general time frame, since that's what it sounds like you're doing.
Doing it this way, separating file-reading into separate threads, would also allow you to have a single thread loading a specific file and to have multiple client-uploads getting the data from the single reading pass over the file. There are several optimizations you could implement with this, depending on what type of project this is for. If you do, make sure you don't keep too much file data in memory, or the obvious will happen. But if you are guaranteed by the nature of your project that there will be few and/or small files that will not take up too much memory, this is a great side effect of separating file-loading into a separate thread.
If you go this route of using join(), you could use the join(milliseconds) variant if you want the deletion thread to be able to wait a certain period then demand the other threads stop (for huge files and/or times when many files are being accessed so HD is going slow), if they haven't already. Just get a timestamp of (now + theDurationYouWantToWait) and join(impatientTimestamp-currentTimestamp), and send an interrupt to all file-loading threads in the middle of the loop on if(currentTimestamp >= impatientTimestamp) - then have the file-loading threads check for it in the loop where they're reading file data, then re-join() the thread that the join(milliseconds) aborted from and continue the join()ing iteration you were doing.

Categories