General purpose of program
To read in a bash-pattern and specified location from command line, and find all files matching that pattern in the location but I have to make the program multi-threaded.
General structure of the program
Driver/Main Class which parses arguments and initiates other classes.
ProcessDirectories Class which adds all directory addresses found from the specified root directory to a string array for processing later
DirectoryData Class which holds the addresses found in the above class
ProcessMatches Class which examines each directory found, and adds any files inside that match the pattern to a string array for printing results later
Main/Driver once again takes over and prints the results :)
The Problem
I need to be processing matches even whilst the ProcessDirectories class is still working (for efficiency so I don't unnecessarily wait for the list to populate before doing work). To do this I try to: a) make ProcessMatches threads wait() if DirectoryData is empty b) make ProcessDirectories notifyAll() if added a new entry.
The Question :)
Every tutorial I look at is focused on the producer and consumer being in the same object, or dealing with just one data structure. How can I do this when I am using more than one data structure and more than one class for producing and consuming?
How about something like:
class Driver(String args)
{
ProcessDirectories pd = ...
BlockingQueue<DirectoryData> dirQueue = new LinkedBlockingQueue<DirectoryData>();
new Thread(new Runnable(){public void run(){pd.addDirs(dirQueue);}}).start();
ProcessMatches pm = ...
BlockingQueue<File> fileQueue = new LinkedBlockingQueue<File>();
new Thread(new Runnable()
{
public void run()
{
for (DirectoryData dir = dirQueue.take(); dir != DIR_POISON; dir = dirQueue.take())
{
for (File file : dir.getFiles())
{
if (pm.matches(data))
fileQueue.add(file)
}
}
fileQueue.add(FILE_POISON);
}
}).start();
for (File file = fileQueue.take(); file != FILE_POISON; file = fileQueue.take())
{
output(file);
}
}
This is just a rough idea of course. ProcessDirectories.addDirs() would just add DirectoryData objects to the queue. In production you'd want to name the threads. Perhaps use an executor to provide manage threads. Perhaps use some other mechanism to indicate end of processing than a poison message. Also, you might want to reduce the limit on the queue size.
Have one data structure that's associated with the data the two threads communicate with each other. This can be a queue that has "get data from queue, waiting if empty" and "put data on queue, waiting if full" functions. Those functions should internally call notify and wait on the queue itself and they should be synchronized to that queue.
Related
In my Java micro-service, I am overriding onFileCreate() function. This method exist in inbuilt library = org.apache.commons.io.monitor, class = FileAlterationListenerAdaptor, method = void onFileCreate(final File file) .
I noticed that even if there are multiple files created, there is only a single thread which is listening to file creations. That means it processes files one by one (synchronous) , instead of multiple at the same time. How can I achieve multi-threading behavior here?
I don't know if it is relevant but I noticed that some of the methods defined in this inbuilt library are 'synchronized'. I am talking about class=FileAlterationMonitor, methods= setThreadFactory(), start(), stop(). Is that the reason? If yes, do I need to override all these 3 methods, or some of them?
enter image description here
setThreadFactory will not help you, it is just an alternative way to create that single thread which monitors file system.
What you need to do is
Create thread pool which will do work for new files. This way you can control how many new files you process in parallel (trust me, you do not want unlimited concurrency)
Your FileAlterationListenerAdaptor.onFileCreate should not process file by itself. Instead, it should submit task to thread pool
Roughly, code should be something like that
int numberOfThreads = ...;
ExecutorService pool = java.util.concurrent.Executors.newFixedThreadPool(numberOfThreads);
FileAlterationListenerAdaptor adaptor = new FileAlterationListenerAdaptor() {
public void onFileCreate(final File file) {
pool.submit(new Runnable() {
// here you do file processing
doSomethingWithFile(file);
});
}
}
....
FileAlterationObserver observer = new FileAlterationObserver(directory);
observer.addListener(adaptor);
...
FileAlterationMonitor monitor = new FileAlterationMonitor(interval);
monitor.addObserver(observer);
monitor.start();
I have a situation where I have a large number of classes that need to do file (read only) access. This is part of a web app running on top of OSGI, so there will be a lot of concurrent needs to access.
So I'm building an OSGI service to access the file system for all the other pieces that will need it and provide a centralized access as this also simplifies configuration of file locations, etc.
It occurs to me that a multi-threaded approach makes the most sense along with a thread pool.
So the question is this:
If I do this and I have a service with an interface like:
FileService.getFileAsClass(class);
and the method getFileAsClass(class) looks kinda like this: (this is a sketch it may not be perfect java code)
public < T> T getFileAsClass(Class< T> clazz) {
Future<InputStream> classFuture = threadpool.submit(new Callable< InputStream>() {
/* initialization block */
{
//any setup from configs.
}
/* implement Callable */
public InputStream call() {
InputStream stream = //new inputstream from file location;
boolean giveUp = false;
while(null == stream && !giveUp) {
//Code that tries to read in the file 4
// times with a Thread.sleep() then gives up
// this is here t make sure we aren't busy updating file.
}
return stream;
}
});
//once we have the file, convert it and return it.
return InputStreamToClassConverter< T>.convert(classFuture.get());
}
Will that correctly wait until the relevant operation is done to call InputStreamtoClassConverter.convert?
This is my first time writing multithreaded java code so I'm not sure what I can expect for some of the behavior. I don't care about order of which threads complete, only that the file handling is handled async and once that file pull is done, then and only then is the Converter used.
I need to serialize into an ArrayList all absolute files paths from a location. I want do that with a FixedThreadPool from ExecutorService.
Example- location: c:/folder1; folder1 have more folders inside, all with files. I want every time I find a folder search their files to add to ArrayList.
public class FilePoolThreads extends Thread {
File fich;
private ArrayList al1;
public FilePoolThreads(File fi, ArrayList<String> al) {
this.fich = fi;
this.al1 = al;
}
public void run() {
FileColector fc = new FileColector();
File[] listaFicheiros = fich.listFiles();
for (int i = 0; i < listaFicheiros.length; i++) {
if (listaFicheiros[i].isFile()) {
al1.add(listaFicheiros[i].getAbsolutePath());
}
}
}
}
The class which I begin the collection of files:
public class FileColector {
private ArrayList<String> list1 = new ArrayList<>();
public static ArrayList<String> search(File fich,ArrayList<String> list1) {
int n1 = 1;
ExecutorService executor = Executors.newFixedThreadPool(n1);
do {
// FilePoolThreads[] threads=new FilePoolThreads[10];
FilePoolThreads mt = new FilePoolThreads(fich, list1);
executor.execute(mt);
} while (fich.isDirectory());
executor.shutdown();
return list1;
}
My code is not working well, I think I have some fails of logic, I need someone help me fix it and how can I return the ArrayList? I have to use the getInputStream before and then the getOutputStream?
Since this is apparently an academic exercise, I'll give an overview of how I would approach this problem given your requirement that you use an executor thread pool.
First, you need to analyze the problem and break it into repeatable units of work that can be done independently of each other. In this case, the basic unit of work is processing a single filesystem directory. Each time you process a directory, you will:
Examine each directory entry.
If the directory entry is a regular file, add it to your list.
If the directory entry is a sub-directory, submit it to be processed.
Next, you need to create an implementation of Runnable to encapsulate the processing of this basic unit of work. Each instance of the class that you create will need at least the following information:
The File representing the directory it is to process.
A list, shared between all workers, to add files to (and, as others have pointed out, ArrayList is not a suitable data structure for this).
A reference to the executor service, for submitting tasks for the sub-directories.
Finally, you would need to create a worker for the top-level directory to process; submit it to the executor service; and then wait until all workers have finished processing. This last part might be the trickiest - you might need to keep a running count, using an AtomicInteger that you pass to each worker, to keep track of how many workers are currently processing.
Don't extend Thread to pass your task to an exectuor. Implement Runnable instead!
Or, implement Callable which can return a result when it's finished executing.
Then you can pass your tasks to ExecutorService.submit() and get back a Future to get() the result of each task's computation when it is done.
Note that you will probably want to recursively visit sub-directories, so that you need to find both files and directories before adding the files to your output and creating new tasks for the directories.
There's no need for threading here, and you have several errors related to trying to use threads. My advice is to forget threading and just solve your real problem, which can be done very simply with something like commons-io FileUtils:
Iterator<File> files = FileUtils.iterateFiles(directoryToScan, FileFileFilter.FILE, TrueFileFilter.INSTANCE);
List<String> paths = new ArrayList<String>();
for (File file : files) {
paths.add(file.getAbsolutePath);
}
That's all.
I have a simple java program that creates a series of temporary files stored in a local tmp directory. I have added a simple shutdown hook that walks through all files and deletes them, then deletes the tmp directory, before exiting the program. here is the code:
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
#Override
public void run() {
File tmpDir = new File("tmp/");
for (File f : tmpDir.listFiles()) {
f.delete();
}
tmpDir.delete();
}
}));
My problem is that the thread that creates these files may not have terminated upon launch of the shutdown hook, and therefore, there may be a file created after listFiles() is called. this causes the tmp dir not to get deleted. I have come up with 2 hacks around this:
Hack # 1:
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
#Override
public void run() {
File tmpDir = new File("tmp/");
while (!tmp.delete()){
for (File f : tmpDir.listFiles()) {
f.delete();
}
}
}
}));
Hack # 2:
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
#Override
public void run() {
try{
Thread.sleep(1000);
} catch(InterruptedException e){
e.printStackTrace();
}
File tmpDir = new File("tmp/");
for (File f : tmpDir.listFiles()) {
f.delete();
}
tmpDir.delete();
}
}));
Neither is a particularly good solution. What would be ideal is to have the shutdown hook wait until all threads have terminated before continuing. Does anyone know if this can be done?
Just keep track of all your running threads and then.join() them before shutting down the program.
This is an answer to the question title as the ewok has said he can't use .deleteOnExit()
What Tyler said, but with a little more detail:
Keep references to the threads where the shutdown hook can access them.
Have the shutdown hook call interrupt on the threads.
Review the code of the threads to make sure they actually respond to interruption (instead of eating the InterruptedException and blundering on, which is typical of a lot of code). An interrupt should prompt the thread to stop looping or blocking, wrap up unfinished business, and terminate.
For each thread where you don't want to proceed until it finishes, check whether the thread is alive and if so call join on it, setting a timeout in case it doesn't finish in a reasonable time, in which case you can decide whether to delete the file or not.
UPDATE: Tyler Heiks accurately pointed out that deleteOnExit() isn't a valid solution since the OP tried it and it did not work. I am providing an alternate solution. It is again indirect, but mainly because the original design using threads and a ShutdownHook is fatally flawed.
Use finally blocks to delete the temp files.
Relying on ShutdownHooks for resource management is a very bad idea and makes the code very difficult to compose or reuse in a larger system. It's an even worse idea to hand resources from thread to thread. Resources like files and streams are among the most dangerous things to share between threads. There is likely very little to gain from this and it would make far more sense for each thread to independently obtain temp files using the library's createTempFile methods and manage their use and deletion using try/finally.
The convention for dealing with the temporary files on the system is to treat them as block boxes where:
location on disk is opaque (irrelevant to and not used directly by the program)
filename is irrelevant
filename is guaranteed to be mutually exclusive
The third above is very difficult to achieve if you hand-roll code to create and name temp files yourself. It is likely to be brittle and fail at the worst times (3AM pager anyone?).
The algorithm you present could delete files created by other processes that coincidentally share the same parent directory. That is unlikely to be a good thing for the stability of those other programs.
Here's the high-level process:
Get Path with Files.createTempFile() (or with legacy pre-Java 7 code File with File.createTempFile())
Use temp file however desired
Delete file
This is similar to InputStream or other resources that need to be manually managed.
That general pattern for explicit resource management (when AutoCloseable and try-with-resources aren't available) is as follows.
Resource r = allocateResource();
try {
useResource(r);
} finally {
releaseResource(r);
}
In the case of Path it looks like this:
Path tempDir = Paths.get("tmp/);
try {
Path p = Files.createTempFile(tempDir, "example", ".tmp");
try {
useTempFile(f);
} finally {
Files.delete(f);
}
} finally {
Files.delete(tempDir);
}
On pre-Java 7 legacy, the use with File looks like this:
File tempDir = new File("tmp/");
try {
File f = File.createTempFile(tempDir, "example", ".tmp");
try {
useTempFile(f);
} finally {
if (!f.delete()) {
handleFailureToDeleteTempFile(f);
}
}
} finally {
if (!tempDir.delete()) {
handleFailureToDeleteTempDir(tempDir);
}
}
What is the best way to share a file between two "writer" services in the same application?
Edit:
Sorry I should have given more details I guess.
I have a Service that saves entries into a buffer. When the buffer gets full it writes all the entries to the file (and so on). Another Service running will come at some point and read the file (essentially copy/compress it) and then empty it.
Here is a general idea of what you can do:
public class FileManager
{
private final FileWriter writer = new FileWriter("SomeFile.txt");
private final object sync = new object();
public void writeBuffer(string buffer)
{
synchronized(sync)
{
writer.write(buffer.getBytes());
}
}
public void copyAndCompress()
{
synchronized(sync)
{
// copy and/or compress
}
}
}
You will have to do some extra work to get it all to work safe, but this is just a basic example to give you an idea of how it looks.
A common method for locking is to create a second file in the same location as the main file. The second file may contain locking data or be blank. The benefit to having locking data (such as a process ID) is that you can easily detect a stale lockfile, which is an inevitability you must plan for. Although PID might not be the best locking data in your case.
example:
Service1:
creates myfile.lock
creates/opens myfile
Service2:
Notices that myfile.lock is present and pauses/blocks/waits
When myfile.lock goes away, it creates it and then opens myfile.
It would also be advantageous for you to double-check that the file contains your locking information (identification specific to your service) right after creating it - just in case two or more services are waiting and create a lock at the exact same time. The last one succeeds and so all other services should notice that their locking data is no longer in the file. Also - pause a few milliseconds before checking its contents.