A question on using threads in java (disclaimer - I am not very experienced with threads so please allow some leeway).
Overview:
I was wondering whether there was a way for multiple threads to add actions to be performed to a queue which another thread would take care of. It does not matter really what order - more important that the actions in the queue are taken care of one at a time.
Explanation:
I plan to host a small server (using servlets). I want each connection to a client to be handled by a separate thread (so far ok). However, each of these threads/clients will be making changes to a single xml file. However, the changes cannot be done at the same time.
Question:
Could I have each thread submit the changes to be made to a queue which another thread will continuously manage? As I said it does not matter on the order of the changes, just that they do not happen at the same time.
Also, please advise if this is not the best way to do this.
Thank you very much.
This is a reasonable approach. Use an unbounded BlockingQueue (e.g. a LinkedBlockingQueue) - the thread performing IO on the XML file calls take on the queue to remove the next message (blocking if the queue is empty) then processing the message to modify the XML file, while the threads submitting changes to the XML file will call offer on the queue in order to add their messages to it. The BlockingQueue is thread-safe, so there's no need for your threads to perform synchronization on it.
You could have the threads submit tasks to an ExecutorService that has only one thread. Or you could have a lock that allows only one thread to alter the file at once. The later seems more natural, as the file is a shared resource. The queue is the implied queue of threads awaiting a lock.
The Executor interface provides the abstraction you need:
An object that executes submitted Runnable tasks. This interface provides a way of decoupling task submission from the mechanics of how each task will be run, including details of thread use, scheduling, etc. An Executor is normally used instead of explicitly creating threads."
A single-threaded executor service seems like exactly the right tool for the job. See Executors.newSingleThreadExecutor(), whose javadoc says:
Creates an Executor that uses a single worker thread operating off an
unbounded queue. (Note however that if this single thread terminates
due to a failure during execution prior to shutdown, a new one will
take its place if needed to execute subsequent tasks.) Tasks are
guaranteed to execute sequentially, and no more than one task will be
active at any given time. Unlike the otherwise equivalent
newFixedThreadPool(1) the returned executor is guaranteed not to be
reconfigurable to use additional threads.
Note that in a JavaEE context, you need to take into consideration how to terminate the worker thread when your webapp is unloaded. There are other questions here on SO that deal with this.
Related
In my Java application I have a Runnable such as:
this.runner = new Runnable({
#Override
public void run() {
// do something that takes roughly 5 seconds.
}
});
I need to run this roughly every 30 seconds (although this can vary) in a separate thread. The nature of the code is such that I can run it and forget about it (whether it succeeds or fails). I do this as follows as a single line of code in my application:
(new Thread(this.runner)).start()
Now, this works fine. However, I'm wondering if there is any sort of cleanup I should be doing on each of the thread instances after they finish running? I am doing CPU profiling of this application in VisualVM and I can see that, over the course of 1 hour runtime, a lot of threads are being created. Is this concern valid or is everything OK?
N.B. The reason I start a new Thread instead of simply defining this.runner as a Thread, is that I sometimes need to run this.runner twice simultaneously (before the first run call has finished), and I can't do that if I defined this.runner as a Thread since a single Thread object can only be run again once the initial execution has finished.
Java objects that need to be "cleaned up" or "closed" after use conventionally implement the AutoCloseable interface. This makes it easy to do the clean up using try-with-resources. The Thread class does not implement AutoCloseable, and has no "close" or "dispose" method. So, you do not need to do any explicit clean up.
However
(new Thread(this.runner)).start()
is not guaranteed to immediately start computation of the Runnable. You might not care whether it succeeds or fails, but I guess you do care whether it runs at all. And you might want to limit the number of these tasks running concurrently. You might want only one to run at once, for example. So you might want to join() the thread (or, perhaps, join with a timeout). Joining the thread will ensure that the thread will completes its computation. Joining the thread with a timeout increases the chance that the thread starts its computation (because the current thread will be suspended, freeing a CPU that might run the other thread).
However, creating multiple threads to perform regular or frequent tasks is not recommended. You should instead submit tasks to a thread pool. That will enable you to control the maximum amount of concurrency, and can provide you with other benefits (such as prioritising different tasks), and amortises the expense of creating threads.
You can configure a thread pool to use a fixed length (bounded) task queue and to cause submitting threads to execute submitted tasks itself themselves when the queue is full. By doing that you can guarantee that tasks submitted to the thread pool are (eventually) executed. The documentation of ThreadPool.execute(Runnable) says it
Executes the given task sometime in the future
which suggests that the implementation guarantees that it will eventually run all submitted tasks even if you do not do those specific tasks to ensure submitted tasks are executed.
I recommend you to look at the Concurrency API. There are numerous pre-defined methods for general use. By using ExecutorService you can call the shutdown method after submitting tasks to the executor which stops accepting new tasks, waits for previously submitted tasks to execute, and then terminates the executor.
For a short introduction:
https://www.baeldung.com/java-executor-service-tutorial
Is there a better way to make writing to files thread safe (for cases where the file may not be all the same in every thread) than synchronizing the method or the file writer? I read a few threads similar to this topic, but they seem to focus on one file as opposed to multiple files.
Ex. There are 20 threads that writes (meaning it uses a method that creates a a file writer to the file and then writes to it with a try-catch, etc) to file; 10 of the threads write to fileA, 5 threads write to fileB, 4 threads write to fileC, and 1 thread writes to fileD.
Synchronizing the method would not be efficient since threads that want to write to different files will have to wait for the previous thread to finish before it can proceed. I think synchronizing the file writer does pretty much the same or am I wrong?
If I were to have a separate thread thread (from the main application) that writes to a file, would they execute (run) in the order they were submitted to the ExecutorService with 1 thread?
In the main application, I would submit new threads to the ExecutorService (uses 1 thread). The threads would write to a file (using a write method that has the FileWriter synchronized from a Logger class). The threads would write to the file one by one because the FileWriter is syncrhonized and there is only 1 threads for the ExecutorService, which will prevent multiple writes to the same file at once. The question is will the threads write to the file in the order they were submitted to the ExecutorService? I know they start in the order they were submitted, but I am not too sure on the execution order.
You are mixing some things up which creates the confusion: First, ExecutorService is an interface that does not mandate a particular way how the submitted tasks (not threads) are executed. So it doesn’t make sense to ask how an ExecutorService will do a particular thing as it is not specified. It might even drop all tasks without executing anything.
Second, as already mentioned above, you are submitting tasks, not threads, to an ExecutorService whereas the tasks may implement Runnable or Callable.
Unfortunately there’s a design flaw in Java that Thread implements Runnable so you actually can pass a Thread instance to submit() which you should never do as it creates a lot of confusion for no benefit. When you do so, the common ExecutorService implementations will treat it as an ordinary Runnable invoking its run() method ignoring the fact completely that it is a Thread instance. The thread resource associated with that Thread instance will have no relationship with the thread actually executing the run() method (if the implementation ever calls run()).
So if you submit tasks implemented as Runnable or Callable to an ExecutorService you have to study the documentation of the particular implementation to learn about how they will be executed.
E.g. if you use Executors.newSingleThreadExecutor() to get an implementation, its documentation says:
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.) Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
(emphasis by me)
So that would answer your question completely. Note that in this case you don’t even need synchronized within your task’s implementation as this ExecutorService already provides the mutual exclusion guaranty required for your tasks.
Consider the alternative of having a specialized file writer thread that is the only thread to write to the files. The other threads can safely add messages to a java.util.concurrent.BlockingQueue. As soon as a thread has placed a message on the queue, it can get back to work.
Running concurrent tasks via ThreadPoolExecutors. Since I have 2-3 sets of tasks to do, for now have a map of ThreadPoolExecutors and can send a set of tasks to one of them.
Now want to know when a pool has completed all tasks assigned to it. The way its organized is that I know before hand the list of tasks, so send them to a newly constructed pool, then plan to start pooling/ tracking to know when all are done.
One way would be to have another pool with 1-2 threads, that polls the other pools to know when their queues are empty. If a few scans show them as empty (with a second sleep between polling, assumes they are done).
Another way would be to sub class ThreadPoolExecutor , keep a track via the queue and over ridding afterExecute(Runnable r, Throwable t) so can know exactly when each task is done, good to show status and know when all are complete if everything moving smoothly.
Is there an implementation of the second some where? Would be good to have an interface that listeners can implement, then add them selves to the sub classed method.
Also looking for an implementation :
To to ask a pool to shut down within a time out,
If after a time out the shut down is not complete then call shutdownNow()
And if this fails then get the thread factory and stop all threads in its group. (assumes that we set the factory and it uses a group or other way to get a reference to all its threads)
Basically as sure a way as we can, to clean up a pool so that we can have this running in an app container. Some of the tasks call selenium etc so there can be hung threads.
The last ditch would be to restart the container (tomcat/jboss) but want that to be the last ditch.
Question is - know of an open source implementation of this or any code to start off with?
For your first question, you can use a ExecutorCompletionService. It will add all completed tasks into a Queue so with a blocking queue you can wait until all tasks arrived at the queue.
Or create a subclass of FutureTask and override its done method to define the “after execute” action. Then submit instances of this class wrapping your jobs to the executor.
The second question has a straightforward solution. “shut down within a time out, and if after a time out the shut down is not complete then call shutdownNow()”:
executor.shutDown();
if(!executor.awaitTermination(timeout, timeUnit))
executor.shutdownNow();
Stopping threads is something you shouldn’t do (Thread.stop is deprecated for a good reason). But you may invoke cancel(true) on your jobs. That could accelerate the termination if your tasks support interruption.
By the way it looks very unnatural to me having multiple ThreadPoolExecutors and playing around with shutting them down instead of simply having one ThreadPoolExecutor for all jobs and letting that ThreadPoolExecutor manage the live cycle of all threads. That’s what the ThreadPoolExecutor is made for.
When we talk about the processing of asynchronous events using an Executors service, why does creating a new fixed thread pool, involve the use of LinkedBlockingQueue ? The events which are arriving are not dependent at all, so why use a queue because the consumer thread would still involve the contention for take lock? Why doens't the Executors class have some hybrid data structure(such as a concurrent Map implementation) where there is no need for a take lock in most of the cases ?
There is very good reason what thread pool executor works with BlockingQueue (btw, you are not obliged to use LinkedBlockingQueue implementation, you can use different implementations of the BlockingQueue). The queue should be blocking in order to suspend worker threads when there are no tasks to execute. This blocking is done using wait on condition variables, so waiting worker threads do not consume any CPU resources when queue is empty.
If you use non-blocking queue in the thread pool, then how would worker threads poll for tasks to execute? They would have to implement some kind of polling, which is unnecessary wasting of CPU resources (it will be "busy waiting").
UPDATE:
Ok, now I fully understood the use case. Still you need blocking collection anyway. The reason is basically the same - since you implement Producer-Consumer you should have means for worker threads to wait for messages to arrive - and this you simply can't do without mutex + condition variable (or simply BlockingQueue).
Regarding map - yes, I understand how you want to use it, but unfortunately there is no such implementation provided. Recently I solved the similar problem: I needed to group incoming tasks by some criteria and execute tasks from each group serially. As a result I implemented my own GroupThreadPoolExecutor that does this grouping. The idea is simple: group incoming tasks into map and then add them to the executor queue when previous task from the group completes.
There is big discussion here - I think it's relevant to your question.
I'm working on a project where execution time is critical. In one of the algorithms I have, I need to save some data into a database.
What I did is call a method that does that. It fires a new thread every time it's called. I faced a runoutofmemory problem since the loaded threads are more than 20,000 ...
My question now is, I want to start only one thread, when the method is called, it adds the job into a queue and notifies the thread, it sleeps when no jobs are available and so on. Any design patterns available or examples available online ?
Run, do not walk to your friendly Javadocs and look up ExecutorService, especially Executors.newSingleThreadExecutor().
ExecutorService myXS = Executors.newSingleThreadExecutor();
// then, as needed...
myXS.submit(myRunnable);
And it will handle the rest.
Yes, you want a worker thread or thread pool pattern.
http://en.wikipedia.org/wiki/Thread_pool_pattern
See http://www.ibm.com/developerworks/library/j-jtp0730/index.html for Java examples
I believe the pattern you're looking for is called producer-consumer. In Java, you can use the blocking methods on a BlockingQueue to pass tasks from the producers (that create the jobs) to the consumer (the single worker thread). This will make the worker thread automatically sleep when no jobs are available in the queue, and wake up when one is added. The concurrent collections should also handle using multiple worker threads.
Are you looking for java.util.concurrent.Executor?
That said, if you have 20000 concurrent inserts into the database, using a thread pool will probably not save you: If the database can't keep up, the queue will get longer and longer, until you run out of memory again. Also, note that an executors queue is volatile, i.e. if the server crashes, the data in it will be gone.