Per the comment of method public static ExecutorService newCachedThreadPool() in Executor Class:
Threads that have not been used for sixty seconds are terminated and
removed from the **cache**.
I was wondering where is the cache and how it functions? As I didn't see any possible static Collection variable in the ThreadPoolExecutor or it's super class.
Technically Worker is a Runnable containing a reference to a Thread and not a Thread by itself.
Let us dig deeper into the mechanics of this class.
Executors.cachedThreadPool uses this constructor from ThreadPoolExecutor
new ThreadPoolExecutor(0, Integer.MAX_VALUE,
60L, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>());
where 60s corresponds to the keepAliveTime time.
Worker Addition / Task submission
A RunnableFuture is created out of the submitted Callable or Runnable.
This is passed down to the execute() method.
The execute method tries to insert the task on to the workQueue which in our case is the SynchronousQueue. This will fail and return false due to the semantics of SynchronousQueue.
(Just hold on to this thought, we will revisit this when we talk about caching aspect)
The call goes on to the addIfUnderMaximumPoolSize method within execute which will create a java.util.concurrent.ThreadPoolExecutor.Worker runnable and creates a Thread and adds the created Worker to the workers hashSet. (the one others have mentioned in the answers)
and then it calls the thread.start() .
The run method of Worker is very important and should be noted.
public void run() {
try {
Runnable task = firstTask;
firstTask = null;
while (task != null || (task = getTask()) != null) {
runTask(task);
task = null;
}
} finally {
workerDone(this);
}
}
At this point in time you have a submitted a task and a thread is created and running it.
Worker Removal
In the run method if you have noticed there is a while loop.
It is an incredibly interesting piece of code.
If the task is not null it will short circuit and not check for the second condition.
Once the task has run using runTask and the task reference is set to null, the call comes to the second check condition which takes it into getTask method.
Here is the part which decides a worker should be purged or not.
workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS);
The workQueue is polled for a minute in this case to check for any new tasks coming on the queue.
If not it will return null and checks for whether worker can exit.
Returning null means we will break out of the while and come to the finally block.
Here the worker is removed from the HashSet and the referenced Thread is also gone.
Caching aspect
Coming back to the SynchronousQueue we discussed in Task submission.
In the event I submit a task where workerQueue.offer and workerQueue.poll is able to work in tandem, i.e. there is a task to process in between those 60s I can re-use the thread.
This can be seen in action if I put a sleep of 59s vs 61s between my each task execution.
for 59s I can see the thread getting re-used. for 61s I can see a new thread getting created in the pool.
N.B. The actual timings could vary from machine to machine and my run() is just printing out Thread.currentThread().getName()
Please let me know in comments if I have missed something or misinterpreted the code.
Cache word is only an abstraction. Internally it uses HashSet to hold Threads. As per the code:
/**
* Set containing all worker threads in pool. Accessed only when
* holding mainLock.
*/
private final HashSet<Worker> workers = new HashSet<Worker>();
And if at all you are interested about the runnables you submit or execute.
newCachedThreadPool uses SynchronousQueue<Runnable> to handle them.
If you go through the code of ThreadPoolExecutor, you will see this:
/**
* Set containing all worker threads in pool. Accessed only when
* holding mainLock.
*/
private final HashSet<Worker> workers = new HashSet<Worker>();
and this:
/**
* The queue used for holding tasks and handing off to worker
* threads. We do not require that workQueue.poll() returning
* null necessarily means that workQueue.isEmpty(), so rely
* solely on isEmpty to see if the queue is empty (which we must
* do for example when deciding whether to transition from
* SHUTDOWN to TIDYING). This accommodates special-purpose
* queues such as DelayQueues for which poll() is allowed to
* return null even if it may later return non-null when delays
* expire.
*/
private final BlockingQueue<Runnable> workQueue;
And this:
try {
Runnable r = timed ?
// here keepAliveTime is passed as sixty seconds from
// Executors#newCachedThreadPool()
workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
workQueue.take();
if (r != null)
return r;
timedOut = true;
} catch (InterruptedException retry) {
timedOut = false;
}
I sincere walk through the actual implementation code, keeping these pointers in mind will help you understand more clearly.
Related
I have a utility method (used for unit testing, it so happens) that executes a Runnable in another thread. It starts the thread running, but does not wait for the Thread to finish, instead relying on a Future. A caller of the method is expected to get() that Future. But is that enough to ensure safe publication of the computation done by the Runnable?
Here is the method:
private static Future<Void> runInOtherThread(final CountDownLatch ready, final Runnable operation) {
final CompletableFuture<Void> future = new CompletableFuture<Void>();
final Thread thread = new Thread(() -> {
try {
ready.await();
operation.run();
} catch (Throwable e) {
future.completeExceptionally(e);
return;
}
future.complete(null);
});
thread.start();
return future;
}
After calling Future.get() on the returned Future, can the caller of the method safely assume that the Runnable has finished execution, and its results have been safely published?
No you don't need to join(). Calling get() on the future is sufficient.
The CompletableFuture interface is a subtype of Future. And the javadoc for Future states this:
Memory consistency effects: Actions taken by the asynchronous computation happen-before actions following the corresponding Future.get() in another thread.
That happen-before relationship is sufficient to ensure safe publication of the value returned by get().
Furthermore, the get() call will not complete until the CompletableFuture has been completed, exceptionally-completed or cancelled.
If we look at Safe Publication by Shipilev one of the trivial ways to get safe publication is to work:
Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
Since CompletableFuture uses a volatile field to write and read the value no additional memory barriers are necessary for safe publication. This is explained in CompletableFuture class overview comment:
* A CompletableFuture may have dependent completion actions,
* collected in a linked stack. It atomically completes by CASing
* a result field, and then pops off and runs those actions. This
* applies across normal vs exceptional outcomes, sync vs async
* actions, binary triggers, and various forms of completions.
*
* Non-nullness of volatile field "result" indicates done. It may
* be set directly if known to be thread-confined, else via CAS.
* An AltResult is used to box null as a result, as well as to
* hold exceptions.
It also handles the safe initialization of the published objects, as per the same overview comment later:
* Completion fields need not be declared as final or volatile
* because they are only visible to other threads upon safe
* publication.
I have a list of objects, from which depending on user interaction some objects need to do work asynchronically. Something like this:
for(TheObject o : this.listOfObjects) {
o.doWork();
}
The class TheObject implements an ExecutorService (SingleThread!), which is used to do the work. Every object of type TheObject instantiates an ExecutorService. I don't want to make lasagna code. I don't have enough Objects at the same time, to make an extra extraction layer with thread pooling needed.
I want to cite the Java Documentation about CachedThreadPools:
Threads that have not been used for sixty seconds are terminated and
removed from the cache. Thus, a pool that remains idle for long enough
will not consume any resources.
First question: Is this also true for a SingleThreadExecutor? Does the thread get terminated? JavaDoc doesn't say anything about SingleThreadExecutor. It wouldn't even matter in this application, as I have an amount of objects I can count on one hand. Just curiosity.
Furthermore the doWork() method of TheObject needs to call the ExecutorService#.submit() method to do the work async. Is it possible (I bet it is) to call the doWork() method implicitly? Is this a viable way of designing an async method?
void doWork() {
if(!isRunningAsync) {
myExecutor.submit(doWork());
} else {
// Do Work...
}
}
First question: Is this also true for a SingleThreadExecutor? Does the thread get terminated?
Take a look at the source code of Executors, comparing the implementations of newCachedThreadPool and newSingleThreadExecutor:
public static ExecutorService newCachedThreadPool() {
return new ThreadPoolExecutor(0, Integer.MAX_VALUE,
60L, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>());
}
public static ExecutorService newSingleThreadExecutor() {
return new FinalizableDelegatedExecutorService
(new ThreadPoolExecutor(1, 1,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>()));
}
The primary difference (of interest here) is the 60L, TimeUnit.SECONDS and 0L, TimeUnit.MILLISECONDS.
Effectively (but not actually), these parameters are passed to ThreadPoolExecutor.setKeepAliveTime. Looking at the Javadoc of that method:
A time value of zero will cause excess threads to terminate immediately after executing tasks.
where "excess threads" actually refers to "threads in excess of the core pool size".
The cached thread pool is created with zero core threads, and an (effectively) unlimited number of non-core threads; as such, any of its threads can be terminated after the keep alive time.
The single thread executor is created with 1 core thread and zero non-core threads; as such, there are no threads which can be terminated after the keep alive time: its one core thread remains active until you shut down the entire ThreadPoolExecutor.
(Thanks to #GPI for pointing out that I was wrong in my interpretation before).
First question:
Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources.
Is this also true for a SingleThreadExecutor?
SingleThreadExecutor works differently. It don't have time-out concept due to the values configured during creation.
Termination of SingleThread is possible. But it guarantees that always one Thread exists to handle tasks from task queue.
From newSingleThreadExecutor documentation:
public static ExecutorService newSingleThreadExecutor()
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.)
Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
Second question:
Furthermore the doWork() method of TheObject needs to call the ExecutorService#.submit() method to do the work async
for(TheObject o : this.listOfObjects) {
o.doWork();
}
can be changed to
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.execute(new Runnable() {
public void run() {
System.out.println("Asynchronous task");
}
});
executorService.shutdown();
with Callable or Runnable interface and add your doWork() code in run() method or call() method. The task will be executed concurrently.
I'm loosely following a tutorial on Java NIO to create my first multi-threading, networking Java application. The tutorial is basically about creating an echo-server and a client, but at the moment I'm just trying to get as far as a server receiving messages from the clients and logging them to the console. By searching the tutorial page for "EchoServer" you can see the class that I base most of the relevant code on.
My problem is (at least I think it is) that I can't find a way to initialize the queue of messages to be processed so that it can be used as I want to.
The application is running on two threads: a server thread, which listens for connections and socket data, and a worker thread which processes data received by the server thread. When the server thread has received a message, it calls processData(byte[] data) on the worker, where the data is added to a queue:
1. public void processData(byte[] data) {
2. synchronized(queue) {
3. queue.add(new String(data));
4. queue.notify();
5. }
6. }
In the worker thread's run() method, I have the following code:
7. while (true) {
8. String msg;
9.
10. synchronized (queue) {
11. while (queue.isEmpty()) {
12. try {
13. queue.wait();
14. } catch (InterruptedException e) { }
15. }
16. msg = queue.poll();
17. }
18.
19. System.out.println("Processed message: " + msg);
20. }
I have verified in the debugger that the worker thread gets to line 13, but doesn't proceed to line 16, when the server starts. I take that as a sign of a successful wait. I have also verified that the server thread gets to line 4, and calls notify()on the queue. However, the worker thread doesn't seem to wake up.
In the javadoc for wait(), it is stated that
The current thread must own this object's monitor.
Given my inexperience with threads I am not exactly certain what that means, but I have tried instantiating the queue from the worker thread with no success.
Why does my thread not wake up? How do I wake it up correctly?
Update:
As #Fly suggested, I added some log calls to print out System.identityHashCode(queue) and sure enough the queues were different instances.
This is the entire Worker class:
public class Worker implements Runnable {
Queue<String> queue = new LinkedList<String>();
public void processData(byte[] data) { ... }
#Override
public void run() { ... }
}
The worker is instantiated in the main method and passed to the server as follows:
public static void main(String[] args)
{
Worker w = new Worker();
// Give names to threads for debugging purposes
new Thread(w,"WorkerThread").start();
new Thread(new Server(w), "ServerThread").start();
}
The server saves the Worker instance to a private field and calls processData() on that field. Why do I not get the same queue?
Update 2:
The entire code for the server and worker threads is now available here.
I've placed the code from both files in the same paste, so if you want to compile and run the code yourself, you'll have to split them up again. Also, there's abunch of calls to Log.d(), Log.i(), Log.w() and Log.e() - those are just simple logging routines that construct a log message with some extra information (timestamp and such) and outputs to System.out and System.err.
I'm going to guess that you are getting two different queue objects, because you are creating a whole new Worker instances. You didn't post the code that starts the Worker, but assuming that it also instantiates and starts the Server, then the problem is on the line where you assign this.worker = new Worker(); instead of assigning it to the Worker parameter.
public Server(Worker worker) {
this.clients = new ArrayList<ClientHandle>();
this.worker = new Worker(); // <------THIS SHOULD BE this.worker = worker;
try {
this.start();
} catch (IOException e) {
Log.e("An error occurred when trying to start the server.", e,
this.getClass());
}
}
The thread for the Worker is probably using the worker instance passed to the Server constructor, so the Server needs to assign its own worker reference to that same Worker object.
You might want to use LinkedBlockingQueue instead, it internally handles the multithreading part, and you can focus more on logic. For example :
// a shared instance somewhere in your code
LinkedBlockingQueue<String> queue = new LinkedBlockingQueue<String>();
in one of your thread
public void processData(byte[] data) {
queue.offer(new String(data));
}
and in your other thread
while (running) { // private class member, set to false to exit loop
String msg = queue.poll(500, TimeUnit.MILLISECONDS);
if (msg == null) {
// queue was empty
Thread.yield();
} else {
System.out.println("Processed message: " + msg);
}
}
Note : for the sake of completeness, the methode poll throws in InterruptedException that you may handle as you see fit. In this case, the while could be surrounded by the try...catch so to exit if the thread should have been interrupted.
I'm assuming that queue is an instance of some class that implements the Queue interface, and that (therefore) the poll() method doesn't block.
In this case, you simply need to instantiate a single queue object that can be shared by the two threads. The following will do the trick:
Queue<String> queue = new LinkedList<String>();
The LinkedList class is not thread-safe, but provided that you always access and update the queue instance in a synchronized(queue) block, this will take care of thread-safety.
I think that the rest of the code is correct. You appear to be doing the wait / notify correctly. The worker thread should get and print the message.
If this isn't working, then the first thing to check is whether the two threads are using the same queue object. The second thing to check is whether processData is actually being called. A third possibility is that some other code is adding or removing queue entries, and doing it the wrong way.
notify() calls are lost if there is no thread sleeping when notify() is called. So if you go notify() then another thread does wait() afterwards, then you will deadlock.
You want to use a semaphore instead. Unlike condition variables, release()/increment() calls are not lost on semaphores.
Start the semaphore's count at zero. When you add to the queue increase it. When you take from the queue decrease it. You will not get lost wake-up calls this way.
Update
To clear up some confusion regarding condition variables and semaphores.
There are two differences between condition variables and semaphores.
Condition variables, unlike semaphores, are associated with a lock. You must acquire the lock before you call wait() and notify(). Semaphore do not have this restriction. Also, wait() calls release the lock.
notify() calls are lost on condition variables, meaning, if you call notify() and no thread is sleeping with a call to wait(), then the notify() is lost. This is not the case with semaphores. The ordering of acquire() and release() calls on semaphores does not matter because the semaphore maintains a count. This is why they are sometimes called counting semaphores.
In the javadoc for wait(), it is stated that
The current thread must own this object's monitor.
Given my inexperience with threads I am not exactly certain what that
means, but I have tried instantiating the queue from the worker thread
with no success.
They use really bizarre and confusing terminology. As a general rule of thumb, "object's monitor" in Java speak means "object's lock". Every object in Java has, inside it, a lock and one condition variable (wait()/notify()). So what that line means is, before you call wait() or notify() on an object (in you're case the queue object) you much acquire the lock with synchronized(object){} fist. Being "inside" the monitor in Java speak means possessing the lock with synchronized(). The terminology has been adopted from research papers and applied to Java concepts so it is a bit confusing since these words mean something slightly different from what they originally meant.
The code seems to be correct.
Do both threads use the same queue object? You can check this by object id in a debugger.
Does changing notify() to notifyAll() help? There could be another thread that invoked wait() on the queue.
OK, after some more hours of pointlessly looking around the net I decided to just screw around with the code for a while and see what I could get to. This worked:
private static BlockingQueue<String> queue;
private BlockingQueue<String> getQueue() {
if (queue == null) {
queue = new LinkedBlockingQueue<String>();
}
return queue;
}
As Yanick Rochon pointed out the code could be simplified slightly by using a BlockingQueue instead of an ordinary Queue, but the change that made the difference was that I implemented the Singleton pattern.
As this solves my immediate problem to get the app working, I'll call this the answer. Large amounts of kudos should go to #Fly and others for pointing out that the Queue instances might not be the same - without that I would never have figured this out. However, I'm still very curious on why I have to do it this way, so I will ask a new question about that in a moment.
I am circling through LinkedBlockingQueue millions of Strings.
The reading thread should end its execution when there are no more items in source.
I thought about putting a dummy value like "SHUTDOWN" in LinkedBlockingQueue.
The reader does this:
while ((data = (String)MyLinkedBlockingQueue.take()).equals("SHUTDOWN") == false) {
//read and live
}
Is it efficient to execute equals on every string? If not what can I use instead?
You are on the right track. This is the standard idiom for finishing processing of a BlockingQueue, it's called the "poison pill". i usually implement it using a special private static final instance so you can do object equality and don't risk overlapping with a real value. e.g.:
private static final String SHUTDOWN = new String("SHUTDOWN"); // use new String() so you don't get an interned value
public void readQueue() {
while ((data = (String)MyLinkedBlockingQueue.take()) != SHUTDOWN) {
//read and live
}
}
public void shutdownQueue() {
MyLinkedBlockingQueue.put(SHUTDOWN);
}
You can also think of using poll() and ending the loop when it returns null.
This could be implemented so that you don't have to check for the "poison pill" every time. Consider making use of a ThreadPoolExecutor that works on your LinkedBlockingQueue. When you want to shut down processing, call the shutdown() method on the executor object. From the documentation of that method:
Initiates an orderly shutdown in which previously submitted tasks are
executed, but no new tasks will be accepted. Invocation has no
additional effect if already shut down.
See this post if you're interested in shutting down processing immediately while tasks are still pending in the queue: With a Java ExecutorService, how do I complete actively executing tasks but halt the processing of waiting tasks?
I have a ThreadPoolExecutor that seems to be lying to me when I call getActiveCount(). I haven't done a lot of multithreaded programming however, so perhaps I'm doing something incorrectly.
Here's my TPE
#Override
public void afterPropertiesSet() throws Exception {
BlockingQueue<Runnable> workQueue;
int maxQueueLength = threadPoolConfiguration.getMaximumQueueLength();
if (maxQueueLength == 0) {
workQueue = new LinkedBlockingQueue<Runnable>();
} else {
workQueue = new LinkedBlockingQueue<Runnable>(maxQueueLength);
}
pool = new ThreadPoolExecutor(
threadPoolConfiguration.getCorePoolSize(),
threadPoolConfiguration.getMaximumPoolSize(),
threadPoolConfiguration.getKeepAliveTime(),
TimeUnit.valueOf(threadPoolConfiguration.getTimeUnit()),
workQueue,
// Default thread factory creates normal-priority,
// non-daemon threads.
Executors.defaultThreadFactory(),
// Run any rejected task directly in the calling thread.
// In this way no records will be lost due to rejection
// however, no records will be added to the workQueue
// while the calling thread is processing a Task, so set
// your queue-size appropriately.
//
// This also means MaxThreadCount+1 tasks may run
// concurrently. If you REALLY want a max of MaxThreadCount
// threads don't use this.
new ThreadPoolExecutor.CallerRunsPolicy());
}
In this class I also have a DAO that I pass into my Runnable (FooWorker), like so:
#Override
public void addTask(FooRecord record) {
if (pool == null) {
throw new FooException(ERROR_THREAD_POOL_CONFIGURATION_NOT_SET);
}
pool.execute(new FooWorker(context, calculator, dao, record));
}
FooWorker runs record (the only non-singleton) through a state machine via calculator then sends the transitions to the database via dao, like so:
public void run() {
calculator.calculate(record);
dao.save(record);
}
Once my main thread is done creating new tasks I try and wait to make sure all threads finished successfully:
while (pool.getActiveCount() > 0) {
recordHandler.awaitTermination(terminationTimeout,
terminationTimeoutUnit);
}
What I'm seeing from output logs (which are presumably unreliable due to the threading) is that getActiveCount() is returning zero too early, and the while() loop is exiting while my last threads are still printing output from calculator.
Note I've also tried calling pool.shutdown() then using awaitTermination but then the next time my job runs the pool is still shut down.
My only guess is that inside a thread, when I send data into the dao (since it's a singleton created by Spring in the main thread...), java is considering the thread inactive since (I assume) it's processing in/waiting on the main thread.
Intuitively, based only on what I'm seeing, that's my guess. But... Is that really what's happening? Is there a way to "do it right" without putting a manual incremented variable at the top of run() and a decremented at the end to track the number of threads?
If the answer is "don't pass in the dao", then wouldn't I have to "new" a DAO for every thread? My process is already a (beautiful, efficient) beast, but that would really suck.
As the JavaDoc of getActiveCount states, it's an approximate value: you should not base any major business logic decisions on this.
If you want to wait for all scheduled tasks to complete, then you should simply use
pool.shutdown();
pool.awaitTermination(terminationTimeout, terminationTimeoutUnit);
If you need to wait for a specific task to finish, you should use submit() instead of execute() and then check the Future object for completion (either using isDone() if you want to do it non-blocking or by simply calling get() which blocks until the task is done).
The documentation suggests that the method getActiveCount() on ThreadPoolExecutor is not an exact number:
getActiveCount
public int getActiveCount()
Returns the approximate number of threads that are actively executing tasks.
Returns: the number of threads
Personally, when I am doing multithreaded work such as this, I use a variable that I increment as I add tasks, and decrement as I grab their output.