I am circling through LinkedBlockingQueue millions of Strings.
The reading thread should end its execution when there are no more items in source.
I thought about putting a dummy value like "SHUTDOWN" in LinkedBlockingQueue.
The reader does this:
while ((data = (String)MyLinkedBlockingQueue.take()).equals("SHUTDOWN") == false) {
//read and live
}
Is it efficient to execute equals on every string? If not what can I use instead?
You are on the right track. This is the standard idiom for finishing processing of a BlockingQueue, it's called the "poison pill". i usually implement it using a special private static final instance so you can do object equality and don't risk overlapping with a real value. e.g.:
private static final String SHUTDOWN = new String("SHUTDOWN"); // use new String() so you don't get an interned value
public void readQueue() {
while ((data = (String)MyLinkedBlockingQueue.take()) != SHUTDOWN) {
//read and live
}
}
public void shutdownQueue() {
MyLinkedBlockingQueue.put(SHUTDOWN);
}
You can also think of using poll() and ending the loop when it returns null.
This could be implemented so that you don't have to check for the "poison pill" every time. Consider making use of a ThreadPoolExecutor that works on your LinkedBlockingQueue. When you want to shut down processing, call the shutdown() method on the executor object. From the documentation of that method:
Initiates an orderly shutdown in which previously submitted tasks are
executed, but no new tasks will be accepted. Invocation has no
additional effect if already shut down.
See this post if you're interested in shutting down processing immediately while tasks are still pending in the queue: With a Java ExecutorService, how do I complete actively executing tasks but halt the processing of waiting tasks?
Related
All the threads in an ExecutorService are busy with tasks that wait for tasks that are stuck in the queue of the executor service.
Example code:
ExecutorService es=Executors.newFixedThreadPool(8);
Set<Future<Set<String>>> outerSet=new HashSet<>();
for(int i=0;i<8;i++){
outerSet.add(es.submit(new Callable<Set<String>>() {
#Override
public Set<String> call() throws Exception {
Thread.sleep(10000); //to simulate work
Set<Future<String>> innerSet=new HashSet<>();
for(int j=0;j<8;j++) {
int k=j;
innerSet.add(es.submit(new Callable<String>() {
#Override
public String call() throws Exception {
return "number "+k+" in inner loop";
}
}));
}
Set<String> out=new HashSet<>();
while(!innerSet.isEmpty()) { //we are stuck at this loop because all the
for(Future<String> f:innerSet) { //callable in innerSet are stuckin the queue
if(f.isDone()) { //of es and can't start since all the threads
out.add(f.get()); //in es are busy waiting for them to finish
}
}
}
return out;
}
}));
}
Are there any way to avoid this other than by making more threadpools for each layer or by having a threadpool that is not fixed in size?
A practical example would be if some callables are submitted to ForkJoinPool.commonPool() and then these tasks use objects that also submit to the commonPool in one of their methods.
You should use a ForkJoinPool. It was made for this situation.
Whereas your solution blocks a thread permanently while it's waiting for its subtasks to finish, the work stealing ForkJoinPool can perform work while in join(). This makes it efficient for these kinds of situations where you may have a variable number of small (and often recursive) tasks that are being run. With a regular thread-pool you would need to oversize it, to make sure that you don't run out of threads.
With CompletableFuture you need to handle a lot more of the actual planning/scheduling yourself, and it will be more complex to tune if you decide to change things. With FJP the only thing you need to tune is the amount of threads in the pool, with CF you need to think about then vs. thenAsync as well.
I would recommend trying to decompose the work to use completion stages via CompletableFuture
CompletableFuture.supplyAsync(outerTask)
.thenCompose(CompletableFuture.allOf(innerTasks)
That way your outer task doesn’t hog the execution thread while processing inner tasks, but you still get a Future that resolves when the entire job is done. It can be hard to split those stages up if they’re too tightly coupled though.
The approach that you are suggesting which basically is based on the hypothesis that there is a possible resolution if the number of threads are more than the number of task, will not work here, if you are already allocating a single thread pool. You may try it to see it. It's a simple case of deadlock as you have stated in the comments of your code.
In such a case, use two separate thread pools, one for the outer and another for the inner. And when the task from the inner pool completes, simply return back the value to the outer.
Or you can simply create a thread on the fly, get the work done in it, get the result and return it back to the outer.
Per the comment of method public static ExecutorService newCachedThreadPool() in Executor Class:
Threads that have not been used for sixty seconds are terminated and
removed from the **cache**.
I was wondering where is the cache and how it functions? As I didn't see any possible static Collection variable in the ThreadPoolExecutor or it's super class.
Technically Worker is a Runnable containing a reference to a Thread and not a Thread by itself.
Let us dig deeper into the mechanics of this class.
Executors.cachedThreadPool uses this constructor from ThreadPoolExecutor
new ThreadPoolExecutor(0, Integer.MAX_VALUE,
60L, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>());
where 60s corresponds to the keepAliveTime time.
Worker Addition / Task submission
A RunnableFuture is created out of the submitted Callable or Runnable.
This is passed down to the execute() method.
The execute method tries to insert the task on to the workQueue which in our case is the SynchronousQueue. This will fail and return false due to the semantics of SynchronousQueue.
(Just hold on to this thought, we will revisit this when we talk about caching aspect)
The call goes on to the addIfUnderMaximumPoolSize method within execute which will create a java.util.concurrent.ThreadPoolExecutor.Worker runnable and creates a Thread and adds the created Worker to the workers hashSet. (the one others have mentioned in the answers)
and then it calls the thread.start() .
The run method of Worker is very important and should be noted.
public void run() {
try {
Runnable task = firstTask;
firstTask = null;
while (task != null || (task = getTask()) != null) {
runTask(task);
task = null;
}
} finally {
workerDone(this);
}
}
At this point in time you have a submitted a task and a thread is created and running it.
Worker Removal
In the run method if you have noticed there is a while loop.
It is an incredibly interesting piece of code.
If the task is not null it will short circuit and not check for the second condition.
Once the task has run using runTask and the task reference is set to null, the call comes to the second check condition which takes it into getTask method.
Here is the part which decides a worker should be purged or not.
workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS);
The workQueue is polled for a minute in this case to check for any new tasks coming on the queue.
If not it will return null and checks for whether worker can exit.
Returning null means we will break out of the while and come to the finally block.
Here the worker is removed from the HashSet and the referenced Thread is also gone.
Caching aspect
Coming back to the SynchronousQueue we discussed in Task submission.
In the event I submit a task where workerQueue.offer and workerQueue.poll is able to work in tandem, i.e. there is a task to process in between those 60s I can re-use the thread.
This can be seen in action if I put a sleep of 59s vs 61s between my each task execution.
for 59s I can see the thread getting re-used. for 61s I can see a new thread getting created in the pool.
N.B. The actual timings could vary from machine to machine and my run() is just printing out Thread.currentThread().getName()
Please let me know in comments if I have missed something or misinterpreted the code.
Cache word is only an abstraction. Internally it uses HashSet to hold Threads. As per the code:
/**
* Set containing all worker threads in pool. Accessed only when
* holding mainLock.
*/
private final HashSet<Worker> workers = new HashSet<Worker>();
And if at all you are interested about the runnables you submit or execute.
newCachedThreadPool uses SynchronousQueue<Runnable> to handle them.
If you go through the code of ThreadPoolExecutor, you will see this:
/**
* Set containing all worker threads in pool. Accessed only when
* holding mainLock.
*/
private final HashSet<Worker> workers = new HashSet<Worker>();
and this:
/**
* The queue used for holding tasks and handing off to worker
* threads. We do not require that workQueue.poll() returning
* null necessarily means that workQueue.isEmpty(), so rely
* solely on isEmpty to see if the queue is empty (which we must
* do for example when deciding whether to transition from
* SHUTDOWN to TIDYING). This accommodates special-purpose
* queues such as DelayQueues for which poll() is allowed to
* return null even if it may later return non-null when delays
* expire.
*/
private final BlockingQueue<Runnable> workQueue;
And this:
try {
Runnable r = timed ?
// here keepAliveTime is passed as sixty seconds from
// Executors#newCachedThreadPool()
workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
workQueue.take();
if (r != null)
return r;
timedOut = true;
} catch (InterruptedException retry) {
timedOut = false;
}
I sincere walk through the actual implementation code, keeping these pointers in mind will help you understand more clearly.
I'm loosely following a tutorial on Java NIO to create my first multi-threading, networking Java application. The tutorial is basically about creating an echo-server and a client, but at the moment I'm just trying to get as far as a server receiving messages from the clients and logging them to the console. By searching the tutorial page for "EchoServer" you can see the class that I base most of the relevant code on.
My problem is (at least I think it is) that I can't find a way to initialize the queue of messages to be processed so that it can be used as I want to.
The application is running on two threads: a server thread, which listens for connections and socket data, and a worker thread which processes data received by the server thread. When the server thread has received a message, it calls processData(byte[] data) on the worker, where the data is added to a queue:
1. public void processData(byte[] data) {
2. synchronized(queue) {
3. queue.add(new String(data));
4. queue.notify();
5. }
6. }
In the worker thread's run() method, I have the following code:
7. while (true) {
8. String msg;
9.
10. synchronized (queue) {
11. while (queue.isEmpty()) {
12. try {
13. queue.wait();
14. } catch (InterruptedException e) { }
15. }
16. msg = queue.poll();
17. }
18.
19. System.out.println("Processed message: " + msg);
20. }
I have verified in the debugger that the worker thread gets to line 13, but doesn't proceed to line 16, when the server starts. I take that as a sign of a successful wait. I have also verified that the server thread gets to line 4, and calls notify()on the queue. However, the worker thread doesn't seem to wake up.
In the javadoc for wait(), it is stated that
The current thread must own this object's monitor.
Given my inexperience with threads I am not exactly certain what that means, but I have tried instantiating the queue from the worker thread with no success.
Why does my thread not wake up? How do I wake it up correctly?
Update:
As #Fly suggested, I added some log calls to print out System.identityHashCode(queue) and sure enough the queues were different instances.
This is the entire Worker class:
public class Worker implements Runnable {
Queue<String> queue = new LinkedList<String>();
public void processData(byte[] data) { ... }
#Override
public void run() { ... }
}
The worker is instantiated in the main method and passed to the server as follows:
public static void main(String[] args)
{
Worker w = new Worker();
// Give names to threads for debugging purposes
new Thread(w,"WorkerThread").start();
new Thread(new Server(w), "ServerThread").start();
}
The server saves the Worker instance to a private field and calls processData() on that field. Why do I not get the same queue?
Update 2:
The entire code for the server and worker threads is now available here.
I've placed the code from both files in the same paste, so if you want to compile and run the code yourself, you'll have to split them up again. Also, there's abunch of calls to Log.d(), Log.i(), Log.w() and Log.e() - those are just simple logging routines that construct a log message with some extra information (timestamp and such) and outputs to System.out and System.err.
I'm going to guess that you are getting two different queue objects, because you are creating a whole new Worker instances. You didn't post the code that starts the Worker, but assuming that it also instantiates and starts the Server, then the problem is on the line where you assign this.worker = new Worker(); instead of assigning it to the Worker parameter.
public Server(Worker worker) {
this.clients = new ArrayList<ClientHandle>();
this.worker = new Worker(); // <------THIS SHOULD BE this.worker = worker;
try {
this.start();
} catch (IOException e) {
Log.e("An error occurred when trying to start the server.", e,
this.getClass());
}
}
The thread for the Worker is probably using the worker instance passed to the Server constructor, so the Server needs to assign its own worker reference to that same Worker object.
You might want to use LinkedBlockingQueue instead, it internally handles the multithreading part, and you can focus more on logic. For example :
// a shared instance somewhere in your code
LinkedBlockingQueue<String> queue = new LinkedBlockingQueue<String>();
in one of your thread
public void processData(byte[] data) {
queue.offer(new String(data));
}
and in your other thread
while (running) { // private class member, set to false to exit loop
String msg = queue.poll(500, TimeUnit.MILLISECONDS);
if (msg == null) {
// queue was empty
Thread.yield();
} else {
System.out.println("Processed message: " + msg);
}
}
Note : for the sake of completeness, the methode poll throws in InterruptedException that you may handle as you see fit. In this case, the while could be surrounded by the try...catch so to exit if the thread should have been interrupted.
I'm assuming that queue is an instance of some class that implements the Queue interface, and that (therefore) the poll() method doesn't block.
In this case, you simply need to instantiate a single queue object that can be shared by the two threads. The following will do the trick:
Queue<String> queue = new LinkedList<String>();
The LinkedList class is not thread-safe, but provided that you always access and update the queue instance in a synchronized(queue) block, this will take care of thread-safety.
I think that the rest of the code is correct. You appear to be doing the wait / notify correctly. The worker thread should get and print the message.
If this isn't working, then the first thing to check is whether the two threads are using the same queue object. The second thing to check is whether processData is actually being called. A third possibility is that some other code is adding or removing queue entries, and doing it the wrong way.
notify() calls are lost if there is no thread sleeping when notify() is called. So if you go notify() then another thread does wait() afterwards, then you will deadlock.
You want to use a semaphore instead. Unlike condition variables, release()/increment() calls are not lost on semaphores.
Start the semaphore's count at zero. When you add to the queue increase it. When you take from the queue decrease it. You will not get lost wake-up calls this way.
Update
To clear up some confusion regarding condition variables and semaphores.
There are two differences between condition variables and semaphores.
Condition variables, unlike semaphores, are associated with a lock. You must acquire the lock before you call wait() and notify(). Semaphore do not have this restriction. Also, wait() calls release the lock.
notify() calls are lost on condition variables, meaning, if you call notify() and no thread is sleeping with a call to wait(), then the notify() is lost. This is not the case with semaphores. The ordering of acquire() and release() calls on semaphores does not matter because the semaphore maintains a count. This is why they are sometimes called counting semaphores.
In the javadoc for wait(), it is stated that
The current thread must own this object's monitor.
Given my inexperience with threads I am not exactly certain what that
means, but I have tried instantiating the queue from the worker thread
with no success.
They use really bizarre and confusing terminology. As a general rule of thumb, "object's monitor" in Java speak means "object's lock". Every object in Java has, inside it, a lock and one condition variable (wait()/notify()). So what that line means is, before you call wait() or notify() on an object (in you're case the queue object) you much acquire the lock with synchronized(object){} fist. Being "inside" the monitor in Java speak means possessing the lock with synchronized(). The terminology has been adopted from research papers and applied to Java concepts so it is a bit confusing since these words mean something slightly different from what they originally meant.
The code seems to be correct.
Do both threads use the same queue object? You can check this by object id in a debugger.
Does changing notify() to notifyAll() help? There could be another thread that invoked wait() on the queue.
OK, after some more hours of pointlessly looking around the net I decided to just screw around with the code for a while and see what I could get to. This worked:
private static BlockingQueue<String> queue;
private BlockingQueue<String> getQueue() {
if (queue == null) {
queue = new LinkedBlockingQueue<String>();
}
return queue;
}
As Yanick Rochon pointed out the code could be simplified slightly by using a BlockingQueue instead of an ordinary Queue, but the change that made the difference was that I implemented the Singleton pattern.
As this solves my immediate problem to get the app working, I'll call this the answer. Large amounts of kudos should go to #Fly and others for pointing out that the Queue instances might not be the same - without that I would never have figured this out. However, I'm still very curious on why I have to do it this way, so I will ask a new question about that in a moment.
I have a ThreadPoolExecutor that seems to be lying to me when I call getActiveCount(). I haven't done a lot of multithreaded programming however, so perhaps I'm doing something incorrectly.
Here's my TPE
#Override
public void afterPropertiesSet() throws Exception {
BlockingQueue<Runnable> workQueue;
int maxQueueLength = threadPoolConfiguration.getMaximumQueueLength();
if (maxQueueLength == 0) {
workQueue = new LinkedBlockingQueue<Runnable>();
} else {
workQueue = new LinkedBlockingQueue<Runnable>(maxQueueLength);
}
pool = new ThreadPoolExecutor(
threadPoolConfiguration.getCorePoolSize(),
threadPoolConfiguration.getMaximumPoolSize(),
threadPoolConfiguration.getKeepAliveTime(),
TimeUnit.valueOf(threadPoolConfiguration.getTimeUnit()),
workQueue,
// Default thread factory creates normal-priority,
// non-daemon threads.
Executors.defaultThreadFactory(),
// Run any rejected task directly in the calling thread.
// In this way no records will be lost due to rejection
// however, no records will be added to the workQueue
// while the calling thread is processing a Task, so set
// your queue-size appropriately.
//
// This also means MaxThreadCount+1 tasks may run
// concurrently. If you REALLY want a max of MaxThreadCount
// threads don't use this.
new ThreadPoolExecutor.CallerRunsPolicy());
}
In this class I also have a DAO that I pass into my Runnable (FooWorker), like so:
#Override
public void addTask(FooRecord record) {
if (pool == null) {
throw new FooException(ERROR_THREAD_POOL_CONFIGURATION_NOT_SET);
}
pool.execute(new FooWorker(context, calculator, dao, record));
}
FooWorker runs record (the only non-singleton) through a state machine via calculator then sends the transitions to the database via dao, like so:
public void run() {
calculator.calculate(record);
dao.save(record);
}
Once my main thread is done creating new tasks I try and wait to make sure all threads finished successfully:
while (pool.getActiveCount() > 0) {
recordHandler.awaitTermination(terminationTimeout,
terminationTimeoutUnit);
}
What I'm seeing from output logs (which are presumably unreliable due to the threading) is that getActiveCount() is returning zero too early, and the while() loop is exiting while my last threads are still printing output from calculator.
Note I've also tried calling pool.shutdown() then using awaitTermination but then the next time my job runs the pool is still shut down.
My only guess is that inside a thread, when I send data into the dao (since it's a singleton created by Spring in the main thread...), java is considering the thread inactive since (I assume) it's processing in/waiting on the main thread.
Intuitively, based only on what I'm seeing, that's my guess. But... Is that really what's happening? Is there a way to "do it right" without putting a manual incremented variable at the top of run() and a decremented at the end to track the number of threads?
If the answer is "don't pass in the dao", then wouldn't I have to "new" a DAO for every thread? My process is already a (beautiful, efficient) beast, but that would really suck.
As the JavaDoc of getActiveCount states, it's an approximate value: you should not base any major business logic decisions on this.
If you want to wait for all scheduled tasks to complete, then you should simply use
pool.shutdown();
pool.awaitTermination(terminationTimeout, terminationTimeoutUnit);
If you need to wait for a specific task to finish, you should use submit() instead of execute() and then check the Future object for completion (either using isDone() if you want to do it non-blocking or by simply calling get() which blocks until the task is done).
The documentation suggests that the method getActiveCount() on ThreadPoolExecutor is not an exact number:
getActiveCount
public int getActiveCount()
Returns the approximate number of threads that are actively executing tasks.
Returns: the number of threads
Personally, when I am doing multithreaded work such as this, I use a variable that I increment as I add tasks, and decrement as I grab their output.
I frequently need to have a thread wait for the result of another thread. Seems like there should be some support for this in java.util.concurrent, but I can't find it.
Exchanger is very close to what I'm talking about, but it's bi-directional. I only want Thread A to wait on Thread B, not have both wait on each other.
Yes, I know I can use a CountDownLatch or a Semaphore or Thread.wait() and then manage the result of the computation myself, but it seems like I must be missing a convenience class somewhere.
What am I missing?
UPDATE
// An Example which works using Exchanger
// but you would think there would be uni-directional solution
protected Exchanger<Integer> exchanger = new Exchanger<Integer>();
public void threadA() {
// perform some computations
int result = ...;
exchanger.exchange(result);
}
public void threadB() {
// retrieve the result of threadA
int resultOfA = exchanger.exchange(null);
}
Are you looking for Future<T>? That's the normal representation of a task which has (usually) been submitted to a work queue, but may not have completed yet. You can find out its completion status, block until it's finished, etc.
Look at ExecutorService for the normal way of obtaining futures. Note that this is focused on getting the result of an individual task, not rather than waiting for a thread to finish. A single thread may complete many tasks in its life time, of course - that's the whole point of a thread pool.
So far, it seems like BlockingQueue may be the best solution I've found.
eg.
BlockingQueue<Integer> queue = new ArrayBlockingQueue<Integer>(1);
The waiting thread will call queue.take() to wait for the result, and the producing queue will call queue.add() to submit the result.
The JDK doesn't provide a convenience class that provides the exact functionality you're looking for. However, it is actually fairly easy to write a small utility class to do just that.
You mentioned the CountDownLatch and your preference regarding it, but I would still suggest looking at it. You can build a small utility class (a "value synchronizer" if you will) pretty easily:
public class OneShotValueSynchronizer<T> {
private volatile T value;
private final CountDownLatch set = new CountDownLatch(1);
public T get() throws InterruptedException {
set.await();
return value;
}
public synchronized void set(T value) {
if (set.getCount() > 0) {
this.value = value;
set.countDown();
}
}
// more methods if needed
}
Since Java 8 you can use CompletableFuture<T>. Thread A can wait for a result using the blocking get() method, while Thread B can pass the result of computation using complete().
If Thread B encounters an exception while calculating the result, it can communicate this to Thread A by calling completeExceptionally().
What's inconvenient in using Thread.join()?
I recently had the same problem, tried using a Future then a CountdownLatch but settled on an Exchanger. They are supposed to allow two threads to swap data but there's no reason why one of those threads can't just pass a null.
In the end I think it was the cleanest solution, but it may depend on what exactly you are trying to achieve.
You might use java.util.concurrent.CountDownLatch for this.
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/CountDownLatch.html
Example:
CountDownLatch latch = new CountDownLatch(1);
// thread one
// do some work
latch.countDown();
// thread two
latch.await();