How to find the number of runnables waiting to be executed - java

Is there any way to know at a given point in time how many runnables are waiting to be executed by the ExecutorService. For example, assuming that the loop is invoking the execute method faster than the runnable can complete and a surplus accumulates, is there anyway to get a running count of the accumulated runnables?
ExecutorService es = Executors.newFixedThreadPool(50);
while (test) {
es.execute(new MyRunnable());
}

Is there any way to know at a given point in time how many runnables are waiting to be executed by the ExecutorService.
Yes. Instead of using the Executors... calls, you should instantiate your own ThreadPoolExecutor. Below is what the Executors.newFixedThreadPool(50) is returning:
ThreadPoolExecutor threadPool = new ThreadPoolExecutor(50, 50,
0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());
Once you have the ThreadPoolExecutor, it has a number of getter methods that give you data on the pool. The number of outstanding jobs should be:
threadPool.getQueue().getSize();
Also available from ThreadPoolExecutor are:
getActiveCount()
getCompletedTaskCount()
getCorePoolSize()
getLargestPoolSize()
getMaximumPoolSize()
getPoolSize()
getTaskCount()
If you want to throttle the number of jobs in the queue so you don't get too far ahead, you should use a bounded BlockingQueue and the RejectedExecutionHandler. See my answer here: Process Large File for HTTP Calls in Java

You can use ThreadPoolExecutor implementation and call toString() method.
Returns a string identifying this pool, as well as its state, including indications of run state and estimated worker and task counts.
Read more here
There are more methods in this implementation to get you different type of counts.
You can use following two methods:
public long getTaskCount()
Returns the approximate total number of tasks that have ever been scheduled for execution. Because the states of tasks and threads may change dynamically during computation, the returned value is only an approximation.
public long getCompletedTaskCount()
Returns the approximate total number of tasks that have completed execution. Because the states of tasks and threads may change dynamically during computation, the returned value is only an approximation, but one that does not ever decrease across successive calls.
Cheers !!

Related

Difference between a ExecutorCompletionService and a FixedThreadPool executor

As far as I know, executor completion service provides the output from the future object regardless of the order in which the task was requested in the inbound queue, i.e. whichever task is completed first the Result is put into the Outbound Queue. On the other hand, FixedThreadPool also executes the tasks parallelly, then what is the difference between the two? ( Not sure whether the FixedThreadPool gives the output sequentially in the order the tasks were fed to the inbound queue )
Thanks.
FixedThreadPool is one of the variations of Executor. It uses class ThreadPoolExecutor with same values for corePoolSize and maximumPoolSize. It means, if you create FixedThreadPool with 10 threads, it will always keep exact 10 threads. If any of these threads are terminated by running task - thread pool will create new ones to keep required amount.
CompletionService arranges that submitted tasks are, upon completion, placed on a queue.
It means, that all results of submitted tasks will be in a queue and you can process them later.
When you submit a task to a CompletionService, it creates a wrapper, so the result of async task is saved to queue. It doesn't create parallelism itself, instead CompletionService uses inside Executor for making parallel threads. You can pass FixedThreadPool inside, for example.
All tasks, submitted to FixedThreadPool and CompletionService will be done in parallel, without keeping the order.
CompletionService can be used, when you need to know when all of your tasks are finished. Example:
//Task extends Callable<Result>
List<Task> tasks = new ArrayList<Task>();
CompletionService<Result> cs = new ExecutorCompletionService<Result>(Executors.newFixedThreadPool(10));
tasks.forEach(task -> cs.submit(task));
for (int i = 0; i < tasks.size(); i++) { // you should know exact amount of submitted tasks
Result r = cs.take().get();
//process r
}
FixedThreadPool can be used in any other case, when you want to parallel threads without waiting for the results.
Also, note the difference between FixedThreadPool and CachedThreadPool. The first one is usually used when you need to keep threads alive and limit their amount. The seconds one is limited by system, it will process as many threads in parallel as possible. If a thread is in idle state in CachedThreadPool it will be automatically deleted after timeout (default is 60 seconds).

Java ThreadPool concepts, and issues with controlling the number of actual threads

I am a newbie to Java concurrency and am a bit confused by several concepts and implementation issues here. Hope you guys can help.
Say, I have a list of tasks stored in a thread-safe list wrapper:
ListWrapper jobs = ....
'ListWrapper' has synchronized fetch/push/append functions, and this 'jobs' object will be shared by multiple worker threads.
And I have a worker 'Runnable' to execute the tasks:
public class Worker implements Runnable{
private ListWrapper jobs;
public Worker(ListWrapper l){
this.jobs=l;
}
public void run(){
while(! jobs.isEmpty()){
//fetch an item from jobs and do sth...
}
}
}
Now in the main function I execute the tasks:
int NTHREADS =10;
ExecutorService service= Executors.newFixedThreadPool(NTHREADS);
//run threads..
int x=3;
for(int i=0; i<x; i++){
service.execute(new Worker(jobs) );
}
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
Now my questions are:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
Thread t1= new Worker(jobs);
Thread t2= new Worker(jobs);
...
t1.join();
t2.join();
...
Thank you very much!!
[[ There are some good answers here but I thought I'd add some more detail. ]]
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
No, not really. I suspect that the reason you weren't seeing 20 threads is that threads had already finished or had yet to be started. If you call new Thread(...).start() 20 times then you will get 20 threads started. However, if you check immediately none of them may have actually begun to run or if you check later they may have finished.
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
Quoting the Javadocs of Executors.newFixedThreadPool(...):
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks.
So changing the NTHREADS constant changes the number of threads running in the pool. Changing x changes the number of jobs that are executed by those threads. You could have 2 threads in the pool and submit 1000 jobs or you could have 1000 threads and only submit 1 job for them to work on.
Btw, after you have submitted all of your jobs, you should then shutdown the pool which stops all of the threads if all of the jobs have been run.
service.shutdown();
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It differs in that it does all of the heavy work for you.
You don't have to create a ListWrapper of the jobs since you get one inside of the ExecutorService. You just submit the jobs to the ExecutorService and it keeps track of them until the threads are available to run them.
You don't have to create any threads or worry about them throwing exceptions and dying because the ExecutorService starts/restarts the threads for you.
If you want your tasks to return information you can make use of the submit(Callable) method and use the Future to get the results of the jobs. Etc, etc..
Doing this code yourself is going to be harder to get right, more code to maintain, and most likely will not perform as well as the code in the JDK that is battle tested and optimized.
You shouldn't create threads by yourself when using a threadpool. Instead of WorkerThread class you should use a class that implements Runnable but is not a thread. Passing a Thread object to the threadpool won't make the thread run actually. The object will be passed to a different internal thread, which will simply execute the run method of your WorkerThread class.
The ExecutorService is simply incompatible with the way you want to write your program.
In the code you have right now, these WorkerThreads will stop to work when your ListWrapper is empty. If you then add something to the list, nothing will happen. This is definitely not what you wanted.
You should get rid of ListWrapper and simply put your tasks directly into the threadpool. The threadpool already incorporates an internal list of jobs shared between the threads. You should just submit your jobs to the threadpool and it will handle them accordingly.
To answer your questions:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
NTHREADS, the threadpool will create the necessary number of threads.
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It's just that ExecutorService automates a lot of things for you. You can choose from a lot of different implementations of threadpools and you can substitute them easily. You can use for instance a scheduled executor. You get extra functionality. Why reinvent the wheel?
For 1) NTHREADS is the maximum threads that the pool will ever run concurrently, but that doesn't mean there will always be that many running. It will only use as many as is needed up to that max value... which in your case is 3.
As the docs say:
At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool-int-
As for 2) using Java's concurrent executors framework is preferred with new code. You get a lot of stuff for free and removes the need for having to handle all of the fiddly thread work yourself.
The number of threads passed into newFixedThreadPool is at most how many threads could be running executing your tasks. If you only have three tasks ever submitted I'd expect the ExecutorService to only create three threads.
To answer your questions:
You should use the number you pass into the constructor to control how many threads are going to be used to execute your tasks.
This differs because of the extra functionality the ExecutorService gives you, as well as the flexibility it gives you such as in the case you need to change your ExecutorService type or number of tasks you'll run (less lines of code to change).
All that is happening is the executor service is only creating as many threads as it needs. NTHREADS is effectively the maximum number of threads it'll create.
There is no point creating ten threads up front if it only has 3 tasks to complete, the other 7 will just be hanging around consuming resources.
If you submit more than NTHREADS number of tasks then it will process that number concurrently and the rest will wait on a queue until a thread becomes free.
This isn't any different from creating a fixed set of your own threads, except the thread management and scheduling is handled for you. The executor service also restarts threads if they are killed by rogue exceptions in your task which you'd otherwise have to code for.
See: The Javadoc on Executorservice.newFixedThreadPool

Multithreading Architecture for N Repeating Tasks

I have N tasks, each to be repeated after its own specific delay Interval(N) using a fixed thread pool size that will usually be smaller than N.
Since there will usually be a shortage of threads, preference should be given to executing a different task rather than repeating a recently completed task.
I was thinking of using an outer ThreadPoolExecutor with N nested ScheduledThreadPoolExecutors. I'm not really sure how to go about this in the most optimal way since each of those classes maintains its own internal thread pool.
Besides the use of a PriorityQueue as answered by assylias, you can also solve this architecturally, by having a simple executing ThreadPoolExecutor, and another ScheduledExecutorService, which will insert the tasks after a given delay.
So every task has the executive Runnable, and an insertion Runnable, and will, after successful execution, tell the ScheduledExecutorService to run the insertion Runnable after a given delay, which will then put the task back into the ThreadPoolExecutor.
As code:
// myExecutionTask
void run() {
doSomeWork();
scheduledExecutor.schedule(myInsertionRunnable, 1000, TimeUnit.MILLISECONDS);
}
and
// myInsertionRunnable
void run () {
threadPoolExecutor.execute(myExecutionTask);
}
Effectively this will automatically cycle the tasks in the ThreadPoolExecutor, as those tasks that have already been finished, will be at the end of the queue.
Edit: As discussed in comments, when using the scheduler's fixedRate or fixedDelay functionality on a very busy system, tasks added later might be executed less often than task that have been added earlier, as the system seems to prefer tasks that are already executing when deciding for the next one to run.
In contrast my solution above cycles these tasks properly, although there can be no guarantee on a busy system, that the requested delay is exact. So they might be executed later, but at least always in FIFO order.
You could use a PriorityBlockingQueue and use timestamps to define priorities - something like:
class Task {
AtomicLong lastRun;
Runnable r;
void run() {
r.run();
lastRun.set(System.currentMillis);
}
}
Your ScheduledExecutorService (one thread) can then add the task N to a PriorityQueue every Interval(N).
And you can have a separate consumer running in your FixedThreadPool that takes from the Queue (using a reverse comparator so that the tasks run more recently will have a lower priority).
That is a little sketchy but it should work.

ThreadPool does not run tasks in sequence

I am using the Executor framework specifically Executors.newCachedThreadPool();
I have a list of Runnables e.g. 100.
The first 50, each create a value (stored in a list) to be used by the last 50.
I thought that if I pass the Runnables in the executor.execute() in the order they are in the list, they would be
also executed in the same order.
But this is not happening.
The tasks seem to be executed in random order and they are interleaved, not executed in sequence.
Is this how it is suppose to work? Any way to work around this problem?
Thanks
You need to submit the jobs in two batches, or otherwise create an explicit "happens-before" relationship. Suggest building two batches of jobs and using invokeAll(batch1); invokeAll(batch2); The invokeAll() method will execute all of the tasks and block until they complete. You may need to wrap your Runnables as Callables, which you can do with Executors.callable(Runnable r). (#Cameron Skinner beat me to getting some code example, see that answer for more...)
The whole point of executors is to abstract away the specifics of execution, so ordering is not guaranteed unless explicitly stated. If you want strictly sequential execution, do it in the thread you're running in (simplest), do it in a single-threaded executor, ala Executors.newSingleThreadExecutor(), or explicitly synchronize the tasks. If you want to do the latter, you could use a barrier or latch and have the dependent tasks block on the barrier/latch. You could also have the first block of tasks implement Callable, return Future, and have the dependent tasks call myFuture.get() which would cause them to block until the results are returned.
If you say more about your specific application, we might be able to help more specifically.
That is the correct behaviour. You have no guarantees about which order the Runnables are executed.
Executors run in parallel, whereas it seems that you want the tasks run in serial. You can either submit the first 50 jobs, wait for them to finish, then submit the second 50 jobs, or (if the order of execution is important) just run them all in a single thread.
For example,
for (Future<Whatever> f: service.invokeAll(first50tasks)) {
addResultToList(f.get());
}
Future<Whatever> output = service.invokeAll(second50tasks);
Perhaps you could execute the first 50, shutdown and awaitTermination, and only then execute the other 50? See this answer for some sample code.

Executor in java

I was trying to run ExecutorService object with FixedThreadPool and I ran into problems.
I expected the program to run in nanoseconds but it was hung. I found that I need to use Semaphore along with it so that the items in the queue do not get added up.
Is there any way I can come to know that all the threads of the pool are used.
Basic code ...
static ExecutorService pool = Executors.newFixedThreadPool(4);
static Semaphore permits = new Semaphore(4);
try {
permits.acquire();
pool.execute(p); // Assuming p is runnable on large number of objects
permits.release();
} catch ( InterruptedException ex ) {
}
This code gets hanged and I really don't know why. How to know if pool is currently waiting for all the threads to finish?
By default, if you submit more than 4 tasks to your pool then the extra tasks will be queued until a thread becomes available.
The blog you referenced in your comment uses the semaphore to limit the amount of work that can be queued at once, which won't be a problem for you until you have many thousands of tasks queued up and they start eating into the available memory. There's an easier way to do this, anyway - construct a ThreadPoolExecutor with a bounded queue.* But this isn't your problem.
If you want to know when a task completes, notice that ExecutorService.submit() returns a Future object which can be used to wait for the task's completion:
Future<?> f = pool.execute(p);
f.get();
System.out.println("task complete");
If you have several tasks and want to wait for all of them to complete, either store each Future in a list and then call get() on each in turn, or investigate ExecutorService.invokeAll() (which essentially does the same but in a single method call).
You can also tell whether a task has completed or not:
Future<?> f = pool.execute(p);
while(!f.isDone()) {
// do something else, task not complete
}
f.get();
Finally, note that even if your tasks are complete, your program may not exit (and thus appears to "hang") if you haven't called shutdown() on the thread pool; the reason is that the threads are still running, waiting to be given more work to do.
*Edit: sorry, I just re-read my answer and realised this part is incorrect - ThreadPoolExecutor offers tasks to the queue and rejects them if they aren't accepted, so a bounded queue has different semantics to the semaphore approach.
You do not need the Semaphore.
If you are hanging it is probably because the threads are locking themselves elsewhere.
Run the code in a Debuger and when it hangs pause it and see what the threads are doing.
You could change to using a ThreadPoolExecutor. It contains a getActiveCount() method which returns an approximate count of the active threads. Why it is approximate I'm not sure.

Categories