I have N tasks, each to be repeated after its own specific delay Interval(N) using a fixed thread pool size that will usually be smaller than N.
Since there will usually be a shortage of threads, preference should be given to executing a different task rather than repeating a recently completed task.
I was thinking of using an outer ThreadPoolExecutor with N nested ScheduledThreadPoolExecutors. I'm not really sure how to go about this in the most optimal way since each of those classes maintains its own internal thread pool.
Besides the use of a PriorityQueue as answered by assylias, you can also solve this architecturally, by having a simple executing ThreadPoolExecutor, and another ScheduledExecutorService, which will insert the tasks after a given delay.
So every task has the executive Runnable, and an insertion Runnable, and will, after successful execution, tell the ScheduledExecutorService to run the insertion Runnable after a given delay, which will then put the task back into the ThreadPoolExecutor.
As code:
// myExecutionTask
void run() {
doSomeWork();
scheduledExecutor.schedule(myInsertionRunnable, 1000, TimeUnit.MILLISECONDS);
}
and
// myInsertionRunnable
void run () {
threadPoolExecutor.execute(myExecutionTask);
}
Effectively this will automatically cycle the tasks in the ThreadPoolExecutor, as those tasks that have already been finished, will be at the end of the queue.
Edit: As discussed in comments, when using the scheduler's fixedRate or fixedDelay functionality on a very busy system, tasks added later might be executed less often than task that have been added earlier, as the system seems to prefer tasks that are already executing when deciding for the next one to run.
In contrast my solution above cycles these tasks properly, although there can be no guarantee on a busy system, that the requested delay is exact. So they might be executed later, but at least always in FIFO order.
You could use a PriorityBlockingQueue and use timestamps to define priorities - something like:
class Task {
AtomicLong lastRun;
Runnable r;
void run() {
r.run();
lastRun.set(System.currentMillis);
}
}
Your ScheduledExecutorService (one thread) can then add the task N to a PriorityQueue every Interval(N).
And you can have a separate consumer running in your FixedThreadPool that takes from the Queue (using a reverse comparator so that the tasks run more recently will have a lower priority).
That is a little sketchy but it should work.
Related
In my Java application I have a Runnable such as:
this.runner = new Runnable({
#Override
public void run() {
// do something that takes roughly 5 seconds.
}
});
I need to run this roughly every 30 seconds (although this can vary) in a separate thread. The nature of the code is such that I can run it and forget about it (whether it succeeds or fails). I do this as follows as a single line of code in my application:
(new Thread(this.runner)).start()
Now, this works fine. However, I'm wondering if there is any sort of cleanup I should be doing on each of the thread instances after they finish running? I am doing CPU profiling of this application in VisualVM and I can see that, over the course of 1 hour runtime, a lot of threads are being created. Is this concern valid or is everything OK?
N.B. The reason I start a new Thread instead of simply defining this.runner as a Thread, is that I sometimes need to run this.runner twice simultaneously (before the first run call has finished), and I can't do that if I defined this.runner as a Thread since a single Thread object can only be run again once the initial execution has finished.
Java objects that need to be "cleaned up" or "closed" after use conventionally implement the AutoCloseable interface. This makes it easy to do the clean up using try-with-resources. The Thread class does not implement AutoCloseable, and has no "close" or "dispose" method. So, you do not need to do any explicit clean up.
However
(new Thread(this.runner)).start()
is not guaranteed to immediately start computation of the Runnable. You might not care whether it succeeds or fails, but I guess you do care whether it runs at all. And you might want to limit the number of these tasks running concurrently. You might want only one to run at once, for example. So you might want to join() the thread (or, perhaps, join with a timeout). Joining the thread will ensure that the thread will completes its computation. Joining the thread with a timeout increases the chance that the thread starts its computation (because the current thread will be suspended, freeing a CPU that might run the other thread).
However, creating multiple threads to perform regular or frequent tasks is not recommended. You should instead submit tasks to a thread pool. That will enable you to control the maximum amount of concurrency, and can provide you with other benefits (such as prioritising different tasks), and amortises the expense of creating threads.
You can configure a thread pool to use a fixed length (bounded) task queue and to cause submitting threads to execute submitted tasks itself themselves when the queue is full. By doing that you can guarantee that tasks submitted to the thread pool are (eventually) executed. The documentation of ThreadPool.execute(Runnable) says it
Executes the given task sometime in the future
which suggests that the implementation guarantees that it will eventually run all submitted tasks even if you do not do those specific tasks to ensure submitted tasks are executed.
I recommend you to look at the Concurrency API. There are numerous pre-defined methods for general use. By using ExecutorService you can call the shutdown method after submitting tasks to the executor which stops accepting new tasks, waits for previously submitted tasks to execute, and then terminates the executor.
For a short introduction:
https://www.baeldung.com/java-executor-service-tutorial
In my Java application I have a Runnable such as:
this.runner = new Runnable({
#Override
public void run() {
// do something that takes roughly 5 seconds.
}
});
I need to run this roughly every 30 seconds (although this can vary) in a separate thread. The nature of the code is such that I can run it and forget about it (whether it succeeds or fails). I do this as follows as a single line of code in my application:
(new Thread(this.runner)).start()
Now, this works fine. However, I'm wondering if there is any sort of cleanup I should be doing on each of the thread instances after they finish running? I am doing CPU profiling of this application in VisualVM and I can see that, over the course of 1 hour runtime, a lot of threads are being created. Is this concern valid or is everything OK?
N.B. The reason I start a new Thread instead of simply defining this.runner as a Thread, is that I sometimes need to run this.runner twice simultaneously (before the first run call has finished), and I can't do that if I defined this.runner as a Thread since a single Thread object can only be run again once the initial execution has finished.
Java objects that need to be "cleaned up" or "closed" after use conventionally implement the AutoCloseable interface. This makes it easy to do the clean up using try-with-resources. The Thread class does not implement AutoCloseable, and has no "close" or "dispose" method. So, you do not need to do any explicit clean up.
However
(new Thread(this.runner)).start()
is not guaranteed to immediately start computation of the Runnable. You might not care whether it succeeds or fails, but I guess you do care whether it runs at all. And you might want to limit the number of these tasks running concurrently. You might want only one to run at once, for example. So you might want to join() the thread (or, perhaps, join with a timeout). Joining the thread will ensure that the thread will completes its computation. Joining the thread with a timeout increases the chance that the thread starts its computation (because the current thread will be suspended, freeing a CPU that might run the other thread).
However, creating multiple threads to perform regular or frequent tasks is not recommended. You should instead submit tasks to a thread pool. That will enable you to control the maximum amount of concurrency, and can provide you with other benefits (such as prioritising different tasks), and amortises the expense of creating threads.
You can configure a thread pool to use a fixed length (bounded) task queue and to cause submitting threads to execute submitted tasks itself themselves when the queue is full. By doing that you can guarantee that tasks submitted to the thread pool are (eventually) executed. The documentation of ThreadPool.execute(Runnable) says it
Executes the given task sometime in the future
which suggests that the implementation guarantees that it will eventually run all submitted tasks even if you do not do those specific tasks to ensure submitted tasks are executed.
I recommend you to look at the Concurrency API. There are numerous pre-defined methods for general use. By using ExecutorService you can call the shutdown method after submitting tasks to the executor which stops accepting new tasks, waits for previously submitted tasks to execute, and then terminates the executor.
For a short introduction:
https://www.baeldung.com/java-executor-service-tutorial
I am a newbie to Java concurrency and am a bit confused by several concepts and implementation issues here. Hope you guys can help.
Say, I have a list of tasks stored in a thread-safe list wrapper:
ListWrapper jobs = ....
'ListWrapper' has synchronized fetch/push/append functions, and this 'jobs' object will be shared by multiple worker threads.
And I have a worker 'Runnable' to execute the tasks:
public class Worker implements Runnable{
private ListWrapper jobs;
public Worker(ListWrapper l){
this.jobs=l;
}
public void run(){
while(! jobs.isEmpty()){
//fetch an item from jobs and do sth...
}
}
}
Now in the main function I execute the tasks:
int NTHREADS =10;
ExecutorService service= Executors.newFixedThreadPool(NTHREADS);
//run threads..
int x=3;
for(int i=0; i<x; i++){
service.execute(new Worker(jobs) );
}
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
Now my questions are:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
Thread t1= new Worker(jobs);
Thread t2= new Worker(jobs);
...
t1.join();
t2.join();
...
Thank you very much!!
[[ There are some good answers here but I thought I'd add some more detail. ]]
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
No, not really. I suspect that the reason you weren't seeing 20 threads is that threads had already finished or had yet to be started. If you call new Thread(...).start() 20 times then you will get 20 threads started. However, if you check immediately none of them may have actually begun to run or if you check later they may have finished.
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
Quoting the Javadocs of Executors.newFixedThreadPool(...):
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks.
So changing the NTHREADS constant changes the number of threads running in the pool. Changing x changes the number of jobs that are executed by those threads. You could have 2 threads in the pool and submit 1000 jobs or you could have 1000 threads and only submit 1 job for them to work on.
Btw, after you have submitted all of your jobs, you should then shutdown the pool which stops all of the threads if all of the jobs have been run.
service.shutdown();
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It differs in that it does all of the heavy work for you.
You don't have to create a ListWrapper of the jobs since you get one inside of the ExecutorService. You just submit the jobs to the ExecutorService and it keeps track of them until the threads are available to run them.
You don't have to create any threads or worry about them throwing exceptions and dying because the ExecutorService starts/restarts the threads for you.
If you want your tasks to return information you can make use of the submit(Callable) method and use the Future to get the results of the jobs. Etc, etc..
Doing this code yourself is going to be harder to get right, more code to maintain, and most likely will not perform as well as the code in the JDK that is battle tested and optimized.
You shouldn't create threads by yourself when using a threadpool. Instead of WorkerThread class you should use a class that implements Runnable but is not a thread. Passing a Thread object to the threadpool won't make the thread run actually. The object will be passed to a different internal thread, which will simply execute the run method of your WorkerThread class.
The ExecutorService is simply incompatible with the way you want to write your program.
In the code you have right now, these WorkerThreads will stop to work when your ListWrapper is empty. If you then add something to the list, nothing will happen. This is definitely not what you wanted.
You should get rid of ListWrapper and simply put your tasks directly into the threadpool. The threadpool already incorporates an internal list of jobs shared between the threads. You should just submit your jobs to the threadpool and it will handle them accordingly.
To answer your questions:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
NTHREADS, the threadpool will create the necessary number of threads.
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It's just that ExecutorService automates a lot of things for you. You can choose from a lot of different implementations of threadpools and you can substitute them easily. You can use for instance a scheduled executor. You get extra functionality. Why reinvent the wheel?
For 1) NTHREADS is the maximum threads that the pool will ever run concurrently, but that doesn't mean there will always be that many running. It will only use as many as is needed up to that max value... which in your case is 3.
As the docs say:
At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool-int-
As for 2) using Java's concurrent executors framework is preferred with new code. You get a lot of stuff for free and removes the need for having to handle all of the fiddly thread work yourself.
The number of threads passed into newFixedThreadPool is at most how many threads could be running executing your tasks. If you only have three tasks ever submitted I'd expect the ExecutorService to only create three threads.
To answer your questions:
You should use the number you pass into the constructor to control how many threads are going to be used to execute your tasks.
This differs because of the extra functionality the ExecutorService gives you, as well as the flexibility it gives you such as in the case you need to change your ExecutorService type or number of tasks you'll run (less lines of code to change).
All that is happening is the executor service is only creating as many threads as it needs. NTHREADS is effectively the maximum number of threads it'll create.
There is no point creating ten threads up front if it only has 3 tasks to complete, the other 7 will just be hanging around consuming resources.
If you submit more than NTHREADS number of tasks then it will process that number concurrently and the rest will wait on a queue until a thread becomes free.
This isn't any different from creating a fixed set of your own threads, except the thread management and scheduling is handled for you. The executor service also restarts threads if they are killed by rogue exceptions in your task which you'd otherwise have to code for.
See: The Javadoc on Executorservice.newFixedThreadPool
I am looking for a load balanced thread pool with no success so far. (Not sure whether load balancing is the correct wording).
Let me explain what I try to achieve.
Part 1:
I have Jobs, with 8 to 10 single tasks. On a 6 core CPU I let 8 thread work on this tasks in parallel which seems to deliver best peformance. Whe one task is ready, another one can start. Once all ten tasks are finished, the complete job is done. Usually a job is done in 30 to 60 seconds.
Part two:
Some times, unfortunately, the job takes more then two hours. This is correct due to amount of data that has to be calculated.
The bad thing is, that no other job can start while job1 is running (assuming, that all threads have the same duration) because it is using all threads.
My First idea:
Have 12 threads, allow up to three jobs in parallel.
BUT: that means, the cou is not fully untilized when there is only 1 job.
I am looking for a solution to have full CPU power for job one when there is no other job. But when an other job needs to be started while one other is running, I want the CPU power allocated to both job. And when a third or fourth job shows up, I want the cpu power alocated fairly to all four jobs.
I apreciate your answers...
thanks in advance
One possibility might be to use a standard ThreadPoolExecutor with a different kind of task queue
public class TaskRunner {
private static class PriorityRunnable implements Runnable,
Comparable<PriorityRunnable> {
private Runnable theRunnable;
private int priority = 0;
public PriorityRunnable(Runnable r, int priority) {
this.theRunnable = r;
this.priority = priority;
}
public int getPriority() {
return priority;
}
public void run() {
theRunnable.run();
}
public int compareTo(PriorityRunnable that) {
return this.priority - that.priority;
}
}
private BlockingQueue<Runnable> taskQueue = new PriorityBlockingQueue<Runnable>();
private ThreadPoolExecutor exec = new ThreadPoolExecutor(8, 8, 0L,
TimeUnit.MILLISECONDS, taskQueue);
public void runTasks(Runnable... tasks) {
int priority = 0;
Runnable nextTask = taskQueue.peek();
if(nextTask instanceof PriorityRunnable) {
priority = ((PriorityRunnable)nextTask).getPriority() + 1;
}
for(Runnable t : tasks) {
exec.execute(new PriorityRunnable(t, priority));
priority += 100;
}
}
}
The idea here is that when you have a new job you call
taskRunner.runTasks(jobTask1, jobTask2, jobTask3);
and it will queue up the tasks in such a way that they interleave nicely with any existing tasks in the queue (if any). Suppose you have one job queued, whose tasks have priority numbers j1t1=3, j1t2=103, and j1t3=203. In the absence of other jobs, these tasks will execute one after the other as quickly as possible. But if you submit another job with three tasks of its own, these will be assigned priority numbers j2t1=4, j2t2=104 and j2t3=204, meaning the queue now looks like
j1t1, j2t1, j1t2, j2t2, etc.
This is not perfect however, because if all threads are currently working (on tasks from job 1) then the first task of job 2 can't start until one of the job 1 tasks is complete (unless there's some external way for you to detect this and interrupt and re-queue some of job 1's tasks). The easiest way to make things more fair would be to break down the longer-running tasks into smaller segments and queue those as separate tasks - you need to get to a point where each individual job involves more tasks than there are threads in the pool, so that some of the tasks will always start off in the queue rather than being assigned directly to threads (if there are idle threads then exec.execute() passes the task straight to a thread without going through the queue at all).
The easiest thing to do is to oversubscribe your CPU, as Kanaga suggests, but start 8 threads each. There may be some overhead from the competition, but if you get to a single job situation, it will fully utilize the CPU. The OS will handle giving time to each thread.
Your "first idea" would also work. The idle threads wouldn't take resources from 8 working threads if they aren't actually executing a task. This wouldn't distribute the cpu resources as evenly when there are multiple jobs running, though.
Do you have a setup where you can test these different pipelines to see how they're performing for you?
I think since your machine is 6 core CPU. Better have 6 worker thread for each job-thread. So that when ever one thread got a new job, it starts up to six parallel workers to work on the single job. This will ensure consuming the full cpu power when there is only one job at a time.
Also please have a look at Fork and Join concept in java 7.
References_1 References_2References_3 References_4
Also learn about newcachedthreadpool()
Java newCachedThreadPool() versus newFixedThreadPool
I was trying to run ExecutorService object with FixedThreadPool and I ran into problems.
I expected the program to run in nanoseconds but it was hung. I found that I need to use Semaphore along with it so that the items in the queue do not get added up.
Is there any way I can come to know that all the threads of the pool are used.
Basic code ...
static ExecutorService pool = Executors.newFixedThreadPool(4);
static Semaphore permits = new Semaphore(4);
try {
permits.acquire();
pool.execute(p); // Assuming p is runnable on large number of objects
permits.release();
} catch ( InterruptedException ex ) {
}
This code gets hanged and I really don't know why. How to know if pool is currently waiting for all the threads to finish?
By default, if you submit more than 4 tasks to your pool then the extra tasks will be queued until a thread becomes available.
The blog you referenced in your comment uses the semaphore to limit the amount of work that can be queued at once, which won't be a problem for you until you have many thousands of tasks queued up and they start eating into the available memory. There's an easier way to do this, anyway - construct a ThreadPoolExecutor with a bounded queue.* But this isn't your problem.
If you want to know when a task completes, notice that ExecutorService.submit() returns a Future object which can be used to wait for the task's completion:
Future<?> f = pool.execute(p);
f.get();
System.out.println("task complete");
If you have several tasks and want to wait for all of them to complete, either store each Future in a list and then call get() on each in turn, or investigate ExecutorService.invokeAll() (which essentially does the same but in a single method call).
You can also tell whether a task has completed or not:
Future<?> f = pool.execute(p);
while(!f.isDone()) {
// do something else, task not complete
}
f.get();
Finally, note that even if your tasks are complete, your program may not exit (and thus appears to "hang") if you haven't called shutdown() on the thread pool; the reason is that the threads are still running, waiting to be given more work to do.
*Edit: sorry, I just re-read my answer and realised this part is incorrect - ThreadPoolExecutor offers tasks to the queue and rejects them if they aren't accepted, so a bounded queue has different semantics to the semaphore approach.
You do not need the Semaphore.
If you are hanging it is probably because the threads are locking themselves elsewhere.
Run the code in a Debuger and when it hangs pause it and see what the threads are doing.
You could change to using a ThreadPoolExecutor. It contains a getActiveCount() method which returns an approximate count of the active threads. Why it is approximate I'm not sure.