The Java ExecutorService framework allows you to delegate a number of tasks to be performed using a managed thread pool so that N tasks can be performed X tasks at a time until complete.
My question is ... what if N is a number that is either infinite or so large as to be impractical to allocate/assign/define initially.
How can you leverage the concept of thread pooling in Java (ExecutorService) to handle more tasks than you could reasonably submit without exhausting resources.
For purposes of this answer, assume each task is self-contained and does not depend on any other task and that tasks can be completed in arbitrary order.
My initial attempt at attacking this problem involved feeding the ExecutorService Y threads at a time but I quickly realized that there's no apparent way to tell when a particular task is complete and therefore submit a new task to be executed.
I know I could write my own "ExecutorService" but I am trying to leverage the bounty of what the Java framework already provides. I'm generally in the "don't re-invent the wheel" category because greater minds than mine have already made investments for me.
Thanks in advance to anybody that can provide any insight in how to attack this type of problem.
You could use a CompletionService to do it. You can have one thread that seeds the service with a bunch of tasks, then as tasks complete you can add new ones.
A simple example:
final CompletionService service = new ExecutorCompletionService(Executors.newFixedThreadPool(5));
Runnable taskGenerator = new Runnable() {
public void run() {
// Seed the service
for (int i = 0; i < 100; ++i) {
service.submit(createNewTask());
}
// As tasks complete create new ones
while (true) {
Future<Something> result = service.take();
processResult(result.get());
service.submit(createNewTask());
}
}
};
new Thread(taskGenerator).start();
This uses a ThreadPoolExecutor with 5 threads to process tasks and a hand-rolled producer/consumer thread for generating tasks and processing results.
Obviously you'll want something a little smarter than while (true), you'll need to have sensible implementations of processResult and createNewTask, and this assumes that task execution is much slower than generating them or processing the results.
Hopefully this will get you on the right track.
Use java.util.concurrent.ThreadPoolExecutor with java.util.concurrent.ArrayBlockingQueue as its workQueue. This way attempt to put more tasks than the size of the queue would block.
BlockingQueue<Runnable> workQueue=new ArrayBlockingQueue<Runnable>(100);
ThreadPoolExecutor tpe=new ThreadPoolExecutor(5, 10, 60, TimeUnit.SECONDS, workQueue);
while (true) {
tpe.execute(createNewTask());
}
Related
So, I'm new in Java.
I wrote relatively simple program that does something with a lot of files.
It was slow, I wanted to run more threads than one. With little StackOverflow Community help I made something like this:
public class FileProcessor {
public static void main(String[] args)
{
// run 5 threads
ExecutorService executor = Executors.newFixedThreadPool(5);
int i;
// get first and last file ID to process
int start = Integer.parseInt(args[0]);
int end = Integer.parseInt(args[1]);
for (i = start; i < end; i++)
{
final int finalId = i; // final necessary in anonymous class
executor.submit(new Runnable()
{
public void run()
{
processFile(finalId);
}
});
}
}
public static void processFile(int id)
{
//doing work here
}
}
This is really really simple multithreading solution and it does what I want. Now I want to fix/improve it because I guess I'm doing it wrong (program never ends, uses more memory than it should etc.).
Shall I reduce number of Runnable objects existing in memory at the same time? If I should - how can I do it?
How can I detect, that all job is done and exit program (and threads)?
when you said program never ends, uses more memory than it should, it could be due to many reasons like,
1) processFile() might be doing some heavy I/O operations (or) it is blocked for some I/O data.
2) There could be a potential dead lock if there is any common data object share
Your thread logic itself, pretty straight forward with ThreadPoolExecutor and I believe the problem is with the code in processFile().
Since you initialized the pool with 5, The ThreadPoolExecutor makes sure that there are only 5 active threads doing the work irrespective of how many possible threads you want to create.
So In this case, I would focus more on application logic optimization than thread management.
If you are really concerned about how many Runnable objects you want to create? Then That's a trade-off between your application requirement and available resources.
If each thread task is independent and there is a execution time limit for all of those threads, then you create more threads in pool and add more resources.
When you define a PoolExecutor with 5 threads at a time limit, and create 10,000 threads then obviously they have to wait as a Future Task in memory until a thread is available.
If you want to reduce the number of threads running, just reduce the size you pass to the fixed thread pool constructor. As for termination, call shutdown and awaitTermination on the executor service. But that will only reduce the number of active threads, not the number of Runnables you're creating in your loop.
I have a requirement in multi-threaded environment in java. The problem is like;
I have suppose 10 different task, and I want to assign all these 10 task to 10 different threads. Now the finish time for these tasks could be different. And there is some finishing or clearance task which should be performed when all these 10 threads are finished. In other words i need to wait until all threads are finished and then only I can go ahead with my further code execution.
Please let me know if any more details required here.
Thansk,
Ashish
Sounds like an ideal job for CountDownLatch.
Initialize it with 10 counts and when each thread finishes its job, it counts down one.
When all 10 threads have finished, the CountDownLatch will let the original thread run, and it can perform the cleanup.
And fire up an ExecutorService with 10 fixed threads to run the tasks.
CyclicBarier (JDK java.util.concurrent) of size 10 is perfect solutuon for you. With CyclicBarier you can wait for 10 threads. If all t hreads achieve barier then you can go further.
Edit: CyclicBarier is almost the same as CountDownLatch but you can reuse barier invoking reset() method.
Whilst CountDownLatch and CyclicBarier do the job of synchronizing multiple threads and performing one action when all threads reach the required point, they require all tasks to actively use this feature. If you are interested in the finishing of the entire task(s) only, the solution can be much simpler: add all tasks to a Collection and use the invokeAll method of an ExecutorService which returns when all tasks have been completed. A simple example:
Callable<Void> simpleTask=new Callable<Void>() {
public Void call() {
System.out.println("Performing one job");
return null;
}
};
List<Callable<Void>> list = Collections.nCopies(10, simpleTask);
ExecutorService es=Executors.newFixedThreadPool(10);
es.invokeAll(list);
System.out.println("All completed");
If each thread terminates after it is finished, you could just use the join() statement. A simple example can be found in the Essential Java Tutorials.
ArrayList<Thread> myThreads = new ArrayList<Thread>();
for (int i = 0; i < 10; i++){
//MyTaskRunnable is a Runnable with your logic
Thread t = new Thread(new MyTaskRunnable());
myThreads.add(t);
}
for(Thread t : myThreads){
t.start();
}
//here all threads are running
for(Thread t : myThreads){
t.join();
}
//here all threads have terminated
Edit:
The other answers all have their merits and are very useful in practice, the join() is however the most basic of the constructs. The CyclicBarrier and CountDownLatch versions allow your threads to continue running after reaching the synchronization point, which can be necessary in some cases. The ExecutorService is more suited to many tasks needing to be executed on a fixed number of threads (aka a thread pool), to create an ExecutorService for just 10 tasks is a bit drastic.
Finally, if you are new to learning Java or are taking a course on concurrency, you should try out all the variants and see what they do. The join is the most basic of these constructs and will help you understand you what is going on. Also it is the basic model supported by most other languages.
I am looking for a load balanced thread pool with no success so far. (Not sure whether load balancing is the correct wording).
Let me explain what I try to achieve.
Part 1:
I have Jobs, with 8 to 10 single tasks. On a 6 core CPU I let 8 thread work on this tasks in parallel which seems to deliver best peformance. Whe one task is ready, another one can start. Once all ten tasks are finished, the complete job is done. Usually a job is done in 30 to 60 seconds.
Part two:
Some times, unfortunately, the job takes more then two hours. This is correct due to amount of data that has to be calculated.
The bad thing is, that no other job can start while job1 is running (assuming, that all threads have the same duration) because it is using all threads.
My First idea:
Have 12 threads, allow up to three jobs in parallel.
BUT: that means, the cou is not fully untilized when there is only 1 job.
I am looking for a solution to have full CPU power for job one when there is no other job. But when an other job needs to be started while one other is running, I want the CPU power allocated to both job. And when a third or fourth job shows up, I want the cpu power alocated fairly to all four jobs.
I apreciate your answers...
thanks in advance
One possibility might be to use a standard ThreadPoolExecutor with a different kind of task queue
public class TaskRunner {
private static class PriorityRunnable implements Runnable,
Comparable<PriorityRunnable> {
private Runnable theRunnable;
private int priority = 0;
public PriorityRunnable(Runnable r, int priority) {
this.theRunnable = r;
this.priority = priority;
}
public int getPriority() {
return priority;
}
public void run() {
theRunnable.run();
}
public int compareTo(PriorityRunnable that) {
return this.priority - that.priority;
}
}
private BlockingQueue<Runnable> taskQueue = new PriorityBlockingQueue<Runnable>();
private ThreadPoolExecutor exec = new ThreadPoolExecutor(8, 8, 0L,
TimeUnit.MILLISECONDS, taskQueue);
public void runTasks(Runnable... tasks) {
int priority = 0;
Runnable nextTask = taskQueue.peek();
if(nextTask instanceof PriorityRunnable) {
priority = ((PriorityRunnable)nextTask).getPriority() + 1;
}
for(Runnable t : tasks) {
exec.execute(new PriorityRunnable(t, priority));
priority += 100;
}
}
}
The idea here is that when you have a new job you call
taskRunner.runTasks(jobTask1, jobTask2, jobTask3);
and it will queue up the tasks in such a way that they interleave nicely with any existing tasks in the queue (if any). Suppose you have one job queued, whose tasks have priority numbers j1t1=3, j1t2=103, and j1t3=203. In the absence of other jobs, these tasks will execute one after the other as quickly as possible. But if you submit another job with three tasks of its own, these will be assigned priority numbers j2t1=4, j2t2=104 and j2t3=204, meaning the queue now looks like
j1t1, j2t1, j1t2, j2t2, etc.
This is not perfect however, because if all threads are currently working (on tasks from job 1) then the first task of job 2 can't start until one of the job 1 tasks is complete (unless there's some external way for you to detect this and interrupt and re-queue some of job 1's tasks). The easiest way to make things more fair would be to break down the longer-running tasks into smaller segments and queue those as separate tasks - you need to get to a point where each individual job involves more tasks than there are threads in the pool, so that some of the tasks will always start off in the queue rather than being assigned directly to threads (if there are idle threads then exec.execute() passes the task straight to a thread without going through the queue at all).
The easiest thing to do is to oversubscribe your CPU, as Kanaga suggests, but start 8 threads each. There may be some overhead from the competition, but if you get to a single job situation, it will fully utilize the CPU. The OS will handle giving time to each thread.
Your "first idea" would also work. The idle threads wouldn't take resources from 8 working threads if they aren't actually executing a task. This wouldn't distribute the cpu resources as evenly when there are multiple jobs running, though.
Do you have a setup where you can test these different pipelines to see how they're performing for you?
I think since your machine is 6 core CPU. Better have 6 worker thread for each job-thread. So that when ever one thread got a new job, it starts up to six parallel workers to work on the single job. This will ensure consuming the full cpu power when there is only one job at a time.
Also please have a look at Fork and Join concept in java 7.
References_1 References_2References_3 References_4
Also learn about newcachedthreadpool()
Java newCachedThreadPool() versus newFixedThreadPool
I am using the Executors framework in Java to create thread pools for a multi-threaded application, and I have a question related to performance.
I have an application which can work in realtime or non-realtime mode. In case it's realtime, I'm simply using the following:
THREAD_POOL = Executors.newCachedThreadPool();
But in case it's not realtime, I want the ability to control the size of my thread pool.
To do this, I'm thinking about 2 options, but I don't really understand the difference, and which one would perform better.
Option 1 is to use the simple way:
THREAD_POOL = Executors.newFixedThreadPool(threadPoolSize);
Option 2 is to create my own ThreadPoolExecutor like this:
RejectedExecutionHandler rejectHandler = new RejectedExecutionHandler() {
#Override
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
try {
executor.getQueue().put(r);
} catch (Exception e) {}
}
};
THREAD_POOL = new ThreadPoolExecutor(threadPoolSize, threadPoolSize, 0, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(10000), rejectHandler);
I would like to understand what is the advantage of using the more complex option 2, and also if I should use another data structure than LinkedBlockingQueue? Any help would be appreciated.
Looking at the source code you'll realize that:
Executors.newFixedThreadPool(threadPoolSize);
is equivalent to:
return new ThreadPoolExecutor(threadPoolSize, threadPoolSize, 0L, MILLISECONDS,
new LinkedBlockingQueue<Runnable>());
Since it doesn't provide explicit RejectedExecutionHandler, default AbortPolicy is used. It basically throws RejectedExecutionException once the queue is full. But the queue is unbounded, so it will never be full. Thus this executor accepts inifnite1 number of tasks.
Your declaration is much more complex and quite different:
new LinkedBlockingQueue<Runnable>(10000) will cause the thread pool to discard tasks if more than 10000 are awaiting.
I don't understand what your RejectedExecutionHandler is doing. If the pool discovers it cannot put any more runnables to the queue it calls your handler. In this handler you... try to put that Runnable into the queue again (which will fail in like 99% of the cases block). Finally you swallow the exception. Seems like ThreadPoolExecutor.DiscardPolicy is what you are after.
Looking at your comments below seems like you are trying to block or somehow throttle clients if tasks queue is too large. I don't think blocking inside RejectedExecutionHandler is a good idea. Instead consider CallerRunsPolicy rejection policy. Not entirely the same, but close enough.
To wrap up: if you want to limit the number of pending tasks, your approach is almost good. If you want to limit the number of concurrent threads, the first one-liner is enough.
1 - assuming 2^31 is infinity
Right now I have this Groovy code to run a series of tasks:
CountDownLatch latch = new CountDownLatch(tasks.size);
for( task in tasks ) {
Thread.start worker.curry(task, latch)
}
latch.await(300L, TimeUnit.SECONDS);
I'd like to limit the number of simultaneous threads to a certain number t. The way it's written now, for n tasks, n threads get created "at once". I thought about using multiple latches or some sort of callback, but couldn't come up with a good solution.
The solution should start new task threads as soon as running threads drops below t, until number running reaches t or there are no un-run tasks.
You should check out GPars and use one of the abstractions listed. You can then specify the number to create in withPool(). I like fork/join:
withPool(4) { pool ->
runForkJoin(rootTask) { task ->
task.eachTask { forkChild(task) }
}
}
You can use the Executor framework.
Executors.newFixedThreadPool(t);
This will create n and only n threads on start. Then you can submit to the executor to utilize these threads.
Edit: Thanks for the comment Josh, I'll post your solution
ExecutorService pool = Executors.newFixedThreadPool(6);
for( task in tasks ) {
pool.execute Worker.curry(task)
}
pool.shutdown();