Is there any way to find the number of tasks completed after a call to invokeAll()? It seems that that returns the list of booleans of completed tasks when all of the threads are completed.
I have a pool of 1000 tasks and want to take a look at them in 100 intervals without having to divide the them into 100-task batches.
I also for compatibility reasons have to work with Java 6 so newer methods won't help.
Also, as a side question: does the invokeAll() processes the tasks in FIFO manner? That is, are the tasks get started with the order with which they are added to the task list?
Thanks
I have a pool of 1000 tasks and want to take a look at them in 100 intervals without having to divide the them into 100-task batches.
You should consider using an ExecutorCompletionService which allows you to get notified once a single job has finished instead of having to wait for all jobs to complete using invokeAll(). Then you can put each of the finished jobs into a collection and then act on them when you get 100.
Maybe something like:
CompletionService<Result> ecs = new ExecutorCompletionService<Result>(executor);
for (Callable<Result> s : solvers)
ecs.submit(s);
int n = solvers.size();
List<Result> batch = new ArrayList<Result>();
for (int i = 0; i < n; ++i) {
Result r = ecs.take().get();
batch.add(r);
if (batch.size() >= 100) {
process(batch);
batch.clear();
}
}
if (!batch.isEmpty()) {
process(batch);
}
does the invokeAll() processes the tasks in FIFO manner? That is, are the tasks get started with the order with which they are added to the task list?
The tasks are submitting to the thread-pool in FIFO manner and are dequeued by the threads also in FIFO order. However, once each thread has a job, there are race conditions which may cause some re-ordering of the actual task "start" and certainly finish.
Tasks will be added in the order you specify, and approximately started in that order if you have multiple threads. The order they are completed will be roughly in that order if they take the same amount of time.
I would build a List<Future> which you can poll periodically to see how many are done.
Related
As far as I know, executor completion service provides the output from the future object regardless of the order in which the task was requested in the inbound queue, i.e. whichever task is completed first the Result is put into the Outbound Queue. On the other hand, FixedThreadPool also executes the tasks parallelly, then what is the difference between the two? ( Not sure whether the FixedThreadPool gives the output sequentially in the order the tasks were fed to the inbound queue )
Thanks.
FixedThreadPool is one of the variations of Executor. It uses class ThreadPoolExecutor with same values for corePoolSize and maximumPoolSize. It means, if you create FixedThreadPool with 10 threads, it will always keep exact 10 threads. If any of these threads are terminated by running task - thread pool will create new ones to keep required amount.
CompletionService arranges that submitted tasks are, upon completion, placed on a queue.
It means, that all results of submitted tasks will be in a queue and you can process them later.
When you submit a task to a CompletionService, it creates a wrapper, so the result of async task is saved to queue. It doesn't create parallelism itself, instead CompletionService uses inside Executor for making parallel threads. You can pass FixedThreadPool inside, for example.
All tasks, submitted to FixedThreadPool and CompletionService will be done in parallel, without keeping the order.
CompletionService can be used, when you need to know when all of your tasks are finished. Example:
//Task extends Callable<Result>
List<Task> tasks = new ArrayList<Task>();
CompletionService<Result> cs = new ExecutorCompletionService<Result>(Executors.newFixedThreadPool(10));
tasks.forEach(task -> cs.submit(task));
for (int i = 0; i < tasks.size(); i++) { // you should know exact amount of submitted tasks
Result r = cs.take().get();
//process r
}
FixedThreadPool can be used in any other case, when you want to parallel threads without waiting for the results.
Also, note the difference between FixedThreadPool and CachedThreadPool. The first one is usually used when you need to keep threads alive and limit their amount. The seconds one is limited by system, it will process as many threads in parallel as possible. If a thread is in idle state in CachedThreadPool it will be automatically deleted after timeout (default is 60 seconds).
Given the below code:
ScheduledExecutorService es = new ScheduledThreadPoolExecutor(100);
es.scheduleAtFixedRate(() -> {
System.out.println("Do work with a fixed rate! ");
}, 0, 1000, TimeUnit.MILLISECONDS);
int i = 0;
while ( i < 100 ) {
es.scheduleAtFixedRate(() -> {
System.out.println("Do more work with a fixed rate! Doesn't really work! We will end up with 100 'workers', each running with a fixed rate! ");
}, 0, 1000, TimeUnit.MILLISECONDS);
i++;
}
which creates a SchedueledThreadPoolExecutor.
In the while loop, we are simulating someone else wanting to add more work to the queue, but this won't obviously work.
I am guessing, that one needs to implement some sort of ThreadPoolExecutor which uses a Queue of some sort, possibly a delayed queue.
The idea is that the executor is created and then it has a fixed rate at which it can execute tasks. If a task finishes too quickly threads that have finished need to wait to do pull off more work.
If one finishes too slowly, then the global time should allow other threads in the threadpool to pull off more work.
http://docs.oracle.com/javase/7/docs/api/java/util/AbstractQueue.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/DelayQueue.html
But I was hoping this was already done, as it should be pretty common problem.
Does anyone have a good solution to this?
It is not entirely clear what you want to do, but my guess is that you want a kind of Pacer or Throttler that ensures that tasks are executed with a certain rate (think of the revolving doors found at entrances to office buildings and others, the speed of the door determines the number of persons that can enter (or exit) the building per time unit and the time difference between each entrance (or exit)).
ScheduledExecutorServcice is not a solution for that problem. Instead, start by studying the Leaky Bucket Algorithm.
A ScheduledThreadPoolExecutor maintains a queue of tasks that is ordered by the tasks' next scheduled execution. These tasks (the Runnable instances you provide) are completely independent of the threads that will execute them. In other words, the threads don't just acquire a task, execute it, then go to sleep waiting for that task's next execution.
Instead, the threads poll the queue, acquire a task, execute it, schedule the next execution of their task by reinserting it into the queue, then poll the queue again. If the queue has a task, great. If not, they wait until the next task is ready (whether it's currently in the queue or added at a later time). They then restart this whole process.
All this to say, that a ScheduledThreadPoolExecutor with 100 threads can easily process more than 100 tasks at any type of rate.
I am looking for a load balanced thread pool with no success so far. (Not sure whether load balancing is the correct wording).
Let me explain what I try to achieve.
Part 1:
I have Jobs, with 8 to 10 single tasks. On a 6 core CPU I let 8 thread work on this tasks in parallel which seems to deliver best peformance. Whe one task is ready, another one can start. Once all ten tasks are finished, the complete job is done. Usually a job is done in 30 to 60 seconds.
Part two:
Some times, unfortunately, the job takes more then two hours. This is correct due to amount of data that has to be calculated.
The bad thing is, that no other job can start while job1 is running (assuming, that all threads have the same duration) because it is using all threads.
My First idea:
Have 12 threads, allow up to three jobs in parallel.
BUT: that means, the cou is not fully untilized when there is only 1 job.
I am looking for a solution to have full CPU power for job one when there is no other job. But when an other job needs to be started while one other is running, I want the CPU power allocated to both job. And when a third or fourth job shows up, I want the cpu power alocated fairly to all four jobs.
I apreciate your answers...
thanks in advance
One possibility might be to use a standard ThreadPoolExecutor with a different kind of task queue
public class TaskRunner {
private static class PriorityRunnable implements Runnable,
Comparable<PriorityRunnable> {
private Runnable theRunnable;
private int priority = 0;
public PriorityRunnable(Runnable r, int priority) {
this.theRunnable = r;
this.priority = priority;
}
public int getPriority() {
return priority;
}
public void run() {
theRunnable.run();
}
public int compareTo(PriorityRunnable that) {
return this.priority - that.priority;
}
}
private BlockingQueue<Runnable> taskQueue = new PriorityBlockingQueue<Runnable>();
private ThreadPoolExecutor exec = new ThreadPoolExecutor(8, 8, 0L,
TimeUnit.MILLISECONDS, taskQueue);
public void runTasks(Runnable... tasks) {
int priority = 0;
Runnable nextTask = taskQueue.peek();
if(nextTask instanceof PriorityRunnable) {
priority = ((PriorityRunnable)nextTask).getPriority() + 1;
}
for(Runnable t : tasks) {
exec.execute(new PriorityRunnable(t, priority));
priority += 100;
}
}
}
The idea here is that when you have a new job you call
taskRunner.runTasks(jobTask1, jobTask2, jobTask3);
and it will queue up the tasks in such a way that they interleave nicely with any existing tasks in the queue (if any). Suppose you have one job queued, whose tasks have priority numbers j1t1=3, j1t2=103, and j1t3=203. In the absence of other jobs, these tasks will execute one after the other as quickly as possible. But if you submit another job with three tasks of its own, these will be assigned priority numbers j2t1=4, j2t2=104 and j2t3=204, meaning the queue now looks like
j1t1, j2t1, j1t2, j2t2, etc.
This is not perfect however, because if all threads are currently working (on tasks from job 1) then the first task of job 2 can't start until one of the job 1 tasks is complete (unless there's some external way for you to detect this and interrupt and re-queue some of job 1's tasks). The easiest way to make things more fair would be to break down the longer-running tasks into smaller segments and queue those as separate tasks - you need to get to a point where each individual job involves more tasks than there are threads in the pool, so that some of the tasks will always start off in the queue rather than being assigned directly to threads (if there are idle threads then exec.execute() passes the task straight to a thread without going through the queue at all).
The easiest thing to do is to oversubscribe your CPU, as Kanaga suggests, but start 8 threads each. There may be some overhead from the competition, but if you get to a single job situation, it will fully utilize the CPU. The OS will handle giving time to each thread.
Your "first idea" would also work. The idle threads wouldn't take resources from 8 working threads if they aren't actually executing a task. This wouldn't distribute the cpu resources as evenly when there are multiple jobs running, though.
Do you have a setup where you can test these different pipelines to see how they're performing for you?
I think since your machine is 6 core CPU. Better have 6 worker thread for each job-thread. So that when ever one thread got a new job, it starts up to six parallel workers to work on the single job. This will ensure consuming the full cpu power when there is only one job at a time.
Also please have a look at Fork and Join concept in java 7.
References_1 References_2References_3 References_4
Also learn about newcachedthreadpool()
Java newCachedThreadPool() versus newFixedThreadPool
The Java ExecutorService framework allows you to delegate a number of tasks to be performed using a managed thread pool so that N tasks can be performed X tasks at a time until complete.
My question is ... what if N is a number that is either infinite or so large as to be impractical to allocate/assign/define initially.
How can you leverage the concept of thread pooling in Java (ExecutorService) to handle more tasks than you could reasonably submit without exhausting resources.
For purposes of this answer, assume each task is self-contained and does not depend on any other task and that tasks can be completed in arbitrary order.
My initial attempt at attacking this problem involved feeding the ExecutorService Y threads at a time but I quickly realized that there's no apparent way to tell when a particular task is complete and therefore submit a new task to be executed.
I know I could write my own "ExecutorService" but I am trying to leverage the bounty of what the Java framework already provides. I'm generally in the "don't re-invent the wheel" category because greater minds than mine have already made investments for me.
Thanks in advance to anybody that can provide any insight in how to attack this type of problem.
You could use a CompletionService to do it. You can have one thread that seeds the service with a bunch of tasks, then as tasks complete you can add new ones.
A simple example:
final CompletionService service = new ExecutorCompletionService(Executors.newFixedThreadPool(5));
Runnable taskGenerator = new Runnable() {
public void run() {
// Seed the service
for (int i = 0; i < 100; ++i) {
service.submit(createNewTask());
}
// As tasks complete create new ones
while (true) {
Future<Something> result = service.take();
processResult(result.get());
service.submit(createNewTask());
}
}
};
new Thread(taskGenerator).start();
This uses a ThreadPoolExecutor with 5 threads to process tasks and a hand-rolled producer/consumer thread for generating tasks and processing results.
Obviously you'll want something a little smarter than while (true), you'll need to have sensible implementations of processResult and createNewTask, and this assumes that task execution is much slower than generating them or processing the results.
Hopefully this will get you on the right track.
Use java.util.concurrent.ThreadPoolExecutor with java.util.concurrent.ArrayBlockingQueue as its workQueue. This way attempt to put more tasks than the size of the queue would block.
BlockingQueue<Runnable> workQueue=new ArrayBlockingQueue<Runnable>(100);
ThreadPoolExecutor tpe=new ThreadPoolExecutor(5, 10, 60, TimeUnit.SECONDS, workQueue);
while (true) {
tpe.execute(createNewTask());
}
Right now I have this Groovy code to run a series of tasks:
CountDownLatch latch = new CountDownLatch(tasks.size);
for( task in tasks ) {
Thread.start worker.curry(task, latch)
}
latch.await(300L, TimeUnit.SECONDS);
I'd like to limit the number of simultaneous threads to a certain number t. The way it's written now, for n tasks, n threads get created "at once". I thought about using multiple latches or some sort of callback, but couldn't come up with a good solution.
The solution should start new task threads as soon as running threads drops below t, until number running reaches t or there are no un-run tasks.
You should check out GPars and use one of the abstractions listed. You can then specify the number to create in withPool(). I like fork/join:
withPool(4) { pool ->
runForkJoin(rootTask) { task ->
task.eachTask { forkChild(task) }
}
}
You can use the Executor framework.
Executors.newFixedThreadPool(t);
This will create n and only n threads on start. Then you can submit to the executor to utilize these threads.
Edit: Thanks for the comment Josh, I'll post your solution
ExecutorService pool = Executors.newFixedThreadPool(6);
for( task in tasks ) {
pool.execute Worker.curry(task)
}
pool.shutdown();