Kind of load balanced thread pool in java

Kind of load balanced thread pool in java - java

I am looking for a load balanced thread pool with no success so far. (Not sure whether load balancing is the correct wording).
Let me explain what I try to achieve.
Part 1:
I have Jobs, with 8 to 10 single tasks. On a 6 core CPU I let 8 thread work on this tasks in parallel which seems to deliver best peformance. Whe one task is ready, another one can start. Once all ten tasks are finished, the complete job is done. Usually a job is done in 30 to 60 seconds.
Part two:
Some times, unfortunately, the job takes more then two hours. This is correct due to amount of data that has to be calculated.
The bad thing is, that no other job can start while job1 is running (assuming, that all threads have the same duration) because it is using all threads.
My First idea:
Have 12 threads, allow up to three jobs in parallel.
BUT: that means, the cou is not fully untilized when there is only 1 job.
I am looking for a solution to have full CPU power for job one when there is no other job. But when an other job needs to be started while one other is running, I want the CPU power allocated to both job. And when a third or fourth job shows up, I want the cpu power alocated fairly to all four jobs.
I apreciate your answers...
thanks in advance

One possibility might be to use a standard ThreadPoolExecutor with a different kind of task queue
public class TaskRunner {
private static class PriorityRunnable implements Runnable,
Comparable<PriorityRunnable> {
private Runnable theRunnable;
private int priority = 0;
public PriorityRunnable(Runnable r, int priority) {
this.theRunnable = r;
this.priority = priority;
}
public int getPriority() {
return priority;
}
public void run() {
theRunnable.run();
}
public int compareTo(PriorityRunnable that) {
return this.priority - that.priority;
}
}
private BlockingQueue<Runnable> taskQueue = new PriorityBlockingQueue<Runnable>();
private ThreadPoolExecutor exec = new ThreadPoolExecutor(8, 8, 0L,
TimeUnit.MILLISECONDS, taskQueue);
public void runTasks(Runnable... tasks) {
int priority = 0;
Runnable nextTask = taskQueue.peek();
if(nextTask instanceof PriorityRunnable) {
priority = ((PriorityRunnable)nextTask).getPriority() + 1;
}
for(Runnable t : tasks) {
exec.execute(new PriorityRunnable(t, priority));
priority += 100;
}
}
}
The idea here is that when you have a new job you call
taskRunner.runTasks(jobTask1, jobTask2, jobTask3);
and it will queue up the tasks in such a way that they interleave nicely with any existing tasks in the queue (if any). Suppose you have one job queued, whose tasks have priority numbers j1t1=3, j1t2=103, and j1t3=203. In the absence of other jobs, these tasks will execute one after the other as quickly as possible. But if you submit another job with three tasks of its own, these will be assigned priority numbers j2t1=4, j2t2=104 and j2t3=204, meaning the queue now looks like
j1t1, j2t1, j1t2, j2t2, etc.
This is not perfect however, because if all threads are currently working (on tasks from job 1) then the first task of job 2 can't start until one of the job 1 tasks is complete (unless there's some external way for you to detect this and interrupt and re-queue some of job 1's tasks). The easiest way to make things more fair would be to break down the longer-running tasks into smaller segments and queue those as separate tasks - you need to get to a point where each individual job involves more tasks than there are threads in the pool, so that some of the tasks will always start off in the queue rather than being assigned directly to threads (if there are idle threads then exec.execute() passes the task straight to a thread without going through the queue at all).

The easiest thing to do is to oversubscribe your CPU, as Kanaga suggests, but start 8 threads each. There may be some overhead from the competition, but if you get to a single job situation, it will fully utilize the CPU. The OS will handle giving time to each thread.
Your "first idea" would also work. The idle threads wouldn't take resources from 8 working threads if they aren't actually executing a task. This wouldn't distribute the cpu resources as evenly when there are multiple jobs running, though.
Do you have a setup where you can test these different pipelines to see how they're performing for you?

I think since your machine is 6 core CPU. Better have 6 worker thread for each job-thread. So that when ever one thread got a new job, it starts up to six parallel workers to work on the single job. This will ensure consuming the full cpu power when there is only one job at a time.
Also please have a look at Fork and Join concept in java 7.
References_1 References_2References_3 References_4
Also learn about newcachedthreadpool()
Java newCachedThreadPool() versus newFixedThreadPool

Related

Can I run background tasks in a ThreadPool?

I have an ExecutorService to execute my tasks concurrently. Most of these tasks are simple actions that require ~300ms to complete each. But a few of these tasks are background processing queues that take in new sub-tasks all the time and execute them in order. These background tasks will remain active as long as there are normal tasks running.
The ThreadPool is generated through one of the Executors' methods (don't know which yet) with a user-specified Thread count. My fear is that the following situation might happen: There are less threads than there are background queues. At a given moment, all background queues are working, blocking all the threads of the ExecutorService. No normal tasks will thus be started and the program hang forever.
Is there a possibility this might happen and how can I avoid it? I'm thinking of a possibility to interrupt the background tasks to leave the place to the normal ones.
The goal is to limit the number of threads in my application because Google said having a lot of threads is bad and having them idle for most of the time is bad too.
There are ~10000 tasks that are going to be submitted in a very short amount of time at the begin of the program execution. About ~50 background task queues are needed and most of the time will be spent waiting for a background job to do.

Don't mix up long running tasks with short running tasks in same ExecutorService.
Use two different ExecutorService instances with right pool size. Even if you set the size as 50 for background threads with long running tasks, performance of the pool is not optimal since number of available cores (2 core, 4 core, 8 core etc.) is not in that number.
I would like to create two separate ExecutorService initialized with Runtime.getRuntime().availableProcessors()/2;
Have a look at below posts for more details to effectively utilize available cores:
How to implement simple threading with a fixed number of worker threads
Dynamic Thread Pool

You can have an unlimited number of threads, check out cache thread pool
Creates a thread pool that creates new threads as needed, but will
reuse previously constructed threads when they are available. These
pools will typically improve the performance of programs that execute
many short-lived asynchronous tasks. Calls to execute will reuse
previously constructed threads if available. If no existing thread is
available, a new thread will be created and added to the pool. Threads
that have not been used for sixty seconds are terminated and removed
from the cache. Thus, a pool that remains idle for long enough will
not consume any resources. Note that pools with similar properties but
different details (for example, timeout parameters) may be created
using ThreadPoolExecutor constructors.
Another option is create two different pools and reserve one for priority tasks.

The solution is that the background tasks stop instead of being idle when there is no work and get restarted if there are enough tasks again.
public class BackgroundQueue implements Runnable {
private final ExecutorService service;
private final Queue<Runnable> tasks = new ConcurrentLinkedQueue<>();
private final AtomicBoolean running = new AtomicBoolean(false);
private Future<?> future;
public BackgroundQueue(ExecutorService service) {
this.service = Objects.requireNonNull(service);
// Create a Future that immediately returns null
FutureTask f = new FutureTask<>(() -> null);
f.run();
future = f;
}
public void awaitQueueTermination() throws InterruptedException, ExecutionException {
do {
future.get();
} while (!tasks.isEmpty() || running.get());
}
public synchronized void submit(Runnable task) {
tasks.add(task);
if (running.compareAndSet(false, true))
future = service.submit(this);
}
#Override
public void run() {
while (!running.compareAndSet(tasks.isEmpty(), false)) {
tasks.remove().run();
}
}
}

Java ThreadPool concepts, and issues with controlling the number of actual threads

I am a newbie to Java concurrency and am a bit confused by several concepts and implementation issues here. Hope you guys can help.
Say, I have a list of tasks stored in a thread-safe list wrapper:
ListWrapper jobs = ....
'ListWrapper' has synchronized fetch/push/append functions, and this 'jobs' object will be shared by multiple worker threads.
And I have a worker 'Runnable' to execute the tasks:
public class Worker implements Runnable{
private ListWrapper jobs;
public Worker(ListWrapper l){
this.jobs=l;
}
public void run(){
while(! jobs.isEmpty()){
//fetch an item from jobs and do sth...
}
}
}
Now in the main function I execute the tasks:
int NTHREADS =10;
ExecutorService service= Executors.newFixedThreadPool(NTHREADS);
//run threads..
int x=3;
for(int i=0; i<x; i++){
service.execute(new Worker(jobs) );
}
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
Now my questions are:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
Thread t1= new Worker(jobs);
Thread t2= new Worker(jobs);
...
t1.join();
t2.join();
...
Thank you very much!!

[[ There are some good answers here but I thought I'd add some more detail. ]]
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
No, not really. I suspect that the reason you weren't seeing 20 threads is that threads had already finished or had yet to be started. If you call new Thread(...).start() 20 times then you will get 20 threads started. However, if you check immediately none of them may have actually begun to run or if you check later they may have finished.
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
Quoting the Javadocs of Executors.newFixedThreadPool(...):
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks.
So changing the NTHREADS constant changes the number of threads running in the pool. Changing x changes the number of jobs that are executed by those threads. You could have 2 threads in the pool and submit 1000 jobs or you could have 1000 threads and only submit 1 job for them to work on.
Btw, after you have submitted all of your jobs, you should then shutdown the pool which stops all of the threads if all of the jobs have been run.
service.shutdown();
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It differs in that it does all of the heavy work for you.
You don't have to create a ListWrapper of the jobs since you get one inside of the ExecutorService. You just submit the jobs to the ExecutorService and it keeps track of them until the threads are available to run them.
You don't have to create any threads or worry about them throwing exceptions and dying because the ExecutorService starts/restarts the threads for you.
If you want your tasks to return information you can make use of the submit(Callable) method and use the Future to get the results of the jobs. Etc, etc..
Doing this code yourself is going to be harder to get right, more code to maintain, and most likely will not perform as well as the code in the JDK that is battle tested and optimized.

You shouldn't create threads by yourself when using a threadpool. Instead of WorkerThread class you should use a class that implements Runnable but is not a thread. Passing a Thread object to the threadpool won't make the thread run actually. The object will be passed to a different internal thread, which will simply execute the run method of your WorkerThread class.
The ExecutorService is simply incompatible with the way you want to write your program.
In the code you have right now, these WorkerThreads will stop to work when your ListWrapper is empty. If you then add something to the list, nothing will happen. This is definitely not what you wanted.
You should get rid of ListWrapper and simply put your tasks directly into the threadpool. The threadpool already incorporates an internal list of jobs shared between the threads. You should just submit your jobs to the threadpool and it will handle them accordingly.
To answer your questions:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
NTHREADS, the threadpool will create the necessary number of threads.
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It's just that ExecutorService automates a lot of things for you. You can choose from a lot of different implementations of threadpools and you can substitute them easily. You can use for instance a scheduled executor. You get extra functionality. Why reinvent the wheel?

For 1) NTHREADS is the maximum threads that the pool will ever run concurrently, but that doesn't mean there will always be that many running. It will only use as many as is needed up to that max value... which in your case is 3.
As the docs say:
At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool-int-
As for 2) using Java's concurrent executors framework is preferred with new code. You get a lot of stuff for free and removes the need for having to handle all of the fiddly thread work yourself.

The number of threads passed into newFixedThreadPool is at most how many threads could be running executing your tasks. If you only have three tasks ever submitted I'd expect the ExecutorService to only create three threads.
To answer your questions:
You should use the number you pass into the constructor to control how many threads are going to be used to execute your tasks.
This differs because of the extra functionality the ExecutorService gives you, as well as the flexibility it gives you such as in the case you need to change your ExecutorService type or number of tasks you'll run (less lines of code to change).

All that is happening is the executor service is only creating as many threads as it needs. NTHREADS is effectively the maximum number of threads it'll create.
There is no point creating ten threads up front if it only has 3 tasks to complete, the other 7 will just be hanging around consuming resources.
If you submit more than NTHREADS number of tasks then it will process that number concurrently and the rest will wait on a queue until a thread becomes free.
This isn't any different from creating a fixed set of your own threads, except the thread management and scheduling is handled for you. The executor service also restarts threads if they are killed by rogue exceptions in your task which you'd otherwise have to code for.
See: The Javadoc on Executorservice.newFixedThreadPool

Multithreading Architecture for N Repeating Tasks

I have N tasks, each to be repeated after its own specific delay Interval(N) using a fixed thread pool size that will usually be smaller than N.
Since there will usually be a shortage of threads, preference should be given to executing a different task rather than repeating a recently completed task.
I was thinking of using an outer ThreadPoolExecutor with N nested ScheduledThreadPoolExecutors. I'm not really sure how to go about this in the most optimal way since each of those classes maintains its own internal thread pool.

Besides the use of a PriorityQueue as answered by assylias, you can also solve this architecturally, by having a simple executing ThreadPoolExecutor, and another ScheduledExecutorService, which will insert the tasks after a given delay.
So every task has the executive Runnable, and an insertion Runnable, and will, after successful execution, tell the ScheduledExecutorService to run the insertion Runnable after a given delay, which will then put the task back into the ThreadPoolExecutor.
As code:
// myExecutionTask
void run() {
doSomeWork();
scheduledExecutor.schedule(myInsertionRunnable, 1000, TimeUnit.MILLISECONDS);
}
and
// myInsertionRunnable
void run () {
threadPoolExecutor.execute(myExecutionTask);
}
Effectively this will automatically cycle the tasks in the ThreadPoolExecutor, as those tasks that have already been finished, will be at the end of the queue.
Edit: As discussed in comments, when using the scheduler's fixedRate or fixedDelay functionality on a very busy system, tasks added later might be executed less often than task that have been added earlier, as the system seems to prefer tasks that are already executing when deciding for the next one to run.
In contrast my solution above cycles these tasks properly, although there can be no guarantee on a busy system, that the requested delay is exact. So they might be executed later, but at least always in FIFO order.

You could use a PriorityBlockingQueue and use timestamps to define priorities - something like:
class Task {
AtomicLong lastRun;
Runnable r;
void run() {
r.run();
lastRun.set(System.currentMillis);
}
}
Your ScheduledExecutorService (one thread) can then add the task N to a PriorityQueue every Interval(N).
And you can have a separate consumer running in your FixedThreadPool that takes from the Queue (using a reverse comparator so that the tasks run more recently will have a lower priority).
That is a little sketchy but it should work.

invokeAll how it exactly work? (ForkJoin)

I have written the following snippet:
static private int counter;
public void compute()
{
if (array.length<=500)
{
for(int i = 0;i<array.length;i++){
counter++;
System.out.println("Ciao this is a recursive action number"+ counter+Thread.currentThread().getName());
}
}
else{
int split = array.length/2;
RecursiveActionTry right = new RecursiveActionTry(split);
RecursiveActionTry left = new RecursiveActionTry(split);
invokeAll(right, left);
I see that invokeAll() automatically fork one of the two RecursiveActionTry object I pass to. My laptop has only 2 cores.. what if I had 4 cores and launched 4 tasks... invokeAll(right, left, backward, forward); would I use all the 4 cores? Cannot know as I have only 2 cores.
I would like also to know if invokeAll(right, left) behind the scenes call compute() for the first argument(right) and fork + join for the second argument (left). (as in a RecursiveTask extension is supposed to be). Otherwise it would not use parallelism, would it?
And by the way, if there are more than 2 arguments.. does it call compute() on the first and fork on all the others?
Thanks in advance.

invokeAll() calls a number of tasks which execute independently on different threads. This does not necessitate the use of a different core for each thread, but it can allow the use of a different core for each thread if they are available. The details are handled by the underlying machine, but essentially (simplistically) if fewer cores are available than threads it time slices the threads so as to allow one to execute on one core for a certain amount of time, then the other, then another (in a loop.)
And by the way, if there are more than 2 arguments.. does it call compute() on the first and fork on all the others?
It would compute() all the arguments, it's then the responsibility of the compute() method to delegate and fork if the worker threshold is not met, then join the computations when copmlete. (Splitting it more than two ways is unusual though - fork join usually works by each recursion splitting the workload in two if necessary.)

The tasks and the worker threads are different things:
WorkerThreads are managed by the ForkJoinPool and if you use the default constructor it starts WorkerThreads according toRuntime.getRuntime().availableProcessors().
Task however are created/managed by you. To get multiple cores busy you must start several tasks. You may either split each into two parts or into N parts. While one part is executed directly, the other one(s) are put into a waiting queue. If any other WorkerThreads from the pool are idle and have no work to do, those are supposed to "steal" your forked pending task(s) from the queue and execute them in parallel.
To get 8 cores / WorkerThread busy it is not necessary to invoke 8 tasks at once. It will be sufficient to fork at least into two task which also fork again (recursively), until all WorkerThread get saturated (supposed your overall problem splits into that many subtasks).
So, there is no need to adapt your code if you have more or less cores, and your Task should not worry about WorkerThread management at all.
Finally invokeAll() or join() returns after all tasks have been run.

Balancing multiple queues

I suspect this is really easy but I’m unsure if there’s a naïve way of doing it in Java. Here’s my problem, I have two scripts for processing data and both have the same inputs/outputs except one is written for the single CPU and the other is for GPUs. The work comes from a queue server and I’m trying to write a program that sends the data to either the CPU or GPU script depending on which one is free.
I do not understand how to do this.
I know with executorservice I can specify how many threads I want to keep running but not sure how to balance between two different ones. I have 2 GPU’s and 8 CPU cores on the system and thought I could have threadexecutorservice keep 2 GPU and 8 CPU processes running but unsure how to balance between them since the GPU will be done a lot quicker than the CPU tasks.
Any suggestions on how to approach this? Should I create two queues and keep pooling them to see which one is less busy? or is there a way to just put all the work units(all the same) into one queue and have the GPU or CPU process take from the same queue as they are free?
UPDATE: just to clarify. the CPU/GPU programs are outside the scope of the program I'm making, they are simply scripts that I call via two different method. I guess the simplified version of what I'm asking is if two methods can take work from the same queue?

Can two methods take work from the same queue?
Yes, but you should use a BlockingQueue to save yourself some synchronization heartache.
Basically, one option would be to have a producer which places tasks into the queue via BlockingQueue.offer. Then design your CPU/GPU threads to call BlockingQueue.take and perform work on whatever they receive.
For example:
main (...) {
BlockingQueue<Task> queue = new LinkedBlockingQueue<>();
for (int i=0;i<CPUs;i++) {
new CPUThread(queue).start();
}
for (int i=0;i<GPUs;i++) {
new GPUThread(queue).start();
}
for (/*all data*/) {
queue.offer(task);
}
}
class CPUThread {
public void run() {
while(/*some condition*/) {
Task task = queue.take();
//do task work
}
}
}
//etc...

Obviously there is more than one way to do it, usually simplest is the best. I would suggest threadpools, one with 2 threads for CPU tasks, second with 8 threads will run GPU tasks. Your work unit manager can submit work to the pool that has idle threads at the moment (I would recommend synchronizing that block of code). Standard Java ThreadPoolExecutor has getActiveCount() method you can use for it, see
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html#getActiveCount().

Use Runnables like this:
CPUGPURunnable implements Runnable {
run() {
if ( Thread.currentThread() instance of CPUGPUThread) {
CPUGPUThread t = Thread.currentThread();
if ( t.isGPU())
runGPU();
else
runCPU();
}
}
}
CPUGPUThreads is a Thread subclass that knows if it runs in CPU or GPU mode, using a flag. Have a ThreadFactory for ThreadPoolExecutors that creates either a CPU of GPU thread. Set up a ThreadPoolExecutor with two workers. Make sure the Threadfactory creates a CPU and then a GPU thread instance.

I suppose you have two objects that represents two GPUs, with methods like boolean isFree() and void execute(Runnable). Then you should start 8 threads which in a loop take next job from the queue, put it in a free GPU, if any, otherwise execute the job itself.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.