How to distribute tasks between threads in ThreadPoolExecutor

How to distribute tasks between threads in ThreadPoolExecutor - java

I have following problem,
I have a queue of tasks and there are a lot of types of tasks like:
A, B, C, D, ...
I execute these tasks in thread pool.
But I have to restrict same type task execution at same time, hence, this is bad:
Thread-1: [A, D, C, B, ...]
Thread-2: [A, C, D, B, ...]
Tasks of type A and B could be executed at same time.
But this is good:
Thread-1: [A,B,A,B,...]
Thread-2: [C,D,D,C,...]
Hence tasks of same type are always executed sequentially.
What is the easiest way to implement this functionality?

This problem easily can be solved with an actor framework like Akka.
For each type of tasks. create an actor.
For each separate task, create a message and send it to the actor of corresponding type. Messages can be of type Runnable, as they probably are now, and the actor's reaction method can be
#Override
public void onReceive(Object msg) {
((Runnable)msg).run();
}
This way your program will run correctly for any number of threads.

I think you can implement your own DistributedThreadPool to control the thread. It's like some kind of topic subscriber/publisher structure.
I did a example as following:
class DistributeThreadPool {
Map<String, TypeThread> TypeCenter = new HashMap<String, TypeThread>();
public void execute(Worker command) {
TypeCenter.get(command.type).accept(command);
}
class TypeThread implements Runnable{
Thread t = null;
LinkedBlockingDeque<Runnable> lbq = null;
public TypeThread() {
lbq = new LinkedBlockingDeque<Runnable>();
}
public void accept(Runnable inRun) {
lbq.add(inRun);
}
public void start() {
t = new Thread(this);
t.start();
}
#Override
public void run() {
while (!Thread.interrupted()) {
try {
lbq.take().run();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
public DistributeThreadPool(String[] Types) {
for (String t : Types) {
TypeThread thread = new TypeThread();
TypeCenter.put(t, thread);
thread.start();
}
}
public static void main(String [] args) {
DistributeThreadPool dtp = new DistributeThreadPool(new String[] {"AB","CD"});
Worker w1 = new Worker("AB",()->System.out.println(Thread.currentThread().getName() +"AB"));
Worker w2 = new Worker("AB",()->System.out.println(Thread.currentThread().getName() +"AB"));
Worker w3 = new Worker("CD",()->System.out.println(Thread.currentThread().getName() +"CD"));
Worker w4 = new Worker("CD",()->System.out.println(Thread.currentThread().getName() +"CD"));
Worker w5 = new Worker("CD",()->System.out.println(Thread.currentThread().getName() +"CD"));
List<Worker> workers = new ArrayList<Worker>();
workers.add(w1);
workers.add(w2);
workers.add(w3);
workers.add(w4);
workers.add(w5);
workers.forEach(e->dtp.execute(e));
}
}

CompletableFuture.supplyAsync(this::doTaskA)
.thenAccept(this::useResultFromTaskAinTaskB);
What's happening above is that Task A and the related Task B are actually run in the same thread (one after the other, no need to "get" a new thread to start running Task B).
Or you can use runAsync for Task A if you don't need any information from it, but do need to wait for it to complete before running Task B.
By default, CompletableFuture's will use the common thread pool, but if you want more control over which ThreadPool gets used, you can pass a 2nd argument to the async methods with your own Executor that uses your own ThreadPool.

Interesting problem. Two questions come to mind:
How many different types of tasks are there?
If there are relatively few, the simplest way may be to create one thread for each type and assign each incoming task to its kind of thread. As long as tasks are balanced between types (and that's a big assumption) utilization will be good enough.
What's the expected timeliness/latency for task completion?
If your problem is flexible on the timeliness, you could batch incoming tasks of each kind by count or time interval, submit each batch you retire to the pool, then await completion of batch to submit another of the same kind.
You can adapt the second alternative to batch sizes as small as one, in which case the mechanics of awaiting completion become important for efficiency. CompletableFuture would fit the bill here; you could chain the "poll next task of type A and submit to pool" action to the task with thenRunAsync, and fire and forget the task.
You would have to maintain one external task queue per task type; the work queues of the FJ pool would be for in-progress tasks only. Still, this design has a good chance of dealing reasonably with imbalance in task count and workload per type.
Hope this helps.

Implement key ordered executor. Each task should have key. Tasks with same keys will be queued and will be executed successively, tasks with different keys will be executed in parallel.
Implementation in netty
You can try to make it yourself, but it is tricky and error prone. I can see few bugs in answer suggested there.

Related

is it safe to store threads in a ConcurrentMap?

I am building a backend service whereby a REST call to my service creates a new thread. The thread waits for another REST call if it does not receive anything by say 5 minutes the thread will die.
To keep track of all the threads I have a collection that keeps track of all the currently running threads so that when the REST call finally comes in such as a user accepting or declining an action, I can then identify that thread using the userID. If its declined we will just remove that thread from the collection if its accepted the thread can carry on doing the next action. i have implemented this using a ConcurrentMap to avoid concurrency issues.
Since this is my first time working with threads I want to make sure that I am not overlooking any issues that may arise. Please have a look at my code and tell me if I could do it better or if there's any flaws.
public class UserAction extends Thread {
int userID;
boolean isAccepted = false;
boolean isDeclined = false;
long timeNow = System.currentTimeMillis();
long timeElapsed = timeNow + 50000;
public UserAction(int userID) {
this.userID = userID;
}
public void declineJob() {
this.isDeclined = true;
}
public void acceptJob() {
this.isAccepted = true;
}
public boolean waitForApproval(){
while (System.currentTimeMillis() < timeElapsed){
System.out.println("waiting for approval");
if (isAccepted) {
return true;
} else if (declined) {
return false;
}
}
return isAccepted;
}
#Override
public void run() {
if (!waitForApproval) {
// mustve timed out or user declined so remove from list and return thread immediately
tCollection.remove(userID);
// end the thread here
return;
}
// mustve been accepted so continue working
}
}
public class Controller {
public static ConcurrentHashMap<Integer, Thread> tCollection = new ConcurrentHashMap<>();
public static void main(String[] args) {
int barberID1 = 1;
int barberID2 = 2;
tCollection.put(barberID1, new UserAction(barberID1));
tCollection.put(barberID2, new UserAction(barberID2));
tCollection.get(barberID1).start();
tCollection.get(barberID2).start();
Thread.sleep(1000);
// simulate REST call accepting/declining job after 1 second. Usually this would be in a spring mvc RESTcontroller in a different class.
tCollection.get(barberID1).acceptJob();
tCollection.get(barberID2).declineJob();
}
}

You don't need (explicit) threads for this. Just a shared pool of task objects that are created on the first rest call.
When the second rest call comes, you already have a thread to use (the one that's handling the rest call). You just need to retrieve the task object according to the user id. You also need to get rid of expired tasks, which can be done with for example a DelayQueue.
Pseudocode:
public void rest1(User u) {
UserTask ut = new UserTask(u);
pool.put(u.getId(), ut);
delayPool.put(ut); // Assuming UserTask implements Delayed with a 5 minute delay
}
public void rest2(User u, Action a) {
UserTask ut = pool.get(u.getId());
if(!a.isAccepted() || ut == null)
pool.remove(u.getId());
else
process(ut);
// Clean up the pool from any expired tasks, can also be done in the beginning
// of the method, if you want to make sure that expired actions aren't performed
while((UserTask u = delayPool.poll()) != null)
pool.remove(u.getId());
}

There's a synchronization issue that you should make your flags isAccepted and isDeclined of class AtomicBoolean.
A critical concept is that you need to take steps to make sure changes to memory in one thread are communicated to other threads that need that data. They're called memory fences and they often occur implicitly between synchronization calls.
The idea of a (simple) Von Neumann architecture with a 'central memory' is false for most modern machines and you need to know data is being shared between caches/threads correctly.
Also as others suggest, creating a thread for each task is a poor model. It scales badly and leaves your application vulnerable to keeling over if too many tasks are submitted. There is some limit to memory so you can only have so many pending tasks at a time but the ceiling for threads will be much lower.
That will be made all the worse because you're spin waiting. Spin waiting puts a thread into a loop waiting for a condition. A better model would wait on a ConditionVariable so threads not doing anything (other than waiting) could be suspended by the operating system until notified that the thing they're waiting for is (or may be) ready.
There are often significant overheads in time and resources to creating and destroying threads. Given that most platforms can be simultaneously only executing a relatively small number of threads creating lots of 'expensive' threads to have them spend most of their time swapped out (suspended) doing nothing is very inefficient.
The right model launches a pool of a fixed number of threads (or relatively fixed number) and places tasks in a shared queue that the threads 'take' work from and process.
That model is known generically as a "Thread Pool".
The entry level implementation you should look at is ThreadPoolExecutor:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html

Executing two tasks consecutively

There's a thread pool with a single thread that is used to perform tasks submitted by multiple threads. The task is actually comprised of two parts - perform with meaningful result and cleanup that takes quite some time but returns no meaningful result. At the moment (obviously incorrect) implementation looks something like this. Is there an elegant way to ensure that another perform task will be executed only after previous cleanup task?
public class Main {
private static class Worker {
int perform() {
return 1;
}
void cleanup() {
}
}
private static void perform() throws InterruptedException, ExecutionException {
ExecutorService pool = Executors.newFixedThreadPool(1);
Worker w = new Worker();
Future f = pool.submit(() -> w.perform());
pool.submit(w::cleanup);
int x = (int) f.get();
System.out.println(x);
}
}

Is there an elegant way to ensure that another perform task will be executed only after previous cleanup task?
The most obvious thing to do is to call cleanup() from perform() but I assume there is a reason why you aren't doing that.
You say that your solution is currently "obviously incorrect". Why? Because of race conditions? Then you could add a synchronized block:
synchronized (pool) {
Future f = pool.submit(() -> w.perform());
pool.submit(w::cleanup);
}
That would ensure that the cleanup() would come immediately after a perform(). If you are worried about the performance hit with the synchronized, don't be.
Another solution might be to use the ExecutorCompletionService class although I'm not sure how that would help with one thread. I've used it before when I had cleanup tasks running in another thread pool.

If you are using java8, you can do this with CompletableFuture
CompletableFuture.supplyAsync(() -> w.perform(), pool)
.thenApplyAsync(() -> w.cleanup(), pool)
.join();

How to check if the threads have completed its task or not?

OK, I created couples of threads to do some complex task. Now How may I check each threads whether it has completed successfully or not??
class BrokenTasks extends Thread {
public BrokenTasks(){
super();
}
public void run(){
//Some complex tasks related to Networking..
//Example would be fetching some data from the internet and it is not known when can it be finished
}
}
//In another class
BrokenTasks task1 = new BrokenTasks();
BrokenTasks task2 = new BrokenTasks();
BrokenTasks task3 = new BrokenTasks();
BrokenTasks task4 = new BrokenTasks();
task1.start();
.....
task4.start();
So how can I check if these all tasks completed successfully from
i) Main Program (Main Thread)
ii)From each consecutive threads.For example: checking if task1 had ended or not from within task2..

A good way to use threads is not to use them, directly. Instead make a thread pool. Then in your POJO task encapsulation have a field that is only set at the end of computation.
There might be 3-4 milliseconds delay when another thread can see the status - but finally the JVM makes it so. As long as other threads do not over write it. That you can protect by making sure each task has a unique instance of work to do and status, and other threads only poll that every 1-5 seconds or have a listener that the worker calls after completion.
A library I have used is my own
https://github.com/tgkprog/ddt/tree/master/DdtUtils/src/main/java/org/s2n/ddt/util/threads
To use : in server start or static block :
package org.s2n.ddt.util;
import org.apache.log4j.Logger;
import org.junit.Test;
import org.s2n.ddt.util.threads.PoolOptions;
import org.s2n.ddt.util.threads.DdtPools;
public class PoolTest {
private static final Logger logger = Logger.getLogger(PoolTest.class);
#Test
public void test() {
PoolOptions options = new PoolOptions();
options.setCoreThreads(2);
options.setMaxThreads(33);
DdtPools.initPool("a", options);
Do1 p = null;
for (int i = 0; i < 10; i++) {
p = new Do1();
DdtPools.offer("a", p);
}
LangUtils.sleep(3 + (int) (Math.random() * 3));
org.junit.Assert.assertNotNull(p);
org.junit.Assert.assertEquals(Do1.getLs(), 10);
}
}
class Do1 implements Runnable {
volatile static long l = 0;
public Do1() {
l++;
}
public void run() {
// LangUtils.sleep(1 + (int) (Math.random() * 3));
System.out.println("hi " + l);
}
public static long getLs() {
return l;
}
}
Things you should not do:
* Don't do things every 10-15 milliseconds
* Unless academic do not make your own thread
* don't make it more complex then it needs for 97% of cases

You can use Callable and ForkJoinPool for this task.
class BrokenTasks implements Callable {
public BrokenTasks(){
super();
}
public Object call() thrown Exception {
//Some complex tasks related to Networking..
//Example would be fetching some data from the internet and it is not known when can it be finished
}
}
//In another class
BrokenTasks task1 = new BrokenTasks();
BrokenTasks task2 = new BrokenTasks();
BrokenTasks task3 = new BrokenTasks();
BrokenTasks task4 = new BrokenTasks();
ForkJoinPool pool = new ForkJoinPool(4);
Future result1 = pool.submit(task1);
Future result2 = pool.submit(task2);
Future result3 = pool.submit(task3);
Future result4 = pool.submit(task4);
value4 = result4.get();//blocking call
value3 = result3.get();//blocking call
value2 = result2.get();//blocking call
value1 = result1.get();//blocking call
And don't forget to shutdown pool after that.

Classically you simply join on the threads you want to finish. Your thread does not proceed until join completes. For example:
// await all threads
task1.join();
task2.join();
task3.join();
task4.join();
// continue with main thread logic
(I probably would have put the tasks in a list for cleaner handling)

If a thread has not been completed its task then it is still alive. So for testing whether the thread has completed its task you can use isAlive() method.

There are two different questions here
One is if the thread still working.
The other one is if the task still not finished.
Thread is a very expensive method to solve problem, when we start a thread in java, the VM has to store context informations and solve synchronize problems(such as lock). So we usually use thread pool instead of directly thread. The benefit of thread pool is that we can use few thread to handle many different tasks. That means few threads keeps alive, while many tasks are finished.
Don’t find task status from a thread.
Thread is a worker, and tasks are jobs.
A thread may work on many different jobs one by one.
I don’t think we should ask a worker if he has finished a job. I’d rather ask the job if it is finished.
When I want to check if a job is finished, I use signals.
Use signals (synchronization aid)
There are many synchronization aid tools since JDK 1.5 works like a signal.
CountDownLatch
This object provides a counter(can be set only once and count down many times). This counter allows one or more threads to wait until a set of operations being performed in other threads completes.
CyclicBarrier
This is another useful signal that allows a set of threads to all wait for each other to reach a common barrier point.
more tools
More tools could be found in JDK java.util.concurrent package.

You can use Thread.isAlive method, see API: "A thread is alive if it has been started and has not yet died". That is in task2 run() you test task1.isAlive()
To see task1 from task2 you need to pass it as an argument to task2's construtor, or make tasks fields instead of local vars

You can use the following..
task1.join();
task2.join();
task3.join();
task4.join();
// and then check every thread by using isAlive() method
e.g : task1.isAlive();
if it return false means that thread had completed it's task
otherwise it will true

I'm not sure of your exact needs, but some Java application frameworks have handy abstractions for dealing with individual units of work or "jobs". The Eclipse Rich Client Platform comes to mind with its Jobs API. Although it may be overkill.
For plain old Java, look at Future, Callable and Executor.

Java threadpools and runnables creating runnables

Bear with me as I'm not terribly savvy in multithreaded programming...
I'm currently building out a system that uses a ThreadPool ExecutorService for various runnables. That much is straightforward. However, I'm looking at the possibility of having the runnables themselves spawn an additional runnable based on what happens in the original runnable (ie, if success, do this, if fail, do this, etc as some tasks must be complete before others execute). It should be noted that the main thread does not need to be notified of the results of these tasks, although it might be handy for handling exceptions, ie, if an external service cannot be contacted and all threads are throwing exceptions as a result, then stop submitting tasks and periodically check on the external service until it comes back up. This isn't completely necessary, but it would be nice.
Ie, submit Task A. Task A does some things. If everything goes well, Task A will execute Task B. If something doesn't work out properly or an exception is thrown, execute Task C. Each child task may also have additional tasks, but only a few levels deep. I'd much rather do something like this than large, snarled conditionals in a single task as this approach allows for much greater flexibility.
However, I'm not certain how this would affect the thread pool. I would assume that any additional thread(s) created from within a thread in the pool would exist outside of the pool as they themselves were not submitted directly to the pool. Is this a correct assumption? If so, it's likely a bad idea (well, if not, it may not be a very good idea anyway) as it could result in a lot more threads as the original thread completes and a new task is submitted while the thread spawned from the earlier task is still going (and may last considerably longer than others).
I've also considered implementing these as Callables instead and placing a response object in the Future that is returned, then add the appropriate Callable to the thread pool based on the response. However, this would tie all actions back to the main thread, which seems an unnecessary bottleneck. I suppose I could place a Runnable into the pool that itself handles the execution of the Callable and subsequent actions, but then I get twice as many threads.
Am I on the right track here or am I completely off the rails?

I have never used this, but it can be useful for you: http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html

There are many ways to do what you want. You need to be careful you don't end up creating too many threads.
The following is an example, you could make this more efficient with an ExecutorCompletionService and alternatively you could use Runnable's.
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class ThreadsMakeThreads {
public static void main(String[] args) {
new ThreadsMakeThreads().start();
}
public void start() {
//Create resources
ExecutorService threadPool = Executors.newCachedThreadPool();
Random random = new Random(System.currentTimeMillis());
int numberOfThreads = 5;
//Prepare threads
ArrayList<Leader> leaders = new ArrayList<Leader>();
for(int i=0; i < numberOfThreads; i++) {
leaders.add(new Leader(threadPool, random));
}
//Get the results
try {
List<Future<Integer>> results = threadPool.invokeAll(leaders);
for(Future<Integer> result : results) {
System.out.println("Result is " + result.get());
}
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
threadPool.shutdown();
}
class Leader implements Callable<Integer> {
private ExecutorService threadPool;
private Random random;
public Leader(ExecutorService threadPool, Random random) {
this.threadPool = threadPool;
this.random = random;
}
#Override
public Integer call() throws Exception {
int numberOfWorkers = random.nextInt(10);
ArrayList<Worker> workers = new ArrayList<Worker>();
for(int i=0; i < numberOfWorkers; i++) {
workers.add(new Worker(random));
}
List<Future<Integer>> tasks = threadPool.invokeAll(workers);
int result = 0;
for(Future<Integer> task : tasks) {
result += task.get();
}
return result;
}
}
class Worker implements Callable<Integer> {
private Random random;
public Worker(Random random) {
this.random = random;
}
#Override
public Integer call() throws Exception {
return random.nextInt(100);
}
}
}

Submitting tasks to the thread pool from other tasks is quite meaningful idea. But I am afraid you think of running new tasks on separate threads, that really can eat all the memory. Just set a limit to the number of threads when the pool is created, and submit new tasks to that thread pool.
This approach can be further elaborated in different directions. First, treat tasks as ordinary objects, with interface methods, and let that methods decide if they want to submit this object to the thread pool. This requires that each task knows its thread pool - pass it as a parameter at the time of creation. Even more convenient, keep reference to the thread pool as a thread local variable.
You can easily emulate functional programming: an object represents a function call, and for each parameter it has corresponding set method. When all parameters are set, the object is submitted to the thread pool.
Another direction is actor programming: task class has single set method, but it can be called multiple times, and if previous argument is not yet processed, the set method does not submit the task to the thread pool, but simply stores its argument in a queue. The run() method processes all available arguments from the queue and then returns.
All this features are implemented in the dataflow library https://github.com/rfqu/df4j. I wrote it intentionally to support task-based parallelism.

Java Concurrency in Practice: race condition in BoundedExecutor?

There's something odd about the implementation of the BoundedExecutor in the book Java Concurrency in Practice.
It's supposed to throttle task submission to the Executor by blocking the submitting thread when there are enough threads either queued or running in the Executor.
This is the implementation (after adding the missing rethrow in the catch clause):
public class BoundedExecutor {
private final Executor exec;
private final Semaphore semaphore;
public BoundedExecutor(Executor exec, int bound) {
this.exec = exec;
this.semaphore = new Semaphore(bound);
}
public void submitTask(final Runnable command) throws InterruptedException, RejectedExecutionException {
semaphore.acquire();
try {
exec.execute(new Runnable() {
#Override public void run() {
try {
command.run();
} finally {
semaphore.release();
}
}
});
} catch (RejectedExecutionException e) {
semaphore.release();
throw e;
}
}
When I instantiate the BoundedExecutor with an Executors.newCachedThreadPool() and a bound of 4, I would expect the number of threads instantiated by the cached thread pool to never exceed 4. In practice, however, it does. I've gotten this little test program to create as much as 11 threads:
public static void main(String[] args) throws Exception {
class CountingThreadFactory implements ThreadFactory {
int count;
#Override public Thread newThread(Runnable r) {
++count;
return new Thread(r);
}
}
List<Integer> counts = new ArrayList<Integer>();
for (int n = 0; n < 100; ++n) {
CountingThreadFactory countingThreadFactory = new CountingThreadFactory();
ExecutorService exec = Executors.newCachedThreadPool(countingThreadFactory);
try {
BoundedExecutor be = new BoundedExecutor(exec, 4);
for (int i = 0; i < 20000; ++i) {
be.submitTask(new Runnable() {
#Override public void run() {}
});
}
} finally {
exec.shutdown();
}
counts.add(countingThreadFactory.count);
}
System.out.println(Collections.max(counts));
}
I think there's a tiny little time frame between the release of the semaphore and the task ending, where another thread can aquire a permit and submit a task while the releasing thread hasn't finished yet. In other words, it has a race condition.
Can someone confirm this?

BoundedExecutor was indeed intended as an illustration of how to throttle task submission, not as a way to place a bound on thread pool size. There are more direct ways to achieve the latter, as at least one comment pointed out.
But the other answers don't mention the text in the book that says to use an unbounded queue and to
set the bound on the semaphore to be equal to the pool size plus the
number of queued tasks you want to allow, since the semaphore is
bounding the number of tasks both currently executing and awaiting
execution. [JCiP, end of section 8.3.3]
By mentioning unbounded queues and pool size, we were implying (apparently not very clearly) the use of a thread pool of bounded size.
What has always bothered me about BoundedExecutor, however, is that it doesn't implement the ExecutorService interface. A modern way to achieve similar functionality and still implement the standard interfaces would be to use Guava's listeningDecorator method and ForwardingListeningExecutorService class.

You are correct in your analysis of the race condition. There is no synchronization guarantees between the ExecutorService & the Semaphore.
However, I do not know if throttling the number of threads is what the BoundedExecutor is used for. I think it is more for throttling the number of tasks submitted to the service. Imagine if you have 5 million tasks that need to submit, and if you submit more then 10,000 of them you run out of memory.
Well you only will ever have 4 threads running at any given time, why would you want to try and queue up all 5 millions tasks? You can use a construct similar to this to throttle the number of tasks queued up at any given time. What you should get out of this is that at any given time there are only 4 tasks running.
Obviously the resolution to this is to use a Executors.newFixedThreadPool(4).

I see as much as 9 threads created at once. I suspect there is a race condition which causes there to be more thread than required.
This could be because there is before and after running the task work to be done. This means that even though there is only 4 thread inside your block of code, there is a number of thread stopping a previous task or getting ready to start a new task.
i.e. the thread does a release() while it is still running. Even though its the last thing you do its not the last thing it does before acquiring a new task.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.