I have the following function, in pseudo-code:
Result calc(Data data) {
if (data.isFinal()) {
return new Result(data); // This is the actual lengthy calculation
} else {
List<Result> results = new ArrayList<Result>();
for (int i=0; i<data.numOfSubTasks(); ++i) {
results.add(calc(data.subTask(i));
}
return new Result(results); // merge all results in to a single result
}
}
I want to parallelize it, using a fixed number of threads.
My first attempt was:
ExecutorService executorService = Executors.newFixedThreadPool(numOfThreads);
Result calc(Data data) {
if (data.isFinal()) {
return new Result(data); // This is the actual lengthy calculation
} else {
List<Result> results = new ArrayList<Result>();
List<Callable<Void>> callables = new ArrayList<Callable<Void>>();
for (int i=0; i<data.numOfSubTasks(); ++i) {
callables.add(new Callable<Void>() {
public Void call() {
results.add(calc(data.subTask(i));
}
});
}
executorService.invokeAll(callables); // wait for all sub-tasks to complete
return new Result(results); // merge all results in to a single result
}
}
However, this quickly got stuck in a deadlock, because, while the top recursion level waits for all threads to finish, the inner levels also wait for threads to become available...
How can I efficiently parallelize my program without deadlocks?
Your problem is a general design problem when using ThreadPoolExecutor for tasks with dependencies.
I see two options:
1) Make sure to submit tasks in a bottom-up order, so that you never have a running task that depends on a task which didn't start yet.
2) Use the "direct handoff" strategy (See ThreadPoolExecutor documentation):
ThreadPoolExecutor executor = new ThreadPoolExecutor(poolSize, poolSize, 0, TimeUnit.SECONDS, new SynchronousQueue<Runnable>());
executor.setRejectedExecutionHandler(new CallerRunsPolicy());
The idea is using a synchronous queue so that tasks never wait in a real queue. The rejection handler takes care of tasks which don't have an available thread to run on. With this particular handler, the submitter thread runs the rejected tasks.
This executor configuration guarantees that tasks are never rejected, and that you never have deadlocks due to inter-task dependencies.
you should split your approach in two phases:
create all the tree down until data.isFinal() == true
recursively collect the results (only possible if the merging does not produce other operations/calls)
To do that, you can use [Futures][1] to make the results async. Means all results of calc will be of type Future[Result].
Immediately returning a Future will free the current thread and give space for the processing of others. With the collection of the Results (new Result(results)) you should wait for all results to be ready (ScatterGather-Pattern, you can use a semaphore to wait for all results). The collection itself will be walking a tree and checking (or waiting for the results to arrive) will happen in a single thread.
Overall you build a tree of Futures, that is used to collect the results and perform only the "expensive" operations in the threadpool.
Related
I'm wondering whether there is any advantage to keeping the same threads over the course of the execution of an object, rather than re-using the same Thread objects. I have an object for which a single (frequently used) method is parallelized using local Thread variables, such that every time the method is called, new Threads (and Runnables) are instantiated. Because the method is called so frequently, a single execution may instantiate upwards of a hundred thousand Thread objects, even though there are never more than a few (~4-6) active at any given time.
Following is a cut down example of how this method is currently implemented, to give a sense of what I mean. For reference, n is of course the pre-determined number of threads to use, whereas this.dataStructure is a (thread-safe) Map which serves as the input to the computation, as well as being modified by the computation. There are other inputs involved, but as they are not relevant to this question, I've omitted their usage. I've also omitted exception handling for the same reason.
Runnable[] tasks = new Runnable[n];
Thread[] threads = new Thread[n];
ArrayBlockingQueue<MyObject> inputs = new ArrayBlockingQueue<>(this.dataStructure.size());
inputs.addAll(this.dataStructure.values());
for (int i = 0; i < n; i++) {
tasks[i] = () -> {
while (true) {
MyObject input = inputs.poll(1L, TimeUnit.MICROSECONDS);
if (input == null) return;
// run computations over this.dataStructure
}
};
threads[i] = new Thread(tasks[i]);
threads[i].start();
}
for (int i = 0; i < n; i++)
threads[i].join();
Because these Threads (and their runnables) always execute the same way using a single ArrayBlockingQueue as input, an alternative to this would be to just "refill the queue" every time the method is called and just re-start the same Threads. This is easily implemented, but I'm unsure as to whether it would make any difference one way or the other. I'm not too familiar with concurrency, so any help is appreciated.
PS.: If there is a more elegant way to handle the polling, that would also be helpful.
It is not possible to start a Thread more than once, but conceptually, the answer to your question is yes.
This is normally accomplished with a thread pool. A thread pool is a set of Threads which rarely actually terminate. Instead, an application is passes its task to the thread pool, which picks a Thread in which to run it. The thread pool then decides whether the Thread should be terminated or reused after the task completes.
Java has some classes which make use of thread pools quite easy: ExecutorService and CompletableFuture.
ExecutorService usage typically looks like this:
ExecutorService executor = Executors.newCachedThreadPool();
for (int i = 0; i < n; i++) {
tasks[i] = () -> {
while (true) {
MyObject input = inputs.poll(1L, TimeUnit.MICROSECONDS);
if (input == null) return;
// run computations over this.dataStructure
}
};
executor.submit(tasks[i]);
}
// Doesn't interrupt or halt any tasks. Will wait for them all to finish
// before terminating its threads.
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
Executors has other methods which can create thread pools, like newFixedThreadPool() and newWorkStealingPool(). You can decide for yourself which one best suits your needs.
CompletableFuture use might look like this:
Runnable[] tasks = new Runnable[n];
CompletableFuture<?>[] futures = new CompletableFuture<?>[n];
for (int i = 0; i < n; i++) {
tasks[i] = () -> {
while (true) {
MyObject input = inputs.poll(1L, TimeUnit.MICROSECONDS);
if (input == null) return;
// run computations over this.dataStructure
}
};
futures[i] = CompletableFuture.runAsync(tasks[i]);
}
CompletableFuture.allOf(futures).get();
The disadvantage of CompletableFuture is that the tasks cannot be canceled or interrupted. (Calling cancel will mark the task as completing with an exception instead of completing successfully, but the task will not be interrupted.)
Per definition, you cannot restart a thread. According to the documentation:
It is never legal to start a thread more than once. In particular, a thread may not be restarted once it has completed execution.
Nevertheless a thread is a valuable resource, and there are implementations to reuse threads. Have a look at the Java Tutorial about Executors.
I have following problem,
I have a queue of tasks and there are a lot of types of tasks like:
A, B, C, D, ...
I execute these tasks in thread pool.
But I have to restrict same type task execution at same time, hence, this is bad:
Thread-1: [A, D, C, B, ...]
Thread-2: [A, C, D, B, ...]
Tasks of type A and B could be executed at same time.
But this is good:
Thread-1: [A,B,A,B,...]
Thread-2: [C,D,D,C,...]
Hence tasks of same type are always executed sequentially.
What is the easiest way to implement this functionality?
This problem easily can be solved with an actor framework like Akka.
For each type of tasks. create an actor.
For each separate task, create a message and send it to the actor of corresponding type. Messages can be of type Runnable, as they probably are now, and the actor's reaction method can be
#Override
public void onReceive(Object msg) {
((Runnable)msg).run();
}
This way your program will run correctly for any number of threads.
I think you can implement your own DistributedThreadPool to control the thread. It's like some kind of topic subscriber/publisher structure.
I did a example as following:
class DistributeThreadPool {
Map<String, TypeThread> TypeCenter = new HashMap<String, TypeThread>();
public void execute(Worker command) {
TypeCenter.get(command.type).accept(command);
}
class TypeThread implements Runnable{
Thread t = null;
LinkedBlockingDeque<Runnable> lbq = null;
public TypeThread() {
lbq = new LinkedBlockingDeque<Runnable>();
}
public void accept(Runnable inRun) {
lbq.add(inRun);
}
public void start() {
t = new Thread(this);
t.start();
}
#Override
public void run() {
while (!Thread.interrupted()) {
try {
lbq.take().run();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
public DistributeThreadPool(String[] Types) {
for (String t : Types) {
TypeThread thread = new TypeThread();
TypeCenter.put(t, thread);
thread.start();
}
}
public static void main(String [] args) {
DistributeThreadPool dtp = new DistributeThreadPool(new String[] {"AB","CD"});
Worker w1 = new Worker("AB",()->System.out.println(Thread.currentThread().getName() +"AB"));
Worker w2 = new Worker("AB",()->System.out.println(Thread.currentThread().getName() +"AB"));
Worker w3 = new Worker("CD",()->System.out.println(Thread.currentThread().getName() +"CD"));
Worker w4 = new Worker("CD",()->System.out.println(Thread.currentThread().getName() +"CD"));
Worker w5 = new Worker("CD",()->System.out.println(Thread.currentThread().getName() +"CD"));
List<Worker> workers = new ArrayList<Worker>();
workers.add(w1);
workers.add(w2);
workers.add(w3);
workers.add(w4);
workers.add(w5);
workers.forEach(e->dtp.execute(e));
}
}
CompletableFuture.supplyAsync(this::doTaskA)
.thenAccept(this::useResultFromTaskAinTaskB);
What's happening above is that Task A and the related Task B are actually run in the same thread (one after the other, no need to "get" a new thread to start running Task B).
Or you can use runAsync for Task A if you don't need any information from it, but do need to wait for it to complete before running Task B.
By default, CompletableFuture's will use the common thread pool, but if you want more control over which ThreadPool gets used, you can pass a 2nd argument to the async methods with your own Executor that uses your own ThreadPool.
Interesting problem. Two questions come to mind:
How many different types of tasks are there?
If there are relatively few, the simplest way may be to create one thread for each type and assign each incoming task to its kind of thread. As long as tasks are balanced between types (and that's a big assumption) utilization will be good enough.
What's the expected timeliness/latency for task completion?
If your problem is flexible on the timeliness, you could batch incoming tasks of each kind by count or time interval, submit each batch you retire to the pool, then await completion of batch to submit another of the same kind.
You can adapt the second alternative to batch sizes as small as one, in which case the mechanics of awaiting completion become important for efficiency. CompletableFuture would fit the bill here; you could chain the "poll next task of type A and submit to pool" action to the task with thenRunAsync, and fire and forget the task.
You would have to maintain one external task queue per task type; the work queues of the FJ pool would be for in-progress tasks only. Still, this design has a good chance of dealing reasonably with imbalance in task count and workload per type.
Hope this helps.
Implement key ordered executor. Each task should have key. Tasks with same keys will be queued and will be executed successively, tasks with different keys will be executed in parallel.
Implementation in netty
You can try to make it yourself, but it is tricky and error prone. I can see few bugs in answer suggested there.
I have a List of 100,000 objects. Want to read the List as fast as possible.
Had split them into multiple small List each of 500 objects
List<List<String>> smallerLists = Lists.partition(bigList, 500);
ExecutorService executor = Executors.newFixedThreadPool(smallerLists.size());
for(int i = 0; i < smallerLists.size();i++) {
MyXMLConverter xmlList = new MyXMLConverter(smallerLists.get(i));
executor.execute(xmlList);
}
executor.shutdown();
while (!executor.isTerminated()) {}
MyXMLConverter.java
Again using Executors of 50 threads, to process these 500 objects List.
public MyXMLConverter(List<String> data){
this.data = data;
}
#Override
public void run() {
try {
convertLine();
} catch (Exception ex) {}
}
public void convertLine(){
ExecutorService executor = Executors.newFixedThreadPool(50);
for(int i = 0; i < data.size();i++) {
MyConverter worker = new MyConverter(list.get(i));
executor.execute(worker);
}
executor.shutdown();
while (!executor.isTerminated()) {}
}
It's consuming lot of time in fetching the objects from List. Is there any better way to do this ? Please suggest.
Since processing time of each item may vary, it'd be better to just have each worker thread pull the next item to processes directly from the main list, in order to keep all threads busy at the end.
Multi-threaded pulling from a shared list is best done using one of the concurrent collections. In your case, ConcurrentLinkedQueue would be a prime candidate.
So, copy your list into a ConcurrentLinkedQueue (or build the "list" directly as a queue), and let your threads call poll() until it return null.
If building the list of 100000 elements take time too, you can even kickstart the process by allowing worker threads to begin their job while building the queue. For this, you'd use a LinkedBlockingQueue, and the workers would call take().
You'd then add a special element to the queue to mark the end, and when a worker get the end-marker, it would put it back in the queue for the next worker, then exit.
There is two main problem
Your code create 200 * 50 + 50 threads
Most of them do nothing in infinite loop: while (!executor.isTerminated()) {}
I suggest to use something like this.
ExecutorService executor = Executors.newFixedThreadPool(COUNT_OF_YOUR_PROCESSOR_CORESS * 2);
List<Future<?>> futureList = new ArrayList<Future<?>>();
for(String currentString : bigList) {
MyConverter worker = new MyConverter(currentString);
Future<?> future = executor.submit(worker);
futureList.add(future);
}
Collections.reverse(futureList);
for (Future<?> future : futureList){
future.get();
}
executor.shutdown(); //No worries. All task already executed here
Or if you Java 8 addict then
bigList.parallelStream().forEach(s -> new MyConverter(s).run());
I am executing millions of iteration and I want to parallelize this. Hence decided to add the task [each iteration] to the Thread Pool.
Now, if I add all the iteration to the Thread Pool, it might throw an OutOfMemoryError. I want to handle that gracefully, so is there any way to know about the availability of the worker Thread in the Thread Pool?
Once it's available, add the Runnable to the Worker Thread.
for(int i=0; i<10000000000; i++) {
executor.submit(new Task(i));
}
Each of those tasks merely take 1 sec to complete.
Why don't you set a limit to how many tasks can run concurrently. Like:
HashSet<Future> futures = new HashSet<>();
int concurrentTasks = 1000;
for (int ii=0; ii<100000000; ii++) {
while(concurrentTasks-- > 0 && ii<100000000) {
concurrentTasks.add(executor.submit(new Task(ii)));
}
Iterator<Future> it = concurrentTasks.iterator();
while(it.hasNext()) {
Future task = it.next();
if (task.isDone()) {
concurrentTasks++;
it.remove();
}
}
}
You'll want to use something like this:
ArrayBlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(MAX_PENDING_TASKS);
Executor executor = new ThreadPoolExecutor(MIN_THREADS, MAX_THREADS, IDLE_TIMEOUT, TimeUnit.SECONDS, queue, new ThreadPoolExecutor.CallerRunsPolicy());
for(int i=0; i<10000000000; i++) {
executor.submit(new Task(i));
}
Basically you create a thread pool with min/max threads and an array backed queue. When you hit the limit of pending tasks, the "caller runs policy" kicks in and your main thread ends up running the next task (giving time for your other tasks to complete and open slots in the queue).
Since you've stated that your tasks are short lived, this seems like an optimal strategy.
The values for MAX_PENDING_TASKS and MIN_THREADS are something you can fiddle with to figure out what the optimal values are for your workload, but MAX_PENDING_TASKS should be at least twice MIN_THREADS and probably more like 10 to 100 times.
You should use java.lang.Runtime
The biggest memory issue is probably going to be your Object creation, not in adding them to your Executor, so that's where you should be calling Runtime.getRuntime().freeMemory().
I think I'm doing it wrong. I am creating threads that are suppose to crunch some data from a shared queue. My problem is the program is slow and a memory hog, I suspect that the queue may not be as shared as I hoped it would be. I suspect this because in my code I added a line that displayed the size of the queue and if I launch 2 threads then I get two outputs with completely different numbers and seem to increment on their own(I thought it could be the same number but maybe it was jumping from 100 to 2 and so on but after watching it shows 105 and 5 and goes at a different rate. If I have 4 threads then I see 4 different numbers).
Here's snippet of the relevant parts. I create a static class with the data I want in the queue at the top of the program
static class queue_class {
int number;
int[] data;
Context(int number, int[] data) {
this.number = number;
this.data = data;
}
}
Then I create the queue after sending some jobs to the callable..
static class process_threaded implements Callable<Void> {
// queue with contexts to process
private Queue<queue_class> queue;
process_threaded(queue_class request) {
queue = new ArrayDeque<queue_class>();
queue.add(request);
}
public Void call() {
while(!queue.isEmpty()) {
System.out.println("in contexts queue with a size of " + queue.size());
Context current = contexts.poll();
//get work and process it, if it work great then the solution goes elsewhere
//otherwise, depending on the data, its either discarded or parts of it is added back to queue
queue.add(new queue_class(k, data_list));
As you can see, there's 3 options for the data, get sent off if data is good, discard if its totally horrible or sent back to the queue. I think the queues are going when its getting sent back but I suspect because each thread is working on its own queue and not a shared one.
Is this guess correct and am I doing this wrong?
You are correct in your assessment that each thread is (probably) working with its own queue, since you are creating a queue in the constructor of your Callable. (It's actually very weird to have a Callable<Void> -- isn't that just a Runnable?)
There are other problems there, for example, the fact that you're working with a queue that isn't thread-safe, or the fact that your code won't compile as it is written.
The important question, though, is do you really need to explicitly create a queue in the first place? Why not have an ExecutorService to which you submit your Callables (or Runnables if you decide to make that switch): Pass a reference to the executor into your Callables, and they can add new Callables to the executor's queue of tasks to run. No need to reinvent the wheel.
For example:
static class process_threaded implements Runnable {
// Reference to an executor
private final ExecutorService exec;
// Reference to the job counter
private final AtomicInteger jobCounter;
// Request to process
private queue_class request;
process_threaded( ExecutorService exec, AtomicInteger counter, queue_class request) {
this.exec = exec;
this.jobCounter = counter;
this.jobCounter.incrementAndGet(); // Assuming that you will always
// submit the process_threaded to
// the executor if you create it.
this.request = request;
}
public run() {
//get work and process **request**, if it work great then the solution goes elsewhere
//otherwise, depending on the data, its either discarded or parts of are added back to the executor
exec.submit( new process_threaded( exec, new queue_class(k, data_list) ) );
// Can do some more work
// Always run before returning: counter update and notify the launcher
synchronized(jobCounter){
jobCounter.decrementAndGet();
jobCounter.notifyAll();
}
}
}
Edit:
To solve your problem of when to shut down the executor, I think the simplest solution is to have a job counter, and shutdown when it reaches 0. For thread-safety an AtomicInteger is probably the best choice. I added some code above to incorporate the change. Then your launching code would look something like this:
void theLauncher() {
AtomicInteger jobCounter = new AtomicInteger( 0 );
ExecutorService exec = Executors.newFixedThreadPool( Runtime.getRuntime().availableProcesses());
exec.submit( new process_threaded( exec, jobCounter, someProcessRequest ) );
// Can submit some other things here of course...
// Wait for jobs to complete:
for(;;jobCounter.get() > 0){
synchronized( jobCounter ){ // (I'm not sure if you have to have the synchronized block, but I think this is safer.
if( jobCounter.get() > 0 )
jobCounter.wait();
}
}
// Now you can shutdown:
exec.shutdown();
}
Don't reinvent the wheel! How about using ConcurrentLinkedQueue? From the javadocs:
An unbounded thread-safe queue based on linked nodes. This queue orders elements FIFO (first-in-first-out). The head of the queue is that element that has been on the queue the longest time. The tail of the queue is that element that has been on the queue the shortest time. New elements are inserted at the tail of the queue, and the queue retrieval operations obtain elements at the head of the queue. A ConcurrentLinkedQueue is an appropriate choice when many threads will share access to a common collection.