invokeAll how it exactly work? (ForkJoin)

invokeAll how it exactly work? (ForkJoin) - java

I have written the following snippet:
static private int counter;
public void compute()
{
if (array.length<=500)
{
for(int i = 0;i<array.length;i++){
counter++;
System.out.println("Ciao this is a recursive action number"+ counter+Thread.currentThread().getName());
}
}
else{
int split = array.length/2;
RecursiveActionTry right = new RecursiveActionTry(split);
RecursiveActionTry left = new RecursiveActionTry(split);
invokeAll(right, left);
I see that invokeAll() automatically fork one of the two RecursiveActionTry object I pass to. My laptop has only 2 cores.. what if I had 4 cores and launched 4 tasks... invokeAll(right, left, backward, forward); would I use all the 4 cores? Cannot know as I have only 2 cores.
I would like also to know if invokeAll(right, left) behind the scenes call compute() for the first argument(right) and fork + join for the second argument (left). (as in a RecursiveTask extension is supposed to be). Otherwise it would not use parallelism, would it?
And by the way, if there are more than 2 arguments.. does it call compute() on the first and fork on all the others?
Thanks in advance.

invokeAll() calls a number of tasks which execute independently on different threads. This does not necessitate the use of a different core for each thread, but it can allow the use of a different core for each thread if they are available. The details are handled by the underlying machine, but essentially (simplistically) if fewer cores are available than threads it time slices the threads so as to allow one to execute on one core for a certain amount of time, then the other, then another (in a loop.)
And by the way, if there are more than 2 arguments.. does it call compute() on the first and fork on all the others?
It would compute() all the arguments, it's then the responsibility of the compute() method to delegate and fork if the worker threshold is not met, then join the computations when copmlete. (Splitting it more than two ways is unusual though - fork join usually works by each recursion splitting the workload in two if necessary.)

The tasks and the worker threads are different things:
WorkerThreads are managed by the ForkJoinPool and if you use the default constructor it starts WorkerThreads according toRuntime.getRuntime().availableProcessors().
Task however are created/managed by you. To get multiple cores busy you must start several tasks. You may either split each into two parts or into N parts. While one part is executed directly, the other one(s) are put into a waiting queue. If any other WorkerThreads from the pool are idle and have no work to do, those are supposed to "steal" your forked pending task(s) from the queue and execute them in parallel.
To get 8 cores / WorkerThread busy it is not necessary to invoke 8 tasks at once. It will be sufficient to fork at least into two task which also fork again (recursively), until all WorkerThread get saturated (supposed your overall problem splits into that many subtasks).
So, there is no need to adapt your code if you have more or less cores, and your Task should not worry about WorkerThread management at all.
Finally invokeAll() or join() returns after all tasks have been run.

Related

Executor pool limit number of threads at a time

I have a situation in which I have to run some 10,000 threads. Obviously, one machine cannot run these many threads in parallel. Is there any way by which we can ask Thread pool to run some specific number of threads in the beginning and as soon as one thread finishes, the threads which are left can start their processing ?

Executors.newFixedThreadPool(nThreads) is what most likely you are looking for. There will only be as many threads running at one time as the number of threads specified. And yes one machine cannot run 10,000 threads at once in parallel, but it will be able to run them concurrently. Depending on how resource intensive each thread is, it may be more efficient in your case to use
Executors.newCachedThreadPool() wherein as many threads are created as needed, and threads that have finished are reused.

Using Executors.newFixedThreadPool(10000) with invokeAll will throw an OutOfMemory exception with that many threads. You still could use it by submitting tasks to it instead of invoking all tasks at same time, that's I would say safer than just invokeAll.

For this use case. You can have a ThreadPollExecuter with Blocking Queue. http://howtodoinjava.com/core-java/multi-threading/how-to-use-blockingqueue-and-threadpoolexecutor-in-java/ this tutorial explains that very well.

It sounds like you want to run 10,000 tasks on a group of threads. A relatively simple approach is to create a List and then add all the tasks to the list, wrapping them in Runnable. Then, create a class that takes the list in the constructor and pops a Runnable of the list and then runs it. This activity must be synchronized in some manner. The class exits when the list is empty. Start some number of threads using this class. They'll burn down the list and then stop. Your main thread can monitor the length of the list.

How to compute approximated values concurrently in Java?

This is probably a question with many possible answers, but I'm asking for the best design, rather than "how can this be done at all".
Let's assume we are implementing a program with a UI that computes Pi. I can hit a "Start" button to start the computation and a "Stop" button to abort the computation, giving me a message box with the highest precision value of Pi computed so far.
I guess the straight forward approach would be starting a Runnable in a new Thread. The runnable computes Pi, and stores the current value in a shared variable, both threads have access to. "Stop" would abort the Thread, and display the shared variable.
I have a feeling this could be implemented more elegantly, though, but I'm not sure how. Maybe using a CompletableFuture?
I'd rather solve this without adding any new libraries to my project, but if you know a library that supports this particularly well, please leave it in the comments.
Obviously, computing Pi will never finish. It would be great though, if the solution also supports e.g. computing the best move in a game of chess. Which will finish, given enough time, but usually has to be aborted, returning the best move so far.

Referring to your examples of computing Pi or computing the best moves in chess approximately, you approximation algorithm has be iterative in nature. Like random sampling for Pi and MCMC for chess. This lets me think of two appraoches.
1. Using a threadsafe flag
Cou can use AtomicBoolean which is a threadsafe boolean variable. You need to pass it to your Runnable and make it check its state while computing the approximation. At the same time you button listener which stops the computation is able to set the variable.
2. Computing small chunks
The iterative nature of the algorithm makes it possible to split the computation and later aggregate it again. E.g you compute 1000 iterations, you can split it in chunks of 200 iterations compute these 5 chunks and aggregate the result.
I would now suggest to use an ExecutorCompletionService and a TimerTask. The idea is to compute a small amount of iterations, which take only a short amount of time and repellingly "refill" the Executor with new Runnables using the TimerTask. Lets say computing 5 runnables would take 1 second your timer task would put 5 Runnables into the Executor every 1 second. When you hit the stop button you would stop spawning and just wait for the pending tasks finish collect their results and have an result.
Ofcourse you also need a variable which tells the TimerTask to stop ,after calling the shutdown methof the the completion service, but this one has not to be threadsafe. The additional benefit of this approach is that you computation is concurrent and that you can fully utilize any CPU easily just be spawning more Runnables. Doing this concurrently allows you to compute more in lesser time and obtain better approximations.

Your problem is how to implement a stoppable task that still delivers a result. Approximating values is a good example but can be ignored for the solution.
A FutureTask for example wouldn't work because the contract of those is that they decide themselves when they are done and they can only either have a result or be cancelled.
A shared (e.g. volatile) variable sounds reasonable but has it's drawbacks. When updated regularly in a tight loop you might observe worse performance than using a local variable and reading the state of a shared object is only safe when the object is e.g. immutable or one can guarantee otherwise that reading and writing happen in the correct order.
You can also build something with a result-delivery BlockingQueue where the computing thread puts the current result (or even regular updates to the result) once interruption is requested.
But the best solution is probably a (shared) CompletableFuture. Sort of a single result-item queue but it has nicer semantics for reporting exceptions.
Example:
CompletableFuture<Integer> sharedFuture = new CompletableFuture<>();
Thread computing = new Thread(() -> {
int value = 1;
try {
while (!Thread.currentThread().isInterrupted() &&
!sharedFuture.isDone()) { // check could be omitted
value = value * 32 + 7;
}
sharedFuture.complete(value);
} catch (Throwable t) {
sharedFuture.completeExceptionally(t);
}
});
computing.start();
try {
Thread.sleep((long) (5000 * Math.random()));
} catch (InterruptedException ignored) {
}
computing.interrupt();
System.out.println(sharedFuture.get());
http://ideone.com/8bpEGV
Its not really important how you execute that task. Instead of above Thread you can also use an ExecutorService and then cancel the Future instead of interrupting the thread.

Java ThreadPool concepts, and issues with controlling the number of actual threads

I am a newbie to Java concurrency and am a bit confused by several concepts and implementation issues here. Hope you guys can help.
Say, I have a list of tasks stored in a thread-safe list wrapper:
ListWrapper jobs = ....
'ListWrapper' has synchronized fetch/push/append functions, and this 'jobs' object will be shared by multiple worker threads.
And I have a worker 'Runnable' to execute the tasks:
public class Worker implements Runnable{
private ListWrapper jobs;
public Worker(ListWrapper l){
this.jobs=l;
}
public void run(){
while(! jobs.isEmpty()){
//fetch an item from jobs and do sth...
}
}
}
Now in the main function I execute the tasks:
int NTHREADS =10;
ExecutorService service= Executors.newFixedThreadPool(NTHREADS);
//run threads..
int x=3;
for(int i=0; i<x; i++){
service.execute(new Worker(jobs) );
}
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
Now my questions are:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
Thread t1= new Worker(jobs);
Thread t2= new Worker(jobs);
...
t1.join();
t2.join();
...
Thank you very much!!

[[ There are some good answers here but I thought I'd add some more detail. ]]
I tested this code with 'x=3', and I found that only 3 threads are running at the same time; but as I set 'x=20', I found that only 10 (=NTHREADS) are running at the same time. Seems to me the # of actual threads is the min of the two values.
No, not really. I suspect that the reason you weren't seeing 20 threads is that threads had already finished or had yet to be started. If you call new Thread(...).start() 20 times then you will get 20 threads started. However, if you check immediately none of them may have actually begun to run or if you check later they may have finished.
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
Quoting the Javadocs of Executors.newFixedThreadPool(...):
Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks.
So changing the NTHREADS constant changes the number of threads running in the pool. Changing x changes the number of jobs that are executed by those threads. You could have 2 threads in the pool and submit 1000 jobs or you could have 1000 threads and only submit 1 job for them to work on.
Btw, after you have submitted all of your jobs, you should then shutdown the pool which stops all of the threads if all of the jobs have been run.
service.shutdown();
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It differs in that it does all of the heavy work for you.
You don't have to create a ListWrapper of the jobs since you get one inside of the ExecutorService. You just submit the jobs to the ExecutorService and it keeps track of them until the threads are available to run them.
You don't have to create any threads or worry about them throwing exceptions and dying because the ExecutorService starts/restarts the threads for you.
If you want your tasks to return information you can make use of the submit(Callable) method and use the Future to get the results of the jobs. Etc, etc..
Doing this code yourself is going to be harder to get right, more code to maintain, and most likely will not perform as well as the code in the JDK that is battle tested and optimized.

You shouldn't create threads by yourself when using a threadpool. Instead of WorkerThread class you should use a class that implements Runnable but is not a thread. Passing a Thread object to the threadpool won't make the thread run actually. The object will be passed to a different internal thread, which will simply execute the run method of your WorkerThread class.
The ExecutorService is simply incompatible with the way you want to write your program.
In the code you have right now, these WorkerThreads will stop to work when your ListWrapper is empty. If you then add something to the list, nothing will happen. This is definitely not what you wanted.
You should get rid of ListWrapper and simply put your tasks directly into the threadpool. The threadpool already incorporates an internal list of jobs shared between the threads. You should just submit your jobs to the threadpool and it will handle them accordingly.
To answer your questions:
1) Which value ('x' or 'NTHREADS') should I set to control the number of concurrent threads? Or it doesn't matter in either I choose?
NTHREADS, the threadpool will create the necessary number of threads.
2) How is this approach different from simply using the Producer-Consumer pattern --creating a fixed number of 'stud' threads to execute the tasks(shown in the code below)?
It's just that ExecutorService automates a lot of things for you. You can choose from a lot of different implementations of threadpools and you can substitute them easily. You can use for instance a scheduled executor. You get extra functionality. Why reinvent the wheel?

For 1) NTHREADS is the maximum threads that the pool will ever run concurrently, but that doesn't mean there will always be that many running. It will only use as many as is needed up to that max value... which in your case is 3.
As the docs say:
At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool-int-
As for 2) using Java's concurrent executors framework is preferred with new code. You get a lot of stuff for free and removes the need for having to handle all of the fiddly thread work yourself.

The number of threads passed into newFixedThreadPool is at most how many threads could be running executing your tasks. If you only have three tasks ever submitted I'd expect the ExecutorService to only create three threads.
To answer your questions:
You should use the number you pass into the constructor to control how many threads are going to be used to execute your tasks.
This differs because of the extra functionality the ExecutorService gives you, as well as the flexibility it gives you such as in the case you need to change your ExecutorService type or number of tasks you'll run (less lines of code to change).

All that is happening is the executor service is only creating as many threads as it needs. NTHREADS is effectively the maximum number of threads it'll create.
There is no point creating ten threads up front if it only has 3 tasks to complete, the other 7 will just be hanging around consuming resources.
If you submit more than NTHREADS number of tasks then it will process that number concurrently and the rest will wait on a queue until a thread becomes free.
This isn't any different from creating a fixed set of your own threads, except the thread management and scheduling is handled for you. The executor service also restarts threads if they are killed by rogue exceptions in your task which you'd otherwise have to code for.
See: The Javadoc on Executorservice.newFixedThreadPool

Balancing multiple queues

I suspect this is really easy but I’m unsure if there’s a naïve way of doing it in Java. Here’s my problem, I have two scripts for processing data and both have the same inputs/outputs except one is written for the single CPU and the other is for GPUs. The work comes from a queue server and I’m trying to write a program that sends the data to either the CPU or GPU script depending on which one is free.
I do not understand how to do this.
I know with executorservice I can specify how many threads I want to keep running but not sure how to balance between two different ones. I have 2 GPU’s and 8 CPU cores on the system and thought I could have threadexecutorservice keep 2 GPU and 8 CPU processes running but unsure how to balance between them since the GPU will be done a lot quicker than the CPU tasks.
Any suggestions on how to approach this? Should I create two queues and keep pooling them to see which one is less busy? or is there a way to just put all the work units(all the same) into one queue and have the GPU or CPU process take from the same queue as they are free?
UPDATE: just to clarify. the CPU/GPU programs are outside the scope of the program I'm making, they are simply scripts that I call via two different method. I guess the simplified version of what I'm asking is if two methods can take work from the same queue?

Can two methods take work from the same queue?
Yes, but you should use a BlockingQueue to save yourself some synchronization heartache.
Basically, one option would be to have a producer which places tasks into the queue via BlockingQueue.offer. Then design your CPU/GPU threads to call BlockingQueue.take and perform work on whatever they receive.
For example:
main (...) {
BlockingQueue<Task> queue = new LinkedBlockingQueue<>();
for (int i=0;i<CPUs;i++) {
new CPUThread(queue).start();
}
for (int i=0;i<GPUs;i++) {
new GPUThread(queue).start();
}
for (/*all data*/) {
queue.offer(task);
}
}
class CPUThread {
public void run() {
while(/*some condition*/) {
Task task = queue.take();
//do task work
}
}
}
//etc...

Obviously there is more than one way to do it, usually simplest is the best. I would suggest threadpools, one with 2 threads for CPU tasks, second with 8 threads will run GPU tasks. Your work unit manager can submit work to the pool that has idle threads at the moment (I would recommend synchronizing that block of code). Standard Java ThreadPoolExecutor has getActiveCount() method you can use for it, see
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html#getActiveCount().

Use Runnables like this:
CPUGPURunnable implements Runnable {
run() {
if ( Thread.currentThread() instance of CPUGPUThread) {
CPUGPUThread t = Thread.currentThread();
if ( t.isGPU())
runGPU();
else
runCPU();
}
}
}
CPUGPUThreads is a Thread subclass that knows if it runs in CPU or GPU mode, using a flag. Have a ThreadFactory for ThreadPoolExecutors that creates either a CPU of GPU thread. Set up a ThreadPoolExecutor with two workers. Make sure the Threadfactory creates a CPU and then a GPU thread instance.

I suppose you have two objects that represents two GPUs, with methods like boolean isFree() and void execute(Runnable). Then you should start 8 threads which in a loop take next job from the queue, put it in a free GPU, if any, otherwise execute the job itself.

Kind of load balanced thread pool in java

I am looking for a load balanced thread pool with no success so far. (Not sure whether load balancing is the correct wording).
Let me explain what I try to achieve.
Part 1:
I have Jobs, with 8 to 10 single tasks. On a 6 core CPU I let 8 thread work on this tasks in parallel which seems to deliver best peformance. Whe one task is ready, another one can start. Once all ten tasks are finished, the complete job is done. Usually a job is done in 30 to 60 seconds.
Part two:
Some times, unfortunately, the job takes more then two hours. This is correct due to amount of data that has to be calculated.
The bad thing is, that no other job can start while job1 is running (assuming, that all threads have the same duration) because it is using all threads.
My First idea:
Have 12 threads, allow up to three jobs in parallel.
BUT: that means, the cou is not fully untilized when there is only 1 job.
I am looking for a solution to have full CPU power for job one when there is no other job. But when an other job needs to be started while one other is running, I want the CPU power allocated to both job. And when a third or fourth job shows up, I want the cpu power alocated fairly to all four jobs.
I apreciate your answers...
thanks in advance

One possibility might be to use a standard ThreadPoolExecutor with a different kind of task queue
public class TaskRunner {
private static class PriorityRunnable implements Runnable,
Comparable<PriorityRunnable> {
private Runnable theRunnable;
private int priority = 0;
public PriorityRunnable(Runnable r, int priority) {
this.theRunnable = r;
this.priority = priority;
}
public int getPriority() {
return priority;
}
public void run() {
theRunnable.run();
}
public int compareTo(PriorityRunnable that) {
return this.priority - that.priority;
}
}
private BlockingQueue<Runnable> taskQueue = new PriorityBlockingQueue<Runnable>();
private ThreadPoolExecutor exec = new ThreadPoolExecutor(8, 8, 0L,
TimeUnit.MILLISECONDS, taskQueue);
public void runTasks(Runnable... tasks) {
int priority = 0;
Runnable nextTask = taskQueue.peek();
if(nextTask instanceof PriorityRunnable) {
priority = ((PriorityRunnable)nextTask).getPriority() + 1;
}
for(Runnable t : tasks) {
exec.execute(new PriorityRunnable(t, priority));
priority += 100;
}
}
}
The idea here is that when you have a new job you call
taskRunner.runTasks(jobTask1, jobTask2, jobTask3);
and it will queue up the tasks in such a way that they interleave nicely with any existing tasks in the queue (if any). Suppose you have one job queued, whose tasks have priority numbers j1t1=3, j1t2=103, and j1t3=203. In the absence of other jobs, these tasks will execute one after the other as quickly as possible. But if you submit another job with three tasks of its own, these will be assigned priority numbers j2t1=4, j2t2=104 and j2t3=204, meaning the queue now looks like
j1t1, j2t1, j1t2, j2t2, etc.
This is not perfect however, because if all threads are currently working (on tasks from job 1) then the first task of job 2 can't start until one of the job 1 tasks is complete (unless there's some external way for you to detect this and interrupt and re-queue some of job 1's tasks). The easiest way to make things more fair would be to break down the longer-running tasks into smaller segments and queue those as separate tasks - you need to get to a point where each individual job involves more tasks than there are threads in the pool, so that some of the tasks will always start off in the queue rather than being assigned directly to threads (if there are idle threads then exec.execute() passes the task straight to a thread without going through the queue at all).

The easiest thing to do is to oversubscribe your CPU, as Kanaga suggests, but start 8 threads each. There may be some overhead from the competition, but if you get to a single job situation, it will fully utilize the CPU. The OS will handle giving time to each thread.
Your "first idea" would also work. The idle threads wouldn't take resources from 8 working threads if they aren't actually executing a task. This wouldn't distribute the cpu resources as evenly when there are multiple jobs running, though.
Do you have a setup where you can test these different pipelines to see how they're performing for you?

I think since your machine is 6 core CPU. Better have 6 worker thread for each job-thread. So that when ever one thread got a new job, it starts up to six parallel workers to work on the single job. This will ensure consuming the full cpu power when there is only one job at a time.
Also please have a look at Fork and Join concept in java 7.
References_1 References_2References_3 References_4
Also learn about newcachedthreadpool()
Java newCachedThreadPool() versus newFixedThreadPool

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.