So in Java concurrency, there is the concept of a task which is really any implementing Runnable or Callable (and, more specifically, the overridden run() or call() method of that interface).
I'm having a tough time understanding the relationship between:
A task (Runnable/Callable); and
An ExecutorService the task is submitted to; and
An underlying, concurrent work queue or list structure used by the ExecutorService
I believe the relationship is something of the following:
You, the developer, must select which ExecutorService and work structure best suits the task at hand
You initialize the ExecutorService (say, as a ScheduledThreadPool) with the underlying structure to use (say, an ArrayBlockingQueue) (if so, how?!?!)
You submit your task to the ExecutorService which then uses its threading/pooling strategy to populate the given structure (ABQ or otherwise) with copies of the task
Each spawned/pooled thread now pulls copies of the task off of the work structure and executes it
First off, please correct/clarify any of the above assumptions if I am off-base on any of them!
Second, if the task is simply copied/replicated over and over again inside the underlying work structure (e.g., identical copies in each index of a list), then how do you ever decompose a big problem down into smaller (concurrent) ones? In other words, if the task simply does steps A - Z, and you have an ABQ with 1,000 of those tasks, then won't each thread just do A - Z as well? How do you say "some threads should work on A - G, while other threads should work on H, and yet other threads should work on I - Z", etc.?
For this second one I might need a code example to visualize how it all comes together. Thanks in advance.
Your last assumption is not quite right. The ExecutorService does not pull copies of the task. The program must supply all tasks individually to be performed by the ExecutorService. When a task has finished, the next task in the queue is executed.
An ExecutorService is an interface for working with a thread pool. You generally have multiple tasks to be executed on the pool, and each operates on a different part of the problem. As the developer, you must specify which parts of the problem each task should work on when creating it, before sending it to the ExecutorService. The results of each task (assuming they are working on a common problem) should be added to a BlockingQueue or other concurrent collection, where another thread may use the results or wait for all tasks to finish.
Here is an article you may want to read about how to use an ExecutorService: http://www.vogella.com/articles/JavaConcurrency/article.html#threadpools
Update: A common use of the ExecutorService is to implement the producer/consumer pattern. Here is an example I quickly threw together to get you started--it is intended for demonstration purposes only, as some details and concerns have been omitted for simplicity. The thread pool contains multiple producer threads and one consumer thread. The job being performed is to sum the numbers from 0...N. Each producer thread sums a smaller interval of numbers, and publishes the result to the BlockingQueue. The consumer thread processes each result added to the BlockingQueue.
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class NumberCounter {
private final ExecutorService pool = Executors.newFixedThreadPool(2);
private final BlockingQueue<Integer> queue = new ArrayBlockingQueue(100);
public void startCounter(int max, int workers) {
// Create multiple tasks to add numbers. Each task submits the result
// to the queue.
int increment = max / workers;
for (int worker = 0; worker < workers; worker++) {
Runnable task = createProducer(worker * increment, (worker + 1) * increment);
pool.execute(task);
}
// Create one more task that will consume the numbers, adding them up
// and printing the results.
pool.execute(new Runnable() {
#Override
public void run() {
int sum = 0;
while (true) {
try {
Integer result = queue.take();
sum += result;
System.out.println("New sum is " + sum);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
}
private Runnable createProducer(final int start, final int stop) {
return new Runnable() {
#Override
public void run() {
System.out.println("Worker started counting from " + start + " to " + stop);
int count = 0;
for (int i = start; i < stop; i++) {
count += i;
}
queue.add(count);
}
};
}
public static void main(String[] args) throws InterruptedException {
NumberCounter counter = new NumberCounter();
counter.startCounter(10000, 5);
}
}
Related
package com.playground.concurrency;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
public class MyRunnable implements Runnable {
private String taskName;
public String getTaskName() {
return taskName;
}
public void setTaskName(String taskName) {
this.taskName = taskName;
}
private int processed = 0;
public MyRunnable(String name) {
this.taskName = name;
}
private boolean keepRunning = true;
public boolean isKeepRunning() {
return keepRunning;
}
public void setKeepRunning(boolean keepRunning) {
this.keepRunning = keepRunning;
}
private BlockingQueue<Integer> elements = new LinkedBlockingQueue<Integer>(10);
public BlockingQueue<Integer> getElements() {
return elements;
}
public void setElements(BlockingQueue<Integer> elements) {
this.elements = elements;
}
#Override
public void run() {
while (keepRunning || !elements.isEmpty()) {
try {
Integer element = elements.take();
Thread.sleep(10);
System.out.println(taskName +" :: "+elements.size());
System.out.println("Got :: " + element);
processed++;
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
System.out.println("Exiting thread");
}
public int getProcessed() {
return processed;
}
public void setProcessed(int processed) {
this.processed = processed;
}
}
package com.playground.concurrency.service;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import com.playground.concurrency.MyRunnable;
public class TestService {
public static void main(String[] args) throws InterruptedException {
int roundRobinIndex = 0;
int noOfProcess = 10;
List<MyRunnable> processes = new ArrayList<MyRunnable>();
for (int i = 0; i < noOfProcess; i++) {
processes.add(new MyRunnable("Task : " + i));
}
ExecutorService threadPoolExecutor = Executors.newFixedThreadPool(5);
for (MyRunnable process : processes) {
threadPoolExecutor.execute(process);
}
int totalMessages = 1000;
long start = System.currentTimeMillis();
for (int i = 1; i <= totalMessages; i++) {
processes.get(roundRobinIndex++).getElements().put(i);
if (roundRobinIndex == noOfProcess) {
roundRobinIndex = 0;
}
}
System.out.println("Done putting all the elements");
for (MyRunnable process : processes) {
process.setKeepRunning(false);
}
threadPoolExecutor.shutdown();
try {
threadPoolExecutor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
long totalProcessed = 0;
for (MyRunnable process : processes) {
System.out.println("task " + process.getTaskName() + " processd " + process.getProcessed());
totalProcessed += process.getProcessed();
}
long end = System.currentTimeMillis();
System.out.println("total time" + (end - start));
}
}
I have a simple task that reads elements from a LinkedBlockingQueue. I create multiple instances of these tasks and execute by ExecutorService . This programs works as expected when the noOfProcess and thread pool size is same.(For ex: noOfProcess=10 and thread pool size=10).
However , if noOfProcess=10 and thread pool size =5 then the main thread keeps waiting at the below line after processing a few items.
processes.get(roundRobinIndex++).getElements().put(i);
What am i doing wrong here ?
Ah yes. The good old deadlock.
What happens is: You submit 10 Tasks to the ExecutorService, and then send jobs via .put(i). This blocks for Task 5 as expected when its queue is full. Now Task 5 is not currently being executed, and as a matter of fact will never be, since Task 0 to 4 are still clogging up your FixedThreadPool, blocking at .take() in the run() Method waiting for new Jobs from .put(i), which they will never get.
This error is a fundamental design flaw within your code and there are myriads of ways to fix it, one of which being the increased Thread Pool Size.
My suggestion is that you go back to the drawing board and rethink the structure in the main Method.
And since you posted your code, have some tips:
1.:
Posting your entire code can be interpreted as a call to 'pls fix my code', and you are encouraged to omit all uneccessary details (like all those getters and setters). Maybe check https://stackoverflow.com/help/minimal-reproducible-example
2.:
Posting two classes in the same body made things kinda complicated. Split it next time.
3.: (nitpick)
processes.get(roundRobinIndex++).getElements().put(i);
Combining two operations like you did here is bad style since it makes your code less readable for others. You could just have written:
processes.get(i % noOfProcesses).getElements().put(i);
To fix the behavior, you need to do one of the following:
have enough Runnables, each with enough queue capacity to take all 1,000 messages (for example: 100 Runnables with capacity 10 or more; or 10 Runnables with capacity 100 or more), or
have a thread pool that is large enough to accomodate all of your Runnables so that each of them can start running.
Without one of those happening, the ExecutorService will not start the extra Runnables. The main worker thread will continue adding items to each queue, including those of non-running Runnables, until it encounters a queue that is full, at which point it blocks. With 10 Runnables and thread pool size 5, the first queue to fill up will the be the 6th Runnable. This is the same if you had just 6 Runnables. The significant point is that you have at least one more Runnable than you have room in your thread pool.
From newFixedThreadPool() Javadoc:
If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available.
Consider a simpler example of 2 processes and thread pool size of 1. You'll be allowed to create the first process and submit it to the ExecutorService (so the ExecutorService will start and run it). The second process however, will not be allowed to run by the ExecutorService. Your main thread does not pay attention to this, however, and it will continue putting elements into the queue for the second process even though nothing is consuming it.
Your code is ok with noOfProcess=10 and thread pool size=5 – if you also change your queue size to 100, like this: new LinkedBlockingQueue<>(100).
You can observe this behavior – where the queue of a non-running Runnable fills up – if you change this line:
processes.get(roundRobinIndex++).getElements().put(i);
to this (which is the same logical code, but has object references saved for use inside the println() output):
MyRunnable runnable = processes.get(roundRobinIndex++);
BlockingQueue<Integer> elements = runnable.getElements();
System.out.println("attempt to put() for " + runnable.getTaskName() + " with " + elements.size() + " elements");
elements.put(i);
OK, I created couples of threads to do some complex task. Now How may I check each threads whether it has completed successfully or not??
class BrokenTasks extends Thread {
public BrokenTasks(){
super();
}
public void run(){
//Some complex tasks related to Networking..
//Example would be fetching some data from the internet and it is not known when can it be finished
}
}
//In another class
BrokenTasks task1 = new BrokenTasks();
BrokenTasks task2 = new BrokenTasks();
BrokenTasks task3 = new BrokenTasks();
BrokenTasks task4 = new BrokenTasks();
task1.start();
.....
task4.start();
So how can I check if these all tasks completed successfully from
i) Main Program (Main Thread)
ii)From each consecutive threads.For example: checking if task1 had ended or not from within task2..
A good way to use threads is not to use them, directly. Instead make a thread pool. Then in your POJO task encapsulation have a field that is only set at the end of computation.
There might be 3-4 milliseconds delay when another thread can see the status - but finally the JVM makes it so. As long as other threads do not over write it. That you can protect by making sure each task has a unique instance of work to do and status, and other threads only poll that every 1-5 seconds or have a listener that the worker calls after completion.
A library I have used is my own
https://github.com/tgkprog/ddt/tree/master/DdtUtils/src/main/java/org/s2n/ddt/util/threads
To use : in server start or static block :
package org.s2n.ddt.util;
import org.apache.log4j.Logger;
import org.junit.Test;
import org.s2n.ddt.util.threads.PoolOptions;
import org.s2n.ddt.util.threads.DdtPools;
public class PoolTest {
private static final Logger logger = Logger.getLogger(PoolTest.class);
#Test
public void test() {
PoolOptions options = new PoolOptions();
options.setCoreThreads(2);
options.setMaxThreads(33);
DdtPools.initPool("a", options);
Do1 p = null;
for (int i = 0; i < 10; i++) {
p = new Do1();
DdtPools.offer("a", p);
}
LangUtils.sleep(3 + (int) (Math.random() * 3));
org.junit.Assert.assertNotNull(p);
org.junit.Assert.assertEquals(Do1.getLs(), 10);
}
}
class Do1 implements Runnable {
volatile static long l = 0;
public Do1() {
l++;
}
public void run() {
// LangUtils.sleep(1 + (int) (Math.random() * 3));
System.out.println("hi " + l);
}
public static long getLs() {
return l;
}
}
Things you should not do:
* Don't do things every 10-15 milliseconds
* Unless academic do not make your own thread
* don't make it more complex then it needs for 97% of cases
You can use Callable and ForkJoinPool for this task.
class BrokenTasks implements Callable {
public BrokenTasks(){
super();
}
public Object call() thrown Exception {
//Some complex tasks related to Networking..
//Example would be fetching some data from the internet and it is not known when can it be finished
}
}
//In another class
BrokenTasks task1 = new BrokenTasks();
BrokenTasks task2 = new BrokenTasks();
BrokenTasks task3 = new BrokenTasks();
BrokenTasks task4 = new BrokenTasks();
ForkJoinPool pool = new ForkJoinPool(4);
Future result1 = pool.submit(task1);
Future result2 = pool.submit(task2);
Future result3 = pool.submit(task3);
Future result4 = pool.submit(task4);
value4 = result4.get();//blocking call
value3 = result3.get();//blocking call
value2 = result2.get();//blocking call
value1 = result1.get();//blocking call
And don't forget to shutdown pool after that.
Classically you simply join on the threads you want to finish. Your thread does not proceed until join completes. For example:
// await all threads
task1.join();
task2.join();
task3.join();
task4.join();
// continue with main thread logic
(I probably would have put the tasks in a list for cleaner handling)
If a thread has not been completed its task then it is still alive. So for testing whether the thread has completed its task you can use isAlive() method.
There are two different questions here
One is if the thread still working.
The other one is if the task still not finished.
Thread is a very expensive method to solve problem, when we start a thread in java, the VM has to store context informations and solve synchronize problems(such as lock). So we usually use thread pool instead of directly thread. The benefit of thread pool is that we can use few thread to handle many different tasks. That means few threads keeps alive, while many tasks are finished.
Don’t find task status from a thread.
Thread is a worker, and tasks are jobs.
A thread may work on many different jobs one by one.
I don’t think we should ask a worker if he has finished a job. I’d rather ask the job if it is finished.
When I want to check if a job is finished, I use signals.
Use signals (synchronization aid)
There are many synchronization aid tools since JDK 1.5 works like a signal.
CountDownLatch
This object provides a counter(can be set only once and count down many times). This counter allows one or more threads to wait until a set of operations being performed in other threads completes.
CyclicBarrier
This is another useful signal that allows a set of threads to all wait for each other to reach a common barrier point.
more tools
More tools could be found in JDK java.util.concurrent package.
You can use Thread.isAlive method, see API: "A thread is alive if it has been started and has not yet died". That is in task2 run() you test task1.isAlive()
To see task1 from task2 you need to pass it as an argument to task2's construtor, or make tasks fields instead of local vars
You can use the following..
task1.join();
task2.join();
task3.join();
task4.join();
// and then check every thread by using isAlive() method
e.g : task1.isAlive();
if it return false means that thread had completed it's task
otherwise it will true
I'm not sure of your exact needs, but some Java application frameworks have handy abstractions for dealing with individual units of work or "jobs". The Eclipse Rich Client Platform comes to mind with its Jobs API. Although it may be overkill.
For plain old Java, look at Future, Callable and Executor.
Bear with me as I'm not terribly savvy in multithreaded programming...
I'm currently building out a system that uses a ThreadPool ExecutorService for various runnables. That much is straightforward. However, I'm looking at the possibility of having the runnables themselves spawn an additional runnable based on what happens in the original runnable (ie, if success, do this, if fail, do this, etc as some tasks must be complete before others execute). It should be noted that the main thread does not need to be notified of the results of these tasks, although it might be handy for handling exceptions, ie, if an external service cannot be contacted and all threads are throwing exceptions as a result, then stop submitting tasks and periodically check on the external service until it comes back up. This isn't completely necessary, but it would be nice.
Ie, submit Task A. Task A does some things. If everything goes well, Task A will execute Task B. If something doesn't work out properly or an exception is thrown, execute Task C. Each child task may also have additional tasks, but only a few levels deep. I'd much rather do something like this than large, snarled conditionals in a single task as this approach allows for much greater flexibility.
However, I'm not certain how this would affect the thread pool. I would assume that any additional thread(s) created from within a thread in the pool would exist outside of the pool as they themselves were not submitted directly to the pool. Is this a correct assumption? If so, it's likely a bad idea (well, if not, it may not be a very good idea anyway) as it could result in a lot more threads as the original thread completes and a new task is submitted while the thread spawned from the earlier task is still going (and may last considerably longer than others).
I've also considered implementing these as Callables instead and placing a response object in the Future that is returned, then add the appropriate Callable to the thread pool based on the response. However, this would tie all actions back to the main thread, which seems an unnecessary bottleneck. I suppose I could place a Runnable into the pool that itself handles the execution of the Callable and subsequent actions, but then I get twice as many threads.
Am I on the right track here or am I completely off the rails?
I have never used this, but it can be useful for you: http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
There are many ways to do what you want. You need to be careful you don't end up creating too many threads.
The following is an example, you could make this more efficient with an ExecutorCompletionService and alternatively you could use Runnable's.
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class ThreadsMakeThreads {
public static void main(String[] args) {
new ThreadsMakeThreads().start();
}
public void start() {
//Create resources
ExecutorService threadPool = Executors.newCachedThreadPool();
Random random = new Random(System.currentTimeMillis());
int numberOfThreads = 5;
//Prepare threads
ArrayList<Leader> leaders = new ArrayList<Leader>();
for(int i=0; i < numberOfThreads; i++) {
leaders.add(new Leader(threadPool, random));
}
//Get the results
try {
List<Future<Integer>> results = threadPool.invokeAll(leaders);
for(Future<Integer> result : results) {
System.out.println("Result is " + result.get());
}
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
threadPool.shutdown();
}
class Leader implements Callable<Integer> {
private ExecutorService threadPool;
private Random random;
public Leader(ExecutorService threadPool, Random random) {
this.threadPool = threadPool;
this.random = random;
}
#Override
public Integer call() throws Exception {
int numberOfWorkers = random.nextInt(10);
ArrayList<Worker> workers = new ArrayList<Worker>();
for(int i=0; i < numberOfWorkers; i++) {
workers.add(new Worker(random));
}
List<Future<Integer>> tasks = threadPool.invokeAll(workers);
int result = 0;
for(Future<Integer> task : tasks) {
result += task.get();
}
return result;
}
}
class Worker implements Callable<Integer> {
private Random random;
public Worker(Random random) {
this.random = random;
}
#Override
public Integer call() throws Exception {
return random.nextInt(100);
}
}
}
Submitting tasks to the thread pool from other tasks is quite meaningful idea. But I am afraid you think of running new tasks on separate threads, that really can eat all the memory. Just set a limit to the number of threads when the pool is created, and submit new tasks to that thread pool.
This approach can be further elaborated in different directions. First, treat tasks as ordinary objects, with interface methods, and let that methods decide if they want to submit this object to the thread pool. This requires that each task knows its thread pool - pass it as a parameter at the time of creation. Even more convenient, keep reference to the thread pool as a thread local variable.
You can easily emulate functional programming: an object represents a function call, and for each parameter it has corresponding set method. When all parameters are set, the object is submitted to the thread pool.
Another direction is actor programming: task class has single set method, but it can be called multiple times, and if previous argument is not yet processed, the set method does not submit the task to the thread pool, but simply stores its argument in a queue. The run() method processes all available arguments from the queue and then returns.
All this features are implemented in the dataflow library https://github.com/rfqu/df4j. I wrote it intentionally to support task-based parallelism.
There's something odd about the implementation of the BoundedExecutor in the book Java Concurrency in Practice.
It's supposed to throttle task submission to the Executor by blocking the submitting thread when there are enough threads either queued or running in the Executor.
This is the implementation (after adding the missing rethrow in the catch clause):
public class BoundedExecutor {
private final Executor exec;
private final Semaphore semaphore;
public BoundedExecutor(Executor exec, int bound) {
this.exec = exec;
this.semaphore = new Semaphore(bound);
}
public void submitTask(final Runnable command) throws InterruptedException, RejectedExecutionException {
semaphore.acquire();
try {
exec.execute(new Runnable() {
#Override public void run() {
try {
command.run();
} finally {
semaphore.release();
}
}
});
} catch (RejectedExecutionException e) {
semaphore.release();
throw e;
}
}
When I instantiate the BoundedExecutor with an Executors.newCachedThreadPool() and a bound of 4, I would expect the number of threads instantiated by the cached thread pool to never exceed 4. In practice, however, it does. I've gotten this little test program to create as much as 11 threads:
public static void main(String[] args) throws Exception {
class CountingThreadFactory implements ThreadFactory {
int count;
#Override public Thread newThread(Runnable r) {
++count;
return new Thread(r);
}
}
List<Integer> counts = new ArrayList<Integer>();
for (int n = 0; n < 100; ++n) {
CountingThreadFactory countingThreadFactory = new CountingThreadFactory();
ExecutorService exec = Executors.newCachedThreadPool(countingThreadFactory);
try {
BoundedExecutor be = new BoundedExecutor(exec, 4);
for (int i = 0; i < 20000; ++i) {
be.submitTask(new Runnable() {
#Override public void run() {}
});
}
} finally {
exec.shutdown();
}
counts.add(countingThreadFactory.count);
}
System.out.println(Collections.max(counts));
}
I think there's a tiny little time frame between the release of the semaphore and the task ending, where another thread can aquire a permit and submit a task while the releasing thread hasn't finished yet. In other words, it has a race condition.
Can someone confirm this?
BoundedExecutor was indeed intended as an illustration of how to throttle task submission, not as a way to place a bound on thread pool size. There are more direct ways to achieve the latter, as at least one comment pointed out.
But the other answers don't mention the text in the book that says to use an unbounded queue and to
set the bound on the semaphore to be equal to the pool size plus the
number of queued tasks you want to allow, since the semaphore is
bounding the number of tasks both currently executing and awaiting
execution. [JCiP, end of section 8.3.3]
By mentioning unbounded queues and pool size, we were implying (apparently not very clearly) the use of a thread pool of bounded size.
What has always bothered me about BoundedExecutor, however, is that it doesn't implement the ExecutorService interface. A modern way to achieve similar functionality and still implement the standard interfaces would be to use Guava's listeningDecorator method and ForwardingListeningExecutorService class.
You are correct in your analysis of the race condition. There is no synchronization guarantees between the ExecutorService & the Semaphore.
However, I do not know if throttling the number of threads is what the BoundedExecutor is used for. I think it is more for throttling the number of tasks submitted to the service. Imagine if you have 5 million tasks that need to submit, and if you submit more then 10,000 of them you run out of memory.
Well you only will ever have 4 threads running at any given time, why would you want to try and queue up all 5 millions tasks? You can use a construct similar to this to throttle the number of tasks queued up at any given time. What you should get out of this is that at any given time there are only 4 tasks running.
Obviously the resolution to this is to use a Executors.newFixedThreadPool(4).
I see as much as 9 threads created at once. I suspect there is a race condition which causes there to be more thread than required.
This could be because there is before and after running the task work to be done. This means that even though there is only 4 thread inside your block of code, there is a number of thread stopping a previous task or getting ready to start a new task.
i.e. the thread does a release() while it is still running. Even though its the last thing you do its not the last thing it does before acquiring a new task.
I'm trying to figure out how to correctly use Java's Executors. I realize submitting tasks to an ExecutorService has its own overhead. However, I'm surprised to see it is as high as it is.
My program needs to process huge amount of data (stock market data) with as low latency as possible. Most of the calculations are fairly simple arithmetic operations.
I tried to test something very simple: "Math.random() * Math.random()"
The simplest test runs this computation in a simple loop. The second test does the same computation inside a anonymous Runnable (this is supposed to measure the cost of creating new objects). The third test passes the Runnable to an ExecutorService (this measures the cost of introducing executors).
I ran the tests on my dinky laptop (2 cpus, 1.5 gig ram):
(in milliseconds)
simpleCompuation:47
computationWithObjCreation:62
computationWithObjCreationAndExecutors:422
(about once out of four runs, the first two numbers end up being equal)
Notice that executors take far, far more time than executing on a single thread. The numbers were about the same for thread pool sizes between 1 and 8.
Question: Am I missing something obvious or are these results expected? These results tell me that any task I pass in to an executor must do some non-trivial computation. If I am processing millions of messages, and I need to perform very simple (and cheap) transformations on each message, I still may not be able to use executors...trying to spread computations across multiple CPUs might end up being costlier than just doing them in a single thread. The design decision becomes much more complex than I had originally thought. Any thoughts?
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ExecServicePerformance {
private static int count = 100000;
public static void main(String[] args) throws InterruptedException {
//warmup
simpleCompuation();
computationWithObjCreation();
computationWithObjCreationAndExecutors();
long start = System.currentTimeMillis();
simpleCompuation();
long stop = System.currentTimeMillis();
System.out.println("simpleCompuation:"+(stop-start));
start = System.currentTimeMillis();
computationWithObjCreation();
stop = System.currentTimeMillis();
System.out.println("computationWithObjCreation:"+(stop-start));
start = System.currentTimeMillis();
computationWithObjCreationAndExecutors();
stop = System.currentTimeMillis();
System.out.println("computationWithObjCreationAndExecutors:"+(stop-start));
}
private static void computationWithObjCreation() {
for(int i=0;i<count;i++){
new Runnable(){
#Override
public void run() {
double x = Math.random()*Math.random();
}
}.run();
}
}
private static void simpleCompuation() {
for(int i=0;i<count;i++){
double x = Math.random()*Math.random();
}
}
private static void computationWithObjCreationAndExecutors()
throws InterruptedException {
ExecutorService es = Executors.newFixedThreadPool(1);
for(int i=0;i<count;i++){
es.submit(new Runnable() {
#Override
public void run() {
double x = Math.random()*Math.random();
}
});
}
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
}
}
Using executors is about utilizing CPUs and / or CPU cores, so if you create a thread pool that utilizes the amount of CPUs at best, you have to have as many threads as CPUs / cores.
You are right, creating new objects costs too much. So one way to reduce the expenses is to use batches. If you know the kind and amount of computations to do, you create batches. So think about thousand(s) computations done in one executed task. You create batches for each thread. As soon as the computation is done (java.util.concurrent.Future), you create the next batch. Even the creation of new batches can be done in parralel (4 CPUs -> 3 threads for computation, 1 thread for batch provisioning). In the end, you may end up with more throughput, but with higher memory demands (batches, provisioning).
Edit: I changed your example and I let it run on my little dual-core x200 laptop.
provisioned 2 batches to be executed
simpleCompuation:14
computationWithObjCreation:17
computationWithObjCreationAndExecutors:9
As you see in the source code, I took the batch provisioning and executor lifecycle out of the measurement, too. That's more fair compared to the other two methods.
See the results by yourself...
import java.util.List;
import java.util.Vector;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ExecServicePerformance {
private static int count = 100000;
public static void main( String[] args ) throws InterruptedException {
final int cpus = Runtime.getRuntime().availableProcessors();
final ExecutorService es = Executors.newFixedThreadPool( cpus );
final Vector< Batch > batches = new Vector< Batch >( cpus );
final int batchComputations = count / cpus;
for ( int i = 0; i < cpus; i++ ) {
batches.add( new Batch( batchComputations ) );
}
System.out.println( "provisioned " + cpus + " batches to be executed" );
// warmup
simpleCompuation();
computationWithObjCreation();
computationWithObjCreationAndExecutors( es, batches );
long start = System.currentTimeMillis();
simpleCompuation();
long stop = System.currentTimeMillis();
System.out.println( "simpleCompuation:" + ( stop - start ) );
start = System.currentTimeMillis();
computationWithObjCreation();
stop = System.currentTimeMillis();
System.out.println( "computationWithObjCreation:" + ( stop - start ) );
// Executor
start = System.currentTimeMillis();
computationWithObjCreationAndExecutors( es, batches );
es.shutdown();
es.awaitTermination( 10, TimeUnit.SECONDS );
// Note: Executor#shutdown() and Executor#awaitTermination() requires
// some extra time. But the result should still be clear.
stop = System.currentTimeMillis();
System.out.println( "computationWithObjCreationAndExecutors:"
+ ( stop - start ) );
}
private static void computationWithObjCreation() {
for ( int i = 0; i < count; i++ ) {
new Runnable() {
#Override
public void run() {
double x = Math.random() * Math.random();
}
}.run();
}
}
private static void simpleCompuation() {
for ( int i = 0; i < count; i++ ) {
double x = Math.random() * Math.random();
}
}
private static void computationWithObjCreationAndExecutors(
ExecutorService es, List< Batch > batches )
throws InterruptedException {
for ( Batch batch : batches ) {
es.submit( batch );
}
}
private static class Batch implements Runnable {
private final int computations;
public Batch( final int computations ) {
this.computations = computations;
}
#Override
public void run() {
int countdown = computations;
while ( countdown-- > -1 ) {
double x = Math.random() * Math.random();
}
}
}
}
This is not a fair test for the thread pool for following reasons,
You are not taking advantage of the pooling at all because you only have 1 thread.
The job is too simple that the pooling overhead can't be justified. A multiplication on a CPU with FPP only takes a few cycles.
Considering following extra steps the thread pool has to do besides object creation and the running the job,
Put the job in the queue
Remove the job from queue
Get the thread from the pool and execute the job
Return the thread to the pool
When you have a real job and multiple threads, the benefit of the thread pool will be apparent.
The 'overhead' you mention is nothing to do with ExecutorService, it is caused by multiple threads synchronizing on Math.random, creating lock contention.
So yes, you are missing something (and the 'correct' answer below is not actually correct).
Here is some Java 8 code to demonstrate 8 threads running a simple function in which there is no lock contention:
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.function.DoubleFunction;
import com.google.common.base.Stopwatch;
public class ExecServicePerformance {
private static final int repetitions = 120;
private static int totalOperations = 250000;
private static final int cpus = 8;
private static final List<Batch> batches = batches(cpus);
private static DoubleFunction<Double> performanceFunc = (double i) -> {return Math.sin(i * 100000 / Math.PI); };
public static void main( String[] args ) throws InterruptedException {
printExecutionTime("Synchronous", ExecServicePerformance::synchronous);
printExecutionTime("Synchronous batches", ExecServicePerformance::synchronousBatches);
printExecutionTime("Thread per batch", ExecServicePerformance::asynchronousBatches);
printExecutionTime("Executor pool", ExecServicePerformance::executorPool);
}
private static void printExecutionTime(String msg, Runnable f) throws InterruptedException {
long time = 0;
for (int i = 0; i < repetitions; i++) {
Stopwatch stopwatch = Stopwatch.createStarted();
f.run(); //remember, this is a single-threaded synchronous execution since there is no explicit new thread
time += stopwatch.elapsed(TimeUnit.MILLISECONDS);
}
System.out.println(msg + " exec time: " + time);
}
private static void synchronous() {
for ( int i = 0; i < totalOperations; i++ ) {
performanceFunc.apply(i);
}
}
private static void synchronousBatches() {
for ( Batch batch : batches) {
batch.synchronously();
}
}
private static void asynchronousBatches() {
CountDownLatch cb = new CountDownLatch(cpus);
for ( Batch batch : batches) {
Runnable r = () -> { batch.synchronously(); cb.countDown(); };
Thread t = new Thread(r);
t.start();
}
try {
cb.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static void executorPool() {
final ExecutorService es = Executors.newFixedThreadPool(cpus);
for ( Batch batch : batches ) {
Runnable r = () -> { batch.synchronously(); };
es.submit(r);
}
es.shutdown();
try {
es.awaitTermination( 10, TimeUnit.SECONDS );
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static List<Batch> batches(final int cpus) {
List<Batch> list = new ArrayList<Batch>();
for ( int i = 0; i < cpus; i++ ) {
list.add( new Batch( totalOperations / cpus ) );
}
System.out.println("Batches: " + list.size());
return list;
}
private static class Batch {
private final int operationsInBatch;
public Batch( final int ops ) {
this.operationsInBatch = ops;
}
public void synchronously() {
for ( int i = 0; i < operationsInBatch; i++ ) {
performanceFunc.apply(i);
}
}
}
}
Result timings for 120 tests of 25k operations (ms):
Synchronous exec time: 9956
Synchronous batches exec time: 9900
Thread per batch exec time: 2176
Executor pool exec time: 1922
Winner: Executor Service.
I don't think this is at all realistic since you're creating a new executor service every time you make the method call. Unless you have very strange requirements that seems unrealistic - typically you'd create the service when your app starts up, and then submit jobs to it.
If you try the benchmarking again but initialise the service as a field, once, outside the timing loop; then you'll see the actual overhead of submitting Runnables to the service vs. running them yourself.
But I don't think you've grasped the point fully - Executors aren't meant to be there for efficiency, they're there to make co-ordinating and handing off work to a thread pool simpler. They will always be less efficient than just invoking Runnable.run() yourself (since at the end of the day the executor service still needs to do this, after doing some extra housekeeping beforehand). It's when you are using them from multiple threads needing asynchronous processing, that they really shine.
Also consider that you're looking at the relative time difference of a basically fixed cost (Executor overhead is the same whether your tasks take 1ms or 1hr to run) compared to a very small variable amount (your trivial runnable). If the executor service takes 5ms extra to run a 1ms task, that's not a very favourable figure. If it takes 5ms extra to run a 5 second task (e.g. a non-trivial SQL query), that's completely negligible and entirely worth it.
So to some extent it depends on your situation - if you have an extremely time-critical section, running lots of small tasks, that don't need to be executed in parallel or asynchronously then you'll get nothing from an Executor. If you're processing heavier tasks in parallel and want to respond asynchronously (e.g. a webapp) then Executors are great.
Whether they are the best choice for you depends on your situation, but really you need to try the tests with realistic representative data. I don't think it would be appropriate to draw any conclusions from the tests you've done unless your tasks really are that trivial (and you don't want to reuse the executor instance...).
Math.random() actually synchronizes on a single Random number generator. Calling Math.random() results in significant contention for the number generator. In fact the more threads you have, the slower it's going to be.
From the Math.random() javadoc:
This method is properly synchronized to allow correct use by more than
one thread. However, if many threads need to generate pseudorandom
numbers at a great rate, it may reduce contention for each thread to
have its own pseudorandom-number generator.
Firstly there's a few issues with the microbenchmark. You do a warm up, which is good. However, it is better to run the test multiple times, which should give a feel as to whether it has really warmed up and the variance of the results. It also tends to be better to do the test of each algorithm in separate runs, otherwise you might cause deoptimisation when an algorithm changes.
The task is very small, although I'm not entirely sure how small. So number of times faster is pretty meaningless. In multithreaded situations, it will touch the same volatile locations so threads could cause really bad performance (use a Random instance per thread). Also a run of 47 milliseconds is a bit short.
Certainly going to another thread for a tiny operation is not going to be fast. Split tasks up into bigger sizes if possible. JDK7 looks as if it will have a fork-join framework, which attempts to support fine tasks from divide and conquer algorithms by preferring to execute tasks on the same thread in order, with larger tasks pulled out by idle threads.
Here are results on my machine (OpenJDK 8 on 64-bit Ubuntu 14.0, Thinkpad W530)
simpleCompuation:6
computationWithObjCreation:5
computationWithObjCreationAndExecutors:33
There's certainly overhead. But remember what these numbers are: milliseconds for 100k iterations. In your case, the overhead was about 4 microseconds per iteration. For me, the overhead was about a quarter of a microsecond.
The overhead is synchronization, internal data structures, and possibly a lack of JIT optimization due to complex code paths (certainly more complex than your for loop).
The tasks that you'd actually want to parallelize would be worth it, despite the quarter microsecond overhead.
FYI, this would be a very bad computation to parallelize. I upped the thread to 8 (the number of cores):
simpleCompuation:5
computationWithObjCreation:6
computationWithObjCreationAndExecutors:38
It didn't make it any faster. This is because Math.random() is synchronized.
The Fixed ThreadPool's ultimate porpose is to reuse already created threads. So the performance gains are seen in the lack of the need to recreate a new thread every time a task is submitted. Hence the stop time must be taken inside the submitted task. Just with in the last statement of the run method.
You need to somehow group execution, in order to submit larger portions of computation to each thread (e.g. build groups based on stock symbol).
I got best results in similar scenarios by using the Disruptor. It has a very low per-job overhead. Still its important to group jobs, naive round robin usually creates many cache misses.
see http://java-is-the-new-c.blogspot.de/2014/01/comparision-of-different-concurrency.html
In case it is useful to others, here are test results with a realistic scenario - use ExecutorService repeatedly until the end of all tasks - on a Samsung Android device.
Simple computation (MS): 102
Use threads (MS): 31049
Use ExecutorService (MS): 257
Code:
ExecutorService executorService = Executors.newFixedThreadPool(1);
int count = 100000;
//Simple computation
Instant instant = Instant.now();
for (int i = 0; i < count; i++) {
double x = Math.random() * Math.random();
}
Duration duration = Duration.between(instant, Instant.now());
Log.d("ExecutorPerformanceTest", "Simple computation (MS): " + duration.toMillis());
//Use threads
instant = Instant.now();
for (int i = 0; i < count; i++) {
new Thread(() -> {
double x = Math.random() * Math.random();
}
).start();
}
duration = Duration.between(instant, Instant.now());
Log.d("ExecutorPerformanceTest", "Use threads (MS): " + duration.toMillis());
//Use ExecutorService
instant = Instant.now();
for (int i = 0; i < count; i++) {
executorService.execute(() -> {
double x = Math.random() * Math.random();
}
);
}
duration = Duration.between(instant, Instant.now());
Log.d("ExecutorPerformanceTest", "Use ExecutorService (MS): " + duration.toMillis());
I've faced a similar problem, but Math.random() was not the issue.
The problem is having many small tasks that take just a few milliseconds to complete. It is not much but a lot of small tasks in series ends up being a lot of time and I needed to parallelize.
So, the solution I found, and it might work for those of you facing this same problem: do not use any of the executor services. Instead create your own long living Threads and feed them tasks.
Here is an example, just as an idea don't try to copy paste it cause it probably won't work as I am using Kotlin and translating to Java in my head. The concept is what's important:
First, the Thread, a Thread that can execute a task and then continue there waiting for the next one:
public class Worker extends Thread {
private Callable task;
private Semaphore semaphore;
private CountDownLatch latch;
public Worker(Semaphore semaphore) {
this.semaphore = semaphore;
}
public void run() {
while (true) {
semaphore.acquire(); // this will block, the while(true) won't go crazy
if (task == null) continue;
task.run();
if (latch != null) latch.countDown();
task = null;
}
}
public void setTask(Callable task) {
this.task = task;
}
public void setCountDownLatch(CountDownLatch latch) {
this.latch = latch;
}
}
There is two things here that need explanation:
the Semaphore: gives you control over how many tasks and when they are executed by this thread
the CountDownLatch: is the way to notify someone else that a task was completed
So this is how you would use this Worker, first just a simple example:
Semaphore semaphore = new Semaphore(0); // initially the semaphore is closed
Worker worker = new Worker(semaphore);
worker.start();
worker.setTask( .. your callable task .. );
semaphore.release(); // this will allow one task to be processed by the worker
Now a more complicated example, with two Threads and waiting for both to complete using the CountDownLatch:
Semaphore semaphore1 = new Semaphore(0);
Worker worker1 = new Worker(semaphore1);
worker1.start();
Semaphore semaphore2 = new Semaphore(0);
Worker worker2 = new Worker(semaphore2);
worker2.start();
// same countdown latch for both workers, with a counter of 2
CountDownLatch countDownLatch = new CountDownLatch(2);
worker1.setCountDownLatch(countDownLatch);
worker2.setCountDownLatch(countDownLatch);
worker1.setTask( .. your callable task .. );
worker2.setTask( .. your callable task .. );
semaphore1.release();
semaphore2.release();
countDownLatch.await(); // this will block until 2 tasks have been completed
And after that code runs you could just add more tasks to the same threads and reuse them. That's the whole point of this, reusing the threads instead of creating new ones.
It is unpolished as f*** but hopefully this gives you an idea. For me this was an improvement compared to no multi threading. And it was much much better than any executor service with any number of threads in the pool by far.