My system is i5-Dual core with hyper-threading. Windows show me 4 processors. When i run a single optimized cpu-bound task by a single thread at a time its service time always display arround 35ms. But when i handover 2 tasks to 2 threads simultanously their service times display arround 70ms. I want to ask that my system have 4 processors then why does service times are arround 70 in case of 2 threads running teir tasks whereas 2 threads should run on 2 processors without any scheduling overhead.The codes are as follows.
CPU-Bound Task is as follows.
import java.math.BigInteger;
public class CpuBoundJob implements Runnable {
public void run() {
BigInteger factValue = BigInteger.ONE;
long t1=System.nanoTime();
for ( int i = 2; i <= 2000; i++){
factValue = factValue.multiply(BigInteger.valueOf(i));
}
long t2=System.nanoTime();
System.out.println("Service Time(ms)="+((double)(t2-t1)/1000000));
}
}
Thread that runs a task is as follows.
public class TaskRunner extends Thread {
CpuBoundJob job=new CpuBoundJob();
public void run(){
job.run();
}
}
And Finally, main class is as follows.
public class Test2 {
int numberOfThreads=100;//warmup code for JIT
public Test2(){
for(int i=1;i<=numberOfThreads;i++){//warmup code for JIT
TaskRunner t=new TaskRunner();
t.start();
}
try{
Thread.sleep(5000);// wait a little bit
}catch(Exception e){}
System.out.println("Warmed up completed! now start benchmarking");
System.out.println("First run single thread at a time");
try{//wait for the thread to complete
Thread.sleep(5000);
}catch(Exception e){}
//run only one thread at a time
TaskRunner t1=new TaskRunner();
t1.start();
try{//wait for the thread to complete
Thread.sleep(5000);
}catch(Exception e){}
//Now run 2 threads simultanously at a time
System.out.println("Now run 3 thread at a time");
for(int i=1;i<=3;i++){//run 2 thread at a time
TaskRunner t2=new TaskRunner();
t2.start();
}
}
public static void main(String[] args) {
new Test2();
}
Final output:
Warmed up completed! now start benchmarking First run single thread at
a time Service Time(ms)=5.829112 Now run 2 thread at a time Service
Time(ms)=6.518721 Service Time(ms)=10.364269 Service
Time(ms)=10.272689
I timed this in a variety of scenarios, and with a slightly modified task, got times of ~45 ms with one thread and ~60 ms for two threads. So, even in this example, in one second, one thread can complete about 22 tasks, but two threads can complete 33 tasks.
However, if you run a task that doesn't tax the garbage collector so grievously, you should see the performance increase you expect: two threads complete twice as many tasks. Here is my version of your test program.
Note that I made one significant change to your task (DirtyTask): n was always 0, because you cast the result of Math.random() to an int (which is zero), and then multiplied by 13.
Then I added a CleanTask that doesn't generate any new objects for the garbage collector to handle. Please test and report the results on your machine. On mine, I got this:
Testing "clean" task.
Average task time: one thread = 46 ms; two threads = 45 ms
Testing "dirty" task.
Average task time: one thread = 41 ms; two threads = 62 ms
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.function.Supplier;
final class Parallels
{
private static final int RUNS = 10;
public static void main(String... argv)
throws Exception
{
System.out.println("Testing \"clean\" task.");
flavor(CleanTask::new);
System.out.println("Testing \"dirty\" task.");
flavor(DirtyTask::new);
}
private static void flavor(Supplier<Callable<Long>> tasks)
throws InterruptedException, ExecutionException
{
ExecutorService warmup = Executors.newFixedThreadPool(100);
for (int i = 0; i < 100; ++i)
warmup.submit(tasks.get());
warmup.shutdown();
warmup.awaitTermination(1, TimeUnit.DAYS);
ExecutorService workers = Executors.newFixedThreadPool(2);
long t1 = test(1, tasks, workers);
long t2 = test(2, tasks, workers);
System.out.printf("Average task time: one thread = %d ms; two threads = %d ms%n", t1 / (1 * RUNS), t2 / (2 * RUNS));
workers.shutdown();
}
private static long test(int n, Supplier<Callable<Long>> tasks, ExecutorService workers)
throws InterruptedException, ExecutionException
{
long sum = 0;
for (int i = 0; i < RUNS; ++i) {
List<Callable<Long>> batch = new ArrayList<>(n);
for (int t = 0; t < n; ++t)
batch.add(tasks.get());
List<Future<Long>> times = workers.invokeAll(batch);
for (Future<Long> f : times)
sum += f.get();
}
return TimeUnit.NANOSECONDS.toMillis(sum);
}
/**
* Do something on the CPU without creating any garbage, and return the
* elapsed time.
*/
private static class CleanTask
implements Callable<Long>
{
#Override
public Long call()
{
long time = System.nanoTime();
long x = 0;
for (int i = 0; i < 15_000_000; i++)
x ^= ThreadLocalRandom.current().nextLong();
if (x == 0)
throw new IllegalStateException();
return System.nanoTime() - time;
}
}
/**
* Do something on the CPU that creates a lot of garbage, and return the
* elapsed time.
*/
private static class DirtyTask
implements Callable<Long>
{
#Override
public Long call()
{
long time = System.nanoTime();
String s = "";
for (int i = 0; i < 10_000; i++)
s += (int) (ThreadLocalRandom.current().nextDouble() * 13);
if (s.length() == 10_000)
throw new IllegalStateException();
return System.nanoTime() - time;
}
}
}
for(int i=0;i<10000;i++)
{
int n=(int)Math.random()*13;
s+=name.valueOf(n);
//s+="*";
}
This code is a tight spin around a resource that can only be accessed by one thread at a time. So each thread just has to wait for the other to release the random number generator so that it can access it.
As the docs for Math.random say:
When this method is first called, it creates a single new pseudorandom-number generator, exactly as if by the expression
new java.util.Random()
This new pseudorandom-number generator is used thereafter for all calls to this method and is used nowhere else.
This method is properly synchronized to allow correct use by more than one thread. However, if many threads need to generate pseudorandom numbers at a great rate, it may reduce contention for each thread to have its own pseudorandom-number generator.
Related
package com.playground.concurrency;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
public class MyRunnable implements Runnable {
private String taskName;
public String getTaskName() {
return taskName;
}
public void setTaskName(String taskName) {
this.taskName = taskName;
}
private int processed = 0;
public MyRunnable(String name) {
this.taskName = name;
}
private boolean keepRunning = true;
public boolean isKeepRunning() {
return keepRunning;
}
public void setKeepRunning(boolean keepRunning) {
this.keepRunning = keepRunning;
}
private BlockingQueue<Integer> elements = new LinkedBlockingQueue<Integer>(10);
public BlockingQueue<Integer> getElements() {
return elements;
}
public void setElements(BlockingQueue<Integer> elements) {
this.elements = elements;
}
#Override
public void run() {
while (keepRunning || !elements.isEmpty()) {
try {
Integer element = elements.take();
Thread.sleep(10);
System.out.println(taskName +" :: "+elements.size());
System.out.println("Got :: " + element);
processed++;
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
System.out.println("Exiting thread");
}
public int getProcessed() {
return processed;
}
public void setProcessed(int processed) {
this.processed = processed;
}
}
package com.playground.concurrency.service;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import com.playground.concurrency.MyRunnable;
public class TestService {
public static void main(String[] args) throws InterruptedException {
int roundRobinIndex = 0;
int noOfProcess = 10;
List<MyRunnable> processes = new ArrayList<MyRunnable>();
for (int i = 0; i < noOfProcess; i++) {
processes.add(new MyRunnable("Task : " + i));
}
ExecutorService threadPoolExecutor = Executors.newFixedThreadPool(5);
for (MyRunnable process : processes) {
threadPoolExecutor.execute(process);
}
int totalMessages = 1000;
long start = System.currentTimeMillis();
for (int i = 1; i <= totalMessages; i++) {
processes.get(roundRobinIndex++).getElements().put(i);
if (roundRobinIndex == noOfProcess) {
roundRobinIndex = 0;
}
}
System.out.println("Done putting all the elements");
for (MyRunnable process : processes) {
process.setKeepRunning(false);
}
threadPoolExecutor.shutdown();
try {
threadPoolExecutor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
long totalProcessed = 0;
for (MyRunnable process : processes) {
System.out.println("task " + process.getTaskName() + " processd " + process.getProcessed());
totalProcessed += process.getProcessed();
}
long end = System.currentTimeMillis();
System.out.println("total time" + (end - start));
}
}
I have a simple task that reads elements from a LinkedBlockingQueue. I create multiple instances of these tasks and execute by ExecutorService . This programs works as expected when the noOfProcess and thread pool size is same.(For ex: noOfProcess=10 and thread pool size=10).
However , if noOfProcess=10 and thread pool size =5 then the main thread keeps waiting at the below line after processing a few items.
processes.get(roundRobinIndex++).getElements().put(i);
What am i doing wrong here ?
Ah yes. The good old deadlock.
What happens is: You submit 10 Tasks to the ExecutorService, and then send jobs via .put(i). This blocks for Task 5 as expected when its queue is full. Now Task 5 is not currently being executed, and as a matter of fact will never be, since Task 0 to 4 are still clogging up your FixedThreadPool, blocking at .take() in the run() Method waiting for new Jobs from .put(i), which they will never get.
This error is a fundamental design flaw within your code and there are myriads of ways to fix it, one of which being the increased Thread Pool Size.
My suggestion is that you go back to the drawing board and rethink the structure in the main Method.
And since you posted your code, have some tips:
1.:
Posting your entire code can be interpreted as a call to 'pls fix my code', and you are encouraged to omit all uneccessary details (like all those getters and setters). Maybe check https://stackoverflow.com/help/minimal-reproducible-example
2.:
Posting two classes in the same body made things kinda complicated. Split it next time.
3.: (nitpick)
processes.get(roundRobinIndex++).getElements().put(i);
Combining two operations like you did here is bad style since it makes your code less readable for others. You could just have written:
processes.get(i % noOfProcesses).getElements().put(i);
To fix the behavior, you need to do one of the following:
have enough Runnables, each with enough queue capacity to take all 1,000 messages (for example: 100 Runnables with capacity 10 or more; or 10 Runnables with capacity 100 or more), or
have a thread pool that is large enough to accomodate all of your Runnables so that each of them can start running.
Without one of those happening, the ExecutorService will not start the extra Runnables. The main worker thread will continue adding items to each queue, including those of non-running Runnables, until it encounters a queue that is full, at which point it blocks. With 10 Runnables and thread pool size 5, the first queue to fill up will the be the 6th Runnable. This is the same if you had just 6 Runnables. The significant point is that you have at least one more Runnable than you have room in your thread pool.
From newFixedThreadPool() Javadoc:
If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available.
Consider a simpler example of 2 processes and thread pool size of 1. You'll be allowed to create the first process and submit it to the ExecutorService (so the ExecutorService will start and run it). The second process however, will not be allowed to run by the ExecutorService. Your main thread does not pay attention to this, however, and it will continue putting elements into the queue for the second process even though nothing is consuming it.
Your code is ok with noOfProcess=10 and thread pool size=5 – if you also change your queue size to 100, like this: new LinkedBlockingQueue<>(100).
You can observe this behavior – where the queue of a non-running Runnable fills up – if you change this line:
processes.get(roundRobinIndex++).getElements().put(i);
to this (which is the same logical code, but has object references saved for use inside the println() output):
MyRunnable runnable = processes.get(roundRobinIndex++);
BlockingQueue<Integer> elements = runnable.getElements();
System.out.println("attempt to put() for " + runnable.getTaskName() + " with " + elements.size() + " elements");
elements.put(i);
I`m trying to learn multithreading programming and I have some questions about the approach that would have to be taken.
So, in my specific case I want to build a program that renames 1000 files and I was thinking to create a worker class:
public class Worker implements Runnable {
private List<File> files ;
public Worker(List<File> f){
files = f;
}
public void run(){
// read all files from list and rename them
}
}
and then in main class to do something like:
Worker w1 = new Worker(..list of 500 files...) ;
Worker w2 = new Worker(..list of the other 500 files...) ;
Thread t1 = new Thread(w1,"thread1");
Thread t2 = new Thread(w2,"thread2");
t1.start();
t2.start();
Running this brings me no concurrency issues so I do not need synchronized code, but I`m not sure if this is the correct approach...?
Or should I create only one instance of Worker() and pass the entire 1000 files list, and the take care that no matter how many threads access the object thew won`t get the same File from the list ?
i.e :
Worker w1 = new Worker(..list of 1000 files...) ;
Thread t1 = new Thread(w1,"thread1");
Thread t2 = new Thread(w1,"thread2");
t1.start();
t2.start();
How should I proceed here ?
The First approach you said is correct one. You need to create two Worker as each worker will work on different list of file.
Worker w1 = new Worker(..list of 500 files...) ; // First List
Worker w2 = new Worker(..list of the other 500 files...) ; // Second List
Thread t1 = new Thread(w1,"thread1");
Thread t2 = new Thread(w2,"thread2");
t1.start();
t2.start();
It's simple here two different thread with load of 500 file will execute concurrently.
A more typical and scalable approach is one of the following:
create a collection (likely an array or list) of N threads to perform the work
use a thread pool, e.g. from Executors.newFixedThreadPool(N)
You may also wish to use a Producer Consumer pattern in which the threads pull from a common task pool. This allows natural balancing of the work - instead of essentially hard-coding one thread handles 500 tasks and the other the same number.
Consider after all what would happen if all of your larger files end up in the bucket handled by the Thread2? The first thread is done/idle and the second thread has to do all of the heavy lifting.
The producer/consumer pooling approach would be to dump all of the work (generated by the Producer's) into a task pool and then the Consumers (your worker threads) bite off small pieces (e.g. one file) at a time. This approach leads to keeping both threads occupied for a similar duration.
In learning multi-threaded programming one of the important insights is that a thread is not a task. By giving a thread a part of the list of items to process you are halfway there but the next step will take you further: constructing the task in such a way that any number of threads can execute it. To do this, you will have to get familiar with the java.util.concurrent classes. These are useful tools to help constructing the tasks.
The example below separates tasks from threads. It uses AtomicInteger to ensure each thread picks a unique task and it uses CountDownLatch to know when all work is done. The example also shows balancing: threads that execute tasks that complete faster, execute more tasks.
The example is by no means the only solution - there are other ways of doing this that could be faster, easier, better to maintain, etc..
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;
public class MultiRename implements Runnable {
public static void main(String[] args) {
final int numberOfFnames = 50;
MultiRenameParams params = new MultiRenameParams();
params.fnameList = new ArrayList<String>();
for (int i = 0; i < numberOfFnames; i++) {
params.fnameList.add("fname " + i);
}
params.fnameListIndex = new AtomicInteger();
final int numberOfThreads = 3;
params.allDone = new CountDownLatch(numberOfThreads);
ExecutorService tp = Executors.newCachedThreadPool();
System.out.println("Starting");
for (int i = 0; i < numberOfThreads; i++) {
tp.execute(new MultiRename(params, i));
}
try { params.allDone.await(); } catch (Exception e) {
e.printStackTrace();
}
tp.shutdownNow();
System.out.println("Finished");
}
private final MultiRenameParams params;
private final Random random = new Random();
// Just to show there are fast and slow tasks.
// Thread with lowest delay should get most tasks done.
private final int delay;
public MultiRename(MultiRenameParams params, int delay) {
this.params = params;
this.delay = delay;
}
#Override
public void run() {
final int maxIndex = params.fnameList.size();
int i = 0;
int count = 0;
while ((i = params.fnameListIndex.getAndIncrement()) < maxIndex) {
String fname = params.fnameList.get(i);
long sleepTimeMs = random.nextInt(10) + delay;
System.out.println(Thread.currentThread().getName() + " renaming " + fname + " for " + sleepTimeMs + " ms.");
try { Thread.sleep(sleepTimeMs); } catch (Exception e) {
e.printStackTrace();
break;
}
count++;
}
System.out.println(Thread.currentThread().getName() + " done, renamed " + count + " files.");
params.allDone.countDown();
}
static class MultiRenameParams {
List<String> fnameList;
AtomicInteger fnameListIndex;
CountDownLatch allDone;
}
}
Problem Statement is:-
Each thread uses unique ID between 1 and 1000 and program has to run for 60 minutes or more, So in that 60 minutes it is possible that all the ID's will get finished so I need to reuse those ID's again,
I know several ways to do it, one way is the below that I wrote by taking help from StackOverflow, but when I tried running this, what I found is that, after few minutes of run this program gets very slow and it takes lot of time to print the ID on the console. And also I get OutOfMemory Error sometimes. Is there any better way to solve this kind of problem?
class IdPool {
private final LinkedList<Integer> availableExistingIds = new LinkedList<Integer>();
public IdPool() {
for (int i = 1; i <= 1000; i++) {
availableExistingIds.add(i);
}
}
public synchronized Integer getExistingId() {
return availableExistingIds.removeFirst();
}
public synchronized void releaseExistingId(Integer id) {
availableExistingIds.add(id);
}
}
class ThreadNewTask implements Runnable {
private IdPool idPool;
public ThreadNewTask(IdPool idPool) {
this.idPool = idPool;
}
public void run() {
Integer id = idPool.getExistingId();
someMethod(id);
idPool.releaseExistingId(id);
}
private void someMethod(Integer id) {
System.out.println("Task: " +id);
}
}
public class TestingPool {
public static void main(String[] args) throws InterruptedException {
int size = 10;
int durationOfRun = 60;
IdPool idPool = new IdPool();
// create thread pool with given size
// create thread pool with given size
ExecutorService service = new ThreadPoolExecutor(size, size, 500L, TimeUnit.MILLISECONDS, new ArrayBlockingQueue<Runnable>(10), new ThreadPoolExecutor.CallerRunsPolicy());
// queue some tasks
long startTime = System.currentTimeMillis();
long endTime = startTime + (durationOfRun * 60 * 1000L);
// Running it for 60 minutes
while(System.currentTimeMillis() <= endTime) {
service.submit(new ThreadNewTask(idPool));
}
// wait for termination
service.shutdown();
service.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
}
}
I already explained you in your previous question that your code submitted millions and millions of tasks to the executor, since it submits tasks in a loop during 60 minutes, withot waiting.
It's very unclear what your end goal is, but as is, you're filling a queue of tasks until you don't have any memory available anymore. Since you don't explain the goal of your program, it's hard to give you any solution.
But the first thing you could do is to limit the size of the task queue of your executor. This would force the main thread to block each time the queue is full.
Looking at the javadocs for CyclicBarrier I found the following statement in the class documentation that I dont completely understand. From the javadoc:
If the barrier action does not rely on the parties being suspended when it is executed, then any of the threads in the party could execute that action when it is released. To facilitate this, each invocation of await() returns the arrival index of that thread at the barrier. You can then choose which thread should execute the barrier action, for example:
if (barrier.await() == 0) {
// log the completion of this iteration
}
Can someone explain how to designate a specific thread for execution of the barrier action once all the parties have called .await() and perhaps provide an example?
OK, pretend RuPaul wanted some worker threads, but only the 3rd one that finished is supposed to do the barrier task (Say "Sashay, Chante").
import java.util.Random;
import java.util.concurrent.BrokenBarrierException;
import java.util.concurrent.CyclicBarrier;
import java.util.concurrent.TimeUnit;
public class Main
{
private static class Worker implements Runnable {
private CyclicBarrier barrier;
public Worker(CyclicBarrier b) {
barrier = b;
}
public void run() {
final String threadName = Thread.currentThread().getName();
System.out.printf("%s: You better work!%n", threadName);
// simulate the workin' it part
Random rnd = new Random();
int secondsToWorkIt = rnd.nextInt(10) + 1;
try {
TimeUnit.SECONDS.sleep(secondsToWorkIt);
} catch (InterruptedException ex) { /* ...*/ }
System.out.printf("%s worked it, girl!%n", threadName);
try {
int n = barrier.await();
final int myOrder = barrier.getParties() - n;
System.out.printf("Turn number: %s was %s%n", myOrder, threadName);
// MAGIC CODE HERE!!!
if (myOrder == 3) { // the third one that finished
System.out.printf("%s: Sashay Chante!%n", myOrder);
}
// END MAGIC CODE
}
catch (BrokenBarrierException ex) { /* ... */ }
catch (InterruptedException ex) { /* ... */ }
}
}
private final int numThreads = 5;
public void work() {
/*
* I want the 3rd thread that finished to say "Sashay Chante!"
* when everyone has called await.
* So I'm not going to put my "barrier action" in the CyclicBarrier constructor,
* where only the last thread will run it! I'm going to put it in the Runnable
* that calls await.
*/
CyclicBarrier b = new CyclicBarrier(numThreads);
for (int i= 0; i < numThreads; i++) {
Worker task = new Worker(b);
Thread thread = new Thread(task);
thread.start();
}
}
public static void main(String[] args)
{
Main main = new Main();
main.work();
}
}
Here is an example of the output:
Thread-0: You better work!
Thread-4: You better work!
Thread-2: You better work!
Thread-1: You better work!
Thread-3: You better work!
Thread-1 worked it, girl!
Thread-4 worked it, girl!
Thread-0 worked it, girl!
Thread-3 worked it, girl!
Thread-2 worked it, girl!
Turn number: 5 was Thread-2
Turn number: 3 was Thread-0
3: Sashay Chante!
Turn number: 1 was Thread-1
Turn number: 4 was Thread-3
Turn number: 2 was Thread-4
As you can see, the thread that finished 3rd was Thread-0, so Thread-0 was the one that did the "barrier action".
Say you are able to name your threads:
thread.setName("My Thread " + i);
Then you can perform the action on the thread of that name...I don't know how feasible that is for you.
I think that section of the documentation is about an alternative to the barrier action Runnable, not a particular way of using it. Note how it says (emphasis mine):
If the barrier action does not rely on the parties being suspended when it is executed
If you specify a barrier action as a runnable, then it ...
is run once per barrier point, after the last thread in the party arrives, but before any threads are released
So, while the threads are suspended (although since it's run by the last thread to arrive, that one isn't suspendd; but at least its normal flow of execution is suspended until the barrier action finishes).
The business about using the return value of await() is something you can do if you don't need your action to run while the threads are suspended.
The documentation's examples are indicative. The example using a Runnable barrier action is coordinating the work of some other threads - merging the rows and checking if the job is done. The other threads need to wait for it to know if they have more work to do. So, it has to run while they're suspended. The example using the return value from await() is some logging. The other threads don't depend on the logging having being done. So, it can happen while the other threads have started doing more work.
CyclicBarrier enables designating a Thread by ORDER :
Designating a thread that returns at a SPECIFIC order is possible if, as you say, you enclose the barrier completion logic in a conditional which is specific to a thread index. Thus, your implementation above will work according to the documentation you cited.
However, the point of confusion here - is that the documentation is talking about thread identity in terms of order of returning to the barrier, rather than thread object identity. Thus, thread 0 refers to the 0th thread to complete.
Alternative : Designating a Thread using other mechanisms.
If you wanted to have a specific thread carry on a specific action after other works completed, you might use a different mechanism - like a semaphore , for example. If you desired this behavior, you may not really need the cyclic barrier.
To inspect what is meant by the documentation, run the class (modified from http://programmingexamples.wikidot.com/cyclicbarrier) below , where ive incorporated your snippet.
Example of what is meant by the docs for the CyclicBarrier
package thread;
import java.util.concurrent.BrokenBarrierException;
import java.util.concurrent.CyclicBarrier;
public class CyclicBarrierExample
{
private static int matrix[][] =
{
{ 1 },
{ 2, 2 },
{ 3, 3, 3 },
{ 4, 4, 4, 4 },
{ 5, 5, 5, 5, 5 } };
static final int rows = matrix.length;
private static int results[]=new int[rows];
static int threadId=0;
private static class Summer extends Thread
{
int row;
CyclicBarrier barrier;
Summer(CyclicBarrier barrier, int row)
{
this.barrier = barrier;
this.row = row;
}
public void run()
{
int columns = matrix[row].length;
int sum = 0;
for (int i = 0; i < columns; i++)
{
sum += matrix[row][i];
}
results[row] = sum;
System.out.println("Results for row " + row + " are : " + sum);
// wait for the others
// Try commenting the below block, and watch what happens.
try
{
int w = barrier.await();
if(w==0)
{
System.out.println("merging now !");
int fullSum = 0;
for (int i = 0; i < rows; i++)
{
fullSum += results[i];
}
System.out.println("Results are: " + fullSum);
}
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
public static void main(String args[])
{
/*
* public CyclicBarrier(int parties,Runnable barrierAction)
* Creates a new CyclicBarrier that will trip when the given number
* of parties (threads) are waiting upon it, and which will execute
* the merger task when the barrier is tripped, performed
* by the last thread entering the barrier.
*/
CyclicBarrier barrier = new CyclicBarrier(rows );
for (int i = 0; i < rows; i++)
{
System.out.println("Creating summer " + i);
new Summer(barrier, i).start();
}
System.out.println("Waiting...");
}
}
I'm trying to figure out how to correctly use Java's Executors. I realize submitting tasks to an ExecutorService has its own overhead. However, I'm surprised to see it is as high as it is.
My program needs to process huge amount of data (stock market data) with as low latency as possible. Most of the calculations are fairly simple arithmetic operations.
I tried to test something very simple: "Math.random() * Math.random()"
The simplest test runs this computation in a simple loop. The second test does the same computation inside a anonymous Runnable (this is supposed to measure the cost of creating new objects). The third test passes the Runnable to an ExecutorService (this measures the cost of introducing executors).
I ran the tests on my dinky laptop (2 cpus, 1.5 gig ram):
(in milliseconds)
simpleCompuation:47
computationWithObjCreation:62
computationWithObjCreationAndExecutors:422
(about once out of four runs, the first two numbers end up being equal)
Notice that executors take far, far more time than executing on a single thread. The numbers were about the same for thread pool sizes between 1 and 8.
Question: Am I missing something obvious or are these results expected? These results tell me that any task I pass in to an executor must do some non-trivial computation. If I am processing millions of messages, and I need to perform very simple (and cheap) transformations on each message, I still may not be able to use executors...trying to spread computations across multiple CPUs might end up being costlier than just doing them in a single thread. The design decision becomes much more complex than I had originally thought. Any thoughts?
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ExecServicePerformance {
private static int count = 100000;
public static void main(String[] args) throws InterruptedException {
//warmup
simpleCompuation();
computationWithObjCreation();
computationWithObjCreationAndExecutors();
long start = System.currentTimeMillis();
simpleCompuation();
long stop = System.currentTimeMillis();
System.out.println("simpleCompuation:"+(stop-start));
start = System.currentTimeMillis();
computationWithObjCreation();
stop = System.currentTimeMillis();
System.out.println("computationWithObjCreation:"+(stop-start));
start = System.currentTimeMillis();
computationWithObjCreationAndExecutors();
stop = System.currentTimeMillis();
System.out.println("computationWithObjCreationAndExecutors:"+(stop-start));
}
private static void computationWithObjCreation() {
for(int i=0;i<count;i++){
new Runnable(){
#Override
public void run() {
double x = Math.random()*Math.random();
}
}.run();
}
}
private static void simpleCompuation() {
for(int i=0;i<count;i++){
double x = Math.random()*Math.random();
}
}
private static void computationWithObjCreationAndExecutors()
throws InterruptedException {
ExecutorService es = Executors.newFixedThreadPool(1);
for(int i=0;i<count;i++){
es.submit(new Runnable() {
#Override
public void run() {
double x = Math.random()*Math.random();
}
});
}
es.shutdown();
es.awaitTermination(10, TimeUnit.SECONDS);
}
}
Using executors is about utilizing CPUs and / or CPU cores, so if you create a thread pool that utilizes the amount of CPUs at best, you have to have as many threads as CPUs / cores.
You are right, creating new objects costs too much. So one way to reduce the expenses is to use batches. If you know the kind and amount of computations to do, you create batches. So think about thousand(s) computations done in one executed task. You create batches for each thread. As soon as the computation is done (java.util.concurrent.Future), you create the next batch. Even the creation of new batches can be done in parralel (4 CPUs -> 3 threads for computation, 1 thread for batch provisioning). In the end, you may end up with more throughput, but with higher memory demands (batches, provisioning).
Edit: I changed your example and I let it run on my little dual-core x200 laptop.
provisioned 2 batches to be executed
simpleCompuation:14
computationWithObjCreation:17
computationWithObjCreationAndExecutors:9
As you see in the source code, I took the batch provisioning and executor lifecycle out of the measurement, too. That's more fair compared to the other two methods.
See the results by yourself...
import java.util.List;
import java.util.Vector;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ExecServicePerformance {
private static int count = 100000;
public static void main( String[] args ) throws InterruptedException {
final int cpus = Runtime.getRuntime().availableProcessors();
final ExecutorService es = Executors.newFixedThreadPool( cpus );
final Vector< Batch > batches = new Vector< Batch >( cpus );
final int batchComputations = count / cpus;
for ( int i = 0; i < cpus; i++ ) {
batches.add( new Batch( batchComputations ) );
}
System.out.println( "provisioned " + cpus + " batches to be executed" );
// warmup
simpleCompuation();
computationWithObjCreation();
computationWithObjCreationAndExecutors( es, batches );
long start = System.currentTimeMillis();
simpleCompuation();
long stop = System.currentTimeMillis();
System.out.println( "simpleCompuation:" + ( stop - start ) );
start = System.currentTimeMillis();
computationWithObjCreation();
stop = System.currentTimeMillis();
System.out.println( "computationWithObjCreation:" + ( stop - start ) );
// Executor
start = System.currentTimeMillis();
computationWithObjCreationAndExecutors( es, batches );
es.shutdown();
es.awaitTermination( 10, TimeUnit.SECONDS );
// Note: Executor#shutdown() and Executor#awaitTermination() requires
// some extra time. But the result should still be clear.
stop = System.currentTimeMillis();
System.out.println( "computationWithObjCreationAndExecutors:"
+ ( stop - start ) );
}
private static void computationWithObjCreation() {
for ( int i = 0; i < count; i++ ) {
new Runnable() {
#Override
public void run() {
double x = Math.random() * Math.random();
}
}.run();
}
}
private static void simpleCompuation() {
for ( int i = 0; i < count; i++ ) {
double x = Math.random() * Math.random();
}
}
private static void computationWithObjCreationAndExecutors(
ExecutorService es, List< Batch > batches )
throws InterruptedException {
for ( Batch batch : batches ) {
es.submit( batch );
}
}
private static class Batch implements Runnable {
private final int computations;
public Batch( final int computations ) {
this.computations = computations;
}
#Override
public void run() {
int countdown = computations;
while ( countdown-- > -1 ) {
double x = Math.random() * Math.random();
}
}
}
}
This is not a fair test for the thread pool for following reasons,
You are not taking advantage of the pooling at all because you only have 1 thread.
The job is too simple that the pooling overhead can't be justified. A multiplication on a CPU with FPP only takes a few cycles.
Considering following extra steps the thread pool has to do besides object creation and the running the job,
Put the job in the queue
Remove the job from queue
Get the thread from the pool and execute the job
Return the thread to the pool
When you have a real job and multiple threads, the benefit of the thread pool will be apparent.
The 'overhead' you mention is nothing to do with ExecutorService, it is caused by multiple threads synchronizing on Math.random, creating lock contention.
So yes, you are missing something (and the 'correct' answer below is not actually correct).
Here is some Java 8 code to demonstrate 8 threads running a simple function in which there is no lock contention:
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.function.DoubleFunction;
import com.google.common.base.Stopwatch;
public class ExecServicePerformance {
private static final int repetitions = 120;
private static int totalOperations = 250000;
private static final int cpus = 8;
private static final List<Batch> batches = batches(cpus);
private static DoubleFunction<Double> performanceFunc = (double i) -> {return Math.sin(i * 100000 / Math.PI); };
public static void main( String[] args ) throws InterruptedException {
printExecutionTime("Synchronous", ExecServicePerformance::synchronous);
printExecutionTime("Synchronous batches", ExecServicePerformance::synchronousBatches);
printExecutionTime("Thread per batch", ExecServicePerformance::asynchronousBatches);
printExecutionTime("Executor pool", ExecServicePerformance::executorPool);
}
private static void printExecutionTime(String msg, Runnable f) throws InterruptedException {
long time = 0;
for (int i = 0; i < repetitions; i++) {
Stopwatch stopwatch = Stopwatch.createStarted();
f.run(); //remember, this is a single-threaded synchronous execution since there is no explicit new thread
time += stopwatch.elapsed(TimeUnit.MILLISECONDS);
}
System.out.println(msg + " exec time: " + time);
}
private static void synchronous() {
for ( int i = 0; i < totalOperations; i++ ) {
performanceFunc.apply(i);
}
}
private static void synchronousBatches() {
for ( Batch batch : batches) {
batch.synchronously();
}
}
private static void asynchronousBatches() {
CountDownLatch cb = new CountDownLatch(cpus);
for ( Batch batch : batches) {
Runnable r = () -> { batch.synchronously(); cb.countDown(); };
Thread t = new Thread(r);
t.start();
}
try {
cb.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static void executorPool() {
final ExecutorService es = Executors.newFixedThreadPool(cpus);
for ( Batch batch : batches ) {
Runnable r = () -> { batch.synchronously(); };
es.submit(r);
}
es.shutdown();
try {
es.awaitTermination( 10, TimeUnit.SECONDS );
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private static List<Batch> batches(final int cpus) {
List<Batch> list = new ArrayList<Batch>();
for ( int i = 0; i < cpus; i++ ) {
list.add( new Batch( totalOperations / cpus ) );
}
System.out.println("Batches: " + list.size());
return list;
}
private static class Batch {
private final int operationsInBatch;
public Batch( final int ops ) {
this.operationsInBatch = ops;
}
public void synchronously() {
for ( int i = 0; i < operationsInBatch; i++ ) {
performanceFunc.apply(i);
}
}
}
}
Result timings for 120 tests of 25k operations (ms):
Synchronous exec time: 9956
Synchronous batches exec time: 9900
Thread per batch exec time: 2176
Executor pool exec time: 1922
Winner: Executor Service.
I don't think this is at all realistic since you're creating a new executor service every time you make the method call. Unless you have very strange requirements that seems unrealistic - typically you'd create the service when your app starts up, and then submit jobs to it.
If you try the benchmarking again but initialise the service as a field, once, outside the timing loop; then you'll see the actual overhead of submitting Runnables to the service vs. running them yourself.
But I don't think you've grasped the point fully - Executors aren't meant to be there for efficiency, they're there to make co-ordinating and handing off work to a thread pool simpler. They will always be less efficient than just invoking Runnable.run() yourself (since at the end of the day the executor service still needs to do this, after doing some extra housekeeping beforehand). It's when you are using them from multiple threads needing asynchronous processing, that they really shine.
Also consider that you're looking at the relative time difference of a basically fixed cost (Executor overhead is the same whether your tasks take 1ms or 1hr to run) compared to a very small variable amount (your trivial runnable). If the executor service takes 5ms extra to run a 1ms task, that's not a very favourable figure. If it takes 5ms extra to run a 5 second task (e.g. a non-trivial SQL query), that's completely negligible and entirely worth it.
So to some extent it depends on your situation - if you have an extremely time-critical section, running lots of small tasks, that don't need to be executed in parallel or asynchronously then you'll get nothing from an Executor. If you're processing heavier tasks in parallel and want to respond asynchronously (e.g. a webapp) then Executors are great.
Whether they are the best choice for you depends on your situation, but really you need to try the tests with realistic representative data. I don't think it would be appropriate to draw any conclusions from the tests you've done unless your tasks really are that trivial (and you don't want to reuse the executor instance...).
Math.random() actually synchronizes on a single Random number generator. Calling Math.random() results in significant contention for the number generator. In fact the more threads you have, the slower it's going to be.
From the Math.random() javadoc:
This method is properly synchronized to allow correct use by more than
one thread. However, if many threads need to generate pseudorandom
numbers at a great rate, it may reduce contention for each thread to
have its own pseudorandom-number generator.
Firstly there's a few issues with the microbenchmark. You do a warm up, which is good. However, it is better to run the test multiple times, which should give a feel as to whether it has really warmed up and the variance of the results. It also tends to be better to do the test of each algorithm in separate runs, otherwise you might cause deoptimisation when an algorithm changes.
The task is very small, although I'm not entirely sure how small. So number of times faster is pretty meaningless. In multithreaded situations, it will touch the same volatile locations so threads could cause really bad performance (use a Random instance per thread). Also a run of 47 milliseconds is a bit short.
Certainly going to another thread for a tiny operation is not going to be fast. Split tasks up into bigger sizes if possible. JDK7 looks as if it will have a fork-join framework, which attempts to support fine tasks from divide and conquer algorithms by preferring to execute tasks on the same thread in order, with larger tasks pulled out by idle threads.
Here are results on my machine (OpenJDK 8 on 64-bit Ubuntu 14.0, Thinkpad W530)
simpleCompuation:6
computationWithObjCreation:5
computationWithObjCreationAndExecutors:33
There's certainly overhead. But remember what these numbers are: milliseconds for 100k iterations. In your case, the overhead was about 4 microseconds per iteration. For me, the overhead was about a quarter of a microsecond.
The overhead is synchronization, internal data structures, and possibly a lack of JIT optimization due to complex code paths (certainly more complex than your for loop).
The tasks that you'd actually want to parallelize would be worth it, despite the quarter microsecond overhead.
FYI, this would be a very bad computation to parallelize. I upped the thread to 8 (the number of cores):
simpleCompuation:5
computationWithObjCreation:6
computationWithObjCreationAndExecutors:38
It didn't make it any faster. This is because Math.random() is synchronized.
The Fixed ThreadPool's ultimate porpose is to reuse already created threads. So the performance gains are seen in the lack of the need to recreate a new thread every time a task is submitted. Hence the stop time must be taken inside the submitted task. Just with in the last statement of the run method.
You need to somehow group execution, in order to submit larger portions of computation to each thread (e.g. build groups based on stock symbol).
I got best results in similar scenarios by using the Disruptor. It has a very low per-job overhead. Still its important to group jobs, naive round robin usually creates many cache misses.
see http://java-is-the-new-c.blogspot.de/2014/01/comparision-of-different-concurrency.html
In case it is useful to others, here are test results with a realistic scenario - use ExecutorService repeatedly until the end of all tasks - on a Samsung Android device.
Simple computation (MS): 102
Use threads (MS): 31049
Use ExecutorService (MS): 257
Code:
ExecutorService executorService = Executors.newFixedThreadPool(1);
int count = 100000;
//Simple computation
Instant instant = Instant.now();
for (int i = 0; i < count; i++) {
double x = Math.random() * Math.random();
}
Duration duration = Duration.between(instant, Instant.now());
Log.d("ExecutorPerformanceTest", "Simple computation (MS): " + duration.toMillis());
//Use threads
instant = Instant.now();
for (int i = 0; i < count; i++) {
new Thread(() -> {
double x = Math.random() * Math.random();
}
).start();
}
duration = Duration.between(instant, Instant.now());
Log.d("ExecutorPerformanceTest", "Use threads (MS): " + duration.toMillis());
//Use ExecutorService
instant = Instant.now();
for (int i = 0; i < count; i++) {
executorService.execute(() -> {
double x = Math.random() * Math.random();
}
);
}
duration = Duration.between(instant, Instant.now());
Log.d("ExecutorPerformanceTest", "Use ExecutorService (MS): " + duration.toMillis());
I've faced a similar problem, but Math.random() was not the issue.
The problem is having many small tasks that take just a few milliseconds to complete. It is not much but a lot of small tasks in series ends up being a lot of time and I needed to parallelize.
So, the solution I found, and it might work for those of you facing this same problem: do not use any of the executor services. Instead create your own long living Threads and feed them tasks.
Here is an example, just as an idea don't try to copy paste it cause it probably won't work as I am using Kotlin and translating to Java in my head. The concept is what's important:
First, the Thread, a Thread that can execute a task and then continue there waiting for the next one:
public class Worker extends Thread {
private Callable task;
private Semaphore semaphore;
private CountDownLatch latch;
public Worker(Semaphore semaphore) {
this.semaphore = semaphore;
}
public void run() {
while (true) {
semaphore.acquire(); // this will block, the while(true) won't go crazy
if (task == null) continue;
task.run();
if (latch != null) latch.countDown();
task = null;
}
}
public void setTask(Callable task) {
this.task = task;
}
public void setCountDownLatch(CountDownLatch latch) {
this.latch = latch;
}
}
There is two things here that need explanation:
the Semaphore: gives you control over how many tasks and when they are executed by this thread
the CountDownLatch: is the way to notify someone else that a task was completed
So this is how you would use this Worker, first just a simple example:
Semaphore semaphore = new Semaphore(0); // initially the semaphore is closed
Worker worker = new Worker(semaphore);
worker.start();
worker.setTask( .. your callable task .. );
semaphore.release(); // this will allow one task to be processed by the worker
Now a more complicated example, with two Threads and waiting for both to complete using the CountDownLatch:
Semaphore semaphore1 = new Semaphore(0);
Worker worker1 = new Worker(semaphore1);
worker1.start();
Semaphore semaphore2 = new Semaphore(0);
Worker worker2 = new Worker(semaphore2);
worker2.start();
// same countdown latch for both workers, with a counter of 2
CountDownLatch countDownLatch = new CountDownLatch(2);
worker1.setCountDownLatch(countDownLatch);
worker2.setCountDownLatch(countDownLatch);
worker1.setTask( .. your callable task .. );
worker2.setTask( .. your callable task .. );
semaphore1.release();
semaphore2.release();
countDownLatch.await(); // this will block until 2 tasks have been completed
And after that code runs you could just add more tasks to the same threads and reuse them. That's the whole point of this, reusing the threads instead of creating new ones.
It is unpolished as f*** but hopefully this gives you an idea. For me this was an improvement compared to no multi threading. And it was much much better than any executor service with any number of threads in the pool by far.