I want to understand performance in multithreaded environments. For that I have written a small test that I ran on my machine (quad-core Intel, Windows XP, Sun JDK 1.6.0_20), with surprising results.
The test is basically a thread-safe counter that is synchronized using either the synchronized keyword or an explicit lock. Here is the code:
import java.util.concurrent.locks.ReentrantLock;
public class SynchronizedPerformance {
static class Counter {
private static final int MAX = 1 << 24;
int count;
long lastLog = 0;
private final ReentrantLock lock = new ReentrantLock();
private int incrementAndGet() {
count++;
if (count == MAX) {
long now = System.nanoTime();
if (lastLog != 0) {
long elapsedTime = now - lastLog;
System.out.printf("counting took %.2f ns\n", Double.valueOf((double)elapsedTime / MAX));
}
lastLog = now;
count = 0;
}
return count;
}
synchronized int synchronizedIncrementAndGet() {
return incrementAndGet();
}
int lockedIncrementAndGet() {
lock.lock();
try {
return incrementAndGet();
} finally {
lock.unlock();
}
}
}
static class SynchronizedCounterAccessor implements Runnable {
private final Counter counter;
public SynchronizedCounterAccessor(Counter counter) {
this.counter = counter;
}
#Override
public void run() {
while (true)
counter.synchronizedIncrementAndGet();
}
}
static class LockedCounterAccessor implements Runnable {
private final Counter counter;
public LockedCounterAccessor(Counter counter) {
this.counter = counter;
}
#Override
public void run() {
while (true)
counter.lockedIncrementAndGet();
}
}
public static void main(String[] args) {
Counter counter = new Counter();
final int n = Integer.parseInt(args[0]);
final String mode = args[1];
if (mode.equals("locked")) {
for (int i = 0; i < n; i++)
new Thread(new LockedCounterAccessor(counter), "ca" + i).start();
} else if (mode.equals("synchronized")) {
for (int i = 0; i < n; i++)
new Thread(new SynchronizedCounterAccessor(counter), "ca" + i).start();
} else {
throw new IllegalArgumentException("locked|synchronized");
}
}
}
I made the following observations:
java SynchronizedPerformance 1 synchronized works pretty well, and takes about 15 ns per step.
java SynchronizedPerformance 2 synchronized interferes a lot and takes about 150 ns per step.
When I start two independent processes of java SynchronizedPerformance 2 synchronized each of them takes about 100 ns per step. That is, starting the process a second time makes the first one (and the second) faster.
I don't understand the third observation. What plausible explanations exist for this phenomenon?
You are running into a situation where performance is entirely dependent on how the scheduler operates. In #3, when any other process in the system wants some time (even a little bit), it will suspend one of your 4 threads. If that thread happens to not hold the lock at when it is suspended, its "pair" can now run uncontested, and will make lots of progress (runs at 20x speed compared to the contested situation).
Of course, if it is swapped out when it does hold the lock, its "pair" will make no progress. So you have two competing factors, and the overall runtime depends on the fraction of time the lock is held by a thread and the penalty/bonus you get for each situation. Your bonus is substantial so I would expect some overall speedup like you saw.
The most likely is that there are certain fixed overheads that exist regardless of how many threads exist- for example, garbage collection or other resource management.
Related
I'm playing around with trying to build a arraylist class that is made threadsafe in a very clumsy way by just slapping on the synchronized keyword on all methods
import java.util.stream.*;
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class LongArrayListUnsafe {
public static void main(String[] args) {
LongArrayList dal1 = LongArrayList.withElements();
ExecutorService executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
for (int i=0; i<1000; i++) {
executorService.execute(new Runnable() {
public void run() {
for (int i=0; i<10; i++)
dal1.add(i);
}
});}
System.out.println("Using toString(): " + dal1);
for (int i=0; i<dal1.size(); i++)
System.out.println(dal1.get(i));
System.out.println(dal1.size());} }
class LongArrayList {
private long[] items;
private int size;
public LongArrayList() {
reset();
}
synchronized public static LongArrayList withElements(long... initialValues){
LongArrayList list = new LongArrayList();
for (long l : initialValues) list.add( l );
return list;
}
// reset me to initial
synchronized public void reset(){
items = new long[2];
size = 0;
}
// Number of items in the double list
synchronized public int size() {
return size;
}
// Return item number i
synchronized public long get(int i) {
if (0 <= i && i < size)
return items[i];
else
throw new IndexOutOfBoundsException(String.valueOf(i));
}
// Replace item number i, if any, with x
synchronized public long set(int i, long x) {
if (0 <= i && i < size) {
long old = items[i];
items[i] = x;
return old;
} else
throw new IndexOutOfBoundsException(String.valueOf(i));}
// Add item x to end of list
synchronized public LongArrayList add(long x) {
if (size == items.length) {
long[] newItems = new long[items.length * 2];
for (int i=0; i<items.length; i++)
newItems[i] = items[i];
items = newItems;
}
items[size] = x;
size++;
return this;
}
synchronized public String toString() {
return Arrays.stream(items, 0,size)
.mapToObj( Long::toString )
.collect(Collectors.joining(", ", "[", "]"));
}
}
The relevant thing I'm doing is adding a bunch elements to a list, with some tasks. The issue is that when I increase the amount of threads that I pass to the fixedthreadPool, my code runs in the same time as when I only pass only one thread, maybe even slower.
I have three theories on why this is:
This is because of thread overhead, and the tasks I am creating are simply too small, I need to make them bigger before it pays off to use more threads.
It has to do with lock contention, because my class is so clumsily threadsafe, the threads a are competing for the locks, and somehow slowing down everything
I'm making a completely obvious mistake in using the threadexecutorpool
It is not only that your task is to simple. The key issue is that you marked the add function is synchronized which means that only a single thread is allowed to enter this function. No matter how many executers you use, at any single point in time, there is only one thread executing this function, while the others have to wait. Even if you make the task more complex, it won't change. You need to have a more complex task and a more fine grained synchronization thereof.
As for lock contention. Yes, see above, and of course acquiring and releasing locks costs time.
To answer the question in the comments:
sychronized synchronizes on the object on which you invoke the class, (i.e., dal1 which is shared by all your threads).
yes, the contention is fairly obvious. You said yourself "just slapping". Nevertheless for the code you have I would call it adequate. The operation that takes the longest is resizing and copying of the array and during that time you certainly do not want any other thread to modify your array.
I do not understand why each time I run this code I get different answer ?
Correct answer should be one 98098 two 98099. Anyone have any ideas why this is not working that way ? For example one time answer comes back "one 49047 two 49047" then another time it comes back "one 40072 two 40072". I am so confused at this point and lack any reasonable explanation
public class TestThreads {
public static void main(String[] args){
ThreadOne t1 = new ThreadOne();
ThreadTwo t2 = new ThreadTwo();
Thread one = new Thread(t1);
Thread two = new Thread(t2);
one.start();
two.start();
}
}
class ThreadOne implements Runnable {
Accum a = Accum.getAccum();
public void run(){
for(int x = 0; x < 98; x++){
a.updateCounter(1000);
try{
Thread.sleep(50);
}catch(InterruptedException ex){
}
}
System.out.println("one " + a.getCount());
}
}
class ThreadTwo implements Runnable {
Accum a = Accum.getAccum();
public void run(){
for(int x = 0; x < 99; x++){
a.updateCounter(1);
try{
Thread.sleep(50);
}catch(InterruptedException ex){
}
}
System.out.println("two " + a.getCount());
}
}
class Accum {
private static Accum a = new Accum();
public static Accum getAccum(){
return a;
}
private int counter = 0;
public int getCount(){
return counter;
}
public void updateCounter(int add){
counter += add;
}
private Accum(){ }
}
As you have two threads updating the same data without thread safety, one thread easily overwrites the value set by the other one.
Each thread works on it's own thread cached value. e.g.
Thread 1 adds 1 one hundred times. It has the value 100
Thread 2 adds 1000 one hundred times. It has the value 100000
At this point, the one value is chosen. say it's thread 1's value.
Thread 1 adds 1 one hundred times. It has the value 200
Thread 2 adds 1000 one hundred times. It has the value 100100
This time, thread 2's value is chosen.
In the end only half the updates on average are retained as the value chosen is somewhat random.
You can get to 98099 by declaring the methods in Accum as synchronized.
This will ensure that only one of the threads can access it's information at a time.
As the other answers have pointed out, you are getting unexpected results because there is nothing to stop each thread overwriting what the other had done.
Try this:
class Accum {
private static Accum a = new Accum();
public static synchronized Accum getAccum(){
return a;
}
private int counter = 0;
public synchronized int getCount(){
return counter;
}
public synchronized void updateCounter(int add){
counter += add;
}
private Accum(){ }
}
your problem is this:
private static Accum a = new Accum();
public static Accum getAccum(){
return a;
}
Since its a STATIC there is only one instance shared by all threads. so when you set it in one thread, all threads get the same new value. if you remove the static notifier and instantiate a new object of class Accum for each thread it should work.
I am having trouble figuring out what my code is doing as this is my first time coding using multiple threads. To start off, in attempt to learn this type of programming I decided to write a miniature program that uses 8 threads to sum a number. However, no matter what I do it seems as if my program never stops when count = 10, it continues onward. I am using 8 threads as I planned on expanding my program to do large calculations. However, these threads are not correlating at all. They are going way past 10. I have used a synchronized method. I have tried a lock. I have tried implementing both at the same time. No matter what, it appears as if the threads still calculate past 10. See below for my current code.
public class calculator implements Runnable {
static int counter = 0;
static int sum = 0;
private synchronized static int getAndIncrement()
{
// System.out.println("counter is : " + counter);
int temp = counter;
counter = counter + 1;
System.out.println("counter is now : " + counter);
return temp;
}
private synchronized void addToSum(int value)
{
// System.out.println("sum : " + sum + " value: " + value);
sum += value;
}
#Override
public void run()
{
// TODO Auto-generated method stub
while(counter < 10)
{
int tempVal = getAndIncrement();
System.out.println("temp val : " + tempVal);
addToSum(tempVal);
// System.out.println("sum is now : " + sum);
}
}
}
This is my main method:
public static void main(String[] args)
{
calculator[] calc = new calculator[8];
Thread[] thread = new Thread[8];
final long startTime = System.currentTimeMillis();
for(int i = 0; i < 8; i++)
{
calc[i] = new calculator();
thread[i] = new Thread(calc[i]);
thread[i].start();
}
while(thread[0].isAlive() ||thread[1].isAlive() || thread[2].isAlive() || thread[3].isAlive() || thread[4].isAlive() || thread[5].isAlive() || thread[6].isAlive() || thread[7].isAlive())
{}
final long endTime = System.currentTimeMillis();
System.out.println(calculator.sum);
System.out.println("Execution time : " + (startTime - endTime));
}
I appreciate the help!
The synchronized keyword takes the object
lock. This means that two methods that are synchronized cannot execute on the same object. They will, however, execute concurrently on invocation on 2 different objects.
In your example, your code had 8 objects of calculator. The synchronized methods do not help you. Each thread uses it's separate object. You can completely remove the synchronized keyword, and your code will be semantically equivalent.
To avoid this, use the atomic version of the objects (AtomicInt) or lock on the objects themselves: synchronized(counter){...} but for this to work you will have to change the type to Integer.
I've just tested your sample and found the addToSum method doesn't work as expected here with heavy multi-thread, even if synchronized keyword is present.
Here, as sum variable is static, the method can be made static too.
After adding the static keyword, the behavior is as expected:
private static synchronized void addToSum(int value)
{
sum += value;
}
Here a simple test (addToSum replaced by incSum for simplicity) :
class IncrementorThread implements Runnable {
private static int sum = 0;
private static synchronized void incSum()
{
sum ++;
}
public void run() {
incSum();
Thread.yield();
}
}
void testIncrementorThread1() {
ExecutorService executorService = Executors.newCachedThreadPool();
//ExecutorService executorService = Executors.newSingleThreadExecutor() // result always ok without needing concurrency precaution
for(int i = 0; i < 5000; i++)
executorService.execute(new IncrementorThread());
executorService.shutdown();
executorService.awaitTermination(4000, TimeUnit.MILLISECONDS);
System.out.println("res = "+IncrementorThread.sum); // must be 5000
}
Result must be 5000, which is not the case if we remove the static keyword from the method incSum()
Is there an ExecutorService that is suitable for a huge amount of very short-lived tasks? I envision something that internally tries busy waiting before switching over to synchronized waiting. Keeping the order of the tasks is not important, but it should be possible to enforce memory consistency (all tasks happen-before the main thread regains control).
The test posted below consists of 100'000 tasks that each generate 100 doubles in a row. It accepts the size of the thread pool as command-line parameter and always tests the serial version vs. the parallel one. (If no command-line arg is given, only the serial version is tested.) The parallel version uses a thread pool of fixed size, allocation of the tasks is not even part of the time measurement. Still, the parallel version is never faster than the serial version, I've tried up to 80 threads (on a machine with 40 hyperthreaded cores). Why?
import java.util.ArrayList;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ExecutorPerfTest {
public static final int TASKS = 100000;
public static final int SUBTASKS = 100;
static final ThreadLocal<Random> R = new ThreadLocal<Random>() {
#Override
protected synchronized Random initialValue() {
return new Random();
}
};
public class SeqTest implements Runnable {
#Override
public void run() {
Random r = R.get();
for (int i = 0; i < TASKS; i++)
for (int j = 0; j < SUBTASKS; j++)
r.nextDouble();
}
}
public class ExecutorTest implements Runnable {
private final class RandomGenerating implements Callable<Double> {
#Override
public Double call() {
double d = 0;
Random r = R.get();
for (int j = 0; j < SUBTASKS; j++)
d = r.nextDouble();
return d;
}
}
private final ExecutorService threadPool;
private ArrayList<Callable<Double>> tasks = new ArrayList<Callable<Double>>(TASKS);
public ExecutorTest(int nThreads) {
threadPool = Executors.newFixedThreadPool(nThreads);
for (int i = 0; i < TASKS; i++)
tasks.add(new RandomGenerating());
}
public void run() {
try {
threadPool.invokeAll(tasks);
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
threadPool.shutdown();
}
}
}
public static void main(String[] args) {
ExecutorPerfTest executorPerfTest = new ExecutorPerfTest();
if (args.length > 0)
executorPerfTest.start(new String[]{});
executorPerfTest.start(args);
}
private void start(String[] args) {
final Runnable r;
if (args.length == 0) {
r = new SeqTest();
}
else {
final int nThreads = Integer.parseInt(args[0]);
r = new ExecutorTest(nThreads);
}
System.out.printf("Starting\n");
long t = System.nanoTime();
r.run();
long dt = System.nanoTime() - t;
System.out.printf("Time: %.6fms\n", 1e-6 * dt);
}
}
The call to Executors.newFixedThreadPool(nThreads) will create a ThreadPoolExecutor that reads tasks from a LinkedBlockingQueue, ie. all threads in the executor will lock on the same queue to retrieve the next task.
Given the very small size of each task and the relatively large number of threads/cpus that you are quoting, it's most likely that your program is running slowly because of the high degree of lock contention and context switching that will be occurring.
Note that the implementation of the ReentrantLock used by LinkedBlockingQueue already spins for short periods (up to approximately 1us) while trying to acquire the lock before the thread gives up and blocks.
If your use case permits then you might want to try using the Disruptor pattern instead, see http://lmax-exchange.github.com/disruptor/
Consider the following code picked from Joshua Bloch - Effective Java, page 263
// Broken - requires synchronization!
private static volatile int nextSerialNumber = 0;
public static int generateSerialNumber() {
return nextSerialNumber++;
}
One way to fix the
generateSerialNumber method is to add
the synchronized modifier to its
declaration. This ensures that
multiple invocations won’t be
interleaved, and that each invocation
will see the effects of all previous
invocations. Once you’ve done that,
you can and should remove the volatile
modifier from nextSerialNumber. To
bulletproof the method, use long
instead of int, or throw an exception
if nextSerialNumber is about to wrap.
I understand that we can remove volatile after we make generateSerialNumber synchronized, as it is redundant. But, does it make any harm? Any performance penalty if I am having both synchronized and volatile like
private static volatile int nextSerialNumber = 0;
public static synchronized int generateSerialNumber() {
return nextSerialNumber++;
}
What does, use long instead of int means? I do not understand how this bulletproof the method?
It simply means that long will hold many more numbers than int.
or throw an exception if nextSerialNumber is about to wrap
implies that the concern here is that you run out of numbers and you end up with an overflow. You want to ensure that does not happen. The thing is, if you are at the maximum integer possible and you increment, the program does not fail. It happily goes not incrementing but the result is no longer correct.
Using long will postpone this possibility. Throwing the exception will indicate that it has happened.
What does, use long instead of int means?
It ensures that serial numbers don't roll over for a long, long time to come. Using an int you might use up all the available values (thus nextSerialNumber will have the maximum possible int value), then at the next increment the value is silently rolled over to the smallest (negative) int value, which is almost certainly not what you would expect from serial numbers :-)
IMHO volatile/AtomicInteger is faster than synchronized in a multi-threaded context. In a single threaded micro-benchmark they are much the same. Part of the reson for this is that synchronized is a OS call whereas volatile is entirely user space.
I get this output from the following program on Java 6 update 23.
Average time to synchronized++ 10000000 times. was 110368 us
Average time to synchronized on the class ++ 10000000 times. was 37140 us
Average time to volatile++ 10000000 times. was 19660 us
I cannot explain why synchronizing on the class is faster than a plain object.
Code:
static final Object o = new Object();
static int num = 0;
static final AtomicInteger num2 = new AtomicInteger();
public static void main(String... args) throws InterruptedException {
final int runs = 10 * 1000 * 1000;
perfTest(new Runnable() {
public void run() {
for (int i = 0; i < runs; i++)
synchronized (o) {
num++;
}
}
public String toString() {
return "synchronized++ " + runs + " times.";
}
}, 4);
perfTest(new Runnable() {
public void run() {
for (int i = 0; i < runs; i++)
synchronized (Main.class) {
num++;
}
}
public String toString() {
return "synchronized on the class ++ " + runs + " times.";
}
}, 4);
perfTest(new Runnable() {
public void run() {
for (int i = 0; i < runs; i++)
num2.incrementAndGet();
}
public String toString() {
return "volatile++ " + runs + " times.";
}
}, 4);
}
public static void perfTest(Runnable r, int times) throws InterruptedException {
ExecutorService es = Executors.newFixedThreadPool(times);
long start = System.nanoTime();
for (int i = 0; i < times; i++)
es.submit(r);
es.shutdown();
es.awaitTermination(1, TimeUnit.MINUTES);
long time = System.nanoTime() - start;
System.out.println("Average time to " + r + " was " + time / times / 10000 + " us");
}
One way of bulletproofing would be (in addition to the above)
if (nextSerialNumber >= Integer.MAX_VALUE)
// throw an Exception;
or print out something, or catch that exception in calling routine