Fastest Way for Java to write mutexes?

Fastest Way for Java to write mutexes? - java

Mutexes are pretty common in many programming languages, like e.g. C/C++. I miss them in Java. However, there are multiple ways I could write my own class Mutex:
Using a simple synchronized keyword on Mutex.
Using a binary semaphore.
Using atomic variables, like discussed here.
...?
What is the fastest (best runtime) way? I think synchronized is most common, but what about performance?

Mutexes are pretty common in many programming languages, like e.g. C/C++. I miss them in Java.
Not sure I follow you (especially because you give the answer in your question).
public class SomeClass {
private final Object mutex = new Object();
public void someMethodThatNeedsAMutex() {
synchronized(mutex) {
//here you hold the mutex
}
}
}
Alternatively, you can simply make the whole method synchronized, which is equivalent to using this as the mutex object:
public class SomeClass {
public synchronized void someMethodThatNeedsAMutex() {
//here you hold the mutex
}
}
What is the fastest (best runtime) way?
Acquiring / releasing a monitor is not going to be a significant performance issue per se (you can read this blog post to see an analysis of the impact). But if you have many threads fighting for the lock, it will create contention and degrade performance.
In that case, the best strategy is to not use mutexes by using "lock-free" algorithms if you are mostly reading data (as pointed out by Marko in the comments, lock-free uses CAS operations, which may involve retrying writes many times if you have lots of writing threads, eventually leading to worse performance) or even better, by avoiding to share too much stuff across threads.

The opposite is the case: Java designers solved it so well that you don't even recognize it: you don't need a first-class Mutex object, just the synchronized modifier.
If you have a special case where you want to juggle your mutexes in a non-nesting fashion, there's always the ReentrantLock and java.util.concurrent offers a cornucopia of synchronization tools that go way beyond the crude mutex.

In Java each object can be uses as Mutex.
This objects are typicaly named "lock" or "mutex".
You can create that object for yourself which is the prefered variant, because it avoids external access to that lock:
// usually a field in the class
private Object mutex = new Object();
// later in methods
synchronized(mutex) {
// mutual exclusive section for all that uses synchronized
// ob this mutex object
}
Faster is to avoid the mutex, by thinking what happens if another thread reads an non actual value. In some situations this would produce wrong calculation results, in other results only in a minimal delay. (but faster than with syncing)
Detailed explanation in book
Java Concurreny in practise
.

What is the fastest (best runtime) way?
That depends on many things. For example, ReentrantLock used to perform better under contention than using synchronized, but that changed when a new HotSpot version, optimizing synchronized locking, was released. So there's nothing inherent in any way of locking that favors one flavor of mutexes over the other (from a performance point of view) - in fact, the "best" solution can change with the data you're processing and the machine you're running on.
Also, why did the inventors of Java not solve this question for me?
They did - in several ways: synchronized, Locks, atomic variables, and a whole slew of other utilities in java.util.concurrent.

You can run micro benchmarks of each variant, like atomic, synchronized, locked. As others have pointed out, it depends a lot on the machine and number of threads in use. In my own experiments incrementing long integers, I found that with only one thread on a Xeon W3520, synchronized wins over atomic: Atomic/Sync/Lock: 8.4/6.2/21.8, in nanos per increment operation.
This is of course a border case since there is never any contention. Of course, in that case, we can also look at unsynchronized single-threads long increment, which comes out six times faster than atomic.
With 4 threads I get 21.8/40.2/57.3. Note that these are all increments across all threads, so we actually see a slowdown. It gets a bit better for locks with 64 threads: 22.2/45.1/45.9.
Another test on a 4-way/64T machine using Xeon E7-4820 yields for 1 thread: 9.1/7.8/29.1, 4 threads: 18.2/29.1/55.2 and 64 Threads: 53.7/402/420.
One more data point, this time a dual Xeon X5560, 1T: 6.6/5.8/17.8, 4T: 29.7/81.5/121, 64T: 31.2/73.4/71.6.
So, on a multi-socket machine, there is a heavy cache coherency tax.

you can use java.util.concurrent.locks.Lock in the same way as the mutex or java.util.concurrent.Semaphore. But using synchronized-keyword is a better way :-)
Regards
Andrej

Related

Thread safety static variables

i read
thread safety for static variables and i understand it and i agree with it but
In book java se 7 programmer exam 804 can some one explain to me
public void run() {
synchronized(SharedCounter.class) {
SharedCounter.count++;
}
}
However, this code is inefficient since it acquires and releases the
lock every time just to increment the value of count.
can someone explain to me the above quote

The code is not particularly inefficient. It could be slightly more efficient. The main problem is that it is fragile: if any developer forgets to synchronize its access to the global SharedCounter.count variable, you have a thread-safety issue. Indeed, since i++ is not an atomic operation and since changing the value of a variable without synchronization doesn't make the variables new value visible to other threads, Every access to i must be done in a synchronized way.
The synchronization is thus not correctly encapsulated in a single class. Generally, accessing global public fields is bad design. It's even worse in a multi-threaded environment.
Using an AtomicInteger solves the encapsulation problem, and makes it slightly more efficient at the same time.

Synchronizing can be expensive, so it shouldn't be used carelessly. There are better ways such as using AtomicInteger.incrementAndGet(); which uses different mechanisms to handle the synchronization.

It's inefficient compared to using intrinsic CPU instructions which can do atomic increments without using a lock. See http://en.wikipedia.org/wiki/Fetch-and-add and http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicInteger.html

Difference b/w intrinsic locking, client side locking & extrinsic locking?

what is the difference b/w intrinsic locking, client side locking & extrinsic locking ?
What is the best way to create a thread safe class ?
which kind of locking is prefered & why ?

I would highly recommend you to read "Java Concurrency In Practice" by Brian Goetz. It is an excellent book that will help you to understand all the concepts about concurrency!
About your questions, I am not sure if I can answer them all, but I can give it a try. Most of the times, if the question is "what is the best way to lock" etc, the answer is always it depends on what problem you try to solve.
Question 1:
What you try to compare here are not exactly comparable;
Java provides a built in mechanism for locking, the synchronized block. Every object can implicitly act as a lock for purposes of synchronization; these built-in locks are called intrinsic locks.
What is interesting with the term intrinsic is that the ownership of a lock is per thread and not per method invocation. That means that only one thread can hold the lock at a given time. What you might also find interesting is the term reentrancy, which allows the same thread to acquire the same lock again. Intrinsic locks are reentrant.
Client side locking, if I understand what you mean, is something different. When you don't have a thread safe class, your clients need to take care about this. They need to hold locks so they can make sure that there are not any race conditions.
Extrinsic locking is, instead of using the built in mechanism of synchronized block which gives you implicit locks to specifically use explicit locks. It is kind of more sophisticate way of locking. There are many advantages (for example you can set priorities). A good starting point is the java documentation about locks
Question 2:
It depends :) The easiest for me is to try to keep everything immutable. When something is immutable, I don't need to care about thread safety anymore
Question 3:
I kind of answered it on your first question

Explicit - locking using concurrent lock utilities like Lock interface. eg - ConcurrentHashMap
Intrinsic - locking using synchronized.
Client side locking - Classes like ConcurrentHashMap doesn't support Client side locking because get method is not using any kind of lock. so although you put a lock over its object like synchronized (object of ConcurrentHashMap) still some other thread can access object of ConcurrentHashMap.
Classes having all set get methods Explicit or Intrinsic locks are supporting client side locking. As some client code come and lock over that object. below is example of Vector
public static Object getLast(Vector list) {
synchronized (list) {
int lastIndex = list.size() - 1;
return list.get(lastIndex);
}
}
public static void deleteLast(Vector list) {
synchronized (list) {
int lastIndex = list.size() - 1;
list.remove(lastIndex);
}
}

Here are some links that discuss the different locking schemes:
Explicit versus Intrinsic
Client side locking and when to avoid it
I don't know that there is a "best" way to create a thread safe class, it depends on what you are trying to achieve exactly. Usually you don't have to make the whole class thread safe, only guard the resources that different threads all have access to, such as common lists etc.

Is there a viable use case where Java synchronized keyword is better than Atomics?

I've not much experience with threading, but I wrote a nifty non-blocking sequential Id generator with Atomics.... It got me a very significant performance boost in testing. Now I wonder why anyone would use synchronized since it is so much slower... is there a reason on modern 64 bit multi-core hardware? Others are asking me about Atomics now. I would love to be able to tell them to never use that keyword unless they are deploying to ancient hardware.

Because you can't do multiple actions exclusively using only atomics (well, technically you can, because you can implement a "lock" using atomics, but i think that's beside the point). you also can't do a blocking wait using atomics (you can do a busy wait, but that's almost always a bad idea).
here's an exercise for the OP: write a program which writes timestamped log messages using multiple threads to the same file where the messages must show up in the file in timestamp order. implement this using only atomics, but without re-inventing ReentrantLock/synchronized.

Now I wonder why anyone would use synchronized since it is so much slower...
Maybe because speed isn't everything.
In fact, if you looked objectively at the overall performance benefit of using your "nifty" generator on a real application, I suspect you will find that it is too small to matter. Profiling the application will tell you that.
Then there is the issue of whether your benchmarking is actually valid; i.e. whether you are doing the things that are needed to avoid misleading effects like JVM warmup anomalies, optimization anomalies, and so on. And whether you are (actually) measuring the contended and uncontended cases.
Is there a viable use case where Java synchronized keyword is better than Atomics?
That's easy.
Any situation where you require exclusive access to one or more data structures to perform a sequence of operations, or an operation that is not intrinsically thread-safe. The AtomicXxx types don't support this kind of thing.
I would love to be able to tell them to never use that keyword unless they are deploying to ancient hardware.
Don't tell them that. It is incorrect. In fact, if you are new to Java threads, I recommend that you read "Java Concurrency in Practice" by Goetz et al before you start advising people.

Depends on what you are doing - if you only need the functionality that an atomic provides you, then yes, there would be no need to do the same work yourself (using the synchronized keyword). However, many multithreaded applications do things a lot more complicated than just needing increment a number atomically.
For example, you might need a unit of work to be done where you modify several data structures in memory and all of that has to happen without interference - you could use a synchronized function or block for that.

From what I understand, the synchronized keyword is actually a moderately heavyweight recursive (re-entrant) lock.
For instance, the following (horrible) code would not deadlock:
public static Object lock = new Object();
int recurCount = 0;
public int fLocktorial(int n) {
synchronized(lock) {
recurCount++;
if (n <= 0)
return 1;
return n * fLocktorial(n-1);
}
}
Implementing this requires the maintenance of additional state and logic within the lock, which may contribute to its lower performance over atomics and other primitives. However, it does allow you to grab locks arbitrarily inside functions, without worrying if a caller has already obtained the lock. Locks implemented naively using Atomics would deadlock in this case.
Additionally, synchronized may yield performance benefits if large amounts of processing are done within the lock. Getting a lock only has a performance hit once, while atomics force a core synchronization per operation. This flushes the processor pipeline, impacting performance.

Conceptually, a critical section protected by a lock transform state from one valid state to another.
int x, y; // invariant: x==y
void inc()
synchronized(lock)
x++;
y++;
void dec()
...
We could encapsulate the state in an object, and atomically change the object.
class State
final int x, y;
State(int x, y) { ... }
volatile State state;
void inc()
do
State s = state;
State s2 = new State(s.x+1, s.y+1);
while( ! compareAndSet( "state", s, s2) ) // use Unsafe or something
Is that better? not necessarily. It is appealing, and it's simpler when states get more complicated; but it's probably slower in most cases.

Is unsynchronized read of integer threadsafe in java?

I see this code quite frequently in some OSS unit tests, but is it thread safe ? Is the while loop guaranteed to see the correct value of invoc ?
If no; nerd points to whoever also knows which CPU architecture this may fail on.
private int invoc = 0;
private synchronized void increment() {
invoc++;
}
public void isItThreadSafe() throws InterruptedException {
for (int i = 0; i < TOTAL_THREADS; i++) {
new Thread(new Runnable() {
public void run() {
// do some stuff
increment();
}
}).start();
}
while (invoc != TOTAL_THREADS) {
Thread.sleep(250);
}
}

No, it's not threadsafe. invoc needs to be declared volatile, or accessed while synchronizing on the same lock, or changed to use AtomicInteger. Just using the synchronized method to increment invoc, but not synchronizing to read it, isn't good enough.
The JVM does a lot of optimizations, including CPU-specific caching and instruction reordering. It uses the volatile keyword and locking to decide when it can optimize freely and when it has to have an up-to-date value available for other threads to read. So when the reader doesn't use the lock the JVM can't know not to give it a stale value.
This quote from Java Concurrency in Practice (section 3.1.3) discusses how both writes and reads need to be synchronized:
Intrinsic locking can be used to guarantee that one thread sees the effects of another in a predictable manner, as illustrated by Figure 3.1. When thread A executes a synchronized block, and subsequently thread B enters a synchronized block guarded by the same lock, the values of variables that were visible to A prior to releasing the lock are guaranteed to be visible to B upon acquiring the lock. In other words, everything A did in or prior to a synchronized block is visible to B when it executes a synchronized block guarded by the same lock. Without synchronization, there is no such guarantee.
The next section (3.1.4) covers using volatile:
The Java language also provides an alternative, weaker form of synchronization, volatile variables, to ensure that updates to a variable are propagated predictably to other threads. When a field is declared volatile, the compiler and runtime are put on notice that this variable is shared and that operations on it should not be reordered with other memory operations. Volatile variables are not cached in registers or in caches where they are hidden from other processors, so a read of a volatile variable always returns the most recent write by any thread.
Back when we all had single-CPU machines on our desktops we'd write code and never have a problem until it ran on a multiprocessor box, usually in production. Some of the factors that give rise to the visiblity problems, things like CPU-local caches and instruction reordering, are things you would expect from any multiprocessor machine. Elimination of apparently unneeded instructions could happen for any machine, though. There's nothing forcing the JVM to ever make the reader see the up-to-date value of the variable, you're at the mercy of the JVM implementors. So it seems to me this code would not be a good bet for any CPU architecture.

Well!
private volatile int invoc = 0;
Will do the trick.
And see Are java primitive ints atomic by design or by accident? which sites some of the relevant java definitions. Apparently int is fine, but double & long might not be.
edit, add-on. The question asks, "see the correct value of invoc ?". What is "the correct value"? As in the timespace continuum, simultaneity doesn't really exist between threads. One of the above posts notes that the value will eventually get flushed, and the other thread will get it. Is the code "thread safe"? I would say "yes", because it won't "misbehave" based on the vagaries of sequencing, in this case.

Theoretically, it is possible that the read is cached. Nothing in Java memory model prevents that.
Practically, that is extremely unlikely to happen (in your particular example). The question is, whether JVM can optimize across a method call.
read #1
method();
read #2
For JVM to reason that read#2 can reuse the result of read#1 (which can be stored in a CPU register), it must know for sure that method() contains no synchronization actions. This is generally impossible - unless, method() is inlined, and JVM can see from the flatted code that there's no sync/volatile or other synchronization actions between read#1 and read#2; then it can safely eliminate read#2.
Now in your example, the method is Thread.sleep(). One way to implement it is to busy loop for certain times, depending on CPU frequency. Then JVM may inline it, and then eliminate read#2.
But of course such implementation of sleep() is unrealistic. It is usually implemented as a native method that calls OS kernel. The question is, can JVM optimize across such a native method.
Even if JVM has knowledge of internal workings of some native methods, therefore can optimize across them, it's improbable that sleep() is treated that way. sleep(1ms) takes millions of CPU cycles to return, there is really no point optimizing around it to save a few reads.
--
This discussion reveals the biggest problem of data races - it takes too much effort to reason about it. A program is not necessarily wrong, if it is not "correctly synchronized", however to prove it's not wrong is not an easy task. Life is much simpler, if a program is correctly synchronized and contains no data race.

As far as I understand the code it should be safe. The bytecode can be reordered, yes. But eventually invoc should be in sync with the main thread again. Synchronize guarantees that invoc is incremented correctly so there is a consistent representation of invoc in some register. At some time this value will be flushed and the little test succeeds.
It is certainly not nice and I would go with the answer I voted for and would fix code like this because it smells. But thinking about it I would consider it safe.

If you're not required to use "int", I would suggest AtomicInteger as an thread-safe alternative.

When do I need to use AtomicBoolean in Java?

How I can use AtomicBoolean and what is that class for?

When multiple threads need to check and change the boolean. For example:
if (!initialized) {
initialize();
initialized = true;
}
This is not thread-safe. You can fix it by using AtomicBoolean:
if (atomicInitialized.compareAndSet(false, true)) {
initialize();
}

Here is the notes (from Brian Goetz book) I made, that might be of help to you
AtomicXXX classes
provide Non-blocking Compare-And-Swap implementation
Takes advantage of the support provide
by hardware (the CMPXCHG instruction
on Intel) When lots of threads are
running through your code that uses
these atomic concurrency API, they
will scale much better than code
which uses Object level
monitors/synchronization. Since,
Java's synchronization mechanisms
makes code wait, when there are lots
of threads running through your
critical sections, a substantial
amount of CPU time is spent in
managing the synchronization
mechanism itself (waiting, notifying,
etc). Since the new API uses hardware
level constructs (atomic variables)
and wait and lock free algorithms to
implement thread-safety, a lot more
of CPU time is spent "doing stuff"
rather than in managing
synchronization.
not only offer better
throughput, but they also provide
greater resistance to liveness
problems such as deadlock and
priority inversion.

There are two main reasons why you can use an atomic boolean. First it's mutable, you can pass it in as a reference and change the value that is associated to the boolean itself, for example.
public final class MyThreadSafeClass{
private AtomicBoolean myBoolean = new AtomicBoolean(false);
private SomeThreadSafeObject someObject = new SomeThreadSafeObject();
public boolean doSomething(){
someObject.doSomeWork(myBoolean);
return myBoolean.get(); //will return true
}
}
and in the someObject class
public final class SomeThreadSafeObject{
public void doSomeWork(AtomicBoolean b){
b.set(true);
}
}
More importantly though, it's thread safe and can indicate to developers maintaining the class, that this variable is expected to be modified and read from multiple threads. If you do not use an AtomicBoolean, you must synchronize the boolean variable you are using by declaring it volatile or synchronizing around the read and write of the field.

The AtomicBoolean class gives you a boolean value that you can update atomically. Use it when you have multiple threads accessing a boolean variable.
The java.util.concurrent.atomic package overview gives you a good high-level description of what the classes in this package do and when to use them. I'd also recommend the book Java Concurrency in Practice by Brian Goetz.

Excerpt from the package description
Package java.util.concurrent.atomic description: A small toolkit of classes that support lock-free thread-safe programming on single variables.[...]
The specifications of these methods enable implementations to employ efficient machine-level atomic instructions that are available on contemporary processors.[...]
Instances of classes AtomicBoolean, AtomicInteger, AtomicLong, and AtomicReference each provide access and updates to a single variable of the corresponding type.[...]
The memory effects for accesses and updates of atomics generally follow the rules for volatiles:
get has the memory effects of reading a volatile variable.
set has the memory effects of writing (assigning) a volatile variable.
weakCompareAndSet atomically reads and conditionally writes a variable, is ordered with respect to other memory operations on that variable, but otherwise acts as an ordinary non-volatile memory operation.
compareAndSet and all other read-and-update operations such as getAndIncrement have the memory effects of both reading and writing volatile variables.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.