Does 'volatile' guarantee that any thread reads the most recently written value? - java

From the book Effective Java:
While the volatile modifier performs no mutual exclusion, it guarantees that any thread that reads the field will see the most recently written value
SO and many other sources claim similar things.
Is this true?
I mean really true, not a close-enough model, or true only on x86, or only in Oracle JVMs, or some definition of "most recently written" that's not the standard English interpretation...
Other sources (SO example) have said that volatile in Java is like acquire/release semantics in C++. Which I think do not offer the guarantee from the quote.
I found that in the JLS 17.4.4 it says "A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order)." But I don't quite understand.
There are quite some sources for and against this, so I'm hoping the answer is able to convince that many of those (on either side) are indeed wrong - for example reference or spec, or counter-example code.

Is this true?
I mean really true, not a close-enough model, or true only on x86, or only in Oracle JVMs, or some definition of "most recently written" that's not the standard English interpretation...
Yes, at least in the sense that a correct implementation of Java gives you this guarantee.
Unless you are using some exotic, experimental Java compiler/JVM (*), you can essentially take this as true.
From JLS 17.4.5:
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
(*) As Stephen C points out, such an exotic implementation that doesn't implement the memory model semantics described in the language spec can't usefully (or even legally) be described as "Java".

The quote per-se is correct in terms of what is tries to prove, but it is incorrect on a broader view.
It tries to make a distinction of sequential consistency and release/acquire semantics, at least in my understanding. The difference is rather "thin" between these two terms, but very important. I have tried to simplify the difference at the beginning of this answer or here.
The author is trying to say that volatile offers that sequential consistency, as implied by that:
"... it guarantees that any thread.."
If you look at the JLS, it has this sentence:
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
The tricky part there is that subsequent and it's meaning, and it has been discussed here. What is really wants to mean is "subsequent that observes that write". So happens-before is guaranteed when the reader observes the value that the writer has written.
This already implies that a write is not necessarily seen on the next read, and this can be the case where speculative execution is allowed. So in this regard, the quote is miss-leading.
The quote that you found:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order)
is a complicated to understand without a much broader context. In simple words, it established synchronizes-with order (and implicitly happens-before) between two threads, where volatile v variables is a shared variable. here is an answer where this has broader explanation and thus should make more sense.

It is not true. JMM is based on sequential consistency and for sequential consistency real time ordering isn't guaranteed; for that you need linearizability. In other words, reads and writes can be skewed as long as the program order isn't violated (or as long is it can't be proven po was violated).
A read of volatile variable a, needs to see the most recent written value before it in the memory order. But that doesn't imply real time ordering.
Good read about the topic:
https://concurrency-interest.altair.cs.oswego.narkive.com/G8KjyUtg/relativity-of-guarantees-provided-by-volatile.
I'll make it concrete:
Imagine there are 2 CPU's and (volatile) variable A with initial value 0. CPU1 does a store A=1 and CPU2 does a load of A. And both CPUs have the cacheline containing A in SHARED state.
The store is first speculatively executed and written to the store buffer; eventually the store commits and retires, but since the stored value is still in the store buffer; it isn't visible yet to the CPU2. Till so far it wasn't required for the cacheline to be in an EXCLUSIVE/MODIFIED state, so the cacheline on CPU2 still contains the old value and hence CPU2 can still read the old value.
So in the real time order, the write of A is ordered before the read of A=0, but in the synchronization order, the write of A=1 is ordered after the read of A=0.
Only when the store leaves the store buffer and wants to enter the L1 cache, the request for ownership (RFO) is send to all other CPU's which set the cacheline containing A to INVALID on CPU2 (RFO prefetching I'll leave out of the discussion). If CPU2 would now read A, it is guaranteed to see A=1 (the request will block till CPU1 has completed the store to the L1 cache).
On acknowledgement of the RFO the cacheline is set to MODIFIED on CPU1 and the store is written to the L1 cache.
So there is a period of time between when the store is executed/retired and when it is visible to another CPU. But the only way to determine this is if you would add special measuring equipment to the CPUs.
I believe a similar delaying effect can happen on the reading side with invalidation queues.
In practice this will not be an issue because store buffers have a limited capacity and need to be drained eventually (so a write can't be invisible indefinitely). So in day to day usage you could say that a volatile read, reads the most recent write.
A java volatile write/read provides release/acquire semantics, but keep in mind that the volatile write/read is stronger than release/acquire semantics. A volatile write/read is sequential consistent and release/acquire semantics isn't.

Related

Memory Barrier Vs CAS

I find that CAS will flush all CPU write cache to main memory。 Is this similar to memory barrier?
If this is true, does this mean CAS can make java Happens-Before work?
For answer:
The CAS is CPU instruction.
The barrier is a StoreLoad barrier because what I care about is will the data are written before CAS can be read after CAS.
More Detail:
I have this question because I am writing a fork-join built in Java. The implementation is like this
{
//initialize result container
Objcet[] result = new Object[];
//worker finish state count
AtomicInteger state = new AtomicInteger(result.size);
}
//worker thread i
{
result[i] = new Object();
//this is a CAS operation
state.getAndDecrement();
if(state.get() == 0){
//do something useing result array
}
}
I want to know can (do something using result array) part see all result element which is written by other worker thread.
I find that CAS will flush all cpu write cache to main memory。 Is this similar to memory barrier?
It depends on what you mean by CAS. (A specific hardware instruction? An implementation strategy used in the implementation of some Java class?)
It depends on what kind of memory barrier you are talking about. There are a number of different kinds ...
It is not necessarily true that a CAS instruction flushes all dirty cache lines. It depends on how a particular instruction set / hardware implements the CAS instruction.
It is unclear what you mean by "make happens-before work". Certainly, under some circumstance a CAS instruction would provide the necessary memory coherency properties for a specific happens-before relationship. But not necessarily all relationships. It would depend on how the CAS instruction is implemented by the hardware.
To be honest, unless you are actually writing a Java compiler, you would do better to not try to understanding the intricacies of what a JIT compiler needs to do to implement the Java Memory Model. Just apply the happens before rules.
UPDATE
It turns out from your recent updates and comments that your actual question is about the behavior of AtomicInteger operations.
The memory semantics of the atomic types are specified in the package javadoc for java.util.concurrent.atomic as follows:
The memory effects for accesses and updates of atomics generally follow the rules for volatiles, as stated in The Java Language Specification (17.4 Memory Model):
get has the memory effects of reading a volatile variable.
set has the memory effects of writing (assigning) a volatile variable.
lazySet has the memory effects of writing (assigning) a volatile variable except that it permits reorderings with subsequent (but not previous) memory actions that do not themselves impose reordering constraints with ordinary non-volatile writes. Among other usage contexts, lazySet may apply when nulling out, for the sake of garbage collection, a reference that is never accessed again.
weakCompareAndSet atomically reads and conditionally writes a variable but does not create any happens-before orderings, so provides no guarantees with respect to previous or subsequent reads and writes of any variables other than the target of the weakCompareAndSet.
compareAndSet and all other read-and-update operations such as getAndIncrement have the memory effects of both reading and writing volatile variables.
As you can see, operations on Atomic types are specified to have memory semantics that are equivalent volatile variables. This should be sufficient to reason about your use of Java atomic types ... without resorting to dubious analogies with CAS instructions and memory barriers.
Your example is incomplete and it is difficult to understand what it is trying to do. Therefore, I can't comment on its correctness. However, you should be able to analyze it yourself using happens-before logic, etc.
I find that CAS will flush all CPU write cache to main memory。
Is this similar to memory barrier?
A CAS in Java on the X86 is implemented using a lock prefix and then it depends on the type of CAS what kind of instruction is actually being used; but that isn't that relevant for this discussion. A locked instruction effectively is a full barrier; so it includes all 4 fences: LoadLoad/LoadStore/StoreLoad/StoreStore. Since the X86 provides all but StoreLoad due to TSO, only the StoreLoad needs to be added; just as with a volatile write.
A StoreLoad doesn't force changes to be written to main memory; it only forces the CPU to wait executing loads till the store buffer has been be drained to the L1d. However, with MESI (Intel) based cache coherence protocols, it can happen that a cache-line that is in MODIFIED state on a different CPU, needs to be flushed to main memory before it can be returned as EXCLUSIVE. With MOESI (AMD) based cache coherence protocols, this is not an issue. If the cache-line is already in MODIFIED,EXCLUSIVE state on the core doing the StoreLoad, StoreLoad doesn't cause the cache line to be flushed to main memory. The cache is the source of truth.
If this is true, does this mean CAS can make java Happens-Before work?
From a memory model perspective, a successful CAS in java is nothing else than a volatile read followed by a volatile write. So there is a happens before relation between a volatile write of some field on some object instance and a subsequent volatile read on the same field on the same object instance.
Since you are working with Java, I would focus on the Java Memory Model and not too much on how it is implemented in the hardware. The JMM is allowing for executions that can't be explained based purely by thinking in fences.
Regarding your example:
result[i] = new Object();
//this is a CAS operation
state.getAndDecrement();
if(state.get() == 0){
//do something using result array
}
I'm not sure what the intended logic is. In your example, multiple threads at the same time could see that the state is 0, so all could start to do something with the array. If this behavior is undesirable, then this is caused by a race condition. I would use something like this:
result[i] = new Object();
//this is a CAS operation
int s = state.getAndDecrement();
if(s == 0){
//do something using result array
}
Now the other question is if there is a data race on the array content. There is a happens-before edge between the write to the array content and the write to 'state' (program order rule). There is a happens before edge between the write of the state and the read (volatile variable rule) and there is a happens before relation between the read of the state and the read of the array content (program order rule). So there is a happens before edge between writing to the array and reading its content in this particular example due to the transitive nature of the happens-before relation.
Personally I would not to try too be too smart and use something less array prone like an AtomicReferenceArray; then at least you don't need to worry about missing happens before edge between the write of the array and the read.

Java volatile variables affecting memory consistency of other non-volatile variables

Scenario A
A1. Write to a volatile variable
A2. Flush all local non-volatile variable writes to main memory
Scenario B
B1. Read from a volatile variable
B2. Reload all non-volatile variables from main memory to local memory
Are scenarios A and B the correct behavior involved with volatile
variables? Or does Scenario A also include B2, or does Scenario B
also include A2?
Are these scenarios atomic? Can anything else happen
in between A1 and A2? Or B1 and B2?
(using Java 1.8 / 1.5+)
Writing to a volatile variables does not guarantee to flush non-volatile variables1. However, it will introduce a "happens before" relation between the write to the volatile and any subsequent read of the volatile (assuming no intervening writes to it). You can exploit this as follows:
Thread A : write NV
Thread A : write V
Thread B : read V
Thread B : read NV
If the actions happen in that order, then Thread B will see updated value of NV in step 4. However, if something (including A) writes to NV after step 2, it is unspecified what Thread B will see at step 4.
In general, using volatiles in this way requires deep and careful reasoning. It is easier and more robust to use synchronized.
Your example is unclear:
If it is intended to be a description of what the Java programmer must do, it is wrong / nonsensical. Java code cannot flush variables.
If it is intended to be a specification of what must happen at the implementation level (e.g. in the JIT compiled code), it is also wrong.
If it is intended to be a description of what could happen at the implementation level (e.g. in the JIT compiled code), it is correct.
I'm not just being pedantic here. The compiler may decide that it doesn't need to flush all local non-volatiles in Thread A, and it will most likely only reload the ones that it needs in Thread B. How it decides? That's the compiler writers' business!
1 - The JLS does not require hardware specific operations such as flushes. Instead, it requires the compiled code to meet some specific guarantees of memory visibility, and leaves the implementation to the compiler writer.
The actual rule is "A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order)." http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4
In other words, writes from one thread up until a write to v are all visible to reads from another thread once it has read v subsequently to that write.
I'm not certain that "flushing to main" is a necessary way to understand that. The Java memory model is documented in terms of happens-before and synchronizes-with. I recommend thinking of it in those terms. Conceptually a JVM could omit certain "flushes" if they aren't necessary to the promise.

Java volatile effect on other variables [duplicate]

So I am reading this book titled Java Concurrency in Practice and I am stuck on this one explanation which I cannot seem to comprehend without an example. This is the quote:
When thread A writes to a volatile
variable and subsequently thread B
reads that same variable, the values
of all variables that were visible to
A prior to writing to the volatile
variable become visible to B after
reading the volatile variable.
Can someone give me a counterexample of why "the values of ALL variables that were visible to A prior to writing to the volatile variable become visible to B AFTER reading the volatile variable"?
I am confused why all other non-volatile variables do not become visible to B before reading the volatile variable?
Declaring a volatile Java variable means:
The value of this variable will never be cached thread-locally: all reads and writes will go straight to "main memory".
Access to the variable acts as though it is enclosed in a synchronized block, synchronized on itself.
Just for your reference, When is volatile needed ?
When multiple threads using the same
variable, each thread will have its
own copy of the local cache for that
variable. So, when it's updating the
value, it is actually updated in the
local cache not in the main variable
memory. The other thread which is
using the same variable doesn't know
anything about the values changed by
the another thread. To avoid this
problem, if you declare a variable as
volatile, then it will not be stored
in the local cache. Whenever thread
are updating the values, it is updated
to the main memory. So, other threads
can access the updated value.
From JLS §17.4.7 Well-Formed Executions
We only consider well-formed
executions. An execution E = < P, A,
po, so, W, V, sw, hb > is well formed
if the following conditions are true:
Each read sees a write to the same
variable in the execution. All reads
and writes of volatile variables are
volatile actions. For all reads r in
A, we have W(r) in A and W(r).v = r.v.
The variable r.v is volatile if and
only if r is a volatile read, and the
variable w.v is volatile if and only
if w is a volatile write.
Happens-before order is a partial
order. Happens-before order is given
by the transitive closure of
synchronizes-with edges and program
order. It must be a valid partial
order: reflexive, transitive and
antisymmetric.
The execution obeys
intra-thread consistency. For each
thread t, the actions performed by t
in A are the same as would be
generated by that thread in
program-order in isolation, with each
write wwriting the value V(w), given
that each read r sees the value
V(W(r)). Values seen by each read are
determined by the memory model. The
program order given must reflect the
program order in which the actions
would be performed according to the
intra-thread semantics of P.
The execution is happens-before consistent
(§17.4.6).
The execution obeys
synchronization-order consistency. For
all volatile reads r in A, it is not
the case that either so(r, W(r)) or
that there exists a write win A such
that w.v = r.v and so(W(r), w) and
so(w, r).
Useful Link : What do we really know about non-blocking concurrency in Java?
Thread B may have a CPU-local cache of those variables. A read of a volatile variable ensures that any intermediate cache flush from a previous write to the volatile is observed.
For an example, read the following link, which concludes with "Fixing Double-Checked Locking using Volatile":
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
If a variable is non-volatile, then the compiler and the CPU, may re-order instructions freely as they see fit, in order to optimize for performance.
If the variable is now declared volatile, then the compiler no longer attempts to optimize accesses (reads and writes) to that variable. It may however continue to optimize access for other variables.
At runtime, when a volatile variable is accessed, the JVM generates appropriate memory barrier instructions to the CPU. The memory barrier serves the same purpose - the CPU is also prevent from re-ordering instructions.
When a volatile variable is written to (by thread A), all writes to any other variable are completed (or will atleast appear to be) and made visible to A before the write to the volatile variable; this is often due to a memory-write barrier instruction. Likewise, any reads on other variables, will be completed (or will appear to be) before the
read (by thread B); this is often due to a memory-read barrier instruction. This ordering of instructions that is enforced by the barrier(s), will mean that all writes visible to A, will be visible B. This however, does not mean that any re-ordering of instructions has not happened (the compiler may have performed re-ordering for other instructions); it simply means that if any writes visible to A have occurred, it would be visible to B. In simpler terms, it means that strict-program order is not maintained.
I will point to this writeup on Memory Barriers and JVM Concurrency, if you want to understand how the JVM issues memory barrier instructions, in finer detail.
Related questions
What is a memory fence?
What are some tricks that a processor does to optimize code?
Threads are allowed to cache variable values that other threads may have since updated since they read them. The volatile keyword forces all threads to not cache values.
This is simply an additional bonus the memory model gives you, if you work with volatile variables.
Normally (i.e. in the absence of volatile variables and synchronization), the VM can make variables from one thread visible to other threads in any order it wants, or not at all. E.g. the reading thread could read some mixture of earlier versions of another threads variable assignments. This is caused by the threads being maybe run on different CPUs with their own caches, which are only sometimes copied to the "main memory", and additionally by code reordering for optimization purposes.
If you used a volatile variable, as soon as thread B read some value X from it, the VM makes sure that anything which thread A has written before it wrote X is also visible to B. (And also everything which A got guaranteed as visible, transitively).
Similar guarantees are given for synchronized blocks and other types of locks.

Does volatile influence non-volatile variables?

Okay, suppose I have a bunch of variables, one of them declared volatile:
int a;
int b;
int c;
volatile int v;
If one thread writes to all four variables (writing to v last), and another thread reads from all four variables (reading from v first), does that second thread see the values written to a, b and c by the first thread, even though they are not themselves declared volatile? Or can it possibly see stale values?
Since there seems to be some confusion: I'm not deliberately trying to do something unsafe. I just want to understand the Java memory model and the semantics of the volatile keyword. Pure curiosity.
I'm going to speak to what I think you may really be probing about—piggybacking synchronization.
The technique that it looks like you're trying to use involves using one volatile variable as a synchronization guard in concert with one or more other non-volatile variables. This technique is applicable when the following conditions hold true:
Only one thread will write to the set of values meant to be guarded.
The threads reading the set of values will read them only if the volatile guard value meets some criteria.
You don't mention the second condition holding true for your example, but we can examine it anyway. The model for the writer is as follows:
Write to all the non-volatile variables, assuming that no other thread will try to read them.
Once complete, write a value to the volatile guard variable that indicates that the readers' criteria is met.
The readers operate as follows:
Read the volatile guard variable at any time, and if its value meets the criteria, then
Read the other non-volatile variables.
The readers must not read the other non-volatile variables if the volatile guard variable does not yet indicate a proper value.
The guard variable is acting as a gate. It's closed until the writer sets it to a particular value, or set of values that all meet the criteria of indicating that the gate is now open. The non-volatile variables are guarded behind the gate. The reader is not permitted to read them until the gate opens. Once the gate is open, the reader will see a consistent view of the set of non-volatile variables.
Note that it is not safe to run this protocol repeatedly. The writer can't keep changing the non-volatile variables once it's opened the gate. At that point, multiple reader threads may be reading those other variables, and they can—though are not guaranteed—see updates to those variables. Seeing some but not all of those updates would yield inconsistent views of the set.
Backing up, the trick here is to control access to a set of variables without either
creating a structure to hold them all, to which an atomic reference could be swapped, um, atomically, or
using a lock to make writing to and reading from the entire set of variables mutually exclusive activities.
Piggybacking on top of the volatile guard variable is a clever stunt—not one to be done casually. Subsequent updates to the program can break the aforementioned fragile conditions, removing the consistency guarantees afforded by the Java memory model. Should you choose to use this technique, document its invariants and requirements in the code clearly.
Yes. volatile, locks, etc., setup the happens-before relationship, but it affects all variables (in the new Java Memory Model (JMM) from Java SE 5/JDK 1.4). Kind of makes it useful for non-primitive volatiles...
does that second thread see the values written to a, b and c by the first thread, even though they are not themselves declared volatile? Or can it possibly see stale values?
You will get stale reads, b/c you can't ensure that the values of a, b, c are the ones set after reading of v. Using state machine (but you need CAS to change the state) is a way to tackle similar issues but it's beyond the scope of the discussion.
Perhaps this part is unclear, after writing to v and reading first from v, you'd get the right results (non-stale reads), the main issue is that if you do
if (v==STATE1){...proceed...}, there is no guarantee some other thread would not be modifying the state of a/b/c. In that case, there will be state reads.
If you modify the a/b/c+v once only you'd get the correct result.
Mastering concurrency and and lock-free structures is a really hard one. Doug Lea has a good book on and most talks/articles of Dr. Cliff Click are a wonderful wealth, if you need something to start digging in.
Yes, volatile write "happens-before" next volatile read on the same variable.
While #seh is right on about consistency problems with multiple variables, there are use cases that less consistency is required.
For example, a writer thread updates some state variables; a reader thread displays them promptly. There's not much relation among the variables, we only care about reading the new values promptly. We could make every state variable volatile. Or we could use only one volatile variable as visibility guard.
However, the saving is only on the paper, performance wise there's hardly any difference. In either version, every state variable must be "flushed" by the writer and "loaded" by the reader. No free lunch.

A quote from "Java Threads" book about volatile keyword

I was just wondering if someone could explain the meaning of this:
Operations like increment and
decrement (e.g. ++ and --) can't be
used on a volatile variable because
these operations are syntactic sugar
for a load, change and a store.
I think increment and decrement should just work fine for a volatile variable, the only difference would be every time you read or write you would be accessing from/writing to main memory rather than from cache.
volatile variable only ensures visibility . It does not ensure atomicity. I guess, that is how the statement should be interpreted.
I think you're taking the quote out of context.
Of course ++ and -- can be applied to volatile variables. They just won't be atomic.
And since volatile often implies that they must be handled in an atomic manner, this is counter to the goal.
The problem with ++ and -- is that they might feel like they are atomic, when indeed they are not.
Doing a = a + 1 makes it (somewhat) explicit that it is not an atomic operation, but one might (wrongly) think that a++ is atomic.
The Java Language Specification does not have atomic operations for the ++ and -- operators. In other words, when you write code in the following manner:
a++;
the Java compiler actually emits code that is similar to the set of steps below (the actual instructions will vary depending on the nature of the variable):
Load the operand onto the stack using one of the operations for loading data.
Duplicate the value of the operand on the stack (for the purpose of returning later). This usually accomplished using a dup operation.
Increment the value on the stack. Usually accomplished using the iadd operation in the VM.
Return the value (obtained in step 2).
As you can observe, there are multiple operations in the VM for what is commonly thought to be an atomic operation. The VM can ensure atomicity only upto the level of an individual operation. Any further requirement can be achieved only via synchronization or other techniques.
Using the volatile keyword, allows other threads to obtain the most recent value of a variable; all read operations on a variable will return the recently updated value on a per-instruction basis. For example, if the variable a were to be volatile in the previous example, then a thread reading the value of a would see different values if it were to read a after instruction 2 and after instruction 3. Use of volatile does not protect against this scenario. It protects against the scenario where multiple threads see multiple values for a after instruction 2 (for instance).
Volatile does not garanty atomicity in an opeartion that involves multiple steps.
Look at it this way it I am reading a value and that is all am doing, the read operation is an atomic operation. It is a single step and hence the use of volatile here will be fine. If however I am reading that value and changing that value before writing back, that is a multistep operation and for this volatile does not manage the atomicity.
The increment and decrement opeartions are multi-stepped and hence the use of the volatile modifier is not sufficient.
Nope -- you use "volatile" to indicate that the variable can be changed by an external entity.
This would typically be some JNI C code, or, a special register linked to some hardware such as a thermometer. Java cannot guarantee that all JVMs on all architectures can will be capable of incrementing these values in a single machine cycle. So it doesnt let you do it anywhere.

Categories