Understanding intra-thread semantics

Understanding intra-thread semantics - java

Could you explain in simple words what "the program satisfies intra-thread semantic" means? Is it possible to provide simple examples of programs which satisfy and which don't satisfy such semantics?

The notion of intra-thread semantics is discussed in the JLS section 17.4, which covers the Java Memory Model. The JMM is a set of requirements and constraints on the execution of Java programs by JVMs. Here's the relevant section of text from 17.4:
The memory model determines what values can be read at every point in the program. The actions of each thread in isolation must behave as governed by the semantics of that thread, with the exception that the values seen by each read are determined by the memory model. When we refer to this, we say that the program obeys intra-thread semantics. Intra-thread semantics are the semantics for single-threaded programs, and allow the complete prediction of the behavior of a thread based on the values seen by read actions within the thread. To determine if the actions of thread t in an execution are legal, we simply evaluate the implementation of thread t as it would be performed in a single-threaded context, as defined in the rest of this specification.
This means that, as far as a single thread is concerned, the values visible in objects' fields are either the fields' initial values (zero, false, or null) or are values that this thread has previously written.
This is so obvious as to be elementary; why bother stating it?
Consider a single-threaded Java program with a few int fields:
field1 = 1; // 1
field2 = 2; // 2
field3 = field1 + field2; // 3
then clearly the value of field3 must be 3. This is because the values visible in field1 and field2 at line 3 must reflect the earlier values written at lines 1 and 2. It would be incorrect if the initial value of zero for field1 or field2 were used in the computation at line 3, since the assignment of those fields occurs earlier in the program than the computation.
What is less obvious are the constraints that are not present. For example, there is no constraint here over the ordering of the writes of field1 and field2. The JVM could execute line 2 before line 1 and the result of the program would be the same. Or, it could delay the writes to field1 and field2 and keep these values in registers, and do register-based addition at line 3. The actual writes of all the fields could be delayed until much later. Or they could even be omitted entirely if the values are subsequently overwritten by this thread. Again, the outcome of the program would be the same.
And that's the point: JVMs are free to rearrange the execution of a program (mainly so that it can run faster), but only as long it doesn't change the results of that program if it were run single-threaded. These constraints are referred to as intra-thread semantics. Any rearrangements that don't violate intra-thread semantics are permitted.
(Note that the paragraph quoted above talks about a "program that obeys intra-thread semantics" but what it really means is the execution of the program obeys intra-thread semantics. Text in later sections, such as 17.4.7, is more precise, referring to whether an execution of a program obeys intra-thread consistency or whether a set of actions performed is in accord with intra-thread semantics.)

Let's break this down: https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4
1 The memory model determines what values can be read at every point in the program. 2 The actions of each thread in isolation must behave as governed by the semantics of that thread, 3 with the exception that the values seen by each read are determined by the memory model. 4 When we refer to this, we say that the program obeys intra-thread semantics. Intra-thread semantics are the semantics for single-threaded programs, and allow the complete prediction of the behavior of a thread based on the values seen by read actions within the thread
1.
The memory model determines what values can be read at every point in the program
It means the JMM have rules for visibility and ordering that dictates the values for reads - rules that haven't been defined yet but makes the reader aware that they exist (they explain in 3 and 4 how it relates to intra-thread semantics)
2.
The actions of each thread in isolation must behave as governed by the semantics of that thread
It does not define what is "isolation" in this context, nor what is "semantics of that thread" but they probably meant Actions that do not fall under inter-thread actions, so all actions done by a thread that don't influence other threads will be read line by line (semantics of thread). Their "definition" for intra-thread actions (sort of, they only give an example) is:
"This specification is only concerned with inter-thread actions. We do not need to concern ourselves with intra-thread actions (e.g., adding two local variables and storing the result in a third local variable)" - ie any action that won't effect any shared memory.
3.
with the exception that the values seen by each read are determined by the memory model.
It means that the JMM apply its rules even when a thread executes code that have no influence on any other thread (2) - it dictates the values of these reads.
4.
When we refer to this, we say that the program obeys intra-thread semantics. Intra-thread semantics are the semantics for single-threaded programs, and allow the complete prediction of the behavior of a thread based on the values seen by read actions within the thread
To sum it up, they are saying - If you have a thread that perform an action in isolation, meaning manipulates (read or write) data not in any shared memory. the value of reads in that code would still be *decided by the JMM and all of it (complete) would be known to that same thread - This is what it means to obey intra-thread semantics
They almost implicitly describe here Sequential Consistency (SC) without actually saying it, the only missing part is inter-thread actions - This is described to be used later on in Program Order (*decided by the JMM) which needed all the definitions here to define when is SC used.
Program orders is the rule they kept mentioning without naming before, it says that if you look at all (total) the actions (intra and inter) in a thread and consider them as intra-thread semantic then they are all considered Sequential Consistent.
You can understand what is SC by reading Stuart Marks answer
As a side note I just have to say the phrasing in JLS for JMM is horrendous, they could not have made it more confusing if they tried to.

Related

Does 'volatile' guarantee that any thread reads the most recently written value?

From the book Effective Java:
While the volatile modifier performs no mutual exclusion, it guarantees that any thread that reads the field will see the most recently written value
SO and many other sources claim similar things.
Is this true?
I mean really true, not a close-enough model, or true only on x86, or only in Oracle JVMs, or some definition of "most recently written" that's not the standard English interpretation...
Other sources (SO example) have said that volatile in Java is like acquire/release semantics in C++. Which I think do not offer the guarantee from the quote.
I found that in the JLS 17.4.4 it says "A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order)." But I don't quite understand.
There are quite some sources for and against this, so I'm hoping the answer is able to convince that many of those (on either side) are indeed wrong - for example reference or spec, or counter-example code.

Is this true?
I mean really true, not a close-enough model, or true only on x86, or only in Oracle JVMs, or some definition of "most recently written" that's not the standard English interpretation...
Yes, at least in the sense that a correct implementation of Java gives you this guarantee.
Unless you are using some exotic, experimental Java compiler/JVM (*), you can essentially take this as true.
From JLS 17.4.5:
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
(*) As Stephen C points out, such an exotic implementation that doesn't implement the memory model semantics described in the language spec can't usefully (or even legally) be described as "Java".

The quote per-se is correct in terms of what is tries to prove, but it is incorrect on a broader view.
It tries to make a distinction of sequential consistency and release/acquire semantics, at least in my understanding. The difference is rather "thin" between these two terms, but very important. I have tried to simplify the difference at the beginning of this answer or here.
The author is trying to say that volatile offers that sequential consistency, as implied by that:
"... it guarantees that any thread.."
If you look at the JLS, it has this sentence:
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
The tricky part there is that subsequent and it's meaning, and it has been discussed here. What is really wants to mean is "subsequent that observes that write". So happens-before is guaranteed when the reader observes the value that the writer has written.
This already implies that a write is not necessarily seen on the next read, and this can be the case where speculative execution is allowed. So in this regard, the quote is miss-leading.
The quote that you found:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order)
is a complicated to understand without a much broader context. In simple words, it established synchronizes-with order (and implicitly happens-before) between two threads, where volatile v variables is a shared variable. here is an answer where this has broader explanation and thus should make more sense.

It is not true. JMM is based on sequential consistency and for sequential consistency real time ordering isn't guaranteed; for that you need linearizability. In other words, reads and writes can be skewed as long as the program order isn't violated (or as long is it can't be proven po was violated).
A read of volatile variable a, needs to see the most recent written value before it in the memory order. But that doesn't imply real time ordering.
Good read about the topic:
https://concurrency-interest.altair.cs.oswego.narkive.com/G8KjyUtg/relativity-of-guarantees-provided-by-volatile.
I'll make it concrete:
Imagine there are 2 CPU's and (volatile) variable A with initial value 0. CPU1 does a store A=1 and CPU2 does a load of A. And both CPUs have the cacheline containing A in SHARED state.
The store is first speculatively executed and written to the store buffer; eventually the store commits and retires, but since the stored value is still in the store buffer; it isn't visible yet to the CPU2. Till so far it wasn't required for the cacheline to be in an EXCLUSIVE/MODIFIED state, so the cacheline on CPU2 still contains the old value and hence CPU2 can still read the old value.
So in the real time order, the write of A is ordered before the read of A=0, but in the synchronization order, the write of A=1 is ordered after the read of A=0.
Only when the store leaves the store buffer and wants to enter the L1 cache, the request for ownership (RFO) is send to all other CPU's which set the cacheline containing A to INVALID on CPU2 (RFO prefetching I'll leave out of the discussion). If CPU2 would now read A, it is guaranteed to see A=1 (the request will block till CPU1 has completed the store to the L1 cache).
On acknowledgement of the RFO the cacheline is set to MODIFIED on CPU1 and the store is written to the L1 cache.
So there is a period of time between when the store is executed/retired and when it is visible to another CPU. But the only way to determine this is if you would add special measuring equipment to the CPUs.
I believe a similar delaying effect can happen on the reading side with invalidation queues.
In practice this will not be an issue because store buffers have a limited capacity and need to be drained eventually (so a write can't be invisible indefinitely). So in day to day usage you could say that a volatile read, reads the most recent write.
A java volatile write/read provides release/acquire semantics, but keep in mind that the volatile write/read is stronger than release/acquire semantics. A volatile write/read is sequential consistent and release/acquire semantics isn't.

VarHandle get/setOpaque

I keep fighting to understand what VarHandle::setOpaque and VarHandle::getOpaque are really doing. It has not been easy so far - there are some things I think I get (but will not present them in the question itself, not to muddy the waters), but overall this is miss-leading at best for me.
The documentation:
Returns the value of a variable, accessed in program order...
Well in my understanding if I have:
int xx = x; // read x
int yy = y; // read y
These reads can be re-ordered. On the other had if I have:
// simplified code, does not compile, but reads happen on the same "this" for example
int xx = VarHandle_X.getOpaque(x);
int yy = VarHandle_Y.getOpaque(y);
This time re-orderings are not possible? And this is what it means "program order"? Are we talking about insertions of barriers here for this re-ordering to be prohibited? If so, since these are two loads, would the same be achieved? via:
int xx = x;
VarHandle.loadLoadFence()
int yy = y;
But it gets a lot trickier:
... but with no assurance of memory ordering effects with respect to other threads.
I could not come up with an example to even pretend I understand this part.
It seems to me that this documentation is targeted at people who know exactly what they are doing (and I am definitely not one)... So can someone shed some light here?

Well in my understanding if I have:
int xx = x; // read x
int yy = y; // read y
These reads can be re-ordered.
These reads may not only happen to be reordered, they may not happen at all. The thread may use an old, previously read value for x and/or y or values it did previously write to these variables whereas, in fact, the write may not have been performed yet, so the “reading thread” may use values, no other thread may know of and are not in the heap memory at that time (and probably never will).
On the other had if I have:
// simplified code, does not compile, but reads happen on the same "this" for example
int xx = VarHandle_X.getOpaque(x);
int yy = VarHandle_Y.getOpaque(y);
This time re-orderings are not possible? And this is what it means "program order"?
Simply said, the main feature of opaque reads and writes, is, that they will actually happen. This implies that they can not be reordered in respect to other memory access of at least the same strength, but that has no impact for ordinary reads and writes.
The term program order is defined by the JLS:
… the program order of t is a total order that reflects the order in which these actions would be performed according to the intra-thread semantics of t.
That’s the evaluation order specified for expressions and statements. The order in which we perceive the effects, as long as only a single thread is involved.
Are we talking about insertions of barriers here for this re-ordering to be prohibited?
No, there is no barrier involved, which might be the intention behind the phrase “…but with no assurance of memory ordering effects with respect to other threads”.
Perhaps, we could say that opaque access works a bit like volatile was before Java 5, enforcing read access to see the most recent heap memory value (which makes only sense if the writing end also uses opaque or an even stronger mode), but with no effect on other reads or writes.
So what can you do with it?
A typical use case would be a cancellation or interruption flag that is not supposed to establish a happens-before relationship. Often, the stopped background task has no interest in perceiving actions made by the stopping task prior to signalling, but will just end its own activity. So writing and reading the flag with opaque mode would be sufficient to ensure that the signal is eventually noticed (unlike the normal access mode), but without any additional negative impact on the performance.
Likewise, a background task could write progress updates, like a percentage number, which the reporting (UI) thread is supposed to notice timely, while no happens-before relationship is required before the publication of the final result.
It’s also useful if you just want atomic access for long and double, without any other impact.
Since truly immutable objects using final fields are immune to data races, you can use opaque modes for timely publishing immutable objects, without the broader effect of release/acquire mode publishing.
A special case would be periodically checking a status for an expected value update and once available, querying the value with a stronger mode (or executing the matching fence instruction explicitly). In principle, a happens-before relationship can only be established between the write and its subsequent read anyway, but since optimizers usually don’t have the horizon to identify such a inter-thread use case, performance critical code can use opaque access to optimize such scenario.

The opaque means that the thread executing opaque operation is guaranteed to observe its own actions in program order, but that's it.
Other threads are free to observe the threads actions in any order. On x86 it is a common case since it has
write ordered with store-buffer forwarding
memory model so even if the thread does store before load. The store can be cached in the store buffer and some thread being executed on any other core observes the thread action in reverse order load-store instead of store-load. So opaque operation is done on x86 for free (on x86 we actually also have acquire for free, see this extremely exhaustive answer for details on some other architectures and their memory models: https://stackoverflow.com/a/55741922/8990329)
Why is it useful? Well, I could speculate that if some thread observed a value stored with opaque memory semantic then subsequent read will observe "at least this or later" value (plain memory access does not provide such guarantees, does it?).
Also since Java 9 VarHandles are somewhat related to acquire/release/consume semantic in C I think it is worth noting that opaque access is similar to memory_order_relaxed which is defined in the Standard as follows:
For memory_order_relaxed, no operation orders memory.
with some examples provided.

I have been struggling with opaque myself and the documentation is certainly not easy to understand.
From the above link:
Opaque operations are bitwise atomic and coherently ordered.
The bitwise atomic part is obvious. Coherently ordered means that loads/stores to a single address have some total order, each reach sees the most recent address before it and the order is consistent with the program order. For some coherence examples, see the following JCStress test.
Coherence doesn't provide any ordering guarantees between loads/stores to different addresses so it doesn't need to provide any fences so that loads/stores to different addresses are ordered.
With opaque, the compiler will emit the loads/stores as it sees them. But the underlying hardware is still allowed to reorder load/stores to different addresses.
I upgraded your example to the message-passing litmus test:
thread1:
X.setOpaque(1);
Y.setOpaque(1);
thread2:
ry = Y.getOpaque();
rx = X.getOpaque();
if (ry == 1 && rx == 0) println("Oh shit");
The above could fail on a platform that would allow for the 2 stores to be reordered or the 2 loads (again ARM or PowerPC). Opaque is not required to provide causality. JCStress has a good example for that as well.
Also, the following IRIW example can fail:
thread1:
X.setOpaque(1);
thread2:
Y.setOpaque(1);
thread3:
rx_thread3 = X.getOpaque();
[LoadLoad]
ry_thread3 = Y.getOpaque();
thread4:
ry_thread4 = Y.getOpaque();
[LoadLoad]
rx_thread4 = X.getOpaque();
Can it be that we end up with rx_thread3=1,ry_thread3=0,ry_thread4=1 and rx_thread4 is 0?
With opaque this can happen. Even though the loads are prevented from being reordered, opaque accesses do not require multi-copy-atomicity (stores to different addresses issued by different CPUs can be seen in different orders).
Release/acquire is stronger than opaque, since with release/acquire it is allowed to fail, therefor with opaque, it is allowed to fail. So Opaque is not required to provide consensus.

Instruction reordering & happens-before relationship [duplicate]

This question already has answers here:
How to understand happens-before consistent
(5 answers)
Closed 4 years ago.
In the book Java Concurrency In Practice, we are told several time that the instructions of our program can be reordered, either by the compiler, by the JVM at runtime, or even by the processor. So we should assume that the executed program will not have its instructions executed in exactly the same order than what we specified in the source code.
However, the last chapter discussing Java Memory Model provides a listing of happens-before rules indicating which instruction ordering are preserved by the JVM. The first of these rules is:
"Program order rule. Each action in a thread happens before every action in that thread that comes later in the program order."
I believe "program order" refers to the source code.
My question: assuming this rule, I wonder what instruction may be actually reordered.
"Action" is defined as follow:
The Java Memory Model is specified in terms of actions, which include reads and writes to variables, locks and unlocks of monitors, and starting and joining with threads. The JMM defines a partial ordering called happens before on all actions within the program. To guarantee that the thread executing action B can see the results of action A (whether or not A and B occur in different threads), there must be a happens before relationship between A and B. In the absence of a happens before ordering between two operations, the JVM is free to reorder them as it pleases.
Other order rules mentionned are:
Monitor lock rule. An unlock on a monitor lock happens before every subsequent lock on that same monitor lock.
Volatile variable rule. A write to a volatile field happens before every subsequent read of that same field.
Thread start rule. A call to Thread.start on a thread happens before every action in the started thread.
Thread termination rule. Any action in a thread happens before any other thread detects that thread has terminated, either by successfully return from Thread.join or by Thread.isAlive returning false.
Interruption rule. A thread calling interrupt on another thread happens before the interrupted thread detects the interrupt (either by having InterruptedException thrown, or invoking isInterrupted or interrupted).
Finalizer rule. The end of a constructor for an object happens before the start of the finalizer for that object.
Transitivity. If A happens before B, and B happens before C, then A happens before C.

The key point of the program order rule is: in a thread.
Imagine this simple program (all variables initially 0):
T1:
x = 5;
y = 6;
T2:
if (y == 6) System.out.println(x);
From T1's perspective, an execution must be consistent with y being assigned after x (program order). However from T2's perspective this does not have to be the case and T2 might print 0.
T1 is actually allowed to assign y first as the 2 assignements are independent and swapping them does not affect T1's execution.
With proper synchronization, T2 will always print 5 or nothing.
EDIT
You seem to be misinterpreting the meaning of program order. The program order rule boils down to:
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y) (i.e. x happens-before y).
happens-before has a very specific meaning in the JMM. In particular, it does not mean that y=6 must be subsequent to x=5 in T1 from a wall clock perspective. It only means that the sequence of actions executed by T1 must be consistent with that order. You can also refer to JLS 17.4.5:
It should be noted that the presence of a happens-before relationship between two actions does not necessarily imply that they have to take place in that order in an implementation. If the reordering produces results consistent with a legal execution, it is not illegal.
In the example I gave above, you will agree that from T1's perspective (i.e. in a single threaded program), x=5;y=6; is consistent with y=6;x=5; since you don't read the values. A statement on the next line is guaranteed, in T1, to see those 2 actions, regardless of the order in which they were performed.

Example for a correctly synchronized program with data races in Java memory model

In JLS, §17.4.5. Happens-before Order, it says that
A program is correctly synchronized if and only if all sequentially consistent executions are free of data races.
According to discussion in Does a correctly synchronized program still allow data race?(Part I),we get following conclusion:
A program can be correctly synchronized and have data races.
The combination of two conclusions means that it must exists such an example:
All sequentially consistent executions of a program are data race free, but the normal executions (executions other than sequentially consistent executions) of such a program contain data race.
After heavy consideration, I still can not find such a code sample. So how about you?

It is not true that "A program can be correctly synchronized and have data races." The example by assylias in that discussion is not correctly synchronized. It is correct from the higher-level, functional standpoint—the data race it contains does not manifest itself as a bug. It is a so-called "benign" data race, but that is irrelevant when discussing the JLS definitions.
A program whose sequentially consistent execution don't contain data races is guaranteed not to contain data races in any execution, sequentially consistent or not. As the JLS says,
This is an extremely strong guarantee for programmers. Programmers do not need to reason about reorderings to determine that their code contains data races. Therefore they do not need to reason about reorderings when determining whether their code is correctly synchronized. Once the determination that the code is correctly synchronized is made, the programmer does not need to worry that reorderings will affect his or her code.
So please note that the definition of a correctly synchronized program is narrowed down to only sequentially consistent executions as a courtesy to the programmer, giving him the strong guarantee that sequentially consistent executions are the only ones he or she needs to reason about and all other executions will automatically have the same guarantee.
UPDATE
It is easy to get lost in the terminology used by the JMM and subtle misinterpretations result in deep misunderstandings later on. Therefore take these to heart:
an execution is simply a set of inter-thread actions. There is no a priori order to it. In particular, there is no time order to it.
This is a counterintuitive definition, so we must be careful about it: each time we say execution, we must be sure to imagine a bag of actions, never a string of them. Whenever we define a partial order, we should imagine several bags lined up.
a program contains instructions to perform actions. Each such instruction can be executed zero or more times, contributing zero or more distinct actions to the particular execution;
an execution may or may not have an execution order, which is a total order over all actions;
a sequentially consistent execution is what you would get if all your shared variables were volatile. This kind of execution always has a definite execution order;
a sequentially inconsistent execution is your real-world execution of a program: non-volatile variables are involved and the compiler reorders reads and writes, there are caches, thread-local stores, etc.
the synchronization order is the total order over all synchronization actions done by an execution. With respect to the execution itself it is still a partial order because not all actions are synchronization actions; most notably, reads and writes of non-volatile variables. Every execution, be it sequentially consistent or not, has a definite synchronization order;
likewise, the happens-before order is defined for a specific execution of a program and is derived as a transitive closure of the synchronization order with the program order.
It is interesting to note that if all your shared vars were volatile, then the synchronization order would become a total order and as such would meet the definition of the execution order. This way we come from a different angle to the conclusion that all executions of such a program would be sequentially consistent.
I have dug deep to get to the bottom of the JLS bug in the definition of a data race:
"When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race."
First of all, it is not the program that contains data races, but a program execution. If we refer back to the original paper defining the Java Memory Model, we'll see this corrected:
"Two accesses x and y form a data race in an execution of a program if they are from diﬀerent threads, they conﬂict, and they are not ordered by happens-before."
However, this still leaves us with actions on volatile vars being defined as data races. Consider the following happens-before graph:
Thread W w1 ----> w2
|
\
Thread R r0 ----> r1
r1 observerd the write w1. It was preceded by another read, r0, and the write was followed by another one, w2. Now notice there is no path between r0 and either w1 or w2; likewise between r1 and w2. All these are examples of a data race by the definition.
Digging even deeper, however, I have found this post on the memoryModel mailing list. It says "a data
race should be defined as conflicting actions on non-volatile variables
that are not ordered by happens-before". Only with that addition would the loophole be closed, but this has still not entered the official JLS release.

Does volatile influence non-volatile variables?

Okay, suppose I have a bunch of variables, one of them declared volatile:
int a;
int b;
int c;
volatile int v;
If one thread writes to all four variables (writing to v last), and another thread reads from all four variables (reading from v first), does that second thread see the values written to a, b and c by the first thread, even though they are not themselves declared volatile? Or can it possibly see stale values?
Since there seems to be some confusion: I'm not deliberately trying to do something unsafe. I just want to understand the Java memory model and the semantics of the volatile keyword. Pure curiosity.

I'm going to speak to what I think you may really be probing about—piggybacking synchronization.
The technique that it looks like you're trying to use involves using one volatile variable as a synchronization guard in concert with one or more other non-volatile variables. This technique is applicable when the following conditions hold true:
Only one thread will write to the set of values meant to be guarded.
The threads reading the set of values will read them only if the volatile guard value meets some criteria.
You don't mention the second condition holding true for your example, but we can examine it anyway. The model for the writer is as follows:
Write to all the non-volatile variables, assuming that no other thread will try to read them.
Once complete, write a value to the volatile guard variable that indicates that the readers' criteria is met.
The readers operate as follows:
Read the volatile guard variable at any time, and if its value meets the criteria, then
Read the other non-volatile variables.
The readers must not read the other non-volatile variables if the volatile guard variable does not yet indicate a proper value.
The guard variable is acting as a gate. It's closed until the writer sets it to a particular value, or set of values that all meet the criteria of indicating that the gate is now open. The non-volatile variables are guarded behind the gate. The reader is not permitted to read them until the gate opens. Once the gate is open, the reader will see a consistent view of the set of non-volatile variables.
Note that it is not safe to run this protocol repeatedly. The writer can't keep changing the non-volatile variables once it's opened the gate. At that point, multiple reader threads may be reading those other variables, and they can—though are not guaranteed—see updates to those variables. Seeing some but not all of those updates would yield inconsistent views of the set.
Backing up, the trick here is to control access to a set of variables without either
creating a structure to hold them all, to which an atomic reference could be swapped, um, atomically, or
using a lock to make writing to and reading from the entire set of variables mutually exclusive activities.
Piggybacking on top of the volatile guard variable is a clever stunt—not one to be done casually. Subsequent updates to the program can break the aforementioned fragile conditions, removing the consistency guarantees afforded by the Java memory model. Should you choose to use this technique, document its invariants and requirements in the code clearly.

Yes. volatile, locks, etc., setup the happens-before relationship, but it affects all variables (in the new Java Memory Model (JMM) from Java SE 5/JDK 1.4). Kind of makes it useful for non-primitive volatiles...

does that second thread see the values written to a, b and c by the first thread, even though they are not themselves declared volatile? Or can it possibly see stale values?
You will get stale reads, b/c you can't ensure that the values of a, b, c are the ones set after reading of v. Using state machine (but you need CAS to change the state) is a way to tackle similar issues but it's beyond the scope of the discussion.
Perhaps this part is unclear, after writing to v and reading first from v, you'd get the right results (non-stale reads), the main issue is that if you do
if (v==STATE1){...proceed...}, there is no guarantee some other thread would not be modifying the state of a/b/c. In that case, there will be state reads.
If you modify the a/b/c+v once only you'd get the correct result.
Mastering concurrency and and lock-free structures is a really hard one. Doug Lea has a good book on and most talks/articles of Dr. Cliff Click are a wonderful wealth, if you need something to start digging in.

Yes, volatile write "happens-before" next volatile read on the same variable.
While #seh is right on about consistency problems with multiple variables, there are use cases that less consistency is required.
For example, a writer thread updates some state variables; a reader thread displays them promptly. There's not much relation among the variables, we only care about reading the new values promptly. We could make every state variable volatile. Or we could use only one volatile variable as visibility guard.
However, the saving is only on the paper, performance wise there's hardly any difference. In either version, every state variable must be "flushed" by the writer and "loaded" by the reader. No free lunch.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.