Happens before and program order in Java Memory Model

Happens before and program order in Java Memory Model - java

I have some question regarding program order and how it affects reorderings in the JMM.
In the Java Memory Model, program order (po) is defined as the total order of actions in each thread in a program. According to the JLS, this induces happens-before (hb) edges:
If x and y are actions of the same thread and x comes before y in
program order, then hb(x, y) (i.e. x happens-before y).
So for a simple program P:
initially, x = y = 0
T1 | T2
-----------|-----------
1. r1 = x | 3. r2 = y
2. y = 1 | 4. x = r2
I think po(1, 2) and po(3, 4). Thus, hb(1, 2) and hb(3, 4).
Now suppose I wanted to reorder some of these statements, giving me P':
initially, x = y = 0
T1 | T2
-----------|-----------
2. y = 1 | 3. r2 = y
1. r1 = x | 4. x = r2
According to this paper, we can reorder any two adjacent statements (e.g. 1 and 2), provided that the reordering doesn't eliminate any transitive happens-before edges in any valid execution. However, since hb is defined (partially) by po, and po is a total order over a thread's actions, it seems to me that it would be impossible to reorder any two statements without violating hb, thus P' is not a legal transformation.
My questions are:
Is my understanding of po and hb correct, and have I correctly defined po and hb with respect to the above program P?
Where is my understanding about reordering with regards to hb failing?

You're missing this part of the JLS:
It should be noted that the presence of a happens-before relationship between two actions does not necessarily imply that they have to take place in that order in an implementation. If the reordering produces results consistent with a legal execution, it is not illegal.
In your case, since 1 and 2 are unrelated, they can be flipped. Now if 2 had been y = r1, then 1 must happen before 2 for the right result.
The real problem occurs with multi-processor execution. Without any happen-before boundaries, T2 may observe 2 happening before 1, regardless of execution order.
This is because of CPU caching. Let's say T1 executed 1 and 2, in any order. Since no happen-before boundary exist, these actions are still in CPU cache, and depending on other needs, the part of the cache containing the result of 2 may be flushed before the part of the cache that contains the result of 1.
If T2 executes between those two cache flush events, it'll observe 2 has happened and 1 hasn't, i.e. 2 happened before 1, as far as T2 knows.
If this is not allowed, then T1 must establish a happens-before boundary between 1 and 2.
In Java there are various ways of doing that. The old style would be to put 1 and 2 into separate synchronized blocks, because the start and end of a synchronized block is a happens-before boundary, i.e. any action before the block happens before actions inside the block, and any action inside the block happens before actions coming after the block.

What you have described as P', is in fact not a different program, but an execution trace of the same program P. It could be a different program, but then it would have different po, and therefore different hb.
Happens-before relation restricts statement reordering with regards to their observable effect, not their execution order. Action 1 happens-before 2, but they don't observe each other's result, so they are allowed to be reordered.
hb guarantees that you will observe that two actions were executed in-order, but only from synchronized context (i.e. from other actions forming hb with 1 and 2). You may think of 1 and 2 saying: Let's swap. No one's watching!.
Here is a good example from JLS that reflects happens-before idea quite well:
For example, the write of a default value to every field of an object constructed by a thread need not happen before the beginning of that thread, as long as no read ever observes that fact
In practice, it is rarely possible to order default-value writes of all objects constructed by a thread before it starts, even though they form synchronized-with edge with every action in that thread. A starting thread may not know what, and how many objects it will construct in run time. But once you have a reference to an object, you will observe that default value writes have already happened. Ordering default writes of an object not yet constructed (or known to be constructed) often cannot be reflected in execution, but it still does not violate happens-before relation, because it is all about observable effect.

I think a key issue is with your construction P'. It implies that the way re-ordering works is that re-ordering is global - the entire program is re-ordered in single way (on each execution) which obeys the memory model. Then you are trying to reason about this P' and find out that no interesting re-orderings are possible!
What actually occurs is that there is no particular global order for statements not related by a hb relationship, so different threads can see different apparent orders on the same execution. In your example, there are no edges between {1,2} and {3,4} statements in one set can see those in the other set in any order. For example, it is possible that T2 observes 2 before 1, but that then T3, which is identical to T2 (with its own private variables), observes the opposite! So there is no single reordering P' - each thread may observe their own reorderings, as long as they are consistent with the JMM rules.

I have some question regarding program order and how it affects reorderings in the JMM.
Strictly speaking about program order: it simply can't affect anything, at least not in a perceivable way. Program order is not something that can be "broken"; it exists only so that a general model about a program starts to take shape.
In other words, program order is only needed so that we know how the original source code looked like. It is also important to note such a statement:
Among all the inter-thread actions performed by each thread t, the program order of t is a total order that reflects the order in which these actions would be performed
Not will be performed, but would. So, po does not say the order in which actions will happen, it only says the order in the original source code.
Yes, po will also bring hb, but A happens-before B does not mean A actually happening before B. A great article here about this, and the most important part here:
(2) still behaves the same as it would have even if the effects of (1) had been visible, which is effectively the same as (1)’s effects being visible.
Since your variables x and y are plain variable and there is no dependency between (1) and (2), that reordering is legal. The perceivable outcome for (1) and (2) for T1 is the same, no matter the order in which (1) and (2) get executed; and because x and y are plain variables, it is allowed for those actions to be reordered.

Related

Why is synchronization order defined as total order over all of the synchronization actions?

While studying Java Memory Model, I was confused by the definition of synchronization order (SO). It is said that SO is a total order over all of the synchronization actions (SA) of an execution. But what is the point of talking about all SA of an execution? I can't figure out how this might be useful to me. How can I think about all SA? I understand the meaning of the following statement:
For each thread t, the synchronization order of the synchronization
actions in t is consistent with the program order of t.
The usefulness of the statement is obvious, and I can easily use it. But definition of SO isn't clear for me. Therefore, I cannot figure out how to use it when it comes to SO over several threads. That worries me. How do you understand SO and how do you use it when writing programs?

SO is not something you deal with on a day to day basis. It is just a basis to construct the happens-before relation.
There are a few orders involved in the happens-before relation.
There is the program order (PO): so the order in which loads/stores get executed by a single thread as specified by the Java source code before any optimizations have been applied. The program order creates a total order for all loads/stores of a single thread. But it is a partial order because it doesn't order loads/stores of different threads.
Then there is the synchronization order (SO): which is a total order over all synchronization actions (lock acquire/release, volatile load/store etc). Because each synchronization action is atomic, they automatically form a total order. So it will even order a release of a lock A and an acquire of a lock B. The SO is consistent with the PO.
Then we have the synchronize-with (SW) order: the SW is a sub-order of SO so that it only captures the synchronizes-with relation. E.g. if a volatile write of X is before a volatile read of X in the SO, then the synchronizes-with relation will order these 2. It will actually order the the write of X with all subsequent reads of X. But unlike SO, it will not order e.g. a volatile write of X and a volatile read of Y. SW will also be consistent with PO.
The happens-before order is defined as the transitive closure of the union of the PO and SW.
Note 1:
The reason that SW and SO need to be consistent with PO is that we do not want that either SW/SO say a->b and the PO says b->a. Because this would lead to a cycle in the happens-before relation and would render it invalid because of a causal loop.
Note 2:
Why do synchronization instructions create a total order? Well; they don't. But we can pretend a total order exist. If CPU1 is executing volatile store A and and CPU2 is executing a volatile store B (different variables), then it is impossible for A,B to be causally related, therefor A-/->B and B-/->A (doesn't happen-before). So we now have 2 actions which are not ordered with respect to each other. The cool thing is that we can just pick any order because nobody can prove otherwise. A more technical argument is that the SO is DAG and every DAG always has at least 1 topological sort which is a total order.

To add to the answer above: total order of synchronization actions is also required for SC-DRF.
Sequential consistency for data-race-free programs (SC-DRF) is one of the key features of Java Memory Model.
From JLS 17.4.5:
If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent (§17.4.3).
This is an extremely strong guarantee for programmers. Programmers do not need to reason about reorderings ...
Sequential consistency (see JLS 17.4.3) means that there is a single total order over all program actions, the same for every thread.
But if synchronization actions weren't in totally ordered, then SC-DRF would be violated in some cases.
One of these cases would be famous IRIW:
volatile int x, y; // initially zero
Thread 1 | Thread 2 | Thread 3 | Thread 4
x = 1; | y = 1; | while(x != 1); | while(y != 1);
| | int r1 = y; | int r2 = x;
When x and y are volatile, then result (r1=0, r2=0) is impossible.
Let's replace volatile with setRelease+getAcquire — we only lose total order over synchronization actions.
From VarHandle:
In addition to obeying Acquire and Release properties, all Volatile operations are totally ordered with respect to each other.
int x, y; // initially zero, with VarHandles X and Y
Thread 1 | Thread 2 | Thread 3 | Thread 4
X.setRelease(this, 1); | Y.setRelease(this, 1); | while(X.getAcquire(this) != 1); | while(Y.getAcquire(this) != 1);
| | int r1 = Y.getAcquire(this); | int r2 = X.getAcquire(this);
Now result (r1=0, r2=0) is possible
=> Threads 3 and Threads 4 see writes to x and y in different order
=> violation of SC, which requires a single total order over all program actions, the same for every thread
So, SC-DRF requires total order of synchronization actions.
There is a similar example in C++ docs for memory_order.

What does "synchronization actions are totally ordered" mean?

I am reading Java Concurrency in Practice, in "16.1.3 The Java Memory Model in 500 words or less", it says:
The Java Memory Model is specified in terms of actions, which include reads and writes to variables, locks and unlocks of monitors, and starting and joining with threads. The JMM defines a partial ordering called happens-before on all actions within the program. To guarantee that the thread executing action B can see the results of action A (whether or not A and B occur in different threads), there must be a happens-before relationship between A and B. In the absence of a happens-before ordering between two operations, the JVM is free to reorder them as it pleases.
Even though actions are only partially ordered, synchronization actions—lock acquisition and release, and reads and writes of volatile variables—are totally ordered. This makes it sensible to describe happens-before in terms of “subsequent” lock acquisitions and reads of volatile variables.
About "partial ordering", I have found this and this, but I don't quite understand "Even though actions are only partially ordered, synchronization actions—lock acquisition and release, and reads and writes of volatile variables—are totally ordered.". What does "synchronization actions are totally ordered" mean?

Analyzing the statement "synchronization actions are totally ordered":
"synchronization actions" is a set S of program operations (actions)
we have a relation R over set S : it is the happens-before relation. That is, given program statements a and b, aRb if and only if a happens-before b.
Then what the statement says, is "relation R is total over S".
"relation R is total over S", means that for every two operations a,b from set S (with a!=b), either aRb, or bRa. That is, either a happens-before b, or b happens-before a.
If we define the set S as the set of all lock acquisitions and lock releases performed on the same lock object X; then the set S is totally ordered by the happens-before relation: let be a the acquisition of lock X performed by thread T1, and b the lock acquisition performed by thread T2. Then either a happens-before b (in case T1 acquires the lock first. T1 will need to release the lock first, then T2 will be able to acquire it); or b happens-before a (in case T2 acquires the lock first).
Note: not all relations are total.
In example, the relation <= is total over the real numbers. That is, for every pair a,b of real numbers, it is true that either a<=b or b<=a. The total order here means that given any two items, we can always decide which comes first wrt. the given relation.
But the relation P: "is an ancestor of", is not a total relation over the set of all humans. Of course, for some pairs of humans a,b it is true that either aPb (a is an ancestor of b), or bPa (b is an ancestor of a). But for most of them, neither aPb nor bPa is true; that is, we can't use the relation to decide which item comes "first" (in genealogical terms).
Back to program statements, the happens-before relation R is obviously partial, over the set of all program statements (like in the "ancestor-of" example): given un-synchronized actions a,b (any operations performed by different threads, in absence of proper synchronization), neither aRb nor bRa holds.

Why are two writes to the same variable conflicting in the Java memory model?

The absence of data races is the prerequisite for sequential consistency. Data races are caused by conflicting accesses. And two accesses to the same variable are conflicting if at least one of the accesses is a write.
See below quotes from the JLS7 for reference.
I understand this definition for the case, where one operation is a read access and the other operation is a write access. However, I do not understand why it is required (for the memory model), if there are two write operations to the same variable.
Question: What's the rationale for not guaranteeing sequential consistency in case of two write operations to the same variable, that are not ordered by a happens-before relationship?
§17.4.1: [..] Two accesses to (reads of or writes to) the same variable are said to be conflicting if at least one of the accesses is a write.
§17.4.5: [..] When a program contains two conflicting accesses that are not ordered by a happens-before relationship, it is said to contain a data race. [..] A program is correctly synchronized if and only if all sequentially consistent executions are free of data races. If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent.

If two write accesses are not in a happens-before relationship, it is unspecified which will happen last, i.e. which assignment wins. For instance, the program
static int x;
static int y;
public static void main(String[] args) {
Thread t1 = new Thread() {
#Override public void run() {
x = 1;
y = 1;
}
};
Thread t2 = new Thread() {
#Override public void run() {
y = 2;
x = 2;
}
};
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(x + "" + y);
}
may print 11, 12, 21, or 22, even though the only data races are between writes, and 12 can not be obtained by a sequentially consistent execution.

Consider for example a long variable on a 32 bit architecture. It will take two writes for a long. If two threads try that concurrently there are sequences leaving the variable in an inconsistent state.
thread1: write high bits write low bits
thread2: write high bits, write low bits
This will result in the high bits from thread2 and the low bits from thread1.

Intuitively, sequential consistency means that the execution of the multithreaded program appears as if the program was executed one statement at a time following the original program order, i.e. the actual order of statements in the code that developpers see. Sequential consistency is how people intuitively reason about concurrency.
The main point here is the verb appear. Indeed, the compiler and the VM have the liberty to perform many optimizations behind the hood, as long as they don't break sequential consistency.
According to the memory model, a program will appear sequentially consistent only if it is correctly synchronized. In other words: if a program is not correctly synchronized, its execution at run time might correspond to an execution you cannot reach by executing one statement at a time in the original program order.
Let's consider the original code
T1 T2
a = 3 b = 5
b = 4 a = 6
Sequentially consistent executions can be a=3,b=4,b=5,a=6, or a=3,b=5,a=6,b=4, or b=5,a=6,a=3,b=4 or a=3,b=5,b=4,a=6 or b=5,a=3,b=4,a=6 or b=5,a=3,a=6,b=4 (all the possible interleaving)
To guarantee sequential executions in the JVM, you should wrap each of the four assignments within a synchronized block. Otherwise, the compiler and VM are authorized to do optimizations that could break the intuitive sequential consistency. For instance, they could decide to reorder the statements of T1 to be b=4,a=3. The code will not run in the original program order. With this reordering, the following execution could happen: b=4,b=5,a=6,a=3, resulting in b=5,a=3. This state couldn't be reached with sequential consistency.
The reason why sequential consistency can be broken if the program is not correctly synchronized is that optimizations consider the consistency of individual threads, not the whole program. Here, swapping the assignements in T1 does not compromise the logic of T1 if taken in isolation. However, it compromises the logic of the interleaving of threads T1 and T2, since they mutate the same variables, i.e. they have a data race. If the assignments were wrapped into synchronized blocks, the reordering wouldn't be legal.
There's something true in your observation that if you don't read the heap, you won't actually notice the race that occured. However, it is safe to assume that any variable written to is also read at time, otherwise it has no purpose. As this small example should have examplified, the writes should not race or they could corrupt the heap, which can have unintended consequences later on.
Now, to make the situation worse, reads and writes aren't atomic on the JVM (reading and write doubles need to memory accesses). So if they race, they can corrupt the heap not only in the sense it's not consistent, but also in the sense that it contains value that never really existed.
Finally, the result of the assignment expression is the value of the variable after the assignment has occurred. So x = ( y = z ) is valid. This assumes that the write did not race with a concurrent race and return the value that was written.
To make the story short: if reads and writes are not properly synchronized, it becomes very very hard to guarantee anything about their effect.

see http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.5-500
We want two writes to have an happens-before relationship so that the later one can shadow the earlier one. Consider this example
hb(w1, r1), hb(w2, r1), hb(r1, r2), but not hb(w1, w2) or hb(w2, w1)
w1 w2
\ /
\ /
|
r1 // see w1 or w2
|
r2 // see w1 or w2
in a sequentially consistent execution, r2 and r1 must see the same value. However JMM is weakened to not guarantee that. Therefore this program is not "correctly synchronized."
If hb(w1, w2) or hb(w2, w1) JMM does guarantee that r2 and r1 see the same value.
w1
|
w2
|
r1 // see w2
|
r2 // see w2
The basic idea is to link all writes and reads on one chain, so that each read is deterministic.
P.S. The definition of data race is buggy; two volatile actions should never be considered a data race, see Is volatile read happens-before volatile write?

Instruction reordering & happens-before relationship [duplicate]

This question already has answers here:
How to understand happens-before consistent
(5 answers)
Closed 4 years ago.
In the book Java Concurrency In Practice, we are told several time that the instructions of our program can be reordered, either by the compiler, by the JVM at runtime, or even by the processor. So we should assume that the executed program will not have its instructions executed in exactly the same order than what we specified in the source code.
However, the last chapter discussing Java Memory Model provides a listing of happens-before rules indicating which instruction ordering are preserved by the JVM. The first of these rules is:
"Program order rule. Each action in a thread happens before every action in that thread that comes later in the program order."
I believe "program order" refers to the source code.
My question: assuming this rule, I wonder what instruction may be actually reordered.
"Action" is defined as follow:
The Java Memory Model is specified in terms of actions, which include reads and writes to variables, locks and unlocks of monitors, and starting and joining with threads. The JMM defines a partial ordering called happens before on all actions within the program. To guarantee that the thread executing action B can see the results of action A (whether or not A and B occur in different threads), there must be a happens before relationship between A and B. In the absence of a happens before ordering between two operations, the JVM is free to reorder them as it pleases.
Other order rules mentionned are:
Monitor lock rule. An unlock on a monitor lock happens before every subsequent lock on that same monitor lock.
Volatile variable rule. A write to a volatile field happens before every subsequent read of that same field.
Thread start rule. A call to Thread.start on a thread happens before every action in the started thread.
Thread termination rule. Any action in a thread happens before any other thread detects that thread has terminated, either by successfully return from Thread.join or by Thread.isAlive returning false.
Interruption rule. A thread calling interrupt on another thread happens before the interrupted thread detects the interrupt (either by having InterruptedException thrown, or invoking isInterrupted or interrupted).
Finalizer rule. The end of a constructor for an object happens before the start of the finalizer for that object.
Transitivity. If A happens before B, and B happens before C, then A happens before C.

The key point of the program order rule is: in a thread.
Imagine this simple program (all variables initially 0):
T1:
x = 5;
y = 6;
T2:
if (y == 6) System.out.println(x);
From T1's perspective, an execution must be consistent with y being assigned after x (program order). However from T2's perspective this does not have to be the case and T2 might print 0.
T1 is actually allowed to assign y first as the 2 assignements are independent and swapping them does not affect T1's execution.
With proper synchronization, T2 will always print 5 or nothing.
EDIT
You seem to be misinterpreting the meaning of program order. The program order rule boils down to:
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y) (i.e. x happens-before y).
happens-before has a very specific meaning in the JMM. In particular, it does not mean that y=6 must be subsequent to x=5 in T1 from a wall clock perspective. It only means that the sequence of actions executed by T1 must be consistent with that order. You can also refer to JLS 17.4.5:
It should be noted that the presence of a happens-before relationship between two actions does not necessarily imply that they have to take place in that order in an implementation. If the reordering produces results consistent with a legal execution, it is not illegal.
In the example I gave above, you will agree that from T1's perspective (i.e. in a single threaded program), x=5;y=6; is consistent with y=6;x=5; since you don't read the values. A statement on the next line is guaranteed, in T1, to see those 2 actions, regardless of the order in which they were performed.

How to understand happens-before consistent

In chapter 17 of JLS, it introduce a concept: happens-before consistent.
A set of actions A is happens-before consistent if for all reads r in A, where W(r) is the write action seen by r, it is not the case that either hb(r, W(r)) or that there exists a write w in A such that w.v = r.v and hb(W(r), w) and hb(w, r)"
In my understanding, it equals to following words:
..., it is the case that neither ... nor ...
So my first two questions are:
is my understanding right?
what does "w.v = r.v" mean?
It also gives an Example: 17.4.5-1
Thread 1 Thread 2
B = 1; A = 2;
r2 = A; r1 = B;
In first execution order:
1: B = 1;
3: A = 2;
2: r2 = A; // sees initial write of 0
4: r1 = B; // sees initial write of 0
The order itself has already told us that two threads are executed alternately, so my third question is: what does left number mean?
In my understanding, the reason of both r2 and r1 can see initial write of 0 is both A and B are not volatile field. So my fourth quesiton is: whether my understanding is right?
In second execution order:
1: r2 = A; // sees write of A = 2
3: r1 = B; // sees write of B = 1
2: B = 1;
4: A = 2;
According to definition of happens-before consistency, it is not difficult to understand this execution order is happens-before consistent(if my first understanding is correct).
So my fifth and sixth questions are: does it exist this situation (reads see writes that occur later) in real world? If it does, could you give me a real example?

Each thread can be on a different core with its own private registers which Java can use to hold values of variables, unless you force access to coherent shared memory. This means that one thread can write to a value storing in a register, and this value is not visible to another thread for some time, like the duration of a loop or whole function. (milli-seconds is not uncommon)
A more extreme example is that the reading thread's code is optimised with the assumption that since it never changes the value, it doesn't need to read it from memory. In this case the optimised code never sees the change performed by another thread.
In both cases, the use of volatile ensures that reads and write occur in a consistent order and both threads see the same value. This is sometimes described as always reading from main memory, though it doesn't have to be the case because the caches can talk to each other directly. (So the performance hit is much smaller than you might expect).
On normal CPUs, caches are "coherent" (can't hold stale / conflicting values) and transparent, not managed manually. Making data visible between threads just means doing an actual load or store instruction in asm to access memory (through the data caches), and optionally waiting for the store buffer to drain to give ordering wrt. other later operations.

happens-before
Let's take a look at definitions in concurrency theory:
Atomicity - is a property of operation that can be executed completely as a single transaction and can not be executed partially. For example Atomic operations[Example]
Visibility - if one thread made changes they are visible for other threads. volatile before Java 5 with happens-before
Ordering - compiler is able to change an ordering of operations/instructions of source code to make some optimisations.
For example happens-before which is a kind of memory barrier which helps to solve Visibility and Ordering issue. Good examples of happens-before are volatile[About], synchronized monitor[About]
A good example of atomicity is Compare and swap(CAS) realization of check then act(CTA) pattern which should be atomic and allows to change a variable in multithreading envirompment. You can write your own implementation if CTA:
volatile + synchronized
java.util.concurrent.atomic with sun.misc.Unsafe(memory allocation, instantiating without constructor call...) from Java 5 which uses JNI and CPU advantages.
CAS algoritm has thee parameters(A(address), O(old value), N(new value)).
If value by A(address) == O(old value) then put N(new value) into A(address),
else O(old value) = value from A(address) and repeat this actions again
Happens-before
Official doc
Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
volatile[About] as an example
A write to a volatile field happens-before every subsequent read of that field.
Let's take a look at the example:
// Definitions
int a = 1;
int b = 2;
volatile boolean myVolatile = false;
// Thread A. Program order
{
a = 5;
b = 6;
myVolatile = true; // <-- write
}
//Thread B. Program order
{
//Thread.sleep(1000); //just to show that writing into `myVolatile`(Thread A) was executed before
System.out.println(myVolatile); // <-- read
System.out.println(a); //prints 5, not 1
System.out.println(b); //prints 6, not 2
}
Visibility - When Thread A changes/writes a volatile variable it also pushes all previous changes into RAM - Main Memory as a result all not volatile variable will be up to date and visible for another threads
Ordering:
All operations before writing into volatile variable in Thread A will be called before. JVM is able to reorder them but guarantees that no one operation before writing into volatile variable in Thread A will be called after it.
All operations after reading the volatile variable in Thread B will be called after. JVM is able to reorder them but guarantees that no one operation after reading a volatile variable in Thread B will be called before it.
[Concurrency vs Parallelism]

The Java Memory Model defines a partial ordering of all your actions of your program which is called happens-before.
To guarantee that a thread Y is able to see the side-effects of action X (irrelevant if X occurred in different thread or not) a happens-before relationship is defined between X and Y.
If such a relationship is not present the JVM may re-order the operations of the program.
Now, if a variable is shared and accessed by many threads, and written by (at least) one thread if the reads and writes are not ordered by the happens before relationship, then you have a data race.
In a correct program there are no data races.
Example is 2 threads A and B synchronized on lock X.
Thread A acquires lock (now Thread B is blocked) and does the write operations and then releases lock X. Now Thread B acquires lock X and since all the actions of Thread A were done before releasing the lock X, they are ordered before the actions of Thread B which acquired the lock X after thread A (and also visible to Thread B).
Note that this occurs on actions synchronized on the same lock. There is no happens before relationship among threads synchronized on different locks

In substance that is correct. The main thing to take out of this is: unless you use some form of synchronization, there is no guarantee that a read that comes after a write in your program order sees the effect of that write, as the statements might have been reodered.
does it exist this situation (reads see writes that occur later) in real world? If it does, could you give me a real example?
From a wall clock's perspective, obviously, a read can't see the effect of a write that has not happened yet.
From a program order's perspective, because statements can be reordered if there isn't a proper synchronization (happens before relationship), a read that comes before a write in your program, could see the effect of that write during execution because it has been executed after the write by the JVM.

Q1: is my understanding right?
A: Yes
Q2: what does "w.v = r.v" mean?
A: The value of w.v is same as that of r.v
Q3: What does left number mean?
A: I think it is statement ID like shown in "Table 17.4-A. Surprising results caused by statement reordering - original code". But you can ignore it because it does not apply to the conent of "Another execution order that is happens-before consistent is: " So the left number is shit completely. Do not stick to it.
Q4: In my understanding, the reason of both r2 and r1 can see initial write of 0 is both A and B are not volatile field. So my fourth quesiton is: whether my understanding is right?
A: That is one reason. re-order can also make it. "A program must be correctly synchronized to avoid the kinds of counterintuitive behaviors that can be observed when code is reordered."
Q5&6: In second execution order ... So my fifth and sixth questions are: does it exist this situation (reads see writes that occur later) in real world? If it does, could you give me a real example?
A: Yes. no synchronization in code, each thread read can see either the write of the initial value or the write by the other thread.
time 1: Thread 2: A=2
time 2: Thread 1: B=1 // Without synchronization, B=1 of Thread 1 can be interleaved here
time 3: Thread 2: r1=B // r1 value is 1
time 4: Thread 1: r2=A // r2 value is 2
Note "An execution is happens-before consistent if its set of actions is happens-before consistent"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.