In chapter 17 of JLS, it introduce a concept: happens-before consistent.
A set of actions A is happens-before consistent if for all reads r in A, where W(r) is the write action seen by r, it is not the case that either hb(r, W(r)) or that there exists a write w in A such that w.v = r.v and hb(W(r), w) and hb(w, r)"
In my understanding, it equals to following words:
..., it is the case that neither ... nor ...
So my first two questions are:
is my understanding right?
what does "w.v = r.v" mean?
It also gives an Example: 17.4.5-1
Thread 1 Thread 2
B = 1; A = 2;
r2 = A; r1 = B;
In first execution order:
1: B = 1;
3: A = 2;
2: r2 = A; // sees initial write of 0
4: r1 = B; // sees initial write of 0
The order itself has already told us that two threads are executed alternately, so my third question is: what does left number mean?
In my understanding, the reason of both r2 and r1 can see initial write of 0 is both A and B are not volatile field. So my fourth quesiton is: whether my understanding is right?
In second execution order:
1: r2 = A; // sees write of A = 2
3: r1 = B; // sees write of B = 1
2: B = 1;
4: A = 2;
According to definition of happens-before consistency, it is not difficult to understand this execution order is happens-before consistent(if my first understanding is correct).
So my fifth and sixth questions are: does it exist this situation (reads see writes that occur later) in real world? If it does, could you give me a real example?
Each thread can be on a different core with its own private registers which Java can use to hold values of variables, unless you force access to coherent shared memory. This means that one thread can write to a value storing in a register, and this value is not visible to another thread for some time, like the duration of a loop or whole function. (milli-seconds is not uncommon)
A more extreme example is that the reading thread's code is optimised with the assumption that since it never changes the value, it doesn't need to read it from memory. In this case the optimised code never sees the change performed by another thread.
In both cases, the use of volatile ensures that reads and write occur in a consistent order and both threads see the same value. This is sometimes described as always reading from main memory, though it doesn't have to be the case because the caches can talk to each other directly. (So the performance hit is much smaller than you might expect).
On normal CPUs, caches are "coherent" (can't hold stale / conflicting values) and transparent, not managed manually. Making data visible between threads just means doing an actual load or store instruction in asm to access memory (through the data caches), and optionally waiting for the store buffer to drain to give ordering wrt. other later operations.
happens-before
Let's take a look at definitions in concurrency theory:
Atomicity - is a property of operation that can be executed completely as a single transaction and can not be executed partially. For example Atomic operations[Example]
Visibility - if one thread made changes they are visible for other threads. volatile before Java 5 with happens-before
Ordering - compiler is able to change an ordering of operations/instructions of source code to make some optimisations.
For example happens-before which is a kind of memory barrier which helps to solve Visibility and Ordering issue. Good examples of happens-before are volatile[About], synchronized monitor[About]
A good example of atomicity is Compare and swap(CAS) realization of check then act(CTA) pattern which should be atomic and allows to change a variable in multithreading envirompment. You can write your own implementation if CTA:
volatile + synchronized
java.util.concurrent.atomic with sun.misc.Unsafe(memory allocation, instantiating without constructor call...) from Java 5 which uses JNI and CPU advantages.
CAS algoritm has thee parameters(A(address), O(old value), N(new value)).
If value by A(address) == O(old value) then put N(new value) into A(address),
else O(old value) = value from A(address) and repeat this actions again
Happens-before
Official doc
Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
volatile[About] as an example
A write to a volatile field happens-before every subsequent read of that field.
Let's take a look at the example:
// Definitions
int a = 1;
int b = 2;
volatile boolean myVolatile = false;
// Thread A. Program order
{
a = 5;
b = 6;
myVolatile = true; // <-- write
}
//Thread B. Program order
{
//Thread.sleep(1000); //just to show that writing into `myVolatile`(Thread A) was executed before
System.out.println(myVolatile); // <-- read
System.out.println(a); //prints 5, not 1
System.out.println(b); //prints 6, not 2
}
Visibility - When Thread A changes/writes a volatile variable it also pushes all previous changes into RAM - Main Memory as a result all not volatile variable will be up to date and visible for another threads
Ordering:
All operations before writing into volatile variable in Thread A will be called before. JVM is able to reorder them but guarantees that no one operation before writing into volatile variable in Thread A will be called after it.
All operations after reading the volatile variable in Thread B will be called after. JVM is able to reorder them but guarantees that no one operation after reading a volatile variable in Thread B will be called before it.
[Concurrency vs Parallelism]
The Java Memory Model defines a partial ordering of all your actions of your program which is called happens-before.
To guarantee that a thread Y is able to see the side-effects of action X (irrelevant if X occurred in different thread or not) a happens-before relationship is defined between X and Y.
If such a relationship is not present the JVM may re-order the operations of the program.
Now, if a variable is shared and accessed by many threads, and written by (at least) one thread if the reads and writes are not ordered by the happens before relationship, then you have a data race.
In a correct program there are no data races.
Example is 2 threads A and B synchronized on lock X.
Thread A acquires lock (now Thread B is blocked) and does the write operations and then releases lock X. Now Thread B acquires lock X and since all the actions of Thread A were done before releasing the lock X, they are ordered before the actions of Thread B which acquired the lock X after thread A (and also visible to Thread B).
Note that this occurs on actions synchronized on the same lock. There is no happens before relationship among threads synchronized on different locks
In substance that is correct. The main thing to take out of this is: unless you use some form of synchronization, there is no guarantee that a read that comes after a write in your program order sees the effect of that write, as the statements might have been reodered.
does it exist this situation (reads see writes that occur later) in real world? If it does, could you give me a real example?
From a wall clock's perspective, obviously, a read can't see the effect of a write that has not happened yet.
From a program order's perspective, because statements can be reordered if there isn't a proper synchronization (happens before relationship), a read that comes before a write in your program, could see the effect of that write during execution because it has been executed after the write by the JVM.
Q1: is my understanding right?
A: Yes
Q2: what does "w.v = r.v" mean?
A: The value of w.v is same as that of r.v
Q3: What does left number mean?
A: I think it is statement ID like shown in "Table 17.4-A. Surprising results caused by statement reordering - original code". But you can ignore it because it does not apply to the conent of "Another execution order that is happens-before consistent is: " So the left number is shit completely. Do not stick to it.
Q4: In my understanding, the reason of both r2 and r1 can see initial write of 0 is both A and B are not volatile field. So my fourth quesiton is: whether my understanding is right?
A: That is one reason. re-order can also make it. "A program must be correctly synchronized to avoid the kinds of counterintuitive behaviors that can be observed when code is reordered."
Q5&6: In second execution order ... So my fifth and sixth questions are: does it exist this situation (reads see writes that occur later) in real world? If it does, could you give me a real example?
A: Yes. no synchronization in code, each thread read can see either the write of the initial value or the write by the other thread.
time 1: Thread 2: A=2
time 2: Thread 1: B=1 // Without synchronization, B=1 of Thread 1 can be interleaved here
time 3: Thread 2: r1=B // r1 value is 1
time 4: Thread 1: r2=A // r2 value is 2
Note "An execution is happens-before consistent if its set of actions is happens-before consistent"
Related
The tutorial http://tutorials.jenkov.com/java-concurrency/volatile.html says
Reads from and writes to other variables cannot be reordered to occur
after a write to a volatile variable, if the reads / writes originally
occurred before the write to the volatile variable. The reads / writes
before a write to a volatile variable are guaranteed to "happen
before" the write to the volatile variable.
What is meant by "before the write to the volatile variable"? Does it mean previous read/writes in the same method where we are writing to the volatile variable? Or is it a larger scope (also in methods higher up the call stack)?
JVM can reorder operations. For example if we have i, j variables and code
i = 1;
j = 2;
JVM can run this in reordered manner
j = 2;
i = 1;
But if the j variable marked as volatile then JVM runs operations only as
i = 1;
j = 2;
write to i "happens before the write to the volatile variable" j.
The JVM ensures that writes to a volatile variable happens-before any reads from it. Take two threads. It's guarateed that for a single thread, the execution follows an as-if-serial semantics. Basically you can assume that there is an implicit happens-before relationship b/w two executions in the same thread (the compiler is still free to reorder instructions). Basically a single thread has a total order b/w its instructions governed by the happens-before relationship trivially.
A multi-threaded program has many such partial orders (every thread has a total order in the local instruction set but there is no order globally across threads) but not a total order b/w the global instruction set. Synchronisation is all about giving your program as much total order as possible.
Coming back to volatile variables, when a thread reads from it, the JVM ensures that all writes to it happened before the read. Now because of this order, everything the writing thread did before it wrote to the variable become visible to the thread reading from it. So yes, to answer your question, even variables up in the call stack should be visible to the reading thread.
I'll try to draw a visual picture. The two threads can be imagined as two parallel rails, and write to a volatile variable can be one of the sleepers b/w them. You basically get a
A -----
|
|
------- B
shaped total order b/w the two threads of execution. Everything in A before the sleeper should be visible to B after the sleeper because of this total order.
The JMM is defined in terms of happens before relation which we'll call ->. If a->b, then the b should see everything of a. This means that there are constraints on reordering loads/stores.
If a is a volatile write and b is a subsequent volatile read of the same variable, then a->b. This is called the volatile variable rule.
If a occurs before b in the code, then a->b. This is called the program order rule.
If a->b and b->c, then a->c. This is called the transitivity rule.
So lets apply this to a simple example:
int a;
volatile int b;
thread1(){
a=1;
b=1
}
thread2(){
int rb=b;
int ra=a;
if(rb==1 and ra==0) print("violation");
}
So the question is if thread2 sees rb=1,will it see ra=1?
a=1->b=1 due to program order rule.
b=1->rb=b (since we see the value 1) due to the volatile variable rule.
rb=b->ra=a due to program order rule.
Now we can apply the transitivity rule twice and we can conclude that that a=1->ra=a. And therefor ra needs to be 1.
This means that:
a=1 and b=1 can't be reordered.
rb=b and ra=a can't be reordered
otherwise we could end up with an rb=1 and ra=0.
this is a followup on another question of mine.
#templatetypedef answered the question (appreciated), and in his answer he wrote:
As a note - atomicity does not mean "all other threads will be blocked
until the value is ready. It means all other threads will either see
the state purely before the operation is done or purely after the
operation is done, but nothing else.
I have a confusion regarding this, and here’s why:
It says here:
Atomic actions cannot be interleaved, so they can be used without fear
of thread interference.
What I infer from this, is that it contradicts what he wrote.
If we have 2 int variables i1 and i2, and we do the atomic operation i1=i2; and this operation is executed by threadX.
Then if atomic actions cannot be interleaved as indicated above, it means that during this atomic operation (executed by threadX), no other threadY is allowed to access (for either read or write) that same variable i2, hence, no other threadY, is allowed to access that same variable during the atomic operation, so some form of blocking does exist.
Did I get this right?
Thanks...
To the best of my knowledge there is no atomic i1 = i2 operation. You can atomically read an int and you can atomically write to one, but you can't do both in the same operation with synchronization. So i1 = i2 is two different atomic operations, a read followed by a write. You are guaranteed that nothing will interleave the read operation so you won't see partial updates to i2 when you read it, and you are guaranteed that nothing will interleave the write to i1, but there's no guarantee that nothing will happen in between those two atomic operations.
Lets say Thread t1 is going to do:
i2 = 10
i1 = i2
And thread t2 is going to do:
i1 = 7
i2 = 18
System.out.println(i1)
What you are guaranteed is that t1 will end up assigning either 10 or 18 to i1 but you can't know which. However, you are guaranteed it can't be any other value because the read of i2 and the write to i1 are atomic so you can't end up seeing some of the bits of i2 while it's being modified. Similarly, t2 is guaranteed to print either 10, 18, or 7 and it can't print anything else. However, without synchronization there's no way to know which of those 3 values it will end up printing.
... it means that during this atomic operation (executed by threadX), no other threadY is allowed to access (for either read or write) that same variable i2, hence, no other threadY, is allowed to access that same variable during the atomic operation, so some form of blocking does exist.
No you didn't get it right.
Atomic operations mean that threads can not see values in a partial state. The assignment is atomic depending on the underlying architecture running your JVM and the data size of i1 and i2. I believe that Java says that int fields are atomically assigned but long (and double) may not be because it may take multiple operations by the CPU.
Atomic actions cannot be interleaved, so they can be used without fear of thread interference.
This is right. If i1 is 1 and i2 is 2 and threadX executes the assignment, then any other thread will either see the value of i1 as 1 (the old value) or 2 (the new value). ThreadY won't see it be some sort of half-way between 1 or 2 because that assignment is atomic even if multiple threads are updating the value of i1.
But what is really confusing the matter is that there are two concepts going on here: atomicity and memory synchronization. With threads, each CPU has its own memory cache so that memory operations are first made to the local memory and then these changes are written to main memory. A thread might see an old copy of i1 in its local cached memory even though another thread has updated main memory already. Even worse is when two threads have updated the value of i1 in their local memory and depending on their order of operations (which is highly random) one thread's value will overwrite the other thread's write to main memory. It's extremely hard to know which one will win the race condition.
As a note - atomicity does not mean "all other threads will be blocked until the value is ready.
Right. This is trying to let you know that there is no locking involved at all here. There are no guarantees as to the value that ThreadY will see. ThreadY could also be updating i1 at the same exact time to the value 3 and then other threads could see it as 1, 2, or 3 depending on the order of operations and whether or not those threads cross memory barriers when the cache flushing and updating is enforced.
The way we control fields and objects that are shared between threads is with the synchronized keyword, which gives a thread unique access to a resource. There are also Locks and other mechanisms to provide mutex. We also can force memory barriers by adding a volatile keyword to a field which means that any read or write to the field will be made to main memory. Both synchronized and volatile ensure proper publishing of data and ordering of operations.
I have some question regarding program order and how it affects reorderings in the JMM.
In the Java Memory Model, program order (po) is defined as the total order of actions in each thread in a program. According to the JLS, this induces happens-before (hb) edges:
If x and y are actions of the same thread and x comes before y in
program order, then hb(x, y) (i.e. x happens-before y).
So for a simple program P:
initially, x = y = 0
T1 | T2
-----------|-----------
1. r1 = x | 3. r2 = y
2. y = 1 | 4. x = r2
I think po(1, 2) and po(3, 4). Thus, hb(1, 2) and hb(3, 4).
Now suppose I wanted to reorder some of these statements, giving me P':
initially, x = y = 0
T1 | T2
-----------|-----------
2. y = 1 | 3. r2 = y
1. r1 = x | 4. x = r2
According to this paper, we can reorder any two adjacent statements (e.g. 1 and 2), provided that the reordering doesn't eliminate any transitive happens-before edges in any valid execution. However, since hb is defined (partially) by po, and po is a total order over a thread's actions, it seems to me that it would be impossible to reorder any two statements without violating hb, thus P' is not a legal transformation.
My questions are:
Is my understanding of po and hb correct, and have I correctly defined po and hb with respect to the above program P?
Where is my understanding about reordering with regards to hb failing?
You're missing this part of the JLS:
It should be noted that the presence of a happens-before relationship between two actions does not necessarily imply that they have to take place in that order in an implementation. If the reordering produces results consistent with a legal execution, it is not illegal.
In your case, since 1 and 2 are unrelated, they can be flipped. Now if 2 had been y = r1, then 1 must happen before 2 for the right result.
The real problem occurs with multi-processor execution. Without any happen-before boundaries, T2 may observe 2 happening before 1, regardless of execution order.
This is because of CPU caching. Let's say T1 executed 1 and 2, in any order. Since no happen-before boundary exist, these actions are still in CPU cache, and depending on other needs, the part of the cache containing the result of 2 may be flushed before the part of the cache that contains the result of 1.
If T2 executes between those two cache flush events, it'll observe 2 has happened and 1 hasn't, i.e. 2 happened before 1, as far as T2 knows.
If this is not allowed, then T1 must establish a happens-before boundary between 1 and 2.
In Java there are various ways of doing that. The old style would be to put 1 and 2 into separate synchronized blocks, because the start and end of a synchronized block is a happens-before boundary, i.e. any action before the block happens before actions inside the block, and any action inside the block happens before actions coming after the block.
What you have described as P', is in fact not a different program, but an execution trace of the same program P. It could be a different program, but then it would have different po, and therefore different hb.
Happens-before relation restricts statement reordering with regards to their observable effect, not their execution order. Action 1 happens-before 2, but they don't observe each other's result, so they are allowed to be reordered.
hb guarantees that you will observe that two actions were executed in-order, but only from synchronized context (i.e. from other actions forming hb with 1 and 2). You may think of 1 and 2 saying: Let's swap. No one's watching!.
Here is a good example from JLS that reflects happens-before idea quite well:
For example, the write of a default value to every field of an object constructed by a thread need not happen before the beginning of that thread, as long as no read ever observes that fact
In practice, it is rarely possible to order default-value writes of all objects constructed by a thread before it starts, even though they form synchronized-with edge with every action in that thread. A starting thread may not know what, and how many objects it will construct in run time. But once you have a reference to an object, you will observe that default value writes have already happened. Ordering default writes of an object not yet constructed (or known to be constructed) often cannot be reflected in execution, but it still does not violate happens-before relation, because it is all about observable effect.
I think a key issue is with your construction P'. It implies that the way re-ordering works is that re-ordering is global - the entire program is re-ordered in single way (on each execution) which obeys the memory model. Then you are trying to reason about this P' and find out that no interesting re-orderings are possible!
What actually occurs is that there is no particular global order for statements not related by a hb relationship, so different threads can see different apparent orders on the same execution. In your example, there are no edges between {1,2} and {3,4} statements in one set can see those in the other set in any order. For example, it is possible that T2 observes 2 before 1, but that then T3, which is identical to T2 (with its own private variables), observes the opposite! So there is no single reordering P' - each thread may observe their own reorderings, as long as they are consistent with the JMM rules.
I have some question regarding program order and how it affects reorderings in the JMM.
Strictly speaking about program order: it simply can't affect anything, at least not in a perceivable way. Program order is not something that can be "broken"; it exists only so that a general model about a program starts to take shape.
In other words, program order is only needed so that we know how the original source code looked like. It is also important to note such a statement:
Among all the inter-thread actions performed by each thread t, the program order of t is a total order that reflects the order in which these actions would be performed
Not will be performed, but would. So, po does not say the order in which actions will happen, it only says the order in the original source code.
Yes, po will also bring hb, but A happens-before B does not mean A actually happening before B. A great article here about this, and the most important part here:
(2) still behaves the same as it would have even if the effects of (1) had been visible, which is effectively the same as (1)’s effects being visible.
Since your variables x and y are plain variable and there is no dependency between (1) and (2), that reordering is legal. The perceivable outcome for (1) and (2) for T1 is the same, no matter the order in which (1) and (2) get executed; and because x and y are plain variables, it is allowed for those actions to be reordered.
The absence of data races is the prerequisite for sequential consistency. Data races are caused by conflicting accesses. And two accesses to the same variable are conflicting if at least one of the accesses is a write.
See below quotes from the JLS7 for reference.
I understand this definition for the case, where one operation is a read access and the other operation is a write access. However, I do not understand why it is required (for the memory model), if there are two write operations to the same variable.
Question: What's the rationale for not guaranteeing sequential consistency in case of two write operations to the same variable, that are not ordered by a happens-before relationship?
§17.4.1: [..] Two accesses to (reads of or writes to) the same variable are said to be conflicting if at least one of the accesses is a write.
§17.4.5: [..] When a program contains two conflicting accesses that are not ordered by a happens-before relationship, it is said to contain a data race. [..] A program is correctly synchronized if and only if all sequentially consistent executions are free of data races. If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent.
If two write accesses are not in a happens-before relationship, it is unspecified which will happen last, i.e. which assignment wins. For instance, the program
static int x;
static int y;
public static void main(String[] args) {
Thread t1 = new Thread() {
#Override public void run() {
x = 1;
y = 1;
}
};
Thread t2 = new Thread() {
#Override public void run() {
y = 2;
x = 2;
}
};
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(x + "" + y);
}
may print 11, 12, 21, or 22, even though the only data races are between writes, and 12 can not be obtained by a sequentially consistent execution.
Consider for example a long variable on a 32 bit architecture. It will take two writes for a long. If two threads try that concurrently there are sequences leaving the variable in an inconsistent state.
thread1: write high bits write low bits
thread2: write high bits, write low bits
This will result in the high bits from thread2 and the low bits from thread1.
Intuitively, sequential consistency means that the execution of the multithreaded program appears as if the program was executed one statement at a time following the original program order, i.e. the actual order of statements in the code that developpers see. Sequential consistency is how people intuitively reason about concurrency.
The main point here is the verb appear. Indeed, the compiler and the VM have the liberty to perform many optimizations behind the hood, as long as they don't break sequential consistency.
According to the memory model, a program will appear sequentially consistent only if it is correctly synchronized. In other words: if a program is not correctly synchronized, its execution at run time might correspond to an execution you cannot reach by executing one statement at a time in the original program order.
Let's consider the original code
T1 T2
a = 3 b = 5
b = 4 a = 6
Sequentially consistent executions can be a=3,b=4,b=5,a=6, or a=3,b=5,a=6,b=4, or b=5,a=6,a=3,b=4 or a=3,b=5,b=4,a=6 or b=5,a=3,b=4,a=6 or b=5,a=3,a=6,b=4 (all the possible interleaving)
To guarantee sequential executions in the JVM, you should wrap each of the four assignments within a synchronized block. Otherwise, the compiler and VM are authorized to do optimizations that could break the intuitive sequential consistency. For instance, they could decide to reorder the statements of T1 to be b=4,a=3. The code will not run in the original program order. With this reordering, the following execution could happen: b=4,b=5,a=6,a=3, resulting in b=5,a=3. This state couldn't be reached with sequential consistency.
The reason why sequential consistency can be broken if the program is not correctly synchronized is that optimizations consider the consistency of individual threads, not the whole program. Here, swapping the assignements in T1 does not compromise the logic of T1 if taken in isolation. However, it compromises the logic of the interleaving of threads T1 and T2, since they mutate the same variables, i.e. they have a data race. If the assignments were wrapped into synchronized blocks, the reordering wouldn't be legal.
There's something true in your observation that if you don't read the heap, you won't actually notice the race that occured. However, it is safe to assume that any variable written to is also read at time, otherwise it has no purpose. As this small example should have examplified, the writes should not race or they could corrupt the heap, which can have unintended consequences later on.
Now, to make the situation worse, reads and writes aren't atomic on the JVM (reading and write doubles need to memory accesses). So if they race, they can corrupt the heap not only in the sense it's not consistent, but also in the sense that it contains value that never really existed.
Finally, the result of the assignment expression is the value of the variable after the assignment has occurred. So x = ( y = z ) is valid. This assumes that the write did not race with a concurrent race and return the value that was written.
To make the story short: if reads and writes are not properly synchronized, it becomes very very hard to guarantee anything about their effect.
see http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.5-500
We want two writes to have an happens-before relationship so that the later one can shadow the earlier one. Consider this example
hb(w1, r1), hb(w2, r1), hb(r1, r2), but not hb(w1, w2) or hb(w2, w1)
w1 w2
\ /
\ /
|
r1 // see w1 or w2
|
r2 // see w1 or w2
in a sequentially consistent execution, r2 and r1 must see the same value. However JMM is weakened to not guarantee that. Therefore this program is not "correctly synchronized."
If hb(w1, w2) or hb(w2, w1) JMM does guarantee that r2 and r1 see the same value.
w1
|
w2
|
r1 // see w2
|
r2 // see w2
The basic idea is to link all writes and reads on one chain, so that each read is deterministic.
P.S. The definition of data race is buggy; two volatile actions should never be considered a data race, see Is volatile read happens-before volatile write?
I try to understand why this example is a correctly synchronized program:
a - volatile
Thread1:
x=a
Thread2:
a=5
Because there are conflicting accesses (there is a write to and read of a) so in every sequential consistency execution must be happens-before relation between that accesses.
Suppose one of sequential execution:
1. x=a
2. a=5
Is 1 happens-before 2, why?
Is 1 happens-before 2, why?
I'm not 100% sure I understand your question.
If you have a volatile variable a and one thread is reading from it and another is writing to it, the order of those accesses can be in either order. It is a race condition. What is guaranteed by the JVM and the Java Memory Model (JMM) depends on which operation happens first.
The write could have just happened and the read sees the updated value. Or the write could happen after the read. So x could be either 5 or the previous value of a.
every sequential consistency execution must be happens-before relation between that accesses
I'm not sure what this means so I'll try to be specific. The "happens before relation" with volatile means that all previous memory writes to a volatile variable prior to a read of the same variable are guaranteed to have finished. But this guarantee in no way explains the timing between the two volatile operations which is subject to the race condition. The reader is guaranteed to have seen the write, but only if the write happened before the read.
You might think this is a pretty weak guarantee, but in threads, whose performance is dramatically improved by using local CPU cache, reading the value of a field might come from a cached memory segment instead of central memory. The guarantee is critical to ensure that the local thread memory is invalidated and updated when a volatile read occurs so that threads can share data appropriately.
Again, the JVM and the JMM guarantee that if you are reading from a volatile field a, then any writes to the same field that have happened before the read, will be seen by it -- the value written will be properly published and visible to the reading thread. However, this guarantee in no way determines the ordering. It doesn't say that the write has to happen before the read.
No, a volatile read before (in synchronization order) a volatile write of the same variable does not necessarily happens-before the volatile write.
This means they can be in a "data race", because they are "conflicting accesses not ordered by a happens-before relationship". If that's true pretty much all programs contain data races:) But it's probably a spec bug. A volatile read and write should never be considered a data race. If all variables in a program are volatile, all executions are trivially sequentially consistent. see http://cs.oswego.edu/pipermail/concurrency-interest/2012-January/008927.html
Sorry, but you cannot say correctly how the JVM will optimize the code depending on the 'memory model' of the JVM. You have to use the high level tools of Java for defining what you want.
So volatile means only that there will be no "inter-thread cache" used for the variables.
If you want a stricter order, you have to use synchronized blocks.
http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html
Volatile and happens-before is only useful when the read of the field drives some condition. For example:
volatile int a;
int b =0;
Thread-1:
b = 5;
a = 10;
Thread-2
c = b + a;
In this case there is no happens-before, a can be either 10 or 0 and b can be either 5 or 0, so as a result c could be either 0, 5, 10 or 15. If the read of a implies some other condition then the happens-before is established for instance:
int b = 0;
volatile int a = 0;
Thread-1:
b = 5
a = 10;
Thread 2:
if(a == 10){
c = b + a;
}
In this case you will ensure c = 15 because the read of a==10 implies that the write of b = 5 happens-before the write of a = 10
Edit: Updating addition order as noted the inconsistency by Gray