Understanding happens-before and synchronization [duplicate] - java

This question already has answers here:
How to understand happens-before consistent
(5 answers)
Closed 4 years ago.
I'm trying to understand Java happens-before order concept and there are a few things that seem very confusing. As far as I can tell, happens before is just an order on the set of actions and does not provide any guarantees about real-time execution order. Actually (emphasize mine):
It should be noted that the presence of a happens-before relationship
between two actions does not necessarily imply that they have to take
place in that order in an implementation. If the reordering produces
results consistent with a legal execution, it is not illegal.
So, all it says is that if there are two actions w (write) and r (read) such that hb(w, r), than r might actually happens before w in an execution, but there's no guarantee that it will. Also the write w is observed by the read r.
How I can determine that two actions are performed subsequently in run-time? For instance:
public volatile int v;
public int c;
Actions:
Thread A
v = 3; //w
Thread B
c = v; //r
Here we have hb(w, r) but that doesn't mean that c will contain value 3 after assignment. How do I enforce that c is assigned with 3? Does synchronization order provide such guarantees?

When the JLS says that some event X in thread A establishes a happens before relationship with event Y in thread B, it does not mean that X will happen before Y.
It means that IF X happens before Y, then both threads will agree that X happened before Y. That is to say, both threads will see the program's memory in a state that is consistent with X happening before Y.
It's all about memory. Threads communicate through shared memory, but when there are multiple CPUs in a system, all trying to access the same memory system, then the memory system becomes a bottleneck. Therefore, the CPUs in a typical multi-CPU computer are allowed to delay, re-order, and cache memory operations in order to speed things up.
That works great when threads are not interacting with one another, but it causes problems when they actually do want to interact: If thread A stores a value into an ordinary variable, Java makes no guarantee about when (or even if) thread B will see the value change.
In order to overcome that problem when it's important, Java gives you certain means of synchronizing threads. That is, getting the threads to agree on the state of the program's memory. The volatile keyword and the synchronized keyword are two means of establishing synchronization between threads.
I think the reason they called it "happens before" is to emphasize the transitive nature of the relationship: If you can prove that A happens before B, and you can prove that B happens before C, then according to the rules specified in the JLS, you have proved that A happens before C.

I would like to associate the above statement with some sample code flow.
To understand this, let us take below class that has two fields counter and isActive.
class StateHolder {
private int counter = 100;
private boolean isActive = false;
public synchronized void resetCounter() {
counter = 0;
isActive = true;
}
public synchronized void printStateWithLock() {
System.out.println("Counter : " + counter);
System.out.println("IsActive : " + isActive);
}
public void printStateWithNoLock() {
System.out.println("Counter : " + counter);
System.out.println("IsActive : " + isActive);
}
}
And assume that there are three thread T1, T2, T3 calling the following methods on the same object of StateHolder:
T1 calls resetCounter() and T2 calls printStateWithLock() at a same time and T1 gets the lock
T3 -> calls printStateWithNoLock() after T1 has completed its execution
It should be noted that the presence of a happens-before relationship between two actions does not necessarily imply that they have to take place in that order in an implementation. If the reordering produces results consistent with a legal execution, it is not illegal.
and the immediate line says,
As per the above statement, it gives the flexibility for JVM, OS or underlying hardware to reorder the statements within the resetCounter() method. And as T1 gets executed it could execute the statements in the below order.
public synchronized void resetCounter() {
isActive = true;
counter = 0;
}
This is inline with the statement not necessarily imply that they have to take place in that order in an implementation.
Now looking at it from a T2 perspective, this reordering doesn't have any negative impact, because both T1 and T2 are synchronizing on the same object and T2 is guaranteed to see changes changes to both of the fields, irrespective of whether the reordering has happened or not, as there is happens-before relationship. So output will always be:
Counter : 0
IsActive : true
This is as per statement, If the reordering produces results consistent with a legal execution, it is not illegal
But look at it from a T3 perspective, with this reordering it possible that T3 will see the updated value of isActive as 'truebut still see thecountervalue as100`, although T1 has completed its execution.
Counter : 100
IsActive : true
The next point in the above link further clarifies the statement and says that:
More specifically, if two actions share a happens-before relationship, they do not necessarily have to appear to have happened in that order to any code with which they do not share a happens-before relationship. Writes in one thread that are in a data race with reads in another thread may, for example, appear to occur out of order to those reads.
In this example T3 has encountered this problem as it doesn't have any happens-before relationship with T1 or T2. This is inline with Not necessarily have to appear to have happened in that order to any code with which they do not share a happens-before relationship.
NOTE: To simplify the case, we have single thread T1 modifying the state and T2 and T3 reading the state. It is possible to have
T1 updates counter to 0, later
T2 modifies isActive to true and sees counter is 0, after sometime
T3 that prints the state could still see only isActive as true but counter is 100, although both T1 and T2 have completed the execution.
As to the last question:
we have hb(w, r) but that doesn't mean that c will contain value 3 after assignment. How do I enforce that c is assigned with 3?
public volatile int v;
public int c;
Thread A
v = 3; //w
Thread B
c = v; //r
Since v is a volatile, as per Happens-before Order
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
So it is safe to assume that when Thread B tries to read the variable v it will always read the updated value and c will be assigned 3 in the above code.

Interpreting #James' answer to my liking:
// Definition: Some variables
private int first = 1;
private int second = 2;
private int third = 3;
private volatile boolean hasValue = false;
// Thread A
first = 5;
second = 6;
third = 7;
hasValue = true;
// Thread B
System.out.println("Flag is set to : " + hasValue);
System.out.println("First: " + first); // will print 5
System.out.println("Second: " + second); // will print 6
System.out.println("Third: " + third); // will print 7
if you want the state/value of the memory(memory and CPU cache) seen at the
time of a write statement of a variable by one thread,
State of the memory seen by hasValue=true(write statement) in Thread A :
first having value 5,second having value 6,third having value 7
to be seen from every subsequent(why subsequent even though only one
read in Thread B in this example? We many have Thread C doing exactly
similar to Thread B) read statement of the same variable by another
thread,then mark that variable volatile.
If X (hasValue=true) in Thread A happens before Y (sysout(hasValue)) in Thread B, the behaviour should be as if X happened before Y in the same thread (memory values seen at X should be same starting from Y)

Here we have hb(w, r) but that doesn't mean that c will contain value 3 after assignment. How do I enforce that c is assigned with 3? Does synchronization order provide such guarantees?
And your example
public volatile int v;
public int c;
Actions:
Thread A
v = 3; //w
Thread B
c = v; //r
You don't need volatile for v in your example. Let's take a look at a similar example
int v = 0;
int c = 0;
volatile boolean assigned = false;
Actions:
Thread A
v = 3;
assigned = true;
Thread B
while(!assigned);
c = v;
assigned field is volatile.
We will have c = v statement in Thread B only after assigned will be true (while(!assigned) is responsible for that).
if we have volatile — we have happens before.
happens before means that, if we see assigned == true — we will see all that happened before a statement assigned = true: we will see v = 3.
So when we have assigned == true -> we have v = 3.
We have c = 3 as a result.
What will happen without volatile
int v = 0;
int c = 0;
boolean assigned = false;
Actions:
Thread A
v = 3;
assigned = true;
Thread B
while(!assigned);
c = v;
We have assigned without volatile for now.
The value of c in the Thread B can be equal 0 or 3 in such situation. So there is not any guaranties
that c == 3.

Related

Should a variable be volatile between 2 running threads?

Should int a in this case be volatile to guarantee visibilty between threads?
private volatile static int a = 0;
public static void main(String[] args) {
Thread t1 = new Thread(new Runnable() {
#Override
public void run() {
a = 10;
}
});
Thread t2 = new Thread(new Runnable() {
#Override
public void run() {
System.out.println(a);
}
});
t1.start();
t2.start();
}
Output
10
happens-before is clearly defined in the language specification, start by reading that; to begin with.
Then to fully understand what is going on you need to know what Program order is, as well as synchronization order.
To put it very simplified, look at the below:
private volatile static int a = 0;
private static int b = 0;
public static void main(String[] args) {
Thread t1 = new Thread(new Runnable() {
#Override
public void run() {
b = 100;
a = 10;
}
});
Thread t2 = new Thread(new Runnable() {
#Override
public void run() {
if(a == 10){
System.out.println(b);
}
}
});
t1.start();
t2.start();
}
The only guarantee you have is that if, and only if, t2 prints something, it will always be 100. This is because t2 has seen a volatile write to a. That happens because a "happens-before" has been established, from the writing thread to the reading one, and every action done before a = 10 is guaranteed to be visible to the thread that has seen that a being 10.
Could you explain yourself a bit further on "happens-before"?
The most important thing to remember about "happens before" is that it's a transitive relation. That means, if the Java Language Spec (JLS) promises that A "happens before" B, and it promises that B "happens before" C, then you can infer a promise that A "happens before" C.
The JLS says that a write to some volatile variable "happens before" a subsequent read of the same variable.
Well Duh! Sounds obvious, doesn't it?
But it's not obvious because the JLS does not give the same guarantee for a non-volatile variable. If processor A writes the value 7 to a non-volatile int, and then some time later processor B writes 5, the JLS does not guarantee that some long time later, the final value of the variable will be 5. Processor B will see 5 (that's a different "happens before" promise, see below). Processor A could see 5 or 7, and any other processor could see 5 or 7 or whatever value the variable had initially (e.g., 0).
How the volatile promise helps
Suppose we have
volatile boolean flag = false;
/*non-volatile*/ int i = 0;
Suppose thread A does this:
i = 7;
flag = true;
And suppose thread B does this:
if (flag) {
System.out.println(i);
}
else {
System.out.println("Bleah!");
}
Thread B could print "7", or it could print "Bleah!" but because of the "happens before" guarantee, we absolutely know that thread B will never print "0". Why not?
Thread A set i = 7 before it set flag = true. The JLS guarantees that if a single thread executes one statement before it executes a second statement, then the first statment "happens before" the second statement. (That sounds stupendously obvious, but again, it shouldn't. A lot of things having to do with threads are not obvious.)
Thread B tests flag before it prints i. So *IF* thread A previously set flag=true then we know that i must equal 7: Transitivity: i=7 "happens before" flag=true, and the write to volatile flag, IF IT HAPPENED AT ALL, "happens before" a read of the same flag.
IF IT HAPPENED AT ALL
Data Races and Race Conditions
The biggest thing to remember is that when the JLS promises that A "happens before" B, they are not saying that A always actually does happen before B: They are saying that you can depend on that transitive relationship. They are saying that if A actually did happen before B, then all of the things that "happens before" A must also have actually happened before B.
The program can print "Bleah!" because nothing prevents thread B from testing the flag before thread A sets it. Some people call that a "data race." The two threads are "racing" to see which one gets to the flag first, and the outcome of the program depends on which thread wins that race.
When the correctness of a program depends on the outcome of a data race, some of us call that a "race condition," and that's a real headache. There's no guarantee that a program with a race condition won't do the right thing a thousand times during your testing, and then do the wrong thing when it matters most for your customer.

Volatile Java reordering

Firstly let me say that I am aware of this being a fairly common topic here but searching for it I couldn't quite find another question that clarifies the following situation. I am very sorry if this is a possible duplicate but here you go:
I am new to concurrency and have been given the following code in order to answer questions:
a) Why any other output aside from "00" would be possible?
b) How to amend the code so that "00" will ALWAYS print.
boolean flag = false;
void changeVal(int val) {
if(this.flag){
return;
}
this.initialInt = val;
this.flag = true;
}
int initialInt = 1;
class MyThread extends Thread {
public void run(){
changeVal(0);
System.out.print(initialInt);
}
}
void execute() throws Exception{
MyThread t1 = new MyThread();
MyThread t2 = new MyThread();
t1.start(); t2.start(); t1.join(); t2.join();
System.out.println();
}
For a) my answer would be the following: In the absence of any volatile / synchronization construct the compiler could reorder some of the instructions. In particular, "this.initialInt = val;" and "this.flag = true;" could be switched so that this situation could occur: The threads are both started and t1 charges ahead. Given the reordered instructions it first sets flag = true. Now before it reaches the now last statement of "this.initialInt = val;" the other thread jumps in, checks the if-condition and immediately returns thus printing the unchanged initialInt value of 1. Besides this, I believe that without any volatile / synchronization it is not for certain whether t2 might see the assignment performed to initialInt in t1 so it may also print "1" as the default value.
For b) I think that flag could be made volatile. I have learned that when t1 writes to a volatile variable setting flag = true then t2, upon reading out this volatile variable in the if-statement will see any write operations performed before the volatile write, hence initialInt = val, too. Therefore, t2 will already have seen its initialInt value changed to 0 and must always print 0.
This will only work, however, if the use of volatile successfully prevents any reordering as I described in a). I have read about volatile accomplishing such things but I am not sure whether this always works here in the absence of any further synchronized blocks or any such locks. From this answer I have gathered that nothing happening before a volatile store (so this.flag = true) can be reordered as to appear beyond it. In that case initialInt = val could not be moved down and I should be correct, right? Or not ? :)
Thank you so much for your help. I am looking forward to your replies.
This example will alway print 00 , because you do changeVal(0) before the printing .
to mimic the case where 00 might not be printed , you need to move initialInt = 1; to the context of a thread like so :
class MyThread extends Thread {
public void run(){
initialInt = 1;
changeVal(0);
System.out.print(initialInt);
}
}
now you might have a race condition , that sets initialInt back to 1 in thread1 before it is printed in thread2
another alternative that might results in a race-condition but is harder to understand , is switching the order of setting the flag and setting the value
void changeVal(int val) {
if(this.flag){
return;
}
this.flag = true;
this.initialInt = val;
}
There are no explicit synchronizations, so all kinds of interleavings are possible, and changes made by one thread are not necessarily visible to the other so, it is possible that the changes to flag are visible before the changes to initialInt, causing 10 or 01 output, as well as 00 output. 11 is not possible, because operations performed on variables are visible to the thread performing them, and effects of changeVal(0) will always visible for at least one of the threads.
Making changeVal synchronized, or making flag volatile would fix the issue. flag is the last variable changed in the critical section, so declaring it as volatile would create a happened-before relationship, making changes to initialInt visible.

If two unsynchronized threads increment a counter X times, can the total result be less than X?

I have two unsynchronized threads in a tight loop, incrementing a global variable X times (x=100000).
The correct final value of the global should be 2*X, but since they are unsynchronized it will be less, empirically it is typically just a bit over X
However, in all the test runs the value of global was never under X .
Is it possible for the final result to be less than x ( less than 100000 )?
public class TestClass {
static int global;
public static void main(String[] args) throws InterruptedException {
Thread t = new Thread( () -> { for(int i=0; i < 100000; ++i) { TestClass.global++; } });
Thread t2 = new Thread( () -> { for(int i=0; i < 100000; ++i) { TestClass.global++; } });
t.start(); t2.start();
t.join(); t2.join();
System.out.println("global = " + global);
}
}
Just imagine the following scenario:
Thread A reads the initial value 0 from global
Thread B performs 99999 updates on global
Thread A writes 1 to global
Thread B reads 1 from global
Thread A perform its remaining 99999 updates on global
Thread B writes 2 to global
Then, both threads completed but the resulting value is 2, not 2 * 100000, nor 100000.
Note that the example above just uses a bad timing without letting any thread perceive reads or writes of the other thread out-of-order (which would be permitted in the absence of synchronization) nor missing updates (which also would be permitted here).
In other words, the scenario shown above would even be possible when the global variable was declared volatile.
It is a common mistake to reason about reads and writes and their visibility, but implicitly assuming a particular timing for the execution of the thread’s code. But it is not guaranteed that these threads run side-by-side with a similar instruction timing.
But that may still happen in your test scenarios, so they’re not revealing the other possible behavior. Also, some legal behavior may never occur on a particular hardware or with a particular JVM implementation, while the developer still has to account for it. It might be very well, that the optimizer replaces the incrementing loop by the equivalent of global += 100000 which rarely exhibits in-between values in such a test, but the insertion of some other nontrivial operation into the loop body could change the behavior entirely.

How to Avoid Data Races - Two Examples

I was told that the following code example has a data race condition (assuming multiple threads, of course):
class C {
private int x = 0;
private int y = 0;
void f() {
x = 1;
y = 1;
}
void g() {
int a = y;
int b = x;
assert(b >= a);
}
}
Yet, I am told that the following "fix" does not have data races:
class C {
private int x = 0;
private int y = 0;
void f() {
synchronized(this) { x = 1; }
synchronized(this) { y = 1; }
}
void g() {
int a, b;
synchronized(this) { a = y; }
synchronized(this) { b = x; }
assert(b >= a);
}
}
Understandably, there are other problems with the above examples, but I just want to know why the second code block has no race conditions. How does synchronizing each assignment statement eliminate the data race condition? What is the significance of synchronizing only a single assignment statement at a time?
Just to clarify, data race is defined as such:
Data races: Simultaneous read/write or write/write of the same
memory location
In the first example the data race condition will be noticed by having the assert fail.
So how is this possible? y > x should always be false, as y is written after x and read before x.
Even if you consider all interleaving of
Thread 1 Thread 2
----------------------------------
read y
read x
write x 1
write y 1
you should always have x <= y
But in a safe execution, if read v during the execution of a write v, there is no guarantee on the value read.
v is 0
T1 write 1: wwwwwwwww
T2 read : rrrrr
T3 read : rrrrr
In this case the value read by T2 can be anything, like 42. Meanwhile, the value read by T3 is guaranteed to be 1.
In the first case a and b can be anything, so the assertion may fail.
The "fix" offers the guarantee that the data race (concurrent read\write) will never occur, and that a and b will always be either 0 or 1.
Whoever told you this was wrong; the race condition (changing x and y before the assert; actually, just assert (x >= y); has the same problem) is still present if you synchronize separately.
A JIT JVM might very well perform lock coarsening and move both pairs of assignments into a single synchronized block, but that's not guaranteed by the language semantics.
The synchronized keyword is all about different threads reading and writing to the same variables, objects and resources. This is not a trivial topic in Java, but here is a quote from Sun:
Synchronized methods enable a simple strategy for preventing thread interference and memory consistency errors: if an object is visible to more than one thread, all reads or writes to that object's variables are done through synchronized methods.
In a very, very small nutshell: When you have two threads that are reading and writing to the same 'resource', say a variable named foo, you need to ensure that these threads access the variable in an atomic way. Without the synchronized keyword, your thread 1 may not see the change thread 2 made to foo, or worse, it may only be half changed. This would not be what you logically expect.

Synchronized data read/write to/from main memory

When a synchronized method is completed, will it push only the data modified by it to main memory, or all the member variables, similarly when a synchronized method executes, will it read only the data it needs from main memory or will it clear all the member variables in the cache and read their values from main memory ? For example
public class SharedData
{
int a; int b; int c; int d;
public SharedData()
{
a = b = c = d = 10;
}
public synchronized void compute()
{
a = b * 20;
b = a + 10;
}
public synchronized int getResult()
{
return b*c;
}
}
In the above code assume compute is executed by threadA and getResult is executed by threadB. After the execution of compute, will threadA update main memory with a and b or will it update a,b,c and d. And before executing getResult will threadB get only the value of b and c from main memory or will it clear the cache and fetch values for all member variables a,b,c and d ?
synchronized ensures you have a consistent view of the data. This means you will read the latest value and other caches will get the latest value. Caches are smart enough to talk to each other via a special bus (not something required by the JLS, but allowed) This bus means that it doesn't have to touch main memory to get a consistent view.
I think following thread should answer your question.
Memory effects of synchronization in Java
In practice, the whole cache is not flushed.
1. synchronized keyword on a method or on an atomic statement, will lock the access to
the resource that it can modify, by allowing only one thread to gain the lock.
2. Now preventing of caching of values into the variables is done by volatile keyword.
Using volatile keyword will ask the JVM to make the thread that access the instance variable to reconcile its copy of the instance variable with the one saved in the memory.
3. Moreover in your above example, if threadA execute the compute(), then threadB canNot access the getResult() method simultaneously, as they both are synchronized methods, and only one thread can have access to the all the synchronized methods of the object, cause its not the method that is locked but the Object. Its like this... Every object has one lock, and the thread which wants to access its synchronized block must get that lock
4. Even every class has a lock, that is used to protect the crucial state of the static variables in the class.
Before answering your question lets clear few terms related to Multi-Threaded environments to understand the basic things.
Race-Condition : When two or more thread trying to perform read or write operations on same variable on same time (here same variable = shared data between thread) eg. In your question Thraed-A
execute b = a + 10 which is write operation on b and At same time Thread-B can execute b*c which is read operation on b. So here race condition is happening.
We can handle race condition by using two ways one is by using Synchronized method or block and second by using Volatile Keyword.
Volatile : Volatile keyword in Java guarantees that the value of the volatile variable will always be read from the main memory and not from Thread’s local cache. Normal variable without Volatile keyword is temporarily stored in Local cache for quick access and easy read write operations. Volatile doesn't block your thread it just make sure the write and read operation is perform in the sync. In the context of your example we can avoid race condition by making all variable as volatile
Synchronized : Synchronization is achived by blocking of thread. It means it use lock and key mechanism in such a way that only one thread can execute this block of code at a time. So Thread-B is waiting in the doors of syncronized block until Thread-A finish completely and release key. if you use syncronized in front of static method then lock is considered on your class (.class) and if method is non static or instance method (same as your case) at that time lock is considered on instance of class or current object read
Now come to the point lets modify your example with few print statements and kotlin code
class SharedData {
var a: Int
var b: Int
var c: Int
var d = 10
init {
c = d
b = c
a = b
}
#Synchronized
fun compute() : Pair<Int,Int>{
a = b * 20
b = a + 10
return a to b
}
#Synchronized
fun getComputationResult() : Int{
return b * c
}
}
#Test
fun testInstanceNotShared (){
println("Instance Not Shared Example")
val threadA = Thread{
val pair = SharedData().compute()
println("Running inside ${Thread.currentThread().name} compute And get A = ${pair.first}, B = ${pair.second} ")
}
threadA.name = "threadA"
threadA.start()
val threadB = Thread{
println("Running inside ${Thread.currentThread().name} getComputationResult = ${SharedData().getComputationResult()}")
}
threadB.name = "threadB"
threadB.start()
threadA.join()
threadB.join()
}
// Output
//Instance Not Shared Example
//Running inside threadB getComputationResult = 100
//Running inside threadA compute And get A = 200, B = 210
#Test
fun testInstanceShared (){
println("Instance Shared Example")
val sharedInstance = SharedData()
val threadA = Thread{
val pair = sharedInstance.compute()
println("Running inside ${Thread.currentThread().name} compute And get A = ${pair.first}, B = ${pair.second} ")
}
threadA.name = "threadA"
threadA.start()
val threadB = Thread{
println("Running inside ${Thread.currentThread().name} getComputationResult = ${sharedInstance.getComputationResult()}")
}
threadB.name = "threadB"
threadB.start()
threadA.join()
threadB.join()
}
//Instance Shared Example
//Running inside threadB getComputationResult = 2100
//Running inside threadA compute And get A = 200, B = 210
From above two test case you can identify that answer to your question is actually hidden in way how you call those methods (compute, getComputationResult) in multi-threaded environment.
After the execution of compute, will threadA update main memory
There is no guarantee that threadA Update the value of variable a,b,c,d on main memory but if you use Volatile keyword in front of those variable then it gives you 100% guarantee that those variable is updated with correct state immediately after modification happen
before executing getResult will threadB get only the value of b and c from main memory or will it clear the cache and fetch values for all member variables a,b,c and d
No
Addition to this - if you notice that in second test example even when two thread call the method same time you will get exact result. Means calling compute, getComputationResult same time still the getComputationResult method return the updated value from compute method this is because Synchronized and Volatile provide functionality called happens before which make sure that every write operations should be called before subsequent read operation.

Categories