What's the difference between getVolatile and getAcquire? - java

What is the difference between getVolatile vs getAcquire when using e.g. AtomicInteger?
PS: those are related to
The source of a synchronizes-with edge is called a release, and the
destination is called an acquire.
from https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.3

It all comes back to how we want the optimisation of our code. Optimisation in terms of reordering code. Compiler might reorder to optimise. getAquire ensures that the instructions following it will not be executed before it. These instructions might be reordered but they will always be executed after the getAquire.
This works in combination with setRelease (for VarHandle) where setRelease ensures that what happens before it is not reordered to happen after it.
Example:
Thread1:
var x = 1;
var y = 2;
var z = 3;
A.setRelease(this, 10)
assignments of x, y and z will happen before A.setRelease but might be reordered themselves.
Thread 2:
if (A.getAquire(this) == 10) {
// we know that x is 1, y is 2 and z = 3
}
This is a nice use case for concurrent program where you don't have to push volatility on everything but just need some instructions to be executed before another.
For getVolatile, the variable is treated just like any volatile variable in Java. No reordering or optimisation is happening.
This video is a nice to see to understand whats called the "memory ordering modes" being plain, opaque, release/acquire and volatile.

One of the key differences between acquire/release and volatile (sequential consistency) can be demonstrated using Dekker's algorithm.
public void lock(int t) {
int other = 1-t;
flag[t]=1
while (flag[other] == 1) {
if (turn == other) {
flag[t]=0;
while (turn == other);
flag[t]=1
}
}
}
public void unlock(int t) {
turn = 1-t;
flag[t]=0
}
So lets assume that the writing of the flag is done using a release store and the loading of the flag is done using an acquire-load, then we'll get the following ordering guarantees:
.. other loads/stores
[StoreStore][LoadStore]
flag[t]=1 // release-store
flag[other] // acquire-load
[LoadLoad][LoadStore]
.. other loads/stores
The problem is that the earlier write to flag[t] can be reordered with the later load of flag[other] and the consequence is that 2 threads could end up in the critical section.
The reason the earlier store and the later load to a different address can be reordered is 2 fold:
the compiler could reorder it.
modern CPU's have store buffers to hide the memory latency. Since stores are going to be made eventually anyway, there is no point on letting the CPU stall on a cache write miss.
To prevent this from happening a stronger memory model is needed. In this case we need sequential consistency since we do not want any reordering to take place. This can be realized by adding a [StoreLoad] between the store and the load.
.. other loads/stores
[StoreStore][LoadStore]
flag[t]=1 // release-store
[StoreLoad]
flag[other] // acquire-load
[LoadLoad][LoadStore]
.. other loads/stores
It depends on the ISA on which side this is done; e.g. on the X86 this is typically done on the writing side. E.g. using an MFENCE (there are other like XCHG that has an implicit lock or using a LOCK ADDL 0 to the stack pointer like the JVM typically does).
On the ARM this is done on the reading side. Instead of using a weaker load like an LDAPR, a LDAR is needed which will lead the LDAR to wait till the STLR's have been drained from the store buffer.
For a good read, check the following link:
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/

Related

When not to use volatile,it still can see the changes which issued by other thread

public class VisibleDemo {
private boolean flag;
public VisibleDemo setFlag(boolean flag) {
this.flag = flag;
return this;
}
public static void main(String[] args) throws InterruptedException {
VisibleDemo t = new VisibleDemo();
new Thread(()->{
long l = System.currentTimeMillis();
while (true) {
if (System.currentTimeMillis() - l > 600) {
break;
}
}
t.setFlag(true);
}).start();
new Thread(()->{
long l = System.currentTimeMillis();
while (true) {
if (System.currentTimeMillis() - l > 500) {
break;
}
}
while (!t.flag) {
// if (System.currentTimeMillis() - l > 598) {
//
// }
}
System.out.println("end");
}).start();
}
}
if it does not have the following codes, it will not show "end".
if (System.currentTimeMillis() - l > 598) {
}
if it has these codes, it will probably show "end". Sometimes it does not show.
when is less than 598 or not have these codes, like use 550, it will not show "end".
when is 598, it will probably show "end"
when is greater than 598, it will show "end" every time
notes:
598 is on my computer, May be your computer is another number.
the flag is not with volatile, why can know the newest value.
First: I want to know Why?
Second: I need help,
I want to know the scenarios: when the worker cache of jvm thread will refresh to/from main memory.
OS: windows 10
java: jdk8u231
Your code is suffering from a data-race and that is why it is behaving unreliably.
The JMM is defined in terms of the happens-before relation. So if you have 2 actions A and B, and A happens-before B, then B should see A and everything before A. It is very important to understand that happens-before doesn't imply happening-before (so ordering based on physical time) and vice versa.
The 'flag' field is accessed concurrently; one thread is reading it while another thread is writing it. In JMM terms this is called conflicting access.
Conflicting accesses are fine as long as it is done using some form of synchronization because the synchronization will induce happens-before edges. But since the 'flag' accesses are plain loads/stores, there is no synchronization, and as a consequence, there will not be a happens-before edge to order the load and the store. A conflicting access, that isn't ordered by a happens-before edge, is called a data-race and that is the problem you are suffering from.
When there is a data-race; funny things can happen but it will not lead to undefined behavior like is possible under C++ (undefined behavior can effectively lead to any possible outcome including crashes and super weird behavior). So load still needs to see a value that is written and can't see a value coming out of thin air.
If we look at your code:
while (!t.flag) {
...
}
Because the flag field isn't updated within the loop and is just a plain load, the compiler is allowed to optimize this code to:
if(!t.flag){
while(true){...}
}
This particular optimization is called loop hoisting (or loop invariant code motion).
So this explains why the loop doesn't need to complete.
Why does it complete when you access the System.currentTimeMillis? Because you got lucky; apparently this prevents the JIT from applying the above optimization. But keep in mind that System.currentTimeMillis doesn't have any formal synchronization semantics and therefore doesn't induce happens-before edges.
How to fix your code?
The simplest way to fix your code would be to make 'flag' volatile or access both the read/write from a synchronized block. If you want to go really hardcore: use VarHandle get/set opaque. Officially it is still a data-race because opaque doesn't indice happens-before edges, but it will prevent the compiler to optimize out the load/store. It is a benign data race. The primary advantage is slightly better performance because it doesn't prevent the reordering of surrounding loads/stores.
I want to know the scenarios: when the worker cache of jvm thread will refresh to/from main memory.
This is a fallacy. Caches on modern CPUs are always coherent; this is taken care of by the cache coherence protocol like MESI. Writing to main memory for every volatile read/write would be extremely slow. For more information see the following excellent post. If you want to know more about cache coherence and memory ordering, please check this excellent book which you can download for free.
I want to know the scenarios: when the worker cache of jvm thread will refresh to/from main memory.
When Taylor Swift is playing on your music player, it'll be 598, unless it's tuesday, then it'll be 599.
No, really. It's that arbitrary. The JVM spec gives the JVM the right to come up with any old number for any reason if your code isn't properly guarded.
The problem is JVM diversity. There is a crazy combinatorial explosion:
There are about 8 OSes give or take.
There are like 20 different 'chip lines', with different pipelining behaviour.
These chips can be in various mitigating modes to mitigate against attacks like Spectre. Let's call it 3.
There are about 8 different major JVM vendors.
These come in ~10 or so different versions (java 8, java 9, java 10, java 11, etc).
That gives us about 384000 different combinations.
The point of the JMM (Java Memory Model) is to remove the handcuffs from a JVM implementation. A JVM implementation is looking for this optimal case:
It wants the freedom to use the various tricks that CPUs use to run code as fast as possible. For example, it wants the freedom to be capable of 're-ordering' (given a(); b(), to run b() first, and a() later. Which is okay, if a and b are utterly independent and are not in any way looking at each others modifications). The reason it wants to do this is because CPUs are pipelines: Even processing a single instruction is in fact a chain of many separate steps, and the 'parse the instruction' step can get cracking on parsing another instruction the very moment it is done, even if that instruction is still being processed by the rest of the pipe. In fact, the CPU could have 4 separate 'instruction parser units' and they can be parsing 4 instructions in parallel. This is NOT the kind of parallelism that multiple cores do: This is a single core that will parse 4 consecutive instructions in parallel because parsing instructions is slightly slower than running them. For example. But that's just intel chips of the Z-whatever line. That's the point. If the memory model of the java specification indicates that a JVM simply can't use this stuff then that would mean JVMs on that particular intel chip run slow as molasses. We don't want that.
Nevertheless, the memory model rules can't be so preferential to giving the JVM the right to re-order and do all sorts of crazy things that it becomes impossible to write reliable code for JVMs. Imagine the java lang spec says that the JVM can re-order any 2 instructions in one method at any time even if these 2 instructions are touching the same field. That'd be great for JVM engineers, they can go nuts with optimizing code on the fly to re-order it optimally. But it would impossible to write java code.
So, a balance has been struck. This balance takes the following form:
The JMM gives you specific rules - these rules take the form of: "If you do X, then the JVM guarantees Y".
But that is all. In particular, there is nothing written about what happens if you do not do X. All you know is, that then Y is not guaranteed. But 'not guaranteed' does not mean: Will definitely NOT happen.
Here is an example:
class Data {
static int a = 0;
static int b = 0;
}
class Thread1 extends Thread {
public void run() {
Data.a = 5;
Data.b = 10;
}
}
class Thread2 extends Thread {
public void run() {
int a = Data.a;
int b = Data.b;
System.out.println(a);
System.out.println(b);
}
}
class Main {
public static void main(String[] args) {
new Thread1().start();
new Thread2().start();
}
}
This code:
Makes 2 fields, which start out at 0 and 0.
Runs one thread that first sets a to 5 and then sets b to 10.
Starts a second thread that reads these 2 fields into local vars and then prints these.
The JVM spec says that it is valid for a JVM to:
Print 0/0
Print 5/0
Print 0/10
Print 5/10
But it would not be legal for a JVM to e.g. print '20/20', or '10/5'.
Let's zoom in on the 0/10 case because that is utterly bizarre - how could a JVM possibly do that? Well, reordering!
WILL a JVM print 0/10? On some combinations of JVM vender and version+Architecture+OS+phase of the moon, YES IT WILL. On most, no it won't. Ever. Still, imagine you wrote this code, you rely on 0/10 NEVER occurring, and you test the heck out of your code, and you verify that indeed, even running the test a million times, it never happens. You ship it to the production server and it runs fine for a week and then just as you are giving the demo to the really important potential customer, all heck breaks loose: Your app is broken, as from time to time the 0/10 case does occur.
You file a bug with your JVM vendor. And they close it as 'intended behaviour - wontfix'. That will really happen, because that really is the intended behaviour. _If you write code that relies on a thing being true that is NOT guaranteed by the JMM, then YOU wrote a bug, even if on your particular hardware on this particular day it is completely impossible for you to make this bug occur right now.
This means one simple and very nasty conclusion is the only correct one: You cannot test this stuff.
So, if you adhere to the rule that if there are no tests then you can't know if you code works, guess what? You cannot ever know if your code is fine. Ever.
That then leads to the conclusion that you don't want to write any such code.
This sounds crazy (how can you simply not ever, ever write anything multicore?) but it's not as nuts as you think. This only comes up if 2 threads are dependent on ordering relative to each other for some in-process action. For example, if two threads are both accessing the same field of the same instance. Simply... don't do that.
It's easier than you think: If all 'communication' between threads goes via the database and you know how to use transactions in databases, voila. Or you use a message bus service like RabbitMQ.
If for some job you really must write multithread code where the threads interact with each other, don't shoot the messenger: It is NOT POSSIBLE to test that you did it right. So write it very carefully.
A second conclusion is that the JMM doesn't explain how things work or what happens. It merely says: IF you follow these rules, I guarantee you that THIS will happen. If you don't follow these rules, anything can happen. A JVM is free to do all sorts of crazy shenanigans, and this documentation nor any other documentation will ever enumerate all the crazy things that could happen. After all, there are at least 38400 different combinations and it's crazy to attempt to document all 38400!
So, what are the core rules?
The core rules are so-called happens-before relationships. The basic rule is simply this:
There are various ways to establish H-B relationships. Such a relationship is always between 2 lines of code. 2 lines of code might be unrelated, H-B wise. Or, the rules state that line A 'happens-before' line B.
If and only if the rules state this, then it will be impossible to observe a state of the universe (the values of all fields of all instances in the entire JVM) at line B as it was before line A ran.
That's it. For example, if line A 'happens before' line B, but line B does not attempt to witness any field change A made, then the JVM is still free to reorder and have B run before A. The point is that this shouldn't matter - you're not observing, so why does it matter?
We can 'fix' our weird 0/0/5/10 issue by setting up H-B: If the 'grab the static field values and save them to local a/b vars' code happens-after thread1's setting of it, then we can be sure that the code will always print 5/10 and the JMM guarantees means a JVM that doesn't print that is broken.
H-B are also transitive (if HB(A, B) is true, and HB(B, C) is true, then HB(A, C) is also true).
How do you set up HB?
If line B would run after line A as per the usual understanding of how things run, and both are being run by the same thread, HB(A, B). This is obvious: If you just write x(); y();, then y cannot observe state as it was before x ran.
HB(thread.start(), X) where X is the very first line in the started thread.
HB(EndS, StartS), where EndS is the exiting of a synchronized block on object ref Z, and StartS is another thread entering a synchronized block (on ref Z as well) later.
HB(V, V) where V is 'accessing volatile variable Z', but it is hard to know which way the HB goes with volatiles.
There are a few more exotic ways. There's also a separate HB relationship for constructors and final variables that they initialize, but generally this one is real easy to understand (once a constructor returns, whatever final fields it initialized are definitely set and cannot be observed to not be set, even if otherwise no actual HB relationship has been established. This applies only to final fields).
This explains why you observe weird values. This also explains why your question of 'I want to know when a JVM thread will refresh to/from main memory' is not answerable: Because the java memory model spec and the java virtual machine spec intentionally and specifically make no promises on how that works. One JVM can work one way, another JVM can do it completely differently.
The reason I started off making a seeming joke about playing Taylor Swift is: A CPU has cores, and the cores are limited. A modern computer, especially a desktop, is doing thousands of things at once, and will therefore be rotating apps through cores all the time. Whether a field update is 'flushed out' to main memory (NOTE: THAT IS DANGEROUS THINKING - THE DOCS DO NOT ACTUALLY ENFORCE THAT JVMS CAN BE UNDERSTOOD IN THOSE TERMS!) might depend on whether it gets rotated out of a core or not. And that in turn might depend on your music player dealing with a particular compressed music file that takes a few more cores to decompress the next block so that it can be queued up in the audio buffer.
Hence, and this is no joke, the song you are playing on your music player can in fact change the number you get. Hence, why you have to give up: You CANNOT enumerate 'if my computer is in this state, then this code will always produce Y number'. There are billions of states you'd have to enumerate. Impossible.

Trying to understand shared variables in java threads

I have the following code :
class thread_creation extends Thread{
int t;
thread_creation(int x){
t=x;
}
public void run() {
increment();
}
public void increment() {
for(int i =0 ; i<10 ; i++) {
t++;
System.out.println(t);
}
}
}
public class test {
public static void main(String[] args) {
int i =0;
thread_creation t1 = new thread_creation(i);
thread_creation t2 = new thread_creation(i);
t1.start();
try {
Thread.sleep(500);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
t2.start();
}
}
When I run it , I get :
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
Why I am getting this output ? According to my understanding , the variable i is a shared variable between the two threads created. So according to the code , the first thread will execute and increments i 10 times , and hence , i will be equal to 10 . The second thread will start after the first one because of the sleep statement and since i is shared , then the second thread will start will i=10 and will start incrementing it 10 times to have i = 20 , but this is not the case in the output , so why that ?
You seem to think that int t; in thread_creation is a shared variable. I'm afraid you are mistaken. Each t instance is a different variable. So the two threads are updating distinct counters.
The output you are seeing reflects that.
This is the nub of your question:
How do I pass a shared variable then ?
Actually, you can't1. Strictly a shared variable is actually a variable belonging to a shared object. You cannot pass a variable per se. Java does not allow passing of variables. This is what "Java does not support call-by-reference" really means. You can't pass or return a variable or the address of a variable in any method call. (Or in any other way.)
In Java you pass and return values: either primitives, or references to objects. The values may read from a variable by the call's parameter expression or assigned to a variable after the call's return. But you are not passing the variable. A variable and its value / contents are different things.
So the only way to implement a shared counter is to implement it as a shared counter object.
Note that "variable" and "object" mean different things, both in Java and in other programming languages. You should NOT use the two terms interchangeable. For example, when I declare this in Java:
String s = "Hello";
the s variable is not a String object. It is a variable that contains a reference to the String object. Other variables may contain references to the same String object as well. The distinction is even more stark when the objects are mutable. (String is not mutable ... in Java.)
Here are the two (IMO) best ways to implement a shared counter object.
You could create a custom Java Counter class with a count variable, a get method, and methods for incrementing, decrementing the counter. The class needs to implement various methods as thread-safe and atomic; e.g. by using synchronized methods or blocks2.
You could just use an AtomicInteger instance. That takes care of atomicity and thread-safety ... to the extent that it is possible with this kind of API.
The latter approach is simpler and likely more efficient ... unless you need to do something special each time the counter changes.
(It is conceivable that you could implement a shared counter other ways, but that is too much detail for this answer.)
1 - I realize that I just said the same thing more than 3 times. But as the Bellman says in "The Hunting of the Snark": "What I tell you three times is true."
2 - If the counter is not implemented using synchronized or an equivalent mutual exclusion mechanism with the appropriate happens before semantics, you are liable to see Heisenbugs; e.g. race conditions and memory visibility problems.
Two crucial things you're missing. Both individually explain this behaviour - you can 'fix' either one and you'll still see this, you'd have to fix both to see 1-20:
Java is pass-by-value
When you pass i, you pass a copy of it. In fact, in java, all parameters to methods are always copies. Hence, when the thread does t++, it has absolutely no effect whatsoever on your i. You can trivially test this, and you don't need to mess with threads to see it:
public static void main(String[] args) {
int i = 0;
add5(i);
System.out.println(i); // prints 0!!
}
static void add5(int i) {
i = i + 5;
}
Note that all non-primitives are references. That means: A copy of the reference is passed. It's like passing the address of a house and not the house itself. If I have an address book, and I hand you a scanned copy of a page that contains the address to my summer home, you can still drive over there and toss a brick through the window, and I'll 'see' that when I go follow my copy of the address. So, when you pass e.g. a list and the method you passed the list to runs list.add("foo"), you DO see that. You may think: AHA! That means java does not pass a copy, it passed the real list! Not so. Java passed a copy of a street address (A reference). The method I handed that copy to decided to drive over there and act - that you can see.
In other words, =, ++, that sort of thing? That is done to the copy. . is java for 'drive to the address and enter the house'. Anything you 'do' with . is visible to the caller, = and ++ and such are not.
Fixing the code to avoid the pass-by-value problem
Change your code to:
class thread_creation extends Thread {
static int t; // now its global!
public void run() {
increment();
}
public void increment() {
for(int i =0 ; i<10 ; i++) {
t++;
// System.out.println(t);
}
}
}
public class test {
public static void main(String[] args) throws Exception {
thread_creation t1 = new thread_creation();
thread_creation t2 = new thread_creation();
t1.start();
Thread.sleep(500);
t2.start();
Thread.sleep(500);
System.out.println(thread_creation.t);
}
}
Note that I remarked out the print line. I did that intentionally - see below. If you run the above code, you'd think you see 20, but depending on your hardware, the OS, the song playing on your mp3 playing app, which websites you have open, and the phase of the moon, it may be less than 20. So what's going on there? Enter the...
The evil coin.
The relevant spec here is the JMM (The Java Memory Model). This spec explains precisely what a JVM must do, and therefore, what a JVM is free not to do, especially when it comes to how memory is actually managed.
The crucial aspect is the following:
Any effects (updates to fields, such as that t field) may or may not be observable, JVM's choice. There's no guarantee that anything you do is visible to anything else... unless there exists a Happens-Before/Happens-After relationship: Any 2 statements with such a relationship have the property that the JVM guarantees that you cannot observe the lack of the update done by the HB line from the HA line.
HB/HA can be established in various ways:
The 'natural' way: Anything that is 'before' something else _and runs in the same thread has an HB/HA relationship. In other words, if you do in one thread x++; System.out.println(x); then you can't observe that the x++ hasn't happened yet. It's stated like this so that if you're not observing, you get no guarantees, which gives the JVM the freedom to optimize. For example, Given x++;y++; and that's all you do, the JVM is free to re-order that and increment y before x. Or not. There are no guarantees, a JVM can do whatever it wants.
synchronized. The moment of 'exiting' a synchronized (x) {} block has HB to the HA of another thread 'entering' the top of any synchronized block on the same object, if it enters later.
volatile - but note that with volatile it's basically impossible which one came first. But one of them did, and any interaction with a volatile field is HB relative to another thread accessing the same field later.
thread starting. thread.start() is HB relative to the first line of the run() of that thread.
thread yielding. thread.yield() is HA relative to the last line of the thread.
There are a few more exotic ways to establish HB/HA but that's pretty much it.
Crucially, in your code there is no HB/HA between any of the statements that modify or print t!
In other words, the JVM is free to run it all in such a way that the effects of various t++ statements run by one thread aren't observed by another thread.
What the.. WHY????
Because of efficiency. Your memory banks on your CPU are, relative to how fast CPUs are, oceans away from the CPU core. Fetching or writing to core memory from a CPU takes an incredibly long time - your CPU is twiddling its thumbs for a very long time while it waits for the memory controller to get the job done. It could be running hundreds of instructions in that time.
So, CPU cores do not write to memory AT ALL. Instead they work with caches: They have an on-core cache page, and the only interaction with your main memory banks (which are shared by CPU cores) is 'load in an entire cache page' and 'write an entire cache page'. That cache page is then effectively a 'local copy' that only that core can see and interact with (but can do so very very quickly, as that IS very close to the core, unlike the main memory banks), and then once the algorithm is done it can flush that page back to main memory.
The JVM needs to be free to use this. Had the JVM actually worked like you want (that anything any thread does is instantly observable by all others), then anything that any line does must first wait 500 cycles to load the relevant page, then wait another 500 cycles to write it back. All java apps would literally be 1000x slower than they could be.
This in passing also explains that actual synchronizing is really slow. Nothing java can do about that, it is a fundamental limitation of our modern multi-core CPUs.
So, evil coin?
Note that the JVM does not guarantee that the CPU must neccessarily work with this cache stuff, nor does it make any promises about when cache pages are flushed. It merely limits the guarantees so that JVMs can be efficiently written on CPUs that work like that.
That means that any read or write to any field any java code ever does can best be thought of as follows:
The JVM first flips a coin. On heads, it uses a local cached copy. On tails, it copies over the value from some other thread's cached copy instead.
The coin is evil: It is not reliably a 50/50 arrangement. It is entirely plausible that throughout developing a feature and testing it, the coin lands tails every time it is flipped. It remains flipping tails 100% of the time for the first week that you deployed it. And then just when that big potential customer comes in and you're demoing your app, the coin, being an evil, evil coin, starts flipping heads a few times and breaking your app.
The correct conclusion is that the coin will mess with you and that you cannot unit test against it. The only way to win the game is to ensure that the coin is never flipped.
You do this by never touching a field from multiple threads unless it is constant (final, or simply never changes), or if all access to it (both reads and writes) has clearly established HB/HA between all threads.
This is hard to do. That's why the vast majority of apps don't do it at all. Instead, they:
Talk between threads using a database, which has vastly more advanced synchronization primitives: Transactions.
Talk using a message bus such as RabbitMQ or similar.
Use stuff from the java.util.concurrent package such as a Latch, ForkJoin, ConcurrentMap, or AtomicInteger. These are easier to use (specifically: It is a lot harder to write code for these abstractions that is buggy but where the bug cannot be observed or tested for on the machine of the developer that wrote it, it'll only blow up much later in production. But not impossible, of course).
Let's fix it!
volatile doesn't 'fix' ++. x++; is 'read x, increment by 1, write result to x' and volatile doesn't make that atomic, so we cannot use this. We can either replace t++ with:
synchronized(thread_creation.class) {
t++;
}
Which works fine but is really slow (and you shouldn't lock on publicly visible stuff if you can help it, so make a custom object to lock on, but you get the gist hopefully), or, better, dig into that j.u.c package for something that seems useful. And so there is! AtomicInteger!
class thread_creation extends Thread {
static AtomicInteger t = new AtomicInteger();
public void run() {
increment();
}
public void increment() {
for(int i =0 ; i<10 ; i++) {
t.incrementAndGet();
}
}
}
public class test {
public static void main(String[] args) throws Exception {
thread_creation t1 = new thread_creation();
thread_creation t2 = new thread_creation();
t1.start();
Thread.sleep(500);
t2.start();
Thread.sleep(500);
System.out.println(thread_creation.t.get());
}
}
That code will print 20. Every time (unless those threads take longer than 500msec which technically could be, but is rather unlikely of course).
Why did you remark out the print statement?
That HB/HA stuff can sneak up on you: When you call code you did not write, such as System.out.println, who knows what kind of HB/HA relationships are in that code? Javadoc isn't that kind of specific, they won't tell you. Turns out that on most OSes and JVM implementations, interaction with standard out, such as System.out.println, causes synchronization; either the JVM does it, or the OS does. Thus, introducing print statements 'to test stuff' doesn't work - that makes it impossible to observe the race conditions your code does have. Similarly, involving debuggers is a great way to make that coin really go evil on you and flip juuust so that you can't tell your code is buggy.
That is why I remarked it out, because with it in, I bet on almost all hardware you end up seeing 20 eventhough the JVM doesn't guarantee it and that first version is broken. Even if on your particular machine, on this day, with this phase of the moon, it seems to reliably print 20 every single time you run it.

VarHandle get/setOpaque

I keep fighting to understand what VarHandle::setOpaque and VarHandle::getOpaque are really doing. It has not been easy so far - there are some things I think I get (but will not present them in the question itself, not to muddy the waters), but overall this is miss-leading at best for me.
The documentation:
Returns the value of a variable, accessed in program order...
Well in my understanding if I have:
int xx = x; // read x
int yy = y; // read y
These reads can be re-ordered. On the other had if I have:
// simplified code, does not compile, but reads happen on the same "this" for example
int xx = VarHandle_X.getOpaque(x);
int yy = VarHandle_Y.getOpaque(y);
This time re-orderings are not possible? And this is what it means "program order"? Are we talking about insertions of barriers here for this re-ordering to be prohibited? If so, since these are two loads, would the same be achieved? via:
int xx = x;
VarHandle.loadLoadFence()
int yy = y;
But it gets a lot trickier:
... but with no assurance of memory ordering effects with respect to other threads.
I could not come up with an example to even pretend I understand this part.
It seems to me that this documentation is targeted at people who know exactly what they are doing (and I am definitely not one)... So can someone shed some light here?
Well in my understanding if I have:
int xx = x; // read x
int yy = y; // read y
These reads can be re-ordered.
These reads may not only happen to be reordered, they may not happen at all. The thread may use an old, previously read value for x and/or y or values it did previously write to these variables whereas, in fact, the write may not have been performed yet, so the “reading thread” may use values, no other thread may know of and are not in the heap memory at that time (and probably never will).
On the other had if I have:
// simplified code, does not compile, but reads happen on the same "this" for example
int xx = VarHandle_X.getOpaque(x);
int yy = VarHandle_Y.getOpaque(y);
This time re-orderings are not possible? And this is what it means "program order"?
Simply said, the main feature of opaque reads and writes, is, that they will actually happen. This implies that they can not be reordered in respect to other memory access of at least the same strength, but that has no impact for ordinary reads and writes.
The term program order is defined by the JLS:
… the program order of t is a total order that reflects the order in which these actions would be performed according to the intra-thread semantics of t.
That’s the evaluation order specified for expressions and statements. The order in which we perceive the effects, as long as only a single thread is involved.
Are we talking about insertions of barriers here for this re-ordering to be prohibited?
No, there is no barrier involved, which might be the intention behind the phrase “…but with no assurance of memory ordering effects with respect to other threads”.
Perhaps, we could say that opaque access works a bit like volatile was before Java 5, enforcing read access to see the most recent heap memory value (which makes only sense if the writing end also uses opaque or an even stronger mode), but with no effect on other reads or writes.
So what can you do with it?
A typical use case would be a cancellation or interruption flag that is not supposed to establish a happens-before relationship. Often, the stopped background task has no interest in perceiving actions made by the stopping task prior to signalling, but will just end its own activity. So writing and reading the flag with opaque mode would be sufficient to ensure that the signal is eventually noticed (unlike the normal access mode), but without any additional negative impact on the performance.
Likewise, a background task could write progress updates, like a percentage number, which the reporting (UI) thread is supposed to notice timely, while no happens-before relationship is required before the publication of the final result.
It’s also useful if you just want atomic access for long and double, without any other impact.
Since truly immutable objects using final fields are immune to data races, you can use opaque modes for timely publishing immutable objects, without the broader effect of release/acquire mode publishing.
A special case would be periodically checking a status for an expected value update and once available, querying the value with a stronger mode (or executing the matching fence instruction explicitly). In principle, a happens-before relationship can only be established between the write and its subsequent read anyway, but since optimizers usually don’t have the horizon to identify such a inter-thread use case, performance critical code can use opaque access to optimize such scenario.
The opaque means that the thread executing opaque operation is guaranteed to observe its own actions in program order, but that's it.
Other threads are free to observe the threads actions in any order. On x86 it is a common case since it has
write ordered with store-buffer forwarding
memory model so even if the thread does store before load. The store can be cached in the store buffer and some thread being executed on any other core observes the thread action in reverse order load-store instead of store-load. So opaque operation is done on x86 for free (on x86 we actually also have acquire for free, see this extremely exhaustive answer for details on some other architectures and their memory models: https://stackoverflow.com/a/55741922/8990329)
Why is it useful? Well, I could speculate that if some thread observed a value stored with opaque memory semantic then subsequent read will observe "at least this or later" value (plain memory access does not provide such guarantees, does it?).
Also since Java 9 VarHandles are somewhat related to acquire/release/consume semantic in C I think it is worth noting that opaque access is similar to memory_order_relaxed which is defined in the Standard as follows:
For memory_order_relaxed, no operation orders memory.
with some examples provided.
I have been struggling with opaque myself and the documentation is certainly not easy to understand.
From the above link:
Opaque operations are bitwise atomic and coherently ordered.
The bitwise atomic part is obvious. Coherently ordered means that loads/stores to a single address have some total order, each reach sees the most recent address before it and the order is consistent with the program order. For some coherence examples, see the following JCStress test.
Coherence doesn't provide any ordering guarantees between loads/stores to different addresses so it doesn't need to provide any fences so that loads/stores to different addresses are ordered.
With opaque, the compiler will emit the loads/stores as it sees them. But the underlying hardware is still allowed to reorder load/stores to different addresses.
I upgraded your example to the message-passing litmus test:
thread1:
X.setOpaque(1);
Y.setOpaque(1);
thread2:
ry = Y.getOpaque();
rx = X.getOpaque();
if (ry == 1 && rx == 0) println("Oh shit");
The above could fail on a platform that would allow for the 2 stores to be reordered or the 2 loads (again ARM or PowerPC). Opaque is not required to provide causality. JCStress has a good example for that as well.
Also, the following IRIW example can fail:
thread1:
X.setOpaque(1);
thread2:
Y.setOpaque(1);
thread3:
rx_thread3 = X.getOpaque();
[LoadLoad]
ry_thread3 = Y.getOpaque();
thread4:
ry_thread4 = Y.getOpaque();
[LoadLoad]
rx_thread4 = X.getOpaque();
Can it be that we end up with rx_thread3=1,ry_thread3=0,ry_thread4=1 and rx_thread4 is 0?
With opaque this can happen. Even though the loads are prevented from being reordered, opaque accesses do not require multi-copy-atomicity (stores to different addresses issued by different CPUs can be seen in different orders).
Release/acquire is stronger than opaque, since with release/acquire it is allowed to fail, therefor with opaque, it is allowed to fail. So Opaque is not required to provide consensus.

Is the expression "a==1 ? 1 : 0" with comparison plus ternary operator expression atomic?

Quick question? Is this line atomic in C++ and Java?
class foo {
bool test() {
// Is this line atomic?
return a==1 ? 1 : 0;
}
int a;
}
If there are multiple thread accessing that line, we could end up with doing the check
a==1 first, then a is updated, then return, right?
Added: I didn't complete the class and of course, there are other parts which update a...
No, for both C++ and Java.
In Java, you need to make your method synchronized and protect other uses of a in the same way. Make sure you're synchronizing on the same object in all cases.
In C++, you need to use std::mutex to protect a, probably using std::lock_guard to make sure you properly unlock the mutex at the end of your function.
return a==1 ? 1 : 0;
is a simple way of writing
if(a == 1)
return 1;
else
return 0;
I don't see any code for updating a. But I think you could figure it out.
Regardless of whether there is a write, reading the value of a non-atomic type in C++ is not an atomic operation. If there are no writes then you might not care whether it's atomic; if some other thread might be modifying the value then you certainly do care.
The correct way of putting it is simply: No! (both for Java and C++)
A less correct, but more practical answer is: Technically this is not atomic, but on most mainstream architectures, it is at least for C++.
Nothing is being modified in the code you posted, the variable is only tested. The code will thus usually result in a single TEST (or similar) instruction accessing that memory location, and that is, incidentially, atomic. The instruction will read a cache line, and there will be one well-defined value in the respective loaction, whatever it may be.
However, this is incidential/accidential, not something you can rely on.
It will usually even work -- again, incidentially/accidentially -- when a single other thread writes to the value. For this, the CPU fetches a cache line, overwrites the location for the respective address within the cache line, and writes back the entire cache line to RAM. When you test the variable, you fetch a cache line which contains either the old or the new value (nothing in between). No happens-before guarantees of any kind, but you can still consider this "atomic".
It is much more complicated when several threads modify that variable concurrently (not part of the question). For this to work properly, you need to use something from C++11 <atomic>, or use an atomic intrinsic, or something similar. Otherwise it is very much unclear what happens, and what the result of an operation may be -- one thread might read the value, increment it and write it back, but another one might read the original value before the modified value is written back.
This is more or less guaranteed to end badly, on all current platforms.
No, it is not atomic (in general) although it can be in some architectures (in C++, for example, in intel if the integer is aligned which it will be unless you force it not to be).
Consider these three threads:
// thread one: // thread two: //thread three
while (true) while (true) while (a) ;
a = 0xFFFF0000; a = 0x0000FFFF;
If the write to a is not atomic (for example, intel if a is unaligned, and for the sake of discussion with 16bits in each one of two consecutive cache lines). Now while it seems that the third thread cannot ever come out of the loop (the two possible values of a are both non-zero), the fact is that the assignments are not atomic, thread two could update the higher 16bits to be 0, and thread three could read the lower 16bits to be 0 before thread two gets the time to complete the update, and come out of the loop.
The whole conditional is irrelevant to the question, since the returned value is local to the thread.
No, it still a test followed by a set and then a return.
Yes, multithreadedness will be a problem.
It's just syntactic sugar.
Your question can be rephrased as: is statement:
a == 1
atomic or not? No it is not atomic, you should use std::atomic for a or check that condition under lock of some sort. If whole ternary operator atomic or not does not matter in this context as it does not change anything. If you mean in your question if in this code:
bool flag = somefoo.test();
flag to be consistent to a == 1, it would definitely not, and it irrelevant if whole ternary operator in your question is atomic.
There a lot of good answers here, but none of them mention the need in Java to mark a as volatile.
This is especially important if no other synchronization method is employed, but other threads could updating a. Otherwise, you could be reading an old value of a.
Consider the following code:
bool done = false;
void Thread1() {
while (!done) {
do_something_useful_in_a_loop_1();
}
do_thread1_cleanup();
}
void Thread2() {
do_something_useful_2();
done = true;
do_thread2_cleanup();
}
The synchronization between these two threads is done using a boolean variable done. This is a wrong way to synchronize two threads.
On x86, the biggest issue is the compile-time optimizations.
Part of the code of do_something_useful_2() can be moved below "done = true" by the compiler.
Part of the code of do_thread2_cleanup() can be moved above "done = true" by the compiler.
If do_something_useful_in_a_loop_1() doesn't modify "done", the compiler may re-write Thread1 in the following way:
if (!done) {
while(true) {
do_something_useful_in_a_loop_1();
}
}
do_thread1_cleanup();
so Thread1 will never exit.
On architectures other than x86, the cache effects or out-of-order instruction execution may lead to other subtle problems.
Most race detector will detect such race.
Also, most dynamic race detectors will report data races on the memory accesses that were intended to be synchronized with this bool
(i.e. between do_something_useful_2() and do_thread1_cleanup())
To fix such race you need to use compiler and/or memory barriers (if you are not an expert -- simply use locks).

Writing a thread safe modular counter in Java

Full disclaimer: this is not really a homework, but I tagged it as such because it is mostly a self-learning exercise rather than actually "for work".
Let's say I want to write a simple thread safe modular counter in Java. That is, if the modulo M is 3, then the counter should cycle through 0, 1, 2, 0, 1, 2, … ad infinitum.
Here's one attempt:
import java.util.concurrent.atomic.AtomicInteger;
public class AtomicModularCounter {
private final AtomicInteger tick = new AtomicInteger();
private final int M;
public AtomicModularCounter(int M) {
this.M = M;
}
public int next() {
return modulo(tick.getAndIncrement(), M);
}
private final static int modulo(int v, int M) {
return ((v % M) + M) % M;
}
}
My analysis (which may be faulty) of this code is that since it uses AtomicInteger, it's quite thread safe even without any explicit synchronized method/block.
Unfortunately the "algorithm" itself doesn't quite "work", because when tick wraps around Integer.MAX_VALUE, next() may return the wrong value depending on the modulo M. That is:
System.out.println(Integer.MAX_VALUE + 1 == Integer.MIN_VALUE); // true
System.out.println(modulo(Integer.MAX_VALUE, 3)); // 1
System.out.println(modulo(Integer.MIN_VALUE, 3)); // 1
That is, two calls to next() will return 1, 1 when the modulo is 3 and tick wraps around.
There may also be an issue with next() getting out-of-order values, e.g.:
Thread1 calls next()
Thread2 calls next()
Thread2 completes tick.getAndIncrement(), returns x
Thread1 completes tick.getAndIncrement(), returns y = x+1 (mod M)
Here, barring the forementioned wrapping problem, x and y are indeed the two correct values to return for these two next() calls, but depending on how the counter behavior is specified, it can be argued that they're out of order. That is, we now have (Thread1, y) and (Thread2, x), but maybe it should really be specified that (Thread1, x) and (Thread2, y) is the "proper" behavior.
So by some definition of the words, AtomicModularCounter is thread-safe, but not actually atomic.
So the questions are:
Is my analysis correct? If not, then please point out any errors.
Is my last statement above using the correct terminology? If not, what is the correct statement?
If the problems mentioned above are real, then how would you fix it?
Can you fix it without using synchronized, by harnessing the atomicity of AtomicInteger?
How would you write it such that tick itself is range-controlled by the modulo and never even gets a chance to wraps over Integer.MAX_VALUE?
We can assume M is at least an order smaller than Integer.MAX_VALUE if necessary
Appendix
Here's a List analogy of the out-of-order "problem".
Thread1 calls add(first)
Thread2 calls add(second)
Now, if we have the list updated succesfully with two elements added, but second comes before first, which is at the end, is that "thread safe"?
If that is "thread safe", then what is it not? That is, if we specify that in the above scenario, first should always come before second, what is that concurrency property called? (I called it "atomicity" but I'm not sure if this is the correct terminology).
For what it's worth, what is the Collections.synchronizedList behavior with regards to this out-of-order aspect?
As far as I can see you just need a variation of the getAndIncrement() method
public final int getAndIncrement(int modulo) {
for (;;) {
int current = atomicInteger.get();
int next = (current + 1) % modulo;
if (atomicInteger.compareAndSet(current, next))
return current;
}
}
I would say that aside from the wrapping, it's fine. When two method calls are effectively simultaneous, you can't guarantee which will happen first.
The code is still atomic, because whichever actually happens first, they can't interfere with each other at all.
Basically if you have code which tries to rely on the order of simultaneous calls, you already have a race condition. Even if in the calling code one thread gets to the start of the next() call before the other, you can imagine it coming to the end of its time-slice before it gets into the next() call - allowing the second thread to get in there.
If the next() call had any other side effect - e.g. it printed out "Starting with thread (thread id)" and then returned the next value, then it wouldn't be atomic; you'd have an observable difference in behaviour. As it is, I think you're fine.
One thing to think about regarding wrapping: you can make the counter last an awful lot longer before wrapping if you use an AtomicLong :)
EDIT: I've just thought of a neat way of avoiding the wrapping problem in all realistic scenarios:
Define some large number M * 100000 (or whatever). This should be chosen to be large enough to not be hit too often (as it will reduce performance) but small enough that you can expect the "fixing" loop below to be effective before too many threads have added to the tick to cause it to wrap.
When you fetch the value with getAndIncrement(), check whether it's greater than this number. If it is, go into a "reduction loop" which would look something like this:
long tmp;
while ((tmp = tick.get()) > SAFETY_VALUE))
{
long newValue = tmp - SAFETY_VALUE;
tick.compareAndSet(tmp, newValue);
}
Basically this says, "We need to get the value back into a safe range, by decrementing some multiple of the modulus" (so that it doesn't change the value mod M). It does this in a tight loop, basically working out what the new value should be, but only making a change if nothing else has changed the value in between.
It could cause a problem in pathological conditions where you had an infinite number of threads trying to increment the value, but I think it would realistically be okay.
Concerning the atomicity problem: I don't believe that it's possible for the Counter itself to provide behaviour to guarantee the semantics you're implying.
I think we have a thread doing some work
A - get some stuff (for example receive a message)
B - prepare to call Counter
C - Enter Counter <=== counter code is now in control
D - Increment
E - return from Counter <==== just about to leave counter's control
F - application continues
The mediation you're looking for concerns the "payload" identity ordering established at A.
For example two threads each read a message - one reads X, one reads Y. You want to ensure that X gets the first counter increment, Y gets the second, even though the two threads are running simultaneously, and may be scheduled arbitarily across 1 or more CPUs.
Hence any ordering must be imposed across all the steps A-F, and enforced by some concurrency countrol outside of the Counter. For example:
pre-A - Get a lock on Counter (or other lock)
A - get some stuff (for example receive a message)
B - prepare to call Counter
C - Enter Counter <=== counter code is now in control
D - Increment
E - return from Counter <==== just about to leave counter's control
F - application continues
post- F - release lock
Now we have a guarantee at the expense of some parallelism; the threads are waiting for each other. When strict ordering is a requirement this does tend to limit concurrency; it's a common problem in messaging systems.
Concerning the List question. Thread-safety should be seen in terms of interface guarantees. There is absolute minimum requriement: the List must be resilient in the face of simultaneous access from several threads. For example, we could imagine an unsafe list that could deadlock or leave the list mis-linked so that any iteration would loop for ever. The next requirement is that we should specify behaviour when two threads access at the same time. There's lots of cases, here's a few
a). Two threads attempt to add
b). One thread adds item with key "X", another attempts to delete the item with key "X"
C). One thread is iterating while a second thread is adding
Providing that the implementation has clearly defined behaviour in each case it's thread-safe. The interesting question is what behaviours are convenient.
We can simply synchronise on the list, and hence easily give well-understood behaviour for a and b. However that comes at a cost in terms of parallelism. And I'm arguing that it had no value to do this, as you still need to synchronise at some higher level to get useful semantics. So I would have an interface spec saying "Adds happen in any order".
As for iteration - that's a hard problem, have a look at what the Java collections promise: not a lot!
This article , which discusses Java collections may be interesting.
Atomic (as I understand) refers to the fact that an intermediate state is not observable from outside. atomicInteger.incrementAndGet() is atomic, while return this.intField++; is not, in the sense that in the former, you can not observe a state in which the integer has been incremented, but has not yet being returned.
As for thread-safety, authors of Java Concurrency in Practice provide one definition in their book:
A class is thread-safe if it behaves
correctly when accessed from multiple
threads, regardless of the scheduling
or interleaving of the execution of
those threads by the runtime
environment, and with no additional
synchronization or other coordination
on the part of the calling code.
(My personal opinion follows)
Now, if we have the list
updated succesfully with two elements
added, but second comes before first,
which is at the end, is that "thread
safe"?
If thread1 entered the entry set of the mutex object (In case of Collections.synchronizedList() the list itself) before thread2, it is guaranteed that first is positioned ahead than second in the list after the update. This is because the synchronized keyword uses fair lock. Whoever sits ahead of the queue gets to do stuff first. Fair locks can be quite expensive and you can also have unfair locks in java (through the use of java.util.concurrent utilities). If you'd do that, then there is no such guarantee.
However, the java platform is not a real time computing platform, so you can't predict how long a piece of code requires to run. Which means, if you want first ahead of second, you need to ensure this explicitly in java. It is impossible to ensure this through "controlling the timing" of the call.
Now, what is thread safe or unsafe here? I think this simply depends on what needs to be done. If you just need to avoid the list being corrupted and it doesn't matter if first is first or second is first in the list, for the application to run correctly, then just avoiding the corruption is enough to establish thread-safety. If it doesn't, it is not.
So, I think thread-safety can not be defined in the absence of the particular functionality we are trying to achieve.
The famous String.hashCode() doesn't use any particular "synchronization mechanism" provided in java, but it is still thread safe because one can safely use it in their own app. without worrying about synchronization etc.
Famous String.hashCode() trick:
int hash = 0;
int hashCode(){
int hash = this.hash;
if(hash==0){
hash = this.hash = calcHash();
}
return hash;
}

Categories