Volatile keyword for thread safety - java

I have the below code:
public class Foo {
private volatile Map<String, String> map;
public Foo() {
refresh();
}
public void refresh() {
map = getData();
}
public boolean isPresent(String id) {
return map.containsKey(id);
}
public String getName(String id) {
return map.get(id);
}
private Map<String, String> getData() {
// logic
}
}
Is the above code thread safe or do I need to add synchronized or mutexes in there? If it's not thread safe, please clarify why.
Also, I've read that one should use AtomicReference instead of this, but in the source of the AtomicReference class, I can see that the field used to hold the value is volatile (along with a few convenience methods).
Is there a specific reason to use AtomicReference instead?
I've read multiple answer related to this but the concept of volatile still confuses me. Thanks in advance.

If you're not modifying the contents of map (except inside of refresh() when creating it), then there are no visibility issues in the code.
It's still possible to do isPresent(), refresh(), getName() (if no outside synchronization is used) and end up with isPresent()==true and getName()==null.

A class is "thread safe" if it does the right thing when it is used by multiple threads at the same time. There is no way to tell whether a class is thread safe unless you can say what "the right thing" means, and especially, what "the right thing when used by multiple threads" means.
What is the right thing if thread A calls foo.isPresent("X") and it returns true, and then thread B calls foo.refresh(), and then thread A calls foo.getName("X")?
If you are going to claim "thread safety", then you must be very explicit about what the caller should expect in cases like that.

Volatile is only useful in this scenario to update the value immediately. It doesn't really make the code by itself thread-safe.
But because you've stated in your comment, you only update the reference and because the reference-switch is atomic, your code will be thread-safe.(from the given code)

If I understood your question correctly and your comments - your class Foo holds a Map in which only the reference should be updated e.g. a whole new Map added instead of mutating it. If this is the premise:
It does not make any difference if you declare it as volatile or not. Every read/write operation in Java is atomic itself. You will never see a half transaction on these operations. See the JLS 17.7
17.7. Non-Atomic Treatment of double and long
For the purposes of the Java programming language memory model, a single write to a non-volatile long or double value is treated as two separate writes: one to each 32-bit half. This can result in a situation where a thread sees the first 32 bits of a 64-bit value from one write, and the second 32 bits from another write.
Writes and reads of volatile long and double values are always atomic.
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
Some implementations may find it convenient to divide a single write action on a 64-bit long or double value into two write actions on adjacent 32-bit values. For efficiency's sake, this behavior is implementation-specific; an implementation of the Java Virtual Machine is free to perform writes to long and double values atomically or in two parts.
Implementations of the Java Virtual Machine are encouraged to avoid splitting 64-bit values where possible. Programmers are encouraged to declare shared 64-bit values as volatile or synchronize their programs correctly to avoid possible complications.
EDIT: Although the top statement still stands as it is - for thread safety it's necessary to add volatile to reflect the immediate update on different Threads to reflect the reference update. The behavior of a Thread is to make local copy of it while with volatile it will do a happens-before relationship in other words the Threads will have the same state of the Map.

Related

Proper use of volatile variables and synchronized blocks

I am trying to wrap my head around thread safety in java (or in general). I have this class (which I hope complies with the definition of a POJO) which also needs to be compatible with JPA providers:
public class SomeClass {
private Object timestampLock = new Object();
// are "volatile"s necessary?
private volatile java.sql.Timestamp timestamp;
private volatile String timestampTimeZoneName;
private volatile BigDecimal someValue;
public ZonedDateTime getTimestamp() {
// is synchronisation necessary here? is this the correct usage?
synchronized (timestampLock) {
return ZonedDateTime.ofInstant(timestamp.toInstant(), ZoneId.of(timestampTimeZoneName));
}
}
public void setTimestamp(ZonedDateTime dateTime) {
// is this the correct usage?
synchronized (timestampLock) {
this.timestamp = java.sql.Timestamp.from(dateTime.toInstant());
this.timestampTimeZoneName = dateTime.getZone().getId();
}
}
// is synchronisation required?
public BigDecimal getSomeValue() {
return someValue;
}
// is synchronisation required?
public void setSomeValue(BigDecimal val) {
someValue = val;
}
}
As stated in the commented rows in the code, is it necessary to define timestamp and timestampTimeZoneName as volatile and are the synchronized blocks used as they should be? Or should I use only the synchronized blocks and not define timestamp and timestampTimeZoneName as volatile? A timestampTimeZoneName of a timestamp should not be erroneously matched with another timestamp's.
This link says
Reads and writes are atomic for all variables declared volatile
(including long and double variables)
Should I understand that accesses to someValue in this code through the setter/getter are thread safe thanks to volatile definitions? If so, is there a better (I do not know what "better" might mean here) way to accomplish this?
To determine if you need synchronized, try to imagine a place where you can have a context switch that would break your code.
In this case, if the context switch happens where I put the comment, then in getTimestamp() you're going to be reading different values from each timestamp type.
Also, although assignments are atomic, this expression java.sql.Timestamp.from(dateTime.toInstant()); certainly isn't, so you can get a context switch inbetween dateTime.toInstant() and the call to from. In short you definitely need the synchronized blocks.
synchronized (timestampLock) {
this.timestamp = java.sql.Timestamp.from(dateTime.toInstant());
//CONTEXT SWITCH HERE
this.timestampTimeZoneName = dateTime.getZone().getId();
}
synchronized (timestampLock) {
return ZonedDateTime.ofInstant(timestamp.toInstant(), ZoneId.of(timestampTimeZoneName));
}
In terms of volatile, I'm pretty sure they're required. You have to guarantee that each thread definitely is getting the most updated version of a variable.
This is the contract of volatile. And although it may be covered by the synchronized block, and volatile not actually necessary here, it's good to write anyway. If the synchronized block does the job of volatile already, the VM won't do the guarantee twice. This means volatile won't cost you any more, and it's a very good flashing light that says to the programmer: "I'M USED IN MULTIPLE THREADS".
For someValue: If there's no synchronized block here, then volatile is definitely necessary. If you call a set in one thread, the other thread has no queue that tells it that may have been updated outside of this thread. So it may use an old and cached value. The JIT can do a lot of funny optimizations if it assumes single thread. Ones that can simply break your program.
Now I'm not entirely certain if synchronized is required here. My guess is no. I would add it anyway to be safe though. Or you can let java worry about the synchronization and use http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicInteger.html
Nothing new here, this is just a more explicit version of something #Cruncher already said:
You need synchronized whenever it is important for two or more fields in your program to be consistent with one another. Suppose you have two parallel lists, and your code depends on them both being the same length. That's called an invariant as in, the two lists are invariably the same length.
How can you write a method, append(x,y), that adds a new pair of values to the lists without temporarily breaking the invariant? You can't. The method must add one item to the first list, breaking the invariant, and then add the other item to the second list, fixing it again. There's no other way.
In a single-threaded program, that temporary broken state is no problem because no other method can possibly use the lists while append(x,y) is running. That's no longer true in a multithreaded program. In the worst case, append(x,y) could add x to the x list, and then the scheduler could suspend the thread at that exact moment to allow other threads to run. The CPUs could execute millions of instructions before append(x,y) gets to finish the job and make the lists right again. During all of that time, other threads would see the broken invariant, and possibly corrupt your data or crash the program as a result.
The fix is for append(x,y) to be synchronized on some object, and (this is the important part), for every other method that uses the lists to be synchronized on the same object. Since only one thread can be synchronized on a given object at a given time, it will not be possible for any other thread to see the lists in an inconsistent state.
So, if thread A calls append(x,y), and thread B tries to look at the lists "at the same time", will thread B see the what the lists looked like before or after thread A did its work? That's called a data race. And with only the synchronization that I have described so far, there's no way to know which thread will win. All we've done so far is to guarantee one particular invariant.
If it matters which thread wins the race, then that means that there is some higher-level invariant that also needs protection. You will have to add more synchronization to protect that one too. "Thread safety" -- two little words to name a subject that is both broad and deep.
Good Luck, and Have Fun!
// is synchronisation required?
public BigDecimal getSomeValue() {
return someValue;
}
// is synchronisation required?
public void setSomeValue(BigDecimal val) {
someValue = val;
}
I think Yes you are require to put the synchronization block because consider an example in which one thread is setting the value and at the same time other thread is trying to read from getter method, like here in the example you will see the syncronization block.So, if you take your variable inside the method then you must require the synchronization block.

Need of synchronization in getters and setters

Probably a very silly question. Just wanted to confirm my understanding.
class Test
{
private volatile String id;
public void setID(String id)
{
this.id = id;
}
public String getID()
{
return id;
}
}
Lets say that an object of above class can be accessed by multiple threads. My understanding is that in case of simple getter and setters like above (doing simple initialization), I do not need to make these methods synchronized, correct ?
I guess volatile is needed as otherwise value of id can be outdated otherwise in different threads.
So, can anyone see any problem if we do not have these methods as synchronized ?
My understanding is that in case of simple getter and setters like above (doing simple initialization), I do not need to make these methods synchronized, correct ?
Correct, because what they get and set (an object reference) is treated atomically by the JVM.
The answer would be "No, you do need synchronization" if you were using a long or double and you didn't have it marked volatile.
Both aspects of this are covered in the JLS, §17.7:
17.7. Non-atomic Treatment of double and long
For the purposes of the Java programming language memory model, a single write to a non-volatile long or double value is treated as two separate writes: one to each 32-bit half. This can result in a situation where a thread sees the first 32 bits of a 64-bit value from one write, and the second 32 bits from another write.
Writes and reads of volatile long and double values are always atomic.
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
If you are in a multi-threaded environment. several thread can access your data. Reading value (get) is fine.But think about the write(set), then your data will become inconsistence. So you have to Synchronized.
You do not need to make any of those functions synchronized nor use the volatile keyword, setting references is always atomic. There are however other problems that arise from non-synchronization/non-volatiling.
First: what Thread A reads via getID might not be what Thread B wrote via setID, because either Thread A was too early or...
Second: Thread A was on time but because of the lack of volatile, it was reading a cached thread variable instead of the real value.
While the first one can only be solved via external thread synchronization or by the architecture of your code, the second one can cause issues based on the happens-before problematic. Take the following example:
Thread A:
myId.setId(3);
idSet = true;
Thread B:
if (idSet) {
accessData(myId.getId());
}
This looks correct - and it kind of is - but what can happen during the JVM optimization step is that first idSet = true is executed and then myId.setId(3). So in worst case, Thread B succeeds on the if clause, but then reads a wrong value. Marking id as volatile will solve that problem, as it guarantees that whenever id is modified, everything that happened before has actually happened.
Another way to solve that is to use immutable classes, so no setter and id is final and set via constructor.
My understanding is that in case of simple getter and setters like above
(doing simple initialization), I do not need to make these methods
synchronized, correct ?
Making a variable volatile simply ensures there will not be a race condition. All threads will read the same value of variable because the variable being volatile is stored directly in main memory and not in Threads local cache.
However it is possible one thread enters public String getID() function. At this point other thread may very well change the value of the variable by executing public void setID(String id) method. 1st thread will see this modified value.
So do not get confused between using atomic variables and synchronizing functions.

Should getters and setters be synchronized?

private double value;
public synchronized void setValue(double value) {
this.value = value;
}
public double getValue() {
return this.value;
}
In the above example is there any point in making the getter synchronized?
I think its best to cite Java Concurrency in Practice here:
It is a common mistake to assume that synchronization needs to be used only when writing to shared variables; this is simply not true.
For each mutable state variable that may be accessed by more than one
thread, all accesses to that variable must be performed with the same
lock held. In this case, we say that the variable is guarded by that
lock.
In the absence of synchronization, the compiler, processor, and runtime can do some downright weird things to the order in which operations appear to execute. Attempts to reason about the order in which memory actions "must" happen in insufflciently synchronized multithreaded programs will almost certainly be incorrect.
Normally, you don't have to be so careful with primitives, so if this would be an int or a boolean it might be that:
When a thread reads a variable without synchronization, it may see a
stale value, but at least it sees a value that was actually placed
there by some thread rather than some random value.
This, however, is not true for 64-bit operations, for instance on long or double if they are not declared volatile:
The Java Memory Model requires fetch and
store operations to be atomic, but for nonvolatile long and double
variables, the JVM is permitted to treat a 64-bit read or write as two
separate 32-bit operations. If the reads and writes occur in different
threads, it is therefore possible to read a nonvolatile long and get
back the high 32 bits of one value and the low 32 bits of another.
Thus, even if you don't care about stale values, it is not safe to use
shared mutable long and double variables in multithreaded programs
unless they are declared volatile or guarded by a lock.
Let me show you by example what is a legal way for a JIT to compile your code. You write:
while (myBean.getValue() > 1.0) {
// perform some action
Thread.sleep(1);
}
JIT compiles:
if (myBean.getValue() > 1.0)
while (true) {
// perform some action
Thread.sleep(1);
}
In just slightly different scenarios even the Java compiler could prouduce similar bytecode (it would only have to eliminate the possibility of dynamic dispatch to a different getValue). This is a textbook example of hoisting.
Why is this legal? The compiler has the right to assume that the result of myBean.getValue() can never change while executing above code. Without synchronized it is allowed to ignore any actions by other threads.
The reason here is to guard against any other thread updating the value when a thread is reading and thus avoid performing any action on stale value.
Here get method will acquire intrinsic lock on "this" and thus any other thread which might attempt to set/update using setter method will have to wait to acquire lock on "this" to enter the setter method which is already acquired by thread performing get.
This is why its recommended to follow the practice of using same lock when performing any operation on a mutable state.
Making the field volatile will work here as there are no compound statements.
It is important to note that synchronized methods use intrinsic lock which is "this". So get and set both being synchronized means any thread entering the method will have to acquire lock on this.
When performing non atomic 64 bit operations special consideration should be taken. Excerpts from Java Concurrency In Practice could be of help here to understand the situation -
"The Java Memory Model requires fetch and store operations to be atomic, but for non-volatile long and double variables, the JVM is permitted to treat a 64 bit read or write as two separate 32
bit operations. If the reads and writes occur in different threads, it is therefore possible to read a non-volatile long and get back the high 32 bits of one value and the low 32 bits of another. Thus, even if you don't care about stale values, it
is not safe to use shared mutable long and double variables in multi-threaded programs unless they are declared
volatile or guarded by a lock."
Maybe for someone this code looks awful, but it works very well.
private Double value;
public void setValue(Double value){
updateValue(value, true);
}
public Double getValue(){
return updateValue(value, false);
}
private double updateValue(Double value,boolean set){
synchronized(MyClass.class){
if(set)
this.value = value;
return value;
}
}

In Java, is it safe to change a reference to a HashMap read concurrently

I hope this isn't too silly a question...
I have code similar to the following in my project:
public class ConfigStore {
public static class Config {
public final String setting1;
public final String setting2;
public final String setting3;
public Config(String setting1, String setting2, String setting3) {
this.setting1 = setting1;
this.setting2 = setting2;
this.setting3 = setting3;
}
}
private volatile HashMap<String, Config> store = new HashMap<String, Config>();
public void swapConfigs(HashMap<String, Config> newConfigs) {
this.store = newConfigs;
}
public Config getConfig(String name) {
return this.store.get(name);
}
}
As requests are processed, each thread will request a config to use from the store using the getConfig() function. However, periodically (every few days most likely), the configs are updated and swapped out using the swapConfigs() function. The code that calls swapConfigs() does not keep a reference to the Map it passes in as it is simply the result of parsing a configuration file.
In this case, is the volatile keyword still needed on the store instance variable?
Will the volatile keyword introduce any potential performance bottlenecks that I should be aware of or can avoid given that the rate of reads greatly exceeds the rate of writes?
Thanks very much,
Since changing references is an atomic operation, you won't end up with one thread modifying the reference, and the other seeing a garbage reference, even if you drop volatile. However, the new map may not get instantly visible for some threads, which may consequently keep reading configuration from the old map for an indefinite time (or forever). So keep volatile.
Update
As #BeeOnRope pointed out in a comment below, there is an even stronger reason to use volatile:
"non-volatile writes [...] don't establish a happens-before relationship between the write and subsequent reads that see the written value. This means that a thread can see a new map published through the instance variable, but this new map hasn't been fully constructed yet. This is not intuitive, but it's a consequence of the memory model, and it happens in the real word. For an object to be safely published, it must be written to a volatile, or use a handful of other techniques.
Since you change the value very rarely, I don't think volatile would cause any noticeable performance difference. But at any rate, correct behaviour trumps performance.
No, this is not thread safe without volatile, even apart from the issues of seeing stale values. Even though there are no writes to the map itself, and reference assignment is atomic, the new Map<> has not been safely published.
For an object to be safely published, it must be communicated to other threads using some mechanism that either establishes a happens-before relationship between the object construction, the reference publication and the reference read, or it must use a handful of narrower methods which are guaranteed to be safe for publishing:
Initializing an object reference from a static initializer.
Storing a reference to it into a final field.
Neither of those two publication specific ways applies to you, so you'll need volatile to establish happens-before.
Here is a longer version of this reasoning, including links to the JLS and some examples of real-world things that can happen if you don't publish safely.
More details on safe publication can be found in JCIP (highly recommended), or here.
Your code is fine. You need volatile, otherwise your code would be 100% thread-safe (updating a reference is atomic), however the change might not be visible to all the threads. It means some threads will still see the old value of store.
That being said volatile is obligatory in your example. You might consider AtomicReference, but it won't give you anything more in your case.
You cannot trade correctness for performance so your second question is not really valid. It will have some performance impact, but probably only during update, which happens very rarely as you said. Basically JVM will ensure the change is visible to all the threads by "flushing" it, but after that it will be accessible as any other local variable (up until next update).
BTW I like Config class being immutable, please also consider immutable Map implementation just in case.
Would it work for you to use a ConcurrentHashMap and instead of swapping the entire config update the affected values in the hash map?

java threads synchronization

In the class below, is the method getIt() thread safe and why?
public class X {
private long myVar;
public void setIt(long var){
myVar = var;
}
public long getIt() {
return myVar;
}
}
It is not thread-safe. Variables of type long and double in Java are treated as two separate 32-bit variables. One thread could be writing and have written half the value when another thread reads both halves. In this situation, the reader would see a value that was never supposed to exist.
To make this thread-safe you can either declare myVar as volatile (Java 1.5 or later) or make both setIt and getIt synchronized.
Note that even if myVar was a 32-bit int you could still run into threading issues where one thread could be reading an out of date value that another thread has changed. This could occur because the value has been cached by the CPU. To resolve this, you again need to declare myVar as volatile (Java 1.5 or later) or make both setIt and getIt synchronized.
It's also worth noting that if you are using the result of getIt in a subsequent setIt call, e.g. x.setIt(x.getIt() * 2), then you probably want to synchronize across both calls:
synchronized(x)
{
x.setIt(x.getIt() * 2);
}
Without the extra synchronization, another thread could change the value in between the getIt and setIt calls causing the other thread's value to be lost.
This is not thread-safe. Even if your platform guarantees atomic writes of long, the lack of synchronized makes it possible that one thread calls setIt() and even after this call has finished it is possible that another thread can call getIt() and this call could return the old value of myVar.
The synchronized keyword does more than an exclusive access of one thread to a block or a method. It also guarantees that the second thread is informed about a change of a variable.
So you either have to mark both methods as synchronized or mark the member myVar as volatile.
There's a very good explanation about synchronization here:
Atomic actions cannot be interleaved, so they can be used without fear of thread interference. However, this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible. Using volatile variables reduces the risk of memory consistency errors, because any write to a volatile variable establishes a happens-before relationship with subsequent reads of that same variable. This means that changes to a volatile variable are always visible to other threads. What's more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the volatile, but also the side effects of the code that led up the change.
No, it's not. At least, not on platforms that lack atomic 64-bit memory accesses.
Suppose that Thread A calls setIt, copies 32 bits into memory where the backing value is, and is then pre-empted before it can copy the other 32 bits.
Then Thread B calls getIt.
No it is not, because longs are not atomic in java, so one thread could have written 32 bits of the long in the setIt method, and then the getIt could read the value, and then setIt could set the other 32 bits.
So the end result is that getIt returns a value that was never valid.
It ought to be, and generally is, but is not guaranteed to be thread safe. There could be issues with different cores having different versions in CPU cache, or the store/retrieve not being atomic for all architectures. Use the AtomicLong class.
The getter is not thread safe because it’s not guarded by any mechanism that guarantees the most up-to-date visibility. Your choices are:
making myVar final (but then you can’t mutate it)
making myVar volatile
use synchronized to accessing myVar
AFAIK, Modern JVMs no longer split long and double operations. I don't know of any reference which states this is still a problem. For example, see AtomicLong which doesn't use synchronization in Sun's JVM.
Assuming you want to be sure it is not a problem then you can use synchronize both get() and set(). However, if you are performing an operation like add, i.e. set(get()+1) then this synchronization doesn't buy you much, you still have to synchronize the object for the whole operation. (A better way around this is to use a single operation for add(n) which is synchronized)
However, a better solution is to use an AtomicLong. This supports atomic operations like get, set and add and DOESN'T use synchronization.
Since it is a read only method. You should synchronize the set method.
EDIT : I see why the get method needs to be synchronized as well. Good job explaining Phil Ross.

Categories