I am reading "Java Concurency in Practice" by Brian Goetz, and have a question about immutable object publication.
In section 3.5.5 it states:
Immutable objects can be published through any mechanism.
Effectively immutable objects must be safely published;
As an example for my question:
// assume Holder is immutable
public class Test {
public static Holder holder = null;
}
Suppose a thread executes the statement:
Test.holder = new Holder(42);
Does this change (i.e. both the reference and the immutable Holder object together) become visible to other threads?
It would seem the semantics, if I'm understanding the textbook correctly, are similar to volatile variables in the sense that the update to the Test.holder member specifically is visible to other threads immediately?
The modification made to the reference variable Test.holder is not guaranteed to be seen by other threads immediately. To ensure this, you have to declare it as volatile. Then, writes to Test.holder become visible immediately.
What is meant in the text is that if you initialized the Test.holder with a new Holder(42) instead of null and never changed it, then all threads would see that Holder(42) object.
Related
Coming from C/C++, I am a little confused about volatile object behavior in Java.
I understand that volatile in Java has two properties:
Won't bring object into cache, always keep it in main memory.
Guarantee "happen-before"
However I am not sure what happens if I make a new non-volatile reference to object. For example,
class Example {
private volatile Book b = null;
public init() { b = new Book(...); }
public use() {
Book local = b;
local.read();
}
}
AFAIK, volatile means the "book object" that b is referencing to should be in main memory. Compiler probably implement reference as pointer internally, so the b pointer probably sit in cache. And to my understanding, volatile is a qualifier for object, not for reference/pointer.
The question is: in the use method, the local reference is not volatile. Would this "local" reference bring the underlying Book object from main memory into cache, essentially making the object not "volatile"?
There is no such thing as a “volatile object” nor “always keep it in main memory” guarantees.
All that volatile variables of a reference type guaranty, is that there will be a happens-before relationship between a write to that variable and a subsequent read of the same variable.
Since happens-before relationships are transitive, they work for your example code, i.e. for b = new Book(...) all modifications made to the Book instance are committed before the reference is written to b and hence for Book local = b; local.read(); the read() is guaranteed to see all these modification made by the other thread before writing the reference.
This does not imply that the Book instance’s memory was special. E.g. modifications made to the instance after the reference has been written to b may or may not be visible to other threads and other threads may perceive only some of them or see them as if being made in a different order.
So it doesn’t matter which way you get the reference to the object, all that matters is whether the changes are made before or after publishing a reference to that object through b. Likewise, it doesn’t matter how you perform the read access to the object, as long as you do it after having acquired the reference by reading b.
With local.read(); you are accessing the object via the local variable local and within read(), the same reference will be accessed though this, but all that matters is that you have acquired the reference by reading b before reading the object’s state.
volatile is about the reference, not the object.
It guarantees that any thread reading the variable b after another thread has set the value of b will get the value assigned to b by the other thread, and not some cached value.
As I understand it, volatile helps in memory visibility and synchronized helps in achieving execution control. Volatile just guarantees that the value read by the thread would have the latest value written to it.
Consider the following:
public class Singleton{
private static volatile Singleton INSTANCE = null;
private Singleton(){}
public static Singleton getInstance(){
if(INSTANCE==null){
synchronized(Integer.class){
if(INSTANCE==null){
INSTANCE = new Singleton();
}
}
}
return INSTANCE;
}
}
In the above piece of code, we use double-checked locking. This helps us create only one instance of Singleton and this is communicated to the other threads by the creating thread as soon as possible. This is what the keyword volatile does. We need the above synchronized block because the delay in the thread reading the INSTANCE variable as null and initializing the object could cause a race condition.
Now consider the following:
public class Singleton{
private static Singleton INSTANCE = null;
private Singleton(){}
public static synchronized Singleton getInstance(){
if(INSTANCE==null){
INSTANCE = new Singleton();
}
return INSTANCE;
}
}
Say we have 2 threads t1 and t2 trying to get the Singleton object. Thread t1 enters the getInstance() method first and creates the INSTANCE object. Now this newly created object should be visible to all the other threads. If the INSTANCE variable is not volatile then how do we make sure that the object is still not in t1's memory and visible to other threads. How soon is the above INSTANCE initialized by t1 visible to other threads ?
Does this mean that it is advisable to always make variables volatile with synchronized ?
In what scenarios would we not require the variable to be volatile ?
P.S I have read other questions on StackOverflow but could not find the answer to my question. Please comment before down-voting.
My question arises from the explanation given here
I think what you're missing is this from JLS 17.4.4:
An unlock action on monitor m synchronizes-with all subsequent lock actions on m (where "subsequent" is defined according to the synchronization order).
Which is very similar to the bullet about volatile variables:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
Then in 17.4.5:
If an action x synchronizes-with a following action y, then we also have hb(x, y).
... where hb is the "happens-before" relation.
Then:
If one action happens-before another, then the first is visible to and ordered before the second.
The memory model is incredibly complicated and I don't claim to be an expert, but my understanding is that the implication of the quoted parts is that the second pattern you've shown is safe without the variable being volatile - and indeed any variable which is only modified and read within synchronization blocks for the same monitor is safe without being volatile. The more interesting aspect (to me) is what happens to the variables within the object that the variable's value refers to. If Singleton isn't immutable, you've still potentially got problems there - but that's one step removed.
To put it more concretely, if two threads call getInstance() when INSTANCE is null, one of those threads will lock the monitor first. The write action of a non-null reference to INSTANCE happens-before the unlock operation, and that unlock operation happens-before the lock operation of the other thread. The lock operation happens-before the read of the INSTANCE variable, therefore the write happens-before the read... at which point, we are guaranteed that the write is visible to the reading thread.
This explanation of what is happening here is entirely wrong, as I misunderstood the Java Memory Model. See Jon Skeet's answer.
Safe lazy initialization
The action you are attempting in this case is "lazy-initialization", and that particular pattern is useful for instances, but sub-optimal for static variables. For static variables, the lazy initialization holder class idiom is preferred.
The following quote and code block are copied directly from Item 71 of Effective Java (2nd Edition), by Josh Bloch:
Because there is no locking if the field is already initialized, it
is critical that the field be declared volatile.
// Double-check idiom for lazy initialization of instance fields
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) { // First check (no locking)
synchronized(this) {
result = field;
if (result == null) // Second check (with locking)
field = result = computeFieldValue();
}
}
return result;
}
In one of his talks, he recommended to copy this structure exactly when performing lazy initialization for instance fields, as it is optimal in such situations, and it is very easy to break it by changing it.
What is actually happening?
EDIT: This section is incorrect.
The volatile keyword means that all read and write operations for the variable are atomic; that is, they happen as one single step from the perspective of anything else. Additionally, volatile variables are always read from and written to main memory, not processor cache. The combination of these two properties guarantees that, as soon as a volatile variable variable is modified on one thread, subsequent reads on another thread will read the updated value. This guarantee is not present for non-volatile variables.
The double-check idiom does not guarantee that only one instance is created. Rather, it is so that, once the variable is initialized, future calls to getInstance() do not need to enter a synchronized block, which is expensive.
The guarantee that it is not initialized twice is made by the fact that (a) it is a volatile field, and (b) it is checked (again) inside of the synchronized block. The outer check helps efficiency; the inner check guarantees single initialization.
I highly recommend reading Item 71 of Effective Java (2nd Edition) for a more complete explanation. I also recommend the book in general as being fantastic.
UPDATE:
The local result variable used reduces the number of accesses of the volatile field needed, which improves performance. If the local variable was left out, and all reads and writes directly accessed the volatile field, it should have the same result, but take slightly longer.
At the about bottom of http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html, it says:
Double-Checked Locking Immutable Objects
If Helper is an immutable object, such that all of the fields of Helper are final, then double-checked locking will work without having to use volatile fields. The idea is that a reference to an immutable object (such as a String or an Integer) should behave in much the same way as an int or float; reading and writing references to immutable objects are atomic.
The sample and explanation of mutable one is as follows:
// Broken multithreaded version
// "Double-Checked Locking" idiom
class Foo {
private Helper helper = null;
public Helper getHelper() {
if (helper == null)
synchronized(this) {
if (helper == null)
helper = new Helper();
}
return helper;
}
// other functions and members...
}
The first reason it doesn't work
The most obvious reason it doesn't work it that the writes that initialize the Helper object and the write to the helper field can be done or perceived out of order. Thus, a thread which invokes getHelper() could see a non-null reference to a helper object, but see the default values for fields of the helper object, rather than the values set in the constructor.
If the compiler inlines the call to the constructor, then the writes that initialize the object and the write to the helper field can be freely reordered if the compiler can prove that the constructor cannot throw an exception or perform synchronization.
Even if the compiler does not reorder those writes, on a multiprocessor the processor or the memory system may reorder those writes, as perceived by a thread running on another processor.
My question is: why immutable class does't have the problem? I cannot see any relation of the reorder with whether the class is mutable.
Thanks
The reason why the code is "broken" for usual objects is that helper could be non null but point to an object that has not been completely initialised yet as explained in your quote.
However if the Helper class is immutable, meaning that all its fields are final, the Java Memory Model guarantees that they are safely published even if the object is made available through a data race (which is the case in your example):
final fields also allow programmers to implement thread-safe immutable objects without synchronization. A thread-safe immutable object is seen as immutable by all threads, even if a data race is used to pass references to the immutable object between threads. This can provide safety guarantees against misuse of an immutable class by incorrect or malicious code. final fields must be used correctly to provide a guarantee of immutability.
Immutable classes did have the problem. The part that you have quoted is true after changes to the Java Memory were made in JSR133.
Specifically the changes that affect immutable objects are related to some changes that were made to the final keyword. Checkout http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#finalRight.
The important part is:
The values for an object's final fields are set in its constructor. Assuming the object is constructed "correctly", once an object is constructed, the values assigned to the final fields in the constructor will be visible to all other threads without synchronization.
Consider the following class:
public class MyClass
{
private MyObject obj;
public MyClass()
{
obj = new MyObject();
}
public void methodCalledByOtherThreads()
{
obj.doStuff();
}
}
Since obj was created on one thread and accessed from another, could obj be null when methodCalledByOtherThread is called? If so, would declaring obj as volatile be the best way to fix this issue? Would declaring obj as final make any difference?
Edit:
For clarity, I think my main question is:
Can other threads see that obj has been initialized by some main thread or could obj be stale (null)?
For the methodCalledByOtherThreads to be called by another thread and cause problems, that thread would have to get a reference to a MyClass object whose obj field is not initialized, ie. where the constructor has not yet returned.
This would be possible if you leaked the this reference from the constructor. For example
public MyClass()
{
SomeClass.leak(this);
obj = new MyObject();
}
If the SomeClass.leak() method starts a separate thread that calls methodCalledByOtherThreads() on the this reference, then you would have problems, but this is true regardless of the volatile.
Since you don't have what I'm describing above, your code is fine.
It depends on whether the reference is published "unsafely". A reference is "published" by being written to a shared variable; another thread reads the variable to get the reference. If there is no relationship of happens-before(write, read), the publication is called unsafe. An example of unsafe publication is through a non-volatile static field.
#chrylis 's interpretation of "unsafe publication" is not accurate. Leaking this before constructor exit is orthogonal to the concept of unsafe publication.
Through unsafe publication, another thread may observe the object in an uncertain state (hence the name); in your case, field obj may appear to be null to another thread. Unless, obj is final, then it cannot appear to be null even if the host object is published unsafely.
This is all too technical and it requires further readings to understand. The good news is, you don't need to master "unsafe publication", because it is a discouraged practice anyway. The best practice is simply: never do unsafe publication; i.e. never do data race; i.e. always read/write shared data through proper synchronization, by using synchronized, volatile or java.util.concurrent.
If we always avoid unsafe publication, do we still need final fields? The answer is no. Then why are some objects (e.g. String) designed to be "thread safe immutable" by using final fields? Because it's assumed that they can be used in malicious code that tries to create uncertain state through deliberate unsafe publication. I think this is an overblown concern. It doesn't make much sense in server environments - if an application embeds malicious code, the server is compromised, period. It probably makes a bit of sense in Applet environment where JVM runs untrusted codes from unknown sources - even then, this is an improbable attack vector; there's no precedence of this kind of attack; there are a lot of other more easily exploitable security holes, apparently.
This code is fine because the reference to the instance of MyClass can't be visible to any other threads before the constructor returns.
Specifically, the happens-before relation requires that the visible effects of actions occur in the same order as they're listed in the program code, so that in the thread where the MyClass is constructed, obj must be definitely assigned before the constructor returns, and the instantiating thread goes directly from the state of not having a reference to the MyClass object to having a reference to a fully-constructed MyClass object.
That thread can then pass a reference to that object to another thread, but all of the construction will have transitively happened-before the second thread can call any methods on it. This might happen through the constructing thread's launching the second thread, a synchronized method, a volatile field, or the other concurrency mechanisms, but all of them will ensure that all of the actions that took place in the instantiating thread are finished before the memory barrier is passed.
Note that if a reference to this gets passed out of the class inside the constructor somewhere, that reference might go floating around and get used before the constructor is finished. That's what's known as unsafe publishing of the object, but code such as yours that doesn't call non-final methods from the constructor (or directly pass out references to this) is fine.
Your other thread could see a null object. A volatile object could possibly help, but an explicit lock mechanism (or a Builder) would likely be a better solution.
Have a look at Java Concurrency in Practice - Sample 14.12
This class (if taken as is) is NOT thread safe. In two words: there is reordering of instructions in java (Instruction reordering & happens-before relationship in java) and when in your code you're instantiating MyClass, under some circumstances you may get following set of instructions:
Allocate memory for new instance of MyClass;
Return link to this block of memory;
Link to this not fully initialized MyClass is available for other threads, they can call "methodCalledByOtherThreads()" and get NullPointerException;
Initialize internals of MyClass.
In order to prevent this and make your MyClass really thread safe - you either have to add "final" or "volatile" to the "obj" field. In this case Java's memory model (starting from Java 5 on) will guarantee that during initialization of MyClass, reference to alocated for it block of memory will be returned only when all internals are initialized.
For more details I would strictly recommend you to read nice book "Java Concurrency in Practice". Exactly your case is described on the pages 50-51 (section 3.5.1). I would even say - you just can write correct multithreaded code without reading that book! :)
The originally picked answer by #Sotirios Delimanolis is wrong. #ZhongYu 's answer is correct.
There is the visibility issue of the concern here. So if MyClass is published unsafely, anything could happen.
Someone in the comment asked for evidence - one can check Listing 3.15 in the book Java Concurrency in Practice:
public class Holder {
private int n;
// Initialize in thread A
public Holder(int n) { this.n = n; }
// Called in thread B
public void assertSanity() {
if (n != n) throw new AssertionError("This statement is false.");
}
}
Someone comes up an example to verify this piece of code:
coding a proof for potential concurrency issue
As to the specific example of this post:
public class MyClass{
private MyObject obj;
// Initialize in thread A
public MyClass(){
obj = new MyObject();
}
// Called in thread B
public void methodCalledByOtherThreads(){
obj.doStuff();
}
}
If MyClass is initialized in Thread A, there is no guarantee that thread B will see this initialization (because the change might stay in the cache of the CPU that Thread A runs on and has not propagated into main memory).
Just as #ZhongYu has pointed out, because the write and read happens at 2 independent threads, so there is no happens-before(write, read) relation.
To fix this, as the original author has mentioned, we can declare private MyObject obj as volatile, which will ensure that the reference itself will be visible to other threads in timely manner
(https://www.logicbig.com/tutorials/core-java-tutorial/java-multi-threading/volatile-ref-object.html) .
Even after going though this, I am still not clear about how usage of final causes safe publication in the below code. Can someone give an easy-to-understand explanation.
public class SafeListener
{
private final EventListener listener;
private SafeListener()
{
listener = new EventListener()
{ public void onEvent(Event e)
{ doSomething(e); }
};
}
public static SafeListener newInstance(EventSource source)
{
SafeListener safe = new SafeListener();
source.registerListener(safe.listener);
return safe;
}
}
Edited to add: Interesting perspective on the origins of Java and JSR-133's final behavior.
Canonical reference for how final works in the new JMM, for safe publication: http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#finalRight
On simple review, I think your code represents "safe" publication to the EventSource source object, which presumably will be fielding event callbacks to listener in a different thread. You are guaranteed that threads operating on the safe.listener reference passed will see a fully-initialized listener field. This does not make any further guarantees about other synchronization issues associated with calls to onEvent or other interactions with the object's state.
What is guaranteed by your code is that, when SafeListener's constructor returns a reference inside the static method, the listener field will not be seen in an unwritten state (even if there is no explicit synchronization). For example: Suppose a thread A calls newInstance(), resulting in an assignment to the listener field. Suppose that a thread B is able to dereference the listener field. Then, even absent any other synchronization, thread B is guaranteed to see the write listener = new EventListener().... If the field were not final, you would not receive that guarantee. There are several (other) ways of providing the guarantee (explicit synchronization, use of an atomic reference, use of volatile) of varying performance and readability.
Not everything that's legal is advisable. Suggest you take a look at JCiP and perhaps this article on safe publication techniques.
A recent, related question is here: "Memory barriers and coding...", "Java multi-threading & Safe Publication".
In a nutshell, the specification for final (see #andersoj's answer) guarantees that when the constructor returns, the final field will have been properly initialized (as visible from all threads).
There is no such guarantee for non-final fields (which means that if another thread gets the freshly constructed object, the field may not have been set yet).
That this works is part of the JVM spec.
How it works would be a JVM implementation detail.
You can refer to JSL
Final field or the object reachable through a final reference can't be reordered with the initial load of a reference to that object. It is visible to all other threads after its construction.