'Effective Java' conundrum: Why is volatile required in this concurrent code? [duplicate]

'Effective Java' conundrum: Why is volatile required in this concurrent code? [duplicate] - java

This question already has answers here:
Why is volatile used in double checked locking
(8 answers)
Closed 4 years ago.
I'm working my way through item 71, "Use lazy initialization judiciously", of Effective Java (second edition). It suggests the use of the double-check idiom for lazy initialization of instance fields using this code (pg 283):
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) { //First check (no locking)
synchronized(this) {
result = field;
if (result == null) //Second check (with locking)
field = result = computeFieldValue();
}
}
return result;
}
So, I actually have several questions:
Why is the volatile modifier required on field given that initialization takes place in a synchronized block? The book offers this supporting text: "Because there is no locking if the field is already initialized, it is critical that the field be declared volatile". Therefore, is it the case that once the field is initialized, volatile is the only guarantee of multiple thread consistent views on field given the lack of other synchronization? If so, why not synchronize getField() or is it the case that the above code offers better performance?
The text suggests that the not-required local variable, result, is used to "ensure that field is read only once in the common case where it's already initialized", thereby improving performance. If result was removed, how would field be read multiple times in the common case where it was already initialized?

Why is the volatile modifier required on field given that initialization takes place in a synchronized block?
The volatile is necessary because of the possible reordering of instructions around the construction of objects. The Java memory model states that the real-time compiler has the option to reorder instructions to move field initialization outside of an object constructor.
This means that thread-1 can initialized the field inside of a synchronized but that thread-2 may see the object not fully initialized. Any non-final fields do not have to be initialized before the object has been assigned to the field. The volatile keyword ensures that field as been fully initialized before it is accessed.
This is an example of the famous "double check locking" bug.
If result was removed, how would field be read multiple times in the common case where it was already initialized?
Anytime you access a volatile field, it causes a memory-barrier to be crossed. This can be expensive compared to accessing a normal field. Copying a volatile field into a local variable is a common pattern if it is to be accessed in any way multiple times in the same method.
See my answer here for more examples of the perils of sharing an object without memory-barriers between threads:
About reference to object before object's constructor is finished

This a fairly complicated but it is related to now the compiler can rearrange things.
Basically the Double Checked Locking pattern does not work in Java unless the variable is volatile.
This is because, in some cases, the compiler can assign the variable so something other than null then do the initialisation of the variable and reassign it. Another thread would see that the variable is not null and attempt to read it - this can cause all sorts of very special outcomes.
Take a look at this other SO question on the topic.

Good questions.
Why is the volatile modifier required on field given that initialization takes place in a synchronized block?
If you have no synchronization, and you assign to that shared global field there is no promise that all writes that occur on construction of that object will be seen. For instance imagine FieldType looks like.
public class FieldType{
Object obj = new Object();
Object obj2 = new Object();
public Object getObject(){return obj;}
public Object getObject2(){return obj2;}
}
It is possible getField() returns a non-null instance but that instance getObj() and getObj2() methods can return null values. This is because without synchronization the writes to those fields can race with the consturction of the object.
How is this fixed with volatile? All writes that occur prior to a volatile write are visible after that volatile write occurs.
If result was removed, how would field be read multiple times in the common case where it was already initialized?
Storing locally once and reading throughout the method ensures one thread/process local store and all thread local reads. You can argue premature optimization in those regards but I like this style because you won't run yourself into strange reordering problems that can occur if you don't.

Related

Issue with Double Check Locking in Java [duplicate]

This question already has answers here:
Java double checked locking
(11 answers)
Closed 4 years ago.
One of the article mentions an issue with "Double Check Locking". Please see the below example
public class MyBrokenFactory {
private static MyBrokenFactory instance;
private int field1, field2 ...
public static MyBrokenFactory getFactory() {
// This is incorrect: don't do it!
if (instance == null) {
synchronized (MyBrokenFactory.class) {
if (instance == null)
instance = new MyBrokenFactory();
}
}
return instance;
}
private MyBrokenFactory() {
field1 = ...
field2 = ...
}
}
Reason:- (Please note the order of execution by the numbering)
Thread 1: 'gets in first' and starts creating instance.
1. Is instance null? Yes.
2. Synchronize on class.
3. Memory is allocated for instance.
4. Pointer to memory saved into instance.
[[Thread 2]]
7. Values for field1 and field2 are written
to memory allocated for object.
.....................
Thread 2: gets in just as Thread 1 has written the object reference
to memory, but before it has written all the fields.
5. Is instance null? No.
6. instance is non-null, but field1 and field2 haven't yet been set!
This thread sees invalid values for field1 and field2!
Question :
As the creation of the new instance(new MyBrokenFactory()) is done from the synchronized block, will the lock be released before the entire initialization is completed (private MyBrokenFactory() is completely executed) ?
Reference - https://www.javamex.com/tutorials/double_checked_locking.shtml
Please explain.

The problem is here:
Thread 2: gets in just as Thread 1 has written the object reference to memory, but before it has written all the fields.
Is instance null? No.
With out synchronization, thread 2 might see instance as null, even though thread 1 has written it. Notice that the first check of instance is outside of the synchronized block:
if (instance == null) {
synchronized (MyBrokenFactory.class) {
Since that first check is done outside of the block there's no guarantee that thread 2 will see the correct value of instance.
I have no idea what you're trying to do with field1 and field2, you never even write them.
Re. Your edit:
As the creation of the new instance(new MyBrokenFactory()) is done from the synchronized block
I think what you're asking is if the two instance fields, field1 and field2 are guaranteed to be visible. The answer is no, and the problem is the same as with instance. Because you don't read instance from within a synchronized block, there's no guarantee that those instance fields will be read correctly. If instance is non-null, you never enter the synchronized block, so no synchronization occurs.

Please find an answer to my question. I got the answer by looking into another similar question here.
Synchronize guarantees, that only one thread can enter a block of code. But it doesn't guarantee, that variables modifications done within synchronized section will be visible to other threads. Only the threads that enters the synchronized block is guaranteed to see the changes. This is the reason why double checked locking is broken - it is not synchronized on the reader's side. The reading thread may see, that the singleton is not null, but singleton data may not be fully initialized (visible).
Ordering is provided by volatile. volatile guarantees ordering, for instance write to volatile singleton static field guarantees that writes to the singleton object will be finished before the write to volatile static field. It doesn't prevent creating singleton of two objects, this is provided by synchronize.
Class final static fields doesn't need to be volatile. In Java, the JVM takes care of this problem.

Cheapest way of establishing happens-before with non-final field

Many questions/answers have indicated that if a class object has a final field and no reference to it is exposed to any other thread during construction, then all threads are guaranteed to see the value written to the field once the constructor completes. They have also indicated that storing into a final field a reference to a mutable object which has never been accessed by outside threads will ensure that all mutations which have been made to the object prior to the store will be visible on all threads which access the object via the field. Unfortunately, neither guarantee applies to writes of non-final fields.
A question I do not see answered, however, is this: If the semantics of a class are such that a field cannot be final, but one wishes to ensure the "publication" of the field and the object identified thereby, what is the most efficient way of doing that? As an example, consider
class ShareableDataHolder<T>
{
Object data; // Always identifies either a T or a SharedDataHolder<T>
}
private class SharedDataHolder<T> extends ShareableDataHolder<T>
{
Object data; // Always identifies either a T or a lower-numbered SharedDataHolder<T>
final long seq; // Immutable; necessarily unique
}
The intention would be that data will initially identify a data object directly, but that it could legitimately at any time be changed to identify a SharedDataHolder<T> which directly or indirectly encapsulates an equivalent data object. Assume all code is written to work correctly (though not necessarily optimally-efficiently) if any read of data may arbitrarily return any value that was ever written to data, but may fail if it reads null.
Declaring volatile Object data would be semantically correct, but would likely impose extra costs on every subsequent access to the field. Entering a dummy lock after initially setting the field would work, but would be needlessly slow. Having a dummy final field, which the object sets to identify itself would seem like it should work; although technically I think it might require that all accesses to the other field be done through the other field, I can't see any realistic scenario where that would matter. In any case, having a dummy field whose purpose is only to provide the appropriate synchronization via its existence would seem wasteful.
Is there any clean way to inform the compiler that a particular write to data within the constructor should have a happens-before relationship with regard to any reads of that field which occur after the constructor returns (as would be the case if the field were final), without having to pay the costs associated with volatile, locks, etc.? Alternatively, if a thread were to read data and find it null, could it somehow repeat the read in such a fashion as to establish a "happens after" with regard to the write of data [recognizing that such a request might be slow, but shouldn't need to happen very often]?
PS--If happens-before relationships are non-transitive, would a proper happens-before relationship exist in the following scenario?
Thread 1 writes to a non-final field dat in some object Fred and stores a reference to it into to a final field George.
Thread 2 copies the reference from George into a non-final field Larry.
Thread 3 reads Larry.dat.
From what I can tell, a happens-before relationship exists between the write of Fred's field dat and a read of George. Would a happens-before relationship exist between the the write of Fred's dat and a read of Larry that returns a reference to Fred that was copied from a final reference to Fred? If not, is there any "safe" way to copy a reference contained in a final field to a non-final field that would be accessible via other threads?
PPS--If an object and its constituents are never accessed outside their creation thread until the main constructor finishes, and the last step of the main constructor is to stores within the main object a final reference to itself, is there any "plausible" implementation/scenario where another thread could see a partially-constructed object, whether or not anything actually uses that final reference?

Short answer
No.
Longer answer
JLS 17.4.5 lists all* of the ways of establishing a happens-before relationship, other than the special case of final field semantics:
An unlock on a monitor happens-before every subsequent lock on that monitor.
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
A call to start() on a thread happens-before any actions in the started thread.
All actions in a thread happen-before any other thread successfully returns from a join() on that thread.
The default initialization of any object happens-before any other actions (other than default-writes) of a program.
(The original lists them as bullet points; I'm changing them to numbers for convenience here.)
Now, you've ruled out locks (#1) and volatile fields (#2). Rules #3 and #4 relate to the life-cycle of the thread, which you don't mention in your question, and doesn't sound like it would apply. Rule #5 doesn't give you any non-null values, so it doesn't apply either.
So of the five possible methods for establishing happens-before, other than final field semantics, three don't apply and two you've explicitly ruled out.
* The rules listed in 17.4.5 are actually consequences of the synchronization order rules defined in 17.4.4, but those relate pretty directly to the ones mentioned in 17.4.5. I mention that because 17.4.5's list can be interpreted as being illustrative and thus non-exhaustive, but 17.4.4's list is non-illustrative and exhaustive, and you can make the same analysis from that directly, if you don't want to rely on the intermediate analysis that 17.4.5 provides.

You can apply final field semantics without making the fields of your class final but by passing your reference through another final field. For this purpose, you need to define a publisher class:
class Publisher<T> {
private final T value;
private Publisher(T value) { this.value = value; }
public static <S> S publish(S value) { return new Publisher<S>(value).value; }
}
If you are now working with an instance of ShareableDataHolder<T>, you can publish the instance by:
ShareableDataHolder<T> holder = new ShareableDataHolder<T>();
// set field values
holder = Publisher.publish(holder);
// Passing holder to other threads is now safe
This approach is tested and benchmarked and turns out to be the most performant alternative on current VMs. The overhead is minimal as escape analysis typically removes the allocation of the very short-lived Publisher instance.

Local variable used to access volatile instance variable

What is the purpose or value of creating a local reference to a static volatile variable that is already kept as a member field. This code here is from java.util.Scanner JDK 6b14 here.
class Scanner {
private static volatile Pattern linePattern;
...
private static Pattern linePattern() {
Pattern lp = linePattern;
if (lp == null)
linePattern = lp = Pattern.compile("...");
return lp;
}
...
}
The Java Tutorials: "Reads and writes are atomic for all variables declared volatile (including long and double variables)... any write to a volatile variable establishes a happens-before relationship with subsequent reads of that same variable. "
This means that reading the reference to the Pattern object won't fail half-way because it has changed. The volatile keyword is supposed to protect exactly these kinds of accesses, so I don't the duplicating local variable is meant to ensure that a valid value is returned.
Also, the lazy initialization can be done on the member field without needing an intermediary local variable:
if (linePattern == null) linePattern = Pattern.compile("...");
It looks to be a byte-code optimization as seen here and here. Using local variables produces smaller bytecode (less instructions) as well as less accesses to the actual value (which is an expensive volatile read). However they have not used the final variable optimization along with it, so I'm skeptical about drawing this conclusion.

Lazy initialization, i.e. delay the work until it's really necessary.

This "speed" up things. Access to volatile variables is expensive. Use can get away with this overhead by assign it to a stack variable and access that instead

It guarantees that returned value is not NULL - even if the static variable is set to NULL between check and return.
And at the same time it is an unsynchronized lazy init with re-initialization if needed ;).

For volatile fields, the linePattern might change in between different lines. Copying the reference to a local variable makes certain that you can't have an inconsistent state. For example, if you had written
if (linePattern == null)
linePattern = Pattern.compile("...");
then linePattern might have stopped being null while the Pattern.compile was executing.

Use of Volatile variables for safe publication of Immutable objects

I came across this statement:
In properly constructed objects, all
threads will see correct values of
final fields, regardless of how the
object is published.
Then why a volatile variable is used to safely
publishing an Immutable object?
I'm really confused. Can anybody make it clear with a suitable example?

In this case, the volatility would only ensure visibility of the new object; any other threads that happened to get hold of your object via a non-volatile field would indeed see the correct values of final fields as per JSR-133's initialization safety guarantees.
Still, making the variable volatile doesn't hurt; is correct from a memory management perspective anyway; and would be necessary for non-final fields initialised in a constructor (although there shouldn't be any of these in an immutable object). If you wish to share variables between threads, you'll need to ensure adequate synchronization to give visibility anyway; though in this case you're right, that there's no danger to the atomicity of the constructor.
Thanks to Tom Hawtin for pointing out I'd completely overlooked the JMM guarantees on final fields; previous incorrect answer is given below.
The reason for the volatile variable is that is establishes a happens-before relationship (according to the Java Memory Model) between the construction of the object, and the assignment of the variable. This achieves two things:
Subsequent reads of that variable from different threads are guaranteed to see the new value. Without marking the variable as volatile, these threads could see stale values of the reference.
The happens-before relationship places limits on what reorderings the compiler can do. Without a volatile variable, the assignment to the variable could happen before the object's constructor runs - hence other threads could get a reference to the object before it was fully constructed.
Since one of the fundamental rules of immutable objects is that you don't publish references during the constructor, it's this second point that is likely being referenced here. In a multithreaded environment without proper concurrent handling, it is possible for a reference to the object to be "published" before that object has been constructed. Thus another thread could get that object, see that one of its fields is null, and then later see that this "immutable" object has changed.
Note that you don't have to use volatile fields to achieve this if you have other appropriate synchronization primitives - for example, if the assignment (and all later reads) are done in a synchronized block on a given monitor - but in a "standalone" sense, marking the variable as volatile is the easiest way to tell the JVM "this might be read by multiple threads, please make the assignment safe in that context."

A volatile reference to an immutable object could be useful. This would allow you to swap one object for another to make the new data available to other threads.
I would suggets you look at using AtomicReference first however.
If you need final volatile fields you have a problem. All fields, including final ones are available to other threads as soon as the constructor returns. So if you pass an object to another thread in the constructor, it is possible for the other thread to see an inconsistent state. IMHO you should consider a different solution so you don't have to do this.

You cant really see the difference in Immutable class.see the below example.in Myclass.class
public static Foo getInstance(){
if(INSTANCE == null){
INSTANCE = new Foo();
}
return INSTANCE;
}
in the above code if Foo is declared final(final Foo INSTANCE;) it guarantees that it won't publish references during the constructor call.partial object construction is not possible
consider this...if this Myclass is Immutable, its state is not gonna change after object construction, making Volatile(volatile final Foo INSTANCE;) keyword redundant.but if this class allows its object state to be changed(Not immutable) multiple threads CAN actually update the object and some updates are not visible to other threads, hence volatile keyword ensures safety publication of objects in non-Immutable class.

Java concurrency - why doesn't synchronizing a setter (but not a getter) make a class thread-safe? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Thread safety in Java class
I'm reading Java concurrency in Practice, and I've come to an example that puzzles me.
The authors state that this class is not threadsafe
public class MutableInteger {
private int number;
public int getInt() {
return number;
}
public void setInt(int val) {
number = val;
}
}
And they also state that synchronizing only one method (the setter for example) would not do; you have to syncronize both.
My question is: Why? Wouldn't synchronizing the setter just do?

Java has a happens before/happens after memory model. There needs to be some common concurrent construct (e.g. synchronized block/method, lock, volatile, atomic) on both the write path and the read path to trigger this behaviour.
If you synchronize both methods you are creating a lock on the whole object that will be shared by both the read and write threads. The JVM will ensure that any changes that occur on the writing thread that occur before leaving the (synchronized) setInt method will be visible to any reading threads after they enter the (synchronized) getInt method. The JVM will insert the necessary memory barriers to ensure that this will happen.
If only the write method is synchronized then changes to the object may not be visible to any reading thread. This is because there is no point on the read path that the JVM can use to ensure that the reading thread's visible memory (cache's etc.) are in line with the writing thread. Make the getInt method synchronized would provide that.
Note: specifically in this case making the field 'number' volatile would give the correct behaviour as volatile read/write also provides the same memory visibility behaviour in the JVM and the action inside of the setInt method is only an assignment.

It's explained in the book before the sample (page 35):
"Synchronizing only the setter would not be sufficient: threads calling get would still be able to see stale values."
Stale data: When the reader thread examines ready, it may see an out-of-date value. Unless synchronization is used every time a variable is accessed, it is possible to see a stale value for that variable. Worse, staleness is not all-or-nothing: a thread can see an up-to-date value of one variable but a stale value of another variable that was written first.

If you only Synchronize the setter method, you could only guarantee the attribute would not be amended incorrectly, but you could not be sure it is stale value when you try to read the variable.

because number is not volatile, and getInt() is not synchronized, getInt() may return stale values. For more information, read about the java memory model.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.