I have a java class with a non-final int variable that I explicitly initialized in the constructor to 0. All other access to the variable is managed by a ReentrantLock. Do i have to worry that threads won't see the initial value of 0 because i didn't use the lock in the constructor?
Yes, you have to worry. To avoid problems in this case you need safe publication of the object reference.
From Java Concurrency in Practice:
To publish an object safely, both the reference to the object and the object's state must be made visible to other
threads at the same time. A properly constructed object can be safely published by:
Initializing an object reference from a static initializer;
Storing a reference to it into a volatile field or AtomicReference;
Storing a reference to it into a final field of a properly constructed object; or
Storing a reference to it into a field that is properly guarded by a lock.
In other cases you can (theoretically) face the situation when result of new will be avaiable to other threads before completion of constructor call (due to possible operation reordering).
Note, however, that if 0 is a default value rather than the value written in the constructor, it's guaranteed to be visible (JLS §17.4.4):
The write of the default value (zero, false or null) to each variable synchronizes-
with the first action in every thread. Although it may seem a little
strange to write a default value to a variable before the object containing the
variable is allocated, conceptually every object is created at the start of the
program with its default initialized values
From Java Concurrency in Practice:
Objects that are not immutable must be
safely published, which usually
entails synchronization by both the
publishing and the consuming thread.
An object is not safely published just by not publishing its reference in the constructor. I.e. the constructor does not enforce the necessary happens-before relationship. So, even if you don't publish an object reference within its constructor, you can still have concurrency problems. For details and examples, see the relevant chapter in the book.
In order to do a safe publication, the authors suggest the following ways:
To publish an object safely, both the
reference to the object and the
object's state must be made visible to
other threads at the same time. A
properly constructed object can be
safely published by:
Initializing an object reference from
a static initializer;
Storing a reference to it into a
volatile field or AtomicReference;
Storing a reference to it into a final
field of a properly constructed
object; or
Storing a reference to it into a field
that is properly guarded by a lock.
In essence, a proper "happens-before" relationship must be introduced between construction of the object and accessing of that object by another thread.
As the authors note, objects that are passed through threadsafe collections are also safely published (e.g. item passed through a worker thread through LinkedBlockingQueue etc.).
It is true that storing a value to primitive int fields (but not to 64bit fields like long) are atomic, meaning that you cannot observe a "weird" value even if you access that field in a non-thread-safe way from a different thread. But when an object is not yet properly constructed, other bad things may happen (to be honest I don't know what exactly could happen, but it's surely not worth a try).
To summarize, you need to publish the object safely anyways, at which point the value is correctly set to 0 and the object is properly instantiated.
Related
More specifically, assuming an object is partially initialized and a field x is initialized to null by the object's constructor, is it possible that some other thread reading this partially initialized object can see any other value than null?
If I understand it correctly, there's no guarantee in Java Memory Model itself that the value will always be null in such case. The question is: considering CPU caches and JVM memory architecture, should it be reasonably expected that the value could be not null?
Yes this is possible – there is no guarantee that an object which has been initialised in a constructor by one thread will be correctly read by another thread. The Java memory model allows the compiler to reorder statements within the constructor so long at the reordering does not effect the the state of the object on completion of the initialisation.
Another thread can obtain a reference to the object after it has been allocated in memory but before the constructor has completed and may read an uninitialised value.
You need to protect access to such variables using the synchronised keyword or by using synchronized collections.
See: Java Memory Model
By default, reference member variables are initialized to null. If the constructor had set the field to a non-null value, then it'd have been possible for other threads to see a null or non-null value. But if the constructor also is setting the value as null (which seems redundant in a simple scenario), then it is NOT possible for other threads to see any value other than null (the only value the field ever had is null, so there is no question of seeing any other value)
Many questions/answers have indicated that if a class object has a final field and no reference to it is exposed to any other thread during construction, then all threads are guaranteed to see the value written to the field once the constructor completes. They have also indicated that storing into a final field a reference to a mutable object which has never been accessed by outside threads will ensure that all mutations which have been made to the object prior to the store will be visible on all threads which access the object via the field. Unfortunately, neither guarantee applies to writes of non-final fields.
A question I do not see answered, however, is this: If the semantics of a class are such that a field cannot be final, but one wishes to ensure the "publication" of the field and the object identified thereby, what is the most efficient way of doing that? As an example, consider
class ShareableDataHolder<T>
{
Object data; // Always identifies either a T or a SharedDataHolder<T>
}
private class SharedDataHolder<T> extends ShareableDataHolder<T>
{
Object data; // Always identifies either a T or a lower-numbered SharedDataHolder<T>
final long seq; // Immutable; necessarily unique
}
The intention would be that data will initially identify a data object directly, but that it could legitimately at any time be changed to identify a SharedDataHolder<T> which directly or indirectly encapsulates an equivalent data object. Assume all code is written to work correctly (though not necessarily optimally-efficiently) if any read of data may arbitrarily return any value that was ever written to data, but may fail if it reads null.
Declaring volatile Object data would be semantically correct, but would likely impose extra costs on every subsequent access to the field. Entering a dummy lock after initially setting the field would work, but would be needlessly slow. Having a dummy final field, which the object sets to identify itself would seem like it should work; although technically I think it might require that all accesses to the other field be done through the other field, I can't see any realistic scenario where that would matter. In any case, having a dummy field whose purpose is only to provide the appropriate synchronization via its existence would seem wasteful.
Is there any clean way to inform the compiler that a particular write to data within the constructor should have a happens-before relationship with regard to any reads of that field which occur after the constructor returns (as would be the case if the field were final), without having to pay the costs associated with volatile, locks, etc.? Alternatively, if a thread were to read data and find it null, could it somehow repeat the read in such a fashion as to establish a "happens after" with regard to the write of data [recognizing that such a request might be slow, but shouldn't need to happen very often]?
PS--If happens-before relationships are non-transitive, would a proper happens-before relationship exist in the following scenario?
Thread 1 writes to a non-final field dat in some object Fred and stores a reference to it into to a final field George.
Thread 2 copies the reference from George into a non-final field Larry.
Thread 3 reads Larry.dat.
From what I can tell, a happens-before relationship exists between the write of Fred's field dat and a read of George. Would a happens-before relationship exist between the the write of Fred's dat and a read of Larry that returns a reference to Fred that was copied from a final reference to Fred? If not, is there any "safe" way to copy a reference contained in a final field to a non-final field that would be accessible via other threads?
PPS--If an object and its constituents are never accessed outside their creation thread until the main constructor finishes, and the last step of the main constructor is to stores within the main object a final reference to itself, is there any "plausible" implementation/scenario where another thread could see a partially-constructed object, whether or not anything actually uses that final reference?
Short answer
No.
Longer answer
JLS 17.4.5 lists all* of the ways of establishing a happens-before relationship, other than the special case of final field semantics:
An unlock on a monitor happens-before every subsequent lock on that monitor.
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
A call to start() on a thread happens-before any actions in the started thread.
All actions in a thread happen-before any other thread successfully returns from a join() on that thread.
The default initialization of any object happens-before any other actions (other than default-writes) of a program.
(The original lists them as bullet points; I'm changing them to numbers for convenience here.)
Now, you've ruled out locks (#1) and volatile fields (#2). Rules #3 and #4 relate to the life-cycle of the thread, which you don't mention in your question, and doesn't sound like it would apply. Rule #5 doesn't give you any non-null values, so it doesn't apply either.
So of the five possible methods for establishing happens-before, other than final field semantics, three don't apply and two you've explicitly ruled out.
* The rules listed in 17.4.5 are actually consequences of the synchronization order rules defined in 17.4.4, but those relate pretty directly to the ones mentioned in 17.4.5. I mention that because 17.4.5's list can be interpreted as being illustrative and thus non-exhaustive, but 17.4.4's list is non-illustrative and exhaustive, and you can make the same analysis from that directly, if you don't want to rely on the intermediate analysis that 17.4.5 provides.
You can apply final field semantics without making the fields of your class final but by passing your reference through another final field. For this purpose, you need to define a publisher class:
class Publisher<T> {
private final T value;
private Publisher(T value) { this.value = value; }
public static <S> S publish(S value) { return new Publisher<S>(value).value; }
}
If you are now working with an instance of ShareableDataHolder<T>, you can publish the instance by:
ShareableDataHolder<T> holder = new ShareableDataHolder<T>();
// set field values
holder = Publisher.publish(holder);
// Passing holder to other threads is now safe
This approach is tested and benchmarked and turns out to be the most performant alternative on current VMs. The overhead is minimal as escape analysis typically removes the allocation of the very short-lived Publisher instance.
I want to make sure that I correctly understand the 'Effectively Immutable Objects' behavior according to Java Memory Model.
Let's say we have a mutable class which we want to publish as an effectively immutable:
class Outworld {
// This MAY be accessed by multiple threads
public static volatile MutableLong published;
}
// This class is mutable
class MutableLong {
private long value;
public MutableLong(long value) {
this.value = value;
}
public void increment() {
value++;
}
public long get() {
return value;
}
}
We do the following:
// Create a mutable object and modify it
MutableLong val = new MutableLong(1);
val.increment();
val.increment();
// No more modifications
// UPDATED: Let's say for this example we are completely sure
// that no one will ever call increment() since now
// Publish it safely and consider Effectively Immutable
Outworld.published = val;
The question is:
Does Java Memory Model guarantee that all threads MUST have Outworld.published.get() == 3 ?
According to Java Concurrency In Practice this should be true, but please correct me if I'm wrong.
3.5.3. Safe Publication Idioms
To publish an object safely, both the reference to the object and the
object's state must be made visible to other threads at the same time.
A properly constructed object can be safely published by:
- Initializing an object reference from a static initializer;
- Storing a reference to it into a volatile field or AtomicReference;
- Storing a reference to it into a final field of a properly constructed object; or
- Storing a reference to it into a field that is properly guarded by a lock.
3.5.4. Effectively Immutable Objects
Safely published effectively immutable objects can be used safely by
any thread without additional synchronization.
Yes. The write operations on the MutableLong are followed by a happens-before relationship (on the volatile) before the read.
(It is possible that a thread reads Outworld.published and passes it on to another thread unsafely. In theory, that could see earlier state. In practice, I don't see it happening.)
There is a couple of conditions which must be met for the Java Memory Model to guarantee that Outworld.published.get() == 3:
the snippet of code you posted which creates and increments the MutableLong, then sets the Outworld.published field, must happen with visibility between the steps. One way to achieve this trivially is to have all that code running in a single thread - guaranteeing "as-if-serial semantics". I assume that's what you intended, but thought it worth pointing out.
reads of Outworld.published must have happens-after semantics from the assignment. An example of this could be having the same thread execute Outworld.published = val; then launch other the threads which could read the value. This would guarantee "as if serial" semantics, preventing re-ordering of the reads before the assignment.
If you are able to provide those guarantees, then the JMM will guarantee all threads see Outworld.published.get() == 3.
However, if you're interested in general program design advice in this area, read on.
For the guarantee that no other threads ever see a different value for Outworld.published.get(), you (the developer) have to guarantee that your program does not modify the value in any way. Either by subsequently executing Outworld.published = differentVal; or Outworld.published.increment();. While that is possible to guarantee, it can be so much easier if you design your code to avoid both the mutable object, and using a static non-final field as a global point of access for multiple threads:
instead of publishing MutableLong, copy the relevant values into a new instance of a different class, whose state cannot be modified. E.g.: introduce the class ImmutableLong, which assigns value to a final field on construction, and doesn't have an increment() method.
instead of multiple threads accessing a static non-final field, pass the object as a parameter to your Callable/Runnable implementations. This will prevent the possibility of one rogue thread from reassigning the value and interfering with the others, and is easier to reason about than static field reassignment. (Admittedly, if you're dealing with legacy code, this is easier said than done).
The question is: Does Java Memory Model guarantee that all threads
MUST have Outworld.published.get() == 3 ?
The short answer is no. Because other threads might access Outworld.published before it has been read.
After the moment when Outworld.published = val; had been performed, under condition that no other modifications done with the val - yes - it always be 3.
But if any thread performs val.increment then its value might be different for other threads.
I came across this statement:
In properly constructed objects, all
threads will see correct values of
final fields, regardless of how the
object is published.
Then why a volatile variable is used to safely
publishing an Immutable object?
I'm really confused. Can anybody make it clear with a suitable example?
In this case, the volatility would only ensure visibility of the new object; any other threads that happened to get hold of your object via a non-volatile field would indeed see the correct values of final fields as per JSR-133's initialization safety guarantees.
Still, making the variable volatile doesn't hurt; is correct from a memory management perspective anyway; and would be necessary for non-final fields initialised in a constructor (although there shouldn't be any of these in an immutable object). If you wish to share variables between threads, you'll need to ensure adequate synchronization to give visibility anyway; though in this case you're right, that there's no danger to the atomicity of the constructor.
Thanks to Tom Hawtin for pointing out I'd completely overlooked the JMM guarantees on final fields; previous incorrect answer is given below.
The reason for the volatile variable is that is establishes a happens-before relationship (according to the Java Memory Model) between the construction of the object, and the assignment of the variable. This achieves two things:
Subsequent reads of that variable from different threads are guaranteed to see the new value. Without marking the variable as volatile, these threads could see stale values of the reference.
The happens-before relationship places limits on what reorderings the compiler can do. Without a volatile variable, the assignment to the variable could happen before the object's constructor runs - hence other threads could get a reference to the object before it was fully constructed.
Since one of the fundamental rules of immutable objects is that you don't publish references during the constructor, it's this second point that is likely being referenced here. In a multithreaded environment without proper concurrent handling, it is possible for a reference to the object to be "published" before that object has been constructed. Thus another thread could get that object, see that one of its fields is null, and then later see that this "immutable" object has changed.
Note that you don't have to use volatile fields to achieve this if you have other appropriate synchronization primitives - for example, if the assignment (and all later reads) are done in a synchronized block on a given monitor - but in a "standalone" sense, marking the variable as volatile is the easiest way to tell the JVM "this might be read by multiple threads, please make the assignment safe in that context."
A volatile reference to an immutable object could be useful. This would allow you to swap one object for another to make the new data available to other threads.
I would suggets you look at using AtomicReference first however.
If you need final volatile fields you have a problem. All fields, including final ones are available to other threads as soon as the constructor returns. So if you pass an object to another thread in the constructor, it is possible for the other thread to see an inconsistent state. IMHO you should consider a different solution so you don't have to do this.
You cant really see the difference in Immutable class.see the below example.in Myclass.class
public static Foo getInstance(){
if(INSTANCE == null){
INSTANCE = new Foo();
}
return INSTANCE;
}
in the above code if Foo is declared final(final Foo INSTANCE;) it guarantees that it won't publish references during the constructor call.partial object construction is not possible
consider this...if this Myclass is Immutable, its state is not gonna change after object construction, making Volatile(volatile final Foo INSTANCE;) keyword redundant.but if this class allows its object state to be changed(Not immutable) multiple threads CAN actually update the object and some updates are not visible to other threads, hence volatile keyword ensures safety publication of objects in non-Immutable class.
The Java language spec defines semantics of final fields in section 17.5:
The usage model for final fields is a simple one. Set the final fields for an object in that object's constructor. Do not write a reference to the object being constructed in a place where another thread can see it before the object's constructor is finished. If this is followed, then when the object is seen by another thread, that thread will always see the correctly constructed version of that object's final fields. It will also see versions of any object or array referenced by those final fields that are at least as up-to-date as the final fields are.
My question is - does the 'up-to-date' guarantee extend to the contents of nested arrays, and nested objects?
In a nutshell: If one thread assigns a mutable object graph to a final field in an object, and the object graph is never updated, can all threads safely read that object graph via the final field?
An example scenario:
Thread A constructs a HashMap of ArrayLists, then assigns the HashMap to final field 'myFinal' in an instance of class 'MyClass'
Thread B sees a (non-synchronized) reference to the MyClass instance and reads 'myFinal', and accesses and reads the contents of one of the ArrayLists
In this scenario, are the members of the ArrayList as seen by Thread B guaranteed to be at least as up to date as they were when MyClass's constructor completed?
I'm looking for clarification of the semantics of the Java Memory Model and language spec, rather than alternative solutions like synchronization. My dream answer would be a yes or no, with a reference to the relevant text.
Updates:
I'm interested in the semantics of Java 1.5 and above, i.e. with the updated Java Memory Model introduced via JSR 133. The 'up-to-date' guarantee on final fields was introduced in this update.
In this scenario, are the members of
the ArrayList as seen by Thread B
guaranteed to be at least as up to
date as they were when MyClass's
constructor completed?
Yes, they are.
A thread is required to read memory when it encounters reference for the first time. Because hash map is constructed, all entries in it are brand new, then the references to objects are up-to-date to what they were when the constructor has finished.
After that initial encounter, the usual visibility rules apply. So, when other thread changes non-final field in the final references, the other thread may not see that change, but it still will see the reference that came out of constructor.
In reality, it means that if you do not modify final hash-map after the constructor, its contents are constants for all threads.
EDIT
I knew that I've seen this guarantee somewhere before.
Here is a paragraph of interest from this article that describes JSR 133
Initialization safety
The new JMM also seeks to provide a
new guarantee of initialization safety
-- that as long as an object is properly constructed (meaning that a
reference to the object is not
published before the constructor has
completed), then all threads will see
the values for its final fields that
were set in its constructor,
regardless of whether or not
synchronization is used to pass the
reference from one thread to another.
Further, any variables that can be
reached through a final field of a
properly constructed object, such as
fields of an object referenced by a
final field, are also guaranteed to be
visible to other threads as well. This
means that if a final field contains a
reference to, say, a LinkedList, in
addition to the correct value of the
reference being visible to other
threads, also the contents of that
LinkedList at construction time would
be visible to other threads without
synchronization. The result is a
significant strengthening of the
meaning of final -- that final fields
can be safely accessed without
synchronization, and that compilers
can assume that final fields will not
change and can therefore optimize away
multiple fetches.
If the constructor is written like this, you should have no issue:
public class MyClass {
public final Map myFinal;
public MyClass () {
Map localMap = new HashMap();
localMap.put("key", new ArrayList());
this.myFinal = localMap;
}
}
This is because the map is fully initialized before it's assigned to the public reference. Once the constructor completes, the final Map will be up-to-date.