Is this possible if an object is visible to other threads during its initialization (visible while doing initialization but not completed yet)? If yes then could you please give a simple example to backup your justification?
This can happen in a number of ways.
You pass your object to another thread in the constructor, e.g. you start a thread in your constructor.
You pass the object to another thread but the other thread sees old, uninitialized values because the fields are not final or volatile or accessed in a locked or synchronized block. Other fields are not guarenteed to be thread safe.
The best case in point would be the notoriously broken double-checked locking idiom. I'll extract from it only the part relevant for this argument. Take this code:
public class Holder { public static File f; }
Somewhere in Thread A you do Holder.f = new File("path"); and elsewhere in Thread B you do File xxf = Holder.f; and proceed to use it. There is no guarantee that, even if you read the reference to Holder.f, any field of the File instance will be in any defined state. You may read all nulls, (zeros, falses, depending on type), as well as any combination of non-null and null values.
Related
The Javadoc in Guava's ImmutableList says that the class has the properties of Guava's ImmutableCollection, one of which is thread safety:
Thread safety. It is safe to access this collection concurrently from multiple threads.
But look at how the ImmutableList is built by its Builder - The Builder keeps all elements in a Object[] (that's okay since no one said that the builder was thread safe) and upon construction passes that array (or possibly a copy) to the constructor of RegularImmutableList:
public abstract class ImmutableList<E> extends ImmutableCollection<E>
implements List<E>, RandomAccess {
...
static <E> ImmutableList<E> asImmutableList(Object[] elements, int length) {
switch (length) {
case 0:
return of();
case 1:
return of((E) elements[0]);
default:
if (length < elements.length) {
elements = Arrays.copyOf(elements, length);
}
return new RegularImmutableList<E>(elements);
}
}
...
public static final class Builder<E> extends ImmutableCollection.Builder<E> {
Object[] contents;
...
public ImmutableList<E> build() { //Builder's build() method
forceCopy = true;
return asImmutableList(contents, size);
}
...
}
}
What does RegularImmutableList do with these elements? What you'd expect, simply initiates its internal array, which is then used for all read oprations:
class RegularImmutableList<E> extends ImmutableList<E> {
final transient Object[] array;
RegularImmutableList(Object[] array) {
this.array = array;
}
...
}
How is this be thread safe? What guarantees the happens-before relationship between the writes performed in the Builder and the reads from RegularImmutableList?
According to the Java memory model there is a happens-before relationship in only five cases (from the Javadoc for java.util.concurrent):
Each action in a thread happens-before every action in that thread that comes later in the program's order.
An unlock (synchronized block or method exit) of a monitor happens-before every subsequent lock (synchronized block or method
entry) of that same monitor. And because the happens-before relation
is transitive, all actions of a thread prior to unlocking
happen-before all actions subsequent to any thread locking that
monitor.
A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar
memory consistency effects as entering and exiting monitors, but do
not entail mutual exclusion locking.
A call to start on a thread happens-before any action in the started thread.
All actions in a thread happen-before any other thread successfully returns from a join on that thread.
None of these seem to apply here. If some thread builds the list and passes its reference to some other threads without using locks (for example via a final or volatile field), I don't see what guarantees thread-safety. What am I missing?
Edit:
Yes, the write of the reference to the array is thread-safe on account of it being final. So that's clearly thread safe.
What I was wondering about were the writes of the individual elements. The elements of the array are neither final nor volatile. Yet they seem to be written by one thread and read by another without synchronization.
So the question can be boiled down to "if thread A writes to a final field, does that guarantee that other threads will see not just that write but all of A's previous writes as well?"
JMM guarantees safe initialization (all values initialized in the constructor will be visible to readers) if all fields in the object are final and there is no leakage of this from constructor1:
class RegularImmutableList<E> extends ImmutableList<E> {
final transient Object[] array;
^
RegularImmutableList(Object[] array) {
this.array = array;
}
}
The final field semantics guarantees that readers will see an up-to-date array:
The effects of all initializations must be committed to memory before
any code after constructor publishes the reference to the newly
constructed object.
Thank you to #JBNizet and to #chrylis for the link to the JLS.
1 - "If this is followed, then when the object is seen by another thread, that thread will always see the correctly constructed version of that object's final fields. It will also see versions of any object or array referenced by those final fields that are at least as up-to-date as the final fields are." - JLS §17.5.
As you stated: "Each action in a thread happens-before every action in that thread that comes later in the program's order."
Obviously, if a thread could somehow access the object before the constructor was even invoked, you would be screwed. So something must prevent the object from being accessed before its constructor returns. But once the constructor returns, anything that lets another thread access the object is safe because it happens after in the constructing thread's program order.
Basic thread safety with any shared object is accomplished by ensuring that whatever allows threads to access the object does not take place until the constructor returns, establishing that anything the constructor might do happens before any other thread might access the object.
The flow is:
The object does not exist and cannot be accessed.
Some thread calls the object's constructor (or does whatever else is needed to get the object ready to be used).
That thread then does something to allow other threads to access the object.
Other threads can now access the object.
Program order of the thread invoking the constructor ensures that no part of 4 happens until all of 2 is done.
Note that this applies just the same if things need to be done after the constructor returns, you can just consider them logically part of the construction process. And similarly, parts of the job can be done by other threads so long as anything that needs to see work done by another thread cannot start until some relationship is established with the work that other thread did.
Does that not 100% answer your question?
To restate:
How is this be thread safe? What guarantees the happens-before relationship between the writes performed in the Builder and the reads from RegularImmutableList?
The answer is whatever prevented the object from being accessed before the constructor was even called (which has to be something, otherwise we'd be completely screwed) continues to prevent the object from being accessed until after the constructor returns. The constructor is effectively an atomic operation because no other thread could possibly attempt to access the object while it's running. Once the constructor returns, whatever the thread that called the constructor does to allow other threads to access the object necessarily takes place after the constructor returns because, "[e]ach action in a thread happens-before every action in that thread that comes later in the program's order."
And, one more time:
If some thread builds the list and passes its reference to some other threads without using locks (for example via a final or volatile field), I don't see what guarantees thread-safety. What am I missing?
The thread first builds the list and then next passes its reference. The building of the list "happens-before every action in that thread that comes later in the program's order" and thus happens-before the passing of the reference. Thus any thread that sees the passing of the reference happens-after the building of the list completed.
Were this not the case, there would be no good way to construct an object in one thread and then give other threads access to it. But this is perfectly safe to do because whatever method you use to hand the object from one thread to another will establish the necessarily relationship.
You are talking about two different things in here.
Access to already built RegularImmutableList and its array is thread safe because there wont be any concurrent writes and reads to that array. Only concurrent reads.
The threading issue can happen when you pass it to another thread. But that has nothing to do with RegularImmutableList but with how other threads see reference to it.
Lets say one thread creates RegularImmutableList and passes its reference to another thread. For the other thread to see that the reference has been updated and is now pointing to new created RegularImmutableList you will need to use either synchronization or volatile.
EDIT:
I think the concern OP has is how JMM makes sure that whatever got written into the array after its creation from one building thread gets visible to other threads after its reference gets passed to them.
This happens by the use or volatile or synchronization. When for example reader thread assigns RegularImmutableList to volatile variable the JMM will make sure that all writes to array get flashed into main memory and when other thread reads from it JMM makes sure that it will see all flashed writes.
public class Test{
private MyObj myobj = new MyObj(); //it is not volatile
public class Updater extends Thred{
myobje = getNewObjFromDb() ; //not am setting new object
}
public MyObj getData(){
//getting stale date is fine for
return myobj;
}
}
Updated regularly updates myobj
Other classes fetch data using getData
IS this code thread safe without using volatile keyword?
I think yes. Can someone confirm?
No, this is not thread safe. (What makes you think it is?)
If you are updating a variable in one thread and reading it from another, you must establish a happens-before relationship between the write and the subsequent read.
In short, this basically means making both the read and write synchronized (on the same monitor), or making the reference volatile.
Without that, there are no guarantees that the reading thread will see the update - and it wouldn't even be as simple as "well, it would either see the old value or the new value". Your reader threads could see some very odd behaviour with the data corruption that would ensue. Look at how lack of synchronization can cause infinite loops, for example (the comments to that article, especially Brian Goetz', are well worth reading):
The moral of the story: whenever mutable data is shared across threads, if you don’t use synchronization properly (which means using a common lock to guard every access to the shared variables, read or write), your program is broken, and broken in ways you probably can’t even enumerate.
No, it isn't.
Without volatile, calling getData() from a different thread may return a stale cached value.
volatile forces assignments from one thread to be visible on all other threads immediately.
Note that if the object itself is not immutable, you are likely to have other problems.
You may get a stale reference. You may not get an invalid reference.
The reference you get is the value of the reference to an object that the variable points to or pointed to or will point to.
Note that there are no guarantees how much stale the reference may be, but it's still a reference to some object and that object still exists. In other words, writing a reference is atomic (nothing can happen during the write) but not synchronized (it is subject to instruction reordering, thread-local cache et al.).
If you declare the reference as volatile, you create a synchronization point around the variable. Simply speaking, that means that all cache of the accessing thread is flushed (writes are written and reads are forgotten).
The only types that don't get atomic reads/writes are long and double because they are larger than 32-bits on 32-bit machines.
If MyObj is immutable (all fields are final), you don't need volatile.
The big problem with this sort of code is the lazy initialization. Without volatile or synchronized keywords, you could assign a new value to myobj that had not been fully initialized. The Java memory model allows for part of an object construction to be executed after the object constructor has returned. This re-ordering of memory operations is why the memory-barrier is so critical in multi-threaded situations.
Without a memory-barrier limitation, there is no happens-before guarantee so you do not know if the MyObj has been fully constructed. This means that another thread could be using a partially initialized object with unexpected results.
Here are some more details around constructor synchronization:
Constructor synchronization in Java
Volatile would work for boolean variables but not for references. Myobj seems to perform like a cached object it could work with an AtomicReference. Since your code extracts the value from the DB I'll let the code stay as is and add the AtomicReference to it.
import java.util.concurrent.atomic.AtomicReference;
public class AtomicReferenceTest {
private AtomicReference<MyObj> myobj = new AtomicReference<MyObj>();
public class Updater extends Thread {
public void run() {
MyObj newMyobj = getNewObjFromDb();
updateMyObj(newMyobj);
}
public void updateMyObj(MyObj newMyobj) {
myobj.compareAndSet(myobj.get(), newMyobj);
}
}
public MyObj getData() {
return myobj.get();
}
}
class MyObj {
}
Will the following code cause same problems, if variable 'commonSet' of this method was instead a class level field. If it was a class level field, I'll have to wrap adding to set operation within a synchronized block as HashSet is not thread safe. Should I do the same in following code, since multiple threads are adding on to the set or even the current thread may go on to mutate the set.
public void threadCreatorFunction(final String[] args) {
final Set<String> commonSet = new HashSet<String>();
final Runnable runnable = new Runnable() {
#Override
public void run() {
while (true) {
commonSet.add(newValue());
}
}
};
new Thread(runnable, "T_A").start();
new Thread(runnable, "T_B").start();
}
The reference to 'commonSet' is 'locked' by using final. But multiple threads operating on it can still corrupt the values in the set(it may contain duplicates?). Secondly, confusion is since 'commonSet' ia a method level variable - it's same reference will be on the stack memory of the calling method (threadCreatorFunction) and stack memory of run methods - is this correct?
There are quite a few questions related to this:
Why do variables passed to runnable need to be final?
Why are only final variables accessible in anonymous class?
But, I cannot see them stressing on thread safe part of such sharing/passing of mutables.
No, this is absolutely not thread-safe. Just because you've got it in a final variable, that means that both threads will see the same reference, which is fine - but it doesn't make the object any more thread-safe.
Either you need to synchronize access, or use ConcurrentSkipListSet.
An interesting example.
The reference commonSet is thread safe and immutable. It is on the stack for the first thread and a field of your anonymous Runnable class as well. (You can see this in a debugger)
The set commonSet refers to is mutable and not thread safe. You need to use synchronized, or a Lock to make it thread safe. (Or use a thread safe collection instead)
I think you're missing a word in your first sentence:
Will the following code cause same problems if variable 'commonSet' of this method was a ??? instead a class level field.
I think you're a little bit confused though. The concurrency issues have nothing to do with whether or not the reference to your mutable data structure is declared final. You need to declare the reference as final because you're closing over it inside the anonymous inner class declaration for your Runnable. If you're actually going to have multiple threads reading/writing the data structure then you need to either use locks (synchronize) or use a concurrent data structure like java.util.concurrent.ConcurrentHashMap.
The commonSet is shared among two Threads. You have declared it as final and thus you made the reference immutable (you can not re-assign it), but the actual data inside the Set is still mutable. Suppose that one Thread puts some data in and some other Thread reads some data out. Whenever the first thread puts data in, you most probably want to lock that Set so that no other Thread could read until that data is written. Does that happen with a HashSet? Not really.
As others have already commented, you are mistaking some concepts, like final and synchronized.
I think that if you explain what you want to accomplish with your code,it would be much easier to help you. I've got the impression that this code snippet is more an example that the actual code.
Some questions: Why is the set defined inside the function? should it be shared among threads? Something that puzzles me is that you crate two threads with the same instance of the runnable
new Thread(runnable, "T_A").start();
new Thread(runnable, "T_B").start();
Whether commonset is used by single thread or multiple it is only the reference that is immutable for final objects(i.e, once assigned you cannot assign another obj reference again) however you can still modify the contents referenced by this object using that reference.
If it were not final one thread could have initialized it again and changed the reference
commonSet = new HashSet<String>();
commonSet.add(newValue());
in which case these two threads may use two different commonsets which is probably not what you want
I want to make sure that I correctly understand the 'Effectively Immutable Objects' behavior according to Java Memory Model.
Let's say we have a mutable class which we want to publish as an effectively immutable:
class Outworld {
// This MAY be accessed by multiple threads
public static volatile MutableLong published;
}
// This class is mutable
class MutableLong {
private long value;
public MutableLong(long value) {
this.value = value;
}
public void increment() {
value++;
}
public long get() {
return value;
}
}
We do the following:
// Create a mutable object and modify it
MutableLong val = new MutableLong(1);
val.increment();
val.increment();
// No more modifications
// UPDATED: Let's say for this example we are completely sure
// that no one will ever call increment() since now
// Publish it safely and consider Effectively Immutable
Outworld.published = val;
The question is:
Does Java Memory Model guarantee that all threads MUST have Outworld.published.get() == 3 ?
According to Java Concurrency In Practice this should be true, but please correct me if I'm wrong.
3.5.3. Safe Publication Idioms
To publish an object safely, both the reference to the object and the
object's state must be made visible to other threads at the same time.
A properly constructed object can be safely published by:
- Initializing an object reference from a static initializer;
- Storing a reference to it into a volatile field or AtomicReference;
- Storing a reference to it into a final field of a properly constructed object; or
- Storing a reference to it into a field that is properly guarded by a lock.
3.5.4. Effectively Immutable Objects
Safely published effectively immutable objects can be used safely by
any thread without additional synchronization.
Yes. The write operations on the MutableLong are followed by a happens-before relationship (on the volatile) before the read.
(It is possible that a thread reads Outworld.published and passes it on to another thread unsafely. In theory, that could see earlier state. In practice, I don't see it happening.)
There is a couple of conditions which must be met for the Java Memory Model to guarantee that Outworld.published.get() == 3:
the snippet of code you posted which creates and increments the MutableLong, then sets the Outworld.published field, must happen with visibility between the steps. One way to achieve this trivially is to have all that code running in a single thread - guaranteeing "as-if-serial semantics". I assume that's what you intended, but thought it worth pointing out.
reads of Outworld.published must have happens-after semantics from the assignment. An example of this could be having the same thread execute Outworld.published = val; then launch other the threads which could read the value. This would guarantee "as if serial" semantics, preventing re-ordering of the reads before the assignment.
If you are able to provide those guarantees, then the JMM will guarantee all threads see Outworld.published.get() == 3.
However, if you're interested in general program design advice in this area, read on.
For the guarantee that no other threads ever see a different value for Outworld.published.get(), you (the developer) have to guarantee that your program does not modify the value in any way. Either by subsequently executing Outworld.published = differentVal; or Outworld.published.increment();. While that is possible to guarantee, it can be so much easier if you design your code to avoid both the mutable object, and using a static non-final field as a global point of access for multiple threads:
instead of publishing MutableLong, copy the relevant values into a new instance of a different class, whose state cannot be modified. E.g.: introduce the class ImmutableLong, which assigns value to a final field on construction, and doesn't have an increment() method.
instead of multiple threads accessing a static non-final field, pass the object as a parameter to your Callable/Runnable implementations. This will prevent the possibility of one rogue thread from reassigning the value and interfering with the others, and is easier to reason about than static field reassignment. (Admittedly, if you're dealing with legacy code, this is easier said than done).
The question is: Does Java Memory Model guarantee that all threads
MUST have Outworld.published.get() == 3 ?
The short answer is no. Because other threads might access Outworld.published before it has been read.
After the moment when Outworld.published = val; had been performed, under condition that no other modifications done with the val - yes - it always be 3.
But if any thread performs val.increment then its value might be different for other threads.
The situation is the following:
I have an object with lots of setters and getters.
Instance of this object is created in a one particular thread where all values are set. Initially I create an "empty" object using new statement and only then I call some setters methods based on some complicated legacy logic.
Only then this object became available to all other threads that use only getters.
The question: Do I have to make all variables of this class volatile or not?
Concerns:
Creation of a new instance of the object and setting all its values
is separated in time.
But all other threads have no idea about this
new instance until all values are set. So other threads shall not
have a cache of not fully initialized object. Isn't it?
Note: I am aware about builder pattern, but I cannot apply it there for several other reasons :(
EDITED:
As I feel two answers from Mathias and axtavt do not match very well, I would like to add an example:
Let's say we have a foo class:
class Foo {
public int x=0;
}
and two threads are using it as described above:
// Thread 1 init the value:
Foo f = new Foo();
f.x = 5;
values.add(f); // Publication via thread-safe collection like Vector or Collections.synchronizedList(new ArrayList(...)) or ConcurrentHashMap?.
// Thread 2
if (values.size()>0){
System.out.println(values.get(0).x); // always 5 ?
}
As I understood Mathias, it can print out 0 on some JVM according to JLS. As I understood axtavt it will always print 5.
What is your opinion?
--
Regards,
Dmitriy
In this case you need to use safe publication idioms when making your object available to other threads, namely (from Java Concurrency in Practice):
Initializing an object reference from a static initializer;
Storing a reference to it into a volatile field or AtomicReference;
Storing a reference to it into a final field of a properly constructed object; or
Storing a reference to it into a field that is properly guarded by a lock.
If you use safe publication, you don't need to declare fields volatile.
However, if you don't use it, declaring fields volatile (theoretically) won't help, because memory barriers incurred by volatile are one-side: volatile write can be reordered with non-volatile actions after it.
So, volatile ensures correctness in the following case:
class Foo {
public int x;
}
volatile Foo foo;
// Thread 1
Foo f = new Foo();
f.x = 42;
foo = f; // Safe publication via volatile reference
// Thread 2
if (foo != null)
System.out.println(foo.x); // Guaranteed to see 42
but don't work in this case:
class Foo {
public volatile int x;
}
Foo foo;
// Thread 1
Foo f = new Foo();
// Volatile doesn't prevent reordering of the following actions!!!
f.x = 42;
foo = f;
// Thread 2
if (foo != null)
System.out.println(foo.x); // NOT guaranteed to see 42,
// since f.x = 42 can happen after foo = f
From the theoretical point of view, in the first sample there is a transitive happens-before relationship
f.x = 42 happens before foo = f happens before read of foo.x
In the second example f.x = 42 and read of foo.x are not linked by happens-before relationship, therefore they can be executed in any order.
You do not need to declare you field volatile of its value is set before the start method is called on the threads that read the field.
The reason is that in that case the setting is in a happens-before relation (as defined in the Java Language Specification) with the read in the other thread.
The relevant rules from the JLS are:
Each action in a thread happens-before every action in that thread that comes later in the program's order
A call to start on a thread happens-before any action in the started thread.
However, if you start the other threads before setting the field, then you must declare the field volatile. The JLS does not allow you to assume that the thread will not cache the value before it reads it for the first time, even if that may be the case on a particular version of the JVM.
In order to fully understand what's going on I have been reading about the Java Memory Model (JMM). A useful introduction to the JMM can be found in Java Conurrency in Practice.
I think the answer to the question is: yes, in the example given making the members of the object volatile is NOT necessary. However, this implementation is rather brittle as this guarantee depends on the exact ORDER in which things are done and on the Thread-Safety of the Container. A builder pattern would be a much better option.
Why is it guaranteed:
The thread 1 does all the assignment BEFORE putting the value into the thread safe container.
The add method of the thread safe container must use some synchronization construct like volatile read / write, lock or synchronized(). This guarantees two things:
Instructions which are in thread 1. before the synchronization will actually be executed before. That is the JVM is not allowed to reorder instructions for optimization purposes with the synchronization instruction. This is called happens-before guarantee.
All writes which happen before the synchronization in thread 1 will afterwards be visible to all other threads.
The objects are NEVER modified after publication.
However, if the container was not thread safe or the Order of things was changed by somebody not aware of the pattern or the objects are changed accidentally after publication then there are no guarantees anymore. So, following the Builder Pattern, as can be generated by google AutoValue or Freebuilder is much safer.
This article on the topic is also quite good:
http://tutorials.jenkov.com/java-concurrency/volatile.html