Immutable Objects and Initialization Safety [duplicate] - java

The below example is from the book "Java Concurrency in Practice" by Brian Goetz, Chapter 3, Section 3.5.1. This is an example of Improper publication of objects:
class SomeClass {
public Holder holder;
public void initialize() {
holder = new Holder(42);
}
}
public class Holder {
private int n;
public Holder(int n) { this.n = n; }
public void assertSanity() {
if (n != n)
throw new AssertionError("This statement is false");
}
}
It says that the Holder could appear to another thread in an inconsistent state and another thread could observe a partially constructed object. How can this happen? Could you give a scenario using the above example?
Also it goes on to say that there are cases when a thread may see a stale value the first time it reads a field and then a more up to date value the next time, which is why the assertSanity can throw AssertionError. How can the AssertionError be thrown?
From further reading, one way to fix this problem is to make Holder immutable by making the variable n final. For now, let us assume that Holder is not immutable but effectively immutable.
To safely publish this object, do we have to make holder initialization static and declare it as volatile (both static initialization and volatile or just volatile)?
Something like this:
public class SomeClass {
public static volatile Holder holder = new Holder(42);
}

You can imagine creation of an object has a number of non-atomic functions. First you want to initialize and publish Holder. But you also need to initialize all the private member fields and publish them.
Well, the JMM has no rules for the write and publication of the holder's member fields to happen before the write of the holder field as occurring in initialize(). What that means is that even though holder is not null, it is legal for the member fields to not yet be visible to other threads.
You may end up seeing something like
public class Holder {
String someString = "foo";
int someInt = 10;
}
holder may not be null but someString could be null and someInt could be 0.
Under an x86 architecture this is, from what I know, impossible to happen but may not be the case in others.
So next question may be "Why does volatile fix this?" The JMM says that all writes that happen prior to the volatile store are visible to all subsequent threads of the volatile field.
So if holder is volatile and you see holder is not null, based on volatile rules, all of the fields would be initialized.
To safely publish this object, do we have to make holder
initialization static and declare it as volatile
Yes, because as I mentioned if the holder variable is not null then all writes would be visible.
How can the AssertionError be thrown?
If a thread notices holder not to be null, and invokes AssertionError upon entering the method and reading n the first time may be 0 (the default value), the second read of n may now see the write from the first thread.

public class Holder {
private int n;
public Holder(int n) { this.n = n; }
public void assertSanity() {
if (n!=n)
throw new AssertionError("This statement is false");
}
}
Say one thread creates an instance of Holder, and passes the reference to another thread, which calls assertSanity.
The assignment to this.n in the constructor occurs in one thread. And two reads of n occur in another thread. The only happens-before relation here is between the two reads. There is no happens-before relation involving the assignment and any of the reads.
Without any happens-before relations, statements can be reordered in various ways, so from the perspective of one thread, this.n = n can occur after the constructor has returned.
This means that the assignment can appear to occur in the second thread after the first read and before the second, resulting in inconsistent values. The can be prevented by making n final, which guarantees that the value is assigned before the constructor finishes.

The problem which you ask about is caused by JVM optimizations and the fact that simple object creation:
MyClass obj = new MyClass()
isn't always done by steps:
Reserve memory for new instance of MyClass on the Heap
Execute constructor to set internal properties values
Set 'obj' reference to address on the Heap
For some optimization purposes JVM can do it by steps:
Reserve memory for new instance of MyClass on the Heap
Set 'obj' reference to address on the Heap
Execute constructor to set internal properties values
So, imagine if two threads want to access MyClass object. First one creates it but due to JVM it executes 'optimized' set of steps. If it will execute only step 1 and 2 (but wont do 3) than we can have a serious problem. If second thread uses this object (it wont be null because it already points to reserved part of memory on the Heap) than it's properties will be incorrect which can lead to nasty things.
This optimization wont happen if reference will be volatile.

The Holder class is OK, but the class someClass can appear in an inconsisten state - between creation and the call to initialize() the holder instance variable is null.

Related

Thread Safety in Java Using Atomic Variables

I have a Java class, here's its code:
public class MyClass {
private AtomicInteger currentIndex;
private List<String> list;
MyClass(List<String> list) {
this.list = list; // list is initialized only one time in this constructor and is not modified anywhere in the class
this.currentIndex = new AtomicInteger(0);
}
public String select() {
return list.get(currentIndex.getAndIncrement() % list.size());
}
}
Now my question:
Is this class really thread safe thanks to using an AtomicInteger only or there must be an addional thread safety mechansim to ensure thread-safety (for example locks)?
The use of currentIndex.getAndIncrement() is perfectly thread-safe. However, you need a change to your code to make it thread-safe in all circumstances.
The fields currentIndex and list need to be made final to achieve full thread-safety, even on unsafe publication of the reference to your MyClass object.
private final AtomicInteger currentIndex;
private final List<String> list;
In practice, if you always ensure that your MyClass object itself is safely published, for example if you create it on the main thread, before any of the threads that use it are started, then you don't need the fields to be final.
Safe publication means that the reference to the MyClass object itself is done in a way that has a guaranteed multi-threaded ordering in the Java Memory Model.
It could be that:
All threads that use the reference get it from a field that was initialized by the thread that started them, before their thread was started
All threads that use the reference get it from a method that was synchronized on the same object as the code that set the reference (you have a synchronized getter and setter for the field)
You make the field that contains the reference volatile
It was in a final field if that final field was initialized as described in section 17.5 of the JLS.
A few more cases the are not easily used to publish references
I think your code contains two bugs.
First, normally when you receive an object from some unknown source like your constructor does, you make a defensive copy to be certain it is not modified outside of the class.
MyClass(List<String> list) {
this.list = new ArrayList<String>( list );
So if you do this, do you now need to mutate that list anywhere inside the class? If so, the method:
public String select() {
return list.get(currentIndex.getAndIncrement() % list.size());
isn't atomic. What could happen here is a thread call getAndIncrement() and then perform the modulus (%). Then at that point if it's swapped out with another thread that removes an item from the list, the old limit of list.size() will no longer be valid.
I think there's nothing for it but to add synchronized to the whole method:
public synchronized String select() {
return list.get(currentIndex.getAndIncrement() % list.size());
And the same with any other mutator.
(final as the other poster mentions is still required on the instance fields.)

Visibility effects of synchronization in Java

This article says:
In this noncompliant code example, the Helper class is made immutable
by declaring its fields final. The JMM guarantees that immutable
objects are fully constructed before they become visible to any other
thread. The block synchronization in the getHelper() method guarantees
that all threads that can see a non-null value of the helper field
will also see the fully initialized Helper object.
public final class Helper {
private final int n;
public Helper(int n) {
this.n = n;
}
// Other fields and methods, all fields are final
}
final class Foo {
private Helper helper = null;
public Helper getHelper() {
if (helper == null) { // First read of helper
synchronized (this) {
if (helper == null) { // Second read of helper
helper = new Helper(42);
}
}
}
return helper; // Third read of helper
}
}
However, this code is not guaranteed to succeed on all Java Virtual
Machine platforms because there is no happens-before relationship
between the first read and third read of helper. Consequently, it is
possible for the third read of helper to obtain a stale null value
(perhaps because its value was cached or reordered by the compiler),
causing the getHelper() method to return a null pointer.
I don't know what to make of it. I can agree that there is no happens before relationship between first and third read, at least no immediate relationship. Isn't there a transitive happens-before relationship in a sense that first read must happen before second, and that second read has to happen before third, therefore first read has to happen before third
Could someone elaborate more proficiently?
No, there is no transitive relationship.
The idea behind the JMM is to define rules that JVM must respect. Providing the JVM follows these rules, they are authorized to reorder and execute code as they want.
In your example, the 2nd read and the 3rd read are not related - no memory barrier introduced by the use of synchronized or volatile for example. Thus, the JVM is allowed to execute it as follow:
public Helper getHelper() {
final Helper toReturn = helper; // "3rd" read, reading null
if (helper == null) { // First read of helper
synchronized (this) {
if (helper == null) { // Second read of helper
helper = new Helper(42);
}
}
}
return toReturn; // Returning null
}
Your call would then return a null value. Yet, a singleton value would have been created. However, sub-sequent calls may still get a null value.
As suggested, using a volatile would introduce new memory barrier. Another common solution is to capture the read value and return it.
public Helper getHelper() {
Helper singleton = helper;
if (singleton == null) {
synchronized (this) {
singleton = helper;
if (singleton == null) {
singleton = new Helper(42);
helper = singleton;
}
}
}
return singleton;
}
As your rely on a local variable, there is nothing to reorder. Everything is happening in the same thread.
No, there's no any transitive relationship between those reads. synchornized only guarantees visibility of changes that were made within synchronized blocks of the same lock. In this case all reads do not use the synchronized blocks on the same lock, hence this is flawed and visibility is not guaranteed.
Because there is no locking once the field is initialized, it is critical that the field be declared volatile. This will ensure the visibility.
private volatile Helper helper = null;
It's all explained here https://shipilev.net/blog/2014/safe-public-construction/#_singletons_and_singleton_factories, the issue simple.
... Notice that we do several reads of instance in this code, and at
least "read 1" and "read 3" are the reads without any
synchronization ... Specification-wise, as mentioned in happens-before
consistency rules, a read action can observe the unordered write via
the race. This is decided for each read action, regardless what other
actions have already read the same location. In our example, that
means that even though "read 1" could read non-null instance, the code
then moves on to returning it, then it does another racy read, and it
can read a null instance, which would be returned!

How to avoid synchronization on a non-final field?

If we have 2 classes that operate on the same object under different threads and we want to avoid race conditions, we'll have to use synchronized blocks with the same monitor like in the example below:
class A {
private DataObject mData; // will be used as monitor
// thread 3
public setObject(DataObject object) {
mData = object;
}
// thread 1
void operateOnData() {
synchronized(mData) {
mData.doSomething();
.....
mData.doSomethingElse();
}
}
}
class B {
private DataObject mData; // will be used as monitor
// thread 3
public setObject(DataObject object) {
mData = object;
}
// thread 2
void processData() {
synchronized(mData) {
mData.foo();
....
mData.bar();
}
}
}
The object we'll operate on, will be set by calling setObject() and it will not change afterwards. We'll use the object as a monitor. However, intelliJ will warn about synchronization on a non-final field.
In this particular scenario, is the non-local field an acceptable solution?
Another problem with the above approach is that it is not guaranteed that the monitor (mData) will be observed by thread 1 or thread 2 after it is set by thread 3, because a "happens-before" relationship hasn't been established between setting and reading the monitor. It could be still observed as null by thread 1 for example. Is my speculation correct?
Regarding possible solutions, making the DataObject thread-safe is not an option. Setting the monitor in the constructor of the classes and declaring it final can work.
EDIT Semantically, the mutual exclusion needed is related to the DataObject. This is the reason that I don't want to have a secondary monitor. One solution would be to add lock() and unlock() methods on DataObject that need to be called before working on it. Internally they would use a Lock Object. So, the operateOnData() method becomes:
void operateOnData() {
mData.lock()
mData.doSomething();
.....
mData.doSomethingElse();
mData.unlock();
}
You may create a wrapper
class Wrapper
{
DataObject mData;
synchronized public setObject(DataObject mData)
{
if(this.mData!=null) throw ..."already set"
this.mData = mData;
}
synchronized public void doSomething()
{
if(mData==null) throw ..."not set"
mData.doSomething();
}
A wrapper object is created and passed to A and B
class A
{
private Wrapper wrapper; // set by constructor
// thread 1
operateOnData()
{
wrapper.doSomething();
}
Thread 3 also has a reference to the wrapper; it calls setObject() when it's available.
Some platforms provide explicit memory-barrier primitives which will ensure that if one thread writes to a field and then does a write barrier, any thread which has never examined the object in question can be guaranteed to see the effect of that write. Unfortunately, as of the last time I asked such a question, Cheapest way of establishing happens-before with non-final field, the only time Java could offer any guarantees of threading semantics without requiring any special action on behalf of a reading thread was by using final fields. Java guarantees that any references made to an object through a final field will see any stores which were performed to final or non-fields of that object before the reference was stored in the final field but that relationship is not transitive. Thus, given
class c1 { public final c2 f;
public c1(c2 ff) { f=ff; }
}
class c2 { public int[] arr; }
class c3 { public static c1 r; public static c2 f; }
If the only thing that ever writes to c3 is a thread which performs the code:
c2 cc = new c2();
cc.arr = new int[1];
cc.arr[0] = 1234;
c3.r = new c1(cc);
c3.f = c3.r.f;
a second thread performs:
int i1=-1;
if (c3.r != null) i1=c3.r.f.arr[0];
and a third thread performs:
int i2=-1;
if (c3.f != null) i2=c3.f.arr[0];
The Java standard guarantees that the second thread will, if the if condition yields true, set i1 to 1234. The third thread, however, might possibly see a non-null value for c3.f and yet see a null value for c3.arr or see zero in c3.f.arr[0]. Even though the value stored into c3.f had been read from c3.r.f and anything that reads the final reference c3.r.f is required to see any changes made to that object identified thereby before the reference c3.r.f was written, nothing in the Java Standard would forbid the JIT from rearranging the first thread's code as:
c2 cc = new c2();
c3.f = cc;
cc.arr = new int[1];
cc.arr[0] = 1234;
c3.r = new c1(cc);
Such a rewrite wouldn't affect the second thread, but could wreak havoc with the third.
A simple solution is to just define a public static final object to use as the lock. Declare it like this:
/**Used to sync access to the {#link #mData} field*/
public static final Object mDataLock = new Object();
Then in the program synchronize on mDataLock instead of mData.
This is very useful, because in the future someone may change mData such that it's value does change then your code would have a slew of weird threading bugs.
This method of synchronization removes that possibility. It also is really low cost.
Also having the lock be static means that all instances of the class share a single lock. In this case, that seems like what you want.
Note that if you have many instances of these classes, this could become a bottleneck. Since all of the instances are now sharing a lock, only a single instance can change any mData at a single time. All other instances have to wait.
In general, I think something like a wrapper for the data you want to synchronize is a better approach, but I think this will work.
This is especially true if you have multiple concurrent instances of these classes.

Does making all fields final of an object guarantees safe publication?

I was reading about safe publication from "Java Concurrency in Practice" and needs help to understand this one example. I know it is simple but looks like i got too much into it and got confused.
public class VolatileCachedFactorizer implements Servlet {
private volatile OneValueCache cache =
new OneValueCache(null, null);
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
BigInteger[] factors = cache.getFactors(i);
if (factors == null) {
factors = factor(i);
cache = new OneValueCache(i, factors);
}
encodeIntoResponse(resp, factors);
}
}
class OneValueCache {
private final BigInteger lastNumber;
private final BigInteger[] lastFactors;
public OneValueCache(BigInteger i,
BigInteger[] factors) {
lastNumber = i;
lastFactors = Arrays.copyOf(factors, factors.length);
}
public BigInteger[] getFactors(BigInteger i) {
if (lastNumber == null || !lastNumber.equals(i))
return null;
else
return Arrays.copyOf(lastFactors, lastFactors.length);
}
}
Above are the two classes, One is VolatileCachedFactorizer which is a servlet and will be initialized only once by the container and each request would call service method to get the factors of the number passed.
Now, OneValueCache is an immutable object which would cache the latest number in the cache along with its factors.
Now, as per the book it is safely published.
My question is that OneValueCache is not declared final in VolatileCachedFactorizer although all its fields are final. When the constructor of OneValueCache is executed from service method then isn't the following scenario possible -
lastNumber would be properly initialized (as it is final) but lastFactors is not as both these statements are atomic. So, are there chances that it might be in an improper state.
If OneValueCache was declared final in VolatileCachedFactorizer then JVM would guarantee that it would be properly initialized.
Thanks
The short answer to your specific question is no. But the thing to note is that the OneValueCache instance "cache" is created in the service method and immediately made visible to all other threads since that is made possible due to the characteristics of the volatile keyword. If thread A writes to a volatile variable then once it is finished then thread B would see all the changes that thread A made if it encounters the volatile variable.
With volatile there are no reordering of memory operations and once the object is created then it is immediately available to be read since the volatile variables are not cached within a local cache wherein it is not visible to other threads on another processor.
If your confusion is on the cache object being properly instantiated then yes, then only the properly constructed immutable cache object be made visible to other threads.

Volatile keyword: is the variable I am using among two threads synchronized?

I have a code like the one below where an object is shared among two threads (the main thread and the Monitor thread). Do I have to declare MyObject globally and make it volatile to ensure it will be pushed to memory? Otherwise the if statement can print "Not null" if MyObject is only locally accessed by the thread and is not declared volatile, right?
public static void main(String[] args) {
MyObject obj = MyObjectFactory.createObject();
new Monitor(obj).start();
Thread.sleep(500);
if(obj == null)
System.out.println("Null");
else
System.out.println("Not null");
}
public void doSomethingWithObject(MyObject obj) {
obj = null;
}
private class Monitor extends Thread {
public Monitor(MyObject obj) {
this.obj=obj;
}
public void run() {
doSomethingWithObject(obj);
}
}
Note: The code example may not compile since I wrote it myself here on Stackoverflow. Consider it as a mix of pseudo code and real code.
The instance is shared but the references to it are not. Example:
String a = "hello";
String b = a;
b = null; // doesn't affect a
a and b are references to the same instance; changing one reference has no effect on the instance or any other references to the same instance.
So if you want to share state between threads, you will have to create a field inside MyObject which has to be volatile:
class MyObject { public volatile int shared; }
public void doSomethingWithObject(MyObject obj) {
obj.shared = 1; // main() can see this
}
Note that volatile just works for some types (references and all primitives except long). Since this is easy to get wrong, you should have a look at types in java.util.concurrent.atomic.
[EDIT] What I said above isn't correct. Instead, using volatile with long works as expected for Java 5 and better. This is the only way to ensure atomic read/writes for this type. See this question for references: Is there any point in using a volatile long?
Kudos go to Affe for pointing that out. Thanks.
You would rather have to synchronize on the object to ensure it will be set to null before the if check. Setting it to volatile only means changes will be "seen" immediately by other threads, but it is very likely that the if check will be executed before the doSomethingWithObject call.
If you want your object to go through a read-update-write scheme atomically, volatile won't cut it. You have to use synchronisation.
Volatility will ensure that the variable will not be cached in the current thread but it will not protect the variable from simultaneous updates, with the potential for the variable becoming something unexpected.
IBM's developerWorks has a useful article on the subject.
Your example consists only one thread, Monitor, which is created and run in main().
"make it volatile to ensure it will be pushed to memory?" - on the contrary, when you declare a variable as volatile - it ensures that it's NOT being "pushed" (cached) to the thread-local memory, cause there might be other threads that will change the value of the variable.
In order to make sure you print the correct value of a variable you should synchronize the method doSomethingWithObject (change the signature of the method to):
public synchronized void doSomethingWithObject(MyObject obj)
or create synchronized blocks around:
obj = null;
and
this.obj=obj;

Categories