Safe initialisation of null reference

Safe initialisation of null reference - java

I'm wondering what publication guarantees exist for a non-final field initialised to null, if any.
Consider the following snippet:
public class MyClass {
private final CopyOnWriteArrayList<Inner> list = new CopyOnWriteArrayList<>();
//called by thread 1
public void init() {
// initialise Inner instance
list.add(new Inner());
}
//called by thread 2
public void doSomething() {
for (Inner i : list) {
// access non-final field
Object ref = i.ref;
// do something
// ...
// ...
// potentially set i.ref
}
}
private static class Inner {
// initialised by thread 1
Object ref = null;
}
}
Assuming doSomething() is always called by thread 2, is this safe? What guarantees are made about what thread 2 will see the first time it's accessed? Is there any possibility thread 2 would see something that's non-null?
Where in the JMM are the semantics around this situation described?

JVM will guarantee that you don't see out of thin air values, so anything other than null it not possible, in case that List is not empty (in this example, of course). If there would have been a different thread involved (let's say Thread3) that would alter your list (add elements to it), Thread2 could see those updates. Just note that individual methods of CopyOnWriteArrayList are thread safe; your method doSomething is not.
See the JLS for the specifics or the excellent (and rather complicated, may be just to me) Aleksey article.

Talking about safe publication makes sense only when you are initialising a field with a meaningful object that has the state. Then, improper publication may lead to observing a partially constructed object.
In this case, null doesn't convey any state. It can be considered an immutable object. Immutable objects don't have publication problems.
What guarantees are made about what thread 2 will see the first time it's accessed?
Thread 2 will see null when referring to i.ref.
Note that the list may be empty because Thread 1 may not have added an Inner to it yet.
Is there any possibility thread 2 would see something that's non-null?
No.

Related

Java thread safety of setting a reference to an object

I'm wondering if the following class is thread safe:
class Example {
private Thing thing;
public setThing(Thing thing) {
this.thing = thing;
}
public use() {
thing.function();
}
}
Specifically, what happens if one thread calls setThing while another thread is in Thing::function via Example::use?
For example:
Example example = new Example();
example.setThing(new Thing());
createThread(example); // create first thread
createThread(example); // create second thread
//Thread1
while(1) {
example.use();
}
//Thread2
while(1) {
sleep(3600000); //yes, i know to use a scheduled thread executor
setThing(new Thing());
}
Specifically, I want to know, when setThing is called while use() is executing, will it continue with the old object successfully, or could updating the reference to the object somehow cause a problem.

There are 2 points when reasoning about thread safety of a particulcar class :
Visibility of shared state between threads.
Safety (preserving class invariants) when class object is used by multiple threads through class methods.
Shared state of Example class consists only from one Thing object.
The class isn't thread safe from visibility perspective. Result of setThing by one thread isn't seen by other threads so they can work with stale data. NPE is also acceptable cause initial value of thing during class initialization is null.
It's not possible to say whether it's safe to access Thing class through use method without its source code. However Example invokes use method without any synchronization so it should be, otherwise Example isn't thread safe.
As a result Example isn't thread safe. To fix point 1 you can either add volatile to thing field if you really need setter or mark it as final and initialize in constructor. The easiest way to ensure that 2 is met is to mark use as synchronized. If you mark setThing with synchronized as well you don't need volatile anymore. However there lots of other sophisticated techniques to meet point 2. This great book describes everything written here in more detail.

If the method is sharing resources and the thread is not synchronized, then the they will collide and several scenarios can occur including overwriting data computed by another thread and stored in a shared variable.
If the method has only local variables, then you can use the method by mutliple threads without worring about racing. However, usually non-helper classes manipulate member variables in their methods, therefore it's recommended to make methods synchronized or if you know exactly where the problem might occur, then lock (also called synchronize) a subscope of a method with a final lock/object.

Synchronized Block within Synchronized Method

I'm looking at some code in a third party library that contains a synchronized method, and within this method there is a synchronized block that locks on an instance variable. It's similar to this:
public class Foo {
final Bar bar = new Bar();
public synchronized void doSomething() {
// do something
synchronized(bar) {
// update bar
}
}
...
}
Does this make sense? If so, what benefits are there to having a synchronized statement within a synchronized method?
Given that a synchronized method locks on the entire object, it seems redundant to me. Perhaps this approach makes sense when working with instance variables that are not private?

In your example the method is both locking on the instance of Foo and on the object bar. Other methods may only be locking on the instance of Foo or on the object bar.
So, yes, this makes sense depending on exactly what they are doing. Presumably bar protects some smaller subset of data, and some methods will only need to lock on bar to perform their actions in a thread-safe manner.
What synchronized does
synchronized (on a method, or in a statement) creates a mutual exclusion zone (critical section or, specifically for Java, a reentrant mutex). The "key" for a thread to enter the critical section is the object reference used in the synchronized statement ‡. Only one thread can (recursively) "own" this "key" at one time across all blocks that use the same key; that is, only one thread can enter any block synchronized on a given object reference at one time.
Such a critical section simply prevents the operations (variable read/write) that you do inside the block happening concurrently with any other operations in all other critical sections that lock on the same object reference. (It doesn't automatically protect all variables inside an object).
In Java, such a critical section also creates a happens-before contract.
By example
As a somewhat contrived example †:
public class Foo {
final Bar bar = new Bar();
private int instanceCounter = 0;
private int barCounter = 0;
public synchronized void incrementBarCounterIfAllowed() {
synchronized (bar) {
if (instanceCounter < 10) barCounter++;
}
}
public synchronized void incrementClassCounter() {
instanceCounter++;
}
public void incrementBarCounter() {
synchronized (bar) {
barCounter++;
}
}
}
Whether the instance variables are private or not doesn't really matter to whether this approach is applicable. In a single class you can have multiple lock objects, each of which protect their own set of data.
However the risk of doing this is that you have to be very strict with coding conventions to prevent deadlocks by locking two locks in different orders in different places. For example, with the above code if you then do this from somewhere else in the code:
synchronized(myFoo.bar) {
myFoo.incrementClassCounter();
}
you risk a deadlock with the incrementBarCounterIfAllowed() method
† Note that barCounter could be an instance variable for Bar etc etc - I avoided that for the sake of brevity in the code sample.
‡ In the case of synchronized methods, that reference is the reference to the class instance (or to the Class<?> for the class for static methods).

When you say "synchronized method locks on the entire object", that's not true. Using synchronized only means that threads have to acquire that lock before they can enter the methods or blocks marked as synchronized that use that lock. The default object used as a lock for synchronized on instance methods is this. The default is this.getClass() if you put synchronized on a static method, or you can specify the object to use as a lock. Using synchronized doesn't do anything other than that to make instance fields inaccessible.
You can write a class where some methods or blocks are protected by one lock, some are protected by another lock, and for others you need both locks. Make sure you acquire the locks in the same order or you can cause a deadlock.

Consider this:
public void someMethod() {
synchronized(bar) {
// fully accessible before entering the other synchronized bar block
// but not afterwards
}
}
Get it clearly, synchronizing only blocks if 2 blocks synchronize on the same object.

I will give a real life example to explain what Andy explained through code(to those who are finding it difficult to understand this):
Suppose You have a 1 BHK Flat .
The way to enter in room is through hall (you have to first enter into hall to enter into room)
You want to restrict some one to use you room , but he can use/enter in the hall.
In this case if someone enters into room from back door and applies lock from inside. One can only enter into the hall and cant enter into the room until the person inside the room releases the lock .
Hope this clarifies to the people who are finding it difficult to understand this.

Java Thread Safety of Initialized Objects

Consider the following class:
public class MyClass
{
private MyObject obj;
public MyClass()
{
obj = new MyObject();
}
public void methodCalledByOtherThreads()
{
obj.doStuff();
}
}
Since obj was created on one thread and accessed from another, could obj be null when methodCalledByOtherThread is called? If so, would declaring obj as volatile be the best way to fix this issue? Would declaring obj as final make any difference?
Edit:
For clarity, I think my main question is:
Can other threads see that obj has been initialized by some main thread or could obj be stale (null)?

For the methodCalledByOtherThreads to be called by another thread and cause problems, that thread would have to get a reference to a MyClass object whose obj field is not initialized, ie. where the constructor has not yet returned.
This would be possible if you leaked the this reference from the constructor. For example
public MyClass()
{
SomeClass.leak(this);
obj = new MyObject();
}
If the SomeClass.leak() method starts a separate thread that calls methodCalledByOtherThreads() on the this reference, then you would have problems, but this is true regardless of the volatile.
Since you don't have what I'm describing above, your code is fine.

It depends on whether the reference is published "unsafely". A reference is "published" by being written to a shared variable; another thread reads the variable to get the reference. If there is no relationship of happens-before(write, read), the publication is called unsafe. An example of unsafe publication is through a non-volatile static field.
#chrylis 's interpretation of "unsafe publication" is not accurate. Leaking this before constructor exit is orthogonal to the concept of unsafe publication.
Through unsafe publication, another thread may observe the object in an uncertain state (hence the name); in your case, field obj may appear to be null to another thread. Unless, obj is final, then it cannot appear to be null even if the host object is published unsafely.
This is all too technical and it requires further readings to understand. The good news is, you don't need to master "unsafe publication", because it is a discouraged practice anyway. The best practice is simply: never do unsafe publication; i.e. never do data race; i.e. always read/write shared data through proper synchronization, by using synchronized, volatile or java.util.concurrent.
If we always avoid unsafe publication, do we still need final fields? The answer is no. Then why are some objects (e.g. String) designed to be "thread safe immutable" by using final fields? Because it's assumed that they can be used in malicious code that tries to create uncertain state through deliberate unsafe publication. I think this is an overblown concern. It doesn't make much sense in server environments - if an application embeds malicious code, the server is compromised, period. It probably makes a bit of sense in Applet environment where JVM runs untrusted codes from unknown sources - even then, this is an improbable attack vector; there's no precedence of this kind of attack; there are a lot of other more easily exploitable security holes, apparently.

This code is fine because the reference to the instance of MyClass can't be visible to any other threads before the constructor returns.
Specifically, the happens-before relation requires that the visible effects of actions occur in the same order as they're listed in the program code, so that in the thread where the MyClass is constructed, obj must be definitely assigned before the constructor returns, and the instantiating thread goes directly from the state of not having a reference to the MyClass object to having a reference to a fully-constructed MyClass object.
That thread can then pass a reference to that object to another thread, but all of the construction will have transitively happened-before the second thread can call any methods on it. This might happen through the constructing thread's launching the second thread, a synchronized method, a volatile field, or the other concurrency mechanisms, but all of them will ensure that all of the actions that took place in the instantiating thread are finished before the memory barrier is passed.
Note that if a reference to this gets passed out of the class inside the constructor somewhere, that reference might go floating around and get used before the constructor is finished. That's what's known as unsafe publishing of the object, but code such as yours that doesn't call non-final methods from the constructor (or directly pass out references to this) is fine.

Your other thread could see a null object. A volatile object could possibly help, but an explicit lock mechanism (or a Builder) would likely be a better solution.
Have a look at Java Concurrency in Practice - Sample 14.12

This class (if taken as is) is NOT thread safe. In two words: there is reordering of instructions in java (Instruction reordering & happens-before relationship in java) and when in your code you're instantiating MyClass, under some circumstances you may get following set of instructions:
Allocate memory for new instance of MyClass;
Return link to this block of memory;
Link to this not fully initialized MyClass is available for other threads, they can call "methodCalledByOtherThreads()" and get NullPointerException;
Initialize internals of MyClass.
In order to prevent this and make your MyClass really thread safe - you either have to add "final" or "volatile" to the "obj" field. In this case Java's memory model (starting from Java 5 on) will guarantee that during initialization of MyClass, reference to alocated for it block of memory will be returned only when all internals are initialized.
For more details I would strictly recommend you to read nice book "Java Concurrency in Practice". Exactly your case is described on the pages 50-51 (section 3.5.1). I would even say - you just can write correct multithreaded code without reading that book! :)

The originally picked answer by #Sotirios Delimanolis is wrong. #ZhongYu 's answer is correct.
There is the visibility issue of the concern here. So if MyClass is published unsafely, anything could happen.
Someone in the comment asked for evidence - one can check Listing 3.15 in the book Java Concurrency in Practice:
public class Holder {
private int n;
// Initialize in thread A
public Holder(int n) { this.n = n; }
// Called in thread B
public void assertSanity() {
if (n != n) throw new AssertionError("This statement is false.");
}
}
Someone comes up an example to verify this piece of code:
coding a proof for potential concurrency issue
As to the specific example of this post:
public class MyClass{
private MyObject obj;
// Initialize in thread A
public MyClass(){
obj = new MyObject();
}
// Called in thread B
public void methodCalledByOtherThreads(){
obj.doStuff();
}
}
If MyClass is initialized in Thread A, there is no guarantee that thread B will see this initialization (because the change might stay in the cache of the CPU that Thread A runs on and has not propagated into main memory).
Just as #ZhongYu has pointed out, because the write and read happens at 2 independent threads, so there is no happens-before(write, read) relation.
To fix this, as the original author has mentioned, we can declare private MyObject obj as volatile, which will ensure that the reference itself will be visible to other threads in timely manner
(https://www.logicbig.com/tutorials/core-java-tutorial/java-multi-threading/volatile-ref-object.html) .

Creating Object in a thread safety way

Directly from this web site, I came across the following description about creating object thread safety.
Warning: When constructing an object that will be shared between
threads, be very careful that a reference to the object does not
"leak" prematurely. For example, suppose you want to maintain a List
called instances containing every instance of class. You might be
tempted to add the following line to your constructor:
instances.add(this);
But then other threads can use instances to access the object before
construction of the object is complete.
Is anybody able to express the same concept with other words or another more graspable example?
Thanks in advance.

Let us assume, you have such class:
class Sync {
public Sync(List<Sync> list) {
list.add(this);
// switch
// instance initialization code
}
public void bang() { }
}
and you have two threads (thread #1 and thread #2), both of them have a reference the same List<Sync> list instance.
Now thread #1 creates a new Sync instance and as an argument provides a reference to the list instance:
new Sync(list);
While executing line // switch in the Sync constructor there is a context switch and now thread #2 is working.
Thread #2 executes such code:
for(Sync elem : list)
elem.bang();
Thread #2 calls bang() on the instance created in point 3, but this instance is not ready to be used yet, because the constructor of this instance has not been finished.
Therefore,
you have to be very careful when calling a constructor and passing a reference to the object shared between a few threads
when implementing a constructor you have to keep in mind that the provided instance can be shared between a few threads

Thread A is creating Object A, in the middle of creation object A (in first line of constructor of Object A) there is context switch. Now thread B is working, and thread B can look into object A (he had reference already). However Object A is not yet fully constructed because Thread A don't have time to finish it.

Here is your clear example :
Let's say, there is class named House
class House {
private static List<House> listOfHouse;
private name;
// other properties
public House(){
listOfHouse.add(this);
this.name = "dummy house";
//do other things
}
// other methods
}
And Village:
class Village {
public static void printsHouses(){
for(House house : House.getListOfHouse()){
System.out.println(house.getName());
}
}
}
Now if you are creating a House in a thread, "X". And when the executing thread is just finished the bellow line,
listOfHouse.add(this);
And the context is switched (already the reference of this object is added in the list listOfHouse, while the object creation is not finished yet) to another thread, "Y" running,
printsHouses();
in it! then printHouses() will see an object which is still not fully created and this type of inconsistency is known as Leak.

Lot of good data here but I thought I'd add some more information.
When constructing an object that will be shared between threads, be very careful that a reference to the object does not "leak" prematurely.
While you are constructing the object, you need to make sure that there is no way for other threads to access this object before it can be fulling constructed. This means that in a constructor you should not, for example:
Assign the object to a static field on the class that is accessible by other threads.
Start a thread on the object in the constructor which may start using fields from the object before they are fulling initialized.
Publish the object into a collection or via any other mechanisms that allow other threads to see the object before it can be fulling constructed.
You might be tempted to add the following line to your constructor:
instances.add(this);
So something like the following is improper:
public class Foo {
// multiple threads can use this
public static List<Foo> instances = new ArrayList<Foo>();
public Foo() {
...
// this "leaks" this, publishing it to other threads
instances.add(this);
...
// other initialization stuff
}
...
One addition bit of complexity is that the Java compiler/optimizer has the ability to reorder the instructions inside of the constructor so they happen at a later time. This means that even if you do instances.add(this); as the last line of the constructor, this is not enough to ensure that the constructor really has finished.
If multiple threads are going to be accessing this published object, it must be synchronized. The only fields you don't need to worry about are final fields which are guaranteed to be finished constructing when the constructor finishes. volatile fields are themselves synchronized so you don't have to worry about them.

I think that the following example illustrate what authors wanted to say:
public clsss MyClass {
public MyClass(List<?> list) {
// some stuff
list.add(this); // self registration
// other stuff
}
}
The MyClass registers itself in list that can be used by other thread. But it runs "other stuff" after the registration. This means that if other thread starts using the object before it finished its constructor the object is probably not fully created yet.

Its describing the following situation:
Thread1:
//we add a reference to this thread
object.add(thread1Id,this);
//we start to initialize this thread, but suppose before reaching the next line we switch threads
this.initialize();
Thread2:
//we are able to get th1, but its not initialized properly so its in an invalid state
//and hence th1 is not valid
Object th1 = object.get(thread1Id);

As the thread scheduler can stop execution of a thread at any time (even half-way through a high level instruction like instances.push_back(this)) and switch to executing a different thread, unexpected behaviour can happen if you don't synchronize parallel access to objects.
Look at the code below:
#include <vector>
#include <thread>
#include <memory>
#include <iostream>
struct A {
std::vector<A*> instances;
A() { instances.push_back(this); }
void printSize() { std::cout << instances.size() << std::endl; }
};
int main() {
std::unique_ptr<A> a; // Initialized to nullptr.
std::thread t1([&a] { a.reset(new A()); }); // Construct new A.
std::thread t2([&a] { a->printSize(); }); // Use A. This will fail if t1 don't happen to finish before.
t1.join();
t2.join();
}
As the access to a in main()-function is not synchronized execution will fail every once in a while.
This happens when execution of thread t1 is halted before finishing construction of the object A and thread t2 is executed instead. This results in thread t2 trying to access a unique_ptr<A> containing a nullptr.

You just have to make sure, that even, when one thread hasn't initialized the Object, no Thread will access it (and get a NullpointerException).
In this case, it would happen in the constructor (I suppose), but another thread could access that very object between its add to the list and the end of the constructor.

question about singleton classes and threads

I'm trying to learn about singleton classes and how they can be used in an application to keep it thread safe. Let's suppose you have an singleton class called IndexUpdater whose reference is obtained as follows:
public static synchronized IndexUpdater getIndexUpdater() {
if (ref == null)
// it's ok, we can call this constructor
ref = new IndexUpdater();
return ref;
}
private static IndexUpdater ref;
Let's suppose there are other methods in the class that do the actual work (update indicies, etc.). What I'm trying to understand is how accessing and using the singleton would work with two threads. Let's suppose in time 1, thread 1 gets a reference to the class, through a call like this IndexUpdater iu = IndexUpdater.getIndexUpdater(); Then,
in time 2, using reference iu, a method within the class is called iu.updateIndex by thread 1. What would happen in time 2, a second thread tries to get a reference to the class. Could it do this and also access methods within the singleton or would it be prevented as long as the first thread has an active reference to the class. I'm assuming the latter (or else how would this work?) but I'd like to make sure before I implement.
Thank you,
Elliott

Since getIndexUpdater() is a synchronized method, it only prevents threads from accessing this method (or any method protected by the same synchronizer) simultaneously. So it could be a problem if other threads are accessing the object's methods at the same time. Just keep in mind that if a thread is running a synchronized method, all other threads trying to run any synchronized methods on the same object are blocked.
More info on:
http://download.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html

Your assumption is wrong. Synchronizing getIndexUpdater() only prevents more than one instance being created by different threads calling getIndexUpdater() at (almost) the same time.
Without synchronization the following could happen: Thread one calls getIndexUpdater(). ref is null. Thread 2 calls getIndexUpdater(). ref is still null. Outcome: ref is instantiated twice.

You are conflating the instantiation of a singleton object with its use. Synchronizing the creation of a singleton object does not guarantee that the singleton class itself is thread-safe. Here is a simple example:
public class UnsafeSingleton {
private static UnsafeSingleton singletonRef;
private Queue<Object> objects = new LinkedList<Object>();
public static synchronized UnsafeSingleton getInstance() {
if (singletonRef == null) {
singletonRef = new UnsafeSingleton();
}
return singletonRef;
}
public void put(Object o) {
objects.add(o);
}
public Object get() {
return objects.remove(o);
}
}
Two threads calling getInstance are guaranteed to get the same instance of UnsafeSingleton because synchronizing this method guarantees that singletonRef will only be set once. However, the instance that is returned is not thread safe, because (in this example) LinkedList is not a thread-safe queue. Two threads modifying this queue may result in unexpected behavior. Additional steps have to be taken to ensure that the singleton itself is thread-safe, not just its instantiation. (In this example, the queue implementation could be replaced with a LinkedBlockingQueue, for example, or the get and put methods could be marked synchronized.)

Then, in time 2, using reference iu, a method within the class is called iu.updateIndex by thread 1. What would happen in time 2, a second thread tries to get a reference to the class. Could it do this and also access methods within the singleton ...?
The answer is yes. Your assumption on how references are obtained is wrong. The second thread can obtain a reference to the Singleton. The Singleton pattern is most commonly used as a sort of pseudo-global state. As we all know, global state is generally very difficult to deal with when multiple entities are using it. In order to make your singleton thread safe you will need to use appropriate safety mechanisms such as using atomic wrapper classes like AtomicInteger or AtomicReference (etc...) or using synchronize (or Lock) to protect critical areas of code from being accessed simultaneously.

The safest is to use the enum-singleton.
public enum Singleton {
INSTANCE;
public String method1() {
...
}
public int method2() {
...
}
}
Thread-safe, serializable, lazy-loaded, etc. Only advantages !

When a second thread tries to invoke getIndexUpdater() method, it will try to obtain a so called lock, created for you when you used synchronized keyword. But since some other thread is already inside the method, it obtained the lock earlier and others (like the second thread) must wait for it.
When the first thread will finish its work, it will release the lock and the second thread will immediately take it and enter the method. To sum up, using synchronized always allows only one thread to enter guarded block - very restrictive access.

The static synchronized guarantees that only one thread can be in this method at once and any other thread attempting to access this method (or any other static synchronized method in this class) will have to wait for it to complete.
IMHO the simplest way to implement a singleton is to have a enum with one value
enum Singleton {
INSTANCE
}
This is thread safe and only creates the INSTANCE when the class is accessed.

As soon as your synchronized getter method will return the IndexUpdater instance (whether it was just created or already existed doesn't matter), it is free to be called from another thread. You should make sure your IndexUpdater is thread safe so it can be called from multiple threads at a time, or you should create an instance per thread so they won't be shared.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.