Directly from this web site, I came across the following description about creating object thread safety.
Warning: When constructing an object that will be shared between
threads, be very careful that a reference to the object does not
"leak" prematurely. For example, suppose you want to maintain a List
called instances containing every instance of class. You might be
tempted to add the following line to your constructor:
instances.add(this);
But then other threads can use instances to access the object before
construction of the object is complete.
Is anybody able to express the same concept with other words or another more graspable example?
Thanks in advance.
Let us assume, you have such class:
class Sync {
public Sync(List<Sync> list) {
list.add(this);
// switch
// instance initialization code
}
public void bang() { }
}
and you have two threads (thread #1 and thread #2), both of them have a reference the same List<Sync> list instance.
Now thread #1 creates a new Sync instance and as an argument provides a reference to the list instance:
new Sync(list);
While executing line // switch in the Sync constructor there is a context switch and now thread #2 is working.
Thread #2 executes such code:
for(Sync elem : list)
elem.bang();
Thread #2 calls bang() on the instance created in point 3, but this instance is not ready to be used yet, because the constructor of this instance has not been finished.
Therefore,
you have to be very careful when calling a constructor and passing a reference to the object shared between a few threads
when implementing a constructor you have to keep in mind that the provided instance can be shared between a few threads
Thread A is creating Object A, in the middle of creation object A (in first line of constructor of Object A) there is context switch. Now thread B is working, and thread B can look into object A (he had reference already). However Object A is not yet fully constructed because Thread A don't have time to finish it.
Here is your clear example :
Let's say, there is class named House
class House {
private static List<House> listOfHouse;
private name;
// other properties
public House(){
listOfHouse.add(this);
this.name = "dummy house";
//do other things
}
// other methods
}
And Village:
class Village {
public static void printsHouses(){
for(House house : House.getListOfHouse()){
System.out.println(house.getName());
}
}
}
Now if you are creating a House in a thread, "X". And when the executing thread is just finished the bellow line,
listOfHouse.add(this);
And the context is switched (already the reference of this object is added in the list listOfHouse, while the object creation is not finished yet) to another thread, "Y" running,
printsHouses();
in it! then printHouses() will see an object which is still not fully created and this type of inconsistency is known as Leak.
Lot of good data here but I thought I'd add some more information.
When constructing an object that will be shared between threads, be very careful that a reference to the object does not "leak" prematurely.
While you are constructing the object, you need to make sure that there is no way for other threads to access this object before it can be fulling constructed. This means that in a constructor you should not, for example:
Assign the object to a static field on the class that is accessible by other threads.
Start a thread on the object in the constructor which may start using fields from the object before they are fulling initialized.
Publish the object into a collection or via any other mechanisms that allow other threads to see the object before it can be fulling constructed.
You might be tempted to add the following line to your constructor:
instances.add(this);
So something like the following is improper:
public class Foo {
// multiple threads can use this
public static List<Foo> instances = new ArrayList<Foo>();
public Foo() {
...
// this "leaks" this, publishing it to other threads
instances.add(this);
...
// other initialization stuff
}
...
One addition bit of complexity is that the Java compiler/optimizer has the ability to reorder the instructions inside of the constructor so they happen at a later time. This means that even if you do instances.add(this); as the last line of the constructor, this is not enough to ensure that the constructor really has finished.
If multiple threads are going to be accessing this published object, it must be synchronized. The only fields you don't need to worry about are final fields which are guaranteed to be finished constructing when the constructor finishes. volatile fields are themselves synchronized so you don't have to worry about them.
I think that the following example illustrate what authors wanted to say:
public clsss MyClass {
public MyClass(List<?> list) {
// some stuff
list.add(this); // self registration
// other stuff
}
}
The MyClass registers itself in list that can be used by other thread. But it runs "other stuff" after the registration. This means that if other thread starts using the object before it finished its constructor the object is probably not fully created yet.
Its describing the following situation:
Thread1:
//we add a reference to this thread
object.add(thread1Id,this);
//we start to initialize this thread, but suppose before reaching the next line we switch threads
this.initialize();
Thread2:
//we are able to get th1, but its not initialized properly so its in an invalid state
//and hence th1 is not valid
Object th1 = object.get(thread1Id);
As the thread scheduler can stop execution of a thread at any time (even half-way through a high level instruction like instances.push_back(this)) and switch to executing a different thread, unexpected behaviour can happen if you don't synchronize parallel access to objects.
Look at the code below:
#include <vector>
#include <thread>
#include <memory>
#include <iostream>
struct A {
std::vector<A*> instances;
A() { instances.push_back(this); }
void printSize() { std::cout << instances.size() << std::endl; }
};
int main() {
std::unique_ptr<A> a; // Initialized to nullptr.
std::thread t1([&a] { a.reset(new A()); }); // Construct new A.
std::thread t2([&a] { a->printSize(); }); // Use A. This will fail if t1 don't happen to finish before.
t1.join();
t2.join();
}
As the access to a in main()-function is not synchronized execution will fail every once in a while.
This happens when execution of thread t1 is halted before finishing construction of the object A and thread t2 is executed instead. This results in thread t2 trying to access a unique_ptr<A> containing a nullptr.
You just have to make sure, that even, when one thread hasn't initialized the Object, no Thread will access it (and get a NullpointerException).
In this case, it would happen in the constructor (I suppose), but another thread could access that very object between its add to the list and the end of the constructor.
Related
I'm wondering what publication guarantees exist for a non-final field initialised to null, if any.
Consider the following snippet:
public class MyClass {
private final CopyOnWriteArrayList<Inner> list = new CopyOnWriteArrayList<>();
//called by thread 1
public void init() {
// initialise Inner instance
list.add(new Inner());
}
//called by thread 2
public void doSomething() {
for (Inner i : list) {
// access non-final field
Object ref = i.ref;
// do something
// ...
// ...
// potentially set i.ref
}
}
private static class Inner {
// initialised by thread 1
Object ref = null;
}
}
Assuming doSomething() is always called by thread 2, is this safe? What guarantees are made about what thread 2 will see the first time it's accessed? Is there any possibility thread 2 would see something that's non-null?
Where in the JMM are the semantics around this situation described?
JVM will guarantee that you don't see out of thin air values, so anything other than null it not possible, in case that List is not empty (in this example, of course). If there would have been a different thread involved (let's say Thread3) that would alter your list (add elements to it), Thread2 could see those updates. Just note that individual methods of CopyOnWriteArrayList are thread safe; your method doSomething is not.
See the JLS for the specifics or the excellent (and rather complicated, may be just to me) Aleksey article.
Talking about safe publication makes sense only when you are initialising a field with a meaningful object that has the state. Then, improper publication may lead to observing a partially constructed object.
In this case, null doesn't convey any state. It can be considered an immutable object. Immutable objects don't have publication problems.
What guarantees are made about what thread 2 will see the first time it's accessed?
Thread 2 will see null when referring to i.ref.
Note that the list may be empty because Thread 1 may not have added an Inner to it yet.
Is there any possibility thread 2 would see something that's non-null?
No.
I'm wondering if the following class is thread safe:
class Example {
private Thing thing;
public setThing(Thing thing) {
this.thing = thing;
}
public use() {
thing.function();
}
}
Specifically, what happens if one thread calls setThing while another thread is in Thing::function via Example::use?
For example:
Example example = new Example();
example.setThing(new Thing());
createThread(example); // create first thread
createThread(example); // create second thread
//Thread1
while(1) {
example.use();
}
//Thread2
while(1) {
sleep(3600000); //yes, i know to use a scheduled thread executor
setThing(new Thing());
}
Specifically, I want to know, when setThing is called while use() is executing, will it continue with the old object successfully, or could updating the reference to the object somehow cause a problem.
There are 2 points when reasoning about thread safety of a particulcar class :
Visibility of shared state between threads.
Safety (preserving class invariants) when class object is used by multiple threads through class methods.
Shared state of Example class consists only from one Thing object.
The class isn't thread safe from visibility perspective. Result of setThing by one thread isn't seen by other threads so they can work with stale data. NPE is also acceptable cause initial value of thing during class initialization is null.
It's not possible to say whether it's safe to access Thing class through use method without its source code. However Example invokes use method without any synchronization so it should be, otherwise Example isn't thread safe.
As a result Example isn't thread safe. To fix point 1 you can either add volatile to thing field if you really need setter or mark it as final and initialize in constructor. The easiest way to ensure that 2 is met is to mark use as synchronized. If you mark setThing with synchronized as well you don't need volatile anymore. However there lots of other sophisticated techniques to meet point 2. This great book describes everything written here in more detail.
If the method is sharing resources and the thread is not synchronized, then the they will collide and several scenarios can occur including overwriting data computed by another thread and stored in a shared variable.
If the method has only local variables, then you can use the method by mutliple threads without worring about racing. However, usually non-helper classes manipulate member variables in their methods, therefore it's recommended to make methods synchronized or if you know exactly where the problem might occur, then lock (also called synchronize) a subscope of a method with a final lock/object.
In this example, is it sufficient to declare the parameter obj as final to safely use it in the thread, below?
public void doSomethingAsync (final Object obj)
{
Thread thread = new Thread ()
{
#Override public void run () { ... do something with obj ... }
}
thread.start ();
}
At first glance it may seem fine. A caller invokes doSomethingAsync and obj gets cached until needed in the thread.
But what happens if there are a burst of calls to doSomethingAsync such that they complete before the threads have done anything with obj?
If the Java compiler simply makes obj into a member variable, the last call to doSomethingAsync will overwrite the prior values of obj, making prior invocations of the thread use a wrong value. Or, does the compiler generate a queue or some dimensioned storage for obj so that each thread gets the proper value?
At first glance it may seem fine. A caller invokes doSomethingAsync and obj gets cached until needed in the thread.
The object is not "cached", the variable reference merely cannot be assigned to another object. The final keyword only prevents the variable from being re-assigned, it does not prevent the object that is being referenced from being mutated.
But what happens if there are a burst of calls to doSomethingAsync such that they complete before the threads have done anything with obj?
If the threads modify the referenced object the behavior would be undefined, they would be competing for the object and their reference to the object may have "old" values because the object was not synchronized between the threads. If the object is immutable, it has no state and cannot be changed, then it is inherently thread safe.
If the Java compiler simply makes obj into a method variable, the last call to doSomethingAsync will overwrite the prior values of obj, making prior invocations of the thread use a wrong value. Or, does the compiler generate a queue or some dimensioned storage for obj so that each thread gets the proper value?
The compiler does not guarantee that the threads get executed in order, threads run concurrently. This is why the synchronize keyword exists, so that you can guarantee that when you reference the object you reference the same state of the object that all of the other threads see. Obviously this is at a cost to performance so it is recommended to only pass immutable objects into threads so that you don't have to synchronize the threads every time you do something with the object.
Large edit here, based on a conversation the Original Poster and I had in chat.
It seems Peri's real question was about the way Java stored local variables like "obj" for use by Thread. This is called "captured variables" if you want to google it yourself. There is a nice discussion here.
Basically what happens is that all your local variable, the ones stored on the stack, plus the "this" pointer get copied into your local class (Thread in this case) when the local class is instantiated.
Original answer follows for the sake of the comments. But it is now obsolete.
Each time you call doSomethingAsync you are creating a new thread. If you call doSomethingAsync just once with a particular object, and then you modify that same object in the calling thread, then you have no idea what what the asynchronous thread will do. It might "do something with the object" before you modify it in the calling thread, after you modify in the calling thread or even WHILE you are concurrently modifying it in the calling thread. Unless the Object itself is thread safe this will cause problems.
Similarly, if you call doSomethignAsync twice with the same object, then you have no idea which asynchronous thread will modify the object first, and no guarantee they will not act concurrently on the same object.
Finally, if you call doSomethignAsync twice with 2 different objects then you don't know which asynchronous thread will act on its own object first, but you don't care, because they can't conflict with each other unless the objects have Static mutable variables (class variables) that are being modified).
If you require that one task get completed before another task and in the order submitted, then a single threaded ExecutorService is your answer.
If the Java compiler simply makes obj into a member variable, the last call to doSomethingAsync will overwrite the prior values of obj, making prior invocations of the thread use a wrong value
No, this will not happen. The subsequent call to doSomethingAsync cannot overwrite the obj captured by previous invocations of doSomethingAsync. This stands even if you remove the final keyword (assume java let you do it for just this time).
I think your question ultimately is about how closure works/is implemented in java. However, your code is not demonstrating the complication in the proper way because the code is not even trying to modify the variable obj in the same lexical scope.
In a way Java is not really capturing the variable obj, but its value. You could write the your code in a different way, and the overall effect is the same:
class YourThread extends Thread {
private Object param;
public YourThread (Object obj){
param = obj;
}
#Override
public void run(){
//do something with your param
}
}
and you no longer need the final keyword:
public void doSomethingAsync (Object obj){
Thread t = new YourThread (obj);
t.start();
}
Now, say you have two instances of YourThread created, how could the second instance modify what has been passed as parameter to the first instance?
Closure in Other Languages
In other languages, magical things can indeed happen, but to show it you need to write the code slightly different:
public void doSomethingAsync (Object obj){
//Here let's assume obj is not null
Thread thread = new Thread (){
#Override
public void run () { ... /*do something with obj*/ ... }
}
thread.start ();
obj = null;
}
This is not valid Java code, but in certain languages code like that is allowed. And the thread, when its run method is executed, might see obj as null.
Similarly, in the below code (again, not valid in Java), thread2 could potentially impact thread1 if thread2 executes first and changes obj in its run method:
public void doSomethingAsync (Object obj){
Thread thread1 = new Thread (){
#Override
public void run () { ... /*do something with obj*/ ... }
}
thread1.start ();
Thread thread2 = new Thread (){
#Override
public void run () { ... /*do something with obj*/ ... }
}
thread2.start ();
}
Back to Java
The reason Java forces you to put a final on obj is that although Java's syntax looks extremely similar to the closure syntax used in other languages, it is not doing the same closure semantics. Knowing it is final, Java does not need to create capturing object (thus additional heap allocation), but use something similar to YourThread behind the scene. See this link for more details
If I have one instance of an object A and it has an instance method foo() with only variables created and used in that method is that method thread safe even if the same instance is accessed by many threads?
If yes, does this still apply if instance method bar() on object A creates many threads and calls method foo() in the text described above?
Does it mean that every thread gets a "copy" of the method even if it belongs to the same instance?
I have deliberately not used the synchronized keyword.
Thanks
Yes. All local variables (variables defined within a method) will be on their own Stack frame. So, they will be thread safe, provided a reference is not escaping the scope (method)
Note : If a local reference escapes the method (as an argument to another method) or a method works on some class level or instance level fields, then it is not thread-safe.
Does it mean that every thread gets a "copy" of the method even if it belongs to the same instance
No, there will be only one method. every thread shares the same method. But each Thread will have its own Stack Frame and local variables will be on that thread's Stack frame. Even if you use synchronize on local Objects, Escape Analysis proves that the JVM will optimize your code and remove all kinds of synchronization.
example :
public static void main(String[] args) {
Object lock = new Object();
synchronized (lock) {
System.out.println("hello");
}
}
will be effectively converted to :
public static void main(String[] args) {
Object lock = new Object(); // JVm might as well remove this line as unused Object or instantiate this on the stack
System.out.println("hello");
}
You have to separate the code being run, and the data being worked on.
The method is code, executed by each of the threads. If that code contains a statement such as int i=5 which defines a new variable i, and sets its value to 5, then each thread will create that variable.
The problem with multi-threading is not with common code, but with common data (and other common resources). If the common code accesses some variable j that was created elsewhere, then all threads will access the same variable j, i.e. the same data. If one of these threads modifies the shared data while the others are reading, all kinds of errors might occur.
Now, regarding your question, your code should be thread safe as long as your variables are defined within bar(), and bar() doesn't access some common resource such as a file.
You should post some example code to make sure we understand the use case.
For this example:
public class Test {
private String varA;
public void doSomething() {
String varB;
}
}
If you don't do anything to modify varA in this example and only modify varB, this example is Thread Safe.
If, however, you create or modify varA and depend on it's state, then the method is NOT Thread Safe.
I'm trying to learn about singleton classes and how they can be used in an application to keep it thread safe. Let's suppose you have an singleton class called IndexUpdater whose reference is obtained as follows:
public static synchronized IndexUpdater getIndexUpdater() {
if (ref == null)
// it's ok, we can call this constructor
ref = new IndexUpdater();
return ref;
}
private static IndexUpdater ref;
Let's suppose there are other methods in the class that do the actual work (update indicies, etc.). What I'm trying to understand is how accessing and using the singleton would work with two threads. Let's suppose in time 1, thread 1 gets a reference to the class, through a call like this IndexUpdater iu = IndexUpdater.getIndexUpdater(); Then,
in time 2, using reference iu, a method within the class is called iu.updateIndex by thread 1. What would happen in time 2, a second thread tries to get a reference to the class. Could it do this and also access methods within the singleton or would it be prevented as long as the first thread has an active reference to the class. I'm assuming the latter (or else how would this work?) but I'd like to make sure before I implement.
Thank you,
Elliott
Since getIndexUpdater() is a synchronized method, it only prevents threads from accessing this method (or any method protected by the same synchronizer) simultaneously. So it could be a problem if other threads are accessing the object's methods at the same time. Just keep in mind that if a thread is running a synchronized method, all other threads trying to run any synchronized methods on the same object are blocked.
More info on:
http://download.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html
Your assumption is wrong. Synchronizing getIndexUpdater() only prevents more than one instance being created by different threads calling getIndexUpdater() at (almost) the same time.
Without synchronization the following could happen: Thread one calls getIndexUpdater(). ref is null. Thread 2 calls getIndexUpdater(). ref is still null. Outcome: ref is instantiated twice.
You are conflating the instantiation of a singleton object with its use. Synchronizing the creation of a singleton object does not guarantee that the singleton class itself is thread-safe. Here is a simple example:
public class UnsafeSingleton {
private static UnsafeSingleton singletonRef;
private Queue<Object> objects = new LinkedList<Object>();
public static synchronized UnsafeSingleton getInstance() {
if (singletonRef == null) {
singletonRef = new UnsafeSingleton();
}
return singletonRef;
}
public void put(Object o) {
objects.add(o);
}
public Object get() {
return objects.remove(o);
}
}
Two threads calling getInstance are guaranteed to get the same instance of UnsafeSingleton because synchronizing this method guarantees that singletonRef will only be set once. However, the instance that is returned is not thread safe, because (in this example) LinkedList is not a thread-safe queue. Two threads modifying this queue may result in unexpected behavior. Additional steps have to be taken to ensure that the singleton itself is thread-safe, not just its instantiation. (In this example, the queue implementation could be replaced with a LinkedBlockingQueue, for example, or the get and put methods could be marked synchronized.)
Then, in time 2, using reference iu, a method within the class is called iu.updateIndex by thread 1. What would happen in time 2, a second thread tries to get a reference to the class. Could it do this and also access methods within the singleton ...?
The answer is yes. Your assumption on how references are obtained is wrong. The second thread can obtain a reference to the Singleton. The Singleton pattern is most commonly used as a sort of pseudo-global state. As we all know, global state is generally very difficult to deal with when multiple entities are using it. In order to make your singleton thread safe you will need to use appropriate safety mechanisms such as using atomic wrapper classes like AtomicInteger or AtomicReference (etc...) or using synchronize (or Lock) to protect critical areas of code from being accessed simultaneously.
The safest is to use the enum-singleton.
public enum Singleton {
INSTANCE;
public String method1() {
...
}
public int method2() {
...
}
}
Thread-safe, serializable, lazy-loaded, etc. Only advantages !
When a second thread tries to invoke getIndexUpdater() method, it will try to obtain a so called lock, created for you when you used synchronized keyword. But since some other thread is already inside the method, it obtained the lock earlier and others (like the second thread) must wait for it.
When the first thread will finish its work, it will release the lock and the second thread will immediately take it and enter the method. To sum up, using synchronized always allows only one thread to enter guarded block - very restrictive access.
The static synchronized guarantees that only one thread can be in this method at once and any other thread attempting to access this method (or any other static synchronized method in this class) will have to wait for it to complete.
IMHO the simplest way to implement a singleton is to have a enum with one value
enum Singleton {
INSTANCE
}
This is thread safe and only creates the INSTANCE when the class is accessed.
As soon as your synchronized getter method will return the IndexUpdater instance (whether it was just created or already existed doesn't matter), it is free to be called from another thread. You should make sure your IndexUpdater is thread safe so it can be called from multiple threads at a time, or you should create an instance per thread so they won't be shared.