In this example, is it sufficient to declare the parameter obj as final to safely use it in the thread, below?
public void doSomethingAsync (final Object obj)
{
Thread thread = new Thread ()
{
#Override public void run () { ... do something with obj ... }
}
thread.start ();
}
At first glance it may seem fine. A caller invokes doSomethingAsync and obj gets cached until needed in the thread.
But what happens if there are a burst of calls to doSomethingAsync such that they complete before the threads have done anything with obj?
If the Java compiler simply makes obj into a member variable, the last call to doSomethingAsync will overwrite the prior values of obj, making prior invocations of the thread use a wrong value. Or, does the compiler generate a queue or some dimensioned storage for obj so that each thread gets the proper value?
At first glance it may seem fine. A caller invokes doSomethingAsync and obj gets cached until needed in the thread.
The object is not "cached", the variable reference merely cannot be assigned to another object. The final keyword only prevents the variable from being re-assigned, it does not prevent the object that is being referenced from being mutated.
But what happens if there are a burst of calls to doSomethingAsync such that they complete before the threads have done anything with obj?
If the threads modify the referenced object the behavior would be undefined, they would be competing for the object and their reference to the object may have "old" values because the object was not synchronized between the threads. If the object is immutable, it has no state and cannot be changed, then it is inherently thread safe.
If the Java compiler simply makes obj into a method variable, the last call to doSomethingAsync will overwrite the prior values of obj, making prior invocations of the thread use a wrong value. Or, does the compiler generate a queue or some dimensioned storage for obj so that each thread gets the proper value?
The compiler does not guarantee that the threads get executed in order, threads run concurrently. This is why the synchronize keyword exists, so that you can guarantee that when you reference the object you reference the same state of the object that all of the other threads see. Obviously this is at a cost to performance so it is recommended to only pass immutable objects into threads so that you don't have to synchronize the threads every time you do something with the object.
Large edit here, based on a conversation the Original Poster and I had in chat.
It seems Peri's real question was about the way Java stored local variables like "obj" for use by Thread. This is called "captured variables" if you want to google it yourself. There is a nice discussion here.
Basically what happens is that all your local variable, the ones stored on the stack, plus the "this" pointer get copied into your local class (Thread in this case) when the local class is instantiated.
Original answer follows for the sake of the comments. But it is now obsolete.
Each time you call doSomethingAsync you are creating a new thread. If you call doSomethingAsync just once with a particular object, and then you modify that same object in the calling thread, then you have no idea what what the asynchronous thread will do. It might "do something with the object" before you modify it in the calling thread, after you modify in the calling thread or even WHILE you are concurrently modifying it in the calling thread. Unless the Object itself is thread safe this will cause problems.
Similarly, if you call doSomethignAsync twice with the same object, then you have no idea which asynchronous thread will modify the object first, and no guarantee they will not act concurrently on the same object.
Finally, if you call doSomethignAsync twice with 2 different objects then you don't know which asynchronous thread will act on its own object first, but you don't care, because they can't conflict with each other unless the objects have Static mutable variables (class variables) that are being modified).
If you require that one task get completed before another task and in the order submitted, then a single threaded ExecutorService is your answer.
If the Java compiler simply makes obj into a member variable, the last call to doSomethingAsync will overwrite the prior values of obj, making prior invocations of the thread use a wrong value
No, this will not happen. The subsequent call to doSomethingAsync cannot overwrite the obj captured by previous invocations of doSomethingAsync. This stands even if you remove the final keyword (assume java let you do it for just this time).
I think your question ultimately is about how closure works/is implemented in java. However, your code is not demonstrating the complication in the proper way because the code is not even trying to modify the variable obj in the same lexical scope.
In a way Java is not really capturing the variable obj, but its value. You could write the your code in a different way, and the overall effect is the same:
class YourThread extends Thread {
private Object param;
public YourThread (Object obj){
param = obj;
}
#Override
public void run(){
//do something with your param
}
}
and you no longer need the final keyword:
public void doSomethingAsync (Object obj){
Thread t = new YourThread (obj);
t.start();
}
Now, say you have two instances of YourThread created, how could the second instance modify what has been passed as parameter to the first instance?
Closure in Other Languages
In other languages, magical things can indeed happen, but to show it you need to write the code slightly different:
public void doSomethingAsync (Object obj){
//Here let's assume obj is not null
Thread thread = new Thread (){
#Override
public void run () { ... /*do something with obj*/ ... }
}
thread.start ();
obj = null;
}
This is not valid Java code, but in certain languages code like that is allowed. And the thread, when its run method is executed, might see obj as null.
Similarly, in the below code (again, not valid in Java), thread2 could potentially impact thread1 if thread2 executes first and changes obj in its run method:
public void doSomethingAsync (Object obj){
Thread thread1 = new Thread (){
#Override
public void run () { ... /*do something with obj*/ ... }
}
thread1.start ();
Thread thread2 = new Thread (){
#Override
public void run () { ... /*do something with obj*/ ... }
}
thread2.start ();
}
Back to Java
The reason Java forces you to put a final on obj is that although Java's syntax looks extremely similar to the closure syntax used in other languages, it is not doing the same closure semantics. Knowing it is final, Java does not need to create capturing object (thus additional heap allocation), but use something similar to YourThread behind the scene. See this link for more details
Related
I have seen this NullPointerException on synchronized statement.
code:
synchronized(a){
a = new A()
}
So according to the above answer I have understood that it is not possible to use synchronized keyword on null reference.
So I changed my code to this:
synchronized(a = new A()){}
But am not sure if this is identical with my original code?
update:
what I want to achieve is lock the creation of a ( a = new A() )
Synchronized requires an object that will provide locking mechanism. It can be any object (in fact, synchronized without parameters will synchronize on this), but Java API provides classes dedicated to this functionality, for example ReentrantLock.
In code you provided every call to function containing synchronized block will use different object for locking, effectivly making synchronization useless.
Edit:
Since you updated your post with what you are actually trying to accomplish I can help you more.
public class Creator {
private A a;
public void createA() {
synchronized(this) {
a = new A();
}
}
}
I don't know if this fits your design since the code sample you provided is very small, but you should get the idea. Here instance of the Creator class is used to synchronize the creation of A. If you share it across multiple threads, each one of them calling createA(), you can be sure that one instantiation process will be finished before another one begins.
synchronized(a = new A()){}
so what it will do is it will create a new object of class A and use
that as Lock, so in simple word every thread can enter in synchronized
block anytime because each thread will have new lock and there will be
no other thread that is using that object as lock so every thread can
enter your synchronized block anytime and outcome will be no
synchronization
For Example
class TestClass {
SomeClass someVariable;
public void myMethod () {
synchronized (someVariable) {
...
}
}
public void myOtherMethod() {
synchronized (someVariable) {
...
}
}
}
here we can say Then those two blocks will be protected by execution
of 2 different threads at any time while someVariable is not modified.
Basically, it's said that those two blocks are synchronized against
the variable someVariable.
But in your case there will be always a new object so there will be no synchronization
These two code snippets are not equivalent!
In the first code snippet you synchronize on some object referenced by a, and afterwards you change the reference which will not change the synchronization object.
In the second snippet you first assign a newly created object to reference a and then synchronize on it. So the synchronization object will be the new one.
Generally, it is a very bad idea to change the reference which is used in the synchronized statement, regardless whether it is done inside the block (first code) or diretcly in the synchronized statement (second code). Make it final! Oh, and it mustn't be null, either.
If I have one instance of an object A and it has an instance method foo() with only variables created and used in that method is that method thread safe even if the same instance is accessed by many threads?
If yes, does this still apply if instance method bar() on object A creates many threads and calls method foo() in the text described above?
Does it mean that every thread gets a "copy" of the method even if it belongs to the same instance?
I have deliberately not used the synchronized keyword.
Thanks
Yes. All local variables (variables defined within a method) will be on their own Stack frame. So, they will be thread safe, provided a reference is not escaping the scope (method)
Note : If a local reference escapes the method (as an argument to another method) or a method works on some class level or instance level fields, then it is not thread-safe.
Does it mean that every thread gets a "copy" of the method even if it belongs to the same instance
No, there will be only one method. every thread shares the same method. But each Thread will have its own Stack Frame and local variables will be on that thread's Stack frame. Even if you use synchronize on local Objects, Escape Analysis proves that the JVM will optimize your code and remove all kinds of synchronization.
example :
public static void main(String[] args) {
Object lock = new Object();
synchronized (lock) {
System.out.println("hello");
}
}
will be effectively converted to :
public static void main(String[] args) {
Object lock = new Object(); // JVm might as well remove this line as unused Object or instantiate this on the stack
System.out.println("hello");
}
You have to separate the code being run, and the data being worked on.
The method is code, executed by each of the threads. If that code contains a statement such as int i=5 which defines a new variable i, and sets its value to 5, then each thread will create that variable.
The problem with multi-threading is not with common code, but with common data (and other common resources). If the common code accesses some variable j that was created elsewhere, then all threads will access the same variable j, i.e. the same data. If one of these threads modifies the shared data while the others are reading, all kinds of errors might occur.
Now, regarding your question, your code should be thread safe as long as your variables are defined within bar(), and bar() doesn't access some common resource such as a file.
You should post some example code to make sure we understand the use case.
For this example:
public class Test {
private String varA;
public void doSomething() {
String varB;
}
}
If you don't do anything to modify varA in this example and only modify varB, this example is Thread Safe.
If, however, you create or modify varA and depend on it's state, then the method is NOT Thread Safe.
Consider the following class:
public class MyClass
{
private MyObject obj;
public MyClass()
{
obj = new MyObject();
}
public void methodCalledByOtherThreads()
{
obj.doStuff();
}
}
Since obj was created on one thread and accessed from another, could obj be null when methodCalledByOtherThread is called? If so, would declaring obj as volatile be the best way to fix this issue? Would declaring obj as final make any difference?
Edit:
For clarity, I think my main question is:
Can other threads see that obj has been initialized by some main thread or could obj be stale (null)?
For the methodCalledByOtherThreads to be called by another thread and cause problems, that thread would have to get a reference to a MyClass object whose obj field is not initialized, ie. where the constructor has not yet returned.
This would be possible if you leaked the this reference from the constructor. For example
public MyClass()
{
SomeClass.leak(this);
obj = new MyObject();
}
If the SomeClass.leak() method starts a separate thread that calls methodCalledByOtherThreads() on the this reference, then you would have problems, but this is true regardless of the volatile.
Since you don't have what I'm describing above, your code is fine.
It depends on whether the reference is published "unsafely". A reference is "published" by being written to a shared variable; another thread reads the variable to get the reference. If there is no relationship of happens-before(write, read), the publication is called unsafe. An example of unsafe publication is through a non-volatile static field.
#chrylis 's interpretation of "unsafe publication" is not accurate. Leaking this before constructor exit is orthogonal to the concept of unsafe publication.
Through unsafe publication, another thread may observe the object in an uncertain state (hence the name); in your case, field obj may appear to be null to another thread. Unless, obj is final, then it cannot appear to be null even if the host object is published unsafely.
This is all too technical and it requires further readings to understand. The good news is, you don't need to master "unsafe publication", because it is a discouraged practice anyway. The best practice is simply: never do unsafe publication; i.e. never do data race; i.e. always read/write shared data through proper synchronization, by using synchronized, volatile or java.util.concurrent.
If we always avoid unsafe publication, do we still need final fields? The answer is no. Then why are some objects (e.g. String) designed to be "thread safe immutable" by using final fields? Because it's assumed that they can be used in malicious code that tries to create uncertain state through deliberate unsafe publication. I think this is an overblown concern. It doesn't make much sense in server environments - if an application embeds malicious code, the server is compromised, period. It probably makes a bit of sense in Applet environment where JVM runs untrusted codes from unknown sources - even then, this is an improbable attack vector; there's no precedence of this kind of attack; there are a lot of other more easily exploitable security holes, apparently.
This code is fine because the reference to the instance of MyClass can't be visible to any other threads before the constructor returns.
Specifically, the happens-before relation requires that the visible effects of actions occur in the same order as they're listed in the program code, so that in the thread where the MyClass is constructed, obj must be definitely assigned before the constructor returns, and the instantiating thread goes directly from the state of not having a reference to the MyClass object to having a reference to a fully-constructed MyClass object.
That thread can then pass a reference to that object to another thread, but all of the construction will have transitively happened-before the second thread can call any methods on it. This might happen through the constructing thread's launching the second thread, a synchronized method, a volatile field, or the other concurrency mechanisms, but all of them will ensure that all of the actions that took place in the instantiating thread are finished before the memory barrier is passed.
Note that if a reference to this gets passed out of the class inside the constructor somewhere, that reference might go floating around and get used before the constructor is finished. That's what's known as unsafe publishing of the object, but code such as yours that doesn't call non-final methods from the constructor (or directly pass out references to this) is fine.
Your other thread could see a null object. A volatile object could possibly help, but an explicit lock mechanism (or a Builder) would likely be a better solution.
Have a look at Java Concurrency in Practice - Sample 14.12
This class (if taken as is) is NOT thread safe. In two words: there is reordering of instructions in java (Instruction reordering & happens-before relationship in java) and when in your code you're instantiating MyClass, under some circumstances you may get following set of instructions:
Allocate memory for new instance of MyClass;
Return link to this block of memory;
Link to this not fully initialized MyClass is available for other threads, they can call "methodCalledByOtherThreads()" and get NullPointerException;
Initialize internals of MyClass.
In order to prevent this and make your MyClass really thread safe - you either have to add "final" or "volatile" to the "obj" field. In this case Java's memory model (starting from Java 5 on) will guarantee that during initialization of MyClass, reference to alocated for it block of memory will be returned only when all internals are initialized.
For more details I would strictly recommend you to read nice book "Java Concurrency in Practice". Exactly your case is described on the pages 50-51 (section 3.5.1). I would even say - you just can write correct multithreaded code without reading that book! :)
The originally picked answer by #Sotirios Delimanolis is wrong. #ZhongYu 's answer is correct.
There is the visibility issue of the concern here. So if MyClass is published unsafely, anything could happen.
Someone in the comment asked for evidence - one can check Listing 3.15 in the book Java Concurrency in Practice:
public class Holder {
private int n;
// Initialize in thread A
public Holder(int n) { this.n = n; }
// Called in thread B
public void assertSanity() {
if (n != n) throw new AssertionError("This statement is false.");
}
}
Someone comes up an example to verify this piece of code:
coding a proof for potential concurrency issue
As to the specific example of this post:
public class MyClass{
private MyObject obj;
// Initialize in thread A
public MyClass(){
obj = new MyObject();
}
// Called in thread B
public void methodCalledByOtherThreads(){
obj.doStuff();
}
}
If MyClass is initialized in Thread A, there is no guarantee that thread B will see this initialization (because the change might stay in the cache of the CPU that Thread A runs on and has not propagated into main memory).
Just as #ZhongYu has pointed out, because the write and read happens at 2 independent threads, so there is no happens-before(write, read) relation.
To fix this, as the original author has mentioned, we can declare private MyObject obj as volatile, which will ensure that the reference itself will be visible to other threads in timely manner
(https://www.logicbig.com/tutorials/core-java-tutorial/java-multi-threading/volatile-ref-object.html) .
Directly from this web site, I came across the following description about creating object thread safety.
Warning: When constructing an object that will be shared between
threads, be very careful that a reference to the object does not
"leak" prematurely. For example, suppose you want to maintain a List
called instances containing every instance of class. You might be
tempted to add the following line to your constructor:
instances.add(this);
But then other threads can use instances to access the object before
construction of the object is complete.
Is anybody able to express the same concept with other words or another more graspable example?
Thanks in advance.
Let us assume, you have such class:
class Sync {
public Sync(List<Sync> list) {
list.add(this);
// switch
// instance initialization code
}
public void bang() { }
}
and you have two threads (thread #1 and thread #2), both of them have a reference the same List<Sync> list instance.
Now thread #1 creates a new Sync instance and as an argument provides a reference to the list instance:
new Sync(list);
While executing line // switch in the Sync constructor there is a context switch and now thread #2 is working.
Thread #2 executes such code:
for(Sync elem : list)
elem.bang();
Thread #2 calls bang() on the instance created in point 3, but this instance is not ready to be used yet, because the constructor of this instance has not been finished.
Therefore,
you have to be very careful when calling a constructor and passing a reference to the object shared between a few threads
when implementing a constructor you have to keep in mind that the provided instance can be shared between a few threads
Thread A is creating Object A, in the middle of creation object A (in first line of constructor of Object A) there is context switch. Now thread B is working, and thread B can look into object A (he had reference already). However Object A is not yet fully constructed because Thread A don't have time to finish it.
Here is your clear example :
Let's say, there is class named House
class House {
private static List<House> listOfHouse;
private name;
// other properties
public House(){
listOfHouse.add(this);
this.name = "dummy house";
//do other things
}
// other methods
}
And Village:
class Village {
public static void printsHouses(){
for(House house : House.getListOfHouse()){
System.out.println(house.getName());
}
}
}
Now if you are creating a House in a thread, "X". And when the executing thread is just finished the bellow line,
listOfHouse.add(this);
And the context is switched (already the reference of this object is added in the list listOfHouse, while the object creation is not finished yet) to another thread, "Y" running,
printsHouses();
in it! then printHouses() will see an object which is still not fully created and this type of inconsistency is known as Leak.
Lot of good data here but I thought I'd add some more information.
When constructing an object that will be shared between threads, be very careful that a reference to the object does not "leak" prematurely.
While you are constructing the object, you need to make sure that there is no way for other threads to access this object before it can be fulling constructed. This means that in a constructor you should not, for example:
Assign the object to a static field on the class that is accessible by other threads.
Start a thread on the object in the constructor which may start using fields from the object before they are fulling initialized.
Publish the object into a collection or via any other mechanisms that allow other threads to see the object before it can be fulling constructed.
You might be tempted to add the following line to your constructor:
instances.add(this);
So something like the following is improper:
public class Foo {
// multiple threads can use this
public static List<Foo> instances = new ArrayList<Foo>();
public Foo() {
...
// this "leaks" this, publishing it to other threads
instances.add(this);
...
// other initialization stuff
}
...
One addition bit of complexity is that the Java compiler/optimizer has the ability to reorder the instructions inside of the constructor so they happen at a later time. This means that even if you do instances.add(this); as the last line of the constructor, this is not enough to ensure that the constructor really has finished.
If multiple threads are going to be accessing this published object, it must be synchronized. The only fields you don't need to worry about are final fields which are guaranteed to be finished constructing when the constructor finishes. volatile fields are themselves synchronized so you don't have to worry about them.
I think that the following example illustrate what authors wanted to say:
public clsss MyClass {
public MyClass(List<?> list) {
// some stuff
list.add(this); // self registration
// other stuff
}
}
The MyClass registers itself in list that can be used by other thread. But it runs "other stuff" after the registration. This means that if other thread starts using the object before it finished its constructor the object is probably not fully created yet.
Its describing the following situation:
Thread1:
//we add a reference to this thread
object.add(thread1Id,this);
//we start to initialize this thread, but suppose before reaching the next line we switch threads
this.initialize();
Thread2:
//we are able to get th1, but its not initialized properly so its in an invalid state
//and hence th1 is not valid
Object th1 = object.get(thread1Id);
As the thread scheduler can stop execution of a thread at any time (even half-way through a high level instruction like instances.push_back(this)) and switch to executing a different thread, unexpected behaviour can happen if you don't synchronize parallel access to objects.
Look at the code below:
#include <vector>
#include <thread>
#include <memory>
#include <iostream>
struct A {
std::vector<A*> instances;
A() { instances.push_back(this); }
void printSize() { std::cout << instances.size() << std::endl; }
};
int main() {
std::unique_ptr<A> a; // Initialized to nullptr.
std::thread t1([&a] { a.reset(new A()); }); // Construct new A.
std::thread t2([&a] { a->printSize(); }); // Use A. This will fail if t1 don't happen to finish before.
t1.join();
t2.join();
}
As the access to a in main()-function is not synchronized execution will fail every once in a while.
This happens when execution of thread t1 is halted before finishing construction of the object A and thread t2 is executed instead. This results in thread t2 trying to access a unique_ptr<A> containing a nullptr.
You just have to make sure, that even, when one thread hasn't initialized the Object, no Thread will access it (and get a NullpointerException).
In this case, it would happen in the constructor (I suppose), but another thread could access that very object between its add to the list and the end of the constructor.
Below is the example code
Class Abc {
void method1(){
ExecutorService threadPool = Executors.newFixedThreadPool(10);
for(int i=0;i<100;i++){
threadPool.execute(new Runnable() {
doSomeThing(Param);
});
}
threadPool.shutdown();
}
void doSomeThing(Param param){
Object ref1,ref2,ref3,ref4;
}
}
Here we execute the method doSomeThing() in multithread. And doSomeThing() method has many object references.
My question is if any thread changes the state of object reference will this change is visible to other thread?
If so what i need to do to make the thread to have its own state. I know we can fix this by creating a new instance of class while passing it in execute(). I am trying to fix the problem with this style
Each call to doSomeThing will get its own set of variables, whether they're in the same thread or not.
The variables will be equal to whatever you set them to in each call.
My question is if any thread changes the state of object reference will this change is visible to other thread?
And the simple answer is yes. However, this is far too simple to be helpful.
What you are asking is fundamental to the multithreading concept. Essentially, if you pass the same object to several threads at once then either the changes each thread makes to the object must be choreographed carefully or you must live with unpredictable results.