to ensure a java method is thread safe - java

is it enough to use only local variables and no instance variables. Thus only using memory on the stack (per thread).
But what happens when you create a new MyObject that is local to the method. Doesn't the new object get created on the heap ? Is it thread safe becuase the reference to it is local (thread safe) ?

It is thread safe because if it is only referenced by variables in that particular method (it is, as you said, a local variable), then no other threads can possibly have a reference to the object, and therefore cannot change it.
Imagine you and I are pirates (threads). You go and bury your booty (the object) on an island (the heap), keeping a map to it (the reference). I happen to use the same island for burying my booty, but unless you give me your map, or I go digging all over the island (which isn't allowed on the island of Java), I can't mess with your stash.

Your new MyObject is thread-safe because each call to the method will create its own local instance on the heap. None of the calls refer to a common method; if there are N calls, that means N instances of MyObject on the heap. When the method exits, each instance is eligible for GC as long as you don't return it to the caller.

Well, let me ask you a question: does limiting your method to local variables mean your method can't share a resource with another thread? If not, then obviously this isn't sufficient for thread safety in general.
If you're worried about whether another thread can modify an object you created in another thread, then the only thing you need to worry about is never leaking a reference to that object out of the thread. If you achieve that, your object will be in the heap, but no other thread will be able to reference it so it doesn't matter.
Edit
Regarding my first statement, here's a method with no instance variables:
public void methodA() {
File f = new File("/tmp/file");
//...
}
This doesn't mean there can't be a shared resource between two threads :-).

Threre's no way to other threads to access such object reference. But if that object is not thread-safe, then the overall thread-safety is compromised.
Consider for example that MyObject is a HashMap.
The argument that if it's in the heap, it's not thread-safe, is not valid. The heap is not accessible via pointer arithmetic, so it doesn't affect where the object is actually stored (besides ThreadLocal's).

Related

Does a class hold references to its field objects?

I am a bit confused GC aspect when it comes to the instance variables, especially fields.
So, if an object holds references to its field objects, these won't be eligible for garbage collection until the object itself is. Since Threads are GC roots and every object must have been created on some Thread only, thread won't let go of any objects created on it and the entire object hierarchy from a Thread shall remain for a considerable time before getting garbage collected.
On the other hand, if an object lets go of the field objects, calling a getter for these objects will end up in returning null later.
So, what are the facts here?
Clarification for "field objects"(as asked in comments)
By field objects I mean, the field members of an object that are themselves objects
Edit 2: A bit more elaboration
So, you see Threads are execution units having representation in memory through the Thread object instance. Any code execution that is happening anywhere is happening on some Thread.
How would this execution happen?
Well, through the execution of some code in a method. What would that make this object created?
A Local variable
And, that would make it a GC root.
Btw, for a method call, there is a stack for that particular call and this is what I have been referring to here.
It ain't so simple as #louis-wasserman said - "Yes, naturally" or for that matter not that natural..(?)
I investigated some more and found the answer on...where you would expect it probably - Java Language Specification
2.7. Representation of Objects
The Java Virtual Machine does not mandate any particular internal
structure for objects.
In some of Oracle’s implementations of the Java Virtual Machine, a
reference to a class instance is a pointer to a handle that is itself
a pair of pointers: one to a table containing the methods of the
object and a pointer to the Class object that represents the type of
the object, and the other to the memory allocated from the heap for
the object data.
Yes, that settles it. Even though, JLS doesn't mandate on the internal structure of an java.lang.Object, it would be likely that a structure similar to Oracle's JVM might be used.
This has bigger implication that you might think. Imagine a very heavy object holding one very bulky member field object. Hmmm...a Bitmap maybe. A Bitmap of 10MB and the other object simply holds the image's title:
bulky_object = {bitmap, title}
If you create up this object as a local variable inside a method inside a nested scope(for example's sake), the container object is eligible for garbage collection after the scope gets over but if you decide to hold a reference to the bitmap(the field) object after the scope, the containing object won't have been collected fully:
void someMethod(){
// Outer block of the method
bitmap_ref;
// Nested block starts
{
some_object = new some_object();
// Hold a ref to the bitmap
bitmap_ref = some_object.bitmap;
}
// Nested block has ended. some_object is eligible for GC and is not accessible as a GC root
// anymore
// bitmap_ref shall remain available alive and well here as we are holding a ref to it
// Also, some_object garbage collection may have happened leaving bitmap_ref alive
}
This would seem like an object leak here.

Is it reasonable to synchronize on a local variable?

From the Java memory model, we know that every thread has its own thread stack, and that local variables are placed in each thread's own thread stack.
And that other threads can't access these local variables.
So in which case should we synchronize on local variables?
You are talking about the below case:
public class MyClass {
public void myMethod() {
//Assume Customer is a Class
Customer customer = getMyCustomer();
synchronized(customer) {
//only one thread at a time can access customer object
which ever holds the lock
}
}
}
In the above code, customer is a local reference variable, but you are still using a synchronized block to restrict access to the object customer is pointing to (by a single thread at a time).
In Java memory model, objects live in heap (even though references are local to a Thread which live in a stack) and synchronization is all about restricting access to an object on the heap by exactly one thread at a time.
In short, when you say local variable (non-primitive), only reference is local, but not the actual object itself i.e., it is actually referring to an object on the heap which can be accessed by many other threads. Because of this, you need synchronization on the object so that single thread can only access that object at a time.
There are two situations:
The local variable is of a primitive type like int or double.
The local variable is of a reference type like ArrayList.
In the first situation, you can't synchronize, as you can only synchronize on Objects (which are pointed to by reference-type variables).
In the second situation, it all depends on what the local variable points to. If it points to an object that other threads (can) also point to, then you need to make sure that your code is properly synchronized.
Examples: you assigned the local variable from a static or instance field, or you got the object from a shared collection.
If, however, the object was created in your thread and only assigned to that local variable, and you never give out a reference to it from your thread to another thread, and the objects implementation itself also doesn't give out references, then you don't need to worry about synchronization.
The point is: synchronization is done for a purpose. You use it to ensure that exactly one thread can do some special protection-worthy activity at any given time.
Thus: if you need synchronization, it is always about more than one thread. And of course, then you need to lock on something that all those threads have access to.
Or in other words: there is no point in you locking the door in order to prevent yourself from entering the building.
But, as the other answer points out: it actually depends on the definition of "local" variable. Lets say you have:
void foo() {
final Object lock = new Object();
Thread a = new Thread() { uses lock
Thread b = new Thread() { uses lock
then sure, that "local" variable can be used as lock for those two threads. And beyond that: that example works because synchronization happens on the monitor of a specific object. And objects reside on the heap. All of them.
Yes, it does make sense when the local variable is used to synchronize access to a block of code from threads that are defined and created in the same method as the local variable.

What should I use as a lock object of a synchronized statement in Java

Could anyone explain what is the difference between these examples?
Example # 1.
public class Main {
private Object lock = new Object();
private MyClass myClass = new MyClass();
public void testMethod() {
// TODO Auto-generated method stub
synchronized (myClass) {
// TODO: modify myClass variable
}
}
}
Example # 2.
package com.test;
public class Main {
private MyClass myClass = new MyClass();
private Object lock = new Object();
public void testMethod() {
// TODO Auto-generated method stub
synchronized (lock) {
// TODO: modify myClass variable
}
}
}
What should I use as a monitor lock if I need to take care about synchronization when modifying the variable?
Assuming that Main is not intended to be a "leaky abstraction", here is minimal difference between the first and second examples.
It may be better to use an Object rather than some other class because an Object instance has no fields and is therefore smaller. And the Object-as-lock idiom makes it clear that the lock variable is intended to only ever used as a lock.
Having said that, there is a definite advantage in locking on an object that nothing else will ever see. The problem with a Main method synchronizing on a Main (e.g. this) is that other unrelated code could also be synchronizing on it for an unrelated purpose. By synchronizing on dedicated (private) lock object you avoid that possibility.
In response to the comment:
There is a MAJOR difference in the two cases. In the first you're locking the object that you want to manipulate. In the second you're locking some other object that has no obvious relationship to the object being manipulated. And the second case takes more space, since you must allocate the (otherwise unused) Object, rather than using the already-existing instance you're protecting.
I think you are making an INCORRECT assumption - that MyClass is the data structure that needs protecting. In fact, the Question doesn't say that. Indeed the way that the example is written implies that the lock is intended to protect the entire Main class ... not just a part of its state. And in that context, there IS an obvious connection ...
The only case where it would be better to lock the MyClass would be if the Main was a leaky abstraction that allowed other code to get hold of its myClass reference. That would be bad design, especially in a multi-threaded app.
Based on the revision history, I'm pretty sure that is not the OP's intention.
The statement synchronization is useful when changing variables of an object.
You are changing variables of myClass so you want to lock on myClass object. If you were to change something in lock then you want to lock on lock object.
In example #2 you are modifying myClass but locking on lock object which is nonsense.
In first case you lock on object that it known only within this method, so it is unlikely that anybody else will use the same object to lock on, so such lock is almost useless. Second variant makes much more sense for me.
At the same time, myClass variable is also known only within this method, so it is unlikely that other thread will access it, so probably lock is not necessary here at all. Need more complete example to say more.
In general, you want to lock on the "root" object of the data you're manipulating. If you're, eg, going to subtract a value from a field in object A and add that value to object B, you need to lock some object that is somehow common (at least by convention) between A and B, possibly the "owner" object of the two. This is because you're doing the lock to maintain a "contract" of consistency between separate pieces of data -- the object locked must be common to and conceptually encompassing of the entire set of data that must be kept consistent.
The simple case, of course, is when you're modifying field A and field B in the same object, in which case locking that object is the obvious choice.
A little less obvious is when you're dealing with static data belonging to a single class. In that case you generally want to lock the class.
A separate "monitor" object -- created only to serve as a lockable entity -- is rarely needed in Java, but might apply to, say, elements of two parallel arrays, where you want to maintain consistency between element N of the two arrays. In that case, something like a 3rd array of monitor objects might be appropriate.
(Note that this is all just a "quick hack" at laying out some rules. There are many subtleties that one can run into, especially when attempting to allow the maximum of concurrent access to heavily-accessed data. But such cases are rare outside of high-performance computing.)
Whatever you choose, it's critical that the choice be consistent across all references to the protected data. You don't want to lock object A in one case and object B in another, when referencing/modifying the same data. (And PLEASE don't fall into the trap of thinking you can lock an arbitrary instance of Class A and that will somehow serve to lock another instance of Class A. That's a classical beginner's mistake.)
In your above example you'd generally want to lock the created object, assuming the consistency you're assuring is all internal to that object. But note that in this particular example, unless the constructor for MyClass somehow lets the object address "escape", there is no need to lock at all, since there is no way that another thread can get the address of the new object.
The difference are the class of the lock and its scope
- Both topics are pretty much orthogonal with synchronization
objects with different classes may have different sizes
objects in different scopes may be available in different contexts
Basically both will behave the same in relation to synchronization
Both examples are not good syncronisation practise.
The lock Object should be placed in MyClass as private field.

Best practice for java object initialization

Are there any practical differences between these approaches? (memory, GC, performance, etc?)
while...{
Object o=new Object();
...
o=new Object();
...
}
and
Object o;
while...{
o=new Object();
...
o=new Object();
...
}
From Effective Java 2nd Edition:
The most powerful technique for minimizing the scope of a local variable
is to declare it where it is first used. If a variable is declared before it is used, it’s
just clutter—one more thing to distract the reader who is trying to figure out what
the program does. By the time the variable is used, the reader might not remember
the variable’s type or initial value.
Declaring a local variable prematurely can cause its scope not only to extend
too early, but also to end too late. The scope of a local variable extends from the
point where it is declared to the end of the enclosing block. If a variable is
declared outside of the block in which it is used, it remains visible after the program
exits that block. If a variable is used accidentally before or after its region of
intended use, the consequences can be disastrous.
In other words, the difference in performance (CPU, memory) are irrelevant in your case. What is far more important is the semantics and correctness of the program, which is better in your first code example.
In your first example, o will go out of scope after your while loop finishes.
Now, If you don't actually use o outside of the while loop (even if you load the object it references into a different structure) this is fine, but you won't be able to access o outside of the loop
also, and this is just being nitpicky, but neither of those will compile, because you declare Object o twice.
I think you need to trade off between object reuse and the eligibility for garbage collection + readability.
The minimum scope always increase readability & minimize error-proneness.
Again if the creation of some object is too costly(like Thread, Database Connection), the reuse should be considered. They are not generally created inside loop and are cached in pool.
That's why connection pooling & Thread Pool are so popular.
In case of Option 1, Object will be eligible for GC once while loops finishes, whereas in option 2 Object will last until method end.

Use of Volatile variables for safe publication of Immutable objects

I came across this statement:
In properly constructed objects, all
threads will see correct values of
final fields, regardless of how the
object is published.
Then why a volatile variable is used to safely
publishing an Immutable object?
I'm really confused. Can anybody make it clear with a suitable example?
In this case, the volatility would only ensure visibility of the new object; any other threads that happened to get hold of your object via a non-volatile field would indeed see the correct values of final fields as per JSR-133's initialization safety guarantees.
Still, making the variable volatile doesn't hurt; is correct from a memory management perspective anyway; and would be necessary for non-final fields initialised in a constructor (although there shouldn't be any of these in an immutable object). If you wish to share variables between threads, you'll need to ensure adequate synchronization to give visibility anyway; though in this case you're right, that there's no danger to the atomicity of the constructor.
Thanks to Tom Hawtin for pointing out I'd completely overlooked the JMM guarantees on final fields; previous incorrect answer is given below.
The reason for the volatile variable is that is establishes a happens-before relationship (according to the Java Memory Model) between the construction of the object, and the assignment of the variable. This achieves two things:
Subsequent reads of that variable from different threads are guaranteed to see the new value. Without marking the variable as volatile, these threads could see stale values of the reference.
The happens-before relationship places limits on what reorderings the compiler can do. Without a volatile variable, the assignment to the variable could happen before the object's constructor runs - hence other threads could get a reference to the object before it was fully constructed.
Since one of the fundamental rules of immutable objects is that you don't publish references during the constructor, it's this second point that is likely being referenced here. In a multithreaded environment without proper concurrent handling, it is possible for a reference to the object to be "published" before that object has been constructed. Thus another thread could get that object, see that one of its fields is null, and then later see that this "immutable" object has changed.
Note that you don't have to use volatile fields to achieve this if you have other appropriate synchronization primitives - for example, if the assignment (and all later reads) are done in a synchronized block on a given monitor - but in a "standalone" sense, marking the variable as volatile is the easiest way to tell the JVM "this might be read by multiple threads, please make the assignment safe in that context."
A volatile reference to an immutable object could be useful. This would allow you to swap one object for another to make the new data available to other threads.
I would suggets you look at using AtomicReference first however.
If you need final volatile fields you have a problem. All fields, including final ones are available to other threads as soon as the constructor returns. So if you pass an object to another thread in the constructor, it is possible for the other thread to see an inconsistent state. IMHO you should consider a different solution so you don't have to do this.
You cant really see the difference in Immutable class.see the below example.in Myclass.class
public static Foo getInstance(){
if(INSTANCE == null){
INSTANCE = new Foo();
}
return INSTANCE;
}
in the above code if Foo is declared final(final Foo INSTANCE;) it guarantees that it won't publish references during the constructor call.partial object construction is not possible
consider this...if this Myclass is Immutable, its state is not gonna change after object construction, making Volatile(volatile final Foo INSTANCE;) keyword redundant.but if this class allows its object state to be changed(Not immutable) multiple threads CAN actually update the object and some updates are not visible to other threads, hence volatile keyword ensures safety publication of objects in non-Immutable class.

Categories