Object References,Threads and Garbage collection - java

One of the few things I know about Garbage collector is that it picks up the object which have no references.
Let MyClass is a class and I make an object of it by doing
MyClass object = new MyClass();
I can make it eligible for garbage collection even when my code is executing by doing
object = null;
But what happened to the objects which have no object references, like following statement.
new MyClass();
My doubt is really concerned to threads, I can create and execute a thread by following code.
public static void main(String args[])
{
new Thread() {
public void run() {
try {
System.out.println("Does it work?");
Thread.sleep(1000);
System.out.println("Nope, it doesnt...again.");
} catch(InterruptedException v) {
System.out.println(v);
}
}
}.start();
}
But the thread is having no reference to it, like you know I can create thread by having a object reference but May be I don't want to.
Let the thread is doing a long long running Task, How the garbage collector will react to this?
What will happen to the new MyClass(); statement from the perspective of GC...

GC looks at all live threads for stack traces so your Thread doesn't have a reference from the main thread, but it does from the currently executing one (the newly created thread).
GC looks for the reachable objects from so called GC roots, one of the roots are all threads.

...object which have no references...
Let's just be clear about what that means. Discussions about GC often distinguish objects that are traceable from a GC root from objects that are not traceable.
"Traceable" means, that you follow a chain of object references starting from a gc root, and eventually reach the object in question. The gc roots basically are all of the static variables in the program, and all of the parameters and all of the local variables of every active method call in every thread.
"Active method call" means that a call has happened for which the corresponding return has not yet happened.
So, if some active method in some thread has a reference to a Foo instance, and the Foo instance has a reference to a Bar instance, and the Bar instance has a reference to a Baz instance; then the Baz instance is "traceable", and it won't be collected.
In every thread, there are method activations below the run() call on the thread's stack, and at least one of those method activations has a reference to the Thread instance that manages the thread. So a Thread object will never be GCd while the thread that it manages still is running.
Also, somewhere in the dark heart of the JVM, there's a static list of all of the Class instances, so none of them will ever be GCd either.

The JVM keeps a reference to all the running thread so if a thread is running it will not be garbage collected. Also if your object is referenced by a running thread it would not be garbage collected.

Related

Is using "final" parameter thread safe in Java

In this example, is it sufficient to declare the parameter obj as final to safely use it in the thread, below?
public void doSomethingAsync (final Object obj)
{
Thread thread = new Thread ()
{
#Override public void run () { ... do something with obj ... }
}
thread.start ();
}
At first glance it may seem fine. A caller invokes doSomethingAsync and obj gets cached until needed in the thread.
But what happens if there are a burst of calls to doSomethingAsync such that they complete before the threads have done anything with obj?
If the Java compiler simply makes obj into a member variable, the last call to doSomethingAsync will overwrite the prior values of obj, making prior invocations of the thread use a wrong value. Or, does the compiler generate a queue or some dimensioned storage for obj so that each thread gets the proper value?
At first glance it may seem fine. A caller invokes doSomethingAsync and obj gets cached until needed in the thread.
The object is not "cached", the variable reference merely cannot be assigned to another object. The final keyword only prevents the variable from being re-assigned, it does not prevent the object that is being referenced from being mutated.
But what happens if there are a burst of calls to doSomethingAsync such that they complete before the threads have done anything with obj?
If the threads modify the referenced object the behavior would be undefined, they would be competing for the object and their reference to the object may have "old" values because the object was not synchronized between the threads. If the object is immutable, it has no state and cannot be changed, then it is inherently thread safe.
If the Java compiler simply makes obj into a method variable, the last call to doSomethingAsync will overwrite the prior values of obj, making prior invocations of the thread use a wrong value. Or, does the compiler generate a queue or some dimensioned storage for obj so that each thread gets the proper value?
The compiler does not guarantee that the threads get executed in order, threads run concurrently. This is why the synchronize keyword exists, so that you can guarantee that when you reference the object you reference the same state of the object that all of the other threads see. Obviously this is at a cost to performance so it is recommended to only pass immutable objects into threads so that you don't have to synchronize the threads every time you do something with the object.
Large edit here, based on a conversation the Original Poster and I had in chat.
It seems Peri's real question was about the way Java stored local variables like "obj" for use by Thread. This is called "captured variables" if you want to google it yourself. There is a nice discussion here.
Basically what happens is that all your local variable, the ones stored on the stack, plus the "this" pointer get copied into your local class (Thread in this case) when the local class is instantiated.
Original answer follows for the sake of the comments. But it is now obsolete.
Each time you call doSomethingAsync you are creating a new thread. If you call doSomethingAsync just once with a particular object, and then you modify that same object in the calling thread, then you have no idea what what the asynchronous thread will do. It might "do something with the object" before you modify it in the calling thread, after you modify in the calling thread or even WHILE you are concurrently modifying it in the calling thread. Unless the Object itself is thread safe this will cause problems.
Similarly, if you call doSomethignAsync twice with the same object, then you have no idea which asynchronous thread will modify the object first, and no guarantee they will not act concurrently on the same object.
Finally, if you call doSomethignAsync twice with 2 different objects then you don't know which asynchronous thread will act on its own object first, but you don't care, because they can't conflict with each other unless the objects have Static mutable variables (class variables) that are being modified).
If you require that one task get completed before another task and in the order submitted, then a single threaded ExecutorService is your answer.
If the Java compiler simply makes obj into a member variable, the last call to doSomethingAsync will overwrite the prior values of obj, making prior invocations of the thread use a wrong value
No, this will not happen. The subsequent call to doSomethingAsync cannot overwrite the obj captured by previous invocations of doSomethingAsync. This stands even if you remove the final keyword (assume java let you do it for just this time).
I think your question ultimately is about how closure works/is implemented in java. However, your code is not demonstrating the complication in the proper way because the code is not even trying to modify the variable obj in the same lexical scope.
In a way Java is not really capturing the variable obj, but its value. You could write the your code in a different way, and the overall effect is the same:
class YourThread extends Thread {
private Object param;
public YourThread (Object obj){
param = obj;
}
#Override
public void run(){
//do something with your param
}
}
and you no longer need the final keyword:
public void doSomethingAsync (Object obj){
Thread t = new YourThread (obj);
t.start();
}
Now, say you have two instances of YourThread created, how could the second instance modify what has been passed as parameter to the first instance?
Closure in Other Languages
In other languages, magical things can indeed happen, but to show it you need to write the code slightly different:
public void doSomethingAsync (Object obj){
//Here let's assume obj is not null
Thread thread = new Thread (){
#Override
public void run () { ... /*do something with obj*/ ... }
}
thread.start ();
obj = null;
}
This is not valid Java code, but in certain languages code like that is allowed. And the thread, when its run method is executed, might see obj as null.
Similarly, in the below code (again, not valid in Java), thread2 could potentially impact thread1 if thread2 executes first and changes obj in its run method:
public void doSomethingAsync (Object obj){
Thread thread1 = new Thread (){
#Override
public void run () { ... /*do something with obj*/ ... }
}
thread1.start ();
Thread thread2 = new Thread (){
#Override
public void run () { ... /*do something with obj*/ ... }
}
thread2.start ();
}
Back to Java
The reason Java forces you to put a final on obj is that although Java's syntax looks extremely similar to the closure syntax used in other languages, it is not doing the same closure semantics. Knowing it is final, Java does not need to create capturing object (thus additional heap allocation), but use something similar to YourThread behind the scene. See this link for more details

Java Multithreading : changing variable from multiple threads

Say I have class
public class OuterClass
{
public static WorkerClass worker;
}
In thread 1
The following command is executed
Outerclass.worker.doLongRunningOperation();
while the doLongRunningOperation is executed, in thread 2
Outerclass.worker = new WokerClass();
What will happen to the doLongRunningOperation in thread 1?
Will the worker Object referenced by thread 1 be garbage collected, only after the doLongRunningOperation is complete or can it be abruptly terminated to garbage collect worker Object.
Edit :
I think it should be GC'ed, since it is no longer referenced. But what will happen to the doLongRunningOperation? will it be terminated abruptly?
The Java garbage collector (GC) reclaims objects that are not traceable starting from a set of GC roots. That is, if object A is referenced by object B, and object B is referenced by object C, and object C is referenced by a root, then objects A, B, and C are all safe from the garbage collector.
So what are the roots? I don't know the complete answer, but I do know that the root set includes every local variable and parameter in every running thread.
So, if some local variable or argument in thread 1 still has a reference to the original WorkerClass instance, then the instance will continue to live.
The original WorkerClass instance will only be reclaimed when it is not referenced by any local or arg in any thread or, by any field in any traceable object. When that happens, it won't matter to your program any more because your program will no longer have any means to access the object.
P.S., "arguments and locals" includes hidden variables that are part of the Java implementation, and it includes implicit variable such as the this reference in every object method. Your original WorkerClass instance can not be reclaimed as long any method call on it (e.g., doLongRunningOperation) still is active.

A new thread object that has not been started will be garbage collected?

I have the source code from a Java Virtual Machine. This VM only garbage collect threads which fullfill these conditions (both conditions):
Thread is finished (started and finished)
Thread object does not have any reference
I think it was supposed to garbage collect a not started thread with no refence. But these threads are being held in the VM memory. Is that correct ?
Sample Code:
public class Test implements Runnable{
private Thread thread;
public Test() {
thread = new Thread(this);
}
#Override
public void run() {
//This thread never runs...
//My question is about garbage collector in a situation like this...
}
}
Considering:
Test object does not have reference anymore
The Test object thread never started
The Test thread will be garbage collected ?
An instance of Thread or Runnable class is like an instance of any other class. So, yes it will be GCed if the reference goes out of scope.
Note : You will first have to create an instance of Test.
A running thread acts as root for GC and will not be GCed. Its starting the thread (by calling start() that actually creates a executing thread and makes it special.

Is it possible to identify that object is collected by garbage collector or not in java?

i have read that object becomes eligible for garbage collection in following cases.
All references of that object explicitly set to null.
Object is created inside a block and reference goes out scope
once control exit that block.
Parent object set to null, if an object holds reference of another
object and when you set container object's reference null, child or
contained object automatically becomes eligible for garbage
collection.
But is there anyway to identify that object which is eligible for garbage collection is collected by garbage collector?
You can implement the Object#finalize() method
public class Driver {
public static void main(String[] args) throws Exception {
garbage();
System.gc();
Thread.sleep(1000);
}
public static void garbage() {
Driver collectMe = new Driver();
}
#Override
protected void finalize() {
System.out.println(Thread.currentThread().getName() + ": See ya, nerds!");
}
}
which prints
Finalizer: See ya, nerds!
So you can intercept right before collection. The javadoc states
The general contract of finalize is that it is invoked if and when the
JavaTM virtual machine has determined that there is no longer any
means by which this object can be accessed by any thread that has not
yet died, except as a result of an action taken by the finalization
of some other object or class which is ready to be finalized. The
finalize method may take any action, including making this object
available again to other threads;
but also
The finalize method is never invoked more than once by a Java virtual
machine for any given object.
After an object is garbage collected, JVM calls its finalize method. Default implementation does nothing; you can override it to, for example, print farewell message, or to close some opened resource.
Note however, that there is no guarantee as to how soon it is called after being collected.

Creating Object in a thread safety way

Directly from this web site, I came across the following description about creating object thread safety.
Warning: When constructing an object that will be shared between
threads, be very careful that a reference to the object does not
"leak" prematurely. For example, suppose you want to maintain a List
called instances containing every instance of class. You might be
tempted to add the following line to your constructor:
instances.add(this);
But then other threads can use instances to access the object before
construction of the object is complete.
Is anybody able to express the same concept with other words or another more graspable example?
Thanks in advance.
Let us assume, you have such class:
class Sync {
public Sync(List<Sync> list) {
list.add(this);
// switch
// instance initialization code
}
public void bang() { }
}
and you have two threads (thread #1 and thread #2), both of them have a reference the same List<Sync> list instance.
Now thread #1 creates a new Sync instance and as an argument provides a reference to the list instance:
new Sync(list);
While executing line // switch in the Sync constructor there is a context switch and now thread #2 is working.
Thread #2 executes such code:
for(Sync elem : list)
elem.bang();
Thread #2 calls bang() on the instance created in point 3, but this instance is not ready to be used yet, because the constructor of this instance has not been finished.
Therefore,
you have to be very careful when calling a constructor and passing a reference to the object shared between a few threads
when implementing a constructor you have to keep in mind that the provided instance can be shared between a few threads
Thread A is creating Object A, in the middle of creation object A (in first line of constructor of Object A) there is context switch. Now thread B is working, and thread B can look into object A (he had reference already). However Object A is not yet fully constructed because Thread A don't have time to finish it.
Here is your clear example :
Let's say, there is class named House
class House {
private static List<House> listOfHouse;
private name;
// other properties
public House(){
listOfHouse.add(this);
this.name = "dummy house";
//do other things
}
// other methods
}
And Village:
class Village {
public static void printsHouses(){
for(House house : House.getListOfHouse()){
System.out.println(house.getName());
}
}
}
Now if you are creating a House in a thread, "X". And when the executing thread is just finished the bellow line,
listOfHouse.add(this);
And the context is switched (already the reference of this object is added in the list listOfHouse, while the object creation is not finished yet) to another thread, "Y" running,
printsHouses();
in it! then printHouses() will see an object which is still not fully created and this type of inconsistency is known as Leak.
Lot of good data here but I thought I'd add some more information.
When constructing an object that will be shared between threads, be very careful that a reference to the object does not "leak" prematurely.
While you are constructing the object, you need to make sure that there is no way for other threads to access this object before it can be fulling constructed. This means that in a constructor you should not, for example:
Assign the object to a static field on the class that is accessible by other threads.
Start a thread on the object in the constructor which may start using fields from the object before they are fulling initialized.
Publish the object into a collection or via any other mechanisms that allow other threads to see the object before it can be fulling constructed.
You might be tempted to add the following line to your constructor:
instances.add(this);
So something like the following is improper:
public class Foo {
// multiple threads can use this
public static List<Foo> instances = new ArrayList<Foo>();
public Foo() {
...
// this "leaks" this, publishing it to other threads
instances.add(this);
...
// other initialization stuff
}
...
One addition bit of complexity is that the Java compiler/optimizer has the ability to reorder the instructions inside of the constructor so they happen at a later time. This means that even if you do instances.add(this); as the last line of the constructor, this is not enough to ensure that the constructor really has finished.
If multiple threads are going to be accessing this published object, it must be synchronized. The only fields you don't need to worry about are final fields which are guaranteed to be finished constructing when the constructor finishes. volatile fields are themselves synchronized so you don't have to worry about them.
I think that the following example illustrate what authors wanted to say:
public clsss MyClass {
public MyClass(List<?> list) {
// some stuff
list.add(this); // self registration
// other stuff
}
}
The MyClass registers itself in list that can be used by other thread. But it runs "other stuff" after the registration. This means that if other thread starts using the object before it finished its constructor the object is probably not fully created yet.
Its describing the following situation:
Thread1:
//we add a reference to this thread
object.add(thread1Id,this);
//we start to initialize this thread, but suppose before reaching the next line we switch threads
this.initialize();
Thread2:
//we are able to get th1, but its not initialized properly so its in an invalid state
//and hence th1 is not valid
Object th1 = object.get(thread1Id);
As the thread scheduler can stop execution of a thread at any time (even half-way through a high level instruction like instances.push_back(this)) and switch to executing a different thread, unexpected behaviour can happen if you don't synchronize parallel access to objects.
Look at the code below:
#include <vector>
#include <thread>
#include <memory>
#include <iostream>
struct A {
std::vector<A*> instances;
A() { instances.push_back(this); }
void printSize() { std::cout << instances.size() << std::endl; }
};
int main() {
std::unique_ptr<A> a; // Initialized to nullptr.
std::thread t1([&a] { a.reset(new A()); }); // Construct new A.
std::thread t2([&a] { a->printSize(); }); // Use A. This will fail if t1 don't happen to finish before.
t1.join();
t2.join();
}
As the access to a in main()-function is not synchronized execution will fail every once in a while.
This happens when execution of thread t1 is halted before finishing construction of the object A and thread t2 is executed instead. This results in thread t2 trying to access a unique_ptr<A> containing a nullptr.
You just have to make sure, that even, when one thread hasn't initialized the Object, no Thread will access it (and get a NullpointerException).
In this case, it would happen in the constructor (I suppose), but another thread could access that very object between its add to the list and the end of the constructor.

Categories