I have confusions on how GC works in Java.
Below is the code snippet that confuse me:
private Data data = new Data();
void main() {
for (int i = 0; i < 100 ; i++) {
MyThread thread = new MyThread(data);
thread.start();
}
System.gc();
// Long running process
}
class MyThread extends Thread {
private Data dataReference;
MyThread(Data data) {
dataReference = data;
}
}
In the above example if gc is called before continuing further (// Long running process)
will the local threads will be garbage collected?
Or GC will mark them (MyThread local references) as alive since it holds the reference to global reference data?
The MyThread instances may be garbage collected only after they are done (i.e. their run method is done). After the for loop ends, any instances of MyThread whose run method is done may be garbage collected (since there are no references to them).
The fact the the MyThread instances each hold a reference to a Data instance that doesn't get garbage collected doesn't affect the time the MyThread instances become eligible for garbage collection.
Your MyThread instances will not be eligible for garbage collection until they have finished running.
The thread stack and local variables for any live (i.e. started but not terminated) thread are reachable by definition.
A reachable object is any object that can be accessed in any potential continuing computation from any live thread. (JLS 12.6.1)
Furthermore, since a live thread could call Thread.currentThread(), the thread's Thread object must also be reachable as long as the thread is live ... irrespective of any other references to it.
However, if the reference to a Thread object becomes unreachable before the start() method has been called, it will be eligible for garbage collection. If this was not so, creating and not starting a Thread would be a memory leak!
You can always call to the garbage collection and but it is not guaranteed to run at the same time. (may or may not depending on your system). because garbage collection running under the daemon thread which is a low priority thread.
An object becomes eligible for Garbage collection or GC if it's not reachable from any live threads or by any static references. In other words, you can say that an object becomes eligible for garbage collection if its all references are null. Cyclic dependencies are not counted as the reference so if object A has a reference to object B and object B has a reference to Object A and they don't have any other live reference then both Objects A and B will be eligible for Garbage collection.
garbage-collection-in-java
There is no grantee that a gc will be executed after a System.gc(); call. A System.gc() call simply SUGGESTS that the VM do a garbage collection.
And thread is not the target for a gc. A thread won't be cleaned up unless its finished running.
Generally speaking, objects are juedged to be alive, if they are still referenced by others.
You should never be calling System.gc. The system will call it for you when low on memory.
In Java, GC works on a system called Mark and Sweep. The algorithm works like this
Start with a set of root objects (GC roots) and a set of all the objects allocated.
Mark those roots
Mark every object reachable from those roots, by visiting every field of these objects recursively.
When every possible object is marked, walk the list of all objects. If an item is not marked, free it.
(This is a simplification, the modern implementation works sort of like this, but is far more sophisticated).
So what is a GC root? Any object stored in a local variable still in scope, in a static variable, in a JNI reference, and all threads that are currently running.
So no, a thread won't be cleaned up unless its finished running. That's why threads so easily create a memory leak- as long as they run, any object they have a reference to cannot be freed because a GC root (the thread) has a reference to it.
But the relationship always goes down from the root to other objects. If Foo holds a reference to Bar, Foo can be deleted regardless of if Bar can be. But if Foo can't be deleted, then neither can Bar.
Related
I decided to continue https://stackoverflow.com/a/41998907/2674303 in a separated topic.
Let's consider following example:
public class SimpleGCExample {
public static void main(String[] args) throws InterruptedException {
ReferenceQueue<Object> queue=new ReferenceQueue<>();
SimpleGCExample e = new SimpleGCExample();
Reference<Object> pRef=new PhantomReference<>(e, queue),
wRef=new WeakReference<>(e, queue);
e = null;
for(int count=0, collected=0; collected<2; ) {
Reference ref=queue.remove(100);
if(ref==null) {
System.gc();
count++;
}
else {
collected++;
System.out.println((ref==wRef? "weak": "phantom")
+" reference enqueued after "+count+" gc polls");
}
}
}
#Override
protected void finalize() throws Throwable {
System.out.println("finalizing the object in "+Thread.currentThread());
Thread.sleep(100);
System.out.println("done finalizing.");
}
}
Java 11 prints following:
finalizing the object in Thread[Finalizer,8,system]
weak reference enqueued after 1 gc polls
done finalizing.
phantom reference enqueued after 3 gc polls
First 2 rows can change order. Looks like they work in parallel.
Last row sometimes prints 2 gc polls and sometimes 3
So I see that enqueing of PhantomReference takes more GC cycles. How to explain it? Is it mentioned somewhere in documentation(I can't find)?
P.S.
WeakReference java doc:
Suppose that the garbage collector determines at a certain point in
time that an object is weakly reachable. At that time it will
atomically clear all weak references to that object and all weak
references to any other weakly-reachable objects from which that
object is reachable through a chain of strong and soft references. At
the same time it will declare all of the formerly weakly-reachable
objects to be finalizable. At the same time or at some later time it
will enqueue those newly-cleared weak references that are registered
with reference queues
PhantomReference java doc:
Suppose the garbage collector determines at a certain point in time
that an object is phantom reachable. At that time it will atomically
clear all phantom references to that object and all phantom references
to any other phantom-reachable objects from which that object is
reachable. At the same time or at some later time it will enqueue
those newly-cleared phantom references that are registered with
reference queues
Difference is not clear for me
P.S.(we are speaking about object with non-trivial finalize method)
I got answer to my question from #Holger:
He(no sexism but I suppose so) pointed me to the java doc and noticed that PhantomReference contains extra phrase in comparison with Soft and Weak References:
An object is weakly reachable if it is neither strongly nor softly
reachable but can be reached by traversing a weak reference. When the
weak references to a weakly-reachable object are cleared, the object
becomes eligible for finalization.
An object is phantom reachable if
it is neither strongly, softly, nor weakly reachable, it has been
finalized, and some phantom reference refers to it
My next question was about what does it mean it has been finalized I expected that it means that finalize method was finished
To prove it I modified application like this:
public class SimpleGCExample {
static SimpleGCExample object;
public static void main(String[] args) throws InterruptedException {
ReferenceQueue<Object> queue = new ReferenceQueue<>();
SimpleGCExample e = new SimpleGCExample();
Reference<Object> pRef = new PhantomReference<>(e, queue),
wRef = new WeakReference<>(e, queue);
e = null;
for (int count = 0, collected = 0; collected < 2; ) {
Reference ref = queue.remove(100);
if (ref == null) {
System.gc();
count++;
} else {
collected++;
System.out.println((ref == wRef ? "weak" : "phantom")
+ " reference enqueued after " + count + " gc polls");
}
}
}
#Override
protected void finalize() throws Throwable {
System.out.println("finalizing the object in " + Thread.currentThread());
Thread.sleep(10000);
System.out.println("done finalizing.");
object = this;
}
}
I see following output:
weak reference enqueued after 1 gc polls
finalizing the object in Thread[Finalizer,8,system]
done finalizing.
And application hangs. I think it is because for Weak/Soft references GC works in a following way: As soon as GC detected that object is Weak/Soft Reachable it does 2 actions in parallel:
enqueue Weak/Soft into registered ReferenceQueue instance
Run finalize method
So for adding into ReferenceQueue it doesn't matter if object was resurrected or not.
But for PhantomReference actions are different. As soon as GC detected that object is Phantom Reachable it does following actions sequentially:
Run finalize method
Check that object still only phantomReachable(check that object was not resurrected during finalize method execution). And Only if object is GC adds phantom reference into ReferenceQueue
But #Holger said that it has been finalized means that JVM initiated finalize() method invocation and for adding PhantomReference into ReferenceQueue it doesn't matter if it finished or not. But looks like my example shows that it really matter.
Frankly speaking I don't understand the difference according to adding into RefernceQueue for Weak and Soft Reference. What was the idea?
The key point is the definition of “phantom reachable” in the package documentation:
An object is phantom reachable if it is neither strongly, softly, nor weakly reachable, it has been finalized, and some phantom reference refers to it.
bold emphasis mine
Note that when we remove the finalize() method, the phantom reference gets collected immediately, together with the weak reference.
This is the consequence of JLS §12.6:
For efficiency, an implementation may keep track of classes that do not override the finalize method of class Object, or override it in a trivial way.
…
We encourage implementations to treat such objects as having a finalizer that is not overridden, and to finalize them more efficiently, as described in §12.6.1.
Unfortunately, §12.6.1 does not go into the consequences of “having a finalizer that is not overridden”, but it’s easy to see, that the implementation just treats those objects like being already finalized, never enqueuing them for finalization and hence, being able to reclaim them immediately, which affects the majority of all objects in typical Java applications.
Another point of view is that the necessary steps for ensuring that the finalize() method will eventually get invoked, i.e. the creation and linking of a Finalizer instance, will be omitted for objects with a trivial finalizer. Also, eliminating the creation of purely local objects after Escape Analysis, only works for those objects.
Since there is no behavioral difference between weak references and phantom references for objects without a finalizer, we can say that the presence of finalization, and its possibility to resurrect objects, is the only reason for the existence of phantom references, to be able to perform an object’s cleanup only when it is safe to assume that it can’t get resurrected anymore¹.
¹ Though, before Java 9, this safety was not bullet-proof, as phantom references were not automatically cleared and deep reflection allowed to pervert the whole concept.
PhantomReferences will only be enqueued after any associated finalizer has finished execution. Note a finalizer can resurrect an object (used to good effect by Princeton's former Secure Internet Project).
Exact behaviour beyond the spec is not specified. Here be implementation dependent stuff.
So what seems to be happening? Once an object weakly collectable, it is also finalisable. So the WeakReferences can be enqueued and the objects queued for finalisation in the same stop-the-world event. The finalisation thread(s) is (are) running in parallel with your ReferenceQueue thread (main). Hence you may see the first two lines of your output in either order, always (unless wildly delayed) followed by the third.
Only some time after your finalizer is exited is the PhantomReference enqueueable. Hence the gc count is strictly greater. The code looks like a reasonably fair race. Perhaps changing the millisecond timeouts would change things. Most things GC don't have exact guarantees.
As far as I understand, GC starts with some set of initial objects (stack, static objects) and recursively traverses it building a graph of reachable objects. Then it marks the memory taken by these objects as occupied and assumes all the rest of the memory free.
But what if this 'free' memory contains an object with finalize method? GC has to call it, but I don't see how it can even know about objects that aren't reachable anymore.
I suppose GC can keep track of all 'finalizable' objects while they are alive. If so, does having finalizable objects make garbage collecting more expensive even when they are still alive?
Consider the Reference API.
It offers some references with special semantics to the GC, i.e Weak, Soft, and Phantom references. There’s simply another non-public type of special reference, for objects needing finalization.
Now, when the garbage collector traverses the object graph and encounters such a special reference object, it will not mark objects reachable through this reference as strongly reachable, but reachable with the special semantics. So if an object is only finalizer-reachable, the reference will be enqueued, so that one (or one of the) finalizer thread(s) can poll the queue and execute the finalize() method (it’s not the garbage collector itself calling this method).
In other words, the garbage collector never processes entirely unreachable objects here. To apply a special semantic to the reachability, the reference object must be reachable, so the referent can be reached through that reference. In case of finalizer-reachability, Finalizer.register is called when an object is created and it creates an instance of Finalizer in turn, a subclass of FinalReference, and right in its constructor, it calls an add() method which will insert the reference into a global linked list. So all these FinalReference instances are reachable through that list until an actual finalization happens.
Since this FinalReference will be created right on the instantiation of the object, if its class declares a non-trivial finalize() method, there is already some overhead due to having a finalization requirement, even if the object has not collected yet.
The other issue is that an object processed by a finalizer thread is reachable by that thread and might even escape, depending on what the finalize() method does. But the next time, this object becomes unreachable, the special reference object does not exist anymore, so it can be treated like any other unreachable object.
This would only be a performance issue, if memory is very low and the next garbage collection had to be performed earlier to eventually reclaim that object. But this doesn’t happen in the reference implementation (aka “HotSpot” or “OpenJDK”). In fact, there could be an OutOfMemoryError while objects are pending in the finalizer queue, whose processing could make more memory reclaimable. There is no guaranty that finalization runs fast enough for you’re purposes. That’s why you should not rely on it.
But what if this 'free' memory contains an object with finalize
method? GC has to call it, but I don't see how it can even know about
objects that aren't reachable anymore.
Let's say we use CMS garbage collector. After it successfully marked all live objects in a first phase, it will then scan memory again and remove all dead objects. GC thread does not call finalize method directly for these objects.
During creation, they are wrapped and added to finalizer queue by JVM (see java.lang.ref.Finalizer.register(Object)). This queue is processed in another thread (java.lang.ref.Finalizer.FinalizerThread), finalize method will be called when there are no references to the object. More details are covered in this blog post.
If so, does having finalizable objects make garbage collecting more
expensive even when they are still alive?
As you can now see, most of the time it does not.
The finalise method is called when an object is about to get garbage collected. That means, when GC determines that the object is no longer being referenced, it can call the finalise method on it. It doesn't have to keep track of objects to be finalised.
According to javadoc, finalize
Called by the garbage collector on an object when garbage collection determines that there are no more references to the object.
So the decision is based on reference counter or something like that.
Actually it is possible not to have this method called at all. So it may be not a good idea to use it as destructor.
As far as I know objects are available to be garbage collected when assigning a null value to the variable :
Object a = new Object;
a = null; //it is now available for garbage collection
or when the object is out of scope due to the method's execution is done:
public void gc(){
Object a = new Object;
} //once gc method is done the object where a is referring to will be available for garbage collection
given with the out of scope isn't also the same when the application just ended?
class Ink{}
public class Main {
Ink k = new Ink();
public void getSomething(){
//method codes here
}
public static void main(String[] args) {
Main n = new Main();
}
}
where I expect 2 objects (Ink object and Main object) should be garbage collected when the application ends.
When the Java application terminates, the JVM typically also terminates in the scope of the OS, so GC at that point is moot. All resources have returned to the OS after as orderly a shutdown of the JVM as the app defined.
You are confusing the event of an object becoming eligible for garbage collection with the actual process of collecting garbage or, more precisely, reclaiming memory.
The garbage collector doesn’t run just because a reference became null or an object went out of scope, that would be a waste of resources. It usually runs because either, memory is low or CPU resources are unused.
Also, the term “garbage collection” is misleading. The actual task for the JVM is to mark all objects being still alive (also known as reachable objects). Everything else is considered reclaimable, aka garbage. Since at the termination of the JVM, the entire memory is reclaimed per se, there is no need to search for reachable references.
That said, it’s helpful to understand, that most thinking about the memory management is useless. E.g. in your code:
public void gc(){
Object a = new Object;
// even here the object might get garbage collected as it is unused in subsequent code
}
the optimizer might remove the entire creation of the object, as it has no observable effect. Then, there will no garbage collection, as the object hasn’t been created in the first place.
See also here.
JVM monitors the GC roots - if an object is not available from a GC root, then it is a candidate for garbage collections. GC root can be
local variables
active java threads
static variables
jni references
I have read in countless places that running threads are garbage collection roots (ie they reside on the stack, the GC identifies them and traces through them to determine if the objects inside them are still reachable). Further more, a GC root will never be garbage collected itself.
My confusion is here: If the objects allocated from within a thread can never be garbage collected until the thread is stopped, how then is anything garbage collected in single-threaded programs where the only thread is the main thread ?
Clearly I'm missing something here.
First, a thread (stack) is only a GC root while it is alive. When the thread terminates, it is no longer a GC root.
My confusion is here: If the objects allocated from within a thread can never be garbage collected until the thread is stopped, how then is anything garbage collected in single-threaded programs where the only thread is the main thread ?
Your premise is incorrect.
It doesn't matter which thread allocates an object. What matters is whether an objects allocated by your thread remains reachable. If it become unreachable (for example, if you didn't put the object's reference into a local variable) ... then it can be collected.
Objects are in the heap, regardless which thread created them.
Objects may be reachable through references. Some of these references can be on the call stack of one or more threads.
An object can be collected when there are no more references to it, regardless whether is allocating thread is still running or not.
For example, the thread below repeatedly allocates new StringBuilder objects. During a call to foo(), the thread has references on its call stack to a StringBuilder object. When foo() returns, there are no further references to the StringBuilder object. Therefore, that object is no longer reachable, and is eligible for garbage collection.
Thread thread = new Thread( new Runnable() {
#Override
public void run() {
while ( true ) {
foo();
}
}
public void foo() {
StringBuilder strBuilder = new StringBuilder("This new object is allocated on the heap.");
System.out.println( strBuilder );
}
});
thread.run();
Within a class that extends Thread, consider the following example:
public void run() {
while (workToDo) {
JSONObject json = new JSONObject(getNextMap());
publishJSON(json.toString());
// thread sleep
}
}
Is each instance of json still referenced as long as the thread is running, or are they freed each time new is called? Should this be moved to a method, i.e. publishJSON(getJson(getNextMap())?
To have a reference to object then it must be a local used variable (while in local scope) or contained in a member variable of a class instance.
I don't see any of the two in your example since after each while iteration the local variable can't be referenced anymore. So, unless you do something with json that saves a reference to it elsewhere, they are eligible for garbage collection.
Mind that this doesn't imply that the GC will collect the no-more-referenced instances after every iteration since its behavior is not predictable from a developer point of view. You just know that eventually they'll be collected.