Suppose there is an object a of class A, which holds a reference to another object b of class B. And this is the only reference to b. So now, if all references to a is removed, then a is ready to GC. Does this mean that b is also ready to get garbage collected? because, though b has a reference ( inside a ), it is unreachable, because a is unreachable.
So how exactly does this scenario works? I mean the order of garbage collection.
Once an object is unreachable from a root, it will be collected. See this question for an explanation of GC roots.
Entire subgraphs will be collected (as you describe) presuming no node within that subgraph may be reached.
Java (and .NET) use mark and sweep garbage collection which deals with this kind of problem.
Reference count-based systems (such as C++'s std::shared_ptr<T>) may fail in the case of circular dependencies that remain unreachable. This is not a problem for Java/.NET GC.
Java GC is smart enough to collect islands of isolated objects though they maybe pointing to each other. Hence, b also becomes eligible for garbage collection. The point to note here is that although you have a reference to b it's not live in the sense that it cannot be reached from your program's root.
It depends on the GC. A JVM can be told to use different GC's and typically uses 3 GC's as one (eden, copy, markcompact).
In any typical GC and in refcounting the situation you described is handled cleanly, both objs are collected. Think of it in 2 stages: first "a" gets noticed and collected then "b" gets noticed and collected. Again: the specific means of noticing depends on the GC.
Even if the objects reference internally to each other, and they do not have a reachable reference from program, they will be eligible for GC. THere is a good explanation with diagrams here
http://www.thejavageek.com/2013/06/22/how-do-objects-become-eligible-for-garbage-collection/
That is exactly the point of GC. Since b is unreachable from main thread, it will be garbage collected. It is not just the count that matters.
Related
I'm reading this article and I can't really understand how the finalizable objects (objects which override the finalize method) takes at least 2 GC cycles before it can be reclaimed.
It takes at least two garbage collection cycles (in the best case) before a finalizeable object can be reclaimed.
Can someone also explain in detail how is it possible for a finalizable object to take more than one GC cycle for reclamation?
My logical argument is that when we override finalize method, the runtime will have to register this object with the garbage-collector (so that GC can call finalize of this object, which makes me think that GC will have reference to all the finalizable objects). And for this, GC will have to keep a strong reference to the finalizable object. If that is the case then how this object became a candidate for reclamation by GC in the first place? I reach a contradiction by this theory.
PS: I understand that overriding finalize is not the recommended approach and this method is deprecated since Java 9.
You are right in that the garbage collector needs a reference to finalizable objects. Of course, this particular reference must not be considered when deciding whether the object is still reachable before the finalization. This implies special knowledge about the nature of this reference to the garbage collector.
When the garbage collector determines that an object is eligible for finalization, the finalizer will run, which implies that the object becomes strongly reachable again, at least as long as the finalizer is executed. After its finalization, the object must become unreachable again and this must be detected, before the object’s memory can be reclaimed. That’s why it takes at least two garbage collection cycles.
In case of the widely used Hotspot/OpenJDK environment (and likely also in IBM’s JVM), this is implemented by creating an instance of a special, non-public subclass of Reference, a Finalizer, right when an object, whose class has a non-trivial finalize() method, is created. Like with weak & soft references, these references are enqueued by the garbage collector when no strong reference to the referent exist, but they are not cleared, so the finalizer thread can read the object, making it strongly reachable again for the finalization. At this point, the Finalizer is cleared, but also not referenced anymore, so it would get collected like an ordinary object anyway, so by the next time the referent becomes unreachable, no special reference to it exists anymore.
For objects whose class has a “trivial finalizer”, i.e. the finalize() method inherited by java.lang.Object or an empty finalize() method, the JVM will take a short-cut and not create the Finalizer instance in the first place, so you could say, these objects, which make the majority of all objects, behave as if their finalizer did already run, right from the start.
Though you got your answer (which is absolutely correct), I want to add a small-ish addendum here. In general, references are of two types : strong and weak. Weak References are WeakReference/SoftReference/PhantomReference and Finalizer(s).
When a certain GC cycle traverses the heap graph and sees one of these weak references, it treats it in a special way. When it first encounters a dead finalizer reference (let's consider this being the first GC cycle), it has to resurrect the instance. finalize is an instance method, and it needs an actual instance to be invoked. So a GC first saw that this Object is dead, only to revive it moments later, to be able to call finalize on it. Once it calls that method on it, it marks the fact that it has already been called; so when the next cycle happens, it can be actually be GC-ed.
It would be incorrect to call this the second GC.
For example G1GC does partial clean-up of the heap (young and mixed), so it might not even capture this reference in the next cycle. It might not fall under its radar, as simple as that.
Other GCs, like Shenandoah, have flags that control on which iteration to handle these special references (ShenandoahRefProcFrequency, 5 by default).
So indeed there is a need for two cycles, but they do not have to be subsequent.
I have Order_Item class instance, and these are paths to GC Roots (excluding phantom/weak/soft references):
I have few questions:
1) I'm not sure if the Order_Item will be garbage collected.
I tried to run System.gc(), and the object remained in heap.
Is it allowed to be collected, according to provided image?
2) What "Native Stack" mean?
As far as I understood, it's accounted as GC root.
http://help.eclipse.org/mars/index.jsp?topic=%2Forg.eclipse.mat.ui.help%2Fconcepts%2Fgcroots.html
Why some object (i.e. Order 0x782032cf8) is kept in "Native Stack"?
3) If I have reference from GC Root to object A, that object will not be garbage collected? Right?
And if so, my Order_Item object can't be garbage collected?
4) If 3 is right, how may I find what keeps objects 0x7821da5e0 and 0x782032cf8, and how to dereference/remove them?
You cannot really force the garbage collector to delete a given object. You know that the object is kept alive as long as it is reachable by references from the given point in the program. But if the object gets "collectable", it might be collected soon, but if there is no pressure on memory, it may fool around for a long time.
Usually, there is not reason to really delete objects if there is enough memory. The only exception I know are passwords. Here, you use a char array and overwrite it with nonsense once you used it.
For the native stack: Your link indicates that the native stack keeps external resources, e.g. files.
Could anyone please tell me what will be with objects that refer to each other? How does java's GC resolve that issue? Thanks in advance!
If you have object A and B, and if the following conditions hold:
A references to B
B references to A
No other objects reference to any one of them
They are not root objects (e.g. objects in the constants pool etc)
then, these two objects will be garbage collected. This is called "circular reference".
This is because the mark-and-sweep GC will scan and find out all the objects that are reachable from the root objects. If A and B reference each other without any external reference, the mark-and-sweep GC won't be able to mark them as reachable, hence will be selected as candidates for GC.
There are a number of different mark-and-sweep implementations (naive mark-and-sweep, tri-colour etc). But the fundamental idea is the same. If an object cannot be reached from the root by direct/indirect references, it will be garbage collected.
There is a number of GCs. In the Young Generation, there is a copy collector.
What this does is find all the objects which are referenced from "root" objects such a thread stacks. e.g. the eden space is copied to a survivor space, and the survivor spaces are copied to each other. Anything left uncopied is cleared away.
This means if you have a bundle of objects which refer to each other and there is no strong reference to any of them, they will be discarded on the next collection. (The exception being soft references where the GC can choose to keep them or not)
On the slides I am revising from it says the following:
Live objects can be identified either by maintaining a count of the number of references to each object, or by tracing chains of references from the roots.
Reference counting is expensive – it needs action every time a reference changes and it doesn’t spot cyclical structures, but it can reclaim space incrementally.
Tracing involves identifying live objects only when you need to reclaim space – moving the cost from general access to the time at which the GC runs, typically only when you are out of memory.
I understand the principles of why reference counting is expensive but do not understand what
"doesn’t spot cyclical structures, but it can reclaim space incrementally." means. Could anyone help me out a little bit please?
Thanks
Reference counting doesn’t spot cyclical structures...
Let's say you have two objects O1 and O2. They reference each other: O1 -> O2 and O2 -> O1, and no other objects references them. They will both have reference count 1 (one referrer).
If neither O1 or O2 is reachable from a GC root, they can be safely garbage collected. This is not detected by counting references though, since they both have reference count > 0.
0 references is a sufficient but not necessary requirement for an object to be eligible for garbage collection.
...but it can reclaim space incrementally.
The incremental part refers to the fact that you can garbage collect some of the 0-referenced objects quickly, get interrupted and continue at another time without problems.
If a tracing-algorithm gets interrupted it will need to start over from scratch the next time it's scheduled. (The tree of references may have changed since it started!)
Simple reference counting cannot resolve cases, when A refers to B and B refers to A. In this case, both A and B will have reference count 1 and will not be collected even if there are no other references.
Reference counting can reclaim space immediately when some object's reference counter becomes 0. There is no need to wait for GC cycle, scan other objects, etc. So, in a sense, it works incrementally as it reclaims space from objects one by one.
"doesn't spot cyclical structures"
Let's say I've got an object A. A needs another object called B to do its work. But B needs another object called C to do its work. But C needs a pointer to A for some reason or other. So the dependency graph looks like:
A -> B -> C -> A
The reference count for an object should be the number of arrows pointing at it. In this example, every object has a reference count of one.
Let's say our main program created a structure like this during its execution, and the main program had a pointer to A, making A's count equal to two. What happens when this structure falls out of scope? A's reference count is decremented to one.
But notice! Now A, B, and C all have reference counts of one even though they're not reachable from the main program. So this is a memory leak. Wikipedia has details on how to solve this problem.
"it can reclaim space incrementally"
Most garbage collectors have a collection period during which they pause execution and free up objects no longer in use. In a mark-and-sweep system, this is the sweep step. The downside is that during the periods between sweeps, memory keeps growing and growing. An object may stop being used almost immediately after its creation, but it will never be reclaimed until the next sweep.
In a reference-counting system, objects are freed as soon as their reference count hits zero. There is no big pause or any big sweep step, and objects no longer in use do not just sit around waiting for collection, they are freed immediately. Hence the collection is incremental in that it incrementally collects any object no longer in use rather than bulk collecting all the objects that have fallen out of use since the last collection.
Of course, this incrementalism may come with its own pitfalls, namely that it might be less expensive to do a bulk GC rather than a lot of little ones, but that is determined by the exact implementation.
Imagine you have three objects (A, B, C): A has a reference on B, B has a reference on C and C has a reference on A. But no other object has any reference on one of these. It's an independent cyclical structure. Using traditional reference counting would prevent the garbage collector from remove the cycle because every object is still referenced. But as long as no one has a reference on one of the three they could/should be removed. I guess reclaiming space incrementally means the way reference counting works when finding unreferenced instances, no cycles etc.
An object can be released for garbage collecting if the reference count reaches 0.
For circular reference this will never happen as each object in the circle keeps a reference to another so they are all at least 1.
For that some graph theory is needed to detect references which are no longer attached to anything, like little islands in the Heap sea. In order to keep these in memory they must have some 'attachment' to a static variable somewhere.
That is what tracing does. It determines which parts of the heap are islands and can be freed and which are still attached to the mainland i.e. static variables smewhere.
Hopefully a simple question. Take for instance a Circularly-linked list:
class ListContainer
{
private listContainer next;
<..>
public void setNext(listContainer next)
{
this.next = next;
}
}
class List
{
private listContainer entry;
<..>
}
Now since it's a circularly-linked list, when a single elemnt is added, it has a reference to itself in it's next variable. When deleting the only element in the list, entry is set to null. Is there a need to set ListContainer.next to null as well for Garbage Collector to free it's memory or does it handle such self-references automagically?
Garbage collectors which rely solely on reference counting are generally vulnerable to failing to collection self-referential structures such as this. These GCs rely on a count of the number of references to the object in order to calculate whether a given object is reachable.
Non-reference counting approaches apply a more comprehensive reachability test to determine whether an object is eligible to be collected. These systems define an object (or set of objects) which are always assumed to be reachable. Any object for which references are available from this object graph is considered ineligible for collection. Any object not directly accessible from this object is not. Thus, cycles do not end up affecting reachability, and can be collected.
See also, the Wikipedia page on tracing garbage collectors.
Circular references is a (solvable) problem if you rely on counting the references in order to decide whether an object is dead. No java implementation uses reference counting, AFAIK. Newer Sun JREs uses a mix of several types of GC, all mark-and-sweep or copying I think.
You can read more about garbage collection in general at Wikipedia, and some articles about java GC here and here, for example.
The actual answer to this is implementation dependent. The Sun JVM keeps track of some set of root objects (threads and the like), and when it needs to do a garbage collection, traces out which objects are reachable from those and saves them, discarding the rest. It's actually more complicated than that to allow for some optimizations, but that is the basic principle. This version does not care about circular references: as long as no live object holds a reference to a dead one, it can be GCed.
Other JVMs can use a method known as reference counting. When a reference is created to the object, some counter is incremented, and when the reference goes out of scope, the counter is decremented. If the counter reaches zero, the object is finalized and garbage collected. This version, however, does allow for the possibility of circular references that would never be garbage collected. As a safeguard, many such JVMs include a backup method to determine which objects actually are dead which it runs periodically to resolve self-references and defrag the heap.
As a non-answer aside (the existing answers more than suffice), you might want to check out a whitepaper on the JVM garbage collection system if you are at all interested in GC. (Any, just google JVM Garbage Collection)
I was amazed at some of the techniques used, and when reading through some of the concepts like "Eden" I really realized for the first time that Java and the JVM actually could beat C/C++ in speed. (Whenever C/C++ frees an object/block of memory, code is involved... When Java frees an object, it actually doesn't do anything at all; since in good OO code, most objects are created and freed almost immediately, this is amazingly efficient.)
Modern GC's tend to be very efficient, managing older objects much differently than new objects, being able to control GCs to be short and half-assed or long and thorough, and a lot of GC options can be managed by command line switches so it's actually useful to know what all the terms actually refer to.
Note: I just realized this was misleading. C++'s STACK allocation is very fast--my point was about allocating objects that are able to exist after the current routine has finished (which I believe SHOULD be all objects--it's something you shouldn't have to think about if you are going to think in OO, but in C++ speed may make this impractical).
If you are only allocating C++ classes on the stack, it's allocation will be at least as fast as Java's.
Java collects any objects that are not reachable. If nothing else has a reference to the entry, then it will be collected, even though it has a reference to itself.
yes Java Garbage collector handle self-reference!
How?
There are special objects called called garbage-collection roots (GC roots). These are always reachable and so is any object that has them at its own root.
A simple Java application has the following GC roots:
Local variables in the main method
The main thread
Static variables of the main class
To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. It works as follows
The algorithm traverses all object references, starting with the GC
roots, and marks every object found as alive.
All of the heap memory that is not occupied by marked objects is
reclaimed. It is simply marked as free, essentially swept free of
unused objects.
So if any object is not reachable from the GC roots(even if it is self-referenced or cyclic-referenced) it will be subjected to garbage collection.
Simply, Yes. :)
Check out http://www.ibm.com/developerworks/java/library/j-jtp10283/
All JDKs (from Sun) have a concept of "reach-ability". If the GC cannot "reach" an object, it goes away.
This isn't any "new" info (your first to respondents are great) but the link is useful, and brevity is something sweet. :)