Garbage Collection in Java

Garbage Collection in Java - java

On the slides I am revising from it says the following:
Live objects can be identified either by maintaining a count of the number of references to each object, or by tracing chains of references from the roots.
Reference counting is expensive – it needs action every time a reference changes and it doesn’t spot cyclical structures, but it can reclaim space incrementally.
Tracing involves identifying live objects only when you need to reclaim space – moving the cost from general access to the time at which the GC runs, typically only when you are out of memory.
I understand the principles of why reference counting is expensive but do not understand what
"doesn’t spot cyclical structures, but it can reclaim space incrementally." means. Could anyone help me out a little bit please?
Thanks

Reference counting doesn’t spot cyclical structures...
Let's say you have two objects O1 and O2. They reference each other: O1 -> O2 and O2 -> O1, and no other objects references them. They will both have reference count 1 (one referrer).
If neither O1 or O2 is reachable from a GC root, they can be safely garbage collected. This is not detected by counting references though, since they both have reference count > 0.
0 references is a sufficient but not necessary requirement for an object to be eligible for garbage collection.
...but it can reclaim space incrementally.
The incremental part refers to the fact that you can garbage collect some of the 0-referenced objects quickly, get interrupted and continue at another time without problems.
If a tracing-algorithm gets interrupted it will need to start over from scratch the next time it's scheduled. (The tree of references may have changed since it started!)

Simple reference counting cannot resolve cases, when A refers to B and B refers to A. In this case, both A and B will have reference count 1 and will not be collected even if there are no other references.
Reference counting can reclaim space immediately when some object's reference counter becomes 0. There is no need to wait for GC cycle, scan other objects, etc. So, in a sense, it works incrementally as it reclaims space from objects one by one.

"doesn't spot cyclical structures"
Let's say I've got an object A. A needs another object called B to do its work. But B needs another object called C to do its work. But C needs a pointer to A for some reason or other. So the dependency graph looks like:
A -> B -> C -> A
The reference count for an object should be the number of arrows pointing at it. In this example, every object has a reference count of one.
Let's say our main program created a structure like this during its execution, and the main program had a pointer to A, making A's count equal to two. What happens when this structure falls out of scope? A's reference count is decremented to one.
But notice! Now A, B, and C all have reference counts of one even though they're not reachable from the main program. So this is a memory leak. Wikipedia has details on how to solve this problem.
"it can reclaim space incrementally"
Most garbage collectors have a collection period during which they pause execution and free up objects no longer in use. In a mark-and-sweep system, this is the sweep step. The downside is that during the periods between sweeps, memory keeps growing and growing. An object may stop being used almost immediately after its creation, but it will never be reclaimed until the next sweep.
In a reference-counting system, objects are freed as soon as their reference count hits zero. There is no big pause or any big sweep step, and objects no longer in use do not just sit around waiting for collection, they are freed immediately. Hence the collection is incremental in that it incrementally collects any object no longer in use rather than bulk collecting all the objects that have fallen out of use since the last collection.
Of course, this incrementalism may come with its own pitfalls, namely that it might be less expensive to do a bulk GC rather than a lot of little ones, but that is determined by the exact implementation.

Imagine you have three objects (A, B, C): A has a reference on B, B has a reference on C and C has a reference on A. But no other object has any reference on one of these. It's an independent cyclical structure. Using traditional reference counting would prevent the garbage collector from remove the cycle because every object is still referenced. But as long as no one has a reference on one of the three they could/should be removed. I guess reclaiming space incrementally means the way reference counting works when finding unreferenced instances, no cycles etc.

An object can be released for garbage collecting if the reference count reaches 0.
For circular reference this will never happen as each object in the circle keeps a reference to another so they are all at least 1.
For that some graph theory is needed to detect references which are no longer attached to anything, like little islands in the Heap sea. In order to keep these in memory they must have some 'attachment' to a static variable somewhere.
That is what tracing does. It determines which parts of the heap are islands and can be freed and which are still attached to the mainland i.e. static variables smewhere.

Related

Collecting old objects from java heap

I have Order_Item class instance, and these are paths to GC Roots (excluding phantom/weak/soft references):
I have few questions:
1) I'm not sure if the Order_Item will be garbage collected.
I tried to run System.gc(), and the object remained in heap.
Is it allowed to be collected, according to provided image?
2) What "Native Stack" mean?
As far as I understood, it's accounted as GC root.
http://help.eclipse.org/mars/index.jsp?topic=%2Forg.eclipse.mat.ui.help%2Fconcepts%2Fgcroots.html
Why some object (i.e. Order 0x782032cf8) is kept in "Native Stack"?
3) If I have reference from GC Root to object A, that object will not be garbage collected? Right?
And if so, my Order_Item object can't be garbage collected?
4) If 3 is right, how may I find what keeps objects 0x7821da5e0 and 0x782032cf8, and how to dereference/remove them?

You cannot really force the garbage collector to delete a given object. You know that the object is kept alive as long as it is reachable by references from the given point in the program. But if the object gets "collectable", it might be collected soon, but if there is no pressure on memory, it may fool around for a long time.
Usually, there is not reason to really delete objects if there is enough memory. The only exception I know are passwords. Here, you use a char array and overwrite it with nonsense once you used it.
For the native stack: Your link indicates that the native stack keeps external resources, e.g. files.

Should variables be declared inside the loop or outside the loop in java [duplicate]

This question already has answers here:
Declaring variables inside or outside of a loop
(20 answers)
Closed 8 years ago.
I know similar question has been asked many times previously but I am still not convinced about when objects become eligible for GC and which approach is more efficient.
Approach one:
for (Item item : items) {
MyObject myObject = new MyObject();
//use myObject.
}
Approach Two:
MyObject myObject = null;
for (Item item : items) {
myObject = new MyObject();
//use myObject.
}
I understand: "By minimizing the scope of local variables, you increase the readability and maintainability of your code and reduce the likelihood of error". (Joshua Bloch).
But How about performance/memory consumption? In Java Objects are Garbage collected when there is no reference left to the object. If there are e.g. 100000 items then 100000 objects will be created. In Approach One each object will have a reference (myObject) to it so they are not eligible for GC?
Where as in Approach Two with every loop iteration you are removing reference from the object created in previous iteration. so surely objects start becoming eligible after the first loop iteration.
Or is it a trade off between performance and code readability & maintainability?
What have I misunderstood?
Note:
Assuming I care about performance and myObject is not needed after the loop.
Thanks In Advance

If there are e.g. 100000 items then 100000 objects will be created in Approach One and each object will have a reference (myObject) to it so they are not eligible for GC?
No, from Garbage Collector's point of view both the approaches work the same i.e. no memory is leaked. With approach two, as soon as the following statement runs
myObject = new MyObject();
the previous MyObject that was being referenced becomes an orphan (unless while using that Object you passed it around, say, to another method where that reference was saved) and is eligible for garbage collection.
The difference is that once the loop runs out you would have the last instance of MyObject still reachable through the myObject reference originally created outside the loop.
Does GC know when references go out of scope during the loop execution or it can only know at the end of method?
First of all there's only one reference, not references. It's the objects that are getting unreferenced in the loop. Secondly, the garbage collection doesn't kick in spontaneously. So forget the loop, it may not even happen when the method exits.
Notice that I said, orphan objects become eligible for gc, not that they get collected immediately. Garbage collection never happens in real time, it happens in phases. In the mark phase, all the objects that are not reachable through a live thread anymore are marked for deletion. Then in the sweep phase, memory is reclaimed and additionally compacted much like defragmenting a hard drive. So, it works more like a batch rather than piecemeal operations.
GC isn't bothered about scopes or methods as such. It only looks for unreferenced objects and it does so when it feels like doing it. You can't force it. The only thing that you can be sure of is that GC would run if the JVM is running out of memory but you can't pin exactly when it would do so.
But, all this does not mean that GC can't kick in while the method executes or even while the loop is running. If you had, say, a Message Processor that processed 10,000 messages every 10 mins or so and then slept in between i.e. the bean waits within the loop, does 10,000 iterations and then waits again; GC would definitely kick into action to reclaim memory even though the method hasn't run to completion yet.

You have misunderstood when objects become eligible for GC - they do this when they are no longer reachable from an active thread. In this context that means:
When the only reference to them goes out of scope (approach 1).
When the only reference to them is assigned another value (approach 2).
So, the instance of MyObject would be eligible for GC at the end of each loop iteration whichever approach was used. The difference (theoretically) between the two approaches is that the JVM would have to allocate memory for a new object reference each iteration in approach 1 but not in approach 2. However, this assumes the Java compiler and/or Just-In-Time compiler is not smart to optimise approach 1 to actually act like approach 2.
In any case, I would go for the more readable and less error prone approach 1 on the grounds that:
The performance overhead for a single object reference allocation is tiny.
It will probably get optimised away anyway.

In both approaches objects will get Garbage collected.
In Approach 1: As and when for loop exits , all the local variable inside for loop get Garbage collected , as the loop ends.
In Approach 2 : As when new new reference is assigned to myObject variable the earlier has no proper reference .So that earlier get garbage collected and so on until loop runs.
So in both approaches there is no performance bottle neck.

I would not expect declaring the variable inside a block to have a detrimental impact on performance.
At least notionally the JVM allocates the stack frame at the start of the method and destroys it at the end. By implication will have the cumulative size to accommodate all the local variables.
See section 2.6 in here:
http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html
That is consistent with other languages such as C where resizing the stack frame as the function/method executes is an overhead with no apparent return.
So wherever you declare it shouldn't make a difference.
Indeed declaring variables in blocks may help the compiler realize that the effective size of the stack frame can be smaller:
void foo() {
int x=6;
int y=7;
int z=8;
//.....
}
Versus
void bar() {
{
int x=6;
//....
}
{
int y=7;
//....
}
{
int z=8;
//....
}
}
Notice that bar() clearly only needs one local variable not 3.
Though making the stack frame smaller is unlikely to have any real influence on performance!
However when a reference goes out of the scope may make the object it references available for garbage collection. You would otherwise need to set references to null which is an untidy and unnecessary bother (and tinsy weenie overhead).
Without question you should declare variables inside a loop if (and only if) you don't need to access them outside the loop.
IMHO blocked statements (like bar has above) are under used.
If a method proceeds in stages you can protect the later stages from variable pollution using blocks.
With suitable (short) comments it can often be more readable (and efficient) way of structuring code than breaking it down it lost of private methods.
I have a chunky algorithm (Hashlife) where making earlier artifacts available for garbage collection during the method can make the difference between getting to the end and getting OutOfMemoryError.

Is an object garbage if it is referenced only from garbage?

Suppose there is an object a of class A, which holds a reference to another object b of class B. And this is the only reference to b. So now, if all references to a is removed, then a is ready to GC. Does this mean that b is also ready to get garbage collected? because, though b has a reference ( inside a ), it is unreachable, because a is unreachable.
So how exactly does this scenario works? I mean the order of garbage collection.

Once an object is unreachable from a root, it will be collected. See this question for an explanation of GC roots.
Entire subgraphs will be collected (as you describe) presuming no node within that subgraph may be reached.
Java (and .NET) use mark and sweep garbage collection which deals with this kind of problem.
Reference count-based systems (such as C++'s std::shared_ptr<T>) may fail in the case of circular dependencies that remain unreachable. This is not a problem for Java/.NET GC.

Java GC is smart enough to collect islands of isolated objects though they maybe pointing to each other. Hence, b also becomes eligible for garbage collection. The point to note here is that although you have a reference to b it's not live in the sense that it cannot be reached from your program's root.

It depends on the GC. A JVM can be told to use different GC's and typically uses 3 GC's as one (eden, copy, markcompact).
In any typical GC and in refcounting the situation you described is handled cleanly, both objs are collected. Think of it in 2 stages: first "a" gets noticed and collected then "b" gets noticed and collected. Again: the specific means of noticing depends on the GC.

Even if the objects reference internally to each other, and they do not have a reachable reference from program, they will be eligible for GC. THere is a good explanation with diagrams here
http://www.thejavageek.com/2013/06/22/how-do-objects-become-eligible-for-garbage-collection/

That is exactly the point of GC. Since b is unreachable from main thread, it will be garbage collected. It is not just the count that matters.

Preventing java garbage collection by making cyclic linked lists? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Garbage Collection in Java and Circular References
Java runs it's garbage collector whenever there is not enough space in the memory . It does so by deleting the unreferenced objects. So what if an object has a pointer that points to itself ,or any cyclic pointing structure that always has a pointer to it ? Will the garbage collector fail or will it recognize any such insidious attempts made by us in order to make it fail ?

Whether the objects will be collected depends on the Garbage collecting algorithm
reference-counting GC, will not collect cyclic references
but, modern GC algorithms usually decide eligibility of an object for collection by traversing the heap starting from the root set.
Garbage collector will collect objects that are not accessible from the root set of references (current local variables, static references, operand stacks of stack frames). This means that even objects that are referenced by some other objects can still be collected.
Therefore, your cyclic-pointer structure will get collected even though each object is referenced iff there is no other reference path going transitively from the root set to the objects forming the cycle.
See also What triggers the Java Garbage Collector.

The Java garbage collector will not reclaim any memory of objects that are pointed to by static or local variables, or by any objects linked (recursively) from those objects. Thus if you have large linked trees of objects, the garbage collector will not reclaim that space if there is a variable pointing to any root of a tree or loop.
The memory for objects linked in a loop will be reclaimed as long as there is no static or local variable pointing to an object in the loop. Thus if you want garbage collection to work efficiently, be sure to null out variables when the objects are no longer needed.
The garbage collector does detect, and does not care, if objects are linked in cyclic ways. The only thing that matters with whether the object can be reached from a variable.

"A references object B and B references A" is called an island of isolation. GC is smart enough to detect such things. If there is no reference to any object in an island the whole island is eligible for garbage collection

How does Java Garbage collector handle self-reference?

Hopefully a simple question. Take for instance a Circularly-linked list:
class ListContainer
{
private listContainer next;
<..>
public void setNext(listContainer next)
{
this.next = next;
}
}
class List
{
private listContainer entry;
<..>
}
Now since it's a circularly-linked list, when a single elemnt is added, it has a reference to itself in it's next variable. When deleting the only element in the list, entry is set to null. Is there a need to set ListContainer.next to null as well for Garbage Collector to free it's memory or does it handle such self-references automagically?

Garbage collectors which rely solely on reference counting are generally vulnerable to failing to collection self-referential structures such as this. These GCs rely on a count of the number of references to the object in order to calculate whether a given object is reachable.
Non-reference counting approaches apply a more comprehensive reachability test to determine whether an object is eligible to be collected. These systems define an object (or set of objects) which are always assumed to be reachable. Any object for which references are available from this object graph is considered ineligible for collection. Any object not directly accessible from this object is not. Thus, cycles do not end up affecting reachability, and can be collected.
See also, the Wikipedia page on tracing garbage collectors.

Circular references is a (solvable) problem if you rely on counting the references in order to decide whether an object is dead. No java implementation uses reference counting, AFAIK. Newer Sun JREs uses a mix of several types of GC, all mark-and-sweep or copying I think.
You can read more about garbage collection in general at Wikipedia, and some articles about java GC here and here, for example.

The actual answer to this is implementation dependent. The Sun JVM keeps track of some set of root objects (threads and the like), and when it needs to do a garbage collection, traces out which objects are reachable from those and saves them, discarding the rest. It's actually more complicated than that to allow for some optimizations, but that is the basic principle. This version does not care about circular references: as long as no live object holds a reference to a dead one, it can be GCed.
Other JVMs can use a method known as reference counting. When a reference is created to the object, some counter is incremented, and when the reference goes out of scope, the counter is decremented. If the counter reaches zero, the object is finalized and garbage collected. This version, however, does allow for the possibility of circular references that would never be garbage collected. As a safeguard, many such JVMs include a backup method to determine which objects actually are dead which it runs periodically to resolve self-references and defrag the heap.

As a non-answer aside (the existing answers more than suffice), you might want to check out a whitepaper on the JVM garbage collection system if you are at all interested in GC. (Any, just google JVM Garbage Collection)
I was amazed at some of the techniques used, and when reading through some of the concepts like "Eden" I really realized for the first time that Java and the JVM actually could beat C/C++ in speed. (Whenever C/C++ frees an object/block of memory, code is involved... When Java frees an object, it actually doesn't do anything at all; since in good OO code, most objects are created and freed almost immediately, this is amazingly efficient.)
Modern GC's tend to be very efficient, managing older objects much differently than new objects, being able to control GCs to be short and half-assed or long and thorough, and a lot of GC options can be managed by command line switches so it's actually useful to know what all the terms actually refer to.
Note: I just realized this was misleading. C++'s STACK allocation is very fast--my point was about allocating objects that are able to exist after the current routine has finished (which I believe SHOULD be all objects--it's something you shouldn't have to think about if you are going to think in OO, but in C++ speed may make this impractical).
If you are only allocating C++ classes on the stack, it's allocation will be at least as fast as Java's.

Java collects any objects that are not reachable. If nothing else has a reference to the entry, then it will be collected, even though it has a reference to itself.

yes Java Garbage collector handle self-reference!
How?
There are special objects called called garbage-collection roots (GC roots). These are always reachable and so is any object that has them at its own root.
A simple Java application has the following GC roots:
Local variables in the main method
The main thread
Static variables of the main class
To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. It works as follows
The algorithm traverses all object references, starting with the GC
roots, and marks every object found as alive.
All of the heap memory that is not occupied by marked objects is
reclaimed. It is simply marked as free, essentially swept free of
unused objects.
So if any object is not reachable from the GC roots(even if it is self-referenced or cyclic-referenced) it will be subjected to garbage collection.

Simply, Yes. :)
Check out http://www.ibm.com/developerworks/java/library/j-jtp10283/
All JDKs (from Sun) have a concept of "reach-ability". If the GC cannot "reach" an object, it goes away.
This isn't any "new" info (your first to respondents are great) but the link is useful, and brevity is something sweet. :)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.