Garbage Collection in C/C++ versus Java

Garbage Collection in C/C++ versus Java - java

There is no Automatic Garbage Collection in C/C++.
Assume that I wrote a simple program in C/C++ and created a single object.
Assume that there are exactly 10 or extremely limited number of addresses for allocation.
I have a for loop running for 100 times inside which this single object is created every time the loop is run.
In Java, because there is Automatic Garbage collection, after a single loop is executed the address of the object is removed each time automatically.
Rough Example:
for(int i = 0; i < 100; i++)
{
Object o = new Object;
}
In C/C++ we have to manually remove the Object address inside the for loop. Do we have to reboot each time as well to properly remove the Object reference in C++?
For those who say that there are no problems deleting an object more than once in C++. from Wikipedia:
When an object is deleted more than
once, or when the programmer attempts
to release a pointer to an object not
allocated from the free store,
catastrophic failure of the dynamic
memory management system can result.
The result of such actions can include
heap corruption, premature destruction
of a different (and newly created)
object which happens to occupy the
same location in memory as the
multiply deleted object, and other
forms of undefined behavior.
The link: Manual Memory Management
So there IS a risk?

C++ doesn't have (built-in) garbage collection, but that doesn't mean you should let your dynamically allocated memory stay allocated forever. You can and should free it. The most basic way to that would be to free your object reference (or pointer, since this is C++) manually when you don't need it anymore:
for(int i = 0; i < 100; i++)
{
// Dynamically allocate an object
Object* o = new Object();
// do something with object
// Release object memory (and call object destructor, if there is one)
delete o;
}
If you allocated your object on the stack however, it always gets automatically released when it goes out of scope, and you don't have to wait for garbage collection to happen - it's always released immediately:
for(int i = 0; i < 100; i++)
{
// Create a new object on the stack
Object o = Object(); // Note we're not using the new keyword here.
// Do something with object
// Object gets automatically deallocated, or more accurately, in this specific
// case (a loop), the compiler will optimize things, so object's destructor
// will get called, but the object's stack memory will be reused.
}
This behavior of C++ stack values (automatic destruction when they go out of scope), which is awkwardly termed RAII (Resource Acquisition Is Initialization) allows for very nice things that Java just can't do. One of them is smart pointers, that allow dynamically allocated objects to be freed automatically just like their stack counterparts, and you can use sophisticated smart pointers to implement your own version of garbage collection if you want.
Another advantage of RAII is that finally-blocks are rarely necessary in C++: local variables that reference to a resource that should be freed immediately are usually allocated on the stack, and therefore get released automatically. No need for finally blocks here.
Practically speaking, when programming in C++ you'd usually put local variables on the stack (and get all the advantages of RAII without lifting a finger), unless you need to keep them alive for longer than that (e.g. you need to create an object and store a reference to it that stays alive when you leave the function that created the object). In that case, you can use pointers directly, but if you don't want to deal with manually deleting pointers and all the problems it can lead to, you'd usually use a smart pointer. A smart pointer is an object allocated on the stack, that wraps a 'dumb' (i.e. regular) pointer, and deletes the pointer when its destructor gets called. More advanced version of smart pointers can implement reference counting (so the pointed object will be released only when all smart pointers referencing it go out of scope) and even garbage collection. The standard library comes with two smart pointers: auto_ptr and smart_ptr (the latter is reference-counting, but some old compiler may not support it).

Java doesn't automatically remove the object at the end of each loop. Instead it waits until there is a lot of garbage and then goes through and collects it. Java makes no guarantees about how long it will be before the object is collected.
In C++, once the program has exited all resources are returned. You don't need to reboot (assuming a reasonable operating system) in order to make sure that resources are returned. Your operating system will take care of making sure that any resources your program didn't release get released when it is shutdown.
If you delete the object in C++, then its gone right then. you don't have to do anything else to recover the memory.

C++ certainly does have automatic storage duration; it just has other types of storage duration as well, and the ability to choose what is most appropriate.
In something like your example, you would use automatic storage:
for (int i = 0; i < 100; ++i) {
Object o;
// The object is automatically destroyed after each iteration
}
If the object is required to outlive the loop, perhaps because it is passed to another object to manage, then you would use smart pointers (the closest C++ equivalent to Java's references). Here are examples using auto_ptr and shared_ptr:
// some function that takes ownership of an object
void register(auto_ptr<Object> const & o);
void register(shared_ptr<Object> const & o);
for (int i = 0; i < 100; ++i) {
auto_ptr<Object> o(new Object);
register(o); // ownership may be transferred; our pointer is now null in that case
// The pointer is automatically destroyed after each iteration,
// deleting the object if it still owns it.
}
for (int i = 0; i < 100; ++i) {
shared_ptr<Object> o(new Object);
register(o); // ownership is shared; our pointer is still valid
// The pointer is automatically destroyed after each iteration,
// deleting the object if there are no other shared pointers to it.
}
In none of these cases do you need to manually delete the object; that is only necessary when dealing with raw object pointers which, in my opinion, should only be done when absolutely necessary.
C++ also has an advantage over Java here: destruction is always deterministic (that is, you know exactly when it happens). In Java, once an object is discarded, you do not know exactly when (or even if) the garbage collector will remove it. This means that, if the object manages a resource (such as a lock, or a database connection) that needs to be released after use, then it is up to the user of the object to manually release it. In C++, this can be done automatically in the destructor, making the class easier and less error-prone to use.

There is no Automatic Garbage Collection in C/C++.
False. C and C++ don't mandate automatic garbage collection, but it's available anyway (e.g., see the Boehm/Demers/Weiser collector).
In C/C++ we have to manually remove the Object address inside the for loop.
Again, false. In C or C++, we'd define the object using the automatic storage class, which would deterministically destroy the object:
for(int i = 0; i < 100; i++)
{
Object o;
}
Just for example, let's do a quick test, by defining Object something like this:
struct Object {
Object() { std::cout << "Created an Object\n"; }
~Object() { std::cout << "Destroyed an Object\n"; }
};
For the loop above, this generates:
Created an Object
Destroyed an Object
Created an Object
Destroyed an Object
Created an Object
Destroyed an Object
[97 more repetitions of the same pattern removed ]
Do we have to reboot each time as well to properly remove the Object reference in C++?
No, of course not. The fundamental difference between C++ and Java in this respect is that in C++ the object destruction is deterministic, but in Java it's not. Just for example, in C++, the code above must follow exactly the prescribed pattern -- the body of the loop is a block, and the object must be created on entry to the block, and destroyed on exit from the block. In Java, you get no such assurance. It might only destroy the first object after it has allocated 10 or 20. The objects may be destroyed in any order, and there's not really a guarantee that any particular object will be destroyed at all.
That difference isn't always important, but certainly can be. In C++, it's used to support RAII (aka., SBRM -- stack bound resource management). This used to assure not only that memory is freed when no longer needed (i.e., the same thing Java's automatic garbage collector handles) but also that other resources (anything from files to widgets to network or database connections) are handled automatically as well.

If you have short lived objects and performance is critical (most of the time it is not) you can create a mutable object which is reused on each loop. This moves work from a per-iteration to per-loop.
List list = new ArrayList(); // mutable object.
for(int i = 0; i < 100; i++) {
list.clear();
// do something with the list.
}
// one list is freed later.
You can make the list a member field (meaning the mutable object might never be freed), which is fine, provided your class doesn't need to be thread safe.

Assuming you are on a typical operating system (Windows/Linux) - no you do not need to reboot. The OS will protect you via the Process/Virtual Memory structure.
Only your process will run out of memory. The OS will clean up after you when you process ends.
Running on many small embedded systems without an OS - yes you would crash or lockup the processor and require a reboot.

Adding to the already given answers, you can always write signal handlers, upon receiving which you can clean up the memory used by your process..

Normally the OS deallocate your program when you quit it. But some OS might not (I've seen that on a http://www.beck-ipc.com/en/products/sc1x/sc13.asp : RTOS didn't deallocate on exit so I ran out of memory after a couple of launch when I didn't deallocate all my objects, so yes I had to reboot).
But most OS will clear/deallocate previously used programs (linux and windows does), so you shouldn't have to worry.

Related

Avoiding objects garbage collection

TLDR: How can I force the JVM not to garbage collect my objects, even if I don't want to use them in any meaningful way?
Longer story:
I have some Items which are loaded from a permanent storage and held as weak references in a cache. The weak reference usage means that, unless someone is actually using a specific Item, there are no strong references to them and unused ones are eventually garbage collected. This is all desired behaviour and works just fine. Additionally, sometimes it is necessary to propagate changes of an Item into the permanent storage. This is done asynchronously in a dedicated writer thread. And here comes the problem, because I obviously cannot allow the Item to be garbage collected before the update is finished. The solution I currently have is to include a strong reference to the Item inside the update object (the Item is never actually used during the update process, just held).
public class Item {
public final String name;
public String value;
}
public class PendingUpdate {
public final Item strongRef; // not actually necessary, just to avoid GC
public final String name;
public final String newValue;
}
But after some thinking and digging I found this paragraph in JavaSE specs (12.6.1):
Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
Which, if I understand it correctly, means that java can just decide that the Item is garbage anyway. One solution would be to do some unnecessary operation on the Item like item.hashCode(); at the end of the storage update code. But I expect that a JVM might be smart enough to remove such unnecessary code anyway and I cannot think of any reasonable solution that a sufficiently smart JVM wouldn't be able to release sooner than needed.
public void performStorageUpdate(PendingUpdate update) {
final Transaction transaction = this.getDataManager().beginTransaction();
try {
// ... some permanent storage update code
} catch (final Throwable t) {
transaction.abort();
}
transaction.commit();
// The Item should never be garbage collected before this point
update.item.hashCode(); // Trying to avoid GC of the item, is probably not enough
}
Has anyone encounter a similar problem with weak references? Are there some language guarantees that I can use to avoid GC for such objects? (Ideally causing as small performance hit as possible.) Or am I overthinking it and the specification paragraph mean something different?
Edit: Why I cannot allow the Item to be garbage collected before the storage update finishes:
Problematic event sequence:
Item is loaded into cache and is used (held as a strong reference)
An update to the item is enqueued
Strong reference to the Item is dropped and there are no other strong references to the item (besides the one in the PendingUpdate, but as I explained, I think that that one can be optimized away by JVM).
Item is garbage collected
Item is requested again and is loaded from the permanent storage and a new strong reference to it is created
Update to the storage is performed
Result state: There are inconsistent data inside the cache and the permanent storage. Therefore, I need to held the strong reference to the Item until the storage update finishes, but I just need to hold it I don't actually need to do anything with it (so JVM is probably free to think that it is safe to get rid off).

TL;DR How can I force the JVM not to garbage collect my objects, even if I don't want to use them in any meaningful way?
Make them strongly reachable; e.g. by adding them to a strongly reachable data structure. If objects are strongly reachable then the garbage collector won't break weak references to them.
When you finish have finished the processing where the objects need to remain in the cache you can clear the data structure to break the above strong references. The next GC run then will be able to break the weak references.
Which, if I understand it correctly, means that java can just decide that the Item is garbage anyway.
That's not what it means.
What it really means that the infrastructure may be able to determine that an object is effectively unreachable, even though there is still a reference to it in a variable. For example:
public void example() {
int[] big = new int[1000000];
// long computation that doesn't use 'big'
}
If the compiler / runtime can determine that the object that big refers to cannot be used1 during the long computation, it is permitted to garbage collect it ... during the long computation.
But here's the thing. It can only do this if the object cannot be used. And if it cannot be used, there is no reason not to garbage collect it.
1 - ... without traversing a reference object.
For what it is worth, the definition of strongly reachable isn't just that there is a reference in a local variable. The definition (in the javadocs) is:
"An object is strongly reachable if it can be reached by some thread without traversing any reference objects. A newly-created object is strongly reachable by the thread that created it."
It doesn't specify how the object can be reached by the thread. Or how the runtime could / might deduce that no thread can reach it.
But the implication is clear that if threads can only access the object via a reference object, then it is not strongly reachable.
Ergo ... make the object strongly reachable.

java - Memory management - Destroying a list when we don't need it [duplicate]

Does assigning an unused object reference to null in Java improve the garbage collection process in any measurable way?
My experience with Java (and C#) has taught me that is often counter intuitive to try and outsmart the virtual machine or JIT compiler, but I've seen co-workers use this method and I am curious if this is a good practice to pick up or one of those voodoo programming superstitions?

Typically, no.
But like all things: it depends. The GC in Java these days is VERY good and everything should be cleaned up very shortly after it is no longer reachable. This is just after leaving a method for local variables, and when a class instance is no longer referenced for fields.
You only need to explicitly null if you know it would remain referenced otherwise. For example an array which is kept around. You may want to null the individual elements of the array when they are no longer needed.
For example, this code from ArrayList:
public E remove(int index) {
RangeCheck(index);
modCount++;
E oldValue = (E) elementData[index];
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
Also, explicitly nulling an object will not cause an object to be collected any sooner than if it just went out of scope naturally as long as no references remain.
Both:
void foo() {
Object o = new Object();
/// do stuff with o
}
and:
void foo() {
Object o = new Object();
/// do stuff with o
o = null;
}
Are functionally equivalent.

In my experience, more often than not, people null out references out of paranoia not out of necessity. Here is a quick guideline:
If object A references object B and you no longer need this reference and object A is not eligible for garbage collection then you should explicitly null out the field. There is no need to null out a field if the enclosing object is getting garbage collected anyway. Nulling out fields in a dispose() method is almost always useless.
There is no need to null out object references created in a method. They will get cleared automatically once the method terminates. The exception to this rule is if you're running in a very long method or some massive loop and you need to ensure that some references get cleared before the end of the method. Again, these cases are extremely rare.
I would say that the vast majority of the time you will not need to null out references. Trying to outsmart the garbage collector is useless. You will just end up with inefficient, unreadable code.

Good article is today's coding horror.
The way GC's work is by looking for objects that do not have any pointers to them, the area of their search is heap/stack and any other spaces they have. So if you set a variable to null, the actual object is now not pointed by anyone, and hence could be GC'd.
But since the GC might not run at that exact instant, you might not actually be buying yourself anything. But if your method is fairly long (in terms of execution time) it might be worth it since you will be increasing your chances of GC collecting that object.
The problem can also be complicated with code optimizations, if you never use the variable after you set it to null, it would be a safe optimization to remove the line that sets the value to null (one less instruction to execute). So you might not actually be getting any improvement.
So in summary, yes it can help, but it will not be deterministic.

At least in java, it's not voodoo programming at all. When you create an object in java using something like
Foo bar = new Foo();
you do two things: first, you create a reference to an object, and second, you create the Foo object itself. So long as that reference or another exists, the specific object can't be gc'd. however, when you assign null to that reference...
bar = null ;
and assuming nothing else has a reference to the object, it's freed and available for gc the next time the garbage collector passes by.

It depends.
Generally speaking shorter you keep references to your objects, faster they'll get collected.
If your method takes say 2 seconds to execute and you don't need an object anymore after one second of method execution, it makes sense to clear any references to it. If GC sees that after one second, your object is still referenced, next time it might check it in a minute or so.
Anyway, setting all references to null by default is to me premature optimization and nobody should do it unless in specific rare cases where it measurably decreases memory consuption.

Explicitly setting a reference to null instead of just letting the variable go out of scope, does not help the garbage collector, unless the object held is very large, where setting it to null as soon as you are done with is a good idea.
Generally setting references to null, mean to the READER of the code that this object is completely done with and should not be concerned about any more.
A similar effect can be achieved by introducing a narrower scope by putting in an extra set of braces
{
int l;
{ // <- here
String bigThing = ....;
l = bigThing.length();
} // <- and here
}
this allows the bigThing to be garbage collected right after leaving the nested braces.

public class JavaMemory {
private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6);
public void f() {
{
byte[] data = new byte[dataSize];
//data = null;
}
byte[] data2 = new byte[dataSize];
}
public static void main(String[] args) {
JavaMemory jmp = new JavaMemory();
jmp.f();
}
}
Above program throws OutOfMemoryError. If you uncomment data = null;, the OutOfMemoryError is solved. It is always good practice to set the unused variable to null

I was working on a video conferencing application one time and noticed a huge huge huge difference in performance when I took the time to null references as soon as I didn't need the object anymore. This was in 2003-2004 and I can only imagine the GC has gotten even smarter since. In my case I had hundreds of objects coming and going out of scope every second, so I noticed the GC when it kicked in periodically. However after I made it a point to null objects the GC stopped pausing my application.
So it depends on what your doing...

Yes.
From "The Pragmatic Programmer" p.292:
By setting a reference to NULL you reduce the number of pointers to the object by one ... (which will allow the garbage collector to remove it)

I assume the OP is referring to things like this:
private void Blah()
{
MyObj a;
MyObj b;
try {
a = new MyObj();
b = new MyObj;
// do real work
} finally {
a = null;
b = null;
}
}
In this case, wouldn't the VM mark them for GC as soon as they leave scope anyway?
Or, from another perspective, would explicitly setting the items to null cause them to get GC'd before they would if they just went out of scope? If so, the VM may spend time GC'ing the object when the memory isn't needed anyway, which would actually cause worse performance CPU usage wise because it would be GC'ing more earlier.

Even if nullifying the reference were marginally more efficient, would it be worth the ugliness of having to pepper your code with these ugly nullifications? They would only be clutter and obscure the intent code that contains them.
Its a rare codebase that has no better candidate for optimisation than trying to outsmart the Garbage collector (rarer still are developers who succeed in outsmarting it). Your efforts will most likely be better spent elsewhere instead, ditching that crufty Xml parser or finding some opportunity to cache computation. These optimisations will be easier to quantify and don't require you dirty up your codebase with noise.

Oracle doc point out "Assign null to Variables That Are No Longer Needed" https://docs.oracle.com/cd/E19159-01/819-3681/abebi/index.html

"It depends"
I do not know about Java but in .net (C#, VB.net...) it is usually not required to assign a null when you no longer require a object.
However note that it is "usually not required".
By analyzing your code the .net compiler makes a good valuation of the life time of the variable...to accurately tell when the object is not being used anymore. So if you write obj=null it might actually look as if the obj is still being used...in this case it is counter productive to assign a null.
There are a few cases where it might actually help to assign a null. One example is you have a huge code that runs for long time or a method that is running in a different thread, or some loop. In such cases it might help to assign null so that it is easy for the GC to know its not being used anymore.
There is no hard & fast rule for this. Going by the above place null-assigns in your code and do run a profiler to see if it helps in any way. Most probably you might not see a benefit.
If it is .net code you are trying to optimize, then my experience has been that taking good care with Dispose and Finalize methods is actually more beneficial than bothering about nulls.
Some references on the topic:
http://blogs.msdn.com/csharpfaq/archive/2004/03/26/97229.aspx
http://weblogs.asp.net/pwilson/archive/2004/02/20/77422.aspx

In the future execution of your program, the values of some data members will be used to computer an output visible external to the program. Others might or might not be used, depending on future (And impossible to predict) inputs to the program. Other data members might be guaranteed not to be used. All resources, including memory, allocated to those unused data are wasted. The job of the garbage collector (GC) is to eliminate that wasted memory. It would be disastrous for the GC to eliminate something that was needed, so the algorithm used might be conservative, retaining more than the strict minimum. It might use heuristic optimizations to improve its speed, at the cost of retaining some items that are not actually needed. There are many potential algorithms the GC might use. Therefore it is possible that changes you make to your program, and which do not affect the correctness of your program, might nevertheless affect the operation of the GC, either making it run faster to do the same job, or to sooner identify unused items. So this kind of change, setting an unusdd object reference to null, in theory is not always voodoo.
Is it voodoo? There are reportedly parts of the Java library code that do this. The writers of that code are much better than average programmers and either know, or cooperate with, programmers who know details of the garbage collector implementations. So that suggests there is sometimes a benefit.

As you said there are optimizations, i.e. JVM knows the place when the variable was last used and the object referenced by it can be GCed right after this last point (still executing in current scope). So nulling out references in most cases does not help GC.
But it can be useful to avoid "nepotism" (or "floating garbage") problem (read more here or watch video). The problem exists because heap is split into Old and Young generations and there are different GC mechanisms applied: Minor GC (which is fast and happens often to clean young gen) and Major Gc (which causes longer pause to clean Old gen). "Nepotism" does not allow for garbage in Young gen to be collected if it is referenced by garbage which was already tenured to an Old gen.
This is 'pathological' because ANY promoted node will result in the promotion of ALL following nodes until a GC resolves the issue.
To avoid nepotism it's a good idea to null out references from an object which is supposed to be removed. You can see this technique applied in JDK classes: LinkedList and LinkedHashMap
private E unlinkFirst(Node<E> f) {
final E element = f.item;
final Node<E> next = f.next;
f.item = null;
f.next = null; // help GC
// ...
}

Garbage collection vs manual memory management

This is a very basic question. I will formulate it using C++ and Java, but it's really language-independent.
Consider a well-known problem in C++:
struct Obj
{
boost::shared_ptr<Obj> m_field;
};
{
boost::shared_ptr<Obj> obj1(new Obj);
boost::shared_ptr<Obj> obj2(new Obj);
obj1->m_field = obj2;
obj2->m_field = obj1;
}
This is a memory leak, and everybody knows it :). The solution is also well-known: one should use weak pointers to break the "refcount interlocking". It is also known that this problem cannot be resolved automatically in principle. It's solely programmer's responsibility to resolve it.
But there's a positive thing: a programmer has full control on refcount values. I can pause my program in debugger and examine refcount for obj1, obj2 and understand that there's a problem. I also can set a breakpoint in destructor of an object and observe a destruction moment (or find out that object has not been destroyed).
My question is about Java, C#, ActionScript and other "Garbage Collection" languages. I might be missing something, but in my opinion they
Do not let me examine refcount of objects
Do not let me know when object is destroyed (okay, when object is exposed to GC)
I often hear that these languages just do not allow a programmer to leak a memory and that's why they are great. As far as I understand, they just hide memory management problems and make it hard to solve them.
Finally, the questions themselves:
Java:
public class Obj
{
public Obj m_field;
}
{
Obj obj1 = new Obj();
Obj obj2 = new Obj();
obj1.m_field = obj2;
obj2.m_field = obj1;
}
Is it memory leak?
If yes: how do I detect and fix it?
If no: why?

Managed memory systems are built on the assumption that you don't want to be tracing memory leak issue in the first place. Instead of making them easier to solve you try to make sure they never happen in the first place.
Java does have a lose term for "Memory Leak" which means any growth in memory which could impact your application, but there is never a point that the managed memory cannot clean up all the memory.
JVM don't use reference counting for a number of reasons
it cannot handled circular references as you have observed.
it has significant memory and threading overhead to maintain accurately.
there are much better, simpler ways of handling such situations for managed memory.
While the JLS doesn't ban the use of reference counts, it is not used in any JVM AFAIK.
Instead Java keeps track of a number of root contexts (e.g. each thread stack) and can trace which objects need to be keeps and which can be discarded based on whether those objects are strongly reachable. It also provides the facility for weak references (which are retained as long as the objects are not cleaned up) and soft references (which are not generally cleaned up but can be at the garbage collectors discretion)

AFAIK, Java GC works by starting from a set of well-defined initial references and computing a transitive closure of objects which can be reached from these references. Anything not reachable is "leaked" and can be GC-ed.

Java has a unique memory management strategy. Everything (except a few specific things) are allocated on the heap, and isn't freed until the GC gets to work.
For example:
public class Obj {
public Object example;
public Obj m_field;
}
public static void main(String[] args) {
int lastPrime = 2;
while (true) {
Obj obj1 = new Obj();
Obj obj2 = new Obj();
obj1.example = new Object();
obj1.m_field = obj2;
obj2.m_field = obj1;
int prime = lastPrime++;
while (!isPrime(prime)) {
prime++;
}
lastPrime = prime;
System.out.println("Found a prime: " + prime);
}
}
C handles this situation by requiring you to manually free the memory of both 'obj', and C++ counts references to 'obj' and automatically destroys them when they go out of scope.
Java does not free this memory, at least not at first.
The Java runtime waits a while until it feels like there is too much memory being used. After that the Garbage collector kicks in.
Let's say the java garbage collector decides to clean up after the 10,000th iteration of the outer loop. By this time, 10,000 objects have been created (which would have already been freed in C/C++).
Although there are 10,000 iterations of the outer loop, only the newly created obj1 and obj2 could possibly be referenced by the code.
These are the GC 'roots', which java uses to find all objects which could possibly be referenced. The garbage collector then recursively iterates down the object tree, marking 'example' as active in addiction to the garbage collector roots.
All those other objects are then destroyed by the garbage collector.
This does come with a performance penalty, but this process has been heavily optimized, and isn't significant for most applications.
Unlike in C++, you don't have to worry about reference cycles at all, since only objects reachable from the GC roots will live.
With java applications you do have to worry about memory (Think lists holding onto the objects from all iterations), but it isn't as significant as other languages.
As for debugging: Java's idea of debugging high memory values are using a special 'memory-analyzer' to find out what objects are still on the heap, not worrying about what is referencing what.

The critical difference is that in Java etc you are not involved in the disposal problem at all. This may feel like a pretty scary position to be but it is surprisingly empowering. All the decisions you used to have to make as to who is responsible for disposing a created object are gone.
It does actually make sense. The system knows much more about what is reachable and what is not than you. It can also make much more flexible and intelligent decisions about when to tear down structures etc.
Essentially - in this environment you can juggle objects in a much more complex way without worrying about dropping one. The only thing you now need to worry about is if you accidentally glue one to the ceiling.
As an ex C programmer having moved to Java I feel your pain.
Re - your final question - it is not a memory leak. When GC kicks in everything is discarded except what is reachable. In this case, assuming you have released obj1 and obj2 neither is reachable so they will both be discarded.

Garbage collection is not simple ref counting.
The circular reference example which you demonstrate will not occur in a garbage collected managed language because the garbage collector will want to trace allocation references all the way back to something on the stack. If there isn't a stack reference somewhere it's garbage. Ref counting systems like shared_ptr are not that smart and it's possible (like you demonstrate) to have two objects somewhere in the heap which keep each other from being deleted.

Garbage collected languages don't let you inspect refcounter because they have no-one. Garbage collection is an entirely different thing from refcounted memory management. The real difference is in determinism.
{
std::fstream file( "example.txt" );
// do something with file
}
// ... later on
{
std::fstream file( "example.txt" );
// do something else with file
}
in C++ you have the guarantee that example.txt has been closed after the first block is closed, or if an exception is thrown. Caomparing it with Java
{
try
{
FileInputStream file = new FileInputStream( "example.txt" );
// do something with file
}
finally
{
if( file != null )
file.close();
}
}
// ..later on
{
try
{
FileInputStream file = new FileInputStream( "example.txt" );
// do something with file
}
finally
{
if( file != null )
file.close();
}
}
As you see, you have traded memory management for all other resources management. That is the real diffence, refcounted objects still keep deterministic destruction. In garbage collection languages you must manually release resources, and check for exception. One may argue that explicit memory management can be tedious and error prone, but in modern C++ you it is mitigated by smart pointers and standard containers. You still have some responsibilities (circular references, for example), but think at how many catch/finally block you can avoid using deterministic destruction and how much typing a Java/C#/etc. programmer must do instead (as they have to manually close/release resources other than memory). And I know that there's using syntax in C# (and something similar in the newest Java) but it covers only the block scope lifetime and not the more general problem of shared ownership.

Difference between new operator in C++ and new operator in java

As far as I know, the new operator does the following things: (please correct me if I am wrong.)
Allocates memory, and then returns the reference of the first block of the
allocated memory. (The memory is allocated from heap, obviously.)
Initialize the object (calling constructor.)
Also the operator new[] works in similar fashion except it does this for each and every element in the array.
Can anybody tell me how both of these operators and different in C++ and Java:
In terms of their life cycle.
What if they fail to allocate memory.

In C++, T * p = new T;...
allocates enough memory for an object of type T,
constructs an object of type T in that memory, possibly initializing it, and
returns a pointer to the object. (The pointer has the same value as the address of the allocated memory for the standard new, but this needn't be the case for the array form new[].)
In case the memory allocation fails, an exception of type std::bad_alloc is thrown, no object is constructed and no memory is allocated.
In case the object constructor throws an exception, no object is (obviously) constructed, the memory is automatically released immediately, and the exception is propagated.
Otherwise a dynamically allocated object has been constructed, and the user must manually destroy the object and release the memory, typically by saying delete p;.
The actual allocation and deallocation function can be controlled in C++. If there is nothing else, a global, predefined function ::operator new() is used, but this may be replaced by the user; and if there exists a static member function T::operator new, that one will be used instead.
In Java it's fairly similar, only that the return value of new is something that can bind to a Java variable of type T (or a base thereof, such as Object), and you must always have an initializer (so you'd say T x = new T();). The object's lifetime is indeterminate, but guaranteed to be at least as long as any variables still refer to the object, and there is no way to (nor any need to) destroy the object manually. Java has no explicit notion of memory, and you cannot control the interna of the allocation.
Furthermore, C++ allows lots of different forms of new expressions (so-called placement forms). They all create dynamic-storage objects which must be destroyed manually, but they can be fairly arbitrary. To my knowledge Java has no such facilities.
The biggest difference is probably in use: In Java, you use new all the time for everything, and you have to, since it's the one and only way to create (class-type) objects. By contrast, in C++ you should almost never have naked news in user code. C++ has unconstrained variables, and so variables themselves can be objects, and that is how objects are usually used in C++.

In your "statement", I don't think "returns a reference to the first block of allocated memory is quite right. new returns a pointer (to the type of the object allocated). This is subtly different from a reference, although conceptually similar.
Answers to your questions:
In C++ an object stays around in memory (see note) until it is explicitly deleted with delete or delete [] (and you must use the one matching what you allocated with, so a new int[1];, although it is the same amount of memory as new int; can not be deleted with delete (and vice versa, delete [] can't be used for a new int). In Java, the memory gets freed by the garbage collector at some point in the future once there is "no reference to the memory".
Both throw an exception (C++ throws std::bad_alloc, Java something like OutOfMemoryError), but in C++ you can use new(std::nothrow) ..., in which case new returns NULL if there isn't enough memory available to satisfy the call.
Note: It is, as per comment, technically possible to "destroy" the object without freeing it's memory. This is a rather unusual case, and not something you should do unless you are REALLY experienced with C++ and you have a VERY good reason to do so. The typical use-case for this is inside the delete operator corresponding to a placement new (where new is called with an already existing memory address to just perform the construction of the object(s)). Again, placement new is pretty much special use of new, and not something you can expect to see much of in normal C++ code.

I don't know about details in Java, but here is what new and new[] do in C++:
Allocate memory
When you have an expression new T or new T(args), the compiler determines which function to call for getting memory
If the type T has an appropriate member operator new that one is called
Otherwise, if the user provided an appropriate global operator new that one is called.
If operator new cannot allocate the requested memory, then it calls a new handler function, which you can set with set_new_handler. That function may free some space so the allocation can succeed, it may terminate the program, or it may throw an exception of type std::bad_alloc or derived from that. The default new handler just throws std::bad_alloc.
The same happens for new T[n] except that operator new[] is called for memory allocation.
Construct the object resp. objects in the newly allocated memory.
For new T(args) the corresponding constructor of the object is called. If the constructor throws an exception, the memory is deallocated by calling the corresponding operator delete (which can be found in the same places as operator new)
For new T it depends if T is POD (i.e. a built-in type or basically a C struct/union) or not. If T is POD, nothing happens, otherwise it is treated like new T().
For new T[n] it also depends on whether T is POD. Again, PODs are not initialized. For non-PODs the default constructor is in turn called for each of the objects in order. If one object's default constructor throws, no further constructors are called, but the already constructed objects (which doesn't include the one whose constructor just threw) are destructed (i.e. have the destructor called) in reverse order. Then the memory is deallocated with the appropriate operator delete[].
Returns a pointer to the newly created object(s). Note that for new[] the pointer will likely not point to the beginning of the allocated memory because there will likely be some information about the number of allocated objects preceding the constructed objects, which is used by delete[] to figure out how many objects to destruct.
In all cases, the objects live until they are destroyed with delete ptr (for objects allocated with normal new) or delete[] ptr (for objects created with array new T[n]). Unless added with a third-party library, there's no garbage collection in C++.
Note that you also can call operator new and operator delete directly to allocate raw memory. The same is true for operator new[] and operator delete[]. However note that even for those low-level functions you may not mix the calls, e.g. by deallocating memory with operator delete that you allocated with operator new[].
You can also copnstruct an object in allocated memory (no matter how you got that) with the so-called placement new. This is done by giving the pointer to the raw memory as argument to new, like this: new(pMem) T(args). To destruct such an explicitly constructed object, you can call the object's destructor directly, p->~T().
Placement new works by calling an operator new which takes the pointer as additional argument and just returns it. This same mechanism can also be used to provide other information to operator new overloads which take corresponding additional arguments. However while you can define corresponding operator delete, those are only used for cleaning up when an object throws an exception during construction. There's no "placement delete" syntax.
One other use of the placement new syntax which is already provided by C++ is nothrow new. That one takes an additional parameter std::nothrow and differs from normal new only in that it returns a null pointer if allocation fails.
Also note that new is not the only memory management mechanism in C++. On the one hand, there are the C functions malloc and free. While usually operator new and operator new[] just call malloc, this is not guaranteed. Therefore you may not mix those forms (e.g. by calling free on a pointer pointing to memory allocated with operator new). On the other hand, STL containers handle their allocations through allocators, which are objects which manage the allocation/deallocation of objects as well as construction/destruction of objects in containers.
And finally, there are those objects whose lifetime is controlled directly by the language, namely those of static and automatic lifetime. Automatic lifetime objects are allocated by simply defining a variable of the type at local scope. They are automatically created when execution passes that line, and automatically destroyed when execution leaves the scope (including it the scope is left through an exception). Static lifetime objects are define at global/namespace scope or at local scope using the keyword static. They are created at program startup (global/namespace scope) or when their definition line is forst executed (local scope), and they live until the end of the program, when they are automatically destroyed in reverse order of construction.
Generally, automatic or static variables are to be preferred to dynamic allocation (i,e, everything you allocate with new or allocators), because there the compiler cares for proper destruction, unlike dynamic allocation where you have to do that on your own. If you have dynamically allocated objects, it's desirable to have their lifetime managed by automatic/static objects (containers, smart pointers) for the same reason.

You seem to have the operation of new correct in that it allocates and initializes memory.
Once the new completes successfully, you, the programmer, are responsible for deleteing that memory. The best way to make sure that this happens is to never use new directly yourself, instead preferring standard containers and algorithms, and stack-based objects. But if you do need to allocate memory, the C++ idiom is to use a smart pointer like unique_ptr from C++11 or shared_ptr from boost or C++11. That makes sure that the memory is reclaimed properly.
If an allocation fails, the new call will throw an exception after cleaning up any portion of the object that has been constructed prior to the failure. You can use the (nothrow) version of new to return a null pointer instead of throwing an exception, but that places even more burden of cleanup onto the client code.

The new keyword
The new operator is somewhat similar in the two languages. The main difference is that every object and array must be allocated via new in Java. (And indeed arrays are actually objects in Java.) So whilst the following is legal in C/C++ and would allocate the array from the stack...
// C/C++ : allocate array from the stack
void myFunction() {
int x[2];
x[0] = 1;
...
}
...in Java, we would have to write the following:
// Java : have to use 'new'; JVM allocates
// memory where it chooses.
void myFunction() {
int[] x = new int[2];
...
}
ref:https://www.javamex.com/java_equivalents/new.shtml

Does assigning objects to null in Java impact garbage collection?

Does assigning an unused object reference to null in Java improve the garbage collection process in any measurable way?
My experience with Java (and C#) has taught me that is often counter intuitive to try and outsmart the virtual machine or JIT compiler, but I've seen co-workers use this method and I am curious if this is a good practice to pick up or one of those voodoo programming superstitions?

Typically, no.
But like all things: it depends. The GC in Java these days is VERY good and everything should be cleaned up very shortly after it is no longer reachable. This is just after leaving a method for local variables, and when a class instance is no longer referenced for fields.
You only need to explicitly null if you know it would remain referenced otherwise. For example an array which is kept around. You may want to null the individual elements of the array when they are no longer needed.
For example, this code from ArrayList:
public E remove(int index) {
RangeCheck(index);
modCount++;
E oldValue = (E) elementData[index];
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
Also, explicitly nulling an object will not cause an object to be collected any sooner than if it just went out of scope naturally as long as no references remain.
Both:
void foo() {
Object o = new Object();
/// do stuff with o
}
and:
void foo() {
Object o = new Object();
/// do stuff with o
o = null;
}
Are functionally equivalent.

In my experience, more often than not, people null out references out of paranoia not out of necessity. Here is a quick guideline:
If object A references object B and you no longer need this reference and object A is not eligible for garbage collection then you should explicitly null out the field. There is no need to null out a field if the enclosing object is getting garbage collected anyway. Nulling out fields in a dispose() method is almost always useless.
There is no need to null out object references created in a method. They will get cleared automatically once the method terminates. The exception to this rule is if you're running in a very long method or some massive loop and you need to ensure that some references get cleared before the end of the method. Again, these cases are extremely rare.
I would say that the vast majority of the time you will not need to null out references. Trying to outsmart the garbage collector is useless. You will just end up with inefficient, unreadable code.

Good article is today's coding horror.
The way GC's work is by looking for objects that do not have any pointers to them, the area of their search is heap/stack and any other spaces they have. So if you set a variable to null, the actual object is now not pointed by anyone, and hence could be GC'd.
But since the GC might not run at that exact instant, you might not actually be buying yourself anything. But if your method is fairly long (in terms of execution time) it might be worth it since you will be increasing your chances of GC collecting that object.
The problem can also be complicated with code optimizations, if you never use the variable after you set it to null, it would be a safe optimization to remove the line that sets the value to null (one less instruction to execute). So you might not actually be getting any improvement.
So in summary, yes it can help, but it will not be deterministic.

At least in java, it's not voodoo programming at all. When you create an object in java using something like
Foo bar = new Foo();
you do two things: first, you create a reference to an object, and second, you create the Foo object itself. So long as that reference or another exists, the specific object can't be gc'd. however, when you assign null to that reference...
bar = null ;
and assuming nothing else has a reference to the object, it's freed and available for gc the next time the garbage collector passes by.

It depends.
Generally speaking shorter you keep references to your objects, faster they'll get collected.
If your method takes say 2 seconds to execute and you don't need an object anymore after one second of method execution, it makes sense to clear any references to it. If GC sees that after one second, your object is still referenced, next time it might check it in a minute or so.
Anyway, setting all references to null by default is to me premature optimization and nobody should do it unless in specific rare cases where it measurably decreases memory consuption.

Explicitly setting a reference to null instead of just letting the variable go out of scope, does not help the garbage collector, unless the object held is very large, where setting it to null as soon as you are done with is a good idea.
Generally setting references to null, mean to the READER of the code that this object is completely done with and should not be concerned about any more.
A similar effect can be achieved by introducing a narrower scope by putting in an extra set of braces
{
int l;
{ // <- here
String bigThing = ....;
l = bigThing.length();
} // <- and here
}
this allows the bigThing to be garbage collected right after leaving the nested braces.

public class JavaMemory {
private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6);
public void f() {
{
byte[] data = new byte[dataSize];
//data = null;
}
byte[] data2 = new byte[dataSize];
}
public static void main(String[] args) {
JavaMemory jmp = new JavaMemory();
jmp.f();
}
}
Above program throws OutOfMemoryError. If you uncomment data = null;, the OutOfMemoryError is solved. It is always good practice to set the unused variable to null

I was working on a video conferencing application one time and noticed a huge huge huge difference in performance when I took the time to null references as soon as I didn't need the object anymore. This was in 2003-2004 and I can only imagine the GC has gotten even smarter since. In my case I had hundreds of objects coming and going out of scope every second, so I noticed the GC when it kicked in periodically. However after I made it a point to null objects the GC stopped pausing my application.
So it depends on what your doing...

Yes.
From "The Pragmatic Programmer" p.292:
By setting a reference to NULL you reduce the number of pointers to the object by one ... (which will allow the garbage collector to remove it)

I assume the OP is referring to things like this:
private void Blah()
{
MyObj a;
MyObj b;
try {
a = new MyObj();
b = new MyObj;
// do real work
} finally {
a = null;
b = null;
}
}
In this case, wouldn't the VM mark them for GC as soon as they leave scope anyway?
Or, from another perspective, would explicitly setting the items to null cause them to get GC'd before they would if they just went out of scope? If so, the VM may spend time GC'ing the object when the memory isn't needed anyway, which would actually cause worse performance CPU usage wise because it would be GC'ing more earlier.

Even if nullifying the reference were marginally more efficient, would it be worth the ugliness of having to pepper your code with these ugly nullifications? They would only be clutter and obscure the intent code that contains them.
Its a rare codebase that has no better candidate for optimisation than trying to outsmart the Garbage collector (rarer still are developers who succeed in outsmarting it). Your efforts will most likely be better spent elsewhere instead, ditching that crufty Xml parser or finding some opportunity to cache computation. These optimisations will be easier to quantify and don't require you dirty up your codebase with noise.

Oracle doc point out "Assign null to Variables That Are No Longer Needed" https://docs.oracle.com/cd/E19159-01/819-3681/abebi/index.html

"It depends"
I do not know about Java but in .net (C#, VB.net...) it is usually not required to assign a null when you no longer require a object.
However note that it is "usually not required".
By analyzing your code the .net compiler makes a good valuation of the life time of the variable...to accurately tell when the object is not being used anymore. So if you write obj=null it might actually look as if the obj is still being used...in this case it is counter productive to assign a null.
There are a few cases where it might actually help to assign a null. One example is you have a huge code that runs for long time or a method that is running in a different thread, or some loop. In such cases it might help to assign null so that it is easy for the GC to know its not being used anymore.
There is no hard & fast rule for this. Going by the above place null-assigns in your code and do run a profiler to see if it helps in any way. Most probably you might not see a benefit.
If it is .net code you are trying to optimize, then my experience has been that taking good care with Dispose and Finalize methods is actually more beneficial than bothering about nulls.
Some references on the topic:
http://blogs.msdn.com/csharpfaq/archive/2004/03/26/97229.aspx
http://weblogs.asp.net/pwilson/archive/2004/02/20/77422.aspx

In the future execution of your program, the values of some data members will be used to computer an output visible external to the program. Others might or might not be used, depending on future (And impossible to predict) inputs to the program. Other data members might be guaranteed not to be used. All resources, including memory, allocated to those unused data are wasted. The job of the garbage collector (GC) is to eliminate that wasted memory. It would be disastrous for the GC to eliminate something that was needed, so the algorithm used might be conservative, retaining more than the strict minimum. It might use heuristic optimizations to improve its speed, at the cost of retaining some items that are not actually needed. There are many potential algorithms the GC might use. Therefore it is possible that changes you make to your program, and which do not affect the correctness of your program, might nevertheless affect the operation of the GC, either making it run faster to do the same job, or to sooner identify unused items. So this kind of change, setting an unusdd object reference to null, in theory is not always voodoo.
Is it voodoo? There are reportedly parts of the Java library code that do this. The writers of that code are much better than average programmers and either know, or cooperate with, programmers who know details of the garbage collector implementations. So that suggests there is sometimes a benefit.

As you said there are optimizations, i.e. JVM knows the place when the variable was last used and the object referenced by it can be GCed right after this last point (still executing in current scope). So nulling out references in most cases does not help GC.
But it can be useful to avoid "nepotism" (or "floating garbage") problem (read more here or watch video). The problem exists because heap is split into Old and Young generations and there are different GC mechanisms applied: Minor GC (which is fast and happens often to clean young gen) and Major Gc (which causes longer pause to clean Old gen). "Nepotism" does not allow for garbage in Young gen to be collected if it is referenced by garbage which was already tenured to an Old gen.
This is 'pathological' because ANY promoted node will result in the promotion of ALL following nodes until a GC resolves the issue.
To avoid nepotism it's a good idea to null out references from an object which is supposed to be removed. You can see this technique applied in JDK classes: LinkedList and LinkedHashMap
private E unlinkFirst(Node<E> f) {
final E element = f.item;
final Node<E> next = f.next;
f.item = null;
f.next = null; // help GC
// ...
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.