What's the Java equivalent of .net's GC.KeepAlive?

What's the Java equivalent of .net's GC.KeepAlive? - java

.NET has a function called GC.KeepAlive(Object). Its sole purpose is to ensure the lifetime of the referenced object lasts until code flow reaches the call.
This is normally not necessary unless one is interoperating with native code.
I have a situation where I've got a graph of C++ objects accessed via JNI, where certain root objects need to be kept alive to keep the children alive. Both the root objects and the child objects have mirrors in JVM land. If the root object is collected and freed on the C++ side (by way of a SWIG-generated finalizer), however, the child objects will become invalid, since their C++ backing object will have been freed.
This can be solved by ensuring the local variables that root the object graph have a lifetime that exceeds the last use of a child object. So I need an idiomatic function that does nothing to an object, yet won't be optimized away or moved (e.g. hoisted out of a loop). That's what GC.KeepAlive(Object) does in .NET.
What is the approximate equivalent in Java?
PS: some possibly illustrative code:
class Parent {
long ptr;
void finalize() { free(ptr); }
Child getChild() { return new Child(expensive_operation(ptr)); }
}
class Child {
long ptr;
void doStuff() { do_stuff(ptr); }
}
// BAD CODE with potential for SIGSEGV
for (Parent p : getParents()) {
p.getChild().doStuff();
}
The trouble is that the GC freeing Parent p will free the memory allocated for Child while doStuff is executing. GC has been observed to do this in practice. A potential fix if GC.KeepAlive was available:
// BAD CODE with potential for SIGSEGV
for (Parent p : getParents()) {
p.getChild().doStuff();
GC.KeepAlive(p);
}
I could e.g. call toString on p, but I won't be doing anything with its output. I could poke p into an array temporarily, but how do I know the JVM won't discard the store? Etc.

I guess you could use JMH Blackhole for this. It was designed for ensuring that the reference doesn't get eliminated in benchmarks so it should work.
Basically it just compares the given object reference against a stored volatile reference and reassigns the later with some small and decreasing probability (storing is expensive so it gets minimized).

Whenever the garbage collector is aggressive enough to claim the object while invoking a native method, and also in Java world little people seem to care to the point that either the problem doesn't exist or there's a lot bugged code around, this other SO answer seems to provide a reasonable alternative to use GC.KeepAlive(Object), that is by using non-static native JNI methods, reasonably preventing any possible garbage collection of the instance invoking these methods.

Related

Avoiding objects garbage collection

TLDR: How can I force the JVM not to garbage collect my objects, even if I don't want to use them in any meaningful way?
Longer story:
I have some Items which are loaded from a permanent storage and held as weak references in a cache. The weak reference usage means that, unless someone is actually using a specific Item, there are no strong references to them and unused ones are eventually garbage collected. This is all desired behaviour and works just fine. Additionally, sometimes it is necessary to propagate changes of an Item into the permanent storage. This is done asynchronously in a dedicated writer thread. And here comes the problem, because I obviously cannot allow the Item to be garbage collected before the update is finished. The solution I currently have is to include a strong reference to the Item inside the update object (the Item is never actually used during the update process, just held).
public class Item {
public final String name;
public String value;
}
public class PendingUpdate {
public final Item strongRef; // not actually necessary, just to avoid GC
public final String name;
public final String newValue;
}
But after some thinking and digging I found this paragraph in JavaSE specs (12.6.1):
Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
Which, if I understand it correctly, means that java can just decide that the Item is garbage anyway. One solution would be to do some unnecessary operation on the Item like item.hashCode(); at the end of the storage update code. But I expect that a JVM might be smart enough to remove such unnecessary code anyway and I cannot think of any reasonable solution that a sufficiently smart JVM wouldn't be able to release sooner than needed.
public void performStorageUpdate(PendingUpdate update) {
final Transaction transaction = this.getDataManager().beginTransaction();
try {
// ... some permanent storage update code
} catch (final Throwable t) {
transaction.abort();
}
transaction.commit();
// The Item should never be garbage collected before this point
update.item.hashCode(); // Trying to avoid GC of the item, is probably not enough
}
Has anyone encounter a similar problem with weak references? Are there some language guarantees that I can use to avoid GC for such objects? (Ideally causing as small performance hit as possible.) Or am I overthinking it and the specification paragraph mean something different?
Edit: Why I cannot allow the Item to be garbage collected before the storage update finishes:
Problematic event sequence:
Item is loaded into cache and is used (held as a strong reference)
An update to the item is enqueued
Strong reference to the Item is dropped and there are no other strong references to the item (besides the one in the PendingUpdate, but as I explained, I think that that one can be optimized away by JVM).
Item is garbage collected
Item is requested again and is loaded from the permanent storage and a new strong reference to it is created
Update to the storage is performed
Result state: There are inconsistent data inside the cache and the permanent storage. Therefore, I need to held the strong reference to the Item until the storage update finishes, but I just need to hold it I don't actually need to do anything with it (so JVM is probably free to think that it is safe to get rid off).

TL;DR How can I force the JVM not to garbage collect my objects, even if I don't want to use them in any meaningful way?
Make them strongly reachable; e.g. by adding them to a strongly reachable data structure. If objects are strongly reachable then the garbage collector won't break weak references to them.
When you finish have finished the processing where the objects need to remain in the cache you can clear the data structure to break the above strong references. The next GC run then will be able to break the weak references.
Which, if I understand it correctly, means that java can just decide that the Item is garbage anyway.
That's not what it means.
What it really means that the infrastructure may be able to determine that an object is effectively unreachable, even though there is still a reference to it in a variable. For example:
public void example() {
int[] big = new int[1000000];
// long computation that doesn't use 'big'
}
If the compiler / runtime can determine that the object that big refers to cannot be used1 during the long computation, it is permitted to garbage collect it ... during the long computation.
But here's the thing. It can only do this if the object cannot be used. And if it cannot be used, there is no reason not to garbage collect it.
1 - ... without traversing a reference object.
For what it is worth, the definition of strongly reachable isn't just that there is a reference in a local variable. The definition (in the javadocs) is:
"An object is strongly reachable if it can be reached by some thread without traversing any reference objects. A newly-created object is strongly reachable by the thread that created it."
It doesn't specify how the object can be reached by the thread. Or how the runtime could / might deduce that no thread can reach it.
But the implication is clear that if threads can only access the object via a reference object, then it is not strongly reachable.
Ergo ... make the object strongly reachable.

java - Memory management - Destroying a list when we don't need it [duplicate]

Does assigning an unused object reference to null in Java improve the garbage collection process in any measurable way?
My experience with Java (and C#) has taught me that is often counter intuitive to try and outsmart the virtual machine or JIT compiler, but I've seen co-workers use this method and I am curious if this is a good practice to pick up or one of those voodoo programming superstitions?

Typically, no.
But like all things: it depends. The GC in Java these days is VERY good and everything should be cleaned up very shortly after it is no longer reachable. This is just after leaving a method for local variables, and when a class instance is no longer referenced for fields.
You only need to explicitly null if you know it would remain referenced otherwise. For example an array which is kept around. You may want to null the individual elements of the array when they are no longer needed.
For example, this code from ArrayList:
public E remove(int index) {
RangeCheck(index);
modCount++;
E oldValue = (E) elementData[index];
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
Also, explicitly nulling an object will not cause an object to be collected any sooner than if it just went out of scope naturally as long as no references remain.
Both:
void foo() {
Object o = new Object();
/// do stuff with o
}
and:
void foo() {
Object o = new Object();
/// do stuff with o
o = null;
}
Are functionally equivalent.

In my experience, more often than not, people null out references out of paranoia not out of necessity. Here is a quick guideline:
If object A references object B and you no longer need this reference and object A is not eligible for garbage collection then you should explicitly null out the field. There is no need to null out a field if the enclosing object is getting garbage collected anyway. Nulling out fields in a dispose() method is almost always useless.
There is no need to null out object references created in a method. They will get cleared automatically once the method terminates. The exception to this rule is if you're running in a very long method or some massive loop and you need to ensure that some references get cleared before the end of the method. Again, these cases are extremely rare.
I would say that the vast majority of the time you will not need to null out references. Trying to outsmart the garbage collector is useless. You will just end up with inefficient, unreadable code.

Good article is today's coding horror.
The way GC's work is by looking for objects that do not have any pointers to them, the area of their search is heap/stack and any other spaces they have. So if you set a variable to null, the actual object is now not pointed by anyone, and hence could be GC'd.
But since the GC might not run at that exact instant, you might not actually be buying yourself anything. But if your method is fairly long (in terms of execution time) it might be worth it since you will be increasing your chances of GC collecting that object.
The problem can also be complicated with code optimizations, if you never use the variable after you set it to null, it would be a safe optimization to remove the line that sets the value to null (one less instruction to execute). So you might not actually be getting any improvement.
So in summary, yes it can help, but it will not be deterministic.

At least in java, it's not voodoo programming at all. When you create an object in java using something like
Foo bar = new Foo();
you do two things: first, you create a reference to an object, and second, you create the Foo object itself. So long as that reference or another exists, the specific object can't be gc'd. however, when you assign null to that reference...
bar = null ;
and assuming nothing else has a reference to the object, it's freed and available for gc the next time the garbage collector passes by.

It depends.
Generally speaking shorter you keep references to your objects, faster they'll get collected.
If your method takes say 2 seconds to execute and you don't need an object anymore after one second of method execution, it makes sense to clear any references to it. If GC sees that after one second, your object is still referenced, next time it might check it in a minute or so.
Anyway, setting all references to null by default is to me premature optimization and nobody should do it unless in specific rare cases where it measurably decreases memory consuption.

Explicitly setting a reference to null instead of just letting the variable go out of scope, does not help the garbage collector, unless the object held is very large, where setting it to null as soon as you are done with is a good idea.
Generally setting references to null, mean to the READER of the code that this object is completely done with and should not be concerned about any more.
A similar effect can be achieved by introducing a narrower scope by putting in an extra set of braces
{
int l;
{ // <- here
String bigThing = ....;
l = bigThing.length();
} // <- and here
}
this allows the bigThing to be garbage collected right after leaving the nested braces.

public class JavaMemory {
private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6);
public void f() {
{
byte[] data = new byte[dataSize];
//data = null;
}
byte[] data2 = new byte[dataSize];
}
public static void main(String[] args) {
JavaMemory jmp = new JavaMemory();
jmp.f();
}
}
Above program throws OutOfMemoryError. If you uncomment data = null;, the OutOfMemoryError is solved. It is always good practice to set the unused variable to null

I was working on a video conferencing application one time and noticed a huge huge huge difference in performance when I took the time to null references as soon as I didn't need the object anymore. This was in 2003-2004 and I can only imagine the GC has gotten even smarter since. In my case I had hundreds of objects coming and going out of scope every second, so I noticed the GC when it kicked in periodically. However after I made it a point to null objects the GC stopped pausing my application.
So it depends on what your doing...

Yes.
From "The Pragmatic Programmer" p.292:
By setting a reference to NULL you reduce the number of pointers to the object by one ... (which will allow the garbage collector to remove it)

I assume the OP is referring to things like this:
private void Blah()
{
MyObj a;
MyObj b;
try {
a = new MyObj();
b = new MyObj;
// do real work
} finally {
a = null;
b = null;
}
}
In this case, wouldn't the VM mark them for GC as soon as they leave scope anyway?
Or, from another perspective, would explicitly setting the items to null cause them to get GC'd before they would if they just went out of scope? If so, the VM may spend time GC'ing the object when the memory isn't needed anyway, which would actually cause worse performance CPU usage wise because it would be GC'ing more earlier.

Even if nullifying the reference were marginally more efficient, would it be worth the ugliness of having to pepper your code with these ugly nullifications? They would only be clutter and obscure the intent code that contains them.
Its a rare codebase that has no better candidate for optimisation than trying to outsmart the Garbage collector (rarer still are developers who succeed in outsmarting it). Your efforts will most likely be better spent elsewhere instead, ditching that crufty Xml parser or finding some opportunity to cache computation. These optimisations will be easier to quantify and don't require you dirty up your codebase with noise.

Oracle doc point out "Assign null to Variables That Are No Longer Needed" https://docs.oracle.com/cd/E19159-01/819-3681/abebi/index.html

"It depends"
I do not know about Java but in .net (C#, VB.net...) it is usually not required to assign a null when you no longer require a object.
However note that it is "usually not required".
By analyzing your code the .net compiler makes a good valuation of the life time of the variable...to accurately tell when the object is not being used anymore. So if you write obj=null it might actually look as if the obj is still being used...in this case it is counter productive to assign a null.
There are a few cases where it might actually help to assign a null. One example is you have a huge code that runs for long time or a method that is running in a different thread, or some loop. In such cases it might help to assign null so that it is easy for the GC to know its not being used anymore.
There is no hard & fast rule for this. Going by the above place null-assigns in your code and do run a profiler to see if it helps in any way. Most probably you might not see a benefit.
If it is .net code you are trying to optimize, then my experience has been that taking good care with Dispose and Finalize methods is actually more beneficial than bothering about nulls.
Some references on the topic:
http://blogs.msdn.com/csharpfaq/archive/2004/03/26/97229.aspx
http://weblogs.asp.net/pwilson/archive/2004/02/20/77422.aspx

In the future execution of your program, the values of some data members will be used to computer an output visible external to the program. Others might or might not be used, depending on future (And impossible to predict) inputs to the program. Other data members might be guaranteed not to be used. All resources, including memory, allocated to those unused data are wasted. The job of the garbage collector (GC) is to eliminate that wasted memory. It would be disastrous for the GC to eliminate something that was needed, so the algorithm used might be conservative, retaining more than the strict minimum. It might use heuristic optimizations to improve its speed, at the cost of retaining some items that are not actually needed. There are many potential algorithms the GC might use. Therefore it is possible that changes you make to your program, and which do not affect the correctness of your program, might nevertheless affect the operation of the GC, either making it run faster to do the same job, or to sooner identify unused items. So this kind of change, setting an unusdd object reference to null, in theory is not always voodoo.
Is it voodoo? There are reportedly parts of the Java library code that do this. The writers of that code are much better than average programmers and either know, or cooperate with, programmers who know details of the garbage collector implementations. So that suggests there is sometimes a benefit.

As you said there are optimizations, i.e. JVM knows the place when the variable was last used and the object referenced by it can be GCed right after this last point (still executing in current scope). So nulling out references in most cases does not help GC.
But it can be useful to avoid "nepotism" (or "floating garbage") problem (read more here or watch video). The problem exists because heap is split into Old and Young generations and there are different GC mechanisms applied: Minor GC (which is fast and happens often to clean young gen) and Major Gc (which causes longer pause to clean Old gen). "Nepotism" does not allow for garbage in Young gen to be collected if it is referenced by garbage which was already tenured to an Old gen.
This is 'pathological' because ANY promoted node will result in the promotion of ALL following nodes until a GC resolves the issue.
To avoid nepotism it's a good idea to null out references from an object which is supposed to be removed. You can see this technique applied in JDK classes: LinkedList and LinkedHashMap
private E unlinkFirst(Node<E> f) {
final E element = f.item;
final Node<E> next = f.next;
f.item = null;
f.next = null; // help GC
// ...
}

Garbage collection vs manual memory management

This is a very basic question. I will formulate it using C++ and Java, but it's really language-independent.
Consider a well-known problem in C++:
struct Obj
{
boost::shared_ptr<Obj> m_field;
};
{
boost::shared_ptr<Obj> obj1(new Obj);
boost::shared_ptr<Obj> obj2(new Obj);
obj1->m_field = obj2;
obj2->m_field = obj1;
}
This is a memory leak, and everybody knows it :). The solution is also well-known: one should use weak pointers to break the "refcount interlocking". It is also known that this problem cannot be resolved automatically in principle. It's solely programmer's responsibility to resolve it.
But there's a positive thing: a programmer has full control on refcount values. I can pause my program in debugger and examine refcount for obj1, obj2 and understand that there's a problem. I also can set a breakpoint in destructor of an object and observe a destruction moment (or find out that object has not been destroyed).
My question is about Java, C#, ActionScript and other "Garbage Collection" languages. I might be missing something, but in my opinion they
Do not let me examine refcount of objects
Do not let me know when object is destroyed (okay, when object is exposed to GC)
I often hear that these languages just do not allow a programmer to leak a memory and that's why they are great. As far as I understand, they just hide memory management problems and make it hard to solve them.
Finally, the questions themselves:
Java:
public class Obj
{
public Obj m_field;
}
{
Obj obj1 = new Obj();
Obj obj2 = new Obj();
obj1.m_field = obj2;
obj2.m_field = obj1;
}
Is it memory leak?
If yes: how do I detect and fix it?
If no: why?

Managed memory systems are built on the assumption that you don't want to be tracing memory leak issue in the first place. Instead of making them easier to solve you try to make sure they never happen in the first place.
Java does have a lose term for "Memory Leak" which means any growth in memory which could impact your application, but there is never a point that the managed memory cannot clean up all the memory.
JVM don't use reference counting for a number of reasons
it cannot handled circular references as you have observed.
it has significant memory and threading overhead to maintain accurately.
there are much better, simpler ways of handling such situations for managed memory.
While the JLS doesn't ban the use of reference counts, it is not used in any JVM AFAIK.
Instead Java keeps track of a number of root contexts (e.g. each thread stack) and can trace which objects need to be keeps and which can be discarded based on whether those objects are strongly reachable. It also provides the facility for weak references (which are retained as long as the objects are not cleaned up) and soft references (which are not generally cleaned up but can be at the garbage collectors discretion)

AFAIK, Java GC works by starting from a set of well-defined initial references and computing a transitive closure of objects which can be reached from these references. Anything not reachable is "leaked" and can be GC-ed.

Java has a unique memory management strategy. Everything (except a few specific things) are allocated on the heap, and isn't freed until the GC gets to work.
For example:
public class Obj {
public Object example;
public Obj m_field;
}
public static void main(String[] args) {
int lastPrime = 2;
while (true) {
Obj obj1 = new Obj();
Obj obj2 = new Obj();
obj1.example = new Object();
obj1.m_field = obj2;
obj2.m_field = obj1;
int prime = lastPrime++;
while (!isPrime(prime)) {
prime++;
}
lastPrime = prime;
System.out.println("Found a prime: " + prime);
}
}
C handles this situation by requiring you to manually free the memory of both 'obj', and C++ counts references to 'obj' and automatically destroys them when they go out of scope.
Java does not free this memory, at least not at first.
The Java runtime waits a while until it feels like there is too much memory being used. After that the Garbage collector kicks in.
Let's say the java garbage collector decides to clean up after the 10,000th iteration of the outer loop. By this time, 10,000 objects have been created (which would have already been freed in C/C++).
Although there are 10,000 iterations of the outer loop, only the newly created obj1 and obj2 could possibly be referenced by the code.
These are the GC 'roots', which java uses to find all objects which could possibly be referenced. The garbage collector then recursively iterates down the object tree, marking 'example' as active in addiction to the garbage collector roots.
All those other objects are then destroyed by the garbage collector.
This does come with a performance penalty, but this process has been heavily optimized, and isn't significant for most applications.
Unlike in C++, you don't have to worry about reference cycles at all, since only objects reachable from the GC roots will live.
With java applications you do have to worry about memory (Think lists holding onto the objects from all iterations), but it isn't as significant as other languages.
As for debugging: Java's idea of debugging high memory values are using a special 'memory-analyzer' to find out what objects are still on the heap, not worrying about what is referencing what.

The critical difference is that in Java etc you are not involved in the disposal problem at all. This may feel like a pretty scary position to be but it is surprisingly empowering. All the decisions you used to have to make as to who is responsible for disposing a created object are gone.
It does actually make sense. The system knows much more about what is reachable and what is not than you. It can also make much more flexible and intelligent decisions about when to tear down structures etc.
Essentially - in this environment you can juggle objects in a much more complex way without worrying about dropping one. The only thing you now need to worry about is if you accidentally glue one to the ceiling.
As an ex C programmer having moved to Java I feel your pain.
Re - your final question - it is not a memory leak. When GC kicks in everything is discarded except what is reachable. In this case, assuming you have released obj1 and obj2 neither is reachable so they will both be discarded.

Garbage collection is not simple ref counting.
The circular reference example which you demonstrate will not occur in a garbage collected managed language because the garbage collector will want to trace allocation references all the way back to something on the stack. If there isn't a stack reference somewhere it's garbage. Ref counting systems like shared_ptr are not that smart and it's possible (like you demonstrate) to have two objects somewhere in the heap which keep each other from being deleted.

Garbage collected languages don't let you inspect refcounter because they have no-one. Garbage collection is an entirely different thing from refcounted memory management. The real difference is in determinism.
{
std::fstream file( "example.txt" );
// do something with file
}
// ... later on
{
std::fstream file( "example.txt" );
// do something else with file
}
in C++ you have the guarantee that example.txt has been closed after the first block is closed, or if an exception is thrown. Caomparing it with Java
{
try
{
FileInputStream file = new FileInputStream( "example.txt" );
// do something with file
}
finally
{
if( file != null )
file.close();
}
}
// ..later on
{
try
{
FileInputStream file = new FileInputStream( "example.txt" );
// do something with file
}
finally
{
if( file != null )
file.close();
}
}
As you see, you have traded memory management for all other resources management. That is the real diffence, refcounted objects still keep deterministic destruction. In garbage collection languages you must manually release resources, and check for exception. One may argue that explicit memory management can be tedious and error prone, but in modern C++ you it is mitigated by smart pointers and standard containers. You still have some responsibilities (circular references, for example), but think at how many catch/finally block you can avoid using deterministic destruction and how much typing a Java/C#/etc. programmer must do instead (as they have to manually close/release resources other than memory). And I know that there's using syntax in C# (and something similar in the newest Java) but it covers only the block scope lifetime and not the more general problem of shared ownership.

How smart is the Java JVM about GC'ing during a reduce() operation on a long Scala list or Stream?

OK, let me see if I can explain.
I have some code that wraps a Java iterator (from Hadoop, as it happens) in a Scala Stream, so that it potentially can be read more than once, by client code that I have no direct control over. The last thing that gets done with this Stream is a reduce() operation. Stream remembers all the items that it's already seen. Unfortunately, in some circumstances the iterator will be extremely large, so that storing all the items in it will lead to out-of-memory errors. However, in general, the situations where the client code needs the multiple-iteration facility are not the same ones with the memory-busting Iterators, and if such cases do exist, that's not my problem.
What I want to ensure is that I can provide the memoizing capability for code that needs it, but not for code that doesn't need it (in particular, for code that never looks at the Stream at all).
The code for reduce() in Stream says that it's written in a way to allow for GC of the already-visited parts of the Stream to happen while reducing. So if I can make sure this actually happens, I'll be fine. But in practice how can I make sure that this happens? In particular, if function A creates and passes the stream to function B, and function B passes the stream to function C, and function C then calls reduce(), then what about the references to the stream still in functions A, B and C? In all these cases, there will be no further use of the stream in any of the three functions, although the calls aren't necessarily tail-recursive. Is the JVM smart enough to ensure that its reference count is 0 from functions A, B and C at the time that reduce() is called, so that the GC can happen? Essentially this means that the JVM notices in function A that the last thing it does with the item is call function B, so it eliminates its own handle at the same time it calls B, and likewise for B to C, and C to reduce().
If this works properly, does it also work if A, B or C has a local variable holding onto the item? (Which, again, won't be used, afterwards.) That's because it's rather more tricky to code this properly without using local vars.

A variable which is in scope but which will never be read from is dead. A JVM is free to ignore dead variables for the purposes of garbage collection; an object which is only pointed to by dead variables is unreachable, and may be collected. The relevant bit of the JLS is, obscurely enough, §12.6.1 Implementing Finalization, which says:
A reachable object is any object that can be accessed in any potential continuing computation from any live thread.
And explains that:
Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
Another example of this occurs if the values in an object's fields are stored in registers. The program may then access the registers instead of the object, and never access the object again. This would imply that the object is garbage. Note that this sort of optimization is only allowed if references are on the stack, not stored in the heap.
If your method A has only dead variables referring to the stream, then it won't obstruct its collection.
Note, however, that that means local variables: if you have fields which refer to the stream (including closed-over local variables from a method enclosing a nested class), then this doesn't apply; i don't think the JVM is allowed to treat these as dead. In other words, here:
public Callable<String> foo(final Object o) {
return new Callable<String>() {
public String call() throws InterruptedException {
String s = o.toString();
Thread.sleep(1000000);
return s;
}
};
}
The object o cannot be collected until the anonymous Callable is collected, even though it is never used after the toString call, because there is a synthetic field referring to it in the Callable.

Does assigning objects to null in Java impact garbage collection?

Does assigning an unused object reference to null in Java improve the garbage collection process in any measurable way?
My experience with Java (and C#) has taught me that is often counter intuitive to try and outsmart the virtual machine or JIT compiler, but I've seen co-workers use this method and I am curious if this is a good practice to pick up or one of those voodoo programming superstitions?

Typically, no.
But like all things: it depends. The GC in Java these days is VERY good and everything should be cleaned up very shortly after it is no longer reachable. This is just after leaving a method for local variables, and when a class instance is no longer referenced for fields.
You only need to explicitly null if you know it would remain referenced otherwise. For example an array which is kept around. You may want to null the individual elements of the array when they are no longer needed.
For example, this code from ArrayList:
public E remove(int index) {
RangeCheck(index);
modCount++;
E oldValue = (E) elementData[index];
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // Let gc do its work
return oldValue;
}
Also, explicitly nulling an object will not cause an object to be collected any sooner than if it just went out of scope naturally as long as no references remain.
Both:
void foo() {
Object o = new Object();
/// do stuff with o
}
and:
void foo() {
Object o = new Object();
/// do stuff with o
o = null;
}
Are functionally equivalent.

In my experience, more often than not, people null out references out of paranoia not out of necessity. Here is a quick guideline:
If object A references object B and you no longer need this reference and object A is not eligible for garbage collection then you should explicitly null out the field. There is no need to null out a field if the enclosing object is getting garbage collected anyway. Nulling out fields in a dispose() method is almost always useless.
There is no need to null out object references created in a method. They will get cleared automatically once the method terminates. The exception to this rule is if you're running in a very long method or some massive loop and you need to ensure that some references get cleared before the end of the method. Again, these cases are extremely rare.
I would say that the vast majority of the time you will not need to null out references. Trying to outsmart the garbage collector is useless. You will just end up with inefficient, unreadable code.

Good article is today's coding horror.
The way GC's work is by looking for objects that do not have any pointers to them, the area of their search is heap/stack and any other spaces they have. So if you set a variable to null, the actual object is now not pointed by anyone, and hence could be GC'd.
But since the GC might not run at that exact instant, you might not actually be buying yourself anything. But if your method is fairly long (in terms of execution time) it might be worth it since you will be increasing your chances of GC collecting that object.
The problem can also be complicated with code optimizations, if you never use the variable after you set it to null, it would be a safe optimization to remove the line that sets the value to null (one less instruction to execute). So you might not actually be getting any improvement.
So in summary, yes it can help, but it will not be deterministic.

At least in java, it's not voodoo programming at all. When you create an object in java using something like
Foo bar = new Foo();
you do two things: first, you create a reference to an object, and second, you create the Foo object itself. So long as that reference or another exists, the specific object can't be gc'd. however, when you assign null to that reference...
bar = null ;
and assuming nothing else has a reference to the object, it's freed and available for gc the next time the garbage collector passes by.

It depends.
Generally speaking shorter you keep references to your objects, faster they'll get collected.
If your method takes say 2 seconds to execute and you don't need an object anymore after one second of method execution, it makes sense to clear any references to it. If GC sees that after one second, your object is still referenced, next time it might check it in a minute or so.
Anyway, setting all references to null by default is to me premature optimization and nobody should do it unless in specific rare cases where it measurably decreases memory consuption.

Explicitly setting a reference to null instead of just letting the variable go out of scope, does not help the garbage collector, unless the object held is very large, where setting it to null as soon as you are done with is a good idea.
Generally setting references to null, mean to the READER of the code that this object is completely done with and should not be concerned about any more.
A similar effect can be achieved by introducing a narrower scope by putting in an extra set of braces
{
int l;
{ // <- here
String bigThing = ....;
l = bigThing.length();
} // <- and here
}
this allows the bigThing to be garbage collected right after leaving the nested braces.

public class JavaMemory {
private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6);
public void f() {
{
byte[] data = new byte[dataSize];
//data = null;
}
byte[] data2 = new byte[dataSize];
}
public static void main(String[] args) {
JavaMemory jmp = new JavaMemory();
jmp.f();
}
}
Above program throws OutOfMemoryError. If you uncomment data = null;, the OutOfMemoryError is solved. It is always good practice to set the unused variable to null

I was working on a video conferencing application one time and noticed a huge huge huge difference in performance when I took the time to null references as soon as I didn't need the object anymore. This was in 2003-2004 and I can only imagine the GC has gotten even smarter since. In my case I had hundreds of objects coming and going out of scope every second, so I noticed the GC when it kicked in periodically. However after I made it a point to null objects the GC stopped pausing my application.
So it depends on what your doing...

Yes.
From "The Pragmatic Programmer" p.292:
By setting a reference to NULL you reduce the number of pointers to the object by one ... (which will allow the garbage collector to remove it)

I assume the OP is referring to things like this:
private void Blah()
{
MyObj a;
MyObj b;
try {
a = new MyObj();
b = new MyObj;
// do real work
} finally {
a = null;
b = null;
}
}
In this case, wouldn't the VM mark them for GC as soon as they leave scope anyway?
Or, from another perspective, would explicitly setting the items to null cause them to get GC'd before they would if they just went out of scope? If so, the VM may spend time GC'ing the object when the memory isn't needed anyway, which would actually cause worse performance CPU usage wise because it would be GC'ing more earlier.

Even if nullifying the reference were marginally more efficient, would it be worth the ugliness of having to pepper your code with these ugly nullifications? They would only be clutter and obscure the intent code that contains them.
Its a rare codebase that has no better candidate for optimisation than trying to outsmart the Garbage collector (rarer still are developers who succeed in outsmarting it). Your efforts will most likely be better spent elsewhere instead, ditching that crufty Xml parser or finding some opportunity to cache computation. These optimisations will be easier to quantify and don't require you dirty up your codebase with noise.

Oracle doc point out "Assign null to Variables That Are No Longer Needed" https://docs.oracle.com/cd/E19159-01/819-3681/abebi/index.html

"It depends"
I do not know about Java but in .net (C#, VB.net...) it is usually not required to assign a null when you no longer require a object.
However note that it is "usually not required".
By analyzing your code the .net compiler makes a good valuation of the life time of the variable...to accurately tell when the object is not being used anymore. So if you write obj=null it might actually look as if the obj is still being used...in this case it is counter productive to assign a null.
There are a few cases where it might actually help to assign a null. One example is you have a huge code that runs for long time or a method that is running in a different thread, or some loop. In such cases it might help to assign null so that it is easy for the GC to know its not being used anymore.
There is no hard & fast rule for this. Going by the above place null-assigns in your code and do run a profiler to see if it helps in any way. Most probably you might not see a benefit.
If it is .net code you are trying to optimize, then my experience has been that taking good care with Dispose and Finalize methods is actually more beneficial than bothering about nulls.
Some references on the topic:
http://blogs.msdn.com/csharpfaq/archive/2004/03/26/97229.aspx
http://weblogs.asp.net/pwilson/archive/2004/02/20/77422.aspx

In the future execution of your program, the values of some data members will be used to computer an output visible external to the program. Others might or might not be used, depending on future (And impossible to predict) inputs to the program. Other data members might be guaranteed not to be used. All resources, including memory, allocated to those unused data are wasted. The job of the garbage collector (GC) is to eliminate that wasted memory. It would be disastrous for the GC to eliminate something that was needed, so the algorithm used might be conservative, retaining more than the strict minimum. It might use heuristic optimizations to improve its speed, at the cost of retaining some items that are not actually needed. There are many potential algorithms the GC might use. Therefore it is possible that changes you make to your program, and which do not affect the correctness of your program, might nevertheless affect the operation of the GC, either making it run faster to do the same job, or to sooner identify unused items. So this kind of change, setting an unusdd object reference to null, in theory is not always voodoo.
Is it voodoo? There are reportedly parts of the Java library code that do this. The writers of that code are much better than average programmers and either know, or cooperate with, programmers who know details of the garbage collector implementations. So that suggests there is sometimes a benefit.

As you said there are optimizations, i.e. JVM knows the place when the variable was last used and the object referenced by it can be GCed right after this last point (still executing in current scope). So nulling out references in most cases does not help GC.
But it can be useful to avoid "nepotism" (or "floating garbage") problem (read more here or watch video). The problem exists because heap is split into Old and Young generations and there are different GC mechanisms applied: Minor GC (which is fast and happens often to clean young gen) and Major Gc (which causes longer pause to clean Old gen). "Nepotism" does not allow for garbage in Young gen to be collected if it is referenced by garbage which was already tenured to an Old gen.
This is 'pathological' because ANY promoted node will result in the promotion of ALL following nodes until a GC resolves the issue.
To avoid nepotism it's a good idea to null out references from an object which is supposed to be removed. You can see this technique applied in JDK classes: LinkedList and LinkedHashMap
private E unlinkFirst(Node<E> f) {
final E element = f.item;
final Node<E> next = f.next;
f.item = null;
f.next = null; // help GC
// ...
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What's the Java equivalent of .net's GC.KeepAlive? - java

Related

Avoiding objects garbage collection

java - Memory management - Destroying a list when we don't need it [duplicate]

Garbage collection vs manual memory management

How smart is the Java JVM about GC'ing during a reduce() operation on a long Scala list or Stream?

Does assigning objects to null in Java impact garbage collection?

Categories

Resources