How does Java GC call finalize() method? - java

As far as I understand, GC starts with some set of initial objects (stack, static objects) and recursively traverses it building a graph of reachable objects. Then it marks the memory taken by these objects as occupied and assumes all the rest of the memory free.
But what if this 'free' memory contains an object with finalize method? GC has to call it, but I don't see how it can even know about objects that aren't reachable anymore.
I suppose GC can keep track of all 'finalizable' objects while they are alive. If so, does having finalizable objects make garbage collecting more expensive even when they are still alive?

Consider the Reference API.
It offers some references with special semantics to the GC, i.e Weak, Soft, and Phantom references. There’s simply another non-public type of special reference, for objects needing finalization.
Now, when the garbage collector traverses the object graph and encounters such a special reference object, it will not mark objects reachable through this reference as strongly reachable, but reachable with the special semantics. So if an object is only finalizer-reachable, the reference will be enqueued, so that one (or one of the) finalizer thread(s) can poll the queue and execute the finalize() method (it’s not the garbage collector itself calling this method).
In other words, the garbage collector never processes entirely unreachable objects here. To apply a special semantic to the reachability, the reference object must be reachable, so the referent can be reached through that reference. In case of finalizer-reachability, Finalizer.register is called when an object is created and it creates an instance of Finalizer in turn, a subclass of FinalReference, and right in its constructor, it calls an add() method which will insert the reference into a global linked list. So all these FinalReference instances are reachable through that list until an actual finalization happens.
Since this FinalReference will be created right on the instantiation of the object, if its class declares a non-trivial finalize() method, there is already some overhead due to having a finalization requirement, even if the object has not collected yet.
The other issue is that an object processed by a finalizer thread is reachable by that thread and might even escape, depending on what the finalize() method does. But the next time, this object becomes unreachable, the special reference object does not exist anymore, so it can be treated like any other unreachable object.
This would only be a performance issue, if memory is very low and the next garbage collection had to be performed earlier to eventually reclaim that object. But this doesn’t happen in the reference implementation (aka “HotSpot” or “OpenJDK”). In fact, there could be an OutOfMemoryError while objects are pending in the finalizer queue, whose processing could make more memory reclaimable. There is no guaranty that finalization runs fast enough for you’re purposes. That’s why you should not rely on it.

But what if this 'free' memory contains an object with finalize
method? GC has to call it, but I don't see how it can even know about
objects that aren't reachable anymore.
Let's say we use CMS garbage collector. After it successfully marked all live objects in a first phase, it will then scan memory again and remove all dead objects. GC thread does not call finalize method directly for these objects.
During creation, they are wrapped and added to finalizer queue by JVM (see java.lang.ref.Finalizer.register(Object)). This queue is processed in another thread (java.lang.ref.Finalizer.FinalizerThread), finalize method will be called when there are no references to the object. More details are covered in this blog post.
If so, does having finalizable objects make garbage collecting more
expensive even when they are still alive?
As you can now see, most of the time it does not.

The finalise method is called when an object is about to get garbage collected. That means, when GC determines that the object is no longer being referenced, it can call the finalise method on it. It doesn't have to keep track of objects to be finalised.

According to javadoc, finalize
Called by the garbage collector on an object when garbage collection determines that there are no more references to the object.
So the decision is based on reference counter or something like that.
Actually it is possible not to have this method called at all. So it may be not a good idea to use it as destructor.

Related

Java - HashMap and WeakHashMap references used in Application

Just trying to understand something from GC viewpoint
public Set<Something> returnFromDb(String id) {
LookupService service = fromSomewhere();
Map<String,Object> where = new WeakHashMap<>() {}
where.put("id",id);
return service.doLookupByKVPair(where); // where doesn't need to be serializable
}
what I understand is that once this method call leaves the stack, there is no reference to where regardless of using HashMap or WeakHashMap - but since weak reference is weakly reachable wouldn't this be GCd faster? But if the method call leaves the stack, then there is no reachable reference anyway.
I guess the real question that I have is - "Would using WeakHashMap<> here actually matters at all" - I think it's a "No, because the impact is insignificant" - but a second answer wouldn't hurt my knowledge.
When you use a statement like where.put("id",id); you’re associating a value with a String instance created from a literal, permanently referenced by the code containing it. So the weak semantic of the association is pointless, as long as the code is reachable, this specific key object will never get garbage collected.
When the entire WeakHashMap becomes unreachable, the weak nature of the references has no impact on the garbage collection, as unreachable objects have in general. As discussed in this answer, the garbage collection performance mainly depends on the reachable objects, not the unreachable ones.
Keep in mind the documentation:
The relationship between a registered reference object and its queue is one-sided. That is, a queue does not keep track of the references that are registered with it. If a registered reference becomes unreachable itself, then it will never be enqueued. It is the responsibility of the program using reference objects to ensure that the objects remain reachable for as long as the program is interested in their referents.
In other words, a WeakReference has no impact when it is unreachable, as it will be treated like any other garbage, i.e. not treated at all.
When you have a strong reference to a WeakHashMap while a garbage collection is in progress, it will reduce the performance, as the garbage collector has to keep track of the encountered reachable WeakReference instances, to clear and enqueue them if their referent has not been encountered and marked as strongly reachable. This additional effort is the price you have to pay for allowing the earlier collection of the keys and the subsequent cleanup, which is needed to remove the strongly referenced value.
As said, when, like in your example, the key will never become garbage collected, this additional effort is wasted. But if no garbage collection happens while the WeakHashMap is used, there will be no impact, as said, as the collection of an entire object graph happens at once, regardless of what kind of objects are in the garbage.

How finalizable objects takes at least 2 garbage collection cycles before it can be reclaimed?

I'm reading this article and I can't really understand how the finalizable objects (objects which override the finalize method) takes at least 2 GC cycles before it can be reclaimed.
It takes at least two garbage collection cycles (in the best case) before a finalizeable object can be reclaimed.
Can someone also explain in detail how is it possible for a finalizable object to take more than one GC cycle for reclamation?
My logical argument is that when we override finalize method, the runtime will have to register this object with the garbage-collector (so that GC can call finalize of this object, which makes me think that GC will have reference to all the finalizable objects). And for this, GC will have to keep a strong reference to the finalizable object. If that is the case then how this object became a candidate for reclamation by GC in the first place? I reach a contradiction by this theory.
PS: I understand that overriding finalize is not the recommended approach and this method is deprecated since Java 9.
You are right in that the garbage collector needs a reference to finalizable objects. Of course, this particular reference must not be considered when deciding whether the object is still reachable before the finalization. This implies special knowledge about the nature of this reference to the garbage collector.
When the garbage collector determines that an object is eligible for finalization, the finalizer will run, which implies that the object becomes strongly reachable again, at least as long as the finalizer is executed. After its finalization, the object must become unreachable again and this must be detected, before the object’s memory can be reclaimed. That’s why it takes at least two garbage collection cycles.
In case of the widely used Hotspot/OpenJDK environment (and likely also in IBM’s JVM), this is implemented by creating an instance of a special, non-public subclass of Reference, a Finalizer, right when an object, whose class has a non-trivial finalize() method, is created. Like with weak & soft references, these references are enqueued by the garbage collector when no strong reference to the referent exist, but they are not cleared, so the finalizer thread can read the object, making it strongly reachable again for the finalization. At this point, the Finalizer is cleared, but also not referenced anymore, so it would get collected like an ordinary object anyway, so by the next time the referent becomes unreachable, no special reference to it exists anymore.
For objects whose class has a “trivial finalizer”, i.e. the finalize() method inherited by java.lang.Object or an empty finalize() method, the JVM will take a short-cut and not create the Finalizer instance in the first place, so you could say, these objects, which make the majority of all objects, behave as if their finalizer did already run, right from the start.
Though you got your answer (which is absolutely correct), I want to add a small-ish addendum here. In general, references are of two types : strong and weak. Weak References are WeakReference/SoftReference/PhantomReference and Finalizer(s).
When a certain GC cycle traverses the heap graph and sees one of these weak references, it treats it in a special way. When it first encounters a dead finalizer reference (let's consider this being the first GC cycle), it has to resurrect the instance. finalize is an instance method, and it needs an actual instance to be invoked. So a GC first saw that this Object is dead, only to revive it moments later, to be able to call finalize on it. Once it calls that method on it, it marks the fact that it has already been called; so when the next cycle happens, it can be actually be GC-ed.
It would be incorrect to call this the second GC.
For example G1GC does partial clean-up of the heap (young and mixed), so it might not even capture this reference in the next cycle. It might not fall under its radar, as simple as that.
Other GCs, like Shenandoah, have flags that control on which iteration to handle these special references (ShenandoahRefProcFrequency, 5 by default).
So indeed there is a need for two cycles, but they do not have to be subsequent.

Use of clear in java.lang.ref.Reference class

ReferenceQueue q = new ReferenceQueue();
Reference r = q.remove();
r.clear();
I see that the java doc says that the clear method clears this reference object. I don't understand the meaning of this. Does this clear from the memory and thus in other words the object has been garbage collected?
java.lang.Reference is a base class for few special references which are treated in special way by garbage collection.
Under certain circumstances garbage collector may push reference object in it's reference queue (reference may be queued only once in a lifetime).
clear() method can be used to suppress special handling (and thus additional work for garbage collector). If reference object is already in queue it doesn't make sense to clear it, it is already cleared by garbage collector.
This project on github has an implementation of resource management using PhantomReferences made for educational purpose. clear() is used if resource is disposed explicitly to avoid extra work for GC in that case.
clear() simply sets the internal reference to null. Since references are automatically cleared when being enqueued by the garbage collector (with the exception of phantom references, but this oddity can be ignored, it will be eliminated in Java 9), there is usually no need to call clear() on a reference received via ReferenceQueue.remove().
In principle, there is the possibility to enqueue references manually via enqueue() without clearing them, but there is little sense in that, as the primary purpose of the reference queue is to learn about references being enqueued by the garbage collector which will be cleared.
When you call clear() on a Reference object that has not been enqueued yet, it may allow the referent to get collected without enqueuing the Reference object. On the other hand, when you don’t need the Reference object anymore, you can let the JVM collect it like an ordinary object, together with the referent if there are no other references left, as in that case, it won’t get enqueued as well, making clear() unnecessary.

Garbage Created By Object that is Never Referenced

Is any garbage created by an object that is never referenced?
The example I am thinking of is using a static factory method to create an object then having that object perform a function but never creating a reference to it.
For Example:
LoggerFactory.getLogger(Foo.class).info("logging some stuff");
Does this just create an unreferenced object in eden space that will be garbage collected as soon as the next collection happens?
getLogger returns an instance - whether it creates a new one or returns a previously cached one is up to the LoggerFactory's implementation. If this object is no longer referenced from inside the factory in some way, it would be eligible for garbage collection.
Provided that getLogger() doesn't store the created Logger somewhere (which is quite possible), then yes. The garbage collector is very good at disposing short lived objects, so it should get GCd quite quickly.
Of course nobody would write that specific logging line, since it makes no sense.
Java GC works by periodically analyzing which objects are reachable via a chain of references. That does not depend on whether those objects ever were reachable in the first place.
Your question suggests that you think GC may watch for references to be reclaimed to determine which objects to collect. Although GC is not forbidden from doing so, it cannot rely exclusively on such a strategy, and I am unaware of any existing Java GC implementation employing it.
Does this just create an unreferenced object in eden space that will be garbage collected as soon as the next collection happens?
Maybe, maybe not. The Logger instance is referenced as this inside info()
E.g. if info() then creates an anonymous inner class or a this-capturing lambda and puts it on a task queue then the Logger instance may live longer than the line of code in your question.
In most scenarios it is likely still be very short-lived. But you cannot know that for certain from the single line of code.
On the other end of the spectrum the object may never be allocated on the heap in the first place, i.e. not even in eden space, due to Escape Analysis

What if a finalizer makes an object reachable?

In Java, finalize is called on an object (that overrides it) when it's about to be garbage collectioned, so when it's unreachable. But what if the finalizer makes the object reachable again, what happens then?
Then the object doesn't get garbage collected, basically. This is called object resurrection. Perform a search for that term, and you should get a bunch of interesting articles. As Jim mentioned, one important point is that the finalizer will only be run once.
The object will not be collected until it gets unreachable again.
According to the JavaDoc, finalize() will not be called again.
If you read the API description carefully, you'll see that the finalizer can make the object reachable again. The object won't be discarded until it is unreachable (again), but finalize() won't be called more than once.
Yeah, this is why you don't use finalizers (Well, one of the many reasons).
There is a reference collection that is made to do this stuff. I'll look it up and post it here in a sec, but I think it's PhantomReference.
Yep, PhantomReference:
Phantom reference objects, which are enqueued after the collector determines that their referents may otherwise be reclaimed. Phantom references are most often used for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism.
It actually does another pass to check and make sure there are no more references to the object. Since it will fail that test on its second pass, you'll end up not freeing the memory for the object.
Because finalize is only called a single time for any given object, the next time through when it has no references, it will just free the memory without calling finalize. Some good information here on finalization.

Categories