I am developing some concurrent algorithms which deal with Reference objects. I am using java 17.
The thing is I don't know what's the memory semantics of operations like get, clear or refersTo. It isn't documented in the Javadoc.
Looking into the source code of OpenJdk, the referent has no modifier, such as volatile (while the next pointer for reference queues is volatile).
Also, get implementation is trivial, but it is an intrinsic candidate. clear and refersTo are native. So I don't know what they really do.
When the GC clears a reference, I have to assume that all threads will see it cleared, or otherwise they would see a reference to an object (in process of being) garbage collected, but it's just an informal guess.
Is there any warranty about the memory semantics of all these operations?
If there isn't, is there a way to obtain the same warranries of a volatile access by invoking, for instance, a fence operation before and/or after calling one of these operations?
When you invoke clear() on a reference object, it will only clear this particular Reference object without any impact on the rest of your application and no special memory semantics. It’s exactly like you have seen in the code, an assignment of null to a field which has no volatile modifier.
Mind the documentation of clear():
This method is invoked only by Java code; when the garbage collector clears references it does so directly, without invoking this method.
So this is not related to the event of the GC clearing a reference. Your assumption “that all threads will see it cleared” when the GC clears a reference is correct. The documentation of WeakReference states:
Suppose that the garbage collector determines at a certain point in time that an object is weakly reachable. At that time it will atomically clear all weak references to that object and all weak references to any other weakly-reachable objects from which that object is reachable through a chain of strong and soft references.
So at this point, not only all threads will agree that a weak reference has been cleared, they will also agree that all weak references to the same object have been cleared. A similar statement can be found at SoftReference and PhantomReference.
The Java Language Specification, §12.6.2. Interaction with the Memory Model refers to points where such an atomic clear may happen as reachability decision points. It specifies interactions between these points and other program actions, in terms of “comes-before di” and “comes-after di” relationships, the most import ones being:
If r is a read that sees a write w and r comes-before di, then w must come-before di.
If x and y are synchronization actions on the same variable or monitor such that so(x, y) (§17.4.4) and y comes-before di, then x must come-before di.
So, the GC action will be inserted into the synchronization order and even a racy read could not subvert it, but it’s important to keep in mind that the exact location of the reachability decision point is not known to the application. It’s obviously somewhere between the last point where get() returned a non-null reference or refersTo(null) returned false and the first point where get() returned null or refersTo(null) returned true.
For practical applications, the fact that once the reference reports the object to be garbage collected you can be sure that it won’t reappear anywhere¹, is enough. Just keep the reference object private, to be sure that not someone invoked clear() on it.
¹ Letting things like “finalizer resurrection aside”
Related
Just trying to understand something from GC viewpoint
public Set<Something> returnFromDb(String id) {
LookupService service = fromSomewhere();
Map<String,Object> where = new WeakHashMap<>() {}
where.put("id",id);
return service.doLookupByKVPair(where); // where doesn't need to be serializable
}
what I understand is that once this method call leaves the stack, there is no reference to where regardless of using HashMap or WeakHashMap - but since weak reference is weakly reachable wouldn't this be GCd faster? But if the method call leaves the stack, then there is no reachable reference anyway.
I guess the real question that I have is - "Would using WeakHashMap<> here actually matters at all" - I think it's a "No, because the impact is insignificant" - but a second answer wouldn't hurt my knowledge.
When you use a statement like where.put("id",id); you’re associating a value with a String instance created from a literal, permanently referenced by the code containing it. So the weak semantic of the association is pointless, as long as the code is reachable, this specific key object will never get garbage collected.
When the entire WeakHashMap becomes unreachable, the weak nature of the references has no impact on the garbage collection, as unreachable objects have in general. As discussed in this answer, the garbage collection performance mainly depends on the reachable objects, not the unreachable ones.
Keep in mind the documentation:
The relationship between a registered reference object and its queue is one-sided. That is, a queue does not keep track of the references that are registered with it. If a registered reference becomes unreachable itself, then it will never be enqueued. It is the responsibility of the program using reference objects to ensure that the objects remain reachable for as long as the program is interested in their referents.
In other words, a WeakReference has no impact when it is unreachable, as it will be treated like any other garbage, i.e. not treated at all.
When you have a strong reference to a WeakHashMap while a garbage collection is in progress, it will reduce the performance, as the garbage collector has to keep track of the encountered reachable WeakReference instances, to clear and enqueue them if their referent has not been encountered and marked as strongly reachable. This additional effort is the price you have to pay for allowing the earlier collection of the keys and the subsequent cleanup, which is needed to remove the strongly referenced value.
As said, when, like in your example, the key will never become garbage collected, this additional effort is wasted. But if no garbage collection happens while the WeakHashMap is used, there will be no impact, as said, as the collection of an entire object graph happens at once, regardless of what kind of objects are in the garbage.
I'm reading this article and I can't really understand how the finalizable objects (objects which override the finalize method) takes at least 2 GC cycles before it can be reclaimed.
It takes at least two garbage collection cycles (in the best case) before a finalizeable object can be reclaimed.
Can someone also explain in detail how is it possible for a finalizable object to take more than one GC cycle for reclamation?
My logical argument is that when we override finalize method, the runtime will have to register this object with the garbage-collector (so that GC can call finalize of this object, which makes me think that GC will have reference to all the finalizable objects). And for this, GC will have to keep a strong reference to the finalizable object. If that is the case then how this object became a candidate for reclamation by GC in the first place? I reach a contradiction by this theory.
PS: I understand that overriding finalize is not the recommended approach and this method is deprecated since Java 9.
You are right in that the garbage collector needs a reference to finalizable objects. Of course, this particular reference must not be considered when deciding whether the object is still reachable before the finalization. This implies special knowledge about the nature of this reference to the garbage collector.
When the garbage collector determines that an object is eligible for finalization, the finalizer will run, which implies that the object becomes strongly reachable again, at least as long as the finalizer is executed. After its finalization, the object must become unreachable again and this must be detected, before the object’s memory can be reclaimed. That’s why it takes at least two garbage collection cycles.
In case of the widely used Hotspot/OpenJDK environment (and likely also in IBM’s JVM), this is implemented by creating an instance of a special, non-public subclass of Reference, a Finalizer, right when an object, whose class has a non-trivial finalize() method, is created. Like with weak & soft references, these references are enqueued by the garbage collector when no strong reference to the referent exist, but they are not cleared, so the finalizer thread can read the object, making it strongly reachable again for the finalization. At this point, the Finalizer is cleared, but also not referenced anymore, so it would get collected like an ordinary object anyway, so by the next time the referent becomes unreachable, no special reference to it exists anymore.
For objects whose class has a “trivial finalizer”, i.e. the finalize() method inherited by java.lang.Object or an empty finalize() method, the JVM will take a short-cut and not create the Finalizer instance in the first place, so you could say, these objects, which make the majority of all objects, behave as if their finalizer did already run, right from the start.
Though you got your answer (which is absolutely correct), I want to add a small-ish addendum here. In general, references are of two types : strong and weak. Weak References are WeakReference/SoftReference/PhantomReference and Finalizer(s).
When a certain GC cycle traverses the heap graph and sees one of these weak references, it treats it in a special way. When it first encounters a dead finalizer reference (let's consider this being the first GC cycle), it has to resurrect the instance. finalize is an instance method, and it needs an actual instance to be invoked. So a GC first saw that this Object is dead, only to revive it moments later, to be able to call finalize on it. Once it calls that method on it, it marks the fact that it has already been called; so when the next cycle happens, it can be actually be GC-ed.
It would be incorrect to call this the second GC.
For example G1GC does partial clean-up of the heap (young and mixed), so it might not even capture this reference in the next cycle. It might not fall under its radar, as simple as that.
Other GCs, like Shenandoah, have flags that control on which iteration to handle these special references (ShenandoahRefProcFrequency, 5 by default).
So indeed there is a need for two cycles, but they do not have to be subsequent.
ReferenceQueue q = new ReferenceQueue();
Reference r = q.remove();
r.clear();
I see that the java doc says that the clear method clears this reference object. I don't understand the meaning of this. Does this clear from the memory and thus in other words the object has been garbage collected?
java.lang.Reference is a base class for few special references which are treated in special way by garbage collection.
Under certain circumstances garbage collector may push reference object in it's reference queue (reference may be queued only once in a lifetime).
clear() method can be used to suppress special handling (and thus additional work for garbage collector). If reference object is already in queue it doesn't make sense to clear it, it is already cleared by garbage collector.
This project on github has an implementation of resource management using PhantomReferences made for educational purpose. clear() is used if resource is disposed explicitly to avoid extra work for GC in that case.
clear() simply sets the internal reference to null. Since references are automatically cleared when being enqueued by the garbage collector (with the exception of phantom references, but this oddity can be ignored, it will be eliminated in Java 9), there is usually no need to call clear() on a reference received via ReferenceQueue.remove().
In principle, there is the possibility to enqueue references manually via enqueue() without clearing them, but there is little sense in that, as the primary purpose of the reference queue is to learn about references being enqueued by the garbage collector which will be cleared.
When you call clear() on a Reference object that has not been enqueued yet, it may allow the referent to get collected without enqueuing the Reference object. On the other hand, when you don’t need the Reference object anymore, you can let the JVM collect it like an ordinary object, together with the referent if there are no other references left, as in that case, it won’t get enqueued as well, making clear() unnecessary.
Java allow to write:
new PhantomReference(new Object(), null)
At this case new Object() will be collected?
As I understand, phantom reference is alternative of finalize() method usage.
And after appearing reference in queue, I need to do some additional actions and then run clear()
java doc stays:
It is possible to create a phantom reference with a null queue, but
such a reference is completely useless: Its get method will always
return null and, since it does not have a queue, it will never be
enqueued
What does mean if it will never be enqueued?
As I understand it means that after finalize method invocation rerference will not be added to the referenceQueue. Thus it may lead to:
1. object memory will be cleared at once
2. Object memory will not be cleared
which case correct?
Well, as you noticed yourself, a PhantomReference is not automatically cleared. This implies that as long as you keep a strong reference to the PhantomReference, the referent will stay phantom reachable. As the documentation says: “An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachable.”
However, considering when an object is unreachable (now I’m talking about the “phantom references themselves”) can lead to many surprises. Especially as it’s very likely that the reference object, not providing useful operations, will not be subsequently touched anymore.
Since the PhantomReference without a queue will never be enqueued and its get() method will always return null, it is indeed not useful.
So why does the constructor allows to construct such a useless object? Well, the documentation of the very first version (1.2) states that it will throw a NullPointerException if the queue is null. This statement persists until 1.4, then Java 5 is the first version containing the statement that you can construct a PhantomReference without a queue, despite being useless. My guess is, that it always inherited the super class’ behavior of allowing a null queue, contradicting the documentation, and it was noticed so late, that the decision was made to stay compatible and adapt the documentation rather than changing the behavior.
The question, even harder to answer, is why a PhantomReference isn’t automatically cleared. The documentation only says that a phantom reachable object will remain so, which is the consequence of not being cleared, but doesn’t explain why this has any relevance.
This question has been brought up on SO, but the answer isn’t really satisfying. It says “to allow performing cleanup before an object is garbage collected”, which might even match the mindset of whoever made that design decision, but since the cleanup code can’t access the object, it has no relevance whether it is executed before or after the object is reclaimed. As said above, since this rule depends on the reachability of the PhantomReference object, which is subject to optimizing code transformations, it might be even the case that the object is reclaimed together with the PhantomReference instance before the cleanup code completes, without anyone noticing.
I also found a similar question on the HotSpot developer mailing list back in 2013 which also lacks an answer.
There is the enhancement request JDK-8071507 to change that behavior and clear PhantomReferences just like the others, which has the status “fixed” for Java 9, and indeed, its documentation now states that they are cleared like any other reference.
This, unfortunately implies that the answer at the beginning of my post will be wrong starting with Java 9. Then, new PhantomReference(new Object(), null) will make the newly created Object instance immediately eligible for garbage collection, regardless of whether you keep a strong reference to the PhantomReference instance or not.
Say there are two objects, A and B, and there is a pointer A.x --> B, and we create, say, WeakReferences to both A and B, with an associated ReferenceQueue.
Assume that both A and B become unreachable. Intuitively B cannot be considered unreachable before A is. In such a case, do we somehow get a guarantee that the respective references will be enqueued in the intuitive (topological when there are no cycles) order in the ReferenceQueue? I.e. ref(A) before ref(B). I don't know - what if the GC marked a bunch of objects as unreachable, and then enqueued them in no particular order?
I was reviewing Finalizer.java of guava, seeing this snippet:
private void cleanUp(Reference<?> reference) throws ShutDown {
...
if (reference == frqReference) {
/*
* The client no longer has a reference to the
* FinalizableReferenceQueue. We can stop.
*/
throw new ShutDown();
}
frqReference is a PhantomReference to the used ReferenceQueue, so if this is GC'ed, no Finalizable{Weak, Soft, Phantom}References can be alive, since they reference the queue. So they have to be GC'ed before the queue itself can be GC'ed - but still, do we get the guarantee that these references will be enqueued to the ReferenceQueue at the order they get "garbage collected" (as if they get GC'ed one by one)? The code implies that there is some kind of guarantee, otherwise unprocessed references could theoretically remain in the queue.
Thanks
I'm pretty sure that the answer is no.
The JVM spec says this about finalizer methods:
The Java virtual machine imposes no ordering on finalize method calls. Finalizers may be called in any order or even concurrently. (JVM spec 2.17.7)
From this I infer that there are no guarantees that references are queued in topological order.
There is no ordering guarantee. In the case of Finalizer.java, the thread can be shut down before all references are processed. See the docs for FinalizableReferenceQueue:
Keep a strong reference to this object until all of the associated
referents have been finalized. If this object is garbage collected earlier,
the backing thread will not invoke {#code finalizeReferent()} on the
remaining references.
This is intentional behavior. For example, we use FRQ to clean up map entries when references to the keys and/or values are cleared. If the user no longer has a reference to the map, and in turn no longer has a reference to the FRQ, there's no point in processing those references.
I think there is no such guarantee. The GC itself does not have a complete and immediate view of the RAM (it cannot, since the GC runs on the CPU which can only look at a few bytes at a time). In your example, assuming a basic "mark & sweep" GC, chances are that A and B will be declared unreachable in the same mark phase, and swept together in no particular order. Maintaining topological order would probably be expensive.
As for Finalizer, it seems that it is meant to be used through a FinalizableReferenceQueue instance only, which does some classloader-related magic. The Finalizer uses its own facilities to detect when the FinalizableReferenceQueue from which it functionally depends becomes itself unreachable; this is the point when the thread which runs Finalizer knows that it should exit. From what I understand, if the application lets the GC reclaim the FRQ, then the finalizer thread will exit, and any references enqueued "after" the FRQ reference will not be processed. This depends on topological order or lack thereof, but I cannot decide whether this is a problem or not. I think that the application is not supposed to drop its FRQ as long as processing of reclaimed referenced objects is important.