How are weak references implemented? - java

I wonder how weak references work internally, for example in .NET or in Java. My two general ideas are:
"Intrusive" - to add list of weak references to the most top class (object class). Then, when an object is destroyed, all the weak references can be iterated and set to null.
"Non-intrusive" - to maintain a hashtable of objects' pointers to lists of weak references. When a weak reference A is created to an object B, there would be an entry in the hashtable modified or created, whose key would be the pointer to B.
"Dirty" - to store a special hash-value with each object, which would be zeroed when the object is destroyed. Weak references would copy that hash-value and would compare it with the object's value to check if the object is alive. This would however cause access violation errors, when used directly, so there would need to be an additional object with that hash-value, I think.
Either of these solutions seems clean nor efficient. Does anyone know how it is actually done?

In .NET, when a WeakReference is created, the GC is asked for a handle/opaque token representing the reference. Then, when needed, WeakReference uses this handle to ask the GC if that handle is still valid (i.e. the original object still exists) - and if so, it can get the actual object reference.
So this is building a list of tokens/handles against object addresses (and presumably maintaining that list during defragmentation etc)
I'm not sure I 100% understand the three bullets, so I hesitate to guess which (if any) that is closest to.

Not sure I understood your question, but you can have a look at the implementation for the class WeakReference and its superclass Reference in Java. It is well commented and you can see it has a field treated specially by the GC and another one used directly by the VM.

Python's PEP 205 has a decent explanation of how weak references should behave in Python, and this gives some insight into how they can be implemented. Since a weak reference is immutable, you could have just one for each object, to which you pass out references as needed. Thus, when the object is destroyed, only one weak reference needs to be invalidated.

It seems that implementation of weak references is well-kept secret in the industry ;-). For example, as of now, wikipedia article lacks any implementation details. And look at the answers above (including the accepted): "go look at the source" or "I think" ;-\ .
Of all the answers, only the one referencing Python's PEP 205 is insightful. As it says, for any single object, there can be at most one weak reference, if we treat weakref as an entity itself.
The rest describes Squirrel language implementation. So, weakref is itself an object, when you put weak reference to an object in some container, you actually put reference to weakref object. Each ref-countable object has field to store pointer to its weakref, which is NULL until weakref to that object is actually requested. Each object has method to request weakref, which either returns existing (singleton) weakref from the field, or creates it and caches in the field.
Of course, weakref points to the original object. So, then you just need to go thru all the available places where references to objects are handled and add transparent handling of weakrefs (i.e. automatically dereference it). ("Transparent" alternative is to add virtual "access" method which will be identity for most objects, and actual dereference for weakref.)
And as object has pointer to its weakref, then the object can NULLify the weakref in own destructor.
This implementation is pretty clean (no magic "calls into GC" and stuff) and has O(1) runtime cost. Of course, it's pretty greedy of memory - need to add +1 pointer field to each object, even though typically for 90+% objects that would be NULL. Of course, VHLLs already have large memory overhead per object, and there may be chance to compact different "extra" fields. For example, object type is typically a small enumeration, so it may be possible to merge type and some kind of weakref reference into single machine word (say, keep weakref objects in a separate arena, and use index to that).

The normal approach, I think, is for the system to maintain some sort of list of weak references. When the garbage collector executes, before dead objects are removed, the system iterates through the list of weak references and invalidates any reference whose target has not been tagged live. Depending upon the system, this may occur before or after the system temporarily resurrects objects which are eligible for immediate finalization (in the case of .net, there are two kinds of WeakReference--one of which is effectively processed before the system scans for finalizers, meaning that it will become invalid when its target becomes eligible for finalization, and one of which is processed after).
Incidentally, if I were designing a gc-based framework, I would add a couple of other goodies: (1) a means of declaring a reference-type storage location as holding a reference that's primarily of interest to someone else, and (2) A variety of WeakReference which would could indicate that the only references to an object are in "of interest to someone else" storage locations. Although WeakReference is a useful type, the act of turning a weak reference into a strong reference may prevent the system from ever recognizing that nobody would mind if its target disappeared.

Related

Is there a way to receive object, without having reference to it?

Suppose following code:
Object obj = new Object();
obj = null;
At this point, i don't have any reference to this object, but it's still on the heap, because garbage collection don't happens instantly. Is there a way to re obtain reference on this object, before it'll be collected by GC?
Only possible way that i seen so far is to use Unsafe, which provides direct memory access, but i will need to know where in memory exactly object is allocated. Also, there is Weak\SoftReference, but they are implemented by special GC behavior.
P.S. To predict questions like "Why do you need it?" - Because science is not about why, it's about why not! (c)
This is highly JVM implementation specific. In a naive implementation having memory allocation information associated with each object, you could find an object whose memory has not been freed yet and it seems you are thinking into that direction.
However, sophisticated JVMs don’t work that way. Associating allocation information with each object would create a giant overhead, given that you may have millions of objects in your runtime. Not only regarding memory requirement, but also regarding the amount of work that has to be done for maintaining these information when allocating or freeing an object.
So what makes a part of your heap memory an object? Only the reference you are holding to it. The garbage collector traverses existing references and within the objects found this way, it will find meta information (i.e. a pointer to class specific information) needed to understand how much memory belongs to the object and how to interpret the contained data (to traverse the sub-references, if any). Everything unreferenced is unused per se and might contain old objects or might have never been used at all, who knows. Once all references to an object are gone, there is no information left about the former existence of this object.
Getting to the point, there is no explicit freeing action. When the garbage collector has found surviving objects, they will be copied to a dedicated new place and their old place is considered to be free, regardless of how many objects there were before and how much memory each individual object occupied when it was alive.
When you search memory that is considered to be unused, you may find reminiscences of old objects, but without references to their starting points, it’s impossible to say whether the bit pattern that looks like an object really is a dead object or just a coincidence. Even if you managed to resurrect an object that way, it had nothing to do with your original idea of being able to resurrect a reference, because the gc didn’t run yet.
Note that all modifications to this ordinary life time work by holding another reference to the object. E.g., when the class defines a non-trivial finalize() method, the JVM has to add a reference to the queue of objects needing finalization. Similarly, soft, weak and phantom references encapsulate a reference to the object in question. Also a debugger may keep a reference to an object, once it has seen it.
But for your trivial code Object obj = new Object(); obj = null;, assuming there’s no breakpoint set in-between, there will be no additional reference and hence, no way of resurrecting the object. A JVM may even elide the entire allocation when optimizing the code at runtime. So then you wouldn’t even find remainings of the object in the RAM when searching it as the object effectively never existed.
At this point, i don't have any reference to this object, but it's still on the heap, because garbage collection don't happens instantly.
It is undefined where it is, and it is also undefined whether or not garbage collection happens instantly.
Is there a way to re obtain reference on this object, before it'll be collected by GC?
You already had one and you threw it away. Just keep it.
I will need to know where in memory exactly object is allocated.
There is nothing in standard Java that will tell you that, and no useful way you could make use of the information if you could get it.
Also, there is Weak/SoftReference, but they are implemented by special GC behavior.
I don't see how this affects your question, whatever it is.

Is there a practical use for weak references? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Weak references - how useful are they?
Since weak references can be claimed by the garbage collector at any time, is there any practical reason for using them?
If you want to keep a reference to something as long as it is used elsewhere e.g. a Listener, you can use a weak reference.
WeakHashMap can be used as a short lived cache of keys to derived data. It can also be used to keep information about objects used else where and you don't know when those objects are discarded.
BTW Soft References are like Weak references, but they will not always be cleaned up immediately. The GC will always discard weak references when it can and retain Soft References when it can.
There is another kind of reference called a Phantom Reference. This is used in the GC clean up process and refers to an object which isn't accessible to "normal" code because its in the process of being cleaned up.
Since weak reference can be claimed by garbage collector at any time, is there any practical reason to use it?
Of course there are practical reasons to use it. It would be awfully strange if the framework designers went to the enormous expense of building a weak reference system that was impractical, don't you think?
I think the question you intended to ask was:
What are realistic situations in which people use weak references?
There are many. A common one is to achieve a performance goal. When performance tuning an application one often must make a tradeoff between more memory usage and more time usage. Suppose for example there is a complex calculation that you must perform many times, but the computation is "pure" -- the answer depends only on the arguments, not upon exogenous state. You can build a cache -- a map from the arguments to the result -- but that then uses memory. You might never ask the question again, and that memory is would then be wasted.
Weak references possibly solve this problem; the cache can get quite large, and therefore time is saved if the same question is asked many times. But if the cache gets large enough that the garbage collector needs to reclaim space, it can do so safely.
The downside is of course that the cleanup policy of the garbage collector is tuned to meet the goals of the whole system, not your specific cache problem. If the GC policy and your desired cache policy are sufficiently aligned then weak references are a highly pragmatic solution to this problem.
If a WeakReference is the only reference to an object, and you want the object to hang around, you should probably be using a SoftReference instead.
WeakReferences are best used in cases where there will be other references to the object, but you can't (or don't want to have to) detect when those other references are no longer used. Then, the other reference will prevent the object from being garbage collected, and the WeakReference will just be another way of getting to the same object.
Two common use cases are:
For holding additional (often expensively calculated but reproducible) information about specific objects that you cannot modify directly, and whose lifecycle you have little control over. WeakHashMap is a perfect way of holding these references: the key in the WeakHashMap is only weakly held, and so when the key is garbage collected, the value can be removed from the Map too, and hence be garbage collected.
For implementing some kind of eventing or notification system, where "listeners" are registered with some kind of coordinator, so they can be informed when something occurs – but where you don't want to prevent these listeners from being garbage collected when they come to the end of their life. A WeakReference will point to the object while it is still alive, but point to "null" once the original object has been garbage collected.
We use it for that reason - in our example, we have a variety of listeners that must register with a service. The service keeps weak references to the listeners, while the instantiated classes keep strong references. If the classes at any time get GC'ed, the weak reference is all that remains of the listeners, which will then be GC'ed as well. It makes keeping track of the intermediary classes much easier.
The most common usage of weak references is for values in "lookup" Maps.
With normal (hard) value references, if the value in the map no longer has references to it elsewhere, you often don't need the lookup any more. With weakly referenced map values, once there are no other references to it, the object becomes a candidate for garbage collection
The fact that the map itself has a (the only) reference to the object does not stop it from being garbage collected because the reference is a weak reference
To prevent memory leaks, see this article for details.
A weak reference is a reference that does not protect the referent object from collection by a garbage collector.
An object referenced only by weak references is considered
unreachable (or "weakly reachable") and so may be collected at any
time.
Weak references are used to avoid keeping memory referenced by
unneeded objects. Some garbage-collected languages feature or support
various levels of weak references, such as Java, C#, Python, Perl, PHP or
Lisp.
Garbage collection is used to reduce the potential for memory leaks
and data corruption. There are two main types of garbage collection:
tracing and reference counting. Reference counting schemes record the
number of references to a given object and collect the object when
the reference count becomes zero. Reference-counting cannot collect
cyclic (or circular) references because only one object may be
collected at a time. Groups of mutually referencing objects which are
not directly referenced by other objects and are unreachable can thus
become permanently resident; if an application continually generates
such unreachable groups of unreachable objects this will have the
effect of a memory leak. Weak references may be used to solve the
problem of circular references if the reference cycles are avoided by
using weak references for some of the references within the group.
Weak references are also used to minimize the number of unnecessary
objects in memory by allowing the program to indicate which objects
are not critical by only weakly referencing them.
I use it generally for some type of cache. Recently accessed items are available immediately and in the case of cache miss you reload the item (DB, FS, whatever).
I use WeakSet to encode links in a graph. If a node is deleted, the links automatically disappear.

When a PhantomReference/SoftReference/WeakReference is queued, how do you know what it referred to?

I haven't used PhantomReferences. There seems to be very few good examples of real-world use.
When a phantom shows up in your queue, how do you know which object it is/was? The get() method appears to be useless. According to the JavaDoc,
Because the referent of a phantom reference is always inaccessible,
this method always returns null.
I think that unless your object is a singleton, you always want to use a subclass of PhantomReference, in which you place whatever mementos you need in order to understand what died.
Is this correct, or did I miss something?
Is this also true for SoftReferences?
For WeakReferences?
Links to relevant examples of usage would be great.
I think that unless your object is a singleton, you always want to use a subclass of PhantomReference, in which you place whatever mementos you need in order to understand what died.
You could also use a Map<Reference<?>, SomeMetadataClassOrInterface> to recover whatever metadata you need. Since ReferenceQueue<T> returns a Reference<T>, you either have to cast it to whatever subclass of PhantomReference you expect, or let a Map<> do it for you.
For what it's worth, it looks like using PhantomReferences puts some burden on you:
Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector as they are enqueued. An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachable.
so you'd have to clear() the references yourself in order for memory to be reclaimed. (why there is usefulness in having to do so vs. letting the JVM do this for you is beyond me)
Your question has caused me to look into it a little more, and I found this very well written explanation and examples of all the reference types. He even talks about some (tenuous) uses of phantom references.
http://weblogs.java.net/blog/2006/05/04/understanding-weak-references

Does circular GC work in a map?

I have a User object which strongly refers to a Data object.
If I create a Map<Data, User> (with Guava MapMaker) with weak keys, such a key would only be removed if it's not referenced anywhere else. However, it is always refered to by the User object that it maps to, which is in turn only removed from the map when the Data key is removed, i.e. never, unless the GC's circular reference detection also works when crossing a map (I hope you understand what I mean :P)
Will Users+Datas be garbage collected if they're no longer used elsewhere in the application, or do I need to specify weak values as well?
The GC doesn't detect circular references because it doesn't need to.
The approach it takes is to keep all the objects which are strongly referenced from root nodes e.g. Thread stacks. This way objects not accessible strongly (with circular references or not) are collected.
EDIT: This may help explain the "myth"
http://www.javacoffeebreak.com/articles/thinkinginjava/abitaboutgarbagecollection.html
Reference counting is commonly used to explain one kind of garbage collection but it doesn't seem to be used in any JVM implementations.
This is an interesting link http://www.ibm.com/developerworks/library/j-jtp10283/
In documentation you see:
weakKeys()
Specifies that each key (not value) stored in the map should be wrapped in a WeakReference (by default, strong references are used).
since it is weakReferenced it will be collected.
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html

Can someone explain the difference between Strong, Soft, Weak and Phantom references and the usage of it? [duplicate]

This question already has answers here:
What's the difference between SoftReference and WeakReference in Java?
(12 answers)
Closed 6 years ago.
I have been trying to understand the difference between different references but the theory does not provoke any ideas for me to visualize the same.
Could anyone please explain in brief the different references?
An example for each would do better.
Another good article on the topic:
Java Reference Objects or How I Learned to Stop Worrying and Love OutOfMemoryError, with nice diagrams
Extract:
As you might guess, adding three new optional states to the object life-cycle diagram makes for a mess.
Although the documentation indicates a logical progression from strongly reachable through soft, weak, and phantom, to reclaimed, the actual progression depends on what reference objects your program creates.
If you create a WeakReference but don't create a SoftReference, then an object progresses directly from strongly-reachable to weakly-reachable to finalized to collected. object life-cycle, with reference objects
It's also important to remember that not all objects are attached to reference objects — in fact, very few of them should be.
A reference object is a layer of indirection: you go through the reference object to reach the referred object, and clearly you don't want that layer of indirection throughout your code.
Most programs, in fact, will use reference objects to access a relatively small number of the objects that the program creates.
References and Referents
A reference object provides a layer of indirection between your program code and some other object, called the referent.
Each reference object is constructed around its referent, and provides a get() method to access the referent. Once you create a reference, you cannot change its referent. Once the referent has been collected, the get() method returns null. relationships between application code, soft/weak reference, and referent
Even more examples: Java Programming: References' Package
alt text http://www.pabrantes.net/blog/space/start/2007-09-16/1/referenceTypes.png
Case 1: This is the regular case where Object is said to be strongly reachable.
Case 2: There are two paths to Object, so the strongest one is chosen, which is the one with the strong reference hence the object is strongly reachable.
Case 3: Once again there are two paths to the Object, the strongest one is the Weak Reference (since the other one is a Phantom Reference), so the object is said to be weakly reachable.
Case 4: There is only one path and the weakest link is a weak reference, so the object is weakly reachable.
Case 5: Only one path and the weakest link is the phantom reference hence the object is phantomly reachable.
Case 6: There are now two paths and the strongest path is the one with a soft reference, so the object is now said to be softly reachable.
An article explaining these types of references (including examples): Understanding Weak References - weblogs.java.net (From the Web Archive)
There's a really simple rule:
strongly-referenced objects are standard bits of code like Object a = new Object(). The referenced Objects are not garbage as long as the reference (a, above) is "reachable". Hence anything which has no reachable strong reference can be deemed garbage.
So then we look at the non-strong reference types:
weakly-referenced objects will probably get collected by the JVM as soon as they become eligible for GC (and the WeakReference cleared). A weak reference to a would look like new WeakReference<Object>(a). Weak references are useful in the case that you want a cache whereby the data is only needed if the keys exist as strongly-reachable elsewhere (e.g. HttpSessions)
softly-referenced objects will probably hang around in the JVM until it absolutely needs to recover memory. Soft references are useful for caches where the values are long-lived but can collected if necessary
I'm never too sure about phantom ones!

Categories