How does a weak hash map know to garbage-collect an object?

How does a weak hash map know to garbage-collect an object? - java

I recently found out about the WeakHashMap data structure in Java.
However, I don't understand what it means by it garbage-collects a mapping when it is no longer in ordinary use. How does the data structure know I will no longer use a key in my program? What if I don't refer to a key for a long time?

However, I don't understand what it means by it garbage-collects a mapping when it is no longer in ordinary use.
OK. Under normal circumstances, when the garbage collector runs it will remove objects that your program can no longer use. The technical term is an "unreachable object", and it means that the program execution has no way to get hold of a reference to the object any more. An object that is unreachable, may be collected in the next GC cycle ... or not. Either way, it is no longer the application's concern.
In this case, the WeakHashMap uses a special class called WeakReference to refer to the keys1. A weak reference is an object that acts sort of like an indirect pointer (a pointer to an object holding a pointer). It has the interesting property that the garbage collector is allowed to break the reference; i.e. replace the reference it contains with null. And the rule is that a weak reference to an object will be broken when the GC notices that the object is no longer reachable via a chain of normal (strong) or soft references2.
The phrase "no longer in ordinary use" really means that the key object is no longer strongly or softly reachable; i.e. via a chain of strong and / or soft references.
How does the data structure know I will no longer use a key in my program?
The WeakHashmap doesn't do it. Rather, it is the GC that notices that the key is not strongly reachable.
As part of its normal traversal, the GC will find and mark all strongly reachable objects. Then it goes through all of the WeakReference objects and checks to see if
the objects they refer to have been marked, and breaks them if they have not. (Or something like that ... I've never looked at the actual GC implementation. And it is complicated by the fact that it has to deal with SoftReference and PhantomReference objects as well.)
The only involvement that WeakHashmap has is that:
it created and uses WeakReference objects for the keys, and
it expunges hash table entries whose key WeakReferences have been cleared by the GC.
What if I don't refer to a key for a long time?
The criterion for deciding that a weak reference should be broken is not time based.
But it is possible that timing influences whether not a key is removed. For instance, a key could 1) cease to be strongly reference, 2) be retrieved from the map, and 3) be assigned to a reachable variable making it strongly referenced once again. If the GC doesn't run during the window in which the key is not strongly reachable, the key and its associated value will stay in the map. (Which is what you'd want to happen ...)
1 - Implementation detail: in recent Java releases, the weak references actually refer to the map's internal Entry objects rather than the keys. This allows broken references to be purged from the map more efficiently. Look at the code for details.
2 - Soft references are a kind of reference that the GC is allowed to break if there is a shortage of heap memory.

Java has a system of references where the language can tell your code whether or not some object is still in use. You can use references to detect when some object has been specifically identified as no longer in use or usable, and can then take action accordingly. This tutorial covers references in some depth, in case you're curious how to use them.
Internally, WeakHashMap likely uses these references to automatically detect when a given key cannot be used any more. The implementation can then remove those objects from the hash table so that they no longer take up any space.
Hope this helps!

The JVM garbage collects in this order:
Unreferenced objects
Objects whose only reference is a WeakReference
Objects whose only reference is a SoftReference
Normally, the garbage collector only garbage collects unreferenced objects.
A weak reference to an object does not count as a reference as far as the garbage collector is concerned. The garbage collector may, or may not collect them. Typically, it will not collect them unless memory is running low, but there are no guarantees.
If the JVM is about run out of memory, the garbage collector will collect softly referenced objects. All weak referenced objects will be garbage collected before any soft referenced objects are garbage collected.
From the javadoc of SoftReference:
All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError

I read your question as asking about the specific wording "regular use" and so assume you already know about strong and weak references. Regular use refers to the case where a weak hashmap contains keys that are also referenced (strongly) by some other data structure(s). The existence of at least one strong reference to a key is "regular use." The key can't be garbage collected as long as this other data structure references it. When the other data structure is no longer reachable (pointers to it no longer exist), the key also becomes unreachable. It's no longer in regular use because the only reference to it is the weak one in the mapping. The garbage collector can eventually reclaim it, and the mapping disappears.
This comes up when you'd like to extend a type C by subclassing but can't: for example when C is an interface with many implementations. You can work around this problem by using a weak hashmap with key type C and new fields to be added wrapped in a new class E. Whenever you create a C instance, you also create an E instance and add the pair to the map, which is then used to access the new fields during the life of the C intance. When the C instance becomes garbage, the mapping and E instance do also. This is automatic because the hashmap is weak. If it weren't, it would have to be cleaned up manually in the same manner you have to explicitly free storage in languages with no garbage collector.

Related

Java - HashMap and WeakHashMap references used in Application

Just trying to understand something from GC viewpoint
public Set<Something> returnFromDb(String id) {
LookupService service = fromSomewhere();
Map<String,Object> where = new WeakHashMap<>() {}
where.put("id",id);
return service.doLookupByKVPair(where); // where doesn't need to be serializable
}
what I understand is that once this method call leaves the stack, there is no reference to where regardless of using HashMap or WeakHashMap - but since weak reference is weakly reachable wouldn't this be GCd faster? But if the method call leaves the stack, then there is no reachable reference anyway.
I guess the real question that I have is - "Would using WeakHashMap<> here actually matters at all" - I think it's a "No, because the impact is insignificant" - but a second answer wouldn't hurt my knowledge.

When you use a statement like where.put("id",id); you’re associating a value with a String instance created from a literal, permanently referenced by the code containing it. So the weak semantic of the association is pointless, as long as the code is reachable, this specific key object will never get garbage collected.
When the entire WeakHashMap becomes unreachable, the weak nature of the references has no impact on the garbage collection, as unreachable objects have in general. As discussed in this answer, the garbage collection performance mainly depends on the reachable objects, not the unreachable ones.
Keep in mind the documentation:
The relationship between a registered reference object and its queue is one-sided. That is, a queue does not keep track of the references that are registered with it. If a registered reference becomes unreachable itself, then it will never be enqueued. It is the responsibility of the program using reference objects to ensure that the objects remain reachable for as long as the program is interested in their referents.
In other words, a WeakReference has no impact when it is unreachable, as it will be treated like any other garbage, i.e. not treated at all.
When you have a strong reference to a WeakHashMap while a garbage collection is in progress, it will reduce the performance, as the garbage collector has to keep track of the encountered reachable WeakReference instances, to clear and enqueue them if their referent has not been encountered and marked as strongly reachable. This additional effort is the price you have to pay for allowing the earlier collection of the keys and the subsequent cleanup, which is needed to remove the strongly referenced value.
As said, when, like in your example, the key will never become garbage collected, this additional effort is wasted. But if no garbage collection happens while the WeakHashMap is used, there will be no impact, as said, as the collection of an entire object graph happens at once, regardless of what kind of objects are in the garbage.

Garbage collection of List items

I have a Java ArrayList with references to other objects stored within the List.
If I mark the List as null then when it is garbage collected, will all the stored items in it also get claimed by GC (assuming there are no other references to them)?
thanks,
Jakao

If you know it (can access it from code / have a reference to it anywhere) it´s there (and won´t ever be collected), if you don´t it might be gone. When is it gone? None of your concern, that´s the point of having a garbage collector.

Assuming there are no other references to them, they will be GC.
In java object reference is an abstract concept, you should not worry about how the JVM manages object storage, but if you interested in memory managment in java, I suggest you deepen the arguments of the weak and soft reference and memory pools.

android - java - WeakReferences with an ArrayList?

I know that with a WeakReference, if I make a WeakReference to something that unless there's a direct reference to it that it will be Garbage Collected with the next GC cycle. My question becomes, what if I make an ArrayList of WeakReferences?
For example:
ArrayList<WeakReference<String>> exArrayList;
exArrayList = new ArrayList<WeakReference<String>>();
exArrayList.add(new WeakReference<String>("Hello"));
I can now access the data with exArrayList.get(0).get().
My question becomes: This is WeakReference data, will the data located at exArrayList.get(0) be GC'd with the next GC cycle? (even IF I don't make another direct reference to it) or will this particular reference stick around until the arraylist is emptied? (eg: exArrayList.clear();).
If this is a duplicate I haven't found it with my keywords in google.

exArrayList.add(new WeakReference<String>("Hello")); is a bad example because String literals are never GC-ed
if it were e.g. exArrayList.add(new WeakReference<Object>(new Object())); then after a GC the object would be GC-ed, but exArrayList.get(0) would still return WeakReference, though exArrayList.get(0).get() would return null

The data at exArrayList.get(0) is the WeakReference. It is not by itself a weak reference, so it will not be collected...
BUT the object referenced by exArrayList.get(0) is weakly referenced, so it might be GCed at any time (of course, that requires that there are no strong references to that object).
So
data.get(0) won't become null, but data.get(0).get() might become.
In other words, the list does not references the weak referenced object but the weak reference itself.

This is a bad idea as the other posters explained above (reference objects not freed). Use a WeakHashMap with the objects as keys and some dummy values ("" or Boolean.TRUE or similar).

Java: difference between strong/soft/weak/phantom reference

I have read this article about different types of references in Java (strong, soft, weak, phantom), but I don't really understand it.
What is the difference between these reference types, and when would each type be used?

Java provides two different types/classes of Reference Objects: strong and weak. Weak Reference Objects can be further divided into soft and phantom.
Strong
Weak
soft
phantom
Let's go point by point.
Strong Reference Object
StringBuilder builder = new StringBuilder();
This is the default type/class of Reference Object, if not differently specified: builder is a strong Reference Object. This kind of reference makes the referenced object not eligible for GC. That is, whenever an object is referenced by a chain of strong Reference Objects, it cannot be garbage collected.
Weak Reference Object
WeakReference<StringBuilder> weakBuilder = new WeakReference<StringBuilder>(builder);
Weak Reference Objects are not the default type/class of Reference Object and to be used they should be explicitly specified like in the above example. This kind of reference makes the reference object eligible for GC. That is, in case the only reference reachable for the StringBuilder object in memory is, actually, the weak reference, then the GC is allowed to garbage collect the StringBuilder object. When an object in memory is reachable only by Weak Reference Objects, it becomes automatically eligible for GC.
Levels of Weakness
Two different levels of weakness can be enlisted: soft and phantom.
A soft Reference Object is basically a weak Reference Object that remains in memory a bit more: normally, it resists GC cycle until no memory is available and there is risk of OutOfMemoryError (in that case, it can be removed).
On the other hand, a phantom Reference Object is useful only to know exactly when an object has been effectively removed from memory: normally they are used to fix weird finalize() revival/resurrection behavior, since they actually do not return the object itself but only help in keeping track of their memory presence.
Weak Reference Objects are ideal to implement cache modules. In fact, a sort of automatic eviction can be implemented by allowing the GC to clean up memory areas whenever objects/values are no longer reachable by strong references chain. An example is the WeakHashMap retaining weak keys.

Weak Reference :
A weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself.
Soft Reference :
A soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.
Phantom Reference :
A phantom reference is quite different than either SoftReference or WeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into a ReferenceQueue, as at that point you know the object to which it pointed is dead.
This text was extracted from: https://weblogs.java.net/blog/2006/05/04/understanding-weak-references

This article can be super helpful to understand strong, soft, weak and phantom references.
To give you a summary,
If you have a strong reference to an object, then the object can never be collected/reclaimed by GC (Garbage Collector).
If you only have weak references to an object (with no strong references), then the object will be reclaimed by GC in the very next GC cycle.
If you only have soft references to an object (with no strong references), then the object will be reclaimed by GC only when JVM runs out of memory.
We create phantom references to an object to keep track of when the object gets enqueued into the ReferenceQueue. Once you know that you can perform fine-grained finalization. (This would save you from accidentally resurrecting the object as phantom-reference don't give you the referrant). I'd suggest you reading this article to get in-depth detail about this.
So you can say that, strong references have ultimate power (can never be collected by GC)
Soft references are powerful than weak references (as they can escape GC cycle until JVM runs out of memory)
Weak references are even less powerful than soft references (as they cannot escape any GC cycle and will be reclaimed if object have no other strong reference).
Restaurant Analogy
Waiter - GC
You - Object in heap
Restaurant area/space - Heap space
New Customer - New object that wants table in restaurant
Now if you are a strong customer (analogous to strong reference), then even if a new customer comes in the restaurant or what so ever happnes, you will never leave your table (the memory area on heap). The waiter has no right to tell you (or even request you) to leave the restaurant.
If you are a soft customer (analogous to soft reference), then if a new customer comes in the restaurant, the waiter will not ask you to leave the table unless there is no other empty table left to accomodate the new customer. (In other words the waiter will ask you to leave the table only if a new customer steps in and there is no other table left for this new customer)
If you are a weak customer (analogous to weak reference), then waiter, at his will, can (at any point of time) ask you to leave the restaurant :P

The simple difference between SoftReference and WeakReference is provided by Android Developer.
The difference between a SoftReference and a WeakReference is the point of time at which the decision is made to clear and enqueue the reference:
A SoftReference should be cleared and enqueued as late as possible,
that is, in case the VM is in danger of running out of memory.
A WeakReference may be cleared and enqueued as soon as is known to be
weakly-referenced.

The three terms that you have used are mostly related to Object's eligibility to get Garbage collected .
Weak Reference :: Its a reference that is not strong enough to force the object to remain in memory . Its the garbage collector's whims to collect that object for garbage collection.
You can't force that GC not to collect it .
Soft Reference :: Its more or less same like the weak reference . But you can say that it holds the object a bit more strongly than the weak reference from garbage collection.
If the Garbage collectors collect the weak reference in the first life cycle itself, it will collect the soft reference in the next cycle of Garbage collection.
Strong Reference :: Its just opposite to the above two kind of references .
They are less like to get garbage collected (Mostly they are never collected.)
You can refer to the following link for more info :
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/ref/Reference.html

Strong References
These are your regular object references which we code daily:
Employee emp = new Employee();
The variable “emp” holds a strong reference to an Employee object and objects that are reachable through any chain of strong references are not eligible for garbage collection.
Usually, this is what you want but not always. Now suppose we are fetching lots of employees from database in a collection or map, and we need to do a lot of processing on them regularly, So in order keep performance we will keep them in the cache.
As far as this is good but now we need different data and we don’t need those Employee objects and these are not referenced from anywhere except the cache. Which is causing a memory leak because these objects are not in use but still not eligible for the garbage collection and we cannot remove those objects from cache because we don’t have reference to them?
So here either we need to empty the entire cache manually which is tedious or we could use other kind references e.g. Weak References.
Weak References
A weak reference does not pin an object into memory and will be GC’d in next GC cycle if not referenced from other references. We can use WeakReference class which is provided by Java to create above kind of caches, which will not store objects which are not referenced from somewhere else.
WeakReference<Cache> cache = new WeakReference<Cache>(data);
To access data you need to call cache.get(). This call to get may return null if the weak reference was garbage collected: you must check the returned value to avoid NPEs.
Java provides collections that use weak references e.g., the WeakHashMap class stores keys (not values) as weak references. If the key is GC’d then the value will automatically be removed from the map too.
Since weak references are objects too we need a way to clean them up (they’re no longer useful when the object they were referencing has been GC’d). If you pass a ReferenceQueue into the constructor for a weak reference then the garbage collector will append that weak reference to the ReferenceQueue before they’re finalized or GC’d. You can periodically process this queue and deal with dead references.
Soft References
A SoftReference is like a WeakReference but it is less likely to be garbage collected. Soft references are cleared at the discretion of the garbage collector in response to memory demand. The virtual machine guarantees that all soft references to softly reachable objects will have been cleared before it would ever throw an OutOfMemoryError.
Phantom References
Phantom references are the weakest of all reference types, calling get on them will always return null. An object is phantomly referenced after it has been finalized, but before its allocated memory has been reclaimed, As opposed to weak references which are enqueued before they’re finalized or GC’d Phantom references are rarely used.
So how are they useful? When you construct a phantom reference you must always pass in a ReferenceQueue. This indicates that you can use a phantom reference to see when your object is GC’d.
Hey, so if weak references are enqueued when they’re considered finalize but not yet GC’d we could create a new strong reference to the object in the finalizer block and prevent the object being GC’d. Yep, you can but you probably shouldn’t do this. To check for this case the GC cycle will happen at least twice for each object unless that object is reachable only by a phantom reference. This is why you can run out of heap even when your memory contains plenty of garbage. Phantom references can prevent this.
You can read more on my article Types of References in Java(Strong, Soft, Weak, Phantom).

4 degrees of reference - Strong, Weak, Soft, Phantom
Strong - is a kind of reference, which makes the referenced object not
eligible for GC. builder classes. eg - StringBuilder
Weak - is a reference which is eligible for GC.
Soft - is a kind of reference whose object is eligible for GC until memory is avaiable. Best for image cache. It will hold them till the memory is available.
Phantom - is a kind of reference whose object is directly eligible for GC. Used only to know when an object is removed from memory.
uses:
Allows you to identify when an object is exactly removed from memory.
when finalize() method is overloaded, then GC might not happen in timely fashion for GC eligible objects of the two classes. So phantom reference makes them eligible for GC before finalize(), is why you can get OutOfMemoryErrors even when most of the heap is garbage.
Weak references are ideal to implement the cache modules.

How does Java Garbage collector handle self-reference?

Hopefully a simple question. Take for instance a Circularly-linked list:
class ListContainer
{
private listContainer next;
<..>
public void setNext(listContainer next)
{
this.next = next;
}
}
class List
{
private listContainer entry;
<..>
}
Now since it's a circularly-linked list, when a single elemnt is added, it has a reference to itself in it's next variable. When deleting the only element in the list, entry is set to null. Is there a need to set ListContainer.next to null as well for Garbage Collector to free it's memory or does it handle such self-references automagically?

Garbage collectors which rely solely on reference counting are generally vulnerable to failing to collection self-referential structures such as this. These GCs rely on a count of the number of references to the object in order to calculate whether a given object is reachable.
Non-reference counting approaches apply a more comprehensive reachability test to determine whether an object is eligible to be collected. These systems define an object (or set of objects) which are always assumed to be reachable. Any object for which references are available from this object graph is considered ineligible for collection. Any object not directly accessible from this object is not. Thus, cycles do not end up affecting reachability, and can be collected.
See also, the Wikipedia page on tracing garbage collectors.

Circular references is a (solvable) problem if you rely on counting the references in order to decide whether an object is dead. No java implementation uses reference counting, AFAIK. Newer Sun JREs uses a mix of several types of GC, all mark-and-sweep or copying I think.
You can read more about garbage collection in general at Wikipedia, and some articles about java GC here and here, for example.

The actual answer to this is implementation dependent. The Sun JVM keeps track of some set of root objects (threads and the like), and when it needs to do a garbage collection, traces out which objects are reachable from those and saves them, discarding the rest. It's actually more complicated than that to allow for some optimizations, but that is the basic principle. This version does not care about circular references: as long as no live object holds a reference to a dead one, it can be GCed.
Other JVMs can use a method known as reference counting. When a reference is created to the object, some counter is incremented, and when the reference goes out of scope, the counter is decremented. If the counter reaches zero, the object is finalized and garbage collected. This version, however, does allow for the possibility of circular references that would never be garbage collected. As a safeguard, many such JVMs include a backup method to determine which objects actually are dead which it runs periodically to resolve self-references and defrag the heap.

As a non-answer aside (the existing answers more than suffice), you might want to check out a whitepaper on the JVM garbage collection system if you are at all interested in GC. (Any, just google JVM Garbage Collection)
I was amazed at some of the techniques used, and when reading through some of the concepts like "Eden" I really realized for the first time that Java and the JVM actually could beat C/C++ in speed. (Whenever C/C++ frees an object/block of memory, code is involved... When Java frees an object, it actually doesn't do anything at all; since in good OO code, most objects are created and freed almost immediately, this is amazingly efficient.)
Modern GC's tend to be very efficient, managing older objects much differently than new objects, being able to control GCs to be short and half-assed or long and thorough, and a lot of GC options can be managed by command line switches so it's actually useful to know what all the terms actually refer to.
Note: I just realized this was misleading. C++'s STACK allocation is very fast--my point was about allocating objects that are able to exist after the current routine has finished (which I believe SHOULD be all objects--it's something you shouldn't have to think about if you are going to think in OO, but in C++ speed may make this impractical).
If you are only allocating C++ classes on the stack, it's allocation will be at least as fast as Java's.

Java collects any objects that are not reachable. If nothing else has a reference to the entry, then it will be collected, even though it has a reference to itself.

yes Java Garbage collector handle self-reference!
How?
There are special objects called called garbage-collection roots (GC roots). These are always reachable and so is any object that has them at its own root.
A simple Java application has the following GC roots:
Local variables in the main method
The main thread
Static variables of the main class
To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. It works as follows
The algorithm traverses all object references, starting with the GC
roots, and marks every object found as alive.
All of the heap memory that is not occupied by marked objects is
reclaimed. It is simply marked as free, essentially swept free of
unused objects.
So if any object is not reachable from the GC roots(even if it is self-referenced or cyclic-referenced) it will be subjected to garbage collection.

Simply, Yes. :)
Check out http://www.ibm.com/developerworks/java/library/j-jtp10283/
All JDKs (from Sun) have a concept of "reach-ability". If the GC cannot "reach" an object, it goes away.
This isn't any "new" info (your first to respondents are great) but the link is useful, and brevity is something sweet. :)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.