Why do we need weak reference in java

Why do we need weak reference in java - java

I understand that weak references are at the mercy of the garbage collector, and we cannot guarantee that the weak reference will exist. I could not see a need to have weak reference, but sure there should be a reason.
Why do we need weak reference in java?
What are the practical (some) uses of weak reference in java? If you can share how you used in your project it will be great!

It's actually quite often a bad idea to use weak hashmaps. For one it's easy to get wrong, but even worse it's usually used to implement some kind of cache.
What this does mean is the following: Your program runs fine with good performance for some time, under stress we allocate more and more memory (more requests = more memory pressure = probably more cache entries) which then leads to a GC.
Now suddenly while your system is under high stress you not only get the GC, but also lose your whole cache, just when you'd need it the most. Not fun this problem, so you at least have to use a reasonably sized hard referenced LRU cache to mitigate that problem - you can still use the weakrefs then but only as an additional help.
I've seen more than one project hit by that "bug"..

The most "unambiguously sensible" use of weak references I've seen is Guava's Striped, which does lock striping. Basically, if no threads currently hold a reference to a lock variable, then there's no reason to keep that lock around, is there? That lock might have been used in the past, but it's not necessary now.
If I had to give a rule for when you use which, I'd say that soft references are preferable when you might use something in the future, but could recompute it if you really needed it; weak references are especially preferable when you can be sure the value will never be needed after a reference goes out of memory. (For example, if you use the default reference equality for a particular equals(Object) implementation for a map key type, and the object stops being referenced anywhere else, then you can be sure that entry will never be referenced again.

The main reason for me to use weak references is indirectly, through a WeakHashMap.
You might want to store a collection of objects in a Map (as a cache or for any other reason), but don't want them to be in memory for as long as the Map exists, especially if the objects are relatively large.
By using a WeakHashMap, you can make sure that the reference from the Map isn't the only thing keeping the object in memory, since it'll be garbage collected if no other references exist.

Say you need to keep some information as long as an object is referenced, but you don't know when it will go away, you can use a weak reference to keep track of the information.

Yes and it has good impact.
Example of "widget serial number" problem above, the easiest thing to do is use the built-in WeakHashMap class. WeakHashMap works exactly like HashMap, except that the keys (not the values!) are referred to using weak references. If a WeakHashMap key becomes garbage, its entry is removed automatically. This avoids the pitfalls I described and requires no changes other than the switch from HashMap to a WeakHashMap. If you're following the standard convention of referring to your maps via the Map interface, no other code needs to even be aware of the change.
Reference

Weak Refernce Objects are needed to JVM platform to ensure means against the memory leaks.
As Java Developers should know, Java can leak, more than expected. This statement is particurarly true in those cases in which an Object is no longer used but some collection still refence strongly that instance: a very simple but recurrent example of memory leak, in fact that memory area will not be deallocated until the strong reference exists.
In the above case, a Weak Reference Object in the collection assures that: without a strong references chain, but with only weak chains, the instance can be elcted as garbace collectable.
In my opinion, all features provided by the Java Platform are useful to some extent: very skilled programmer can drive Java to be fast and reliable as C++ writing very high quality code.

Decoupling from pub-sub or event bus
A WeakReference is good when you want to let an object head for garbage-collection without having to gracefully remove itself from other objects holding an reference.
In scenarios such as publish-subscribe or an event bus, a collection of references to subscribing objects is held. Those references should be weak, so that the subscriber can conveniently go out of scope without bothering to unsubscribe. The subscriber can just “disappear” after all other places in the app have released their strong reference. At that point, there is no need for the subscription list or event bus to keep hanging on to the subscriber. The use of WeakReference allows the object to continue on its way into oblivion.
The subscribing object may have been subscribed without its knowledge, submitted to the pub-sub or event bus by some other 3rd-party object. Coordinating a call to unsubscribe later in the life-cycle of the subscriber can be quite cumbersome. So letting the subscriber fade away without formally unsubscribing may greatly simplify your code, and can avoid difficult bugs if that unsubscribing coordination were to fail.
Here is an example of a thread-safe set of weak references, as seen in this Answer and this Answer.
this.subscribersSet =
Collections.synchronizedSet(
Collections.newSetFromMap(
new WeakHashMap <>()
)
);
Note that the entry in the set is not actually removed until after garbage-collection actually executes, as discussed in linked Answer above. While existing as a candidate for garbage-collection (no remaining strong references held anywhere), the item remains in the set.

It is required to make Java garbage collection deterministic. (From the slightly satirical point of view with some truth to it).

Related

Do callback on Java object just BEFORE it's garbage collected

Backstory: So I had this great idea, right? Sometimes you're collecting a massive amount of data, and you don't need to access all of it all the time, but you also may not need it after the program has finished, and you don't really want to muck around with database tables, etc. What if you had a library that would silently and automatically serialize objects to disk when you're not using them, and silently bring them back when you needed them? So I started writing a library; it has a number of collections like "DiskList" or "DiskMap" where you put your objects. They keep your objects via WeakReferences. While you're still using a given object, it has strong references to it, so it stays in memory. When you stop using it, the object is garbage collected, and just before that happens, the collection serializes it to disk (*). When you want the object again, you ask for it by index or key, like usual, and the collection deserializes it (or returns it from its inner cache, if it hasn't been GCd yet).
(*) See now, this is the sticking point. In order for this to work, I need to be able to be notified JUST BEFORE the object is GCd - after no other references to it exist (and therefore the object can no longer be modified), but before the object is wiped from memory. This is proving difficult. I thought briefly that using a ReferenceQueue would save me, but alas, it returns a Reference, whose referent has thus far always been null.
Is there a way, having been given an arbitrary object, to receive (via callback or queue, etc.) the object after it is ready to be garbage collected, but before it IS garbage collected?
I know (Object).finalize() can basically do that, but I'll have to deal with classes that don't belong to me, and whose finalize methods I can't legitimately override. I'd prefer not to go as arcane as custom classloaders, bytecode manipulation, or reflection, but I will if I have to.
(Also, if you know of existing libraries that do transparent disk caching, I'd look favorably on that, though my requirements on such a library would be fairly stringent.)

You can look for a cache that supports "write behind caching" and tiering. Notable products would be EHCache, Hazelcast, Infinispan.
Or you can construct something by yourself with a cache and a time to idle expiry.
Then, the cache access would be "the usage" of the object.
Is there a way, having been given an arbitrary object, to receive (via callback or queue, etc.) the object after it is ready to be garbage collected, but before it IS garbage collected?
This interferes heavily with garbage collection. Chances are high that it will bring down your application or whole system. What you want to do is to start disk I/O and potentially allocate additional objects, when the system is low or out of memory. If you manage it to work, you'll end up using more heap than before, since the heap must always be extended when the GC kicks in.

Weak Reference and Soft Reference

Weak references allows GC to collect the references in next GC Cycle whereas Soft Reference will keep the reference until memory is full and before throwing out of memory error,it will remove soft references.
Where we will be using these references?
Which reference will be best for implementing caching?
For eg:If I use soft reference for caching,then it will be cleared when memory is full.
But lets suppose,I have fetched some database details and put that in memory and cached that detail in soft reference and now if i removed some key value from memory,it will still be there in cache.Do we need to use weak reference in this case?How the decision should be made.

Soft References are used for caching in most cases. You want to keep data in RAM as long as it possible, but it is better to purge cache than die with OOM.
Weak references can be used (for example) to keep extra info about your class. You have class User and you want to store some additional info, which should be deleted at the moment when user is deleted (you do not want to do it manually since it is bolierplate code). So, you use WeakHashMap using User as key, and when there is no reference to user, it is deleted from this map as well.
By the way: in languages with references counting weak references are used to prevent reference cycles, but java GC removes "islands of isolation", so this usage of weak reference is not for java.

Weak references allows GC to collect the references in next GC Cycle whereas Soft Reference will keep the reference until memory is full and before throwing out of memory error,it will remove soft references.
You are reading more into the docs than they actually say. All objects that are softly reachable (which precludes them being strongly reachable) will be cleaned up and released before the VM throws an OutOfMemoryError, but the VM is in no way required to preserve them past the point when they are initially determined to be softly reachable. The docs do not forbid that they be reclaimed in the same GC cycle in which they are found to be softly reachable.
Generally speaking, however, you can suppose that the GC will prefer to process phantom reachable and weakly reachable objects first.
Where we will be using these references? Which reference will be best for implementing caching?
The docs say this:
Soft references are for implementing memory-sensitive caches, weak references are for implementing canonicalizing mappings that do not prevent their keys (or values) from being reclaimed, and phantom references are for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism.
You go on to ask:
For eg:If I use soft reference for caching,then it will be cleared when memory is full. But lets suppose,I have fetched some database details and put that in memory and cached that detail in soft reference and now if i removed some key value from memory,it will still be there in cache.Do we need to use weak reference in this case?How the decision should be made.
If you want to build a cache that can discard entries whose keys cease to be strongly reachable (which may mean that those entries can no longer be retrieved), that is squarely in the center of the intended purpose for weak references. The cache internally holds only a weak reference to the key of each entry, and it registers those references with a ReferenceQueue that lets it know when they should be discarded. This is precisely how WeakHashMap works.
If you want to build a cache that can respond to high memory demand by discarding entries, then that is the intended purpose of soft references (to the values); such a cache would work similarly to WeakHashMap, but with use of soft references to the values instead of weak references to the keys. The two could be combined, of course.
Note, by the way, that Reference objects become relevant to GC only when their referents cease to be strongly reachable. In particular, having a SoftReference to an object does not in itself guarantee that that object will ever be reclaimed, no matter what the demand is on memory. No object that is strongly reachable is ever eligible for finalization or reclamation.

If you are implementing your own cache, use a Soft Reference.
I once maintained a legacy system that used a cache of Weak References to store large objects that were very expensive to create. Almost every time a thread tried to fetch an object from that cache, it had already been GC'ed so the objects had to be expensively recreated a lot! It was practically like the cache wasn't there.
But lets suppose,I have fetched some database details and put that in memory and cached that detail in soft reference and now if i removed some key value from memory,it will still be there in cache. Do we need to use weak reference in this case?How the decision should be made.
I'm not sure I understand your question. When the original "hard" reference to those details is GC'ed it can still be in the cache of soft references. If you remove the item from the cache then there are no more references to the details at all so it will be GC'ed next time no matter what kind of reference used to point to it.
There are very few times when a WeakReference is useful. See Weak references - how useful are they? for some examples

Which reference will be best for implementing caching?
for caching use SoftReference class, the whole point of caching things is to keep it prepared for quick use - if memory is available. So when there is little memory ten its ok to flush your cache.
WeakReference are perfect for avoiding reference leaks, it happens when you have some static object or thread which keeps reference to object whose lifetime is shorter that this object/thread. I use a lot of WeakReference - s in android development, especially with AsyncTask's whose lifetime is often longer than of Activity-s lifetime which created them.
... Do we need to use weak reference in this case?
once you remove such entry from your cache collection it will be garbage collected, so no need for WeakReference.
I have never used SoftReferences - but thats because I mostly code under android platform, and acording to its docs http://developer.android.com/reference/java/lang/ref/SoftReference.html, they are useless for caching - at least under this platform.

How to test code that relies on a SoftReference?

I have certain code that uses many instances of a SoftReference subclass. I would like test that it works correctly in cases when all / only some / none of these references are staged for clearing in ReferenceQueue. For the case of "none" this is quite easy: create strong references to the objects, and the soft references are guaranteed to stay. However, how would I go about guaranteeing them to be cleared? As I understand, System.gc() is only a request to run garbage collector, and even if it actually runs, it may well decide to not collect all unreachable objects...
Also the code is quite performance critical, so it's not a good idea to alter it just for testing purposes. (Adding a test-only method that doesn't affect other methods is fine, but adding paths that are used only for testing in other methods is something to avoid).

If it is an option to access your SoftReference instances from the test you could simulate the GC behavior by calling methods directly on the SoftReference instances.
Calling SoftReference.clear() would correspond to the first step where the reference is cleared. Then you could call SoftReference.enqueue() to enqueue it in the reference queue, corresponding to the enqueing step the GC does [some time] after clearing the reference.
Calling these methods on a subset of your SoftReferences you could simulate that only some of the references have been cleared and enqueued.
I really think the above approach is to recommend since you got control of which references are cleared and that is a good thing in a test.
However, if you can not access your SoftReferences directly you are pretty much limited to allocating memory in an attempt to make the GC clear them. For example, as illustrated in this question and its answers.

Why doesn't the JVM destroy a resource as soon as its reference count hits 0?

I have always wondered why the garbage collector in Java activates whenever it feels like it rather than do:
if(obj.refCount == 0)
{
delete obj;
}
Are there any big advantages to how Java does it that I overlooked?
Thanks

Each JVM is different, but the HotSpot JVM does not primarily rely on reference counting as a means for garbage collection. Reference counting has the advantage of being simple to implement, but it is inherently error-prone. In particular, if you have a reference cycle (a set of objects that all refer to one another in a cycle), then reference counting will not correctly reclaim those objects because they all have nonzero reference count. This forces you to use an auxiliary garbage collector from time to time, which tends to be slower (Mozilla Firefox had this exact problem, and IIRC their solution was to add in a garbage collector on top of reference counting). This is why, for example, languages like C++ tend to have a combination of shared_ptrs that use reference counting and weak_ptrs that don't use reference cycles.
Additionally, associating a reference count with each object makes the cost of assigning a reference greater than normal, because of the extra bookkeeping involved of adjusting the reference count (which only gets worse in the presence of multithreading). Furthermore, using reference counting precludes the use of certain types fast of memory allocators, which can be a problem. It also tends to lead to heap fragmentation in its naive form, since objects are scattered through memory rather than tightly-packed, decreasing allocation times and causing poor locality.
The HotSpot JVM uses a variety of different techniques to do garbage collection, but its primary garbage collector is called a stop-and-copy collector. This collector works by allocating objects contiguously in memory next to one another, and allows for extremely fast (one or two assembly instructions) allocation of new objects. When space runs out, all of the new objects are GC'ed simultaneously, which usually kills off most of the new objects that were constructed. As a result, the GC is much faster than a typically reference-counting implementation, and ends up having better locality and better performance.
For a comparison of techniques in garbage collecting, along with a quick overview of how the GC in HotSpot works, you may want to check out these lecture slides from a compilers course that I taught last summer. You may also want to look at the HotSpot garbage collection white paper that goes into much more detail about how the garbage collector works, including ways of tuning the collector on an application-by-application basis.
Hope this helps!

Reference counting has the following limitations:
It is VERY bad for multithreading performance (basically, every assignment of an object reference must be protected).
You cannot free cycles automatically

Because it doesn't work strictly based on reference counting.
Consider circular references which are no longer reachable from the "root" of the application.
For example:
APP has a reference to SOME_SCREEN
SOME_SCREEN has a reference to SOME_CHILD
SOME_CHILD has a reference to SOME_SCREEN
now, APP drops it's reference to SOME_SCREEN.
In this case, SOME_SCREEN still has a reference to SOME_CHILD, and SOME_CHILD still has a reference to SOME_SCREEN - so, in this case, your example doesn't work.
Now, others (Apple with ARC, Microsoft with COM, many others) have solutions for this and work more similarly to how you describe it.
With ARC you have to annotate your references with keywords like strong and weak to let ARC know how to deal with these references (and avoid circular references)... (don't read too far into my specific example with ARC because ARC handles these things ahead-of-time during the compilation process and doesn't require a specific runtime per-se) so it can definitely be done similarly to how you describe it, but it's just not workable with some of the features of Java. I also believe COM works more similarly to how you describe... but again, that's not free of some amount of consideration on the developer's part.
In fact, no "simple" reference counting scheme would ever be workable without some amount of thought by the application developer (to avoid circular references, etc)

Because the garbage collector in modern JVMs is no longer tracking references count. This algorithm is used to teach how GC works, but it was both resource-consuming and error-prone (e.g. cyclic dependencies).

because the garbage collector in java is based on copying collector for 'youg generation' objects, and
mark and sweep for `tenure generations' objects.
Resources from: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

What is a use case for a soft reference in Java?

What is a use case for a soft reference in Java? Would it be useful to garbage collect non-critical items when a JVM has run out of memory in order to free up enough resources to perhaps dump critical information before shutting down the JVM?
Are they called soft-references in they they are soft and break when "put under stress" ie:the JVM has run out of memory. I understand weak references and phantom references but not really when these would be needed.

One use is for caching. Imagine you want to maintain an in-memory cache of large objects but you don't want that cache to consume memory that could be used for other purposes (for the cache can always be rebuilt). By maintaining a cache of soft-references to the objects, the referenced objects can be freed by the JVM and the memory they occupied reused for other purposes. The cache would merely need to clear out broken soft-references when it encounters them.
Another use may be for maintaining application images on a memory-constrained device, such as a mobile phone. As the user opens applications, the previous application images could be maintained as soft-references so that they can be cleared out if the memory is needed for something else but will still be there if there is not demand for memory. This will allow the user to return to the application more quickly if there is no pressure on memory and allow the previous application's memory to be reclaimed if it is needed for something else.

This article gave me a good understanding of each of them (weak, soft and phantom references). Here's a summarized cite:
A weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself.
A soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.
A phantom reference is quite different than either SoftReference or WeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into a ReferenceQueue, as at that point you know the object to which it pointed is dead.

The best example I can think of is a cache. You might not mind dumping the oldest entries in the cache if memory became a problem. Caching large object graphs might make this likely as well.

An example of how a SoftReference can be used as a cache can be found in this post.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.