What is a use case for a soft reference in Java? Would it be useful to garbage collect non-critical items when a JVM has run out of memory in order to free up enough resources to perhaps dump critical information before shutting down the JVM?
Are they called soft-references in they they are soft and break when "put under stress" ie:the JVM has run out of memory. I understand weak references and phantom references but not really when these would be needed.
One use is for caching. Imagine you want to maintain an in-memory cache of large objects but you don't want that cache to consume memory that could be used for other purposes (for the cache can always be rebuilt). By maintaining a cache of soft-references to the objects, the referenced objects can be freed by the JVM and the memory they occupied reused for other purposes. The cache would merely need to clear out broken soft-references when it encounters them.
Another use may be for maintaining application images on a memory-constrained device, such as a mobile phone. As the user opens applications, the previous application images could be maintained as soft-references so that they can be cleared out if the memory is needed for something else but will still be there if there is not demand for memory. This will allow the user to return to the application more quickly if there is no pressure on memory and allow the previous application's memory to be reclaimed if it is needed for something else.
This article gave me a good understanding of each of them (weak, soft and phantom references). Here's a summarized cite:
A weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself.
A soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.
A phantom reference is quite different than either SoftReference or WeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into a ReferenceQueue, as at that point you know the object to which it pointed is dead.
The best example I can think of is a cache. You might not mind dumping the oldest entries in the cache if memory became a problem. Caching large object graphs might make this likely as well.
An example of how a SoftReference can be used as a cache can be found in this post.
Related
Backstory: So I had this great idea, right? Sometimes you're collecting a massive amount of data, and you don't need to access all of it all the time, but you also may not need it after the program has finished, and you don't really want to muck around with database tables, etc. What if you had a library that would silently and automatically serialize objects to disk when you're not using them, and silently bring them back when you needed them? So I started writing a library; it has a number of collections like "DiskList" or "DiskMap" where you put your objects. They keep your objects via WeakReferences. While you're still using a given object, it has strong references to it, so it stays in memory. When you stop using it, the object is garbage collected, and just before that happens, the collection serializes it to disk (*). When you want the object again, you ask for it by index or key, like usual, and the collection deserializes it (or returns it from its inner cache, if it hasn't been GCd yet).
(*) See now, this is the sticking point. In order for this to work, I need to be able to be notified JUST BEFORE the object is GCd - after no other references to it exist (and therefore the object can no longer be modified), but before the object is wiped from memory. This is proving difficult. I thought briefly that using a ReferenceQueue would save me, but alas, it returns a Reference, whose referent has thus far always been null.
Is there a way, having been given an arbitrary object, to receive (via callback or queue, etc.) the object after it is ready to be garbage collected, but before it IS garbage collected?
I know (Object).finalize() can basically do that, but I'll have to deal with classes that don't belong to me, and whose finalize methods I can't legitimately override. I'd prefer not to go as arcane as custom classloaders, bytecode manipulation, or reflection, but I will if I have to.
(Also, if you know of existing libraries that do transparent disk caching, I'd look favorably on that, though my requirements on such a library would be fairly stringent.)
You can look for a cache that supports "write behind caching" and tiering. Notable products would be EHCache, Hazelcast, Infinispan.
Or you can construct something by yourself with a cache and a time to idle expiry.
Then, the cache access would be "the usage" of the object.
Is there a way, having been given an arbitrary object, to receive (via callback or queue, etc.) the object after it is ready to be garbage collected, but before it IS garbage collected?
This interferes heavily with garbage collection. Chances are high that it will bring down your application or whole system. What you want to do is to start disk I/O and potentially allocate additional objects, when the system is low or out of memory. If you manage it to work, you'll end up using more heap than before, since the heap must always be extended when the GC kicks in.
Weak references allows GC to collect the references in next GC Cycle whereas Soft Reference will keep the reference until memory is full and before throwing out of memory error,it will remove soft references.
Where we will be using these references?
Which reference will be best for implementing caching?
For eg:If I use soft reference for caching,then it will be cleared when memory is full.
But lets suppose,I have fetched some database details and put that in memory and cached that detail in soft reference and now if i removed some key value from memory,it will still be there in cache.Do we need to use weak reference in this case?How the decision should be made.
Soft References are used for caching in most cases. You want to keep data in RAM as long as it possible, but it is better to purge cache than die with OOM.
Weak references can be used (for example) to keep extra info about your class. You have class User and you want to store some additional info, which should be deleted at the moment when user is deleted (you do not want to do it manually since it is bolierplate code). So, you use WeakHashMap using User as key, and when there is no reference to user, it is deleted from this map as well.
By the way: in languages with references counting weak references are used to prevent reference cycles, but java GC removes "islands of isolation", so this usage of weak reference is not for java.
Weak references allows GC to collect the references in next GC Cycle whereas Soft Reference will keep the reference until memory is full and before throwing out of memory error,it will remove soft references.
You are reading more into the docs than they actually say. All objects that are softly reachable (which precludes them being strongly reachable) will be cleaned up and released before the VM throws an OutOfMemoryError, but the VM is in no way required to preserve them past the point when they are initially determined to be softly reachable. The docs do not forbid that they be reclaimed in the same GC cycle in which they are found to be softly reachable.
Generally speaking, however, you can suppose that the GC will prefer to process phantom reachable and weakly reachable objects first.
Where we will be using these references? Which reference will be best for implementing caching?
The docs say this:
Soft references are for implementing memory-sensitive caches, weak references are for implementing canonicalizing mappings that do not prevent their keys (or values) from being reclaimed, and phantom references are for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism.
You go on to ask:
For eg:If I use soft reference for caching,then it will be cleared when memory is full. But lets suppose,I have fetched some database details and put that in memory and cached that detail in soft reference and now if i removed some key value from memory,it will still be there in cache.Do we need to use weak reference in this case?How the decision should be made.
If you want to build a cache that can discard entries whose keys cease to be strongly reachable (which may mean that those entries can no longer be retrieved), that is squarely in the center of the intended purpose for weak references. The cache internally holds only a weak reference to the key of each entry, and it registers those references with a ReferenceQueue that lets it know when they should be discarded. This is precisely how WeakHashMap works.
If you want to build a cache that can respond to high memory demand by discarding entries, then that is the intended purpose of soft references (to the values); such a cache would work similarly to WeakHashMap, but with use of soft references to the values instead of weak references to the keys. The two could be combined, of course.
Note, by the way, that Reference objects become relevant to GC only when their referents cease to be strongly reachable. In particular, having a SoftReference to an object does not in itself guarantee that that object will ever be reclaimed, no matter what the demand is on memory. No object that is strongly reachable is ever eligible for finalization or reclamation.
If you are implementing your own cache, use a Soft Reference.
I once maintained a legacy system that used a cache of Weak References to store large objects that were very expensive to create. Almost every time a thread tried to fetch an object from that cache, it had already been GC'ed so the objects had to be expensively recreated a lot! It was practically like the cache wasn't there.
But lets suppose,I have fetched some database details and put that in memory and cached that detail in soft reference and now if i removed some key value from memory,it will still be there in cache. Do we need to use weak reference in this case?How the decision should be made.
I'm not sure I understand your question. When the original "hard" reference to those details is GC'ed it can still be in the cache of soft references. If you remove the item from the cache then there are no more references to the details at all so it will be GC'ed next time no matter what kind of reference used to point to it.
There are very few times when a WeakReference is useful. See Weak references - how useful are they? for some examples
Which reference will be best for implementing caching?
for caching use SoftReference class, the whole point of caching things is to keep it prepared for quick use - if memory is available. So when there is little memory ten its ok to flush your cache.
WeakReference are perfect for avoiding reference leaks, it happens when you have some static object or thread which keeps reference to object whose lifetime is shorter that this object/thread. I use a lot of WeakReference - s in android development, especially with AsyncTask's whose lifetime is often longer than of Activity-s lifetime which created them.
... Do we need to use weak reference in this case?
once you remove such entry from your cache collection it will be garbage collected, so no need for WeakReference.
I have never used SoftReferences - but thats because I mostly code under android platform, and acording to its docs http://developer.android.com/reference/java/lang/ref/SoftReference.html, they are useless for caching - at least under this platform.
I have a cache built from a Map to SoftReferences. When they are added they get put into another queue to be lazily compressed down via gzip or some such.
My idea is this: I want to have WeakReferences to the objects in the compress queue, so that when the compressor task gets to the object, if it is already gone we needn't bother compressing it - and also that the compressor's queue doesn't keep objects alive that would otherwise be GC'd.
So if there is exactly one SoftReference and one WeakReference, does the semantic of SoftReference apply still?
Yes the semantic of SoftReferences still applies: SoftReferences are stronger than WeakReferences.
WeakReferences are basically treated as non existing for the GC. So an object that is only weakly reachable may be GCed immediately. Objects only reachable by a SoftReferences as the strongest type, however, are only considered for GCing if demands on memory needs to be fullfilled.
So if there are both soft and weak references, the semantic of SoftReference is applied.
Weak reference objects, which do not
prevent their referents from being
made finalizable, finalized, and then
reclaimed.
http://download.oracle.com/javase/6/docs/api/java/lang/ref/WeakReference.html
Soft reference objects, which are
cleared at the discretion of the
garbage collector in response to
memory demand. Soft references are
most often used to implement
memory-sensitive caches.
http://download.oracle.com/javase/6/docs/api/java/lang/ref/SoftReference.html
yes, there is no problem to GC the object that has as many soft/weak references as you want, until it has almost one strong reference.
You should notice that Weak Reference always collect before Soft Reference. And Soft Reference often use to cache something.
It means: at that time, no longer it needs to survive, but, at sometimes in the future, maybe you need it again, and Java will not work again to instance a new object.
Has anyone researched the runtime costs involved in creating and garbage collecting Java WeakReference objects? Are there any performance issues (e.g. contention) for multi-threaded applications?
EDIT: Obviously the actual answer(s) will be JVM dependent, but general observations are also welcome.
EDIT 2: If anyone has done some benchmarking of the performance, or can point to some benchmarking results, that would be ideal. (Sorry, but the bounty has expired ...)
WeakReferences have negative impact on CMS garbage collector. As far as I can see from behavior of our server it influences parallel remark phase time. During this phase all app threads are stopped so it's extremely undesirable thing. So you need to be careful with WeakReferences.
I implemented a Java garbage collector once, so whatever I was able to accomplish is a (weak :) lower bound on what is possible.
In my implementation, there is a small constant amount of additional overhead for each weak reference when it is visited during garbage collection.
So the upshot is: I wouldn't worry about it, it's not a big problem unless you are using zillions of weak references.
Most importantly, the cost is proportional to the number of weak references in existence, not the size of the overall heap.
However, that's not to say that a garbage collector that supports weak references will be as fast as one that does not. The presumed question here is, given that Java supports weak references, what is the incremental cost of using them?
Mine was a simple "stop the world" mark/sweep garbage collector. During garbage collection, it makes a determination for every object whether that object is live or not and sets a LIVE bit in the object header. Then it goes through and frees all the non-live objects.
To handle weak references you just add the following:
Ignore weak references when setting LIVE bits (i.e., they don't cause the LIVE bit on the referenced object to be set).
During the sweep step, add a special check as follows: if the object you're visiting is LIVE, and it's a WeakReference, then check the object that it weakly references, and if that object is not LIVE, clear the reference.
Small variations of this logic work for soft and phantom references.
Implementation is here if you're really curious.
cache using weak references may significantly slow down your app if it's rebuild on demand e.g. in getters:
public Object getSomethingExpensiveToFind() {
if(cache.contains(EXPENSIVE_OBJ_KEY)) {
return cache.get(EXPENSIVE_OBJ_KEY);
}
Object sth = obtainSomethingExpensiveToFind(); // computationally expensive
cache.put(EXPENSIVE_OBJ_KEY, sth);
return sth;
}
imagine this scenario:
1) app is running low on memory
2) GC cleans weak references, thus cache is cleared too
3) app continues, a lot of methods like getSomethingExpensiveToFind() are invoked and rebuild the cache
4) app is running low on memory again
5) GC cleans wear references, clears cache
6) app continues, a lot of methods like getSomethingExpensiveToFind() are invoked and rebuild the cache again
7) and so on...
I came upon such problem - the app was interrupted by GC very often and it compeletly defeated the whole point of caching.
In other words, if improperly managed, weak references can slow down your application.
In order to perform some testing, I'd like to check how my application behaves when some or all of the objects I have stored in a cache of SoftReference'd objects are disposed of.
In order to do this, I'd like to manually clear the references stored in the cached SoftReference objects - simulating the VM disposing of those objects - but only if nothing else currently has a strong reference to that object (which could be the case if another process had recently retrieved the referenced object from the cache).
My application is single-threaded, so I don't need to worry about the soft reachability of a cached object changing as this code is executing. This also means that I don't currently have any locking mechanisms - if I did have, I could possibly have used these to determine whether or not an object was 'being used' and hence strongly reachable, but alas I don't have need of such locking.
One approach I have tried is to create an additional SoftReference to every object stored in the cache which is registered with a ReferenceQueue. My hope was that in doing so, all of the softly reachable objects in the cache would have their additional SoftReference added to the queue, so all I had to do was loop over the queue, and remove those objects from my cache. However, it seems that the GC enqueues softly reachable objects to their respective queues at its leisure, so it is not guaranteed that anything will be added to the queue once I've finished iterating through the objects in the cache.
One thing that I have also looked at is the -XX:SoftRefLRUPolicyMSPerMB JVM option with a very small value. With judicious memory allocation, this will quite probably clear softly reachable objects from the cache for me the moment they are softly reachable, but I'd really like the app to run normally until I receive a request to clear the softly reachable objects from the cache. As a JVM option, I don't believe I can alter this value while my app is running.
So, does anyone have any ideas as to how I can determine whether or not an object is only softly reachable (and hence can be cleared)?
Edit: The a few extra points that may not have been clear:
The app will probably be doing useful work at the times that I want to clear these softly reference objects out. So I'd prefer to not try and cause the GC to clear objects out for me.
It would be preferable if I could select which softly reachable objects were cleared out.
I would like to app to run normally, i.e. using production memory settings. Changing settings in code, which can then be reset back to their production values, is fine.
IIRC, soft references are guaranteed (in some sense) to be cleared before an OutOfMemoryError is thrown. So, if you allocate lots of memory they should get cleared if the objects is not strongly referenced. (Not tested.)
Mixing some answers: as Tom Hawtin said allocate memory till you go outOfMemory, for example with this code:
private void doOutOfMemory() {
try {
List<byte[]> list = new ArrayList<byte[]>();
while (true) {
list.add(new byte[200 * 1024 * 1024]);
}
} catch (OutOfMemoryError ex) {
}
}
If you wan't to control what objects are cleared, take a strong reference on the objects you want to keep.
You may also use weakReferences instead and only call System.gc() to clear there is no guarantee they will always be cleared...
Substitute a weakreference system for your current soft reference system while testing.
The weak reference system will remove an objects with no other incoming references as soon as that happens, instead of waiting for the jvm to run garbage collection.