How to test code that relies on a SoftReference?

How to test code that relies on a SoftReference? - java

I have certain code that uses many instances of a SoftReference subclass. I would like test that it works correctly in cases when all / only some / none of these references are staged for clearing in ReferenceQueue. For the case of "none" this is quite easy: create strong references to the objects, and the soft references are guaranteed to stay. However, how would I go about guaranteeing them to be cleared? As I understand, System.gc() is only a request to run garbage collector, and even if it actually runs, it may well decide to not collect all unreachable objects...
Also the code is quite performance critical, so it's not a good idea to alter it just for testing purposes. (Adding a test-only method that doesn't affect other methods is fine, but adding paths that are used only for testing in other methods is something to avoid).

If it is an option to access your SoftReference instances from the test you could simulate the GC behavior by calling methods directly on the SoftReference instances.
Calling SoftReference.clear() would correspond to the first step where the reference is cleared. Then you could call SoftReference.enqueue() to enqueue it in the reference queue, corresponding to the enqueing step the GC does [some time] after clearing the reference.
Calling these methods on a subset of your SoftReferences you could simulate that only some of the references have been cleared and enqueued.
I really think the above approach is to recommend since you got control of which references are cleared and that is a good thing in a test.
However, if you can not access your SoftReferences directly you are pretty much limited to allocating memory in an attempt to make the GC clear them. For example, as illustrated in this question and its answers.

Related

Out of memory errors in Java API that uses finalizers in order to free memory allocated by C calls

We have a Java API that is a wrapper around a C API.
As such, we end up with several Java classes that are wrappers around C++ classes.
These classes implement the finalize method in order to free the memory that has been allocated for them.
Generally, this works fine. However, in high-load scenarios we get out of memory exceptions.
Memory dumps indicate that virtually all the memory (around 6Gb in this case) is filled with the finalizer queue and the objects waiting to be finalized.
For comparison, the C API on its own never goes over around 150 Mb of memory usage.
Under low load, the Java implementation can run indefinitely. So this doesn't seem to be a memory leak as such. It just seem to be that under high load, new objects that require finalizing are generated faster than finalizers get executed.
Obviously, the 'correct' fix is to reduce the number of objects being created. However, that's a significant undertaking and will take a while. In the meantime, is there a mechanism that might help alleviate this issue? For example, by giving the GC more resources.

Java was designed around the idea that finalizers could be used as the primary cleanup mechanism for objects that go out of scope. Such an approach may have been almost workable when the total number of objects was small enough that the overhead of an "always scan everything" garbage collector would have been acceptable, but there are relatively few cases where finalization would be appropriate cleanup measure in a system with a generational garbage collector (which nearly all JVM implementations are going to have, because it offers a huge speed boost compared to always scanning everything).
Using Closable along with a try-with-resources constructs is a vastly superior approach whenever it's workable. There is no guarantee that finalize methods will get called with any degree of timeliness, and there are many situations where patterns of interrelated objects may prevent them from getting called at all. While finalize can be useful for some purposes, such as identifying objects which got improperly abandoned while holding resources, there are relatively few purposes for which it would be the proper tool.
If you do need to use finalizers, you should understand an important principle: contrary to popular belief, finalizers do not trigger when an object is actually garbage collected"--they fire when an object would have been garbage collected but for the existence of a finalizer somewhere [including, but not limited to, the object's own finalizer]. No object can actually be garbage collected while any reference to it exists in any local variable, in any other object to which any reference exists, or any object with a finalizer that hasn't run to completion. Further, to avoid having to examine all objects on every garbage-collection cycle, objects which have been alive for awhile will be given a "free pass" on most GC cycles. Thus, if an object with a finalizer is alive for awhile before it is abandoned, it may take quite awhile for its finalizer to run, and it will keep objects to which it holds references around long enough that they're likely to also earn a "free pass".
I would thus suggest that to the extent possible, even when it's necessary to use finalizer, you should limit their use to privately-held objects which in turn avoid holding strong references to anything which isn't explicitly needed for their cleanup task.

Phantom references is an alternative to finalizers available in Java.
Phantom references allow you to better control resource reclamation process.
you can combine explicit resource disposal (e.g. try with resources construct) with GC base disposal
you can employ multiple threads for postmortem housekeeping
Using phantom references is complicated tough. In this article you can find a minimal example of phantom reference base resource housekeeping.
In modern Java there are also Cleaner class which is based on phantom reference too, but provides infrastructure (reference queue, worker threads etc) for ease of use.

Why do we need weak reference in java

I understand that weak references are at the mercy of the garbage collector, and we cannot guarantee that the weak reference will exist. I could not see a need to have weak reference, but sure there should be a reason.
Why do we need weak reference in java?
What are the practical (some) uses of weak reference in java? If you can share how you used in your project it will be great!

It's actually quite often a bad idea to use weak hashmaps. For one it's easy to get wrong, but even worse it's usually used to implement some kind of cache.
What this does mean is the following: Your program runs fine with good performance for some time, under stress we allocate more and more memory (more requests = more memory pressure = probably more cache entries) which then leads to a GC.
Now suddenly while your system is under high stress you not only get the GC, but also lose your whole cache, just when you'd need it the most. Not fun this problem, so you at least have to use a reasonably sized hard referenced LRU cache to mitigate that problem - you can still use the weakrefs then but only as an additional help.
I've seen more than one project hit by that "bug"..

The most "unambiguously sensible" use of weak references I've seen is Guava's Striped, which does lock striping. Basically, if no threads currently hold a reference to a lock variable, then there's no reason to keep that lock around, is there? That lock might have been used in the past, but it's not necessary now.
If I had to give a rule for when you use which, I'd say that soft references are preferable when you might use something in the future, but could recompute it if you really needed it; weak references are especially preferable when you can be sure the value will never be needed after a reference goes out of memory. (For example, if you use the default reference equality for a particular equals(Object) implementation for a map key type, and the object stops being referenced anywhere else, then you can be sure that entry will never be referenced again.

The main reason for me to use weak references is indirectly, through a WeakHashMap.
You might want to store a collection of objects in a Map (as a cache or for any other reason), but don't want them to be in memory for as long as the Map exists, especially if the objects are relatively large.
By using a WeakHashMap, you can make sure that the reference from the Map isn't the only thing keeping the object in memory, since it'll be garbage collected if no other references exist.

Say you need to keep some information as long as an object is referenced, but you don't know when it will go away, you can use a weak reference to keep track of the information.

Yes and it has good impact.
Example of "widget serial number" problem above, the easiest thing to do is use the built-in WeakHashMap class. WeakHashMap works exactly like HashMap, except that the keys (not the values!) are referred to using weak references. If a WeakHashMap key becomes garbage, its entry is removed automatically. This avoids the pitfalls I described and requires no changes other than the switch from HashMap to a WeakHashMap. If you're following the standard convention of referring to your maps via the Map interface, no other code needs to even be aware of the change.
Reference

Weak Refernce Objects are needed to JVM platform to ensure means against the memory leaks.
As Java Developers should know, Java can leak, more than expected. This statement is particurarly true in those cases in which an Object is no longer used but some collection still refence strongly that instance: a very simple but recurrent example of memory leak, in fact that memory area will not be deallocated until the strong reference exists.
In the above case, a Weak Reference Object in the collection assures that: without a strong references chain, but with only weak chains, the instance can be elcted as garbace collectable.
In my opinion, all features provided by the Java Platform are useful to some extent: very skilled programmer can drive Java to be fast and reliable as C++ writing very high quality code.

Decoupling from pub-sub or event bus
A WeakReference is good when you want to let an object head for garbage-collection without having to gracefully remove itself from other objects holding an reference.
In scenarios such as publish-subscribe or an event bus, a collection of references to subscribing objects is held. Those references should be weak, so that the subscriber can conveniently go out of scope without bothering to unsubscribe. The subscriber can just “disappear” after all other places in the app have released their strong reference. At that point, there is no need for the subscription list or event bus to keep hanging on to the subscriber. The use of WeakReference allows the object to continue on its way into oblivion.
The subscribing object may have been subscribed without its knowledge, submitted to the pub-sub or event bus by some other 3rd-party object. Coordinating a call to unsubscribe later in the life-cycle of the subscriber can be quite cumbersome. So letting the subscriber fade away without formally unsubscribing may greatly simplify your code, and can avoid difficult bugs if that unsubscribing coordination were to fail.
Here is an example of a thread-safe set of weak references, as seen in this Answer and this Answer.
this.subscribersSet =
Collections.synchronizedSet(
Collections.newSetFromMap(
new WeakHashMap <>()
)
);
Note that the entry in the set is not actually removed until after garbage-collection actually executes, as discussed in linked Answer above. While existing as a candidate for garbage-collection (no remaining strong references held anywhere), the item remains in the set.

It is required to make Java garbage collection deterministic. (From the slightly satirical point of view with some truth to it).

Calling System.gc() causing data loss in JSP

I am currently debugging a web application which has been causing intermittent problems especially when the hit rate is high. As each new user is given a session containing personalised navigation menu I thought I would try to see how much memory the session variables need.
In a code section of the JSP which handles the creation of session objects, I added the following code:
System.gc();
long orgHeap = Runtime.getRuntime().freeMemory();
... create session variables ...
System.gc();
log.info("Heap delta: " + (Runtime.getRuntime().freeMemory() - orgHeap));
What surprised me is that adding System.gc() breaks the app. It does not work any more as some objects are getting lost (or so it seems). Values which had been initialised are null. When the gc() calls are removed the app works fine. gc() should only remove objects which have been marked for deletion - right?
Has any one else had similar problems?

The only case I can think of where this might actually happen as you describe, is if you're using WeakReferences (or expired SoftReferences, which are broadly similar in this situation) to refer to objects that you still want to keep. The objects the might get into a state where they're collectable by the GC; but until it actually runs, you'll still be able to reach them through your references. Calling System.gc(), even though it doesn't have any guaranteed semantics, might cause the collector to run and harvest all these weakly-reachable objects.
That seems unlikely, though, because
Accidentally using WeakReferences for strongly-reachable objects doesn't seem like an easy mistake to fall into. Even if you're using libraries, I find it hard to think of a case where you might feasibly end up using weak references by mistake.
If this did happen, the behaviour of the application would be undefined anyway. Garbage collections could happen at any time, so you'd likely see inconsistent behaviour without the System.gc() call. You'd always have some bit of code that runs just after a collection anyway and so couldn't find its referent object.
System.gc() doesn't theoretically do anything so it shouldn't be causing this.
That last point is the important one - why are you calling System.gc() anyway, when it is almost always counterproductive to call? I don't believe that you have a legitimate need to call it, it doesn't do anything you can rely on, and apparently it's breaking your application.
So if your app works fine without making the call, then just stop making it.
I would still consider inspecting how your app fits together though, because this is not going to be the actual cause of the problem and you likely have a deeper issue which is very fragile and just waiting to break on you later.
EDIT: Another possible reason for this could be simple timing. Calling System.gc() is likely to take a non-negligible amount of time. During this period, other threads could progress and change state in a way that the GCing thread isn't expecting. Hence when it returns from the call, the state of the world breaks its expectations and hence logic errors result. Again, this is just a guess, but is more plausible than the WeakReference one.

Garbage collection will never remove live objects, so I think your problem is elsewhere.
As you state this is under load - "hit rate is high" - this could be a case of using EJB's incorrectly, where you expect something to happen because it does so under low load (like getting the same EJB repeatedly when asking for it) but that change under high load (somebody else got that EJB, you get another).

What is a use case for a soft reference in Java?

What is a use case for a soft reference in Java? Would it be useful to garbage collect non-critical items when a JVM has run out of memory in order to free up enough resources to perhaps dump critical information before shutting down the JVM?
Are they called soft-references in they they are soft and break when "put under stress" ie:the JVM has run out of memory. I understand weak references and phantom references but not really when these would be needed.

One use is for caching. Imagine you want to maintain an in-memory cache of large objects but you don't want that cache to consume memory that could be used for other purposes (for the cache can always be rebuilt). By maintaining a cache of soft-references to the objects, the referenced objects can be freed by the JVM and the memory they occupied reused for other purposes. The cache would merely need to clear out broken soft-references when it encounters them.
Another use may be for maintaining application images on a memory-constrained device, such as a mobile phone. As the user opens applications, the previous application images could be maintained as soft-references so that they can be cleared out if the memory is needed for something else but will still be there if there is not demand for memory. This will allow the user to return to the application more quickly if there is no pressure on memory and allow the previous application's memory to be reclaimed if it is needed for something else.

This article gave me a good understanding of each of them (weak, soft and phantom references). Here's a summarized cite:
A weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself.
A soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.
A phantom reference is quite different than either SoftReference or WeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into a ReferenceQueue, as at that point you know the object to which it pointed is dead.

The best example I can think of is a cache. You might not mind dumping the oldest entries in the cache if memory became a problem. Caching large object graphs might make this likely as well.

An example of how a SoftReference can be used as a cache can be found in this post.

Cost of using weak references in Java

Has anyone researched the runtime costs involved in creating and garbage collecting Java WeakReference objects? Are there any performance issues (e.g. contention) for multi-threaded applications?
EDIT: Obviously the actual answer(s) will be JVM dependent, but general observations are also welcome.
EDIT 2: If anyone has done some benchmarking of the performance, or can point to some benchmarking results, that would be ideal. (Sorry, but the bounty has expired ...)

WeakReferences have negative impact on CMS garbage collector. As far as I can see from behavior of our server it influences parallel remark phase time. During this phase all app threads are stopped so it's extremely undesirable thing. So you need to be careful with WeakReferences.

I implemented a Java garbage collector once, so whatever I was able to accomplish is a (weak :) lower bound on what is possible.
In my implementation, there is a small constant amount of additional overhead for each weak reference when it is visited during garbage collection.
So the upshot is: I wouldn't worry about it, it's not a big problem unless you are using zillions of weak references.
Most importantly, the cost is proportional to the number of weak references in existence, not the size of the overall heap.
However, that's not to say that a garbage collector that supports weak references will be as fast as one that does not. The presumed question here is, given that Java supports weak references, what is the incremental cost of using them?
Mine was a simple "stop the world" mark/sweep garbage collector. During garbage collection, it makes a determination for every object whether that object is live or not and sets a LIVE bit in the object header. Then it goes through and frees all the non-live objects.
To handle weak references you just add the following:
Ignore weak references when setting LIVE bits (i.e., they don't cause the LIVE bit on the referenced object to be set).
During the sweep step, add a special check as follows: if the object you're visiting is LIVE, and it's a WeakReference, then check the object that it weakly references, and if that object is not LIVE, clear the reference.
Small variations of this logic work for soft and phantom references.
Implementation is here if you're really curious.

cache using weak references may significantly slow down your app if it's rebuild on demand e.g. in getters:
public Object getSomethingExpensiveToFind() {
if(cache.contains(EXPENSIVE_OBJ_KEY)) {
return cache.get(EXPENSIVE_OBJ_KEY);
}
Object sth = obtainSomethingExpensiveToFind(); // computationally expensive
cache.put(EXPENSIVE_OBJ_KEY, sth);
return sth;
}
imagine this scenario:
1) app is running low on memory
2) GC cleans weak references, thus cache is cleared too
3) app continues, a lot of methods like getSomethingExpensiveToFind() are invoked and rebuild the cache
4) app is running low on memory again
5) GC cleans wear references, clears cache
6) app continues, a lot of methods like getSomethingExpensiveToFind() are invoked and rebuild the cache again
7) and so on...
I came upon such problem - the app was interrupted by GC very often and it compeletly defeated the whole point of caching.
In other words, if improperly managed, weak references can slow down your application.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.