Backstory: So I had this great idea, right? Sometimes you're collecting a massive amount of data, and you don't need to access all of it all the time, but you also may not need it after the program has finished, and you don't really want to muck around with database tables, etc. What if you had a library that would silently and automatically serialize objects to disk when you're not using them, and silently bring them back when you needed them? So I started writing a library; it has a number of collections like "DiskList" or "DiskMap" where you put your objects. They keep your objects via WeakReferences. While you're still using a given object, it has strong references to it, so it stays in memory. When you stop using it, the object is garbage collected, and just before that happens, the collection serializes it to disk (*). When you want the object again, you ask for it by index or key, like usual, and the collection deserializes it (or returns it from its inner cache, if it hasn't been GCd yet).
(*) See now, this is the sticking point. In order for this to work, I need to be able to be notified JUST BEFORE the object is GCd - after no other references to it exist (and therefore the object can no longer be modified), but before the object is wiped from memory. This is proving difficult. I thought briefly that using a ReferenceQueue would save me, but alas, it returns a Reference, whose referent has thus far always been null.
Is there a way, having been given an arbitrary object, to receive (via callback or queue, etc.) the object after it is ready to be garbage collected, but before it IS garbage collected?
I know (Object).finalize() can basically do that, but I'll have to deal with classes that don't belong to me, and whose finalize methods I can't legitimately override. I'd prefer not to go as arcane as custom classloaders, bytecode manipulation, or reflection, but I will if I have to.
(Also, if you know of existing libraries that do transparent disk caching, I'd look favorably on that, though my requirements on such a library would be fairly stringent.)
You can look for a cache that supports "write behind caching" and tiering. Notable products would be EHCache, Hazelcast, Infinispan.
Or you can construct something by yourself with a cache and a time to idle expiry.
Then, the cache access would be "the usage" of the object.
Is there a way, having been given an arbitrary object, to receive (via callback or queue, etc.) the object after it is ready to be garbage collected, but before it IS garbage collected?
This interferes heavily with garbage collection. Chances are high that it will bring down your application or whole system. What you want to do is to start disk I/O and potentially allocate additional objects, when the system is low or out of memory. If you manage it to work, you'll end up using more heap than before, since the heap must always be extended when the GC kicks in.
Related
We have a Java API that is a wrapper around a C API.
As such, we end up with several Java classes that are wrappers around C++ classes.
These classes implement the finalize method in order to free the memory that has been allocated for them.
Generally, this works fine. However, in high-load scenarios we get out of memory exceptions.
Memory dumps indicate that virtually all the memory (around 6Gb in this case) is filled with the finalizer queue and the objects waiting to be finalized.
For comparison, the C API on its own never goes over around 150 Mb of memory usage.
Under low load, the Java implementation can run indefinitely. So this doesn't seem to be a memory leak as such. It just seem to be that under high load, new objects that require finalizing are generated faster than finalizers get executed.
Obviously, the 'correct' fix is to reduce the number of objects being created. However, that's a significant undertaking and will take a while. In the meantime, is there a mechanism that might help alleviate this issue? For example, by giving the GC more resources.
Java was designed around the idea that finalizers could be used as the primary cleanup mechanism for objects that go out of scope. Such an approach may have been almost workable when the total number of objects was small enough that the overhead of an "always scan everything" garbage collector would have been acceptable, but there are relatively few cases where finalization would be appropriate cleanup measure in a system with a generational garbage collector (which nearly all JVM implementations are going to have, because it offers a huge speed boost compared to always scanning everything).
Using Closable along with a try-with-resources constructs is a vastly superior approach whenever it's workable. There is no guarantee that finalize methods will get called with any degree of timeliness, and there are many situations where patterns of interrelated objects may prevent them from getting called at all. While finalize can be useful for some purposes, such as identifying objects which got improperly abandoned while holding resources, there are relatively few purposes for which it would be the proper tool.
If you do need to use finalizers, you should understand an important principle: contrary to popular belief, finalizers do not trigger when an object is actually garbage collected"--they fire when an object would have been garbage collected but for the existence of a finalizer somewhere [including, but not limited to, the object's own finalizer]. No object can actually be garbage collected while any reference to it exists in any local variable, in any other object to which any reference exists, or any object with a finalizer that hasn't run to completion. Further, to avoid having to examine all objects on every garbage-collection cycle, objects which have been alive for awhile will be given a "free pass" on most GC cycles. Thus, if an object with a finalizer is alive for awhile before it is abandoned, it may take quite awhile for its finalizer to run, and it will keep objects to which it holds references around long enough that they're likely to also earn a "free pass".
I would thus suggest that to the extent possible, even when it's necessary to use finalizer, you should limit their use to privately-held objects which in turn avoid holding strong references to anything which isn't explicitly needed for their cleanup task.
Phantom references is an alternative to finalizers available in Java.
Phantom references allow you to better control resource reclamation process.
you can combine explicit resource disposal (e.g. try with resources construct) with GC base disposal
you can employ multiple threads for postmortem housekeeping
Using phantom references is complicated tough. In this article you can find a minimal example of phantom reference base resource housekeeping.
In modern Java there are also Cleaner class which is based on phantom reference too, but provides infrastructure (reference queue, worker threads etc) for ease of use.
So what I am having here is a java program the manipulates a huge amount of data and store it into objects (Mainly hash-maps). At some point of the running time the data becomes useless and I need to discard so I can free up some memory.
My question is what would be the best behavior to discard these data to be garbage collected ?
I have tried the map.clear(), however this is not enough to clear the memory allocated by the map.
EDIT (To add alternatives I have tried)
I have also tried the system.gc() to force the garbage collector to run, however it did not help
HashMap#clear will throw all entries out of the HashMap, but it will not shrink it back to its initial capacity. That means you will have an empty backing array with (in your case, I guess) space for tens of thousands of entries.
If you do not intend to re-use the HashMap (with roughly the same amount of data), just throw away the whole HashMap instance (set it to null).
In addition to the above:
if the entries of the Map are still referenced by some other part of your system, they won't be garbage-collected even when they are removed from the Map (because they are needed elsewhere)
Garbage collections happens in the background, and only when it is required. So you may not immediately see a lot of memory being freed, and this may not be a problem.
system.gc()
is not recommended as jvm should be the only one to take care of all the garbage collection. Use Class WeakHashMap<K,V> in this case.
The objects will automatically be removed if the key is no longer valid
Please read this link for reference
I understand that weak references are at the mercy of the garbage collector, and we cannot guarantee that the weak reference will exist. I could not see a need to have weak reference, but sure there should be a reason.
Why do we need weak reference in java?
What are the practical (some) uses of weak reference in java? If you can share how you used in your project it will be great!
It's actually quite often a bad idea to use weak hashmaps. For one it's easy to get wrong, but even worse it's usually used to implement some kind of cache.
What this does mean is the following: Your program runs fine with good performance for some time, under stress we allocate more and more memory (more requests = more memory pressure = probably more cache entries) which then leads to a GC.
Now suddenly while your system is under high stress you not only get the GC, but also lose your whole cache, just when you'd need it the most. Not fun this problem, so you at least have to use a reasonably sized hard referenced LRU cache to mitigate that problem - you can still use the weakrefs then but only as an additional help.
I've seen more than one project hit by that "bug"..
The most "unambiguously sensible" use of weak references I've seen is Guava's Striped, which does lock striping. Basically, if no threads currently hold a reference to a lock variable, then there's no reason to keep that lock around, is there? That lock might have been used in the past, but it's not necessary now.
If I had to give a rule for when you use which, I'd say that soft references are preferable when you might use something in the future, but could recompute it if you really needed it; weak references are especially preferable when you can be sure the value will never be needed after a reference goes out of memory. (For example, if you use the default reference equality for a particular equals(Object) implementation for a map key type, and the object stops being referenced anywhere else, then you can be sure that entry will never be referenced again.
The main reason for me to use weak references is indirectly, through a WeakHashMap.
You might want to store a collection of objects in a Map (as a cache or for any other reason), but don't want them to be in memory for as long as the Map exists, especially if the objects are relatively large.
By using a WeakHashMap, you can make sure that the reference from the Map isn't the only thing keeping the object in memory, since it'll be garbage collected if no other references exist.
Say you need to keep some information as long as an object is referenced, but you don't know when it will go away, you can use a weak reference to keep track of the information.
Yes and it has good impact.
Example of "widget serial number" problem above, the easiest thing to do is use the built-in WeakHashMap class. WeakHashMap works exactly like HashMap, except that the keys (not the values!) are referred to using weak references. If a WeakHashMap key becomes garbage, its entry is removed automatically. This avoids the pitfalls I described and requires no changes other than the switch from HashMap to a WeakHashMap. If you're following the standard convention of referring to your maps via the Map interface, no other code needs to even be aware of the change.
Reference
Weak Refernce Objects are needed to JVM platform to ensure means against the memory leaks.
As Java Developers should know, Java can leak, more than expected. This statement is particurarly true in those cases in which an Object is no longer used but some collection still refence strongly that instance: a very simple but recurrent example of memory leak, in fact that memory area will not be deallocated until the strong reference exists.
In the above case, a Weak Reference Object in the collection assures that: without a strong references chain, but with only weak chains, the instance can be elcted as garbace collectable.
In my opinion, all features provided by the Java Platform are useful to some extent: very skilled programmer can drive Java to be fast and reliable as C++ writing very high quality code.
Decoupling from pub-sub or event bus
A WeakReference is good when you want to let an object head for garbage-collection without having to gracefully remove itself from other objects holding an reference.
In scenarios such as publish-subscribe or an event bus, a collection of references to subscribing objects is held. Those references should be weak, so that the subscriber can conveniently go out of scope without bothering to unsubscribe. The subscriber can just “disappear” after all other places in the app have released their strong reference. At that point, there is no need for the subscription list or event bus to keep hanging on to the subscriber. The use of WeakReference allows the object to continue on its way into oblivion.
The subscribing object may have been subscribed without its knowledge, submitted to the pub-sub or event bus by some other 3rd-party object. Coordinating a call to unsubscribe later in the life-cycle of the subscriber can be quite cumbersome. So letting the subscriber fade away without formally unsubscribing may greatly simplify your code, and can avoid difficult bugs if that unsubscribing coordination were to fail.
Here is an example of a thread-safe set of weak references, as seen in this Answer and this Answer.
this.subscribersSet =
Collections.synchronizedSet(
Collections.newSetFromMap(
new WeakHashMap <>()
)
);
Note that the entry in the set is not actually removed until after garbage-collection actually executes, as discussed in linked Answer above. While existing as a candidate for garbage-collection (no remaining strong references held anywhere), the item remains in the set.
It is required to make Java garbage collection deterministic. (From the slightly satirical point of view with some truth to it).
What is a use case for a soft reference in Java? Would it be useful to garbage collect non-critical items when a JVM has run out of memory in order to free up enough resources to perhaps dump critical information before shutting down the JVM?
Are they called soft-references in they they are soft and break when "put under stress" ie:the JVM has run out of memory. I understand weak references and phantom references but not really when these would be needed.
One use is for caching. Imagine you want to maintain an in-memory cache of large objects but you don't want that cache to consume memory that could be used for other purposes (for the cache can always be rebuilt). By maintaining a cache of soft-references to the objects, the referenced objects can be freed by the JVM and the memory they occupied reused for other purposes. The cache would merely need to clear out broken soft-references when it encounters them.
Another use may be for maintaining application images on a memory-constrained device, such as a mobile phone. As the user opens applications, the previous application images could be maintained as soft-references so that they can be cleared out if the memory is needed for something else but will still be there if there is not demand for memory. This will allow the user to return to the application more quickly if there is no pressure on memory and allow the previous application's memory to be reclaimed if it is needed for something else.
This article gave me a good understanding of each of them (weak, soft and phantom references). Here's a summarized cite:
A weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself.
A soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.
A phantom reference is quite different than either SoftReference or WeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into a ReferenceQueue, as at that point you know the object to which it pointed is dead.
The best example I can think of is a cache. You might not mind dumping the oldest entries in the cache if memory became a problem. Caching large object graphs might make this likely as well.
An example of how a SoftReference can be used as a cache can be found in this post.
In order to perform some testing, I'd like to check how my application behaves when some or all of the objects I have stored in a cache of SoftReference'd objects are disposed of.
In order to do this, I'd like to manually clear the references stored in the cached SoftReference objects - simulating the VM disposing of those objects - but only if nothing else currently has a strong reference to that object (which could be the case if another process had recently retrieved the referenced object from the cache).
My application is single-threaded, so I don't need to worry about the soft reachability of a cached object changing as this code is executing. This also means that I don't currently have any locking mechanisms - if I did have, I could possibly have used these to determine whether or not an object was 'being used' and hence strongly reachable, but alas I don't have need of such locking.
One approach I have tried is to create an additional SoftReference to every object stored in the cache which is registered with a ReferenceQueue. My hope was that in doing so, all of the softly reachable objects in the cache would have their additional SoftReference added to the queue, so all I had to do was loop over the queue, and remove those objects from my cache. However, it seems that the GC enqueues softly reachable objects to their respective queues at its leisure, so it is not guaranteed that anything will be added to the queue once I've finished iterating through the objects in the cache.
One thing that I have also looked at is the -XX:SoftRefLRUPolicyMSPerMB JVM option with a very small value. With judicious memory allocation, this will quite probably clear softly reachable objects from the cache for me the moment they are softly reachable, but I'd really like the app to run normally until I receive a request to clear the softly reachable objects from the cache. As a JVM option, I don't believe I can alter this value while my app is running.
So, does anyone have any ideas as to how I can determine whether or not an object is only softly reachable (and hence can be cleared)?
Edit: The a few extra points that may not have been clear:
The app will probably be doing useful work at the times that I want to clear these softly reference objects out. So I'd prefer to not try and cause the GC to clear objects out for me.
It would be preferable if I could select which softly reachable objects were cleared out.
I would like to app to run normally, i.e. using production memory settings. Changing settings in code, which can then be reset back to their production values, is fine.
IIRC, soft references are guaranteed (in some sense) to be cleared before an OutOfMemoryError is thrown. So, if you allocate lots of memory they should get cleared if the objects is not strongly referenced. (Not tested.)
Mixing some answers: as Tom Hawtin said allocate memory till you go outOfMemory, for example with this code:
private void doOutOfMemory() {
try {
List<byte[]> list = new ArrayList<byte[]>();
while (true) {
list.add(new byte[200 * 1024 * 1024]);
}
} catch (OutOfMemoryError ex) {
}
}
If you wan't to control what objects are cleared, take a strong reference on the objects you want to keep.
You may also use weakReferences instead and only call System.gc() to clear there is no guarantee they will always be cleared...
Substitute a weakreference system for your current soft reference system while testing.
The weak reference system will remove an objects with no other incoming references as soon as that happens, instead of waiting for the jvm to run garbage collection.