Cost of using weak references in Java

Cost of using weak references in Java - java

Has anyone researched the runtime costs involved in creating and garbage collecting Java WeakReference objects? Are there any performance issues (e.g. contention) for multi-threaded applications?
EDIT: Obviously the actual answer(s) will be JVM dependent, but general observations are also welcome.
EDIT 2: If anyone has done some benchmarking of the performance, or can point to some benchmarking results, that would be ideal. (Sorry, but the bounty has expired ...)

WeakReferences have negative impact on CMS garbage collector. As far as I can see from behavior of our server it influences parallel remark phase time. During this phase all app threads are stopped so it's extremely undesirable thing. So you need to be careful with WeakReferences.

I implemented a Java garbage collector once, so whatever I was able to accomplish is a (weak :) lower bound on what is possible.
In my implementation, there is a small constant amount of additional overhead for each weak reference when it is visited during garbage collection.
So the upshot is: I wouldn't worry about it, it's not a big problem unless you are using zillions of weak references.
Most importantly, the cost is proportional to the number of weak references in existence, not the size of the overall heap.
However, that's not to say that a garbage collector that supports weak references will be as fast as one that does not. The presumed question here is, given that Java supports weak references, what is the incremental cost of using them?
Mine was a simple "stop the world" mark/sweep garbage collector. During garbage collection, it makes a determination for every object whether that object is live or not and sets a LIVE bit in the object header. Then it goes through and frees all the non-live objects.
To handle weak references you just add the following:
Ignore weak references when setting LIVE bits (i.e., they don't cause the LIVE bit on the referenced object to be set).
During the sweep step, add a special check as follows: if the object you're visiting is LIVE, and it's a WeakReference, then check the object that it weakly references, and if that object is not LIVE, clear the reference.
Small variations of this logic work for soft and phantom references.
Implementation is here if you're really curious.

cache using weak references may significantly slow down your app if it's rebuild on demand e.g. in getters:
public Object getSomethingExpensiveToFind() {
if(cache.contains(EXPENSIVE_OBJ_KEY)) {
return cache.get(EXPENSIVE_OBJ_KEY);
}
Object sth = obtainSomethingExpensiveToFind(); // computationally expensive
cache.put(EXPENSIVE_OBJ_KEY, sth);
return sth;
}
imagine this scenario:
1) app is running low on memory
2) GC cleans weak references, thus cache is cleared too
3) app continues, a lot of methods like getSomethingExpensiveToFind() are invoked and rebuild the cache
4) app is running low on memory again
5) GC cleans wear references, clears cache
6) app continues, a lot of methods like getSomethingExpensiveToFind() are invoked and rebuild the cache again
7) and so on...
I came upon such problem - the app was interrupted by GC very often and it compeletly defeated the whole point of caching.
In other words, if improperly managed, weak references can slow down your application.

Related

Out of memory errors in Java API that uses finalizers in order to free memory allocated by C calls

We have a Java API that is a wrapper around a C API.
As such, we end up with several Java classes that are wrappers around C++ classes.
These classes implement the finalize method in order to free the memory that has been allocated for them.
Generally, this works fine. However, in high-load scenarios we get out of memory exceptions.
Memory dumps indicate that virtually all the memory (around 6Gb in this case) is filled with the finalizer queue and the objects waiting to be finalized.
For comparison, the C API on its own never goes over around 150 Mb of memory usage.
Under low load, the Java implementation can run indefinitely. So this doesn't seem to be a memory leak as such. It just seem to be that under high load, new objects that require finalizing are generated faster than finalizers get executed.
Obviously, the 'correct' fix is to reduce the number of objects being created. However, that's a significant undertaking and will take a while. In the meantime, is there a mechanism that might help alleviate this issue? For example, by giving the GC more resources.

Java was designed around the idea that finalizers could be used as the primary cleanup mechanism for objects that go out of scope. Such an approach may have been almost workable when the total number of objects was small enough that the overhead of an "always scan everything" garbage collector would have been acceptable, but there are relatively few cases where finalization would be appropriate cleanup measure in a system with a generational garbage collector (which nearly all JVM implementations are going to have, because it offers a huge speed boost compared to always scanning everything).
Using Closable along with a try-with-resources constructs is a vastly superior approach whenever it's workable. There is no guarantee that finalize methods will get called with any degree of timeliness, and there are many situations where patterns of interrelated objects may prevent them from getting called at all. While finalize can be useful for some purposes, such as identifying objects which got improperly abandoned while holding resources, there are relatively few purposes for which it would be the proper tool.
If you do need to use finalizers, you should understand an important principle: contrary to popular belief, finalizers do not trigger when an object is actually garbage collected"--they fire when an object would have been garbage collected but for the existence of a finalizer somewhere [including, but not limited to, the object's own finalizer]. No object can actually be garbage collected while any reference to it exists in any local variable, in any other object to which any reference exists, or any object with a finalizer that hasn't run to completion. Further, to avoid having to examine all objects on every garbage-collection cycle, objects which have been alive for awhile will be given a "free pass" on most GC cycles. Thus, if an object with a finalizer is alive for awhile before it is abandoned, it may take quite awhile for its finalizer to run, and it will keep objects to which it holds references around long enough that they're likely to also earn a "free pass".
I would thus suggest that to the extent possible, even when it's necessary to use finalizer, you should limit their use to privately-held objects which in turn avoid holding strong references to anything which isn't explicitly needed for their cleanup task.

Phantom references is an alternative to finalizers available in Java.
Phantom references allow you to better control resource reclamation process.
you can combine explicit resource disposal (e.g. try with resources construct) with GC base disposal
you can employ multiple threads for postmortem housekeeping
Using phantom references is complicated tough. In this article you can find a minimal example of phantom reference base resource housekeeping.
In modern Java there are also Cleaner class which is based on phantom reference too, but provides infrastructure (reference queue, worker threads etc) for ease of use.

Java - Why forced garbage collection doesn't release memory

I am generating a large data structure and write it to hard disk. Afterwards I want to get rid of the object, to reduce the memory consumption. My problem is that after I had forced a garbage collection the amount of used memory is at least as high as it was before garbage collection. I have added a minimal working example what I am doing.
DataStructure data = new DateStructure();
data.generateStructure(pathToData);
Writer.writeData(data);
WeakReference<Object> ref = new WeakReference<Object>(data);
data = null;
while (ref.get() != null) {
System.gc();
}
The code should force a garbage collection on the data object as it is recommended in thread:
Forcing Garbage Collection in Java?
I know this garbage collection does guarantee the deletion of the data object, but in the past I was more successful by using the garbage collection as described at the link as using simply System.gc().
Maybe someone has an answer whats the best way to get rid of large objects.

It seems that this is premature optimization (or rather an illusion of it). System.gc(); is not guaranteed to force a garbage collection. What you are doing here is busy waiting for some non-guaranteed gc to happen. But if the heap does not get filled up the JVM might not start a garbage collection at all.
I think that you should start thinking about this problem when you stress test your application and you can identify this part as a bottleneck.
So in a nutshell you can't really force a gc and this is intentional. The JVM will know when and how to free up space. I think that if you clear your references and call System.gc(); you can move on without caring about whether it gets cleaned up or not. You may read the Official documentation about how to fine-tune the garbage collector. You should rather be using some GC tuning according to the documentation than asking java to GC from your code.
Just a sidenote: the JVM will expand some of the heap's generations if the need arises. As far as I know there is a configuration option where you can set some percentage when the JVM will contract a generation. Use MinHeapFreeRatio/MaxHeapFreeRatio if you don't want Java to reserve memory which it does not need.

This idiom is broken for a whole range of reasons, here are some:
System.gc() doesn't force anything; it is just a hint to the garbage collector;
there is no guarantee when a weak reference will be cleared. The spec says "Suppose that the garbage collector determines at a certain point in time that an object is weakly reachable". When that happens, it is up to the implementation;
even after the weak reference is cleared, there is no telling when its referent's memory will actually be reclaimed. The only thing you know at that point is that the object has transitioned from "weakly reachable" to "finalizable". It may even be resurrected from the finalizer.
From my experience, just doing System.gc a fixed number of times, for example three, with delays between them (your GC could be ConcurrentMarkSweep) in the range of half-second to second, gives much stabler results than these supposedly "smart" approaches.
A final note: never use System.gc in production code. It is almost impossible to make it bring any value to your project. I use it only for microbenchmarking.
UPDATE
In the comments you provide a key piece of information which is not in your question: you are interested in reducing the total heap size (Runtime#totalMemory) after you are done with your object, and not just the heap occupancy (Runtime#totalMemory-Runtime#freeMemory). This is completely outside of programmatic control and on certain JVM implementations it never happens: once the heap has increased, the memory is never released back to the operating system.

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
Something along the lines of http://wwwasd.web.cern.ch/wwwasd/lhc++/Objectivity/V5.2/Java/guide/jgdStorage.fm.html and specifically non-garbage-collectible containers there (non-garbage-collectable?).
The problem is that I have lots of ordinary temporary objects, but I have even bigger (several Gigs) of objects that are stored for Cache purposes. For no reason should the Java GC traverse all those Cache gigabytes trying to find anything to collect, because they contain cached data which have their own timeouts.
This way I could partition my data in a custom way into infinite-lived and normal-lived objects, and hopefully GC would be quite fast because normal objects don't live so long and amount to smaller amounts.
There are some workarounds to this problem, such as Apache DirectMemory and Commercial Terracotta BigMemory(http://terracotta.org/products/bigmemory), but a java-native solution would be nicer (I mean free and probably more reliable?). Also I want to avoid serialization overhead which means it should happen within same jvm. To my understanding DirectMemory and BigMemory operate mainly off heap which means that the objects must be serialized/deserialized to/from memory outside jvm. Simply marking non-gc regions within the jvm would seem a better solution. Using Files for cache is not an option either, it has the same unaffordable serialization/deserialization overhead - use case is a HA server with lots of data used in random (human) order and low latency needed.

Any memory the JVM manages is also garbage-collected by the JVM. And any “live” objects which are directly available to Java methods without deserialization have to live in JVM memory. Therefore in my understanding you cannot have live objects which are immune to garbage collection.
On the other hand, the usage you describe should make the generational approach to garbage collection quite efficient. If your big objects stay around for a while, they will be checked for reclamation less often. So I doubt there is much to be gained from avoiding those checks.

Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
No it is not possible.
You can prevent objects from being garbage collected by keeping them reachable, but the GC will still need to trace them to check reachability on each full; GC (at least).
Is simply my assumption, that when the jvm is starving it begins scanning all those unnecessary objects too.
Yes. That is correct. However, unless you've got LOTS of objects that you want to be treated this way, the overhead is likely to be insignificant. (And anyway, a better idea is to give the JVM more memory ... if that is possible.)

Quite simply, for you to be able to do this, the garbage collection algorithm would need to be aware of such a flag, and take it into account when doing its work.
I'm not aware of any of the standard GC algorithms having such a flag, so for this to work you would need to write your own GC algorithm (after deciding on some feasible way to communicate this information to it).
In principle, in fact, you've already started down this track - you're deciding how garbage collection should be done rather than being happy to leaving it to the JVM's GC algo. Is the situation you describe a measurable problem for you; something for which the existing garbage collection is insufficient, but your plan would work? Garbage collectors are extremely well-tuned, so I wouldn't be surprised if the "inefficient" default strategy is actually faster than your naively-optimal one.
(Doing manual memory management is tricky and error-prone at the best of times; managing some memory yourself while using a stock garbage collector to handle the rest seems even worse. I expect you'd run into a lot of edge cases where the GC assumes it "knows" what's happening with the whole heap, which would no longer be true. Steer clear if you can...)

The recommended approaches would be to use either a commerical RTSJ implementation to avoid GC, or to use off heap memory. One could also look into soft references for caches as well (they do get collected).
This is not recommended:
If for some reason you do not believe these options are sufficient, you could look into direct memory access which is UNSAFE (part of sun.misc.Unsafe). You can use the 'theUnsafe' field to get the 'Unsafe' instance. Unsafe allows to allocation/deallocate memory via 'allocateMemory' and 'freeMemory'. This is not under GC control nor limited by JVM heap size. The impact on GC/application, once you go down this route, is not guaranteed - which is why using byte buffers might be the way to go (if you're not using a RTSJ like implementation).
Hope this helps.

Living Java objects will always be part of the GC life cycle. Or said another way, marking an object to be non-gc is the same order of overhead than having your object referenced by a root reference (a static final map for instance).
But thinking a bit further, data put in a cache are most likely to be temporary, and would eventually be evicted. At that point you will start again to like the JVM and the GC.
If you have 100's of GBs of permanent data, you may want to rethink the architecture of your application, and try to shard and distribute your data (horizontally scalability).
Last but not least, lots of work has been done around serialization, and the overhead of serialization (I'm not speaking about the poor reputation of ObjectInputStream and ObjectOutputStream) is not that big.
More than that, if your data is mainly composed of primitive types (including bytes array), there is efficient way to readInt() or readBytes() from off heap buffers (for instannce netty.io's ChannelBuffer). This could be a way to go.

Why doesn't the JVM destroy a resource as soon as its reference count hits 0?

I have always wondered why the garbage collector in Java activates whenever it feels like it rather than do:
if(obj.refCount == 0)
{
delete obj;
}
Are there any big advantages to how Java does it that I overlooked?
Thanks

Each JVM is different, but the HotSpot JVM does not primarily rely on reference counting as a means for garbage collection. Reference counting has the advantage of being simple to implement, but it is inherently error-prone. In particular, if you have a reference cycle (a set of objects that all refer to one another in a cycle), then reference counting will not correctly reclaim those objects because they all have nonzero reference count. This forces you to use an auxiliary garbage collector from time to time, which tends to be slower (Mozilla Firefox had this exact problem, and IIRC their solution was to add in a garbage collector on top of reference counting). This is why, for example, languages like C++ tend to have a combination of shared_ptrs that use reference counting and weak_ptrs that don't use reference cycles.
Additionally, associating a reference count with each object makes the cost of assigning a reference greater than normal, because of the extra bookkeeping involved of adjusting the reference count (which only gets worse in the presence of multithreading). Furthermore, using reference counting precludes the use of certain types fast of memory allocators, which can be a problem. It also tends to lead to heap fragmentation in its naive form, since objects are scattered through memory rather than tightly-packed, decreasing allocation times and causing poor locality.
The HotSpot JVM uses a variety of different techniques to do garbage collection, but its primary garbage collector is called a stop-and-copy collector. This collector works by allocating objects contiguously in memory next to one another, and allows for extremely fast (one or two assembly instructions) allocation of new objects. When space runs out, all of the new objects are GC'ed simultaneously, which usually kills off most of the new objects that were constructed. As a result, the GC is much faster than a typically reference-counting implementation, and ends up having better locality and better performance.
For a comparison of techniques in garbage collecting, along with a quick overview of how the GC in HotSpot works, you may want to check out these lecture slides from a compilers course that I taught last summer. You may also want to look at the HotSpot garbage collection white paper that goes into much more detail about how the garbage collector works, including ways of tuning the collector on an application-by-application basis.
Hope this helps!

Reference counting has the following limitations:
It is VERY bad for multithreading performance (basically, every assignment of an object reference must be protected).
You cannot free cycles automatically

Because it doesn't work strictly based on reference counting.
Consider circular references which are no longer reachable from the "root" of the application.
For example:
APP has a reference to SOME_SCREEN
SOME_SCREEN has a reference to SOME_CHILD
SOME_CHILD has a reference to SOME_SCREEN
now, APP drops it's reference to SOME_SCREEN.
In this case, SOME_SCREEN still has a reference to SOME_CHILD, and SOME_CHILD still has a reference to SOME_SCREEN - so, in this case, your example doesn't work.
Now, others (Apple with ARC, Microsoft with COM, many others) have solutions for this and work more similarly to how you describe it.
With ARC you have to annotate your references with keywords like strong and weak to let ARC know how to deal with these references (and avoid circular references)... (don't read too far into my specific example with ARC because ARC handles these things ahead-of-time during the compilation process and doesn't require a specific runtime per-se) so it can definitely be done similarly to how you describe it, but it's just not workable with some of the features of Java. I also believe COM works more similarly to how you describe... but again, that's not free of some amount of consideration on the developer's part.
In fact, no "simple" reference counting scheme would ever be workable without some amount of thought by the application developer (to avoid circular references, etc)

Because the garbage collector in modern JVMs is no longer tracking references count. This algorithm is used to teach how GC works, but it was both resource-consuming and error-prone (e.g. cyclic dependencies).

because the garbage collector in java is based on copying collector for 'youg generation' objects, and
mark and sweep for `tenure generations' objects.
Resources from: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

What is a use case for a soft reference in Java?

What is a use case for a soft reference in Java? Would it be useful to garbage collect non-critical items when a JVM has run out of memory in order to free up enough resources to perhaps dump critical information before shutting down the JVM?
Are they called soft-references in they they are soft and break when "put under stress" ie:the JVM has run out of memory. I understand weak references and phantom references but not really when these would be needed.

One use is for caching. Imagine you want to maintain an in-memory cache of large objects but you don't want that cache to consume memory that could be used for other purposes (for the cache can always be rebuilt). By maintaining a cache of soft-references to the objects, the referenced objects can be freed by the JVM and the memory they occupied reused for other purposes. The cache would merely need to clear out broken soft-references when it encounters them.
Another use may be for maintaining application images on a memory-constrained device, such as a mobile phone. As the user opens applications, the previous application images could be maintained as soft-references so that they can be cleared out if the memory is needed for something else but will still be there if there is not demand for memory. This will allow the user to return to the application more quickly if there is no pressure on memory and allow the previous application's memory to be reclaimed if it is needed for something else.

This article gave me a good understanding of each of them (weak, soft and phantom references). Here's a summarized cite:
A weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself.
A soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.
A phantom reference is quite different than either SoftReference or WeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into a ReferenceQueue, as at that point you know the object to which it pointed is dead.

The best example I can think of is a cache. You might not mind dumping the oldest entries in the cache if memory became a problem. Caching large object graphs might make this likely as well.

An example of how a SoftReference can be used as a cache can be found in this post.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.