I want to write a cache using SoftReferences using as much memory as possible, as long as it doesn't get too inefficient.
Trying to estimate the used size by calculating object sizes or by getting some used memory approximation of the JVM are dead ends.
The javadoc even states that SoftReferences are good for memory-aware caches, but there is no hard rule on how a JVM implementation shall handle SoftReferences. I'm only talking about the Oracle implementation of the JVM (Version 6.22 and above and Version 7).
Now my questions (please feel free to answer partial, grouped or in any way you please):
Does the JVM take the last access of the object into account and only remove the old ones? Javadoc states: Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references.
What happens when memory gets tight? The JVM panics and just eats all objects?
Is there a parameter for telling the JVM to only eat as much to survive (no OOMEs) and live healthy (not having the CPU only run the GC)
I don't think there is an order. (I'm not sure though about the order of events)
But what happens with soft references is that it is always guaranteed that they will be released before there is an out of memory exception. Unless you have a hard reference pointing to them.
But you should be aware that you might try to access them and they are gone. My guess is that the garbage collector will just eat the first soft reference that fits the amount needed for the operation.
Although SoftReferences are a cool feature, I personally don't dare using them in large
projects where I don't know the memory requirements of every other component. Will a memory-hogging SoftReference cache make other parts perform badly?
I'd instead of using SoftReferences I'd consider using EHCache. It let's you limit the size of particular caches in terms of number of entries, or even better, the bytes used in memory (this is a new feature in the upcoming version 2.5). Different eviction strategies can be configured, of course, such as LRU. There's lots you can configure with EHCache.
If you're using Spring, then version 3.1 will also provide you with some nice #Cachable method-level annotations; EHCache can be used as a caching implementation there.
What happens when memory gets tight? The JVM panics and just eats all
objects?
I know for a fact that with Oracle 1.6 JVM this is not the case. I am aware of a situation where a server that processes concurrent requests uses a response the contains the actual data inside a soft reference. I have observed that when a low memory situation is reported by one thread the other threads' soft references continue to hold on to their contents (the referenced objects).
Is there a parameter for telling the JVM to only eat as much to
survive (no OOMEs) and live healthy (not having the CPU only run the
GC)
What is enough to survive? You mean that if X amount of memory is required then only reclaim soft-references till X is available? I didn't find any such tuning parameter but as I said JVM does not seem to be reclaiming all soft references when it needs to reclaim one.
Related
We have a Java API that is a wrapper around a C API.
As such, we end up with several Java classes that are wrappers around C++ classes.
These classes implement the finalize method in order to free the memory that has been allocated for them.
Generally, this works fine. However, in high-load scenarios we get out of memory exceptions.
Memory dumps indicate that virtually all the memory (around 6Gb in this case) is filled with the finalizer queue and the objects waiting to be finalized.
For comparison, the C API on its own never goes over around 150 Mb of memory usage.
Under low load, the Java implementation can run indefinitely. So this doesn't seem to be a memory leak as such. It just seem to be that under high load, new objects that require finalizing are generated faster than finalizers get executed.
Obviously, the 'correct' fix is to reduce the number of objects being created. However, that's a significant undertaking and will take a while. In the meantime, is there a mechanism that might help alleviate this issue? For example, by giving the GC more resources.
Java was designed around the idea that finalizers could be used as the primary cleanup mechanism for objects that go out of scope. Such an approach may have been almost workable when the total number of objects was small enough that the overhead of an "always scan everything" garbage collector would have been acceptable, but there are relatively few cases where finalization would be appropriate cleanup measure in a system with a generational garbage collector (which nearly all JVM implementations are going to have, because it offers a huge speed boost compared to always scanning everything).
Using Closable along with a try-with-resources constructs is a vastly superior approach whenever it's workable. There is no guarantee that finalize methods will get called with any degree of timeliness, and there are many situations where patterns of interrelated objects may prevent them from getting called at all. While finalize can be useful for some purposes, such as identifying objects which got improperly abandoned while holding resources, there are relatively few purposes for which it would be the proper tool.
If you do need to use finalizers, you should understand an important principle: contrary to popular belief, finalizers do not trigger when an object is actually garbage collected"--they fire when an object would have been garbage collected but for the existence of a finalizer somewhere [including, but not limited to, the object's own finalizer]. No object can actually be garbage collected while any reference to it exists in any local variable, in any other object to which any reference exists, or any object with a finalizer that hasn't run to completion. Further, to avoid having to examine all objects on every garbage-collection cycle, objects which have been alive for awhile will be given a "free pass" on most GC cycles. Thus, if an object with a finalizer is alive for awhile before it is abandoned, it may take quite awhile for its finalizer to run, and it will keep objects to which it holds references around long enough that they're likely to also earn a "free pass".
I would thus suggest that to the extent possible, even when it's necessary to use finalizer, you should limit their use to privately-held objects which in turn avoid holding strong references to anything which isn't explicitly needed for their cleanup task.
Phantom references is an alternative to finalizers available in Java.
Phantom references allow you to better control resource reclamation process.
you can combine explicit resource disposal (e.g. try with resources construct) with GC base disposal
you can employ multiple threads for postmortem housekeeping
Using phantom references is complicated tough. In this article you can find a minimal example of phantom reference base resource housekeeping.
In modern Java there are also Cleaner class which is based on phantom reference too, but provides infrastructure (reference queue, worker threads etc) for ease of use.
I know that the garbage-collection is used to get rid of the orphaned objects (the ones that loses their references) but is it possible to set custom intervals for garbage-collecting in Java?
It is not advisable for an application to tell the GC to run. It is better to leave it to the JVM to make the decision.
Why?
Because the JVM knows best. The JVM has access to information that allows it to run the JVM at the best time, to optimize either for high throughput or low pause times. It can monitor the size of the various heap "spaces", and estimate the best time to initiate a collection, and what kind of collection to initiate. The decision making is complicated.
By contrast, if an application calls System.gc() on a fixed time interval, it may run when it doesn't need to, using CPU cycles unnecessarily. Indeed, if you run the GC when there is no garbage, it spend a lot of time scanning all of the live objects ... and then not achieve anything.
The other thing to note that if System.gc() is not ignored, a common behavior is to run a full garbage collection. Depending on your JVM's GC options, this may cause all application threads to be frozen. If the heap is large, the "GC pauses" for full collections can be significant.
Another answer suggests using the sun.rmi.dgc.client.gcInterval property. This is designed to deal with the collection of remote references in an RMI application. It may work in other contexts, but it is inadvisable for the reasons stated above.
Using Runtime.getRuntime().gc() or System.gc() you can suggest JVM for garbage collection but you cannot set intervals or call it in any way
public static void gc()
Calling the gc method suggests that the Java Virtual Machine expend effort toward recycling unused objects in order to make the memory they currently occupy available for quick reuse. When control returns from the method call, the Java Virtual Machine has made a best effort to reclaim space from all discarded objects.
The whole point of a garbage collection is, that the developer does not need to worry about the memory management whatsoever (as always there might be exceptions, but these are rare)
As pointed out you can only suggest the JVM to trigger the GC from your java code.
If you have problems with your GC interval, maybe you can instead set some parameters to your JVM?
Maybe you can use the parameter:
gcInterval(ms) = max interval between GC
But I have never used it myself, so no experience on this one.
Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
Something along the lines of http://wwwasd.web.cern.ch/wwwasd/lhc++/Objectivity/V5.2/Java/guide/jgdStorage.fm.html and specifically non-garbage-collectible containers there (non-garbage-collectable?).
The problem is that I have lots of ordinary temporary objects, but I have even bigger (several Gigs) of objects that are stored for Cache purposes. For no reason should the Java GC traverse all those Cache gigabytes trying to find anything to collect, because they contain cached data which have their own timeouts.
This way I could partition my data in a custom way into infinite-lived and normal-lived objects, and hopefully GC would be quite fast because normal objects don't live so long and amount to smaller amounts.
There are some workarounds to this problem, such as Apache DirectMemory and Commercial Terracotta BigMemory(http://terracotta.org/products/bigmemory), but a java-native solution would be nicer (I mean free and probably more reliable?). Also I want to avoid serialization overhead which means it should happen within same jvm. To my understanding DirectMemory and BigMemory operate mainly off heap which means that the objects must be serialized/deserialized to/from memory outside jvm. Simply marking non-gc regions within the jvm would seem a better solution. Using Files for cache is not an option either, it has the same unaffordable serialization/deserialization overhead - use case is a HA server with lots of data used in random (human) order and low latency needed.
Any memory the JVM manages is also garbage-collected by the JVM. And any “live” objects which are directly available to Java methods without deserialization have to live in JVM memory. Therefore in my understanding you cannot have live objects which are immune to garbage collection.
On the other hand, the usage you describe should make the generational approach to garbage collection quite efficient. If your big objects stay around for a while, they will be checked for reclamation less often. So I doubt there is much to be gained from avoiding those checks.
Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
No it is not possible.
You can prevent objects from being garbage collected by keeping them reachable, but the GC will still need to trace them to check reachability on each full; GC (at least).
Is simply my assumption, that when the jvm is starving it begins scanning all those unnecessary objects too.
Yes. That is correct. However, unless you've got LOTS of objects that you want to be treated this way, the overhead is likely to be insignificant. (And anyway, a better idea is to give the JVM more memory ... if that is possible.)
Quite simply, for you to be able to do this, the garbage collection algorithm would need to be aware of such a flag, and take it into account when doing its work.
I'm not aware of any of the standard GC algorithms having such a flag, so for this to work you would need to write your own GC algorithm (after deciding on some feasible way to communicate this information to it).
In principle, in fact, you've already started down this track - you're deciding how garbage collection should be done rather than being happy to leaving it to the JVM's GC algo. Is the situation you describe a measurable problem for you; something for which the existing garbage collection is insufficient, but your plan would work? Garbage collectors are extremely well-tuned, so I wouldn't be surprised if the "inefficient" default strategy is actually faster than your naively-optimal one.
(Doing manual memory management is tricky and error-prone at the best of times; managing some memory yourself while using a stock garbage collector to handle the rest seems even worse. I expect you'd run into a lot of edge cases where the GC assumes it "knows" what's happening with the whole heap, which would no longer be true. Steer clear if you can...)
The recommended approaches would be to use either a commerical RTSJ implementation to avoid GC, or to use off heap memory. One could also look into soft references for caches as well (they do get collected).
This is not recommended:
If for some reason you do not believe these options are sufficient, you could look into direct memory access which is UNSAFE (part of sun.misc.Unsafe). You can use the 'theUnsafe' field to get the 'Unsafe' instance. Unsafe allows to allocation/deallocate memory via 'allocateMemory' and 'freeMemory'. This is not under GC control nor limited by JVM heap size. The impact on GC/application, once you go down this route, is not guaranteed - which is why using byte buffers might be the way to go (if you're not using a RTSJ like implementation).
Hope this helps.
Living Java objects will always be part of the GC life cycle. Or said another way, marking an object to be non-gc is the same order of overhead than having your object referenced by a root reference (a static final map for instance).
But thinking a bit further, data put in a cache are most likely to be temporary, and would eventually be evicted. At that point you will start again to like the JVM and the GC.
If you have 100's of GBs of permanent data, you may want to rethink the architecture of your application, and try to shard and distribute your data (horizontally scalability).
Last but not least, lots of work has been done around serialization, and the overhead of serialization (I'm not speaking about the poor reputation of ObjectInputStream and ObjectOutputStream) is not that big.
More than that, if your data is mainly composed of primitive types (including bytes array), there is efficient way to readInt() or readBytes() from off heap buffers (for instannce netty.io's ChannelBuffer). This could be a way to go.
Per the documentation for Guava's MapMaker.softValues():
Warning: in most circumstances it is better to set a per-cache maximum size instead of using soft references. You should only use this method if you are well familiar with the practical consequences of soft references.
I have an intermediate understanding of soft references - their behavior, uses, and their contract with garbage collection. However I'm wondering what these practical consequences are which the doc alludes to. Why exactly is it better to use maximum size rather than soft references? Don't the algorithms and behavior of soft references make their use more efficient that a hardcoded ceiling, in terms of implementing a cache?
I think that all they are alluding too is that you should be prepared for maximum memory usage, and potentially more gc activity, if you use a Soft reference map, since references are only gc'd as memory needs to be freed up.
If you know you only need the last n values in the cache then using a LRU Cache is a leaner approach, with more predictable resource usage for a running application.
Furthermore, according to this, it seems there are subtle differences in behaviour between -server and -client JVM's.
The Sun JRE does treat SoftReferences
differently from WeakReferences. We
attempt to hold on to object
referenced by a SoftReference if there
isn't pressure on the available
memory. One detail: the policy for the
"-client" and "-server" JRE's are
different: the -client JRE tries to
keep your footprint small by
preferring to clear SoftReferences
rather than expand the heap, whereas
the -server JRE tries to keep your
performance high by preferring to
expand the heap (if possible) rather
than clear SoftReferences. One size
does not fit all.
One of the practical problems with using SoftReferences is that they tend to be discarded all at once. The reason you have a cache is to provide pretty good perform, most of the time.
However using SoftReferences for a cache can mean after your application has stopped for a GC, it will run slowly until the cache is rebuilt. i.e. Just at the time you need to application catch up.
Note: You can use a LinkedHashMap as an LRU cache, its doesn't have to be complex.
I'm on my way with implementing a caching mechanism for my Android application.
I use SoftReference, like many examples I've found. The problem is, when I scroll up or down in my ListView, the most of the images are already cleared. I can see in LogCat that my application is garbage collected everytime the application loads new images. That means that the most of the non-visible images in the ListView are gone.
So, everytime I scroll back to an earlier position (where I really downloaded images before) I have to download the images once again - they're not cached.
I've also researched this topic. According to Mark Murphy in this article, it seems that there is (or was?) a bug with the SoftReference. Some other results indicates the same thing (or the same result); SoftReferences are getting cleared too early.
Is there any working solution?
SoftReference are the poor mans Cache. The JVM can hold those reference alive longer, but doesn't have to. As soon as there's no hard reference anymore, the JVM can garbage collect a the soft-referenced Object. The behavior of the JVM you're experiencing is correct, since the JVM doesn't have to hold such object longer around. Of course most JVMs try to keep the soft reference object alive to some degree.
Therefore SoftReferences are kind of a dangerous cache. If you really want to ensure a caching-behavior, you need a real cache. Like a LRU-cache. Especially if you're caching is performance-critical, you should use a proper cache.
From Android Training site:
http://developer.android.com/training/displaying-bitmaps/cache-bitmap.html
In the past, a popular memory cache implementation was a SoftReference
or WeakReference bitmap cache, however this is not recommended.
Starting from Android 2.3 (API Level 9) the garbage collector is more
aggressive with collecting soft/weak references which makes them
fairly ineffective. In addition, prior to Android 3.0 (API Level 11),
the backing data of a bitmap was stored in native memory which is not
released in a predictable manner, potentially causing an application
to briefly exceed its memory limits and crash.
More information in link.
We shoud use LruCache instead.
Cache each image on persistent storage instead of just in memory.
Gamlor's answer is correct in your situtation. However, for additional information, see the GC FAQ, question 32.
The Java HotSpot Server VM uses the maximum possible heap size (as set by the -Xmx option) to calculate free space remaining.
The Java HotSpot Client VM uses the current heap size to calculate the free space.
This means that the general tendency is for the Server VM to grow the heap rather than flush soft references, and -Xmx therefore has a significant effect on when soft references are garbage collected.
Jvm follows this simple equation to determine if a soft reference should get cleared:
interval <= free_heap * ms_per_mb
interval is duration between last gc cycle timestamp and the last access timestamp of soft reference.
free heap is heap space available at that moment.
ms_per_mb is milliseconds allocated to every MB available in heap. (Constant default 1000 ms)
If above equation is false, reference gets cleared.
So, even if you have a lot of free memory, if your soft references have not been accessed for an ample amount of time, they will get cleared.
-XX:SoftRefLRUPolicyMSPerMB= jvm arg can be used to tweak ms_per_mb constant.