Java Empty the cache before OutOfMemoryError - java

Is there a reliable approach to empty the cache before the memory is full?
Or even better limit the cache according to current available "actual" free memory (hard-referenced objects)?
A soft referenced cache is not a good idea due to high GC penalty, once hit the limit all cache entries need to be reloaded.
Also the value runtime.freeMemory() is not that reliable for my purpose because even if it is too low, after the next GC cycle there might be plenty of free space so it's not a good indication of the actual used memory.
I tried to figure out how much memory each primitive time would consume so I would know the actual memory usage of the cache and put a limit on it, but couldn't find a reliable way to figure out how much memory would be used to store a String reference of size n.

Have two or three collections. If you want degrading service with memory availability you can have.
a map on the most recent entries, e.g. LinkedHashMap.
a map of soft references.
a map of weak references.
You can control how large each map should be with the knowledge that weak references can be cleared after a minor collection, soft references will be cleared if needed, and the strong references map has the core data which will always be retained.
BTW: If you are hitting your memory limit often, you should consider buying more memory up to about 32 GB per JVM. You can buy 32 GB for less than $200.

Try one of the more recent Oracle 1.7 incarnations. They should offer a GarbageCollectorMXBean and GarbageCollectionNotificationInfo. Use that to monitor the amount of used/unused memory after each GC cycle. There is some sample code here.
You can then use a multi-level cache as suggested by Peter to clean out the outer level when memory is tight, but retain the smaller first-level cache.

I would suggest that the simplest solution would be to change your references to weak references.
This way the references can still finalized and garbage collected when all strong references have gone out of scope.
See: http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/ref/WeakReference.html

Related

Does garbage collection change the object addresses in Java?

I read that garbage collection can lead to memory fragmentation problem at run-time. To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
This means that the object addresses must change from time to time? Also, if this happens,
Are the references to these objects also re-assigned?
Won't this cause significant performance issues? How does Java cope with it?
I read that garbage collection can lead to memory fragmentation problem at run-time.
This is not an exclusive problem of garbage collected heaps. When you have a manually managed heap and free memory in a different order than the preceding allocations, you may get a fragmented heap as well. And being able to have different lifetimes than the last-in-first-out order of automatic storage aka stack memory, is one of the main motivations to use the heap memory.
To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
Not necessarily all objects. Typical implementation strategies will divide the memory into logical regions and only move objects from a specific region to another, but not all existing objects at a time. These strategies may incorporate the age of the objects, like generational collectors moving objects of the young generation from the Eden space to a Survivor space, or the distribution of the remaining objects, like the “Garbage First” collector which will, like the name suggests, evacuate the fragment with the highest garbage ratio first, which implies the smallest work to get a free contiguous memory block.
This means that the object addresses must change from time to time?
Of course, yes.
Also, if this happens,
Are the references to these objects also re-assigned?
The specification does not mandate how object references are implemented. An indirect pointer may eliminate the need to adapt all references, see also this Q&A. However, for JVMs using direct pointers, this does indeed imply that these pointers need to get adapted.
Won't this cause significant performance issues? How does Java cope with it?
First, we have to consider what we gain from that. To “eliminate fragmentation” is not an end in itself. If we don’t do it, we have to scan the reachable objects for gaps between them and create a data structure maintaining this information, which we would call “free memory” then. We also needed to implement memory allocations as a search for matching chunks in this data structure or to split chunks if no exact match has been found. This is a rather expensive operation compared to an allocation from a contiguous free memory block, where we only have to bump the pointer to the next free byte by the required size.
Given that allocations happen much more often than garbage collection, which only runs when the memory is full (or a threshold has been crossed), this does already justify more expensive copy operations. It also implies that just using a larger heap can solve performance issues, as it reduces the number of required garbage collector runs, whereas the number of survivor objects will not scale with the memory (unreachable objects stay unreachable, regardless of how long you defer the collection). In fact, deferring the collection raises the chances that more objects became unreachable in the meanwhile. Compare also with this answer.
The costs of adapting references are not much higher than the costs of traversing references in the marking phase. In fact, non-concurrent collectors could even combine these two steps, transferring an object on first encounter and adapting subsequently encountered references, instead of marking the object. The actual copying is the more expensive aspect, but as explained above, it is reduced by not copying all objects but using certain strategies based on typical application behavior, like generational approaches or the “garbage first” strategy, to minimize the required work.
If you move an object around the memory, its address will change. Therefore, references pointing to it will need to be updated. Memory fragmentation occurs when an object in a contigous (in memory) sequence of objects gets deleted. This creates a hole in the memory space, which is generally bad because contigous chunks of memory have faster access times and a higher probability of fitting in chache lines, among other things. It should be noted that the use of indirection tables can prevent reference updates up to the maximum level of indirection used.
Garbage collection has a moderate performance overhead, not just in Java but in other languages as well, such as C# for example. As how Java copes with this, the strategies for performing garbage collection and how to minimize its impact on performance depends on the particular JVM being used, since each JVM can implement garbage collection however it pleases; the only requirement is that it meets the JVM specification.
However, as a programmer, there are some best practices you should follow to make the best out of garbage collection and to minimze its performance hit on your application. See this, also this, this, this blog post, and this other blog post. You might want to check the JVM specs but it's a bit dense.

Java SoftReference, panicing GC and GC behavior

I want to write a cache using SoftReferences using as much memory as possible, as long as it doesn't get too inefficient.
Trying to estimate the used size by calculating object sizes or by getting some used memory approximation of the JVM are dead ends.
The javadoc even states that SoftReferences are good for memory-aware caches, but there is no hard rule on how a JVM implementation shall handle SoftReferences. I'm only talking about the Oracle implementation of the JVM (Version 6.22 and above and Version 7).
Now my questions (please feel free to answer partial, grouped or in any way you please):
Does the JVM take the last access of the object into account and only remove the old ones? Javadoc states: Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references.
What happens when memory gets tight? The JVM panics and just eats all objects?
Is there a parameter for telling the JVM to only eat as much to survive (no OOMEs) and live healthy (not having the CPU only run the GC)
I don't think there is an order. (I'm not sure though about the order of events)
But what happens with soft references is that it is always guaranteed that they will be released before there is an out of memory exception. Unless you have a hard reference pointing to them.
But you should be aware that you might try to access them and they are gone. My guess is that the garbage collector will just eat the first soft reference that fits the amount needed for the operation.
Although SoftReferences are a cool feature, I personally don't dare using them in large
projects where I don't know the memory requirements of every other component. Will a memory-hogging SoftReference cache make other parts perform badly?
I'd instead of using SoftReferences I'd consider using EHCache. It let's you limit the size of particular caches in terms of number of entries, or even better, the bytes used in memory (this is a new feature in the upcoming version 2.5). Different eviction strategies can be configured, of course, such as LRU. There's lots you can configure with EHCache.
If you're using Spring, then version 3.1 will also provide you with some nice #Cachable method-level annotations; EHCache can be used as a caching implementation there.
What happens when memory gets tight? The JVM panics and just eats all
objects?
I know for a fact that with Oracle 1.6 JVM this is not the case. I am aware of a situation where a server that processes concurrent requests uses a response the contains the actual data inside a soft reference. I have observed that when a low memory situation is reported by one thread the other threads' soft references continue to hold on to their contents (the referenced objects).
Is there a parameter for telling the JVM to only eat as much to
survive (no OOMEs) and live healthy (not having the CPU only run the
GC)
What is enough to survive? You mean that if X amount of memory is required then only reclaim soft-references till X is available? I didn't find any such tuning parameter but as I said JVM does not seem to be reclaiming all soft references when it needs to reclaim one.

If there is a list holding 300,000 objects all the time, will gc have a bad performance?

There is a list which is holding 300,000 objects all the time, which won't be cleaned by gc.
If the jvm configuration "Xmx" have a big enough value, will this big list make gc have a bad performance?
I'm asking this because I want to use a big list and data cache in my application. If a big list doesn't affect GC, it's the best choice to do this, because a list in jvm has better performance than others, e.g. memcached, memory db,
In general, probably not. The GC will see that those objects are long-lived, and move them to an area of the heap that is designed to hold long-lived objects.
First, why to use a list as a cache? I guess you'll need a lot of accesses to this cache? If this is the case, maybe you need to think about a Map implementation.
About the GC performance, if your cached objects will stay referenced for a long time, they'll be automatically moved to the old generation (a place in the heap, containing long lived objects), and in this generation, the GC will not be called often, so it's better in terms of performance.
If you want to learn more about GC with JDK6, here is a good link :
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html

What are the "practical consequences" of using soft references?

Per the documentation for Guava's MapMaker.softValues():
Warning: in most circumstances it is better to set a per-cache maximum size instead of using soft references. You should only use this method if you are well familiar with the practical consequences of soft references.
I have an intermediate understanding of soft references - their behavior, uses, and their contract with garbage collection. However I'm wondering what these practical consequences are which the doc alludes to. Why exactly is it better to use maximum size rather than soft references? Don't the algorithms and behavior of soft references make their use more efficient that a hardcoded ceiling, in terms of implementing a cache?
I think that all they are alluding too is that you should be prepared for maximum memory usage, and potentially more gc activity, if you use a Soft reference map, since references are only gc'd as memory needs to be freed up.
If you know you only need the last n values in the cache then using a LRU Cache is a leaner approach, with more predictable resource usage for a running application.
Furthermore, according to this, it seems there are subtle differences in behaviour between -server and -client JVM's.
The Sun JRE does treat SoftReferences
differently from WeakReferences. We
attempt to hold on to object
referenced by a SoftReference if there
isn't pressure on the available
memory. One detail: the policy for the
"-client" and "-server" JRE's are
different: the -client JRE tries to
keep your footprint small by
preferring to clear SoftReferences
rather than expand the heap, whereas
the -server JRE tries to keep your
performance high by preferring to
expand the heap (if possible) rather
than clear SoftReferences. One size
does not fit all.
One of the practical problems with using SoftReferences is that they tend to be discarded all at once. The reason you have a cache is to provide pretty good perform, most of the time.
However using SoftReferences for a cache can mean after your application has stopped for a GC, it will run slowly until the cache is rebuilt. i.e. Just at the time you need to application catch up.
Note: You can use a LinkedHashMap as an LRU cache, its doesn't have to be complex.

LRU LinkedHashMap that limits size based on available memory

I want to create a LinkedHashMap which will limit its size based on available memory (ie. when freeMemory + (maxMemory - allocatedMemory) gets below a certain threshold). This will be used as a form of cache, probably using "least recently used" as a caching strategy.
My concern though is that allocatedMemory also includes (I assume) un-garbage collected data, and thus will over-estimate the amount of used memory. I'm concerned about the unintended consequences this might have.
For example, the LinkedHashMap may keep deleting items because it thinks there isn't enough free memory, but the free memory doesn't increase because these deleted items aren't being garbage collected immediately.
Does anyone have any experience with this type of thing? Is my concern warranted? If so, can anyone suggest a good approach?
I should add that I also want to be able to "lock" the cache, basically saying "ok, from now on don't delete anything because of memory usage issues".
I know I'm biased, but I really have to strongly recommend our MapMaker for this. Use the softKeys() or softValues() feature, depending on whether it's GC collection of the key or of the value that more aptly describes when an entry can be cleaned up.
Caches tend to be problematic. IIRC, there's a SoftCache in Sun's JRE that has had many problems.
Anyway, the simplest thing is to use SoftReferences in the map. This should work fine so long as the overhead of SoftReference plus Map.Entry is significantly lower than the cached data.
Alternatively you can, like WeakHashMap, use a ReferenceQueue and either poll it or have a thread blocking on it (one thread per instance, unfortunately). Be careful with synchronisation issues.
"Locking" the map, you probably want to avoid if necessary. You'd need keep strong references to all the data (and evict if not null). That is going to be ugly.
I would strongly suggest using something like Ehcache instead of re-inventing a caching system. It's super simple to use, very configurable, and works great.
As matt b said, something like Ehcache or JbossCache is a good first step.
If you want something light-weight and in-process, look at google collections. For example, you can use MapMaker (http://google-collections.googlecode.com/svn/trunk/javadoc/index.html?com/google/common/collect/BiMap.html) to make a map with Soft/Weak keys and values, so it would cache only those items it has room for (though you wouldn't get LRU).
I had the same need in the past and this is how I implemented my cache:
there is a cache memory manager, which has a minimum and a maximum memory limit(max limit it matters anyway)
every registered cache has the following(important) parameters: maximum capacity(most of the time you have a higher limit, you don't want go hold more than X items) & percent memory usage
I use a LinkedHashMap and a ReentrantReadWriteLock to guard the cache.
every X puts I calculate the mean memory consumption per entry and trigger eviction(async) if the calculated memory limit > allowed memory limit.
of course memory computation doesn't actually shows the real memory consumption but comparing the computed memory with the real value(using a profiler) I find that it is close enough.
I was planning to put also an additional guard on the cache, to evict in case puts are going faster than memory based evictions, but until now I didn't find the need to do it.

Categories