Java fixed memory map - java

Is there a simple, efficient Map implementation that allows a limit on the memory to be used by the map.
My use case is that I want to allocate dynamically most of the memory available at the time of its creation but I don't want OutOFMemoryError at any time in future. Basically, I want to use this map as a cache, but but I wanna avoid heavy cache implementations like EHCache. My need is simple (at most an LRU algorithm)
I should further clarify that objects in my cache are char[] or similar primitives that will not hold references to other objects.
I can put an upper limit on max size for each entry.

You can use a LinkedHashMap to limit the number of entries in the Map:
removeEldestEntry(Map.Entry<K,V> eldest): Returns true if this map should remove its eldest entry. This method is invoked by put and putAll after inserting a new entry into the map. It provides the implementor with the opportunity to remove the eldest entry each time a new one is added. This is useful if the map represents a cache: it allows the map to reduce memory consumption by deleting stale entries.
Sample use: this override will allow the map to grow up to 100 entries and then delete the eldest entry each time a new entry is added, maintaining a steady state of 100 entries.
private static final int MAX_ENTRIES = 100;
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
Related questions
How do I limit the number of entries in a java hashtable?
Easy, simple to use LRU cache in java
What is a data structure kind of like a hash table, but infrequently-used keys are deleted?

For caches, a SoftHashMap is much more appropriate than a WeakHashMap. A WeakhashMap is usually used when you want to maintain an association with an object for as long as that object is alive, but without preventing it from being reclaimed.
In contrast, a SoftReference is more closely involved with memory allocation. See No SoftHashMap? for details on the differences.
WeakHashMap is also not usually appropriate as it has the association around the wrong way for a cache - it uses weak keys and hard values. That is, the key and value are removed from the map when the key is cleared by the garbage collector. This is typically not what you want for a cache - where the keys are usually lightweight identifiers (e.g. strings, or some other simple value type) - caches usually operate such that the key/value is reclaimed when the value reference is cleared.
The Commons Collections has a ReferenceMap where you can plug in what types of references you wish to use for keys and values. For a memory-sensitive cache, you will probably use hard references for keys, and soft references for values.
To obtain LRU semantics for a given number of references N, maintain a list of the last N entries fetched from the cache - as an entry is retrieved from the cache it is added to the head of the list (and the tail of the list removed.) To ensure this does not hold on to too much memory, you can create a soft reference and use that as a trigger to evict a percentage of the entries from the end of the list. (And create a new soft reference for the next trigger.)

Java Platform Solutions
If all you're looking for is a Map whose keys can be cleaned up to avoid OutOfMemoryErrors, you might want to look into WeakHashMap. It uses WeakReferences in order to allow the garbage collector to reap the map entries. It won't enforce any sort of LRU semantics, though, except those present in the generational garbage collection.
There's also LinkedHashMap, which has this in the documentation:
A special constructor is provided to
create a linked hash map whose order
of iteration is the order in which its
entries were last accessed, from
least-recently accessed to
most-recently (access-order). This
kind of map is well-suited to building
LRU caches. Invoking the put or get
method results in an access to the
corresponding entry (assuming it
exists after the invocation
completes). The putAll method
generates one entry access for each
mapping in the specified map, in the
order that key-value mappings are
provided by the specified map's entry
set iterator. No other methods
generate entry accesses. In
particular, operations on
collection-views do not affect the
order of iteration of the backing map.
So if you use this constructor to make a map whose Iterator iterates in LRU, it becomes pretty easy to prune the map. The one (fairly big) caveat is that LinkedHashMap is not synchronized whatsoever, so you're on your own for concurrency. You can just wrap it in a synchronized wrapper, but that may have throughput issues.
Roll Your Own Solution
If I had to write my own data structure for this use-case, I'd probably create some sort of data structure with a map, queue, and ReadWriteLock along with a janitor thread to handle the cleanup when too many entries were in the map. It would be possible to go slightly over the desired max size, but in the steady-state you'd stay under it.

WeakHashMap won't necessarily attain your purpose since if enough strong reference to the keys are hold by your app., you WILL see OOME.
Alternatively you could look into SoftReference, which will null out the content once the heap is scarce. However, most of the comments I seen indicate that it will not null out the reference until the heap is really really low and a lot of GC starts to kick in with severe performance hit (so I don't recommend using it for your purpose).
My recommendation is to use a simple LRU map, e.g. http://commons.apache.org/collections/apidocs/org/apache/commons/collections/LRUMap.html

thanks for replies guys!
As jasonmp85 pointed out LinkedHashMap has a constructor that allows access order. I missed out that bit when I looked at API docs. The implementation also looks quite efficient(see below). Combined with max size cap for each entry, that should solve my problem.
I will also look closely at SoftReference. Just for the record, Google Collections seems to have pretty good API for SoftKeys and SoftValues and Maps in general.
Here is a snippet from Java LikedHashMap class that shows how they maintain LRU behavior.
/**
* Removes this entry from the linked list.
*/
private void remove() {
before.after = after;
after.before = before;
}
/**
* Inserts this entry before the specified existing entry in the list.
*/
private void addBefore(Entry<K,V> existingEntry) {
after = existingEntry;
before = existingEntry.before;
before.after = this;
after.before = this;
}
/**
* This method is invoked by the superclass whenever the value
* of a pre-existing entry is read by Map.get or modified by Map.set.
* If the enclosing Map is access-ordered, it moves the entry
* to the end of the list; otherwise, it does nothing.
*/
void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
if (lm.accessOrder) {
lm.modCount++;
remove();
addBefore(lm.header);
}

Related

Map clear vs null

I have a map that I use to store dynamic data that are discarded as soon as they are created (i.e. used; they are consumed quickly). It responds to user interaction in the sense that when user clicks a button the map is filled and then the data is used to do some work and then the map is no longer needed.
So my question is what's a better approach for emptying the map? should I set it to null each time or should I call clear()? I know clear is linear in time. But I don't know how to compare that cost with that of creating the map each time. The size of the map is not constant, thought it may run from n to 3n elements between creations.
If a map is not referenced from other objects where it may be hard to set a new one, simply null-ing out an old map and starting from scratch is probably lighter-weight than calling a clear(), because no linear-time cleanup needs to happen. With the garbage collection costs being tiny on modern systems, there is a good chance that you would save some CPU cycles this way. You can avoid resizing the map multiple times by specifying the initial capacity.
One situation where clear() is preferred would be when the map object is shared among multiple objects in your system. For example, if you create a map, give it to several objects, and then keep some shared information in it, setting the map to a new one in all these objects may require keeping references to objects that have the map. In situations like that it's easier to keep calling clear() on the same shared map object.
Well, it depends on how much memory you can throw at it. If you have a lot, then it doesn't matter. However, setting the map itself to null means that you have freed up the garbage collector - if only the map has references to the instances inside of it, the garbage collector can collect not only the map but also any instances inside of it. Clear does empty the map but it has to iterate over everything in the map to set each reference to null, and this takes place during your execution time that you can control - the garbage collector essentially has to do this work anyways, so let it do its thing. Just note that setting it to null doesn't let you reuse it. A typical pattern to reuse a map variable may be:
Map<String, String> whatever = new HashMap<String, String();
// .. do something with map
whatever = new HashMap<String, String>();
This allows you to reuse the variable without setting it to null at all, you silently discard the reference to the old map. This is atrocious practice in non-memory managed applications since they must reference the old pointer to clear it (this is a dangling pointer in other langauges), but in Java since nothing references this the GC marks it as eligible for collection.
I feel nulling the existing map is more cheaper than clear(). As creation of object is very cheap in modern JVMs.
Short answer: use Collection.clear() unless it is too complicated to keep the collection arround.
Detailed answer: In Java, the allocation of memory is almost instantaneous. It is litle more than a pointer that gets moved inside the VM. However, the initialization of those objects might add up to something significant. Also, all objects that use an internal buffer are sensible to resizing and copying of their content. Using clear() make sure that buffers eventually stabilize to some dimension, so that reallocation of memory and copying if old buffer to new buffer will never be necessary.
Another important issue is that reallocating then releasing a lot of objects will require more frequent execution of the Garbage collector, which might cause suddenly lag.
If you always holds the map, it will be prompted to the old generation. If each user has one corresponding map, the number of map in the old generation is proportionate to the number of the user. It may trigger Full GC more frequently when the number of users increase.
You can use both with similar results.
One prior answer notes that clear is expected to take constant time in a mature map implementation. Without checking the source code of the likes of HashMap, TreeMap, ConcurrentHashMap, I would expect their clear method to take constant time, plus amortized garbage collection costs.
Another poster notes that a shared map cannot be nulled. Well, it can if you want it, but you do it by using a proxy object which encapsulates a proper map and nulls it out when needed. Of course, you'd have to implement the proxy map class yourself.
Map<Foo, Bar> myMap = new ProxyMap<Foo, Bar>();
// Internally, the above object holds a reference to a proper map,
// for example, a hash map. Furthermore, this delegates all calls
// to the underlying map. A true proxy.
myMap.clear();
// The clear method simply reinitializes the underlying map.
Unless you did something like the above, clear and nulling out are equivalent in the ways that matter, but I think it's more mature to assume your map, even if not currently shared, may become shared at a later time due to forces you can't foresee.
There is another reason to clear instead of nulling out, even if the map is not shared. Your map may be instantiated by an external client, like a factory, so if you clear your map by nulling it out, you might end up coupling yourself to the factory unnecessarily. Why should the object that clears the map have to know that you instantiate your maps using Guava's Maps.newHashMap() with God knows what parameters? Even if this is not a realistic concern in your project, it still pays off to align yourself to mature practices.
For the above reasons, and all else being equal, I would vote for clear.
HTH.

Replace a big hashmap in AS

I have a hashmap which stores around 1 G of data is terms of key value pairs. This hashmap changes every 15 days. It will be loaded into memory and used from there.
When a new hashmap has to be loaded into the memory, there would be several transactions already accessing the hashmap in memory. How can I replace the old hashmap with the new one without effecting the current transactions accessing the old hashmap. If there a way to hot swap the hashmap in memory?
Use an AtomicReference<Map<Foo, Bar>> rather than exposing a direct (hard) reference to the map. Consumers of the map will use #get(), and when you're ready to swap out the map, your "internal" code will use #set() or #getAndSet().
Provide a getter to the map
Mark the map private and volatile
When updating the map, create a new one, populate it and when it is ready, assign it to your private map variable.
Reference assignments are atomic in Java and volatile ensures visibility.
Caveats:
you will have two maps in memory at some stage
if some code keeps a reference to the old map it will access stale data. If that is an issue you can completely hide the map and provide a get(K key) instead so that users always access the latest map.
I will suggest to use caching tools like memcached if the data size is large like yours. This way you can invalidate individual items or entire cache as per your requirement.

Need an efficient Map or Set that does NOT produce any garbage when adding and removing

So because Javolution does not work (see here) I am in deep need of a Java Map implementation that is efficient and produces no garbage under simple usage. java.util.Map will produce garbage as you add and remove keys. I checked Trove and Guava but it does not look they have Set<E> implementations. Where can I find a simple and efficient alternative for java.util.Map?
Edit for EJP:
An entry object is allocated when you add an entry, and released to GC when you remove it. :(
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
Taken literally, I am not aware of any existing implementation of Map or Set that never produces any garbage on adding and removing a key.
In fact, the only way that it would even be technically possible (in Java, using the Map and Set APIs as defined) is if you were to place a strict upper bound on the number of entries. Practical Map and Set implementations need extra state proportional to the number of elements they hold. This state has to be stored somewhere, and when the current allocation is exceeded that storage needs to be expanded. In Java, that means that new nodes need to be allocated.
(OK, you could designed a data structure class that held onto old useless nodes for ever, and therefore never generated any collectable garbage ... but it is still generating garbage.)
So what can you do about this in practice ... to reduce the amount of garbage generated. Let's take HashMap as an example:
Garbage is created when you remove an entry. This is unavoidable, unless you replace the hash chains with an implementation that never releases the nodes that represent the chain entries. (And that's a bad idea ... unless you can guarantee that the free node pool size will always be small. See below for why it is a bad idea.)
Garbage is created when the main hash array is resized. This can be avoided in a couple of ways:
You can give a 'capacity' argument in the HashMap constructor to set the size of the initial hash array large enough that you never need to resize it. (But that potentially wastes space ... especially if you can't accurately predict how big the HashMap is going to grow.)
You can supply a ridiculous value for the 'load factor' argument to cause the HashMap to never resize itself. (But that results in a HashMap whose hash chains are unbounded, and you end up with O(N) behaviour for lookup, insertion, deletion, etc.
In fact, creating garbage is not necessarily bad for performance. Indeed, hanging onto nodes so that the garbage collector doesn't collect them can actually be worse for performance.
The cost of a GC run (assuming a modern copying collector) is mostly in three areas:
Finding nodes that are not garbage.
Copying those non-garbage nodes to the "to-space".
Updating references in other non-garbage nodes to point to objects in "to-space".
(If you are using a low-pause collector there are other costs too ... generally proportional to the amount of non-garbage.)
The only part of the GC's work that actually depends on the amount of garbage, is zeroing the memory that the garbage objects once occupied to make it ready for reuse. And this can be done with a single bzero call for the entire "from-space" ... or using virtual memory tricks.
Suppose your application / data structure hangs onto nodes to avoid creating garbage. Now, when the GC runs, it has to do extra work to traverse all of those extra nodes, and copy them to "to-space", even though they contain no useful information. Furthermore, those nodes are using memory, which means that if the rest of the application generates garbage there will be less space to hold it, and the GC will need to run more often.
And if you've used weak/soft references to allow the GC to claw back nodes from your data structure, then that's even more work for the GC ... and space to represent those references.
Note: I'm not claiming that object pooling always makes performance worse, just that it often does, especially if the pool gets unexpectedly big.
And of course, that's why HashMap and similar general purpose data structure classes don't do any object pooling. If they did, they would perform significantly badly in situations where the programmer doesn't expect it ... and they would be genuinely broken, IMO.
Finally, there is an easy way to tune a HashMap so that an add immediately followed by a remove of the same key produces no garbage (guaranteed). Wrap it in a Map class that caches the last entry "added", and only does the put on the real HashMap when the next entry is added. Of course, this is NOT a general purpose solution, but it does address the use case of your earlier question.
I guess you need a version of HashMap that uses open addressing, and you'll want something better than linear probing. I don't know of a specific recommendation though.
http://sourceforge.net/projects/high-scale-lib/ has implementations of Set and Map which do not create garbage on add or remove of keys. The implementation uses a single array with alternating keys and values, so put(k,v) does not create an Entry object.
Now, there are some caveats:
Rehash creates garbage b/c it replaces the underlying array
I think this map will rehash given enough interleaved put & delete operations, even if the overall size is stable. (To harvest tombstone values)
This map will create Entry object if you ask for the entry set (one at a time as you iterate)
The class is called NonBlockingHashMap.
One option is to try to fix the HashMap implementation to use a pool of entries. I have done that. :) There are also other optimizations for speed you can do there. I agree with you: that issue with Javolution FastMap is mind-boggling. :(

Map.clear() vs new Map : Which one will be better? [duplicate]

This question already has answers here:
Better practice to re-instantiate a List or invoke clear()
(4 answers)
Closed 1 year ago.
I have a Map as syntax as Map<String, String> testMap = new HashMap<String, String>();.
In this map there can be 1000 data.
When my application requires to new list of data, then I must clear the Map. But when I saw the code of Map.clear() as
/**
* Removes all of the mappings from this map.
* The map will be empty after this call returns.
*/
public void clear() {
modCount++;
Entry[] tab = table;
for (int i = 0; i < tab.length; i++)
tab[i] = null;
size = 0;
}
I realize that clear method goes in loop for n times (Where n is number of data in Map). So I thought there can be a way to redefine that Map as testMap = new HashMap<String, String>();
and previously used Map will be Garbage collected.
But I am not sure this will be a good way. I am working on mobile application.
Can you please guide me?
Complicated question. Let's see what happens.
You instantiate a new instance, which is backed with new array. So, garbage collector should clear all the key and values from the previous map, and clear the reference to itself. So O(n) algorithm is executed anyway, but in the garbage collector thread. For 1000 records you won't see any difference.
BUT. The performance guide tells you that it is always better not to create new objects, if you can. So I would go with clear() method.
Anyway, try both variants and try to measure. Always measure!
When you say Map.clear() on a Map of size n... You are asking the GC to clean up 2*n (Key & Value) objects. When you say null to the same Map, you are asking the GC to clean up 2*n+1 (1 for the Map itself) objects. Then you will have to create a new Map instance yet another overhead. So go for Map.clear(). You will be wise to preset the size of the Map while instantiating it.
I thought Creating object in java more expensive in terms of memory,so it is better to you go with .clear(),so you are using same object instead of creating new one
The idea of having clear() method is to remove references to other objects from the map, so that the key/values are not held up from gcing if the "map is referenced somewhere else".
But if your map is a local map only used by your specific code( i.e. "map is 'not' referenced somewhere else") then go ahead and use a new map instead, but setting a 1000 references to null wont be a big performance hit anyway.
don't forget the repopulation of the map
if you don't specify the capacity on the new map you will get quite a bit of overhead on the newly created map because of rehashes (which each are O(n) (at the time) and happen O(log(n)) times while this might amortize to O(n) total but if they don't happen in the first place you will still be better of)
this won't happen with the cleared map because the capacity doesn't change
I think calling new HashMap() is a better idea since it will not have to do as much processing as clearing the hashmap. Also, by creating a new hashmap you are removing the chance that the hashmap may still be binded to the control that uses the data, which would cause problems when the hashmap is to be cleared.
The map.clear() that will remove all data. Note that this will only discard all entries, but keep the internal array used to store the entries at the same size (rather than shrinking to an initial capacity). If you also need to eliminate that, the easiest way would be to discard the whole HashMap and replace it with a new instance. That of course only works if you control who has a pointer to the map.
As for reclaiming the memory, you will have to let the garbage collector do its work.
Are your values also Long? In this case, you may want to look at a more (memory-) efficient implementation than the generic HashMap, such as the TLongLongHashMap found in the GNU Trove library. That should save a lot of memory.

Can I constrain a HashMap by the amount of memory it takes up?

I am implementing a simple cache using LinkedHashMap based on the instructions found here. I use the following code:
public class Cache extends LinkedHashMap {
private final int capacity;
public Cache(int capacity) {
super(capacity + 1, 1.1f, true);
this.capacity = capacity;
}
protected boolean removeEldestEntry(Entry eldest) {
return size() > capacity;
}
}
This is very easy. However, it simply imposes a fixed size on the map. I am running on a very small heap and depending on the size of the cached objects and my chosen capacity this could still run out of memory. The objects are arbitrary and so I can't estimate how big they might be. I don't want to depend on SoftReferences to prune the cache because the way those are cleaned up is unreliable; it changes from VM to VM, and they might either get reclaimed too soon, or they might never get reclaimed until they fill up my heap.
Is there any way for me to monitor the size of the map and constrain based on that?
If soft/weak references are out of the question, then I see 2 (non-trivial) options:
1) Use Java instrumentation to check the actual size of the items added to the map. The instrumentation interface provides the "shallow" size of an object, and you will need more code to explore the references (and avoid counting duplicates!). Here is a solution that calculates the deep size of one object.
2) Use JMX to track the heap size after GCs, and change the map behavior when some dangerous threshold is being reached. See "notifications" section in MemoryMXBean javadoc.
The map itself contains only fixed-size entries, which contain references to the actual objects "contained" in the map. You would need to override all map-mutating methods (i.e. put(), copy-constructor, etc) to keep track of the sizes of objects referenced from the map (can you even determine how much memory a Java object takes up?). Then consider that objects you add to the cache might themselves contain references to other objects and/or collections. How deep do you go?
Take a look at http://www.javapractices.com/topic/TopicAction.do?Id=83
As others have mentioned, you can use agent instrumentation to do this. The SizeOf project provides a handy utility for this approach. This can be used with the ConcrrentLinkedHashMap's concept of weighted values, where a Weigher determines how many units of capacity a value consumes. That enables caches to properly handle collections or memory limits in addition to the traditional maximum number of entries constraint.
If you wish to bound by the heap, then there is a fork of an earlier version of ConcurrentLinkedHashMap that does this. This retains the Apache license of the original so it could be adapted for your needs since it is packaged with Voldemort.
http://sizeof.sourceforge.net/
http://code.google.com/p/concurrentlinkedhashmap/
http://github.com/Omega1/voldemort/blob/master/src/java/voldemort/store/memory/ConcurrentLinkedHashMap.java
You could wrap a Map implementation and enforce the size in the put and putAll methods.

Categories