I have a hashmap which stores around 1 G of data is terms of key value pairs. This hashmap changes every 15 days. It will be loaded into memory and used from there.
When a new hashmap has to be loaded into the memory, there would be several transactions already accessing the hashmap in memory. How can I replace the old hashmap with the new one without effecting the current transactions accessing the old hashmap. If there a way to hot swap the hashmap in memory?
Use an AtomicReference<Map<Foo, Bar>> rather than exposing a direct (hard) reference to the map. Consumers of the map will use #get(), and when you're ready to swap out the map, your "internal" code will use #set() or #getAndSet().
Provide a getter to the map
Mark the map private and volatile
When updating the map, create a new one, populate it and when it is ready, assign it to your private map variable.
Reference assignments are atomic in Java and volatile ensures visibility.
Caveats:
you will have two maps in memory at some stage
if some code keeps a reference to the old map it will access stale data. If that is an issue you can completely hide the map and provide a get(K key) instead so that users always access the latest map.
I will suggest to use caching tools like memcached if the data size is large like yours. This way you can invalidate individual items or entire cache as per your requirement.
Related
I have a situation where I load some data at application level inside a HashMapin my android application. I use one of the entries (with a particular keyA in the HashMap) in this map to initialise some data inside my Activity and this Activity hangs around for a while. While the user is on this activity, the HashMap from which I referenced the object for keyA might change. The code to update the HashMap is written by me. When I want to update the HashMap, I want to clear the entire HashMap(so that its size() returns 0) and want to populate everything again. If I call HashMap.clear(), would it make the old objects to be garbage collected?
If yes, what is the best way to clear the entire HashMap so that I don't loose the old objects if they are being referred to anywhere else in the code. What would be the best to reassign values to the HashMap in this case?
PS: I am using ReentrantReadWriteLock for maintaining the access if that might help.
I call HashMap.clear(), would it make the old objects to be garbage collected?
No. No Java object will ever be garbage collected if it is reachable. If there is any reference (or chain of references) that allows your code to access an object, then the object is safe from garbage collection.
If I call HashMap.clear(), would it make the old objects to be garbage collected?
A HashMap is just storing a single reference to an object. When you call clear(), then the object can be gc'd if (and only if) the only reference to the object was the HashMap. If other parts of the code have references to the object then it won't be. That's how gc works. HashMap isn't magic.
What would be the best to reassign values to the HashMap in this case?
If you are updating the values in a shared map then your code is going to have to re-get the values from the map whenever they use them – or maybe every so often. I'd first re-get the value each time you use it and then prove to yourself that it's too expensive before doing anything else.
PS: I am using ReentrantReadWriteLock for maintaining the access if that might help.
I'd switch to using a ConcurrentHashMap and not both with the expense and code maintenance of doing the locking yourself.
If you need to update specifically keyA alone then you can use HashMap.put("keyA","value");
This will replace the value of keyA in the HashMap object with the value you specify, also it will not affect the other values saved in the HashMap object.
I'm storing some information inside a MbGlobalMap (embedded Global Cache) of the IBM Integration Bus. If the map is called EXAMPLE.MAP I can access the values as follows:
MbGlobalMap map = MbGlobalMap.getGlobalMap("EXAMPLE.MAP");
Object value = map.get(key);
But I want to get all values of the EXAMPLE.MAP, even if I don't know all keys of the map. I can't iterate over the MbGlobalMap and a cast to java.util.Map won't work at all.
This is the documentation of the Class: https://www.ibm.com/support/knowledgecenter/SSMKHH_9.0.0/com.ibm.etools.mft.plugin.doc/com/ibm/broker/plugin/MbGlobalMap.html. There is no method provided, to return all elements inside the Map.
A workaround could be a list with all current keys in it, so that you can get this list and with it you can get all values inside the map. But this is not a clean solution I think.
After some time of research, I want to give an answer to this question by myself:
The solution is the workaround i mentioned in my question. You can put a Java HashMap into the Global Cache and write all your Objects into this Map. An example would look something like the following:
MbGlobalMap globalMap = MbGlobalMap.getGlobalMap("EXAMPLE.MAP");
HashMap<String,Object> map = new HashMap<String,Object>();
// Put some objects into the map
globalMap.put("ALL", map);
Now you have a Java HashMap inside the MbGlobalMap of the Global Cache and you can access the data, without knowing the keys as follows:
MbGlobalMap globalMap = MbGlobalMap.getGlobalMap("EXAMPLE.MAP");
HashMap<String,Object> map = (HashMap<String,Object>)globalMap.get("ALL");
Set<String> allKeys = map.keySet();
Iterator<String> iter = allKeys.iterator();
while(iter.hasNext()) {
// Do something with map.get(iter.next());
}
First I thought, this solution would not be a clean one, because now the Map has to be locked for every write operation. But it seems, that the Global Cache will lock the Map for every write operation anyway:
As JAMESHART mentioned it at his contribution at IBM developerWorks, the Extreme Scale Grid under the Global Cache is configured with pessimistic locking strategy. According to the entry in the IBM Knowledge Center, this means the following:
Pessimistic locking: Acquires locks on entries, then and holds the locks until commit time. This locking strategy provides good consistency at the expense of throughput.
So the use of the described workaround won't have such a big impact on write access and performance.
There's now an enhancement request on IBM's Community RFE website in order to get this feature:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=94875
Please give your vote for this request if you are interested in this feature, because IBM considers ERs based on their votes.
Arcdic , the best way with the API at hand will be to use the putAll that takes in a java.util.Map , then use an EntrySet to get values that you are interested in.
public void putAll(Map m)
throws MbException
I have a map that I use to store dynamic data that are discarded as soon as they are created (i.e. used; they are consumed quickly). It responds to user interaction in the sense that when user clicks a button the map is filled and then the data is used to do some work and then the map is no longer needed.
So my question is what's a better approach for emptying the map? should I set it to null each time or should I call clear()? I know clear is linear in time. But I don't know how to compare that cost with that of creating the map each time. The size of the map is not constant, thought it may run from n to 3n elements between creations.
If a map is not referenced from other objects where it may be hard to set a new one, simply null-ing out an old map and starting from scratch is probably lighter-weight than calling a clear(), because no linear-time cleanup needs to happen. With the garbage collection costs being tiny on modern systems, there is a good chance that you would save some CPU cycles this way. You can avoid resizing the map multiple times by specifying the initial capacity.
One situation where clear() is preferred would be when the map object is shared among multiple objects in your system. For example, if you create a map, give it to several objects, and then keep some shared information in it, setting the map to a new one in all these objects may require keeping references to objects that have the map. In situations like that it's easier to keep calling clear() on the same shared map object.
Well, it depends on how much memory you can throw at it. If you have a lot, then it doesn't matter. However, setting the map itself to null means that you have freed up the garbage collector - if only the map has references to the instances inside of it, the garbage collector can collect not only the map but also any instances inside of it. Clear does empty the map but it has to iterate over everything in the map to set each reference to null, and this takes place during your execution time that you can control - the garbage collector essentially has to do this work anyways, so let it do its thing. Just note that setting it to null doesn't let you reuse it. A typical pattern to reuse a map variable may be:
Map<String, String> whatever = new HashMap<String, String();
// .. do something with map
whatever = new HashMap<String, String>();
This allows you to reuse the variable without setting it to null at all, you silently discard the reference to the old map. This is atrocious practice in non-memory managed applications since they must reference the old pointer to clear it (this is a dangling pointer in other langauges), but in Java since nothing references this the GC marks it as eligible for collection.
I feel nulling the existing map is more cheaper than clear(). As creation of object is very cheap in modern JVMs.
Short answer: use Collection.clear() unless it is too complicated to keep the collection arround.
Detailed answer: In Java, the allocation of memory is almost instantaneous. It is litle more than a pointer that gets moved inside the VM. However, the initialization of those objects might add up to something significant. Also, all objects that use an internal buffer are sensible to resizing and copying of their content. Using clear() make sure that buffers eventually stabilize to some dimension, so that reallocation of memory and copying if old buffer to new buffer will never be necessary.
Another important issue is that reallocating then releasing a lot of objects will require more frequent execution of the Garbage collector, which might cause suddenly lag.
If you always holds the map, it will be prompted to the old generation. If each user has one corresponding map, the number of map in the old generation is proportionate to the number of the user. It may trigger Full GC more frequently when the number of users increase.
You can use both with similar results.
One prior answer notes that clear is expected to take constant time in a mature map implementation. Without checking the source code of the likes of HashMap, TreeMap, ConcurrentHashMap, I would expect their clear method to take constant time, plus amortized garbage collection costs.
Another poster notes that a shared map cannot be nulled. Well, it can if you want it, but you do it by using a proxy object which encapsulates a proper map and nulls it out when needed. Of course, you'd have to implement the proxy map class yourself.
Map<Foo, Bar> myMap = new ProxyMap<Foo, Bar>();
// Internally, the above object holds a reference to a proper map,
// for example, a hash map. Furthermore, this delegates all calls
// to the underlying map. A true proxy.
myMap.clear();
// The clear method simply reinitializes the underlying map.
Unless you did something like the above, clear and nulling out are equivalent in the ways that matter, but I think it's more mature to assume your map, even if not currently shared, may become shared at a later time due to forces you can't foresee.
There is another reason to clear instead of nulling out, even if the map is not shared. Your map may be instantiated by an external client, like a factory, so if you clear your map by nulling it out, you might end up coupling yourself to the factory unnecessarily. Why should the object that clears the map have to know that you instantiate your maps using Guava's Maps.newHashMap() with God knows what parameters? Even if this is not a realistic concern in your project, it still pays off to align yourself to mature practices.
For the above reasons, and all else being equal, I would vote for clear.
HTH.
If I do the following.
Create a HashMap (in a final field)
Populate HashMap
Wrap HashMap with unmodifiable wrapper Map
Start other threads which will access but not modify the Map
As I understand it the Map has been "safely published" because the other threads were started after the Map was fully populated so I think it is ok to access the Map from multiple threads as it cannot be modified after this point.
Is this right?
This is perfectly fine concerning the map itself. But you need to realize the making the map unmodifiable will only make the map itself unmodifiable and not its keys and values. So if you have for example a Map<String, SomeMutableObject> such as Map<String, List<String>>, then threads will still be able to alter the value by for example map.get("foo").add("bar");. To avoid this, you'd like to make the keys/values immutable/unmodifiable as well.
As I understand it the Map has been "safely published" because the other threads were started after the Map was fully populated so I think it is ok to access the Map from multiple threads as it cannot be modified after this point.
Yes. Just make sure that the other threads are started in a synchronized manner, i.e. make sure you have a happens-before relation between publishing the map, and starting the threads.
This is discussed in this blog post:
[...] This is how Collections.unmodifiableMap() works.
[...]
Because of the special meaning of the keyword "final", instances of this class can be shared with multiple threads without using any additional synchronization; when another thread calls get() on the instance, it is guaranteed to get the object you put into the map, without doing any additional synchronization. You should probably use something that is thread-safe to perform the handoff between threads (like LinkedBlockingQueue or something), but if you forget to do this, then you still have the guarantee.
In short, no you don't need the map to be thread-safe if the reads are non-destructive and the map reference is safely published to the client.
In the example there are two important happens-before relationships established here. The final-field publication (if and only if the population is done inside the constructor and the reference doesn't leak outside the constructor) and the calls to start the threads.
Anything that modifies the map after these calls wrt the client reading from the map is not safely published.
We have for example a CopyOnWriteMap that has a non-threadsafe map underlying that is copied on each write. This is as fast as possible in situations where there are many more reads than writes (caching configuration data is a good example).
That said, if the intention really is to not change the map, setting an immutable version of the map into the field is always the best way to go as it guarantees the client will see the correct thing.
Lastly, there are some Map implementations that have destructive reads such as a LinkedHashMap with access ordering, or a WeakHashMap where entries can disappear. These types of maps must be accessed serially.
You are correct. There is no need to ensure exclusive access to the data structure by different threads by using mutex'es or otherwise since it's immutable. This usually greatly increases performance.
Also note that if you only wrap the original Map rather than creating a copy, ie the unmodifiable Map delegates method calls further to the inner HashMap, modifying the underlying Map may introduce race condition problems.
Immutable map is born to thread-safe. You could use ImmutableMap of Guava.
Is there a simple, efficient Map implementation that allows a limit on the memory to be used by the map.
My use case is that I want to allocate dynamically most of the memory available at the time of its creation but I don't want OutOFMemoryError at any time in future. Basically, I want to use this map as a cache, but but I wanna avoid heavy cache implementations like EHCache. My need is simple (at most an LRU algorithm)
I should further clarify that objects in my cache are char[] or similar primitives that will not hold references to other objects.
I can put an upper limit on max size for each entry.
You can use a LinkedHashMap to limit the number of entries in the Map:
removeEldestEntry(Map.Entry<K,V> eldest): Returns true if this map should remove its eldest entry. This method is invoked by put and putAll after inserting a new entry into the map. It provides the implementor with the opportunity to remove the eldest entry each time a new one is added. This is useful if the map represents a cache: it allows the map to reduce memory consumption by deleting stale entries.
Sample use: this override will allow the map to grow up to 100 entries and then delete the eldest entry each time a new entry is added, maintaining a steady state of 100 entries.
private static final int MAX_ENTRIES = 100;
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
Related questions
How do I limit the number of entries in a java hashtable?
Easy, simple to use LRU cache in java
What is a data structure kind of like a hash table, but infrequently-used keys are deleted?
For caches, a SoftHashMap is much more appropriate than a WeakHashMap. A WeakhashMap is usually used when you want to maintain an association with an object for as long as that object is alive, but without preventing it from being reclaimed.
In contrast, a SoftReference is more closely involved with memory allocation. See No SoftHashMap? for details on the differences.
WeakHashMap is also not usually appropriate as it has the association around the wrong way for a cache - it uses weak keys and hard values. That is, the key and value are removed from the map when the key is cleared by the garbage collector. This is typically not what you want for a cache - where the keys are usually lightweight identifiers (e.g. strings, or some other simple value type) - caches usually operate such that the key/value is reclaimed when the value reference is cleared.
The Commons Collections has a ReferenceMap where you can plug in what types of references you wish to use for keys and values. For a memory-sensitive cache, you will probably use hard references for keys, and soft references for values.
To obtain LRU semantics for a given number of references N, maintain a list of the last N entries fetched from the cache - as an entry is retrieved from the cache it is added to the head of the list (and the tail of the list removed.) To ensure this does not hold on to too much memory, you can create a soft reference and use that as a trigger to evict a percentage of the entries from the end of the list. (And create a new soft reference for the next trigger.)
Java Platform Solutions
If all you're looking for is a Map whose keys can be cleaned up to avoid OutOfMemoryErrors, you might want to look into WeakHashMap. It uses WeakReferences in order to allow the garbage collector to reap the map entries. It won't enforce any sort of LRU semantics, though, except those present in the generational garbage collection.
There's also LinkedHashMap, which has this in the documentation:
A special constructor is provided to
create a linked hash map whose order
of iteration is the order in which its
entries were last accessed, from
least-recently accessed to
most-recently (access-order). This
kind of map is well-suited to building
LRU caches. Invoking the put or get
method results in an access to the
corresponding entry (assuming it
exists after the invocation
completes). The putAll method
generates one entry access for each
mapping in the specified map, in the
order that key-value mappings are
provided by the specified map's entry
set iterator. No other methods
generate entry accesses. In
particular, operations on
collection-views do not affect the
order of iteration of the backing map.
So if you use this constructor to make a map whose Iterator iterates in LRU, it becomes pretty easy to prune the map. The one (fairly big) caveat is that LinkedHashMap is not synchronized whatsoever, so you're on your own for concurrency. You can just wrap it in a synchronized wrapper, but that may have throughput issues.
Roll Your Own Solution
If I had to write my own data structure for this use-case, I'd probably create some sort of data structure with a map, queue, and ReadWriteLock along with a janitor thread to handle the cleanup when too many entries were in the map. It would be possible to go slightly over the desired max size, but in the steady-state you'd stay under it.
WeakHashMap won't necessarily attain your purpose since if enough strong reference to the keys are hold by your app., you WILL see OOME.
Alternatively you could look into SoftReference, which will null out the content once the heap is scarce. However, most of the comments I seen indicate that it will not null out the reference until the heap is really really low and a lot of GC starts to kick in with severe performance hit (so I don't recommend using it for your purpose).
My recommendation is to use a simple LRU map, e.g. http://commons.apache.org/collections/apidocs/org/apache/commons/collections/LRUMap.html
thanks for replies guys!
As jasonmp85 pointed out LinkedHashMap has a constructor that allows access order. I missed out that bit when I looked at API docs. The implementation also looks quite efficient(see below). Combined with max size cap for each entry, that should solve my problem.
I will also look closely at SoftReference. Just for the record, Google Collections seems to have pretty good API for SoftKeys and SoftValues and Maps in general.
Here is a snippet from Java LikedHashMap class that shows how they maintain LRU behavior.
/**
* Removes this entry from the linked list.
*/
private void remove() {
before.after = after;
after.before = before;
}
/**
* Inserts this entry before the specified existing entry in the list.
*/
private void addBefore(Entry<K,V> existingEntry) {
after = existingEntry;
before = existingEntry.before;
before.after = this;
after.before = this;
}
/**
* This method is invoked by the superclass whenever the value
* of a pre-existing entry is read by Map.get or modified by Map.set.
* If the enclosing Map is access-ordered, it moves the entry
* to the end of the list; otherwise, it does nothing.
*/
void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
if (lm.accessOrder) {
lm.modCount++;
remove();
addBefore(lm.header);
}