Map.clear() vs new Map : Which one will be better? [duplicate]

Map.clear() vs new Map : Which one will be better? [duplicate] - java

This question already has answers here:
Better practice to re-instantiate a List or invoke clear()
(4 answers)
Closed 1 year ago.
I have a Map as syntax as Map<String, String> testMap = new HashMap<String, String>();.
In this map there can be 1000 data.
When my application requires to new list of data, then I must clear the Map. But when I saw the code of Map.clear() as
/**
* Removes all of the mappings from this map.
* The map will be empty after this call returns.
*/
public void clear() {
modCount++;
Entry[] tab = table;
for (int i = 0; i < tab.length; i++)
tab[i] = null;
size = 0;
}
I realize that clear method goes in loop for n times (Where n is number of data in Map). So I thought there can be a way to redefine that Map as testMap = new HashMap<String, String>();
and previously used Map will be Garbage collected.
But I am not sure this will be a good way. I am working on mobile application.
Can you please guide me?

Complicated question. Let's see what happens.
You instantiate a new instance, which is backed with new array. So, garbage collector should clear all the key and values from the previous map, and clear the reference to itself. So O(n) algorithm is executed anyway, but in the garbage collector thread. For 1000 records you won't see any difference.
BUT. The performance guide tells you that it is always better not to create new objects, if you can. So I would go with clear() method.
Anyway, try both variants and try to measure. Always measure!

When you say Map.clear() on a Map of size n... You are asking the GC to clean up 2*n (Key & Value) objects. When you say null to the same Map, you are asking the GC to clean up 2*n+1 (1 for the Map itself) objects. Then you will have to create a new Map instance yet another overhead. So go for Map.clear(). You will be wise to preset the size of the Map while instantiating it.

I thought Creating object in java more expensive in terms of memory,so it is better to you go with .clear(),so you are using same object instead of creating new one

The idea of having clear() method is to remove references to other objects from the map, so that the key/values are not held up from gcing if the "map is referenced somewhere else".
But if your map is a local map only used by your specific code( i.e. "map is 'not' referenced somewhere else") then go ahead and use a new map instead, but setting a 1000 references to null wont be a big performance hit anyway.

don't forget the repopulation of the map
if you don't specify the capacity on the new map you will get quite a bit of overhead on the newly created map because of rehashes (which each are O(n) (at the time) and happen O(log(n)) times while this might amortize to O(n) total but if they don't happen in the first place you will still be better of)
this won't happen with the cleared map because the capacity doesn't change

I think calling new HashMap() is a better idea since it will not have to do as much processing as clearing the hashmap. Also, by creating a new hashmap you are removing the chance that the hashmap may still be binded to the control that uses the data, which would cause problems when the hashmap is to be cleared.

The map.clear() that will remove all data. Note that this will only discard all entries, but keep the internal array used to store the entries at the same size (rather than shrinking to an initial capacity). If you also need to eliminate that, the easiest way would be to discard the whole HashMap and replace it with a new instance. That of course only works if you control who has a pointer to the map.
As for reclaiming the memory, you will have to let the garbage collector do its work.
Are your values also Long? In this case, you may want to look at a more (memory-) efficient implementation than the generic HashMap, such as the TLongLongHashMap found in the GNU Trove library. That should save a lot of memory.

Related

How to remove elements to key in ConcurrentSkipListMap?

I have a ConcurrentSkipListMap. I need to remove elements which are lower then key.
Here is how I can perform it:
private ConcurrentNavigableMap<Double, MyObject> myObjectsMap = new ConcurrentSkipListMap<>();
//...
myObjectsMap = myObjectsMap.tailMap(10.25, false);
Looks OK, but I am confused about these facts:
1.
The returned map is backed by this map, so changes in the returned map
are reflected in this map, and vice-versa.
Does it mean that old values won't be removed by the garbage collector?
I.e. we removed the old map and now we have a new map. But this new map is backed by the old map. So, what happens with the old map? Will it be removed or will it be sitting on a memory forever?
2.
The returned map will throw an IllegalArgumentException on an attempt
to insert a key outside its range.
So, now I can't put new keys which are less than 10.25 and more than the last maximum value?
I'am confused. How then correctly I need to remove elements from the ConcurrentSkipListMap?

Does it mean that old values won't be removed by the garbage collector?
I.e. we removed the old map and now we have a new map. But this new map is backed by the old map. So, what happens with the old map? Will it be removed or will it be sitting on a memory forever?
Yes, in point of fact. The old map is still around, and it'll stay around.
If you want to remove keys < 10.25, then do
map.headMap(10.25, false).clear();
...which will create that sub-map, remove all its elements -- removing them from the original map, too -- and then discard that submap view, letting it get garbage collected and leaving you with the original map object containing only keys >= 10.25.
Mind you, while this is guaranteed to remove keys that were < 10.25 when the operation started, there are no guarantees that new keys haven't been concurrently inserted, or that new keys might get inserted later. There's nothing you can do about that, really. If you want to be very sure you're only operating over values >= 10.25, then go ahead and use map.tailMap(10.25, true), but other values less than 10.25 might still be getting inserted, and they'll still be in memory.

Map clear vs null

I have a map that I use to store dynamic data that are discarded as soon as they are created (i.e. used; they are consumed quickly). It responds to user interaction in the sense that when user clicks a button the map is filled and then the data is used to do some work and then the map is no longer needed.
So my question is what's a better approach for emptying the map? should I set it to null each time or should I call clear()? I know clear is linear in time. But I don't know how to compare that cost with that of creating the map each time. The size of the map is not constant, thought it may run from n to 3n elements between creations.

If a map is not referenced from other objects where it may be hard to set a new one, simply null-ing out an old map and starting from scratch is probably lighter-weight than calling a clear(), because no linear-time cleanup needs to happen. With the garbage collection costs being tiny on modern systems, there is a good chance that you would save some CPU cycles this way. You can avoid resizing the map multiple times by specifying the initial capacity.
One situation where clear() is preferred would be when the map object is shared among multiple objects in your system. For example, if you create a map, give it to several objects, and then keep some shared information in it, setting the map to a new one in all these objects may require keeping references to objects that have the map. In situations like that it's easier to keep calling clear() on the same shared map object.

Well, it depends on how much memory you can throw at it. If you have a lot, then it doesn't matter. However, setting the map itself to null means that you have freed up the garbage collector - if only the map has references to the instances inside of it, the garbage collector can collect not only the map but also any instances inside of it. Clear does empty the map but it has to iterate over everything in the map to set each reference to null, and this takes place during your execution time that you can control - the garbage collector essentially has to do this work anyways, so let it do its thing. Just note that setting it to null doesn't let you reuse it. A typical pattern to reuse a map variable may be:
Map<String, String> whatever = new HashMap<String, String();
// .. do something with map
whatever = new HashMap<String, String>();
This allows you to reuse the variable without setting it to null at all, you silently discard the reference to the old map. This is atrocious practice in non-memory managed applications since they must reference the old pointer to clear it (this is a dangling pointer in other langauges), but in Java since nothing references this the GC marks it as eligible for collection.

I feel nulling the existing map is more cheaper than clear(). As creation of object is very cheap in modern JVMs.

Short answer: use Collection.clear() unless it is too complicated to keep the collection arround.
Detailed answer: In Java, the allocation of memory is almost instantaneous. It is litle more than a pointer that gets moved inside the VM. However, the initialization of those objects might add up to something significant. Also, all objects that use an internal buffer are sensible to resizing and copying of their content. Using clear() make sure that buffers eventually stabilize to some dimension, so that reallocation of memory and copying if old buffer to new buffer will never be necessary.
Another important issue is that reallocating then releasing a lot of objects will require more frequent execution of the Garbage collector, which might cause suddenly lag.

If you always holds the map, it will be prompted to the old generation. If each user has one corresponding map, the number of map in the old generation is proportionate to the number of the user. It may trigger Full GC more frequently when the number of users increase.

You can use both with similar results.
One prior answer notes that clear is expected to take constant time in a mature map implementation. Without checking the source code of the likes of HashMap, TreeMap, ConcurrentHashMap, I would expect their clear method to take constant time, plus amortized garbage collection costs.
Another poster notes that a shared map cannot be nulled. Well, it can if you want it, but you do it by using a proxy object which encapsulates a proper map and nulls it out when needed. Of course, you'd have to implement the proxy map class yourself.
Map<Foo, Bar> myMap = new ProxyMap<Foo, Bar>();
// Internally, the above object holds a reference to a proper map,
// for example, a hash map. Furthermore, this delegates all calls
// to the underlying map. A true proxy.
myMap.clear();
// The clear method simply reinitializes the underlying map.
Unless you did something like the above, clear and nulling out are equivalent in the ways that matter, but I think it's more mature to assume your map, even if not currently shared, may become shared at a later time due to forces you can't foresee.
There is another reason to clear instead of nulling out, even if the map is not shared. Your map may be instantiated by an external client, like a factory, so if you clear your map by nulling it out, you might end up coupling yourself to the factory unnecessarily. Why should the object that clears the map have to know that you instantiate your maps using Guava's Maps.newHashMap() with God knows what parameters? Even if this is not a realistic concern in your project, it still pays off to align yourself to mature practices.
For the above reasons, and all else being equal, I would vote for clear.
HTH.

Reuse hashmaps in array

I am holding an array of hashmaps, I want to gain maximum performance and memory usage so I would like to resue the hashmaps inside an array.
So when there is a hashmap in the array that is not needed any more and I want to add new hashmap to the array I just clear the hashmap and use put() to add new values.
I also need to copy back values when I retireve hashmap from array.
I am not sure if this is better than creating new HashMap() every time.
What is better?
UPDATE
need to cycle about 50 milions of hashmaps, each hash map has about 10 key-value pairs. If size of the array 20,000 I need just 20,000 hashmaps instead of 50 milions new hashmaps()

Be very careful with this approach. Although it may be better performance-wise to recycle objects, you may get into trouble by modifying the same reference several times, as illustrated in the following example:
public class A {
public int counter = 0;
public static void main(String[] args) {
A a = new A();
a.counter = 5;
A b = a; // I want to save a into b and then recycle a for other purposes
a.counter = 10; // now b.counter is also 10
}
}
I'm sure you got the point, however if you are not copying around references to HashMaps from the array, then it should be ok.

Doesn't matter. Premature optimization. Come back when you have profiler results telling you where you're actually spending most memory or CPU cycles

It is entirely unclear why re-using maps in this manner would improve performance and/or memory usage. For all we know, it might make no difference, or might have the opposite effect.
You should do whatever results in the most readable code, then profile, and finally optimize the parts of the code that the profiler highlights as bottlenecks.

In most cases you will not feel any difference.
Typically number of map entries is MUCH higher than number of map objects. When you populate map you create instance of Map.Entry per entry. This is relatively light-weight object but anyway you invoke new. The map itself without data is lightweight too, so you will not get any benefits with these tricks unless your map is supposed to hold 1-2 entries.
Bottom line.
Forget about pre-mature optimization. Implement your application. If you have performance problems profile the application, find bottle necks and fix them. I can 99% guarantee you that the bottleneck will never be in new HashMap() call.

I think what you want is an Object pool kind of thing, where you get an object(in your case, its HashMap) from the object pool, perform your operations, and if that Object is no longer needed you put it back in the pool.
check for Object pool design pattern, for further reference check this link :
http://sourcemaking.com/design_patterns/object_pool

The problem you have is that most of the objects are Map.Entry objects in the HashMap. While you can recycle the HashMap itself (and its array) these are only a small portion of the objects. One way around this is to use FastMap from javolution which recycles everything and has support for managing the lifecycle (its designed to minimise garbage this way)
I suspect the most efficient way is to use an EnumMap is possible (if you have known key attributes) or POJOs even if most fields are not used.

There's a few problems with reusing HashMaps.
Even if the key and value data were to take no memory (shared from other places), the Map.Entry objects would dominate memory usage but not be reused (unless you did something a bit special).
Because of generational GC, generally having old objects point to new is expensive (and relatively difficult to see what's going on). Might not be an issue if you are keeping millions of these.
More complicated code is more difficult to optimise. So keep it simple, and then do the big optimisations, which probably involve changing the data structures.

Java fixed memory map

Is there a simple, efficient Map implementation that allows a limit on the memory to be used by the map.
My use case is that I want to allocate dynamically most of the memory available at the time of its creation but I don't want OutOFMemoryError at any time in future. Basically, I want to use this map as a cache, but but I wanna avoid heavy cache implementations like EHCache. My need is simple (at most an LRU algorithm)
I should further clarify that objects in my cache are char[] or similar primitives that will not hold references to other objects.
I can put an upper limit on max size for each entry.

You can use a LinkedHashMap to limit the number of entries in the Map:
removeEldestEntry(Map.Entry<K,V> eldest): Returns true if this map should remove its eldest entry. This method is invoked by put and putAll after inserting a new entry into the map. It provides the implementor with the opportunity to remove the eldest entry each time a new one is added. This is useful if the map represents a cache: it allows the map to reduce memory consumption by deleting stale entries.
Sample use: this override will allow the map to grow up to 100 entries and then delete the eldest entry each time a new entry is added, maintaining a steady state of 100 entries.
private static final int MAX_ENTRIES = 100;
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
Related questions
How do I limit the number of entries in a java hashtable?
Easy, simple to use LRU cache in java
What is a data structure kind of like a hash table, but infrequently-used keys are deleted?

For caches, a SoftHashMap is much more appropriate than a WeakHashMap. A WeakhashMap is usually used when you want to maintain an association with an object for as long as that object is alive, but without preventing it from being reclaimed.
In contrast, a SoftReference is more closely involved with memory allocation. See No SoftHashMap? for details on the differences.
WeakHashMap is also not usually appropriate as it has the association around the wrong way for a cache - it uses weak keys and hard values. That is, the key and value are removed from the map when the key is cleared by the garbage collector. This is typically not what you want for a cache - where the keys are usually lightweight identifiers (e.g. strings, or some other simple value type) - caches usually operate such that the key/value is reclaimed when the value reference is cleared.
The Commons Collections has a ReferenceMap where you can plug in what types of references you wish to use for keys and values. For a memory-sensitive cache, you will probably use hard references for keys, and soft references for values.
To obtain LRU semantics for a given number of references N, maintain a list of the last N entries fetched from the cache - as an entry is retrieved from the cache it is added to the head of the list (and the tail of the list removed.) To ensure this does not hold on to too much memory, you can create a soft reference and use that as a trigger to evict a percentage of the entries from the end of the list. (And create a new soft reference for the next trigger.)

Java Platform Solutions
If all you're looking for is a Map whose keys can be cleaned up to avoid OutOfMemoryErrors, you might want to look into WeakHashMap. It uses WeakReferences in order to allow the garbage collector to reap the map entries. It won't enforce any sort of LRU semantics, though, except those present in the generational garbage collection.
There's also LinkedHashMap, which has this in the documentation:
A special constructor is provided to
create a linked hash map whose order
of iteration is the order in which its
entries were last accessed, from
least-recently accessed to
most-recently (access-order). This
kind of map is well-suited to building
LRU caches. Invoking the put or get
method results in an access to the
corresponding entry (assuming it
exists after the invocation
completes). The putAll method
generates one entry access for each
mapping in the specified map, in the
order that key-value mappings are
provided by the specified map's entry
set iterator. No other methods
generate entry accesses. In
particular, operations on
collection-views do not affect the
order of iteration of the backing map.
So if you use this constructor to make a map whose Iterator iterates in LRU, it becomes pretty easy to prune the map. The one (fairly big) caveat is that LinkedHashMap is not synchronized whatsoever, so you're on your own for concurrency. You can just wrap it in a synchronized wrapper, but that may have throughput issues.
Roll Your Own Solution
If I had to write my own data structure for this use-case, I'd probably create some sort of data structure with a map, queue, and ReadWriteLock along with a janitor thread to handle the cleanup when too many entries were in the map. It would be possible to go slightly over the desired max size, but in the steady-state you'd stay under it.

WeakHashMap won't necessarily attain your purpose since if enough strong reference to the keys are hold by your app., you WILL see OOME.
Alternatively you could look into SoftReference, which will null out the content once the heap is scarce. However, most of the comments I seen indicate that it will not null out the reference until the heap is really really low and a lot of GC starts to kick in with severe performance hit (so I don't recommend using it for your purpose).
My recommendation is to use a simple LRU map, e.g. http://commons.apache.org/collections/apidocs/org/apache/commons/collections/LRUMap.html

thanks for replies guys!
As jasonmp85 pointed out LinkedHashMap has a constructor that allows access order. I missed out that bit when I looked at API docs. The implementation also looks quite efficient(see below). Combined with max size cap for each entry, that should solve my problem.
I will also look closely at SoftReference. Just for the record, Google Collections seems to have pretty good API for SoftKeys and SoftValues and Maps in general.
Here is a snippet from Java LikedHashMap class that shows how they maintain LRU behavior.
/**
* Removes this entry from the linked list.
*/
private void remove() {
before.after = after;
after.before = before;
}
/**
* Inserts this entry before the specified existing entry in the list.
*/
private void addBefore(Entry<K,V> existingEntry) {
after = existingEntry;
before = existingEntry.before;
before.after = this;
after.before = this;
}
/**
* This method is invoked by the superclass whenever the value
* of a pre-existing entry is read by Map.get or modified by Map.set.
* If the enclosing Map is access-ordered, it moves the entry
* to the end of the list; otherwise, it does nothing.
*/
void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
if (lm.accessOrder) {
lm.modCount++;
remove();
addBefore(lm.header);
}

What's the quickest way to remove an element from a Map by value in Java?

What's the quickest way to remove an element from a Map by value in Java?
Currently I'm using:
DomainObj valueToRemove = new DomainObj();
String removalKey = null;
for (Map.Entry<String, DomainObj> entry : map.entrySet()) {
if (valueToRemove.equals(entry.getValue())) {
removalKey = entry.getKey();
break;
}
}
if (removalKey != null) {
map.remove(removalKey);
}

The correct and fast one-liner would actually be:
while (map.values().remove(valueObject));
Kind of strange that most examples above assume the valueObject to be unique.

Here's the one-line solution:
map.values().remove(valueToRemove);
That's probably faster than defining your own iterator, since the JDK collection code has been significantly optimized.
As others have mentioned, a bimap will have faster value removes, though it requires more memory and takes longer to populate. Also, a bimap only works when the values are unique, which may or may not be the case in your code.

Without using a Bi-directional map (commons-collections and google collections have them), you're stuck with iterating the Map

map.values().removeAll(Collections.singleton(null));
reference to How to filter "Null" values from HashMap<String, String>?, we can do following for java 8:
map.values().removeIf(valueToRemove::equals);

If you don't have a reverse map, I'd go for an iterator.
DomainObj valueToRemove = new DomainObj();
for (
Iterator<Map.Entry<String, DomainObj>> iter = map.entrySet().iterator();
iter.hasNext();
) {
Map.Entry<String, DomainObj> entry = iter.next();
if (valueToRemove.equals(entry.getValue())) {
iter.remove();
break; // if only want to remove first match.
}
}

You could always use the values collection, since any changes made to that collection will result in the change being reflected in the map. So if you were to call Map.values().remove(valueToRemove) that should work - though I'm not sure if you'll see performance better than what you have with that loop. One idea would be to extend or override the map class such that the backing collection then is always sorted by value - that would enable you to do a binary search on the value which may be faster.
Edit: This is essentially the same as Alcon's answer except I don't think his will work since the entrySet is still going to be ordered by key - in which case you can't call .remove() with the value.
This is also assuming that the value is supposed to be unique or that you would want to remove any duplicates from the Map as well.

i would use this
Map x = new HashMap();
x.put(1, "value1");
x.put(2, "value2");
x.put(3, "value3");
x.put(4, "value4");
x.put(5, "value5");
x.put(6, "value6");
x.values().remove("value4");
edit:
because objects are referenced by "pointer" not by value.
N

If you have no way to figure out the key from the DomainObj, then I don't see how you can improve on that. There's no built in method to get the key from the value, so you have to iterate through the map.
If this is something you're doing all the time, you might maintain two maps (string->DomainObj and DomainObj->Key).

Like most of the other posters have said, it's generally an O(N) operation because you're going to have to look through the whole list of hashtable values regardless. #tackline has the right solution for keeping the memory usage at O(1) (I gave him an up-vote for that).
Your other option is to sacrifice memory space for the sake of speed. If your map is reasonably sized, you could store two maps in parallel.
If you have a Map then maintain a Map in parallel to it. When you insert/remove on one map, do it on the other also. Granted this is uglier because you're wasting space and you'll have to make sure the "hashCode" method of DomainObj is written properly, but your removal time drops from O(N) to O(1) because you can lookup the key/object mapping in constant time either direction.
Not generally the best solution, but if your number one concern is speed, I think this is probably as fast as you're gonna get.
====================
Addendum: This essentially what #msaeed suggested just sans the third party library.

A shorter usage of iterator is to use a values() iterator.
DomainObj valueToRemove = new DomainObj();
for (Iterator<DomainObj> it = map.values().iterator(); it.hasNext();)) {
if (valueToRemove.equals(it.next())) {
it.remove();
break;
}
}

We know this situation arise rarely but is extremely helpful. I'll prefer BidiMap from org.apache.commons.collections .

I don't think this will happen only once in the lifetime of your app.
So what I would do, is to delegate to another object the responsability to maintain a reference to the objects added to that map.
So the next time you need to remove it, you use that "reverse map" ...
class MapHolder {
private Map<String, DomainObj> originalMap;
private Map<DomainObj,String> reverseMap;
public void remove( DomainObj value ) {
if ( reverseMap.contains( value ) ) {
originalMap.remove( reverseMap.get( value ) );
reverseMap.remove( value );
}
}
}
This is much much faster than iterating.
Obviously you need to keep them synchronized. But it should not be that hard if you refector your code to have one object being responsible for the state of the map.
Remember that in OOP we have objects that have an state and behavior. If your data is passing around variables all over the place, you are creating unnecessary dependencies between objects
Yes, It will take you some time to correct the code, but the time spent correcting it, will save you a lot of headaches in the future. Think about it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.