Say I am iterating over a Map in Java... I am unclear about what I can to that Map while in the process of iterating over it. I guess I am mostly confused by this warning in the Javadoc for the Iterator interface remove method:
[...] The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.
I know for sure that I can invoke the remove method without any issues. But while iterating over the Map collection, can I:
Change the value associated with a key with the Map class put method (put with an existing key)?
Add a new entry with the Map class put method (put with a new key)?
Remove an entry with the Map class remove method?
My guess is that I can probably safely do #1 (put to an existing key) but not safely do #2 or #3.
Thanks in advance for any clarification on this.
You can use Iterator.remove(), and if using an entrySet iterator (of Map.Entry's) you can use Map.Entry.setValue(). Anything else and all bets are off - you should not change the map directly, and some maps will
not permit either or both of the aforementioned methods.
Specifically, your (1), (2) and (3) are not permitted.
You might get away with setting an existing key's value through the Map object, but the Set.iterator() documentation specifically precludes that and it will be implementation specific:
If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation, or through the setValue operation on a map entry returned by the iterator) the results of the iteration are undefined. (emphasis added)
If you take a look at the HashMap class, you'll see a field called 'modCount'. This is how the map knows when it's been modified during iteration. Any method that increments modCount when you're iterating will cause it to throw a ConcurrentModificationException.
That said, you CAN put a value into a map if the key already exists, effectively updating the entry with the new value:
Map<String, Object> test = new HashMap<String, Object>();
test.put("test", 1);
for(String key : test.keySet())
{
test.put(key, 2); // this works!
}
System.out.println(test); // will print "test->2"
When you ask if you can perform these operations 'safely,' you don't have to worry too much because HashMap is designed to throw that ConcurrentModificationException as soon as it runs into a problem like this. These operations will fail fast; they won't leave the map in a bad state.
There is no global answer. The map interface let the choice to the users. Unfortunately, I think that all the implementations in the jdk use the fail-fast implementation (here is the definition of fail-fast, as it stated in the HashMap Javadoc):
The iterators returned by all of this
class's "collection view methods" are
fail-fast: if the map is structurally
modified at any time after the
iterator is created, in any way except
through the iterator's own remove
method, the iterator will throw a
ConcurrentModificationException. Thus,
in the face of concurrent
modification, the iterator fails
quickly and cleanly, rather than
risking arbitrary, non-deterministic
behavior at an undetermined time in
the future.
In general, if you want to change the Map while iterating over it, you should use one of the iterator's methods. I have not actually tested to see if #1 will will work, but the others definitely will not.
Related
Here's what I know:
Fail-fast iterators will thrown a ConcurrentModificationException
if I try to modify the given element while iterating through it,
without the use of the iterator's methods (like iterator.remove())
It's not guaranteed that a fail-fast iterator will ALWAYS throw a CME.
Fail-safe iterators won't throw CME.
I'm reading a book where I came across the following sentence:
A HashMap provides its set of keys and a Java application can iterate
over them. Thus, a HashMap is fail-fast.
The part that I don't understand is where it says "Thus...". If someone would tell me that a HashMap provides its set of keys, I still wouldn't know whether it's a fail-fast or fail-safe (based on that alone).
So why does, providing its own set of keys, make the HashMap fail-fast?
What's the connection between those two things?
It is actually the first sentence which provides the information why HashMap's is fail-fast:
A HashMap provides its set of keys and a Java application can iterate over them.
Fail safe iterators iterate over the private copy of the original collection, not the collection itself. Therefore any change to the original collection does not get noticed by the iterator, and hence it never throws CME.
Since HashMap provides its set of keys as in the quote above (rather than a copy of) it is therefore fail-fast.
The author just didn't complete the idea.
From javadocs of HashMap (https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html#keySet()):
"Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined. The set supports element removal, which removes the corresponding mapping from the map, via the Iterator.remove, Set.remove, removeAll, retainAll, and clear operations. It does not support the add or addAll operations."
The idea that wasn't expressed is that the hash map is iterated over using the keySet-provided Set (well, let's take that as iterating over the map...). As that set is fail-fast (as per the doc above), the map is also fail-fast.
Remember that other methods allow to iterate over the map (but luckily as far as I could see, they're also fail-fast). Check https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html#entrySet()
Looking at internet pages that contain that exact sentence, I think the context was a comparison of HashMap with HashTable. It can be made clearer if you look at what the Javadoc says for HashTable:
The iterators returned by the iterator method of the collections returned by all of this class's "collection view methods" are fail-fast: if the Hashtable is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future. The Enumerations returned by Hashtable's keys and elements methods are not fail-fast.
So, armed with this background, we can figure out that the author of the HashMap vs HashTable comparison was trying to say that HashMap is fail-fast because it has a keySet() method, which returns fail-fast iterators as described by the above Javadoc. However, this gives incomplete information because it can be taken to imply that, unlike HashMap, HashTable isn't fail-fast. In fact, HashTable implements Map and therefore also has the keySet() method, so it also has fail-fast iterators just like HashMap. Another problem with that sentence is that it is misleading: it is not the HashMap that is fail-fast, but the iterators it returns.
In the java.util.TreeMap javadoc there is this statement:
All Map.Entry pairs returned by methods in this class and its views represent snapshots of mappings at the time they were produced. They do not support the Entry.setValue method. (Note however that it is possible to change mappings in the associated map using put.)
I don't get this line. In what way they do not support setValue method? When I use entrySet() and iterate over Map.Entry object it sets value fine.
Map<String, Integer> map = new TreeMap<>();
map.put("dbc", 1);
map.put("abc", 1);
map.put("cbc", 1);
for(Map.Entry<String, Integer> item: map.entrySet()) {
item.setValue(1);
}
This is a known issue.
There is an OpenJDK tracker entry (JDK-8038146) that notes that this is a javadoc error. But there is more to it than this.
There is also a Java Bug Database entry (bug id 7006877) that explains that the javadoc was changed to say that in Java 6, and that it is actually true for the alternative version of TreeMap that you get (got) if you run the JVM with aggressive optimizations enabled.
This ticket also says that the issue affected Java 7, and was fixed in Java 8. They apparently removed the alternative TreeMap implementation ... though they didn't change the javadoc.
Commentary:
If the issue trackers are to be believed (and I've understood them correctly), then the javadoc probably ought to say that the Entry.setValue method may not be supported in Java 6 and Java 7. But the misleading sentences could be entirely removed for Java 8 onwards.
Whether that is the correct thing to do is somewhat debatable, because some people need to understand how their new Java code might run on older platforms. Maybe it would be best to leave this as a historical footnote.
It appears the comment you quoted isn't entirely accurate.
The TreeMap class has a number of methods which return single Map.Entry objects: firstEntry, lastEntry, higherEntry, lowerEntry, etc...
I believe the comment is meant in reference to the methods on TreeMap which return a single Map.Entry. Those methods return a Map.Entry by making an immutable copy of the underlying Entry ( via AbstractMap.SimpleImmutableEntry).
These immutable copies will throw an UnsupportedOperationException if you call their setValue.
After reading the comment you quoted I would have thought that entrySet would return Immutable copies. The javadoc of the TreeMap.entrySet method states:
Returns a Set view of the mappings contained in this map. The set's iterator returns the entries in ascending key order. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation, or through the setValue operation on a map entry returned by the iterator) the results of the iteration are undefined. The set supports element removal, which removes the corresponding mapping from the map, via the Iterator.remove, Set.remove, removeAll, retainAll and clear operations. It does not support the add or addAll operations.
Given that changes to the Set returned by entrySet must operate on the underlying map I can see why TreeMap doesn't attempt to wrap the Entries - that would severely complicate efforts to keep the Set and the Map in sync.
As all known We can't modify a non-thread-safe collection while iterating it since it will throw a ConcurrentModificationException
But what I want to know is that what will happen if it would not throw a Exception and let the iteration and modification happened concurrently.
For example, remove an element from a HashMap while iterating it.
Remove. Since remove operation would not change the length of the underlying table in the HashMap, I think that's not an issue to the iteration.
Put. Maybe problem only occurs when the Put triggers resize() since the underlying table will be shuffled.
Is my analysis correct?
Short answer: no, your analysis is not correct.
If you remove something from a collection while you're iterating over it (without using the iterator), the iterator doesn't have a good way to keep track of where it is. Use a simpler example: List. Say the iterator is at index 10, and you remove index 5. That removal shifts all of the indices. Now you call next() on the iterator, and you.. what? Go to index 11? Stay at index 10? The iterator has no way to know.
Similarly, if you add something to a collection while you're iterating over it (without using the iterator), the iterator doesn't know if that was added before or after the current index, so the next() function is broken.
This doesn't even get into data structures where the iterator order depends on what's in the collection, but the issues are similar to the ones I listed above.
But what I want to know is that what will happen if it would not throw a Exception and let the iteration and modification happened concurrently.
This is hypothetical because the respective (non-concurrent) collections don't work that way. If we hypothesize that they did allow "concurrent" modification, then we still cannot answer without making assumptions about how iteration would then be implemented. Finally, assuming that we just removed the fast-fail tests, then the behaviour will be collection specific.
Looking at your analysis for the HashMap case, you have to consider the internal state of the iterator object. I haven't looked at any specific implementation code, but a typical HashMap iterator will have an index for a hash chain in the main hash array, and a pointer to a node within the hash chain:
A Map.remove won't change the hashmap size, so the chain index won't be invalidated. However, if the wrong entry was removed, we could find that the iterator's node pointer could refer to a node that is no longer in the chain. This could cause the iteration to return deleted map entries.
You are correct that a Map.put that triggered a resize could cause the entries to be redistributed. This could cause some entries to be skipped, and others to be returned twice.
Usually the traditional collection classes in java.util package uses an int variable (modCount) to keep track of modifications (additions and deletions).
When we ask for an Iterator from these collection classes then a object of Iterator which is returned is provided with the existing modification count variable as its expected modification count.
Upon invoking the next() method the Iterator object checks the current modification count variable value against its expected modification count value.
In case of a mismatch it fails fast by throwing ConcurrentModificationException present in java.util package, its a RuntimeException.
Do not get confused between the size of the collection object (as in your question Map) and the total buckets available. Moreover its not about the size, one addition increases the value of the modification count flag and also a deletion increases its value.
I have need of iterating over some HashMaps on each frame of an OpenGL loop. I do it like this:
for (Map.Entry<MyKey, MyValue> entry : myMap.entrySet(){...}
My concern is about whether this call to entrySet() actually instantiates and populates a brand new Map.Entry object every time it's called, because if it is, the GC will be more busy than I'd like when animating in OpenGL. My gut says no, because the HashMap documentation says that you can directly modify the HashMap using the returned entry set, but I don't know how to tell for sure.
And I'd also like to know about other Map implementations as well, like Hashtable, TreeMap, and LinkedHashMap.
After reviewing the source, the answer is no. It lazily instantiates the entry set on the first call to entrySet() and then returns a reference to the same object on each subsequent call.
The same is true for LinkedHashMap, Hashtable, and TreeMap.
No, it does not create a new Set, rather a light-weight wrapper, like eg. Arrays.asList does, API says The set is backed by the map, so changes to the map are reflected in the set, and vice-versa.
The implementation of HashMap.entrySet is simply a cheap view, and takes O(1) time to create every time you call it. The entry objects you get by iterating over it are, in fact, the objects used by the map to implement its internal data structures.
(That, and there's really no other way to do the things you would want to do with it.)
I am iterating through a Hashtable and at one point, I add something in to the Hashtable which is clearly giving me a ConcurrentModificationException. I understand why I am getting the error, but is there a way around this such that I could still iterate through the Hashtable and add values simultaneously?
From the docs
The iterators returned by the iterator
method of the collections returned by
all of this class's "collection view
methods" are fail-fast: if the
Hashtable is structurally modified at
any time after the iterator is
created, in any way except through the
iterator's own remove method, the
iterator will throw a
ConcurrentModificationException. Thus,
in the face of concurrent
modification, the iterator fails
quickly and cleanly, rather than
risking arbitrary, non-deterministic
behavior at an undetermined time in
the future. The Enumerations returned
by Hashtable's keys and elements
methods are not fail-fast.
Note that the fail-fast behavior of an
iterator cannot be guaranteed as it
is, generally speaking, impossible to
make any hard guarantees in the
presence of unsynchronized concurrent
modification. Fail-fast iterators
throw ConcurrentModificationException
on a best-effort basis. Therefore, it
would be wrong to write a program that
depended on this exception for its
correctness: the fail-fast behavior of
iterators should be used only to
detect bugs.
If you need this kind of behavior you can safely copy the set of keys and iterate through the copy. Another option if the hashtable is large and copying the keyset is likely to be expensive is to add to a separate collection during the iteration and add the elements of the separate collection post iteration.
You may also want to know about CopyOnWriteSet, which is specifically designed for safe iteration while set is modified. Note that iterator sees only the original set. Any additions will not be visible until next iteration.
http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/CopyOnWriteArraySet.html
This is most useful in many readers / few writers scenario. It is unlikely to be the most efficient solution if reading and writing happens in the same code path.
Make a new Hashtable that you add new entries to; then when you are done iterating, add in the entries from the first table.
Optionally, if you need to, you can skip keys that exist in the original table.
Another alternative would be to use a ConcurrentHashMap instead of a HashMap. However:
The iterators for a ConcurrentHashMap are defined to return objects reflecting a state some time at or after the creation of the iterator. A more precise statement of the behaviour is in the javadocs for the relevant methods.
A ConcurrentHashMap is probably slower than a regular HashMap.