Understanding concurrentHashMap - java

As known, the ConcurrenthashMap class allows us to use iterators safely. As far as I understood from the sources of the Map it's achieved by storing the current Map state into the iterator itself. Here is the inner class representing the iterator (There's a child that is created when iterator()'s called):
abstract class HashIterator {
int nextSegmentIndex;
int nextTableIndex;
HashEntry<K,V>[] currentTable;
HashEntry<K, V> nextEntry;
HashEntry<K, V> lastReturned;
//Methods and ctor
}
But what if some thread writes to the Map something during construction of the iterator? Do we get non-determenistic state of the map then?
The thing is neither of the methods of the Map are synchronized. There's a ReentrantLock for put method, but that's it (as far as I could find). So, I don't understand how the iterator can support a correct state even if some thread writes to the map during its construction?.

The Iterator offers a weakly consistent state. It doesn't offer a transactional view of the data. It only offers that you will see all the keys/values if it is not altered and if it is, you may or may not see that change, but you won't get an error.

From the java doc of ConcurrentHashMap:
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset. For aggregate operations such as putAll and
clear, concurrent retrievals may reflect insertion or removal of only
some entries. Similarly, Iterators and Enumerations return elements
reflecting the state of the hash table at some point at or since the
creation of the iterator/enumeration. They do not throw
ConcurrentModificationException. However, iterators are designed to be
used by only one thread at a time.
Now answering the questions.
But what if some thread writes to the Map something during
construction of the iterator?
As mentioned, an iterator represents the state at some point of time. So it may not be the most recent state.
how the iterator can support a correct state even if some thread
writes to the map during its construction?
The guarantee is that things will not break if you put/remove during iteration. However, there is no guarantee that one thread will see the changes to the map that the other thread performs (without obtaining a new iterator from the map). The iterator is guaranteed to reflect the state of the map at the time of it's creation. Futher changes may be reflected in the iterator, but they do not have to be.

Related

ArrayList$Itr.checkForComodification throws java.util.ConcurrentModificationException on iterated NOT modified array list [duplicate]

I am using a Collection (a HashMap used indirectly by the JPA, it so happens), but apparently randomly the code throws a ConcurrentModificationException. What is causing it and how do I fix this problem? By using some synchronization, perhaps?
Here is the full stack-trace:
Exception in thread "pool-1-thread-1" java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
at java.util.HashMap$ValueIterator.next(Unknown Source)
at org.hibernate.collection.AbstractPersistentCollection$IteratorProxy.next(AbstractPersistentCollection.java:555)
at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
This is not a synchronization problem. This will occur if the underlying collection that is being iterated over is modified by anything other than the Iterator itself.
Iterator it = map.entrySet().iterator();
while (it.hasNext()) {
Entry item = it.next();
map.remove(item.getKey());
}
This will throw a ConcurrentModificationException when the it.hasNext() is called the second time.
The correct approach would be
Iterator it = map.entrySet().iterator();
while (it.hasNext()) {
Entry item = it.next();
it.remove();
}
Assuming this iterator supports the remove() operation.
Try using a ConcurrentHashMap instead of a plain HashMap
Modification of a Collection while iterating through that Collection using an Iterator is not permitted by most of the Collection classes. The Java library calls an attempt to modify a Collection while iterating through it a "concurrent modification". That unfortunately suggests the only possible cause is simultaneous modification by multiple threads, but that is not so. Using only one thread it is possible to create an iterator for the Collection (using Collection.iterator(), or an enhanced for loop), start iterating (using Iterator.next(), or equivalently entering the body of the enhanced for loop), modify the Collection, then continue iterating.
To help programmers, some implementations of those Collection classes attempt to detect erroneous concurrent modification, and throw a ConcurrentModificationException if they detect it. However, it is in general not possible and practical to guarantee detection of all concurrent modifications. So erroneous use of the Collection does not always result in a thrown ConcurrentModificationException.
The documentation of ConcurrentModificationException says:
This exception may be thrown by methods that have detected concurrent modification of an object when such modification is not permissible...
Note that this exception does not always indicate that an object has been concurrently modified by a different thread. If a single thread issues a sequence of method invocations that violates the contract of an object, the object may throw this exception...
Note that fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast operations throw ConcurrentModificationException on a best-effort basis.
Note that
the exception may be throw, not must be thrown
different threads are not required
throwing the exception cannot be guaranteed
throwing the exception is on a best-effort basis
throwing the exception happens when the concurrent modification is detected, not when it is caused
The documentation of the HashSet, HashMap, TreeSet and ArrayList classes says this:
The iterators returned [directly or indirectly from this class] are fail-fast: if the [collection] is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Note again that the behaviour "cannot be guaranteed" and is only "on a best-effort basis".
The documentation of several methods of the Map interface say this:
Non-concurrent implementations should override this method and, on a best-effort basis, throw a ConcurrentModificationException if it is detected that the mapping function modifies this map during computation. Concurrent implementations should override this method and, on a best-effort basis, throw an IllegalStateException if it is detected that the mapping function modifies this map during computation and as a result computation would never complete.
Note again that only a "best-effort basis" is required for detection, and a ConcurrentModificationException is explicitly suggested only for the non concurrent (non thread-safe) classes.
Debugging ConcurrentModificationException
So, when you see a stack-trace due to a ConcurrentModificationException, you can not immediately assume that the cause is unsafe multi-threaded access to a Collection. You must examine the stack-trace to determine which class of Collection threw the exception (a method of the class will have directly or indirectly thrown it), and for which Collection object. Then you must examine from where that object can be modified.
The most common cause is modification of the Collection within an enhanced for loop over the Collection. Just because you do not see an Iterator object in your source code does not mean there is no Iterator there! Fortunately, one of the statements of the faulty for loop will usually be in the stack-trace, so tracking down the error is usually easy.
A trickier case is when your code passes around references to the Collection object. Note that unmodifiable views of collections (such as produced by Collections.unmodifiableList()) retain a reference to the modifiable collection, so iteration over an "unmodifiable" collection can throw the exception (the modification has been done elsewhere). Other views of your Collection, such as sub lists, Map entry sets and Map key sets also retain references to the original (modifiable) Collection. This can be a problem even for a thread-safe Collection, such as CopyOnWriteList; do not assume that thread-safe (concurrent) collections can never throw the exception.
Which operations can modify a Collection can be unexpected in some cases. For example, LinkedHashMap.get() modifies its collection.
The hardest cases are when the exception is due to concurrent modification by multiple threads.
Programming to prevent concurrent modification errors
When possible, confine all references to a Collection object, so its is easier to prevent concurrent modifications. Make the Collection a private object or a local variable, and do not return references to the Collection or its iterators from methods. It is then much easier to examine all the places where the Collection can be modified. If the Collection is to be used by multiple threads, it is then practical to ensure that the threads access the Collection only with appropriate synchonization and locking.
In Java 8, you can use lambda expression:
map.keySet().removeIf(key -> key condition);
removeIf is a convenient default method in Collection which uses Iterator internally to iterate over the elements of the calling collection.
The extraction of the removal condition is expressed by allowing the caller to provide a Predicate<? super E>.
"I'll perform the iteration for you and test your Predicate on each one of the elements in the collection. If an element causes the test method of the Predicate to return true, I'll remove it."
It sounds less like a Java synchronization issue and more like a database locking problem.
I don't know if adding a version to all your persistent classes will sort it out, but that's one way that Hibernate can provide exclusive access to rows in a table.
Could be that isolation level needs to be higher. If you allow "dirty reads", maybe you need to bump up to serializable.
Note that the selected answer cannot be applied to your context directly before some modification, if you are trying to remove some entries from the map while iterating the map just like me.
I just give my working example here for newbies to save their time:
HashMap<Character,Integer> map=new HashMap();
//adding some entries to the map
...
int threshold;
//initialize the threshold
...
Iterator it=map.entrySet().iterator();
while(it.hasNext()){
Map.Entry<Character,Integer> item=(Map.Entry<Character,Integer>)it.next();
//it.remove() will delete the item from the map
if((Integer)item.getValue()<threshold){
it.remove();
}
Try either CopyOnWriteArrayList or CopyOnWriteArraySet depending on what you are trying to do.
I ran into this exception when try to remove x last items from list.
myList.subList(lastIndex, myList.size()).clear(); was the only solution that worked for me.

Adding and removing values from ConcurrentHashMap while iterating over it

I have this code:
private ConcurrentMap<String, Integer> myMap = new ConcurrentHashMap<>();
#Scheduled(fixedDelay = 600_000)
public void foo(){
myMap.values().stream().
filter(predicate()).
forEach(this::remove);
}
public void insert(String str, Integer value){
myMap.put(str, value);
}
What would happen if while iterating over this map - someone will put a new value in it or remove an existing value from it?
The documentation for ConcurrentHashMap has some details about the behavior. First we look at what ConcurrentHashMap.values() does:
Returns a Collection view of the values contained in this map...
The view's iterators and spliterators are weakly consistent.
The view's spliterator reports Spliterator.CONCURRENT and Spliterator.NONNULL.
What's interesting are the terms "weakly consistent" and Spliterator.CONCURRENT, where the former is described as:
Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators and Spliterators provide weakly consistent rather than fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.
and Spliterator.CONCURRENT is described as:
Characteristic value signifying that the element source may be safely concurrently modified (allowing additions, replacements, and/or removals) by multiple threads without external synchronization. If so, the Spliterator is expected to have a documented policy concerning the impact of modifications during traversal.
From all these documentations, and being consistent with the concurrency model of the ConcurrentHashMap, it means that the stream pipeline is completely thread-safe and will traverse the elements as they existed upon the creation of the iterator.

thread safe LinkedHashMap without Collections.synchronized

I am using a LinkedHashMap and the environment is multi threaded so this structure needs to be thread safe. During specific events I need to read the entire map push to db and clear all.
Most of time only writes happen to this map. This map has a limit 50 entries.
I am using Oracle MAF and it does not have Collections.syncronizedMap available. So, what are things I need to put in synchronized blocks to make sure writing and reading doesn't hit me concurrentModificationException etc
Few requirements:
I need to behave it like a circular queue so Overriding removeEldestEntry method of the LinkedHashMap.
I need to preserve the order
So, what are things I need to put in synchronized blocks to make sure writing and reading doesn't hit me concurrentModificationException etc
Everything method call should be in a synchronized block.
The tricky one being the use of an Iterator, as you have to hold the lock for the life of the Iterator. e.g.
// pre Java 5.0 code
synchronized(map) { // the lock has to be held for the whole loop.
for(Iterator iter = map.entrySet().iterator(); iter.hashNext(); ) {
Map.Entry entry = iter.next();
String key = (String) entry.getKey();
MyType value = (MyType) entry.getValue();
// do something with key and value.
}
}
If you are using a java version 1.5 or newer you can use java.util.concurrent.ConcurrentHashMap.
This is the most efficient implementation of a Map to use in a multithread environment.
It adds also some method like putIfAbsent very useful for atomic operations on the map.
From java doc:
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset. For aggregate operations such as putAll and
clear, concurrent retrievals may reflect insertion or removal of only
some entries
So verify is this is the behaviour you expect from your class.
If your map has only 50 records and needs to be used as a circular Queue why you use a Map? Is not better to use one of the Queue implementations?
If you need to use a LinkedHashMap use the following:
Map m = Collections.synchronizedMap(new LinkedHashMap());
From javadoc of LinkedHashMap:
Note that this implementation is not synchronized. If multiple threads
access a linked hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally.
This is typically accomplished by synchronizing on some object that
naturally encapsulates the map. If no such object exists, the map
should be "wrapped" using the Collections.synchronizedMap method. This
is best done at creation time, to prevent accidental unsynchronized
access to the map:
Map m = Collections.synchronizedMap(new LinkedHashMap(...));
https://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html
Most LinkedHashMap operations require synchronization in a multi-threaded environment, even the ones that look pure like get(key), get(key) actually mutates some internal nodes. The easiest you could do is using Collections.synchronizedMap.
Map<K,V> map = Collections.synchronizedMap(new LinkedHashMap<>());
Now if it is not available, you can easily add it, as it is just a simple decorator around map that synchronize all operation.
class SyncMap<T,U> implements Map<T,U>{
SyncMap<T,U>(LinkedHashMap<T,U> map){
..
}
public synchronized U get(T t){
..
}
}

Is Hashtable.entrySet() thread safe?

Is it safe to iterate over the set returned by Hashtable.entrySet()? What about Hashtable.values() and Hashtable.keySet()?
What I intent to do: I want update the entries of a Hastable while the table is used by different other threads. For this I have to iterate over all entries currently in the map. It's not important if any entries added/removed from other threads during the iteration are handled or not. Synchronizing on the Hashtable is not an option because the update may take to long.
From the JavaDoc:
Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.
Better use a ConcurrentHashMap.
No, iterating over the views of a Hashtable is not safe in the face of concurrent modification. From the javadoc (emphasis mine):
The iterators returned by the iterator method of the collections returned by all of this class's "collection view methods" are fail-fast: if the Hashtable is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future. The Enumerations returned by Hashtable's keys and elements methods are not fail-fast.
jabu.10245 is correct that a ConcurrentHashMap is more appropriate, and does meet your requirement of allowing iteration while concurrent modification is occurring.
I think we're just going to quote the documentation of Hashtable at you.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
I think the docs say it all: You can't depend on the iterator telling you when you've modified a Hashtable incorrectly. Either synchronize on the Hashtable, or use a different class.

Why do we need synchronized ArrayLists when we already have Vectors?

When do we use synchronized ArrayList? We already have Vector which is synchronized.
I think that you've got this wrong. ArrayList is unsynchronized, Vector is.
Being synchronized means that every operation is thread safe - if you use the same vector from two threads at the same time, they can't corrupt the state. However, this makes it slower.
If you are working in a single threaded environment (or the list is limited to a thread and never shared), use ArrayList. If you are working with multiple threads that share the same collection, either use Vector, or use ArrayList but synchronize in some other way (e.g., manually or via a wrapper).
ArrayList is not synchronized via http://java.sun.com/javase/6/docs/api/java/util/ArrayList.html
ArrayList is not synchronized out of the box.
Resizable-array implementation of the
List interface. Implements all
optional list operations, and permits
all elements, including null. In
addition to implementing the List
interface, this class provides methods
to manipulate the size of the array
that is used internally to store the
list. (This class is roughly
equivalent to Vector, except that it
is unsynchronized.)
This avoids some performance issues in situations where you know that you won't need thread safety (e.g., entirely encapsulated private data). However both ArrayList and Vector have issues when using iterators over them: when iterating through either type of collection, if data is added or removed, you will throw a ConcurrentModificationException:
Note that this implementation is not
synchronized. If multiple threads
access an ArrayList instance
concurrently, and at least one of the
threads modifies the list
structurally, it must be synchronized
externally. (A structural modification
is any operation that adds or deletes
one or more elements, or explicitly
resizes the backing array; merely
setting the value of an element is not
a structural modification.) This is
typically accomplished by
synchronizing on some object that
naturally encapsulates the list. If no
such object exists, the list should be
"wrapped" using the
Collections.synchronizedList method.
This is best done at creation time, to
prevent accidental unsynchronized
access to the list:
List list =
Collections.synchronizedList(new
ArrayList(...));
The iterators returned by this class's
iterator and listIterator methods are
fail-fast: if the list is structurally
modified at any time after the
iterator is created, in any way except
through the iterator's own remove or
add methods, the iterator will throw a
ConcurrentModificationException. Thus,
in the face of concurrent
modification, the iterator fails
quickly and cleanly, rather than
risking arbitrary, non-deterministic
behavior at an undetermined time in
the future.
Note that the fail-fast behavior of an
iterator cannot be guaranteed as it
is, generally speaking, impossible to
make any hard guarantees in the
presence of unsynchronized concurrent
modification. Fail-fast iterators
throw ConcurrentModificationException
on a best-effort basis. Therefore, it
would be wrong to write a program that
depended on this exception for its
correctness: the fail-fast behavior of
iterators should be used only to
detect bugs.
ArrayList comes in a variety of useful flavors, however, while Vector does not. My personal favorite is the CopyOnWriteArrayList:
A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
This is ordinarily too costly, but may be more efficient than alternatives when traversal operations vastly outnumber mutations, and is useful when you cannot or don't want to synchronize traversals, yet need to preclude interference among concurrent threads. The "snapshot" style iterator method uses a reference to the state of the array at the point that the iterator was created. This array never changes during the lifetime of the iterator, so interference is impossible and the iterator is guaranteed not to throw ConcurrentModificationException. The iterator will not reflect additions, removals, or changes to the list since the iterator was created. Element-changing operations on iterators themselves (remove, set, and add) are not supported. These methods throw UnsupportedOperationException.
CopyOnWriteArrayLists are tremendously useful in GUI work, especially in situations where you are displaying an updating set of data (e.g., moving icons on a screen). If you can tolerate having your displayed list of data be one frame out of date (because your producer thread is slightly behind your graphical update thread), CopyOnWriteArrayLists are the perfect data structure.
What does it mean array list is synchronized in java?
It means it is thread-safe.
Vectors are synchronized. Any method that touches the Vector's contents is thread safe.
ArrayList, on the other hand, is unsynchronized, making them, therefore, not thread safe.
I read ArrayList is synchronized in
java..
But the array list api says that
Note that this implementation is not
synchronized
So you use an ArrayList when you are sure that you wont be dealing with concurrency.
Using Vector might be an overkill, and may result in performance issues.

Categories