Adding and removing values from ConcurrentHashMap while iterating over it - java

I have this code:
private ConcurrentMap<String, Integer> myMap = new ConcurrentHashMap<>();
#Scheduled(fixedDelay = 600_000)
public void foo(){
myMap.values().stream().
filter(predicate()).
forEach(this::remove);
}
public void insert(String str, Integer value){
myMap.put(str, value);
}
What would happen if while iterating over this map - someone will put a new value in it or remove an existing value from it?

The documentation for ConcurrentHashMap has some details about the behavior. First we look at what ConcurrentHashMap.values() does:
Returns a Collection view of the values contained in this map...
The view's iterators and spliterators are weakly consistent.
The view's spliterator reports Spliterator.CONCURRENT and Spliterator.NONNULL.
What's interesting are the terms "weakly consistent" and Spliterator.CONCURRENT, where the former is described as:
Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators and Spliterators provide weakly consistent rather than fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.
and Spliterator.CONCURRENT is described as:
Characteristic value signifying that the element source may be safely concurrently modified (allowing additions, replacements, and/or removals) by multiple threads without external synchronization. If so, the Spliterator is expected to have a documented policy concerning the impact of modifications during traversal.
From all these documentations, and being consistent with the concurrency model of the ConcurrentHashMap, it means that the stream pipeline is completely thread-safe and will traverse the elements as they existed upon the creation of the iterator.

Related

Java stream, remove and perform action from ConcurrentLinkedQueue

I am unsure how to do this, I'd like to iterate the ConcurrentLinkedQueue (all of it), removing the i-th item and performing some code on it.
This is what I was used to do:
public static class Input {
public static final ConcurrentLinkedQueue<TreeNode> treeNodes = new ConcurrentLinkedQueue<>();
}
public static class Current {
public static final ConcurrentHashMap<Integer, TreeNode> treeNodes = new ConcurrentHashMap<>();
}
TreeNode is a simple class
TreeNode treeNode = Input.treeNodes.poll();
while (treeNode != null) {
treeNode.init(gl3);
Current.treeNodes.put(treeNode.getId(), treeNode);
treeNode = Input.treeNodes.poll();
}
This is how I am trying to do using stream:
Input.treeNodes.stream()
.forEach(treeNode -> {
Input.treeNodes.remove(treeNode);
treeNode.init(gl3);
Current.treeNodes.put(treeNode.getId(), treeNode);
});
I am afraid that something may be error prone having to remove the item inside the forEach action.
So my question is:
Is this safe and/or are there any better ways to do it?
Just as you've assumed, you should not modify the backing collection while processing the stream because you might get a ConcurrentModificationException (just as with for(Object o:objectArray){} loops)
On the other hand it is not very clear which TreeNode you are trying to remove, as in the current case, seemingly you wish to remove all elements from the List, perform some actions on them and put them in a Map.
You may safely achieve your current logic via:
Input.treeNodes.stream()
.map(treeNode -> {
treeNode.init(gl3);
Current.treeNodes.put(treeNode.getId(), treeNode);
});
Input.treeNodes.clear();
This behavior is determine by the Spliterator used to construct the Stream. The documentation of ConcurrentLinkedQueue.spliterator() says:
Returns a Spliterator over the elements in this queue.
The returned spliterator is weakly consistent.
“weakly consistent” implies:
Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators and Spliterators provide weakly consistent rather than fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.
This implies that removing the encountered elements should not have any impact.
On the other hand, when other threads add or remove elements, the outcome of your Stream operation regarding these elements is unpredictable.
However, you should consider that remove(Object) is not the intended use case for a queue.

Understanding concurrentHashMap

As known, the ConcurrenthashMap class allows us to use iterators safely. As far as I understood from the sources of the Map it's achieved by storing the current Map state into the iterator itself. Here is the inner class representing the iterator (There's a child that is created when iterator()'s called):
abstract class HashIterator {
int nextSegmentIndex;
int nextTableIndex;
HashEntry<K,V>[] currentTable;
HashEntry<K, V> nextEntry;
HashEntry<K, V> lastReturned;
//Methods and ctor
}
But what if some thread writes to the Map something during construction of the iterator? Do we get non-determenistic state of the map then?
The thing is neither of the methods of the Map are synchronized. There's a ReentrantLock for put method, but that's it (as far as I could find). So, I don't understand how the iterator can support a correct state even if some thread writes to the map during its construction?.
The Iterator offers a weakly consistent state. It doesn't offer a transactional view of the data. It only offers that you will see all the keys/values if it is not altered and if it is, you may or may not see that change, but you won't get an error.
From the java doc of ConcurrentHashMap:
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset. For aggregate operations such as putAll and
clear, concurrent retrievals may reflect insertion or removal of only
some entries. Similarly, Iterators and Enumerations return elements
reflecting the state of the hash table at some point at or since the
creation of the iterator/enumeration. They do not throw
ConcurrentModificationException. However, iterators are designed to be
used by only one thread at a time.
Now answering the questions.
But what if some thread writes to the Map something during
construction of the iterator?
As mentioned, an iterator represents the state at some point of time. So it may not be the most recent state.
how the iterator can support a correct state even if some thread
writes to the map during its construction?
The guarantee is that things will not break if you put/remove during iteration. However, there is no guarantee that one thread will see the changes to the map that the other thread performs (without obtaining a new iterator from the map). The iterator is guaranteed to reflect the state of the map at the time of it's creation. Futher changes may be reflected in the iterator, but they do not have to be.

Advantages of using a HashSet over HashMap

According to JavaDoc API a HashSet is just a wrapper around a HashMap. Therefore can there ever be any performance benefit to using a HashSet over a HashMap. Ever? Ever? Or is just to have a different API which suits other cases?
No, there's no "performance" benefit; there's just the benefit that comes from using the right API for the right problem...which is considerable.
Not Really there are no Performance benefits, They both serves for the different purpose.
About Hashset in API documentation
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:
Set s = Collections.synchronizedSet(new HashSet(...));
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Check this reference : HashSet

Looking for an unbounded, queue-based, concurrent implementation of java.util.Set

I'm looking for an implementation of java.util.Set with the following features:
Should be concurrent by no means of synchronized locking; So it's obvious that I don't want to use Collections.synchronizedSet().
Should keep insertion order. So ConcurrentSkipListSet is not preferable, because it uses compareTo() as equals(), and requires to provide either an implementation of Comparable or Comparator. There is also a ConcurrentLinkedHashMap which in spite of LinkedHashMap, doesn't keep insertion order.
Should be unbounded.
Recommended be a FIFO linked list, as my operations are done only to the first element of the queue.
As far as I could find the only proper impl is CopyOnWriteArraySet, but it states in the documentation that:
Mutative operations (add, set, remove,
etc.) are expensive since they usually entail copying the entire
underlying array.
In my case, I have lots of insertions to the end of queue (set) and lots Any deletions (and read) from head of the queue. So, any recommendation?
The following solution has a race condition on removal. It also behaves somewhat differently from standard JDK Set implementations.
However, it uses standard JDK objects, and is a simple implementation. Only you can decide whether this race condition is acceptable, or whether you're willing to invest the timee to find/implement a solution without races.
public class FifoSet<T>
{
private ConcurrentHashMap<T,T> _map;
private ConcurrentLinkedQueue<T> _queue;
public void add(T obj)
{
if (_map.put(obj,obj) != null)
return;
_queue.add(obj);
}
public T removeFirst()
{
T obj = _queue.remove();
_map.remove(obj);
return obj;
}
}
Some more explanation: the ConcurrentHashMap exists solely as a guard on the ConcurrentLinkedList; its put() method is essentially a compare-and-swap. So you ensure that you don't have anything in the map before adding to the queue, and you don't remove from the map until you remove from the queue.
The race condition on remove is that there's a space of time between removing the item from the queue and removing it from the map. In that space of time, add will fail, because it still thinks the item is in the queue.
This is imo a relatively minor race condition. One that's far less important than the gap in time between removing the item from the queue and actually doing something with that item.

Does java have a "LinkedConcurrentHashMap" data structure?

I need a data structure that is a LinkedHashMap and is thread safe.
How can I do that ?
You can wrap the map in a Collections.synchronizedMap to get a synchronized hashmap that maintains insertion order. This is not as efficient as a ConcurrentHashMap (and doesn't implement the extra interface methods of ConcurrentMap) but it does get you the (somewhat) thread safe behavior.
Even the mighty Google Collections doesn't appear to have solved this particular problem yet. However, there is one project that does try to tackle the problem.
I say somewhat on the synchronization, because iteration is still not thread safe in the sense that concurrent modification exceptions can happen.
There's a number of different approaches to this problem. You could use:
Collections.synchronizedMap(new LinkedHashMap());
as the other responses have suggested but this has several gotchas you'll need to be aware of. Most notably is that you will often need to hold the collections synchronized lock when iterating over the collection, which in turn prevents other threads from accessing the collection until you've completed iterating over it. (See Java theory and practice: Concurrent collections classes). For example:
synchronized(map) {
for (Object obj: map) {
// Do work here
}
}
Using
new ConcurrentHashMap();
is probably a better choice as you won't need to lock the collection to iterate over it.
Finally, you might want to consider a more functional programming approach. That is you could consider the map as essentially immutable. Instead of adding to an existing Map, you would create a new one that contains the contents of the old map plus the new addition. This sounds pretty bizarre at first, but it is actually the way Scala deals with concurrency and collections
There is one implementation available under Google code. A quote from their site:
A high performance version of java.util.LinkedHashMap for use as a software cache.
Design
A concurrent linked list runs through a ConcurrentHashMap to provide eviction ordering.
Supports insertion and access ordered eviction policies (FIFO, LRU, and Second Chance).
You can use a ConcurrentSkipListMap, only available in Java SE/EE 6 or later. It is order presevering in that keys are sorted according to their natural ordering. You need to have a Comparator or make the keys Comparable objects. In order to mimik a linked hash map behavior (iteration order is the order in time in which entries were added) I implemented my key objects to always compare to be greater than a given other object unless it is equal (whatever that is for your object).
A wrapped synchronized linked hash map did not suffice because as stated in
http://www.ibm.com/developerworks/java/library/j-jtp07233.html: "The synchronized collections wrappers, synchronizedMap and synchronizedList, are sometimes called conditionally thread-safe -- all individual operations are thread-safe, but sequences of operations where the control flow depends on the results of previous operations may be subject to data races. The first snippet in Listing 1 shows the common put-if-absent idiom -- if an entry does not already exist in the Map, add it. Unfortunately, as written, it is possible for another thread to insert a value with the same key between the time the containsKey() method returns and the time the put() method is called. If you want to ensure exactly-once insertion, you need to wrap the pair of statements with a synchronized block that synchronizes on the Map m."
So what only helps is a ConcurrentSkipListMap which is 3-5 times slower than a normal ConcurrentHashMap.
Collections.synchronizedMap(new LinkedHashMap())
Since the ConcurrentHashMap offers a few important extra methods that are not in the Map interface, simply wrapping a LinkedHashMap with a synchronizedMap won't give you the same functionality, in particular, they won't give you anything like the putIfAbsent(), replace(key, oldValue, newValue) and remove(key, oldValue) methods which make the ConcurrentHashMap so useful.
Unless there's some apache library that has implemented what you want, you'll probably have to use a LinkedHashMap and provide suitable synchronized{} blocks of your own.
I just tried synchronized bounded LRU Map based on insertion order LinkedConcurrentHashMap; with Read/Write Lock for synchronization.
So when you are using iterator; you have to acquire WriteLock to avoid ConcurrentModificationException. This is better than Collections.synchronizedMap.
public class LinkedConcurrentHashMap<K, V> {
private LinkedHashMap<K, V> linkedHashMap = null;
private final int cacheSize;
private ReadWriteLock readWriteLock = null;
public LinkedConcurrentHashMap(LinkedHashMap<K, V> psCacheMap, int size) {
this.linkedHashMap = psCacheMap;
cacheSize = size;
readWriteLock=new ReentrantReadWriteLock();
}
public void put(K key, V value) throws SQLException{
Lock writeLock=readWriteLock.writeLock();
try{
writeLock.lock();
if(linkedHashMap.size() >= cacheSize && cacheSize > 0){
K oldAgedKey = linkedHashMap.keySet().iterator().next();
remove(oldAgedKey);
}
linkedHashMap.put(key, value);
}finally{
writeLock.unlock();
}
}
public V get(K key){
Lock readLock=readWriteLock.readLock();
try{
readLock.lock();
return linkedHashMap.get(key);
}finally{
readLock.unlock();
}
}
public boolean containsKey(K key){
Lock readLock=readWriteLock.readLock();
try{
readLock.lock();
return linkedHashMap.containsKey(key);
}finally{
readLock.unlock();
}
}
public V remove(K key){
Lock writeLock=readWriteLock.writeLock();
try{
writeLock.lock();
return linkedHashMap.remove(key);
}finally{
writeLock.unlock();
}
}
public ReadWriteLock getLock(){
return readWriteLock;
}
public Set<Map.Entry<K, V>> entrySet(){
return linkedHashMap.entrySet();
}
}
The answer is pretty much no, there's nothing equivalent to a ConcurrentHashMap that is sorted (like the LinkedHashMap). As other people pointed out, you can wrap your collection using Collections.synchronizedMap(-yourmap-) however this will not give you the same level of fine grained locking. It will simply block the entire map on every operation.
Your best bet is to either use synchronized around any access to the map (where it matters, of course. You may not care about dirty reads, for example) or to write a wrapper around the map that determines when it should or should not lock.
How about this.
Take your favourite open-source concurrent HashMap implementation. Sadly it can't be Java's ConcurrentHashMap as it's basically impossible to copy and modify that due to huge numbers of package-private stuff. (Why do the Java authors always do that?)
Add a ConcurrentLinkedDeque field.
Modify all of the put methods so that if an insertion is successful the Entry is added to the end of the deque. Modify all of the remove methods so that any removed entries are also removed from the deque. Where a put method replaces the existing value, we don't have to do anything to the deque.
Change all iterator/spliterator methods so that they delegate to the deque.
There's no guarantee that the deque and the map have exactly the same contents at all times, but concurrent hash maps don't make those sort of promises anyway.
Removal won't be super fast (have to scan the deque). But most maps are never (or very rarely) asked to remove entries anyway.
You could also achieve this by extending ConcurrentHashMap, or decorating it (decorator pattern).

Categories