Behavior of entrySet().removeIf in ConcurrentHashMap

Behavior of entrySet().removeIf in ConcurrentHashMap - java

I would like to use ConcurrentHashMap to let one thread delete some items from the map periodically and other threads to put and get items from the map at the same time.
I'm using map.entrySet().removeIf(lambda) in the removing thread. I'm wondering what assumptions I can make about its behavior. I can see that removeIf method uses iterator to go through elements in the map, check the given condition and then remove them if needed using iterator.remove().
Documentation gives some info about ConcurrentHashMap iterators behavior:
Similarly, Iterators, Spliterators and Enumerations return elements
reflecting the state of the hash table at some point at or since the
creation of the iterator/enumeration. hey do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
As the whole removeIf call happens in one thread I can be sure that the iterator is not used by more than one thread at the time. Still I'm wondering if the course of events described below is possible:
Map contains mapping: 'A'->0
Deleting Thread starts executing map.entrySet().removeIf(entry->entry.getValue()==0)
Deleting Thread calls .iteratator() inside removeIf call and gets the iterator reflecting the current state of the collection
Another thread executes map.put('A', 1)
Deleting thread still sees 'A'->0 mapping (iterator reflects the old state) and because 0==0 is true it decides to remove A key from the map.
The map now contains 'A'->1 but deleting thread saw the old value of 0 and the 'A' ->1 entry is removed even though it shouldn't be. The map is empty.
I can imagine that the behavior may be prevented by the implementation in many ways. For example: maybe iterators are not reflecting put/remove operations but are always reflecting value updates or maybe the remove method of the iterator checks if the whole mapping (both key and value) is still present in the map before calling remove on the key. I couldn't find info about any of those things happening and I'm wondering if there's something which makes that use case safe.

I also managed to reproduce such case on my machine.
I think, the problem is that EntrySetView (which is returned by ConcurrentHashMap.entrySet()) inherits its removeIf implementation from Collection, and it looks like:
default boolean removeIf(Predicate<? super E> filter) {
Objects.requireNonNull(filter);
boolean removed = false;
final Iterator<E> each = iterator();
while (each.hasNext()) {
// `test` returns `true` for some entry
if (filter.test(each.next())) {
// entry has been just changed, `test` would return `false` now
each.remove(); // ...but we still remove
removed = true;
}
}
return removed;
}
In my humble opinion, this cannot be considered as a correct implementation for ConcurrentHashMap.

After discussion with user Zielu in comments below Zielu's answer I have gone deeper into the ConcurrentHashMap code and found out that:
ConcurrentHashMap implementation provides remove(key, value) method which calls replaceNode(key, null, value)
replaceNode checks if both key and value are still present in the map before removing so using it should be fine. Documentation says that it
Replaces node value with v, conditional upon match of cv if
* non-null.
In the case mentioned in the question ConcurrentHashMap's .entrySet() is called which returns EntrySetView class. Then removeIf method calls .iterator() which returns EntryIterator.
EntryIterator extends BaseIterator and inherits remove implementation that calls map.replaceNode(p.key, null, null) which disables conditional removal and just always removes the key.
The negative course of events could be still prevented if iterators always iterated over 'current' values and never returned old ones if some value is modified. I still don't know if that happens or not, but the test case mentioned below seems to verify the whole thing.
I think that have created a test case which shows that the behavior described in my question can really happen. Please correct me if I there are any mistakes in the code.
The code starts two threads. One of them (DELETING_THREAD) removes all entries mapped to 'false' boolean value. Another one (ADDING_THREAD) randomly puts (1, true) or (1,false) values into the map. If it puts true in the value it expects that the entry will still be there when checked and throws an exception if it is not. It throws an exception quickly when I run it locally.
package test;
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
public class MainClass {
private static final Random RANDOM = new Random();
private static final ConcurrentHashMap<Integer, Boolean> MAP = new ConcurrentHashMap<Integer, Boolean>();
private static final Integer KEY = 1;
private static final Thread DELETING_THREAD = new Thread() {
#Override
public void run() {
while (true) {
MAP.entrySet().removeIf(entry -> entry.getValue() == false);
}
}
};
private static final Thread ADDING_THREAD = new Thread() {
#Override
public void run() {
while (true) {
boolean val = RANDOM.nextBoolean();
MAP.put(KEY, val);
if (val == true && !MAP.containsKey(KEY)) {
throw new RuntimeException("TRUE value was removed");
}
}
}
};
public static void main(String[] args) throws InterruptedException {
DELETING_THREAD.setDaemon(true);
ADDING_THREAD.start();
DELETING_THREAD.start();
ADDING_THREAD.join();
}
}

Related

LRUCache entry reordering when using get

I checked out official Android documentation for LRUCache which says : Each time a value is accessed, it is moved to the head of a queue. When a value is added to a full cache, the value at the end of that queue is evicted and may become eligible for garbage collection.
I suppose this is the doubly linked list which is maintained by linkedhashmap which is used by the cache. To check this behavior, I checked out the source code for LruCache, and checked the get(K key) method. It further calls upon map's get method which gets the value from the underlying hashmap and calls upon recordAccess method.
public V get(Object key) {
LinkedHashMapEntry<K,V> e = (LinkedHashMapEntry<K,V>)getEntry(key);
if (e == null)
return null;
e.recordAccess(this);
return e.value;
}
recordAccess method in turn moves the accessed entry to the end of the list in case accessOrder is set to true (for my problem let's assume it is), else it does nothing.
/**
* This method is invoked by the superclass whenever the value
* of a pre-existing entry is read by Map.get or modified by Map.set.
* If the enclosing Map is access-ordered, it moves the entry
* to the end of the list; otherwise, it does nothing.
*/
void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
if (lm.accessOrder) {
lm.modCount++;
remove();
addBefore(lm.header);
}
}
This sounds contradictory to the above statement where it's said that the element is moved to the head of the queue. Instead it's moved to the last element of the list (using head.before). Surely, I'm missing something here, any help?

You are not missing anything, just you are reading LruCache documentation for LinkedHashMap. LinkedHashMap has its own documentation, in particular about its accessOrder. (Same on Java docs).
[...when accessOrder=true...] order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order)
So LinkedHashMap puts the most recently used entries at the end, and it is documented.
Practically LruCache describes how such cache works in theory, but LinkedHashMap shows how to implement it without adding separate backward-moving iterators: by putting the recent elements at the end, trimming can use the already available (forward-moving) iterator to access (and remove) old elements efficiently.
Though here and now I could not tell what was wrong with removeEldestEntry. Perhaps it did not exist in the past.

From the javadoc of LinkedHashMap:
If the three argument constructor is used, and accessOrder is specified as true, the iteration will be in the order that entries were accessed. The access order is affected by put, get, and putAll operations, but not by operations on the collection views.
Exactly the case, that LruCache has.
public LruCache(int maxSize) {
if (maxSize <= 0) {
throw new IllegalArgumentException("maxSize <= 0");
}
this.maxSize = maxSize;
this.map = new LinkedHashMap<K, V>(0, 0.75f, true);
}
Let's see what recordAccess() does:
void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
if (lm.accessOrder) { // true, because `LruCache` instantiated this
// map with `accessOrder = true`
lm.modCount++;
remove(); // remove this `LinkedHashMapEntry` from the map
addBefore(lm.header); // adds this entry before the current header of
// the map, thus this entry becomes the header
}
}
Instead it's moved to the last element of the list (using head.before).
I cannot see, how your statement is valid.

Is it safe to use hashmap value reference when it may be updated in another thread

Is it safe to use getParameter
Since I can tolerate the value is not latest.
And when next time I can get the latest value of Parameter
Code like this :
public class ParameterManager {
private volatile Map<String, Parameter> scenarioParameterMap = Maps.newHashMap();
public ParameterManager(String appName) throws DarwinClientException {
}
public Parameter getParameter(String scenario) {
return scenarioParameterMap.get(scenario);
}
public void update(String scenario, Map<String, String> parameters) {
if (scenarioParameterMap.containsKey(scenario)) {
Parameter parameter = scenarioParameterMap.get(scenario);
parameter.update(parameters);
} else {
scenarioParameterMap.put(scenario, new Parameter(scenario, parameters));
}
}
}
or the update is just use
scenarioParameterMap.put(scenario, new Parameter(scenario, parameters));

volatile does not help here at all. It only protects the reference held in scenarioParameterMap, not the contents of that map. Since you're not reassigning it to point to a different map at any point, volatile is extraneous.
This code is not threadsafe. You need to use proper synchronization, be that via synchronized, or using a concurrent map, or other equivalent method.
Since I can tolerate the value is not latest.
Thread non-safety can be more dangerous than that. It could give you wrong results. It could crash. You can't get by thinking that the worst case is stale data. That's not the case.
Imagine that Map.put() is in the middle of updating the map and has the internal data in some temporarily invalid state. If Map.get() runs at the same time who knows what might go wrong. Sometimes adding an entry to a hash map will cause the whole thing to be reallocated and re-bucketed. Another thread reading the map at that time would be very confused.

LinkedHashMap LRU Cache - Determine what values are removed?

Background Information
You can make a LRU cache with a LinkedHashMap as shown at this link. Basically, you just:
Extend linked hash map.
Provide a capacity parameter.
Initialize the super class (LinkedHashMap) with parameters to tell it its capacity, scaling factor (which should never be used), and to keep items in insertion/reference order.
Override removeEldestEntry to remove the oldest entry when the capacity is breached.
My Question
This is a pretty standard LRU cache implementation. But one thing that I can't figure out how to do is how to be notified when the LinkedHashMap removes an entry due to it not being used recently enough.
I know I can make removeEldestEntry provide some form of notification... but is there any way to retrieve the element that is removed from the cache right when a new one is inserted (put) into the underlying map? Alternatively, is there a way to query for the last item that was removed from the cache?

You can get it to work with some creative use of thread local storage:
class LRUCacheLHM<K,V> extends LinkedHashMap<K,V> {
private int capacity;
public LRUCacheLHM(int capacity) {
//1 extra element as add happens before remove (101), and load factor big
//enough to avoid triggering resize. True = keep in access order.
super(capacity + 1, 1.1f, true);
this.capacity = capacity;
}
private ThreadLocal<Map.Entry<K,V>> removed = new ThreadLocal<Map.Entry<K,V>>();
private ThreadLocal<Boolean> report = new ThreadLocal<Boolean>();
{
report.set(false);
}
#Override
public boolean removeEldestEntry(Map.Entry<K,V> eldest) {
boolean res = size() > capacity;
if (res && report.get()) {
removed.set(eldest);
}
return res;
}
public Map.Entry<K,V> place(K k, V v) {
report.set(true);
put(k, v);
try {
return removed.get();
} finally {
removed.set(null);
report.set(false);
}
}
}
Demo.
The idea behind the place(K,V) method is to signal to removeEldestEntry that we would like to get the eldest entry by setting a thread-local report flag to true. When removeEldestEntry sees this flag and knows that an entry is being removed, it places the eldest entry in the report variable, which is thread-local as well.
The call to removeEldestEntry happens inside the call to the put method. After that the eldest entry is either null, or is sitting inside the report variable ready to be harvested.
Calling set(null) on removed is important to avoid lingering memory leaks.

is there any way to retrieve the element that is removed from the cache right when a new one is inserted (put) into the underlying map?
The removeEldestEntry is notified of the entry to be removed. You can add a listener which this method calls if you want to make it dynamically configurable.
From the Javadoc
protected boolean removeEldestEntry(Map.Entry eldest)
eldest - The least recently inserted entry in the map, or if this is an access-ordered map, the least recently accessed entry. This is the entry that will be removed it this method returns true. If the map was empty prior to the put or putAll invocation resulting in this invocation, this will be the entry that was just inserted; in other words, if the map contains a single entry, the eldest entry is also the newest.
.
is there a way to query for the last item that was removed from the cache?
The last item removed has been removed, however you could have the sub-class store this entry in a field you can retrieve later.

What is the use of LinkedHashMap.removeEldestEntry?

I am aware the answer to this question is easily available on the internet. I need to know what happens if I choose not to removeEldestEntry. Below is my code:
package collection;
import java.util.*;
public class MyLinkedHashMap {
private static final int MAX_ENTRIES = 2;
public static void main(String[] args) {
LinkedHashMap lhm = new LinkedHashMap(MAX_ENTRIES, 0.75F, false) {
protected boolean removeEldestEntry(Map.Entry eldest) {
return false;
}
};
lhm.put(0, "H");
lhm.put(1, "E");
lhm.put(2, "L");
lhm.put(3, "L");
lhm.put(4, "O");
System.out.println("" + lhm);
}
}
Even though I am not allowing the removeEldestEntry my code works fine.
So, internally what is happening?

removeEldestEntry always gets checked after an element was inserted. For example, if you override the method to always return true, the LinkedHashMap will always be empty, since after every put or putAll insertion, the eldest element will be removed, no matter what. The JavaDoc shows a very sensible example on how to use it:
protected boolean removeEldestEntry(Map.Entry eldest){
return size() > MAX_SIZE;
}
In an alternative way, you might only want to remove an entry if it is unimportant:
protected boolean removeEldestEntry(Map.Entry eldest){
if(size() > MAX_ENTRIES){
if(isImportant(eldest)){
//Handle an important entry here, like reinserting it to the back of the list
this.remove(eldest.getKey());
this.put(eldest.getKey(), eldest.getValue());
//removeEldestEntry will be called again, now with the next entry
//so the size should not exceed the MAX_ENTRIES value
//WARNING: If every element is important, this will loop indefinetly!
} else {
return true; //Element is unimportant
}
return false; //Size not reached or eldest element was already handled otherwise
}

Why can't people just answer the OP's simple question!
If removeEldestEntry returns false then no items will ever be removed from the map and it will essentially behave like a normal Map.

Expanding on the answer by DavidNewcomb:
I'm assuming that you are learning how to implement a cache.
The method LinkedHashMap.removeEldestEntry is a method very commonly used in cache data structures, where the size of the cache is limited to a certain threshold. In such cases, the removeEldestEntry method can be set to automatically remove the oldest entry when the size exceeds the threshold (defined by the MAX_ENTRIES attribute) - as in the example provided here.
On the other hand, when you override the removeEldestEntry method this way, you are ensuring that nothing ever happens when the MAX_ENTRIES threshold is exceeded. In other words, the data structure would not behave like a cache, but rather a normal map.

Your removeEldestEntry method is identical to the default implementation of LinkedHashMap.removeEldestEntry, so your LinkedHashMap will simply behave like a normal LinkedHashMap with no overridden methods, retaining whatever you values and keys put into it unless and until you explicitly remove them by calling remove, removeAll, clear, etc. The advantage of using LinkedHashMap is that the collection views (keySet(), values(), entrySet()) always return Iterators that traverse the keys and/or values in the order they were added to the Map.

Updating a PriorityQueue when iterating it

I need to update some fixed-priority elements in a PriorityQueue based on their ID. I think it's quite a common scenario, here's an example snippet (Android 2.2):
for (Entry e : mEntries) {
if (e.getId().equals(someId)) {
e.setData(newData);
}
}
I've then made Entry "immutable" (no setter methods) so that a new Entry instance is created and returned by setData(). I modified my method into this:
for (Entry e : mEntries) {
if (e.getId().equals(someId)) {
Entry newEntry = e.setData(newData);
mEntries.remove(e);
mEntries.add(newEntry);
}
}
The code seems to work fine, but someone pointed out that modifying a queue while iterating over it is a bad idea: it may throw a ConcurrentModificationException and I'd need to add the elements I want to remove to an ArrayList and remove it later. He didn't explain why, and it looks quite an overhead to me, but I couldn't find any specific explanation on internet.
(This post is similar, but there priorities can change, which is not my case)
Can anyone clarify what's wrong with my code, how should I change it and - most of all - why?
Thanks,
Rippel
PS: Some implementation details...
PriorityQueue<Entry> mEntries = new PriorityQueue<Entry>(1, Entry.EntryComparator());
with:
public static class EntryComparator implements Comparator<Entry> {
public int compare(Entry my, Entry their) {
if (my.mPriority < their.mPriority) {
return 1;
}
else if (my.mPriority > their.mPriority) {
return -1;
}
return 0;
}
}

This code is in the Java 6 implementation of PriorityQueue:
private class Itr implements Iterator<E> {
/**
* The modCount value that the iterator believes that the backing
* Queue should have. If this expectation is violated, the iterator
* has detected concurrent modification.
*/
private int expectedModCount = modCount;
public E next() {
if(expectedModCount != modCount) {
throw new ConcurrentModificationException();
}
}
}
Now, why is this code here? If you look at the Javadoc for ConcurrentModificationException you will find that the behaviour of an iterator is undefined if modification occurs to the underlying collection before iteration completes. As such, many of the collections implement this modCount mechanism.
To fix your code
You need to ensure that you don't modify the code mid-loop. If your code is single threaded (as it appears to be) then you can simply do as your coworker suggested and copy it into a list for later inclusion. Also, the use of the Iterator.remove() method is documented to prevent ConcurrentModificationExceptions. An example:
List<Entry> toAdd = new ArrayList<Entry>();
Iterator it = mEntries.iterator();
while(it.hasNext()) {
Entry e = it.next();
if(e.getId().equals(someId)) {
Entry newEntry = e.setData(newData);
it.remove();
toAdd.add(newEntry);
}
}
mEntries.addAll(toAdd);

The Javadoc for PriorityQueue says explicitly:
"Note that this implementation is not synchronized. Multiple threads should not access a PriorityQueue instance concurrently if any of the threads modifies the list structurally. Instead, use the thread-safe PriorityBlockingQueue class."
This seems to be your case.

What's wrong in your code was already explained -- implementing iterator, which can consistently iterate through collection with intersected modification is rather hard task to do. You need to specify how to deal with removed items (will it be seen through iterator?), added items, modified items... Even if you can do it consistently it will be rather complex and unefficient implementation -- and, mostly, not very usefull, since use case "iterate without modifications" is much more common. So, java architects choose to deny modification while iterate, and most collections from Java collection API follow this, and throw ConcurrentModificationException if such modification detected.
As for your code -- for me, your just should not make items immutable. Immutability is great thing, but it should not be overused. If Entry object you use here is some kind of domain object, and you really want them to be immutable -- you can just create some kind of temporary data holder (MutableEntry) object, use it inside your algorithm, and copy data to Entry before return. From my point of view it will be best solution.

a slightly better implementation is
List<Entry> toAdd = new ArrayList<Entry>();
for (Iterator<Entry> it= mEntries.iterator();it.hasNext();) {
Entry e = it.next();
if (e.getId().equals(someId)) {
Entry newEntry = e.setData(newData);
it.remove();
toAdd.add(newEntry);
}
}
mEntries.addAll(toAdd);
this uses the remove of the iterator and a bulk add afterwards

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.