Does ConcurrentHashMap need synchronization when incrementing its values?

Does ConcurrentHashMap need synchronization when incrementing its values? - java

I know ConcurrentHashMap is thread-safe e.g.putIfAbsent,Replace etc., but I was wondering, is a block of code like the one below safe?
if (accumulator.containsKey(key)) { //accumulator is a ConcurrentHashMap
accumulator.put(key, accumulator.get(key)+1);
} else {
accumulator.put(key, 0);
}
Keep in mind that the accumulator value for a key may be asked by two different threads simultaneously, which would cause a problem in a normal HashMap. So do I need something like this?
ConcurrentHashMap<Integer,Object> locks;
...
locks.putIfAbsent(key,new Object());
synchronized(locks.get(key)) {
if (accumulator.containsKey(key)) {
accumulator.put(key, accumulator.get(key)+1);
} else {
accumulator.put(key, 0);
}
}

if (accumulator.containsKey(key)) { //accumulator is a ConcurrentHashMap
accumulator.put(key, accumulator.get(key)+1);
} else {
accumulator.put(key, 0);
}
No, this code is not thread-safe; accumulator.get(key) can be changed in between the get and the put, or the entry can be added between the containsKey and the put. If you're in Java 8, you can write accumulator.compute(key, (k, v) -> (v == null) ? 0 : v + 1), or any of the many equivalents, and it'll work. If you're not, the thing to do is write something like
while (true) {
Integer old = accumulator.get(key);
if (old == null) {
if (accumulator.putIfAbsent(key, 0) == null) {
// note: it's a little surprising that you want to put 0 in this case,
// are you sure you don't mean 1?
break;
}
} else if (accumulator.replace(key, old, old + 1)) {
break;
}
}
...which loops until it manages to make the atomic swap. This sort of loop is pretty much how you have to do it: it's how AtomicInteger works, and what you're asking for is AtomicInteger across many keys.
Alternately, you can use a library: e.g. Guava has AtomicLongMap and ConcurrentHashMultiset, which also do things like this.

I think the best solution for you would be to use an AtomicInteger. The nice feature here is that it is non-blocking, mutable and thread-safe. You can use the replace method offered by the CHM, but with that you will have to hold a lock of the segment/bucket-entry prior to the replace completing.
With the AtomicInteger you leverage quick non-blocking updates.
ConcurrentMap<Key, AtomicInteger> map;
then
map.get(key).incrementAndGet();
If you are using Java 8, LongAdder would be better.

You are correct that your first code snippet is unsafe. It's totally reasonable for the thread to get interrupted right after the check has been performed and for another thread to begin executing. Therefore in the first snippet the following could happen:
[Thread 1]: Check for key, return false
[Thread 2]: Check for key, return false
[Thread 2]: Put value 0 in for key
[Thread 1]: Put value 0 in for key
In this example, the behavior you would want would leave you in a state with the value for that key being set to 1, not 0.
Therefore locking is necessary.

Only individual actions on ConcurrentHashMap are thread-safe; taking multiple actions in sequence is not. Your first block of code is not thread-safe. It is possible, for example:
THREAD A: accumulator.containsKey(key) = false
THREAD B: accumulator.containsKey(key) = false
THREAD B: accumulator.put(key, 0)
THREAD A: accumulator.put(key, 0)
Similary, it is not thread-safe to get the accumulator value for a given key, increment it, and then put it back in the map. This is a three-step process, and it is possible for another thread to interrupt at any point.
Your second synchronized block of code is thread-safe.

Related

Under which circumstances can toSet throw an java.lang.IllegalArgumentException?

Based on our Crashlytics logs it seems that we're running into the following exception from time to time:
Fatal Exception: java.lang.IllegalArgumentException
Illegal initial capacity: -1
...
java.util.HashMap.<init> (HashMap.java:448)
java.util.LinkedHashMap.<init> (LinkedHashMap.java:371)
java.util.HashSet.<init> (HashSet.java:161)
java.util.LinkedHashSet.<init> (LinkedHashSet.java:146)
kotlin.collections.CollectionsKt___CollectionsKt.toSet (CollectionsKt___CollectionsKt.java:1316)
But we're not sure when it is possible that this exception is actually thrown. The relevant code for this statement looks something like this:
private val markersMap = mutableMapOf<Any, Marker>()
...
synchronized(markersMap) {
val currentMarkers = markersMap.values.toSet() //it crashes here
// performing some operation on the markers
}
Right now we're suspecting multithreading to cause the issue as the markersMap is modified in multiple places, but as the map is already initialized by default we're not really sure how it can end up in less than an empty state. We also took a look at the toSet implementation:
if (this is Collection) {
return when (size) {
0 -> emptySet()
1 -> setOf(if (this is List) this[0] else iterator().next())
else -> toCollection(LinkedHashSet<T>(mapCapacity(size)))
}
}
Based on this, we'd assume that mapCapacity(size) returns -1, but we weren't able to find the actual implementation of mapCapacity to verify when this can happen.
Does anybody know when -1 is returned here, which in turn causes the constructor to fail?

Java collections are not synchronized and if you need to access a Map or any collection from multiple threads then you are required to take care of synchonization. as stated in LinkedHashMap's header
Note that this implementation is not synchronized.If multiple threads
access a linked hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally.
My guess is that you are probably performing structural modifications(mix of put and remove) on the Map without synchronization, which can cause this issue. for example
fun main(){
val markersMap = mutableMapOf<Any, Any>()
(1..1000).forEach { markersMap.put(it, "$it") }
val t1 = Thread{
(1..1000).forEach { markersMap.remove(it)
if(markersMap.size < 0){
print("SIZE IS ${markersMap.size}")
}
}
}
val t2 = Thread{
(1..1000).forEach {
markersMap.remove(it)
if(markersMap.size < 0){
print("SIZE IS ${markersMap.size}")
}
}
}
t1.start()
t2.start()
}
On my machine this code prints SIZE IS -128, SIZE IS -127 and lot many other negative values and when I added markersMap.values.toSet() inside one of the if blocks, this happened

Java ConcurrentHashMap actions atomicity

This may be a duplicate question, but I've found this part of code in a book about concurrency. This is said to be thread-safe:
ConcurrentHashMap<String, Integer> counts = new ...;
private void countThing(String thing) {
while (true) {
Integer currentCount = counts.get(thing);
if (currentCount == null) {
if (counts.putIfAbsent(thing, 1) == null)
break;
} else if (counts.replace(thing, currentCount, currentCount + 1)) {
break;
}
}
}
From my (concurrency beginners') point of view, thread t1 and thread t2 could both read currentCount = 1. Then both threads could change the maps' value to 2. Can someone please explain me if the code is okay or not?

The trick is that replace(K key, V oldValue, V newValue) provides the atomicity for you. From the docs (emphasis mine):
Replaces the entry for a key only if currently mapped to a given value. ... the action is performed atomically.
The key word is "atomically." Within replace, the "check if the old value is what we expect, and only if it is, replace it" happens as a single chunk of work, with no other threads able to interleave with it. It's up to the implementation to do whatever synchronization it needs to make sure that it provides this atomicity.
So, it can't be that both threads see currentAction == 1 from within the replace function. One of them will see it as 1, and thus its invocation to replace will return true. The other will see it as 2 (because of the first call), and thus return false — and loop back to try again, this time with the new value of currentAction == 2.
Of course, it could be that a third thread has updated currentAction to 3 in the meanwhile, in which case that second thread will just keep trying until it's lucky enough to not have anyone jump ahead of it.

Can someone please explain me if the code is okay or not?
In addition to yshavit's answer, you can avoid writing your own loop by using compute which was added in Java 8.
ConcurrentMap<String, Integer> counts = new ...;
private void countThing(String thing) {
counts.compute(thing, (k, prev) -> prev == null ? 1 : 1 + prev);
}

With put you can too replace the value.
if (currentCount == null) {
counts.put(thing, 2);
}

Concurrent hashmap simultaneous insertion

I have read that in concurrent hashmap in Java, simultaneous insertions are possible because it is divided into segments and separate lock is taken for each segment.
But if two insertions are going to happen on same segment, then these simultaneous will not happen.
My question is what will happen in such a case? Will second insertion waits till first one gets completed or what?

In general you don't need be too concerned how ConcurrentHashMap is implemented. It simply complies to the the contract of ConcurrentMap which ensures that concurrent modifications are possible.
But to answer your question: yes, one insertion may wait for completion of the other one. Internally, it uses locks which ensure that one thread is waiting until the other one releases the lock. Class Segment used internally actually inherits from ReentrantLock. Here is a shortened version of Segmenet.put():
final V put(K key, int hash, V value, boolean onlyIfAbsent) {
HashEntry<K,V> node = tryLock() ? null : scanAndLockForPut(key, hash, value);
V oldValue;
try {
// modifications
} finally {
unlock();
}
return oldValue;
}
private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {
// ...
int retries = -1; // negative while locating node
while (!tryLock()) {
if (retries < 0) {
// ...
}
else if (++retries > MAX_SCAN_RETRIES) {
lock();
break;
}
else if ((retries & 1) == 0 && (f = entryForHash(this, hash)) != first) {
e = first = f; // re-traverse if entry changed
retries = -1;
}
}
return node;
}
This could give you an idea.

ConcurrentHashMap does not block when performing retrieval operations, and there is no locking for the usual operations.
The heuristic with most Concurrent Data Structures is that there's a backing data structure that gets modified first, with a front-facing data structure that's visible to outside methods. Then, when the modification is complete, the backing data structure is made the public data structure and the public data structure is pushed to the back. There's way more to it than that, but that's the typical contract.

If 2 updates try to happen on the same segment they will go into contention with each other and one of them will have to wait. You can optimise this by choosing a concurrencyLevel value which takes into account the number of threads which will be concurrently updating the hashmap.
You can find all the details in the javadoc for the class

ConcurrentHashMap contains array of Segment which in turn holds array of HashEntry. Each HashEntry holds a key, a value, and a pointer to it's next adjacent entry.
But it acquires the lock in segment level. Hence you are correct. i.e second insertion waits till first one gets completed

Take a look at the javadoc for ConcurrentMap. It describes the extra methods available to deal with concurrent map mutations.

ArrayList vs Vector - does this illustrate difference in synchronization?

I'm trying to understand the difference in behaviour of an ArrayList and a Vector. Does the following snippet in any way illustrate the difference in synchronization ? The output for the ArrayList (f1) is unpredictable while the output for the Vector (f2) is predictable. I think it may just be luck that f2 has predictable output because modifying f2 slightly to get the thread to sleep for even a ms (f3) causes an empty vector ! What's causing that ?
public class D implements Runnable {
ArrayList<Integer> al;
Vector<Integer> vl;
public D(ArrayList al_, Vector vl_) {
al = al_;
vl = vl_;
}
public void run() {
if (al.size() < 20)
f1();
else
f2();
} // 1
public void f1() {
if (al.size() == 0)
al.add(0);
else
al.add(al.get(al.size() - 1) + 1);
}
public void f2() {
if (vl.size() == 0)
vl.add(0);
else
vl.add(vl.get(vl.size() - 1) + 1);
}
public void f3() {
if (vl.size() == 0) {
try {
Thread.sleep(1);
vl.add(0);
} catch (InterruptedException e) {
System.out.println(e.getMessage());
}
} else {
vl.add(vl.get(vl.size() - 1) + 1);
}
}
public static void main(String... args) {
Vector<Integer> vl = new Vector<Integer>(20);
ArrayList<Integer> al = new ArrayList<Integer>(20);
for (int i = 1; i < 40; i++) {
new Thread(new D(al, vl), Integer.toString(i)).start();
}
}
}

To answer the question: Yes vector is synchronized, this means that concurrent actions on the data structure itself won't lead to unexpected behavior (e.g. NullPointerExceptions or something). Hence calls like size() are perfectly safe with a Vector in concurrent situations, but not with an ArrayList (note if there are only read accesses ArrayLists are safe too, we get into problems as soon as at least one thread writes to the datastructure, e.g. add/remove)
The problem is, that this low level synchronization is basically completely useless and your code already demonstrates this.
if (al.size() == 0)
al.add(0);
else
al.add(al.get(al.size() - 1) + 1);
What you want here is to add a number to your datastructure depending on the current size (ie if N threads execute this, in the end we'd want the list to contain the numbers [0..N)). Sadly that does not work:
Assume that 2 threads execute this code sample concurrently on an empty list/vector. The following timeline is quite possible:
T1: size() # go to true branch of if
T2: size() # alas we again take the true branch.
T1: add(0)
T2: add(0) # ouch
Both execute size() and get back the value 0. They then go into the true branch of the and both add 0 to the datastructure. That's not what you want.
Hence you'll have to synchronize in your business logic anyhow to make sure that size() and add() are executed atomically. Hence the synchronization of vector is quite useless in almost any scenario (contrary to some claims on modern JVMs the performance hit of an uncontended lock is completely negligible though, but the Collections API is much nicer so why not use it)

In The Beginning (Java 1.0) there was the "synchronized vector".
Which entailed a potentially HUGE performance hit.
Hence the addition of "ArrayList" and friends in Java 1.2 onwards.
Your code illustrates the rationale for making vectors synchronized in the first place. But it's simply unnecessary most of the time, and better done in other ways most of the rest of the time.
IMHO...
PS:
An interesting link:
http://www.coderanch.com/t/523384/java/java/ArrayList-Vector-size-incrementation

Vectors are Thread safe. ArrayLists are not. That is why ArrayList is faster than the vector.
The below link has nice info about this.
http://www.javaworld.com/javaworld/javaqa/2001-06/03-qa-0622-vector.html

I'm trying to understand the difference in behaviour of an ArrayList
and a Vector
Vector is synchronized while ArrayList is not. ArrayList is not thread-safe.
Does the following snippet in any way illustrate the difference in
synchronization ?
No difference since only Vector is sunchronized

Java synchronized block using method call to get synch object

We are writing some locking code and have run into a peculiar question. We use a ConcurrentHashMap for fetching instances of Object that we lock on. So our synchronized blocks look like this
synchronized(locks.get(key)) { ... }
We have overridden the get method of ConcurrentHashMap to make it always return a new object if it did not contain one for the key.
#Override
public Object get(Object key) {
Object o = super.get(key);
if (null == o) {
Object no = new Object();
o = putIfAbsent((K) key, no);
if (null == o) {
o = no;
}
}
return o;
}
But is there a state in which the get-method has returned the object, but the thread has not yet entered the synchronized block. Allowing other threads to get the same object and lock on it.
We have a potential race condition were
thread 1: gets the object with key A, but does not enter the synchronized block
thread 2: gets the object with key A, enters a synchronized block
thread 2: removes the object from the map, exits synchronized block
thread 1: enters the synchronized block with the object that is no longer in the map
thread 3: gets a new object for key A (not the same object as thread 1 got)
thread 3: enters a synchronized block, while thread 1 also is in its synchronized block both using key A
This situation would not be possible if java entered the synchronized block directly after the call to get has returned. If not, does anyone have any input on how we could remove keys without having to worry about this race condition?

As I see it, the problem originates from the fact that you lock on map values, while in fact you need to lock on the key (or some derivation of it). If I understand correctly, you want to avoid 2 threads from running the critical section using the same key.
Is it possible for you to lock on the keys? can you guarantee that you always use the same instance of the key?
A nice alternative:
Don't delete the locks at all. Use a ReferenceMap with weak values. This way, a map entry is removed only if it is not currently in use by any thread.
Note:
1) Now you will have to synchronize this map (using Collections.synchronizedMap(..)).
2) You also need to synchronize the code that generates/returns a value for a given key.

you have 2 options:
a. you could check the map once inside the synchronized block.
Object o = map.get(k);
synchronized(o) {
if(map.get(k) != o) {
// object removed, handle...
}
}
b. you could extend your values to contain a flag indicating their status. when a value is removed from the map, you set a flag indicating that it was removed (within the sync block).
CacheValue v = map.get(k);
sychronized(v) {
if(v.isRemoved()) {
// object removed, handle...
}
}

The code as is, is thread safe. That being said, if you are removing from the CHM then any type of assumptions that are made when synchronizing on an object returned from the collection will be lost.
But is there a state in which the
get-method has returned the object,
but the thread has not yet entered the
synchronized block. Allowing other
threads to get the same object and
lock on it.
Yes, but that happens any time you synchronize on an Object. What is garunteed is that the other thread will not enter the synchronized block until the other exists.
If not, does anyone have any input on
how we could remove keys without
having to worry about this race
condition?
The only real way of ensuring this atomicity is to either synchronize on the CHM or another object (shared by all threads). The best way is to not remove from the CHM.

Thanks for all the great suggestions and ideas, really appreciate it! Eventually this discussion made me come up with a solution that does not use objects for locking.
Just a brief description of what we're actually doing.
We have a cache that receives data continuously from our environment. The cache has several 'buckets' for each key and aggregated events into the buckets as they come in. The events coming in have a key that determines the cache entry to be used, and a timestamp determining the bucket in the cache entry that should be incremented.
The cache also has an internal flush task that runs periodically. It will iterate all cache entries and flushes all buckets but the current one to database.
Now the timestamps of the incoming data can be for any time in the past, but the majority of them are for very recent timestamps. So the current bucket will get more hits than buckets for previous time intervals.
Knowing this, I can demonstrate the race condition we had. All this code is for one single cache entry, since the issue was isolated to concurrent writing and flushing of single cache elements.
// buckets :: ConcurrentMap<Long, AtomicLong>
void incrementBucket(long timestamp, long value) {
long key = bucketKey(timestamp, LOG_BUCKET_INTERVAL);
AtomicLong bucket = buckets.get(key);
if (null == bucket) {
AtomicLong newBucket = new AtomicLong(0);
bucket = buckets.putIfAbsent(key, newBucket);
if (null == bucket) {
bucket = newBucket;
}
}
bucket.addAndGet(value);
}
Map<Long, Long> flush() {
long now = System.currentTimeMillis();
long nowKey = bucketKey(now, LOG_BUCKET_INTERVAL);
Map<Long, Long> flushedValues = new HashMap<Long, Long>();
for (Long key : new TreeSet<Long>(buckets.keySet())) {
if (key != nowKey) {
AtomicLong bucket = buckets.remove(key);
if (null != bucket) {
long databaseKey = databaseKey(key);
long n = bucket.get()
if (!flushedValues.containsKey(databaseKey)) {
flushedValues.put(databaseKey, n);
} else {
long sum = flushedValues.get(databaseKey) + n;
flushedValues.put(databaseKey, sum);
}
}
}
}
return flushedValues;
}
What could happen was: (fl = flush thread, it = increment thread)
it: enters incrementBucket, executes until just before the call to addAndGet(value)
fl: enters flush and iterates the buckets
fl: reaches the bucket that is being incremented
fl: removes it and calls bucket.get() and stores the value to the flushed values
it: increments the bucket (which will be lost now, because the bucket has been flushed and removed)
The solution:
void incrementBucket(long timestamp, long value) {
long key = bucketKey(timestamp, LOG_BUCKET_INTERVAL);
boolean done = false;
while (!done) {
AtomicLong bucket = buckets.get(key);
if (null == bucket) {
AtomicLong newBucket = new AtomicLong(0);
bucket = buckets.putIfAbsent(key, newBucket);
if (null == bucket) {
bucket = newBucket;
}
}
synchronized (bucket) {
// double check if the bucket still is the same
if (buckets.get(key) != bucket) {
continue;
}
done = true;
bucket.addAndGet(value);
}
}
}
Map<Long, Long> flush() {
long now = System.currentTimeMillis();
long nowKey = bucketKey(now, LOG_BUCKET_INTERVAL);
Map<Long, Long> flushedValues = new HashMap<Long, Long>();
for (Long key : new TreeSet<Long>(buckets.keySet())) {
if (key != nowKey) {
AtomicLong bucket = buckets.get(key);
if (null != value) {
synchronized(bucket) {
buckets.remove(key);
long databaseKey = databaseKey(key);
long n = bucket.get()
if (!flushedValues.containsKey(databaseKey)) {
flushedValues.put(databaseKey, n);
} else {
long sum = flushedValues.get(databaseKey) + n;
flushedValues.put(databaseKey, sum);
}
}
}
}
}
return flushedValues;
}
I hope this will be useful for others that might run in to the same problem.

The two code snippets you've provided are fine, as they are. What you've done is similar to how lazy instantiation with Guava's MapMaker.makeComputingMap() might work, but I see no problems with the way that the keys are lazily created.
You're right by the way that it's entirely possible for a thread to be prempted after the get() lookup of a lock object, but before entering sychronized.
My problem is with the third bullet point in your race condition description. You say:
thread 2: removes the object from the map, exits synchronized block
Which object, and which map? In general, I presumed that you were looking up a key to lock on, and then would be performing some other operations on other data structures, within the synchronized block. If you're talking about removing the lock object from the ConcurrentHashMap mentioned at the start, that's a massive difference.
And the real question is whether this is necessary at all. In a general purpose environment, I don't think there will be any memory issues with just remembering all of the lock objects for all the keys that have ever been looked up (even if those keys no longer represent live objects). It is much harder to come up with some way of safely disposing of an object that may be stored in a local variable of some other thread at any time, and if you do want to go down this route I have a feeling that performance will degrade to that of a single coarse lock around the key lookup.
If I've misunderstood what's going on there then feel free to correct me.
Edit: OK - in which case I stand by my above claim that the easiest way to do this is not remove the keys; this might not actually be as problematic as you think, since the rate at which the space grows will be very small. By my calculations (which may well be off, I'm not an expert in space calculations and your JVM may vary) the map grows by about 14Kb/hour. You'd have to have a year of continuous uptime before this map used up 100MB of heap space.
But let's assume that the keys really do need to be removed. This poses the problem that you can't remove a key until you know that no threads are using it. This leads to the chicken-and-egg problem that you'll require all threads to synchronize on something else in order to get atomicity (of checking) and visibility across threads, which then means that you can't do much else than slap a single synchronized block around the whole thing, completely subverting your lock striping strategy.
Let's revisit the constraints. The main thing here is that things get cleared up eventually. It's not a correctness constraint but just a memory issue. Hence what we really want to do is identify some point at which the key could definitely no longer be used, and then use this as the trigger to remove it from the map. There are two cases here:
You can identify such a condition, and logically test for it. In which case you can remove the keys from the map with (in the worst case) some kind of timer thread, or hopefully some logic that's more cleanly integrated with your application.
You cannot identify any condition by which you know that a key will no longer be used. In this case, by definition, there is no point at which it's safe to remove the keys from the map. So in fact, for correctness' sake, you must leave them in.
In any case, this effectively boils down to manual garbage collection. Remove the keys from the map when you can lazily determine that they're no longer going to be used. Your current solution is too eager here since (as you point out) it's doing the removal before this situation holds.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.