I'am trying to clarify HashMap vs ConcurrentHashMap regarding type-safety and also performance. I came across a lot of good articles, but still getting troubles figuring it all out.
Let's take the following example using a ConcurrentHashMap, where I will try to add a value for a key not already there and returning it, the new way of doing it would be:
private final Map<K,V> map = new ConcurrentHashMap<>();
return map.putIfAbsent(k, new Object());
let's assume we don't want to use the putIfAbsent method, the above code should look something like this:
private final Map<K,V> map = new ConcurrentHashMap<>();
synchronized (map) {
V value = map.get(key); //Edit adding the value fetch inside synchronized block
if (!nonNull(value)) {
map.put(key, new Object());
}
}
return map.get(key)
Is the problem with this approach the fact that the whole map is locked whereas in first approach the putIfAbsent method only synchronizes on the bucket on which the hash of the key is, and thus leading to less performance ? Would the second approach work fine with just a HashMap ?
Is the problem with this approach the fact that the whole map is locked
There are two problems with this approach.
It's not intrinsic
The fact that you've acquired the lock on the map reference has zero effect whatsoever, except in regards to any other code that (tries) to acquire this lock. Crucially, ConcurrentHashmap itself does not acquire this lock.
So, if, during that second snippet (with synchronized), some other thread does this:
map.putIfAbsent(key, new Object());
Then it may occur that your map.get(key) call returns null, and nevertheless your followup map.put call ends up overwriting. In other words, that both your thread, and that hypothetical thread running putIfAbsent, both decided to write.
Presumably, if that is just fine in your book, that'd be weird. Why use putIfAbsent and check if map.get returns null in the first place?
Had the other thread done this:
synchronized (map) {
map.putIfAbsent(key, new Object());
}
then there'd be no problem; either your get-check-if-null-then-set code will set and the putIfAbsent call is a noop, or vice versa, but they couldn't possibly both 'decide to write'.
Which leads us to;
This is pointless
There are two different ways to achieve concurrency with maps: Intrinsic and extrinsic. There is zero point in doing both, and they do not interact.
If you have structure whereby all access (both read and write) out of a plain old entirely non-multicore capable java.util.HashMap goes through some shared lock (the hashmap instance itself, or any other lock, long as all threads that interact with that particular map instance use the same one), then that works fine and there is therefore no reason or point to using ConcurrentHashMap instead.
The point of ConcurrentHashMap is to streamline concurrent processes without the use of extrinsic locking: To let the map do the locking.
One of the reasons you want this is that the ConcurrentHashMap impl is significantly faster at the jobs it is capable of doing; these jobs are spelled out explicitly: It's the methods that ConcurrentHashMap has.
Atomicity
The central problem of your code snippet is that it lacks atomicity. Check-then-act is fundamentally broken in concurrent models (in your case: Check: Is key 'k' associated with no value or null?, then Act: Set the mapping of key 'k' to value 'v'). This is broken because what if the thing you checked changes in between? What if you have two threads that both 'check-and-act' and then run simultaneously; then they both check first, then both act first, and broken things ensue: One of the two threads will be acting upon a state that isn't equal to the state as it was when you checked, which means your check's broken.
The right model is act-then-check: Act first, and then check the result of the operation. Of course, this requires redefining, and integrating, the code you wrote explicitly in your snippet, into the very definition of your 'act' phase.
In other words, putIfAbsent is not a convenience method! is a fundamental operation! It's the only way (short of extrinsic locking) to convey the notion of: "Perform the action of associating 'v' with 'k', but only if there is no association yet. I'll check the results of this operation next". There is no way to break that down into if (!map.containsKey(key)) map.put(key, v); because check-then-act does not work in concurrent modelling.
Conclusions
Either get rid of concurrenthashmap, or get rid of synchronized. Having code that uses both is probably broken and even if it isn't, it's error prone, confusing, and I can guarantee you there's a much better way to write it (better in that it is more idiomatic, easier to read, more flexible in the face of future change requests, easier to test, and less likely to have hard-to-test-for bugs in it).
If you can state all operations you need to perform 100% in terms of the methods that CHM has, then do that, because CHM is vastly superior. It even has mechanisms for arbitrary operations: For example, unlike basic hashmaps, you can iterate through a CHM even if other threads are also messing with it, whereas with a normal hashmap you need to hold the lock for the entire duration of the operation, which means any other thread trying to do anything to that hashmap, even just 'ask for its size', need to wait. Hence, for most use cases, CHM results in orders of magnitude better performance.
in first approach the putIfAbsent method only synchronizes on the bucket
That is incorrect, ConcurrentHashMap doesn't synchronize on anything, it uses different mechanics to ensure thread safety.
Would the second approach work fine with just a HashMap ?
Yes, except the second approach is flawed. If using synchronization to make a Map thread-safe, then all access of the Map should use synchronization. As such, it would be best to call Collections.synchronizedMap(map). Performance will be worse than using ConcurrentHashMap.
private final Map<Integer, Object> map = Collections.synchronizedMap(new HashMap<>());
let's assume we don't want to use the putIfAbsent method.
Why? Oh, because it wastes a allocation if the key is already in the map, which is why we should be using computeIfAbsent() instead
map.computeIfAbsent(key, k -> new Object());
Related
I don't know why I can't get my head around this question.
I see examples on the internet where people talk about using striped locking to synchronize a map. The idea is to use multiple locks to allow for more concurrency, while retaining correctness. But is an approach like this actually correct?
I believe that the idea is to have one lock per bucket, the same thing that ConcurrentHashMap does under the hood, but how can one achieve such a feat given the fact the the mapping key -> bucket is an internal Map implementation detail, and is not actually possible to match it from the outside of the Map?
Take this example:
public void concurrentMethod(String key) {
val lock = stripedLock.get(key))
lock.lock()
// do work on map[key]
lock.unlock()
}
there's no guarantee that stripedLocks.get(key) will return the same lock for 2 keys that will end up in the same bucket. So, as far as I can tell, this is not correct and the synchronization doesn't actually work.
Is my reasoning wrong here? Can an approach like this lead to correct synchronization?
I am caching an object, which is created by a thread, into a map. The creation of the object is expensive, so I don't want multiple threads running to create the object because the put() hasn't returned. Once a thread tries to create an object for that key, other threads shouldn't try to create the object, even if put is not yet complete. Will using computeIfAbsent() work to acquire a 'lock' on that particular key? If not, is there another way to achieve this?
> Will using computeIfAbsent() work to acquire a 'lock' on that particular key?
Yes; per the Javadoc for ConcurrentHashMap.computeIfAbsent(...):
The entire method invocation is performed atomically, so the function is applied at most once per key.
That's really the whole point of the method.
However, to be clear, the lock is not completely specific to that one key; rather, ConcurrentHashMap typically works by splitting the map into multiple segments, and having one lock per segment. This allows a great deal of concurrency, and is usually the most efficient approach; but you should be aware that it means that some threads might block on your object creation even if they're not actually touching the same key.
If this is a problem for you, then another approach is to use something like ConcurrentHashMap<K, AtomicReference<V>> to decouple adding the map entry from populating the map entry. (AtomicReference<V> doesn't have a computeIfAbsent method, but at that point you can just use normal double-checked locking with a combination of get() and synchronized.)
Took some research, but we were probably after is the Java ConcurrentHashMap equivalent of .NET's .TryAdd method. Which is Java world is:
putIfAbsent
public V putIfAbsent(K key, V value);
If the specified key is not already associated with a value, associate it with the given value. This is equivalent to:
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
except that the action is performed atomically.
I knew an atomic add operation had to exist; just was not easy to find. (which is odd because it's like the very first thing anyone would ever need to call).
Hashtable and Collections.synchronizedMap are thread safe but still compound operations like
if (!map_obj.containsKey(key)) {
map_obj.put(key, value);
}
needs external synchronization as:
synchronized(map_obj) {
if (!map_obj.containsKey(key)) {
map_obj.put(key, value);
}
}
Suppose we have ConcurrentHashMap(CHM) instead of Hashtable or HashMap. CHM provides an alternative putIfAbsent() method for the above compound operation, thus removing the need of external synchronization.
But suppose there is no putIfAbsent() provided by CHM. Then can we write following code:
synchronized(concurrenthashmap_obj) {
if (!concurrenthashmap_obj.containsKey(key)) {
concurrenthashmap_obj.put(key, value);
}
}
I mean can we use external synchronization on CHM object?Will it work?
For above compound operation there is putIfAbsent() method in CHM but how can we achieve thread safety for other compound operations if we are using CHM. I mean can we use external synchronization on CHM object?
No, you cannot use external synchronization to ensure atomicity of compound operations over ConcurrentHashMap.
To be precise, you can use external synchronization to ensure atomicity of compound operations, but only if all operations with ConcurrentHashMap are synchronized over the same lock as well (though use of ConcurrentHashMap won't make sense in this case - you can replace it with regular HashMap).
Approach with external synchronization works with Hashtable and Collections.synchronizedMap() only because they guarantee that their primitive operations are synchronized over these objects as well. Since ConcurrentHashMap doesn't provide such a guarantee, primitive operations may interfere with execution of your compound operations, breaking their atomicity.
However, ConcurrentHashMap provides number of methods that can be used to implement compound operations in optimistic manner:
putIfAbsent(key, value)
remove(key, value)
replace(key, value)
replace(key, oldValue, newValue)
You can use these operation to implement certain compound operations without explict synchronization, the same way as you would do for AtomicReference, etc.
There isn't any reason why you can't. Traditional synchronization works with everything, there aren't special exceptions against them. ConcurrentHashMaps simply use more optimized thread-safety mechanisms, if you wish to do something more complex, falling back to traditional synchronization may actually be your only option (that and using locks).
You can always use a synchronized block. The fancy collections in java.util.concurrent do not prohibit it, they just make it redundant for most common use cases. If you are performing a compound operation (e.g. - you want to insert two keys which must always have the same value), not only can you use external synchronization - you must.
E.g.:
String key1 = getKeyFromSomewhere();
String key2 = getKeyFromSomewhereElse();
String value = getValue();
// We want to put two pairs in the map - [key1, value] and [key2, value]
// and be sure that in any point in time both key1 and key2 have the same
// value
synchronized(concurrenthashmap_obj) {
concurrenthashmap_obj.put(key1, value);
// without external syncronoziation, key1's value may have already been
// overwritten from a different thread!
concurrenthashmap_obj.put(key2, value);
}
As the ConcurrentHashMap implements the Map Interface, it does support all features every basic Map does as well. So yes: you can use it like any other map and ignore all the extra features. But then you will essentially have a slower HashMap.
The main difference between a synchronized Map and a concurrent Map is - as the name says - concurrency. Imagine you have 100 threads wanting to read from the Map, if you synchronize you block out 99 threads and 1 can do the work. If you use concurrency 100 threads can work at the same time.
Now if you think about the actual reason why you use threads, you soon come to the conclusion that you should get rid of every possible synchronized block that you can.
It all depends on what you mean by "other compound operation" and by "working". Synchronization works with ConcurrentHashMap exactly the same way as it works with any other object.
So, if you want some complex shared state change to be seen as an atomic change, then all accesses to this shared state must be synchronized, on the same lock. This lock could be the Map itself, or it could be another object.
About java.util.concurrent.ConcurrentHashMap
"is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details: they do not throw ConcurrentModificationException."
"allows concurrency among update operations"
About java memory
Generally speaking the reads are safe from a synchronization standpoint but not a memory standpoint.
See also "http://www.ibm.com/developerworks/java/library/j-jtp03304/".
So synchronizaton and volatile should be used to manage concurrent reading (vs. writing).
About putIfAbsent
putIfAbsent is your friend:
If the specified key is not already associated with a value, associate
it with the given
value. This is equivalent to
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
except that the action is performed !!!atomically!!!.
I am considering using EnumMap in a concurrent environment. However, the environment is atypical, here's why:
EnumMap is always full: there are no unmapped keys when the map is exposed to the concurrent environment
Only put() and get() operations will be used (no iterating over, no remove(), etc.)
It is completely acceptable if a call to get() does not reflect a call to put() immediately or orderly.
From what I could gather, including relevant method source code, this seems to be a safe scenario (unlike if iterations were allowed). Is there anything I might have overlooked?
In general, using non-thread-safe classes across threads is fraught with many problems. In your particular case, assuming safe publication after all keys have had values assigned (such that map.size() == TheEnum.values().length), the only problem I can see from a quickish glance of EnumMap's code in Java 1.6 is that a put may not ever get reflected in another thread. But that's only true because of the internals of EnumMap's implementation, which could change in the future. In other words, future changes could break the use case in more dangerous, subtle ways.
It's possible to write correct code that still contains data races -- but it's tricky. Why not just wrap the instance in a Collections.synchronizedMap?
Straight from the JavaDoc:
Like most collection implementations EnumMap is not synchronized. If multiple threads access an enum map concurrently, and at least one of the threads modifies the map, it should be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the enum map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap(java.util.Map<K, V>) method. This is best done at creation time, to prevent accidental unsynchronized access:
Map<EnumKey, V> m = Collections.synchronizedMap(new EnumMap<EnumKey, V>(...));
The problem you have is that threads may not ever see the change made by another thread or they may see partially made changes. It's the same reason double-check-locking was broken before java 5 introduced volatile.
It might work if you made the EnumMap reference volatile but I'm not 100% sure even then, you might need the internal references inside the EnumMap to be volatile and obviously you can't do that without doing your own version of EnumMap.
this is a passage from JavaDoc regarding ConcurrentHashMap. It says retrieval operations generally do not block, so may overlap with update operations. Does this mean the get() method is not thread safe?
"However, even though all operations are thread-safe, retrieval
operations do not entail locking, and there is not any support for
locking the entire table in a way that prevents all access. This class
is fully interoperable with Hashtable in programs that rely on its
thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset."
The get() method is thread-safe, and the other users gave you useful answers regarding this particular issue.
However, although ConcurrentHashMap is a thread-safe drop-in replacement for HashMap, it is important to realize that if you are doing multiple operations you may have to change your code significantly. For example, take this code:
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
In a multi-thread environment, this is a race condition. You have to use the ConcurrentHashMap.putIfAbsent(K key, V value) and pay attention to the return value, which tells you if the put operation was successful or not. Read the docs for more details.
Answering to a comment that asks for clarification on why this is a race condition.
Imagine there are two threads A, B that are going to put two different values in the map, v1 and v2 respectively, having the same key. The key is initially not present in the map. They interleave in this way:
Thread A calls containsKey and finds out that the key is not present, but is immediately suspended.
Thread B calls containsKey and finds out that the key is not present, and has the time to insert its value v2.
Thread A resumes and inserts v1, "peacefully" overwriting (since put is threadsafe) the value inserted by thread B.
Now thread B "thinks" it has successfully inserted its very own value v2, but the map contains v1. This is really a disaster because thread B may call v2.updateSomething() and will "think" that the consumers of the map (e.g. other threads) have access to that object and will see that maybe important update ("like: this visitor IP address is trying to perform a DOS, refuse all the requests from now on"). Instead, the object will be soon garbage collected and lost.
It is thread-safe. However, the way it is being thread-safe may not be what you expect. There are some "hints" you can see from:
This class is fully interoperable with Hashtable in programs that
rely on its thread safety but not on its synchronization details
To know the whole story in a more complete picture, you need to be aware of the ConcurrentMap interface.
The original Map provides some very basic read/update methods. Even I was able to make a thread-safe implementation of Map; there are lots of cases that people cannot use my Map without considering my synchronization mechanism. This is a typical example:
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
This piece of code is not thread-safe, even though the map itself is. Two threads calling containsKey() at the same time could think there is no such key they both therefore insert into the Map.
In order to fix the problem, we need to do extra synchronization explicitly. Assume the thread-safety of my Map is achieved by synchronized keywords, you will need to do:
synchronized(threadSafeMap) {
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
}
Such extra code needs you to know about the "synchronization details" of the map. In the above example, we need to know that the synchronization is achieved by "synchronized".
ConcurrentMap interface take this one step further. It defines some common "complex" actions that involves multiple access to map. For example, the above example is exposed as putIfAbsent(). With these "complex" actions, users of ConcurrentMap (in most case) don't need to synchronise actions with multiple access to the map. Hence, the implementation of Map can perform more complicated synchronization mechanism for better performance. ConcurrentHashhMap is a good example. Thread-safety is in fact maintained by keeping separate locks for different partitions of the map. It is thread-safe because concurrent access to the map will not corrupt the internal data structure, or cause any update lost unexpected, etc.
With all the above in mind, the meaning of Javadoc will be clearer:
"Retrieval operations (including get) generally do not block" because ConcurrentHashMap is not using "synchronized" for its thread-safety. The logic of get itself takes care of the thread-safeness; and If you look further in the Javadoc:
The table is internally partitioned to try to permit the indicated number
of concurrent updates without contention
Not only is retrieval non-blocking, even updates can happen concurrently. However, non-blocking/concurrent-updates does not means that it is thread-UNsafe. It simply means that it is using some ways other than simple "synchronized" for thread-safety.
However, as the internal synchronization mechanism is not exposed, if you want to do some complicated actions other than those provided by ConcurrentMap, you may need to consider changing your logic, or consider not using ConcurrentHashMap. For example:
// only remove if both key1 and key2 exists
if (map.containsKey(key1) && map.containsKey(key2)) {
map.remove(key1);
map.remove(key2);
}
ConcurrentHashmap.get() is thread-safe, in the sense that
It will not throw any exception, including ConcurrentModificationException
It will return a result that was true at some (recent) time in past. This means that two back-to-back calls to get can return different results. Of course, this true of any other Map as well.
HashMap is divided into "buckets" based on hashCode. ConcurrentHashMap uses this fact. Its synchronization mechanism is based on blocking buckets rather than on entire Map. This way few threads can simultaneously write to few different buckets (one thread can write to one bucket at a time).
Reading from ConcurrentHashMap almost doesn't use synchronization. Synchronization is used when while fetching value for key, it sees null value. Since ConcurrentHashMap can't store null as values (yes, aside from keys, values also can't be nulls) it suggests that fetching null while reading happened in the middle of initializing map entry (key-value pair) by another thread: when key was assigned, but value not yet, and it still holds default null.
In such case reading thread will need to wait until entry will be written fully.
So results from read() will be based on current state of map. If you read value of key that was in the middle of updating you will likely get old value since writing process hasn't finished yet.
get() in ConcurrentHashMap is thread-safe because It reads the value
which is Volatile. And in cases when value is null of any key, then
get() method waits till it gets the lock and then it reads the updated
value.
When put() method is updating CHM, then it sets the value of that key to null, and then it creates a new entry and updates the CHM. This null value is used by get() method as signal that another thread is updating the CHM with the same key.
It just means that when one thread is updating and one thread is reading there is no guarantee that the one that called the ConcurrentHashMap method first, in time, will have their operation occur first.
Think about an update on the item telling where Bob is. If one thread asks where Bob is at about the same time that another thread updates to say he came 'inside', you can't predict whether the reader thread will get Bob's status as 'inside' or 'outside'. Even if the update thread calls the method first, the reader thread might get the 'outside' status.
The threads will not cause each other problems. The code is ThreadSafe.
One thread won't go into an infinite loop or start generating wierd NullPointerExceptions or get "itside" with half of the old status and half of the new.