Are size(), put(), remove(), get() atomic in Java synchronized HashMap?

Are size(), put(), remove(), get() atomic in Java synchronized HashMap? - java

I am declaring a Java Map as
Map<String, String> map = Collections.synchronizedMap(new HashMap<String, String>());
to deal with the concurrency issues, and synchronizing on the map for all the operations on it. However, I read that synchronization isn't necessary on a synchronizedMap when the operations are atomic. I checked the Java API and the documentation of HashMap doesn't seem to mention which are atomic, so I'm not sure which are.
I'm synchronizing on the following calls to the map:
map.size()
map.put()
map.remove()
map.get()
But if some are atomic, it seems synchronization isn't necessary for these. Which are atomic?

A synchronized map as the name suggests is synchronized. Every operation on it is atomic in respect to any other operation on it.
You can think of it as if every method of your synchronized map is declared with a synchronized keyword.
Please bear in mind that although individual operations are atomic, if you combine them they're no longer atomic, for instance:
String value = map.get("key");
map.put("key", value+"2");
is not equivalent to your custom synchronized code:
synchronized (map) {
String value = map.get("key");
map.put("key", value+"2");
}
but rather:
synchronized (map) {
String value = map.get("key");
}
synchronized (map) {
map.put("key", value+"2");
}

A HashMap is not guaranteed to have atomic operations. Calling any of its methods from different threads (even size()) may corrupt the map. However, a map obtained using Collections.synchronizedMap will have each call synchronized (and hence thread-safe).
However, you may need higher-level synchronization. For instance, if you test whether a key is present, read the size, or otherwise access something from the map and then do something else with the map based on the result, the map may have changed between the two calls. In that case, you need a synchronized block to make the entire transaction atomic, rather than a synchronized map (that just makes each call atomic).

The map itself is synchronized, not some internal locks. Running more than one operation on the map does require a synchronized block. In any event, if you are using a JDK 1.6 or greater, you should consider using ConcurrentHashMap
ConcurrentHashMap is optimal when you need to ensure data consistency, and each of your threads need a current view of the map. If performance is critical, and each thread only inserts data to the map, with reads happening less frequently, then use the path you've outlined. That said, performance may only be poorer when only a single thread accesses a ConcurrentHashMap at a time, but significantly better when multiple threads access the map concurrently.

Related

ConcurrentHashMap thread-safety without using putIfAbsent

I'am trying to clarify HashMap vs ConcurrentHashMap regarding type-safety and also performance. I came across a lot of good articles, but still getting troubles figuring it all out.
Let's take the following example using a ConcurrentHashMap, where I will try to add a value for a key not already there and returning it, the new way of doing it would be:
private final Map<K,V> map = new ConcurrentHashMap<>();
return map.putIfAbsent(k, new Object());
let's assume we don't want to use the putIfAbsent method, the above code should look something like this:
private final Map<K,V> map = new ConcurrentHashMap<>();
synchronized (map) {
V value = map.get(key); //Edit adding the value fetch inside synchronized block
if (!nonNull(value)) {
map.put(key, new Object());
}
}
return map.get(key)
Is the problem with this approach the fact that the whole map is locked whereas in first approach the putIfAbsent method only synchronizes on the bucket on which the hash of the key is, and thus leading to less performance ? Would the second approach work fine with just a HashMap ?

Is the problem with this approach the fact that the whole map is locked
There are two problems with this approach.
It's not intrinsic
The fact that you've acquired the lock on the map reference has zero effect whatsoever, except in regards to any other code that (tries) to acquire this lock. Crucially, ConcurrentHashmap itself does not acquire this lock.
So, if, during that second snippet (with synchronized), some other thread does this:
map.putIfAbsent(key, new Object());
Then it may occur that your map.get(key) call returns null, and nevertheless your followup map.put call ends up overwriting. In other words, that both your thread, and that hypothetical thread running putIfAbsent, both decided to write.
Presumably, if that is just fine in your book, that'd be weird. Why use putIfAbsent and check if map.get returns null in the first place?
Had the other thread done this:
synchronized (map) {
map.putIfAbsent(key, new Object());
}
then there'd be no problem; either your get-check-if-null-then-set code will set and the putIfAbsent call is a noop, or vice versa, but they couldn't possibly both 'decide to write'.
Which leads us to;
This is pointless
There are two different ways to achieve concurrency with maps: Intrinsic and extrinsic. There is zero point in doing both, and they do not interact.
If you have structure whereby all access (both read and write) out of a plain old entirely non-multicore capable java.util.HashMap goes through some shared lock (the hashmap instance itself, or any other lock, long as all threads that interact with that particular map instance use the same one), then that works fine and there is therefore no reason or point to using ConcurrentHashMap instead.
The point of ConcurrentHashMap is to streamline concurrent processes without the use of extrinsic locking: To let the map do the locking.
One of the reasons you want this is that the ConcurrentHashMap impl is significantly faster at the jobs it is capable of doing; these jobs are spelled out explicitly: It's the methods that ConcurrentHashMap has.
Atomicity
The central problem of your code snippet is that it lacks atomicity. Check-then-act is fundamentally broken in concurrent models (in your case: Check: Is key 'k' associated with no value or null?, then Act: Set the mapping of key 'k' to value 'v'). This is broken because what if the thing you checked changes in between? What if you have two threads that both 'check-and-act' and then run simultaneously; then they both check first, then both act first, and broken things ensue: One of the two threads will be acting upon a state that isn't equal to the state as it was when you checked, which means your check's broken.
The right model is act-then-check: Act first, and then check the result of the operation. Of course, this requires redefining, and integrating, the code you wrote explicitly in your snippet, into the very definition of your 'act' phase.
In other words, putIfAbsent is not a convenience method! is a fundamental operation! It's the only way (short of extrinsic locking) to convey the notion of: "Perform the action of associating 'v' with 'k', but only if there is no association yet. I'll check the results of this operation next". There is no way to break that down into if (!map.containsKey(key)) map.put(key, v); because check-then-act does not work in concurrent modelling.
Conclusions
Either get rid of concurrenthashmap, or get rid of synchronized. Having code that uses both is probably broken and even if it isn't, it's error prone, confusing, and I can guarantee you there's a much better way to write it (better in that it is more idiomatic, easier to read, more flexible in the face of future change requests, easier to test, and less likely to have hard-to-test-for bugs in it).
If you can state all operations you need to perform 100% in terms of the methods that CHM has, then do that, because CHM is vastly superior. It even has mechanisms for arbitrary operations: For example, unlike basic hashmaps, you can iterate through a CHM even if other threads are also messing with it, whereas with a normal hashmap you need to hold the lock for the entire duration of the operation, which means any other thread trying to do anything to that hashmap, even just 'ask for its size', need to wait. Hence, for most use cases, CHM results in orders of magnitude better performance.

in first approach the putIfAbsent method only synchronizes on the bucket
That is incorrect, ConcurrentHashMap doesn't synchronize on anything, it uses different mechanics to ensure thread safety.
Would the second approach work fine with just a HashMap ?
Yes, except the second approach is flawed. If using synchronization to make a Map thread-safe, then all access of the Map should use synchronization. As such, it would be best to call Collections.synchronizedMap(map). Performance will be worse than using ConcurrentHashMap.
private final Map<Integer, Object> map = Collections.synchronizedMap(new HashMap<>());
let's assume we don't want to use the putIfAbsent method.
Why? Oh, because it wastes a allocation if the key is already in the map, which is why we should be using computeIfAbsent() instead
map.computeIfAbsent(key, k -> new Object());

What is difference between HashMap in synchronized block vs Collections.synchronizedMap().

What is difference between HashMap in synchronized block vs Collections.synchronizedMap().
HashMap<String,String> hm = new HashMap<String,String>();
hm.put("key1","value1");
hm.put("key2","value2");
synchronized(hm)
{
// Thread safe operation
}
Map<String, String> synchronizedMap = Collections.synchronizedMap(hm);
// Use synchronizedMap for Thread safe concurrent operation
Which is better out of these two?

Using the synchronizedMap method is more convenient and safer in that you know all accesses to the map will be protected (as long as you don't bypass it by calling methods on hm directly).
But using the synchronized block gives you the ability to control the granularity of locking, by holding the lock over multiple statements, which the synchronizedMap option doesn't allow for.
So if you need multiple statements to be executed without being interleaved with calls from other threads, you would have to choose the synchronized blocks (or else switch to something like ConcurrentHashMap if you're looking for something like putIfAbsent or similar functionality). If you don't need that, the synchronizedMap is easier.

They're the same. synchronizedMap() is far easier than handling the syncing yourself, though.

ConcurrentHashMap and compound operations

Hashtable and Collections.synchronizedMap are thread safe but still compound operations like
if (!map_obj.containsKey(key)) {
map_obj.put(key, value);
}
needs external synchronization as:
synchronized(map_obj) {
if (!map_obj.containsKey(key)) {
map_obj.put(key, value);
}
}
Suppose we have ConcurrentHashMap(CHM) instead of Hashtable or HashMap. CHM provides an alternative putIfAbsent() method for the above compound operation, thus removing the need of external synchronization.
But suppose there is no putIfAbsent() provided by CHM. Then can we write following code:
synchronized(concurrenthashmap_obj) {
if (!concurrenthashmap_obj.containsKey(key)) {
concurrenthashmap_obj.put(key, value);
}
}
I mean can we use external synchronization on CHM object?Will it work?
For above compound operation there is putIfAbsent() method in CHM but how can we achieve thread safety for other compound operations if we are using CHM. I mean can we use external synchronization on CHM object?

No, you cannot use external synchronization to ensure atomicity of compound operations over ConcurrentHashMap.
To be precise, you can use external synchronization to ensure atomicity of compound operations, but only if all operations with ConcurrentHashMap are synchronized over the same lock as well (though use of ConcurrentHashMap won't make sense in this case - you can replace it with regular HashMap).
Approach with external synchronization works with Hashtable and Collections.synchronizedMap() only because they guarantee that their primitive operations are synchronized over these objects as well. Since ConcurrentHashMap doesn't provide such a guarantee, primitive operations may interfere with execution of your compound operations, breaking their atomicity.
However, ConcurrentHashMap provides number of methods that can be used to implement compound operations in optimistic manner:
putIfAbsent(key, value)
remove(key, value)
replace(key, value)
replace(key, oldValue, newValue)
You can use these operation to implement certain compound operations without explict synchronization, the same way as you would do for AtomicReference, etc.

There isn't any reason why you can't. Traditional synchronization works with everything, there aren't special exceptions against them. ConcurrentHashMaps simply use more optimized thread-safety mechanisms, if you wish to do something more complex, falling back to traditional synchronization may actually be your only option (that and using locks).

You can always use a synchronized block. The fancy collections in java.util.concurrent do not prohibit it, they just make it redundant for most common use cases. If you are performing a compound operation (e.g. - you want to insert two keys which must always have the same value), not only can you use external synchronization - you must.
E.g.:
String key1 = getKeyFromSomewhere();
String key2 = getKeyFromSomewhereElse();
String value = getValue();
// We want to put two pairs in the map - [key1, value] and [key2, value]
// and be sure that in any point in time both key1 and key2 have the same
// value
synchronized(concurrenthashmap_obj) {
concurrenthashmap_obj.put(key1, value);
// without external syncronoziation, key1's value may have already been
// overwritten from a different thread!
concurrenthashmap_obj.put(key2, value);
}

As the ConcurrentHashMap implements the Map Interface, it does support all features every basic Map does as well. So yes: you can use it like any other map and ignore all the extra features. But then you will essentially have a slower HashMap.
The main difference between a synchronized Map and a concurrent Map is - as the name says - concurrency. Imagine you have 100 threads wanting to read from the Map, if you synchronize you block out 99 threads and 1 can do the work. If you use concurrency 100 threads can work at the same time.
Now if you think about the actual reason why you use threads, you soon come to the conclusion that you should get rid of every possible synchronized block that you can.

It all depends on what you mean by "other compound operation" and by "working". Synchronization works with ConcurrentHashMap exactly the same way as it works with any other object.
So, if you want some complex shared state change to be seen as an atomic change, then all accesses to this shared state must be synchronized, on the same lock. This lock could be the Map itself, or it could be another object.

About java.util.concurrent.ConcurrentHashMap
"is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details: they do not throw ConcurrentModificationException."
"allows concurrency among update operations"
About java memory
Generally speaking the reads are safe from a synchronization standpoint but not a memory standpoint.
See also "http://www.ibm.com/developerworks/java/library/j-jtp03304/".
So synchronizaton and volatile should be used to manage concurrent reading (vs. writing).
About putIfAbsent
putIfAbsent is your friend:
If the specified key is not already associated with a value, associate
it with the given
value. This is equivalent to
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
except that the action is performed !!!atomically!!!.

How to synchronize Map between one r/w Thread and one read-only Thread?

I have a synchronized Map (via Collections.synchronizedMap()) that is read and updated by Thread A. Thread B accesses the Map only via Map.keySet() (read-only).
How should I synchronize this? The docs say that keySet() (for a Collections.synchronizedMap) "Needn't be in synchronized block". I can put Thread A's read/write access within a synchronized block, but is that even necessary?
I guess it seems odd to me to even use a synchronized Map, or a synchronized block, if Map.keySet doesn't need to be synchronized (according to the docs link above)...
Update: I missed that iteration of the keySet must be synchronized, even though retrieving the keySet does not require sync. Not particularly exciting to have the keySet without being able to look through it, so end result = synchronization required. Using a ConcurrentHashMap instead.

To make a truly read/write versus read/only locking Map wrapper, you can take a look at the wrapper the Collections uses for synchronizedMap() and replace all of the synchronized statements with a ReentrantReadWriteLock. This is a good bit of work. Instead, you should consider switching to using a ConcurrentHashMap which does all of the right things there.
In terms of keySet(), it doesn't need to be in a synchronized block because it is already being synchronized by the Collections.synchronizedMap(). The Javadocs is just pointing out that if you are iterating through the map, you need to synchronize on it because you are doing multiple operations, but you don't need to synchronize when you are getting the keySet() which is wrapped in a SynchronizedSet class which does its own synchronization.
Lastly, your question seemed to be implying that you don't need to synchronize on something if you are just reading from it. You have to remember that synchronization not only protects against race conditions but also ensures that the data is properly shared by each of the processors. Even if you are accessing a Map as read-only, you still need to synchronize on it if any other thread is updating it.

The docs are telling you how to properly synchronize multi-step operations that need to be atomic, in this case iterating over the map:
Map m = Collections.synchronizedMap(new HashMap());
...
Set s = m.keySet(); // Needn't be in synchronized block
...
synchronized(m) { // Synchronizing on m, not s!
Iterator i = s.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Note how the actual iteration must be in a synchronized block. The docs are just saying that it doesn't matter if obtaining the keySet() is in the synchronized block, because it's a live view of the Map. If the keys in the map change between the reference to the key set being obtained and the beginning of the synchronized block, the key set will reflect those changes.
And by the way, the docs you cite are only for a Map returned by Collections.synchronizedMap. The statement does not necessarily apply to all Maps.

The docs are correct. The map returned from Collections.synchronizedMap() will properly wrap synchronized around all calls sent to the original Map. However, the set impl returned by keySet() does not have the same property, so you must ensure it is read under the same lock.
Without this synchronization, there is no guarantee that Thread B will ever see any update made by Thread A.
You might want to investigate ConcurrentHashMap. It provides useful semantics for exactly this use case. Iterating over a collection view in CHM (like keySet()) gives useful concurrent behavior ("weakly consistent" iterators). You will traverse all keys from the state of the collection at iteration and you may or may not see changes after the iterator was created.

ConcurrentHashMap vs Synchronized HashMap

What is the difference between using the wrapper class, SynchronizedMap, on a HashMap and ConcurrentHashMap?
Is it just being able to modify the HashMap while iterating it (ConcurrentHashMap)?

Synchronized HashMap：
Each method is synchronized using an object level lock. So the get and put methods on synchMap acquire a lock.
Locking the entire collection is a performance overhead. While one thread holds on to the lock, no other thread can use the collection.
ConcurrentHashMap was introduced in JDK 5.
There is no locking at the object level,The locking is at a much finer granularity. For a ConcurrentHashMap, the locks may be at a hashmap bucket level.
The effect of lower level locking is that you can have concurrent readers and writers which is not possible for synchronized collections. This leads too much more scalability.
ConcurrentHashMap does not throw a ConcurrentModificationException if one thread tries to modify it while another is iterating over it.
This article Java 7: HashMap vs ConcurrentHashMap is a very good read. Highly recommended.

The short answer:
Both maps are thread-safe implementations of the Map interface. ConcurrentHashMap is implemented for higher throughput in cases where high concurrency is expected.
Brian Goetz's article on the idea behind ConcurrentHashMap is a very good read. Highly recommended.

ConcurrentHashMap is thread safe without synchronizing the whole map. Reads can happen very fast while write is done with a lock.

We can achieve thread safety by using both ConcurrentHashMap and synchronisedHashmap. But there is a lot of difference if you look at their architecture.
synchronisedHashmap
It will maintain the lock at the object level. So if you want to perform any operation like put/get then you have to acquire the lock first. At the same time, other threads are not allowed to perform any operation. So at a time, only one thread can operate on this. So the waiting time will increase here. We can say that performance is relatively low when you are comparing with ConcurrentHashMap.
ConcurrentHashMap
It will maintain the lock at the segment level. It has 16 segments and maintains the concurrency level as 16 by default. So at a time, 16 threads can be able to operate on ConcurrentHashMap. Moreover, read operation doesn't require a lock. So any number of threads can perform a get operation on it.
If thread1 wants to perform put operation in segment 2 and thread2 wants to perform put operation on segment 4 then it is allowed here. Means, 16 threads can perform update(put/delete) operation on ConcurrentHashMap at a time.
So that the waiting time will be less here. Hence the performance is relatively better than synchronisedHashmap.

SynchronizedMap and ConcurrentHashMap are both thread safe class and can be used in multithreaded application, the main difference between them is regarding how they achieve thread safety.
SynchronizedMap acquires lock on the entire Map instance , while ConcurrentHashMap divides the Map instance into multiple segments and locking is done on those.

Both are synchronized version of HashMap, with difference in their core functionality and their internal structure.
ConcurrentHashMap consist of internal segments which can be viewed as independent HashMaps Conceptually.
All such segments can be locked by separate threads in high concurrent executions.
So, multiple threads can get/put key-value pairs from ConcurrentHashMap without blocking/waiting for each other.
This is implemented for higher throughput.
whereas
Collections.synchronizedMap(), we get a synchronized version of HashMap and it is accessed in blocking manner. This means if multiple threads try to access synchronizedMap at same time, they will be allowed to get/put key-value pairs one at a time in synchronized manner.

ConcurrentHashMap uses finer-grained locking mechanism known as lock stripping to allow greater degree of shared access. Due to this it provides better concurrency and scalability.
Also iterators returned for ConcurrentHashMap are weakly consistent instead of fail fast technique used by Synchronized HashMap.

Methods on SynchronizedMap hold the lock on the object, whereas in ConcurrentHashMap there's a concept of "lock striping" where locks are held on buckets of the contents instead. Thus improved scalability and performance.

ConcurrentHashMap :
1)Both maps are thread-safe implementations of the Map interface.
2)ConcurrentHashMap is implemented for higher throughput in cases where high concurrency is expected.
3) There is no locking in object level.
Synchronized Hash Map:
1) Each method is synchronized using an object level lock.

ConcurrentHashMap allows concurrent access to data. Whole map is divided into segments.
Read operation ie. get(Object key) is not synchronized even at segment level.
But write operations ie. remove(Object key), get(Object key) acquire lock at segment level. Only part of whole map is locked, other threads still can read values from various segments except locked one.
SynchronizedMap on the other hand, acquire lock at object level. All threads should wait for current thread irrespective of operation(Read/Write).

A simple performance test for ConcurrentHashMap vs Synchronized HashMap
. The test flow is calling put in one thread and calling get in three threads on Map concurrently. As #trshiv said, ConcurrentHashMap has higher throughput and speed for whose reading operation without lock. The result is when operation times is over 10^7, ConcurrentHashMap is 2x faster than Synchronized HashMap.

Synchronized HashMap
Lock mechanism - It Locks the whole map, so Multiple threads can't access the map concurrently. So, performance is relatively less.
2.Null key or Value - It will allow null as a key or value.
3.Concurrent modification exception - Iterator return by synchronized map throws concurrent modification exception
ConcurrentHashMap
1.Lock mechanism -Locks the portion, Concurrent hashmap allows concurrent read and write. So performance is relatively better than a synchronized map
2.Null key or Value - It doesn't allow null as a key or value. If you use it will throw java.lang.NullPointerException at Runtime.
3.Concurrent modification exception - It doesn't throw concurrent modification exceptions.
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
public class Ex_ConcurrentHashMap {
public static void main(String[] args) {
Map<String, String> map = new ConcurrentHashMap<>();
map.put("one", "one");
map.put("two", "two");
map.put("three", "three");
System.out.println("1st map : "+map);
String key = null;
for(Map.Entry<String, String> itr : map.entrySet())
{
key = itr.getKey();
if("three".equals(key))
{
map.put("FOUR", "FOUR");
}
System.out.println(key+" ::: "+itr.getValue());
}
System.out.println("2nd map : "+map);
//map.put("FIVE", null);//java.lang.NullPointerException
map.put(null, "FIVE");//java.lang.NullPointerException
System.out.println("3rd map : "+map);
}
}
Synchronized HashMap Example
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Map.Entry;
public class Ex_Synchronizedmap {
public static void main(String[] args) {
Map<String, String> map = new HashMap<>();
map.put("one", "one");
map.put("two", "two");
map.put("three", "three");
map.put("FOUR", null);
map.put(null, "FIVE");
System.out.println("map : "+map);
Map<String, String> map1 =
Collections.synchronizedMap(map);
System.out.println("map1 : "+map1);
String key = null;
for(Map.Entry<String, String> itr : map1.entrySet())
{
key = itr.getKey();
if("three".equals(key))
{
map1.put("ABC", "ABC");
}
System.out.println(key+" ::: "+itr.getValue());
}
System.out.println("New Map :: "+map1);
Iterator<Entry<String, String>> iterator = map1.entrySet().iterator();
int i = 0;
while(iterator.hasNext())
{
if(i == 1)
{
map1.put("XYZ", "XYZ");
}
Entry<String, String> next = iterator.next();
System.out.println(next.getKey()+" :: "+next.getValue());
i++;
}
}
}

As per java doc's
Hashtable and Collections.synchronizedMap(new HashMap()) are
synchronized. But ConcurrentHashMap is "concurrent".
A concurrent collection is thread-safe, but not governed by a single exclusion lock.
In the particular case of ConcurrentHashMap, it safely permits
any number of concurrent reads as well as a tunable number of
concurrent writes. "Synchronized" classes can be useful when you need
to prevent all access to a collection via a single lock, at the
expense of poorer scalability.
In other cases in which multiple
threads are expected to access a common collection, "concurrent"
versions are normally preferable. And unsynchronized collections are
preferable when either collections are unshared, or are accessible
only when holding other locks.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.