What is the difference between using the wrapper class, SynchronizedMap, on a HashMap and ConcurrentHashMap?
Is it just being able to modify the HashMap while iterating it (ConcurrentHashMap)?
Synchronized HashMap:
Each method is synchronized using an object level lock. So the get and put methods on synchMap acquire a lock.
Locking the entire collection is a performance overhead. While one thread holds on to the lock, no other thread can use the collection.
ConcurrentHashMap was introduced in JDK 5.
There is no locking at the object level,The locking is at a much finer granularity. For a ConcurrentHashMap, the locks may be at a hashmap bucket level.
The effect of lower level locking is that you can have concurrent readers and writers which is not possible for synchronized collections. This leads too much more scalability.
ConcurrentHashMap does not throw a ConcurrentModificationException if one thread tries to modify it while another is iterating over it.
This article Java 7: HashMap vs ConcurrentHashMap is a very good read. Highly recommended.
The short answer:
Both maps are thread-safe implementations of the Map interface. ConcurrentHashMap is implemented for higher throughput in cases where high concurrency is expected.
Brian Goetz's article on the idea behind ConcurrentHashMap is a very good read. Highly recommended.
ConcurrentHashMap is thread safe without synchronizing the whole map. Reads can happen very fast while write is done with a lock.
We can achieve thread safety by using both ConcurrentHashMap and synchronisedHashmap. But there is a lot of difference if you look at their architecture.
synchronisedHashmap
It will maintain the lock at the object level. So if you want to perform any operation like put/get then you have to acquire the lock first. At the same time, other threads are not allowed to perform any operation. So at a time, only one thread can operate on this. So the waiting time will increase here. We can say that performance is relatively low when you are comparing with ConcurrentHashMap.
ConcurrentHashMap
It will maintain the lock at the segment level. It has 16 segments and maintains the concurrency level as 16 by default. So at a time, 16 threads can be able to operate on ConcurrentHashMap. Moreover, read operation doesn't require a lock. So any number of threads can perform a get operation on it.
If thread1 wants to perform put operation in segment 2 and thread2 wants to perform put operation on segment 4 then it is allowed here. Means, 16 threads can perform update(put/delete) operation on ConcurrentHashMap at a time.
So that the waiting time will be less here. Hence the performance is relatively better than synchronisedHashmap.
SynchronizedMap and ConcurrentHashMap are both thread safe class and can be used in multithreaded application, the main difference between them is regarding how they achieve thread safety.
SynchronizedMap acquires lock on the entire Map instance , while ConcurrentHashMap divides the Map instance into multiple segments and locking is done on those.
Both are synchronized version of HashMap, with difference in their core functionality and their internal structure.
ConcurrentHashMap consist of internal segments which can be viewed as independent HashMaps Conceptually.
All such segments can be locked by separate threads in high concurrent executions.
So, multiple threads can get/put key-value pairs from ConcurrentHashMap without blocking/waiting for each other.
This is implemented for higher throughput.
whereas
Collections.synchronizedMap(), we get a synchronized version of HashMap and it is accessed in blocking manner. This means if multiple threads try to access synchronizedMap at same time, they will be allowed to get/put key-value pairs one at a time in synchronized manner.
ConcurrentHashMap uses finer-grained locking mechanism known as lock stripping to allow greater degree of shared access. Due to this it provides better concurrency and scalability.
Also iterators returned for ConcurrentHashMap are weakly consistent instead of fail fast technique used by Synchronized HashMap.
Methods on SynchronizedMap hold the lock on the object, whereas in ConcurrentHashMap there's a concept of "lock striping" where locks are held on buckets of the contents instead. Thus improved scalability and performance.
ConcurrentHashMap :
1)Both maps are thread-safe implementations of the Map interface.
2)ConcurrentHashMap is implemented for higher throughput in cases where high concurrency is expected.
3) There is no locking in object level.
Synchronized Hash Map:
1) Each method is synchronized using an object level lock.
ConcurrentHashMap allows concurrent access to data. Whole map is divided into segments.
Read operation ie. get(Object key) is not synchronized even at segment level.
But write operations ie. remove(Object key), get(Object key) acquire lock at segment level. Only part of whole map is locked, other threads still can read values from various segments except locked one.
SynchronizedMap on the other hand, acquire lock at object level. All threads should wait for current thread irrespective of operation(Read/Write).
A simple performance test for ConcurrentHashMap vs Synchronized HashMap
. The test flow is calling put in one thread and calling get in three threads on Map concurrently. As #trshiv said, ConcurrentHashMap has higher throughput and speed for whose reading operation without lock. The result is when operation times is over 10^7, ConcurrentHashMap is 2x faster than Synchronized HashMap.
Synchronized HashMap
Lock mechanism - It Locks the whole map, so Multiple threads can't access the map concurrently. So, performance is relatively less.
2.Null key or Value - It will allow null as a key or value.
3.Concurrent modification exception - Iterator return by synchronized map throws concurrent modification exception
ConcurrentHashMap
1.Lock mechanism -Locks the portion, Concurrent hashmap allows concurrent read and write. So performance is relatively better than a synchronized map
2.Null key or Value - It doesn't allow null as a key or value. If you use it will throw java.lang.NullPointerException at Runtime.
3.Concurrent modification exception - It doesn't throw concurrent modification exceptions.
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
public class Ex_ConcurrentHashMap {
public static void main(String[] args) {
Map<String, String> map = new ConcurrentHashMap<>();
map.put("one", "one");
map.put("two", "two");
map.put("three", "three");
System.out.println("1st map : "+map);
String key = null;
for(Map.Entry<String, String> itr : map.entrySet())
{
key = itr.getKey();
if("three".equals(key))
{
map.put("FOUR", "FOUR");
}
System.out.println(key+" ::: "+itr.getValue());
}
System.out.println("2nd map : "+map);
//map.put("FIVE", null);//java.lang.NullPointerException
map.put(null, "FIVE");//java.lang.NullPointerException
System.out.println("3rd map : "+map);
}
}
Synchronized HashMap Example
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Map.Entry;
public class Ex_Synchronizedmap {
public static void main(String[] args) {
Map<String, String> map = new HashMap<>();
map.put("one", "one");
map.put("two", "two");
map.put("three", "three");
map.put("FOUR", null);
map.put(null, "FIVE");
System.out.println("map : "+map);
Map<String, String> map1 =
Collections.synchronizedMap(map);
System.out.println("map1 : "+map1);
String key = null;
for(Map.Entry<String, String> itr : map1.entrySet())
{
key = itr.getKey();
if("three".equals(key))
{
map1.put("ABC", "ABC");
}
System.out.println(key+" ::: "+itr.getValue());
}
System.out.println("New Map :: "+map1);
Iterator<Entry<String, String>> iterator = map1.entrySet().iterator();
int i = 0;
while(iterator.hasNext())
{
if(i == 1)
{
map1.put("XYZ", "XYZ");
}
Entry<String, String> next = iterator.next();
System.out.println(next.getKey()+" :: "+next.getValue());
i++;
}
}
}
As per java doc's
Hashtable and Collections.synchronizedMap(new HashMap()) are
synchronized. But ConcurrentHashMap is "concurrent".
A concurrent collection is thread-safe, but not governed by a single exclusion lock.
In the particular case of ConcurrentHashMap, it safely permits
any number of concurrent reads as well as a tunable number of
concurrent writes. "Synchronized" classes can be useful when you need
to prevent all access to a collection via a single lock, at the
expense of poorer scalability.
In other cases in which multiple
threads are expected to access a common collection, "concurrent"
versions are normally preferable. And unsynchronized collections are
preferable when either collections are unshared, or are accessible
only when holding other locks.
Related
What is difference between HashMap in synchronized block vs Collections.synchronizedMap().
HashMap<String,String> hm = new HashMap<String,String>();
hm.put("key1","value1");
hm.put("key2","value2");
synchronized(hm)
{
// Thread safe operation
}
Map<String, String> synchronizedMap = Collections.synchronizedMap(hm);
// Use synchronizedMap for Thread safe concurrent operation
Which is better out of these two?
Using the synchronizedMap method is more convenient and safer in that you know all accesses to the map will be protected (as long as you don't bypass it by calling methods on hm directly).
But using the synchronized block gives you the ability to control the granularity of locking, by holding the lock over multiple statements, which the synchronizedMap option doesn't allow for.
So if you need multiple statements to be executed without being interleaved with calls from other threads, you would have to choose the synchronized blocks (or else switch to something like ConcurrentHashMap if you're looking for something like putIfAbsent or similar functionality). If you don't need that, the synchronizedMap is easier.
They're the same. synchronizedMap() is far easier than handling the syncing yourself, though.
This code is the base (fastest):
Map<String,String> map = new HashMap<>();
for (E e:source) map.put(e.getKey(), e.getValue());
This code is slower (x2):
Map<String,String> map = new HashMap<>();
synchronized(map) {
for (E e:source) map.put(e.getKey(), e.getValue());
}
This code is worse (x20):
Map<String,String> map = new HashMap<>();
synchronized(map) {
source.forEach(map::put);
}
For more detailed measurements taken, see a related question of mine. For the full source code, see GitHub repository.
Why those big discrepancies? If a HashMap is truly lightweight and not thread-safe (no synchronized), then overhead should have been negligible. Besides locks are supposed to be reentrant.
When using Properties, I actually get the reverse effect, as I would have expected: I save time by acquiring a single lock beforehand (before the loop starts).
Can some-one explain those discrepancies?
Note that I am using following JVM option: -Xms4g
UPDATE: a good article on benchmarking - http://www.ibm.com/developerworks/library/j-benchmark1/
If a HashMap is truly lightweight and not thread-safe (no synchronized), then overhead should have been negligible.
That's a complete non-sequitur. The more lightweight the operation inside the synchronized block, the higher the relative overhead of synchronization.
Besides locks are supposed to be reentrant.
They are. So? There is no re-entrancy here.
I am declaring a Java Map as
Map<String, String> map = Collections.synchronizedMap(new HashMap<String, String>());
to deal with the concurrency issues, and synchronizing on the map for all the operations on it. However, I read that synchronization isn't necessary on a synchronizedMap when the operations are atomic. I checked the Java API and the documentation of HashMap doesn't seem to mention which are atomic, so I'm not sure which are.
I'm synchronizing on the following calls to the map:
map.size()
map.put()
map.remove()
map.get()
But if some are atomic, it seems synchronization isn't necessary for these. Which are atomic?
A synchronized map as the name suggests is synchronized. Every operation on it is atomic in respect to any other operation on it.
You can think of it as if every method of your synchronized map is declared with a synchronized keyword.
Please bear in mind that although individual operations are atomic, if you combine them they're no longer atomic, for instance:
String value = map.get("key");
map.put("key", value+"2");
is not equivalent to your custom synchronized code:
synchronized (map) {
String value = map.get("key");
map.put("key", value+"2");
}
but rather:
synchronized (map) {
String value = map.get("key");
}
synchronized (map) {
map.put("key", value+"2");
}
A HashMap is not guaranteed to have atomic operations. Calling any of its methods from different threads (even size()) may corrupt the map. However, a map obtained using Collections.synchronizedMap will have each call synchronized (and hence thread-safe).
However, you may need higher-level synchronization. For instance, if you test whether a key is present, read the size, or otherwise access something from the map and then do something else with the map based on the result, the map may have changed between the two calls. In that case, you need a synchronized block to make the entire transaction atomic, rather than a synchronized map (that just makes each call atomic).
The map itself is synchronized, not some internal locks. Running more than one operation on the map does require a synchronized block. In any event, if you are using a JDK 1.6 or greater, you should consider using ConcurrentHashMap
ConcurrentHashMap is optimal when you need to ensure data consistency, and each of your threads need a current view of the map. If performance is critical, and each thread only inserts data to the map, with reads happening less frequently, then use the path you've outlined. That said, performance may only be poorer when only a single thread accesses a ConcurrentHashMap at a time, but significantly better when multiple threads access the map concurrently.
I'm attempting to create a ConcurrentHashMap that supports "snapshots" in order to provide consistent iterators, and am wondering if there's a more efficient way to do this. The problem is that if two iterators are created at the same time then they need to read the same values, and the definition of the concurrent hash map's weakly consistent iterators does not guarantee this to be the case. I'd also like to avoid locks if possible: there are several thousand values in the map and processing each item takes several dozen milliseconds, and I don't want to have to block writers during this time as this could result in writers blocking for a minute or longer.
What I have so far:
The ConcurrentHashMap's keys are Strings, and its values are instances of ConcurrentSkipListMap<Long, T>
When an element is added to the hashmap with putIfAbsent, then a new skiplist is allocated, and the object is added via skipList.put(System.nanoTime(), t).
To query the map, I use map.get(key).lastEntry().getValue() to return the most recent value. To query a snapshot (e.g. with an iterator), I use map.get(key).lowerEntry(iteratorTimestamp).getValue(), where iteratorTimestamp is the result of System.nanoTime() called when the iterator was initialized.
If an object is deleted, I use map.get(key).put(timestamp, SnapShotMap.DELETED), where DELETED is a static final object.
Questions:
Is there a library that already implements this? Or barring that, is there a data structure that would be more appropriate than the ConcurrentHashMap and the ConcurrentSkipListMap? My keys are comparable, so maybe some sort of concurrent tree would better support snapshots than a concurrent hash table.
How do I prevent this thing from continually growing? I can delete all of the skip list entries with keys less than X (except for the last key in the map) after all iterators that were initialized on or before X have completed, but I don't know of a good way to determine when this has happened: I can flag that an iterator has completed when its hasNext method returns false, but not all iterators are necessarily going to run to completion; I can keep a WeakReference to an iterator so that I can detect when it's been garbage collected, but I can't think of a good way to detect this other than by using a thread that iterates through the collection of weak references and then sleeps for several minutes - ideally the thread would block on the WeakReference and be notified when the wrapped reference is GC'd, but I don't think this is an option.
ConcurrentSkipListMap<Long, WeakReference<Iterator>> iteratorMap;
while(true) {
long latestGC = 0;
for(Map.Entry<Long, WeakReference<Iterator>> entry : iteratorMap.entrySet()) {
if(entry.getValue().get() == null) {
iteratorMap.remove(entry.getKey());
latestGC = entry.getKey();
} else break;
}
// remove ConcurrentHashMap entries with timestamps less than `latestGC`
Thread.sleep(300000); // five minutes
}
Edit: To clear up some confusion in the answers and comments, I'm currently passing weakly consistent iterators to code written by another division in the company, and they have asked me to increase the strength of the iterators' consistency. They are already aware of the fact that it is infeasible for me to make 100% consistent iterators, they just want a best effort on my part. They care more about throughput than iterator consistency, so coarse-grained locks are not an option.
What is your actual use case that requires a special implementation? From the Javadoc of ConcurrentHashMap (emphasis added):
Retrievals reflect the results of the most recently completed update operations holding upon their onset. ... Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
So the regular ConcurrentHashMap.values().iterator() will give you a "consistent" iterator, but only for one-time use by a single thread. If you need to use the same "snapshot" multiple times and/or by multiple threads, I suggest making a copy of the map.
EDIT: With the new information and the insistence for a "strongly consistent" iterator, I offer this solution. Please note that the use of a ReadWriteLock has the following implications:
Writes will be serialized (only one writer at a time) so write performance may be impacted.
Concurrent reads are allowed as long as there is no write in progress, so read performance impact should be minimal.
Active readers block writers but only as long as it takes to retrieve the reference to the current "snapshot". Once a thread has the snapshot, it no longer blocks writers no matter how long it takes to process the information in the snapshot.
Readers are blocked while any write is active; once the write finishes then all readers will have access to the new snapshot until a new write replaces it.
Consistency is achieved by serializing the writes and making a copy of the current values on each and every write. Readers that hold a reference to a "stale" snapshot can continue to use the old snapshot without worrying about modification, and the garbage collector will reclaim old snapshots as soon as no one is using it any more. It is assumed that there is no requirement for a reader to request a snapshot from an earlier point in time.
Because snapshots are potentially shared among multiple concurrent threads, the snapshots are read-only and cannot be modified. This restriction also applies to the remove() method of any Iterator instances created from the snapshot.
import java.util.*;
import java.util.concurrent.locks.*;
public class StackOverflow16600019 <K, V> {
private final ReadWriteLock locks = new ReentrantReadWriteLock();
private final HashMap<K,V> map = new HashMap<>();
private Collection<V> valueSnapshot = Collections.emptyList();
public V put(K key, V value) {
locks.writeLock().lock();
try {
V oldValue = map.put(key, value);
updateSnapshot();
return oldValue;
} finally {
locks.writeLock().unlock();
}
}
public V remove(K key) {
locks.writeLock().lock();
try {
V removed = map.remove(key);
updateSnapshot();
return removed;
} finally {
locks.writeLock().unlock();
}
}
public Collection<V> values() {
locks.readLock().lock();
try {
return valueSnapshot; // read-only!
} finally {
locks.readLock().unlock();
}
}
/** Callers MUST hold the WRITE LOCK. */
private void updateSnapshot() {
valueSnapshot = Collections.unmodifiableCollection(
new ArrayList<V>(map.values())); // copy
}
}
I've found that the ctrie is the ideal solution - it's a concurrent hash array mapped trie with constant time snapshots
Solution1) What about just synchronizing on the puts, and on the iteration. That should give you a consistent snapshot.
Solution2) Start iterating and make a boolean to say so, then override the puts, putAll so that they go into a queue, when the iteration is finished simply make those puts with the changed values.
I have enough knowledge on creating Synchronized static objects.
However for a Map (Collection) in Java,
I found default implementations in Java individually (one for Synchronized list and one for for Singleton map).
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Collections.html#synchronizedMap(java.util.Map)
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Collections.html#singletonMap(K, V)
I am thinking of getting the desired result by following implementation
Map<K,V> initMap = new HashMap<K,V>();
Map<K,V> syncSingMap = Collections.synchronizedMap(Collection.singletonMap(initMap));
Am i making right sense? Because documentation at oracle shows some warning on this
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:
Map m = Collections.synchronizedMap(new HashMap());
...
Set s = m.keySet(); // Needn't be in synchronized block
...
synchronized(m) { // Synchronizing on m, not s!
Iterator i = s.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic behavior
How about using ConcurrentMap over this.
Requriement: static synchronized singleton map which will be used by tons of threads for some processing operations
UPDATE
After going through few articles, found that ConcurrentMap is much preferable than HashMap in multi-thread environment
http://java.dzone.com/articles/java-7-hashmap-vs
Collections.singletonMap returns an immutable Map with exactly one entry, not a "singleton" in the sense of "only one exists in your application." (If you use Collections.singletonMap, there's no need to synchronize it, since it's unmodifiable.)
Use ConcurrentMap if you are using Java 6+:
public class MapHolder {
public static final ConcurrentMap<String, Object> A_MAP = new ConcurrentHashMap<String, Object>();
}
Its better to use ConcurrentHashMap for performance reasons also, synchronizedMap will cause lock on the map instance and will reduce the performance. But in ConcurrentHashMap there is highly optimized algorithms for achieving high level of concurrency.
For an example ConcurrentHashMap has lock for each Hash Bucket and so multiple threads can even update the map.
ConcurrentHashMap is better than synchronizedMap.