I have a class container containing a collection which is going to be used by multiple threads:
public class Container{
private Map<String, String> map;
//ctor, other methods reading the map
public void doSomeWithMap(String key, String value){
//do some threads safe action
map.put(key, value);
//do something else, also thread safe
}
}
What would be better, to declare the method synchronized:
public synchronized void doSomeWithMap(String key, String value)
or to use standard thread-safe decorator?
Collections.synchronizedMap(map);
Generally speaking, synchronizing the map will protect most access to it without having to think about it further. However, the "synchronized map" is not safe for iteration which may be an issue depending on your use case. It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views.
Consider using ConcurrentHashMap if that will meet your use case.
If there is other state to this object that needs to be protected from concurrency errors, then you will need to use synchronized or a Lock.
If your doSomeWithMap method will access the map more than once, you must synchronize the doSomeWithMap method. If the only access is the put() call shown, then it's better to use a ConcurrentHashMap.
Note that "more than once" is any call, and an iterator is by nature many "gets".
If you look at the implementation of SynchronizedMap, you'll see that it's simply a map wrapping a non thread-safe map that uses a mutex before calling any method
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}
public V put(K key, V value) {
synchronized (mutex) {return m.put(key, value);}
}
public Set<Map.Entry<K,V>> entrySet() {
synchronized (mutex) {
if (entrySet==null)
entrySet = new SynchronizedSet<>(m.entrySet(), mutex);
return entrySet;
}
}
If all you want is protecting get and put, this implementation does it for you.
However it's not suitable if you want a Map that can be iterated over and updated by two or more threads, in which case you should use a ConcurrentHashMap.
If the other things you do inside doSomeWithMap would cause problems if done concurrently by different threads (for example, they update class-level variables) then you should synchronise the whole method. If this is not the case then you should use a synchronised Map in order to minimise the length of time the synchronisation lock is left in place.
You should probably have the block to be synchronized upon the requirement. Please note this.
When you use synchronized Collection like ConcurrentHashMap or Collection's method like synchronizedMap(), synchronizedList() etc, only the Map/List is synchronized. To explain little further,
Consider,
Map<String, Object> map = new HashMap<>();
Map<String, Object> synchMap = Collections.synchronizedMap(map);
This makes the map's get operation to be synchronous and not the objects inside it.
Object o = synchMap.get("1");// Object referenced by o is not synchronized. Only the map is.
If you want to protect the objects inside the Map, then you also have to put the code inside the synchronized block. This is good to remember as many people forget to safe guard the object in most cases.
Look at this for little info too Collection's synchronizedMap
Related
I need to add a static thread safe HashMap. I have something like this -
private static Map<String, ConcurrentHashMap<Integer, ClassA>> myCache =
new ConcurrentHashMap<String, ConcurrentHashMap<Integer, ClassA>>();
Even though I am using ConcurrentHashMap, I see that if main thread is adding element in the myCache, and when the same is accessed in other thread even at later time, it does not have the latest data in the myCache object. For e.g
Thread 1: Adds entry to the map
myCache = {a1={1=com.x.y.z.ClassA#3ec8b657}}
Thread 2: Access the same key a1. But it does not see the data added by Thread 1. Rather it sees empty value for this key myCache = {a1={}}
As a result, data is getting corrupted. Entries added for the a1 key in Thread 1 are not visible in Thread 2.
Thanks in advance for any pointers on how can I update this map in thread safe manner.
Even though I am using ConcurrentHashMap, I see that if main thread is adding element in the myCache, and when the same is accessed in other thread even at later time, it does not have the latest data in the myCache object.
ConcurrentHashMap is running in a large number of applications around the world at this instance. If your fairly typical use case didn't work appropriately, many critical systems would be failing.
Something is going on but chances are high that it is nothing to do with ConcurrentHashMap. So here are some questions for you to help you debug your code:
Are you sure that the cache lookup happens after the cache put? ConcurrentHashMap doesn't save you from race conditions in your code.
Any chance this is a problem with the hashcode() or equals() functions of the key object? By default the hashcode() and equals() methods are for the object instance and not the object value. See the consistency requirements.
Any chance that the cache lookup happens after a cache remove or a timeout of the cache value? Is there cache cleanup logic?
Do you have logic that does 2 operations on the ConcurrentHashMap. For example testing for existence of a cache entry and then making another call to put the value? You should be using putIfAbsent(...) or the other atomic calls if so.
If you edit your post and show a small sample of your code with the key object, the real source of the issue may be revealed.
I've never had good results with ConcurrentHashMap. When I need thread safety, I usally do something like this:
public class Cache<K, V> {
private Map<K, V> cache = new HashMap<>();
public V get(K key) {
synchronized (cache) {
return cache.get(key);
}
}
public void put(K key, V value) {
synchronized (cache) {
cache.put(key, value);
}
}
}
The #Ryan answer is essentially correct.
Remember that you must proxy every Map method that you wish to use
and you must synchronize to the cashe element within every proxied method.
For example:
public void clear()
{
synchronized(cache)
{
cache.clear();
}
}
I have below class where
public class LRUCache {
private HashMap<String,String> dataMap;
private HashMap<String,String> analyticsMap;
public put(String key, String value) {
dataMap.put(key, value);
String date = getCurrentDateAsString();
analyticsMap.put(key, date);
}
public get(String key) {
String date = analyticsMap.get(key);
boolean dateExpired = isDateExpired(date);
boolean value = null;
if (!dateExpired)
value = dataMap.get();
return value;
}
}
In the above class I have 2 hashmaps, which are being accessed in get and put methods. How do I make this class thread safe ?
Do I need to synchronize both get and put which should solve my problem?
In general if I have more than 1 state in class, then instead of making each of using 2 concurrentHashMaps, should I be putting them in a synchronized method?
Merely using ConcurrentHashMap structures doesn't make your LRUCache class thread-safe. You'd need to properly control access so no other thread can modify the underlying contents when you're doing multi-step put/get operations. This can be accomplished with synchronized methods, or with ReentrantReadWriteLock read/write locks.
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html
From the official Javadoc (my highlights) https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html :
A hash table supporting full concurrency of retrievals and high
expected concurrency for updates. This class obeys the same functional
specification as Hashtable, and includes versions of methods
corresponding to each method of Hashtable. However, even though all
operations are thread-safe, retrieval operations do not entail
locking, and there is not any support for locking the entire table in
a way that prevents all access. This class is fully interoperable with
Hashtable in programs that rely on its thread safety but not on its
synchronization details.
I'd use a ReentrantLock https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantLock.html then you can just synchronize a block rather than the whole method.
Today I have asked a question in my interview.
The question is that Collections.synchronizedMap() is
used to synchronize the map which are by default not thread safe like hashmap.
His question is but we can pass any kind of map inside this method.
So what is the effect when we pass a hashtable inside this method because hashtable is by default synchronized.
The behavior of the map will be the same, but the performance will be affected, because each method will acquire two synchronization locks instead of one.
For example, consider calling the method size() on the resulting map. The implementation in the Collections.SynchronizedMap class looks like this:
public int size() {
synchronized(mutex) {return m.size();} // first lock
}
... where, m.size() calls the implementation in Hashtable:
public synchronized int size() { // second lock
return count;
}
The first lock object is the mutex field in SynchronizedMap. The second lock is implicit - the Hashtable instance itself.
You would have two synchronization levels: one at the level of the synchronized map itself, implemented by a mutex object, and one at the level of the wrapped instance:
public boolean isEmpty() {
// first level synchronization
synchronized(mutex) {
// second level synchronization if c is a Hashtable
return c.isEmpty();
}
}
The additional synchronization is not needed and can lead to lower performance.
Another effect is that you won't be able to use API from Hashtable like Hashtable#elements since the wrapped collection is now strictly a Map instance.
It will get wrapped into a SynchronizedMap, from java.util.Collections:
public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m) {
return new SynchronizedMap<>(m);
}
The synchronizedMap() method does not distinguish between the types of Maps passed into it.
"His question is but we can pass any kind of map inside this method."
The answer is yes, because the constructor of SynchronizedMap accepts every Map in it's signature.
"So what is the effect when we pass a hashtable inside this method because hashtable is by default synchronized"
The answer is: We are showing ignorance to the ConcurrentHashMap which is most likely the tool to be uses instead of a blocking implementation.
If you see the code in the SynchronizedCollection. The methods will delegate the call to the underlying collection but adding synchronized block on top of the call something like this
public int size() {
synchronized (mutex) {return c.size();}
}
The implementation of the size looks like this in HashTable class
public synchronized int size() {
return count;
}
So, if you pass in HashTable to SynchronizedCollection, the thread accessing to SynchronizedCollection will have to take the locks at 2 levels once for synchronized block and another for synchronized method.
If there are other threads using the HashTable object directly, they can block the threads using the SynchronizedCollection even when the thread got the lock on SynchronizedCollection.
Does the static ConcurrentHashmap need to be externaly synchronized using synchronize block or locks?
Yes and no. It depends on what you're doing. ConcurrentHashMap is thread safe for all of its methods (e.g. get and put). However, it is not thread safe for non-atomic operations. Here is an example a method that performs a non-atomic operation:
public class Foo {
Map<String, Object> map = new ConcurrentHashMap<String, Object>();
public Object getFoo(String bar) {
Object value = foo.get(bar);
if (value == null) {
value = new Object();
map.put(bar, foo);
}
return value;
}
}
The flaw here is that it is possible for two threads calling getFoo to receive a different Object. Remember that when dealing with a any data structure or type, even as simple as an int, non-atomic operations always require external synchronization. Classes such as AtomicInteger and ConcurrentHashMap assist in making some common operations thread safe, but do not protect against check-then-set operations such as in getFoo above.
You only need external synchronization if you need to obtain a lock on the collection. The collection doesn't expose its internal locks.
ConcurrentMap has putIfAbsent, however if the creation of the object is expensive you may not want to use this.
final ConcurrentMap<Key, Value> map =
public Value get(Key key) {
// allow concurrent read
return map.get(key);
}
public Value getOrCreate(Key key) {
// could put an extra check here to avoid synchronization.
synchronized(map) {
Value val = map.get(key);
if (val == null)
map.put(key, val = new ExpensiveValue(key));
return val;
}
}
As far as I know all needed locking is done in this class so that you don't need to worry about it too much in case you are not doing some specific things and need it to function like that.
On http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html it says:
However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.
So in case this does not represent any problems in your specific application you do not need to worry about it.
No: No need to synchronise externally.
All methods on the java.util.concurrent classes are threadsafe.
Is the following code set up to correctly synchronize the calls on synchronizedMap?
public class MyClass {
private static Map<String, List<String>> synchronizedMap = Collections.synchronizedMap(new HashMap<String, List<String>>());
public void doWork(String key) {
List<String> values = null;
while ((values = synchronizedMap.remove(key)) != null) {
//do something with values
}
}
public static void addToMap(String key, String value) {
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
}
}
}
From my understanding, I need the synchronized block in addToMap() to prevent another thread from calling remove() or containsKey() before I get through the call to put() but I do not need a synchronized block in doWork() because another thread cannot enter the synchronized block in addToMap() before remove() returns because I created the Map originally with Collections.synchronizedMap(). Is that correct? Is there a better way to do this?
Collections.synchronizedMap() guarantees that each atomic operation you want to run on the map will be synchronized.
Running two (or more) operations on the map however, must be synchronized in a block.
So yes - you are synchronizing correctly.
If you are using JDK 6 then you might want to check out ConcurrentHashMap
Note the putIfAbsent method in that class.
There is the potential for a subtle bug in your code.
[UPDATE: Since he's using map.remove() this description isn't totally valid. I missed that fact the first time thru. :( Thanks to the question's author for pointing that out. I'm leaving the rest as is, but changed the lead statement to say there is potentially a bug.]
In doWork() you get the List value from the Map in a thread-safe way. Afterward, however, you are accessing that list in an unsafe matter. For instance, one thread may be using the list in doWork() while another thread invokes synchronizedMap.get(key).add(value) in addToMap(). Those two access are not synchronized. The rule of thumb is that a collection's thread-safe guarantees don't extend to the keys or values they store.
You could fix this by inserting a synchronized list into the map like
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, Collections.synchronizedList(valuesList)); // sync'd list
Alternatively you could synchronize on the map while you access the list in doWork():
public void doWork(String key) {
List<String> values = null;
while ((values = synchronizedMap.remove(key)) != null) {
synchronized (synchronizedMap) {
//do something with values
}
}
}
The last option will limit concurrency a bit, but is somewhat clearer IMO.
Also, a quick note about ConcurrentHashMap. This is a really useful class, but is not always an appropriate replacement for synchronized HashMaps. Quoting from its Javadocs,
This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
In other words, putIfAbsent() is great for atomic inserts but does not guarantee other parts of the map won't change during that call; it guarantees only atomicity. In your sample program, you are relying on the synchronization details of (a synchronized) HashMap for things other than put()s.
Last thing. :) This great quote from Java Concurrency in Practice always helps me in designing an debugging multi-threaded programs.
For each mutable state variable that may be accessed by more than one thread, all accesses to that variable must be performed with the same lock held.
Yes, you are synchronizing correctly. I will explain this in more detail.
You must synchronize two or more method calls on the synchronizedMap object only in a case you have to rely on results of previous method call(s) in the subsequent method call in the sequence of method calls on the synchronizedMap object.
Let’s take a look at this code:
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
}
In this code
synchronizedMap.get(key).add(value);
and
synchronizedMap.put(key, valuesList);
method calls are relied on the result of the previous
synchronizedMap.containsKey(key)
method call.
If the sequence of method calls were not synchronized the result might be wrong.
For example thread 1 is executing the method addToMap() and thread 2 is executing the method doWork()
The sequence of method calls on the synchronizedMap object might be as follows:
Thread 1 has executed the method
synchronizedMap.containsKey(key)
and the result is "true".
After that operating system has switched execution control to thread 2 and it has executed
synchronizedMap.remove(key)
After that execution control has been switched back to the thread 1 and it has executed for example
synchronizedMap.get(key).add(value);
believing the synchronizedMap object contains the key and NullPointerException will be thrown because synchronizedMap.get(key)
will return null.
If the sequence of method calls on the synchronizedMap object is not dependent on the results of each other then you don't need to synchronize the sequence.
For example you don't need to synchronize this sequence:
synchronizedMap.put(key1, valuesList1);
synchronizedMap.put(key2, valuesList2);
Here
synchronizedMap.put(key2, valuesList2);
method call does not rely on the results of the previous
synchronizedMap.put(key1, valuesList1);
method call (it does not care if some thread has interfered in between the two method calls and for example has removed the key1).
That looks correct to me. If I were to change anything, I would stop using the Collections.synchronizedMap() and synchronize everything the same way, just to make it clearer.
Also, I'd replace
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
with
List<String> valuesList = synchronziedMap.get(key);
if (valuesList == null)
{
valuesList = new ArrayList<String>();
synchronziedMap.put(key, valuesList);
}
valuesList.add(value);
The way you have synchronized is correct. But there is a catch
Synchronized wrapper provided by Collection framework ensures that the method calls I.e add/get/contains will run mutually exclusive.
However in real world you would generally query the map before putting in the value. Hence you would need to do two operations and hence a synchronized block is needed. So the way you have used it is correct. However.
You could have used a concurrent implementation of Map available in Collection framework. 'ConcurrentHashMap' benefit is
a. It has a API 'putIfAbsent' which would do the same stuff but in a more efficient manner.
b. Its Efficient: dThe CocurrentMap just locks keys hence its not blocking the whole map's world. Where as you have blocked keys as well as values.
c. You could have passed the reference of your map object somewhere else in your codebase where you/other dev in your tean may end up using it incorrectly. I.e he may just all add() or get() without locking on the map's object. Hence his call won't run mutually exclusive to your sync block. But using a concurrent implementation gives you a peace of mind that it
can never be used/implemented incorrectly.
Check out Google Collections' Multimap, e.g. page 28 of this presentation.
If you can't use that library for some reason, consider using ConcurrentHashMap instead of SynchronizedHashMap; it has a nifty putIfAbsent(K,V) method with which you can atomically add the element list if it's not already there. Also, consider using CopyOnWriteArrayList for the map values if your usage patterns warrant doing so.