Hashtable and Collections.synchronizedMap are thread safe but still compound operations like
if (!map_obj.containsKey(key)) {
map_obj.put(key, value);
}
needs external synchronization as:
synchronized(map_obj) {
if (!map_obj.containsKey(key)) {
map_obj.put(key, value);
}
}
Suppose we have ConcurrentHashMap(CHM) instead of Hashtable or HashMap. CHM provides an alternative putIfAbsent() method for the above compound operation, thus removing the need of external synchronization.
But suppose there is no putIfAbsent() provided by CHM. Then can we write following code:
synchronized(concurrenthashmap_obj) {
if (!concurrenthashmap_obj.containsKey(key)) {
concurrenthashmap_obj.put(key, value);
}
}
I mean can we use external synchronization on CHM object?Will it work?
For above compound operation there is putIfAbsent() method in CHM but how can we achieve thread safety for other compound operations if we are using CHM. I mean can we use external synchronization on CHM object?
No, you cannot use external synchronization to ensure atomicity of compound operations over ConcurrentHashMap.
To be precise, you can use external synchronization to ensure atomicity of compound operations, but only if all operations with ConcurrentHashMap are synchronized over the same lock as well (though use of ConcurrentHashMap won't make sense in this case - you can replace it with regular HashMap).
Approach with external synchronization works with Hashtable and Collections.synchronizedMap() only because they guarantee that their primitive operations are synchronized over these objects as well. Since ConcurrentHashMap doesn't provide such a guarantee, primitive operations may interfere with execution of your compound operations, breaking their atomicity.
However, ConcurrentHashMap provides number of methods that can be used to implement compound operations in optimistic manner:
putIfAbsent(key, value)
remove(key, value)
replace(key, value)
replace(key, oldValue, newValue)
You can use these operation to implement certain compound operations without explict synchronization, the same way as you would do for AtomicReference, etc.
There isn't any reason why you can't. Traditional synchronization works with everything, there aren't special exceptions against them. ConcurrentHashMaps simply use more optimized thread-safety mechanisms, if you wish to do something more complex, falling back to traditional synchronization may actually be your only option (that and using locks).
You can always use a synchronized block. The fancy collections in java.util.concurrent do not prohibit it, they just make it redundant for most common use cases. If you are performing a compound operation (e.g. - you want to insert two keys which must always have the same value), not only can you use external synchronization - you must.
E.g.:
String key1 = getKeyFromSomewhere();
String key2 = getKeyFromSomewhereElse();
String value = getValue();
// We want to put two pairs in the map - [key1, value] and [key2, value]
// and be sure that in any point in time both key1 and key2 have the same
// value
synchronized(concurrenthashmap_obj) {
concurrenthashmap_obj.put(key1, value);
// without external syncronoziation, key1's value may have already been
// overwritten from a different thread!
concurrenthashmap_obj.put(key2, value);
}
As the ConcurrentHashMap implements the Map Interface, it does support all features every basic Map does as well. So yes: you can use it like any other map and ignore all the extra features. But then you will essentially have a slower HashMap.
The main difference between a synchronized Map and a concurrent Map is - as the name says - concurrency. Imagine you have 100 threads wanting to read from the Map, if you synchronize you block out 99 threads and 1 can do the work. If you use concurrency 100 threads can work at the same time.
Now if you think about the actual reason why you use threads, you soon come to the conclusion that you should get rid of every possible synchronized block that you can.
It all depends on what you mean by "other compound operation" and by "working". Synchronization works with ConcurrentHashMap exactly the same way as it works with any other object.
So, if you want some complex shared state change to be seen as an atomic change, then all accesses to this shared state must be synchronized, on the same lock. This lock could be the Map itself, or it could be another object.
About java.util.concurrent.ConcurrentHashMap
"is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details: they do not throw ConcurrentModificationException."
"allows concurrency among update operations"
About java memory
Generally speaking the reads are safe from a synchronization standpoint but not a memory standpoint.
See also "http://www.ibm.com/developerworks/java/library/j-jtp03304/".
So synchronizaton and volatile should be used to manage concurrent reading (vs. writing).
About putIfAbsent
putIfAbsent is your friend:
If the specified key is not already associated with a value, associate
it with the given
value. This is equivalent to
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
except that the action is performed !!!atomically!!!.
Related
I'am trying to clarify HashMap vs ConcurrentHashMap regarding type-safety and also performance. I came across a lot of good articles, but still getting troubles figuring it all out.
Let's take the following example using a ConcurrentHashMap, where I will try to add a value for a key not already there and returning it, the new way of doing it would be:
private final Map<K,V> map = new ConcurrentHashMap<>();
return map.putIfAbsent(k, new Object());
let's assume we don't want to use the putIfAbsent method, the above code should look something like this:
private final Map<K,V> map = new ConcurrentHashMap<>();
synchronized (map) {
V value = map.get(key); //Edit adding the value fetch inside synchronized block
if (!nonNull(value)) {
map.put(key, new Object());
}
}
return map.get(key)
Is the problem with this approach the fact that the whole map is locked whereas in first approach the putIfAbsent method only synchronizes on the bucket on which the hash of the key is, and thus leading to less performance ? Would the second approach work fine with just a HashMap ?
Is the problem with this approach the fact that the whole map is locked
There are two problems with this approach.
It's not intrinsic
The fact that you've acquired the lock on the map reference has zero effect whatsoever, except in regards to any other code that (tries) to acquire this lock. Crucially, ConcurrentHashmap itself does not acquire this lock.
So, if, during that second snippet (with synchronized), some other thread does this:
map.putIfAbsent(key, new Object());
Then it may occur that your map.get(key) call returns null, and nevertheless your followup map.put call ends up overwriting. In other words, that both your thread, and that hypothetical thread running putIfAbsent, both decided to write.
Presumably, if that is just fine in your book, that'd be weird. Why use putIfAbsent and check if map.get returns null in the first place?
Had the other thread done this:
synchronized (map) {
map.putIfAbsent(key, new Object());
}
then there'd be no problem; either your get-check-if-null-then-set code will set and the putIfAbsent call is a noop, or vice versa, but they couldn't possibly both 'decide to write'.
Which leads us to;
This is pointless
There are two different ways to achieve concurrency with maps: Intrinsic and extrinsic. There is zero point in doing both, and they do not interact.
If you have structure whereby all access (both read and write) out of a plain old entirely non-multicore capable java.util.HashMap goes through some shared lock (the hashmap instance itself, or any other lock, long as all threads that interact with that particular map instance use the same one), then that works fine and there is therefore no reason or point to using ConcurrentHashMap instead.
The point of ConcurrentHashMap is to streamline concurrent processes without the use of extrinsic locking: To let the map do the locking.
One of the reasons you want this is that the ConcurrentHashMap impl is significantly faster at the jobs it is capable of doing; these jobs are spelled out explicitly: It's the methods that ConcurrentHashMap has.
Atomicity
The central problem of your code snippet is that it lacks atomicity. Check-then-act is fundamentally broken in concurrent models (in your case: Check: Is key 'k' associated with no value or null?, then Act: Set the mapping of key 'k' to value 'v'). This is broken because what if the thing you checked changes in between? What if you have two threads that both 'check-and-act' and then run simultaneously; then they both check first, then both act first, and broken things ensue: One of the two threads will be acting upon a state that isn't equal to the state as it was when you checked, which means your check's broken.
The right model is act-then-check: Act first, and then check the result of the operation. Of course, this requires redefining, and integrating, the code you wrote explicitly in your snippet, into the very definition of your 'act' phase.
In other words, putIfAbsent is not a convenience method! is a fundamental operation! It's the only way (short of extrinsic locking) to convey the notion of: "Perform the action of associating 'v' with 'k', but only if there is no association yet. I'll check the results of this operation next". There is no way to break that down into if (!map.containsKey(key)) map.put(key, v); because check-then-act does not work in concurrent modelling.
Conclusions
Either get rid of concurrenthashmap, or get rid of synchronized. Having code that uses both is probably broken and even if it isn't, it's error prone, confusing, and I can guarantee you there's a much better way to write it (better in that it is more idiomatic, easier to read, more flexible in the face of future change requests, easier to test, and less likely to have hard-to-test-for bugs in it).
If you can state all operations you need to perform 100% in terms of the methods that CHM has, then do that, because CHM is vastly superior. It even has mechanisms for arbitrary operations: For example, unlike basic hashmaps, you can iterate through a CHM even if other threads are also messing with it, whereas with a normal hashmap you need to hold the lock for the entire duration of the operation, which means any other thread trying to do anything to that hashmap, even just 'ask for its size', need to wait. Hence, for most use cases, CHM results in orders of magnitude better performance.
in first approach the putIfAbsent method only synchronizes on the bucket
That is incorrect, ConcurrentHashMap doesn't synchronize on anything, it uses different mechanics to ensure thread safety.
Would the second approach work fine with just a HashMap ?
Yes, except the second approach is flawed. If using synchronization to make a Map thread-safe, then all access of the Map should use synchronization. As such, it would be best to call Collections.synchronizedMap(map). Performance will be worse than using ConcurrentHashMap.
private final Map<Integer, Object> map = Collections.synchronizedMap(new HashMap<>());
let's assume we don't want to use the putIfAbsent method.
Why? Oh, because it wastes a allocation if the key is already in the map, which is why we should be using computeIfAbsent() instead
map.computeIfAbsent(key, k -> new Object());
I am declaring a Java Map as
Map<String, String> map = Collections.synchronizedMap(new HashMap<String, String>());
to deal with the concurrency issues, and synchronizing on the map for all the operations on it. However, I read that synchronization isn't necessary on a synchronizedMap when the operations are atomic. I checked the Java API and the documentation of HashMap doesn't seem to mention which are atomic, so I'm not sure which are.
I'm synchronizing on the following calls to the map:
map.size()
map.put()
map.remove()
map.get()
But if some are atomic, it seems synchronization isn't necessary for these. Which are atomic?
A synchronized map as the name suggests is synchronized. Every operation on it is atomic in respect to any other operation on it.
You can think of it as if every method of your synchronized map is declared with a synchronized keyword.
Please bear in mind that although individual operations are atomic, if you combine them they're no longer atomic, for instance:
String value = map.get("key");
map.put("key", value+"2");
is not equivalent to your custom synchronized code:
synchronized (map) {
String value = map.get("key");
map.put("key", value+"2");
}
but rather:
synchronized (map) {
String value = map.get("key");
}
synchronized (map) {
map.put("key", value+"2");
}
A HashMap is not guaranteed to have atomic operations. Calling any of its methods from different threads (even size()) may corrupt the map. However, a map obtained using Collections.synchronizedMap will have each call synchronized (and hence thread-safe).
However, you may need higher-level synchronization. For instance, if you test whether a key is present, read the size, or otherwise access something from the map and then do something else with the map based on the result, the map may have changed between the two calls. In that case, you need a synchronized block to make the entire transaction atomic, rather than a synchronized map (that just makes each call atomic).
The map itself is synchronized, not some internal locks. Running more than one operation on the map does require a synchronized block. In any event, if you are using a JDK 1.6 or greater, you should consider using ConcurrentHashMap
ConcurrentHashMap is optimal when you need to ensure data consistency, and each of your threads need a current view of the map. If performance is critical, and each thread only inserts data to the map, with reads happening less frequently, then use the path you've outlined. That said, performance may only be poorer when only a single thread accesses a ConcurrentHashMap at a time, but significantly better when multiple threads access the map concurrently.
this is a passage from JavaDoc regarding ConcurrentHashMap. It says retrieval operations generally do not block, so may overlap with update operations. Does this mean the get() method is not thread safe?
"However, even though all operations are thread-safe, retrieval
operations do not entail locking, and there is not any support for
locking the entire table in a way that prevents all access. This class
is fully interoperable with Hashtable in programs that rely on its
thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset."
The get() method is thread-safe, and the other users gave you useful answers regarding this particular issue.
However, although ConcurrentHashMap is a thread-safe drop-in replacement for HashMap, it is important to realize that if you are doing multiple operations you may have to change your code significantly. For example, take this code:
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
In a multi-thread environment, this is a race condition. You have to use the ConcurrentHashMap.putIfAbsent(K key, V value) and pay attention to the return value, which tells you if the put operation was successful or not. Read the docs for more details.
Answering to a comment that asks for clarification on why this is a race condition.
Imagine there are two threads A, B that are going to put two different values in the map, v1 and v2 respectively, having the same key. The key is initially not present in the map. They interleave in this way:
Thread A calls containsKey and finds out that the key is not present, but is immediately suspended.
Thread B calls containsKey and finds out that the key is not present, and has the time to insert its value v2.
Thread A resumes and inserts v1, "peacefully" overwriting (since put is threadsafe) the value inserted by thread B.
Now thread B "thinks" it has successfully inserted its very own value v2, but the map contains v1. This is really a disaster because thread B may call v2.updateSomething() and will "think" that the consumers of the map (e.g. other threads) have access to that object and will see that maybe important update ("like: this visitor IP address is trying to perform a DOS, refuse all the requests from now on"). Instead, the object will be soon garbage collected and lost.
It is thread-safe. However, the way it is being thread-safe may not be what you expect. There are some "hints" you can see from:
This class is fully interoperable with Hashtable in programs that
rely on its thread safety but not on its synchronization details
To know the whole story in a more complete picture, you need to be aware of the ConcurrentMap interface.
The original Map provides some very basic read/update methods. Even I was able to make a thread-safe implementation of Map; there are lots of cases that people cannot use my Map without considering my synchronization mechanism. This is a typical example:
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
This piece of code is not thread-safe, even though the map itself is. Two threads calling containsKey() at the same time could think there is no such key they both therefore insert into the Map.
In order to fix the problem, we need to do extra synchronization explicitly. Assume the thread-safety of my Map is achieved by synchronized keywords, you will need to do:
synchronized(threadSafeMap) {
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
}
Such extra code needs you to know about the "synchronization details" of the map. In the above example, we need to know that the synchronization is achieved by "synchronized".
ConcurrentMap interface take this one step further. It defines some common "complex" actions that involves multiple access to map. For example, the above example is exposed as putIfAbsent(). With these "complex" actions, users of ConcurrentMap (in most case) don't need to synchronise actions with multiple access to the map. Hence, the implementation of Map can perform more complicated synchronization mechanism for better performance. ConcurrentHashhMap is a good example. Thread-safety is in fact maintained by keeping separate locks for different partitions of the map. It is thread-safe because concurrent access to the map will not corrupt the internal data structure, or cause any update lost unexpected, etc.
With all the above in mind, the meaning of Javadoc will be clearer:
"Retrieval operations (including get) generally do not block" because ConcurrentHashMap is not using "synchronized" for its thread-safety. The logic of get itself takes care of the thread-safeness; and If you look further in the Javadoc:
The table is internally partitioned to try to permit the indicated number
of concurrent updates without contention
Not only is retrieval non-blocking, even updates can happen concurrently. However, non-blocking/concurrent-updates does not means that it is thread-UNsafe. It simply means that it is using some ways other than simple "synchronized" for thread-safety.
However, as the internal synchronization mechanism is not exposed, if you want to do some complicated actions other than those provided by ConcurrentMap, you may need to consider changing your logic, or consider not using ConcurrentHashMap. For example:
// only remove if both key1 and key2 exists
if (map.containsKey(key1) && map.containsKey(key2)) {
map.remove(key1);
map.remove(key2);
}
ConcurrentHashmap.get() is thread-safe, in the sense that
It will not throw any exception, including ConcurrentModificationException
It will return a result that was true at some (recent) time in past. This means that two back-to-back calls to get can return different results. Of course, this true of any other Map as well.
HashMap is divided into "buckets" based on hashCode. ConcurrentHashMap uses this fact. Its synchronization mechanism is based on blocking buckets rather than on entire Map. This way few threads can simultaneously write to few different buckets (one thread can write to one bucket at a time).
Reading from ConcurrentHashMap almost doesn't use synchronization. Synchronization is used when while fetching value for key, it sees null value. Since ConcurrentHashMap can't store null as values (yes, aside from keys, values also can't be nulls) it suggests that fetching null while reading happened in the middle of initializing map entry (key-value pair) by another thread: when key was assigned, but value not yet, and it still holds default null.
In such case reading thread will need to wait until entry will be written fully.
So results from read() will be based on current state of map. If you read value of key that was in the middle of updating you will likely get old value since writing process hasn't finished yet.
get() in ConcurrentHashMap is thread-safe because It reads the value
which is Volatile. And in cases when value is null of any key, then
get() method waits till it gets the lock and then it reads the updated
value.
When put() method is updating CHM, then it sets the value of that key to null, and then it creates a new entry and updates the CHM. This null value is used by get() method as signal that another thread is updating the CHM with the same key.
It just means that when one thread is updating and one thread is reading there is no guarantee that the one that called the ConcurrentHashMap method first, in time, will have their operation occur first.
Think about an update on the item telling where Bob is. If one thread asks where Bob is at about the same time that another thread updates to say he came 'inside', you can't predict whether the reader thread will get Bob's status as 'inside' or 'outside'. Even if the update thread calls the method first, the reader thread might get the 'outside' status.
The threads will not cause each other problems. The code is ThreadSafe.
One thread won't go into an infinite loop or start generating wierd NullPointerExceptions or get "itside" with half of the old status and half of the new.
I have the following class for a Router's table with synchronised methods:
public class RouterTable {
private String tableForRouter;
private Map<String,RouterTableEntry> table;
public RouterTable(String router){
tableForRouter = router;
table = new HashMap<String,RouterTableEntry>();
}
public String owner(){
return tableForRouter;
}
public synchronized void add(String network, String ipAddress, int distance){
table.put(network, new RouterTableEntry(ipAddress, distance));
}
public synchronized boolean exists(String network){
return table.containsKey(network);
}
}
Multiple threads will read and write to the HashMap. I was wondering if it would be best to remove the synchronized on the methods and just use Collections.synchronizedMap(new HashMap<String,RouterTableEntry())` what is the most sensible way in Java to do this?
I would suggest using a ConcurrentHashmap. This is a newer data structure introduced in later version of Java. It provides thread safety and allows concurrent operations, as opposed to a synchronized map, which will do one operation at a time.
If the map is the only place where thread safety is required, then just using the ConcurrentHashmap is fine. However, if you have atomic operations involving more state variables, I would suggest using synchronized code blocks instead of synchronized functions
In the absence of strict requirements about happens-before relationships and point in time correctness, the sensible thing to do in modern java is usually just use a ConcurrentMap.
Otherwise, yes, using a Collections#synchronizedMap is both safer and likely more performant (because you won't enclose any tertiary code that doesn't need synchronization) than manually synchronizing everything yourself.
The best is to use a java.util.concurrent.ConcurrentHashMap, which is designed from the ground up for concurrent access (read & write).
Using synchronization like you do works, but shows high contention and therefore not optimal performance. A collection obtained through Collections.synchronizedMap() would do just the same (it only wraps a standart collection with synchronized methods).
ConcurrentHashMap, on the contrary, used various techniques to be thread-safe and provide good concurrency ; for example, it has (by default) 16 regions, each guarded by a distinct lock, so that up to 16 threads can use it concurrently.
Synchronizing the map will prevent users of your class from doing meaningful synchronization.
They will have no way of knowing if the result from exists is still valid, once they get into there if statement, and will need to do external synchronization.
With the synchronized methods as you show, they could lock on your class until they are done with a block of method calls.
The other option is to do no synchronization and let the user handle that, which they need to do anyway to be safe.
Adding your own synchronization is what was wrong with HashTable.
The current common style tends to prefer Synchronized collections over explicit synchronized qualification on the methods that access them. However, this is not set in stone, and your decision should depend on the way you use this code/will use this code in the future.
Points to consider:
(a) If your map is going to be used by code that is outside of the RouterTable then you need to use a SynchronizedMap.
(b) OTOH, if you are going to add some additional fields to RouterTable, and their values need to be consistent with the values in the map (in other words: you want changes to the map and to the additional fields to happen in one atomic quantum), then you need to use synchrnoized method.
It's been a while since I've used hashtable for anything significant, but I seem to recall the get() and put() methods being synchronized.
The JavaDocs don't reflect this. They simply say that the class Hashtable is synchronized. What can I assume? If several threads access the hashtable at the same time (assuming they are not modifying the same entry), the operations will succeed, right? I guess what I'm asking is "Is java.util.Hashtable thread safe?"
Please Guide me to get out of this issue...
It is threadsafe because the get, put, contains methods etc are synchronized. Furthermore, Several threads will not be able to access the hashtable at the same time, regardless of which entries they are modifying.
edit - amended to include the provisio that synchronization makes the hashtable internally threadsafe in that it is modified atomically; it doesn't guard against race conditions in outside code caused by concurrent access to the hashtable by multiple threads.
For general usage it is thread safe.
But you have to understand that it doesent make your application logic around it thread-safe. For e.g. consider implementing to put a value in a map, if its not there already.
This idiom is called putIfAbsent. Its difficult to implement this in a thread-safe manner using HashTable alone. Similarly for the idiom replace(k,V,V).
Hence for certain idioms like putIfAbsent and and replace(K,V,V), I would recommend using ConcurrentHashMap
Hashtable is deprecated. Forget it. If you want to use synchronized collections, use Collections.syncrhonize*() wrapper for that purpose. But these ones are not recommended. In Java 5, 6 new concurrent algorithms have been implemented. Copy-on-write, CAS, lock-free algorithms.
For Map interface there are two concurrent implementations. ConcurrentHashMap (concurrent hash map) and ConcurrentSkipListMap - concurrent sorted map implementaion.
The first one is optimized for reading, so retrievals do not block even while the table is being updated. Writes are also work much faster comparing with synchronized wrappers cause a ConcurrentHashMap consists of not one but a set of tables, called segments. It can be managed by the last argument in the constructor:
public ConcurrentHashMap(int initialCapacity,
float loadFactor,
int concurrencyLevel);
ConcurrentHashMap is indispensable in highly concurrent contexts, where it performs far better than any available alternative.
No. It is 'threadsafe' only to the extent that its methods are synchronized. However it is not threadsafe in general, and it can't be, because classes that export internal state such as Iterators or Enumerations require the use of the internal state to be synchronized as well. That's why the new Collections classes are not synchronized, as the Java designers recognized that thread-safety is up to the user of the class, not the class itself.
I'm asking is "Is java.util.Hashtable thread safe?".
Yes Hashtable is thread safe, If a thread safe is not needed in your application then go through HashMap, In case, If a thread-safe implementation is desired,then it is recommended to use ConcurrentHashMap in place of Hashtable.
Note, that a lot of the answers state that Hashtable is synchronised. but this will give you a very little. The synchronization is on the accessor / mutator methods will stop two threads adding or removing from the map concurrently, but in the real world you will often need additional synchronisation.
Even iterating over a Hashtable's entries is not thread safe unless you also guard the Map from being modified through additional synchronization.
If you look into Hashtable code, you will see that methods are synchronized such as:
public synchronized V get(Object key)
public synchronized V put(K key, V value)
public synchronized boolean containsKey(Object key)
You can keep pressing on control key (command for mac) and then click on any method name in the eclipse to go to the java source code.
Unlike the new collection implementations, Hashtable is synchronized. *If a thread-safe implementation is not needed, it is recommended to use HashMap* in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.
http://download.oracle.com/javase/7/docs/api/java/util/Hashtable.html
Yes, Hashtable thread safe, so only one thread can access a hashtable at any time
HashMap, on the other side, is not thread safe (and thus 'faster').