concurrentHashMap putIfAbsent method functionality

concurrentHashMap putIfAbsent method functionality - java

I am a newbie to the world of Java and exploring the concurrentHashMap, while exploring the concurrentHashMap API , I discover the putifAbsent() method
public V putIfAbsent(K paramK, V paramV)
{
if (paramV == null)
throw new NullPointerException();
int i = hash(paramK.hashCode());
return segmentFor(i).put(paramK, i, paramV, true);
}
Now please advise what is it functionality and when do we practically require it , if possible please explain with a small simple example.

A ConcurrentHashMap is designed so that it can be used by a large number of concurrent Threads.
Now, if you used the methods provided by the standard Map interface you would probably write something like this
if(!map.containsKey("something")) {
map.put("something", "a value");
}
This looks good and seems to do the job but, it is not thread safe. So you would then think, "Ah, but I know about the synchronized keyword" and change it to this
synchronized(map) {
if(!map.containsKey("something")) {
map.put("something", "a value");
}
}
Which fixes the issue.
Now what you have done is locked the entire map for both read and write while you check if the key exists and then add it to the map.
This is a very crude solution. Now you could implement your own solution with double checked locks and re-locking on the key etc. but that is a lot of very complicated code that is very prone to bugs.
So, instead you use the solution provided by the JDK.
The ConcurrentHashMap is a clever implementation that divides the Map into regions and locks them individually so that you can have concurrent, thread safe, reads and writes of the map without external locking.
Like all other methods in the implementation putIfAbsent locks the key's region and not the whole Map and therefore allows other things to go on in other regions in the meantime.

ConcurrentHashMap is used when several threads may access the same map concurrently. In that case, implementing putIfAbsent() manually, like below, is not acceptable:
if (!map.containsKey(key)) {
map.put(key, value);
}
Indeed, two threads might execute the above block in parallel and enter in a race condition, where both first test if the key is absent, and then both put their own value in the map, breaking the invariants of the program.
The ConcurrentHashMap thus provides the putIfAbsent() operation which makes sure this is done in an atomic way, avoiding the race condition.

Imagine we need a cache of lazy-initialized named singleton beans. Below is a ConcurrentHashMap based lock-free implementation:
ConcurrentMap<String, Object> map = new ConcurrentHashMap<>();
<T> T getBean(String name, Class<T> cls) throws Exception {
T b1 = (T) map.get(name);
if (b1 != null) {
return b1;
}
b1 = cls.newInstance();
T b2 = (T) map.putIfAbsent(name, b1);
if (b2 != null) {
return b2;
}
return b1;
}
Note that it solves the same problem as double-checked locking but with no locking.

Related

Performance overhead in synchronizing a SynchronizedMap

Will there be a performance hit if I synchronize a SynchronizedMap?
For eg:
private static Map<Integer, Integer> intMap= Collections.synchronizedMap(new HashMap<Integer, Integer>());
public static int doSomething(int mapId) {
synchronized(intMap) {
Integer id = intMap.get(mapId);
if (id != null) {
//do something
}
if (id == null) {
intMap.put(mapId);
}
}
}
I have to synchronize explicitly in the above method since its a combination of operations on the synchronizedMap. Since I have to synchronize anyways, is it better to use normal hashmap instead of synchronizedMap?
Will there be a performance issue if I synchronize a synchronizedMap?

Yes, there will be a performance hit. It may not be much, because the JVM is capable of optimizing these locks based on usage, but it will always be there.
There's no reason to use a synchronized map if you're already using the synchronized keyword on that object, unless you have code in other parts of your program that needs a synchronized Map.

Yes. You are doing unnecessary synchronization on an already synchronized collection. It will be just as safe and efficient (in fact more efficient) to remove the extra synchronized block.
Further, I performed a small test small-scale: https://gist.github.com/AgentTroll/02b77680efe3b92e350f
(Ignore the license, I had it on one of my projects on GitHub, it's there so people don't think I steal code)

Atomically perform multiple operations

I'm trying to find a way to perform multiple operations on a ConcurrentHashMap in an atomic manner.
My logic is like this:
if (!map.contains(key)) {
map.put(key, value);
doSomethingElse();
}
I know there is the putIfAbsent method. But if I use it, I still won't be able to call the doSomethingElse atomically.
Is there any way of doing such things apart from resorting to synchronization / client-side locking?
If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.

If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.
If that's the case, you would generally have to synchronize externally.
In some circumstances (depending on what doSomethingElse() expects the state of the map to be, and what the other threads might do the map), the following may also work:
if (map.putIfAbsent(key, value) == null) {
doSomethingElse();
}
This will ensure that only one thread goes into doSomethingElse() for any given key.

This would work unless you want all putting threads to wait until the first successful thread puts in the map..
if(map.get(key) == null){
Object ret = map.putIfAbsent(key,value);
if(ret == null){ // I won the put
doSomethingElse();
}
}
Now if many threads are putting with the same key only one will win and only one will doSomethingElse().

If your design demands that the map access and the other operation be grouped without anybody else accessing the map, then you have no choice but to lock them. Perhaps the design can be revisited to avoid this need?
This also implies that all other accesses to the map must be serialized behind the same lock.

You might keep a lock per entry. That would allow concurrent non-locking updates, unless two threads try to access the same element.
class LockedReference<T> {
Lock lock = new ReentrantLock();;
T value;
LockedReference(T value) {this.value=value;}
}
LockedReference<T> ref = new LockedReference(value);
ref.lock.lock(); //lock on the new reference, there is no contention here
try {
if (map.putIfAbsent(key, ref)==null) {
//we have locked on the key before inserting the element
doSomethingElse();
}
} finally {ref.lock.unlock();}
later
Object value;
while (true) {
LockedReference<T> ref = map.get(key)
if (ref!=null) {
ref.lock.lock();
//there is no contention, unless a thread is already working on this entry
try {
if (map.containsKey(key)) {
value=ref.value;
break;
} else {
/*key was removed between get and lock*/
}
} finally {ref.lock.unlock();}
} else value=null;
}
A fancier approach would be rewriting ConcurrentHashMap and have a version of putIfAbsent that accepts a Runnable (which is executed if the element was put). But that would be far far more complex.
Basically, ConcurrentHashMap implements locked segments, which is in the middle between one lock per entry, and one global lock for the whole map.

Is there a way to synchronize using two lock objects in Java?

I'm wondering if there's a way in Java to synchronize using two lock objects.
I don't mean locking on either object, I mean locking only on both.
e.g. if I have 4 threads:
Thread A requests a lock using Object1 and Object2
Thread B requests a lock using Object1 and Object3
Thread C requests a lock using Object4 and Object2
Thread D requests a lock using Object1 and Object2
In the above scenario, Thread A and Thread D would share a lock, but Thread B and Thread C would have their own locks. Even though they overlap with one of the two objects, the same lock only applies if it overlaps on both.
So I have a method called by many threads which is going to perform a specific activity type based on a specific database. I have identifier objects for both the database and the activity, and I can guarantee that the action will be thread safe as long as it is not the same activity based on the same database as another thread.
My ideal code would look something like:
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID, actID ) { // <--- Not real Java
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
I could create a hashmap of lock objects that are keyed by both the DatabaseIdentifier and the ActivityIdentifier, but I'm going to run into the same synchronization issue when I need to create/access those locks in a thread-safe way.
For now I'm just synchronizing on the DatabaseIdentifier. It's much less likely that there will be multiple activities going on at the same time for one DBIdentifier, so I will only rarely be over-locking. (Can't say the same for the opposite direction though.)
Anyone have a good way to handle this that doesn't involve forcing unnecessary threads to wait?
Thanks!

have each DatabaseIdentifier keep a set of locks keyed to ActivityIdentifiers that it owns
so you can call
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID.getLock(actID) ) {
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
then you only need a (short) lock on the underlying collection (use a ConcurrentHashMap) in dbID
in other words
ConcurrentHashMap<ActivityIdentifier ,Object> locks = new...
public Object getLock(ActivityIdentifier actID){
Object res = locks.get(actID); //avoid unnecessary allocations of Object
if(res==null) {
Object newLock = new Object();
res = locks.puIfAbsent(actID,newLock );
return res!=null?res:newLock;
} else return res;
}
this is better than locking the full action on dbID (especially when its a long action) but still worse than your ideal scenario
update in responce to comments about EnumMap
private final EnumMap<ActivityIdentifier ,Object> locks;
/**
initializer ensuring all values are initialized
*/
{
EnumMap<ActivityIdentifier ,Object> tmp = new EnumMap<ActivityIdentifier ,Object>(ActivityIdentifier.class)
for(ActivityIdentifier e;ActivityIdentifier.values()){
tmp.put(e,new Object());
}
locks = Collections.unmodifiableMap(tmp);//read-only view ensures no modifications will happen after it is initialized making this thread-safe
}
public Object getLock(ActivityIdentifier actID){
return locks.get(actID);
}

I think you should go the way of the hashmap, but encapsulate that in a flyweight factory. Ie, you call:
FlyweightAllObjectsLock lockObj = FlyweightAllObjectsLock.newInstance(dbID, actID);
Then lock on that object. The flyweight factory can get a read lock on the map to see if the key is in there, and only do a write lock if it is not. It should reduce the concurrency factor.
You might also want to look into using weak references on that map as well, to avoid keeping memory from garbage collection.

I can't think of a way to do this that really captures your idea of locking a pair of objects. Some low-level concurrency boffin might be able to invent one, but i have my doubts about whether we would have the necessary primitives to implement it in Java.
I think the idea of using the pairs as keys to identify lock objects is a good one. If you want to avoid locking, then arrange the lookup so that it doesn't do any.
I would suggest a two-level map, vaguely like:
Map<DatabaseIdentifier, Map<ActivityIdentifier, Lock>> locks;
Used vaguely thus:
synchronized (locks.get(databaseIdentifier).get(activityIdentifier)) {
performSpecificActivityOnDatabase();
}
If you know what all the databases and activities are upfront, then just create a perfectly normal map containing all the combinations when your application starts up, and use it exactly as above. The only locking is on the lock objects, and there is no contention.
If you don't know what the databases and activities will be, or there are too many combinations to create a complete map upfront, then you will need to create the map incrementally. This is where Concurrency Fun Times begin.
The straightforward solution is to lazily create the inner maps and the locks, and to protect these actions with normal locks:
Map<ActivityIdentifier, Object> locksForDatabase;
synchronized (locks) {
locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
locksForDatabase = new HashMap<ActivityIdentifier, Object>();
locks.put(databaseIdentifier, locksForDatabase);
}
}
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
As you are evidently aware, this will lead to too much contention. I mention it only for didactic completeness.
You can improve it by making the outer map concurrent:
ConcurrentMap<DatabaseIdentifier, Map<ActivityIdentifier, Object>> locks;
And:
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
Map<ActivityIdentifier, Object> locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
Your only lock contention there will be on the per-database maps, for the duration of a put and a get, and according to your report, there won't be much of that. You could convert the inner map to a ConcurrentMap to avoid that, but that sounds like overkill.
There will, however, be a steady stream of HashMap instances being created to be fed to putIfAbsent and then being thrown away. You can avoid that with a sort of postmodern atomic remix of double-checked locking; replace the first three lines with:
Map<ActivityIdentifier, Object> locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
}
In the common case that the per-database map already exists, this will do a single concurrent get. In the uncommon case that it does not, it will do an additional but necessary new HashMap() and putIfAbsent. In the very rare case that it does not, but another thread has also discovered that, one of the threads will be doing a redundant new HashMap() and putIfAbsent. That should not be expensive.
Actually, it occurs to me that this is all a terrible idea, and that you should just stick the two identifiers together to make one double-size key, and use that to make lookups in a single ConcurrentHashMap. Sadly, i am too lazy and vain to delete the above. Consider this advice a special prize for reading this far.
PS It always mildly annoys me to see an instance of Object used as nothing but a lock. I propose calling them LockGuffins.

Your hashmap suggestion is what I've done in the past. The only change I'd make is using a ConcurrentHashMap, to minimize the synchronization.
The other issue is how to cleanup the map if the possible keys are going to change.

Does the static ConcurrentHashmap needs external synchronisation

Does the static ConcurrentHashmap need to be externaly synchronized using synchronize block or locks?

Yes and no. It depends on what you're doing. ConcurrentHashMap is thread safe for all of its methods (e.g. get and put). However, it is not thread safe for non-atomic operations. Here is an example a method that performs a non-atomic operation:
public class Foo {
Map<String, Object> map = new ConcurrentHashMap<String, Object>();
public Object getFoo(String bar) {
Object value = foo.get(bar);
if (value == null) {
value = new Object();
map.put(bar, foo);
}
return value;
}
}
The flaw here is that it is possible for two threads calling getFoo to receive a different Object. Remember that when dealing with a any data structure or type, even as simple as an int, non-atomic operations always require external synchronization. Classes such as AtomicInteger and ConcurrentHashMap assist in making some common operations thread safe, but do not protect against check-then-set operations such as in getFoo above.

You only need external synchronization if you need to obtain a lock on the collection. The collection doesn't expose its internal locks.
ConcurrentMap has putIfAbsent, however if the creation of the object is expensive you may not want to use this.
final ConcurrentMap<Key, Value> map =
public Value get(Key key) {
// allow concurrent read
return map.get(key);
}
public Value getOrCreate(Key key) {
// could put an extra check here to avoid synchronization.
synchronized(map) {
Value val = map.get(key);
if (val == null)
map.put(key, val = new ExpensiveValue(key));
return val;
}
}

As far as I know all needed locking is done in this class so that you don't need to worry about it too much in case you are not doing some specific things and need it to function like that.
On http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html it says:
However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.
So in case this does not represent any problems in your specific application you do not need to worry about it.

No: No need to synchronise externally.
All methods on the java.util.concurrent classes are threadsafe.

Java synchronized block vs. Collections.synchronizedMap

Is the following code set up to correctly synchronize the calls on synchronizedMap?
public class MyClass {
private static Map<String, List<String>> synchronizedMap = Collections.synchronizedMap(new HashMap<String, List<String>>());
public void doWork(String key) {
List<String> values = null;
while ((values = synchronizedMap.remove(key)) != null) {
//do something with values
}
}
public static void addToMap(String key, String value) {
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
}
}
}
From my understanding, I need the synchronized block in addToMap() to prevent another thread from calling remove() or containsKey() before I get through the call to put() but I do not need a synchronized block in doWork() because another thread cannot enter the synchronized block in addToMap() before remove() returns because I created the Map originally with Collections.synchronizedMap(). Is that correct? Is there a better way to do this?

Collections.synchronizedMap() guarantees that each atomic operation you want to run on the map will be synchronized.
Running two (or more) operations on the map however, must be synchronized in a block.
So yes - you are synchronizing correctly.

If you are using JDK 6 then you might want to check out ConcurrentHashMap
Note the putIfAbsent method in that class.

There is the potential for a subtle bug in your code.
[UPDATE: Since he's using map.remove() this description isn't totally valid. I missed that fact the first time thru. :( Thanks to the question's author for pointing that out. I'm leaving the rest as is, but changed the lead statement to say there is potentially a bug.]
In doWork() you get the List value from the Map in a thread-safe way. Afterward, however, you are accessing that list in an unsafe matter. For instance, one thread may be using the list in doWork() while another thread invokes synchronizedMap.get(key).add(value) in addToMap(). Those two access are not synchronized. The rule of thumb is that a collection's thread-safe guarantees don't extend to the keys or values they store.
You could fix this by inserting a synchronized list into the map like
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, Collections.synchronizedList(valuesList)); // sync'd list
Alternatively you could synchronize on the map while you access the list in doWork():
public void doWork(String key) {
List<String> values = null;
while ((values = synchronizedMap.remove(key)) != null) {
synchronized (synchronizedMap) {
//do something with values
}
}
}
The last option will limit concurrency a bit, but is somewhat clearer IMO.
Also, a quick note about ConcurrentHashMap. This is a really useful class, but is not always an appropriate replacement for synchronized HashMaps. Quoting from its Javadocs,
This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
In other words, putIfAbsent() is great for atomic inserts but does not guarantee other parts of the map won't change during that call; it guarantees only atomicity. In your sample program, you are relying on the synchronization details of (a synchronized) HashMap for things other than put()s.
Last thing. :) This great quote from Java Concurrency in Practice always helps me in designing an debugging multi-threaded programs.
For each mutable state variable that may be accessed by more than one thread, all accesses to that variable must be performed with the same lock held.

Yes, you are synchronizing correctly. I will explain this in more detail.
You must synchronize two or more method calls on the synchronizedMap object only in a case you have to rely on results of previous method call(s) in the subsequent method call in the sequence of method calls on the synchronizedMap object.
Let’s take a look at this code:
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
}
In this code
synchronizedMap.get(key).add(value);
and
synchronizedMap.put(key, valuesList);
method calls are relied on the result of the previous
synchronizedMap.containsKey(key)
method call.
If the sequence of method calls were not synchronized the result might be wrong.
For example thread 1 is executing the method addToMap() and thread 2 is executing the method doWork()
The sequence of method calls on the synchronizedMap object might be as follows:
Thread 1 has executed the method
synchronizedMap.containsKey(key)
and the result is "true".
After that operating system has switched execution control to thread 2 and it has executed
synchronizedMap.remove(key)
After that execution control has been switched back to the thread 1 and it has executed for example
synchronizedMap.get(key).add(value);
believing the synchronizedMap object contains the key and NullPointerException will be thrown because synchronizedMap.get(key)
will return null.
If the sequence of method calls on the synchronizedMap object is not dependent on the results of each other then you don't need to synchronize the sequence.
For example you don't need to synchronize this sequence:
synchronizedMap.put(key1, valuesList1);
synchronizedMap.put(key2, valuesList2);
Here
synchronizedMap.put(key2, valuesList2);
method call does not rely on the results of the previous
synchronizedMap.put(key1, valuesList1);
method call (it does not care if some thread has interfered in between the two method calls and for example has removed the key1).

That looks correct to me. If I were to change anything, I would stop using the Collections.synchronizedMap() and synchronize everything the same way, just to make it clearer.
Also, I'd replace
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
with
List<String> valuesList = synchronziedMap.get(key);
if (valuesList == null)
{
valuesList = new ArrayList<String>();
synchronziedMap.put(key, valuesList);
}
valuesList.add(value);

The way you have synchronized is correct. But there is a catch
Synchronized wrapper provided by Collection framework ensures that the method calls I.e add/get/contains will run mutually exclusive.
However in real world you would generally query the map before putting in the value. Hence you would need to do two operations and hence a synchronized block is needed. So the way you have used it is correct. However.
You could have used a concurrent implementation of Map available in Collection framework. 'ConcurrentHashMap' benefit is
a. It has a API 'putIfAbsent' which would do the same stuff but in a more efficient manner.
b. Its Efficient: dThe CocurrentMap just locks keys hence its not blocking the whole map's world. Where as you have blocked keys as well as values.
c. You could have passed the reference of your map object somewhere else in your codebase where you/other dev in your tean may end up using it incorrectly. I.e he may just all add() or get() without locking on the map's object. Hence his call won't run mutually exclusive to your sync block. But using a concurrent implementation gives you a peace of mind that it
can never be used/implemented incorrectly.

Check out Google Collections' Multimap, e.g. page 28 of this presentation.
If you can't use that library for some reason, consider using ConcurrentHashMap instead of SynchronizedHashMap; it has a nifty putIfAbsent(K,V) method with which you can atomically add the element list if it's not already there. Also, consider using CopyOnWriteArrayList for the map values if your usage patterns warrant doing so.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

concurrentHashMap putIfAbsent method functionality - java

Related

Performance overhead in synchronizing a SynchronizedMap

Atomically perform multiple operations

Is there a way to synchronize using two lock objects in Java?

Does the static ConcurrentHashmap needs external synchronisation

Java synchronized block vs. Collections.synchronizedMap

Categories

Resources