Performance overhead in synchronizing a SynchronizedMap - java

Will there be a performance hit if I synchronize a SynchronizedMap?
For eg:
private static Map<Integer, Integer> intMap= Collections.synchronizedMap(new HashMap<Integer, Integer>());
public static int doSomething(int mapId) {
synchronized(intMap) {
Integer id = intMap.get(mapId);
if (id != null) {
//do something
}
if (id == null) {
intMap.put(mapId);
}
}
}
I have to synchronize explicitly in the above method since its a combination of operations on the synchronizedMap. Since I have to synchronize anyways, is it better to use normal hashmap instead of synchronizedMap?
Will there be a performance issue if I synchronize a synchronizedMap?

Yes, there will be a performance hit. It may not be much, because the JVM is capable of optimizing these locks based on usage, but it will always be there.
There's no reason to use a synchronized map if you're already using the synchronized keyword on that object, unless you have code in other parts of your program that needs a synchronized Map.

Yes. You are doing unnecessary synchronization on an already synchronized collection. It will be just as safe and efficient (in fact more efficient) to remove the extra synchronized block.
Further, I performed a small test small-scale: https://gist.github.com/AgentTroll/02b77680efe3b92e350f
(Ignore the license, I had it on one of my projects on GitHub, it's there so people don't think I steal code)

Related

Is `synchronized` the only way to ensure thread-safety when modifying collection inside ConcurrentHashMap?

I have a static final ConcurrentHashMap<Long, Queue<User>> MAP, which contains ConcurrentLinkedQueue for values, and I need to frequently modify queues inside the map while ensuring that no other thread can intervene. I've tried to gather pieces of information and best practice on how to handle thread-safety of nested collections, and it seems that the only way is to synchronized all modifications to the nested collection.
My add method
public static void add(Long id, User user) {
Queue<User> q = MAP.get(id);
if (q != null) {
synchronized(q) {
if (q != null) {
q.offer(user);
}
}
}
}
My remove method
public static void remove(Long id, User user) {
Queue<User> q = MAP.get(id);
if (q != null) {
synchronized (q) {
if(q != null) {
q.remove(user);
}
}
}
}
Is it correct or maybe there is a better solution?
Access to the nested collection needs to be thread-safe, but the use of synchronized is just one means of providing thread-safety (and often not the preferred means):
For collections that are already thread-safe and specifically designed for efficient concurrent access, including ConcurrentLinkedQueue, you don't generally need additional synchronisation-- you should use these where possible;
If the collection is not thread-safe and there is no available equivalent that is naturally thread-safe, then synchronized can be a simple solution, but it comes with the caveats that it will exclusively lock the collection during all reads and writes (so even while the collection is not being modified and theoretically, multiple threads could read at the same time, only one read at a time is possible), and in most implementations is "last in, first out", meaning that under high concurrency, accessors at the back of the queue can get starved access for longer periods of time-- still, it can be appropriate when the actual reads/writes are very fast;
Using one of Java's explicit locks (e.g. ReentrantLock, ReentrantReadWriteLock) can be more appropriate when the actual collection read/write operations are more complex and/or you need to guarantee fairness and/or reads outnumber writes and you want to allow concurrent reads while the collection is not being modified.
All of the above presupposes that:
Having one collection nested within another is indeed the appropriate data structure for your purpose;
Your access to the outer ConcurrentHashMap is correct (e.g. in your example, you must have pre-populated the map; if you are adding new queues on the fly, you need to deal properly with the potential race condition of two threads concurrently trying to create a queue for the same ID for the first time).

concurrentHashMap putIfAbsent method functionality

I am a newbie to the world of Java and exploring the concurrentHashMap, while exploring the concurrentHashMap API , I discover the putifAbsent() method
public V putIfAbsent(K paramK, V paramV)
{
if (paramV == null)
throw new NullPointerException();
int i = hash(paramK.hashCode());
return segmentFor(i).put(paramK, i, paramV, true);
}
Now please advise what is it functionality and when do we practically require it , if possible please explain with a small simple example.
A ConcurrentHashMap is designed so that it can be used by a large number of concurrent Threads.
Now, if you used the methods provided by the standard Map interface you would probably write something like this
if(!map.containsKey("something")) {
map.put("something", "a value");
}
This looks good and seems to do the job but, it is not thread safe. So you would then think, "Ah, but I know about the synchronized keyword" and change it to this
synchronized(map) {
if(!map.containsKey("something")) {
map.put("something", "a value");
}
}
Which fixes the issue.
Now what you have done is locked the entire map for both read and write while you check if the key exists and then add it to the map.
This is a very crude solution. Now you could implement your own solution with double checked locks and re-locking on the key etc. but that is a lot of very complicated code that is very prone to bugs.
So, instead you use the solution provided by the JDK.
The ConcurrentHashMap is a clever implementation that divides the Map into regions and locks them individually so that you can have concurrent, thread safe, reads and writes of the map without external locking.
Like all other methods in the implementation putIfAbsent locks the key's region and not the whole Map and therefore allows other things to go on in other regions in the meantime.
ConcurrentHashMap is used when several threads may access the same map concurrently. In that case, implementing putIfAbsent() manually, like below, is not acceptable:
if (!map.containsKey(key)) {
map.put(key, value);
}
Indeed, two threads might execute the above block in parallel and enter in a race condition, where both first test if the key is absent, and then both put their own value in the map, breaking the invariants of the program.
The ConcurrentHashMap thus provides the putIfAbsent() operation which makes sure this is done in an atomic way, avoiding the race condition.
Imagine we need a cache of lazy-initialized named singleton beans. Below is a ConcurrentHashMap based lock-free implementation:
ConcurrentMap<String, Object> map = new ConcurrentHashMap<>();
<T> T getBean(String name, Class<T> cls) throws Exception {
T b1 = (T) map.get(name);
if (b1 != null) {
return b1;
}
b1 = cls.newInstance();
T b2 = (T) map.putIfAbsent(name, b1);
if (b2 != null) {
return b2;
}
return b1;
}
Note that it solves the same problem as double-checked locking but with no locking.

Atomically perform multiple operations

I'm trying to find a way to perform multiple operations on a ConcurrentHashMap in an atomic manner.
My logic is like this:
if (!map.contains(key)) {
map.put(key, value);
doSomethingElse();
}
I know there is the putIfAbsent method. But if I use it, I still won't be able to call the doSomethingElse atomically.
Is there any way of doing such things apart from resorting to synchronization / client-side locking?
If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.
If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.
If that's the case, you would generally have to synchronize externally.
In some circumstances (depending on what doSomethingElse() expects the state of the map to be, and what the other threads might do the map), the following may also work:
if (map.putIfAbsent(key, value) == null) {
doSomethingElse();
}
This will ensure that only one thread goes into doSomethingElse() for any given key.
This would work unless you want all putting threads to wait until the first successful thread puts in the map..
if(map.get(key) == null){
Object ret = map.putIfAbsent(key,value);
if(ret == null){ // I won the put
doSomethingElse();
}
}
Now if many threads are putting with the same key only one will win and only one will doSomethingElse().
If your design demands that the map access and the other operation be grouped without anybody else accessing the map, then you have no choice but to lock them. Perhaps the design can be revisited to avoid this need?
This also implies that all other accesses to the map must be serialized behind the same lock.
You might keep a lock per entry. That would allow concurrent non-locking updates, unless two threads try to access the same element.
class LockedReference<T> {
Lock lock = new ReentrantLock();;
T value;
LockedReference(T value) {this.value=value;}
}
LockedReference<T> ref = new LockedReference(value);
ref.lock.lock(); //lock on the new reference, there is no contention here
try {
if (map.putIfAbsent(key, ref)==null) {
//we have locked on the key before inserting the element
doSomethingElse();
}
} finally {ref.lock.unlock();}
later
Object value;
while (true) {
LockedReference<T> ref = map.get(key)
if (ref!=null) {
ref.lock.lock();
//there is no contention, unless a thread is already working on this entry
try {
if (map.containsKey(key)) {
value=ref.value;
break;
} else {
/*key was removed between get and lock*/
}
} finally {ref.lock.unlock();}
} else value=null;
}
A fancier approach would be rewriting ConcurrentHashMap and have a version of putIfAbsent that accepts a Runnable (which is executed if the element was put). But that would be far far more complex.
Basically, ConcurrentHashMap implements locked segments, which is in the middle between one lock per entry, and one global lock for the whole map.

Is there a way to synchronize using two lock objects in Java?

I'm wondering if there's a way in Java to synchronize using two lock objects.
I don't mean locking on either object, I mean locking only on both.
e.g. if I have 4 threads:
Thread A requests a lock using Object1 and Object2
Thread B requests a lock using Object1 and Object3
Thread C requests a lock using Object4 and Object2
Thread D requests a lock using Object1 and Object2
In the above scenario, Thread A and Thread D would share a lock, but Thread B and Thread C would have their own locks. Even though they overlap with one of the two objects, the same lock only applies if it overlaps on both.
So I have a method called by many threads which is going to perform a specific activity type based on a specific database. I have identifier objects for both the database and the activity, and I can guarantee that the action will be thread safe as long as it is not the same activity based on the same database as another thread.
My ideal code would look something like:
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID, actID ) { // <--- Not real Java
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
I could create a hashmap of lock objects that are keyed by both the DatabaseIdentifier and the ActivityIdentifier, but I'm going to run into the same synchronization issue when I need to create/access those locks in a thread-safe way.
For now I'm just synchronizing on the DatabaseIdentifier. It's much less likely that there will be multiple activities going on at the same time for one DBIdentifier, so I will only rarely be over-locking. (Can't say the same for the opposite direction though.)
Anyone have a good way to handle this that doesn't involve forcing unnecessary threads to wait?
Thanks!
have each DatabaseIdentifier keep a set of locks keyed to ActivityIdentifiers that it owns
so you can call
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID.getLock(actID) ) {
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
then you only need a (short) lock on the underlying collection (use a ConcurrentHashMap) in dbID
in other words
ConcurrentHashMap<ActivityIdentifier ,Object> locks = new...
public Object getLock(ActivityIdentifier actID){
Object res = locks.get(actID); //avoid unnecessary allocations of Object
if(res==null) {
Object newLock = new Object();
res = locks.puIfAbsent(actID,newLock );
return res!=null?res:newLock;
} else return res;
}
this is better than locking the full action on dbID (especially when its a long action) but still worse than your ideal scenario
update in responce to comments about EnumMap
private final EnumMap<ActivityIdentifier ,Object> locks;
/**
initializer ensuring all values are initialized
*/
{
EnumMap<ActivityIdentifier ,Object> tmp = new EnumMap<ActivityIdentifier ,Object>(ActivityIdentifier.class)
for(ActivityIdentifier e;ActivityIdentifier.values()){
tmp.put(e,new Object());
}
locks = Collections.unmodifiableMap(tmp);//read-only view ensures no modifications will happen after it is initialized making this thread-safe
}
public Object getLock(ActivityIdentifier actID){
return locks.get(actID);
}
I think you should go the way of the hashmap, but encapsulate that in a flyweight factory. Ie, you call:
FlyweightAllObjectsLock lockObj = FlyweightAllObjectsLock.newInstance(dbID, actID);
Then lock on that object. The flyweight factory can get a read lock on the map to see if the key is in there, and only do a write lock if it is not. It should reduce the concurrency factor.
You might also want to look into using weak references on that map as well, to avoid keeping memory from garbage collection.
I can't think of a way to do this that really captures your idea of locking a pair of objects. Some low-level concurrency boffin might be able to invent one, but i have my doubts about whether we would have the necessary primitives to implement it in Java.
I think the idea of using the pairs as keys to identify lock objects is a good one. If you want to avoid locking, then arrange the lookup so that it doesn't do any.
I would suggest a two-level map, vaguely like:
Map<DatabaseIdentifier, Map<ActivityIdentifier, Lock>> locks;
Used vaguely thus:
synchronized (locks.get(databaseIdentifier).get(activityIdentifier)) {
performSpecificActivityOnDatabase();
}
If you know what all the databases and activities are upfront, then just create a perfectly normal map containing all the combinations when your application starts up, and use it exactly as above. The only locking is on the lock objects, and there is no contention.
If you don't know what the databases and activities will be, or there are too many combinations to create a complete map upfront, then you will need to create the map incrementally. This is where Concurrency Fun Times begin.
The straightforward solution is to lazily create the inner maps and the locks, and to protect these actions with normal locks:
Map<ActivityIdentifier, Object> locksForDatabase;
synchronized (locks) {
locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
locksForDatabase = new HashMap<ActivityIdentifier, Object>();
locks.put(databaseIdentifier, locksForDatabase);
}
}
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
As you are evidently aware, this will lead to too much contention. I mention it only for didactic completeness.
You can improve it by making the outer map concurrent:
ConcurrentMap<DatabaseIdentifier, Map<ActivityIdentifier, Object>> locks;
And:
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
Map<ActivityIdentifier, Object> locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
Your only lock contention there will be on the per-database maps, for the duration of a put and a get, and according to your report, there won't be much of that. You could convert the inner map to a ConcurrentMap to avoid that, but that sounds like overkill.
There will, however, be a steady stream of HashMap instances being created to be fed to putIfAbsent and then being thrown away. You can avoid that with a sort of postmodern atomic remix of double-checked locking; replace the first three lines with:
Map<ActivityIdentifier, Object> locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
}
In the common case that the per-database map already exists, this will do a single concurrent get. In the uncommon case that it does not, it will do an additional but necessary new HashMap() and putIfAbsent. In the very rare case that it does not, but another thread has also discovered that, one of the threads will be doing a redundant new HashMap() and putIfAbsent. That should not be expensive.
Actually, it occurs to me that this is all a terrible idea, and that you should just stick the two identifiers together to make one double-size key, and use that to make lookups in a single ConcurrentHashMap. Sadly, i am too lazy and vain to delete the above. Consider this advice a special prize for reading this far.
PS It always mildly annoys me to see an instance of Object used as nothing but a lock. I propose calling them LockGuffins.
Your hashmap suggestion is what I've done in the past. The only change I'd make is using a ConcurrentHashMap, to minimize the synchronization.
The other issue is how to cleanup the map if the possible keys are going to change.

Does the static ConcurrentHashmap needs external synchronisation

Does the static ConcurrentHashmap need to be externaly synchronized using synchronize block or locks?
Yes and no. It depends on what you're doing. ConcurrentHashMap is thread safe for all of its methods (e.g. get and put). However, it is not thread safe for non-atomic operations. Here is an example a method that performs a non-atomic operation:
public class Foo {
Map<String, Object> map = new ConcurrentHashMap<String, Object>();
public Object getFoo(String bar) {
Object value = foo.get(bar);
if (value == null) {
value = new Object();
map.put(bar, foo);
}
return value;
}
}
The flaw here is that it is possible for two threads calling getFoo to receive a different Object. Remember that when dealing with a any data structure or type, even as simple as an int, non-atomic operations always require external synchronization. Classes such as AtomicInteger and ConcurrentHashMap assist in making some common operations thread safe, but do not protect against check-then-set operations such as in getFoo above.
You only need external synchronization if you need to obtain a lock on the collection. The collection doesn't expose its internal locks.
ConcurrentMap has putIfAbsent, however if the creation of the object is expensive you may not want to use this.
final ConcurrentMap<Key, Value> map =
public Value get(Key key) {
// allow concurrent read
return map.get(key);
}
public Value getOrCreate(Key key) {
// could put an extra check here to avoid synchronization.
synchronized(map) {
Value val = map.get(key);
if (val == null)
map.put(key, val = new ExpensiveValue(key));
return val;
}
}
As far as I know all needed locking is done in this class so that you don't need to worry about it too much in case you are not doing some specific things and need it to function like that.
On http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html it says:
However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.
So in case this does not represent any problems in your specific application you do not need to worry about it.
No: No need to synchronise externally.
All methods on the java.util.concurrent classes are threadsafe.

Categories