Does the static ConcurrentHashmap need to be externaly synchronized using synchronize block or locks?
Yes and no. It depends on what you're doing. ConcurrentHashMap is thread safe for all of its methods (e.g. get and put). However, it is not thread safe for non-atomic operations. Here is an example a method that performs a non-atomic operation:
public class Foo {
Map<String, Object> map = new ConcurrentHashMap<String, Object>();
public Object getFoo(String bar) {
Object value = foo.get(bar);
if (value == null) {
value = new Object();
map.put(bar, foo);
}
return value;
}
}
The flaw here is that it is possible for two threads calling getFoo to receive a different Object. Remember that when dealing with a any data structure or type, even as simple as an int, non-atomic operations always require external synchronization. Classes such as AtomicInteger and ConcurrentHashMap assist in making some common operations thread safe, but do not protect against check-then-set operations such as in getFoo above.
You only need external synchronization if you need to obtain a lock on the collection. The collection doesn't expose its internal locks.
ConcurrentMap has putIfAbsent, however if the creation of the object is expensive you may not want to use this.
final ConcurrentMap<Key, Value> map =
public Value get(Key key) {
// allow concurrent read
return map.get(key);
}
public Value getOrCreate(Key key) {
// could put an extra check here to avoid synchronization.
synchronized(map) {
Value val = map.get(key);
if (val == null)
map.put(key, val = new ExpensiveValue(key));
return val;
}
}
As far as I know all needed locking is done in this class so that you don't need to worry about it too much in case you are not doing some specific things and need it to function like that.
On http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html it says:
However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.
So in case this does not represent any problems in your specific application you do not need to worry about it.
No: No need to synchronise externally.
All methods on the java.util.concurrent classes are threadsafe.
Related
I have a static final ConcurrentHashMap<Long, Queue<User>> MAP, which contains ConcurrentLinkedQueue for values, and I need to frequently modify queues inside the map while ensuring that no other thread can intervene. I've tried to gather pieces of information and best practice on how to handle thread-safety of nested collections, and it seems that the only way is to synchronized all modifications to the nested collection.
My add method
public static void add(Long id, User user) {
Queue<User> q = MAP.get(id);
if (q != null) {
synchronized(q) {
if (q != null) {
q.offer(user);
}
}
}
}
My remove method
public static void remove(Long id, User user) {
Queue<User> q = MAP.get(id);
if (q != null) {
synchronized (q) {
if(q != null) {
q.remove(user);
}
}
}
}
Is it correct or maybe there is a better solution?
Access to the nested collection needs to be thread-safe, but the use of synchronized is just one means of providing thread-safety (and often not the preferred means):
For collections that are already thread-safe and specifically designed for efficient concurrent access, including ConcurrentLinkedQueue, you don't generally need additional synchronisation-- you should use these where possible;
If the collection is not thread-safe and there is no available equivalent that is naturally thread-safe, then synchronized can be a simple solution, but it comes with the caveats that it will exclusively lock the collection during all reads and writes (so even while the collection is not being modified and theoretically, multiple threads could read at the same time, only one read at a time is possible), and in most implementations is "last in, first out", meaning that under high concurrency, accessors at the back of the queue can get starved access for longer periods of time-- still, it can be appropriate when the actual reads/writes are very fast;
Using one of Java's explicit locks (e.g. ReentrantLock, ReentrantReadWriteLock) can be more appropriate when the actual collection read/write operations are more complex and/or you need to guarantee fairness and/or reads outnumber writes and you want to allow concurrent reads while the collection is not being modified.
All of the above presupposes that:
Having one collection nested within another is indeed the appropriate data structure for your purpose;
Your access to the outer ConcurrentHashMap is correct (e.g. in your example, you must have pre-populated the map; if you are adding new queues on the fly, you need to deal properly with the potential race condition of two threads concurrently trying to create a queue for the same ID for the first time).
I have below class where
public class LRUCache {
private HashMap<String,String> dataMap;
private HashMap<String,String> analyticsMap;
public put(String key, String value) {
dataMap.put(key, value);
String date = getCurrentDateAsString();
analyticsMap.put(key, date);
}
public get(String key) {
String date = analyticsMap.get(key);
boolean dateExpired = isDateExpired(date);
boolean value = null;
if (!dateExpired)
value = dataMap.get();
return value;
}
}
In the above class I have 2 hashmaps, which are being accessed in get and put methods. How do I make this class thread safe ?
Do I need to synchronize both get and put which should solve my problem?
In general if I have more than 1 state in class, then instead of making each of using 2 concurrentHashMaps, should I be putting them in a synchronized method?
Merely using ConcurrentHashMap structures doesn't make your LRUCache class thread-safe. You'd need to properly control access so no other thread can modify the underlying contents when you're doing multi-step put/get operations. This can be accomplished with synchronized methods, or with ReentrantReadWriteLock read/write locks.
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html
From the official Javadoc (my highlights) https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html :
A hash table supporting full concurrency of retrievals and high
expected concurrency for updates. This class obeys the same functional
specification as Hashtable, and includes versions of methods
corresponding to each method of Hashtable. However, even though all
operations are thread-safe, retrieval operations do not entail
locking, and there is not any support for locking the entire table in
a way that prevents all access. This class is fully interoperable with
Hashtable in programs that rely on its thread safety but not on its
synchronization details.
I'd use a ReentrantLock https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantLock.html then you can just synchronize a block rather than the whole method.
The single-check idiom can be used to implement a thread safe-lazy init with (compared to the double-check) the possible drawback of wasting some computation time by multiple concurrent inits. It is
Single-check idiom
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) {
field = result = computeFieldValue();
}
return result;
}
Here, we need the volatile field to avoid that a partially initialized object is passed to another thread, that is - the assignment (write) implicitly performs the necessary synchronization.
I would like to implement a parametrized lazy init cache, which is essentially represented by a Map<Integer, Object>, where each element is created using lazy init.
My question is: is it enough to use a ConcurrentHashMap to avoid the issue with partial initialization. That is, in that case the thread-safe implementation of a lazy-init cache using the single-check idiom can be given by
private final ConcurrentHashMap<Integer, ItemType> items = new ConcurrentHashMap<Integer, ItemType>();
ItemType getItem(Integer index) {
ItemType result = items.get(index);
if (result == null) {
result = computeItemValue(index);
items.put(index, result);
}
return result;
}
In other words: I assume that 'items.put(index, result)' performs the necessary synchronization (since it is a write). Note that the question might be twofold here: first I wonder if this works in the (a) current JVM implementation, second (more importantly) I wonder if this is guaranteed (given the documentation/contract) of ConcurrentHashMap.
Note: Here I assume that computeItemValue generates an immutable object and guarantees thread safety in the sense of the single-check idiom (that is, once object construction is completed, the object returned behaves identically for all threads). I assume this is item 71 in J. Bloch's book.
In java 8, you can use the computeIfAbsent to avoid even the possibility of duplicate initialization:
private final ConcurrentHashMap<Integer, ItemType> items = new ConcurrentHashMap<Integer, ItemType>();
ItemType getItem(Integer index) {
return items.computeIfAbsent(index, this::computeItemValue);
}
is it enough to use a ConcurrentHashMap to avoid the issue with
partial initialization.
Yes, there is an explicit happens-before ordering for objects read from a CHM relative to being written.
So, you cover visibility, but haven't touched on atomicity. There are two components on atomicity
memoization
put if not present
You aren't achieving either. That is, for memoization, you can create more than one object (not actually lazy-init). And for the put if not present, the CHM can receive more than one value since you are not invoking putIfAbsent.
Based on your last update, if they are going to be the same object and are immutable, than you should be fine despite extra construction.
I am a newbie to the world of Java and exploring the concurrentHashMap, while exploring the concurrentHashMap API , I discover the putifAbsent() method
public V putIfAbsent(K paramK, V paramV)
{
if (paramV == null)
throw new NullPointerException();
int i = hash(paramK.hashCode());
return segmentFor(i).put(paramK, i, paramV, true);
}
Now please advise what is it functionality and when do we practically require it , if possible please explain with a small simple example.
A ConcurrentHashMap is designed so that it can be used by a large number of concurrent Threads.
Now, if you used the methods provided by the standard Map interface you would probably write something like this
if(!map.containsKey("something")) {
map.put("something", "a value");
}
This looks good and seems to do the job but, it is not thread safe. So you would then think, "Ah, but I know about the synchronized keyword" and change it to this
synchronized(map) {
if(!map.containsKey("something")) {
map.put("something", "a value");
}
}
Which fixes the issue.
Now what you have done is locked the entire map for both read and write while you check if the key exists and then add it to the map.
This is a very crude solution. Now you could implement your own solution with double checked locks and re-locking on the key etc. but that is a lot of very complicated code that is very prone to bugs.
So, instead you use the solution provided by the JDK.
The ConcurrentHashMap is a clever implementation that divides the Map into regions and locks them individually so that you can have concurrent, thread safe, reads and writes of the map without external locking.
Like all other methods in the implementation putIfAbsent locks the key's region and not the whole Map and therefore allows other things to go on in other regions in the meantime.
ConcurrentHashMap is used when several threads may access the same map concurrently. In that case, implementing putIfAbsent() manually, like below, is not acceptable:
if (!map.containsKey(key)) {
map.put(key, value);
}
Indeed, two threads might execute the above block in parallel and enter in a race condition, where both first test if the key is absent, and then both put their own value in the map, breaking the invariants of the program.
The ConcurrentHashMap thus provides the putIfAbsent() operation which makes sure this is done in an atomic way, avoiding the race condition.
Imagine we need a cache of lazy-initialized named singleton beans. Below is a ConcurrentHashMap based lock-free implementation:
ConcurrentMap<String, Object> map = new ConcurrentHashMap<>();
<T> T getBean(String name, Class<T> cls) throws Exception {
T b1 = (T) map.get(name);
if (b1 != null) {
return b1;
}
b1 = cls.newInstance();
T b2 = (T) map.putIfAbsent(name, b1);
if (b2 != null) {
return b2;
}
return b1;
}
Note that it solves the same problem as double-checked locking but with no locking.
I'm wondering if there's a way in Java to synchronize using two lock objects.
I don't mean locking on either object, I mean locking only on both.
e.g. if I have 4 threads:
Thread A requests a lock using Object1 and Object2
Thread B requests a lock using Object1 and Object3
Thread C requests a lock using Object4 and Object2
Thread D requests a lock using Object1 and Object2
In the above scenario, Thread A and Thread D would share a lock, but Thread B and Thread C would have their own locks. Even though they overlap with one of the two objects, the same lock only applies if it overlaps on both.
So I have a method called by many threads which is going to perform a specific activity type based on a specific database. I have identifier objects for both the database and the activity, and I can guarantee that the action will be thread safe as long as it is not the same activity based on the same database as another thread.
My ideal code would look something like:
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID, actID ) { // <--- Not real Java
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
I could create a hashmap of lock objects that are keyed by both the DatabaseIdentifier and the ActivityIdentifier, but I'm going to run into the same synchronization issue when I need to create/access those locks in a thread-safe way.
For now I'm just synchronizing on the DatabaseIdentifier. It's much less likely that there will be multiple activities going on at the same time for one DBIdentifier, so I will only rarely be over-locking. (Can't say the same for the opposite direction though.)
Anyone have a good way to handle this that doesn't involve forcing unnecessary threads to wait?
Thanks!
have each DatabaseIdentifier keep a set of locks keyed to ActivityIdentifiers that it owns
so you can call
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID.getLock(actID) ) {
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
then you only need a (short) lock on the underlying collection (use a ConcurrentHashMap) in dbID
in other words
ConcurrentHashMap<ActivityIdentifier ,Object> locks = new...
public Object getLock(ActivityIdentifier actID){
Object res = locks.get(actID); //avoid unnecessary allocations of Object
if(res==null) {
Object newLock = new Object();
res = locks.puIfAbsent(actID,newLock );
return res!=null?res:newLock;
} else return res;
}
this is better than locking the full action on dbID (especially when its a long action) but still worse than your ideal scenario
update in responce to comments about EnumMap
private final EnumMap<ActivityIdentifier ,Object> locks;
/**
initializer ensuring all values are initialized
*/
{
EnumMap<ActivityIdentifier ,Object> tmp = new EnumMap<ActivityIdentifier ,Object>(ActivityIdentifier.class)
for(ActivityIdentifier e;ActivityIdentifier.values()){
tmp.put(e,new Object());
}
locks = Collections.unmodifiableMap(tmp);//read-only view ensures no modifications will happen after it is initialized making this thread-safe
}
public Object getLock(ActivityIdentifier actID){
return locks.get(actID);
}
I think you should go the way of the hashmap, but encapsulate that in a flyweight factory. Ie, you call:
FlyweightAllObjectsLock lockObj = FlyweightAllObjectsLock.newInstance(dbID, actID);
Then lock on that object. The flyweight factory can get a read lock on the map to see if the key is in there, and only do a write lock if it is not. It should reduce the concurrency factor.
You might also want to look into using weak references on that map as well, to avoid keeping memory from garbage collection.
I can't think of a way to do this that really captures your idea of locking a pair of objects. Some low-level concurrency boffin might be able to invent one, but i have my doubts about whether we would have the necessary primitives to implement it in Java.
I think the idea of using the pairs as keys to identify lock objects is a good one. If you want to avoid locking, then arrange the lookup so that it doesn't do any.
I would suggest a two-level map, vaguely like:
Map<DatabaseIdentifier, Map<ActivityIdentifier, Lock>> locks;
Used vaguely thus:
synchronized (locks.get(databaseIdentifier).get(activityIdentifier)) {
performSpecificActivityOnDatabase();
}
If you know what all the databases and activities are upfront, then just create a perfectly normal map containing all the combinations when your application starts up, and use it exactly as above. The only locking is on the lock objects, and there is no contention.
If you don't know what the databases and activities will be, or there are too many combinations to create a complete map upfront, then you will need to create the map incrementally. This is where Concurrency Fun Times begin.
The straightforward solution is to lazily create the inner maps and the locks, and to protect these actions with normal locks:
Map<ActivityIdentifier, Object> locksForDatabase;
synchronized (locks) {
locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
locksForDatabase = new HashMap<ActivityIdentifier, Object>();
locks.put(databaseIdentifier, locksForDatabase);
}
}
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
As you are evidently aware, this will lead to too much contention. I mention it only for didactic completeness.
You can improve it by making the outer map concurrent:
ConcurrentMap<DatabaseIdentifier, Map<ActivityIdentifier, Object>> locks;
And:
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
Map<ActivityIdentifier, Object> locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
Your only lock contention there will be on the per-database maps, for the duration of a put and a get, and according to your report, there won't be much of that. You could convert the inner map to a ConcurrentMap to avoid that, but that sounds like overkill.
There will, however, be a steady stream of HashMap instances being created to be fed to putIfAbsent and then being thrown away. You can avoid that with a sort of postmodern atomic remix of double-checked locking; replace the first three lines with:
Map<ActivityIdentifier, Object> locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
}
In the common case that the per-database map already exists, this will do a single concurrent get. In the uncommon case that it does not, it will do an additional but necessary new HashMap() and putIfAbsent. In the very rare case that it does not, but another thread has also discovered that, one of the threads will be doing a redundant new HashMap() and putIfAbsent. That should not be expensive.
Actually, it occurs to me that this is all a terrible idea, and that you should just stick the two identifiers together to make one double-size key, and use that to make lookups in a single ConcurrentHashMap. Sadly, i am too lazy and vain to delete the above. Consider this advice a special prize for reading this far.
PS It always mildly annoys me to see an instance of Object used as nothing but a lock. I propose calling them LockGuffins.
Your hashmap suggestion is what I've done in the past. The only change I'd make is using a ConcurrentHashMap, to minimize the synchronization.
The other issue is how to cleanup the map if the possible keys are going to change.