I have code which implements a "lock handler" for arbitrary keys. Given a key, it ensures that only one thread at a time can process that(or equals) key (which here means calling the externalSystem.process(key) call).
So far, I have code like this:
public class MyHandler {
private final SomeWorkExecutor someWorkExecutor;
private final ConcurrentHashMap<Key, Lock> lockMap = new ConcurrentHashMap<>();
public void handle(Key key) {
// This can lead to OOM as it creates locks without removing them
Lock keyLock = lockMap.computeIfAbsent(
key, (k) -> new ReentrantLock()
);
keyLock.lock();
try {
someWorkExecutor.process(key);
} finally {
keyLock.unlock();
}
}
}
I understand that this code can lead to the OutOfMemoryError because no one clear map.
I think about how to make map which will accumulate limited count of elements. When limit will be exceeded then we should replace oldest access element with new(this code should synchronized with oldest element as monitor). But I don't know how to have callback which will say me that limit exceeded.
Please share your thoughts.
P.S.
I reread the task and now I see that I have limitation that handle method cannot be invoked more than 8 threads. I don't know how can it help me but I just mentioned it.
P.S.2
by #Boris the Spider was suggested nice and simple solution:
} finally {
lockMap.remove(key);
keyLock.unlock();
}
But after Boris noticed that code us not thread safe because it break behavior:
lets research 3 threads invoked with equally key:
Thread#1 acquire the lock and now before map.remove(key);
Thread#2 invokes with equals key so it wait when thread#1 release lock.
then thread#1 execute map.remove(key);. After this thread#3 invokes method handle. It checks that lock for this key is absent in map thus it creates new lock and acquires it.
Thread#1 releases the lock and thus thread#2 acquires it.
Thus thread#2 and thread#3 can be invoked in parallel for equals keys. But it should not be allowed.
To avoid this situation, before map clearing we should block any thread to acquire the lock while all threads from waitset is not acquire and release the lock. Looks like it is enough complicated synchronization needed and it will lead to slow algorithm working. Maybe we should clear map from time to time when map size exceeds some limited value.
I wasted a lot of time but unfortunately I have not ideas how to achieve this.
You don't need to try to limit the size to some arbitrary value - as it turns out, you can accomplish this kind of "lock handler" idiom while only storing exactly the number of keys currently locked in the map.
The idea is to use a simple convention: successfully adding the mapping to the map counts as the "lock" operation, and removing it counts as the "unlock" operation. This neatly avoids the issue of removing a mapping while some thread still has it locked and other race conditions.
At this point, the value in the mapping is only used to block other threads who arrive with the same key and need to wait until the mapping is removed.
Here's an example1 with CountDownLatch rather than Lock as the map value:
public void handle(Key key) throws InterruptedException {
CountDownLatch latch = new CountDownLatch(1);
// try to acquire the lock by inserting our latch as a
// mapping for key
while(true) {
CountDownLatch existing = lockMap.putIfAbsent(key, latch);
if (existing != null) {
// there is an existing key, wait on it
existing.await();
} else {
break;
}
}
try {
externalSystem.process(key);
} finally {
lockMap.remove(key);
latch.countDown();
}
}
Here, the lifetime of the mapping is only as long as the lock is held. The map will never have more entries than there are concurrent requests for different keys.
The difference with your approach is that the mappings are not "re-used" - each handle call will create a new latch and mapping. Since you are already doing expensive atomic operations, this isn't likely to be much of a slowdown in practice. Another downside is that with many waiting threads, all are woken when the latch counts down, but only one will succeed in putting a new mapping in and hence acquiring the lock - the rest go back to sleep on the new lock.
You could build another version of this which re-uses the mappings when threads coming along and wait on an existing mapping. Basically, the unlocking thread just does a "handoff" to one of the waiting threads. Only one mapping will be used for an entire set of threads that wait on the same key - it is handed off to each one in sequence. The size is still bounded because one no more threads are waiting on a given mapping it is still removed.
To implement that, you replace the CountDownLatch with a map value that can count the number of waiting threads. When a thread does the unlock, it first checks to see if any threads are waiting, and if so wakes one to do the handoff. If no threads are waiting, it "destroys" the object (i.e., sets a flag that the object is no longer in the mapping) and removes it from the map.
You need to do the above manipulations under a proper lock, and there are a few tricky details. In practice I find the short and sweet example above works great.
1 Written on the fly, not compiled and not tested, but the idea works.
You could rely on the method compute(K key, BiFunction<? super K,? super V,? extends V> remappingFunction) to synchronize calls to your method process for a given key, you don't even need anymore to use Lock as type of the values of your map as you don't rely on it anymore.
The idea is to rely on the internal locking mechanism of your ConcurrentHashMap to execute your method, this will allow threads to execute in parallel the process method for keys whose corresponding hashes are not part of the same bin. This equivalent to the approach based on striped locks except that you don't need additional third party library.
The striped locks' approach is interesting because it is very light in term of memory footprint as you only need a limited amount of locks to do it, so the memory footprint needed for your locks is known and never changes, which is not the case of approaches that use one lock for each key (like in your question) such that it is generally better/recommended to use approaches based on striped locks for such need.
So your code could be something like this:
// This will create a ConcurrentHashMap with an initial table size of 16
// bins by default, you may provide an initialCapacity and loadFactor
// if too much or not enough to get the expected table size in order
// increase or reduce the concurrency level of your map
// NB: We don't care much of the type of the value so I arbitrarily
// used Void but it could be any type like simply Object
private final ConcurrentMap<Key, Void> lockMap = new ConcurrentHashMap<>();
public void handle(Key lockKey) {
// Execute the method process through the remapping Function
lockMap.compute(
lockKey,
(key, value) -> {
// Execute the process method under the protection of the
// lock of the bin of hashes corresponding to the key
someWorkExecutor.process(key);
// Returns null to keep the Map empty
return null;
}
);
}
NB 1: As we always returns null the map will always be empty such that you will never run out of memory because of this map.
NB 2: As we never affect a value to a given key, please note that it could also be done using the method computeIfAbsent(K key, Function<? super K,? extends V> mappingFunction):
public void handle(Key lockKey) {
// Execute the method process through the remapping Function
lockMap.computeIfAbsent(
lockKey,
key -> {
// Execute the process method under the protection of the
// lock of the segment of hashes corresponding to the key
someWorkExecutor.process(key);
// Returns null to keep the Map empty
return null;
}
);
}
NB 3: Make sure that your method process never calls the method handle for any keys as you would end up with infinite loops (same key) or deadlocks (other non ordered keys, for example: If one thread calls handle(key1) and then process internally calls handle(key2) and another thread calls in parallel handle(key2) and then process internally calls handle(key1), you will get a deadlock whatever the approach used). This behavior is not specific to this approach, it will occur with any approaches.
One approach is to dispense with the concurrent hash map entirely, and just use a regular HashMap with locking to perform the required manipulation of the map and lock state atomically.
At first glance, this seems to reduce the concurrency of the system, but if we assume that the process(key) call is lengthy relative the very fast lock manipulations, it works well because the process() calls still run concurrently. Only a small and fixed amount of work occurs in the exclusive critical section.
Here's a sketch:
public class MyHandler {
private static class LockHolder {
ReentrantLock lock = new ReentrantLock();
int refcount = 0;
void lock(){
lock.lock();
}
}
private final SomeWorkExecutor someWorkExecutor;
private final Lock mapLock = new ReentrantLock();
private final HashMap<Key, LockHolder> lockMap = new HashMap<>();
public void handle(Key key) {
// lock the map
mapLock.lock();
LockHolder holder = lockMap.computeIfAbsent(key, k -> new LockHolder());
// the lock in holder is either unlocked (newly created by us), or an existing lock, let's increment refcount
holder.refcount++;
mapLock.unlock();
holder.lock();
try {
someWorkExecutor.process(key);
} finally {
mapLock.lock()
keyLock.unlock();
if (--holder.refcount == 0) {
// no more users, remove lock holder
map.remove(key);
}
mapLock.unlock();
}
}
}
We use refcount, which is only manipulated under the shared mapLock to keep track of how many users of the lock there are. Whenever the refcount is zero, we can get rid of the entry as we exit the handler. This approach is nice in that it is fairly easy to reason about and will perform well if the process() call is relatively expensive compared to the locking overhead. Since the map manipulation occurs under a shared lock, it is also straightforward to add additional logic, e.g., keeping some Holder objects in the map, keeping track of statistics, etc.
Thanks Ben Mane
I have found this variant.
public class MyHandler {
private final int THREAD_COUNT = 8;
private final int K = 100;
private final Striped<Lock> striped = Striped.lazyWeakLock(THREAD_COUNT * K);
private final SomeWorkExecutor someWorkExecutor = new SomeWorkExecutor();
public void handle(Key key) throws InterruptedException {
Lock keyLock = striped.get(key);
keyLock.lock();
try {
someWorkExecutor.process(key);
} finally {
keyLock.unlock();
}
}
}
Here's a short and sweet version that leverages the weak version of Guava's Interner class to do the heavily lifting of coming up with a "canonical" object for each key to use as the lock, and implementing weak reference semantics so that unused entries are cleaned up.
public class InternerHandler {
private final Interner = Interners.newWeakInterner();
public void handle(Key key) throws InterruptedException {
Key canonKey = Interner.intern(key);
synchronized (canonKey) {
someWorkExecutor.process(key);
}
}
}
Basically we ask for a canonical canonKey which is equal() to key, and then lock on this canonKey. Everyone will agree on the canonical key and hence all callers that pass equal keys will agree on the object on which to lock.
The weak nature of the Interner means that any time the canonical key isn't being used, the entry can be removed, so you avoid accumulation of entries in the interner. Later, if an equal key again comes in, a new canonical entry is chosen.
The simple code above relies on the built-in monitor to synchronize - but if this doesn't work for you (e.g., it's already used for another purpose) you can include a lock object in the Key class or create a holder object.
class MyHandler {
private final Map<Key, Lock> lockMap = Collections.synchronizedMap(new WeakHashMap<>());
private final SomeWorkExecutor someWorkExecutor = new SomeWorkExecutor();
public void handle(Key key) throws InterruptedException {
Lock keyLock = lockMap.computeIfAbsent(key, (k) -> new ReentrantLock());
keyLock.lock();
try {
someWorkExecutor.process(key);
} finally {
keyLock.unlock();
}
}
}
Creating and removing the lock object for a key each time is an costly operation in term of performance. When you do add/remove lock from concurrent map (say cache), it have to be ensure that putting/removing object from cache is itself thread-safe. So this seems not good idea but can be implemented via ConcurrentHashMap
Strip locking approach (also used by concurrent hash map internally) is better approach. From Google Guava docs it is explained as
When you want to associate a lock with an object, the key guarantee
you need is that if key1.equals(key2), then the lock associated with
key1 is the same as the lock associated with key2.
The crudest way to do this is to associate every key with the same
lock, which results in the coarsest synchronization possible. On the
other hand, you can associate every distinct key with a different
lock, but this requires linear memory consumption and concurrency
management for the system of locks itself, as new keys are discovered.
Striped allows the programmer to select a number of locks, which are
distributed between keys based on their hash code. This allows the
programmer to dynamically select a tradeoff between concurrency and
memory consumption, while retaining the key invariant that if
key1.equals(key2), then striped.get(key1) == striped.get(key2)
code:
//declare globally; e.g. class field level
Striped<Lock> rwLockStripes = Striped.lock(16);
Lock lock = rwLockStripes.get("key");
lock.lock();
try {
// do you work here
} finally {
lock.unlock();
}
Following snipped of code can help in implementing the putting/removal of lock.
private ConcurrentHashMap<String, ReentrantLock> caches = new ConcurrentHashMap<>();
public void processWithLock(String key) {
ReentrantLock lock = findAndGetLock(key);
lock.lock();
try {
// do you work here
} finally {
unlockAndClear(key, lock);
}
}
private void unlockAndClear(String key, ReentrantLock lock) {
// *** Step 1: Release the lock.
lock.unlock();
// *** Step 2: Attempt to remove the lock
// This is done by calling compute method, if given lock is present in
// cache. if current lock object in cache is same instance as 'lock'
// then remove it from cache. If not, some other thread is succeeded in
// putting new lock object and hence we can leave the removal of lock object to that
// thread.
caches.computeIfPresent(key, (k, current) -> lock == current ? null : current);
}
private ReentrantLock findAndGetLock(String key) {
// Merge method given us the access to the previously( if available) and
// newer lock object together.
return caches.merge(key, new ReentrantLock(), (older, newer) -> nonNull(older) ? older : newer);
}
Instead of writing you own you might try something like JKeyLockManager. From the projects description:
JKeyLockManager provides fine-grained locking with application
specific keys.
Example code given on site:
public class WeatherServiceProxy {
private final KeyLockManager lockManager = KeyLockManagers.newManager();
public void updateWeatherData(String cityName, float temperature) {
lockManager.executeLocked(cityName, () -> delegate.updateWeatherData(cityName, temperature));
}
New values will be added when you call
lockMap.computeIfAbsent()
So you can just check lockMap.size() for item count.
But How are you going to find first added item? it would be better just remove items after you used them.
You can use an in process cache that stores object references, like Caffeine, Guava, EHCache or cache2k. Here is an example how to build a cache with cache2k:
final Cache<Key, Lock> locks =
new Cache2kBuilder<Key, Lock>(){}
.loader(
new CacheLoader<Key, Lock>() {
#Override
public Lock load(Key o) {
return new ReentrantLock();
}
}
)
.storeByReference(true)
.entryCapacity(1000)
.build();
The usage pattern is as you have in the question:
Lock keyLock = locks.get(key);
keyLock.lock();
try {
externalSystem.process(key);
} finally {
keyLock.unlock();
}
Since the cache is limited to 1000 entries, there is an automatically cleanup of locks that are not in use any more.
There is the potential that a lock in use is evicted by the cache, if the capacity and the number of threads in the application are mismatching. This solution works perfectly for years in our applications. The cache will evict a lock that is in use, when there is a sufficiently long running task AND the capacity is exceeded. In a real application you always control the number of life threads, e.g. in a web container you would limit the number of processing threads to (example) 100. So you know that there are never more then 100 locks in use. If this is accounted for, this solution has a minimum overhead.
Keep in mind that the locking only works as long as your application runs on a single VM. You may want to take a look at distributed lock managers (DLM). Examples for products that provide distributed locks: hazelcast, infinispan, teracotta, redis/redisson.
Related
I have a REST API which has one method M which does something.
Of course it's called by multiple threads sometimes simultaneously.
This method M has an input String businessID
(which comes from the payload of the caller/client).
Now... I want to protect one particular section from method M's body against simultaneous execution by multiple threads. But I want to defend it only if I have simultaneous executions by two threads T1 and T2 for the same businessID value. So after some thinking I decided to go with this approach.
public M(){
// non-critical work 1
String bid = businessID.intern();
synchronized (bid){
// do some critical work here
}
// non-critical work 2
}
That means I intend to use the interned version of the String businessID as a lock to my critical section of code.
Is this going to work as intended? I think so... but I want to be absolutely sure.
Also, does anyone have any alternative ideas how to implement this? I wonder if there's some ready-made solution, like an idiomatic way of doing this in Java, without having to implement my own cache, my own eviction mechanism, etc. etc.
Note that delays caused by this synchronization are not worrying me. It is very rare scenario two threads to call the method M with the same business ID at the same tome (happens only once or twice per day). Also the critical section takes no more than 1-2 secs to complete execution. So delays caused by threads waiting for obtaining the lock, this is not worrying me here.
It seems like a bad idea because:
String#intern() is a native method and it uses native Hash Table which apparently is much slower than a typical ConcurrentHashMap.
You probably don't want the pool of strings to grow indefinitely. How would you invalidate entries from there?
I think using your own map of strings will be the preferred way. Maybe Guava's cache could be leveraged because you need to evict items from that map eventually. But this needs further research.
Leasing a lock
Another option is to have a set of predefined lock objects. E.g. a HashMap of size 513. Then to acquire a lock use bid.hashCode() mod 513:
int hash = Math.abs(bid.hashCode() % 513);
Object lock = locks.get(hash);
synchronized(lock) {...}
This will occasionally lock unrelated transactions, but at least you don't have to bother with eviction.
PS: there was some method to calculate a true mod in Math class.
I think you can maintain a registry of current locks held by each businessId and before starting critical section, peek into this registry to get/create a lock and after you are done with critical section , release the lock.
Well, It's not production ready something like below
import java.util.HashMap;
import java.util.Map;
class Lock {
public static void main(String[] args) {
String businessId ="bid1";
Lock lock = getLockObjectForBusinessId(businessId);
synchronized (lock) {
//critical section start
//do work
//critical section end
releaseLockForBusinessId(businessId);
}
}
public static Map<String, Lock> currentLocks = new HashMap<>();
public static synchronized Lock getLockObjectForBusinessId(String businessId){
Lock currentLock = currentLocks.get(businessId);
if(currentLock==null){
Lock lock = new Lock();
currentLocks.put(businessId,lock);
return lock;
}
else{
return currentLock;
}
}
public static synchronized void releaseLockForBusinessId(String businessId){
currentLocks.remove(businessId);
}
}
I have several threads that save information to my ConcurrentHashMap<K,V>. I'm supposed to take a snapshot of the whole Map in a parent thread, process the information in it and eventually empty it from all values and keys.
How can I make sure that during the read (in the parent thread), it is not gonna be updated by any child-threads writing to it until I'm done with it? Is it possible to lock this data structure with a semaphore or a mutex?
Try something like this. Using a monitor to guard the map field so when you're taking the snapshot no one else can put values inside it.
public class Example<K, V> {
private final Map<K, V> map = new HashMap<>();
private final Object monitor = new Object();
public Object snapshot() {
synchronized (monitor) {
// take the snapshot and return it
}
}
public V put(K key, V value) {
synchronized (monitor) {
return map.put(key, value);
}
}
}
Also in this example you can simplify it by using a simple HashMap instead of a ConcurrentHashMap because you have the monitor guarding accesses to that field.
Use a ReadWriteLock.
This gives you a pair of locks:
A read lock, which many threads can acquire at the same time
A write lock, which only one thread can hold, and whilst held no thread can hold the read lock.
Despite the names, there is no reason these locks have to be used for reading and writing specifically:
Acquire (and release) the read lock for threads that are updating the map
Acquire (and release) the write lock for the thread which has to see the whole map at once.
I'm attempting to create a ConcurrentHashMap that supports "snapshots" in order to provide consistent iterators, and am wondering if there's a more efficient way to do this. The problem is that if two iterators are created at the same time then they need to read the same values, and the definition of the concurrent hash map's weakly consistent iterators does not guarantee this to be the case. I'd also like to avoid locks if possible: there are several thousand values in the map and processing each item takes several dozen milliseconds, and I don't want to have to block writers during this time as this could result in writers blocking for a minute or longer.
What I have so far:
The ConcurrentHashMap's keys are Strings, and its values are instances of ConcurrentSkipListMap<Long, T>
When an element is added to the hashmap with putIfAbsent, then a new skiplist is allocated, and the object is added via skipList.put(System.nanoTime(), t).
To query the map, I use map.get(key).lastEntry().getValue() to return the most recent value. To query a snapshot (e.g. with an iterator), I use map.get(key).lowerEntry(iteratorTimestamp).getValue(), where iteratorTimestamp is the result of System.nanoTime() called when the iterator was initialized.
If an object is deleted, I use map.get(key).put(timestamp, SnapShotMap.DELETED), where DELETED is a static final object.
Questions:
Is there a library that already implements this? Or barring that, is there a data structure that would be more appropriate than the ConcurrentHashMap and the ConcurrentSkipListMap? My keys are comparable, so maybe some sort of concurrent tree would better support snapshots than a concurrent hash table.
How do I prevent this thing from continually growing? I can delete all of the skip list entries with keys less than X (except for the last key in the map) after all iterators that were initialized on or before X have completed, but I don't know of a good way to determine when this has happened: I can flag that an iterator has completed when its hasNext method returns false, but not all iterators are necessarily going to run to completion; I can keep a WeakReference to an iterator so that I can detect when it's been garbage collected, but I can't think of a good way to detect this other than by using a thread that iterates through the collection of weak references and then sleeps for several minutes - ideally the thread would block on the WeakReference and be notified when the wrapped reference is GC'd, but I don't think this is an option.
ConcurrentSkipListMap<Long, WeakReference<Iterator>> iteratorMap;
while(true) {
long latestGC = 0;
for(Map.Entry<Long, WeakReference<Iterator>> entry : iteratorMap.entrySet()) {
if(entry.getValue().get() == null) {
iteratorMap.remove(entry.getKey());
latestGC = entry.getKey();
} else break;
}
// remove ConcurrentHashMap entries with timestamps less than `latestGC`
Thread.sleep(300000); // five minutes
}
Edit: To clear up some confusion in the answers and comments, I'm currently passing weakly consistent iterators to code written by another division in the company, and they have asked me to increase the strength of the iterators' consistency. They are already aware of the fact that it is infeasible for me to make 100% consistent iterators, they just want a best effort on my part. They care more about throughput than iterator consistency, so coarse-grained locks are not an option.
What is your actual use case that requires a special implementation? From the Javadoc of ConcurrentHashMap (emphasis added):
Retrievals reflect the results of the most recently completed update operations holding upon their onset. ... Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
So the regular ConcurrentHashMap.values().iterator() will give you a "consistent" iterator, but only for one-time use by a single thread. If you need to use the same "snapshot" multiple times and/or by multiple threads, I suggest making a copy of the map.
EDIT: With the new information and the insistence for a "strongly consistent" iterator, I offer this solution. Please note that the use of a ReadWriteLock has the following implications:
Writes will be serialized (only one writer at a time) so write performance may be impacted.
Concurrent reads are allowed as long as there is no write in progress, so read performance impact should be minimal.
Active readers block writers but only as long as it takes to retrieve the reference to the current "snapshot". Once a thread has the snapshot, it no longer blocks writers no matter how long it takes to process the information in the snapshot.
Readers are blocked while any write is active; once the write finishes then all readers will have access to the new snapshot until a new write replaces it.
Consistency is achieved by serializing the writes and making a copy of the current values on each and every write. Readers that hold a reference to a "stale" snapshot can continue to use the old snapshot without worrying about modification, and the garbage collector will reclaim old snapshots as soon as no one is using it any more. It is assumed that there is no requirement for a reader to request a snapshot from an earlier point in time.
Because snapshots are potentially shared among multiple concurrent threads, the snapshots are read-only and cannot be modified. This restriction also applies to the remove() method of any Iterator instances created from the snapshot.
import java.util.*;
import java.util.concurrent.locks.*;
public class StackOverflow16600019 <K, V> {
private final ReadWriteLock locks = new ReentrantReadWriteLock();
private final HashMap<K,V> map = new HashMap<>();
private Collection<V> valueSnapshot = Collections.emptyList();
public V put(K key, V value) {
locks.writeLock().lock();
try {
V oldValue = map.put(key, value);
updateSnapshot();
return oldValue;
} finally {
locks.writeLock().unlock();
}
}
public V remove(K key) {
locks.writeLock().lock();
try {
V removed = map.remove(key);
updateSnapshot();
return removed;
} finally {
locks.writeLock().unlock();
}
}
public Collection<V> values() {
locks.readLock().lock();
try {
return valueSnapshot; // read-only!
} finally {
locks.readLock().unlock();
}
}
/** Callers MUST hold the WRITE LOCK. */
private void updateSnapshot() {
valueSnapshot = Collections.unmodifiableCollection(
new ArrayList<V>(map.values())); // copy
}
}
I've found that the ctrie is the ideal solution - it's a concurrent hash array mapped trie with constant time snapshots
Solution1) What about just synchronizing on the puts, and on the iteration. That should give you a consistent snapshot.
Solution2) Start iterating and make a boolean to say so, then override the puts, putAll so that they go into a queue, when the iteration is finished simply make those puts with the changed values.
I'm trying to find a way to perform multiple operations on a ConcurrentHashMap in an atomic manner.
My logic is like this:
if (!map.contains(key)) {
map.put(key, value);
doSomethingElse();
}
I know there is the putIfAbsent method. But if I use it, I still won't be able to call the doSomethingElse atomically.
Is there any way of doing such things apart from resorting to synchronization / client-side locking?
If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.
If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.
If that's the case, you would generally have to synchronize externally.
In some circumstances (depending on what doSomethingElse() expects the state of the map to be, and what the other threads might do the map), the following may also work:
if (map.putIfAbsent(key, value) == null) {
doSomethingElse();
}
This will ensure that only one thread goes into doSomethingElse() for any given key.
This would work unless you want all putting threads to wait until the first successful thread puts in the map..
if(map.get(key) == null){
Object ret = map.putIfAbsent(key,value);
if(ret == null){ // I won the put
doSomethingElse();
}
}
Now if many threads are putting with the same key only one will win and only one will doSomethingElse().
If your design demands that the map access and the other operation be grouped without anybody else accessing the map, then you have no choice but to lock them. Perhaps the design can be revisited to avoid this need?
This also implies that all other accesses to the map must be serialized behind the same lock.
You might keep a lock per entry. That would allow concurrent non-locking updates, unless two threads try to access the same element.
class LockedReference<T> {
Lock lock = new ReentrantLock();;
T value;
LockedReference(T value) {this.value=value;}
}
LockedReference<T> ref = new LockedReference(value);
ref.lock.lock(); //lock on the new reference, there is no contention here
try {
if (map.putIfAbsent(key, ref)==null) {
//we have locked on the key before inserting the element
doSomethingElse();
}
} finally {ref.lock.unlock();}
later
Object value;
while (true) {
LockedReference<T> ref = map.get(key)
if (ref!=null) {
ref.lock.lock();
//there is no contention, unless a thread is already working on this entry
try {
if (map.containsKey(key)) {
value=ref.value;
break;
} else {
/*key was removed between get and lock*/
}
} finally {ref.lock.unlock();}
} else value=null;
}
A fancier approach would be rewriting ConcurrentHashMap and have a version of putIfAbsent that accepts a Runnable (which is executed if the element was put). But that would be far far more complex.
Basically, ConcurrentHashMap implements locked segments, which is in the middle between one lock per entry, and one global lock for the whole map.
I'm wondering if there's a way in Java to synchronize using two lock objects.
I don't mean locking on either object, I mean locking only on both.
e.g. if I have 4 threads:
Thread A requests a lock using Object1 and Object2
Thread B requests a lock using Object1 and Object3
Thread C requests a lock using Object4 and Object2
Thread D requests a lock using Object1 and Object2
In the above scenario, Thread A and Thread D would share a lock, but Thread B and Thread C would have their own locks. Even though they overlap with one of the two objects, the same lock only applies if it overlaps on both.
So I have a method called by many threads which is going to perform a specific activity type based on a specific database. I have identifier objects for both the database and the activity, and I can guarantee that the action will be thread safe as long as it is not the same activity based on the same database as another thread.
My ideal code would look something like:
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID, actID ) { // <--- Not real Java
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
I could create a hashmap of lock objects that are keyed by both the DatabaseIdentifier and the ActivityIdentifier, but I'm going to run into the same synchronization issue when I need to create/access those locks in a thread-safe way.
For now I'm just synchronizing on the DatabaseIdentifier. It's much less likely that there will be multiple activities going on at the same time for one DBIdentifier, so I will only rarely be over-locking. (Can't say the same for the opposite direction though.)
Anyone have a good way to handle this that doesn't involve forcing unnecessary threads to wait?
Thanks!
have each DatabaseIdentifier keep a set of locks keyed to ActivityIdentifiers that it owns
so you can call
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID.getLock(actID) ) {
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
then you only need a (short) lock on the underlying collection (use a ConcurrentHashMap) in dbID
in other words
ConcurrentHashMap<ActivityIdentifier ,Object> locks = new...
public Object getLock(ActivityIdentifier actID){
Object res = locks.get(actID); //avoid unnecessary allocations of Object
if(res==null) {
Object newLock = new Object();
res = locks.puIfAbsent(actID,newLock );
return res!=null?res:newLock;
} else return res;
}
this is better than locking the full action on dbID (especially when its a long action) but still worse than your ideal scenario
update in responce to comments about EnumMap
private final EnumMap<ActivityIdentifier ,Object> locks;
/**
initializer ensuring all values are initialized
*/
{
EnumMap<ActivityIdentifier ,Object> tmp = new EnumMap<ActivityIdentifier ,Object>(ActivityIdentifier.class)
for(ActivityIdentifier e;ActivityIdentifier.values()){
tmp.put(e,new Object());
}
locks = Collections.unmodifiableMap(tmp);//read-only view ensures no modifications will happen after it is initialized making this thread-safe
}
public Object getLock(ActivityIdentifier actID){
return locks.get(actID);
}
I think you should go the way of the hashmap, but encapsulate that in a flyweight factory. Ie, you call:
FlyweightAllObjectsLock lockObj = FlyweightAllObjectsLock.newInstance(dbID, actID);
Then lock on that object. The flyweight factory can get a read lock on the map to see if the key is in there, and only do a write lock if it is not. It should reduce the concurrency factor.
You might also want to look into using weak references on that map as well, to avoid keeping memory from garbage collection.
I can't think of a way to do this that really captures your idea of locking a pair of objects. Some low-level concurrency boffin might be able to invent one, but i have my doubts about whether we would have the necessary primitives to implement it in Java.
I think the idea of using the pairs as keys to identify lock objects is a good one. If you want to avoid locking, then arrange the lookup so that it doesn't do any.
I would suggest a two-level map, vaguely like:
Map<DatabaseIdentifier, Map<ActivityIdentifier, Lock>> locks;
Used vaguely thus:
synchronized (locks.get(databaseIdentifier).get(activityIdentifier)) {
performSpecificActivityOnDatabase();
}
If you know what all the databases and activities are upfront, then just create a perfectly normal map containing all the combinations when your application starts up, and use it exactly as above. The only locking is on the lock objects, and there is no contention.
If you don't know what the databases and activities will be, or there are too many combinations to create a complete map upfront, then you will need to create the map incrementally. This is where Concurrency Fun Times begin.
The straightforward solution is to lazily create the inner maps and the locks, and to protect these actions with normal locks:
Map<ActivityIdentifier, Object> locksForDatabase;
synchronized (locks) {
locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
locksForDatabase = new HashMap<ActivityIdentifier, Object>();
locks.put(databaseIdentifier, locksForDatabase);
}
}
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
As you are evidently aware, this will lead to too much contention. I mention it only for didactic completeness.
You can improve it by making the outer map concurrent:
ConcurrentMap<DatabaseIdentifier, Map<ActivityIdentifier, Object>> locks;
And:
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
Map<ActivityIdentifier, Object> locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
Your only lock contention there will be on the per-database maps, for the duration of a put and a get, and according to your report, there won't be much of that. You could convert the inner map to a ConcurrentMap to avoid that, but that sounds like overkill.
There will, however, be a steady stream of HashMap instances being created to be fed to putIfAbsent and then being thrown away. You can avoid that with a sort of postmodern atomic remix of double-checked locking; replace the first three lines with:
Map<ActivityIdentifier, Object> locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
}
In the common case that the per-database map already exists, this will do a single concurrent get. In the uncommon case that it does not, it will do an additional but necessary new HashMap() and putIfAbsent. In the very rare case that it does not, but another thread has also discovered that, one of the threads will be doing a redundant new HashMap() and putIfAbsent. That should not be expensive.
Actually, it occurs to me that this is all a terrible idea, and that you should just stick the two identifiers together to make one double-size key, and use that to make lookups in a single ConcurrentHashMap. Sadly, i am too lazy and vain to delete the above. Consider this advice a special prize for reading this far.
PS It always mildly annoys me to see an instance of Object used as nothing but a lock. I propose calling them LockGuffins.
Your hashmap suggestion is what I've done in the past. The only change I'd make is using a ConcurrentHashMap, to minimize the synchronization.
The other issue is how to cleanup the map if the possible keys are going to change.