Java 8 ConcurrentHashMap per key wait() within compute() - java

I want to be able to release the lock of an atomic execution like compute() to wait for a condition, how can I do that?
(Edited) Is there something that is like wait() (wait for a condition) just for the current key within compute() of ConcurrentHashMap (or getAnd*() functions of AtomicReference), but will actually release the lock in Java 8? I'm also fine with using a totally different API.
I know I can do what I want if I have a separate list of objects/locks for each key, and use a plain old synchronized block, but I am looking for a less clunky way.
Pseudocode to illustrate:
public class Test {
ConcurrentHashMap<String, Integer> map;
public void illustration(String key) {
map.computeIfPresent(key, (k, v) -> {
Integer new_v = v;
if (!/* Condition on v */) {
// Pretend this will release the lock held by compute()
k.wait(timeout);
new_v = map.get(k);
}
if (/* Same condition on new_v */) {
return /* Result of operation on new_v */;
} else
throw new RuntimeException();
});
}
}

I want to be able to release the lock of an atomic execution like compute() to wait for a condition, how can I do that?
Your pseudo-code using wait() doesn't release the lock(s) on the map. It is a bad idea.
The compute function will be called while holding a lock on part of the map. As the javadoc states:
"Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map."
A wait() call implies the possibility that other threads using the map may be blocked for the duration of the wait.
I guess your idea of "something equivalent" is another way of waiting on the condition. That will have the same issues.
And your followup / modified question.
Is there something that is like wait() (wait for a condition) just for the current key within compute() of ConcurrentHashMap (or getAnd*() functions of AtomicReference), but will actually release the lock in Java 8? I'm also fine with using a totally different API.
No there isn't. The only way to get the computeIfPresent call to release its lock(s) on parts of the map is to terminate the call; i.e. return from your compute method or throw an exception.
This is just the way that mutual exclusion locking works in Java ... and in other languages. (No matter how you ask the question, the fundamental problem is the same.)
I think you need to go with your "clunky" approach.
The other thing to consider is that k.wait(...) is only allowed if you are holding the primitive lock on k. But k appears to be a String. This is liable to be problematic unless your map keys have been canonicalized; e.g. by interning them.

Related

Synchronized lock by particular ID

I have a REST API which has one method M which does something.
Of course it's called by multiple threads sometimes simultaneously.
This method M has an input String businessID
(which comes from the payload of the caller/client).
Now... I want to protect one particular section from method M's body against simultaneous execution by multiple threads. But I want to defend it only if I have simultaneous executions by two threads T1 and T2 for the same businessID value. So after some thinking I decided to go with this approach.
public M(){
// non-critical work 1
String bid = businessID.intern();
synchronized (bid){
// do some critical work here
}
// non-critical work 2
}
That means I intend to use the interned version of the String businessID as a lock to my critical section of code.
Is this going to work as intended? I think so... but I want to be absolutely sure.
Also, does anyone have any alternative ideas how to implement this? I wonder if there's some ready-made solution, like an idiomatic way of doing this in Java, without having to implement my own cache, my own eviction mechanism, etc. etc.
Note that delays caused by this synchronization are not worrying me. It is very rare scenario two threads to call the method M with the same business ID at the same tome (happens only once or twice per day). Also the critical section takes no more than 1-2 secs to complete execution. So delays caused by threads waiting for obtaining the lock, this is not worrying me here.
It seems like a bad idea because:
String#intern() is a native method and it uses native Hash Table which apparently is much slower than a typical ConcurrentHashMap.
You probably don't want the pool of strings to grow indefinitely. How would you invalidate entries from there?
I think using your own map of strings will be the preferred way. Maybe Guava's cache could be leveraged because you need to evict items from that map eventually. But this needs further research.
Leasing a lock
Another option is to have a set of predefined lock objects. E.g. a HashMap of size 513. Then to acquire a lock use bid.hashCode() mod 513:
int hash = Math.abs(bid.hashCode() % 513);
Object lock = locks.get(hash);
synchronized(lock) {...}
This will occasionally lock unrelated transactions, but at least you don't have to bother with eviction.
PS: there was some method to calculate a true mod in Math class.
I think you can maintain a registry of current locks held by each businessId and before starting critical section, peek into this registry to get/create a lock and after you are done with critical section , release the lock.
Well, It's not production ready something like below
import java.util.HashMap;
import java.util.Map;
class Lock {
public static void main(String[] args) {
String businessId ="bid1";
Lock lock = getLockObjectForBusinessId(businessId);
synchronized (lock) {
//critical section start
//do work
//critical section end
releaseLockForBusinessId(businessId);
}
}
public static Map<String, Lock> currentLocks = new HashMap<>();
public static synchronized Lock getLockObjectForBusinessId(String businessId){
Lock currentLock = currentLocks.get(businessId);
if(currentLock==null){
Lock lock = new Lock();
currentLocks.put(businessId,lock);
return lock;
}
else{
return currentLock;
}
}
public static synchronized void releaseLockForBusinessId(String businessId){
currentLocks.remove(businessId);
}
}

Lock handler for arbitrary keys

I have code which implements a "lock handler" for arbitrary keys. Given a key, it ensures that only one thread at a time can process that(or equals) key (which here means calling the externalSystem.process(key) call).
So far, I have code like this:
public class MyHandler {
private final SomeWorkExecutor someWorkExecutor;
private final ConcurrentHashMap<Key, Lock> lockMap = new ConcurrentHashMap<>();
public void handle(Key key) {
// This can lead to OOM as it creates locks without removing them
Lock keyLock = lockMap.computeIfAbsent(
key, (k) -> new ReentrantLock()
);
keyLock.lock();
try {
someWorkExecutor.process(key);
} finally {
keyLock.unlock();
}
}
}
I understand that this code can lead to the OutOfMemoryError because no one clear map.
I think about how to make map which will accumulate limited count of elements. When limit will be exceeded then we should replace oldest access element with new(this code should synchronized with oldest element as monitor). But I don't know how to have callback which will say me that limit exceeded.
Please share your thoughts.
P.S.
I reread the task and now I see that I have limitation that handle method cannot be invoked more than 8 threads. I don't know how can it help me but I just mentioned it.
P.S.2
by #Boris the Spider was suggested nice and simple solution:
} finally {
lockMap.remove(key);
keyLock.unlock();
}
But after Boris noticed that code us not thread safe because it break behavior:
lets research 3 threads invoked with equally key:
Thread#1 acquire the lock and now before map.remove(key);
Thread#2 invokes with equals key so it wait when thread#1 release lock.
then thread#1 execute map.remove(key);. After this thread#3 invokes method handle. It checks that lock for this key is absent in map thus it creates new lock and acquires it.
Thread#1 releases the lock and thus thread#2 acquires it.
Thus thread#2 and thread#3 can be invoked in parallel for equals keys. But it should not be allowed.
To avoid this situation, before map clearing we should block any thread to acquire the lock while all threads from waitset is not acquire and release the lock. Looks like it is enough complicated synchronization needed and it will lead to slow algorithm working. Maybe we should clear map from time to time when map size exceeds some limited value.
I wasted a lot of time but unfortunately I have not ideas how to achieve this.
You don't need to try to limit the size to some arbitrary value - as it turns out, you can accomplish this kind of "lock handler" idiom while only storing exactly the number of keys currently locked in the map.
The idea is to use a simple convention: successfully adding the mapping to the map counts as the "lock" operation, and removing it counts as the "unlock" operation. This neatly avoids the issue of removing a mapping while some thread still has it locked and other race conditions.
At this point, the value in the mapping is only used to block other threads who arrive with the same key and need to wait until the mapping is removed.
Here's an example1 with CountDownLatch rather than Lock as the map value:
public void handle(Key key) throws InterruptedException {
CountDownLatch latch = new CountDownLatch(1);
// try to acquire the lock by inserting our latch as a
// mapping for key
while(true) {
CountDownLatch existing = lockMap.putIfAbsent(key, latch);
if (existing != null) {
// there is an existing key, wait on it
existing.await();
} else {
break;
}
}
try {
externalSystem.process(key);
} finally {
lockMap.remove(key);
latch.countDown();
}
}
Here, the lifetime of the mapping is only as long as the lock is held. The map will never have more entries than there are concurrent requests for different keys.
The difference with your approach is that the mappings are not "re-used" - each handle call will create a new latch and mapping. Since you are already doing expensive atomic operations, this isn't likely to be much of a slowdown in practice. Another downside is that with many waiting threads, all are woken when the latch counts down, but only one will succeed in putting a new mapping in and hence acquiring the lock - the rest go back to sleep on the new lock.
You could build another version of this which re-uses the mappings when threads coming along and wait on an existing mapping. Basically, the unlocking thread just does a "handoff" to one of the waiting threads. Only one mapping will be used for an entire set of threads that wait on the same key - it is handed off to each one in sequence. The size is still bounded because one no more threads are waiting on a given mapping it is still removed.
To implement that, you replace the CountDownLatch with a map value that can count the number of waiting threads. When a thread does the unlock, it first checks to see if any threads are waiting, and if so wakes one to do the handoff. If no threads are waiting, it "destroys" the object (i.e., sets a flag that the object is no longer in the mapping) and removes it from the map.
You need to do the above manipulations under a proper lock, and there are a few tricky details. In practice I find the short and sweet example above works great.
1 Written on the fly, not compiled and not tested, but the idea works.
You could rely on the method compute(K key, BiFunction<? super K,? super V,? extends V> remappingFunction) to synchronize calls to your method process for a given key, you don't even need anymore to use Lock as type of the values of your map as you don't rely on it anymore.
The idea is to rely on the internal locking mechanism of your ConcurrentHashMap to execute your method, this will allow threads to execute in parallel the process method for keys whose corresponding hashes are not part of the same bin. This equivalent to the approach based on striped locks except that you don't need additional third party library.
The striped locks' approach is interesting because it is very light in term of memory footprint as you only need a limited amount of locks to do it, so the memory footprint needed for your locks is known and never changes, which is not the case of approaches that use one lock for each key (like in your question) such that it is generally better/recommended to use approaches based on striped locks for such need.
So your code could be something like this:
// This will create a ConcurrentHashMap with an initial table size of 16
// bins by default, you may provide an initialCapacity and loadFactor
// if too much or not enough to get the expected table size in order
// increase or reduce the concurrency level of your map
// NB: We don't care much of the type of the value so I arbitrarily
// used Void but it could be any type like simply Object
private final ConcurrentMap<Key, Void> lockMap = new ConcurrentHashMap<>();
public void handle(Key lockKey) {
// Execute the method process through the remapping Function
lockMap.compute(
lockKey,
(key, value) -> {
// Execute the process method under the protection of the
// lock of the bin of hashes corresponding to the key
someWorkExecutor.process(key);
// Returns null to keep the Map empty
return null;
}
);
}
NB 1: As we always returns null the map will always be empty such that you will never run out of memory because of this map.
NB 2: As we never affect a value to a given key, please note that it could also be done using the method computeIfAbsent(K key, Function<? super K,? extends V> mappingFunction):
public void handle(Key lockKey) {
// Execute the method process through the remapping Function
lockMap.computeIfAbsent(
lockKey,
key -> {
// Execute the process method under the protection of the
// lock of the segment of hashes corresponding to the key
someWorkExecutor.process(key);
// Returns null to keep the Map empty
return null;
}
);
}
NB 3: Make sure that your method process never calls the method handle for any keys as you would end up with infinite loops (same key) or deadlocks (other non ordered keys, for example: If one thread calls handle(key1) and then process internally calls handle(key2) and another thread calls in parallel handle(key2) and then process internally calls handle(key1), you will get a deadlock whatever the approach used). This behavior is not specific to this approach, it will occur with any approaches.
One approach is to dispense with the concurrent hash map entirely, and just use a regular HashMap with locking to perform the required manipulation of the map and lock state atomically.
At first glance, this seems to reduce the concurrency of the system, but if we assume that the process(key) call is lengthy relative the very fast lock manipulations, it works well because the process() calls still run concurrently. Only a small and fixed amount of work occurs in the exclusive critical section.
Here's a sketch:
public class MyHandler {
private static class LockHolder {
ReentrantLock lock = new ReentrantLock();
int refcount = 0;
void lock(){
lock.lock();
}
}
private final SomeWorkExecutor someWorkExecutor;
private final Lock mapLock = new ReentrantLock();
private final HashMap<Key, LockHolder> lockMap = new HashMap<>();
public void handle(Key key) {
// lock the map
mapLock.lock();
LockHolder holder = lockMap.computeIfAbsent(key, k -> new LockHolder());
// the lock in holder is either unlocked (newly created by us), or an existing lock, let's increment refcount
holder.refcount++;
mapLock.unlock();
holder.lock();
try {
someWorkExecutor.process(key);
} finally {
mapLock.lock()
keyLock.unlock();
if (--holder.refcount == 0) {
// no more users, remove lock holder
map.remove(key);
}
mapLock.unlock();
}
}
}
We use refcount, which is only manipulated under the shared mapLock to keep track of how many users of the lock there are. Whenever the refcount is zero, we can get rid of the entry as we exit the handler. This approach is nice in that it is fairly easy to reason about and will perform well if the process() call is relatively expensive compared to the locking overhead. Since the map manipulation occurs under a shared lock, it is also straightforward to add additional logic, e.g., keeping some Holder objects in the map, keeping track of statistics, etc.
Thanks Ben Mane
I have found this variant.
public class MyHandler {
private final int THREAD_COUNT = 8;
private final int K = 100;
private final Striped<Lock> striped = Striped.lazyWeakLock(THREAD_COUNT * K);
private final SomeWorkExecutor someWorkExecutor = new SomeWorkExecutor();
public void handle(Key key) throws InterruptedException {
Lock keyLock = striped.get(key);
keyLock.lock();
try {
someWorkExecutor.process(key);
} finally {
keyLock.unlock();
}
}
}
Here's a short and sweet version that leverages the weak version of Guava's Interner class to do the heavily lifting of coming up with a "canonical" object for each key to use as the lock, and implementing weak reference semantics so that unused entries are cleaned up.
public class InternerHandler {
private final Interner = Interners.newWeakInterner();
public void handle(Key key) throws InterruptedException {
Key canonKey = Interner.intern(key);
synchronized (canonKey) {
someWorkExecutor.process(key);
}
}
}
Basically we ask for a canonical canonKey which is equal() to key, and then lock on this canonKey. Everyone will agree on the canonical key and hence all callers that pass equal keys will agree on the object on which to lock.
The weak nature of the Interner means that any time the canonical key isn't being used, the entry can be removed, so you avoid accumulation of entries in the interner. Later, if an equal key again comes in, a new canonical entry is chosen.
The simple code above relies on the built-in monitor to synchronize - but if this doesn't work for you (e.g., it's already used for another purpose) you can include a lock object in the Key class or create a holder object.
class MyHandler {
private final Map<Key, Lock> lockMap = Collections.synchronizedMap(new WeakHashMap<>());
private final SomeWorkExecutor someWorkExecutor = new SomeWorkExecutor();
public void handle(Key key) throws InterruptedException {
Lock keyLock = lockMap.computeIfAbsent(key, (k) -> new ReentrantLock());
keyLock.lock();
try {
someWorkExecutor.process(key);
} finally {
keyLock.unlock();
}
}
}
Creating and removing the lock object for a key each time is an costly operation in term of performance. When you do add/remove lock from concurrent map (say cache), it have to be ensure that putting/removing object from cache is itself thread-safe. So this seems not good idea but can be implemented via ConcurrentHashMap
Strip locking approach (also used by concurrent hash map internally) is better approach. From Google Guava docs it is explained as
When you want to associate a lock with an object, the key guarantee
you need is that if key1.equals(key2), then the lock associated with
key1 is the same as the lock associated with key2.
The crudest way to do this is to associate every key with the same
lock, which results in the coarsest synchronization possible. On the
other hand, you can associate every distinct key with a different
lock, but this requires linear memory consumption and concurrency
management for the system of locks itself, as new keys are discovered.
Striped allows the programmer to select a number of locks, which are
distributed between keys based on their hash code. This allows the
programmer to dynamically select a tradeoff between concurrency and
memory consumption, while retaining the key invariant that if
key1.equals(key2), then striped.get(key1) == striped.get(key2)
code:
//declare globally; e.g. class field level
Striped<Lock> rwLockStripes = Striped.lock(16);
Lock lock = rwLockStripes.get("key");
lock.lock();
try {
// do you work here
} finally {
lock.unlock();
}
Following snipped of code can help in implementing the putting/removal of lock.
private ConcurrentHashMap<String, ReentrantLock> caches = new ConcurrentHashMap<>();
public void processWithLock(String key) {
ReentrantLock lock = findAndGetLock(key);
lock.lock();
try {
// do you work here
} finally {
unlockAndClear(key, lock);
}
}
private void unlockAndClear(String key, ReentrantLock lock) {
// *** Step 1: Release the lock.
lock.unlock();
// *** Step 2: Attempt to remove the lock
// This is done by calling compute method, if given lock is present in
// cache. if current lock object in cache is same instance as 'lock'
// then remove it from cache. If not, some other thread is succeeded in
// putting new lock object and hence we can leave the removal of lock object to that
// thread.
caches.computeIfPresent(key, (k, current) -> lock == current ? null : current);
}
private ReentrantLock findAndGetLock(String key) {
// Merge method given us the access to the previously( if available) and
// newer lock object together.
return caches.merge(key, new ReentrantLock(), (older, newer) -> nonNull(older) ? older : newer);
}
Instead of writing you own you might try something like JKeyLockManager. From the projects description:
JKeyLockManager provides fine-grained locking with application
specific keys.
Example code given on site:
public class WeatherServiceProxy {
private final KeyLockManager lockManager = KeyLockManagers.newManager();
public void updateWeatherData(String cityName, float temperature) {
lockManager.executeLocked(cityName, () -> delegate.updateWeatherData(cityName, temperature));
}
New values will be added when you call
lockMap.computeIfAbsent()
So you can just check lockMap.size() for item count.
But How are you going to find first added item? it would be better just remove items after you used them.
You can use an in process cache that stores object references, like Caffeine, Guava, EHCache or cache2k. Here is an example how to build a cache with cache2k:
final Cache<Key, Lock> locks =
new Cache2kBuilder<Key, Lock>(){}
.loader(
new CacheLoader<Key, Lock>() {
#Override
public Lock load(Key o) {
return new ReentrantLock();
}
}
)
.storeByReference(true)
.entryCapacity(1000)
.build();
The usage pattern is as you have in the question:
Lock keyLock = locks.get(key);
keyLock.lock();
try {
externalSystem.process(key);
} finally {
keyLock.unlock();
}
Since the cache is limited to 1000 entries, there is an automatically cleanup of locks that are not in use any more.
There is the potential that a lock in use is evicted by the cache, if the capacity and the number of threads in the application are mismatching. This solution works perfectly for years in our applications. The cache will evict a lock that is in use, when there is a sufficiently long running task AND the capacity is exceeded. In a real application you always control the number of life threads, e.g. in a web container you would limit the number of processing threads to (example) 100. So you know that there are never more then 100 locks in use. If this is accounted for, this solution has a minimum overhead.
Keep in mind that the locking only works as long as your application runs on a single VM. You may want to take a look at distributed lock managers (DLM). Examples for products that provide distributed locks: hazelcast, infinispan, teracotta, redis/redisson.

Does partial thread-safety make a Java class thread-safe?

I came across the example below of a Java class which was claimed to be thread-safe. Could anyone please explain how it could be thread-safe? I can clearly see that the last method in the class is not being guarded against concurrent access of any reader thread. Or, am I missing something here?
public class Account {
private Lock lock = new ReentrantLock();
private int value = 0;
public void increment() {
lock.lock();
value++;
lock.unlock();
}
public void decrement() {
lock.lock();
value--;
lock.unlock();
}
public int getValue() {
return value;
}
}
The code is not thread-safe.
Suppose that one thread calls decrement and then a second thread calls getValue. What happens?
The problem is that there is no "happens before" relationship between the decrement and the getValue. That means that there is no guarantee, that the getValue call will see the results of the decrement. Indeed, the getValue could "miss" the results of an indefinite sequence of increment and decrement calls.
Actually, unless we see the code that uses the Account class, the question of thread-safety is ill-defined. The conventional notion of thread-safety1 of a program is about whether the code behaves correctly irrespective of thread-related non-determinacy. In this case, we don't have a specification of what "correct" behaviour is, or indeed an executable program to test or examine.
But my reading of the code2 is that there is an implied API requirement / correctness criterion that getValue returns the current value of the account. That cannot be guaranteed if there are multiple threads, therefore the class is not thread-safe.
Related links:
http://blogs.msdn.com/b/ericlippert/archive/2009/10/19/what-is-this-thing-you-call-thread-safe.aspx
1 - The Concurrency in Practice quote in #CKing's answer is also appealing to a notion of "correctness" by mentioning "invalid state" in the definition. However, the JLS sections on the memory model don't specify thread-safety. Instead, they talk about "well-formed executions".
2 - This reading is supported by the OP's comment below. However, if you don't accept that this requirement is real (e.g. because it is not stated explicitly), then the flip-side is that behaviour of the "account" abstraction depends on how code outside of the Account class ... which makes this a "leaky abstraction".
This is not thread safe purely due to the fact there is no guarantees about how the compiler can re-order. Since value is not volatile here is your classic example:
while(account.getValue() != 0){
}
This can be hoisted to look like
while(true){
if(account.getValue() != 0){
} else {
break;
}
}
I can imagine there are other permutations of compiler fun which can cause this to subtly fail. But accessing this getValue via multiple threads can result in failure.
There are several distinct issues here:
Q: If multiple threads make overlapped calls to increment() and decrement(), and then they stop, and then enough time passes with no threads calling increment() or decrement(), will getValue() return the correct number?
A: Yes. The locking in the increment and decrement methods insures that each increment and decrement operation will happen atomically. They can not interfere with one another.
Q: How long is enough time?
A: That's hard to say. The Java language specification does not guarantee that a thread calling getValue() will ever see the latest value written by some other thread because getValue() accesses the value without any synchronization at all.
If you change getValue() to lock and unlock the same lock object or if you declare count to be volatile, then zero amount of time would be enough.
Q: Can a call to getValue() return an invalid value?
A: No, It can only ever return the initial value, or the result of complete increment() call or the result of a complete decrement() operation.
But, the reason for this has nothing to do with the lock. The lock does not prevent any thread from calling getValue() while some other thread is in the middle of incrementing or decrementing the value.
The thing that prevents getValue() from returning a completely invalid value is that value is an int, and the JLS guarantees that updates and reads of int variables are always atomic.
The short answer :
By definition,Account is a thread-safe class even though the geValue method is not guarded
The long answer
From Java Concurrency in practice a class is said to be thread safe when :
No set of operations performed sequentially or concurrently on
instances of a thread-safe class can cause an instance to be in an
invalid state.
Since the the getValue method will not result in the Account class being in an invalid state at any given time, your class is said to be thread safe.
The documentation for Collections#synchronizedCollection resonates this sentiment :
Returns a synchronized (thread-safe) collection backed by the
specified collection. In order to guarantee serial access, it is
critical that all access to the backing collection is accomplished
through the returned collection. It is imperative that the user
manually synchronize on the returned collection when iterating over
it:
Collection c = Collections.synchronizedCollection(myCollection);
...
synchronized (c) {
Iterator i = c.iterator(); // Must be in the synchronized block
while (i.hasNext())
foo(i.next());
}
Notice how the documentation says that the collection (which is an object of an inner class named SynchronizedCollection in the Collections class) is thread-safe and yet asks the client code to guard the collection while iterating over it. Infact, the iterator method in SynchronizedCollection is not synchronized. This is very similar to your example where Account is thread-safe but client code still needs to ensure atomicity when calling getValue.
It's completely thread safe.
Nobody can simultaneously increment and decrement value so you won't lose or gain a count in error.
The fact that getValue() will return different values through time is something that will happen anyway: simultaneity is not relevant.
You do not have to protect getValue. Accessing it from multiple threads at the same time does not lead to any negative effects. The object state cannot become invalid no matter when or from how many threads you call this methid (because it does not change).
Having said that - you can write a non-thread-safe code that uses this class.
For example something like
if (acc.getValue()>0) acc.decrement();
is potentially dangerous because it can lead to race conditions. Why?
Let's say you have a business rule "never decrement below 0", your current value is 1, and there are two threads executing this code. There's a chance that they'll do it in the following order:
Thread 1 checks that acc.getValue is >0. Yes!
Thread 2 that acc.getValue is >0. Yes!
Thread 1 calls decrement. value is 0
Thread 2 calls decrement. value is now -1
What happened? Each function made sure it was not going below zero, but together they managed to do that. This is called race condition.
To avoid this you must not protect the elementary operations, but rather any pieces of code that must be executed uninterrupted.
So, this class is thread-safe but only for very limited use.

Atomically perform multiple operations

I'm trying to find a way to perform multiple operations on a ConcurrentHashMap in an atomic manner.
My logic is like this:
if (!map.contains(key)) {
map.put(key, value);
doSomethingElse();
}
I know there is the putIfAbsent method. But if I use it, I still won't be able to call the doSomethingElse atomically.
Is there any way of doing such things apart from resorting to synchronization / client-side locking?
If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.
If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.
If that's the case, you would generally have to synchronize externally.
In some circumstances (depending on what doSomethingElse() expects the state of the map to be, and what the other threads might do the map), the following may also work:
if (map.putIfAbsent(key, value) == null) {
doSomethingElse();
}
This will ensure that only one thread goes into doSomethingElse() for any given key.
This would work unless you want all putting threads to wait until the first successful thread puts in the map..
if(map.get(key) == null){
Object ret = map.putIfAbsent(key,value);
if(ret == null){ // I won the put
doSomethingElse();
}
}
Now if many threads are putting with the same key only one will win and only one will doSomethingElse().
If your design demands that the map access and the other operation be grouped without anybody else accessing the map, then you have no choice but to lock them. Perhaps the design can be revisited to avoid this need?
This also implies that all other accesses to the map must be serialized behind the same lock.
You might keep a lock per entry. That would allow concurrent non-locking updates, unless two threads try to access the same element.
class LockedReference<T> {
Lock lock = new ReentrantLock();;
T value;
LockedReference(T value) {this.value=value;}
}
LockedReference<T> ref = new LockedReference(value);
ref.lock.lock(); //lock on the new reference, there is no contention here
try {
if (map.putIfAbsent(key, ref)==null) {
//we have locked on the key before inserting the element
doSomethingElse();
}
} finally {ref.lock.unlock();}
later
Object value;
while (true) {
LockedReference<T> ref = map.get(key)
if (ref!=null) {
ref.lock.lock();
//there is no contention, unless a thread is already working on this entry
try {
if (map.containsKey(key)) {
value=ref.value;
break;
} else {
/*key was removed between get and lock*/
}
} finally {ref.lock.unlock();}
} else value=null;
}
A fancier approach would be rewriting ConcurrentHashMap and have a version of putIfAbsent that accepts a Runnable (which is executed if the element was put). But that would be far far more complex.
Basically, ConcurrentHashMap implements locked segments, which is in the middle between one lock per entry, and one global lock for the whole map.

specific question on java threading + synchronization

I know this question sounds crazy, but consider the following java snippets:
Part - I:
class Consumer implements Runnable{
private boolean shouldTerminate = false
public void run() {
while( !shouldTerminate ){
//consume and perform some operation.
}
}
public void terminate(){
this.shouldTerminate = true;
}
}
So, the first question is, should I ever need to synchronize on shouldTerminate boolean? If so why? I don't mind missing the flag set to true for one or two cycles(cycle = 1 loop execution). And second, can a boolean variable ever be in a inconsistent state?(anything other than true or false)
Part - II of the question:
class Cache<K,V> {
private Map<K, V> cache = new HashMap<K, V>();
public V getValue(K key) {
if ( !cache.containsKey(key) ) {
synchronized(this.cache){
V value = loadValue(key)
cache.put(key, value);
}
}
return cache.get(key);
}
}
Should access to the whole map be synchronized? Is there any possibility where two threads try to run this method, with one "writer thread" halfway through the process of storing value into the map and simultaneously, a "reader thread" invoking the "contains" method. Will this cause the JVM to blow up? (I don't mind overwriting values in the map -- if two writer threads try to load at the same time)
Both of the code examples have broken concurrency.
The first one requires at least the field marked volatile or else the other thread might never see the variable being changed (it may store its value in CPU cache or a register, and not check whether the value in memory has changed).
The second one is even more broken, because the internals of HashMap are no thread-safe and it's not just a single value but a complex data structure - using it from many threads produces completely unpredictable results. The general rule is that both reading and writing the shared state must be synchronized. You may also use ConcurrentHashMap for better performance.
Unless you either synchronize on the variable, or mark the variable as volatile, there is no guarantee that separate threads' view of the object ever get reconciled. To quote the Wikipedia artible on the Java Memory Model
The major caveat of this is that as-if-serial semantics do not prevent different threads from having different views of the data.
Realistically, so long as the two threads synchronize on some lock at some time, the update to the variable will be seen.
I am wondering why you wouldn't want to mark the variable volatile?
It's not that the JVM will "blow up" as such. But both cases are incorrectly synchronised, and so the results will be unpredictable. The bottom line is that JVMs are designed to behave in a particular way if you synchronise in a particular way; if you don't synchronise correctly, you lose that guarantee.
It's not uncommon for people to think they've found a reason why certain synchronisation can be omitted, or to unknowingly omit necessary synchronisation but with no immediately obvious problem. But with inadequate synchronisation, there is a danger that your program could appear to work fine in one environment, only for an issue to appear later when a particular factor is changed (e.g. moving to a machine with more CPUs, or an update to the JVM that adds a particular optimisation).
Synchronizing shouldTerminate: See
Dilum's answer
Your bool value will
never be inconsistent state.
If one
thread is calling
cache.containsKey(key) while
another thread is calling
cache.put(key, value) the JVM will
blow up (by throwing ConcurrentModificationException)
something bad might happen if that put call caused the map
the grow, but will usually mostly work (worse than failure).

Categories