There is a term correctly synchronized in the JLS:
A program is correctly synchronized if and only if all sequentially consistent executions are free of data races.
If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent (§17.4.3).
Can this correctly synchronized be applied to something smaller than the whole program, like some collection class?
In other words, imagine that I want to create my custom concurrent collection class.
I want the code of my collection to never produce data races in any program which uses my collection.
Would it be enough to only check that every possible sequential execution has no data races in order to guarantee that non-sequential executions also cannot produce data races?
You need to ensure that there would be no race conditions is your concurrent custom collection.
I.e. multiple threads can access the same element simultaneously only if they are trying to read its value, that's fine. But if it's possible if one thread is trying to update the element while another is trying to retrieve its value, or multiple threads a trying to change the same element at the same time (overriding results produced by one-another), that's a problem. And it can be avoided by synchronizing the mutating code.
§17.4.5. Happens-before Order
When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race.
§17.4.1. Shared Variables
Two accesses to (reads of or writes to) the same variable are said to be conflicting if at least one of the accesses is a write.
Here's a dummy example of the concurrent list which allows reading its value from multiple threads simultaneously, but if one of the threads is writing a value, all others would be blocked.
This can be achieved by using ReentrantReadWriteLock.
public class MyConcurrentArrayList<T> {
private Object[] array;
private int size;
private final ReadWriteLock lock = new ReentrantReadWriteLock();
private final Lock read = lock.readLock();
private final Lock write = lock.readLock();
// non-mutating operation
#SuppressWarnings("unchecked")
public T get(int i) {
T result;
read.lock(); // lock the read-lock, from that moment only non-mutating operations (guarded by the readLock) are allowed
// access the element under the given index
try {
result = (T) array[i];
} finally {
read.unlock(); // release the read-lock
}
return result;
}
// mutating operation
public void add(T item) {
write.lock(); // lock the write-lock, none of the threads would be able to read or update the list until this lock is not released
try {
// check if the list needs to grow (omitted)
array[size++] = item; // add a new element
} finally {
write.unlock(); // release the write-lock
}
}
}
I need to add a static thread safe HashMap. I have something like this -
private static Map<String, ConcurrentHashMap<Integer, ClassA>> myCache =
new ConcurrentHashMap<String, ConcurrentHashMap<Integer, ClassA>>();
Even though I am using ConcurrentHashMap, I see that if main thread is adding element in the myCache, and when the same is accessed in other thread even at later time, it does not have the latest data in the myCache object. For e.g
Thread 1: Adds entry to the map
myCache = {a1={1=com.x.y.z.ClassA#3ec8b657}}
Thread 2: Access the same key a1. But it does not see the data added by Thread 1. Rather it sees empty value for this key myCache = {a1={}}
As a result, data is getting corrupted. Entries added for the a1 key in Thread 1 are not visible in Thread 2.
Thanks in advance for any pointers on how can I update this map in thread safe manner.
Even though I am using ConcurrentHashMap, I see that if main thread is adding element in the myCache, and when the same is accessed in other thread even at later time, it does not have the latest data in the myCache object.
ConcurrentHashMap is running in a large number of applications around the world at this instance. If your fairly typical use case didn't work appropriately, many critical systems would be failing.
Something is going on but chances are high that it is nothing to do with ConcurrentHashMap. So here are some questions for you to help you debug your code:
Are you sure that the cache lookup happens after the cache put? ConcurrentHashMap doesn't save you from race conditions in your code.
Any chance this is a problem with the hashcode() or equals() functions of the key object? By default the hashcode() and equals() methods are for the object instance and not the object value. See the consistency requirements.
Any chance that the cache lookup happens after a cache remove or a timeout of the cache value? Is there cache cleanup logic?
Do you have logic that does 2 operations on the ConcurrentHashMap. For example testing for existence of a cache entry and then making another call to put the value? You should be using putIfAbsent(...) or the other atomic calls if so.
If you edit your post and show a small sample of your code with the key object, the real source of the issue may be revealed.
I've never had good results with ConcurrentHashMap. When I need thread safety, I usally do something like this:
public class Cache<K, V> {
private Map<K, V> cache = new HashMap<>();
public V get(K key) {
synchronized (cache) {
return cache.get(key);
}
}
public void put(K key, V value) {
synchronized (cache) {
cache.put(key, value);
}
}
}
The #Ryan answer is essentially correct.
Remember that you must proxy every Map method that you wish to use
and you must synchronize to the cashe element within every proxied method.
For example:
public void clear()
{
synchronized(cache)
{
cache.clear();
}
}
I have code which implements a "lock handler" for arbitrary keys. Given a key, it ensures that only one thread at a time can process that(or equals) key (which here means calling the externalSystem.process(key) call).
So far, I have code like this:
public class MyHandler {
private final SomeWorkExecutor someWorkExecutor;
private final ConcurrentHashMap<Key, Lock> lockMap = new ConcurrentHashMap<>();
public void handle(Key key) {
// This can lead to OOM as it creates locks without removing them
Lock keyLock = lockMap.computeIfAbsent(
key, (k) -> new ReentrantLock()
);
keyLock.lock();
try {
someWorkExecutor.process(key);
} finally {
keyLock.unlock();
}
}
}
I understand that this code can lead to the OutOfMemoryError because no one clear map.
I think about how to make map which will accumulate limited count of elements. When limit will be exceeded then we should replace oldest access element with new(this code should synchronized with oldest element as monitor). But I don't know how to have callback which will say me that limit exceeded.
Please share your thoughts.
P.S.
I reread the task and now I see that I have limitation that handle method cannot be invoked more than 8 threads. I don't know how can it help me but I just mentioned it.
P.S.2
by #Boris the Spider was suggested nice and simple solution:
} finally {
lockMap.remove(key);
keyLock.unlock();
}
But after Boris noticed that code us not thread safe because it break behavior:
lets research 3 threads invoked with equally key:
Thread#1 acquire the lock and now before map.remove(key);
Thread#2 invokes with equals key so it wait when thread#1 release lock.
then thread#1 execute map.remove(key);. After this thread#3 invokes method handle. It checks that lock for this key is absent in map thus it creates new lock and acquires it.
Thread#1 releases the lock and thus thread#2 acquires it.
Thus thread#2 and thread#3 can be invoked in parallel for equals keys. But it should not be allowed.
To avoid this situation, before map clearing we should block any thread to acquire the lock while all threads from waitset is not acquire and release the lock. Looks like it is enough complicated synchronization needed and it will lead to slow algorithm working. Maybe we should clear map from time to time when map size exceeds some limited value.
I wasted a lot of time but unfortunately I have not ideas how to achieve this.
You don't need to try to limit the size to some arbitrary value - as it turns out, you can accomplish this kind of "lock handler" idiom while only storing exactly the number of keys currently locked in the map.
The idea is to use a simple convention: successfully adding the mapping to the map counts as the "lock" operation, and removing it counts as the "unlock" operation. This neatly avoids the issue of removing a mapping while some thread still has it locked and other race conditions.
At this point, the value in the mapping is only used to block other threads who arrive with the same key and need to wait until the mapping is removed.
Here's an example1 with CountDownLatch rather than Lock as the map value:
public void handle(Key key) throws InterruptedException {
CountDownLatch latch = new CountDownLatch(1);
// try to acquire the lock by inserting our latch as a
// mapping for key
while(true) {
CountDownLatch existing = lockMap.putIfAbsent(key, latch);
if (existing != null) {
// there is an existing key, wait on it
existing.await();
} else {
break;
}
}
try {
externalSystem.process(key);
} finally {
lockMap.remove(key);
latch.countDown();
}
}
Here, the lifetime of the mapping is only as long as the lock is held. The map will never have more entries than there are concurrent requests for different keys.
The difference with your approach is that the mappings are not "re-used" - each handle call will create a new latch and mapping. Since you are already doing expensive atomic operations, this isn't likely to be much of a slowdown in practice. Another downside is that with many waiting threads, all are woken when the latch counts down, but only one will succeed in putting a new mapping in and hence acquiring the lock - the rest go back to sleep on the new lock.
You could build another version of this which re-uses the mappings when threads coming along and wait on an existing mapping. Basically, the unlocking thread just does a "handoff" to one of the waiting threads. Only one mapping will be used for an entire set of threads that wait on the same key - it is handed off to each one in sequence. The size is still bounded because one no more threads are waiting on a given mapping it is still removed.
To implement that, you replace the CountDownLatch with a map value that can count the number of waiting threads. When a thread does the unlock, it first checks to see if any threads are waiting, and if so wakes one to do the handoff. If no threads are waiting, it "destroys" the object (i.e., sets a flag that the object is no longer in the mapping) and removes it from the map.
You need to do the above manipulations under a proper lock, and there are a few tricky details. In practice I find the short and sweet example above works great.
1 Written on the fly, not compiled and not tested, but the idea works.
You could rely on the method compute(K key, BiFunction<? super K,? super V,? extends V> remappingFunction) to synchronize calls to your method process for a given key, you don't even need anymore to use Lock as type of the values of your map as you don't rely on it anymore.
The idea is to rely on the internal locking mechanism of your ConcurrentHashMap to execute your method, this will allow threads to execute in parallel the process method for keys whose corresponding hashes are not part of the same bin. This equivalent to the approach based on striped locks except that you don't need additional third party library.
The striped locks' approach is interesting because it is very light in term of memory footprint as you only need a limited amount of locks to do it, so the memory footprint needed for your locks is known and never changes, which is not the case of approaches that use one lock for each key (like in your question) such that it is generally better/recommended to use approaches based on striped locks for such need.
So your code could be something like this:
// This will create a ConcurrentHashMap with an initial table size of 16
// bins by default, you may provide an initialCapacity and loadFactor
// if too much or not enough to get the expected table size in order
// increase or reduce the concurrency level of your map
// NB: We don't care much of the type of the value so I arbitrarily
// used Void but it could be any type like simply Object
private final ConcurrentMap<Key, Void> lockMap = new ConcurrentHashMap<>();
public void handle(Key lockKey) {
// Execute the method process through the remapping Function
lockMap.compute(
lockKey,
(key, value) -> {
// Execute the process method under the protection of the
// lock of the bin of hashes corresponding to the key
someWorkExecutor.process(key);
// Returns null to keep the Map empty
return null;
}
);
}
NB 1: As we always returns null the map will always be empty such that you will never run out of memory because of this map.
NB 2: As we never affect a value to a given key, please note that it could also be done using the method computeIfAbsent(K key, Function<? super K,? extends V> mappingFunction):
public void handle(Key lockKey) {
// Execute the method process through the remapping Function
lockMap.computeIfAbsent(
lockKey,
key -> {
// Execute the process method under the protection of the
// lock of the segment of hashes corresponding to the key
someWorkExecutor.process(key);
// Returns null to keep the Map empty
return null;
}
);
}
NB 3: Make sure that your method process never calls the method handle for any keys as you would end up with infinite loops (same key) or deadlocks (other non ordered keys, for example: If one thread calls handle(key1) and then process internally calls handle(key2) and another thread calls in parallel handle(key2) and then process internally calls handle(key1), you will get a deadlock whatever the approach used). This behavior is not specific to this approach, it will occur with any approaches.
One approach is to dispense with the concurrent hash map entirely, and just use a regular HashMap with locking to perform the required manipulation of the map and lock state atomically.
At first glance, this seems to reduce the concurrency of the system, but if we assume that the process(key) call is lengthy relative the very fast lock manipulations, it works well because the process() calls still run concurrently. Only a small and fixed amount of work occurs in the exclusive critical section.
Here's a sketch:
public class MyHandler {
private static class LockHolder {
ReentrantLock lock = new ReentrantLock();
int refcount = 0;
void lock(){
lock.lock();
}
}
private final SomeWorkExecutor someWorkExecutor;
private final Lock mapLock = new ReentrantLock();
private final HashMap<Key, LockHolder> lockMap = new HashMap<>();
public void handle(Key key) {
// lock the map
mapLock.lock();
LockHolder holder = lockMap.computeIfAbsent(key, k -> new LockHolder());
// the lock in holder is either unlocked (newly created by us), or an existing lock, let's increment refcount
holder.refcount++;
mapLock.unlock();
holder.lock();
try {
someWorkExecutor.process(key);
} finally {
mapLock.lock()
keyLock.unlock();
if (--holder.refcount == 0) {
// no more users, remove lock holder
map.remove(key);
}
mapLock.unlock();
}
}
}
We use refcount, which is only manipulated under the shared mapLock to keep track of how many users of the lock there are. Whenever the refcount is zero, we can get rid of the entry as we exit the handler. This approach is nice in that it is fairly easy to reason about and will perform well if the process() call is relatively expensive compared to the locking overhead. Since the map manipulation occurs under a shared lock, it is also straightforward to add additional logic, e.g., keeping some Holder objects in the map, keeping track of statistics, etc.
Thanks Ben Mane
I have found this variant.
public class MyHandler {
private final int THREAD_COUNT = 8;
private final int K = 100;
private final Striped<Lock> striped = Striped.lazyWeakLock(THREAD_COUNT * K);
private final SomeWorkExecutor someWorkExecutor = new SomeWorkExecutor();
public void handle(Key key) throws InterruptedException {
Lock keyLock = striped.get(key);
keyLock.lock();
try {
someWorkExecutor.process(key);
} finally {
keyLock.unlock();
}
}
}
Here's a short and sweet version that leverages the weak version of Guava's Interner class to do the heavily lifting of coming up with a "canonical" object for each key to use as the lock, and implementing weak reference semantics so that unused entries are cleaned up.
public class InternerHandler {
private final Interner = Interners.newWeakInterner();
public void handle(Key key) throws InterruptedException {
Key canonKey = Interner.intern(key);
synchronized (canonKey) {
someWorkExecutor.process(key);
}
}
}
Basically we ask for a canonical canonKey which is equal() to key, and then lock on this canonKey. Everyone will agree on the canonical key and hence all callers that pass equal keys will agree on the object on which to lock.
The weak nature of the Interner means that any time the canonical key isn't being used, the entry can be removed, so you avoid accumulation of entries in the interner. Later, if an equal key again comes in, a new canonical entry is chosen.
The simple code above relies on the built-in monitor to synchronize - but if this doesn't work for you (e.g., it's already used for another purpose) you can include a lock object in the Key class or create a holder object.
class MyHandler {
private final Map<Key, Lock> lockMap = Collections.synchronizedMap(new WeakHashMap<>());
private final SomeWorkExecutor someWorkExecutor = new SomeWorkExecutor();
public void handle(Key key) throws InterruptedException {
Lock keyLock = lockMap.computeIfAbsent(key, (k) -> new ReentrantLock());
keyLock.lock();
try {
someWorkExecutor.process(key);
} finally {
keyLock.unlock();
}
}
}
Creating and removing the lock object for a key each time is an costly operation in term of performance. When you do add/remove lock from concurrent map (say cache), it have to be ensure that putting/removing object from cache is itself thread-safe. So this seems not good idea but can be implemented via ConcurrentHashMap
Strip locking approach (also used by concurrent hash map internally) is better approach. From Google Guava docs it is explained as
When you want to associate a lock with an object, the key guarantee
you need is that if key1.equals(key2), then the lock associated with
key1 is the same as the lock associated with key2.
The crudest way to do this is to associate every key with the same
lock, which results in the coarsest synchronization possible. On the
other hand, you can associate every distinct key with a different
lock, but this requires linear memory consumption and concurrency
management for the system of locks itself, as new keys are discovered.
Striped allows the programmer to select a number of locks, which are
distributed between keys based on their hash code. This allows the
programmer to dynamically select a tradeoff between concurrency and
memory consumption, while retaining the key invariant that if
key1.equals(key2), then striped.get(key1) == striped.get(key2)
code:
//declare globally; e.g. class field level
Striped<Lock> rwLockStripes = Striped.lock(16);
Lock lock = rwLockStripes.get("key");
lock.lock();
try {
// do you work here
} finally {
lock.unlock();
}
Following snipped of code can help in implementing the putting/removal of lock.
private ConcurrentHashMap<String, ReentrantLock> caches = new ConcurrentHashMap<>();
public void processWithLock(String key) {
ReentrantLock lock = findAndGetLock(key);
lock.lock();
try {
// do you work here
} finally {
unlockAndClear(key, lock);
}
}
private void unlockAndClear(String key, ReentrantLock lock) {
// *** Step 1: Release the lock.
lock.unlock();
// *** Step 2: Attempt to remove the lock
// This is done by calling compute method, if given lock is present in
// cache. if current lock object in cache is same instance as 'lock'
// then remove it from cache. If not, some other thread is succeeded in
// putting new lock object and hence we can leave the removal of lock object to that
// thread.
caches.computeIfPresent(key, (k, current) -> lock == current ? null : current);
}
private ReentrantLock findAndGetLock(String key) {
// Merge method given us the access to the previously( if available) and
// newer lock object together.
return caches.merge(key, new ReentrantLock(), (older, newer) -> nonNull(older) ? older : newer);
}
Instead of writing you own you might try something like JKeyLockManager. From the projects description:
JKeyLockManager provides fine-grained locking with application
specific keys.
Example code given on site:
public class WeatherServiceProxy {
private final KeyLockManager lockManager = KeyLockManagers.newManager();
public void updateWeatherData(String cityName, float temperature) {
lockManager.executeLocked(cityName, () -> delegate.updateWeatherData(cityName, temperature));
}
New values will be added when you call
lockMap.computeIfAbsent()
So you can just check lockMap.size() for item count.
But How are you going to find first added item? it would be better just remove items after you used them.
You can use an in process cache that stores object references, like Caffeine, Guava, EHCache or cache2k. Here is an example how to build a cache with cache2k:
final Cache<Key, Lock> locks =
new Cache2kBuilder<Key, Lock>(){}
.loader(
new CacheLoader<Key, Lock>() {
#Override
public Lock load(Key o) {
return new ReentrantLock();
}
}
)
.storeByReference(true)
.entryCapacity(1000)
.build();
The usage pattern is as you have in the question:
Lock keyLock = locks.get(key);
keyLock.lock();
try {
externalSystem.process(key);
} finally {
keyLock.unlock();
}
Since the cache is limited to 1000 entries, there is an automatically cleanup of locks that are not in use any more.
There is the potential that a lock in use is evicted by the cache, if the capacity and the number of threads in the application are mismatching. This solution works perfectly for years in our applications. The cache will evict a lock that is in use, when there is a sufficiently long running task AND the capacity is exceeded. In a real application you always control the number of life threads, e.g. in a web container you would limit the number of processing threads to (example) 100. So you know that there are never more then 100 locks in use. If this is accounted for, this solution has a minimum overhead.
Keep in mind that the locking only works as long as your application runs on a single VM. You may want to take a look at distributed lock managers (DLM). Examples for products that provide distributed locks: hazelcast, infinispan, teracotta, redis/redisson.
I have a class container containing a collection which is going to be used by multiple threads:
public class Container{
private Map<String, String> map;
//ctor, other methods reading the map
public void doSomeWithMap(String key, String value){
//do some threads safe action
map.put(key, value);
//do something else, also thread safe
}
}
What would be better, to declare the method synchronized:
public synchronized void doSomeWithMap(String key, String value)
or to use standard thread-safe decorator?
Collections.synchronizedMap(map);
Generally speaking, synchronizing the map will protect most access to it without having to think about it further. However, the "synchronized map" is not safe for iteration which may be an issue depending on your use case. It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views.
Consider using ConcurrentHashMap if that will meet your use case.
If there is other state to this object that needs to be protected from concurrency errors, then you will need to use synchronized or a Lock.
If your doSomeWithMap method will access the map more than once, you must synchronize the doSomeWithMap method. If the only access is the put() call shown, then it's better to use a ConcurrentHashMap.
Note that "more than once" is any call, and an iterator is by nature many "gets".
If you look at the implementation of SynchronizedMap, you'll see that it's simply a map wrapping a non thread-safe map that uses a mutex before calling any method
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}
public V put(K key, V value) {
synchronized (mutex) {return m.put(key, value);}
}
public Set<Map.Entry<K,V>> entrySet() {
synchronized (mutex) {
if (entrySet==null)
entrySet = new SynchronizedSet<>(m.entrySet(), mutex);
return entrySet;
}
}
If all you want is protecting get and put, this implementation does it for you.
However it's not suitable if you want a Map that can be iterated over and updated by two or more threads, in which case you should use a ConcurrentHashMap.
If the other things you do inside doSomeWithMap would cause problems if done concurrently by different threads (for example, they update class-level variables) then you should synchronise the whole method. If this is not the case then you should use a synchronised Map in order to minimise the length of time the synchronisation lock is left in place.
You should probably have the block to be synchronized upon the requirement. Please note this.
When you use synchronized Collection like ConcurrentHashMap or Collection's method like synchronizedMap(), synchronizedList() etc, only the Map/List is synchronized. To explain little further,
Consider,
Map<String, Object> map = new HashMap<>();
Map<String, Object> synchMap = Collections.synchronizedMap(map);
This makes the map's get operation to be synchronous and not the objects inside it.
Object o = synchMap.get("1");// Object referenced by o is not synchronized. Only the map is.
If you want to protect the objects inside the Map, then you also have to put the code inside the synchronized block. This is good to remember as many people forget to safe guard the object in most cases.
Look at this for little info too Collection's synchronizedMap
so I have a HashMap that is declared in class level like so:
private static volatile HashMap<String, ArrayList>String>> map =
new HashMap<String, ArrayList>String>>();
I have several threads updating the same map and the threads are declared in the class level like so:
private class UpdateThread extends Thread {
#Override
public void run() {
// update map here
// map actually gets updated here
}
}
But after the threads exit:
for (FetchSKUsThread thread : listOfThreads) {
thread.start();
}
for (FetchSKUsThread thread : listOfThreads) {
try {
thread.join();
// map not updated anymore :-[
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Why are the map changes that are occuring inside the thread not persisting after the thread is done? I've decalred the map static and volatile already...
Thanks in advance
Why are the map changes that are occurring inside the thread not persisting after the thread is done? I've declared the map static and volatile already...
It depends highly on how you are updating the map.
// update map here -- what's happening here?
As #Louis points out, if multiple threads are updating the same map instance, volatile won't help you and you should be using a ConcurrentHashMap. As #Gerhard points out, volatile is only protecting the updating of the HashMap reference and not the innards of the map itself. You need to fully lock the map if the threads are updating it in parallel or use a concurrent map.
However, if each thread is replacing the map with a new map then the volatile method would work. Then again, each thread may be overwriting the central map because of race conditions.
If you show us your update code, we should be able to explain it better.
The keyowrd volatile only makes the reference to the HashMap visible to all threads.
If you want to access a HashMap in several threads, you need to use a synchronized map. The easiest choices are using java.util.Hashtable or using Collections.synchronizedMap(map). The volatile declaration is useless in your case, since your variable is initialized at the beginning.
The semantics of volatile apply only to the variable you are declaring.
In your case, the variable that holds your reference to map is volatile, and so the JVM will go to lengths to assure that changes you make to the reference contained by map are visible to other threads.
However, the object referred to by map is not covered by any such guarantee and in order for changes to any object or any object graph to be viewed by other threads, you will need to establish a happens-before relationship. With mutable state objects, this usually means synchronizing on a lock or using a thread safe object designed for concurrency. Happily, in your case, a high-performance Map implementation designed for concurrent access is part of the Java library: `ConcurrentHashMap'.