I would like to implement a variation on the "Map of Sets" collection that will be constantly accessed by multiple threads. I am wondering whether the synchronization I am doing is sufficient to guarantee that no issues will manifest.
So given the following code, where Map, HashMap, and Set are the Java implementations, and Key and Value are some arbitrary Objects:
public class MapOfSets {
private Map<Key, Set<Value>> map;
public MapOfLists() {
map = Collections.synchronizedMap(new HashMap<Key, Set<Value>());
}
//adds value to the set mapped to key
public void add(Key key, Value value) {
Set<Value> old = map.get(key);
//if no previous set exists on this key, create it and add value to it
if(old == null) {
old = new Set<Value>();
old.add(value);
map.put(old);
}
//otherwise simply insert the value to the existing set
else {
old.add(value);
}
}
//similar to add
public void remove(Key key, Value value) {...}
//perform some operation on all elements in the set mapped to key
public void foo(Key key) {
Set<Value> set = map.get(key);
for(Value v : set)
v.bar();
}
}
The idea here is that because I've synchronized the Map itself, the get() and put() method should be atomic right? So there should be no need to do additional synchronization on the Map or the Sets contained in it. So will this work?
Alternatively, would the above code be advantageous over another possible synchronization solution:
public class MapOfSets {
private Map<Key, Set<Value>> map;
public MapOfLists() {
map = new HashMap<Key, Set<Value>();
}
public synchronized void add(Key key, Value value) {
Set<Value> old = map.get(key);
//if no previous set exists on this key, create it and add value to it
if(old == null) {
old = new Set<Value>();
old.add(value);
map.put(old);
}
//otherwise simply insert the value to the existing set
else {
old.add(value);
}
}
//similar to add
public synchronized void remove(Key key, Value value) {...}
//perform some operation on all elements in the set mapped to key
public synchronized void foo(Key key) {
Set<Value> set = map.get(key);
for(Value v : set)
v.bar();
}
}
Where I leave the data structures unsynchronized but synchronize all the possible public methods instead. So which ones will work, and which one is better?
The first implementation you posted is not thread safe. Consider what happens when the add method is accessed by two concurrent threads with the same key:
thread A executes line 1 of the method, and gets a null reference because no item with the given key is present
thread B executes line 1 of the method, and gets a null reference because no item with the given key is present — this will happen after A returns from the first call, as the map is synchronized
thread A evaluates the if condition to false
thread B evaluates the if condition to false
From that point on, the two threads will carry on with execution of the true branch of the if statement, and you will lose one of the two value objects.
The second variant of the method you posted looks safer.
However, if you can use third party libraries, I would suggest you to check out Google Guava, as they offer concurrent multimaps (docs).
The second one is correct, but the first one isn't.
Think about it a minute, and suppose two threads are calling add() in parallel. Here's what could occur:
Thread 1 calls add("foo", bar");
Thread 2 calls add("foo", baz");
Thread 1 gets the set for "foo" : null
Thread 2 gets the set for "foo" : null
Thread 1 creates a new set and adds "bar" in it
Thread 2 creates a new set and adds "baz" in it
Thread 1 puts its set in the map
Thread 2 puts its set in the map
At the end of the story, the map contains one value for "foo" instead of two.
Synchronizing the map makes sure that its internal state is coherent, and that each method you call on the map is thread-safe. but it doesn't make the get-then-put operation atomic.
Consider using one of Guava's SetMultiMap implementations, which does everything for you. Wrap it into a call to Multimaps.synchronizedSetMultimap(SetMultimap) to make it thread-safe.
Your second implementation will work, but it holds locks for longer than it needs to (an inevitable problem with using synchronized methods rather than synchronized blocks), which will reduce concurrency. If you find that the limit on concurrency here is a bottleneck, you could shrink the locked regions a bit.
Alternatively, you could use some of the lock-free collections providded by java.util.concurrent. Here's my attempt at that; this isn't tested, and it requires Key to be comparable, but it should not perform any locking ever:
public class MapOfSets {
private final ConcurrentMap<Key, Set<Value>> map;
public MapOfSets() {
map = new ConcurrentSkipListMap<Key, Set<Value>>();
}
private static ThreadLocal<Set<Value>> freshSets = new ThreadLocal<Set<Value>>() {
#Override
protected Set<Value> initialValue() {
return new ConcurrentSkipListSet<Value>();
}
};
public void add(Key key, Value value) {
Set<Value> freshSet = freshSets.get();
Set<Value> set = map.putIfAbsent(key, freshSet);
if (set == null) {
set = freshSet;
freshSets.remove();
}
set.add(value);
}
public void remove(Key key, Value value) {
Set<Value> set = map.get(key);
if (set != null) {
set.remove(value);
}
}
//perform some operation on all elements in the set mapped to key
public void foo(Key key) {
Set<Value> set = map.get(key);
if (set != null) {
for (Value v: set) {
v.bar();
}
}
}
}
For your Map implementation you could just use a ConcurrentHashMap - You wouldn't have to worry about ensuring thread safety for access, whether it's input or retrieval, as the implemenation takes care of that for you.
And if you really want to use a Set, you could call
Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>())
on your ConcurrentHashMap.
Related
I need to provide thread-safe implementation of the following container:
public interface ParameterMetaData<ValueType> {
public String getName();
}
public interface Parameters {
public <M> M getValue(ParameterMetaData<M> pmd);
public <M> void put(ParameterMetaData<M> p, M value);
public int size();
}
The thing is the size method should return the accurate number of paramters currently contained in a Parameters instance. So, my first attempt was to try delegating thread-safety as follows:
public final class ConcurrentParameters implements Parameters{
private final ConcurrentMap<ParameterMetaData<?>, Object> parameters =
new ConcurrentHashMap<>();
//Should represent the ACCURATE size of the internal map
private final AtomicInteger size = new AtomicInteger();
#Override
public <M> M getValue(ParameterMetaData<M> pmd) {
#SuppressWarnings("unchecked")
M value = (M) parameters.get(pmd);
return value;
}
#Override
public <M> void put(ParameterMetaData<M> p, M value){
if(value == null)
return;
//The problem is in the code below
M previous = (M) parameters.putIfAbsent(p, value);
if(previous != null)
//throw an exception indicating that the parameter already exists
size.incrementAndGet();
}
#Override
public int size() {
return size.intValue();
}
The problem is that I can't just call parameters.size() on the ConcurrentHashMap instance to return the actual size, as that the operation performs traversal without locking and there's no guaratee that it will retrieve the actual size. It isn't acceptable in my case. So, I decided to maintain the field containing the size.
QUESTION: Is it possible somehow to delegate thread safety and preserve the invariatns?
The outcome you want to achieve is non-atomic. You want to modify map and then get count of elements that would be consistent in a scope of single thread. The only way to achieve that is to make this flow "atomic operation" by synchronizing access to the map. This is the only way to assure that count will not change due to modifications made in another thread.
Synchronize modify-count access to the map via synchronized or Semaphore to allow only single thread to modify map and count elements at the time.
Using additional field as a counter does not guarantee thread safety here, as after map modification and before counter manipulation, other thread can in fact modify map, and the counter value will not be valid.
This is the reason why map does not keeps its size internally but has to traversal over elements - to give most accurate results at given point in time.
EDIT:
To be 100% clear, this is the most convinient way to achieve this:
synchronized(yourMap){
doSomethingWithTheMap();
yourMap.size();
}
so if you will change every map operation to such block, you will guarantee that size() will return accurate count of elements. The only condition is that all data manipulations are done using such synchronized block.
Suppose we have different methods which do some http calls, each of those are called with some specific argument... and we want to compare last value of method + argument and see if response was different and only then proceed...
method1(Arg arg)
method2(Arg arg)
when we make a particular call we have a hash of the response so that we can put them in a map...
{"key" : "method1|arg", "value" : "hash"}
now the next time we get the response we retrieve this particular "hash" from that cache store and compare it...
but all the method|arg calls are concurrent and there might be many calls of the same combination running in parallel, and only concurrency issue might happen on an Entry level... when the same call tries to update cache or read while the other one is updating...
So we need to synchronize on a entry object, and with that we will have that only a unique exact same combination of "method|arg" can block it... only the same call can block its other executions, and wont block other calls that have nothing to do with it.
I wonder if there is a lib (cache) already for this purpose?
if not, then is there any Map implementation that will allow to get Entry by key? or i shall keep another map?
and generally will it be safe to use HashMap and synchronize on Entry objects? (i dont really imagine what will happen when HashMap is rehashing and some concurrent gets are executing...)
UPDATE
Here is the implementation i've come up with... altough ConcurrentHashMap is probably covering this case but idea was to lock only on an entry not the entire map... (well except on writes)
public class HashCache {
final HashMap<String, Holder> hashCache = new HashMap<>();
public boolean hasChanged(String key, Object hash) {
assert key != null && hash != null;
Holder holder = hashCache.get(key);
if (holder == null) {
synchronized (hashCache) {
hashCache.put(key, new Holder(hash));
}
return true; // first hash
} else {
synchronized (holder) {
if (Objects.equals(holder.object, hash)) {
return false; // hash not changed
} else {
holder.object = hash;
return true; // hash changed
}
}
}
}
private static class Holder {
Object object;
Holder(Object object) {
this.object = object;
}
}
}
if you see a possible bug please comment :)
I think you'd be OK with a ConcurrentHashMap. I don't believe you need a cache for this, since you don't need to cache the response, but to store response's hash.
ConcurrentHashMap is a highly optimized Map which avoids thread contention as much as possible, especially for reads (I believe this matches your case).
You could use another approach and lock on every entry once you get it from a common HashMap, however I don't think it's worth the effort. I'd go first with the ConcurrentHashMap and test it, and would only change the implementation if behavior differs from expected results.
EDIT:
As per your edit, I must insist on recommending you use a ConcurrentHashMap. Anyways, if by some reason this is not affordable to you, I believe you should double-check when putting the value in the map for the first time:
public boolean hasChanged(String key, Object hash) {
assert key != null && hash != null;
Holder holder = hashCache.get(key);
if (holder == null) {
synchronized (hashCache) { // Double-check that value hasn't been changed
// before entering synchronized block
holder = hashCache.get(key);
if (holder == null) {
hashCache.put(key, new Holder(hash));
return true; // first hash
} // inner if
} // sync block
} // outer if
// No more else!
synchronized (holder) {
if (Objects.equals(holder.object, hash)) {
return false; // hash not changed
} else {
holder.object = hash;
return true; // hash changed
}
}
}
The double-check is needed because another thread might have put a value for the same key after your first get() but before you enter the synchronized block.
I'm using a WeakHashMap concurrently. I want to achieve fine-grained locking based on an Integer parameter; if thread A needs to modify a resource identified by Integer a and thread B does the same for resource identified by Integer b, then they need not to be synchronized. However, if there are two threads using the same resource, say thread C is also using a resource identified by Integer a, then of course thread A and C need to synchronize on the same Lock.
When there are no more threads that need the resource with ID X then the Lock in the Map for key=X can be removed. However, another thread can come in at that moment and try to use the lock in the Map for ID=X, so we need global synchronization when adding/removing the lock. (This would be the only place where every thread must synchronize, regardless of the Integer parameter) But, a thread cannot know when to remove the lock, because it doesn't know it is the last thread using the lock.
That's why I'm using a WeakHashMap: when the ID is no longer used, the key-value pair can be removed when the GC wants it.
To make sure I have a strong reference to the key of an already existing entry, and exactly that object reference that forms the key of the mapping, I need to iterate the keySet of the map:
synchronized (mrLocks){
// ... do other stuff
for (Integer entryKey : mrLocks.keySet()) {
if (entryKey.equals(id)) {
key = entryKey;
break;
}
}
// if key==null, no thread has a strong reference to the Integer
// key, so no thread is doing work on resource with id, so we can
// add a mapping (new Integer(id) => new ReentrantLock()) here as
// we are in a synchronized block. We must keep a strong reference
// to the newly created Integer, because otherwise the id-lock mapping
// may already have been removed by the time we start using it, and
// then other threads will not use the same Lock object for this
// resource
}
Now, can the content of the Map change while iterating it? I think not, because by calling mrLocks.keySet(), I created a strong reference to all keys for the scope of iteration. Is that correct?
As the API makes no assertions about the keySet(), I would recommend a cache usage like this:
private static Map<Integer, Reference<Integer>> lockCache = Collections.synchronizedMap(new WeakHashMap<>());
public static Object getLock(Integer i)
{
Integer monitor = null;
synchronized(lockCache) {
Reference<Integer> old = lockCache.get(i);
if (old != null)
monitor = old.get();
// if no monitor exists yet
if (monitor == null) {
/* clone i for avoiding strong references
to the map's key besides the Object returend
by this method.
*/
monitor = new Integer(i);
lockCache.remove(monitor); //just to be sure
lockCache.put(monitor, new WeakReference<>(monitor));
}
}
return monitor;
}
This way you are holding a reference to the monitor (the key itself) while locking on it and allow the GC to finalize it when not using it anymore.
Edit:
After the discussion about payload in the comments I thought about a solution with two caches:
private static Map<Integer, Reference<ReentrantLock>> lockCache = new WeakHashMap<>();
private static Map<ReentrantLock, Integer> keyCache = new WeakHashMap<>();
public static ReentrantLock getLock(Integer i)
{
ReentrantLock lock = null;
synchronized(lockCache) {
Reference<ReentrantLock> old = lockCache.get(i);
if (old != null)
lock = old.get();
// if no lock exists or got cleared from keyCache already but not from lockCache yet
if (lock == null || !keyCache.containsKey(lock)) {
/* clone i for avoiding strong references
to the map's key besides the Object returend
by this method.
*/
Integer cacheKey = new Integer(i);
lock = new ReentrantLock();
lockCache.remove(cacheKey); // just to be sure
lockCache.put(cacheKey, new WeakReference<>(lock));
keyCache.put(lock, cacheKey);
}
}
return lock;
}
As long as a strong reference to the payload (the lock) exists, the strong reference to the mapped integer in keyCache avoids the removal of the payload from the lockCache cache.
I have the following code:
public class Cache {
private final Map map = new ConcurrentHashMap();
public Object get(Object key) {
Object value = map.get(key);
if (value == null) {
value = new SomeObject();
map.put(key, value);
}
return value;
}
}
My question is:
The put and get methods of the map are thread safe, but since the whole block in not synchronized - could multiple threads add a the same key twice?
put and get are thread safe in the sense that calling them from different threads cannot corrupt the data structure (as, e.g., is possible with a normal java.util.HashMap).
However, since the block is not synchronized, you may still have multiple threads adding the same key:
Both threads may pass the null check, one adds the key and returns its value, and then the second will override that value with a new one and returns it.
As of Java 8, you can also prevent this addition of duplicate keys with:
public class Cache {
private final Map map = new ConcurrentHashMap();
public Object get(Object key) {
Object value = map.computeIfAbsent(key, (key) -> {
return new SomeObject();
});
return value;
}
}
The API docs state:
If the specified key is not already associated with a value, attempts
to compute its value using the given mapping function and enters it
into this map unless null. The entire method invocation is performed
atomically, so the function is applied at most once per key. Some
attempted update operations on this map by other threads may be
blocked while computation is in progress, so the computation should be
short and simple, and must not attempt to update any other mappings of
this map.
could multiple threads add a the same key twice?
Yes, they could. To fix this problem you can:
1) Use putIfAbsent method instead of put. It very fast but unnecessary SomeObject instances can be created.
2) Use double checked locking:
Object value = map.get(key);
if (value == null) {
synchronized (map) {
value = map.get(key);
if (value == null) {
value = new SomeObject();
map.put(key, value);
}
}
}
return value;
Lock is much slower, but only necessary objects will be created
you could also combine checking and putIfAbsent such as:
Object value = map.get(key);
if (value == null) {
return map.putIfAbsent(key, new SomeObject());
}
return value;
thereby reducing the unneccessary new objects to cases where new entries are introduced in the short time between the check and the putIfAbsent.
If you are feeling lucky and reads vastly outnumber writes to your map, you can also create your own copy-on-write map similar to CopyOnWriteArrayList.
I need to keep track of multiple values against unique keys i.e. 1(a,b) 2(c,d) etc...
The solution is accessed by multiple threads so effectively I have the following defined;
ConcurrentSkipListMap<key, ConcurrentSkipListSet<values>>
My question is does the removal of the key when the value set size is 0 need to be synchronized? I know that the two classes are "concurrent" and I've looked through the OpenJDK source code but I there would appear to be a window between one thread T1 checking that the Set is empty and removing the Map in remove(...) and another thread T2 calling add(...). Result being T1 removes last Set entry and removes the Map interleaved with T2 just adding a Set entry. Thus the Map and T2 Set entry are removed by T1 and data is lost.
Do I just "synchronize" the add() and remove() methods or is there a "better" way?
The Map is modified by multiple threads but only through two methods.
Code snippet as follows;
protected static class EndpointSet extends U4ConcurrentSkipListSet<U4Endpoint> {
private static final long serialVersionUID = 1L;
public EndpointSet() {
super();
}
}
protected static class IDToEndpoint extends U4ConcurrentSkipListMap<String, EndpointSet> {
private static final long serialVersionUID = 1L;
protected Boolean add(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
endpoints = new EndpointSet();
put(id, endpoints);
}
endpoints.add(endpoint);
return true;
}
protected Boolean remove(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
return false;
} else {
endpoints.remove(endpoint);
if (endpoints.size() == 0) {
remove(id);
}
return true;
}
}
}
As it is your code has data races. Examples of what could happen:
a thread could add between if (endpoints.size() == 0) and remove(id); - you saw that
in add, a thread could read a non null value in EndpointSet endpoints = get(id); and another thread could remove data from that set, remove the set from the map because the set is empty. The initial thread would then add a value to the set, which is not held in the map any longer => data gets lost too as it becomes unreachable.
The easiest way to solve your issue is to make both add and remove synchronized. But you then lose all the performance benefits of using a ConcurrentMap.
Alternatively, you could simply leave the empty sets in the map - unless you have memory constraints. You would still need some form of synchronization but it would be easier to optimise.
If contention (performance) is an issue, you could try a more fine grained locking strategy by synchronizing on the keys or values but it could be quite tricky (and locking on Strings is not such a good idea because of String pooling).
It seems that in all cases, you could use a non concurrent set as you will need to synchronize it externally yourself.