I need to provide thread-safe implementation of the following container:
public interface ParameterMetaData<ValueType> {
public String getName();
}
public interface Parameters {
public <M> M getValue(ParameterMetaData<M> pmd);
public <M> void put(ParameterMetaData<M> p, M value);
public int size();
}
The thing is the size method should return the accurate number of paramters currently contained in a Parameters instance. So, my first attempt was to try delegating thread-safety as follows:
public final class ConcurrentParameters implements Parameters{
private final ConcurrentMap<ParameterMetaData<?>, Object> parameters =
new ConcurrentHashMap<>();
//Should represent the ACCURATE size of the internal map
private final AtomicInteger size = new AtomicInteger();
#Override
public <M> M getValue(ParameterMetaData<M> pmd) {
#SuppressWarnings("unchecked")
M value = (M) parameters.get(pmd);
return value;
}
#Override
public <M> void put(ParameterMetaData<M> p, M value){
if(value == null)
return;
//The problem is in the code below
M previous = (M) parameters.putIfAbsent(p, value);
if(previous != null)
//throw an exception indicating that the parameter already exists
size.incrementAndGet();
}
#Override
public int size() {
return size.intValue();
}
The problem is that I can't just call parameters.size() on the ConcurrentHashMap instance to return the actual size, as that the operation performs traversal without locking and there's no guaratee that it will retrieve the actual size. It isn't acceptable in my case. So, I decided to maintain the field containing the size.
QUESTION: Is it possible somehow to delegate thread safety and preserve the invariatns?
The outcome you want to achieve is non-atomic. You want to modify map and then get count of elements that would be consistent in a scope of single thread. The only way to achieve that is to make this flow "atomic operation" by synchronizing access to the map. This is the only way to assure that count will not change due to modifications made in another thread.
Synchronize modify-count access to the map via synchronized or Semaphore to allow only single thread to modify map and count elements at the time.
Using additional field as a counter does not guarantee thread safety here, as after map modification and before counter manipulation, other thread can in fact modify map, and the counter value will not be valid.
This is the reason why map does not keeps its size internally but has to traversal over elements - to give most accurate results at given point in time.
EDIT:
To be 100% clear, this is the most convinient way to achieve this:
synchronized(yourMap){
doSomethingWithTheMap();
yourMap.size();
}
so if you will change every map operation to such block, you will guarantee that size() will return accurate count of elements. The only condition is that all data manipulations are done using such synchronized block.
Related
Consider the following implementation of some kind of fixed size cache, that allows lookup by an integer handle:
static class HandleCache {
private final AtomicInteger counter = new AtomicInteger();
private final Map<Data, Integer> handles = new ConcurrentHashMap<>();
private final Data[] array = new Data[100_000];
int getHandle(Data data) {
return handles.computeIfAbsent(data, k -> {
int i = counter.getAndIncrement();
if (i >= array.length) {
throw new IllegalStateException("array overflow");
}
array[i] = data;
return i;
});
}
Data getData(int handle) {
return array[handle];
}
}
There is an array store inside the compute function, which is not synchronized in any way. Would it be allowed by the java memory model for other threads to read a null value from this array later on?
PS: Would the outcome change if the id returned from getHandle was stored in a final field and only accessed through this field from other threads?
The read access isn't thread safe. You could make it thread safe indirectly however it's likely to be brittle. I would implemented it in a much simpler way and only optimise it later should it prove to a performance problem. e.g. because you see it in a profiler for a realistic test.
static class HandleCache {
private final Map<Data, Integer> handles = new HashMap<>();
private final List<Data> dataByIndex = new ArrayList<>();
synchronized int getHandle(Data data) {
Integer id = handles.get(data);
if (id == null) {
id = handles.size();
handles.put(data, id);
dataByIndex.add(id);
}
return id;
}
synchronized Data getData(int handle) {
return dataByIndex.get(handle);
}
}
Assuming that you determine the index for the array read from the value of counter than yes - you may get a null read
The simplest example (there are others) is a follows:
T1 calls getHandle(data) and is suspended just after int i = counter.getAndIncrement();
T2 calls handles[counter.get()] and reads null.
You should be able to easily verify this with a strategically placed sleep and two threads.
From the documentation of ConcurrentHashMap#computeIfAbsent:
The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.
The documentation's reference to blocking refers only to update operations on the Map, so if any other thread attempts to access array directly (rather than through an update operation on the Map), there can be race conditions and null can be read.
I came across the performance issue when implementing a data structure of non-duplicate concurrent ArrayList(or ConcurrentLinkedQueue).
public class NonDuplicateList implements Outputable {
private Map<Term, Integer> map;
private List<Term> terms;
public NonDuplicateList() {
this.map = new HashMap<>();
this.terms = new ArrayList<>();
}
public synchronized int addTerm(Term term) { //bad performance :(
Integer index = map.get(term);
if (index == null) {
index = terms.size();
terms.add(term);
map.put(term, index);
}
return index;
}
#Override
public void output(DataOutputStream out) throws IOException {
out.writeInt(terms.size());
for (Term term : terms) {
term.output(out);
}
}
}
Note that Term and NonDuplicateList both implement Outputable interface to output.
In order to keep NonDuplicateList thread-safe, I use synchronized to guard the method addTerm(Term) and the performance is as bad as expected, when currently invoking addTerm.
It seems that ConcurrentHashMap isn't suitable for this case, since it doesn't keep strong data consistency. Any idea how to improve the performance of addTerm without losing its thread-safety?
EDIT:
output method, i.e. iteration through NonDuplicateList, might not be thread-safe since only one thread will access this method after concurrently invoking addTerm, but addTerm must return the index value immediately as soon as a term is added into the NonDuplicateList.
There is a possibility to reuse ConcurrentHashMap in your implementation if you can sacrifice addTerm return type. Instead of returning actual index you can return boolean which indicates whether addition was successful or produced duplicate. This will also allow you to remove method synchronization and improve performance:
private ConcurrentMap<Term, Boolean> map;
private List<Term> terms;
public boolean addTerm(Term term) {
Boolean previousValue = map.putIfAbsent(term, Boolean.TRUE);
if (previousValue == null) {
terms.add(term);
return true;
}
return false;
}
I am afraid you will not get much faster solution here. The point is to avoid synchronization when you don't need it. If you don't mind weak consistency, using ConcurrentHashMap iterator can be significantly cheaper than either preventing other threads from adding items while you're iterating or taking a consistent snapshot when the iterator is created.
On the other hand, when you need synchronization and a consistent iterator, you'll need an alternative for ConcurrentHashMap. One that comes to my mind is java.util.Collections#synchronizedMap, but it's using synchronization at Object level, so every read/write operation needs to acquire lock, which is a performance overhead.
Take a look at ConcurrentSkipListMap, which guarantees average O(log(n)) performance on a wide variety of operations. It also has a number of operations that ConcurrentHashMap doesn't: ceilingEntry/Key, floorEntry/Key, etc. It also maintains a sort order, which would otherwise have to be calculated (at notable expense) if you were using a ConcurrentHashMap. Maybe it would be possible to get rid of list+map and use ConcurrentSkipListMap instead. Index of element might be computed using ConcurrentSkipListMap api.
I need to keep track of multiple values against unique keys i.e. 1(a,b) 2(c,d) etc...
The solution is accessed by multiple threads so effectively I have the following defined;
ConcurrentSkipListMap<key, ConcurrentSkipListSet<values>>
My question is does the removal of the key when the value set size is 0 need to be synchronized? I know that the two classes are "concurrent" and I've looked through the OpenJDK source code but I there would appear to be a window between one thread T1 checking that the Set is empty and removing the Map in remove(...) and another thread T2 calling add(...). Result being T1 removes last Set entry and removes the Map interleaved with T2 just adding a Set entry. Thus the Map and T2 Set entry are removed by T1 and data is lost.
Do I just "synchronize" the add() and remove() methods or is there a "better" way?
The Map is modified by multiple threads but only through two methods.
Code snippet as follows;
protected static class EndpointSet extends U4ConcurrentSkipListSet<U4Endpoint> {
private static final long serialVersionUID = 1L;
public EndpointSet() {
super();
}
}
protected static class IDToEndpoint extends U4ConcurrentSkipListMap<String, EndpointSet> {
private static final long serialVersionUID = 1L;
protected Boolean add(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
endpoints = new EndpointSet();
put(id, endpoints);
}
endpoints.add(endpoint);
return true;
}
protected Boolean remove(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
return false;
} else {
endpoints.remove(endpoint);
if (endpoints.size() == 0) {
remove(id);
}
return true;
}
}
}
As it is your code has data races. Examples of what could happen:
a thread could add between if (endpoints.size() == 0) and remove(id); - you saw that
in add, a thread could read a non null value in EndpointSet endpoints = get(id); and another thread could remove data from that set, remove the set from the map because the set is empty. The initial thread would then add a value to the set, which is not held in the map any longer => data gets lost too as it becomes unreachable.
The easiest way to solve your issue is to make both add and remove synchronized. But you then lose all the performance benefits of using a ConcurrentMap.
Alternatively, you could simply leave the empty sets in the map - unless you have memory constraints. You would still need some form of synchronization but it would be easier to optimise.
If contention (performance) is an issue, you could try a more fine grained locking strategy by synchronizing on the keys or values but it could be quite tricky (and locking on Strings is not such a good idea because of String pooling).
It seems that in all cases, you could use a non concurrent set as you will need to synchronize it externally yourself.
I would like to implement a variation on the "Map of Sets" collection that will be constantly accessed by multiple threads. I am wondering whether the synchronization I am doing is sufficient to guarantee that no issues will manifest.
So given the following code, where Map, HashMap, and Set are the Java implementations, and Key and Value are some arbitrary Objects:
public class MapOfSets {
private Map<Key, Set<Value>> map;
public MapOfLists() {
map = Collections.synchronizedMap(new HashMap<Key, Set<Value>());
}
//adds value to the set mapped to key
public void add(Key key, Value value) {
Set<Value> old = map.get(key);
//if no previous set exists on this key, create it and add value to it
if(old == null) {
old = new Set<Value>();
old.add(value);
map.put(old);
}
//otherwise simply insert the value to the existing set
else {
old.add(value);
}
}
//similar to add
public void remove(Key key, Value value) {...}
//perform some operation on all elements in the set mapped to key
public void foo(Key key) {
Set<Value> set = map.get(key);
for(Value v : set)
v.bar();
}
}
The idea here is that because I've synchronized the Map itself, the get() and put() method should be atomic right? So there should be no need to do additional synchronization on the Map or the Sets contained in it. So will this work?
Alternatively, would the above code be advantageous over another possible synchronization solution:
public class MapOfSets {
private Map<Key, Set<Value>> map;
public MapOfLists() {
map = new HashMap<Key, Set<Value>();
}
public synchronized void add(Key key, Value value) {
Set<Value> old = map.get(key);
//if no previous set exists on this key, create it and add value to it
if(old == null) {
old = new Set<Value>();
old.add(value);
map.put(old);
}
//otherwise simply insert the value to the existing set
else {
old.add(value);
}
}
//similar to add
public synchronized void remove(Key key, Value value) {...}
//perform some operation on all elements in the set mapped to key
public synchronized void foo(Key key) {
Set<Value> set = map.get(key);
for(Value v : set)
v.bar();
}
}
Where I leave the data structures unsynchronized but synchronize all the possible public methods instead. So which ones will work, and which one is better?
The first implementation you posted is not thread safe. Consider what happens when the add method is accessed by two concurrent threads with the same key:
thread A executes line 1 of the method, and gets a null reference because no item with the given key is present
thread B executes line 1 of the method, and gets a null reference because no item with the given key is present — this will happen after A returns from the first call, as the map is synchronized
thread A evaluates the if condition to false
thread B evaluates the if condition to false
From that point on, the two threads will carry on with execution of the true branch of the if statement, and you will lose one of the two value objects.
The second variant of the method you posted looks safer.
However, if you can use third party libraries, I would suggest you to check out Google Guava, as they offer concurrent multimaps (docs).
The second one is correct, but the first one isn't.
Think about it a minute, and suppose two threads are calling add() in parallel. Here's what could occur:
Thread 1 calls add("foo", bar");
Thread 2 calls add("foo", baz");
Thread 1 gets the set for "foo" : null
Thread 2 gets the set for "foo" : null
Thread 1 creates a new set and adds "bar" in it
Thread 2 creates a new set and adds "baz" in it
Thread 1 puts its set in the map
Thread 2 puts its set in the map
At the end of the story, the map contains one value for "foo" instead of two.
Synchronizing the map makes sure that its internal state is coherent, and that each method you call on the map is thread-safe. but it doesn't make the get-then-put operation atomic.
Consider using one of Guava's SetMultiMap implementations, which does everything for you. Wrap it into a call to Multimaps.synchronizedSetMultimap(SetMultimap) to make it thread-safe.
Your second implementation will work, but it holds locks for longer than it needs to (an inevitable problem with using synchronized methods rather than synchronized blocks), which will reduce concurrency. If you find that the limit on concurrency here is a bottleneck, you could shrink the locked regions a bit.
Alternatively, you could use some of the lock-free collections providded by java.util.concurrent. Here's my attempt at that; this isn't tested, and it requires Key to be comparable, but it should not perform any locking ever:
public class MapOfSets {
private final ConcurrentMap<Key, Set<Value>> map;
public MapOfSets() {
map = new ConcurrentSkipListMap<Key, Set<Value>>();
}
private static ThreadLocal<Set<Value>> freshSets = new ThreadLocal<Set<Value>>() {
#Override
protected Set<Value> initialValue() {
return new ConcurrentSkipListSet<Value>();
}
};
public void add(Key key, Value value) {
Set<Value> freshSet = freshSets.get();
Set<Value> set = map.putIfAbsent(key, freshSet);
if (set == null) {
set = freshSet;
freshSets.remove();
}
set.add(value);
}
public void remove(Key key, Value value) {
Set<Value> set = map.get(key);
if (set != null) {
set.remove(value);
}
}
//perform some operation on all elements in the set mapped to key
public void foo(Key key) {
Set<Value> set = map.get(key);
if (set != null) {
for (Value v: set) {
v.bar();
}
}
}
}
For your Map implementation you could just use a ConcurrentHashMap - You wouldn't have to worry about ensuring thread safety for access, whether it's input or retrieval, as the implemenation takes care of that for you.
And if you really want to use a Set, you could call
Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>())
on your ConcurrentHashMap.
I have code similar to following:
public class Cache{
private final Object lock = new Object();
private HashMap<Integer, TreeMap<Long, Integer>> cache =
new HashMap<Integer, TreeMap<Long, Integer>>();
private AtomicLong FREESPACE = new AtomicLong(102400);
private void putInCache(TreeMap<Long, Integer> tempMap, int fileNr){
int length; //holds the length of data in tempMap
synchronized(lock){
if(checkFreeSpace(length)){
cache.get(fileNr).putAll(tmpMap);
FREESPACE.getAndAdd(-length);
}
}
}
private boolean checkFreeSpace(int length){
while(FREESPACE.get() < length && thereIsSomethingToDelete()){
// deleteSomething returns the length of deleted data or 0 if
// it could not delete anything
FREESPACE.getAndAdd(deleteSomething(length));
}
if(FREESPACE.get() < length) return true;
return false;
}
}
putInCache is called by about 139 threads a second. Can I be sure that these two methods will synchronize on both cache and FREESPACE? Also, is checkFreeSpace() multithread-safe i.e can I be sure that there will be only one invocation of this method at a time? Can the "multithread-safety" of this code be improved?
To have your question answered fully, you would need to show the implementations of the thereIsSomethingToDelete() and deleteSomething() methods.
Given that checkFreeSpace is a public method (does it really need to be?), and is unsynchronized, it is possible it could be called by another thread while the synchronized block in the putInCache() method is running. This by itself might not break anything, since it appears that the checkFreeSpace method can only increase the amount of free space, not reduce it.
What would be more serious (and the code sample doesn't allow us to determine this) is if the thereIsSomethingToDelete() and deleteSomething() methods don't properly synchronize their access to the cache object, using the same Object lock as used by putInCache().
You don't usually synchronize on the fields you want to control access to directly.
The fields that you want to synchronize access to must only be accessed from within synchronized blocks (on the same object) to be considered thread safe. You are already doing this in putInCache().
Therefore, because checkFreeSpace() accesses shared state in an unsynchronized fashion, it is not thread safe.