ConcurrentSkipListMap how to make remove and add calls atomic - java

I have N threads that add values and one removing thread. I am thinking of the best way how to sync adding to existing list of values and removing of the list.
I guess following case is possible:
thread 1 checked condition containsKey, and entered in else block
thread 2 removed the value
thread 1 try to add value to existing list, and get returns null
I think the only approach that I can use is syncing by map value, in our case is List when we adding and when we deleting
private ConcurrentSkipListMap<LocalDateTime, List<Task>> tasks = new ConcurrentSkipListMap<>();
//Thread1,3...N
public void add(LocalDateTime time, Task task) {
if (!tasks.containsKey(time)) {
tasks.computeIfAbsent(time, k -> createValue(task));
} else {
//potentially should be synced
tasks.get(time).add(task);
}
}
private List<Task> createValue(Task val) {
return new ArrayList<>(Arrays.asList(val));
}
//thread 2
public void remove()
while(true){
Map.Entry<LocalDateTime, List<Task>> keyVal = tasks.firstEntry();
if (isSomeCondition(keyVal)) {
tasks.remove(keyVal.getKey());
for (Task t : keyVal.getValue()) {
//do task processing
}
}
}
}

About the add part you would be really inclined to use merge, but the documentation is pretty clear about it - saying that it is not guaranteed to happen atomically.
I would replace your add with merge, but under a lock.
SomeLock lock ...
public void add(LocalDateTime time, Task task) {
lock.lock();
tasks.merge...
lock.unlock();
}
And same for the remove method. But then, if you are doing things under a lock there is no need for ConcurrentSkipListMap in the first place.
On the other hand if you are OK changing to ConcurrentHashMap - it has merge that is atomic for example.

It’s not entirely clear what your remove() method is supposed to do. In its current form, it’s an infinite loop, first, it will iterate over the head elements and remove them, until the condition is not met for the head element, then, it will repeatedly poll for that head element and re-evaluate the condition. Unless, it manages to remove all elements, in which case it will bail out with an exception.
If you want to process all elements currently in the map, you may simply loop over it, the weakly consistent iterators allow you to proceed while modifying it; you may notice ongoing concurrent updates or not.
If you want to process the matching head elements only, you have to insert a condition to either, return to the caller or put the thread into sleep (or better add a notification mechanism), to avoid burning the CPU with a repeated failing test (or even throw when the map is empty).
Besides that, you can implement the operations using ConcurrentSkipListMap when you ensure that there is no interference between the functions. Assuming remove is supposed to process all current elements once, the implementation may look like
public void add(LocalDateTime time, Task task) {
tasks.merge(time, Collections.singletonList(task),
(l1,l2) -> Stream.concat(l1.stream(),l2.stream()).collect(Collectors.toList()));
}
public void remove() {
for(Map.Entry<LocalDateTime, List<Task>> keyVal : tasks.entrySet()) {
final List<Task> values = keyVal.getValue();
if(isSomeCondition(keyVal) && tasks.remove(keyVal.getKey(), values)) {
for (Task t : values) {
//do task processing
}
}
}
}
The key point is that the lists contained in the map are never modified. The merge(time, Collections.singletonList(task), … operation will even store an immutable list of a single task if there was no previous mapping. In case there are previous tasks, the merge function (l1,l2) -> Stream.concat(l1.stream(),l2.stream()).collect(Collectors.toList()) will create a new list rather than modifying the existing ones. This may have a performance impact when the lists become much larger, especially when the operation has to be repeated in the case of contention, but that’s the price for not needing lock nor additional synchronization.
The remove operation uses the remove(key, value) method which only succeeds if the map’s value still matches the expected one. This relies on the fact that neither of our methods ever modifies the lists contained in the map, but replaces them with new list instances when merging. If remove(key, value) succeeds, the list can be processed; at this time, it is not contained in the map anymore. Note that during the evaluation of isSomeCondition(keyVal), the list is still contained in the map, therefore, isSomeCondition(keyVal) must not modify it, though, I assume that this should be the case for a testing method like isSomeCondition anyway. Of course, evaluating the list within isSomeCondition also relies on the other methods never modifying the list.

Related

Accurate data on a map accessed by many threads

I am trying to sort objects into five separate groups depending on a weight given to them at instantiation.
Now, I want to sort these objects into the five groups by their weights. In order to do this, each one must be compared to the other.
Now the problem I'm having is these objects are added to the groups on separate worker threads. Each one is sent to the synchronized sorting function, which compares against all members currently in the three groups, after an object has completed downloading a picture.
The groups have been set up as two different maps. The first being a Hashtable, which crashes the program throwing an unknown ConcurrencyIssue. When I use a ConcurrentHashMap, the data is wrong because it doesn't remove the entry in time before the next object is compared against the ConcurrentHashmap. So this causes a logic error and yields groups that are sorted correctly only half of the time.
I need the hashmap to immediately remove the entry from the map before the next sort occurs... I thought synchronizing the function would do this but it still doesn't seem to work.
Is there a better way to sort objects against each other that are being added to a datastructure by worker threads? Thanks! I'm a little lost on this one.
private synchronized void sortingHat(Moment moment) {
try {
ConcurrentHashMap[] helperList = {postedOverlays, chanl_2, chanl_3, chanl_4, chanl_5};
Moment moment1 = moment;
//Iterate over all channels going from highest channel to lowest
for (int i = channelCount - 1; i > 0; i--) {
ConcurrentHashMap<String, Moment> table = helperList[i];
Set<String> keys = table.keySet();
boolean mOverlap = false;
double width = getWidthbyChannel(i);
//If there is no objects in table, don't bother trying to compare...
if (!table.isEmpty()) {
//Iterate over all objects currently in the hashmap
for (String objId : keys) {
Moment moment2 = table.get(objId);
//x-Overlap
if ((moment2.x + width >= moment1.x - width) ||
(moment2.x - width <= moment1.x + width)) {
//y-Overlap
if ((moment2.y + width >= moment1.y - width) ||
(moment2.y - width <= moment1.y + width)) {
//If there is overlap, only replace the moment with the greater weight.
if (moment1.weight >= moment2.weight) {
mOverlap = true;
table.remove(objId);
table.put(moment1.id, moment1);
}
}
}
}
}
//If there is no overlap, add to channel anyway
if (!mOverlap) {
table.put(moment1.id, moment1);
}
}
} catch (Exception e) {
Log.d("SortingHat", e.toString());
}
}
The table.remove(objId) is where the problems occur. Moment A gets sent to sorting function, and has no problems. Moment B is added, it overlaps, it compares against Moment A. If Moment B is less weight than Moment A, everything is fine. If Moment B is weighted more and A has to be removed, then when moment C gets sorted moment A will still be in the hashmap along with moment B. And so that seems to be where the logic error is.
You are having an issue with your synchronization.
The synchronize you use, will synchronize using the "this" lock. You can imagine it like this:
public synchronized void foo() { ... }
is the same as
public void foo() {
synchronized(this) {
....
}
}
This means, before entering, the current Thread will try to acquire "this object" as a lock. Now, if you have a worker Thread, that also has a synchronized method (for adding stuff to the table), they won't totally exclude each other. What you wanted is, that one Thread has to finish with his work, before the next one can start its work.
The first being a Hashtable, which crashes the program throwing an unknown ConcurrencyIssue.
This problem accourse because it may happen, that 2 Threads call something at the same time. To illustrate, imagine one Thread calling put(key, value) on it and another Thread calling remove(key). If those calls get executed at the same time (like by different cores) what will be the resulting HashTable? Because noone can say for sure, a ConcurrentModificationException will be thrown. Note: This is a verry simplyfied explanation!
When I use a ConcurrentHashMap, the data is wrong because it doesn't remove the entry in time before the next object is compared against the ConcurrentHashmap
The ConcurrentHashMap is a utility, for avoiding said concurrency issues, it is not magical, multi functional, unicorn hunting, butter knife. It snynchronizes the mehtod calls, which results in the fact, that only one Thread can either add to or remove from or do any other work on the HashMap. It does not have the same functionallity as a Lock of some sort, which would result in the access over the map being allocated to on Thread.
There could be one Thread that wants to call add and one that want to call remove. The ConcurrentHashMap only limits those calls in the matter, that they can't happen at the same time. Which comes first? You have power over that (in this scenario). What you want is, that one thread has to finish with his work, before the next one can do its work.
What you realy need is up to you. The java.util.concurrent package brings a whole arsenal of classes you could use. For example:
You could use a lock for each Map. With that, each Thread (either sorting/removing/adding or whatever) could first fetch the Lock for said Map and than work on that Map, like this:
public Worker implements Runnable {
private int idOfMap = ...;
#Override
public void run() {
Lock lock = getLock(idOfMap);
try {
lock.lock();
// The work goes here
//...
} finally {
lock.unlock();
}
}
}
The line lock.lock() would ensure, that there is no other Thread, that is currently working on the Map and modifing it, after the method call returns and this Thread will therefore have the mutial access over the Map. No one sort, before you are finished removing the right element.
Of course, you would somehow have to hold said locks, like in a data-object. With that being said, you could also utilize the Semaphore, synchronized(map) in each Thread or formulating your work on the Map in the form of Runnables and passing those to another Thread that calls all Runnables he received one by one. The possibilities are nearly endless. I personally would recommend on starting with the lock.

How do I iterate over a list that could be added to by another thread

I have a list that I iterate over and perform some actions on. One of the actions that can be performed can result in work being passed off to another thread, which may add more elements to my list while I'm still iterating over it in the first thread.
Is there a way for me to iterate over the list in such a way that the iterator includes the additions from the other thread?
Here's some pseudo Java
Class1{
List multiThreadList = getMyList(); //points to list in Class2
for(Element element:multiThreadList)
//perform some actions, these actions may results in another thread being called
//which will cause addToMyList() to be called while I'm still iterating over it
//I want to keep iterating if an element gets added on the other thread
}
Class2{
List originalList = new ArrayList();
public getMyList()
return originalList;
void addToMyList(Element element)
originalList.add(element);
}
Are you sure List is the kind of collection you need?
I would use BlockingQueue, and remove elements from BlockingQueue in one Thread, and add in another. This way you would not need any additional concurrency control.
BlockingQueue<String> bounded = new LinkedBlockingQueue<String>();
bounded.put("Value");
String value = bounded.take();
Your pseudo code becomes
Class1{
BlockingQueue queue = getMyList();
Object element = queue.poll(0, TimeUnit.SECONDS);
while(element != null) {
//perform some actions, these actions may results in another thread being called
//which will cause addToMyList() to be called while I'm still iterating over it
//I want to keep iterating if an element gets added on the other thread
element = queue.poll(0, TimeUnit.SECONDS);
}
}
Class2{
BlockingQueue originalList = new LinkedBlockingQueue();
public BlockingQueue getMyList()
return originalList;
void addToMyList(Element element)
originalList.put(element);
}
But one thing, you need to understand, that this task in current form would give you inconsistent results. Since you don't control another Thread, your iterator might finish, before new element was added, and you will miss it, while iterating, depending on the state of the system you can miss from zero to all new elements. So you either need to join all created Threads, before you finish iterating, or change the approach.
You would have to use some type of concurrency control mechanism, such as a Mutex lock, a semaphore, or a monitor. These create a 'critical section' of your code which would allow only one thread to access it at a time.
Unfortunately, there is no solution to the problem without somehow locking the code and making a portion of it serial.

Iterating over ConcurrentSkipListSet with different thread removing elements

I have a ConcurrentSKipListSet, and I'm iterating over values in this set with a for-each loop.
Another thread at some point is going to remove an element from this set.
I think I'm running into a situation where one thread removes an element that I'm yet to iterate over (or maybe I've just started to iterate over it) and so a call being made from within the loop fails.
Some code for clarity:
for(Foo foo : fooSet) {
//do stuff
//At this point in time, another thread removes this element from the set
//do some more stuff
callService(foo.getId()); // Fails
}
Reading the docs I can't work out if this is possible or not:
Iterators are weakly consistent, returning elements reflecting the state of the set at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations.
So is this possible, and if so, what's a good way of handling this?
Thanks
Will
I think I'm running into a situation where one thread removes an element that I'm yet to iterate over (or maybe I've just started to iterate over it) and so a call being made from within the loop fails.
I don't think that's what the javadocs are saying:
Iterators are weakly consistent, returning elements reflecting the state of the set at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations.
This is saying that you don't have to worry about someone removing from the ConcurrentSkipListSet at the same time that you are iterating across the list. There certainly is going to be a race condition as you are moving across the iterator however. Either foo gets removed right after your iterator gets it or it was removed right before and the iterator doesn't see it.
callService(foo.getId()); // this shouldn't "fail"
If foo gets returned by the iterator, your service call won't "fail" unless it is assuming that the foo is still in the list and somehow checking it. The worst case is that you might do some operations on foo and call the service with it even though it was just removed from the list by the other thread.
I've hit this problem as well with queues that are written to and read by different threads. One approach is to mark instead of remove elements that are no longer needed. You can run a cleanup iterator after you go through the whole list. You need a global lock just for removing elements from the list, and the rest of the time your code can run in parallel. Schematically it works like this:
writer:
while() {
set.add(something);
something.markForDelete();
}
reader:
while() {
// process async
iterator iter = set.getIterator();
for(iter.hasNext()) {
... work, check isMarkedForDelete() ...
}
iter = set.getIterator();
// delete, sync
globalLock.Lock();
for(iter.hasNext()) {
if(something.isMarkedForDelete()) {
set.remove(something);
}
globalLock.Unlock();
}
}

Creating a ConcurrentHashMap that supports "snapshots"

I'm attempting to create a ConcurrentHashMap that supports "snapshots" in order to provide consistent iterators, and am wondering if there's a more efficient way to do this. The problem is that if two iterators are created at the same time then they need to read the same values, and the definition of the concurrent hash map's weakly consistent iterators does not guarantee this to be the case. I'd also like to avoid locks if possible: there are several thousand values in the map and processing each item takes several dozen milliseconds, and I don't want to have to block writers during this time as this could result in writers blocking for a minute or longer.
What I have so far:
The ConcurrentHashMap's keys are Strings, and its values are instances of ConcurrentSkipListMap<Long, T>
When an element is added to the hashmap with putIfAbsent, then a new skiplist is allocated, and the object is added via skipList.put(System.nanoTime(), t).
To query the map, I use map.get(key).lastEntry().getValue() to return the most recent value. To query a snapshot (e.g. with an iterator), I use map.get(key).lowerEntry(iteratorTimestamp).getValue(), where iteratorTimestamp is the result of System.nanoTime() called when the iterator was initialized.
If an object is deleted, I use map.get(key).put(timestamp, SnapShotMap.DELETED), where DELETED is a static final object.
Questions:
Is there a library that already implements this? Or barring that, is there a data structure that would be more appropriate than the ConcurrentHashMap and the ConcurrentSkipListMap? My keys are comparable, so maybe some sort of concurrent tree would better support snapshots than a concurrent hash table.
How do I prevent this thing from continually growing? I can delete all of the skip list entries with keys less than X (except for the last key in the map) after all iterators that were initialized on or before X have completed, but I don't know of a good way to determine when this has happened: I can flag that an iterator has completed when its hasNext method returns false, but not all iterators are necessarily going to run to completion; I can keep a WeakReference to an iterator so that I can detect when it's been garbage collected, but I can't think of a good way to detect this other than by using a thread that iterates through the collection of weak references and then sleeps for several minutes - ideally the thread would block on the WeakReference and be notified when the wrapped reference is GC'd, but I don't think this is an option.
ConcurrentSkipListMap<Long, WeakReference<Iterator>> iteratorMap;
while(true) {
long latestGC = 0;
for(Map.Entry<Long, WeakReference<Iterator>> entry : iteratorMap.entrySet()) {
if(entry.getValue().get() == null) {
iteratorMap.remove(entry.getKey());
latestGC = entry.getKey();
} else break;
}
// remove ConcurrentHashMap entries with timestamps less than `latestGC`
Thread.sleep(300000); // five minutes
}
Edit: To clear up some confusion in the answers and comments, I'm currently passing weakly consistent iterators to code written by another division in the company, and they have asked me to increase the strength of the iterators' consistency. They are already aware of the fact that it is infeasible for me to make 100% consistent iterators, they just want a best effort on my part. They care more about throughput than iterator consistency, so coarse-grained locks are not an option.
What is your actual use case that requires a special implementation? From the Javadoc of ConcurrentHashMap (emphasis added):
Retrievals reflect the results of the most recently completed update operations holding upon their onset. ... Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
So the regular ConcurrentHashMap.values().iterator() will give you a "consistent" iterator, but only for one-time use by a single thread. If you need to use the same "snapshot" multiple times and/or by multiple threads, I suggest making a copy of the map.
EDIT: With the new information and the insistence for a "strongly consistent" iterator, I offer this solution. Please note that the use of a ReadWriteLock has the following implications:
Writes will be serialized (only one writer at a time) so write performance may be impacted.
Concurrent reads are allowed as long as there is no write in progress, so read performance impact should be minimal.
Active readers block writers but only as long as it takes to retrieve the reference to the current "snapshot". Once a thread has the snapshot, it no longer blocks writers no matter how long it takes to process the information in the snapshot.
Readers are blocked while any write is active; once the write finishes then all readers will have access to the new snapshot until a new write replaces it.
Consistency is achieved by serializing the writes and making a copy of the current values on each and every write. Readers that hold a reference to a "stale" snapshot can continue to use the old snapshot without worrying about modification, and the garbage collector will reclaim old snapshots as soon as no one is using it any more. It is assumed that there is no requirement for a reader to request a snapshot from an earlier point in time.
Because snapshots are potentially shared among multiple concurrent threads, the snapshots are read-only and cannot be modified. This restriction also applies to the remove() method of any Iterator instances created from the snapshot.
import java.util.*;
import java.util.concurrent.locks.*;
public class StackOverflow16600019 <K, V> {
private final ReadWriteLock locks = new ReentrantReadWriteLock();
private final HashMap<K,V> map = new HashMap<>();
private Collection<V> valueSnapshot = Collections.emptyList();
public V put(K key, V value) {
locks.writeLock().lock();
try {
V oldValue = map.put(key, value);
updateSnapshot();
return oldValue;
} finally {
locks.writeLock().unlock();
}
}
public V remove(K key) {
locks.writeLock().lock();
try {
V removed = map.remove(key);
updateSnapshot();
return removed;
} finally {
locks.writeLock().unlock();
}
}
public Collection<V> values() {
locks.readLock().lock();
try {
return valueSnapshot; // read-only!
} finally {
locks.readLock().unlock();
}
}
/** Callers MUST hold the WRITE LOCK. */
private void updateSnapshot() {
valueSnapshot = Collections.unmodifiableCollection(
new ArrayList<V>(map.values())); // copy
}
}
I've found that the ctrie is the ideal solution - it's a concurrent hash array mapped trie with constant time snapshots
Solution1) What about just synchronizing on the puts, and on the iteration. That should give you a consistent snapshot.
Solution2) Start iterating and make a boolean to say so, then override the puts, putAll so that they go into a queue, when the iteration is finished simply make those puts with the changed values.

Atomically perform multiple operations

I'm trying to find a way to perform multiple operations on a ConcurrentHashMap in an atomic manner.
My logic is like this:
if (!map.contains(key)) {
map.put(key, value);
doSomethingElse();
}
I know there is the putIfAbsent method. But if I use it, I still won't be able to call the doSomethingElse atomically.
Is there any way of doing such things apart from resorting to synchronization / client-side locking?
If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.
If it helps, the doSomethingElse in my case would be pretty complex, involving creating and starting a thread that looks for the key that we just added to the map.
If that's the case, you would generally have to synchronize externally.
In some circumstances (depending on what doSomethingElse() expects the state of the map to be, and what the other threads might do the map), the following may also work:
if (map.putIfAbsent(key, value) == null) {
doSomethingElse();
}
This will ensure that only one thread goes into doSomethingElse() for any given key.
This would work unless you want all putting threads to wait until the first successful thread puts in the map..
if(map.get(key) == null){
Object ret = map.putIfAbsent(key,value);
if(ret == null){ // I won the put
doSomethingElse();
}
}
Now if many threads are putting with the same key only one will win and only one will doSomethingElse().
If your design demands that the map access and the other operation be grouped without anybody else accessing the map, then you have no choice but to lock them. Perhaps the design can be revisited to avoid this need?
This also implies that all other accesses to the map must be serialized behind the same lock.
You might keep a lock per entry. That would allow concurrent non-locking updates, unless two threads try to access the same element.
class LockedReference<T> {
Lock lock = new ReentrantLock();;
T value;
LockedReference(T value) {this.value=value;}
}
LockedReference<T> ref = new LockedReference(value);
ref.lock.lock(); //lock on the new reference, there is no contention here
try {
if (map.putIfAbsent(key, ref)==null) {
//we have locked on the key before inserting the element
doSomethingElse();
}
} finally {ref.lock.unlock();}
later
Object value;
while (true) {
LockedReference<T> ref = map.get(key)
if (ref!=null) {
ref.lock.lock();
//there is no contention, unless a thread is already working on this entry
try {
if (map.containsKey(key)) {
value=ref.value;
break;
} else {
/*key was removed between get and lock*/
}
} finally {ref.lock.unlock();}
} else value=null;
}
A fancier approach would be rewriting ConcurrentHashMap and have a version of putIfAbsent that accepts a Runnable (which is executed if the element was put). But that would be far far more complex.
Basically, ConcurrentHashMap implements locked segments, which is in the middle between one lock per entry, and one global lock for the whole map.

Categories