Is it Thread-safe on this operating on ConcurrentHashMap? - java

private final ConcurrentHashMap<Float, VoteItem> datum = new ConcurrentHashMap<>();
public void vote(float graduation) {
datum.putIfAbsent(graduation, new VoteItem(graduation, new AtomicInteger(0)));
datum.get(graduation).getNum().incrementAndGet();
}
Does the method vote is totally thread safe? VoteItem.getNum() returns an AtomicInteger? Or if there is a better way to achieve it?

If VoteItem#getNum() is thread-safe, e. g. returns final property, and no deletions are performed in parallel thread, your code is also thread-safe as there is no chance for putIfAbsent() to overwrite existing entry, and thus no chance for get() to return entry that is overwritten.
But there is more common way to achieve it using result of putIfAbsent(), which returns existing value if it is present for a given key:
public void vote(float graduation) {
VoteItem i = datum.putIfAbsent(graduation, new VoteItem(graduation, new AtomicInteger(1)));
if (i != null)
i.getNum().incrementAndGet();
}
This handles possibility of concurrent removals as well. In contrast to your code, where concurrent removal can be performed between putIfAbsent() and get() thus causing NPE, here no such situation can occur.
And consider to use computeIfAbsent() instead of putIfAbsent() in order to avoid unnessessary VoteItem creations:
public void vote(float graduation) {
datum.computeIfAbsent(graduation, g -> new VoteItem(g, new AtomicInteger(0)))
.getNum()
.incrementAndGet();
}
Calling getNum() on result is possible because in contrast to putIfAbsent(), which returns null if value didn't exist prior to insertion, it returns just computed value.

Related

Do parallel streams treat upstream iterators in a thread safe way?

Today I was using a stream that was performing a parallel() operation after a map, however; the underlying source is an iterator which is not thread safe which is similar to the BufferedReader.lines implementation.
I originally thought that trySplit would be called on the created thread, however; I observed that the accesses to the iterator have come from multiple threads.
By example, the following silly iterator implementation is just setup with enough elements to cause splitting and also keeps track of the unique threads that accessed the hasNext method.
class SillyIterator implements Iterator<String> {
private final ArrayDeque<String> src =
IntStream.range(1, 10000)
.mapToObj(Integer::toString)
.collect(toCollection(ArrayDeque::new));
private Map<String, String> ts = new ConcurrentHashMap<>();
public Set<String> threads() { return ts.keySet(); }
private String nextRecord = null;
#Override
public boolean hasNext() {
var n = Thread.currentThread().getName();
ts.put(n, n);
if (nextRecord != null) {
return true;
} else {
nextRecord = src.poll();
return nextRecord != null;
}
}
#Override
public String next() {
if (nextRecord != null || hasNext()) {
var rec = nextRecord;
nextRecord = null;
return rec;
}
throw new NoSuchElementException();
}
}
Using this to create a stream as follows:
var iter = new SillyIterator();
StreamSupport
.stream(Spliterators.spliteratorUnknownSize(
iter, Spliterator.ORDERED | Spliterator.NONNULL
), false)
.map(n -> "value = " + n)
.parallel()
.collect(toList());
System.out.println(iter.threads());
This on my system output the two fork join threads as well as the main thread, which kind of scared me.
[ForkJoinPool.commonPool-worker-1, ForkJoinPool.commonPool-worker-2, main]
Thread safety does not necessarily imply being accessed by only one thread. The important aspect is that there is no concurrent access, i.e. no access by more than one thread at the same time. If the access by different threads is temporally ordered and this ordering also ensures the necessary memory visibility, which is the responsibility of the caller, it still is a thread safe usage.
The Spliterator documentation says:
Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition.
The spliterator doesn’t need to be confined to the same thread throughout its lifetime, but there should be a clear handover at the caller’s side ensuring that the old thread stops using it before the new thread starts using it.
But the important takeaway is, the spliterator doesn’t need to be thread safe, hence, the iterator wrapped by a spliterator also doesn’t need to be thread safe.
Note that a typical behavior is splitting and handing over before starting traversal, but since an ordinary Iterator doesn’t support splitting, the wrapping spliterator has to iterate and buffer elements to implement splitting. Therefore, the Iterator experiences traversal by different threads (but one at a time) when the traversal has not been started from the Stream implementation’s perspective.
That said, the lines() implementation of BufferedReader is a bad example which you should not follow. Since it’s centered around a single readLine() call, it would be natural to implement Spliterator directly instead of implementing a more complicated Iterator and have it wrapped via spliteratorUnknownSize(…).
Since your example is likewise centered around a single poll() call, it’s also straight-forward to implement Spliterator directly:
class SillySpliterator extends Spliterators.AbstractSpliterator<String> {
private final ArrayDeque<String> src = IntStream.range(1, 10000)
.mapToObj(Integer::toString).collect(toCollection(ArrayDeque::new));
SillySpliterator() {
super(Long.MAX_VALUE, ORDERED | NONNULL);
}
#Override
public boolean tryAdvance(Consumer<? super String> action) {
String nextRecord = src.poll();
if(nextRecord == null) return false;
action.accept(nextRecord);
return true;
}
}
Depending on your real life case, you may also pass the actual deque size to the constructor and provide the SIZED characteristic.
Then, you may use it like
var result = StreamSupport.stream(new SillySpliterator(), true)
.map(n -> "value = " + n)
.collect(toList());

Check size and then perform operation - is it safe for ConcurrentLinkedDeque?

I need to replace the first value in Deque with the new value, only
if the size will exceed the limit. I wrote this code to solve it:
final class Some {
final int buffer;
final Deque<Operation> operations = new ConcurrentLinkedDeque<>();
// constructors ommited;
#Override
public void register(final Operation operation) {
if (this.operations.size() == this.buffer) {
// remove the oldest operation
this.operations.removeFirst();
}
// add new operation to the tail
this.operations.addLast(operation);
}
#Override
public void apply() {
// take the fresh operation from tail and perform it
this.operations.removeLast().perform();
}
}
As you see, I have two methods, that modifies the Deque. I have doubts, that this code will work correctly in the multithreaded environment. The question is: is it safe to check the size() and then performing operations, that modifies the ConcurrentLinkedDeque afterward? I want to have as least locks as possible. So if this code won't work, then I had to introduce locking and then there is no point in the usage of ConcurrentLinkedDeque().
final class Some {
final int buffer;
final Deque<Operation> operations = new LinkedList<>();
final Lock lock = new ReentrantLock();
// constructors ommited;
#Override
public void register(final Operation operation) {
this.lock.lock();
try {
if (this.operations.size() == this.buffer) {
// remove the oldest operation
this.operations.removeFirst();
}
// add new operation to the tail
this.operations.addLast(operation);
} finally {
lock.unlock();
}
}
#Override
public void apply() {
this.lock.lock();
try {
// take the fresh operation from tail and perform it
this.operations.removeLast().perform();
} finally {
this.lock.unlock();
}
}
}
This is the alternative with the Lock. Is that the only way to achieve what I want? I am especially interested in trying to use the concurrent collections.
Concurrent collections are thread-safe when it comes to internal state. In other words, they
Allow multiple threads to read/write concurrently without having to worry that the internal state will become corrupted
Allow iteration and removal while other threads are modifying the collection
Not all, however. I believe CopyOnWriteArrayList's Iterator does not support the remove() operation
Guarantees things such as happens-before
Meaning a write by one thread will happen-before a read by a subsequent thread
However, they are not thread-safe across external method calls. When you call one method it will acquire whatever locks are necessary but those locks are released by the time the method returns. If you're not careful this can lead to a check-then-act race condition. Looking at your code
if (this.operations.size() == this.buffer) {
this.operations.removeFirst();
}
this.operations.addLast(operation);
the following can happen:
Thread-A checks size condition, result is false
Thread-A moves to add new Operation
Before Thread-A can add the Operation, Thread-B checks size condition which results in false as well
Thread-B goes to add new Operation
Thread-A does add new Operation
Oh, no! The Operation added by Thread-A causes the size threshold to be reached
Thread-B, already past the if statement, adds its Operation making the deque have one too many Operations
This is why a check-then-act requires external synchronization, which you do in your second example using a Lock. Note you could also use a synchronized block on the Deque.
Unrelated to your question: You call Operation.perform() in your second example while still holding the Lock. This means no other thread can attempt to add another Operation to the Deque while perform() executes. If this isn't desired you can change the code like so:
Operation op;
lock.lock();
try {
op = deque.pollLast(); // poll won't throw exception if there is no element
} finally {
lock.unlock();
}
if (op != null) {
op.perform();
}
From the doc of size()
BlockquoteBeware that, unlike in most collections, this method is NOT a constant-time operation. Because of the asynchronous nature of these deques, determining the current number of elements requires traversing them all to count them. Additionally, it is possible for the size to change during execution of this method, in which case the returned result will be inaccurate. Thus, this method is typically not very useful in concurrent applications.
While #Slaw is correct, also add that an addition/subtraction can occur during the traversal.
I don't use size() in my software. I keep my own count of what is in the collection with an AtomicInteger. If count.get() < max, I can add. Being a little over max is ok for my usage. You can use a lock on count to force compliance.

Delegating thread-safety to ConcurrentMap and AtomicInteger

I need to provide thread-safe implementation of the following container:
public interface ParameterMetaData<ValueType> {
public String getName();
}
public interface Parameters {
public <M> M getValue(ParameterMetaData<M> pmd);
public <M> void put(ParameterMetaData<M> p, M value);
public int size();
}
The thing is the size method should return the accurate number of paramters currently contained in a Parameters instance. So, my first attempt was to try delegating thread-safety as follows:
public final class ConcurrentParameters implements Parameters{
private final ConcurrentMap<ParameterMetaData<?>, Object> parameters =
new ConcurrentHashMap<>();
//Should represent the ACCURATE size of the internal map
private final AtomicInteger size = new AtomicInteger();
#Override
public <M> M getValue(ParameterMetaData<M> pmd) {
#SuppressWarnings("unchecked")
M value = (M) parameters.get(pmd);
return value;
}
#Override
public <M> void put(ParameterMetaData<M> p, M value){
if(value == null)
return;
//The problem is in the code below
M previous = (M) parameters.putIfAbsent(p, value);
if(previous != null)
//throw an exception indicating that the parameter already exists
size.incrementAndGet();
}
#Override
public int size() {
return size.intValue();
}
The problem is that I can't just call parameters.size() on the ConcurrentHashMap instance to return the actual size, as that the operation performs traversal without locking and there's no guaratee that it will retrieve the actual size. It isn't acceptable in my case. So, I decided to maintain the field containing the size.
QUESTION: Is it possible somehow to delegate thread safety and preserve the invariatns?
The outcome you want to achieve is non-atomic. You want to modify map and then get count of elements that would be consistent in a scope of single thread. The only way to achieve that is to make this flow "atomic operation" by synchronizing access to the map. This is the only way to assure that count will not change due to modifications made in another thread.
Synchronize modify-count access to the map via synchronized or Semaphore to allow only single thread to modify map and count elements at the time.
Using additional field as a counter does not guarantee thread safety here, as after map modification and before counter manipulation, other thread can in fact modify map, and the counter value will not be valid.
This is the reason why map does not keeps its size internally but has to traversal over elements - to give most accurate results at given point in time.
EDIT:
To be 100% clear, this is the most convinient way to achieve this:
synchronized(yourMap){
doSomethingWithTheMap();
yourMap.size();
}
so if you will change every map operation to such block, you will guarantee that size() will return accurate count of elements. The only condition is that all data manipulations are done using such synchronized block.

java.util.concurrent: external synchronize to remove map value

I need to keep track of multiple values against unique keys i.e. 1(a,b) 2(c,d) etc...
The solution is accessed by multiple threads so effectively I have the following defined;
ConcurrentSkipListMap<key, ConcurrentSkipListSet<values>>
My question is does the removal of the key when the value set size is 0 need to be synchronized? I know that the two classes are "concurrent" and I've looked through the OpenJDK source code but I there would appear to be a window between one thread T1 checking that the Set is empty and removing the Map in remove(...) and another thread T2 calling add(...). Result being T1 removes last Set entry and removes the Map interleaved with T2 just adding a Set entry. Thus the Map and T2 Set entry are removed by T1 and data is lost.
Do I just "synchronize" the add() and remove() methods or is there a "better" way?
The Map is modified by multiple threads but only through two methods.
Code snippet as follows;
protected static class EndpointSet extends U4ConcurrentSkipListSet<U4Endpoint> {
private static final long serialVersionUID = 1L;
public EndpointSet() {
super();
}
}
protected static class IDToEndpoint extends U4ConcurrentSkipListMap<String, EndpointSet> {
private static final long serialVersionUID = 1L;
protected Boolean add(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
endpoints = new EndpointSet();
put(id, endpoints);
}
endpoints.add(endpoint);
return true;
}
protected Boolean remove(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
return false;
} else {
endpoints.remove(endpoint);
if (endpoints.size() == 0) {
remove(id);
}
return true;
}
}
}
As it is your code has data races. Examples of what could happen:
a thread could add between if (endpoints.size() == 0) and remove(id); - you saw that
in add, a thread could read a non null value in EndpointSet endpoints = get(id); and another thread could remove data from that set, remove the set from the map because the set is empty. The initial thread would then add a value to the set, which is not held in the map any longer => data gets lost too as it becomes unreachable.
The easiest way to solve your issue is to make both add and remove synchronized. But you then lose all the performance benefits of using a ConcurrentMap.
Alternatively, you could simply leave the empty sets in the map - unless you have memory constraints. You would still need some form of synchronization but it would be easier to optimise.
If contention (performance) is an issue, you could try a more fine grained locking strategy by synchronizing on the keys or values but it could be quite tricky (and locking on Strings is not such a good idea because of String pooling).
It seems that in all cases, you could use a non concurrent set as you will need to synchronize it externally yourself.

Correct HashMap Synchronization

Let's say I have a HashMap declared as follows:
#GuardedBy("pendingRequests")
private final Map<UInt32, PendingRequest> pendingRequests = new HashMap<UInt32, PendingRequest>();
Access to the map is multi-threaded, and all access is guarded by synchronizing on this final instance of the map, e.g.:
synchronized (pendingRequests) {
pendingRequests.put(reqId, request);
}
Is this enough? Should the map be created using Collections.synchronizedMap()? Should I be locking on a dedicated lock object instead of the map instance? Or maybe both?
External synchronization (in addition to possibly using Collections.synchronizedMap()) is needed in a couple areas where multiple calls on the map must be atomic.
Synchronizing on the map itself is essentially what the Map returned by Collection.synchronizedMap() would do. For your situation it is a reasonable approach, and there is not much to recommend using a separate lock object other than personal preference (or if you wish to have more fine grained control and use a ReentrantReadWriteLock to allow concurrent reading of the map).
E.g.
private Map<Integer,Object> myMap;
private ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
public void myReadMethod()
{
rwl.readLock().lock();
try
{
myMap.get(...);
...
} finally
{
rwl.readLock().unlock();
}
}
public void myWriteMethod()
{
// may want / need to call rwl.readLock().unlock() here,
// since if you are holding the readLock here already then
// you cannot get the writeLock (so be careful on how your
// methods lock/unlock and call each other).
rwl.writeLock().lock();
try
{
myMap.put(key1,item1);
myMap.put(key2,item2);
} finally
{
rwl.writeLock().unlock();
}
}
All calls to the map need to be synchronized, and Collections.synchronizedMap() gives you that.
However, there is also an aspect of compound logic. If you need the integrity of the compound logic, synchronization of individual calls is not enough. For example, consider the following code:
Object value = yourMap.get(key); // synchronized
if (value == null) {
// do more action
yourMap.put(key, newValue); // synchronized
}
Although individual calls (get() and put()) are synchronized, your logic will not be safe against concurrent access.
Another interesting case is when you iterate. For an iteration to be safe, you'd need to synchronize for the entire duration of the iteration, or you will get ConcurrentModificationExceptions.

Categories