Is an assignment inside ConcurrentHashMap.computeIfAbsent threadsafe? - java

Consider the following implementation of some kind of fixed size cache, that allows lookup by an integer handle:
static class HandleCache {
private final AtomicInteger counter = new AtomicInteger();
private final Map<Data, Integer> handles = new ConcurrentHashMap<>();
private final Data[] array = new Data[100_000];
int getHandle(Data data) {
return handles.computeIfAbsent(data, k -> {
int i = counter.getAndIncrement();
if (i >= array.length) {
throw new IllegalStateException("array overflow");
}
array[i] = data;
return i;
});
}
Data getData(int handle) {
return array[handle];
}
}
There is an array store inside the compute function, which is not synchronized in any way. Would it be allowed by the java memory model for other threads to read a null value from this array later on?
PS: Would the outcome change if the id returned from getHandle was stored in a final field and only accessed through this field from other threads?

The read access isn't thread safe. You could make it thread safe indirectly however it's likely to be brittle. I would implemented it in a much simpler way and only optimise it later should it prove to a performance problem. e.g. because you see it in a profiler for a realistic test.
static class HandleCache {
private final Map<Data, Integer> handles = new HashMap<>();
private final List<Data> dataByIndex = new ArrayList<>();
synchronized int getHandle(Data data) {
Integer id = handles.get(data);
if (id == null) {
id = handles.size();
handles.put(data, id);
dataByIndex.add(id);
}
return id;
}
synchronized Data getData(int handle) {
return dataByIndex.get(handle);
}
}

Assuming that you determine the index for the array read from the value of counter than yes - you may get a null read
The simplest example (there are others) is a follows:
T1 calls getHandle(data) and is suspended just after int i = counter.getAndIncrement();
T2 calls handles[counter.get()] and reads null.
You should be able to easily verify this with a strategically placed sleep and two threads.

From the documentation of ConcurrentHashMap#computeIfAbsent:
The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.
The documentation's reference to blocking refers only to update operations on the Map, so if any other thread attempts to access array directly (rather than through an update operation on the Map), there can be race conditions and null can be read.

Related

Is this the correct way to extract counts from a Concurrent Hash Map without missing some or double counting?

Working on something where I'm trying to count the number of times something is happening. Instead of spamming the database with millions of calls, I'm trying to sum the updates in-memory and then dumping the results into the database once per second (so like turning 10 +1s into a single +10)
I've noticed some strange inconsistency with the counts (like there should be exactly 1 million transactions but instead there are 1,000,016 or something).
I'm looking into other possible causes but I wanted to check that this is the correct way of doing things. The use case is that it needs to be eventually correct, so it's okay as long as the counts aren't double counted or dropped.
Here is my sample implementation.
public class Aggregator {
private Map<String, LongAdder> transactionsPerUser = new ConcurrentHashMap<>();
private StatisticsDAO statisticsDAO;
public Aggregator(StatisticsDAO statisticsDAO) {
this.statisticsDAO = statisticsDAO;
}
public void incrementCount(String userId) {
transactionsPerId.computeIfAbsent(userId, k -> new LongAdder()).increment();
}
#Scheduled(every = "1s")
public void sendAggregatedStatisticsToDatabase() {
for (String userId : transactionsPerUser.keySet()) {
long count = transactionsPerUser.remove(userId).sum();
statisticsDAO.updateCount(userId, count);
}
}
}
You will have updates dropped in the following scenario:
Thread A calls incrementCount, and finds an already existing LongAdder instance for the given userId, this instance is returned from computeIfAbsent.
Thread B is at the same time handling a sendAggregatedStatisticsToDatabase call, which removes that LongAdder instance from the map.
Thread B calls sum() on the LongAdder instance.
Thread A, still executing that same incrementCount invocation, now calls increment() on the LongAdder instance.
This update is now dropped. It will not be seen by the next invocation of sendAggregatedStatisticsToDatabase, because the increment() call happened on an instance that was removed from the map in between the calls to computeIfAbsent() and increment() in the incrementCount method.
You might be better off reusing the LongAdder instances by doing something like this in sendAggregatedStatisticsToDatabase:
LongAdder longAdder = transactionsPerUser.get(userId);
long count = longAdder.sum();
longAdder.add(-count);
I agree with the answer of #NorthernSky. My answer should be seen as an alternative solution to the problem. Specifically addressing the comments on the accepted answer, saying that a correct and performant solution would be more complex.
I would propose to use a producer/consumer pattern here, using an unbounded blocking queue. The producers call incrementCount() which just adds a userId to the queue.
The consumer is scheduled to run every second and reads the queue into a HashMap, and then pushes the map's data to the DAO.
public class Aggregator {
private final Queue<String> queue = new LinkedBlockingQueue<>();
private final StatisticsDao statisticsDAO;
public Aggregator(StatisticsDao statisticsDAO) {
this.statisticsDAO = statisticsDAO;
}
public void incrementCount(String userId) {
queue.add(userId);
}
#Scheduled(every = "1s")
public void sendAggregatedStatisticsToDatabase() {
int size = queue.size();
HashMap<String, LongAdder> counts = new HashMap<>();
for (int i = 0; i < size; i++) {
counts.computeIfAbsent(queue.remove(), k -> new LongAdder()).increment();
}
counts.forEach((userId, adder) -> statisticsDAO.updateCount(userId, adder.sum()));
}
}
Even better would be to not have a scheduled consumer, but one that keeps reading from the queue into a local HashMap until a timout happens or a size threshold is reached, or even when the queue is empty.
Then it would process the current map and push it entirly into the DAO, clear the map and start reading the queue again until the next time there's enough data to process.

Necessity of the locks while working with concurrent hash map

Here is the code in one of my classes:
class SomeClass {
private Map<Integer, Integer> map = new ConcurrentHashMap<>();
private volatile int counter = 0;
final AtomicInteger sum = new AtomicInteger(0); // will be used in other classes/threads too
private ReentrantLock l = new ReentrantLock();
public void put(String some) {
l.lock();
try {
int tmp = Integer.parseInt(some);
map.put(counter++, tmp);
sum.getAndAdd(tmp);
} finally {
l.unlock();
}
}
public Double get() {
l.lock();
try {
//... perform some map resizing operation ...
// some calculations including sum field ...
} finally {
l.unlock();
}
}
}
You can assume that this class will be used in concurrent environment.
The question is: how do you think is there a necessity of the locks? How does this code smell? :)
Let's look at the operations inside public void put(String some).
map.put(counter++, tmp);
sum.getAndAdd(tmp);
Now let's look at the individual parts.
counter is a volatile variable. So it only provides memory visibility but not atomicity. Since counter++ is a compound operation, you need a lock to achieve atomicity.
map.put(key, value) is atomic since it is a ConcurrentHashMap.
sum.getAndAdd(tmp) is atomic since it is a AtomicInteger.
As you can see, except counter++ every other operation is atomic. However, you are trying to achieve some function by combining all these operations. To achieve atomicity at the functionality level, you need a lock. This will help you to avoid surprising side effects when the threads interleave between the individual atomic operations.
So you need a lock because counter++ is not atomic and you want to combine a few atomic operations to achieve some functionality (assuming you want this to be atomic).
Since you always increment counter when you use it as a key to put into this map:
map.put(counter++, tmp);
when you come to read it again:
return sum / map.get(counter);
map.get(counter) will be null, so this results in a NPE (unless you put more than 2^32 things into the map, ofc). (I'm assuming you mean sum.get(), otherwise it won't compile).
As such, you can have equivalent functionality without any locks:
class SomeClass {
public void put(String some) { /* do nothing */ }
public Double get() {
throw new NullPointerException();
}
}
You've not really fixed the problem with your edit. divisor will still be null, so the equivalent functionality without locks would be:
class SomeClass {
private final AtomicInteger sum = new AtomicInteger(0);
public void put(String some) {
sum.getAndAdd(Integer.parseInt(some));
}
public Double get() {
return sum.get();
}
}

Delegating thread-safety to ConcurrentMap and AtomicInteger

I need to provide thread-safe implementation of the following container:
public interface ParameterMetaData<ValueType> {
public String getName();
}
public interface Parameters {
public <M> M getValue(ParameterMetaData<M> pmd);
public <M> void put(ParameterMetaData<M> p, M value);
public int size();
}
The thing is the size method should return the accurate number of paramters currently contained in a Parameters instance. So, my first attempt was to try delegating thread-safety as follows:
public final class ConcurrentParameters implements Parameters{
private final ConcurrentMap<ParameterMetaData<?>, Object> parameters =
new ConcurrentHashMap<>();
//Should represent the ACCURATE size of the internal map
private final AtomicInteger size = new AtomicInteger();
#Override
public <M> M getValue(ParameterMetaData<M> pmd) {
#SuppressWarnings("unchecked")
M value = (M) parameters.get(pmd);
return value;
}
#Override
public <M> void put(ParameterMetaData<M> p, M value){
if(value == null)
return;
//The problem is in the code below
M previous = (M) parameters.putIfAbsent(p, value);
if(previous != null)
//throw an exception indicating that the parameter already exists
size.incrementAndGet();
}
#Override
public int size() {
return size.intValue();
}
The problem is that I can't just call parameters.size() on the ConcurrentHashMap instance to return the actual size, as that the operation performs traversal without locking and there's no guaratee that it will retrieve the actual size. It isn't acceptable in my case. So, I decided to maintain the field containing the size.
QUESTION: Is it possible somehow to delegate thread safety and preserve the invariatns?
The outcome you want to achieve is non-atomic. You want to modify map and then get count of elements that would be consistent in a scope of single thread. The only way to achieve that is to make this flow "atomic operation" by synchronizing access to the map. This is the only way to assure that count will not change due to modifications made in another thread.
Synchronize modify-count access to the map via synchronized or Semaphore to allow only single thread to modify map and count elements at the time.
Using additional field as a counter does not guarantee thread safety here, as after map modification and before counter manipulation, other thread can in fact modify map, and the counter value will not be valid.
This is the reason why map does not keeps its size internally but has to traversal over elements - to give most accurate results at given point in time.
EDIT:
To be 100% clear, this is the most convinient way to achieve this:
synchronized(yourMap){
doSomethingWithTheMap();
yourMap.size();
}
so if you will change every map operation to such block, you will guarantee that size() will return accurate count of elements. The only condition is that all data manipulations are done using such synchronized block.

Does a synchronized block trigger a full memory fence for arrays?

I am confused about sharing arrays safely between threads in Java, specifically memory fences and the keyword synchronized.
This Q&A is helpful, but does not answer all of my questions: Java arrays: synchronized + Atomic*, or synchronized suffices?
What follows is sample code to demonstrate the issue. Assume there is a pool of worker threads that populates the SharedTable via method add(...). After all worker threads are done, a final thread reads and saves the data.
Sample code to demonstrate the issue:
public final class SharedTable {
// Column-oriented data entries
private final String[] data1Arr;
private final int[] data2Arr;
private final long[] data3Arr;
private final AtomicInteger nextIndex;
public SharedTable(int size) {
this.data1Arr = new String[size];
this.data2Arr = new int[size];
this.data3Arr = new long[size];
this.nextIndex = new AtomicInteger(0);
}
// Thread-safe: Called by worker threads
public void addEntry(String data1, int data2, long data3) {
final int index = nextIndex.getAndIncrement();
data1Arr[index] = data1;
data2Arr[index] = data2;
data3Arr[index] = data3;
}
// Not thread-safe: Called by clean-up/joiner/collator thread...
// after worker threads are complete
public void save() {
// Does this induce a full memory fence to ensure thread-safe reading of
synchronized (this) {
final int usedSide = nextIndex.get();
for (int i = 0; i < usedSide; ++i) {
final String data1 = data1Arr[i];
final int data2 = data2Arr[i];
final long data3 = data3Arr[i];
// TODO: Save data here
}
}
}
}
The sample code above could also be implemented using Atomic*Array, which acts as an "array of volatile values/references".
public final class SharedTable2 {
// Column-oriented data entries
private final AtomicReferenceArray<String> data1Arr;
private final AtomicIntegerArray data2Arr;
private final AtomicLongArray data3Arr;
private final AtomicInteger nextIndex;
public SharedTable2(int size) { ... }
// Thread-safe: Called by worker threads
public void addEntry(String data1, int data2, long data3) {
final int index = nextIndex.getAndIncrement();
data1Arr.set(index, data1);
...
}
// Not thread-safe: Called by clean-up/joiner/collator thread...
// after worker threads are complete
public void save() {
final int usedSide = nextIndex.get();
for (int i = 0; i < usedSide; ++i) {
final String data1 = data1Arr.get(i);
final int data2 = data2Arr.get(i);
final long data3 = data3Arr.get(i);
// TODO: Save data here
}
}
}
Is SharedTable thread-safe (and cache coherent)?
Is SharedTable (much?) more efficient as only a single memory fence is required, whereas SharedTable2 invokes a memory fence for each call to Atomic*Array.set(...)?
If it helps, I am using Java 8 on 64-bit x86 hardware (Windows and Linux).
No, SharedTable is not thread-safe. A happens-before is only guaranteed if you read, from a synchronized block, something that has been written from a synchronized block using the same lock.
Since the writes are made out of a synchronized block, the JMM doesn't guarantee that the writes will be visible by the reader thread.
Mutating an object that is exchanged between threads, can be done outside of a synchronized block.
Let's first introduce a very practical example. Imagine you have 2 threads; one produces jobs and the other consumes jobs. These threads communicate with each other using a queue. Let's assume a BLockingQueue. Then the producer thread can use simple POJO objects that do not have any internal synchronization and safely exchange these POJOs with the consumer thread. This is exactly how java Executors work. In the documentation, you will find something about the memory consistency effects.
Why does it work?
There needs to be a happens-before edge between writing of the fields of the job and reading the fields of the job.
class Job{int a;}
queue = new SomeBlockingQueue();
thread1:
job = new Job();
job.a=1; (1)
queue.put(job); (2)
thread2:
job=queue.take(); (3)
r1=job.a; (4)
There is a happens-before edge between (1) and (2) due to program order rule.
There is a happens-before edge between (2) and (3) due to either the monitor lock rule or volatile variable rule.
There is a happens-before edge between (3) and (4) due to the program order rule.
Because the happens-before relation is transitive, there is a happens-before edge between (1) and (4) and hence there is no data race.
So the above code will work fine. But if the producer modifies the Job after it has put it on the queue, then there could be a data race. So you need to make sure your code doesn't suffer from that problem.

Synchronizing on two or more objects (Java)

I have code similar to following:
public class Cache{
private final Object lock = new Object();
private HashMap<Integer, TreeMap<Long, Integer>> cache =
new HashMap<Integer, TreeMap<Long, Integer>>();
private AtomicLong FREESPACE = new AtomicLong(102400);
private void putInCache(TreeMap<Long, Integer> tempMap, int fileNr){
int length; //holds the length of data in tempMap
synchronized(lock){
if(checkFreeSpace(length)){
cache.get(fileNr).putAll(tmpMap);
FREESPACE.getAndAdd(-length);
}
}
}
private boolean checkFreeSpace(int length){
while(FREESPACE.get() < length && thereIsSomethingToDelete()){
// deleteSomething returns the length of deleted data or 0 if
// it could not delete anything
FREESPACE.getAndAdd(deleteSomething(length));
}
if(FREESPACE.get() < length) return true;
return false;
}
}
putInCache is called by about 139 threads a second. Can I be sure that these two methods will synchronize on both cache and FREESPACE? Also, is checkFreeSpace() multithread-safe i.e can I be sure that there will be only one invocation of this method at a time? Can the "multithread-safety" of this code be improved?
To have your question answered fully, you would need to show the implementations of the thereIsSomethingToDelete() and deleteSomething() methods.
Given that checkFreeSpace is a public method (does it really need to be?), and is unsynchronized, it is possible it could be called by another thread while the synchronized block in the putInCache() method is running. This by itself might not break anything, since it appears that the checkFreeSpace method can only increase the amount of free space, not reduce it.
What would be more serious (and the code sample doesn't allow us to determine this) is if the thereIsSomethingToDelete() and deleteSomething() methods don't properly synchronize their access to the cache object, using the same Object lock as used by putInCache().
You don't usually synchronize on the fields you want to control access to directly.
The fields that you want to synchronize access to must only be accessed from within synchronized blocks (on the same object) to be considered thread safe. You are already doing this in putInCache().
Therefore, because checkFreeSpace() accesses shared state in an unsynchronized fashion, it is not thread safe.

Categories