Is incrementing of current value while putting thread-safe in ConcurrentHashMap?

Is incrementing of current value while putting thread-safe in ConcurrentHashMap? - java

I wonder what happens when I modify current value while putting it into ConcurrentHashMap.
I have ConcurrentHashMap (ConncurentHashMap<String, Integer> attendance) with existing mapping of conference hall names and number of visitors to each.
Every invocation of visit(conferenceHallName) method is supposed to increment the number of visitors to the given conference hall. Every invoker is a thread.
So here is the method:
public void visit(String conferenceHallName) {
attendance.put(conferenceHallName, attendance.get(conferenceHallName) + 1);
}
put() is locking method, get() is not. But what happens first in this case:
thread will lock segment with this mapping, then calculate a new value and then put and release which is perfect for me
or
thread will get the old value, calculate a new one, then lock segment and put which means inconsistency for me and I will need to find a workaround.
And if the second scenario is what happens in reality will using AtomicInteger instead of Integer solve my issue?

The second description is closer to what actually happens: the thread will access the value in a thread-safe way, construct a new Integer with the updated count, then lock the map, and replace the object.
Using AtomicInteger instead of Integer will solve the issue:
attendance.get(conferenceHallName).getAndIncrement();
This is assuming that all conferenceHallName keys are properly set in the map.

Related

Concurrent frequency counter - concurrency issue

I would like to create a concurrent frequency counter class in Java.
It's about that once a request is processed (by processRequest method), the code checks the request's type (an integer) and counts how many requests have been processed (grouped by the request's type) from a given time. The processRequest method will be called by multiple threads in the same time.
There are two other methods:
clearMap(): It will be called by one thread in every 3 hours and clears the whole map.
getMap(): It can be called in any time by a webservice and returns an immutable copy of the current state of the frequency map.
See below my initial plan to implement that.
public class FrequencyCounter {
private final ConcurrentHashMap<Integer,Long> frequencenyMap = new ConcurrentHashMap<>();
public void processRequest(Request request){
frequencenyMap.merge(request.type, 0L, (v, d) -> v+1);
}
public void clearMap(){
frequencenyMap.clear();
}
public Map<Integer,Long> getMap(){
return ImmutableMap.copyOf(frequencenyMap);
}
}
I checked the documentation of ConcurrentHashMap and it tells that the merge method is performed atomically.
So once the clear() method starts to clear the hash buckets of the map (locking as per hash bucket), it can't be invoked when another thread is between getting the value of the frequency map and incrementing its value in the processRequest method because the merge method is executed atomically.
Am I right?
Does my above plan seem to be fine?
Thank you for your advice.

First, replace Long with AtomicLong.
Second, use computeIfAbsent.
private final Map<Integer, AtomicLong> frequencyMap = new ConcurrentHashMap<>();
public void processRequest(Request request){
frequencyMap.computeIfAbsent(request.type, k -> new AtomicLong())
.incrementAndGet();
}
There are a few reasons why I believe this is a better solution:
The code in the question uses boxed objects, i.e. (v, d) -> v+1 is really (Long v, Long d) -> Long.valueOf(v.longValue() + 1).
That code generates extra garbage, which can be avoided by using AtomicLong.
The code here only allocates one object per key, and doesn't require any extra allocations to increment the counter, e.g. it will still only be the one object even if counter goes to the millions.
The unboxing, adding 1, boxing operation will likely take slightly longer than the tightly coded incrementAndGet() operation, increasing the likelyhood of a collision, requiring a re-try in the merge method.
Code "purity". Using a method that takes a "value", which is then entirely ignored, seems wrong to me. It is unnecessary code noise.
These are of course my opinions. You can make your own decision, but I think this code clarifies the purpose, i.e. to increment a long counter, in a fully thread-safe way.

Adding to AtomicInteger within ConcurrentHashMap

I have the following defined
private ConcurrentMap<Integer, AtomicInteger> = new ConcurrentHashMap<Integer, AtomicInteger>();
private void add() {
staffValues.replace(100, staffValues.get(100), new AtomicInteger(staffValues.get(100).addAndGet(200)));
}
After testing, the values I am getting are not expected, and I think there is a race condition here. Does anyone know if this would be considered threadsafe by wrapping the get call in the replace function?

A good way to handle situations like this is using the computeIfAbsent method (not the compute method that #the8472 recommends)
The computeIfAbsent accepts 2 arguments, the key, and a Function<K, V> that will only be called if the existing value is missing. Since a AtomicInteger is thread safe to increment from multiple threads, you can use it easely in the following manner:
staffValues.computeIfAbsent(100, k -> new AtomicInteger(0)).addAndGet(200);

There are a few issues with your code. The biggest is that you're ignoring the return-value of ConcurrentHashMap.replace: if the replacement doesn't happen (due to another thread having made a replacement in parallel), you simply proceed as if it happened. This is the main reason you're getting wrong results.
I also think it's a design mistake to mutate an AtomicInteger and then immediately replace it with a different AtomicInteger; even if you can get this working, there's simply no reason for it.
Lastly, I don't think you should call staffValues.get(100) twice. I don't think that causes a bug in the current code — your correctness depends only on the second call returning a "newer" result than the first, which I think is actually guaranteed by ConcurrentHashMap — but it's fragile and subtle and confusing. In general, when you call ConcurrentHashMap.replace, its third argument should be something you computed using the second.
Overall, you can simplify your code either by not using AtomicInteger:
private ConcurrentMap<Integer, Integer> staffValues = new ConcurrentHashMap<>();
private void add() {
final Integer prevValue = staffValues.get(100);
staffValues.replace(100, prevValue, prevValue + 200);
}
or by not using replace (and perhaps not even ConcurrentMap, depending how else you're touching this map):
private Map<Integer, AtomicInteger> staffValues = new HashMap<>();
private void add() {
staffValues.get(100).addAndGet(200);
}

You don't need to use replace(). AtomicInteger is a mutable value that does not need to be substituted whenever you want to increment it. In fact addAndGet already increments it in place.
Instead use compute to put a default value (presumably 0) into the map when none is present and otherwise get the pre-existing value and increment that.
If, on the other hand, you want to use immutable values put Integer instances instead of AtomicInteger into the map and update them with the atomic compute/replace/merge operations.

Why or when would a Map.get(..) need synchronization?

This is a code snippet from collections's SynchronizedMap. My question is not specific to the code snippet below - but a generic one: Why does a get operation need synchronization?
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}

If your threads are only ever getting from the Map, the synchronization is not needed. In this case it might be a good idea to express this fact by using an immutable map, like the one from the Guava libraries, this protects you at compile time from accidentally modifying the map anyway.
The trouble begins when multiple threads are reading and modifying the map, because the internal structure of, e.g. the HashMap implementation from the Java standard libraries is not prepared for that. In this case you can either wrap an external serialization layer around that map, like
using the synchronized keyword,
slightly safer would be to use a SynchronizedMap, because then you can't forget the synchonized keyword everywhere it's needed,
protect the map using a ReadWriteLock, which would allow multiple concurrently reading threads (which is fine)
switch to an ConcurrentHashMap altogether, which is prepared for being accessed by multiple threads.
But coming back to you original question, why is the synchronization needed in the first place: This is a bit hard to tell without looking at the code of the class. Possibly it would break when the put or remove from one thread causes the bucket count to change, which would cause a reading thread to see too many / too few elements because the resize is not finished yet. Maybe something completely different, I don't know and it's not really important because the exact reason(s) why it is unsafe can change at any time with a new Java release. The important fact is only that it is not supported and your code will likely blow up one or another way at runtime.

If the table gets resized in the middle of the call to get(), it could potentially look in the wrong bucket and return null incorrectly.
Consider the steps that happen in m.get():
A hash is calculated for the key.
The current length of the table (the buckets in the HashMap) is read.
This length is used to calculate the correct bucket to get from the table.
The bucket is retrieved and the entries in the bucket are walked until a match is found or until the end of the bucket is reached.
If another thread changes the map and causes the table to be resized in between 2 & 3, the wrong bucket could be used to look for the entry, potentially giving an incorrect result.

The reason why synchronization is needed in a concurrent environment is, that java operations aren't atomic. This means that a single java operation like counter++ causes the underlaying VM to execute more than one machine operation.
Read value
Increment value
Write value
While those three operations are performed, another thread called T2 may be invoked and read the old value e.g 10 of that variable. T1 increments that value und writes the value 11 back. But T2 has read value 10! In cas that T2 should also increment this value, the result stays the same, namely 11 instead of 12.
The synchronisation will avoid such concurrent errors.
T1:
Set synchronizer token
Read value
Another thread T2 was invoked and tries to read the value. But since the synchronizer token was already set, T2 has to wait.
Increment value
Write value
Remove synchronizer token
T2:
Set synchronizer token
Read value
Increment value
Write value
Remove synchronizer token

By synchronising the get method you are forcing the thread to cross the memory barrier and read the value from the main memory. If you wouldn't synchronise the get method then the JVM takes liberties to apply underlying optimisations that might result in that thread reading blissfully unaware a stale value stored in registers and caches.

Scalable patterns for thread-safe hashtable puts when keeping track of frequency

This was an interview question I got some time last week and it ended at a cliffhanger. The question was simple: Design a service that keeps track of the frequency of "messages" (a 1 line string, could be in different languages) passed to it. There are 2 broad apis: submitMsg(String msg) and getFrequency(String msg). My immediate reaction was to use as hashMap that uses a String as a key (in this case, a message) and an Integer as a value (to keep track of counts/frequency).
The submitMsg api simply sees whether a message exists in the hashMap. If it doesn't, put the message and set the frequency to 1; if it does, then get the current count and increment it by 1. The interviewer then pointed out this would fail miserably in the event multiple threads access the SAME key at the SAME exact time.
For example: At 12:00:00:000 Thread1 would try to "submitMsg", and thereby my method would do a (1) get on the hashMap and see that the value is not null, it is infact, say 100 (2) do a put by incrementing the frequency by 1 so that the key's value is 101. Meanwhile consider that Thread2 ALSO tried to do a submitMsg at exactly At 12:00:00:000, and the method once again internally did a get on the hashMap (which returned a 100 - this is a race condition), after which the hashMap now increments the frequency to 101. Alas, the true frequency should have been 102 and not 101, and this is a major design flaw in a largely multithreaded environment. I wasn't sure how to stop this from happening: Putting a lock on simply the write isn't good enough, and having a lock on a read didn't make sense. What would have been ideal is to "lock" an element if a get was invoked internally via the submitMsg api because we expect it to be "written to" soonafter. The lock would be released once the frequency had been updated, but if someone were to use the getFrequency() api having a pure lock wouldn't make sense. I'm not sure whether a mutex would help here because I don't have a strong background in distributed systems.
I'm looking to the SO community for help on the best way to think through a problem like this. Is the magic in the datastructure to be used or some kind of synchronization that I need to do in my api itself? How can we maintain the integrity of "frequency" while maintaining the scalability of the service as well?

Well, your initial idea isn't a million miles off, you just need to make it thread safe. For instance, you could use a ConcurrentHashMap<String, AtomicInteger>.
public void submitMsg(String msg) {
AtomicInteger previous = map.putIfAbsent(msg, new AtomicInteger(1));
if (null != previous) {
previous.incrementAndGet();
}
}

The simplest solution is using Guava's com.google.common.collect.ConcurrentHashMultiset:
private final ConcurrentHashMultiset<String> multiset = ConcurrentHashMultiset.create();
public void submitMsg(String msg) {
multiset.add(msg);
}
public int count(String msg) {
return multiset.count(msg);
}
But this is basically the same as Aurand's solution, just that somebody already implemented the boring details like creating the counter if it doesn't exists yet, etc.

Treat it as a Producer–consumer problem.
The service is the producer; it should add each message to a queue that feeds the consumer. You could run one queue per producer to ensure that the producers do not wait.
The consumer encapsulates the HashTable, and pulls the messages off the queue and updates the table.

Caching of instances

I am just curious about the java memory model a little.
Here is what i though.
If i have the following class
public class Test {
int[] numbers = new int[Integer.MAX_VALUE]; // kids dont try this at home
void increment(int ind){
numbers[ind]++;
}
int get(int ind){
return numbers[ind];
}
}
There are multiple readers get() and one writer increment() thread accessing this class.
The question is here , is there actually any synchronization at all that i have to do in order to leave the class at a consistent state after each method call?
Why i am asking this, i am curious if the elements in the array are cached in some way by the JVM or is this only applied to class members? If the members inside the array could be cached, is there a way to define them as volatile ?
Thanks
Roman

As an alternative to synchronizing those methods, you could also consider replacing the int[] with an array of AtomicIntegers. This would have the benefit/downside (depending on your application) of allowing concurrent access to different elements in your list.

You will definitely have to use some sort of synchronization (either on your class or the underlying data structure) in order to ensure the data is left in a consistent state after method calls. Consider the following situations, with two Threads A and B, with the integer array initially containing all zero values.
Thread A calls increment(0). The post-increment operation is not atomic; you can actually consider it to be broken down into at least three steps:
Read the current value; Add one to the current value; Store the value.
Thread B also calls increment(0). If this happens soon after Thread A has done the same, they will both read the same initial value for the element at index 0 of the array.
At this point, both Thread A and B have read a value of '0' for the element they want to increment. Both will increment the value to '1' and store it back in the first element of the array.
Thus, only the work of the Thread that last writes to the array is seen.
The situation is similar if you had a decrement() method. If both increment() and decrement() were called at near-simultaneous times by two separate Threads, there is no telling what the outcome would be. The value would either be incremented by one or decremented by one, and the operations would not "cancel" each other out.
EDIT: Update to reflect Roman's (OP) comment below
Sorry, I mis-read the post. I think I understand your question, which is along the lines of:
"If I declare an array as volatile,
does that mean access to its elements
are treated as volatile as well?"
The quick answer is No: Please see this article for more information; the information in the previous answers here is also correct.

Yes, the VM is allowed to cache inside the thread any field that is not synchronized or voltile. To prevent this, you could mark the fields as volatile, but they still wouldn't be thread safe, since ++ is not an atomic operation. Add the synchronized keyword to the methods, and you're safe.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.