Concurrent frequency counter - concurrency issue - java

I would like to create a concurrent frequency counter class in Java.
It's about that once a request is processed (by processRequest method), the code checks the request's type (an integer) and counts how many requests have been processed (grouped by the request's type) from a given time. The processRequest method will be called by multiple threads in the same time.
There are two other methods:
clearMap(): It will be called by one thread in every 3 hours and clears the whole map.
getMap(): It can be called in any time by a webservice and returns an immutable copy of the current state of the frequency map.
See below my initial plan to implement that.
public class FrequencyCounter {
private final ConcurrentHashMap<Integer,Long> frequencenyMap = new ConcurrentHashMap<>();
public void processRequest(Request request){
frequencenyMap.merge(request.type, 0L, (v, d) -> v+1);
}
public void clearMap(){
frequencenyMap.clear();
}
public Map<Integer,Long> getMap(){
return ImmutableMap.copyOf(frequencenyMap);
}
}
I checked the documentation of ConcurrentHashMap and it tells that the merge method is performed atomically.
So once the clear() method starts to clear the hash buckets of the map (locking as per hash bucket), it can't be invoked when another thread is between getting the value of the frequency map and incrementing its value in the processRequest method because the merge method is executed atomically.
Am I right?
Does my above plan seem to be fine?
Thank you for your advice.

First, replace Long with AtomicLong.
Second, use computeIfAbsent.
private final Map<Integer, AtomicLong> frequencyMap = new ConcurrentHashMap<>();
public void processRequest(Request request){
frequencyMap.computeIfAbsent(request.type, k -> new AtomicLong())
.incrementAndGet();
}
There are a few reasons why I believe this is a better solution:
The code in the question uses boxed objects, i.e. (v, d) -> v+1 is really (Long v, Long d) -> Long.valueOf(v.longValue() + 1).
That code generates extra garbage, which can be avoided by using AtomicLong.
The code here only allocates one object per key, and doesn't require any extra allocations to increment the counter, e.g. it will still only be the one object even if counter goes to the millions.
The unboxing, adding 1, boxing operation will likely take slightly longer than the tightly coded incrementAndGet() operation, increasing the likelyhood of a collision, requiring a re-try in the merge method.
Code "purity". Using a method that takes a "value", which is then entirely ignored, seems wrong to me. It is unnecessary code noise.
These are of course my opinions. You can make your own decision, but I think this code clarifies the purpose, i.e. to increment a long counter, in a fully thread-safe way.

Related

What does Atomicity imply for cmputeIfAbsent in a concurrent hash map? Atomicity vs. Synchronized

I know there have been many questions around computeIfAbsent.
Specifically what I am looking for is to understand the statement around atomicity for a concurrent hash map.
from the JavaDoc
The entire method invocation is performed atomically, so the function is applied at most once per key.
If two threads attempt to execute computeIfAbsent with different key's and find that in both cases the map does not contain them, might the resulting executions of the compute if absent function be concurrent? I understand they would not be concurrent in the event that both threads were trying to add the SAME key.
The word Atomic is used and it is mentioned that this means applied at most once per key. But there isn't a specific mention of synchronized behaviour on the method.
As a side note, this is relevant to me in that the method called by computeIfAbsent modifies then uses a field of the class in its body.*
I want to understand if there is a threading concern resulting from two different thread executions of the computeIfAbsent method for the two different keys.
Essentially do I have to look at something along the lines of synchronizing access to the field variable and its subsequent use within the computeIfAbsent method I call.
*( The computeIfAbsent method invoked is the only method which modifies the field. There is no other invoker of the method outside of the call from the hash map computeIfAbsent method. There is only one instance of the concurrent hash map that calls the computeWithAbsent method that invokes the "atomic" method in question)
My field is volatile to avoid potential concerns with atomic visibility.
There are situations where the mapping function could be executed concurrently for different key values so it is important that your mapping function is thread-safe.
The computeIfAbsent method only guarantees that the mapping function isn't called simultaneously for the same key value. Also note that a Map works by hashing muliple keys into buckets of entries and if computeIfAbsent(a, mapFunc) is called at same time as computeIfAbsent(b, mapFunc) with pair of keys a+b that map to same sub-table of ConcurrentHashMap, then mapFunc for each key will be run one after the other and not at same time.
However where the different keys do not resolve to same sub-table within ConcurrentHashMap you should expect your mapping function to be called simulaneously by different threads for different key values.
Here is an example which shows a thread-safe mapping function that detects concurrent callers:
public static void main(String[] args) throws InterruptedException {
ConcurrentHashMap<String, String> map = new ConcurrentHashMap<>(2096, 1.0f);
AtomicInteger concurrent = new AtomicInteger();
Function<String, String> mappingFunction = s -> {
int c = concurrent.incrementAndGet();
String value = "Value:"+s +" concurrent="+c+" thread="+Thread.currentThread().getName();
if (c != 1)
System.out.println("Multiple callers for "+value);
try { Thread.sleep(50); } catch (InterruptedException ignore) { }
concurrent.decrementAndGet();
return value;
};
Runnable task = () -> {
Random r = new Random();
for (int i = 0; i < 10_000; i++)
map.computeIfAbsent(String.valueOf(r.nextInt(10240)), mappingFunction);
};
Thread a = new Thread(task, "one");
a.start();
task.run();
a.join();
map.values().stream().limit(32).forEach(System.out::println);
}
If run enough, there will be occasions where the counter inside mappingFunction shows that 2 instances are running at same time on the pair of threads.
EDIT
To answer your comment about synchronized (r):
Note that there is infinite while loop inside the computeIfAbsent which only exits on break or return, and mappingFunction.apply(key) is called in two places:
when the key is the first entry into the sub-table it runs to the synchronized (r) block. As the line before declares Node<K,V> r = new ReservationNode<K,V>() there is NEVER contention on r from different threads, but only one thread successfully enters the if (casTabAt(...)) { binCount = 1; ... } block and returns, other losing threads resume the loop.
when the key is not the first entry into the sub-table it runs to the synchronized (f) block which would block all but one threads trying to computeIfAbsent for different keys that are hashed to the same sub-table. As each thread enters the block it verifies f is unchanged, and if so returns existing or computed new value - otherwise resumes the loop.
TL;DR
Oversimplifying it, when two threads execute computeIfAbsent, just one of them will be successful. The second thread will be blocked until the first one ends. Once it is done, the second thread will now find the key and won't apply the mapping function.
Now going into detail:
computeIfAbsent is said to be atomic since it uses a mix between synchronization and compare-and-swap mechanisms to search and set values in the map. This ensures that two threads won't collide when setting a new key, and that is why the documentation ensures that this method will be executed "once at most".
If you take a quick look at computeIfAbsent source code in the JDK you will find the following:
synchronized (r) {
if (casTabAt(tab, i, null, r)) {
binCount = 1;
Node<K,V> node = null;
try {
if ((val = mappingFunction.apply(key)) != null)
node = new Node<K,V>(h, key, val);
} finally {
setTabAt(tab, i, node);
}
}
}
That snippet will granularly block and try to atomically apply the mapping function.
If you want to dig even more on that and you have a good understanding of how HashMaps work and CAS, you can also take a look at a more detailed explanation of the ConcurrentHashMap code here: ConcurrentHashmap in JDK8 code explanation

Is incrementing of current value while putting thread-safe in ConcurrentHashMap?

I wonder what happens when I modify current value while putting it into ConcurrentHashMap.
I have ConcurrentHashMap (ConncurentHashMap<String, Integer> attendance) with existing mapping of conference hall names and number of visitors to each.
Every invocation of visit(conferenceHallName) method is supposed to increment the number of visitors to the given conference hall. Every invoker is a thread.
So here is the method:
public void visit(String conferenceHallName) {
attendance.put(conferenceHallName, attendance.get(conferenceHallName) + 1);
}
put() is locking method, get() is not. But what happens first in this case:
thread will lock segment with this mapping, then calculate a new value and then put and release which is perfect for me
or
thread will get the old value, calculate a new one, then lock segment and put which means inconsistency for me and I will need to find a workaround.
And if the second scenario is what happens in reality will using AtomicInteger instead of Integer solve my issue?
The second description is closer to what actually happens: the thread will access the value in a thread-safe way, construct a new Integer with the updated count, then lock the map, and replace the object.
Using AtomicInteger instead of Integer will solve the issue:
attendance.get(conferenceHallName).getAndIncrement();
This is assuming that all conferenceHallName keys are properly set in the map.

Adding to AtomicInteger within ConcurrentHashMap

I have the following defined
private ConcurrentMap<Integer, AtomicInteger> = new ConcurrentHashMap<Integer, AtomicInteger>();
private void add() {
staffValues.replace(100, staffValues.get(100), new AtomicInteger(staffValues.get(100).addAndGet(200)));
}
After testing, the values I am getting are not expected, and I think there is a race condition here. Does anyone know if this would be considered threadsafe by wrapping the get call in the replace function?
A good way to handle situations like this is using the computeIfAbsent method (not the compute method that #the8472 recommends)
The computeIfAbsent accepts 2 arguments, the key, and a Function<K, V> that will only be called if the existing value is missing. Since a AtomicInteger is thread safe to increment from multiple threads, you can use it easely in the following manner:
staffValues.computeIfAbsent(100, k -> new AtomicInteger(0)).addAndGet(200);
There are a few issues with your code. The biggest is that you're ignoring the return-value of ConcurrentHashMap.replace: if the replacement doesn't happen (due to another thread having made a replacement in parallel), you simply proceed as if it happened. This is the main reason you're getting wrong results.
I also think it's a design mistake to mutate an AtomicInteger and then immediately replace it with a different AtomicInteger; even if you can get this working, there's simply no reason for it.
Lastly, I don't think you should call staffValues.get(100) twice. I don't think that causes a bug in the current code — your correctness depends only on the second call returning a "newer" result than the first, which I think is actually guaranteed by ConcurrentHashMap — but it's fragile and subtle and confusing. In general, when you call ConcurrentHashMap.replace, its third argument should be something you computed using the second.
Overall, you can simplify your code either by not using AtomicInteger:
private ConcurrentMap<Integer, Integer> staffValues = new ConcurrentHashMap<>();
private void add() {
final Integer prevValue = staffValues.get(100);
staffValues.replace(100, prevValue, prevValue + 200);
}
or by not using replace (and perhaps not even ConcurrentMap, depending how else you're touching this map):
private Map<Integer, AtomicInteger> staffValues = new HashMap<>();
private void add() {
staffValues.get(100).addAndGet(200);
}
You don't need to use replace(). AtomicInteger is a mutable value that does not need to be substituted whenever you want to increment it. In fact addAndGet already increments it in place.
Instead use compute to put a default value (presumably 0) into the map when none is present and otherwise get the pre-existing value and increment that.
If, on the other hand, you want to use immutable values put Integer instances instead of AtomicInteger into the map and update them with the atomic compute/replace/merge operations.

Why do we need to avoid mutations while coding? What is a mutation?

Why is the second code (the one with the stream) a better solution than the first?
First :
public static void main(String [] args) {
List<Integer> values = Arrays.asList(1,2,3,4,5,6);
int total = 0;
for(int e : values) {
total += e * 2;
}
Second :
System.out.println(total);
System.out.println(
values.stream()
.map(e-> e*2)
.reduce(0, (c, e)-> c + e));
Mutation is changing an object and is one common side effect in programming languages.
A method that has a functional contract will always return the same value to the same arguments and have no other side effects (like storing file, printing, reading). Thus even if you mutate temporary values inside your function it's still pure from the outside. By putting your first example in a function demonstrates it:
public static int squareSum(const List<Integer> values)
{
int total = 0;
for(int e : values) {
total += e * 2; // mutates a local variable
}
return total;
}
A purely functional method doesn't even update local variables. If you put the second version in a function it would be pure:
public static int squareSum(const List<Integer> values)
{
return values.stream()
.map(e-> e*2)
.reduce(0, (c, e)-> c + e);
}
For a person that knows other languages that has long been preferring a functional style map and reduce with lambda is very natural. Both versions are easy to read and easy to test, which is the most important part.
Java has functional classes. java.lang.String is one of them.
Mutation is changing the state of an object, either the list or some custom object.
Your particular code does not cause a mutation of the list either way, so there's no practical benefit here of using lambdas instead of plain old iteration. And, blame me, but I would use the iteration approach in this case.
Some approaches say that whenever you need to modify an object/collection, you need to return a new object/collection with the modified data instead of changing the original one. This is good for collection for example when you concurrently access a collection and it's being changed from another thread.
Of course this could lead to memory leaks, so there are some algorithms for managing memory and mutability for collection i.e. only the changed nodes are stored in another place in memory.
While Royal Bg is right you're not mutating your data in either case, it's not true that there's no advantage to the second version. The second version can be heavily multithreaded without ambiguity.
Since we're not expecting to iterate the list we can put the operations into a heavily multi-threaded context and solve it on a gpu. In the latter one each data point in the collection is multiplied by 2. Then reduced (which means every element is added together), which can be done by a reduction.
There are a number of potential advantages to the latter code not seen in the former. And while neither code element actually mutates, in the second one we are given the very clear contract that the items cannot mutate while that is happening. So we know that it doesn't matter if we iterate the list forwards, backwards, or apply it multithreaded etc. The implementation details can be filled in later. But, only if we know mutation can't happen and streams simply don't allow them.

Declaring a hashmap inside a method

Local variables are thread safe in Java. Is using a hashmap declared inside a method thread safe?
For Example-
void usingHashMap()
{
HashMap<Integer> map = new HashMap<integer>();
}
When two threads run the same method here usingHashMap(), they are in no way way related. Each thread will create its own version of every local variable, and these variables will not interact with each other in any way
If variables aren't local,then they are attached to the instance. In this case, two threads running the same method both see the one variable, and this isn't threadsafe.
public class usingHashMapNotThreadSafe {
HashMap<Integer, String> map = new HashMap<Integer, String>();
public int work() {
//manipulating the hashmap here
}
}
public class usingHashMapThreadSafe {
public int worksafe() {
HashMap<Integer, String> map = new HashMap<Integer, String>();
//manipulating the hashmap here
}
}
While usingHashMapNotThreadSafe two threads running on the same instance of usingHashMapNotThreadSafe will see the same x. This could be dangerous, because the threads are trying to change map! In the second, two threads running on the same instance of usingHashMapThreadSafe will see totally different versions of x, and can't effect each other.
As long as the reference to the HashMap object is not published (is not passed to another method), it is threadsafe.
The same applies to the keys/values stored in the map. They need to be either immutable (cannot change their states after being created) or used only within this method.
I think to ensure complete concurrency, a ConcurrentHashMap should be used in any case. Even if it is local in scope. ConcurrentHashMap implements ConcurrentMap. The partitioning is essentially an attempt, as explained in the documentation to:
The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.

Categories