Using concurrentHashMap vs Hashmap

Using concurrentHashMap vs Hashmap - java

private static final Map<String, SampleClass> map = new
ConcurrentHashMap<>();
public static SampleClass getsampleclass(String context) {
if( map.get(context) != null) {
return map.get(context);
} else {
SampleClass cls = new SampleClass(context);
map.put(context, cls);
}
}
In multi-threaded environment, if two threads get map.get(context) as null, then both the threads will create cls, and put will blocked, therefore thread1 will put first and after it thread2 will override what was put by thread1.
Is this behavior correct ?
In my case, I want same value to be returned when map.get is done, hence i see using HashMap and synchronizing it is preferred.

Use CHM's atomic computeIfAbsent() method and you won't have to worry about synchronization:
return map.computeIfAbsent(context, SampleClass::new);

Related

Is an assignment inside ConcurrentHashMap.computeIfAbsent threadsafe?

Consider the following implementation of some kind of fixed size cache, that allows lookup by an integer handle:
static class HandleCache {
private final AtomicInteger counter = new AtomicInteger();
private final Map<Data, Integer> handles = new ConcurrentHashMap<>();
private final Data[] array = new Data[100_000];
int getHandle(Data data) {
return handles.computeIfAbsent(data, k -> {
int i = counter.getAndIncrement();
if (i >= array.length) {
throw new IllegalStateException("array overflow");
}
array[i] = data;
return i;
});
}
Data getData(int handle) {
return array[handle];
}
}
There is an array store inside the compute function, which is not synchronized in any way. Would it be allowed by the java memory model for other threads to read a null value from this array later on?
PS: Would the outcome change if the id returned from getHandle was stored in a final field and only accessed through this field from other threads?

The read access isn't thread safe. You could make it thread safe indirectly however it's likely to be brittle. I would implemented it in a much simpler way and only optimise it later should it prove to a performance problem. e.g. because you see it in a profiler for a realistic test.
static class HandleCache {
private final Map<Data, Integer> handles = new HashMap<>();
private final List<Data> dataByIndex = new ArrayList<>();
synchronized int getHandle(Data data) {
Integer id = handles.get(data);
if (id == null) {
id = handles.size();
handles.put(data, id);
dataByIndex.add(id);
}
return id;
}
synchronized Data getData(int handle) {
return dataByIndex.get(handle);
}
}

Assuming that you determine the index for the array read from the value of counter than yes - you may get a null read
The simplest example (there are others) is a follows:
T1 calls getHandle(data) and is suspended just after int i = counter.getAndIncrement();
T2 calls handles[counter.get()] and reads null.
You should be able to easily verify this with a strategically placed sleep and two threads.

From the documentation of ConcurrentHashMap#computeIfAbsent:
The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.
The documentation's reference to blocking refers only to update operations on the Map, so if any other thread attempts to access array directly (rather than through an update operation on the Map), there can be race conditions and null can be read.

Replace double checked locking in concurrent environment

I have already topic with same code:
public abstract class Digest {
private Map<String, byte[]> cache = new HashMap<>();
public byte[] digest(String input) {
byte[] result = cache.get(input);
if (result == null) {
synchronized (cache) {
result = cache.get(input);
if (result == null) {
result = doDigest(input);
cache.put(input, result);
}
}
}
return result;
}
protected abstract byte[] doDigest(String input);
}
At previous I got prove that code is not thread safe.
At this topic I want to provide solutions which I have in my head and I ask you to review these solutions:
Solution#1 through ReadWriteLock:
public abstract class Digest {
private final ReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock readLock = rwl.readLock();
private final Lock writeLock = rwl.writeLock();
private Map<String, byte[]> cache = new HashMap<>(); // I still don't know should I use volatile or not
public byte[] digest(String input) {
byte[] result = null;
readLock.lock();
try {
result = cache.get(input);
} finally {
readLock.unlock();
}
if (result == null) {
writeLock.lock();
try {
result = cache.get(input);
if (result == null) {
result = doDigest(input);
cache.put(input, result);
}
} finally {
writeLock.unlock();
}
}
return result;
}
protected abstract byte[] doDigest(String input);
}
Solution#2 through CHM
public abstract class Digest {
private Map<String, byte[]> cache = new ConcurrentHashMap<>(); //should be volatile?
public byte[] digest(String input) {
return cache.computeIfAbsent(input, this::doDigest);
}
protected abstract byte[] doDigest(String input);
}
Please review correctness of both solutions. It is not question about what the solution better. I undestand that CHM better. Please, review correctnes of implementation

Unlike the clusterfudge we got into in the last question, this is better.
As was shown in the prefious question's duplicate, the original code is not thread-safe since HashMap is not threadsafe and the initial get() can be called while the put() is being executed inside the synchronized block. This can break all sorts of things, so that's definitely not threadsafe.
The second solution is thread-safe, since all accesses to cache are done in guarded code. The inital get() is protected by a readlock, and the put() is done inside a writelock, guaranteeing that threads can't read the cache while it's being written to, but they're free to read it at the same time as other reading threads. No concurrency issues, no visibility issues, no chances of deadlocks. Everything's fine.
The last is of course the most elegant one. Since computeIfAbsent() is an atomic operation, it guarantees that the value is either directly returned or computed at most once, from the javadoc:
If the specified key is not already associated with a value, attempts
to compute its value using the given mapping function and enters it
into this map unless null. The entire method invocation is
performed atomically, so the function is applied at most once per key.
Some attempted update operations on this map by other threads may be
blocked while computation is in progress, so the computation should be
short and simple, and must not attempt to update any other mappings of
this map.
The Map in question shouldn't be volatile, but it should be final. If it's not final, it could (at least in theory) be changed and it would be possible for 2 threads to work on different objects, which is not what you want.

Cache using ConcurrentHashMap

I have the following code:
public class Cache {
private final Map map = new ConcurrentHashMap();
public Object get(Object key) {
Object value = map.get(key);
if (value == null) {
value = new SomeObject();
map.put(key, value);
}
return value;
}
}
My question is:
The put and get methods of the map are thread safe, but since the whole block in not synchronized - could multiple threads add a the same key twice?

put and get are thread safe in the sense that calling them from different threads cannot corrupt the data structure (as, e.g., is possible with a normal java.util.HashMap).
However, since the block is not synchronized, you may still have multiple threads adding the same key:
Both threads may pass the null check, one adds the key and returns its value, and then the second will override that value with a new one and returns it.

As of Java 8, you can also prevent this addition of duplicate keys with:
public class Cache {
private final Map map = new ConcurrentHashMap();
public Object get(Object key) {
Object value = map.computeIfAbsent(key, (key) -> {
return new SomeObject();
});
return value;
}
}
The API docs state:
If the specified key is not already associated with a value, attempts
to compute its value using the given mapping function and enters it
into this map unless null. The entire method invocation is performed
atomically, so the function is applied at most once per key. Some
attempted update operations on this map by other threads may be
blocked while computation is in progress, so the computation should be
short and simple, and must not attempt to update any other mappings of
this map.

could multiple threads add a the same key twice?
Yes, they could. To fix this problem you can:
1) Use putIfAbsent method instead of put. It very fast but unnecessary SomeObject instances can be created.
2) Use double checked locking:
Object value = map.get(key);
if (value == null) {
synchronized (map) {
value = map.get(key);
if (value == null) {
value = new SomeObject();
map.put(key, value);
}
}
}
return value;
Lock is much slower, but only necessary objects will be created

you could also combine checking and putIfAbsent such as:
Object value = map.get(key);
if (value == null) {
return map.putIfAbsent(key, new SomeObject());
}
return value;
thereby reducing the unneccessary new objects to cases where new entries are introduced in the short time between the check and the putIfAbsent.
If you are feeling lucky and reads vastly outnumber writes to your map, you can also create your own copy-on-write map similar to CopyOnWriteArrayList.

Confused about what pattern to use to block threads conditionally

I'm having a Map object that could be null or simply cleared when the application first starts. I need all threads accessing this map to block till the map is initialized and only then I need to signal all threads to access this map.
This map holds configuration data and it will be for reading only unless a single threads decides to refresh to load new configuration data (So it doesn't need to Synchronized for the sake of performance as I don't find necessary too). I tried using a Condition object for a ReentrantLock but it threw IllegalMonitorState exceptions whenever I tried to signalAll() or await().
Here is a pseudo code for what I need to do:
void monitorThread{
while(someCondition){
map = updatedMap();
condition.signalAll();
}
}
String readValueFromMap(String key){
if(map == null){
condition.await();
}
return map.get(key);
}

CountDownLatch is all you need.
CountDownLatch latch = new CountDownLatch(1);
While initialize hashmap do latch.countdown() and in threads use latch.await()
void monitorThread{
map = updatedMap();
latch.countDown();
}
String readValueFromMap(String key){
latch.await();
return map.get(key);
}
Please note that CountDownLatch await() method only waits if countdown is greater than 0 hence only first time.

To do this right, you need a memory barrier hence the volatile. Because the map may be null initially, you are going to need another lock object. The following should work:
private final Object lockObject = new Object();
private volatile Map<...> map;
void monitorThread{
while (condition){
// do this outside of the synchronized in case it takes a while
Map<...> updatedMap = updatedMap();
synchronized (lockObject) {
map = updatedMap;
// notify everyone that may be waiting for the map to be initialized
lockObject.notifyAll();
}
}
}
String readValueFromMap(String key) {
// we grab a copy of the map to avoid race conditions in case the map is
// updated in the future
Map<...> mapRef = map;
// we have a while loop here to handle spurious signals
if (mapRef == null) {
synchronized (lockObject) {
while (map == null) {
// wait for the map to initialized
lockObject.wait();
}
mapRef = map;
}
}
return mapRef.get(key);
}

Sounds like all you need is a "Lock" object that guards access to the Map.
These are pretty easy to use:
Lock l = ...;
l.lock();
try {
// access the resource protected by this lock
} finally {
l.unlock();
}
You could probably use: java.util.concurrent.locks.ReentrantReadWriteLock.ReadLock

Synchronizing a Map of Sets/Lists

I would like to implement a variation on the "Map of Sets" collection that will be constantly accessed by multiple threads. I am wondering whether the synchronization I am doing is sufficient to guarantee that no issues will manifest.
So given the following code, where Map, HashMap, and Set are the Java implementations, and Key and Value are some arbitrary Objects:
public class MapOfSets {
private Map<Key, Set<Value>> map;
public MapOfLists() {
map = Collections.synchronizedMap(new HashMap<Key, Set<Value>());
}
//adds value to the set mapped to key
public void add(Key key, Value value) {
Set<Value> old = map.get(key);
//if no previous set exists on this key, create it and add value to it
if(old == null) {
old = new Set<Value>();
old.add(value);
map.put(old);
}
//otherwise simply insert the value to the existing set
else {
old.add(value);
}
}
//similar to add
public void remove(Key key, Value value) {...}
//perform some operation on all elements in the set mapped to key
public void foo(Key key) {
Set<Value> set = map.get(key);
for(Value v : set)
v.bar();
}
}
The idea here is that because I've synchronized the Map itself, the get() and put() method should be atomic right? So there should be no need to do additional synchronization on the Map or the Sets contained in it. So will this work?
Alternatively, would the above code be advantageous over another possible synchronization solution:
public class MapOfSets {
private Map<Key, Set<Value>> map;
public MapOfLists() {
map = new HashMap<Key, Set<Value>();
}
public synchronized void add(Key key, Value value) {
Set<Value> old = map.get(key);
//if no previous set exists on this key, create it and add value to it
if(old == null) {
old = new Set<Value>();
old.add(value);
map.put(old);
}
//otherwise simply insert the value to the existing set
else {
old.add(value);
}
}
//similar to add
public synchronized void remove(Key key, Value value) {...}
//perform some operation on all elements in the set mapped to key
public synchronized void foo(Key key) {
Set<Value> set = map.get(key);
for(Value v : set)
v.bar();
}
}
Where I leave the data structures unsynchronized but synchronize all the possible public methods instead. So which ones will work, and which one is better?

The first implementation you posted is not thread safe. Consider what happens when the add method is accessed by two concurrent threads with the same key:
thread A executes line 1 of the method, and gets a null reference because no item with the given key is present
thread B executes line 1 of the method, and gets a null reference because no item with the given key is present — this will happen after A returns from the first call, as the map is synchronized
thread A evaluates the if condition to false
thread B evaluates the if condition to false
From that point on, the two threads will carry on with execution of the true branch of the if statement, and you will lose one of the two value objects.
The second variant of the method you posted looks safer.
However, if you can use third party libraries, I would suggest you to check out Google Guava, as they offer concurrent multimaps (docs).

The second one is correct, but the first one isn't.
Think about it a minute, and suppose two threads are calling add() in parallel. Here's what could occur:
Thread 1 calls add("foo", bar");
Thread 2 calls add("foo", baz");
Thread 1 gets the set for "foo" : null
Thread 2 gets the set for "foo" : null
Thread 1 creates a new set and adds "bar" in it
Thread 2 creates a new set and adds "baz" in it
Thread 1 puts its set in the map
Thread 2 puts its set in the map
At the end of the story, the map contains one value for "foo" instead of two.
Synchronizing the map makes sure that its internal state is coherent, and that each method you call on the map is thread-safe. but it doesn't make the get-then-put operation atomic.
Consider using one of Guava's SetMultiMap implementations, which does everything for you. Wrap it into a call to Multimaps.synchronizedSetMultimap(SetMultimap) to make it thread-safe.

Your second implementation will work, but it holds locks for longer than it needs to (an inevitable problem with using synchronized methods rather than synchronized blocks), which will reduce concurrency. If you find that the limit on concurrency here is a bottleneck, you could shrink the locked regions a bit.
Alternatively, you could use some of the lock-free collections providded by java.util.concurrent. Here's my attempt at that; this isn't tested, and it requires Key to be comparable, but it should not perform any locking ever:
public class MapOfSets {
private final ConcurrentMap<Key, Set<Value>> map;
public MapOfSets() {
map = new ConcurrentSkipListMap<Key, Set<Value>>();
}
private static ThreadLocal<Set<Value>> freshSets = new ThreadLocal<Set<Value>>() {
#Override
protected Set<Value> initialValue() {
return new ConcurrentSkipListSet<Value>();
}
};
public void add(Key key, Value value) {
Set<Value> freshSet = freshSets.get();
Set<Value> set = map.putIfAbsent(key, freshSet);
if (set == null) {
set = freshSet;
freshSets.remove();
}
set.add(value);
}
public void remove(Key key, Value value) {
Set<Value> set = map.get(key);
if (set != null) {
set.remove(value);
}
}
//perform some operation on all elements in the set mapped to key
public void foo(Key key) {
Set<Value> set = map.get(key);
if (set != null) {
for (Value v: set) {
v.bar();
}
}
}
}

For your Map implementation you could just use a ConcurrentHashMap - You wouldn't have to worry about ensuring thread safety for access, whether it's input or retrieval, as the implemenation takes care of that for you.
And if you really want to use a Set, you could call
Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>())
on your ConcurrentHashMap.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using concurrentHashMap vs Hashmap - java

Use CHM's atomic computeIfAbsent() method and you won't have to worry about synchronization: return map.computeIfAbsent(context, SampleClass::new);

Related

Is an assignment inside ConcurrentHashMap.computeIfAbsent threadsafe?

Replace double checked locking in concurrent environment

Cache using ConcurrentHashMap

Confused about what pattern to use to block threads conditionally

Synchronizing a Map of Sets/Lists

Categories

Resources