Suppose that i have a shared ConcurrentHashMap<String, Integer> called map that has already only one mapping ("One", 1), and suppose also that i have 2 threads.
The first thread executes this code:
map.put("One", 2);
and the second thread executes this code:
synchronized (map) {
Integer number = map.get("One");
System.out.println(number == map.get("One"));
}
Since ConcurrentHashMap works with lock striping method instead of locking entire object i don't think that the described scenario is thread safe.
Particularly i don't know if there could be an interleaving of map.put("One", 2); in first thread between Integer number = map.get("One"); call and System.out.println(number == map.get("One")); call in second thread despite both are inside a synchronized block.
So is it possible that that code prints false?
All methods within ConcurrentHashMap might be thread-safe, but this does not mean that it synchronizes on the ConcurrentHashMap object itself. What you can do is synchronize put and the map access code on the same reference. Your put code would have to be changed to this:
synchronized (map) {
map.put("One", 2);
}
And your access code can remain like:
synchronized (map) {
Integer number = map.get("One");
System.out.println(number == map.get("One"));
}
This will never be able to print false.
Related
I have a piece of code that on startup creates a HashMap of key to ReentrantLock.
void constructor() {
this.lockMap = new HashMap<>();
for (int i=0; i<100; i++) {
this.lockMap.put(i, new ReentrantLock(true));
}
}
During concurrent execution, I try to lock the lock inside the lockMap in the following manner:
runConcurrently() {
ii = 10;
if (!lockMap.containsKey(ii)) {
log.error("lock id is not found in the lockMap " + ii);
}
locked = lockMap.get(ii).tryLock();
if (!locked) {
return;
}
runCriticialSection();
lockMap.get(ii).unlock();
}
void runCriticialSection() {
log.info("hello");
log.info("I'm here");
}
so here is what I have seen happen once in while every 4 hours the code is running, in a very rare occurrence.
I see these logs:
hello.
hello.
I'm here.
I'm here.
and then I see this log right after on third time accessing the hasmap on the same key ii =10:
lock id is not found in the map 10.
NullPointerExeception ... trying to access the map.
where I should see in guaranteed ordering:
hello.
I'm here.
hello.
I'm here.
The Hashmap never gets modified during execution at all.
is there an issue with hashmap not being concurrent hashmap? is get, not threadsafe in absence of modifications? I am specifically not using it due to locking slowness in concurrent hasmap. But the hashmap is only created on startup and never modified after. I find it very weird where it seems the lock has been acquired twice and it seems like the element is missing from the map.
There is no concurrency issue with the map itself, if the map is never modified after the constructor. If so, threads will only ever see that final version of the map. Else, the behaviour is undefined.
No exclusive access of the critical section
From your output, it appears that (at least) two threads accessed runCriticialSection simultaneously.
This is due to the fact that you are using a different lock for each value of ii. A lock only excludes another thread from locking it, if that other threads uses that same lock! Thus, threads that do not use the same value of ii, will effortlessly run runCriticialSection simultaneously. That can result in the described output anomaly as shown above, as follows:
Thread 1 executes log.info("hello");
Thread 2 executes log.info("hello");
Thread 1 executes log.info("I'm here");
Thread 2 executes log.info("I'm here");
If you want exclusive access to a section, always use the same lock surrounding that section.
Coding problems
When the check fails that ii maps to a lock, you should not continue but instead return or throw an exception. If you don't, locked = lockMap.get(ii).tryLock(); throws a NullPointerExcetpion, because lockMap.get(ii) returns null.
Between locking the lock and unlocking it, you are running user code, in the form of runCriticalSection. If you change the implementation of that method later and it starts throwing things: your lock will never unlock! Always use try ... finally with a lock.
Fixing these issues, could lead to the following code:
if (!lockMap.containsKey(ii)) {
log.error("lock id is not found in the lockMap " + ii);
return;
}
locked = lockMap.get(ii).tryLock();
if (!locked) {
return;
}
try {
runCriticialSection();
}
finally {
lockMap.get(ii).unlock();
}
Actually, I would just put the lock in a local variable, but that is a matter of opinion.
ReentrantLock lock = lockMap.get(ii);
if (lock == null) {
log.error("lock id is not found in the lockMap " + ii);
return;
}
locked = lock.tryLock();
if (!locked) {
return;
}
try {
runCriticialSection();
}
finally {
lock.unlock();
}
I have 4 threads - 2 of the thread does update and 2 of the thread does read on the concurrentHashMap. The code is as follow:
private static ConcurrentHashMap<String, String> myHashMap = new ConcurrentHashMap<>();
private static final Object lock = new Object();
Thread 1 and Thread 2's run method (key and value is a string)
synchronized (lock) {
if (!myHashMap.containsKey(key)) {
myHashMap.put(key, value);
} else {
String value = myHashMap.get(key)
// do something with the value
myHashMap.put(key, value);
}
}
Thread 3 and Thread 4's run method does the print
for (Entry<String, String> entry : myHashMap.entrySet()) {
String key = entry.getKey();
String value = entry.getValue();
System.out.println("key, " + key + " value " + value);
}
Is there any issue with the above usage of ConcurrenHashMap code?
Because when I read the Javadoc and search the web, I found the following claim:
This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details. (Note - I understand the print thread result might not be the most recent result, but that is ok as long as the update thread does things correctly.)
There is also some claim over the website that says the same Iterator cannot be used for 2 or more different thread. So I am wondering if the print method uses the same Iterator in 2 thread above. And why we cannot use the same Iterator in 2 different threads?
As for the requirement, I want concurrent read without blocking that is why I choose the ConcurrentHashMap.
Instead of using the if else block you can use the putIfAbsent method from the concurrent hashmap and second thing you should not use the external locking in concurrent hashmap.
I know ConcurrentHashMap is thread-safe e.g.putIfAbsent,Replace etc., but I was wondering, is a block of code like the one below safe?
if (accumulator.containsKey(key)) { //accumulator is a ConcurrentHashMap
accumulator.put(key, accumulator.get(key)+1);
} else {
accumulator.put(key, 0);
}
Keep in mind that the accumulator value for a key may be asked by two different threads simultaneously, which would cause a problem in a normal HashMap. So do I need something like this?
ConcurrentHashMap<Integer,Object> locks;
...
locks.putIfAbsent(key,new Object());
synchronized(locks.get(key)) {
if (accumulator.containsKey(key)) {
accumulator.put(key, accumulator.get(key)+1);
} else {
accumulator.put(key, 0);
}
}
if (accumulator.containsKey(key)) { //accumulator is a ConcurrentHashMap
accumulator.put(key, accumulator.get(key)+1);
} else {
accumulator.put(key, 0);
}
No, this code is not thread-safe; accumulator.get(key) can be changed in between the get and the put, or the entry can be added between the containsKey and the put. If you're in Java 8, you can write accumulator.compute(key, (k, v) -> (v == null) ? 0 : v + 1), or any of the many equivalents, and it'll work. If you're not, the thing to do is write something like
while (true) {
Integer old = accumulator.get(key);
if (old == null) {
if (accumulator.putIfAbsent(key, 0) == null) {
// note: it's a little surprising that you want to put 0 in this case,
// are you sure you don't mean 1?
break;
}
} else if (accumulator.replace(key, old, old + 1)) {
break;
}
}
...which loops until it manages to make the atomic swap. This sort of loop is pretty much how you have to do it: it's how AtomicInteger works, and what you're asking for is AtomicInteger across many keys.
Alternately, you can use a library: e.g. Guava has AtomicLongMap and ConcurrentHashMultiset, which also do things like this.
I think the best solution for you would be to use an AtomicInteger. The nice feature here is that it is non-blocking, mutable and thread-safe. You can use the replace method offered by the CHM, but with that you will have to hold a lock of the segment/bucket-entry prior to the replace completing.
With the AtomicInteger you leverage quick non-blocking updates.
ConcurrentMap<Key, AtomicInteger> map;
then
map.get(key).incrementAndGet();
If you are using Java 8, LongAdder would be better.
You are correct that your first code snippet is unsafe. It's totally reasonable for the thread to get interrupted right after the check has been performed and for another thread to begin executing. Therefore in the first snippet the following could happen:
[Thread 1]: Check for key, return false
[Thread 2]: Check for key, return false
[Thread 2]: Put value 0 in for key
[Thread 1]: Put value 0 in for key
In this example, the behavior you would want would leave you in a state with the value for that key being set to 1, not 0.
Therefore locking is necessary.
Only individual actions on ConcurrentHashMap are thread-safe; taking multiple actions in sequence is not. Your first block of code is not thread-safe. It is possible, for example:
THREAD A: accumulator.containsKey(key) = false
THREAD B: accumulator.containsKey(key) = false
THREAD B: accumulator.put(key, 0)
THREAD A: accumulator.put(key, 0)
Similary, it is not thread-safe to get the accumulator value for a given key, increment it, and then put it back in the map. This is a three-step process, and it is possible for another thread to interrupt at any point.
Your second synchronized block of code is thread-safe.
I run a program which contains the following classes (not only, but these are the relevant ones for the question)
Under Results class I have a synchronized LinkedHashMap such as:
private static Map<Integer,Result> resultsHashMap=Collections.synchronizedMap(new LinkedHashMap<Integer, Result>());
and a getter method:
public static Map<Integer,Result> getResultsHashMap() {
return resultsHashMap;
}
As well I have inside my Result class a constructor with this synchronized code:
public Result(){
synchronized (Lock.lock) {
uniqueIdResult++;
}
}
and a synchronized getter method as such:
public static int getUniqueIdResult() {
synchronized (Lock.lock) {
return uniqueIdResult;
}
}
the uniqueIdResult is defined as following:
private static int uniqueIdResult=0;
Also I have a Lock class consists this Object:
public static final Lock lock=new Lock();
Now, this is the important issue i'm after. In my program I have the next 2 lines, which are creating a Result and putting it into the HashMap
Result result = new Result();
Results.getResultsHashMap().put(Result.getUniqueIdResult(), result);
I try to run my program with different number of Threads. When it is being run with 1 thread the output is as I expect it to be (specifically, but not necessarily important, Results.resultsHashMap contains 433 keys, which is what should be, and the keys are starting from 1).
But when I run it with different number of Threads, it gives a different output. For example running with 6 Threads gives a different number of keys each time, sometimes 430, sometimes 428, sometimes 427, etc.. and the starting key is not always related to the total number of keys (e.g total_number_of_keys-starting_key_number+1, which seemed to me in the beginning to be some pattern, but realized it's not)
The iteration is like this:
int counterOfResults=0;
for (Integer key : Results.getResultsHashMap().keySet()) {
System.out.println(key + " " + Results.getResultsHashMap().get(key));
counterOfResults++;
}
System.out.println(counterOfResults);
Also when synchronizing the getter method for getting the hashMap, without synchronization of the Result creation and the insertion to the hashMap, the output with multiple threads gives wrong output.
Also, when synchronizing only one of the lines (creation of Result and putting into hashMap), the output is not coherent under multiple Threads.
However when I synchronize both these lines (the creation of Result and putting into the map) like so:
Result result;
synchronized (Lock.lock) {
result = new Result(currentLineTimeNationalityNameYearofbirth.getName(),currentLineTimeNationalityNameYearofbirth.getTime(),citycompetionwas,date,distance,stroke,gender,kindofpool);
Results.getResultsHashMap().put(Result.getUniqueIdResult(), result);
}
the output is perfect, no matter how many Threads I use.
Also, I will note that the output is being printed only after all Threads have finished, by using join method for all Threads created.
So my question is:
As far as I know, before synchronizing the 2 lines (creating Result and puting into hashMap) all of my critical sections ,e.g, changing and getting the uniqueIdResult, getting the resultsHashMap (as I mentioned, I tried synchronizing this getter method also) are being synchronized on the same object, plus I put a further safe approach when puting the hashMap with Collections.synchronizedMap, which,as far as I know, should make the hashMap thread-safe.
Why then the output is not as I expect it to be? Where is there a safety problem?
There's no exclusion around these lines:
Result result = new Result();
Results.getResultsHashMap().put(Result.getUniqueIdResult(), result);
If you have 4 threads, they might all execute the first line (which will increment the uniqueIdResult variable four times), and then all execute the second line (at which point they will all see the same return value from getUniqueIdResult()). That explains how your keys could start at 4 when you have 4 (or more) threads.
Because you have multiple threads potentially (and unpredictably) storing to the same key, you also end up with a variable number of entries in your map.
You should probably remove the increment from the Result class constructor and instead do it in the getUniqueIdResult method:
public static int getUniqueIdResult() {
synchronized (Lock.lock) {
return ++uniqueIdResult;
}
}
(Having done that, there is no longer any need to create instances of Result at all).
We are writing some locking code and have run into a peculiar question. We use a ConcurrentHashMap for fetching instances of Object that we lock on. So our synchronized blocks look like this
synchronized(locks.get(key)) { ... }
We have overridden the get method of ConcurrentHashMap to make it always return a new object if it did not contain one for the key.
#Override
public Object get(Object key) {
Object o = super.get(key);
if (null == o) {
Object no = new Object();
o = putIfAbsent((K) key, no);
if (null == o) {
o = no;
}
}
return o;
}
But is there a state in which the get-method has returned the object, but the thread has not yet entered the synchronized block. Allowing other threads to get the same object and lock on it.
We have a potential race condition were
thread 1: gets the object with key A, but does not enter the synchronized block
thread 2: gets the object with key A, enters a synchronized block
thread 2: removes the object from the map, exits synchronized block
thread 1: enters the synchronized block with the object that is no longer in the map
thread 3: gets a new object for key A (not the same object as thread 1 got)
thread 3: enters a synchronized block, while thread 1 also is in its synchronized block both using key A
This situation would not be possible if java entered the synchronized block directly after the call to get has returned. If not, does anyone have any input on how we could remove keys without having to worry about this race condition?
As I see it, the problem originates from the fact that you lock on map values, while in fact you need to lock on the key (or some derivation of it). If I understand correctly, you want to avoid 2 threads from running the critical section using the same key.
Is it possible for you to lock on the keys? can you guarantee that you always use the same instance of the key?
A nice alternative:
Don't delete the locks at all. Use a ReferenceMap with weak values. This way, a map entry is removed only if it is not currently in use by any thread.
Note:
1) Now you will have to synchronize this map (using Collections.synchronizedMap(..)).
2) You also need to synchronize the code that generates/returns a value for a given key.
you have 2 options:
a. you could check the map once inside the synchronized block.
Object o = map.get(k);
synchronized(o) {
if(map.get(k) != o) {
// object removed, handle...
}
}
b. you could extend your values to contain a flag indicating their status. when a value is removed from the map, you set a flag indicating that it was removed (within the sync block).
CacheValue v = map.get(k);
sychronized(v) {
if(v.isRemoved()) {
// object removed, handle...
}
}
The code as is, is thread safe. That being said, if you are removing from the CHM then any type of assumptions that are made when synchronizing on an object returned from the collection will be lost.
But is there a state in which the
get-method has returned the object,
but the thread has not yet entered the
synchronized block. Allowing other
threads to get the same object and
lock on it.
Yes, but that happens any time you synchronize on an Object. What is garunteed is that the other thread will not enter the synchronized block until the other exists.
If not, does anyone have any input on
how we could remove keys without
having to worry about this race
condition?
The only real way of ensuring this atomicity is to either synchronize on the CHM or another object (shared by all threads). The best way is to not remove from the CHM.
Thanks for all the great suggestions and ideas, really appreciate it! Eventually this discussion made me come up with a solution that does not use objects for locking.
Just a brief description of what we're actually doing.
We have a cache that receives data continuously from our environment. The cache has several 'buckets' for each key and aggregated events into the buckets as they come in. The events coming in have a key that determines the cache entry to be used, and a timestamp determining the bucket in the cache entry that should be incremented.
The cache also has an internal flush task that runs periodically. It will iterate all cache entries and flushes all buckets but the current one to database.
Now the timestamps of the incoming data can be for any time in the past, but the majority of them are for very recent timestamps. So the current bucket will get more hits than buckets for previous time intervals.
Knowing this, I can demonstrate the race condition we had. All this code is for one single cache entry, since the issue was isolated to concurrent writing and flushing of single cache elements.
// buckets :: ConcurrentMap<Long, AtomicLong>
void incrementBucket(long timestamp, long value) {
long key = bucketKey(timestamp, LOG_BUCKET_INTERVAL);
AtomicLong bucket = buckets.get(key);
if (null == bucket) {
AtomicLong newBucket = new AtomicLong(0);
bucket = buckets.putIfAbsent(key, newBucket);
if (null == bucket) {
bucket = newBucket;
}
}
bucket.addAndGet(value);
}
Map<Long, Long> flush() {
long now = System.currentTimeMillis();
long nowKey = bucketKey(now, LOG_BUCKET_INTERVAL);
Map<Long, Long> flushedValues = new HashMap<Long, Long>();
for (Long key : new TreeSet<Long>(buckets.keySet())) {
if (key != nowKey) {
AtomicLong bucket = buckets.remove(key);
if (null != bucket) {
long databaseKey = databaseKey(key);
long n = bucket.get()
if (!flushedValues.containsKey(databaseKey)) {
flushedValues.put(databaseKey, n);
} else {
long sum = flushedValues.get(databaseKey) + n;
flushedValues.put(databaseKey, sum);
}
}
}
}
return flushedValues;
}
What could happen was: (fl = flush thread, it = increment thread)
it: enters incrementBucket, executes until just before the call to addAndGet(value)
fl: enters flush and iterates the buckets
fl: reaches the bucket that is being incremented
fl: removes it and calls bucket.get() and stores the value to the flushed values
it: increments the bucket (which will be lost now, because the bucket has been flushed and removed)
The solution:
void incrementBucket(long timestamp, long value) {
long key = bucketKey(timestamp, LOG_BUCKET_INTERVAL);
boolean done = false;
while (!done) {
AtomicLong bucket = buckets.get(key);
if (null == bucket) {
AtomicLong newBucket = new AtomicLong(0);
bucket = buckets.putIfAbsent(key, newBucket);
if (null == bucket) {
bucket = newBucket;
}
}
synchronized (bucket) {
// double check if the bucket still is the same
if (buckets.get(key) != bucket) {
continue;
}
done = true;
bucket.addAndGet(value);
}
}
}
Map<Long, Long> flush() {
long now = System.currentTimeMillis();
long nowKey = bucketKey(now, LOG_BUCKET_INTERVAL);
Map<Long, Long> flushedValues = new HashMap<Long, Long>();
for (Long key : new TreeSet<Long>(buckets.keySet())) {
if (key != nowKey) {
AtomicLong bucket = buckets.get(key);
if (null != value) {
synchronized(bucket) {
buckets.remove(key);
long databaseKey = databaseKey(key);
long n = bucket.get()
if (!flushedValues.containsKey(databaseKey)) {
flushedValues.put(databaseKey, n);
} else {
long sum = flushedValues.get(databaseKey) + n;
flushedValues.put(databaseKey, sum);
}
}
}
}
}
return flushedValues;
}
I hope this will be useful for others that might run in to the same problem.
The two code snippets you've provided are fine, as they are. What you've done is similar to how lazy instantiation with Guava's MapMaker.makeComputingMap() might work, but I see no problems with the way that the keys are lazily created.
You're right by the way that it's entirely possible for a thread to be prempted after the get() lookup of a lock object, but before entering sychronized.
My problem is with the third bullet point in your race condition description. You say:
thread 2: removes the object from the map, exits synchronized block
Which object, and which map? In general, I presumed that you were looking up a key to lock on, and then would be performing some other operations on other data structures, within the synchronized block. If you're talking about removing the lock object from the ConcurrentHashMap mentioned at the start, that's a massive difference.
And the real question is whether this is necessary at all. In a general purpose environment, I don't think there will be any memory issues with just remembering all of the lock objects for all the keys that have ever been looked up (even if those keys no longer represent live objects). It is much harder to come up with some way of safely disposing of an object that may be stored in a local variable of some other thread at any time, and if you do want to go down this route I have a feeling that performance will degrade to that of a single coarse lock around the key lookup.
If I've misunderstood what's going on there then feel free to correct me.
Edit: OK - in which case I stand by my above claim that the easiest way to do this is not remove the keys; this might not actually be as problematic as you think, since the rate at which the space grows will be very small. By my calculations (which may well be off, I'm not an expert in space calculations and your JVM may vary) the map grows by about 14Kb/hour. You'd have to have a year of continuous uptime before this map used up 100MB of heap space.
But let's assume that the keys really do need to be removed. This poses the problem that you can't remove a key until you know that no threads are using it. This leads to the chicken-and-egg problem that you'll require all threads to synchronize on something else in order to get atomicity (of checking) and visibility across threads, which then means that you can't do much else than slap a single synchronized block around the whole thing, completely subverting your lock striping strategy.
Let's revisit the constraints. The main thing here is that things get cleared up eventually. It's not a correctness constraint but just a memory issue. Hence what we really want to do is identify some point at which the key could definitely no longer be used, and then use this as the trigger to remove it from the map. There are two cases here:
You can identify such a condition, and logically test for it. In which case you can remove the keys from the map with (in the worst case) some kind of timer thread, or hopefully some logic that's more cleanly integrated with your application.
You cannot identify any condition by which you know that a key will no longer be used. In this case, by definition, there is no point at which it's safe to remove the keys from the map. So in fact, for correctness' sake, you must leave them in.
In any case, this effectively boils down to manual garbage collection. Remove the keys from the map when you can lazily determine that they're no longer going to be used. Your current solution is too eager here since (as you point out) it's doing the removal before this situation holds.