If I have two multiple threads accessing a HashMap, but guarantee that they'll never be accessing the same key at the same time, could that still lead to a race condition?
In #dotsid's answer he says this:
If you change a HashMap in any way then your code is simply broken.
He is correct. A HashMap that is updated without synchronization will break even if the threads are using disjoint sets of keys. Here are just some1 of the things that can go wrong.
If one thread does a put, then another thread may see a stale value for the hashmap's size.
If one thread does a put with a key that is (currently) in the same hash bucket as the second thread's key, second thread's map entry might get lost, temporarily or permanently. It depends on how the hash chains (or whatever) are implemented.
When a thread does a put that triggers a rebuild of the table, another thread may see transient or stale versions of the hashtable array reference, its size, its contents or the hash chains. Chaos may ensue.
When a thread does a put for a key that collides with some key used by some other thread, and the latter thread does a put for its key, then the latter might see a stale copy of hash chain reference. Chaos may ensue.
When one thread probes the table with a key that collides with one of some other thread's keys, it may encounter that key on the chain. It will call equals on that key, and if the threads are not synchronized, the equals method may encounter stale state in that key.
And if you have two threads simultaneously doing put or remove requests, there are numerous opportunities for race conditions.
I can think of three solutions:
Use a ConcurrentHashMap.
Use a regular HashMap but synchronize on the outside; e.g. using primitive mutexes, Lock objects, etcetera. But beware that this could lead to a concurrency bottleneck due to lock contention.
Use a different HashMap for each thread. If the threads really have a disjoint set of keys, then there should be no need (from an algorithmic perspective) for them to share a single Map. Indeed, if your algorithms involve the threads iterating the keys, values or entries of the map at some point, splitting the single map into multiple maps could give a significant speedup for that part of the processing.
1 - We cannot enumerate all of the possible things that could go wrong. For a start, we can't predict how all JVMs will handle the unspecified aspects of the JMM ... on all platforms. But you should NOT be relying on that kind of information anyway. All you need to know is that it is fundamentally wrong to use a HashMap like this. An application that does this is broken ... even if you haven't observed the symptoms of the brokenness yet.
Just use a ConcurrentHashMap. The ConcurrentHashMap uses multiple locks which cover a range of hash buckets to reduce the chances of a lock being contested. There is a marginal performance impact to acquiring an uncontested lock.
To answer your original question: According to the javadoc, as long as the structure of the map doesn't change, your are fine. This mean no removing elements at all and no adding new keys that are not already in the map. Replacing the value associated with existing keys is fine.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
Though it makes no guarantees about visibility. So you have to be willing to accept retrieving stale associations occasionally.
It depends on what you mean under "accessing". If you just reading, you can read even the same keys as long as visibility of data guarantied under "happens-before" rules. This means that HashMap shouldn't change and all changes (initial constructions) should be completed before any reader start to accessing HashMap.
If you change a HashMap in any way then your code is simply broken. #Stephen C provides very good explanation why.
EDIT: If the first case is your actual situation, I recommend you to use Collections.unmodifiableMap() to be shure that your HashMap is never changed. Objects which are pointed by HashMap should not change also, so aggressive using final keyword can help you.
And as #Lars Andren says, ConcurrentHashMap is best choice in most cases.
Modifying a HashMap without proper synchronization from two threads may easily lead to a race condition.
When a put() leads to a resize of the internal table, this takes some time and the other thread continues to write to the old table.
Two put() for different keys lead to an update of the same bucket if the keys' hashcodes are equal modulo the table size. (Actually, the relation between hashcode and bucket index is more complicated, but collisions may still occur.)
Related
This question already has answers here:
Is a HashMap thread-safe for different keys?
(4 answers)
Closed 4 years ago.
Suppose we have multiple threads and we're dividing the possible keySet between the threads (i.e. key % thread_i) so there's no key collision.
Can we safely use HashMap<T> instead of ConcurrentHashMap<T>?
No, for several reasons. Some (but not all) would be:
HashMap rehashes your hash, so you won't even know what the actual key hash is.
You'd have to look in HashMap's implementation to even know how many buckets there are.
Resizing would have to copy state from the old array, and that wouldn't be safely published across threads.
Other state, like the map's current size, wouldn't be safely updated.
If you constructed your map with a particular jvm implementation in mind, and made sure that your map never resized, and knew you'd never care about that extra state, it's maybe possible, in the strictest of senses. But for any practical purposes, no.
The Javadoc is very clear about how you should use HashMap in multithreaded applications (I didn't even add the emphasis on "must" myself):
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
It's not beating about the bush here. You can use it safely provided you synchronize appropriately (which basically means to ensure mutually exclusive access to the map across threads).
If you do anything other than what it directly tells you, you are on your own.
Maybe it will work, if you pre-populate the map with dummy values for each keys, before starting the different threads. But:
it may depends on implementations
you always run the risk of someone later modifying the code to read/write data that is in the scope of a different thread.
if a new key comes, that wasn't pre-populated, you're running the risk of some undefined behavior.
A better option is to use a local map for each thread and join the local map at the end to collect the results.
What is a generic approach to achieve thread safety when an object (e.g. a HashMap or an ArrayList or some POJO) is always modified by a single (same) thread but can be accessed by multiple threads?
HashMap is of most interest for me but I need a generic approach.
Is it enough to make it volatile?
Thanks.
Maybe you should take a look at ConcurrentHashMap.
public class ConcurrentHashMap
extends AbstractMap
implements ConcurrentMap, Serializable
A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control.
More info here :
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
It behaves programmaticaly exactly like a classical hashmap.
A generic approach will not be easy to implement and will need a lot of effort on your part, and still things will go wrong (as it is too difficult). So, your best bet is ConcurrentHashMap as suggested by Gomoku7.
Generic approach will require an implementation based on locks. You will have to lock objects before update and release lock afterwords. There are different types of locks in Java. So, chose locks which suits your need. There are some tip:
Final is your friend, do not mutate objects unless it is necessary, make objects final where possible
Avoid creating temporary variables
Use Executors and Fork/Join where possible
If I have two multiple threads accessing a HashMap, but guarantee that they'll never be accessing the same key at the same time, could that still lead to a race condition?
In #dotsid's answer he says this:
If you change a HashMap in any way then your code is simply broken.
He is correct. A HashMap that is updated without synchronization will break even if the threads are using disjoint sets of keys. Here are just some1 of the things that can go wrong.
If one thread does a put, then another thread may see a stale value for the hashmap's size.
If one thread does a put with a key that is (currently) in the same hash bucket as the second thread's key, second thread's map entry might get lost, temporarily or permanently. It depends on how the hash chains (or whatever) are implemented.
When a thread does a put that triggers a rebuild of the table, another thread may see transient or stale versions of the hashtable array reference, its size, its contents or the hash chains. Chaos may ensue.
When a thread does a put for a key that collides with some key used by some other thread, and the latter thread does a put for its key, then the latter might see a stale copy of hash chain reference. Chaos may ensue.
When one thread probes the table with a key that collides with one of some other thread's keys, it may encounter that key on the chain. It will call equals on that key, and if the threads are not synchronized, the equals method may encounter stale state in that key.
And if you have two threads simultaneously doing put or remove requests, there are numerous opportunities for race conditions.
I can think of three solutions:
Use a ConcurrentHashMap.
Use a regular HashMap but synchronize on the outside; e.g. using primitive mutexes, Lock objects, etcetera. But beware that this could lead to a concurrency bottleneck due to lock contention.
Use a different HashMap for each thread. If the threads really have a disjoint set of keys, then there should be no need (from an algorithmic perspective) for them to share a single Map. Indeed, if your algorithms involve the threads iterating the keys, values or entries of the map at some point, splitting the single map into multiple maps could give a significant speedup for that part of the processing.
1 - We cannot enumerate all of the possible things that could go wrong. For a start, we can't predict how all JVMs will handle the unspecified aspects of the JMM ... on all platforms. But you should NOT be relying on that kind of information anyway. All you need to know is that it is fundamentally wrong to use a HashMap like this. An application that does this is broken ... even if you haven't observed the symptoms of the brokenness yet.
Just use a ConcurrentHashMap. The ConcurrentHashMap uses multiple locks which cover a range of hash buckets to reduce the chances of a lock being contested. There is a marginal performance impact to acquiring an uncontested lock.
To answer your original question: According to the javadoc, as long as the structure of the map doesn't change, your are fine. This mean no removing elements at all and no adding new keys that are not already in the map. Replacing the value associated with existing keys is fine.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
Though it makes no guarantees about visibility. So you have to be willing to accept retrieving stale associations occasionally.
It depends on what you mean under "accessing". If you just reading, you can read even the same keys as long as visibility of data guarantied under "happens-before" rules. This means that HashMap shouldn't change and all changes (initial constructions) should be completed before any reader start to accessing HashMap.
If you change a HashMap in any way then your code is simply broken. #Stephen C provides very good explanation why.
EDIT: If the first case is your actual situation, I recommend you to use Collections.unmodifiableMap() to be shure that your HashMap is never changed. Objects which are pointed by HashMap should not change also, so aggressive using final keyword can help you.
And as #Lars Andren says, ConcurrentHashMap is best choice in most cases.
Modifying a HashMap without proper synchronization from two threads may easily lead to a race condition.
When a put() leads to a resize of the internal table, this takes some time and the other thread continues to write to the old table.
Two put() for different keys lead to an update of the same bucket if the keys' hashcodes are equal modulo the table size. (Actually, the relation between hashcode and bucket index is more complicated, but collisions may still occur.)
I am trying to track down a race condition and all signs seem to be pointing to ConcurrentHashMap.putIfAbsent(). Is it possible that if 2 threads call putIfAbsent() on an empty map with the same key that both could do their lookup to see that the key does not exist yet so both threads then try to add it? For some reason when I first started using putIfAbsent() I did not think the call would need to be synchronized. But now I can't see how it would prevent both threads from adding their values if the timing was right. I have not been able to reproduce this outside of production.
Thanks
None of the operations for any concurrent collection needs to use synchronized.
This is by design and in fact locking the collection has no effect on other operations. (Unless they are locked as well) In which case it will make them slower.
Is it possible that if 2 threads call putIfAbsent() on an empty map with the same key that both could do their lookup to see that the key does not exist yet so both threads then try to add it?
Both can try, but only one will succeed. It is not possible for two threads to appear to have succeeded.
For some reason when I first started using putIfAbsent() I did not think the call would need to be synchronized.
It doesn't.
But now I can't see how it would prevent both threads from adding their values if the timing was right.
It performs a CAS operation in the code which means only one operation can succeed and the thread will know which one. A CAS operation doesn't need locking as it uses the underlying assembly instruction to perform this. In fact you would normally implement a lock using a CAS operation, rather than the other way around.
Is it possible that if 2 threads call putIfAbsent on an empty map with the same key that both could do their lookup to see that the key does not exist yet so both threads then try to add it?
Not according to the documentation for putIfAbsent():
If the specified key is not already associated with a value, associate it with the given value. This is equivalent to
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
except that the action is performed atomically.
This means that is not possible for both threads to attempt to insert the key-value pair.
In my application, I have a key-value map that serves as a central repository for storing data that is used to return to a defined state after a crash or restart (checkpointing).
The application is multithreaded and several threads may put key-value pairs into that map. One thread is responsible for regularly creating a checkpoint, i. e. serialize the map to persistant storage.
While the checkpoint is being written, the map should remain unchanged. It's rather easy to avoid new items being added, but what about other threads changing members of "their" objects inside the map?
I could have a single object whose monitor is seized when the checkpointing starts and wrap all write access to any member of the map, and members thereof, in blocks synchronizing on that object. This seems very error-prone and tedious to me.
I could also make the map private to the checkpointer and only put copies of the submitted objects in it. But then I would have to ensure that the copies are deep copies and I wouldn't be able to have the data in the map being automatically updated, on every change to the submitted objects, the submitters would have to re-submit them. This seems like a lot of overhead and also error-prone, as I have to remember putting resubmit code in all the right places.
What's an elegant and reliable way to solve this?
what about other threads changing members of "their" objects inside the map
Here you have a problem :) and it cannot be solved by any kind of Map...
One solution would be to allow only immutable objects in your Map, but this may be impossible for you.
Otherwise you have to share a lock will all threads that may change data referenced by your map and block them all during your snapshot ; but this is a stop the world approach...
pgras is right that immutability would fix things, but that would also be tough. You could just lock the whole thing but that could be a performance problem. I can think of two good ideas.
First is to use a ReadWriteLock (which requires 1.5 or newer). Since your checkpoint can acquire the read lock it can be assured things are safe, but when no one is reading performance should be pretty good. This is still a pretty coarse lock, so you may also want to do #2...
Second is to break things up. Each area of the program could keep it's own map (the map for GUI stuff, the map for user settings, the map for hardware settings, whatever). Each one would have a lock on it and things would go about as usual. When it came time to checkpoint, the checkpointer would grab ALL the locks (so things are consistant) and then do it's job. The catch here is you have define an order for the locks to be grabbed in (say alphabetical) otherwise you'll end-up with deadlocks.
If the maps are orthogonal to each other (updates to one don't require updates to another to be consistent) then the easiest thing may be to push the updates to a central "backup" map in the checkpointer, not unlike something you described.
My biggest question to you would be, how much of a problem is this (performance wise)? Are updates very frequent, or are they rare? That would help to advise on something since my last idea (previous paragraph) could be slow, but it's easy and may not matter.
There is a fantastic book called Java Concurrency in Practice which is basically the Java threading bible. It discusses how to figure out this kind of stuff and strategies to avoid problems or make solving them easier. If you are going to be doing more threading, it's a very useful read.
Actually if your key values are orthogonal to eachother, then things are really easy. The ConcurrentMap interface (there are implemetations such as the ConcurrentHashMap) would solve your problems since they can do changes atomically, so readers wouldn't see inconsistent data. But if you have any two (or more) keys that must be updated at the same time this won't cover you.
I hope this helps. Threading access to shared data structures is complex stuff.