Java: modifying object in a single thread, reading in multiple threads - java

What is a generic approach to achieve thread safety when an object (e.g. a HashMap or an ArrayList or some POJO) is always modified by a single (same) thread but can be accessed by multiple threads?
HashMap is of most interest for me but I need a generic approach.
Is it enough to make it volatile?
Thanks.

Maybe you should take a look at ConcurrentHashMap.
public class ConcurrentHashMap
extends AbstractMap
implements ConcurrentMap, Serializable
A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control.
More info here :
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
It behaves programmaticaly exactly like a classical hashmap.

A generic approach will not be easy to implement and will need a lot of effort on your part, and still things will go wrong (as it is too difficult). So, your best bet is ConcurrentHashMap as suggested by Gomoku7.
Generic approach will require an implementation based on locks. You will have to lock objects before update and release lock afterwords. There are different types of locks in Java. So, chose locks which suits your need. There are some tip:
Final is your friend, do not mutate objects unless it is necessary, make objects final where possible
Avoid creating temporary variables
Use Executors and Fork/Join where possible

Related

MultiThreading : Map Performance

In multiThreading I want to use a map which will be updated, which Map will be better considering the performance 1. HashMap 2. ConcurrentHashMap? also, will it perform slow if i make it volatile?
It is going to be used in a Java batch for approx. 20Million records.
Currently i am not sharing this map among threads.
will sharing the map among threads reduce performance?
HashMap will be better performance-wise, as it is not synchronized in any way. ConcurrentHashMap adds overhead to manage concurrent read and - especially - concurrent write access.
That being said, in a multithreaded environment, you are responsible for synchronizing access to HashMap as needed, which will cost performance, too.
Therefore, I would go for HashMap only if the use case allows for very specific optimization of the synchronization logic. Otherwise, ConcurrentHashMap will save you a lot of time working out the synchronization.
However, please note that even with ConcurrentHashMap you will need to carefully consider what level of synchronization you need. ConcurrentHashMap is thread-safe, but not fully synchronized. For instance, if you absolutely need to synchronize each read access with each write access, you will still need custom logic, since for a read operation ConcurrentHashMap will provide the state after the last successfully finished write operation. That is, there might still be an ongoing write operation which will not be seen by the read.
As for volatile, this only ensures that changes to that particular field will be synchronized between threads. Since you will likely not change the reference to the HashMap / ConcurrentHashMap, but work on the instance, the performance overhead will be negligible.

Why the size of hashmap is different then using threads? [duplicate]

If I have two multiple threads accessing a HashMap, but guarantee that they'll never be accessing the same key at the same time, could that still lead to a race condition?
In #dotsid's answer he says this:
If you change a HashMap in any way then your code is simply broken.
He is correct. A HashMap that is updated without synchronization will break even if the threads are using disjoint sets of keys. Here are just some1 of the things that can go wrong.
If one thread does a put, then another thread may see a stale value for the hashmap's size.
If one thread does a put with a key that is (currently) in the same hash bucket as the second thread's key, second thread's map entry might get lost, temporarily or permanently. It depends on how the hash chains (or whatever) are implemented.
When a thread does a put that triggers a rebuild of the table, another thread may see transient or stale versions of the hashtable array reference, its size, its contents or the hash chains. Chaos may ensue.
When a thread does a put for a key that collides with some key used by some other thread, and the latter thread does a put for its key, then the latter might see a stale copy of hash chain reference. Chaos may ensue.
When one thread probes the table with a key that collides with one of some other thread's keys, it may encounter that key on the chain. It will call equals on that key, and if the threads are not synchronized, the equals method may encounter stale state in that key.
And if you have two threads simultaneously doing put or remove requests, there are numerous opportunities for race conditions.
I can think of three solutions:
Use a ConcurrentHashMap.
Use a regular HashMap but synchronize on the outside; e.g. using primitive mutexes, Lock objects, etcetera. But beware that this could lead to a concurrency bottleneck due to lock contention.
Use a different HashMap for each thread. If the threads really have a disjoint set of keys, then there should be no need (from an algorithmic perspective) for them to share a single Map. Indeed, if your algorithms involve the threads iterating the keys, values or entries of the map at some point, splitting the single map into multiple maps could give a significant speedup for that part of the processing.
1 - We cannot enumerate all of the possible things that could go wrong. For a start, we can't predict how all JVMs will handle the unspecified aspects of the JMM ... on all platforms. But you should NOT be relying on that kind of information anyway. All you need to know is that it is fundamentally wrong to use a HashMap like this. An application that does this is broken ... even if you haven't observed the symptoms of the brokenness yet.
Just use a ConcurrentHashMap. The ConcurrentHashMap uses multiple locks which cover a range of hash buckets to reduce the chances of a lock being contested. There is a marginal performance impact to acquiring an uncontested lock.
To answer your original question: According to the javadoc, as long as the structure of the map doesn't change, your are fine. This mean no removing elements at all and no adding new keys that are not already in the map. Replacing the value associated with existing keys is fine.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
Though it makes no guarantees about visibility. So you have to be willing to accept retrieving stale associations occasionally.
It depends on what you mean under "accessing". If you just reading, you can read even the same keys as long as visibility of data guarantied under "happens-before" rules. This means that HashMap shouldn't change and all changes (initial constructions) should be completed before any reader start to accessing HashMap.
If you change a HashMap in any way then your code is simply broken. #Stephen C provides very good explanation why.
EDIT: If the first case is your actual situation, I recommend you to use Collections.unmodifiableMap() to be shure that your HashMap is never changed. Objects which are pointed by HashMap should not change also, so aggressive using final keyword can help you.
And as #Lars Andren says, ConcurrentHashMap is best choice in most cases.
Modifying a HashMap without proper synchronization from two threads may easily lead to a race condition.
When a put() leads to a resize of the internal table, this takes some time and the other thread continues to write to the old table.
Two put() for different keys lead to an update of the same bucket if the keys' hashcodes are equal modulo the table size. (Actually, the relation between hashcode and bucket index is more complicated, but collisions may still occur.)

Meaning of 'thread-safe' in ConcurrentHashMap

Its OK that CHM provides performance gains when used instead of Collections.synchronizedMap(...)/hashtable, due to bucket level locking.
Its also OK that when we do not want our iterator to throw ConcurrentModificationException , we use CHM.
BUT I am confused with the meaning of thread-safe in context of CHM as they does not prevent changes to get reflected between retrievals and writes overlappings?
What thread safety means is that you are permitted to share a ConcurrentHashMap object across multiple threads, and to access/modify that object concurrently without external locking.
The exact semantics are explained in the documentation:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.
For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
As Aix mentioned that Sharing ConcurrentHashMap object across multiple threads without external lock is Thread-Safe
How it works is like ConcurrentHashMap ( Sun's current implementation) works by dividing the underlying map into a number of separate buckets. Getting an element does not require any locking per se, but it does use atomic/volatile operations, which implies a memory barrier (potentially very costly, and interfering with other possible optimisations) but in Collections.synchronizedMap(...) lock is applied to whole map so it is costly.
Usage:
ConcurrentHashMap is implemented for higher throughput in cases where high concurrency is expected

Thread safe Hash Map?

I am writing an application which will return a HashMap to user. User will get reference to this MAP.
On the backend, I will be running some threads which will update the Map.
What I have done so far?
I have made all the backend threads so share a common channel to update the MAP. So at backend I am sure that concurrent write operation will not be an issue.
Issues I am having
If user tries to update the MAP and simultaneously MAP is being updated at backend --> Concurrent write operation problem.
If use tries to read something from MAP and simultaneously MAP is being updated at backend --> concurrent READ and WRITE Operation problem.
Untill now I have not face any such issue, but i m afraid that i may face in future. Please give sugesstions.
I am using ConcurrentHashMap<String, String>.
You are on the right track using ConcurrentHashMap. For each point:
Check out the methods putIfAbsent and replace both are threadsafe and combine checking current state of hashmap and updating it into one atomic operation.
The get method is not synchronized internally but will return the most recent value for the specified key available to it (check the ConcurrentHashMap class Javadoc for discussion).
The benefit of ConcurrentHashMap over something like Collections.synchronizedMap is the combined methods like putIfAbsent which provide traditional Map get and put logic in an internally synchronized way. Use these methods and do not try to provide your own custom synchronization over ConcurrentHashMap as it will not work. The java.util.concurrent collections are internally synchronized and other threads will not respond to attempts at synchronizing the object (e.g. synchronize(myConcurrentHashMap){} will not block other threads).
Side Note:
You might want to look into the lock free hash table implementation by Cliff Click, it's part of the Highly Scalable Java library
(Here's a Google Talk by Cliff Click about this lock free hash.)
ConcurrentHashMap was designed and implemented to avoid any issues with the scenarios you describe. You have nothing to worry about.
A hash table supporting full
concurrency of retrievals and
adjustable expected concurrency for
updates.updates.
javadoc of ConcurrentHashMap

Is a HashMap thread-safe for different keys?

If I have two multiple threads accessing a HashMap, but guarantee that they'll never be accessing the same key at the same time, could that still lead to a race condition?
In #dotsid's answer he says this:
If you change a HashMap in any way then your code is simply broken.
He is correct. A HashMap that is updated without synchronization will break even if the threads are using disjoint sets of keys. Here are just some1 of the things that can go wrong.
If one thread does a put, then another thread may see a stale value for the hashmap's size.
If one thread does a put with a key that is (currently) in the same hash bucket as the second thread's key, second thread's map entry might get lost, temporarily or permanently. It depends on how the hash chains (or whatever) are implemented.
When a thread does a put that triggers a rebuild of the table, another thread may see transient or stale versions of the hashtable array reference, its size, its contents or the hash chains. Chaos may ensue.
When a thread does a put for a key that collides with some key used by some other thread, and the latter thread does a put for its key, then the latter might see a stale copy of hash chain reference. Chaos may ensue.
When one thread probes the table with a key that collides with one of some other thread's keys, it may encounter that key on the chain. It will call equals on that key, and if the threads are not synchronized, the equals method may encounter stale state in that key.
And if you have two threads simultaneously doing put or remove requests, there are numerous opportunities for race conditions.
I can think of three solutions:
Use a ConcurrentHashMap.
Use a regular HashMap but synchronize on the outside; e.g. using primitive mutexes, Lock objects, etcetera. But beware that this could lead to a concurrency bottleneck due to lock contention.
Use a different HashMap for each thread. If the threads really have a disjoint set of keys, then there should be no need (from an algorithmic perspective) for them to share a single Map. Indeed, if your algorithms involve the threads iterating the keys, values or entries of the map at some point, splitting the single map into multiple maps could give a significant speedup for that part of the processing.
1 - We cannot enumerate all of the possible things that could go wrong. For a start, we can't predict how all JVMs will handle the unspecified aspects of the JMM ... on all platforms. But you should NOT be relying on that kind of information anyway. All you need to know is that it is fundamentally wrong to use a HashMap like this. An application that does this is broken ... even if you haven't observed the symptoms of the brokenness yet.
Just use a ConcurrentHashMap. The ConcurrentHashMap uses multiple locks which cover a range of hash buckets to reduce the chances of a lock being contested. There is a marginal performance impact to acquiring an uncontested lock.
To answer your original question: According to the javadoc, as long as the structure of the map doesn't change, your are fine. This mean no removing elements at all and no adding new keys that are not already in the map. Replacing the value associated with existing keys is fine.
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
Though it makes no guarantees about visibility. So you have to be willing to accept retrieving stale associations occasionally.
It depends on what you mean under "accessing". If you just reading, you can read even the same keys as long as visibility of data guarantied under "happens-before" rules. This means that HashMap shouldn't change and all changes (initial constructions) should be completed before any reader start to accessing HashMap.
If you change a HashMap in any way then your code is simply broken. #Stephen C provides very good explanation why.
EDIT: If the first case is your actual situation, I recommend you to use Collections.unmodifiableMap() to be shure that your HashMap is never changed. Objects which are pointed by HashMap should not change also, so aggressive using final keyword can help you.
And as #Lars Andren says, ConcurrentHashMap is best choice in most cases.
Modifying a HashMap without proper synchronization from two threads may easily lead to a race condition.
When a put() leads to a resize of the internal table, this takes some time and the other thread continues to write to the old table.
Two put() for different keys lead to an update of the same bucket if the keys' hashcodes are equal modulo the table size. (Actually, the relation between hashcode and bucket index is more complicated, but collisions may still occur.)

Categories