I am writing an application which will return a HashMap to user. User will get reference to this MAP.
On the backend, I will be running some threads which will update the Map.
What I have done so far?
I have made all the backend threads so share a common channel to update the MAP. So at backend I am sure that concurrent write operation will not be an issue.
Issues I am having
If user tries to update the MAP and simultaneously MAP is being updated at backend --> Concurrent write operation problem.
If use tries to read something from MAP and simultaneously MAP is being updated at backend --> concurrent READ and WRITE Operation problem.
Untill now I have not face any such issue, but i m afraid that i may face in future. Please give sugesstions.
I am using ConcurrentHashMap<String, String>.
You are on the right track using ConcurrentHashMap. For each point:
Check out the methods putIfAbsent and replace both are threadsafe and combine checking current state of hashmap and updating it into one atomic operation.
The get method is not synchronized internally but will return the most recent value for the specified key available to it (check the ConcurrentHashMap class Javadoc for discussion).
The benefit of ConcurrentHashMap over something like Collections.synchronizedMap is the combined methods like putIfAbsent which provide traditional Map get and put logic in an internally synchronized way. Use these methods and do not try to provide your own custom synchronization over ConcurrentHashMap as it will not work. The java.util.concurrent collections are internally synchronized and other threads will not respond to attempts at synchronizing the object (e.g. synchronize(myConcurrentHashMap){} will not block other threads).
Side Note:
You might want to look into the lock free hash table implementation by Cliff Click, it's part of the Highly Scalable Java library
(Here's a Google Talk by Cliff Click about this lock free hash.)
ConcurrentHashMap was designed and implemented to avoid any issues with the scenarios you describe. You have nothing to worry about.
A hash table supporting full
concurrency of retrievals and
adjustable expected concurrency for
updates.updates.
javadoc of ConcurrentHashMap
Related
Since Java 8 we can use .compute* methods on ConcurrentHashMap to synchronize processing by key, so that if two threads execute .compute* method on the same key at the same time, callbacks still will be executed one after another and not simultaneously. But ConcurrentHashMap doesn't provide ability to delete data in timely fashion as caches usually allow.
Guava/Caffeine caches provide ability to automatically delete values based on time, but you don't have that nasty feature of synchronized processing based on key, as in ConcurrentHashMap (you can get ConcurrentMap using asMap method, but .compute* implementations doesn't provide synchronization based on key)
My goal is to have both processing synchronization by key as in ConcurrentHashMap AND removals by time as in Guava/Caffeine.
What is the best way to achieve it in Java?
I was wrong about Caffeine - it supports atomic operations with compute.
Guava support was added since 21 version.
What is a generic approach to achieve thread safety when an object (e.g. a HashMap or an ArrayList or some POJO) is always modified by a single (same) thread but can be accessed by multiple threads?
HashMap is of most interest for me but I need a generic approach.
Is it enough to make it volatile?
Thanks.
Maybe you should take a look at ConcurrentHashMap.
public class ConcurrentHashMap
extends AbstractMap
implements ConcurrentMap, Serializable
A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control.
More info here :
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
It behaves programmaticaly exactly like a classical hashmap.
A generic approach will not be easy to implement and will need a lot of effort on your part, and still things will go wrong (as it is too difficult). So, your best bet is ConcurrentHashMap as suggested by Gomoku7.
Generic approach will require an implementation based on locks. You will have to lock objects before update and release lock afterwords. There are different types of locks in Java. So, chose locks which suits your need. There are some tip:
Final is your friend, do not mutate objects unless it is necessary, make objects final where possible
Avoid creating temporary variables
Use Executors and Fork/Join where possible
In multiThreading I want to use a map which will be updated, which Map will be better considering the performance 1. HashMap 2. ConcurrentHashMap? also, will it perform slow if i make it volatile?
It is going to be used in a Java batch for approx. 20Million records.
Currently i am not sharing this map among threads.
will sharing the map among threads reduce performance?
HashMap will be better performance-wise, as it is not synchronized in any way. ConcurrentHashMap adds overhead to manage concurrent read and - especially - concurrent write access.
That being said, in a multithreaded environment, you are responsible for synchronizing access to HashMap as needed, which will cost performance, too.
Therefore, I would go for HashMap only if the use case allows for very specific optimization of the synchronization logic. Otherwise, ConcurrentHashMap will save you a lot of time working out the synchronization.
However, please note that even with ConcurrentHashMap you will need to carefully consider what level of synchronization you need. ConcurrentHashMap is thread-safe, but not fully synchronized. For instance, if you absolutely need to synchronize each read access with each write access, you will still need custom logic, since for a read operation ConcurrentHashMap will provide the state after the last successfully finished write operation. That is, there might still be an ongoing write operation which will not be seen by the read.
As for volatile, this only ensures that changes to that particular field will be synchronized between threads. Since you will likely not change the reference to the HashMap / ConcurrentHashMap, but work on the instance, the performance overhead will be negligible.
I have a HashMap that is loaded from a database during setup. A change to the database triggers an update on the map. I need to clear out the entire map and reload it with new values. I need to block any other threads from being able to access the map's get method until the entire map is reloaded.
I cannot use a ConcurrentHashMap because the docs say:
For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries.
I have tried using a synchronized block and a ReadWriteLock for the method that reloads the map but both still allow other threads to read from it. I cannot put a lock on each read because that would hinder the performance of my app.
All read access to the map is provided through a getter. I have considered using a Semaphore but I am not sure it is the best solution. Is there any other solution that would allow optimal performance without a major implementation change?
this.getMap().clear();
this.getMap().putAll(data);
First, I'll describe what I want and then I'll elaborate on the possibilities I am considering. I don't know which is the best so I want some help.
I have a hash map on which I do read and write operations from a Servlet. Now, since this Servlet is on Tomcat, I need the hash map to be thread safe. Basically, when it is being written to, nothing else should write to it and nothing should be able to read it as well.
I have seen ConcurrentHashMap but noticed its get method is not thread-safe. Then, I have seen locks and something called synchronized.
I want to know which is the most reliable way to do it.
ConcurrentHashMap.get() is thread safe.
You can make HashMap thread safe by wrapping it with Collections.synchronizedMap().
EDIT: removed false information
In any case, the synchronized keyword is a safe bet. It blocks any threads from accessing the object while inside a synchronized block.
// Anything can modify map at this point, making it not thread safe
map.get(0);
as opposed to
// Nothing can modify map until the synchronized block is complete
synchronized(map) {
map.get(0);
}
I would like to suggest you to go with ConcurrentHashMap , the requirement that you have mentioned above ,earlier I also had the same type of requirement for our application but we were little more focused on the performance side.
I ran both ConcurrentHashMap and map returned by Colecctions.synchronizedMap(); , under various types of load and launching multiple threads at a time using JMeter and I monitored them using JProfiler .After all these tests we came to conclusion that that map returned by Colecctions.synchronizedMap() was not as efficient in terms of performance in comaprison to ConcurrentHashMap.
I have written a post also on the same about my experience with both.
Thanks
Collections.synchronizedMap(new HashMap<K, V>);
Returns a synchronized (thread-safe) map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:
This is the point of ConcurrentHashMap class. It protects your collection, when you have more than 1 thread.