I need to add a static thread safe HashMap. I have something like this -
private static Map<String, ConcurrentHashMap<Integer, ClassA>> myCache =
new ConcurrentHashMap<String, ConcurrentHashMap<Integer, ClassA>>();
Even though I am using ConcurrentHashMap, I see that if main thread is adding element in the myCache, and when the same is accessed in other thread even at later time, it does not have the latest data in the myCache object. For e.g
Thread 1: Adds entry to the map
myCache = {a1={1=com.x.y.z.ClassA#3ec8b657}}
Thread 2: Access the same key a1. But it does not see the data added by Thread 1. Rather it sees empty value for this key myCache = {a1={}}
As a result, data is getting corrupted. Entries added for the a1 key in Thread 1 are not visible in Thread 2.
Thanks in advance for any pointers on how can I update this map in thread safe manner.
Even though I am using ConcurrentHashMap, I see that if main thread is adding element in the myCache, and when the same is accessed in other thread even at later time, it does not have the latest data in the myCache object.
ConcurrentHashMap is running in a large number of applications around the world at this instance. If your fairly typical use case didn't work appropriately, many critical systems would be failing.
Something is going on but chances are high that it is nothing to do with ConcurrentHashMap. So here are some questions for you to help you debug your code:
Are you sure that the cache lookup happens after the cache put? ConcurrentHashMap doesn't save you from race conditions in your code.
Any chance this is a problem with the hashcode() or equals() functions of the key object? By default the hashcode() and equals() methods are for the object instance and not the object value. See the consistency requirements.
Any chance that the cache lookup happens after a cache remove or a timeout of the cache value? Is there cache cleanup logic?
Do you have logic that does 2 operations on the ConcurrentHashMap. For example testing for existence of a cache entry and then making another call to put the value? You should be using putIfAbsent(...) or the other atomic calls if so.
If you edit your post and show a small sample of your code with the key object, the real source of the issue may be revealed.
I've never had good results with ConcurrentHashMap. When I need thread safety, I usally do something like this:
public class Cache<K, V> {
private Map<K, V> cache = new HashMap<>();
public V get(K key) {
synchronized (cache) {
return cache.get(key);
}
}
public void put(K key, V value) {
synchronized (cache) {
cache.put(key, value);
}
}
}
The #Ryan answer is essentially correct.
Remember that you must proxy every Map method that you wish to use
and you must synchronize to the cashe element within every proxied method.
For example:
public void clear()
{
synchronized(cache)
{
cache.clear();
}
}
Related
I am a bit confused regarding one pattern I have seen in some legacy code of ours.
The controller uses a map as a cache, with an approach that should be thread safe, however I am still not confident it indeed is. We have a map, which is properly synchronized during addition and retrieval, however, there is a bit of logic outside of the synchronized block, that does some additional filtering.
(the map itself and the lists are never accessed outside of this method, so concurrent modification is not an issue; the map holds some stable parameters, which basically never change, but are used often).
The code looks like the following sample:
public class FooBarController {
private final Map<String, List<FooBar>> fooBarMap =
new HashMap<String, List<FooBar>>();
public FooBar getFooBar(String key, String foo, String bar) {
List<FooBar> foobarList;
synchronized (fooBarMap) {
if (fooBarMap.get(key) == null) {
foobarList = queryDbByKey(key);
fooBarMap.put(key, foobarList);
} else {
foobarList = fooBarMap.get(key);
}
}
for(FooBar fooBar : foobarList) {
if(foo.equals(fooBar.getFoo()) && bar.equals(fooBar.getBar()))
return fooBar;
}
return null;
}
private List<FooBar> queryDbByKey(String key) {
// ... (simple Hibernate-query)
}
// ...
}
Based on what I know about the JVM memory model, this should be fine, since if one thread populates a list, another one can only retrieve it from the map with proper synchronization in place, ensuring that the entries of the list is visible. (putting the list happens-before getting it)
However, we keep seeing cases, where an entry expected to be in the map is not found, combined with the typical notorious symptoms of concurrency issues (e.g. intermittent failures in production, which I cannot reproduce in my development environment; different threads can properly retrieve the value etc.)
I am wondering if iterating through the elements of the List like this is thread-safe?
The code you provided is correct in terms of concurrency. Here are the guarantees:
only one thread at a time adds values to map, because of synchronization on map object
values added by thread become visible for all other threads, that enter synchronized block
Given that, you can be sure that all threads that iterate a list see the same elements. The issues you described are indeed strange but I doubt they're related to the code you provided.
It could be thread safe only if all access too fooBarMap are synchronized. A little out of scope, but safer may be to use a ConcurrentHashmap.
There is a great article on how hashmaps can be synchronized here.
In situation like this it's best option to use ConcurrentHashMap.
Verify if all Update-Read are in order.
As I understood from your question. There are fix set of params which never changes. One of the ways I preferred in situation like this is:
I. To create the map cache during start up and keep only one instance of it.
II. Read the map Instance anytime anywhere in the application.
In the for loop you are returning reference to fooBar objects in the foobarList.
So the method calling getFooBar() has access to the Map through this fooBar reference object.
try to clone fooBar before returning from getFooBar()
I am making some changes to some code I have written to try and change it into a multi-threaded solution. Some of the elements from my main class were originally static, and have had to be changed as part of the changes I am making. I had the idea to store them in a HashMap, using the Id of the Thread as the key for retrieving the items - that way I could store a reference to the Runnable class in the hash and access the desired attributes for the given thread by using getters/setters. I defined the below code to do this:
import java.util.HashMap;
public class ThreadContext {
private static HashMap<String, HashMap<String, Object>> tContext;
static {
initThreadContext();
}
public static void initThreadContext() {
String id = String.valueOf(Thread.currentThread().getId());
tContext = new HashMap<>();
}
public static void setObject(String key, Object o) {
String id = String.valueOf(Thread.currentThread().getId());
HashMap<String, Object> hash = tContext.get(id);
if( hash == null ) {
hash = new HashMap<>();
tContext.put(id, hash);
}
hash.put(key, o);
}
public static Object getObject(String key) {
String id = String.valueOf(Thread.currentThread().getId());
HashMap<String, Object> hash = tContext.get(id);
if( hash == null ) {
hash = new HashMap<>();
tContext.put(id, hash);
}
Object o = hash.get(key);
return o;
}
}
My question is: is it safe to do this, or should I try and find another way to do this? My example appears to work OK, but I'm unsure of any other side effects which may come about because of this.
EDIT: Example usage:
Foo foo = ((Foo)ThreadContext.getObject(Foo.CLASS_IDENTIFIER));
foo.doStuff();
There is already a way to do this using the JDK's ThreadLocal, which stores distinct references for each (local) thread.
Not sure what you are trying to do, however some of the points you should think are :
HashMap is not a synchronized object and has to be used in places where you don't need to worry about threads
In your case you seem to assume Thread Id will be unique which will not be when running on application servers. Some of the Application servers reuse thread ids and even use thread pool to reuse threads.
If you want to have data associated to a thread alone, use ThreadLocal. Again ThreadLocal should be used with Caution as there is no way JVM can clear the contents of ThreadLocal once your thread completes execution, if there is a thread pool. You will have to set the data and clear the data yourself.
The ThreadLocal is certainly a better approach.
But you want feedback on this code, so here it is.
The static block and the init can all be inlined on the static declaration.
You could use an IdentityHashMap and store thread instance themselves, avoiding the unclear risks around the thread id value stated above.
You could certainly use some static method synchronization for thread safety, but that would create contention. So a ConcurrentHashMap would locate the sub map for each thread, which in turn doesn't need synchronization (since only one thread could access it).
Regarding the safety (visibility to other unintended stackframes) when using a thread pool or executor and the likes, you can code yourself a try/finally or a closure (java87 lambda) to make sure you cleanup when you leave your code stackframes. No harder than the lock/unlock discipline.
BIG WARNING: if your code needing this custom threadlocal (or ANY thread local) will be inside a ForkJointTask.compute() subject to a ForkJoinPool during and calling a ForkJoinTask.join(), your thread will possibly run other identical ForkJoinTask.compute() (because of the thread continuation emulation) and your custom threadlocal could be initialized again and again (meaning, it will be clobbered) before even leaving the initial ForkJoinTask.compute(). This means you would need a stack of initial values managed in your try/finally... to tolerate re-entrance.
I need to make the following class thread-safe:
//Shared among all threads
public class SharedCache {
private Map<Object, Future<Collection<Integer>>> chachedFutures;
{
chachedFutures = new ConcurrentHashMap<>(); //not sure about that
}
public Future<Collection<Integer>> ensureFuture(Object value,
FutureFactory<Collection<Integer>> ff){
if(chachedFutures.containsKey(value))
return chachedFutures.get(value);
Future<Collection<Integer>> ftr = ff.create();
chachedFutures.put(value, ftr);
return ftr;
}
public Future<Collection<Integer>> remove(Object value){
return chachedFutures.remove(value);
}
}
After reading the article about the ConcurrentHashMap class it's still difficult for me to make a right decision.
Firstly, I tended to make the methods ensureFuture and remove just synchronized. And it would work, but from the performance standpoint it was not very good because of mutually-exclusing.
I don't know the exact (even approximately) amount of threads having access to the Cache simultaneously and the size of the Cache. Taking into account that
resizing this or any other kind of hash table is a relatively slow
operation
I didn't specify the initial size of the map. Also the concurrencyLevel parameter. Is it justified to use ConcurrentHashMap here or synchronized methods would be enough?
You have following methods:
public Future<Collection<Integer>> ensureFuture(Object value,
FutureFactory<Collection<Integer>> ff){
if(chachedFutures.containsKey(value))
return chachedFutures.get(value);
Future<Collection<Integer>> ftr = ff.create();
chachedFutures.put(value, ftr);
return ftr;
}
public Future<Collection<Integer>> remove(Object value){
return chachedFutures.remove(value);
}
There are some points to be noticed:
Suppose method ensureFuture is not synchronized in that case it is possible that one thread invokes containsKey which returns true but before next line is executed another thread may remove the entry respective to that key. This can lead to race condition as it is check-then-act scenario. Check this as well.
Also you are using chachedFutures.put(value, ftr) but IMO you should use chachedFutures.putIfAbsent(value, ftr) . For this method if the specified key is not already associated with a value (or is mapped to null) associates it with the given value and returns null, else returns the current value. Using this you can also avoid contains check.
Is it justified to use ConcurrentHashMap here or synchronized methods
would be enough?
It depends as CHM needs more memory compared to HashMap due to lot of bookkeeping activities etc. Another alternative is to use Collections.synchronizedMap which will provide synchronization on a regular HashMap.
I have a class container containing a collection which is going to be used by multiple threads:
public class Container{
private Map<String, String> map;
//ctor, other methods reading the map
public void doSomeWithMap(String key, String value){
//do some threads safe action
map.put(key, value);
//do something else, also thread safe
}
}
What would be better, to declare the method synchronized:
public synchronized void doSomeWithMap(String key, String value)
or to use standard thread-safe decorator?
Collections.synchronizedMap(map);
Generally speaking, synchronizing the map will protect most access to it without having to think about it further. However, the "synchronized map" is not safe for iteration which may be an issue depending on your use case. It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views.
Consider using ConcurrentHashMap if that will meet your use case.
If there is other state to this object that needs to be protected from concurrency errors, then you will need to use synchronized or a Lock.
If your doSomeWithMap method will access the map more than once, you must synchronize the doSomeWithMap method. If the only access is the put() call shown, then it's better to use a ConcurrentHashMap.
Note that "more than once" is any call, and an iterator is by nature many "gets".
If you look at the implementation of SynchronizedMap, you'll see that it's simply a map wrapping a non thread-safe map that uses a mutex before calling any method
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}
public V put(K key, V value) {
synchronized (mutex) {return m.put(key, value);}
}
public Set<Map.Entry<K,V>> entrySet() {
synchronized (mutex) {
if (entrySet==null)
entrySet = new SynchronizedSet<>(m.entrySet(), mutex);
return entrySet;
}
}
If all you want is protecting get and put, this implementation does it for you.
However it's not suitable if you want a Map that can be iterated over and updated by two or more threads, in which case you should use a ConcurrentHashMap.
If the other things you do inside doSomeWithMap would cause problems if done concurrently by different threads (for example, they update class-level variables) then you should synchronise the whole method. If this is not the case then you should use a synchronised Map in order to minimise the length of time the synchronisation lock is left in place.
You should probably have the block to be synchronized upon the requirement. Please note this.
When you use synchronized Collection like ConcurrentHashMap or Collection's method like synchronizedMap(), synchronizedList() etc, only the Map/List is synchronized. To explain little further,
Consider,
Map<String, Object> map = new HashMap<>();
Map<String, Object> synchMap = Collections.synchronizedMap(map);
This makes the map's get operation to be synchronous and not the objects inside it.
Object o = synchMap.get("1");// Object referenced by o is not synchronized. Only the map is.
If you want to protect the objects inside the Map, then you also have to put the code inside the synchronized block. This is good to remember as many people forget to safe guard the object in most cases.
Look at this for little info too Collection's synchronizedMap
I'm the following situation.
At web application startup I need to load a Map which is thereafter used by multiple incoming threads. That is, requests comes in and the Map is used to find out whether it contains a particular key and if so the value (the object) is retrieved and associated to another object.
Now, at times the content of the Map changes. I don't want to restart my application to reload the new situation. Instead I want to do this dynamically.
However, at the time the Map is re-loading (removing all items and replacing them with the new ones), concurrent read requests on that Map still arrive.
What should I do to prevent all read threads from accessing that Map while it's being reloaded ? How can I do this in the most performant way, because I only need this when the Map is reloading which will only occur sporadically (each every x weeks) ?
If the above is not an option (blocking) how can I make sure that while reloading my read request won't suffer from unexpected exceptions (because a key is no longer there, or a value is no longer present or being reloaded) ?
I was given the advice that a ReadWriteLock might help me out. Can you someone provide me an example on how I should use this ReadWriteLock with my readers and my writer ?
Thanks,
E
I suggest to handle this as follow:
Have your map accessible at a central place (could be a Spring singleton, a static ...).
When starting to reload, let the instance as is, work in a different Map instance.
When that new map is filled, replace the old map with this new one (that's an atomic operation).
Sample code:
static volatile Map<U, V> map = ....;
// **************************
Map<U, V> tempMap = new ...;
load(tempMap);
map = tempMap;
Concurrency effects :
volatile helps with visibility of the variable to other threads.
While reloading the map, all other threads see the old value undisturbed, so they suffer no penalty whatsoever.
Any thread that retrieves the map the instant before it is changed will work with the old values.
It can ask several gets to the same old map instance, which is great for data consistency (not loading the first value from the older map, and others from the newer).
It will finish processing its request with the old map, but the next request will ask the map again, and will receive the newer values.
If the client threads do not modify the map, i.e. the contents of the map is solely dependent on the source from where it is loaded, you can simply load a new map and replace the reference to the map your client threads are using once the new map is loaded.
Other then using twice the memory for a short time, no performance penalty is incurred.
In case the map uses too much memory to have 2 of them, you can use the same tactic per object in the map; iterate over the map, construct a new mapped-to object and replace the original mapping once the object is loaded.
Note that changing the reference as suggested by others could cause problems if you rely on the map being unchanged for a while (e.g. if (map.contains(key)) {V value = map.get(key); ...}. If you need that, you should keep a local reference to the map:
static Map<U,V> map = ...;
void do() {
Map<U,V> local = map;
if (local.contains(key)) {
V value = local.get(key);
...
}
}
EDIT:
The assumption is that you don't want costly synchronization for your client threads. As a trade-off, you allow client threads to finish their work that they've already begun before your map changed - ignoring any changes to the map that happened while it is running. This way, you can safely made some assumptions about your map - e.g. that a key is present and always mapped to the same value for the duration of a single request. In the example above, if your reader thread changed the map just after a client called map.contains(key), the client might get null on map.get(key) - and you'd almost certainly end this request with a NullPointerException. So if you're doing multiple reads to the map and need to do some assumptions as the one mentioned before, it's easiest to keep a local reference to the (maybe obsolete) map.
The volatile keyword isn't strictly necessary here. It would just make sure that the new map is used by other threads as soon as you changed the reference (map = newMap). Without volatile, a subsequent read (local = map) could still return the old reference for some time (we're talking about less than a nanosecond though) - especially on multicore systems if I remember correctly. I wouldn't care about it, but f you feel a need for that extra bit of multi-threading beauty, your free to use it of course ;)
I like the volatile Map solution from KLE a lot and would go with that. Another idea that someone might find interesting is to use the map equivalent of a CopyOnWriteArrayList, basically a CopyOnWriteMap. We built one of these internally and it is non-trivial but you might be able to find a COWMap out in the wild:
http://old.nabble.com/CopyOnWriteMap-implementation-td13018855.html
This is the answer from the JDK javadocs for ReentrantReadWriteLock implementation of ReadWriteLock. A few years late but still valid, especially if you don't want to rely only on volatile
class RWDictionary {
private final Map<String, Data> m = new TreeMap<String, Data>();
private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock r = rwl.readLock();
private final Lock w = rwl.writeLock();
public Data get(String key) {
r.lock();
try { return m.get(key); }
finally { r.unlock(); }
}
public String[] allKeys() {
r.lock();
try { return m.keySet().toArray(); }
finally { r.unlock(); }
}
public Data put(String key, Data value) {
w.lock();
try { return m.put(key, value); }
finally { w.unlock(); }
}
public void clear() {
w.lock();
try { m.clear(); }
finally { w.unlock(); }
}
}