How to refresh HashMap while clients read from it - java

I have a static HashMap that is initialized on server startup. Clients initialize their data from this map when they login.
Now I need to refresh this map, but clients can login and get data from this map at the same time.
Can I change reference of map like below while they read?
I cant use synchronized because they can read in at the same time and only one thread is writing.
public void refresh() {
Map<String, Object> newMap = prepareData();
map = newMap;
}

Lets assume that "refresh" means that you want to replace all entries in the hashmap with a fresh set loaded from (say) a file.
If the set of keys in the new mapping is a superset of the keys in the original mapping, AND if you application doesn't care if clients can set part of the old mapping and part of the new mapping at the same time, then you could use a ConcurrentHashMap instead of a HashMap, and replace the entries with a sequence of put calls.
However, if keys are (or could be) different, or if the update needs to be atomic from the client's perspective then a ConcurrentHashMap is not going to work. Instead, you need to declare map as a volatile and implement your refresh() method as per your question.
As you point out, using synchronized (or a single-writer-multiple-reader lock) is liable to lead to a concurrency bottleneck.
Note: using a volatile is likely to give better performance than using a ConcurrentHashMap even in the cases where the latter is a viable solution.

First of all your map needs to be declared as volatile in order to ensure that each thread has the last version of it, then here is how you could proceed:
public void refresh() {
synchronized (MyClass.class) {
Map<String, Object> newMap = prepareData();
map = Collections.unmodifiableMap(newMap);
}
}
And your map would be declared as below:
private static volatile Map<String, Object> map = ...

If it's ok that clients have stale data, then all you need to do is create a new map and point your static reference at it. If a new client comes along while you're doing this then they get the stale data and no harm is done, if they turn up after the switch (reassignment) to the new values has occurred then they will get the new values. Job done.
If it's not ok then you will also, probably, have to inform other clients that existed before the update about the change. In which case you want to use the observer pattern for the updates. In this pattern it's fine if the client connects during the update, because they will be updated as soon as possible after the update is complete.
BTW: in all cases, you really shouldn't be using 'static' for anything. It'll only lead to problems down the line. Rather, create a non-static singleton that holds the map and inject that into your clients/services/whatever.

Related

Best approach to store a set of static String values

I have a requirement to retrieve query params lets say clId,clCtx and clName from an odata uri and use them throughout the program across many classes. Retrieving the query params is an expensive process each time in our customised client framework. What is the best approach to retrieve and store the params once per request and utilise the same throughout the program?
My thoughts were to create a singleton class with a static java map as below. For the first time I can retrieve and store them in a map to use them later.
Also Would like to know Is my approach has any issues like memory leak or drawbacks.
public class ClientContainer {
private static Map<String,String> clientMap;
private static ClientContainer instance;
private ClientContainer(){}
public static ClientContainer getInstance(){
if(instance == null){
instance = new ClientContainer();
}
return instance;
}
private void updateClientMap(HashMap<String,String> clientMap){
if(this.clientMap==null){
this.clientMap =clientMap;
}
}
private HashMap<String,String> getClientMap(){
return this.clientMap;
}
}
Also Would like to know Is my approach has any issues like memory leak
or drawbacks.
1) Your ClientContainer class stores only the parameters for one client.
You use a Map<String,String> (values by param) in and not a Map<Client, Map<String, String>> (values by param and all that by client).
As you can instantiate a single time the class with the singleton pattern you are so stuck to store data not more than for one client request.
2) The lifespan of the requests (param-value) will be endless in this way.
The Map will indeed never be garbage collected as this is referenced by the singleton instance that has never a chance to be garbage collected too.
So if you keep the singleton pattern you should at least consider the key-values according to the client and you should also think to clean the cache at each time that a request was completely handled to avoid memory leak.
But If I was you, I probably would use another solution:
either I would create the Map as a local variable a single time : at the time where I receive the request and then I would pass it explicitly to each method that needs that.
or I would use ThreadLocal to store the Map if each request is handled by a specific thread.
Note that I prefer the first way that exposes clearly the dependency of the method.
Note also that using a Map is not necessary the best thing if the number of fields to use is stable and known : using a custom class may be much clearer, readable and more robust.
If you are using ClientContainer class in multithreading environment, then make sense to make getInstance() method thread-safe in order to avoid duplication of constructor. To avoid of memory leaks you can use WeakHashMap instead of HashMap.

Is iterating over a list retrieved in a synchronized block thread-safe?

I am a bit confused regarding one pattern I have seen in some legacy code of ours.
The controller uses a map as a cache, with an approach that should be thread safe, however I am still not confident it indeed is. We have a map, which is properly synchronized during addition and retrieval, however, there is a bit of logic outside of the synchronized block, that does some additional filtering.
(the map itself and the lists are never accessed outside of this method, so concurrent modification is not an issue; the map holds some stable parameters, which basically never change, but are used often).
The code looks like the following sample:
public class FooBarController {
private final Map<String, List<FooBar>> fooBarMap =
new HashMap<String, List<FooBar>>();
public FooBar getFooBar(String key, String foo, String bar) {
List<FooBar> foobarList;
synchronized (fooBarMap) {
if (fooBarMap.get(key) == null) {
foobarList = queryDbByKey(key);
fooBarMap.put(key, foobarList);
} else {
foobarList = fooBarMap.get(key);
}
}
for(FooBar fooBar : foobarList) {
if(foo.equals(fooBar.getFoo()) && bar.equals(fooBar.getBar()))
return fooBar;
}
return null;
}
private List<FooBar> queryDbByKey(String key) {
// ... (simple Hibernate-query)
}
// ...
}
Based on what I know about the JVM memory model, this should be fine, since if one thread populates a list, another one can only retrieve it from the map with proper synchronization in place, ensuring that the entries of the list is visible. (putting the list happens-before getting it)
However, we keep seeing cases, where an entry expected to be in the map is not found, combined with the typical notorious symptoms of concurrency issues (e.g. intermittent failures in production, which I cannot reproduce in my development environment; different threads can properly retrieve the value etc.)
I am wondering if iterating through the elements of the List like this is thread-safe?
The code you provided is correct in terms of concurrency. Here are the guarantees:
only one thread at a time adds values to map, because of synchronization on map object
values added by thread become visible for all other threads, that enter synchronized block
Given that, you can be sure that all threads that iterate a list see the same elements. The issues you described are indeed strange but I doubt they're related to the code you provided.
It could be thread safe only if all access too fooBarMap are synchronized. A little out of scope, but safer may be to use a ConcurrentHashmap.
There is a great article on how hashmaps can be synchronized here.
In situation like this it's best option to use ConcurrentHashMap.
Verify if all Update-Read are in order.
As I understood from your question. There are fix set of params which never changes. One of the ways I preferred in situation like this is:
I. To create the map cache during start up and keep only one instance of it.
II. Read the map Instance anytime anywhere in the application.
In the for loop you are returning reference to fooBar objects in the foobarList.
So the method calling getFooBar() has access to the Map through this fooBar reference object.
try to clone fooBar before returning from getFooBar()

Java multi threading atomic assignment

Same with the follow link, I use the same code with the questioner.
Java multi-threading atomic reference assignment
In my code, there
HashMap<String,String> cache = new HashMap<String,String>();
public class myClass {
private HashMap<String,String> cache = null;
public void init() {
refreshCache();
}
// this method can be called occasionally to update the cache.
//Only one threading will get to this code.
public void refreshCache() {
HashMap<String,String> newcache = new HashMap<String,String>();
// code to fill up the new cache
// and then finally
cache = newcache; //assign the old cache to the new one in Atomic way
}
//Many threads will run this code
public void getCache(Object key) {
ob = cache.get(key)
//do something
}
}
I read the sjlee's answer again and again, I can't understand in which case these code will go wrong. Can anyone give me a example?
Remember I don't care about the getCache function will get the old data.
I'm sorry I can't add comment to the above question because I don't have 50 reputation.
So I just add a new question.
Without a memory barrier you might see null or an old map but you could see an incomplete map. I.e. you see bits of it but not all. Thus is not a problem if you don't mind entries being missing but you risk seeing the Map object but not anything it refers to resulting in a possible NPE.
There is no guarantee you will see a complete Map.
final fields will be visible but non - final fields might not.
this is a very interesting problem, and it shows that one of your core assumptions
"Remember I don't care about the getCache function will get the old
data."
is not correct.
we think, that if "refreshCache" and "getCache" is not synchronized, then we will only get old data, which is not true.
Their call by the initial thread may never reflect in other threads. Since cache is not volatile, every thread is free to keep it's own local copy of it and never make it consistent across threads.
Because the "visibility" aspect of multi-threading, which says that unless we use appropriate locking, or use volatile, we do not trigger a happens-before scenario, which forces threads to make shared variable value consistent across the multiple processors they are running on, which means "cache" , may never get initialized causing an obvious NPE in getCache
to understand this properly, i would recommend reading section 16.2.4 of "Java concurrency in practice" book which deals with a similar problem in double checked locking code.
Solution: would be
To make refreshCache synchronized to force, all threads to update their copy of HashMap whenever any one thread calls it, or
To make cache volatile or
You would have to call refreshCache in every single thread that calls getCache which kind of defeats the purpose of a common cache.

Declaring a hashmap inside a method

Local variables are thread safe in Java. Is using a hashmap declared inside a method thread safe?
For Example-
void usingHashMap()
{
HashMap<Integer> map = new HashMap<integer>();
}
When two threads run the same method here usingHashMap(), they are in no way way related. Each thread will create its own version of every local variable, and these variables will not interact with each other in any way
If variables aren't local,then they are attached to the instance. In this case, two threads running the same method both see the one variable, and this isn't threadsafe.
public class usingHashMapNotThreadSafe {
HashMap<Integer, String> map = new HashMap<Integer, String>();
public int work() {
//manipulating the hashmap here
}
}
public class usingHashMapThreadSafe {
public int worksafe() {
HashMap<Integer, String> map = new HashMap<Integer, String>();
//manipulating the hashmap here
}
}
While usingHashMapNotThreadSafe two threads running on the same instance of usingHashMapNotThreadSafe will see the same x. This could be dangerous, because the threads are trying to change map! In the second, two threads running on the same instance of usingHashMapThreadSafe will see totally different versions of x, and can't effect each other.
As long as the reference to the HashMap object is not published (is not passed to another method), it is threadsafe.
The same applies to the keys/values stored in the map. They need to be either immutable (cannot change their states after being created) or used only within this method.
I think to ensure complete concurrency, a ConcurrentHashMap should be used in any case. Even if it is local in scope. ConcurrentHashMap implements ConcurrentMap. The partitioning is essentially an attempt, as explained in the documentation to:
The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.

How to use ReadWriteLock?

I'm the following situation.
At web application startup I need to load a Map which is thereafter used by multiple incoming threads. That is, requests comes in and the Map is used to find out whether it contains a particular key and if so the value (the object) is retrieved and associated to another object.
Now, at times the content of the Map changes. I don't want to restart my application to reload the new situation. Instead I want to do this dynamically.
However, at the time the Map is re-loading (removing all items and replacing them with the new ones), concurrent read requests on that Map still arrive.
What should I do to prevent all read threads from accessing that Map while it's being reloaded ? How can I do this in the most performant way, because I only need this when the Map is reloading which will only occur sporadically (each every x weeks) ?
If the above is not an option (blocking) how can I make sure that while reloading my read request won't suffer from unexpected exceptions (because a key is no longer there, or a value is no longer present or being reloaded) ?
I was given the advice that a ReadWriteLock might help me out. Can you someone provide me an example on how I should use this ReadWriteLock with my readers and my writer ?
Thanks,
E
I suggest to handle this as follow:
Have your map accessible at a central place (could be a Spring singleton, a static ...).
When starting to reload, let the instance as is, work in a different Map instance.
When that new map is filled, replace the old map with this new one (that's an atomic operation).
Sample code:
static volatile Map<U, V> map = ....;
// **************************
Map<U, V> tempMap = new ...;
load(tempMap);
map = tempMap;
Concurrency effects :
volatile helps with visibility of the variable to other threads.
While reloading the map, all other threads see the old value undisturbed, so they suffer no penalty whatsoever.
Any thread that retrieves the map the instant before it is changed will work with the old values.
It can ask several gets to the same old map instance, which is great for data consistency (not loading the first value from the older map, and others from the newer).
It will finish processing its request with the old map, but the next request will ask the map again, and will receive the newer values.
If the client threads do not modify the map, i.e. the contents of the map is solely dependent on the source from where it is loaded, you can simply load a new map and replace the reference to the map your client threads are using once the new map is loaded.
Other then using twice the memory for a short time, no performance penalty is incurred.
In case the map uses too much memory to have 2 of them, you can use the same tactic per object in the map; iterate over the map, construct a new mapped-to object and replace the original mapping once the object is loaded.
Note that changing the reference as suggested by others could cause problems if you rely on the map being unchanged for a while (e.g. if (map.contains(key)) {V value = map.get(key); ...}. If you need that, you should keep a local reference to the map:
static Map<U,V> map = ...;
void do() {
Map<U,V> local = map;
if (local.contains(key)) {
V value = local.get(key);
...
}
}
EDIT:
The assumption is that you don't want costly synchronization for your client threads. As a trade-off, you allow client threads to finish their work that they've already begun before your map changed - ignoring any changes to the map that happened while it is running. This way, you can safely made some assumptions about your map - e.g. that a key is present and always mapped to the same value for the duration of a single request. In the example above, if your reader thread changed the map just after a client called map.contains(key), the client might get null on map.get(key) - and you'd almost certainly end this request with a NullPointerException. So if you're doing multiple reads to the map and need to do some assumptions as the one mentioned before, it's easiest to keep a local reference to the (maybe obsolete) map.
The volatile keyword isn't strictly necessary here. It would just make sure that the new map is used by other threads as soon as you changed the reference (map = newMap). Without volatile, a subsequent read (local = map) could still return the old reference for some time (we're talking about less than a nanosecond though) - especially on multicore systems if I remember correctly. I wouldn't care about it, but f you feel a need for that extra bit of multi-threading beauty, your free to use it of course ;)
I like the volatile Map solution from KLE a lot and would go with that. Another idea that someone might find interesting is to use the map equivalent of a CopyOnWriteArrayList, basically a CopyOnWriteMap. We built one of these internally and it is non-trivial but you might be able to find a COWMap out in the wild:
http://old.nabble.com/CopyOnWriteMap-implementation-td13018855.html
This is the answer from the JDK javadocs for ReentrantReadWriteLock implementation of ReadWriteLock. A few years late but still valid, especially if you don't want to rely only on volatile
class RWDictionary {
private final Map<String, Data> m = new TreeMap<String, Data>();
private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock r = rwl.readLock();
private final Lock w = rwl.writeLock();
public Data get(String key) {
r.lock();
try { return m.get(key); }
finally { r.unlock(); }
}
public String[] allKeys() {
r.lock();
try { return m.keySet().toArray(); }
finally { r.unlock(); }
}
public Data put(String key, Data value) {
w.lock();
try { return m.put(key, value); }
finally { w.unlock(); }
}
public void clear() {
w.lock();
try { m.clear(); }
finally { w.unlock(); }
}
}

Categories