I have a JSONArray which I am iterating to populate my Map as shown below. My ppJsonArray will have data like this -
[693,694,695,696,697,698,699,700,701,702]
Below is my code which is having issues with thread safety as my static analysis tool complained -
Map<Integer, Integer> m = new HashMap<Integer, Integer>();
ConcurrentMap<String, Map<Integer, Integer>> partitionsToNodeMap = new ConcurrentHashMap<String, Map<Integer, Integer>>();
int hostNum = 2;
JSONArray ppJsonArray = j.getJSONArray("pp");
for (int i = 0; i < ppJsonArray.length(); i++) {
m.put(Integer.parseInt(ppJsonArray.get(i).toString()), hostNum);
}
Map<Integer, Integer> tempMap = partitionsToNodeMap.get("PRIMARY");
if (tempMap != null) {
tempMap.putAll(m);
} else {
tempMap = m;
}
partitionsToNodeMap.put("PRIMARY", tempMap);
But when I am running static analysis tool, it is complaining as -
Non-atomic use of get/check/put on partitionsToNodeMap.put("PRIMARY", tempMap)
Which makes me think my above code is not thread safe? How can I resolve this issue?
The above code is not thread safe.
Does it need to be thread safe? (i.e., Is partitionsToNodeMap used by more than one thread? Could more than one thread run this routine? or could thread A thread update partitionsToNodeMap in some other routine while thread B runs this routine?)
If you answered "yes" to any of those questions, then you probably need to use some kind of synchronization.
partitionsToNodeMap is a ConcurrentHashMap. That will prevent the map structure itself from becoming corrupt if it is updated by more than one thread at one time; but the data in the map presumably aren't just random strings and integers. It probably means something to your program. The fact that the map structure itself is protected from corruption will not prevent the higher-level meaning of the map contents from becoming corrupt.
Can you provide an example how can I protect this?
Not a complete one, because thread-safety is a property of the whole program. You can't do thread-safety function-by-function.
Being thread-safe is all about protecting invariants. An invariant is an assertion about your data that must always be true. For example, if you were modeling a game of Monopoly, one invariant would say that the total amount of money in the game must always be $15,140.
If some thread in the Monopoly game processes a payment by taking X dollars away from one player, and returning it to the bank, that's a two step process, and in-between the two steps the invariant is broken. If the first thread were preempted in-between the two steps, and some other thread counted all of the money in the game, it would get the wrong total.
The main use-case for the Java synchronized keyword (or equivalently, for the java.util.concurrent.locks.ReentrantLock class) is to prevent other threads from seeing broken invariants.
Either way of locking is voluntary. To make it work, you must wrap every block of code that can temporarily break an invariant in a protected block
synchronized(bank-lock) {
deductNDollarsFrom(N, player);
giveNDollarsTo(N, bank);
}
AND every block of code that cares about the invariant must also be wrapped in a protected block.
synchronized(bank-lock) {
int totalDollars = countAllMoneyInGame(...);
if (totalDollars != 15140) {
throw new CheatingDetectedException(...);
}
}
Java won't let the balance transfer and the audit happen at the same time because it never allows two threads to synchronize on the same object (bank-lock, in this case) at the same time.
You will have to figure out what your invariants are. The static analyzer is telling you that the get()...put() sequence looks like a block of code that might care about an invariant. You have to figure out whether it really does or not. Is there something that some other thread could do in-between the get() and the put() that could cause things to go south? If so then both blocks of code should synchronize on the same object so that they can not both be executed at the same time.
Your static analysis tool is confused because what you're doing looks like a classic race condition.
Map<Integer, Integer> tempMap = partitionsToNodeMap.get("PRIMARY"); // GET
if (tempMap != null) { // CHECK
tempMap.putAll(m);
} else {
tempMap = m;
}
partitionsToNodeMap.put("PRIMARY", tempMap); // PUT
If another thread were to partitionsToNodeMap.put("PRIMARY"); after you get assign tempMap, you would overwrite the other thread's work. Among a myriad of other potential bad things. It seems like you don't have multiple threads accessing it though, so it isn't an issue. However, it would be more clearly expressed as:
Map<Integer, Integer> primaryMap = partitionsToNodeMap.get("PRIMARY");
if (primaryMap != null) {
primaryMap.putAll(m);
} else {
partitionsToNodeMap.put("PRIMARY", m);
}
If you want to make the static analysis tool happy, swap out your concurrent map for a regular map. The code you've provided doesn't require a threadsafe data structure.
Related
I am trying to sort objects into five separate groups depending on a weight given to them at instantiation.
Now, I want to sort these objects into the five groups by their weights. In order to do this, each one must be compared to the other.
Now the problem I'm having is these objects are added to the groups on separate worker threads. Each one is sent to the synchronized sorting function, which compares against all members currently in the three groups, after an object has completed downloading a picture.
The groups have been set up as two different maps. The first being a Hashtable, which crashes the program throwing an unknown ConcurrencyIssue. When I use a ConcurrentHashMap, the data is wrong because it doesn't remove the entry in time before the next object is compared against the ConcurrentHashmap. So this causes a logic error and yields groups that are sorted correctly only half of the time.
I need the hashmap to immediately remove the entry from the map before the next sort occurs... I thought synchronizing the function would do this but it still doesn't seem to work.
Is there a better way to sort objects against each other that are being added to a datastructure by worker threads? Thanks! I'm a little lost on this one.
private synchronized void sortingHat(Moment moment) {
try {
ConcurrentHashMap[] helperList = {postedOverlays, chanl_2, chanl_3, chanl_4, chanl_5};
Moment moment1 = moment;
//Iterate over all channels going from highest channel to lowest
for (int i = channelCount - 1; i > 0; i--) {
ConcurrentHashMap<String, Moment> table = helperList[i];
Set<String> keys = table.keySet();
boolean mOverlap = false;
double width = getWidthbyChannel(i);
//If there is no objects in table, don't bother trying to compare...
if (!table.isEmpty()) {
//Iterate over all objects currently in the hashmap
for (String objId : keys) {
Moment moment2 = table.get(objId);
//x-Overlap
if ((moment2.x + width >= moment1.x - width) ||
(moment2.x - width <= moment1.x + width)) {
//y-Overlap
if ((moment2.y + width >= moment1.y - width) ||
(moment2.y - width <= moment1.y + width)) {
//If there is overlap, only replace the moment with the greater weight.
if (moment1.weight >= moment2.weight) {
mOverlap = true;
table.remove(objId);
table.put(moment1.id, moment1);
}
}
}
}
}
//If there is no overlap, add to channel anyway
if (!mOverlap) {
table.put(moment1.id, moment1);
}
}
} catch (Exception e) {
Log.d("SortingHat", e.toString());
}
}
The table.remove(objId) is where the problems occur. Moment A gets sent to sorting function, and has no problems. Moment B is added, it overlaps, it compares against Moment A. If Moment B is less weight than Moment A, everything is fine. If Moment B is weighted more and A has to be removed, then when moment C gets sorted moment A will still be in the hashmap along with moment B. And so that seems to be where the logic error is.
You are having an issue with your synchronization.
The synchronize you use, will synchronize using the "this" lock. You can imagine it like this:
public synchronized void foo() { ... }
is the same as
public void foo() {
synchronized(this) {
....
}
}
This means, before entering, the current Thread will try to acquire "this object" as a lock. Now, if you have a worker Thread, that also has a synchronized method (for adding stuff to the table), they won't totally exclude each other. What you wanted is, that one Thread has to finish with his work, before the next one can start its work.
The first being a Hashtable, which crashes the program throwing an unknown ConcurrencyIssue.
This problem accourse because it may happen, that 2 Threads call something at the same time. To illustrate, imagine one Thread calling put(key, value) on it and another Thread calling remove(key). If those calls get executed at the same time (like by different cores) what will be the resulting HashTable? Because noone can say for sure, a ConcurrentModificationException will be thrown. Note: This is a verry simplyfied explanation!
When I use a ConcurrentHashMap, the data is wrong because it doesn't remove the entry in time before the next object is compared against the ConcurrentHashmap
The ConcurrentHashMap is a utility, for avoiding said concurrency issues, it is not magical, multi functional, unicorn hunting, butter knife. It snynchronizes the mehtod calls, which results in the fact, that only one Thread can either add to or remove from or do any other work on the HashMap. It does not have the same functionallity as a Lock of some sort, which would result in the access over the map being allocated to on Thread.
There could be one Thread that wants to call add and one that want to call remove. The ConcurrentHashMap only limits those calls in the matter, that they can't happen at the same time. Which comes first? You have power over that (in this scenario). What you want is, that one thread has to finish with his work, before the next one can do its work.
What you realy need is up to you. The java.util.concurrent package brings a whole arsenal of classes you could use. For example:
You could use a lock for each Map. With that, each Thread (either sorting/removing/adding or whatever) could first fetch the Lock for said Map and than work on that Map, like this:
public Worker implements Runnable {
private int idOfMap = ...;
#Override
public void run() {
Lock lock = getLock(idOfMap);
try {
lock.lock();
// The work goes here
//...
} finally {
lock.unlock();
}
}
}
The line lock.lock() would ensure, that there is no other Thread, that is currently working on the Map and modifing it, after the method call returns and this Thread will therefore have the mutial access over the Map. No one sort, before you are finished removing the right element.
Of course, you would somehow have to hold said locks, like in a data-object. With that being said, you could also utilize the Semaphore, synchronized(map) in each Thread or formulating your work on the Map in the form of Runnables and passing those to another Thread that calls all Runnables he received one by one. The possibilities are nearly endless. I personally would recommend on starting with the lock.
This question already has answers here:
Java double checked locking
(11 answers)
Closed 7 years ago.
The following code uses a double checked pattern to initialize variables. I believe the code is thread safe, as the map wont partially assigned even if two threads are getting into getMap() method at the same time. So I don't have to make the map as volatile as well. Is the reasoning correct? NOTE: The map is immutable once it is initialized.
class A {
private Map<String, Integer> map;
private final Object lock = new Object();
public static Map<String, Integer> prepareMap() {
Map<String, Integer> map = new HashMap<>();
map.put("test", 1);
return map;
}
public Map<String, Integer> getMap() {
if (map == null) {
synchronized (lock) {
if (map == null) {
map = prepareMap();
}
}
}
return map;
}
}
According to the top names in the Java world, no it is not thread safe. You can read why here: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
You better off using ConcurrentHashmap or synchronizing your Map.
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html
Edit: If you only want to make the initialization of the map thread safe (so that two or more maps are not accidentally created) then you can do two things. 1) initialize the map when it is declared. 2) make the getMap() method synchronized.
No, your reasoning is wrong, access to the map is not thread safe, because the threads that call getMap() after the initialization may not invoke synchronized(lock) and thus are not in happens-before relation to other threads.
The map has to be volatile.
The code could be optimized by inlining to
public Map<String,Integer> getMap()
{
if(map == null)
{
synchronized(lock)
{
if(map == null)
{
map = new HashMap<>(); // partial map exposed
map.put("test", 1);
}
}
}
return map;
}
}
Having a HashMap under concurrent read and write is VERY dangerous, don't do it. Google HashMap infinite loop.
Solutions -
Expand synchronized to the entire method, so that reading map variable is also under lock. This is a little expensive.
Declare map as volatile, to prevent reordering optimization. This is simple, and pretty cheap.
Use an immutable map. The final fields will also prevent exposing partial object state. In your particular example, we can use Collections.singletonMap. But for maps with more entries, I'm not sure JDK has a public implementation.
This is just one example of how things can go wrong. To fully understand the issues, there is no substitute for reading The "Double-Checked Locking is Broken" Declaration, referenced in a prior answer.
To get anything approaching the full flavor, think about two processors, A and B, each with its own caches, and a main memory that they share.
Suppose Thread A, running on Processor A, first calls getMap. It does several assignments inside the synchronized block. Suppose the assignment to map gets written to main memory first, before Thread A reaches the end of the synchronized block.
Meanwhile, on Processor B, Thread B also calls getMap, and does not happen to have the memory location representing map in its cache. It goes out to main memory to get it, and its read happens to hit just after Thread A's assignment to map, so it sees a non-null map. Thread B does not enter the synchronized block.
At this point, Thread B can go ahead and attempt to use the HashMap, despite the fact that Thread A's work on creating it has not yet been written to main memory. Thread B may even have the memory pointed to by map in its cache because of a prior use.
If you are tempted to try to work around this, consider the following quote from the referenced article:
There are lots of reasons it doesn't work. The first couple of reasons
we'll describe are more obvious. After understanding those, you may be
tempted to try to devise a way to "fix" the double-checked locking
idiom. Your fixes will not work: there are more subtle reasons why
your fix won't work. Understand those reasons, come up with a better
fix, and it still won't work, because there are even more subtle
reasons.
This answer only contains one of the most obvious reasons.
No, it is not thread safe.
The basic reason is that you can have reordering of operations you don't even see in the Java code. Let's imagine a similar pattern with an even simpler class:
class Simple {
int value = 42;
}
In the analogous getSimple() method, you assign /* non-volatile */ simple = new Simple (). What happens here?
the JVM allocates some space for the new object
the JVM sets some bit of this space to 42 (for value)
the JVM returns the address of this space, which is then assigned to space
Without synchronization instructions to prohibit it, these instructions can be reordered. In particular, steps 2 and 3 can be ordered such that simple gets the new object's address before the constructor finishes! If another thread then reads simple.value, it'll see a value 0 (the field's default value) instead of 42. This is called seeing a partially-constructed object. Yes, that's weird; yes, I've seen things like that happen. It's a real bug.
You can imagine how if the object is a non-trivial object, like HashMap, the problem is even worse; there are a lot more operations, and so more possibilities for weird ordering.
Marking the field as volatile is a way of telling the JVM, "any thread that reads a value from this field must also read all operations that happened before that value was written." That prohibits those weird reorderings, which guarantees you'll see the fully-constructed object.
Unless you declare the lock as volatile, this code may be translated to non-thread-safe bytecode.
The compiler may optimize the expression map == null, cache the value of the expression and thus read the map property only once.
volatile Map<> map instructs the Java VM to always read the property map when it is accessed. Thsi would forbid such optimization from the complier.
Please refer to JLS Chapter 17. Threads and Locks
I have a wordCount(CharacterReader charReader) function which takes a stream of characters, converts them to words.
I also have a Collection<CharacterReader> characerReaders, containing multiple character streams. The number of readers in a collection can vary, I want to read from all streams and have a count of all words.
I'm a little confused about threads and couldn't find any examples which were similar to this.
I essentially want multiple threads outputting their words into a SortedMap so I can have a real time total word count.
How would I go about doing this?
Thanks
If you are going to have multiple threads writing to the map, you need to use a ConcurrentSkipListMap which is both a SortedMap and a ConcurrentMap.
You can create for each CharacterReader in the collection a Runnable which calls the wordCount function (which accesses the map described previously).
After creating the Runnables you can create an ExecutorService (for example using Executors.newCacheThreadPool()), pass it all the Runnables and wait for them to finish (see the example in the javadoc of class ExecutorService).
You can also create the Runnables just before sending them to the ExecutorService.
Create a WordMap class which encapsulates your sorted map, and makes sure all the accesses to the map are properly synchronized. Or use a concurrent map that is already thread safe.
Create an instance of this class. Use the Executors class to create an ExecutorService with the characteristics that you desire.
Then iterate through the collection, and for each reader, create a Callable or a Runnable filling the WordMap instance with the words found in this reader, and submit this Callable or Runnable to the ExecutorService.
vainolo and JB's answers are both good.
I will add one thing, which is a description of how to make a highly concurrent data structure to store your word counts.
As vainolo said, a ConcurrentSkipListMap is the basic data structure you want, because it is both sorted and concurrent. To make good use if it, you want to avoid doing any locking. That means you must avoid patterns which involve a lock-read-write-unlock cycle. That has two consequences: firstly, putting a new word in the map should not involve a lock, and incrementing the count of an existing word should not involve a lock.
You can safely add new things to the map using ConcurrentMap's putIfAbsent method. However, that alone is not quite enough, because you have to supply a potential value every time you use it, which is potentially expensive. The easiest thing to do is to use a sort of double-checked locking pattern, where you first simply try to get an existing value, then if you find there isn't one, add a new one with putIfAbsent (you can't simply call put, because there could be a race between two threads putting at the same time).
Incrementing without locking can easily be done by not storing integers in the map, but rather objects which themselves contain integers. That way, you never have to put an incremented value in the map, you just increment the object already there. AtomicInteger seems like a good candidate for this.
Putting that together, you get:
public class WordCounts {
private final ConcurrentMap<String, AtomicInteger> counts
= new ConcurrentSkipListMap<String, AtomicInteger>();
public void count(String word) {
AtomicInteger count = getCount(word);
count.incrementAndGet();
}
private AtomicInteger getCount(String word) {
AtomicInteger count = counts.get(word);
if (count == null) {
AtomicInteger newCount = new AtomicInteger();
count = counts.putIfAbsent(word, newCount);
if (count == null) count = newCount;
}
return count;
}
}
I'm wondering if there's a way in Java to synchronize using two lock objects.
I don't mean locking on either object, I mean locking only on both.
e.g. if I have 4 threads:
Thread A requests a lock using Object1 and Object2
Thread B requests a lock using Object1 and Object3
Thread C requests a lock using Object4 and Object2
Thread D requests a lock using Object1 and Object2
In the above scenario, Thread A and Thread D would share a lock, but Thread B and Thread C would have their own locks. Even though they overlap with one of the two objects, the same lock only applies if it overlaps on both.
So I have a method called by many threads which is going to perform a specific activity type based on a specific database. I have identifier objects for both the database and the activity, and I can guarantee that the action will be thread safe as long as it is not the same activity based on the same database as another thread.
My ideal code would look something like:
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID, actID ) { // <--- Not real Java
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
I could create a hashmap of lock objects that are keyed by both the DatabaseIdentifier and the ActivityIdentifier, but I'm going to run into the same synchronization issue when I need to create/access those locks in a thread-safe way.
For now I'm just synchronizing on the DatabaseIdentifier. It's much less likely that there will be multiple activities going on at the same time for one DBIdentifier, so I will only rarely be over-locking. (Can't say the same for the opposite direction though.)
Anyone have a good way to handle this that doesn't involve forcing unnecessary threads to wait?
Thanks!
have each DatabaseIdentifier keep a set of locks keyed to ActivityIdentifiers that it owns
so you can call
public void doActivity(DatabaseIdentifier dbID, ActivityIdentifier actID) {
synchronized( dbID.getLock(actID) ) {
// Do an action that can be guaranteed thread-safe per unique
// combination of dbIT and actID, but needs to share a
// lock if they are both the same.
}
}
then you only need a (short) lock on the underlying collection (use a ConcurrentHashMap) in dbID
in other words
ConcurrentHashMap<ActivityIdentifier ,Object> locks = new...
public Object getLock(ActivityIdentifier actID){
Object res = locks.get(actID); //avoid unnecessary allocations of Object
if(res==null) {
Object newLock = new Object();
res = locks.puIfAbsent(actID,newLock );
return res!=null?res:newLock;
} else return res;
}
this is better than locking the full action on dbID (especially when its a long action) but still worse than your ideal scenario
update in responce to comments about EnumMap
private final EnumMap<ActivityIdentifier ,Object> locks;
/**
initializer ensuring all values are initialized
*/
{
EnumMap<ActivityIdentifier ,Object> tmp = new EnumMap<ActivityIdentifier ,Object>(ActivityIdentifier.class)
for(ActivityIdentifier e;ActivityIdentifier.values()){
tmp.put(e,new Object());
}
locks = Collections.unmodifiableMap(tmp);//read-only view ensures no modifications will happen after it is initialized making this thread-safe
}
public Object getLock(ActivityIdentifier actID){
return locks.get(actID);
}
I think you should go the way of the hashmap, but encapsulate that in a flyweight factory. Ie, you call:
FlyweightAllObjectsLock lockObj = FlyweightAllObjectsLock.newInstance(dbID, actID);
Then lock on that object. The flyweight factory can get a read lock on the map to see if the key is in there, and only do a write lock if it is not. It should reduce the concurrency factor.
You might also want to look into using weak references on that map as well, to avoid keeping memory from garbage collection.
I can't think of a way to do this that really captures your idea of locking a pair of objects. Some low-level concurrency boffin might be able to invent one, but i have my doubts about whether we would have the necessary primitives to implement it in Java.
I think the idea of using the pairs as keys to identify lock objects is a good one. If you want to avoid locking, then arrange the lookup so that it doesn't do any.
I would suggest a two-level map, vaguely like:
Map<DatabaseIdentifier, Map<ActivityIdentifier, Lock>> locks;
Used vaguely thus:
synchronized (locks.get(databaseIdentifier).get(activityIdentifier)) {
performSpecificActivityOnDatabase();
}
If you know what all the databases and activities are upfront, then just create a perfectly normal map containing all the combinations when your application starts up, and use it exactly as above. The only locking is on the lock objects, and there is no contention.
If you don't know what the databases and activities will be, or there are too many combinations to create a complete map upfront, then you will need to create the map incrementally. This is where Concurrency Fun Times begin.
The straightforward solution is to lazily create the inner maps and the locks, and to protect these actions with normal locks:
Map<ActivityIdentifier, Object> locksForDatabase;
synchronized (locks) {
locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
locksForDatabase = new HashMap<ActivityIdentifier, Object>();
locks.put(databaseIdentifier, locksForDatabase);
}
}
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
As you are evidently aware, this will lead to too much contention. I mention it only for didactic completeness.
You can improve it by making the outer map concurrent:
ConcurrentMap<DatabaseIdentifier, Map<ActivityIdentifier, Object>> locks;
And:
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
Map<ActivityIdentifier, Object> locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
Object lock;
synchronized (locksForDatabase) {
lock = locksForDatabase.get(locksForDatabase);
if (lock == null) {
lock = new Object();
locksForDatabase.put(locksForDatabase, lock);
}
}
synchronized (lock) {
performSpecificActivityOnDatabase();
}
Your only lock contention there will be on the per-database maps, for the duration of a put and a get, and according to your report, there won't be much of that. You could convert the inner map to a ConcurrentMap to avoid that, but that sounds like overkill.
There will, however, be a steady stream of HashMap instances being created to be fed to putIfAbsent and then being thrown away. You can avoid that with a sort of postmodern atomic remix of double-checked locking; replace the first three lines with:
Map<ActivityIdentifier, Object> locksForDatabase = locks.get(databaseIdentifier);
if (locksForDatabase == null) {
Map<ActivityIdentifier, Object> newHashMap = new HashMap<ActivityIdentifier, Object>();
locksForDatabase = locks.putIfAbsent(databaseIdentifier, newHashMap);
if (locksForDatabase == null) locksForDatabase = newHashMap;
}
In the common case that the per-database map already exists, this will do a single concurrent get. In the uncommon case that it does not, it will do an additional but necessary new HashMap() and putIfAbsent. In the very rare case that it does not, but another thread has also discovered that, one of the threads will be doing a redundant new HashMap() and putIfAbsent. That should not be expensive.
Actually, it occurs to me that this is all a terrible idea, and that you should just stick the two identifiers together to make one double-size key, and use that to make lookups in a single ConcurrentHashMap. Sadly, i am too lazy and vain to delete the above. Consider this advice a special prize for reading this far.
PS It always mildly annoys me to see an instance of Object used as nothing but a lock. I propose calling them LockGuffins.
Your hashmap suggestion is what I've done in the past. The only change I'd make is using a ConcurrentHashMap, to minimize the synchronization.
The other issue is how to cleanup the map if the possible keys are going to change.
I'm the following situation.
At web application startup I need to load a Map which is thereafter used by multiple incoming threads. That is, requests comes in and the Map is used to find out whether it contains a particular key and if so the value (the object) is retrieved and associated to another object.
Now, at times the content of the Map changes. I don't want to restart my application to reload the new situation. Instead I want to do this dynamically.
However, at the time the Map is re-loading (removing all items and replacing them with the new ones), concurrent read requests on that Map still arrive.
What should I do to prevent all read threads from accessing that Map while it's being reloaded ? How can I do this in the most performant way, because I only need this when the Map is reloading which will only occur sporadically (each every x weeks) ?
If the above is not an option (blocking) how can I make sure that while reloading my read request won't suffer from unexpected exceptions (because a key is no longer there, or a value is no longer present or being reloaded) ?
I was given the advice that a ReadWriteLock might help me out. Can you someone provide me an example on how I should use this ReadWriteLock with my readers and my writer ?
Thanks,
E
I suggest to handle this as follow:
Have your map accessible at a central place (could be a Spring singleton, a static ...).
When starting to reload, let the instance as is, work in a different Map instance.
When that new map is filled, replace the old map with this new one (that's an atomic operation).
Sample code:
static volatile Map<U, V> map = ....;
// **************************
Map<U, V> tempMap = new ...;
load(tempMap);
map = tempMap;
Concurrency effects :
volatile helps with visibility of the variable to other threads.
While reloading the map, all other threads see the old value undisturbed, so they suffer no penalty whatsoever.
Any thread that retrieves the map the instant before it is changed will work with the old values.
It can ask several gets to the same old map instance, which is great for data consistency (not loading the first value from the older map, and others from the newer).
It will finish processing its request with the old map, but the next request will ask the map again, and will receive the newer values.
If the client threads do not modify the map, i.e. the contents of the map is solely dependent on the source from where it is loaded, you can simply load a new map and replace the reference to the map your client threads are using once the new map is loaded.
Other then using twice the memory for a short time, no performance penalty is incurred.
In case the map uses too much memory to have 2 of them, you can use the same tactic per object in the map; iterate over the map, construct a new mapped-to object and replace the original mapping once the object is loaded.
Note that changing the reference as suggested by others could cause problems if you rely on the map being unchanged for a while (e.g. if (map.contains(key)) {V value = map.get(key); ...}. If you need that, you should keep a local reference to the map:
static Map<U,V> map = ...;
void do() {
Map<U,V> local = map;
if (local.contains(key)) {
V value = local.get(key);
...
}
}
EDIT:
The assumption is that you don't want costly synchronization for your client threads. As a trade-off, you allow client threads to finish their work that they've already begun before your map changed - ignoring any changes to the map that happened while it is running. This way, you can safely made some assumptions about your map - e.g. that a key is present and always mapped to the same value for the duration of a single request. In the example above, if your reader thread changed the map just after a client called map.contains(key), the client might get null on map.get(key) - and you'd almost certainly end this request with a NullPointerException. So if you're doing multiple reads to the map and need to do some assumptions as the one mentioned before, it's easiest to keep a local reference to the (maybe obsolete) map.
The volatile keyword isn't strictly necessary here. It would just make sure that the new map is used by other threads as soon as you changed the reference (map = newMap). Without volatile, a subsequent read (local = map) could still return the old reference for some time (we're talking about less than a nanosecond though) - especially on multicore systems if I remember correctly. I wouldn't care about it, but f you feel a need for that extra bit of multi-threading beauty, your free to use it of course ;)
I like the volatile Map solution from KLE a lot and would go with that. Another idea that someone might find interesting is to use the map equivalent of a CopyOnWriteArrayList, basically a CopyOnWriteMap. We built one of these internally and it is non-trivial but you might be able to find a COWMap out in the wild:
http://old.nabble.com/CopyOnWriteMap-implementation-td13018855.html
This is the answer from the JDK javadocs for ReentrantReadWriteLock implementation of ReadWriteLock. A few years late but still valid, especially if you don't want to rely only on volatile
class RWDictionary {
private final Map<String, Data> m = new TreeMap<String, Data>();
private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock r = rwl.readLock();
private final Lock w = rwl.writeLock();
public Data get(String key) {
r.lock();
try { return m.get(key); }
finally { r.unlock(); }
}
public String[] allKeys() {
r.lock();
try { return m.keySet().toArray(); }
finally { r.unlock(); }
}
public Data put(String key, Data value) {
w.lock();
try { return m.put(key, value); }
finally { w.unlock(); }
}
public void clear() {
w.lock();
try { m.clear(); }
finally { w.unlock(); }
}
}