strange HashMap.put() behaviour - java

I am attempting to troubleshoot an intermittent failure that appears to be related to removing an object from a HashMap and then putting that same object back using a new key. My HashMap is created as follows:
transactions = new HashMap<Short, TransactionBase>();
The code that does the re-assignment is as follows:
transactions.remove(transaction.tran_no);
transaction.tran_no = generate_transaction_id();
transactions.put(transaction.tran_no, transaction);
The intermittent behaviour that I am seeing is that code that executes immediately after this that depends upon the transaction object being locatable does not appear to find the transaction object using the new transaction id. However, at some future point, the transaction can be located. So pulling at straws, is there any sort of asynchronous effect to put() or remove that can cause this sort of behaviour?
I should mention that, to the best of my knowledge, the container is being accessed by only one thread. I have already read in he documentation that class HashMap is not "synchronised".

There is a slight difference between remove/get and put (although my guess is that you have a threading issue).
The parameter for remove/get is of type Object; for put it is of type K. The reason for that has been stated many times before. This means it has problems with boxing. I'm not even going to guess what the rules are. If a value gets boxed as a Byte in one place and a Short in another, then those two objects cannot be equal.
There is a similar issue with List.remove(int) and List.remove(Object).

I presume that every time you check for the presence of the item you're definitely using a short or Short argument to Map.get() or Map.contains()?
These methods take Object arguments so if you pass them an int it will be converted to an Integer and will never match any item in your Map because they will all have Short keys.

There are no "asynchronous" effects in the HashMap class. As soon as you put something in there, it's there. You should double- and triple- check to make sure that there are no threading issues.
The only other thing I can think of is that you're making a copy of the HashMap somewhere. The copy obviously won't be affected by you adding stuff into the original, or vice versa.

Just one suggestion...you're focusing on accesses to the HashMap but I wonder if you should also see if your generate_transaction_id() is thread safe or if it is behaving in an unexpected way.

Have you overridden equals() but not hashCode() in your model objects? How about compareTo() ? If you get these wrong, Collections will behave strangely indeed.
Check Java Practices on equals() and compareTo().

What does this generate_transaction_id() function do? If it is generating a 16-bit GUID-like thing, you could easily get hash collisions. Combined with threading, you could get:
T1: transaction1.tran_no = generate_transaction_id(); => 1729
T2: transaction2.tran_no = generate_transaction_id(); => 1729
T1: transactions.put(transaction1.tran_no, transaction1); => map.put(1729, transaction1)
T2: transactions.put(transaction2.tran_no, transaction2); => map.put(1729, transaction2)
T1: int tran_no = transactions.get(1729); => transaction2
T1: transactions.remove(transaction.tran_no); => map.remove(1729)
T2: int tran_no = transactions.get(1729); => null
Of course, this could only be a solution if that 'to the best of your knowledge' part is not true.

So you know HashMap's not thread safe. Are you sure it's being accessed by only one thread? After all, intermittent failures a frequently threading related. If not, you can wrap it with Collections.synchronizedMap(), like so:
Collections.synchronizedMap(transactions);
You could always just try so you can eliminate that possibility.
It should be pointed out that this just wraps the original map with one with all the methods synchronized. You may want to consider using a synchronized block if the access is localized.

Threading has been mentioned in a few responses already, but have you considered the visibility issue for objects used by multiple threads? It's possible (and quite common) that if you put an object into a collection in one thread, it will not be "published" to other threads unless you have properly synchronized on the collection.
Threads and Locks
Synchronization and thread safety in Java

As others have observed you HAVE to know whether the HashMap is accessed by just one Thread, or not. CollectionSpy is a new profiler that lets you find out instantly, for all containers, how many threads perform any accesses. See www.collectionspy.com for more details.

Related

thread safe map operation

I came across the following piece of code ,and noted a few inconsistencies - for a multi-thread safe code.
Map<String,Map<String,Set<String>> clusters = new HashMap<.........>;
Map<String,Set<String>> servers = clusters.get(clusterkey);
if(servers==null){
synchronized(clusterkey){
servers = clusters.get(clusterkey);
if(servers==null){....initialize new hashmap and put...}
}
}
Set<String> users=servers.get(serverkey);
if(users==null){
synchronized(serverkey){
users=servers.get(serverkey);
if(users==null){ ... initialize new hashset and put...}
}
}
users.add(userid);
Why would map be synchronized on clusterkey- shouldnt it be on the map as monitor itself?
Shouldnt the last users.add... be synchronized too?
This seems to be a lot of code to add a single user in a thread-safe manner.What would be a smarter implementation?
Here just some observations:
Synchronizing on a String is a very bad idea -> sync on clusterKey and serverKey will probably not work the way intended.
Better would be to use ConcurrentHashMaps and ConcurrentHashSets.
Though without more context it is not really possible to answer this question. It seems the code-author wanted to safely create just 1 mapping per clusterKey and serverKey so the user can be added just once.
A (probably better) way would be to just synchronize on the clusters map itself and then you're safe as only one thread can read and/or write to said map.
Another way would be to use custom Locks, maybe one for reading, and another one for writing, though this may lead again to inconsistencies if one thread is writing to the Map while another is reading that exact value from it.
The code looks like a non-thought through version of the Double checked locking idiom that sometimes is used for lazy initialisation. Read the provided link for why this is a really bad implementation of it.
The problem with the given code is that it fails intermittently. There is a race condition when there are several threads trying to work on the map using the same key (or keys with the same hashcode) which means that the map created first might be replaced by the second hashmap.
1 -The synch is trying to avoid that two threads, at the same time, create a new Entry in that Map. The second one must wait so his (servers==null) doesn't also return true.
2 - That users list seems to be out of scope, but seems like it doesn't need a synch. Maybe the programmer knows there is no duplicated userIds, or maybe he doesn't care about resetting the same user again and again.
3- ConcurrentHashMap maybe?

Is ConcurrentHashMap totally safe?

this is a passage from JavaDoc regarding ConcurrentHashMap. It says retrieval operations generally do not block, so may overlap with update operations. Does this mean the get() method is not thread safe?
"However, even though all operations are thread-safe, retrieval
operations do not entail locking, and there is not any support for
locking the entire table in a way that prevents all access. This class
is fully interoperable with Hashtable in programs that rely on its
thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset."
The get() method is thread-safe, and the other users gave you useful answers regarding this particular issue.
However, although ConcurrentHashMap is a thread-safe drop-in replacement for HashMap, it is important to realize that if you are doing multiple operations you may have to change your code significantly. For example, take this code:
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
In a multi-thread environment, this is a race condition. You have to use the ConcurrentHashMap.putIfAbsent(K key, V value) and pay attention to the return value, which tells you if the put operation was successful or not. Read the docs for more details.
Answering to a comment that asks for clarification on why this is a race condition.
Imagine there are two threads A, B that are going to put two different values in the map, v1 and v2 respectively, having the same key. The key is initially not present in the map. They interleave in this way:
Thread A calls containsKey and finds out that the key is not present, but is immediately suspended.
Thread B calls containsKey and finds out that the key is not present, and has the time to insert its value v2.
Thread A resumes and inserts v1, "peacefully" overwriting (since put is threadsafe) the value inserted by thread B.
Now thread B "thinks" it has successfully inserted its very own value v2, but the map contains v1. This is really a disaster because thread B may call v2.updateSomething() and will "think" that the consumers of the map (e.g. other threads) have access to that object and will see that maybe important update ("like: this visitor IP address is trying to perform a DOS, refuse all the requests from now on"). Instead, the object will be soon garbage collected and lost.
It is thread-safe. However, the way it is being thread-safe may not be what you expect. There are some "hints" you can see from:
This class is fully interoperable with Hashtable in programs that
rely on its thread safety but not on its synchronization details
To know the whole story in a more complete picture, you need to be aware of the ConcurrentMap interface.
The original Map provides some very basic read/update methods. Even I was able to make a thread-safe implementation of Map; there are lots of cases that people cannot use my Map without considering my synchronization mechanism. This is a typical example:
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
This piece of code is not thread-safe, even though the map itself is. Two threads calling containsKey() at the same time could think there is no such key they both therefore insert into the Map.
In order to fix the problem, we need to do extra synchronization explicitly. Assume the thread-safety of my Map is achieved by synchronized keywords, you will need to do:
synchronized(threadSafeMap) {
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
}
Such extra code needs you to know about the "synchronization details" of the map. In the above example, we need to know that the synchronization is achieved by "synchronized".
ConcurrentMap interface take this one step further. It defines some common "complex" actions that involves multiple access to map. For example, the above example is exposed as putIfAbsent(). With these "complex" actions, users of ConcurrentMap (in most case) don't need to synchronise actions with multiple access to the map. Hence, the implementation of Map can perform more complicated synchronization mechanism for better performance. ConcurrentHashhMap is a good example. Thread-safety is in fact maintained by keeping separate locks for different partitions of the map. It is thread-safe because concurrent access to the map will not corrupt the internal data structure, or cause any update lost unexpected, etc.
With all the above in mind, the meaning of Javadoc will be clearer:
"Retrieval operations (including get) generally do not block" because ConcurrentHashMap is not using "synchronized" for its thread-safety. The logic of get itself takes care of the thread-safeness; and If you look further in the Javadoc:
The table is internally partitioned to try to permit the indicated number
of concurrent updates without contention
Not only is retrieval non-blocking, even updates can happen concurrently. However, non-blocking/concurrent-updates does not means that it is thread-UNsafe. It simply means that it is using some ways other than simple "synchronized" for thread-safety.
However, as the internal synchronization mechanism is not exposed, if you want to do some complicated actions other than those provided by ConcurrentMap, you may need to consider changing your logic, or consider not using ConcurrentHashMap. For example:
// only remove if both key1 and key2 exists
if (map.containsKey(key1) && map.containsKey(key2)) {
map.remove(key1);
map.remove(key2);
}
ConcurrentHashmap.get() is thread-safe, in the sense that
It will not throw any exception, including ConcurrentModificationException
It will return a result that was true at some (recent) time in past. This means that two back-to-back calls to get can return different results. Of course, this true of any other Map as well.
HashMap is divided into "buckets" based on hashCode. ConcurrentHashMap uses this fact. Its synchronization mechanism is based on blocking buckets rather than on entire Map. This way few threads can simultaneously write to few different buckets (one thread can write to one bucket at a time).
Reading from ConcurrentHashMap almost doesn't use synchronization. Synchronization is used when while fetching value for key, it sees null value. Since ConcurrentHashMap can't store null as values (yes, aside from keys, values also can't be nulls) it suggests that fetching null while reading happened in the middle of initializing map entry (key-value pair) by another thread: when key was assigned, but value not yet, and it still holds default null.
In such case reading thread will need to wait until entry will be written fully.
So results from read() will be based on current state of map. If you read value of key that was in the middle of updating you will likely get old value since writing process hasn't finished yet.
get() in ConcurrentHashMap is thread-safe because It reads the value
which is Volatile. And in cases when value is null of any key, then
get() method waits till it gets the lock and then it reads the updated
value.
When put() method is updating CHM, then it sets the value of that key to null, and then it creates a new entry and updates the CHM. This null value is used by get() method as signal that another thread is updating the CHM with the same key.
It just means that when one thread is updating and one thread is reading there is no guarantee that the one that called the ConcurrentHashMap method first, in time, will have their operation occur first.
Think about an update on the item telling where Bob is. If one thread asks where Bob is at about the same time that another thread updates to say he came 'inside', you can't predict whether the reader thread will get Bob's status as 'inside' or 'outside'. Even if the update thread calls the method first, the reader thread might get the 'outside' status.
The threads will not cause each other problems. The code is ThreadSafe.
One thread won't go into an infinite loop or start generating wierd NullPointerExceptions or get "itside" with half of the old status and half of the new.

Missing items from synchronized HashMap in a threaded environment

I know you have to synchronize around anything that would change the structure of a hashmap (put or remove) but it seems to me you also have to synchronize around reads of the hashmap otherwise you might be reading while another thread is changing the structure of the hashmap.
So I sync around gets and puts to my hashmap.
The only machines I have available to me to test with all only have one processor so I never had any real concurrency until the system went to production and started failing. Items were missing out of my hashmap. I assume this is because two threads were writing at the same time, but based on the code below, this should not be possible. When I turned down the number of threads to 1 it started working flawlessly, so it's definitely a threading problem.
Details:
// something for all the threads to sync on
private static Object EMREPORTONE = new Object();
synchronized (EMREPORTONE)
{
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
etc...
}
... and elsewhere....
synchronized (EMREPORTONE)
{
eri.name = (String)reportdatacache.get("name.." + eri.recip_map_id);
eri.subject = (String)reportdatacache.get("subjec" + eri.recip_map_id);
etc...
}
and that's it. I pass around reportdatacache between functions, but that's just the reference to the hashmap.
Another important point is that this is running as a servlet in an appserver (iplanet to be specific, but I know none of you have ever heard of that)
But regardless, EMREPORTONE is global to the webserver process, no two threads should be able to step on each other, yet my hashmap is getting wrecked. Any thoughts?
In servlet container environment static variables depend on classloader. So you may think that you're dealing with same static instance, but in fact it could be completely different one.
Additionally, check if you do not use the map by escaped reference elsewhere and write/remove keys from it.
And yes, use ConcurrentHashMap instead.
Yes, synchronization is not only important when writing, but also when reading. While a write will be performed under mutually exclusion, a reader might access an errenous state of the map.
I cannot recommend you under any circumstances to synchronize the Java Collections manually, there are thread-safe counterparts: Collections.synchronizedMap and ConcurrentHashMap. Use them, they will ensure, that access to them is safe in a multithreaded environment.
Futher hints, it seems that everyone is accesing the datareportcache. Is there only one instance of that object? Why not synchronize then on the cache itself? But forget then when trying to solve your problems, use the sugar from java.util.concurrent.
As I see it there are 3 possibilities here:
You are locking on two different objects. EMREPORTONE is private static however and the code that accesses the reportdatacache is in one file only. Ok, that isn't it then. But I would recommend locking on reportdatacache instead of EMREPORTONE however. Cleaner code.
You are missing some read or write to reportdatacache somewhere. There are other accesses to the map that are not synchronized. Are things never removed from the cache?
This isn't a synchronization problem but rather a race condition issue. The data in the hashmap is fine but you are expecting things to be in the cache but they haven't be stored by the other thread yet. Maybe 2 requests come in for the same eri at the same time and they are both putting values into the cache? Maybe check to see if the old value returned by put(...) is always null? Maybe explaining more about how you know that items are missing from the map would help with this.
As an aside, you are doing this:
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
But it seems like you really should be storing the eri by its id.
reportdatacache.put(recip_map_id, eri);
Then you aren't creating fake keys with the "name.." prefix. Or maybe you should create a NameSubject private static class to store the name and subject in the cache. Cleaner.
Hope something here helps.

Does re-putting an object into a ConcurrentHashMap cause a "happens-before" memory relation?

I'm working with existing code that has an object store in the form of a ConcurrentHashMap. Within the map are stored mutable objects, use by multiple threads. No two threads try to modify an object at once by design. My concern is regarding the visibility of the modifications between the threads.
Currently the objects' code has synchronization on the "setters" (guarded by the object itself). There is no synchronization on the "getters" nor are the members volatile. This, to me, would mean that visibility is not guaranteed. However, when an object is modified it is re-put back into the map (the put() method is called again, same key). Does this mean that when another thread pulls the object out of the map, it will see the modifications?
I've researched this here on stackoverflow, in JCIP, and in the package description for java.util.concurrent. I've basically confused myself I think... but the final straw that caused me to ask this question was from the package description, it states:
Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
In relation to my question, do "actions" include the modifications to the objects stored in the map before the re-put()? If all this does result in visibility across threads, is this an efficient approach? I'm relatively new to threads and would appreciate your comments.
Edit:
Thank you all for you responses! This was my first question on StackOverflow and it has been very helpful to me.
I have to go with ptomli's answer because I think it most clearly addressed my confusion. To wit, establishing a "happens-before" relation doesn't necessarily affect modification visibility in this case. My "title question" is poorly constructed regarding my actual question described in the text. ptomli's answer now jives with what I read in JCIP: "To ensure all threads see the most up-to-date values of shared mutable variables, the reading and writing threads must synchronize on a common lock" (page 37). Re-putting the object back into the map doesn't provide this common lock for the modification to the inserted object's members.
I appreciate all the tips for change (immutable objects, etc), and I wholeheartedly concur. But for this case, as I mentioned there is no concurrent modification because of careful thread handling. One thread modifies an object, and another thread later reads the object (with the CHM being the object conveyer). I think the CHM is insufficient to ensure that the later executing thread will see the modifications from the first given the situation I provided. However, I think many of you correctly answered the title question.
You call concurrHashMap.put after each write to an object. However you did not specified that you also call concurrHashMap.get before each read. This is necessary.
This is true of all forms of synchronization: you need to have some "checkpoints" in both threads. Synchronizing only one thread is useless.
I haven't checked the source code of ConcurrentHashMap to make sure that put and get trigger an happens-before, but it is only logical that they should.
There is still an issue with your method however, even if you use both put and get. The problem happens when you modify an object and it is used (in an inconsistent state) by the other thread before it is put. It's a subtle problem because you might think the old value would be read since it hasn't been put yet and it would not cause a problem. The problem is that when you don't synchronize, you are not guaranteed to get a consistent older object, but rather the behavior is undefined. The JVM can update whatever part of the object in the other threads, at any time. It's only when using some explicit synchronization that you are sure you are updating the values in a consistent way across threads.
What you could do:
(1) synchronize all accesses (getters and setters) to your objects everywhere in the code. Be careful with the setters: make sure that you can't set the object in an inconsistent state. For example, when setting first and last name, having two synchronized setters is not sufficient: you must get the object lock for both operations together.
or
(2) when you put an object in the map, put a deep copy instead of the object itself. That way the other threads will never read an object in an inconsistent state.
EDIT:
I just noticed
Currently the objects' code has synchronization on the "setters"
(guarded by the object itself). There is no synchronization on the
"getters" nor are the members volatile.
This is not good. As I said above synchronizing on only one thread is no synchronization at all. You might synchronize on all your writer threads, but who cares since the readers won't get the right values.
I think this has been already said across a few answers but to sum it up
If your code goes
CHM#get
call various setters
CHM#put
then the "happens-before" provided by the put will guarantee that all the mutate calls are executed before the put. This means that any subsequent get will be guaranteed to see those changes.
Your problem is that the actual state of the object will not be deterministic because if the actual flow of events is
thread 1: CHM#get
thread 1: call setter
thread 2: CHM#get
thread 1: call setter
thread 1: call setter
thread 1: CHM#put
then there is no guarantee over what the state of the object will be in thread 2. It might see the object with the value provided by the first setter or it might not.
The immutable copy would be the best approach as then only completely consistent objects are published. Making the various setters synchronized (or the underlying references volatile) still doesn't let you publish consistent state, it just means that the object will always see the latest value for each getter on each call.
I think your question relates more to the objects you're storing in the map, and how they react to concurrent access, than the concurrent map itself.
If the instances you're storing in the map have synchronized mutators, but not synchronized accessors, then I don't see how they can be thread safe as described.
Take the Map out of the equation and determine if the instances you're storing are thread safe by themselves.
However, when an object is modified it is re-put back into the map (the put() method is called again, same key). Does this mean that when another thread pulls the object out of the map, it will see the modifications?
This exemplifies the confusion. The instance that is re-put into the Map will be retrieved from the Map by another thread. This is the guarantee of the concurrent map. That has nothing to do with visibility of the state of the stored instance itself.
My understanding is that it should work for all gets after the re-put, but this would be a very unsafe method of synchronization.
What happens to gets that happen before the re-put, but while modifications are happening. They may see only some of the changes, and the object would have an inconsistent state.
If you can, I'd recommend store immutable objects in the map. Then any get will retrieve a version of the object that was current when it did the get.
That's a code snippet from java.util.concurrent.ConcurrentHashMap (Open JDK 7):
919 public V get(Object key) {
920 Segment<K,V> s; // manually integrate access methods to reduce overhead
921 HashEntry<K,V>[] tab;
922 int h = hash(key.hashCode());
923 long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
924 if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
925 (tab = s.table) != null) {
926 for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
927 (tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);
928 e != null; e = e.next) {
929 K k;
930 if ((k = e.key) == key || (e.hash == h && key.equals(k)))
931 return e.value;
932 }
933 }
934 return null;
935 }
UNSAFE.getObjectVolatile() is documented as getter with internal volatile semantics, thus the memory barrier will be crossed when getting the reference.
yes, put incurs a volatile write, even if key-value already exists in the map.
using ConcurrentHashMap to publish objects across thread is pretty effecient. Objects should not be modified further once they are in the map. (They don't have to be strictly immutable (with final fields))

Java concurrency scenario -- do I need synchronization or not?

Here's the deal. I have a hash map containing data I call "program codes", it lives in an object, like so:
Class Metadata
{
private HashMap validProgramCodes;
public HashMap getValidProgramCodes() { return validProgramCodes; }
public void setValidProgramCodes(HashMap h) { validProgramCodes = h; }
}
I have lots and lots of reader threads each of which will call getValidProgramCodes() once and then use that hashmap as a read-only resource.
So far so good. Here's where we get interesting.
I want to put in a timer which every so often generates a new list of valid program codes (never mind how), and calls setValidProgramCodes.
My theory -- which I need help to validate -- is that I can continue using the code as is, without putting in explicit synchronization. It goes like this:
At the time that validProgramCodes are updated, the value of validProgramCodes is always good -- it is a pointer to either the new or the old hashmap. This is the assumption upon which everything hinges. A reader who has the old hashmap is okay; he can continue to use the old value, as it will not be garbage collected until he releases it. Each reader is transient; it will die soon and be replaced by a new one who will pick up the new value.
Does this hold water? My main goal is to avoid costly synchronization and blocking in the overwhelming majority of cases where no update is happening. We only update once per hour or so, and readers are constantly flickering in and out.
Use Volatile
Is this a case where one thread cares what another is doing? Then the JMM FAQ has the answer:
Most of the time, one thread doesn't
care what the other is doing. But when
it does, that's what synchronization
is for.
In response to those who say that the OP's code is safe as-is, consider this: There is nothing in Java's memory model that guarantees that this field will be flushed to main memory when a new thread is started. Furthermore, a JVM is free to reorder operations as long as the changes aren't detectable within the thread.
Theoretically speaking, the reader threads are not guaranteed to see the "write" to validProgramCodes. In practice, they eventually will, but you can't be sure when.
I recommend declaring the validProgramCodes member as "volatile". The speed difference will be negligible, and it will guarantee the safety of your code now and in future, whatever JVM optimizations might be introduced.
Here's a concrete recommendation:
import java.util.Collections;
class Metadata {
private volatile Map validProgramCodes = Collections.emptyMap();
public Map getValidProgramCodes() {
return validProgramCodes;
}
public void setValidProgramCodes(Map h) {
if (h == null)
throw new NullPointerException("validProgramCodes == null");
validProgramCodes = Collections.unmodifiableMap(new HashMap(h));
}
}
Immutability
In addition to wrapping it with unmodifiableMap, I'm copying the map (new HashMap(h)). This makes a snapshot that won't change even if the caller of setter continues to update the map "h". For example, they might clear the map and add fresh entries.
Depend on Interfaces
On a stylistic note, it's often better to declare APIs with abstract types like List and Map, rather than a concrete types like ArrayList and HashMap. This gives flexibility in the future if concrete types need to change (as I did here).
Caching
The result of assigning "h" to "validProgramCodes" may simply be a write to the processor's cache. Even when a new thread starts, "h" will not be visible to a new thread unless it has been flushed to shared memory. A good runtime will avoid flushing unless it's necessary, and using volatile is one way to indicate that it's necessary.
Reordering
Assume the following code:
HashMap codes = new HashMap();
codes.putAll(source);
meta.setValidProgramCodes(codes);
If setValidCodes is simply the OP's validProgramCodes = h;, the compiler is free to reorder the code something like this:
1: meta.validProgramCodes = codes = new HashMap();
2: codes.putAll(source);
Suppose after execution of writer line 1, a reader thread starts running this code:
1: Map codes = meta.getValidProgramCodes();
2: Iterator i = codes.entrySet().iterator();
3: while (i.hasNext()) {
4: Map.Entry e = (Map.Entry) i.next();
5: // Do something with e.
6: }
Now suppose that the writer thread calls "putAll" on the map between the reader's line 2 and line 3. The map underlying the Iterator has experienced a concurrent modification, and throws a runtime exception—a devilishly intermittent, seemingly inexplicable runtime exception that was never produced during testing.
Concurrent Programming
Any time you have one thread that cares what another thread is doing, you must have some sort of memory barrier to ensure that actions of one thread are visible to the other. If an event in one thread must happen before an event in another thread, you must indicate that explicitly. There are no guarantees otherwise. In practice, this means volatile or synchronized.
Don't skimp. It doesn't matter how fast an incorrect program fails to do its job. The examples shown here are simple and contrived, but rest assured, they illustrate real-world concurrency bugs that are incredibly difficult to identify and resolve due to their unpredictability and platform-sensitivity.
Additional Resources
The Java Language Specification - 17 Threads and Locks sections: §17.3 and §17.4
The JMM FAQ
Doug Lea's concurrency books
No, the code example is not safe, because there is no safe publication of any new HashMap instances. Without any synchronization, there is a possibility that a reader thread will see a partially initialized HashMap.
Check out #erickson's explanation under "Reordering" in his answer. Also I can't recommend Brian Goetz's book Java Concurrency in Practice enough!
Whether or not it is okay with you that reader threads might see old (stale) HashMap references, or might even never see a new reference, is beside the point. The worst thing that can happen is that a reader thread might obtain reference to and attempt to access a HashMap instance that is not yet initialized and not ready to be accessed.
No, by the Java Memory Model (JMM), this is not thread-safe.
There is no happens-before relation between writing and reading the HashMap implementation objects. So, although the writer thread appears to write out the object first and then the reference, a reader thread may not see the same order.
As also mentioned there is no guarantee that the reaer thread will ever see the new value. In practice with current compilers on existing hardware the value should get updated, unless the loop body is sufficienly small that it can be sufficiently inlined.
So, making the reference volatile is adequate under the new JMM. It is unlikely to make a substantial difference to system performance.
The moral of this story: Threading is difficult. Don't try to be clever, because sometimes (may be not on your test system) you wont be clever enough.
As others have already noted, this is not safe and you shouldn't do this. You need either volatile or synchronized here to force other threads to see the change.
What hasn't been mentioned is that synchronized and especially volatile are probably a lot faster than you think. If it's actually a performance bottleneck in your app, then I'll eat this web page.
Another option (probably slower than volatile, but YMMV) is to use a ReentrantReadWriteLock to protect access so that multiple concurrent readers can read it. And if that's still a performance bottleneck, I'll eat this whole web site.
public class Metadata
{
private HashMap validProgramCodes;
private ReadWriteLock lock = new ReentrantReadWriteLock();
public HashMap getValidProgramCodes() {
lock.readLock().lock();
try {
return validProgramCodes;
} finally {
lock.readLock().unlock();
}
}
public void setValidProgramCodes(HashMap h) {
lock.writeLock().lock();
try {
validProgramCodes = h;
} finally {
lock.writeLock().unlock();
}
}
}
I think your assumptions are correct. The only thing I would do is set the validProgramCodes volatile.
private volatile HashMap validProgramCodes;
This way, when you update the "pointer" of validProgramCodes you guaranty that all threads access the same latest HasMap "pointer" because they don't rely on local thread cache and go directly to memory.
The assignment will work as long as you're not concerned about reading stale values, and as long as you can guarantee that your hashmap is properly populated on initialization. You should at the least create the hashMap with Collections.unmodifiableMap on the Hashmap to guarantee that your readers won't be changing/deleting objects from the map, and to avoid multiple threads stepping on each others toes and invalidating iterators when other threads destroy.
( writer above is right about the volatile, should've seen that)
While this is not the best solution for this particular problem (erickson's idea of a new unmodifiableMap is), I'd like to take a moment to mention the java.util.concurrent.ConcurrentHashMap class introduced in Java 5, a version of HashMap specifically built with concurrency in mind. This construct does not block on reads.
Check this post about concurrency basics. It should be able to answer your question satisfactorily.
http://walivi.wordpress.com/2013/08/24/concurrency-in-java-a-beginners-introduction/
I think it's risky. Threading results in all kinds of subtly issues that are a giant pain to debug. You might want to look at FastHashMap, which is intended for read-only threading cases like this.
At the least, I'd also declare validProgramCodes to be volatile so that the reference won't get optimized into a register or something.
If I read the JLS correctly (no guarantees there!), accesses to references are always atomic, period. See Section 17.7 Non-atomic Treatment of double and long
So, if the access to a reference is always atomic and it doesn't matter what instance of the returned Hashmap the threads see, you should be OK. You won't see partial writes to the reference, ever.
Edit: After review of the discussion in the comments below and other answers, here are references/quotes from
Doug Lea's book (Concurrent Programming in Java, 2nd Ed), p 94, section 2.2.7.2 Visibility, item #3: "
The first time a thread access a field
of an object, it sees either the
initial value of the field or the
value since written by some other
thread."
On p. 94, Lea goes on to describe risks associated with this approach:
The memory model guarantees that, given the eventual occurrence of the above operations, a particular update to a particular field made by one thread will eventually be visible to another. But eventually can be an arbitrarily long time.
So when it absolutely, positively, must be visible to any calling thread, volatile or some other synchronization barrier is required, especially in long running threads or threads that access the value in a loop (as Lea says).
However, in the case where there is a short lived thread, as implied by the question, with new threads for new readers and it does not impact the application to read stale data, synchronization is not required.
#erickson's answer is the safest in this situation, guaranteeing that other threads will see the changes to the HashMap reference as they occur. I'd suggest following that advice simply to avoid the confusion over the requirements and implementation that resulted in the "down votes" on this answer and the discussion below.
I'm not deleting the answer in the hope that it will be useful. I'm not looking for the "Peer Pressure" badge... ;-)

Categories