Java memory model - can someone explain it? - java

For years and years, I've tried to understand the part of Java specification that deals with memory model and concurrency. I have to admit that I've failed miserably. Yes' I understand about locks and "synchronized" and wait() and notify(). And I can use them just fine, thank you. I even have a vague idea about what "volatile" does. But all of that was not derived from the language spec - rather from general experience.
Here are two sample questions that I am asking. I am not so much interested in particular answers, as I need to understand how the answers are derived from the spec (or may be how I conclude that the spec has no answer).
What does "volatile" do, exactly?
Are writes to variable atomic? Does it depend on variable's type?

I'm not going to attempt to actually answer your questions here - instead I'll redirect you to the book which I seeing recommended for advice on this topic: Java Concurrency in Practice.
One word of warning: if there are answers here, expect quite a few of them to be wrong. One of the reasons I'm not going to post details is because I'm pretty sure I'd get it wrong in at least some respects. I mean no disrespect whatsoever to the community when I say that the chances of everyone who thinks they can answer this question actually having enough rigour to get it right is practically zero. (Joe Duffy recently found a bit of the .NET memory model that was surprised by. If he can get it wrong, so can mortals like us.)
I will offer some insight on just one aspect, because it's often misunderstood:
There's a difference between volatility and atomicity. People often think that an atomic write is volatile (i.e. you don't need to worry about the memory model if the write is atomic). That's not true.
Volatility is about whether one thread performing a read (logically, in the source code) will "see" changes made by another thread.
Atomicity is about whether there is any chance that if a change is seen, only part of the change will be seen.
For instance, take writing to an integer field. That is guaranteed to be atomic, but not volatile. That means that if we have (starting at foo.x = 0):
Thread 1: foo.x = 257;
Thread 2: int y = foo.x;
It's possible for y to be 0 or 257. It won't be any other value, (e.g. 256 or 1) due to the atomicity constraint. However, even if you know that in "wall time" the code in thread 2 executed after the code in thread 1, there could be odd caching, memory accesses "moving" etc. Making the variable x volatile will fix this.
I'll leave the rest up to real honest-to-goodness experts.

non-volatile variables can be cached thread-locally, so different threads may see different values at the same time; volatile prevents this (source)
writes to variables of 32 bits or smaller are guaranteed to be atomic (implied here); not so for long and double, though 64bit JVMs probably implement them as atomic operations

I wont try to explain these issues here but instead refer you to Brian Goetz excellent book on the subject.
The book is "Java Concurrency in Practice", can be found at Amazon or any other well sorted store for computer literature.

This is a good link which can give you a little in depth information:
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html

I recently found an excellent article that explain volatile as:
First, you have to understand a little something about the Java memory model. I've struggled a bit over the years to explain it briefly and well. As of today, the best way I can think of to describe it is if you imagine it this way:
Each thread in Java takes place in a separate memory space (this is clearly untrue, so bear with me on this one).
You need to use special mechanisms to guarantee that communication happens between these threads, as you would on a message passing system.
Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed. Without explicit communication, you can't guarantee which writes get seen by other threads, or even the order in which they get seen.
The Java volatile modifier is an example of a special mechanism to guarantee that communication happens between threads. When one thread writes to a volatile variable, and another thread sees that write, the first thread is telling the second about all of the contents of memory up until it performed the write to that volatile variable.
Additional links:
http://jeremymanson.blogspot.com/2008/11/what-volatile-means-in-java.html
http://www.javaperformancetuning.com/news/qotm030.shtml

JVM Memory model
High level diagram
Code sample
class MainClass {
void method1() { //<- main
int variable1 = 1;
Class1 variable2 = new Class1();
variable2.method2();
}
}
class Class1 {
static Class2 classVariable4 = new Class2();
int instanceVariable5 = 0;
Class2 instanceVariable6 = new Class2();
void method2() {
int variable3 = 3;
}
}
class Class2 { }
*Notes:
thread stack contains only local variables
Members(Class and Instance variables) are stored on heap even they are primitives
What does "volatile" do, exactly?
[Java volatile]
Are writes to variable atomic? Does it depend on variable's type?
[Atomic variable]
[Java thread safe of local variables]

Other answers above are absolutely correct in that your question is not for the feint of heart.
However, I understand your pain on really wanting to get what is under the hood - for this I would point you back to the worlds compilers and lower-level predecessors to java - i.e. assembly, C and C++.
Read about different kinds of barriers ('fences'). Understanding what a memory barrier is, and where it is necessary, will help you have an intuitive grasp of what volatile does.

One notion might be helpful: data (datum) and copies.
If you declare a variable, let's say a byte, it resides somewhere in the memory, in a data segment (roughly speaking). There are 8 bits somewhere in the memory devoted to store that piece of information.
However, there can be several copies of that data, moving around in your machine. For various technical reasons, e.g. thread's local storage, compiler optimizations. And if we have several copies, they might be out of sync.
So you should always keep this notion in mind. It's true not only for java class fields, but for cpp variables, database records (the record state data gets copied into several sessions etc.). Variables, their hidden/visible copies and the subtle syncing issues will be around forever.

Another attempt to provide a summary of things I understood from the answers here and from other sources (the first attempt was pretty far off base. I hope this one is better).
Java memory model is about propagating values written to memory in one thread to other threads so that other threads can see them as they read from memory.
In short, if you obtain a lock on a mutex, anything written by any thread that released that mutex before will be visible to your thread.
If you read a volatile variable, anything written to that volatile variable before you read it is visible to the reading thread. Also, any write to volatile variable done by the thread that write to your variable before the write to your variable is visible. Moreover, in Java 1.5 any write at all, volatile or not, that happened on any thread that wrote to your volatile variable before the write to your volatile variable will be visible to you.
After an object is constructed, you can pass it to another thread, and all final members will be visible and fully constructed in the new thread. There are no similar guarantees about non-final members. That makes me think that assignment to a final member acts as a write to volatile variable (memory fence).
Anything that a thread wrote before its Runnable exited is visible to the thread that executes join(). Anything that a thread wrote before executing start() will be visible to the spawned thread.
Another thing to mention: volatile variables and synchronization have a function that's rarely mentioned: besides flushing the thread cache and providing one-thread-at-a-time access they also prevent compiler and CPU from reordering reads and writes across sync boundary.
None of it is new and the other answers have stated it better. I just wanted to write this up to clear my head.

This explains it using cities (threads) and planets (main memory).
http://mollypages.org/tutorials/javamemorymodel.mp
There are no direct flights from city to city.
You have to first go to another planet (Mars in this case) and then to another city on your home planet. So, from NYC to Tokyo, you have to go:
NYC -> Mars -> Tokyo
Now replace NYC and Tokyo with 2 threads, Mars with Main memory and the flights as acquiring/releasing locks and you have the JMM.

Related

Q: Code isnt working without syncronized method [duplicate]

I have a class that contains a boolean field like this one:
public class MyClass
{
private bool boolVal;
public bool BoolVal
{
get { return boolVal; }
set { boolVal = value; }
}
}
The field can be read and written from many threads using the property. My question is if I should fence the getter and setter with a lock statement? Or should I simply use the volatile keyword and save the locking? Or should I totally ignore multithreading since getting and setting boolean values atomic?
regards,
There are several issues here.
The simple first. Yes, reading and writing a boolean variable is an atomic operation. (clarification: What I mean is that read and write operations by themselves are atomic operations for booleans, not reading and writing, that will of course generate two operations, which together will not be atomic)
However, unless you take extra steps, the compiler might optimize away such reading and writing, or move the operations around, which could make your code operate differently from what you intend.
Marking the field as volatile means that the operations will not be optimized away, the directive basically says that the compiler should never assume the value in this field is the same as the previous one, even if it just read it in the previous instruction.
However, on multicore and multicpu machines, different cores and cpus might have a different value for the field in their cache, and thus you add a lock { } clause, or anything else that forces a memory barrier. This will ensure that the field value is consistent across cores. Additionally, reads and writes will not move past a memory barrier in the code, which means you have predictability in where the operations happen.
So if you suspect, or know, that this field will be written to and read from multiple threads, I would definitely add locking and volatile to the mix.
Note that I'm no expert in multithreading, I'm able to hold my own, but I usually program defensively. There might (I would assume it is highly likely) that you can implement something that doesn't use a lock (there are many lock-free constructs), but sadly I'm not experienced enough in this topic to handle those things. Thus my advice is to add both a lock clause and a volatile directive.
volatile alone is not enough and serves for a different purpose, lock should be fine, but in the end it depends if anyone is going to set boolVal in MyClass iself, who knows, you may have a worker thread spinning in there. It also depends and how you are using boolVal internally. You may also need protection elsewhere. If you ask me, if you are not DEAD SURE you are going to use MyClass in more than one thread, then it's not worth even thinking about it.
P.S. you may also want to read this section

Is unsynchronized read of integer threadsafe in java?

I see this code quite frequently in some OSS unit tests, but is it thread safe ? Is the while loop guaranteed to see the correct value of invoc ?
If no; nerd points to whoever also knows which CPU architecture this may fail on.
private int invoc = 0;
private synchronized void increment() {
invoc++;
}
public void isItThreadSafe() throws InterruptedException {
for (int i = 0; i < TOTAL_THREADS; i++) {
new Thread(new Runnable() {
public void run() {
// do some stuff
increment();
}
}).start();
}
while (invoc != TOTAL_THREADS) {
Thread.sleep(250);
}
}
No, it's not threadsafe. invoc needs to be declared volatile, or accessed while synchronizing on the same lock, or changed to use AtomicInteger. Just using the synchronized method to increment invoc, but not synchronizing to read it, isn't good enough.
The JVM does a lot of optimizations, including CPU-specific caching and instruction reordering. It uses the volatile keyword and locking to decide when it can optimize freely and when it has to have an up-to-date value available for other threads to read. So when the reader doesn't use the lock the JVM can't know not to give it a stale value.
This quote from Java Concurrency in Practice (section 3.1.3) discusses how both writes and reads need to be synchronized:
Intrinsic locking can be used to guarantee that one thread sees the effects of another in a predictable manner, as illustrated by Figure 3.1. When thread A executes a synchronized block, and subsequently thread B enters a synchronized block guarded by the same lock, the values of variables that were visible to A prior to releasing the lock are guaranteed to be visible to B upon acquiring the lock. In other words, everything A did in or prior to a synchronized block is visible to B when it executes a synchronized block guarded by the same lock. Without synchronization, there is no such guarantee.
The next section (3.1.4) covers using volatile:
The Java language also provides an alternative, weaker form of synchronization, volatile variables, to ensure that updates to a variable are propagated predictably to other threads. When a field is declared volatile, the compiler and runtime are put on notice that this variable is shared and that operations on it should not be reordered with other memory operations. Volatile variables are not cached in registers or in caches where they are hidden from other processors, so a read of a volatile variable always returns the most recent write by any thread.
Back when we all had single-CPU machines on our desktops we'd write code and never have a problem until it ran on a multiprocessor box, usually in production. Some of the factors that give rise to the visiblity problems, things like CPU-local caches and instruction reordering, are things you would expect from any multiprocessor machine. Elimination of apparently unneeded instructions could happen for any machine, though. There's nothing forcing the JVM to ever make the reader see the up-to-date value of the variable, you're at the mercy of the JVM implementors. So it seems to me this code would not be a good bet for any CPU architecture.
Well!
private volatile int invoc = 0;
Will do the trick.
And see Are java primitive ints atomic by design or by accident? which sites some of the relevant java definitions. Apparently int is fine, but double & long might not be.
edit, add-on. The question asks, "see the correct value of invoc ?". What is "the correct value"? As in the timespace continuum, simultaneity doesn't really exist between threads. One of the above posts notes that the value will eventually get flushed, and the other thread will get it. Is the code "thread safe"? I would say "yes", because it won't "misbehave" based on the vagaries of sequencing, in this case.
Theoretically, it is possible that the read is cached. Nothing in Java memory model prevents that.
Practically, that is extremely unlikely to happen (in your particular example). The question is, whether JVM can optimize across a method call.
read #1
method();
read #2
For JVM to reason that read#2 can reuse the result of read#1 (which can be stored in a CPU register), it must know for sure that method() contains no synchronization actions. This is generally impossible - unless, method() is inlined, and JVM can see from the flatted code that there's no sync/volatile or other synchronization actions between read#1 and read#2; then it can safely eliminate read#2.
Now in your example, the method is Thread.sleep(). One way to implement it is to busy loop for certain times, depending on CPU frequency. Then JVM may inline it, and then eliminate read#2.
But of course such implementation of sleep() is unrealistic. It is usually implemented as a native method that calls OS kernel. The question is, can JVM optimize across such a native method.
Even if JVM has knowledge of internal workings of some native methods, therefore can optimize across them, it's improbable that sleep() is treated that way. sleep(1ms) takes millions of CPU cycles to return, there is really no point optimizing around it to save a few reads.
--
This discussion reveals the biggest problem of data races - it takes too much effort to reason about it. A program is not necessarily wrong, if it is not "correctly synchronized", however to prove it's not wrong is not an easy task. Life is much simpler, if a program is correctly synchronized and contains no data race.
As far as I understand the code it should be safe. The bytecode can be reordered, yes. But eventually invoc should be in sync with the main thread again. Synchronize guarantees that invoc is incremented correctly so there is a consistent representation of invoc in some register. At some time this value will be flushed and the little test succeeds.
It is certainly not nice and I would go with the answer I voted for and would fix code like this because it smells. But thinking about it I would consider it safe.
If you're not required to use "int", I would suggest AtomicInteger as an thread-safe alternative.

Does volatile influence non-volatile variables?

Okay, suppose I have a bunch of variables, one of them declared volatile:
int a;
int b;
int c;
volatile int v;
If one thread writes to all four variables (writing to v last), and another thread reads from all four variables (reading from v first), does that second thread see the values written to a, b and c by the first thread, even though they are not themselves declared volatile? Or can it possibly see stale values?
Since there seems to be some confusion: I'm not deliberately trying to do something unsafe. I just want to understand the Java memory model and the semantics of the volatile keyword. Pure curiosity.
I'm going to speak to what I think you may really be probing about—piggybacking synchronization.
The technique that it looks like you're trying to use involves using one volatile variable as a synchronization guard in concert with one or more other non-volatile variables. This technique is applicable when the following conditions hold true:
Only one thread will write to the set of values meant to be guarded.
The threads reading the set of values will read them only if the volatile guard value meets some criteria.
You don't mention the second condition holding true for your example, but we can examine it anyway. The model for the writer is as follows:
Write to all the non-volatile variables, assuming that no other thread will try to read them.
Once complete, write a value to the volatile guard variable that indicates that the readers' criteria is met.
The readers operate as follows:
Read the volatile guard variable at any time, and if its value meets the criteria, then
Read the other non-volatile variables.
The readers must not read the other non-volatile variables if the volatile guard variable does not yet indicate a proper value.
The guard variable is acting as a gate. It's closed until the writer sets it to a particular value, or set of values that all meet the criteria of indicating that the gate is now open. The non-volatile variables are guarded behind the gate. The reader is not permitted to read them until the gate opens. Once the gate is open, the reader will see a consistent view of the set of non-volatile variables.
Note that it is not safe to run this protocol repeatedly. The writer can't keep changing the non-volatile variables once it's opened the gate. At that point, multiple reader threads may be reading those other variables, and they can—though are not guaranteed—see updates to those variables. Seeing some but not all of those updates would yield inconsistent views of the set.
Backing up, the trick here is to control access to a set of variables without either
creating a structure to hold them all, to which an atomic reference could be swapped, um, atomically, or
using a lock to make writing to and reading from the entire set of variables mutually exclusive activities.
Piggybacking on top of the volatile guard variable is a clever stunt—not one to be done casually. Subsequent updates to the program can break the aforementioned fragile conditions, removing the consistency guarantees afforded by the Java memory model. Should you choose to use this technique, document its invariants and requirements in the code clearly.
Yes. volatile, locks, etc., setup the happens-before relationship, but it affects all variables (in the new Java Memory Model (JMM) from Java SE 5/JDK 1.4). Kind of makes it useful for non-primitive volatiles...
does that second thread see the values written to a, b and c by the first thread, even though they are not themselves declared volatile? Or can it possibly see stale values?
You will get stale reads, b/c you can't ensure that the values of a, b, c are the ones set after reading of v. Using state machine (but you need CAS to change the state) is a way to tackle similar issues but it's beyond the scope of the discussion.
Perhaps this part is unclear, after writing to v and reading first from v, you'd get the right results (non-stale reads), the main issue is that if you do
if (v==STATE1){...proceed...}, there is no guarantee some other thread would not be modifying the state of a/b/c. In that case, there will be state reads.
If you modify the a/b/c+v once only you'd get the correct result.
Mastering concurrency and and lock-free structures is a really hard one. Doug Lea has a good book on and most talks/articles of Dr. Cliff Click are a wonderful wealth, if you need something to start digging in.
Yes, volatile write "happens-before" next volatile read on the same variable.
While #seh is right on about consistency problems with multiple variables, there are use cases that less consistency is required.
For example, a writer thread updates some state variables; a reader thread displays them promptly. There's not much relation among the variables, we only care about reading the new values promptly. We could make every state variable volatile. Or we could use only one volatile variable as visibility guard.
However, the saving is only on the paper, performance wise there's hardly any difference. In either version, every state variable must be "flushed" by the writer and "loaded" by the reader. No free lunch.

Does a variable accessed by multiple threads in a java servlet need to be declared volatile?

In the book Java Servlet Programming, there's an example servlet on page 54 which searches for primes in a background thread. Each time a client accesses the servlet, the most recently found prime number is returned.
The variable which is used to store the most recently found prime is declared as such:
long lastprime = 0;
Since this variable is begin accessed from multiple threads (the background thread that's doing the calculations and any client threads that are accessing it), doesn't it need to be declared volatile or have its access synchronized in some way?
Yes, assuming you really want to see the most recently calculated prime on any thread, it should either be volatile or be accessed in a thread-safe way via synchronized blocks/methods. Additionally, as pointed out in the comments, non-volatile long variables may not be updated atomically - so you could see the top 32 bits of an old value and the bottom 32 bits of a new value (or vice versa).
I forgot about the atomicity side of things earlier because it's almost always solved automatically by when you make sure you get the most recently published value, and make sure you fully publish new values. In practice this is almost always what you want, so atomicity becomes a non-issue if your code is working properly to start with.
It's not a SingleThreadModel servlet is it? That would obviously make a difference.
Another alternative would have been to use AtomicLong.
Yes. A servlet's variables aren't thread-safe.
There is a clean read/write split between the threads; one thread "publishes" the last prime for others to read, then you can get away with making it volatile.
If the access pattern involved some read-modify-write sequences or the like, then you'd have to synchronize the access to the field.
Assuming Java 5 or later then declaring it volatile gives well-defined semantics as desscribed here. On the principle of removing doubt from the code maintainer's mind I would use volatile, saying "yes I know that multiple threads use this variable".
The intersting question is the effect of not declaring it volatile. Provided that you got a prime, does it matter if it's the very latest available? Volatile ensures taht values are taken from memory, not any "CPU" caches, so you should get a more up to date value.
What about the possibility of seeing a partial assigment? Could you get really unlucky and see a long whose LSBs are part of an old value and MSBs part of a different value? Well, assignments to longs and doubles are not atomic, so in theory yes!
Ergo, volatile or synchronized is not just a nice-to-have ... you need it
Semantics of volatile variable in Java are not strong enough to make the increment operation (lastprime++) atomic, unless you can guarantee that the variable is written only from a single thread - not in servlet's case
On the other hand, using AtomicXXX variables is thread-safe, as long as no compounded operations are performed. There will be window of vulnerability when updating more than one atomic variables, even though each call to is atomic.

Java concurrency scenario -- do I need synchronization or not?

Here's the deal. I have a hash map containing data I call "program codes", it lives in an object, like so:
Class Metadata
{
private HashMap validProgramCodes;
public HashMap getValidProgramCodes() { return validProgramCodes; }
public void setValidProgramCodes(HashMap h) { validProgramCodes = h; }
}
I have lots and lots of reader threads each of which will call getValidProgramCodes() once and then use that hashmap as a read-only resource.
So far so good. Here's where we get interesting.
I want to put in a timer which every so often generates a new list of valid program codes (never mind how), and calls setValidProgramCodes.
My theory -- which I need help to validate -- is that I can continue using the code as is, without putting in explicit synchronization. It goes like this:
At the time that validProgramCodes are updated, the value of validProgramCodes is always good -- it is a pointer to either the new or the old hashmap. This is the assumption upon which everything hinges. A reader who has the old hashmap is okay; he can continue to use the old value, as it will not be garbage collected until he releases it. Each reader is transient; it will die soon and be replaced by a new one who will pick up the new value.
Does this hold water? My main goal is to avoid costly synchronization and blocking in the overwhelming majority of cases where no update is happening. We only update once per hour or so, and readers are constantly flickering in and out.
Use Volatile
Is this a case where one thread cares what another is doing? Then the JMM FAQ has the answer:
Most of the time, one thread doesn't
care what the other is doing. But when
it does, that's what synchronization
is for.
In response to those who say that the OP's code is safe as-is, consider this: There is nothing in Java's memory model that guarantees that this field will be flushed to main memory when a new thread is started. Furthermore, a JVM is free to reorder operations as long as the changes aren't detectable within the thread.
Theoretically speaking, the reader threads are not guaranteed to see the "write" to validProgramCodes. In practice, they eventually will, but you can't be sure when.
I recommend declaring the validProgramCodes member as "volatile". The speed difference will be negligible, and it will guarantee the safety of your code now and in future, whatever JVM optimizations might be introduced.
Here's a concrete recommendation:
import java.util.Collections;
class Metadata {
private volatile Map validProgramCodes = Collections.emptyMap();
public Map getValidProgramCodes() {
return validProgramCodes;
}
public void setValidProgramCodes(Map h) {
if (h == null)
throw new NullPointerException("validProgramCodes == null");
validProgramCodes = Collections.unmodifiableMap(new HashMap(h));
}
}
Immutability
In addition to wrapping it with unmodifiableMap, I'm copying the map (new HashMap(h)). This makes a snapshot that won't change even if the caller of setter continues to update the map "h". For example, they might clear the map and add fresh entries.
Depend on Interfaces
On a stylistic note, it's often better to declare APIs with abstract types like List and Map, rather than a concrete types like ArrayList and HashMap. This gives flexibility in the future if concrete types need to change (as I did here).
Caching
The result of assigning "h" to "validProgramCodes" may simply be a write to the processor's cache. Even when a new thread starts, "h" will not be visible to a new thread unless it has been flushed to shared memory. A good runtime will avoid flushing unless it's necessary, and using volatile is one way to indicate that it's necessary.
Reordering
Assume the following code:
HashMap codes = new HashMap();
codes.putAll(source);
meta.setValidProgramCodes(codes);
If setValidCodes is simply the OP's validProgramCodes = h;, the compiler is free to reorder the code something like this:
1: meta.validProgramCodes = codes = new HashMap();
2: codes.putAll(source);
Suppose after execution of writer line 1, a reader thread starts running this code:
1: Map codes = meta.getValidProgramCodes();
2: Iterator i = codes.entrySet().iterator();
3: while (i.hasNext()) {
4: Map.Entry e = (Map.Entry) i.next();
5: // Do something with e.
6: }
Now suppose that the writer thread calls "putAll" on the map between the reader's line 2 and line 3. The map underlying the Iterator has experienced a concurrent modification, and throws a runtime exception—a devilishly intermittent, seemingly inexplicable runtime exception that was never produced during testing.
Concurrent Programming
Any time you have one thread that cares what another thread is doing, you must have some sort of memory barrier to ensure that actions of one thread are visible to the other. If an event in one thread must happen before an event in another thread, you must indicate that explicitly. There are no guarantees otherwise. In practice, this means volatile or synchronized.
Don't skimp. It doesn't matter how fast an incorrect program fails to do its job. The examples shown here are simple and contrived, but rest assured, they illustrate real-world concurrency bugs that are incredibly difficult to identify and resolve due to their unpredictability and platform-sensitivity.
Additional Resources
The Java Language Specification - 17 Threads and Locks sections: §17.3 and §17.4
The JMM FAQ
Doug Lea's concurrency books
No, the code example is not safe, because there is no safe publication of any new HashMap instances. Without any synchronization, there is a possibility that a reader thread will see a partially initialized HashMap.
Check out #erickson's explanation under "Reordering" in his answer. Also I can't recommend Brian Goetz's book Java Concurrency in Practice enough!
Whether or not it is okay with you that reader threads might see old (stale) HashMap references, or might even never see a new reference, is beside the point. The worst thing that can happen is that a reader thread might obtain reference to and attempt to access a HashMap instance that is not yet initialized and not ready to be accessed.
No, by the Java Memory Model (JMM), this is not thread-safe.
There is no happens-before relation between writing and reading the HashMap implementation objects. So, although the writer thread appears to write out the object first and then the reference, a reader thread may not see the same order.
As also mentioned there is no guarantee that the reaer thread will ever see the new value. In practice with current compilers on existing hardware the value should get updated, unless the loop body is sufficienly small that it can be sufficiently inlined.
So, making the reference volatile is adequate under the new JMM. It is unlikely to make a substantial difference to system performance.
The moral of this story: Threading is difficult. Don't try to be clever, because sometimes (may be not on your test system) you wont be clever enough.
As others have already noted, this is not safe and you shouldn't do this. You need either volatile or synchronized here to force other threads to see the change.
What hasn't been mentioned is that synchronized and especially volatile are probably a lot faster than you think. If it's actually a performance bottleneck in your app, then I'll eat this web page.
Another option (probably slower than volatile, but YMMV) is to use a ReentrantReadWriteLock to protect access so that multiple concurrent readers can read it. And if that's still a performance bottleneck, I'll eat this whole web site.
public class Metadata
{
private HashMap validProgramCodes;
private ReadWriteLock lock = new ReentrantReadWriteLock();
public HashMap getValidProgramCodes() {
lock.readLock().lock();
try {
return validProgramCodes;
} finally {
lock.readLock().unlock();
}
}
public void setValidProgramCodes(HashMap h) {
lock.writeLock().lock();
try {
validProgramCodes = h;
} finally {
lock.writeLock().unlock();
}
}
}
I think your assumptions are correct. The only thing I would do is set the validProgramCodes volatile.
private volatile HashMap validProgramCodes;
This way, when you update the "pointer" of validProgramCodes you guaranty that all threads access the same latest HasMap "pointer" because they don't rely on local thread cache and go directly to memory.
The assignment will work as long as you're not concerned about reading stale values, and as long as you can guarantee that your hashmap is properly populated on initialization. You should at the least create the hashMap with Collections.unmodifiableMap on the Hashmap to guarantee that your readers won't be changing/deleting objects from the map, and to avoid multiple threads stepping on each others toes and invalidating iterators when other threads destroy.
( writer above is right about the volatile, should've seen that)
While this is not the best solution for this particular problem (erickson's idea of a new unmodifiableMap is), I'd like to take a moment to mention the java.util.concurrent.ConcurrentHashMap class introduced in Java 5, a version of HashMap specifically built with concurrency in mind. This construct does not block on reads.
Check this post about concurrency basics. It should be able to answer your question satisfactorily.
http://walivi.wordpress.com/2013/08/24/concurrency-in-java-a-beginners-introduction/
I think it's risky. Threading results in all kinds of subtly issues that are a giant pain to debug. You might want to look at FastHashMap, which is intended for read-only threading cases like this.
At the least, I'd also declare validProgramCodes to be volatile so that the reference won't get optimized into a register or something.
If I read the JLS correctly (no guarantees there!), accesses to references are always atomic, period. See Section 17.7 Non-atomic Treatment of double and long
So, if the access to a reference is always atomic and it doesn't matter what instance of the returned Hashmap the threads see, you should be OK. You won't see partial writes to the reference, ever.
Edit: After review of the discussion in the comments below and other answers, here are references/quotes from
Doug Lea's book (Concurrent Programming in Java, 2nd Ed), p 94, section 2.2.7.2 Visibility, item #3: "
The first time a thread access a field
of an object, it sees either the
initial value of the field or the
value since written by some other
thread."
On p. 94, Lea goes on to describe risks associated with this approach:
The memory model guarantees that, given the eventual occurrence of the above operations, a particular update to a particular field made by one thread will eventually be visible to another. But eventually can be an arbitrarily long time.
So when it absolutely, positively, must be visible to any calling thread, volatile or some other synchronization barrier is required, especially in long running threads or threads that access the value in a loop (as Lea says).
However, in the case where there is a short lived thread, as implied by the question, with new threads for new readers and it does not impact the application to read stale data, synchronization is not required.
#erickson's answer is the safest in this situation, guaranteeing that other threads will see the changes to the HashMap reference as they occur. I'd suggest following that advice simply to avoid the confusion over the requirements and implementation that resulted in the "down votes" on this answer and the discussion below.
I'm not deleting the answer in the hope that it will be useful. I'm not looking for the "Peer Pressure" badge... ;-)

Categories