Here's the deal. I have a hash map containing data I call "program codes", it lives in an object, like so:
Class Metadata
{
private HashMap validProgramCodes;
public HashMap getValidProgramCodes() { return validProgramCodes; }
public void setValidProgramCodes(HashMap h) { validProgramCodes = h; }
}
I have lots and lots of reader threads each of which will call getValidProgramCodes() once and then use that hashmap as a read-only resource.
So far so good. Here's where we get interesting.
I want to put in a timer which every so often generates a new list of valid program codes (never mind how), and calls setValidProgramCodes.
My theory -- which I need help to validate -- is that I can continue using the code as is, without putting in explicit synchronization. It goes like this:
At the time that validProgramCodes are updated, the value of validProgramCodes is always good -- it is a pointer to either the new or the old hashmap. This is the assumption upon which everything hinges. A reader who has the old hashmap is okay; he can continue to use the old value, as it will not be garbage collected until he releases it. Each reader is transient; it will die soon and be replaced by a new one who will pick up the new value.
Does this hold water? My main goal is to avoid costly synchronization and blocking in the overwhelming majority of cases where no update is happening. We only update once per hour or so, and readers are constantly flickering in and out.
Use Volatile
Is this a case where one thread cares what another is doing? Then the JMM FAQ has the answer:
Most of the time, one thread doesn't
care what the other is doing. But when
it does, that's what synchronization
is for.
In response to those who say that the OP's code is safe as-is, consider this: There is nothing in Java's memory model that guarantees that this field will be flushed to main memory when a new thread is started. Furthermore, a JVM is free to reorder operations as long as the changes aren't detectable within the thread.
Theoretically speaking, the reader threads are not guaranteed to see the "write" to validProgramCodes. In practice, they eventually will, but you can't be sure when.
I recommend declaring the validProgramCodes member as "volatile". The speed difference will be negligible, and it will guarantee the safety of your code now and in future, whatever JVM optimizations might be introduced.
Here's a concrete recommendation:
import java.util.Collections;
class Metadata {
private volatile Map validProgramCodes = Collections.emptyMap();
public Map getValidProgramCodes() {
return validProgramCodes;
}
public void setValidProgramCodes(Map h) {
if (h == null)
throw new NullPointerException("validProgramCodes == null");
validProgramCodes = Collections.unmodifiableMap(new HashMap(h));
}
}
Immutability
In addition to wrapping it with unmodifiableMap, I'm copying the map (new HashMap(h)). This makes a snapshot that won't change even if the caller of setter continues to update the map "h". For example, they might clear the map and add fresh entries.
Depend on Interfaces
On a stylistic note, it's often better to declare APIs with abstract types like List and Map, rather than a concrete types like ArrayList and HashMap. This gives flexibility in the future if concrete types need to change (as I did here).
Caching
The result of assigning "h" to "validProgramCodes" may simply be a write to the processor's cache. Even when a new thread starts, "h" will not be visible to a new thread unless it has been flushed to shared memory. A good runtime will avoid flushing unless it's necessary, and using volatile is one way to indicate that it's necessary.
Reordering
Assume the following code:
HashMap codes = new HashMap();
codes.putAll(source);
meta.setValidProgramCodes(codes);
If setValidCodes is simply the OP's validProgramCodes = h;, the compiler is free to reorder the code something like this:
1: meta.validProgramCodes = codes = new HashMap();
2: codes.putAll(source);
Suppose after execution of writer line 1, a reader thread starts running this code:
1: Map codes = meta.getValidProgramCodes();
2: Iterator i = codes.entrySet().iterator();
3: while (i.hasNext()) {
4: Map.Entry e = (Map.Entry) i.next();
5: // Do something with e.
6: }
Now suppose that the writer thread calls "putAll" on the map between the reader's line 2 and line 3. The map underlying the Iterator has experienced a concurrent modification, and throws a runtime exception—a devilishly intermittent, seemingly inexplicable runtime exception that was never produced during testing.
Concurrent Programming
Any time you have one thread that cares what another thread is doing, you must have some sort of memory barrier to ensure that actions of one thread are visible to the other. If an event in one thread must happen before an event in another thread, you must indicate that explicitly. There are no guarantees otherwise. In practice, this means volatile or synchronized.
Don't skimp. It doesn't matter how fast an incorrect program fails to do its job. The examples shown here are simple and contrived, but rest assured, they illustrate real-world concurrency bugs that are incredibly difficult to identify and resolve due to their unpredictability and platform-sensitivity.
Additional Resources
The Java Language Specification - 17 Threads and Locks sections: §17.3 and §17.4
The JMM FAQ
Doug Lea's concurrency books
No, the code example is not safe, because there is no safe publication of any new HashMap instances. Without any synchronization, there is a possibility that a reader thread will see a partially initialized HashMap.
Check out #erickson's explanation under "Reordering" in his answer. Also I can't recommend Brian Goetz's book Java Concurrency in Practice enough!
Whether or not it is okay with you that reader threads might see old (stale) HashMap references, or might even never see a new reference, is beside the point. The worst thing that can happen is that a reader thread might obtain reference to and attempt to access a HashMap instance that is not yet initialized and not ready to be accessed.
No, by the Java Memory Model (JMM), this is not thread-safe.
There is no happens-before relation between writing and reading the HashMap implementation objects. So, although the writer thread appears to write out the object first and then the reference, a reader thread may not see the same order.
As also mentioned there is no guarantee that the reaer thread will ever see the new value. In practice with current compilers on existing hardware the value should get updated, unless the loop body is sufficienly small that it can be sufficiently inlined.
So, making the reference volatile is adequate under the new JMM. It is unlikely to make a substantial difference to system performance.
The moral of this story: Threading is difficult. Don't try to be clever, because sometimes (may be not on your test system) you wont be clever enough.
As others have already noted, this is not safe and you shouldn't do this. You need either volatile or synchronized here to force other threads to see the change.
What hasn't been mentioned is that synchronized and especially volatile are probably a lot faster than you think. If it's actually a performance bottleneck in your app, then I'll eat this web page.
Another option (probably slower than volatile, but YMMV) is to use a ReentrantReadWriteLock to protect access so that multiple concurrent readers can read it. And if that's still a performance bottleneck, I'll eat this whole web site.
public class Metadata
{
private HashMap validProgramCodes;
private ReadWriteLock lock = new ReentrantReadWriteLock();
public HashMap getValidProgramCodes() {
lock.readLock().lock();
try {
return validProgramCodes;
} finally {
lock.readLock().unlock();
}
}
public void setValidProgramCodes(HashMap h) {
lock.writeLock().lock();
try {
validProgramCodes = h;
} finally {
lock.writeLock().unlock();
}
}
}
I think your assumptions are correct. The only thing I would do is set the validProgramCodes volatile.
private volatile HashMap validProgramCodes;
This way, when you update the "pointer" of validProgramCodes you guaranty that all threads access the same latest HasMap "pointer" because they don't rely on local thread cache and go directly to memory.
The assignment will work as long as you're not concerned about reading stale values, and as long as you can guarantee that your hashmap is properly populated on initialization. You should at the least create the hashMap with Collections.unmodifiableMap on the Hashmap to guarantee that your readers won't be changing/deleting objects from the map, and to avoid multiple threads stepping on each others toes and invalidating iterators when other threads destroy.
( writer above is right about the volatile, should've seen that)
While this is not the best solution for this particular problem (erickson's idea of a new unmodifiableMap is), I'd like to take a moment to mention the java.util.concurrent.ConcurrentHashMap class introduced in Java 5, a version of HashMap specifically built with concurrency in mind. This construct does not block on reads.
Check this post about concurrency basics. It should be able to answer your question satisfactorily.
http://walivi.wordpress.com/2013/08/24/concurrency-in-java-a-beginners-introduction/
I think it's risky. Threading results in all kinds of subtly issues that are a giant pain to debug. You might want to look at FastHashMap, which is intended for read-only threading cases like this.
At the least, I'd also declare validProgramCodes to be volatile so that the reference won't get optimized into a register or something.
If I read the JLS correctly (no guarantees there!), accesses to references are always atomic, period. See Section 17.7 Non-atomic Treatment of double and long
So, if the access to a reference is always atomic and it doesn't matter what instance of the returned Hashmap the threads see, you should be OK. You won't see partial writes to the reference, ever.
Edit: After review of the discussion in the comments below and other answers, here are references/quotes from
Doug Lea's book (Concurrent Programming in Java, 2nd Ed), p 94, section 2.2.7.2 Visibility, item #3: "
The first time a thread access a field
of an object, it sees either the
initial value of the field or the
value since written by some other
thread."
On p. 94, Lea goes on to describe risks associated with this approach:
The memory model guarantees that, given the eventual occurrence of the above operations, a particular update to a particular field made by one thread will eventually be visible to another. But eventually can be an arbitrarily long time.
So when it absolutely, positively, must be visible to any calling thread, volatile or some other synchronization barrier is required, especially in long running threads or threads that access the value in a loop (as Lea says).
However, in the case where there is a short lived thread, as implied by the question, with new threads for new readers and it does not impact the application to read stale data, synchronization is not required.
#erickson's answer is the safest in this situation, guaranteeing that other threads will see the changes to the HashMap reference as they occur. I'd suggest following that advice simply to avoid the confusion over the requirements and implementation that resulted in the "down votes" on this answer and the discussion below.
I'm not deleting the answer in the hope that it will be useful. I'm not looking for the "Peer Pressure" badge... ;-)
Related
Same with the follow link, I use the same code with the questioner.
Java multi-threading atomic reference assignment
In my code, there
HashMap<String,String> cache = new HashMap<String,String>();
public class myClass {
private HashMap<String,String> cache = null;
public void init() {
refreshCache();
}
// this method can be called occasionally to update the cache.
//Only one threading will get to this code.
public void refreshCache() {
HashMap<String,String> newcache = new HashMap<String,String>();
// code to fill up the new cache
// and then finally
cache = newcache; //assign the old cache to the new one in Atomic way
}
//Many threads will run this code
public void getCache(Object key) {
ob = cache.get(key)
//do something
}
}
I read the sjlee's answer again and again, I can't understand in which case these code will go wrong. Can anyone give me a example?
Remember I don't care about the getCache function will get the old data.
I'm sorry I can't add comment to the above question because I don't have 50 reputation.
So I just add a new question.
Without a memory barrier you might see null or an old map but you could see an incomplete map. I.e. you see bits of it but not all. Thus is not a problem if you don't mind entries being missing but you risk seeing the Map object but not anything it refers to resulting in a possible NPE.
There is no guarantee you will see a complete Map.
final fields will be visible but non - final fields might not.
this is a very interesting problem, and it shows that one of your core assumptions
"Remember I don't care about the getCache function will get the old
data."
is not correct.
we think, that if "refreshCache" and "getCache" is not synchronized, then we will only get old data, which is not true.
Their call by the initial thread may never reflect in other threads. Since cache is not volatile, every thread is free to keep it's own local copy of it and never make it consistent across threads.
Because the "visibility" aspect of multi-threading, which says that unless we use appropriate locking, or use volatile, we do not trigger a happens-before scenario, which forces threads to make shared variable value consistent across the multiple processors they are running on, which means "cache" , may never get initialized causing an obvious NPE in getCache
to understand this properly, i would recommend reading section 16.2.4 of "Java concurrency in practice" book which deals with a similar problem in double checked locking code.
Solution: would be
To make refreshCache synchronized to force, all threads to update their copy of HashMap whenever any one thread calls it, or
To make cache volatile or
You would have to call refreshCache in every single thread that calls getCache which kind of defeats the purpose of a common cache.
I have been working on a daily basis with the Java Memory Model for some years now. I think I have a good understanding about the concept of data races and the different ways to avoid them (e.g, synchronized blocks, volatile variables, etc). However, there's still something that I don't think I fully understand about the memory model, which is the way that final fields of classes are supposed to be thread safe without any further synchronization.
So according to the specification, if an object is properly initialized (that is, no reference to the object escapes in its constructor in such a way that the reference can be seen by another thread), then, after construction, any thread that sees the object will be guaranteed to see the references to all the final fields of the object (in the state they were when constructed), without any further synchronization.
In particular, the standard (http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4) says:
The usage model for final fields is a simple one: Set the final fields
for an object in that object's constructor; and do not write a
reference to the object being constructed in a place where another
thread can see it before the object's constructor is finished. If this
is followed, then when the object is seen by another thread, that
thread will always see the correctly constructed version of that
object's final fields. It will also see versions of any object or
array referenced by those final fields that are at least as up-to-date
as the final fields are.
They even give the following example:
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x; // guaranteed to see 3
int j = f.y; // could see 0
}
}
}
In which a thread A is supposed to run "reader()", and a thread B is supposed to run "writer()".
So far, so good, apparently.
My main concern has to do with... is this really useful in practice? As far as I know, in order to make thread A (which is running "reader()") see the reference to "f", we must use some synchronization mechanism, such as making f volatile, or using locks to synchronize access to f. If we don't do so, we are not even guaranteed that "reader()" will be able to see an initialized "f", that is, since we have not synchronized access to "f", the reader will potentially see "null" instead of the object that was constructed by the writer thread. This issue is stated in http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#finalWrong , which is one of the main references for the Java Memory Model [bold emphasis mine]:
Now, having said all of this, if, after a thread constructs an
immutable object (that is, an object that only contains final fields),
you want to ensure that it is seen correctly by all of the other
thread, you still typically need to use synchronization. There is no
other way to ensure, for example, that the reference to the immutable
object will be seen by the second thread. The guarantees the program
gets from final fields should be carefully tempered with a deep and
careful understanding of how concurrency is managed in your code.
So if we are not even guaranteed to see the reference to "f", and we must therefore use typical synchronization mechanisms (volatile, locks, etc.), and these mechanisms do already cause data races to go away, the need for final is something I would not even consider. I mean, if in order to make "f" visible to other threads we still need to use volatile or synchronized blocks, and they already make internal fields be visible to the other threads... what's the point (in thread safety terms) in making a field final in the first place?
I think that you are misunderstanding what the JLS example is intended to show:
static void reader() {
if (f != null) {
int i = f.x; // guaranteed to see 3
int j = f.y; // could see 0
}
}
This code does not guarantee that the latest value of f will be seen by the thread that calls reader(). But what it is saying is that if you do see f as non-null, then f.x is guaranteed to be 3 ... despite the fact that we didn't actually do any explicit synchronizing.
Well is this implicit synchronization for finals in constructors useful? Certainly it is ... IMO. It means that we don't need to do any extra synchronization each time we accessed an immutable object's state. That is a good thing, because synchronization typically entails cache read-through or write-through, and that slows your program down.
But what Pugh is saying is that you will typically need to synchronize to get hold of the reference to the immutable object in the first place. He is making the point that using immutable objects (implemented using final) does not excuse you from the need to synchronize ... or from the need to understand the concurrency / synchronization implementation of your application.
The problem is that we still need to be sure that reader will se a non-null "f", and that's only possible if we use other synchronization mechanism that will already provide the semantics of allowing us to see 3 for f.x. And if that's the case, why bother using final for thread safety stuff?
There is a difference between synchronizing to get the reference and synchronizing to use the reference. The first one I may need to do only once. The second one I may need to do lots of times ... with the same reference. And even if it is one-to-one, I have still halved the number of synchronizing operations ... if I (hypothetically) implement the immutable object as thread-safe.
TL;DR: Most software developers should ignore the special rules regarding final variables in the Java Memory Model. They should adhere to the general rule: If a program is free of data races, all executions will appear to be sequentially consistent. In most cases, final variables can not be used to improve the performance of concurrent code, because the special rule in the Java Memory Model creates some additional costs for final variables, what makes volatile superior to final variables for almost all use cases.
The special rule about final variables prevents in some cases, that a final variable can show different values. However, performance-wise the rule is irrelevant.
Having said that, here is a more detailed answer. But I have to warn you. The following description might contain some precarious information, that most software developers should never care about, and it's better if they don't know about it.
The special rule about final variables in the Java Memory Model somehow implies, that it makes a difference for the Java VM and Java JIT compiler, if a member variable is final or if it's not.
public class Int {
public /* final */ int value;
public Int(int value) {
this.value = value;
}
}
If you take a look at the Hotspot source code, you will see that the compiler checks if the constructor of a class writes at least one final variable. If it does so, the compiler will emit additional code for the constructor, more precisely a memory release barrier. You will also find the following comment in the source code:
This method (which must be a constructor by the rules of Java)
wrote a final. The effects of all initializations must be
committed to memory before any code after the constructor
publishes the reference to the newly constructor object.
Rather than wait for the publication, we simply block the
writes here. Rather than put a barrier on only those writes
which are required to complete, we force all writes to complete.
That means the initialization of a final variable is similar to a write of a volatile variable. It implies some kind of memory release barrier. However, as can be seen from the quoted comment, final variables might be even more expensive. And what's even worse, you have these additional costs for final variables regardless whether they are used in concurrent code or not.
That's awful, because we want software developers to use final variables in order to increase the readability and maintainability of source code. Unfortunately, using final variables can significantly impact the performance of a program.
The question remains: Are there any use cases where the special rule regarding final variables helps to improve the performance of concurrent code?
That's hard to tell, because it depends on the actual implementation of the Java VM and the memory architecture of the machine. I haven't seen any such use cases until now. A quick glance at the source code of the package java.util.concurrent has also revealed nothing.
The problem is: The initialization of a final variable is about as expensive as a write of a volatile or atomic variable. If you use a volatile variable for the reference of the newly created object, you get the same behaviour and costs with the exception, that the reference will also be published immediately. So, there is basically no benefit in using final variables for concurrent programming.
You are right, since locking makes stronger guarantees, the guarantee about availability of finals is not particularly useful in the presence of locking. However, locking is not always necessary to ensure reliable concurrent access.
As far as I know, in order to make thread A (which is running "reader()") see the reference to "f", we must use some synchronization mechanism, such as making f volatile, or using locks to synchronize access to f.
Making f volatile is not a synchronization mechanism; it forces threads to read the memory each time the variable is accessed, but it does not synchronize access to a memory location. Locking is a way to synchronize access, but it is not necessary in practice to guarantee that the two threads share data reliably. For example, you could use a ConcurrentLinkedQueue<E> class, which is a lock-free concurrent collection* , to pass data from a reader thread to a writer thread, and avoid synchronization. You could also use AtomicReference<T> to ensure reliable concurrent access to an object without locking.
It is when you use lock-free concurrency that the guarantee about the visibility of final fields come in handy. If you make a lock-free collection, and use it to store immutable objects, your threads would be able to access the content of the objects without additional locking.
* ConcurrentLinkedQueue<E> is not only lock-free, but also a wait-free collection (i.e. a lock-free collection with additional guarantees not relevant to this discussion).
Yes final final fields are useful in terms of thread-safety. It may not be useful in your example, however if you look at the old ConcurrentHashMap implementation the get method doesn't apply any locking while it search for the value, though there is a risk that while look up is happening the list might change (think of ConcurrentModificationException ). However CHM uses the list made of final filed for 'next' field guaranteeing the consistency of the list (the items in the front/yet-to see will not grow or shrink). So the advantage is thread-safety is established without synchronization.
From the article
Exploiting immutability
One significant source of inconsistency is avoided by making the Entry
elements nearly immutable -- all fields are final, except for the
value field, which is volatile. This means that elements cannot be
added to or removed from the middle or end of the hash chain --
elements can only be added at the beginning, and removal involves
cloning all or part of the chain and updating the list head pointer.
So once you have a reference into a hash chain, while you may not know
whether you have a reference to the head of the list, you do know that
the rest of the list will not change its structure. Also, since the
value field is volatile, you will be able to see updates to the value
field immediately, greatly simplifying the process of writing a Map
implementation that can deal with a potentially stale view of memory.
While the new JMM provides initialization safety for final variables,
the old JMM does not, which means that it is possible for another
thread to see the default value for a final field, rather than the
value placed there by the object's constructor. The implementation
must be prepared to detect this as well, which it does by ensuring
that the default value for each field of Entry is not a valid value.
The list is constructed such that if any of the Entry fields appear to
have their default value (zero or null), the search will fail,
prompting the get() implementation to synchronize and traverse the
chain again.
Article link: https://www.ibm.com/developerworks/library/j-jtp08223/
I hope this isn't too silly a question...
I have code similar to the following in my project:
public class ConfigStore {
public static class Config {
public final String setting1;
public final String setting2;
public final String setting3;
public Config(String setting1, String setting2, String setting3) {
this.setting1 = setting1;
this.setting2 = setting2;
this.setting3 = setting3;
}
}
private volatile HashMap<String, Config> store = new HashMap<String, Config>();
public void swapConfigs(HashMap<String, Config> newConfigs) {
this.store = newConfigs;
}
public Config getConfig(String name) {
return this.store.get(name);
}
}
As requests are processed, each thread will request a config to use from the store using the getConfig() function. However, periodically (every few days most likely), the configs are updated and swapped out using the swapConfigs() function. The code that calls swapConfigs() does not keep a reference to the Map it passes in as it is simply the result of parsing a configuration file.
In this case, is the volatile keyword still needed on the store instance variable?
Will the volatile keyword introduce any potential performance bottlenecks that I should be aware of or can avoid given that the rate of reads greatly exceeds the rate of writes?
Thanks very much,
Since changing references is an atomic operation, you won't end up with one thread modifying the reference, and the other seeing a garbage reference, even if you drop volatile. However, the new map may not get instantly visible for some threads, which may consequently keep reading configuration from the old map for an indefinite time (or forever). So keep volatile.
Update
As #BeeOnRope pointed out in a comment below, there is an even stronger reason to use volatile:
"non-volatile writes [...] don't establish a happens-before relationship between the write and subsequent reads that see the written value. This means that a thread can see a new map published through the instance variable, but this new map hasn't been fully constructed yet. This is not intuitive, but it's a consequence of the memory model, and it happens in the real word. For an object to be safely published, it must be written to a volatile, or use a handful of other techniques.
Since you change the value very rarely, I don't think volatile would cause any noticeable performance difference. But at any rate, correct behaviour trumps performance.
No, this is not thread safe without volatile, even apart from the issues of seeing stale values. Even though there are no writes to the map itself, and reference assignment is atomic, the new Map<> has not been safely published.
For an object to be safely published, it must be communicated to other threads using some mechanism that either establishes a happens-before relationship between the object construction, the reference publication and the reference read, or it must use a handful of narrower methods which are guaranteed to be safe for publishing:
Initializing an object reference from a static initializer.
Storing a reference to it into a final field.
Neither of those two publication specific ways applies to you, so you'll need volatile to establish happens-before.
Here is a longer version of this reasoning, including links to the JLS and some examples of real-world things that can happen if you don't publish safely.
More details on safe publication can be found in JCIP (highly recommended), or here.
Your code is fine. You need volatile, otherwise your code would be 100% thread-safe (updating a reference is atomic), however the change might not be visible to all the threads. It means some threads will still see the old value of store.
That being said volatile is obligatory in your example. You might consider AtomicReference, but it won't give you anything more in your case.
You cannot trade correctness for performance so your second question is not really valid. It will have some performance impact, but probably only during update, which happens very rarely as you said. Basically JVM will ensure the change is visible to all the threads by "flushing" it, but after that it will be accessible as any other local variable (up until next update).
BTW I like Config class being immutable, please also consider immutable Map implementation just in case.
Would it work for you to use a ConcurrentHashMap and instead of swapping the entire config update the affected values in the hash map?
I see this code quite frequently in some OSS unit tests, but is it thread safe ? Is the while loop guaranteed to see the correct value of invoc ?
If no; nerd points to whoever also knows which CPU architecture this may fail on.
private int invoc = 0;
private synchronized void increment() {
invoc++;
}
public void isItThreadSafe() throws InterruptedException {
for (int i = 0; i < TOTAL_THREADS; i++) {
new Thread(new Runnable() {
public void run() {
// do some stuff
increment();
}
}).start();
}
while (invoc != TOTAL_THREADS) {
Thread.sleep(250);
}
}
No, it's not threadsafe. invoc needs to be declared volatile, or accessed while synchronizing on the same lock, or changed to use AtomicInteger. Just using the synchronized method to increment invoc, but not synchronizing to read it, isn't good enough.
The JVM does a lot of optimizations, including CPU-specific caching and instruction reordering. It uses the volatile keyword and locking to decide when it can optimize freely and when it has to have an up-to-date value available for other threads to read. So when the reader doesn't use the lock the JVM can't know not to give it a stale value.
This quote from Java Concurrency in Practice (section 3.1.3) discusses how both writes and reads need to be synchronized:
Intrinsic locking can be used to guarantee that one thread sees the effects of another in a predictable manner, as illustrated by Figure 3.1. When thread A executes a synchronized block, and subsequently thread B enters a synchronized block guarded by the same lock, the values of variables that were visible to A prior to releasing the lock are guaranteed to be visible to B upon acquiring the lock. In other words, everything A did in or prior to a synchronized block is visible to B when it executes a synchronized block guarded by the same lock. Without synchronization, there is no such guarantee.
The next section (3.1.4) covers using volatile:
The Java language also provides an alternative, weaker form of synchronization, volatile variables, to ensure that updates to a variable are propagated predictably to other threads. When a field is declared volatile, the compiler and runtime are put on notice that this variable is shared and that operations on it should not be reordered with other memory operations. Volatile variables are not cached in registers or in caches where they are hidden from other processors, so a read of a volatile variable always returns the most recent write by any thread.
Back when we all had single-CPU machines on our desktops we'd write code and never have a problem until it ran on a multiprocessor box, usually in production. Some of the factors that give rise to the visiblity problems, things like CPU-local caches and instruction reordering, are things you would expect from any multiprocessor machine. Elimination of apparently unneeded instructions could happen for any machine, though. There's nothing forcing the JVM to ever make the reader see the up-to-date value of the variable, you're at the mercy of the JVM implementors. So it seems to me this code would not be a good bet for any CPU architecture.
Well!
private volatile int invoc = 0;
Will do the trick.
And see Are java primitive ints atomic by design or by accident? which sites some of the relevant java definitions. Apparently int is fine, but double & long might not be.
edit, add-on. The question asks, "see the correct value of invoc ?". What is "the correct value"? As in the timespace continuum, simultaneity doesn't really exist between threads. One of the above posts notes that the value will eventually get flushed, and the other thread will get it. Is the code "thread safe"? I would say "yes", because it won't "misbehave" based on the vagaries of sequencing, in this case.
Theoretically, it is possible that the read is cached. Nothing in Java memory model prevents that.
Practically, that is extremely unlikely to happen (in your particular example). The question is, whether JVM can optimize across a method call.
read #1
method();
read #2
For JVM to reason that read#2 can reuse the result of read#1 (which can be stored in a CPU register), it must know for sure that method() contains no synchronization actions. This is generally impossible - unless, method() is inlined, and JVM can see from the flatted code that there's no sync/volatile or other synchronization actions between read#1 and read#2; then it can safely eliminate read#2.
Now in your example, the method is Thread.sleep(). One way to implement it is to busy loop for certain times, depending on CPU frequency. Then JVM may inline it, and then eliminate read#2.
But of course such implementation of sleep() is unrealistic. It is usually implemented as a native method that calls OS kernel. The question is, can JVM optimize across such a native method.
Even if JVM has knowledge of internal workings of some native methods, therefore can optimize across them, it's improbable that sleep() is treated that way. sleep(1ms) takes millions of CPU cycles to return, there is really no point optimizing around it to save a few reads.
--
This discussion reveals the biggest problem of data races - it takes too much effort to reason about it. A program is not necessarily wrong, if it is not "correctly synchronized", however to prove it's not wrong is not an easy task. Life is much simpler, if a program is correctly synchronized and contains no data race.
As far as I understand the code it should be safe. The bytecode can be reordered, yes. But eventually invoc should be in sync with the main thread again. Synchronize guarantees that invoc is incremented correctly so there is a consistent representation of invoc in some register. At some time this value will be flushed and the little test succeeds.
It is certainly not nice and I would go with the answer I voted for and would fix code like this because it smells. But thinking about it I would consider it safe.
If you're not required to use "int", I would suggest AtomicInteger as an thread-safe alternative.
There is a case where a map will be constructed, and once it is initialized, it will never be modified again. It will however, be accessed (via get(key) only) from multiple threads. Is it safe to use a java.util.HashMap in this way?
(Currently, I'm happily using a java.util.concurrent.ConcurrentHashMap, and have no measured need to improve performance, but am simply curious if a simple HashMap would suffice. Hence, this question is not "Which one should I use?" nor is it a performance question. Rather, the question is "Would it be safe?")
Jeremy Manson, the god when it comes to the Java Memory Model, has a three part blog on this topic - because in essence you are asking the question "Is it safe to access an immutable HashMap" - the answer to that is yes. But you must answer the predicate to that question which is - "Is my HashMap immutable". The answer might surprise you - Java has a relatively complicated set of rules to determine immutability.
For more info on the topic, read Jeremy's blog posts:
Part 1 on Immutability in Java:
http://jeremymanson.blogspot.com/2008/04/immutability-in-java.html
Part 2 on Immutability in Java:
http://jeremymanson.blogspot.com/2008/07/immutability-in-java-part-2.html
Part 3 on Immutability in Java:
http://jeremymanson.blogspot.com/2008/07/immutability-in-java-part-3.html
Your idiom is safe if and only if the reference to the HashMap is safely published. Rather than anything relating the internals of HashMap itself, safe publication deals with how the constructing thread makes the reference to the map visible to other threads.
Basically, the only possible race here is between the construction of the HashMap and any reading threads that may access it before it is fully constructed. Most of the discussion is about what happens to the state of the map object, but this is irrelevant since you never modify it - so the only interesting part is how the HashMap reference is published.
For example, imagine you publish the map like this:
class SomeClass {
public static HashMap<Object, Object> MAP;
public synchronized static setMap(HashMap<Object, Object> m) {
MAP = m;
}
}
... and at some point setMap() is called with a map, and other threads are using SomeClass.MAP to access the map, and check for null like this:
HashMap<Object,Object> map = SomeClass.MAP;
if (map != null) {
.. use the map
} else {
.. some default behavior
}
This is not safe even though it probably appears as though it is. The problem is that there is no happens-before relationship between the set of SomeObject.MAP and the subsequent read on another thread, so the reading thread is free to see a partially constructed map. This can pretty much do anything and even in practice it does things like put the reading thread into an infinite loop.
To safely publish the map, you need to establish a happens-before relationship between the writing of the reference to the HashMap (i.e., the publication) and the subsequent readers of that reference (i.e., the consumption). Conveniently, there are only a few easy-to-remember ways to accomplish that[1]:
Exchange the reference through a properly locked field (JLS 17.4.5)
Use static initializer to do the initializing stores (JLS 12.4)
Exchange the reference via a volatile field (JLS 17.4.5), or as the consequence of this rule, via the AtomicX classes
Initialize the value into a final field (JLS 17.5).
The ones most interesting for your scenario are (2), (3) and (4). In particular, (3) applies directly to the code I have above: if you transform the declaration of MAP to:
public static volatile HashMap<Object, Object> MAP;
then everything is kosher: readers who see a non-null value necessarily have a happens-before relationship with the store to MAP and hence see all the stores associated with the map initialization.
The other methods change the semantics of your method, since both (2) (using the static initalizer) and (4) (using final) imply that you cannot set MAP dynamically at runtime. If you don't need to do that, then just declare MAP as a static final HashMap<> and you are guaranteed safe publication.
In practice, the rules are simple for safe access to "never-modified objects":
If you are publishing an object which is not inherently immutable (as in all fields declared final) and:
You already can create the object that will be assigned at the moment of declarationa: just use a final field (including static final for static members).
You want to assign the object later, after the reference is already visible: use a volatile fieldb.
That's it!
In practice, it is very efficient. The use of a static final field, for example, allows the JVM to assume the value is unchanged for the life of the program and optimize it heavily. The use of a final member field allows most architectures to read the field in a way equivalent to a normal field read and doesn't inhibit further optimizationsc.
Finally, the use of volatile does have some impact: no hardware barrier is needed on many architectures (such as x86, specifically those that don't allow reads to pass reads), but some optimization and reordering may not occur at compile time - but this effect is generally small. In exchange, you actually get more than what you asked for - not only can you safely publish one HashMap, you can store as many more not-modified HashMaps as you want to the same reference and be assured that all readers will see a safely published map.
For more gory details, refer to Shipilev or this FAQ by Manson and Goetz.
[1] Directly quoting from shipilev.
a That sounds complicated, but what I mean is that you can assign the reference at construction time - either at the declaration point or in the constructor (member fields) or static initializer (static fields).
b Optionally, you can use a synchronized method to get/set, or an AtomicReference or something, but we're talking about the minimum work you can do.
c Some architectures with very weak memory models (I'm looking at you, Alpha) may require some type of read barrier before a final read - but these are very rare today.
The reads are safe from a synchronization standpoint but not a memory standpoint. This is something that is widely misunderstood among Java developers including here on Stackoverflow. (Observe the rating of this answer for proof.)
If you have other threads running, they may not see an updated copy of the HashMap if there is no memory write out of the current thread. Memory writes occur through the use of the synchronized or volatile keywords, or through uses of some java concurrency constructs.
See Brian Goetz's article on the new Java Memory Model for details.
After a bit more looking, I found this in the java doc (emphasis mine):
Note that this implementation is not
synchronized. If multiple threads
access a hash map concurrently, and at
least one of the threads modifies the
map structurally, it must be
synchronized externally. (A structural
modification is any operation that
adds or deletes one or more mappings;
merely changing the value associated
with a key that an instance already
contains is not a structural
modification.)
This seems to imply that it will be safe, assuming the converse of the statement there is true.
One note is that under some circumstances, a get() from an unsynchronized HashMap can cause an infinite loop. This can occur if a concurrent put() causes a rehash of the Map.
http://lightbody.net/blog/2005/07/hashmapget_can_cause_an_infini.html
There is an important twist though. It's safe to access the map, but in general it's not guaranteed that all threads will see exactly the same state (and thus values) of the HashMap. This might happen on multiprocessor systems where the modifications to the HashMap done by one thread (e.g., the one that populated it) can sit in that CPU's cache and won't be seen by threads running on other CPUs, until a memory fence operation is performed ensuring cache coherence. The Java Language Specification is explicit on this one: the solution is to acquire a lock (synchronized (...)) which emits a memory fence operation. So, if you are sure that after populating the HashMap each of the threads acquires ANY lock, then it's OK from that point on to access the HashMap from any thread until the HashMap is modified again.
According to http://www.ibm.com/developerworks/java/library/j-jtp03304/ # Initialization safety you can make your HashMap a final field and after the constructor finishes it would be safely published.
...
Under the new memory model, there is something similar to a happens-before relationship between the write of a final field in a constructor and the initial load of a shared reference to that object in another thread.
...
This question is addressed in Brian Goetz's "Java Concurrency in Practice" book (Listing 16.8, page 350):
#ThreadSafe
public class SafeStates {
private final Map<String, String> states;
public SafeStates() {
states = new HashMap<String, String>();
states.put("alaska", "AK");
states.put("alabama", "AL");
...
states.put("wyoming", "WY");
}
public String getAbbreviation(String s) {
return states.get(s);
}
}
Since states is declared as final and its initialization is accomplished within the owner's class constructor, any thread who later reads this map is guaranteed to see it as of the time the constructor finishes, provided no other thread will try to modify the contents of the map.
So the scenario you described is that you need to put a bunch of data into a Map, then when you're done populating it you treat it as immutable. One approach that is "safe" (meaning you're enforcing that it really is treated as immutable) is to replace the reference with Collections.unmodifiableMap(originalMap) when you're ready to make it immutable.
For an example of how badly maps can fail if used concurrently, and the suggested workaround I mentioned, check out this bug parade entry: bug_id=6423457
Be warned that even in single-threaded code, replacing a ConcurrentHashMap with a HashMap may not be safe. ConcurrentHashMap forbids null as a key or value. HashMap does not forbid them (don't ask).
So in the unlikely situation that your existing code might add a null to the collection during setup (presumably in a failure case of some kind), replacing the collection as described will change the functional behaviour.
That said, provided you do nothing else concurrent reads from a HashMap are safe.
[Edit: by "concurrent reads", I mean that there are not also concurrent modifications.
Other answers explain how to ensure this. One way is to make the map immutable, but it's not necessary. For example, the JSR133 memory model explicitly defines starting a thread to be a synchronised action, meaning that changes made in thread A before it starts thread B are visible in thread B.
My intent is not to contradict those more detailed answers about the Java Memory Model. This answer is intended to point out that even aside from concurrency issues, there is at least one API difference between ConcurrentHashMap and HashMap, which could scupper even a single-threaded program which replaced one with the other.]
http://www.docjar.com/html/api/java/util/HashMap.java.html
here is the source for HashMap. As you can tell, there is absolutely no locking / mutex code there.
This means that while its okay to read from a HashMap in a multithreaded situation, I'd definitely use a ConcurrentHashMap if there were multiple writes.
Whats interesting is that both the .NET HashTable and Dictionary<K,V> have built in synchronization code.
If the initialization and every put is synchronized you are save.
Following code is save because the classloader will take care of the synchronization:
public static final HashMap<String, String> map = new HashMap<>();
static {
map.put("A","A");
}
Following code is save because the writing of volatile will take care of the synchronization.
class Foo {
volatile HashMap<String, String> map;
public void init() {
final HashMap<String, String> tmp = new HashMap<>();
tmp.put("A","A");
// writing to volatile has to be after the modification of the map
this.map = tmp;
}
}
This will also work if the member variable is final because final is also volatile. And if the method is a constructor.