Using volatile to publish immutable objects? - java

public class VolatileCachedFactorizer extends GenericServlet implements Servlet {
private volatile OneValueCache cache = new OneValueCache(null, null);
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
BigInteger[] factors = cache.getFactors(i);
if (factors == null) {
factors = factor(i); //----------> thread A
cache = new OneValueCache(i, factors); //---------> thread B
}
encodeIntoResponse(resp, factors);
}
}
public class OneValueCache {
private final BigInteger lastNum;
private final BigInteger[] lastFactors;
public OneValueCache(BigInteger i, BigInteger[] lastFactors){
this.lastNum = i;
this.lastFactors = lastFactors;
}
public BigInteger[] getFactors(BigInteger i){
if(lastNum == null || !lastNum.equals(i))
return null;
else
return Arrays.copyOf(lastFactors, lastFactors.length);
}
}
This is the code from the book Java concurrency in practice, my question is in this code specifically, we can remove the final keyword from the OneValueCache and still preserve the thread-safe, right, I am not sure why are these final keyword necessary.
Thanks.

It is not necessary in this very situation, but it is a bit complicated to reason about when done without the "final" keywords.
Basically there are two concurrency problems we are trying to solve:
1) The visibility of the "cache" reference - solved by using "volatile" here.
2) State consistency (safe publication) of the OneValueCache object. As stated in the "Java Concurrency In Practice" book:
The publication requirements for an object depend on its mutability:
Immutable objects can be published through any mechanism;
Effectively immutable objects must be safely published;
...
So if you remove "final" usages from OneValueCache then you are making this class more of an effectively immutable class, at least from the visibility standpoint, because "final" has memory visibility semantics (somewhat similar to "volatile") under concurrency.
So now instead of forgetting about object state consistency for any usages of the class you are forcing yourself to always think about safe publication when using it.
It also resembles what is described in chapter "16.1.4 Piggybacking on synchronization", because you would use the happens-before of writing/reading the volatile reference to guarantee that the OneValueCache object is in consistent state to all the threads after the construction. Basically it seems to be just a different explanation of the "safe publication" problem in this context.

Related

Java concurrency in practice - safe publication, immutable object and volatile

I'm reading "Java concurrency in practice" and one thing is confusing me.
class OneValueCache {
private final BigInteger lastNumber;
private final BigInteger[] lastFactors;
public OneValueCache(BigInteger lastNumber, BigInteger[] lastFactors) {
this.lastNumber = lastNumber;
this.lastFactors = Arrays.copyOf(lastFactors, lastFactors.length);
}
public BigInteger[] getFactors(BigInteger i) {
if (lastNumber == null || !lastNumber.equals(i)) {
return null;
}
return Arrays.copyOf(lastFactors, lastFactors.length);
}
}
class VolatileCachedFactorized implements Servlet {
private volatile OneValueCache cache = new OneValueCache(null, null);
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
BigInteger[] factors = cache.getFactors(i);
if (factors == null) {
factors = factor(i);
cache = new OneValueCache(i, factors);
}
encodeIntoResponse(resp, factors);
}
}
In above code author uses volatile with reference to immutable OneValueCache, but a few page later he writes:
Immutable objects can be used safely by any thread without additional synchronization, even when synchronization is not used to publish them.
So .. volatile is not necessary in above code?
There are kind of 2 level of "thread-safety" that is being applied here. One is at reference level ( done using volatile). Think of an example where a thread reads the value to be null vs other thread seeing some reference value ( changed in between). Volatile will guarantee the publication of one thread is visible to another. But aAnother level of thread safety will be required to safeguard the internal members themselves which have the potential to be changed. Just having a volatile will have no impact on the data within the Cache ( like lastNumber, lastFactors). So immutability will help in that case.
As a general rule ( referred here) as a good thread safe programming practice
Do not assume that declaring a reference volatile guarantees safe
publication of the members of the referenced object
This is the same reason why putting a volatile keyword in front of a HasMap variable does not make it threadsafe.
cache is not a cache, it is a reference to a cache. The reference needs to be volatile in order that the switch of cache is visible to all threads.
Even after assignment to cache, other threads may be using the old cache, which they can safely do. But if you want the new cache to be seen as soon as it is switched, volatile is needed. There is still a window where threads might be using the old cache, but volatile guarantees that subsequent accessors will see the new cache. Do not confuse 'safety' with 'timeliness'.
Another way to look at this is to note that immutability is a property of the cache object, and cannot affect the use of any reference to that object. (And obviously the reference is not immutable, since we assign to it).

Does making all fields final of an object guarantees safe publication?

I was reading about safe publication from "Java Concurrency in Practice" and needs help to understand this one example. I know it is simple but looks like i got too much into it and got confused.
public class VolatileCachedFactorizer implements Servlet {
private volatile OneValueCache cache =
new OneValueCache(null, null);
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
BigInteger[] factors = cache.getFactors(i);
if (factors == null) {
factors = factor(i);
cache = new OneValueCache(i, factors);
}
encodeIntoResponse(resp, factors);
}
}
class OneValueCache {
private final BigInteger lastNumber;
private final BigInteger[] lastFactors;
public OneValueCache(BigInteger i,
BigInteger[] factors) {
lastNumber = i;
lastFactors = Arrays.copyOf(factors, factors.length);
}
public BigInteger[] getFactors(BigInteger i) {
if (lastNumber == null || !lastNumber.equals(i))
return null;
else
return Arrays.copyOf(lastFactors, lastFactors.length);
}
}
Above are the two classes, One is VolatileCachedFactorizer which is a servlet and will be initialized only once by the container and each request would call service method to get the factors of the number passed.
Now, OneValueCache is an immutable object which would cache the latest number in the cache along with its factors.
Now, as per the book it is safely published.
My question is that OneValueCache is not declared final in VolatileCachedFactorizer although all its fields are final. When the constructor of OneValueCache is executed from service method then isn't the following scenario possible -
lastNumber would be properly initialized (as it is final) but lastFactors is not as both these statements are atomic. So, are there chances that it might be in an improper state.
If OneValueCache was declared final in VolatileCachedFactorizer then JVM would guarantee that it would be properly initialized.
Thanks
The short answer to your specific question is no. But the thing to note is that the OneValueCache instance "cache" is created in the service method and immediately made visible to all other threads since that is made possible due to the characteristics of the volatile keyword. If thread A writes to a volatile variable then once it is finished then thread B would see all the changes that thread A made if it encounters the volatile variable.
With volatile there are no reordering of memory operations and once the object is created then it is immediately available to be read since the volatile variables are not cached within a local cache wherein it is not visible to other threads on another processor.
If your confusion is on the cache object being properly instantiated then yes, then only the properly constructed immutable cache object be made visible to other threads.

Ensuring safe publication and thread safety in java by means of static factories

The class below is meant to be immutable (but see edit):
public final class Position extends Data {
double latitude;
double longitude;
String provider;
private Position() {}
private static enum LocationFields implements
Fields<Location, Position, List<Byte>> {
LAT {
#Override
public List<byte[]> getData(Location loc, final Position out) {
final double lat = loc.getLatitude();
out.latitude = lat;
// return an arrayList
}
#Override
public void parse(List<Byte> list, final Position pos)
throws ParserException {
try {
pos.latitude = listToDouble(list);
} catch (NumberFormatException e) {
throw new ParserException("Malformed file", e);
}
}
}/* , LONG, PROVIDER, TIME (field from Data superclass)*/;
}
// ========================================================================
// Static API (factories essentially)
// ========================================================================
public static Position saveData(Context ctx, Location data)
throws IOException {
final Position out = new Position();
final List<byte[]> listByteArrays = new ArrayList<byte[]>();
for (LocationFields bs : LocationFields.values()) {
listByteArrays.add(bs.getData(data, out).get(0));
}
Persist.saveData(ctx, FILE_PREFIX, listByteArrays);
return out;
}
public static List<Position> parse(File f) throws IOException,
ParserException {
List<EnumMap<LocationFields, List<Byte>>> entries;
// populate entries from f
final List<Position> data = new ArrayList<Position>();
for (EnumMap<LocationFields, List<Byte>> enumMap : entries) {
Position p = new Position();
for (LocationFields field : enumMap.keySet()) {
field.parse(enumMap.get(field), p);
}
data.add(p);
}
return data;
}
/**
* Constructs a Position instance from the given string. Complete copy
* paste just to get the picture
*/
public static Position fromString(String s) {
if (s == null || s.trim().equals("")) return null;
final Position p = new Position();
String[] split = s.split(N);
p.time = Long.valueOf(split[0]);
int i = 0;
p.longitude = Double.valueOf(split[++i].split(IS)[1].trim());
p.latitude = Double.valueOf(split[++i].split(IS)[1].trim());
p.provider = split[++i].split(IS)[1].trim();
return p;
}
}
Being immutable it is also thread safe and all that. As you see the only way to construct instances of this class - except reflection which is another question really - is by using the static factories provided.
Questions :
Is there any case an object of this class might be unsafely published ?
Is there a case the objects as returned are thread unsafe ?
EDIT : please do not comment on the fields not being private - I realize this is not an immutable class by the dictionary, but the package is under my control and I won't ever change the value of a field manually (after construction ofc). No mutators are provided.
The fields not being final on the other hand is the gist of the question. Of course I realize that if they were final the class would be truly immutable and thread safe (at least after Java5). I would appreciate providing an example of bad use in this case though.
Finally - I do not mean to say that the factories being static has anything to do with thread safety as some of the comments seem(ed) to imply. What is important is that the only way to create instances of this class is through those (static of course) factories.
Yes, instances of this class can be published unsafely. This class is not immutable, so if the instantiating thread makes an instance available to other threads without a memory barrier, those threads may see the instance in a partially constructed or otherwise inconsistent state.
The term you are looking for is effectively immutable: the instance fields could be modified after initialization, but in fact they are not.
Such objects can be used safely by multiple threads, but it all depends on how other threads get access to the instance (i.e., how they are published). If you put these objects on a concurrent queue to be consumed by another thread—no problem. If you assign them to a field visible to another thread in a synchronized block, and notify() a wait()-ing thread which reads them—no problem. If you create all the instances in one thread which then starts new threads that use them—no problem!
But if you just assign them to a non-volatile field and sometime "later" another thread happens to read that field, that's a problem! Both the writing thread and the reading thread need synchronization points so that the write truly can be said to have happened before the read.
Your code doesn't do any publication, so I can't say if you are doing it safely. You could ask the same question about this object:
class Option {
private boolean value;
Option(boolean value) { this.value = value; }
boolean get() { return value; }
}
If you are doing something "extra" in your code that you think would make a difference to the safe publication of your objects, please point it out.
Position is not immutable, the fields have package visibility and are not final, see definition of immutable classes here: http://www.javapractices.com/topic/TopicAction.do?Id=29.
Furthermore Position is not safely published because the fields are not final and there is no other mechanism in place to ensure safe publication. The concept of safe publication is explained in many places, but this one seems particularly relevant: http://www.ibm.com/developerworks/java/library/j-jtp0618/
There are also relevant sources on SO.
In a nutshell, safe publication is about what happens when you give the reference of your constructed instance to another thread, will that thread see the fields values as intended? the answer here is no, because the Java compiler and JIT compiler are free to re-order the field initialization with the reference publication, leading to half baked state becoming visible to other threads.
This last point is crucial, from the OP comment to one of the answers below he appears to believe static methods somehow work differently from other methods, that is not the case. A static method can get inlined much like any other method, and the same is true for constructors (the exception being final fields in constructors post Java 1.5). To be clear, while the JMM doesn't guarantee the construction is safe, it may well work fine on certain or even all JVMs. For ample discussion, examples and industry expert opinions see this discussion on the concurrency-interest mailing list: http://jsr166-concurrency.10961.n7.nabble.com/Volatile-stores-in-constructors-disallowed-to-see-the-default-value-td10275.html
The bottom line is, it may work, but it is not safe publishing according to JMM. If you can't prove it is safe, it isn't.
The fields of the Position class are not final, so I believe that their values are not safely published by the constructor. The constructor is therefore not thread-safe, so no code (such as your factory methods) that use them produce thread-safe objects.

Java, lazily initialized field without synchronization

Sometimes when I need lazily initialized field, I use following design pattern.
class DictionaryHolder {
private volatile Dictionary dict; // some heavy object
public Dictionary getDictionary() {
Dictionary d = this.dict;
if (d == null) {
d = loadDictionary(); // costy operation
this.dict = d;
}
return d;
}
}
It looks like Double Checking idion, but not exactly. There is no synchronization and it is possible for loadDictionary method to be called several times.
I use this pattern when the concurrency is pretty low. Also I bear in mind following assumptions when using this pattern:
loadDictionary method always returns the same data.
loadDictionary method is thread-safe.
My questions:
Is this pattern correct? In other words, is it possible for getDictionary() to return invalid data?
Is it possible to make dict field non-volatile for more efficiency?
Is there any better solution?
I personally feel that the Initialization on demand holder idiom is a good fit for this case. From the wiki:
public class Something {
private Something() {}
private static class LazyHolder {
private static final Something INSTANCE = new Something();
}
public static final Something getInstance() {
return LazyHolder.INSTANCE;
}
}
Though this might look like a pattern intended purely for singleton control, you can do many more cool things with it. For e.g. the holder class can invoke a method which in turn populates some kind of data.
Also, it seems that in your case if multiple threads queue on the loadDictionary call (which is synchronized), you might end up loading the same thing multiple times.
The simplest solution is to rely on the fact that a class is not loaded until it is needed. i.e. it is lazy loaded anyway. This way you can avoid having to do those checks yourself.
public enum Dictionary {
INSTANCE;
private Dictionary() {
// load dictionary
}
}
There shouldn't be a need to make it any more complex, certainly you won't make it more efficient.
EDIT: If Dictionary need to extend List or Map you can do.
public enum Dictionary implements List<String> { }
OR a better approach is to use a field.
public enum Dictionary {
INSTANCE;
public final List<String> list = new ArrayList<String>();
}
OR use a static initialization block
public class Dictionary extends ArrayList<String> {
public static final Dictionary INSTANCE = new Dictionary();
private Dictionary() { }
}
Your code is correct. To avoid loading more than once, synchronized{} would be nice.
You can remove volatile, if Dictionary is immutable.
private Dictionary dict; // not volatile; assume Dictionary immutable
public Dictionary getDict()
if(dict==null)
dict = load()
return dict;
If we add double checked locking, it's perfect
public Dictionary getDict()
if(dict==null)
synchronized(this)
if(dict==null)
dict = load()
return dict;
Double checked locking works great for immutable objects, without need of volatile.
Unfortunately the above 2 getDict() methods aren't theoretically bullet proof. The weak java memory model will allow some spooky actions - in theory. To be 100% correct by the book, we must add a local variable, which clutters our code:
public Dictionary getDict()
Dictionary local = dict;
if(local==null)
synchronized(this)
local = dict;
if(local==null)
local = dict = load()
return local;
1.Is this pattern correct? In other words, is it possible for getDictionary() to return invalid data?
Yes if it's okay that loadDictionary() can be called by several threads simultaneously and thus different calls to getDictionary() can return different objects. Otherwise you need a solution with syncronization.
2.Is it possible to make dict field non-volatile for more efficiency?
No, it can cause memory visibility problems.
3.Is there any better solution?
As long as you want a solution without syncronization (either explicit or implicit) - no (as far as I understand). Otherwise, there are a lot of idioms such as using enum or inner holder class (but they use implicit synchronization).
Just a quick stab at this but what about...
class DictionaryHolder {
private volatile Dictionary dict; // some heavy object
public Dictionary getDictionary() {
Dictionary d = this.dict;
if (d == null) {
synchronized (this) {
d = this.dict;
if (d == null) { // gated test for null
this.dict = d = loadDictionary(); // costy operation
}
}
return d;
}
}
Is it possible to make dict field non-volatile for more efficiency?
No. That would hurt visibility, i.e. when one thread initializes dict, other threads may not see the updated reference in time (or at all). This in turn would results in multiple heavy initializations, thus lots of useless work , not to mention returning references to multiple distinct objects.
Anyway, when dealing with concurrency, micro-optimizations for efficiency would be my last thought.
Initialize-on-demand holder class idiom
This method relies on the JVM only
intializing the class members upon
first reference to the class. In this
case, we have a inner class that is
only referenced within the
getDictionary() method. This means
DictionaryHolder will get initialized
on the first call to getDictionary().
public class DictionaryHolder {
private DictionaryHolder ()
{
}
public static Dictionary getDictionary()
{
return DictionaryLazyHolder.instance;
}
private static class DictionaryLazyHolder
{
static final DictionaryHolder instance = new DictionaryHolder();
}
}

How do you ensure multiple threads can safely access a class field?

When a class field is accessed via a getter method by multiple threads, how do you maintain thread safety? Is the synchronized keyword sufficient?
Is this safe:
public class SomeClass {
private int val;
public synchronized int getVal() {
return val;
}
private void setVal(int val) {
this.val = val;
}
}
or does the setter introduce further complications?
If you use 'synchronized' on the setter here too, this code is threadsafe. However it may not be sufficiently granular; if you have 20 getters and setters and they're all synchronized, you may be creating a synchronization bottleneck.
In this specific instance, with a single int variable, then eliminating the 'synchronized' and marking the int field 'volatile' will also ensure visibility (each thread will see the latest value of 'val' when calling the getter) but it may not be synchronized enough for your needs. For example, expecting
int old = someThing.getVal();
if (old == 1) {
someThing.setVal(2);
}
to set val to 2 if and only if it's already 1 is incorrect. For this you need an external lock, or some atomic compare-and-set method.
I strongly suggest you read Java Concurrency In Practice by Brian Goetz et al, it has the best coverage of Java's concurrency constructs.
In addition to Cowan's comment, you could do the following for a compare and store:
synchronized(someThing) {
int old = someThing.getVal();
if (old == 1) {
someThing.setVal(2);
}
}
This works because the lock defined via a synchronized method is implicitly the same as the object's lock (see java language spec).
From my understanding you should use synchronized on both the getter and the setter methods, and that is sufficient.
Edit: Here is a link to some more information on synchronization and what not.
If your class contains just one variable, then another way of achieving thread-safety is to use the existing AtomicInteger object.
public class ThreadSafeSomeClass {
private final AtomicInteger value = new AtomicInteger(0);
public void setValue(int x){
value.set(x);
}
public int getValue(){
return value.get();
}
}
However, if you add additional variables such that they are dependent (state of one variable depends upon the state of another), then AtomicInteger won't work.
Echoing the suggestion to read "Java Concurrency in Practice".
For simple objects this may suffice. In most cases you should avoid the synchronized keyword because you may run into a synchronization deadlock.
Example:
public class SomeClass {
private Object mutex = new Object();
private int val = -1; // TODO: Adjust initialization to a reasonable start
// value
public int getVal() {
synchronized ( mutex ) {
return val;
}
}
private void setVal( int val ) {
synchronized ( mutex ) {
this.val = val;
}
}
}
Assures that only one thread reads or writes to the local instance member.
Read the book "Concurrent Programming in Java(tm): Design Principles and Patterns (Java (Addison-Wesley))", maybe http://java.sun.com/docs/books/tutorial/essential/concurrency/index.html is also helpful...
Synchronization exists to protect against thread interference and memory consistency errors. By synchronizing on the getVal(), the code is guaranteeing that other synchronized methods on SomeClass do not also execute at the same time. Since there are no other synchronized methods, it isn't providing much value. Also note that reads and writes on primitives have atomic access. That means with careful programming, one doesn't need to synchronize the access to the field.
Read Sychronization.
Not really sure why this was dropped to -3. I'm simply summarizing what the Synchronization tutorial from Sun says (as well as my own experience).
Using simple atomic variable access is
more efficient than accessing these
variables through synchronized code,
but requires more care by the
programmer to avoid memory consistency
errors. Whether the extra effort is
worthwhile depends on the size and
complexity of the application.

Categories