How to "safely publish" lazily-generated effectively-immutable array - java

Java's present memory model guarantees that if the only reference to an object "George" is stored into a final field of some other object "Joe", and neither George nor Joe have never been seen by any other thread, all operations upon George which were performed before the store will be seen by all threads as having been performed before the store. This works out very nicely in cases where it makes sense to store into a final field a reference to an object which will never be mutated after that.
Is there any efficient way of achieving such semantics in cases where an object of mutable type is supposed to be lazily created (sometime after the owning object's constructor has finished execution)? Consider the fairly simple class ArrayThing which encapsulates an immutable array, but it offers a method (three versions with the same nominal purpose) to return the sum of all elements prior to a specified one. For purposes of this example, assume that many instances will be constructed without ever using that method, but on instances where that method is used, it will be used a lot; consequently, it's not worthwhile to precompute the sums when every instance of ArrayThing is constructed, but it is worthwhile to cache them.
class ArrayThing {
final int[] mainArray;
ArrayThing(int[] initialContents) {
mainArray = (int[])initialContents.clone();
}
public int getElementAt(int index) {
return mainArray[index];
}
int[] makeNewSumsArray() {
int[] temp = new int[mainArray.length+1];
int sum=0;
for (int i=0; i<mainArray.length; i++) {
temp[i] = sum;
sum += mainArray[i];
}
temp[i] = sum;
return temp;
}
// Unsafe version (a thread could be seen as setting sumOfPrevElements1
// before it's seen as populating array).
int[] sumOfPrevElements1;
public int getSumOfElementsBefore_v1(int index) {
int[] localElements = sumOfPrevElements1;
if (localElements == null) {
localElements = makeNewSumsArray();
sumOfPrevElements1 = localElements;
}
return localElements[index];
}
static class Holder {
public final int[] it;
public Holder(int[] dat) { it = dat; }
}
// Safe version, but slower to read (adds another level of indirection
// but no thread can possibly see a write to sumOfPreviousElements2
// before the final field and the underlying array have been written.
Holder sumOfPrevElements2;
public int getSumOfElementsBefore_v2(int index) {
Holder localElements = sumOfPrevElements2;
if (localElements == null) {
localElements = new Holder(makeNewSumsArray());
sumOfPrevElements2 = localElements;
}
return localElements.it[index];
}
// Safe version, I think; but no penalty on reading speed.
// Before storing the reference to the new array, however, it
// creates a temporary object which is almost immediately
// discarded; that seems rather hokey.
int[] sumOfPrevElements3;
public int getSumOfElementsBefore_v3(int index) {
int[] localElements = sumOfPrevElements3;
if (localElements == null) {
localElements = (new Holder(makeNewSumsArray())).it;
sumOfPrevElements3 = localElements;
}
return localElements[index];
}
}
As with the String#hashCode() method, it is possible that two or more threads might see that a computation hasn't been performed, decide to perform it, and store the result. Since all threads would end up producing identical results, that wouldn't be an issue. With getSumOfElementsBefore_v1(), however, there is a different problem: Java could re-order program execution so the array reference gets written to sumOfPrevElements1 before all the elements of the array have been written. Another thread which called getSumOfElementsBefore() at that moment could see that the array wasn't null, and then proceed to read an array element which hadn't yet been written. Oops.
From what I understand, getSumOfElementsBefore_v2() fixes that problem, since storing a reference to the array in final field Holder#it would establish a "happens-after" relationship with regard to the array element writes. Unfortunately, that version of the code would need to create and maintain an extra heap object, and would require that every attempt to access the sum-of-elements array go through an extra level of indirection.
I think getSumOfElementsBefore_v3() would be cheaper but still safe. The JVM guarantees that all actions which were done to a new object before a reference is stored into a final field will be visible to all threads by the time any thread can see that reference. Thus, even if other threads don't use Holder#it directly, the fact that they are using a reference which was copied from that field would establish that they can't see the reference until after everything that was done before the store has actually happened.
Even though the latter method limits the overhead (versus the unsafe method) to the times when the new array is created (rather than adding overhead to every read), it still seems rather ugly to create a new object purely for the purpose of writing and reading back a final field. Making the array field volatile would achieve legitimate semantics, but would add memory-system overhead every time the field is read (a volatile qualifier would require that the code notice if the field has been written in another thread, but that's overkill for this application; what's necessary is merely that any thread which does see that the field has been written also see all writes which occurred to the array identify thereby before the reference was stored). Is there any way to achieve similar semantics without having to either create and abandon a superfluous temporary object, or add additional overhead every time the field is read??

Your third version does not work. The guarantees made for a properly constructed object stored in a final instance field apply to reads of that final field only. Since the other threads don’t read that final variable, there is no guaranty made.
Most notably, the fact that the initialization of the array has to be completed before the array reference is stored in the final Holder.it variable does not say anything about when the sumOfPrevElements3 variable will be written (as seen by other threads). In practice, a JVM might optimize away the entire Holder instance creation as it has no side-effects, thus the resulting code behaves like an ordinary unsafe publication of an int[] array.
For using the final field publication guaranty you have to publish the Holder instance containing the final field, there is no way around it.
But if that additional instance annoys you, you should really consider using a simple volatile variable. After all, you are making only assumptions about the cost of that volatile variable, in other words, thinking about premature optimization.
After all, detecting a change made by another thread doesn’t have to be expensive, e.g. on x86 it doesn’t even need an access to the main memory as it has cache coherence. It’s also possible that an optimizer detects that you never write to the variable again once it became non-null, then enabling almost all optimizations possible for ordinary fields once a non-null reference has been read.
So the conclusion is as always: measure, don’t guess. And start optimizing only once you found a real bottleneck.

I think your second and third examples do work (sort of, as you say the reference itself might not be noticed by another thread, which might re-assign the array. That's a lot of extra work!).
But those examples are based on a faulty premise: it is not true that a volatile field requires the reader to "notice" the change. In fact, volatile and final fields perform exactly the same operation. The read operation of a volatile or a final has no overhead on most CPU architectures. I believe on a write volatile has a tiny amount of extra overhead.
So I would just use volatile here, and not worry about your supposed "optimizations". The difference in speed, if any, is going to be extremely slight, and I'm talking like an extra 4 bytes written with a bus-lock, if that. And your "optimized" code is pretty god-awful to read.
As a minor pendant, it is not true that final fields require you to have the sole reference to an object to make it immutable and thread safe. The spec only requires you to prevent changes to the object. Having the sole reference to an object is one way to prevent changes, sure. But objects that are already immutable (like java.lang.String for example) can be shared without problems.
In summary: Premature Optimization is the Root of All Evil.. Loose the tricky nonsense and just write a simple array update with assignment to a volatile.
volatile int[] sumOfPrevElements;
public int getSumOfElementsBefore(int index) {
if( sumOfPrevElements != null ) return sumOfPrevElements[index];
sumOfPrevElements = makeNewSumsArray();
return sumOfPrevElements[index];
}

Related

How to delete an (effectively) final array in Java

I want to delete remove the reference to a large array by nulling the reference after I used it. This gives me a compiler error however, since the parallel assignment to the array requires the array to be (effectivly) final (at least that is what I think the problem is...). How can I allow the garbage collection to remove the array?
double[][] arr = new double[n][n];
IntStream.range(0, n).parallel().forEach(i -> {
for(int j=0;j<i;j++) {
directDistances[i][j] = directDistances[j][i] = ...;
}
});
//Use arr here...
arr = null; //arr no longer needed.
//This gives the error "Local variable defined in an enclosing scope must be final or effectively final."
I want to delete remove the reference to a large array by nulling the reference after I used it
Don't.
All implementations that I am aware of in the JVM world, will scan thread stacks to find out reachable objects. This means that scope of the method has nothing to do with how long an Object is kept alive. In simpler words:
void yourMethod(){
byte [] bytes = ....
// use bytes array somehow
// stop using the byte array
// .... 10_000 lines of code
// done
}
immediately after the line // stop using the byte array, bytes IS eligible for garbage collection. It is not going to be eligible after the method ends. scope of the method (everything between { and }) does not influence how much bytes is going to stay alive. here is an example that proves this.
The array becomes eligible for garbage collection at the latest when the method returns - you don't need to set it to null.
If you have a long method and are concerned that the array is kept around for the rest of it, the solution is to write smaller methods. Dividing the functionality among smaller methods may also improve readability and reusability.
If you can't or don't want to write smaller methods, introducing separate blocks in the method may help. Local variable declarations are local to the block, so this "trick" also lets you re-use a variable name in different blocks in the method.
void largeMethod() {
first: {
final int number = 1;
}
second: {
final int number = 2;
}
}
Technically, the array becomes eligible for garbage collection after its last use, which can be in the middle of the method - before the variable goes out of scope. This is explicitly allowed by section 12.6.1 in the language specification:
Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a Java compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
While the specification allows this optimization, it does not require it. If you find that the optimization is not being made in a particular situation and you need a better guarantee, splitting the big method into smaller methods will help.
Use AtomicReference ar = new AtomicReference () ; ar. set(arr) ;
This will provide you with an effectively final array
Then use ar.set() and ar.get() methods to modify the array

How to create thread safe object array in Java?

I've searched for this question and I only found answer for primitive type arrays.
Let's say I have a class called MyClass and I want to have an array of its objects in my another class.
class AnotherClass {
[modifiers(?)] MyClass myObjects;
void initFunction( ... ) {
// some code
myObjects = new MyClass[] { ... };
}
MyClass accessFunction(int index) {
return myObjects[index];
}
}
I read somewhere that declaring an array volatile does not give volatile access to its fields, but giving a new value of the array is safe.
So, if I understand it well, if I give my array a volatile modifier in my example code, it would be (kinda?) safe. In case of I never change its values by the [] operator.
Or am I wrong? And what should I do if I want to change one of its value? Should I create a new instance of the array an replace the old value with the new in the initial assignment?
AtomicXYZArray is not an option because it is only good for a primitive type arrays. AtomicIntegerArray uses native code for get() and set(), so it didn't help me.
Edit 1:
Collections.synchronizedList(...) can be a good alternative I think, but now I'm looking for arrays.
Edit 2: initFunction() is called from a different class.
AtomicReferenceArray seems to be a good answer. I didn't know about it, up to now. (I'm still interested in that my example code would work with volatile modifier (before the array) with only this two function called from somewhere else.)
This is my first question. I hope I managed to reach the formal requirements. Thanks.
Yes you are correct when you say that the volatile word will not fulfill your case, as it will protect the reference to the array and not its elements.
If you want both, Collections.synchronizedList(...) or synchronized collections is the easiest way to go.
Using modifiers like you are inclining to do is not the way to do this, as you will not affect the elements.
If you really, must, use and array like this one: new MyClass[]{ ... };
Then AnotherClass is the one that needs to take responsibility for its safety, you are probably looking for lower level synchronization here: synchronized key word and locks.
The synchonized key word is the easier and yuo may create blocks and method that lock in a object, or in the class instance by default.
In higher levels you can use Streams to perform a job for you. But in the end, I would suggest you use a synchronized version of an arraylist if you are already using arrays. and a volatile reference to it, if necessary. If you do not update the reference to your array after your class is created, you don't need volatile and you better make it final, if possible.
For your data to be thread-safe you want to ensure that there are no simultaneous:
write/write operations
read/write operations
by threads to the same object. This is known as the readers/writers problem. Note that it is perfectly fine for two threads to simultaneously read data at the same time from the same object.
You can enforce the above properties to a satisfiable level in normal circumstances by using the synchronized modifier (which acts as a lock on objects) and atomic constructs (which performs operations "instantaneously") in methods and for members. This essentially ensures that no two threads can access the same resource at the same time in a way that would lead to bad interleaving.
if I give my array a volatile modifier in my example code, it would be (kinda?) safe.
The volatile keyword will place the array reference in main memory and ensure that no thread can cache a local copy of it within their private memory, which helps with thread visibility although it won't guarantee thread safety by itself. Also the use of volatile should be used sparsely unless by experienced programmers as it may cause unintended effects on the program.
And what should I do if I want to change one of its value? Should I create a new instance of the array an replace the old value with the new in the initial assignment?
Create synchronized mutator methods for the mutable members of your class if they need to be changed or use the methods provided by atomic objects within your classes. This would be the simplest approach to changing your data without causing any unintended side-effects (for example, removing the object from the array whilst a thread is accessing the data in the object being removed).
Volatile does actually work in this case with one caveat: all the operations on MyClass may only read values.
Compared to all what you might read about what volatile does, it has one purpose in the JMM: creating a happens-before relationship. It only affects two kinds of operations:
volatile read (eg. accessing the field)
volatile write (eg. assignment to the field)
That's it. A happens-before relationship, straight from the JLS §17.4.5:
Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
These relationships are transitive. Taken all together this implies some important points: All actions taken on a single thread happened-before that thread's volatile write to that field (third point above). A volatile write of a field happens-before a read of that field (point two). So any other thread that reads the volatile field would see all the updates, including all referred to objects like array elements in this case, as visible (first point). Importantly, they are only guaranteed to see the updates visible when the field was written. This means that if you fully construct an object, and then assign it to a volatile field and then never mutate it or any of the objects it refers to, it will be never be in an inconsistent state. This is safe taken with the caveat above:
class AnotherClass {
private volatile MyClass[] myObjects = null;
void initFunction( ... ) {
// Using a volatile write with a fully constructed object.
myObjects = new MyClass[] { ... };
}
MyClass accessFunction(int index) {
// volatile read
MyClass[] local = myObjects;
if (local == null) {
return null; // or something else
}
else {
// should probably check length too
return local[index];
}
}
}
I'm assuming you're only calling initFunction once. Even if you did call it more than once you would just clobber the values there, it wouldn't ever be in an inconsistent state.
You're also correct that updating this structure is not quite straightforward because you aren't allowed to mutate the array. Copy and replace, as you stated is common. Assuming that only one thread will be updating the values you can simply grab a reference to the current array, copy the values into a new array, and then re-assign the newly constructed value back to the volatile reference. Example:
private void add(MyClass newClass) {
// volatile read
MyClass[] local = myObjects;
if (local == null) {
// volatile write
myObjects = new MyClass[] { newClass };
}
else {
MyClass[] withUpdates = new MyClass[local.length + 1];
// System.arrayCopy
withUpdates[local.length] = newClass;
// volatile write
myObjects = withUpdates;
}
}
If you're going to have more than one thread updating then you're going to run into issues where you lose additions to the array as two threads could copy and old array, create a new array with their new element and then the last write would win. In that case you need to either use more synchronization or AtomicReferenceFieldUpdater

Are final fields really useful regarding thread-safety?

I have been working on a daily basis with the Java Memory Model for some years now. I think I have a good understanding about the concept of data races and the different ways to avoid them (e.g, synchronized blocks, volatile variables, etc). However, there's still something that I don't think I fully understand about the memory model, which is the way that final fields of classes are supposed to be thread safe without any further synchronization.
So according to the specification, if an object is properly initialized (that is, no reference to the object escapes in its constructor in such a way that the reference can be seen by another thread), then, after construction, any thread that sees the object will be guaranteed to see the references to all the final fields of the object (in the state they were when constructed), without any further synchronization.
In particular, the standard (http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4) says:
The usage model for final fields is a simple one: Set the final fields
for an object in that object's constructor; and do not write a
reference to the object being constructed in a place where another
thread can see it before the object's constructor is finished. If this
is followed, then when the object is seen by another thread, that
thread will always see the correctly constructed version of that
object's final fields. It will also see versions of any object or
array referenced by those final fields that are at least as up-to-date
as the final fields are.
They even give the following example:
class FinalFieldExample {
final int x;
int y;
static FinalFieldExample f;
public FinalFieldExample() {
x = 3;
y = 4;
}
static void writer() {
f = new FinalFieldExample();
}
static void reader() {
if (f != null) {
int i = f.x; // guaranteed to see 3
int j = f.y; // could see 0
}
}
}
In which a thread A is supposed to run "reader()", and a thread B is supposed to run "writer()".
So far, so good, apparently.
My main concern has to do with... is this really useful in practice? As far as I know, in order to make thread A (which is running "reader()") see the reference to "f", we must use some synchronization mechanism, such as making f volatile, or using locks to synchronize access to f. If we don't do so, we are not even guaranteed that "reader()" will be able to see an initialized "f", that is, since we have not synchronized access to "f", the reader will potentially see "null" instead of the object that was constructed by the writer thread. This issue is stated in http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#finalWrong , which is one of the main references for the Java Memory Model [bold emphasis mine]:
Now, having said all of this, if, after a thread constructs an
immutable object (that is, an object that only contains final fields),
you want to ensure that it is seen correctly by all of the other
thread, you still typically need to use synchronization. There is no
other way to ensure, for example, that the reference to the immutable
object will be seen by the second thread. The guarantees the program
gets from final fields should be carefully tempered with a deep and
careful understanding of how concurrency is managed in your code.
So if we are not even guaranteed to see the reference to "f", and we must therefore use typical synchronization mechanisms (volatile, locks, etc.), and these mechanisms do already cause data races to go away, the need for final is something I would not even consider. I mean, if in order to make "f" visible to other threads we still need to use volatile or synchronized blocks, and they already make internal fields be visible to the other threads... what's the point (in thread safety terms) in making a field final in the first place?
I think that you are misunderstanding what the JLS example is intended to show:
static void reader() {
if (f != null) {
int i = f.x; // guaranteed to see 3
int j = f.y; // could see 0
}
}
This code does not guarantee that the latest value of f will be seen by the thread that calls reader(). But what it is saying is that if you do see f as non-null, then f.x is guaranteed to be 3 ... despite the fact that we didn't actually do any explicit synchronizing.
Well is this implicit synchronization for finals in constructors useful? Certainly it is ... IMO. It means that we don't need to do any extra synchronization each time we accessed an immutable object's state. That is a good thing, because synchronization typically entails cache read-through or write-through, and that slows your program down.
But what Pugh is saying is that you will typically need to synchronize to get hold of the reference to the immutable object in the first place. He is making the point that using immutable objects (implemented using final) does not excuse you from the need to synchronize ... or from the need to understand the concurrency / synchronization implementation of your application.
The problem is that we still need to be sure that reader will se a non-null "f", and that's only possible if we use other synchronization mechanism that will already provide the semantics of allowing us to see 3 for f.x. And if that's the case, why bother using final for thread safety stuff?
There is a difference between synchronizing to get the reference and synchronizing to use the reference. The first one I may need to do only once. The second one I may need to do lots of times ... with the same reference. And even if it is one-to-one, I have still halved the number of synchronizing operations ... if I (hypothetically) implement the immutable object as thread-safe.
TL;DR: Most software developers should ignore the special rules regarding final variables in the Java Memory Model. They should adhere to the general rule: If a program is free of data races, all executions will appear to be sequentially consistent. In most cases, final variables can not be used to improve the performance of concurrent code, because the special rule in the Java Memory Model creates some additional costs for final variables, what makes volatile superior to final variables for almost all use cases.
The special rule about final variables prevents in some cases, that a final variable can show different values. However, performance-wise the rule is irrelevant.
Having said that, here is a more detailed answer. But I have to warn you. The following description might contain some precarious information, that most software developers should never care about, and it's better if they don't know about it.
The special rule about final variables in the Java Memory Model somehow implies, that it makes a difference for the Java VM and Java JIT compiler, if a member variable is final or if it's not.
public class Int {
public /* final */ int value;
public Int(int value) {
this.value = value;
}
}
If you take a look at the Hotspot source code, you will see that the compiler checks if the constructor of a class writes at least one final variable. If it does so, the compiler will emit additional code for the constructor, more precisely a memory release barrier. You will also find the following comment in the source code:
This method (which must be a constructor by the rules of Java)
wrote a final. The effects of all initializations must be
committed to memory before any code after the constructor
publishes the reference to the newly constructor object.
Rather than wait for the publication, we simply block the
writes here. Rather than put a barrier on only those writes
which are required to complete, we force all writes to complete.
That means the initialization of a final variable is similar to a write of a volatile variable. It implies some kind of memory release barrier. However, as can be seen from the quoted comment, final variables might be even more expensive. And what's even worse, you have these additional costs for final variables regardless whether they are used in concurrent code or not.
That's awful, because we want software developers to use final variables in order to increase the readability and maintainability of source code. Unfortunately, using final variables can significantly impact the performance of a program.
The question remains: Are there any use cases where the special rule regarding final variables helps to improve the performance of concurrent code?
That's hard to tell, because it depends on the actual implementation of the Java VM and the memory architecture of the machine. I haven't seen any such use cases until now. A quick glance at the source code of the package java.util.concurrent has also revealed nothing.
The problem is: The initialization of a final variable is about as expensive as a write of a volatile or atomic variable. If you use a volatile variable for the reference of the newly created object, you get the same behaviour and costs with the exception, that the reference will also be published immediately. So, there is basically no benefit in using final variables for concurrent programming.
You are right, since locking makes stronger guarantees, the guarantee about availability of finals is not particularly useful in the presence of locking. However, locking is not always necessary to ensure reliable concurrent access.
As far as I know, in order to make thread A (which is running "reader()") see the reference to "f", we must use some synchronization mechanism, such as making f volatile, or using locks to synchronize access to f.
Making f volatile is not a synchronization mechanism; it forces threads to read the memory each time the variable is accessed, but it does not synchronize access to a memory location. Locking is a way to synchronize access, but it is not necessary in practice to guarantee that the two threads share data reliably. For example, you could use a ConcurrentLinkedQueue<E> class, which is a lock-free concurrent collection* , to pass data from a reader thread to a writer thread, and avoid synchronization. You could also use AtomicReference<T> to ensure reliable concurrent access to an object without locking.
It is when you use lock-free concurrency that the guarantee about the visibility of final fields come in handy. If you make a lock-free collection, and use it to store immutable objects, your threads would be able to access the content of the objects without additional locking.
* ConcurrentLinkedQueue<E> is not only lock-free, but also a wait-free collection (i.e. a lock-free collection with additional guarantees not relevant to this discussion).
Yes final final fields are useful in terms of thread-safety. It may not be useful in your example, however if you look at the old ConcurrentHashMap implementation the get method doesn't apply any locking while it search for the value, though there is a risk that while look up is happening the list might change (think of ConcurrentModificationException ). However CHM uses the list made of final filed for 'next' field guaranteeing the consistency of the list (the items in the front/yet-to see will not grow or shrink). So the advantage is thread-safety is established without synchronization.
From the article
Exploiting immutability
One significant source of inconsistency is avoided by making the Entry
elements nearly immutable -- all fields are final, except for the
value field, which is volatile. This means that elements cannot be
added to or removed from the middle or end of the hash chain --
elements can only be added at the beginning, and removal involves
cloning all or part of the chain and updating the list head pointer.
So once you have a reference into a hash chain, while you may not know
whether you have a reference to the head of the list, you do know that
the rest of the list will not change its structure. Also, since the
value field is volatile, you will be able to see updates to the value
field immediately, greatly simplifying the process of writing a Map
implementation that can deal with a potentially stale view of memory.
While the new JMM provides initialization safety for final variables,
the old JMM does not, which means that it is possible for another
thread to see the default value for a final field, rather than the
value placed there by the object's constructor. The implementation
must be prepared to detect this as well, which it does by ensuring
that the default value for each field of Entry is not a valid value.
The list is constructed such that if any of the Entry fields appear to
have their default value (zero or null), the search will fail,
prompting the get() implementation to synchronize and traverse the
chain again.
Article link: https://www.ibm.com/developerworks/library/j-jtp08223/

Effectively Immutable Object

I want to make sure that I correctly understand the 'Effectively Immutable Objects' behavior according to Java Memory Model.
Let's say we have a mutable class which we want to publish as an effectively immutable:
class Outworld {
// This MAY be accessed by multiple threads
public static volatile MutableLong published;
}
// This class is mutable
class MutableLong {
private long value;
public MutableLong(long value) {
this.value = value;
}
public void increment() {
value++;
}
public long get() {
return value;
}
}
We do the following:
// Create a mutable object and modify it
MutableLong val = new MutableLong(1);
val.increment();
val.increment();
// No more modifications
// UPDATED: Let's say for this example we are completely sure
// that no one will ever call increment() since now
// Publish it safely and consider Effectively Immutable
Outworld.published = val;
The question is:
Does Java Memory Model guarantee that all threads MUST have Outworld.published.get() == 3 ?
According to Java Concurrency In Practice this should be true, but please correct me if I'm wrong.
3.5.3. Safe Publication Idioms
To publish an object safely, both the reference to the object and the
object's state must be made visible to other threads at the same time.
A properly constructed object can be safely published by:
- Initializing an object reference from a static initializer;
- Storing a reference to it into a volatile field or AtomicReference;
- Storing a reference to it into a final field of a properly constructed object; or
- Storing a reference to it into a field that is properly guarded by a lock.
3.5.4. Effectively Immutable Objects
Safely published effectively immutable objects can be used safely by
any thread without additional synchronization.
Yes. The write operations on the MutableLong are followed by a happens-before relationship (on the volatile) before the read.
(It is possible that a thread reads Outworld.published and passes it on to another thread unsafely. In theory, that could see earlier state. In practice, I don't see it happening.)
There is a couple of conditions which must be met for the Java Memory Model to guarantee that Outworld.published.get() == 3:
the snippet of code you posted which creates and increments the MutableLong, then sets the Outworld.published field, must happen with visibility between the steps. One way to achieve this trivially is to have all that code running in a single thread - guaranteeing "as-if-serial semantics". I assume that's what you intended, but thought it worth pointing out.
reads of Outworld.published must have happens-after semantics from the assignment. An example of this could be having the same thread execute Outworld.published = val; then launch other the threads which could read the value. This would guarantee "as if serial" semantics, preventing re-ordering of the reads before the assignment.
If you are able to provide those guarantees, then the JMM will guarantee all threads see Outworld.published.get() == 3.
However, if you're interested in general program design advice in this area, read on.
For the guarantee that no other threads ever see a different value for Outworld.published.get(), you (the developer) have to guarantee that your program does not modify the value in any way. Either by subsequently executing Outworld.published = differentVal; or Outworld.published.increment();. While that is possible to guarantee, it can be so much easier if you design your code to avoid both the mutable object, and using a static non-final field as a global point of access for multiple threads:
instead of publishing MutableLong, copy the relevant values into a new instance of a different class, whose state cannot be modified. E.g.: introduce the class ImmutableLong, which assigns value to a final field on construction, and doesn't have an increment() method.
instead of multiple threads accessing a static non-final field, pass the object as a parameter to your Callable/Runnable implementations. This will prevent the possibility of one rogue thread from reassigning the value and interfering with the others, and is easier to reason about than static field reassignment. (Admittedly, if you're dealing with legacy code, this is easier said than done).
The question is: Does Java Memory Model guarantee that all threads
MUST have Outworld.published.get() == 3 ?
The short answer is no. Because other threads might access Outworld.published before it has been read.
After the moment when Outworld.published = val; had been performed, under condition that no other modifications done with the val - yes - it always be 3.
But if any thread performs val.increment then its value might be different for other threads.

Thread Safe Copying of Objects in Java

I have a static array of classes similar to the following:
public class Entry {
private String sharedvariable1= "";
private String sharedvariable2= "";
private int sharedvariable3= -1;
private int mutablevariable1 = -1
private int mutablevariable2 = -2;
public Entry (String sharedvariable1,
String sharedvariable2,
int sharedvariable3) {
this.sharedvariable1 = sharedvariable1;
this.sharedvariable2 = sharedvariable2;
this.sharedvariable3 = sharedvariable 3;
}
public Entry (Entry entry) { //copy constructor.
this (entry.getSharedvariable1,
entry.getSharedvariable2,
entry.getSharedvaraible3);
}
....
/* other methods including getters and setters*/
}
At some point in my program I access an instance of this object and make a copy of it using the copy constructor above. I then change the value of the two mutable variables above. This program is running in a multithreaded environment. Please note. ALL VARIABLES ARE SET WITH THEIR INITIAL VALUES PRIOR TO THREADING. Only after the program is threaded an a copy is made, are the variables changed. I believe that it is thread safe because I am only reading the static object, not writing to it (even shared variable3, although an int and mutable is only read) and I am only making changes to the copy of the static object (and the copy is being made within a thread). But, I want to confirm that my thinking is correct here.
Can someone please evaluate what I am doing?
It is not thread-safe. You need to wrap anything that modifies the sharedvariables thusly:
synchronized (this) {
this.sharedvariable1 = newValue;
}
For setters, you can do this instead:
public synchronized void setSharedvariable1(String sharedvariable1) {
this.sharedvariable1 = sharedvariable1;
}
Then in your copy constructor, you'll do similarly:
public Entry (Entry entry) {
this();
synchronized(entry) {
this.setSharedvariable1(entry.getSharedvariable1());
this.setSharedvariable2(entry.getSharedvariable2());
this.setSharedvariable3(entry.getSharedvariable3());
}
}
This ensures that if modifications are being made to an instance, the copy operation will wait until the modifications are done.
It is not thread-safe, you should synchronize in your copy constructor. You are reading each of the three variables from the original object in your copy constructor. These operations are not atomic together. So it could be that while you are reading the first value the third value gets changed by another thread. In this case you have a "copied" object in an inconsistent state.
It's not thread safe. And I mean that is does not guarantee thread safety for multiple threads that use the same Entry instance.
The problem I see here is as follows:
Thread 1 starts constructing an Entry instance. It does not keep that instance hidden from other threads access.
Thread 2 accesses that instance, using its copy constructor, while it is still in the middle of construction.
Considering the initial value for Entry's field private int sharedvariable3= -1;, the result might be that the new "copied" instance created by Thread 2 will have its sharedvariable3 field set to 0 (the default for int class fields in java).
That's the problem.
If it bothers you, you've got to either synchronize the read/write operations, or take care of Entry instances publication. Meaning, don't allow access of other threads to an Entry instance that is in the middle of construction.
I don't really get, why you consider private instance variables as shared. Usually shared fields are static and not private - I recommend you not to share private instance variables. For thread-safety you should synchronize the operations that mutate the variables values.
You can use the synchronized keyword for that but choose the correct monitor object (I think the entry itself should do). Another alternative is to use some lock implementation from java.util.concurrent. Usually locks offer higher throughput and better granularity (for example multiple parallel reads but only one write at any given time).
Another thing you have to think about is what is called the memory barrier. Have a look at this interesting article http://java.dzone.com/articles/java-memory-model-programer%E2%80%99s
You can enforce the happens before semantic with the volatile keyword. Explicit synchronization (locks or synchonized code) also crosses the memory barrier and enforces happens before semantics.
Finally a general piece of advice: You should avoid shared mutable state at all costs. Synchronization is a pain in the ass (performance and maintenance wise). Bugs that result from incorrect synchronization are incredibly hard to detect. It is better to design for immutability or isolated mutability (e.g. actors).
The answer is that it is thread safe under the conditions outlined since I am only reading from the variables in their static state and only changing the copies.

Categories