What is the purpose or value of creating a local reference to a static volatile variable that is already kept as a member field. This code here is from java.util.Scanner JDK 6b14 here.
class Scanner {
private static volatile Pattern linePattern;
...
private static Pattern linePattern() {
Pattern lp = linePattern;
if (lp == null)
linePattern = lp = Pattern.compile("...");
return lp;
}
...
}
The Java Tutorials: "Reads and writes are atomic for all variables declared volatile (including long and double variables)... any write to a volatile variable establishes a happens-before relationship with subsequent reads of that same variable. "
This means that reading the reference to the Pattern object won't fail half-way because it has changed. The volatile keyword is supposed to protect exactly these kinds of accesses, so I don't the duplicating local variable is meant to ensure that a valid value is returned.
Also, the lazy initialization can be done on the member field without needing an intermediary local variable:
if (linePattern == null) linePattern = Pattern.compile("...");
It looks to be a byte-code optimization as seen here and here. Using local variables produces smaller bytecode (less instructions) as well as less accesses to the actual value (which is an expensive volatile read). However they have not used the final variable optimization along with it, so I'm skeptical about drawing this conclusion.
Lazy initialization, i.e. delay the work until it's really necessary.
This "speed" up things. Access to volatile variables is expensive. Use can get away with this overhead by assign it to a stack variable and access that instead
It guarantees that returned value is not NULL - even if the static variable is set to NULL between check and return.
And at the same time it is an unsynchronized lazy init with re-initialization if needed ;).
For volatile fields, the linePattern might change in between different lines. Copying the reference to a local variable makes certain that you can't have an inconsistent state. For example, if you had written
if (linePattern == null)
linePattern = Pattern.compile("...");
then linePattern might have stopped being null while the Pattern.compile was executing.
Related
I have got the following code, which is somewhat abstracted from a real implementation I had in a Java program:
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = bufferedReader.readLine()) != null) {
String lineReference = line;
runLater(() -> consumeString(lineReference));
}
Here I need to use a reference copy for the lambda expression, when I try to use line I get:
Local variables referenced from a lambda expression must be final or effectively final
It seems rather awkward to me, as all I do to fix it is obtain a new reference to the object, this is something the compiler could also figure out by itself.
So I would say line is effectively final here, as it only gets the assignment in the loop and nowhere else.
Could anyone shed some more light on this and explain why exactly it is needed here and why the compile cannot fix it?
So I would say line is effectively final here, as it only gets the assignment in the loop and nowhere else.
No, it's not final because during the variable's lifetime it is getting assigned a new value on every loop iteration. This is the complete opposite of final.
I get: 'Local variables referenced from a lambda expression must be final or effectively final'. It seems rather awkward to me.
Consider this: You're passing the lambda to runLater(...). When the lambda finally executes, which value of line should it use? The value it had when the lambda was created, or the value it had when the lambda executed?
The rule is that lambdas (appear to) use the current value at time of lambda execution. They do not (appear to) create a copy of the variable. Now, how is this rule implemented in practice?
If line is a static field, it's easy because there is no state for the lambda to capture. The lambda can read the current value of the field whenever it needs to, just as any other code can.
If line is an instance field, that's also fairly easy. The lambda can capture the reference to the object in a private hidden field in each lambda object, and access the line field through that.
If line is a local variable within a method (as it is in your example), this is suddenly not easy. At an implementation level, the lambda expression is in a completely different method, and there is no easy way for outside code to share access to the variable which only exists within the one method.
To enable access to the local variable, the compiler would have to box the variable into some hidden, mutable holder object (such as a 1-element array) so that the holder object could be referenced from both the enclosing method and the lambda, giving them both access to the variable within.
Although that solution would technically work, the behavior it achieves would be undesirable for a bundle of reasons. Allocating the holder object would give local variables an unnatural performance characteristic which would not be obvious from reading the code. (Merely defining a lambda that used a local variable would make the variable slower throughout the method.) Worse than that, it would introduce subtle race conditions into otherwise simple code, depending on when the lambda is executed. In your example, by the time the lambda executes, any number of loop iterations could have happened, or the method might have returned, so the line variable could have any value or no defined value, and almost certainly wouldn't have the value you wanted. So in practice you'd still need the separate, unchanging lineReference variable! The only difference is that the compiler wouldn't require you do to that, so it would allow you to write broken code. Since the lambda could ultimately execute on a different thread, this would also introduce subtle concurrency and thread visibility complexity to local variables, which would require the language to allow the volatile modifier on local variables, and other bother.
So, for the lambda to see the current changing values of local variables would introduce a lot of fuss (and no advantages since you can do the mutable holder trick manually if you ever need to). Instead, the language says no to the whole kerfuffle by simply demanding that the variable be final (or effectively final). That way, the lambda can capture the value of the local variable at lambda creation time, and it doesn't need to worry about detecting changes because it knows there can't be any.
This is something the compiler could also figure out by itself
It did figure it out, which is why it disallows it. The lineReference variable is of absolutely no benefit to the compiler, which could easily capture the current value of line for use in the lambda at each lambda object's creation time. But since the lambda wouldn't detect changes to the variable (which would be impractical and undesirable for the reasons explained above), the subtle difference between capture of fields and capture of locals would be confusing. The "final or effectively final" rule is for the programmer's benefit: it prevents you from wondering why changes to a variable don't appear within a lambda, by preventing you from changing them at all. Here's an example of what would happen without that rule:
String field = "A";
void foo() {
String local = "A";
Runnable r = () -> System.out.println(field + local);
field = "B";
local = "B";
r.run(); // output: "BA"
}
That confusion goes away if any local variables referenced within the lambda are (effectively) final.
In your code, lineReference is effectively final. Its value is assigned exactly once during its lifetime, before it goes out of scope at the end of each loop iteration, which is why you can use it in the lambda.
There is an alternative arrangement of your loop possible by declaring line inside the loop body:
for (;;) {
String line = bufferedReader.readLine();
if (line == null) break;
runLater(() -> consumeString(line));
}
This is allowed because line now goes out of scope at the end of each loop iteration. Each iteration effectively has a fresh variable, assigned exactly once. (However, at a low level the variable is still stored in the same CPU register, so it's not like it has to be repeatedly "created" and "destroyed". What I mean is, there is happily no extra cost to declaring variables inside a loop like this, so it's fine.)
Note: All this is not unique to lambdas. It also applies identically to any classes declared lexically inside the method, from which lambdas inherited the rules.
Note 2: It could be argued that lambdas would be simpler if they followed the rule of always capturing the values of variables they use at lambda creation time. Then there would be no difference in behavior between fields and locals, and no need for the "final or effectively final" rule because it would be well-established that lambdas don't see changes made after lambda creation time. But this rule would have its own uglinesses. As one example, for an instance field x accessed within a lambda, there would be a difference between the behavior of reading x (capturing final value of x) and this.x (capturing final value of this, seeing its field x changing). Language design is hard.
If you would use line instead of lineReference in the lambda expression, you would be passing to your runLater method a lambda expression that would execute consumeString on a String referred by line.
But line keeps changing as you assign new lines to it. When you finally execute the method of the functional interface returned by the lambda expression, only then will it get the current value of line and use it in the call to consumeString. At this point the value of line would not be the same as it was when you passed the lambda expression to the runLater method.
After reading a bunch of questions / articles on this topic there is still one thing unclear to me.
From what I understand (and please correct me if I'm wrong) is that the value of a variable can be cached locally to a thread so if one thread updates the value of that variable this change may not be visible to another thread. The use of volatile then is to essentially force all threads to read the value of the variable from the same location. Furthermore, all literature on this topic states that synchronizing on that variable will have the same effect.
My problem is that nothing that I've read ever explicitly states that synchronizing on a different variable will cause this same behavior but frequently provides a code example stating that in the following two cases the value that is read from the variable will be up to date :
volatile int x;
...
int y = x;
and
final Object lock = new Object();
int x;
...
synchronized(lock) {
int y = x;
}
The question is then: is it the case that synchronizing on any arbitrary variable will force every variable access within the synchronized block to access the most up to date value of that variable?
is it the case that synchronizing on any
arbitrary variable will force every variable access within the
synchronized block to access the most up to date value of that
variable?
You can synchronize on any variable for a read so long as the write of that variable was done under synchronization of the same variable.
In your example, so long as something like the following happens then all writes that occurred prior to the write of x will be visible after subsequent read:
synchronized(lock){
x = 10;
}
So to your earlier point:
...nothing I've read ever explicitly states that synchronizing on a different
variable will cause this same behavior...
That is because it doesn't offer the same behavior. The happens-before relationship occurrs on a few occasions, two important ones in your case are
Write and subsequent read of the same volatile variable
The exit and subsequent enter of a monitor on the same object
There is an enlightening article here where it is mentioned:
In the Java Memory Model a volatile field has a store barrier inserted after a write to it and a load barrier inserted before a read of it. ...
Note that there is nothing specific to what field is being accessed. This means that accessing any volatile field generates a barrier for all cached variables.
Synchronising has similar functionality.
This question already has answers here:
Why is volatile used in double checked locking
(8 answers)
Closed 4 years ago.
I'm working my way through item 71, "Use lazy initialization judiciously", of Effective Java (second edition). It suggests the use of the double-check idiom for lazy initialization of instance fields using this code (pg 283):
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) { //First check (no locking)
synchronized(this) {
result = field;
if (result == null) //Second check (with locking)
field = result = computeFieldValue();
}
}
return result;
}
So, I actually have several questions:
Why is the volatile modifier required on field given that initialization takes place in a synchronized block? The book offers this supporting text: "Because there is no locking if the field is already initialized, it is critical that the field be declared volatile". Therefore, is it the case that once the field is initialized, volatile is the only guarantee of multiple thread consistent views on field given the lack of other synchronization? If so, why not synchronize getField() or is it the case that the above code offers better performance?
The text suggests that the not-required local variable, result, is used to "ensure that field is read only once in the common case where it's already initialized", thereby improving performance. If result was removed, how would field be read multiple times in the common case where it was already initialized?
Why is the volatile modifier required on field given that initialization takes place in a synchronized block?
The volatile is necessary because of the possible reordering of instructions around the construction of objects. The Java memory model states that the real-time compiler has the option to reorder instructions to move field initialization outside of an object constructor.
This means that thread-1 can initialized the field inside of a synchronized but that thread-2 may see the object not fully initialized. Any non-final fields do not have to be initialized before the object has been assigned to the field. The volatile keyword ensures that field as been fully initialized before it is accessed.
This is an example of the famous "double check locking" bug.
If result was removed, how would field be read multiple times in the common case where it was already initialized?
Anytime you access a volatile field, it causes a memory-barrier to be crossed. This can be expensive compared to accessing a normal field. Copying a volatile field into a local variable is a common pattern if it is to be accessed in any way multiple times in the same method.
See my answer here for more examples of the perils of sharing an object without memory-barriers between threads:
About reference to object before object's constructor is finished
This a fairly complicated but it is related to now the compiler can rearrange things.
Basically the Double Checked Locking pattern does not work in Java unless the variable is volatile.
This is because, in some cases, the compiler can assign the variable so something other than null then do the initialisation of the variable and reassign it. Another thread would see that the variable is not null and attempt to read it - this can cause all sorts of very special outcomes.
Take a look at this other SO question on the topic.
Good questions.
Why is the volatile modifier required on field given that initialization takes place in a synchronized block?
If you have no synchronization, and you assign to that shared global field there is no promise that all writes that occur on construction of that object will be seen. For instance imagine FieldType looks like.
public class FieldType{
Object obj = new Object();
Object obj2 = new Object();
public Object getObject(){return obj;}
public Object getObject2(){return obj2;}
}
It is possible getField() returns a non-null instance but that instance getObj() and getObj2() methods can return null values. This is because without synchronization the writes to those fields can race with the consturction of the object.
How is this fixed with volatile? All writes that occur prior to a volatile write are visible after that volatile write occurs.
If result was removed, how would field be read multiple times in the common case where it was already initialized?
Storing locally once and reading throughout the method ensures one thread/process local store and all thread local reads. You can argue premature optimization in those regards but I like this style because you won't run yourself into strange reordering problems that can occur if you don't.
Say that I have a private variable and I have a setVariable() method for it which is synchronized, isn't it exactly the same as using volatile modifier?
No. Volatile means the variable isn't cached in any per-thread cache, and its value is always retrieved from main memory when needed. Synchronization means that those per-thread caches will be kept in sync at certain points. In theory, using a volatile variable can come with a great speed penalty if many threads need to read the value of the variable, but it is changed only rarely.
No, calling a synchronized getXXX/setXXX method is not the same as reading/writing to a volatile variable.
Multiple threads can concurrently read from or write to a volatile variable. But only one thread at a time can read from or write to a variable that is guarded by a synchronized block.
volatile variables are not synchronized (at least, not in the way synchronized stuff is synchronized). What volatile does is ensure that a variable is retrieved each time it's used (ie: it prevents certain kinds of optimization), and IIRC that it's read and written in the correct order. This could conceivably emulate some kinds of synchronization, but it can't work the same if your setter has to set more than one thing. (If you set two volatile variables, for example, there will be a point where one is set and the other isn't.)
Actually No.
volatile is actually weaker form of synchronization, when field is declared as a volatile the compiler and runtime understands that this variable is shared and operations on it shouldn't be reordered with other memory operations. Volatile variable aren't cached in registers or in caches where they are hidden from other processors, so a read of a volatile variable always return a recent write by any thread.
just an example :
First thread run :
while(stopped){
... do something
}
Second thread run :
stopped = true;
it's useful to declare stopped as a volatile boolean for the first thread to have a fresh value of it.
There is no any relation.
Basically
Volatile => it always retrieves parameter's latest value
Synchronized => it serves only 1 thread at the same time
I came across this statement:
In properly constructed objects, all
threads will see correct values of
final fields, regardless of how the
object is published.
Then why a volatile variable is used to safely
publishing an Immutable object?
I'm really confused. Can anybody make it clear with a suitable example?
In this case, the volatility would only ensure visibility of the new object; any other threads that happened to get hold of your object via a non-volatile field would indeed see the correct values of final fields as per JSR-133's initialization safety guarantees.
Still, making the variable volatile doesn't hurt; is correct from a memory management perspective anyway; and would be necessary for non-final fields initialised in a constructor (although there shouldn't be any of these in an immutable object). If you wish to share variables between threads, you'll need to ensure adequate synchronization to give visibility anyway; though in this case you're right, that there's no danger to the atomicity of the constructor.
Thanks to Tom Hawtin for pointing out I'd completely overlooked the JMM guarantees on final fields; previous incorrect answer is given below.
The reason for the volatile variable is that is establishes a happens-before relationship (according to the Java Memory Model) between the construction of the object, and the assignment of the variable. This achieves two things:
Subsequent reads of that variable from different threads are guaranteed to see the new value. Without marking the variable as volatile, these threads could see stale values of the reference.
The happens-before relationship places limits on what reorderings the compiler can do. Without a volatile variable, the assignment to the variable could happen before the object's constructor runs - hence other threads could get a reference to the object before it was fully constructed.
Since one of the fundamental rules of immutable objects is that you don't publish references during the constructor, it's this second point that is likely being referenced here. In a multithreaded environment without proper concurrent handling, it is possible for a reference to the object to be "published" before that object has been constructed. Thus another thread could get that object, see that one of its fields is null, and then later see that this "immutable" object has changed.
Note that you don't have to use volatile fields to achieve this if you have other appropriate synchronization primitives - for example, if the assignment (and all later reads) are done in a synchronized block on a given monitor - but in a "standalone" sense, marking the variable as volatile is the easiest way to tell the JVM "this might be read by multiple threads, please make the assignment safe in that context."
A volatile reference to an immutable object could be useful. This would allow you to swap one object for another to make the new data available to other threads.
I would suggets you look at using AtomicReference first however.
If you need final volatile fields you have a problem. All fields, including final ones are available to other threads as soon as the constructor returns. So if you pass an object to another thread in the constructor, it is possible for the other thread to see an inconsistent state. IMHO you should consider a different solution so you don't have to do this.
You cant really see the difference in Immutable class.see the below example.in Myclass.class
public static Foo getInstance(){
if(INSTANCE == null){
INSTANCE = new Foo();
}
return INSTANCE;
}
in the above code if Foo is declared final(final Foo INSTANCE;) it guarantees that it won't publish references during the constructor call.partial object construction is not possible
consider this...if this Myclass is Immutable, its state is not gonna change after object construction, making Volatile(volatile final Foo INSTANCE;) keyword redundant.but if this class allows its object state to be changed(Not immutable) multiple threads CAN actually update the object and some updates are not visible to other threads, hence volatile keyword ensures safety publication of objects in non-Immutable class.