Lambdas: local variables need final, instance variables don't - java

In a lambda, local variables need to be final, but instance variables don't. Why so?

The fundamental difference between a field and a local variable is that the local variable is copied when JVM creates a lambda instance. On the other hand, fields can be changed freely, because the changes to them are propagated to the outside class instance as well (their scope is the whole outside class, as Boris pointed out below).
The easiest way of thinking about anonymous classes, closures and labmdas is from the variable scope perspective; imagine a copy constructor added for all local variables you pass to a closure.

In a document of project lambda, State of the Lambda v4, under Section 7. Variable capture, it is mentioned that:
It is our intent to prohibit capture of mutable local variables. The
reason is that idioms like this:
int sum = 0;
list.forEach(e -> { sum += e.size(); });
are fundamentally serial; it is quite difficult to write lambda bodies
like this that do not have race conditions. Unless we are willing to
enforce—preferably at compile time—that such a function cannot escape
its capturing thread, this feature may well cause more trouble than it
solves.
Another thing to note here is, local variables are passed in the constructor of an inner class when you access them inside your inner class, and this won't work with non-final variable because value of non-final variables can be changed after construction.
While in case of an instance variable, the compiler passes a reference of the object and object reference will be used to access instance variables. So, it is not required in case of instance variables.
PS : It is worth mentioning that anonymous classes can access only final local variables (in Java SE 7), while in Java SE 8 you can access effectively final variables also inside lambda as well as inner classes.

In Java 8 in Action book, this situation is explained as:
You may be asking yourself why local variables have these restrictions.
First, there’s a key
difference in how instance and local variables are implemented behind the scenes. Instance
variables are stored on the heap, whereas local variables live on the stack. If a lambda could
access the local variable directly and the lambda were used in a thread, then the thread using the
lambda could try to access the variable after the thread that allocated the variable had
deallocated it. Hence, Java implements access to a free local variable as access to a copy of it
rather than access to the original variable. This makes no difference if the local variable is
assigned to only once—hence the restriction.
Second, this restriction also discourages typical imperative programming patterns (which, as we
explain in later chapters, prevent easy parallelization) that mutate an outer variable.

Because instance variables are always accessed through a field access operation on a reference to some object, i.e. some_expression.instance_variable. Even when you don't explicitly access it through dot notation, like instance_variable, it is implicitly treated as this.instance_variable (or if you're in an inner class accessing an outer class's instance variable, OuterClass.this.instance_variable, which is under the hood this.<hidden reference to outer this>.instance_variable).
Thus an instance variable is never directly accessed, and the real "variable" you're directly accessing is this (which is "effectively final" since it is not assignable), or a variable at the beginning of some other expression.

Putting up some concepts for future visitors:
Basically it all boils down to the point that compiler should be able to deterministically tell that lambda expression body is not working on a stale copy of the variables.
In case of local variables, compiler has no way to be sure that lambda expression body is not working on a stale copy of the variable unless that variable is final or effectively final, so local variables should be either final or effectively final.
Now, in case of instance fields, when you access an instance field inside the lambda expression then compiler will append a this to that variable access (if you have not done it explicitly) and since this is effectively final so compiler is sure that lambda expression body will always have the latest copy of the variable (please note that multi-threading is out of scope right now for this discussion). So, in case instance fields, compiler can tell that lambda body has latest copy of instance variable so instance variables need not to be final or effectively final. Please refer below screen shot from an Oracle slide:
Also, please note that if you are accessing an instance field in lambda expression and that is getting executed in multi-threaded environment then you could potentially run in problem.

It seems like you are asking about variables that you can reference from a lambda body.
From the JLS §15.27.2
Any local variable, formal parameter, or exception parameter used but not declared in a lambda expression must either be declared final or be effectively final (§4.12.4), or a compile-time error occurs where the use is attempted.
So you don't need to declare variables as final you just need to make sure that they are "effectively final". This is the same rule as applies to anonymous classes.

Within Lambda expressions you can use effectively final variables from the surrounding scope.
Effectively means that it is not mandatory to declare variable final but make sure you do not change its state within the lambda expresssion.
You can also use this within closures and using "this" means the enclosing object but not the lambda itself as closures are anonymous functions and they do not have class associated with them.
So when you use any field (let say private Integer i;)from the enclosing class which is not declared final and not effectively final it will still work as the compiler makes the trick on your behalf and insert "this" (this.i).
private Integer i = 0;
public void process(){
Consumer<Integer> c = (i)-> System.out.println(++this.i);
c.accept(i);
}

Here is a code example, as I didn't expect this either, I expected to be unable to modify anything outside my lambda
public class LambdaNonFinalExample {
static boolean odd = false;
public static void main(String[] args) throws Exception {
//boolean odd = false; - If declared inside the method then I get the expected "Effectively Final" compile error
runLambda(() -> odd = true);
System.out.println("Odd=" + odd);
}
public static void runLambda(Callable c) throws Exception {
c.call();
}
}
Output:
Odd=true

YES, you can change the member variables of the instance but you CANNOT change the instance itself just like when you handle variables.
Something like this as mentioned:
class Car {
public String name;
}
public void testLocal() {
int theLocal = 6;
Car bmw = new Car();
bmw.name = "BMW";
Stream.iterate(0, i -> i + 2).limit(2)
.forEach(i -> {
// bmw = new Car(); // LINE - 1;
bmw.name = "BMW NEW"; // LINE - 2;
System.out.println("Testing local variables: " + (theLocal + i));
});
// have to comment this to ensure it's `effectively final`;
// theLocal = 2;
}
The basic principle to restrict the local variables is about data and computation validity
If the lambda, evaluated by the second thread, were given the ability to mutate local variables. Even the ability to read the value of mutable local variables from a different thread would introduce the necessity for synchronization or the use of volatile in order to avoid reading stale data.
But as we know the principal purpose of the lambdas
Amongst the different reasons for this, the most pressing one for the Java platform is that they make it easier to distribute processing of collections over multiple threads.
Quite unlike local variables, local instance can be mutated, because it's shared globally. We can understand this better via the heap and stack difference:
Whenever an object is created, it’s always stored in the Heap space and stack memory contains the reference to it. Stack memory only contains local primitive variables and reference variables to objects in heap space.
So to sum up, there are two points I think really matter:
It's really hard to make the instance effectively final, which might cause lots of senseless burden (just imagine the deep-nested class);
the instance itself is already globally shared and lambda is also shareable among threads, so they can work together properly since we know we're handling the mutation and want to pass this mutation around;
Balance point here is clear: if you know what you are doing, you can do it easily but if not then the default restriction will help to avoid insidious bugs.
P.S. If the synchronization required in instance mutation, you can use directly the stream reduction methods or if there is dependency issue in instance mutation, you still can use thenApply or thenCompose in Function while mapping or methods similar.

First, there is a key difference in how local and instance variables are implemented behind the scenes. Instance variables are stored in the heap, whereas local variables stored in the stack.
If the lambda could access the local variable directly and the lambda was used in a thread, then the thread using the lambda could try to access the variable after the thread that allocated the variable had deallocated it.
In short: to ensure another thread does not override the original value, it is better to provide access to the copy variable rather than the original one.

Related

Why variable used in lambda expression should be final or effectively final

This question has been previously asked over here
My question regarding why which was answered over here
But I have some doubts about the answer.
The answer provided mentions-
Although other answers prove the requirement, they don't explain why the requirement exists.
The JLS mentions why in §15.27.2:
The restriction to effectively final variables prohibits access to dynamically-changing local variables, whose capture would likely introduce concurrency problems.
To lower the risk of bugs, they decided to ensure captured variables are never mutated.
I am confused by the statement that it would lead to concurrency problems.
I read the article about concurrency problems on Baeldung but still, I am a bit confused about how it will cause concurrency problems, can anybody help me out with an example.
Thanks in advance.
I'd like to preface this answer by saying what I show below is not actually how lambdas are implemented. The actual implementation involves java.lang.invoke.LambdaMetafactory if I'm not mistaken. My answer makes use of some inaccuracies to better demonstrate the point.
Let's say you have the following:
public static void main(String[] args) {
String foo = "Hello, World!";
Runnable r = () -> System.out.println(foo);
r.run();
}
Remember that a lambda expression is shorthand for declaring an implementation of a functional interface. The lambda body is the implementation of the single abstract method of said functional interface. At run-time an actual object is created. So the above results in an object whose class implements Runnable.
Now, the above lambda body references a local variable from the enclosing method. The instance created as a result of the lambda expression "captures" the value of that local variable. It's almost (but not really) like you have the following:
public static void main(String[] args) {
String foo = "Hello, World!";
final class GeneratedClass implements Runnable {
private final String generatedField;
private GeneratedClass(String generatedParam) {
generatedField = generatedParam;
}
#Override
public void run() {
System.out.println(generatedField);
}
}
Runnable r = new GeneratedClass(foo);
r.run();
}
And now it should be easier to see the problems with supporting concurrency here:
Local variables are not considered "shared variables". This is stated in §17.4.1 of the Java Language Specification:
Memory that can be shared between threads is called shared memory or heap memory.
All instance fields, static fields, and array elements are stored in heap memory. In this chapter, we use the term variable to refer to both fields and array elements.
Local variables (§14.4), formal method parameters (§8.4.1), and exception handler parameters (§14.20) are never shared between threads and are unaffected by the memory model.
In other words, local variables are not covered by the concurrency rules of Java and cannot be shared between threads.
At a source code level you only have access to the local variable. You don't see the generated field.
I suppose Java could be designed so that modifying the local variable inside the lambda body only writes to the generated field, and modifying the local variable outside the lambda body only writes to the local variable. But as you can probably imagine that'd be confusing and counterintuitive. You'd have two variables that appear to be one variable based on the source code. And what's worse those two variables can diverge in value.
The other option is to have no generated field. But consider the following:
public static void main(String[] args) {
String foo = "Hello, World!";
Runnable r = () -> {
foo = "Goodbye, World!"; // won't compile
System.out.println(foo);
}
new Thread(r).start();
System.out.println(foo);
}
What is supposed to happen here? If there is no generated field then the local variable is being modified by a second thread. But local variables cannot be shared between threads. Thus this approach is not possible, at least not without a likely non-trivial change to Java and the JVM.
So, as I understand it, the designers put in the rule that the local variable must be final or effectively final in this context in order to avoid concurrency problems and confusing developers with esoteric problems.
When an instance of a lambda expression is created, any variables in the enclosing scope that it refers are copied into it. Now, suppose if that were allowed to modify, and now you are working with a stale value which is there in that copy. On the other hand, suppose the copy is modified inside the lambda, and still the value in the enclosing scope is not updated, leaving an inconsistency. Thus, to prevent such occurrences, the language designers have imposed this restriction. It would probably have made their life easier too. A related answer for an anonymous inner class can be found here.
Another point is that you will be able to pass the lambda expression around and if it is escaped and a different thread executes it, while current thread is updating the same local variable, then there will be some concurrency issues too.
It is for the same reason the anonymous classes require the variables used in their coming out from the scope of themselves must be read-only -> final.
final int finalInt = 0;
int effectivelyFinalInt = 0;
int brokenInt = 0;
brokenInt = 0;
Supplier<Integer> supplier = new Supplier<Integer>() {
#Override
public Integer get() {
return finalInt; // compiles
return effectivelyFinalInt; // compiles
return brokenInt; // doesn't compile
}
};
Lambda expressions are only shortcuts for instances implementing the interface with only one abstract method (#FunctionalInterface).
Supplier<Integer> supplier = () -> brokenInt; // compiles
Supplier<Integer> supplier = () -> brokenInt; // compiles
Supplier<Integer> supplier = () -> brokenInt; // doesn't compile
I struggle to read the Java Language specification to provide support to my statements below, however, they are logical:
Note that evaluation of a lambda expression produces an instance of a functional interface.
Note that instantiating an interface requires implementing all its abstract methods. Doing as an expression produces an anonymous class.
Note that an anonymous class is always an inner class.
Each inner class can access only final or effectively-final variables outside of its scope: Accessing Members of an Enclosing Class
In addition, a local class has access to local variables. However, a local class can only access local variables that are declared final. When a local class accesses a local variable or parameter of the enclosing block, it captures that variable or parameter.

Is this local or static variable capture?

In the following code, the lambda expressions capture a static variable.
However, it is also local to the scope of the enclosing class, so would this be local variable capture or static variable capture?
public class ExampleImpl{
static String someStaticVar = "text";
Example lam = () -> {
System.out.println(someStaticVar);
};
interface Example {
void sample();
}
}
The terms “local variable capture” and “static variable capture” do not appear anywhere in the specification, so their meaning would be up to whoever coined these terms.
The most likely interpretation is that “local variable capture” just mean “capture of a local variable” and likewise “static variable capture” means “capture of a static variable”, in other words, capture of a variable which happens to be of either kind, local, instance field, or static field, and then, the answer is quiet simple, the nature of the variables doesn’t change when you place a lambda expression in a different scope.
In your example, someStaticVar always is a static variable, regardless of where you access it.
It’s not clear why this distinction matters to you. There might be technical differences under the hood, which are intentionally unspecified, hence, implementation specific. The most relevant aspect of the type of the captured variables would be that capturing an instance variable will cause the generated instance to keep a reference to the instance. But first, this doesn’t apply to local variables or static variables, second, this is a natural relationship, that code potentially accessing an instance field may prevent the garbage collection of that instance.

Why does java allow class level variables to be reassigned in anonymous inner class, whereas same is not allowed for local variables [duplicate]

This question already has answers here:
How can non-final fields be used in a anonymous class class if their value can change?
(1 answer)
Lambdas: local variables need final, instance variables don't
(10 answers)
Why can an anonymous class access non-final class member of the enclosing class
(4 answers)
Closed 4 years ago.
This question is similar to Lambdas: local variables need final, instance variables don't,but the only difference is this question is valid even without lambda expressions i.e. valid even on Java7.
Here is the code snippet below.
public class MyClass {
Integer globalInteger = new Integer(1);
public void someMethod() {
Integer localInt = new Integer(2);
Runnable runnable = new Runnable() {
#Override
public void run() {
globalInteger = new Integer(11);//no error
localInt = new Integer(22);//error here
}
};
}
}
I am allowed to reassign globalInteger a new value but not to localInteger. Why is this difference?
To understand why non-local variables are allowed to change, we first need to understand why local variables aren't. And that's because local variables are stored on the stack (which instance (or static) variables aren't).
The problem with stack variables is that they're going to disappear once their containing method returns. However the instance of your anonymous class might live longer than that. So if accessing local variables were implemented naively, using the local variable from inside the inner class after the method returned would access a variable on a stack frame that no longer exists. That would either lead to a crash, an exception or undefined behavior depending on the exact implementation. Since that's clearly bad, access to local variables is implemented via copying instead. That is, all the local variables that are used by the class (including the special variable this) are copied into the anonymous object. So when a method of the inner class accesses a local variable x, it's not actually accessing that local variable. It's accessing a copy of it stored inside the object.
But what would happen if a local variable changed after the object was created or if a method of the object changed the variable? Well, the former would cause the local variable to change, but not the copy in the object, and the latter would change the copy, but not the original. So either way the two versions of the variable would no longer be the same, which would be very counter-intuitive to any programmer who doesn't know about the copying going on. So to avoid this problem, you're only allowed to access local variables if their value is never changed.
Instance variables don't need to be copied because they won't disappear until their containing object is garbage collected (and static variables never disappear) - since the anonymous object will contain a reference to the outer this, this won't happen until the anonymous object is garbage collected as well. So since they aren't copied, modifying them doesn't cause any issues and there's no reason to disallow it.
Because JVM has no machine instruction to assign a variable located at any stack frame which is different from the current stack frame.
because the lambda function is not part of the class.
Think about the following code (small changes of yours):
public Runnable func() {
Integer localInt = new Integer(2);
Runnable runnable = new Runnable() {
#Override
public void run() {
globalInteger = new Integer(11);//no error
localInt = new Integer(22);//error here
}
};
return runnable;
}
//Somewhere in the code:
Runnable r = func();
r.run(); // At this point localInt is not defined.
The complier tells you the error that variable in inner class must be final or effectively final. This is due to Java's 8 closure and how JVM captures the reference. The restriction is that the reference captured in the lambda body must be final (not re-assignable), and complier needs to ensure it doesn't reference copies of local variables.
so if you access instance variable, your lambda is really referencing this instance of the surrounding class, which is effective final (non-changing reference). In addition, if you use a wrapper class or array, the complier error also goes away.

Local variable log defined in an enclosing scope must be final or effectively final

I'm new to lambda and Java8. I'm facing following error.
Local variable log defined in an enclosing scope must be final or
effectively final
public JavaRDD<String> modify(JavaRDD<String> filteredRdd) {
filteredRdd.map(log -> {
placeHolder.forEach(text -> {
//error comes here
log = log.replace(text, ",");
});
return log;
});
return null;
}
The message says exactly what the problem is: your variable log must be final (that is: carry the keyword final) or be effectively final (that is: you only assign a value to it once outside of the lambda). Otherwise, you can't use that variable within your lambda statement.
But of course, that conflicts with your usage of log. The point is: you can't write to something external from within the lambda ... so you have to step back and look for other ways for whatever you intend to do.
In that sense: just believe the compiler.
Beyond that, there is one core point to understand: you can not use a local variable that you can write to. Local variables are "copied" into the context of the lambda at runtime, and in order to achieve deterministic behavior, they can only be read, and should be constants.
If your use case is to write to some object, then it should be a field of your enclosing class for example!
So, long story short:
local variables used (read) inside a lambda must act like a constant
you can not write to local variables!
or the other way round: if you need something to write to, you have to use a field of your surrounding class for example (or provide a call back method)
The reason for this limitation is the same as the reason for the Java language feature that local variables accessed from within (anonymous) inner classes must be (effectively) final.
This answer by rgettman gets into the details of it. rgettman explains the limitations in clear detail and I link to that answer because the behavior of lambda expressions should be same as that of anonymous inner classes. Note that such limitation does not exist for class or instance variables, however. The main reason for this is slightly complicated and I couldn't explain it better than what Roedy Green does it here. Copying here only so it is at one place:
The rule is anonymous inner classes may only access final local
variables of the enclosing method. Why? Because the inner class’s
methods may be invoked later, long after the method that spawned it
has terminated, e.g. by an AWT (Advanced Windowing Toolkit) event. The
local variables are long gone. The anonymous class then must work with
flash frozen copies of just the ones it needs squirreled away covertly
by the compiler in the anonymous inner class object. You might ask,
why do the local variables have to be final? Could not the compiler
just as well take a copy of non-final local variables, much the way it
does for a non-final parameters? If it did so, you would have two
copies of the variable. Each could change independently, much like
caller and callee’s copy of a parameter, however you would use the
same syntax to access either copy. This would be confusing. So Sun
insisted the local be final. This makes irrelevant that there are
actually two copies of it.
The ability for an anonymous class to access the caller’s final local
variables is really just syntactic sugar for automatically passing in
some local variables as extra constructor parameters. The whole thing
smells to me of diluted eau de kludge.
Remember method inner classes can`t modify any value from their surrounding method. Your second lambda expression in forecach is trying to access its surrounding method variable (log).
To solve this you can avoid using lambda in for each and so a simple for each and re-palace all the values in log.
filteredRdd.map(log -> {
for (String text:placeHolder){
log = log.replace(text,",");
}
return log;
});
In some use cases there can be a work around. The following code complains about the startTime variable not being effectively final:
List<Report> reportsBeforeTime = reports.stream()
.filter(r->r.getTime().isAfter(startTime))
.collect(Collectors.toList());
So, just copy the value to a final variable before passing it to lambda:
final LocalTime finalStartTime = startTime;
List<Report> reportsBeforeTime = reports.stream()
.filter(r->r.getTime().isAfter(finalStartTime))
.collect(Collectors.toList());
However, If you need to change a local variable inside a lambda function, that won't work.
If you do not want to create your own object wrapper, you can use AtomicReference, for example:
AtomicReference<String> data = new AtomicReference<>();
Test.lamdaTest(()-> {
//data = ans.get(); <--- can't do this, so we do as below
data.set("to change local variable");
});
return data.get();
One solution is to encapsulate the code in an enclosing (inner class). You can define this:
public abstract class ValueContext<T> {
public T value;
public abstract void run();
}
And then use it like this (example of a String value):
final ValueContext<String> context = new ValueContext<String>(myString) {
#Override
public void run() {
// Your code here; lambda or other enclosing classes that want to work on myString,
// but use 'value' instead of 'myString'
value = doSomethingWithMyString(value);
}};
context.run();
myString = context.value;

What are captured variables in Java Local Classes

The Java documentation for Local Classes says that:
In addition, a local class has access to local variables. However, a
local class can only access local variables that are declared final.
When a local class accesses a local variable or parameter of the
enclosing block, it captures that variable or parameter. For example,
the PhoneNumber constructor can access the local variable numberLength
because it is declared final; numberLength is a captured variable.
What is captured variable,what is its use and why is that needed? Please help me in understanding the concept of it.
What is captured variable,what is its use and why is that needed?
A captured variable is one that has been copied so it can be used in a nested class. The reason it has to be copied is the object may out live the current context. It has to be final (or effectively final in Java 8) so there is no confusion about whether changes to the variable will be seen (because they won't)
Note: Groovy does have this rule and a change to the local variable can mean a change to the value in the enclosing class which is especially confusing if multiple threads are involved.
An example of capture variable.
public void writeToDataBase(final Object toWrite) {
executor.submit(new Runnable() {
public void run() {
writeToDBNow(toWrite);
}
});
// if toWrite were mutable and you changed it now, what would happen !?
}
// after the method returns toWrite no longer exists for the this thread...
Here is a post describing it: http://www.devcodenote.com/2015/04/variable-capture-in-java.html
Here is a snippet from the post:
”It is imposed as a mandate by Java that if an inner class defined within a method references a local variable of that method, that local variable should be defined as final.”
This is because the function may complete execution and get removed from the process stack, with all the variables destroyed but it may be the case that objects of the inner class are still on the heap referencing a particular local variable of that function. To counter this, Java makes a copy of the local variable and gives that as a reference to the inner class. To maintain consistency between the 2 copies, the local variable is mandated to be “final” and non-modifiable.
A captured variable is one from the outside of your local class - one declared in the surrounding block. In some languages this is called a closure.
In the example from the Oracle Docs (simplified) the variable numberLength, declared outside of class PhoneNumber, is "captured".
final int numberLength = 10; // in JDK7 and earlier must be final...
class PhoneNumber {
// you can refer to numberLength here... it has been "captured"
}

Categories