As I understand, in languages such as Haskell, and also as part of the lambda calculus, each lambda expression has its own scope, so if I have nested lambda expressions such as: \x -> (\x -> x) then the first \x parameter is different to the second \x.
In Java if you do this you get a compilation error, just like if you use x again as the parameter name or a local variable name within the lambda if it has already been used inside the enclosing scope, e.g. as a method parameter.
Does anybody know why Java implemented lambda expressions this way - why not have them introduce a new level of scope and behave like an anonymous class would? I'm assuming it's because of some limitation or optimisation, or possibly because lambdas had to be hacked into the existing language?
This is the same behaviour as for other code blocks in Java.
This gives a compilation error
int a;
{
int a;
}
while this does not
{
int a;
}
{
int a;
}
You can read about this topic in section 6.4 of the JLS, together with some reasoning.
A lambda block is a new block, aka scope, but it does not establish a new context/level, like an anonymous class implementation does.
From Java Language Specification 15.27.2 Lambda Body:
Unlike code appearing in anonymous class declarations, the meaning of names and the this and super keywords appearing in a lambda body, along with the accessibility of referenced declarations, are the same as in the surrounding context (except that lambda parameters introduce new names).
And from JLS 6.4 Shadowing and Obscuring:
These rules allow redeclaration of a variable or local class in nested class declarations (local classes (§14.3) and anonymous classes (§15.9)) that occur in the scope of the variable or local class. Thus, the declaration of a formal parameter, local variable, or local class may be shadowed in a class declaration nested within a method, constructor, or lambda expression; and the declaration of an exception parameter may be shadowed inside a class declaration nested within the Block of the catch clause.
There are two design alternatives for handling name clashes created by lambda parameters and other variables declared in lambda expressions. One is to mimic class declarations: like local classes, lambda expressions introduce a new "level" for names, and all variable names outside the expression can be redeclared. Another is a "local" strategy: like catch clauses, for loops, and blocks, lambda expressions operate at the same "level" as the enclosing context, and local variables outside the expression cannot be shadowed. The above rules use the local strategy; there is no special dispensation that allows a variable declared in a lambda expression to shadow a variable declared in an enclosing method.
Example:
class Test {
private int f;
public void test() {
int a;
a = this.f; // VALID
{
int a; // ERROR: Duplicate local variable a
a = this.f; // VALID
}
Runnable r1 = new Runnable() {
#Override
public void run() {
int a; // VALID (new context)
a = this.f; // ERROR: f cannot be resolved or is not a field
// (this refers to the instance of Runnable)
a = Test.this.f; // VALID
}
};
Runnable r2 = () -> {
int a; // ERROR: Lambda expression's local variable a cannot redeclare another local variable defined in an enclosing scope.
a = this.f; // VALID
};
}
}
Lambdas in Java do introduce a new scope - any variable declared in a lambda is only accessible within the lambda.
What you really ask about is shadowing - changing binding of a variable already bound in some outer scope.
It is logical to allow some level of shadowing: you want to be able to shadow global names by local names, because otherwise you can break local code just by adding a new name to some global namespace. A lot of langues, for sake of simplicity, simply extend this rule down to local names.
On the other hand, rebinding local names is a code smell and can be a source of subtle mistakes, while - at the same time - not offering any technical advantage. Since you mentioned Haskell, you can look at this discussion on Lambda the Ultimate.
This is why Java disallows shadowing of local variables (like many other potentially dangerous things), but allows shadowing attributes by local variables (so that adding attributes will never break a method that already used the name).
So, the designers of Java 8 had to answer a question if lambdas should behave more like code blocks (no shadowing) or like inner classes (shadowing) and made a conscious decision to treat them like the former.
While the other answers make it seem like this was a clear-cut decision by the language designers, there is actually a JEP that proposes to introduce shadowing for lambda parameters (emphasis mine):
Lambda parameters are not allowed to shadow variables in the enclosing
scopes. [...] It would be desirable to lift this restriction, and
allow lambda parameters (and locals declared with a lambda) to shadow
variables defined in enclosing scopes.
The proposal is relatively old and has obviously not found its way into the JDK yet. But since it also includes a better treatment of the underscore (which was deprecated as an identifier in Java 8 to pave the way for this treatment), I could image that the proposal as a whole is not completely off the table.
Related
This question has been previously asked over here
My question regarding why which was answered over here
But I have some doubts about the answer.
The answer provided mentions-
Although other answers prove the requirement, they don't explain why the requirement exists.
The JLS mentions why in §15.27.2:
The restriction to effectively final variables prohibits access to dynamically-changing local variables, whose capture would likely introduce concurrency problems.
To lower the risk of bugs, they decided to ensure captured variables are never mutated.
I am confused by the statement that it would lead to concurrency problems.
I read the article about concurrency problems on Baeldung but still, I am a bit confused about how it will cause concurrency problems, can anybody help me out with an example.
Thanks in advance.
I'd like to preface this answer by saying what I show below is not actually how lambdas are implemented. The actual implementation involves java.lang.invoke.LambdaMetafactory if I'm not mistaken. My answer makes use of some inaccuracies to better demonstrate the point.
Let's say you have the following:
public static void main(String[] args) {
String foo = "Hello, World!";
Runnable r = () -> System.out.println(foo);
r.run();
}
Remember that a lambda expression is shorthand for declaring an implementation of a functional interface. The lambda body is the implementation of the single abstract method of said functional interface. At run-time an actual object is created. So the above results in an object whose class implements Runnable.
Now, the above lambda body references a local variable from the enclosing method. The instance created as a result of the lambda expression "captures" the value of that local variable. It's almost (but not really) like you have the following:
public static void main(String[] args) {
String foo = "Hello, World!";
final class GeneratedClass implements Runnable {
private final String generatedField;
private GeneratedClass(String generatedParam) {
generatedField = generatedParam;
}
#Override
public void run() {
System.out.println(generatedField);
}
}
Runnable r = new GeneratedClass(foo);
r.run();
}
And now it should be easier to see the problems with supporting concurrency here:
Local variables are not considered "shared variables". This is stated in §17.4.1 of the Java Language Specification:
Memory that can be shared between threads is called shared memory or heap memory.
All instance fields, static fields, and array elements are stored in heap memory. In this chapter, we use the term variable to refer to both fields and array elements.
Local variables (§14.4), formal method parameters (§8.4.1), and exception handler parameters (§14.20) are never shared between threads and are unaffected by the memory model.
In other words, local variables are not covered by the concurrency rules of Java and cannot be shared between threads.
At a source code level you only have access to the local variable. You don't see the generated field.
I suppose Java could be designed so that modifying the local variable inside the lambda body only writes to the generated field, and modifying the local variable outside the lambda body only writes to the local variable. But as you can probably imagine that'd be confusing and counterintuitive. You'd have two variables that appear to be one variable based on the source code. And what's worse those two variables can diverge in value.
The other option is to have no generated field. But consider the following:
public static void main(String[] args) {
String foo = "Hello, World!";
Runnable r = () -> {
foo = "Goodbye, World!"; // won't compile
System.out.println(foo);
}
new Thread(r).start();
System.out.println(foo);
}
What is supposed to happen here? If there is no generated field then the local variable is being modified by a second thread. But local variables cannot be shared between threads. Thus this approach is not possible, at least not without a likely non-trivial change to Java and the JVM.
So, as I understand it, the designers put in the rule that the local variable must be final or effectively final in this context in order to avoid concurrency problems and confusing developers with esoteric problems.
When an instance of a lambda expression is created, any variables in the enclosing scope that it refers are copied into it. Now, suppose if that were allowed to modify, and now you are working with a stale value which is there in that copy. On the other hand, suppose the copy is modified inside the lambda, and still the value in the enclosing scope is not updated, leaving an inconsistency. Thus, to prevent such occurrences, the language designers have imposed this restriction. It would probably have made their life easier too. A related answer for an anonymous inner class can be found here.
Another point is that you will be able to pass the lambda expression around and if it is escaped and a different thread executes it, while current thread is updating the same local variable, then there will be some concurrency issues too.
It is for the same reason the anonymous classes require the variables used in their coming out from the scope of themselves must be read-only -> final.
final int finalInt = 0;
int effectivelyFinalInt = 0;
int brokenInt = 0;
brokenInt = 0;
Supplier<Integer> supplier = new Supplier<Integer>() {
#Override
public Integer get() {
return finalInt; // compiles
return effectivelyFinalInt; // compiles
return brokenInt; // doesn't compile
}
};
Lambda expressions are only shortcuts for instances implementing the interface with only one abstract method (#FunctionalInterface).
Supplier<Integer> supplier = () -> brokenInt; // compiles
Supplier<Integer> supplier = () -> brokenInt; // compiles
Supplier<Integer> supplier = () -> brokenInt; // doesn't compile
I struggle to read the Java Language specification to provide support to my statements below, however, they are logical:
Note that evaluation of a lambda expression produces an instance of a functional interface.
Note that instantiating an interface requires implementing all its abstract methods. Doing as an expression produces an anonymous class.
Note that an anonymous class is always an inner class.
Each inner class can access only final or effectively-final variables outside of its scope: Accessing Members of an Enclosing Class
In addition, a local class has access to local variables. However, a local class can only access local variables that are declared final. When a local class accesses a local variable or parameter of the enclosing block, it captures that variable or parameter.
In the following code, the lambda expressions capture a static variable.
However, it is also local to the scope of the enclosing class, so would this be local variable capture or static variable capture?
public class ExampleImpl{
static String someStaticVar = "text";
Example lam = () -> {
System.out.println(someStaticVar);
};
interface Example {
void sample();
}
}
The terms “local variable capture” and “static variable capture” do not appear anywhere in the specification, so their meaning would be up to whoever coined these terms.
The most likely interpretation is that “local variable capture” just mean “capture of a local variable” and likewise “static variable capture” means “capture of a static variable”, in other words, capture of a variable which happens to be of either kind, local, instance field, or static field, and then, the answer is quiet simple, the nature of the variables doesn’t change when you place a lambda expression in a different scope.
In your example, someStaticVar always is a static variable, regardless of where you access it.
It’s not clear why this distinction matters to you. There might be technical differences under the hood, which are intentionally unspecified, hence, implementation specific. The most relevant aspect of the type of the captured variables would be that capturing an instance variable will cause the generated instance to keep a reference to the instance. But first, this doesn’t apply to local variables or static variables, second, this is a natural relationship, that code potentially accessing an instance field may prevent the garbage collection of that instance.
A variable usage is basically every occurrence of a variable after its declaration in the same scope, where some operation may be applied to it. Variable usage highlighting is even supported in some IDEs like IntelliJ and Eclipse.
I was wondering if there is a way to find variable usages using ANTLR ? I have already generated the Lexer, Parser, and BaseListener classes by running ANTLR on Java8.g4. I can find variable declarations but am not able to find variable usages in a given Java source code. How can I do this ?
Example :
int i; // Variable declaration
i++; // Variable usage
i = 2; // Variable usage
foo(i); // Variable 'i' usage
I am able to capture the declaration but not usage using the Listener class. I am parsing Java source codes here.
I'll assume you're only considering local variables.
You'll need scopes and resolving to do this.
Scope will represent Java's variable scope. It will hold information about which variables are declared in the given scope. You'll have to create it when you enter a Java scope (start of block, method, ...) and get rid of it upon leaving the scope. You'll keep a stack of scopes to represent nested blocks / scopes (Java doesn't allow hiding a local variable in nested scope, but you still need to track when a variable goes out of scope at the end of a nested scope).
And then you'll need to resolve each name you encounter in the parsed input - determine whether the name refers to a variable or not (using scope). Basically, it refers to a local variable, whenever it is the first part of name (before any .), is not followed by ( and matches a name of a local variable.
Parser cannot do this for you, because whether a name refers to a variable or not depends on available variables:
private static class A {
B out = new B();
}
private static class B {
void println(String foo) {
System.out.println("ha");
}
}
public static void main(String[] args) {
{
A System = new A();
System.out.println("a");
}
System.out.println("b");
}
ha
b
If you're considering also instance and static fields instead of just local variables, the resolving part becomes much more complicated, because you'll need to consider all classes in the current class' hierarchy, their instance and static fields, visibilities etc. to determine whether a variable of a given name exists and is visible.
I'm new to lambda and Java8. I'm facing following error.
Local variable log defined in an enclosing scope must be final or
effectively final
public JavaRDD<String> modify(JavaRDD<String> filteredRdd) {
filteredRdd.map(log -> {
placeHolder.forEach(text -> {
//error comes here
log = log.replace(text, ",");
});
return log;
});
return null;
}
The message says exactly what the problem is: your variable log must be final (that is: carry the keyword final) or be effectively final (that is: you only assign a value to it once outside of the lambda). Otherwise, you can't use that variable within your lambda statement.
But of course, that conflicts with your usage of log. The point is: you can't write to something external from within the lambda ... so you have to step back and look for other ways for whatever you intend to do.
In that sense: just believe the compiler.
Beyond that, there is one core point to understand: you can not use a local variable that you can write to. Local variables are "copied" into the context of the lambda at runtime, and in order to achieve deterministic behavior, they can only be read, and should be constants.
If your use case is to write to some object, then it should be a field of your enclosing class for example!
So, long story short:
local variables used (read) inside a lambda must act like a constant
you can not write to local variables!
or the other way round: if you need something to write to, you have to use a field of your surrounding class for example (or provide a call back method)
The reason for this limitation is the same as the reason for the Java language feature that local variables accessed from within (anonymous) inner classes must be (effectively) final.
This answer by rgettman gets into the details of it. rgettman explains the limitations in clear detail and I link to that answer because the behavior of lambda expressions should be same as that of anonymous inner classes. Note that such limitation does not exist for class or instance variables, however. The main reason for this is slightly complicated and I couldn't explain it better than what Roedy Green does it here. Copying here only so it is at one place:
The rule is anonymous inner classes may only access final local
variables of the enclosing method. Why? Because the inner class’s
methods may be invoked later, long after the method that spawned it
has terminated, e.g. by an AWT (Advanced Windowing Toolkit) event. The
local variables are long gone. The anonymous class then must work with
flash frozen copies of just the ones it needs squirreled away covertly
by the compiler in the anonymous inner class object. You might ask,
why do the local variables have to be final? Could not the compiler
just as well take a copy of non-final local variables, much the way it
does for a non-final parameters? If it did so, you would have two
copies of the variable. Each could change independently, much like
caller and callee’s copy of a parameter, however you would use the
same syntax to access either copy. This would be confusing. So Sun
insisted the local be final. This makes irrelevant that there are
actually two copies of it.
The ability for an anonymous class to access the caller’s final local
variables is really just syntactic sugar for automatically passing in
some local variables as extra constructor parameters. The whole thing
smells to me of diluted eau de kludge.
Remember method inner classes can`t modify any value from their surrounding method. Your second lambda expression in forecach is trying to access its surrounding method variable (log).
To solve this you can avoid using lambda in for each and so a simple for each and re-palace all the values in log.
filteredRdd.map(log -> {
for (String text:placeHolder){
log = log.replace(text,",");
}
return log;
});
In some use cases there can be a work around. The following code complains about the startTime variable not being effectively final:
List<Report> reportsBeforeTime = reports.stream()
.filter(r->r.getTime().isAfter(startTime))
.collect(Collectors.toList());
So, just copy the value to a final variable before passing it to lambda:
final LocalTime finalStartTime = startTime;
List<Report> reportsBeforeTime = reports.stream()
.filter(r->r.getTime().isAfter(finalStartTime))
.collect(Collectors.toList());
However, If you need to change a local variable inside a lambda function, that won't work.
If you do not want to create your own object wrapper, you can use AtomicReference, for example:
AtomicReference<String> data = new AtomicReference<>();
Test.lamdaTest(()-> {
//data = ans.get(); <--- can't do this, so we do as below
data.set("to change local variable");
});
return data.get();
One solution is to encapsulate the code in an enclosing (inner class). You can define this:
public abstract class ValueContext<T> {
public T value;
public abstract void run();
}
And then use it like this (example of a String value):
final ValueContext<String> context = new ValueContext<String>(myString) {
#Override
public void run() {
// Your code here; lambda or other enclosing classes that want to work on myString,
// but use 'value' instead of 'myString'
value = doSomethingWithMyString(value);
}};
context.run();
myString = context.value;
In a lambda, local variables need to be final, but instance variables don't. Why so?
The fundamental difference between a field and a local variable is that the local variable is copied when JVM creates a lambda instance. On the other hand, fields can be changed freely, because the changes to them are propagated to the outside class instance as well (their scope is the whole outside class, as Boris pointed out below).
The easiest way of thinking about anonymous classes, closures and labmdas is from the variable scope perspective; imagine a copy constructor added for all local variables you pass to a closure.
In a document of project lambda, State of the Lambda v4, under Section 7. Variable capture, it is mentioned that:
It is our intent to prohibit capture of mutable local variables. The
reason is that idioms like this:
int sum = 0;
list.forEach(e -> { sum += e.size(); });
are fundamentally serial; it is quite difficult to write lambda bodies
like this that do not have race conditions. Unless we are willing to
enforce—preferably at compile time—that such a function cannot escape
its capturing thread, this feature may well cause more trouble than it
solves.
Another thing to note here is, local variables are passed in the constructor of an inner class when you access them inside your inner class, and this won't work with non-final variable because value of non-final variables can be changed after construction.
While in case of an instance variable, the compiler passes a reference of the object and object reference will be used to access instance variables. So, it is not required in case of instance variables.
PS : It is worth mentioning that anonymous classes can access only final local variables (in Java SE 7), while in Java SE 8 you can access effectively final variables also inside lambda as well as inner classes.
In Java 8 in Action book, this situation is explained as:
You may be asking yourself why local variables have these restrictions.
First, there’s a key
difference in how instance and local variables are implemented behind the scenes. Instance
variables are stored on the heap, whereas local variables live on the stack. If a lambda could
access the local variable directly and the lambda were used in a thread, then the thread using the
lambda could try to access the variable after the thread that allocated the variable had
deallocated it. Hence, Java implements access to a free local variable as access to a copy of it
rather than access to the original variable. This makes no difference if the local variable is
assigned to only once—hence the restriction.
Second, this restriction also discourages typical imperative programming patterns (which, as we
explain in later chapters, prevent easy parallelization) that mutate an outer variable.
Because instance variables are always accessed through a field access operation on a reference to some object, i.e. some_expression.instance_variable. Even when you don't explicitly access it through dot notation, like instance_variable, it is implicitly treated as this.instance_variable (or if you're in an inner class accessing an outer class's instance variable, OuterClass.this.instance_variable, which is under the hood this.<hidden reference to outer this>.instance_variable).
Thus an instance variable is never directly accessed, and the real "variable" you're directly accessing is this (which is "effectively final" since it is not assignable), or a variable at the beginning of some other expression.
Putting up some concepts for future visitors:
Basically it all boils down to the point that compiler should be able to deterministically tell that lambda expression body is not working on a stale copy of the variables.
In case of local variables, compiler has no way to be sure that lambda expression body is not working on a stale copy of the variable unless that variable is final or effectively final, so local variables should be either final or effectively final.
Now, in case of instance fields, when you access an instance field inside the lambda expression then compiler will append a this to that variable access (if you have not done it explicitly) and since this is effectively final so compiler is sure that lambda expression body will always have the latest copy of the variable (please note that multi-threading is out of scope right now for this discussion). So, in case instance fields, compiler can tell that lambda body has latest copy of instance variable so instance variables need not to be final or effectively final. Please refer below screen shot from an Oracle slide:
Also, please note that if you are accessing an instance field in lambda expression and that is getting executed in multi-threaded environment then you could potentially run in problem.
It seems like you are asking about variables that you can reference from a lambda body.
From the JLS §15.27.2
Any local variable, formal parameter, or exception parameter used but not declared in a lambda expression must either be declared final or be effectively final (§4.12.4), or a compile-time error occurs where the use is attempted.
So you don't need to declare variables as final you just need to make sure that they are "effectively final". This is the same rule as applies to anonymous classes.
Within Lambda expressions you can use effectively final variables from the surrounding scope.
Effectively means that it is not mandatory to declare variable final but make sure you do not change its state within the lambda expresssion.
You can also use this within closures and using "this" means the enclosing object but not the lambda itself as closures are anonymous functions and they do not have class associated with them.
So when you use any field (let say private Integer i;)from the enclosing class which is not declared final and not effectively final it will still work as the compiler makes the trick on your behalf and insert "this" (this.i).
private Integer i = 0;
public void process(){
Consumer<Integer> c = (i)-> System.out.println(++this.i);
c.accept(i);
}
Here is a code example, as I didn't expect this either, I expected to be unable to modify anything outside my lambda
public class LambdaNonFinalExample {
static boolean odd = false;
public static void main(String[] args) throws Exception {
//boolean odd = false; - If declared inside the method then I get the expected "Effectively Final" compile error
runLambda(() -> odd = true);
System.out.println("Odd=" + odd);
}
public static void runLambda(Callable c) throws Exception {
c.call();
}
}
Output:
Odd=true
YES, you can change the member variables of the instance but you CANNOT change the instance itself just like when you handle variables.
Something like this as mentioned:
class Car {
public String name;
}
public void testLocal() {
int theLocal = 6;
Car bmw = new Car();
bmw.name = "BMW";
Stream.iterate(0, i -> i + 2).limit(2)
.forEach(i -> {
// bmw = new Car(); // LINE - 1;
bmw.name = "BMW NEW"; // LINE - 2;
System.out.println("Testing local variables: " + (theLocal + i));
});
// have to comment this to ensure it's `effectively final`;
// theLocal = 2;
}
The basic principle to restrict the local variables is about data and computation validity
If the lambda, evaluated by the second thread, were given the ability to mutate local variables. Even the ability to read the value of mutable local variables from a different thread would introduce the necessity for synchronization or the use of volatile in order to avoid reading stale data.
But as we know the principal purpose of the lambdas
Amongst the different reasons for this, the most pressing one for the Java platform is that they make it easier to distribute processing of collections over multiple threads.
Quite unlike local variables, local instance can be mutated, because it's shared globally. We can understand this better via the heap and stack difference:
Whenever an object is created, it’s always stored in the Heap space and stack memory contains the reference to it. Stack memory only contains local primitive variables and reference variables to objects in heap space.
So to sum up, there are two points I think really matter:
It's really hard to make the instance effectively final, which might cause lots of senseless burden (just imagine the deep-nested class);
the instance itself is already globally shared and lambda is also shareable among threads, so they can work together properly since we know we're handling the mutation and want to pass this mutation around;
Balance point here is clear: if you know what you are doing, you can do it easily but if not then the default restriction will help to avoid insidious bugs.
P.S. If the synchronization required in instance mutation, you can use directly the stream reduction methods or if there is dependency issue in instance mutation, you still can use thenApply or thenCompose in Function while mapping or methods similar.
First, there is a key difference in how local and instance variables are implemented behind the scenes. Instance variables are stored in the heap, whereas local variables stored in the stack.
If the lambda could access the local variable directly and the lambda was used in a thread, then the thread using the lambda could try to access the variable after the thread that allocated the variable had deallocated it.
In short: to ensure another thread does not override the original value, it is better to provide access to the copy variable rather than the original one.