Use ANTLR to find Variable usage/reference in Java source-code?

Use ANTLR to find Variable usage/reference in Java source-code? - java

A variable usage is basically every occurrence of a variable after its declaration in the same scope, where some operation may be applied to it. Variable usage highlighting is even supported in some IDEs like IntelliJ and Eclipse.
I was wondering if there is a way to find variable usages using ANTLR ? I have already generated the Lexer, Parser, and BaseListener classes by running ANTLR on Java8.g4. I can find variable declarations but am not able to find variable usages in a given Java source code. How can I do this ?
Example :
int i; // Variable declaration
i++; // Variable usage
i = 2; // Variable usage
foo(i); // Variable 'i' usage
I am able to capture the declaration but not usage using the Listener class. I am parsing Java source codes here.

I'll assume you're only considering local variables.
You'll need scopes and resolving to do this.
Scope will represent Java's variable scope. It will hold information about which variables are declared in the given scope. You'll have to create it when you enter a Java scope (start of block, method, ...) and get rid of it upon leaving the scope. You'll keep a stack of scopes to represent nested blocks / scopes (Java doesn't allow hiding a local variable in nested scope, but you still need to track when a variable goes out of scope at the end of a nested scope).
And then you'll need to resolve each name you encounter in the parsed input - determine whether the name refers to a variable or not (using scope). Basically, it refers to a local variable, whenever it is the first part of name (before any .), is not followed by ( and matches a name of a local variable.
Parser cannot do this for you, because whether a name refers to a variable or not depends on available variables:
private static class A {
B out = new B();
}
private static class B {
void println(String foo) {
System.out.println("ha");
}
}
public static void main(String[] args) {
{
A System = new A();
System.out.println("a");
}
System.out.println("b");
}
ha
b
If you're considering also instance and static fields instead of just local variables, the resolving part becomes much more complicated, because you'll need to consider all classes in the current class' hierarchy, their instance and static fields, visibilities etc. to determine whether a variable of a given name exists and is visible.

Related

Local variable log defined in an enclosing scope must be final or effectively final

I'm new to lambda and Java8. I'm facing following error.
Local variable log defined in an enclosing scope must be final or
effectively final
public JavaRDD<String> modify(JavaRDD<String> filteredRdd) {
filteredRdd.map(log -> {
placeHolder.forEach(text -> {
//error comes here
log = log.replace(text, ",");
});
return log;
});
return null;
}

The message says exactly what the problem is: your variable log must be final (that is: carry the keyword final) or be effectively final (that is: you only assign a value to it once outside of the lambda). Otherwise, you can't use that variable within your lambda statement.
But of course, that conflicts with your usage of log. The point is: you can't write to something external from within the lambda ... so you have to step back and look for other ways for whatever you intend to do.
In that sense: just believe the compiler.
Beyond that, there is one core point to understand: you can not use a local variable that you can write to. Local variables are "copied" into the context of the lambda at runtime, and in order to achieve deterministic behavior, they can only be read, and should be constants.
If your use case is to write to some object, then it should be a field of your enclosing class for example!
So, long story short:
local variables used (read) inside a lambda must act like a constant
you can not write to local variables!
or the other way round: if you need something to write to, you have to use a field of your surrounding class for example (or provide a call back method)

The reason for this limitation is the same as the reason for the Java language feature that local variables accessed from within (anonymous) inner classes must be (effectively) final.
This answer by rgettman gets into the details of it. rgettman explains the limitations in clear detail and I link to that answer because the behavior of lambda expressions should be same as that of anonymous inner classes. Note that such limitation does not exist for class or instance variables, however. The main reason for this is slightly complicated and I couldn't explain it better than what Roedy Green does it here. Copying here only so it is at one place:
The rule is anonymous inner classes may only access final local
variables of the enclosing method. Why? Because the inner class’s
methods may be invoked later, long after the method that spawned it
has terminated, e.g. by an AWT (Advanced Windowing Toolkit) event. The
local variables are long gone. The anonymous class then must work with
flash frozen copies of just the ones it needs squirreled away covertly
by the compiler in the anonymous inner class object. You might ask,
why do the local variables have to be final? Could not the compiler
just as well take a copy of non-final local variables, much the way it
does for a non-final parameters? If it did so, you would have two
copies of the variable. Each could change independently, much like
caller and callee’s copy of a parameter, however you would use the
same syntax to access either copy. This would be confusing. So Sun
insisted the local be final. This makes irrelevant that there are
actually two copies of it.
The ability for an anonymous class to access the caller’s final local
variables is really just syntactic sugar for automatically passing in
some local variables as extra constructor parameters. The whole thing
smells to me of diluted eau de kludge.

Remember method inner classes can`t modify any value from their surrounding method. Your second lambda expression in forecach is trying to access its surrounding method variable (log).
To solve this you can avoid using lambda in for each and so a simple for each and re-palace all the values in log.
filteredRdd.map(log -> {
for (String text:placeHolder){
log = log.replace(text,",");
}
return log;
});

In some use cases there can be a work around. The following code complains about the startTime variable not being effectively final:
List<Report> reportsBeforeTime = reports.stream()
.filter(r->r.getTime().isAfter(startTime))
.collect(Collectors.toList());
So, just copy the value to a final variable before passing it to lambda:
final LocalTime finalStartTime = startTime;
List<Report> reportsBeforeTime = reports.stream()
.filter(r->r.getTime().isAfter(finalStartTime))
.collect(Collectors.toList());
However, If you need to change a local variable inside a lambda function, that won't work.

If you do not want to create your own object wrapper, you can use AtomicReference, for example:
AtomicReference<String> data = new AtomicReference<>();
Test.lamdaTest(()-> {
//data = ans.get(); <--- can't do this, so we do as below
data.set("to change local variable");
});
return data.get();

One solution is to encapsulate the code in an enclosing (inner class). You can define this:
public abstract class ValueContext<T> {
public T value;
public abstract void run();
}
And then use it like this (example of a String value):
final ValueContext<String> context = new ValueContext<String>(myString) {
#Override
public void run() {
// Your code here; lambda or other enclosing classes that want to work on myString,
// but use 'value' instead of 'myString'
value = doSomethingWithMyString(value);
}};
context.run();
myString = context.value;

Why do java lambda expressions not introduce a new level of scope?

As I understand, in languages such as Haskell, and also as part of the lambda calculus, each lambda expression has its own scope, so if I have nested lambda expressions such as: \x -> (\x -> x) then the first \x parameter is different to the second \x.
In Java if you do this you get a compilation error, just like if you use x again as the parameter name or a local variable name within the lambda if it has already been used inside the enclosing scope, e.g. as a method parameter.
Does anybody know why Java implemented lambda expressions this way - why not have them introduce a new level of scope and behave like an anonymous class would? I'm assuming it's because of some limitation or optimisation, or possibly because lambdas had to be hacked into the existing language?

This is the same behaviour as for other code blocks in Java.
This gives a compilation error
int a;
{
int a;
}
while this does not
{
int a;
}
{
int a;
}
You can read about this topic in section 6.4 of the JLS, together with some reasoning.

A lambda block is a new block, aka scope, but it does not establish a new context/level, like an anonymous class implementation does.
From Java Language Specification 15.27.2 Lambda Body:
Unlike code appearing in anonymous class declarations, the meaning of names and the this and super keywords appearing in a lambda body, along with the accessibility of referenced declarations, are the same as in the surrounding context (except that lambda parameters introduce new names).
And from JLS 6.4 Shadowing and Obscuring:
These rules allow redeclaration of a variable or local class in nested class declarations (local classes (§14.3) and anonymous classes (§15.9)) that occur in the scope of the variable or local class. Thus, the declaration of a formal parameter, local variable, or local class may be shadowed in a class declaration nested within a method, constructor, or lambda expression; and the declaration of an exception parameter may be shadowed inside a class declaration nested within the Block of the catch clause.
There are two design alternatives for handling name clashes created by lambda parameters and other variables declared in lambda expressions. One is to mimic class declarations: like local classes, lambda expressions introduce a new "level" for names, and all variable names outside the expression can be redeclared. Another is a "local" strategy: like catch clauses, for loops, and blocks, lambda expressions operate at the same "level" as the enclosing context, and local variables outside the expression cannot be shadowed. The above rules use the local strategy; there is no special dispensation that allows a variable declared in a lambda expression to shadow a variable declared in an enclosing method.
Example:
class Test {
private int f;
public void test() {
int a;
a = this.f; // VALID
{
int a; // ERROR: Duplicate local variable a
a = this.f; // VALID
}
Runnable r1 = new Runnable() {
#Override
public void run() {
int a; // VALID (new context)
a = this.f; // ERROR: f cannot be resolved or is not a field
// (this refers to the instance of Runnable)
a = Test.this.f; // VALID
}
};
Runnable r2 = () -> {
int a; // ERROR: Lambda expression's local variable a cannot redeclare another local variable defined in an enclosing scope.
a = this.f; // VALID
};
}
}

Lambdas in Java do introduce a new scope - any variable declared in a lambda is only accessible within the lambda.
What you really ask about is shadowing - changing binding of a variable already bound in some outer scope.
It is logical to allow some level of shadowing: you want to be able to shadow global names by local names, because otherwise you can break local code just by adding a new name to some global namespace. A lot of langues, for sake of simplicity, simply extend this rule down to local names.
On the other hand, rebinding local names is a code smell and can be a source of subtle mistakes, while - at the same time - not offering any technical advantage. Since you mentioned Haskell, you can look at this discussion on Lambda the Ultimate.
This is why Java disallows shadowing of local variables (like many other potentially dangerous things), but allows shadowing attributes by local variables (so that adding attributes will never break a method that already used the name).
So, the designers of Java 8 had to answer a question if lambdas should behave more like code blocks (no shadowing) or like inner classes (shadowing) and made a conscious decision to treat them like the former.

While the other answers make it seem like this was a clear-cut decision by the language designers, there is actually a JEP that proposes to introduce shadowing for lambda parameters (emphasis mine):
Lambda parameters are not allowed to shadow variables in the enclosing
scopes. [...] It would be desirable to lift this restriction, and
allow lambda parameters (and locals declared with a lambda) to shadow
variables defined in enclosing scopes.
The proposal is relatively old and has obviously not found its way into the JDK yet. But since it also includes a better treatment of the underscore (which was deprecated as an identifier in Java 8 to pave the way for this treatment), I could image that the proposal as a whole is not completely off the table.

Lambdas: local variables need final, instance variables don't

In a lambda, local variables need to be final, but instance variables don't. Why so?

The fundamental difference between a field and a local variable is that the local variable is copied when JVM creates a lambda instance. On the other hand, fields can be changed freely, because the changes to them are propagated to the outside class instance as well (their scope is the whole outside class, as Boris pointed out below).
The easiest way of thinking about anonymous classes, closures and labmdas is from the variable scope perspective; imagine a copy constructor added for all local variables you pass to a closure.

In a document of project lambda, State of the Lambda v4, under Section 7. Variable capture, it is mentioned that:
It is our intent to prohibit capture of mutable local variables. The
reason is that idioms like this:
int sum = 0;
list.forEach(e -> { sum += e.size(); });
are fundamentally serial; it is quite difficult to write lambda bodies
like this that do not have race conditions. Unless we are willing to
enforce—preferably at compile time—that such a function cannot escape
its capturing thread, this feature may well cause more trouble than it
solves.
Another thing to note here is, local variables are passed in the constructor of an inner class when you access them inside your inner class, and this won't work with non-final variable because value of non-final variables can be changed after construction.
While in case of an instance variable, the compiler passes a reference of the object and object reference will be used to access instance variables. So, it is not required in case of instance variables.
PS : It is worth mentioning that anonymous classes can access only final local variables (in Java SE 7), while in Java SE 8 you can access effectively final variables also inside lambda as well as inner classes.

In Java 8 in Action book, this situation is explained as:
You may be asking yourself why local variables have these restrictions.
First, there’s a key
difference in how instance and local variables are implemented behind the scenes. Instance
variables are stored on the heap, whereas local variables live on the stack. If a lambda could
access the local variable directly and the lambda were used in a thread, then the thread using the
lambda could try to access the variable after the thread that allocated the variable had
deallocated it. Hence, Java implements access to a free local variable as access to a copy of it
rather than access to the original variable. This makes no difference if the local variable is
assigned to only once—hence the restriction.
Second, this restriction also discourages typical imperative programming patterns (which, as we
explain in later chapters, prevent easy parallelization) that mutate an outer variable.

Because instance variables are always accessed through a field access operation on a reference to some object, i.e. some_expression.instance_variable. Even when you don't explicitly access it through dot notation, like instance_variable, it is implicitly treated as this.instance_variable (or if you're in an inner class accessing an outer class's instance variable, OuterClass.this.instance_variable, which is under the hood this.<hidden reference to outer this>.instance_variable).
Thus an instance variable is never directly accessed, and the real "variable" you're directly accessing is this (which is "effectively final" since it is not assignable), or a variable at the beginning of some other expression.

Putting up some concepts for future visitors:
Basically it all boils down to the point that compiler should be able to deterministically tell that lambda expression body is not working on a stale copy of the variables.
In case of local variables, compiler has no way to be sure that lambda expression body is not working on a stale copy of the variable unless that variable is final or effectively final, so local variables should be either final or effectively final.
Now, in case of instance fields, when you access an instance field inside the lambda expression then compiler will append a this to that variable access (if you have not done it explicitly) and since this is effectively final so compiler is sure that lambda expression body will always have the latest copy of the variable (please note that multi-threading is out of scope right now for this discussion). So, in case instance fields, compiler can tell that lambda body has latest copy of instance variable so instance variables need not to be final or effectively final. Please refer below screen shot from an Oracle slide:
Also, please note that if you are accessing an instance field in lambda expression and that is getting executed in multi-threaded environment then you could potentially run in problem.

It seems like you are asking about variables that you can reference from a lambda body.
From the JLS §15.27.2
Any local variable, formal parameter, or exception parameter used but not declared in a lambda expression must either be declared final or be effectively final (§4.12.4), or a compile-time error occurs where the use is attempted.
So you don't need to declare variables as final you just need to make sure that they are "effectively final". This is the same rule as applies to anonymous classes.

Within Lambda expressions you can use effectively final variables from the surrounding scope.
Effectively means that it is not mandatory to declare variable final but make sure you do not change its state within the lambda expresssion.
You can also use this within closures and using "this" means the enclosing object but not the lambda itself as closures are anonymous functions and they do not have class associated with them.
So when you use any field (let say private Integer i;)from the enclosing class which is not declared final and not effectively final it will still work as the compiler makes the trick on your behalf and insert "this" (this.i).
private Integer i = 0;
public void process(){
Consumer<Integer> c = (i)-> System.out.println(++this.i);
c.accept(i);
}

Here is a code example, as I didn't expect this either, I expected to be unable to modify anything outside my lambda
public class LambdaNonFinalExample {
static boolean odd = false;
public static void main(String[] args) throws Exception {
//boolean odd = false; - If declared inside the method then I get the expected "Effectively Final" compile error
runLambda(() -> odd = true);
System.out.println("Odd=" + odd);
}
public static void runLambda(Callable c) throws Exception {
c.call();
}
}
Output:
Odd=true

YES, you can change the member variables of the instance but you CANNOT change the instance itself just like when you handle variables.
Something like this as mentioned:
class Car {
public String name;
}
public void testLocal() {
int theLocal = 6;
Car bmw = new Car();
bmw.name = "BMW";
Stream.iterate(0, i -> i + 2).limit(2)
.forEach(i -> {
// bmw = new Car(); // LINE - 1;
bmw.name = "BMW NEW"; // LINE - 2;
System.out.println("Testing local variables: " + (theLocal + i));
});
// have to comment this to ensure it's `effectively final`;
// theLocal = 2;
}
The basic principle to restrict the local variables is about data and computation validity
If the lambda, evaluated by the second thread, were given the ability to mutate local variables. Even the ability to read the value of mutable local variables from a different thread would introduce the necessity for synchronization or the use of volatile in order to avoid reading stale data.
But as we know the principal purpose of the lambdas
Amongst the different reasons for this, the most pressing one for the Java platform is that they make it easier to distribute processing of collections over multiple threads.
Quite unlike local variables, local instance can be mutated, because it's shared globally. We can understand this better via the heap and stack difference:
Whenever an object is created, it’s always stored in the Heap space and stack memory contains the reference to it. Stack memory only contains local primitive variables and reference variables to objects in heap space.
So to sum up, there are two points I think really matter:
It's really hard to make the instance effectively final, which might cause lots of senseless burden (just imagine the deep-nested class);
the instance itself is already globally shared and lambda is also shareable among threads, so they can work together properly since we know we're handling the mutation and want to pass this mutation around;
Balance point here is clear: if you know what you are doing, you can do it easily but if not then the default restriction will help to avoid insidious bugs.
P.S. If the synchronization required in instance mutation, you can use directly the stream reduction methods or if there is dependency issue in instance mutation, you still can use thenApply or thenCompose in Function while mapping or methods similar.

First, there is a key difference in how local and instance variables are implemented behind the scenes. Instance variables are stored in the heap, whereas local variables stored in the stack.
If the lambda could access the local variable directly and the lambda was used in a thread, then the thread using the lambda could try to access the variable after the thread that allocated the variable had deallocated it.
In short: to ensure another thread does not override the original value, it is better to provide access to the copy variable rather than the original one.

What are captured variables in Java Local Classes

The Java documentation for Local Classes says that:
In addition, a local class has access to local variables. However, a
local class can only access local variables that are declared final.
When a local class accesses a local variable or parameter of the
enclosing block, it captures that variable or parameter. For example,
the PhoneNumber constructor can access the local variable numberLength
because it is declared final; numberLength is a captured variable.
What is captured variable,what is its use and why is that needed? Please help me in understanding the concept of it.

What is captured variable,what is its use and why is that needed?
A captured variable is one that has been copied so it can be used in a nested class. The reason it has to be copied is the object may out live the current context. It has to be final (or effectively final in Java 8) so there is no confusion about whether changes to the variable will be seen (because they won't)
Note: Groovy does have this rule and a change to the local variable can mean a change to the value in the enclosing class which is especially confusing if multiple threads are involved.
An example of capture variable.
public void writeToDataBase(final Object toWrite) {
executor.submit(new Runnable() {
public void run() {
writeToDBNow(toWrite);
}
});
// if toWrite were mutable and you changed it now, what would happen !?
}
// after the method returns toWrite no longer exists for the this thread...

Here is a post describing it: http://www.devcodenote.com/2015/04/variable-capture-in-java.html
Here is a snippet from the post:
”It is imposed as a mandate by Java that if an inner class defined within a method references a local variable of that method, that local variable should be defined as final.”
This is because the function may complete execution and get removed from the process stack, with all the variables destroyed but it may be the case that objects of the inner class are still on the heap referencing a particular local variable of that function. To counter this, Java makes a copy of the local variable and gives that as a reference to the inner class. To maintain consistency between the 2 copies, the local variable is mandated to be “final” and non-modifiable.

A captured variable is one from the outside of your local class - one declared in the surrounding block. In some languages this is called a closure.
In the example from the Oracle Docs (simplified) the variable numberLength, declared outside of class PhoneNumber, is "captured".
final int numberLength = 10; // in JDK7 and earlier must be final...
class PhoneNumber {
// you can refer to numberLength here... it has been "captured"
}

Local variables in java

I went through local variables and class variables concept.
But I had stuck at a doubt
" Why is it so that we cannot declare local variables as static " ?
For e.g
Suppose we have a play( ) function :
void play( )
{
static int i=5;
System.out.println(i);
}
It gives me error in eclipse : Illegal modifier for parameter i;
I had this doubt because of the following concepts I have read :
Variables inside method : scope is local i.e within that method.
When variable is declared as static , it is present for the entire class i.e not to particular object.
Please could anyone help me out to clarify the concept.
Thanks.

Because the scope of the local variables is limited to the surrounding block. That's why they cannot be referred to (neither statically, nor non-statically), from other classes or methods.
Wikipedia says about static local variables (in C++ for example):
Static local variables are declared inside a function, just like automatic local variables. They have the same scope as normal local variables, differing only in "storage duration": whatever values the function puts into static local variables during one call will still be present when the function is called again.
That doesn't exist in Java. And in my opinion - for the better.

Java doesn't have static variables like C. Instead, since every method has a class (or instance of a class) associated with it, the persistent scoped variables are best stored at that level (e.g., as private or static private fields). The only real difference is that other methods in the same class can refer to them; since all those methods are constrained to a single file anyway, it's not a big problem in practice.

Static members (variables, functions, etc.) serve to allow callers of the class, whether they're within the class or outside of the class, to execute functions and utilize variables without referring to a specific instance of the class. Because of this, the concept of a "static local" doesn't make sense, as there would be no way for a caller outside of the function to refer to the variable (since it's local to that function).
There are some languages (VB.NET, for example), that have a concept of "static" local variables, though the term "static" is inconsistently used in this scenario; VB.NET static local variables are more like hidden instance variables, where subsequent calls on the same instance will have the previous value intact. For example
Public Class Foo
Public Sub Bar()
Static i As Integer
i = i + 1
Console.WriteLine(i)
End Sub
End Class
...
Dim f As New Foo()
Dim f2 as New Foo()
f.Bar() // Prints "1"
f.Bar() // Prints "2"
f2.Bar() // Prints "1"
So, as you can see, the keyword "static" is not used in the conventional OO meaning here, as it's still specific to a particular instance of Foo.
Because this behavior can be confusing (or, at the very least, unintuitive), other languages like Java and C# are less flexible when it comes to variable declarations. Depending on how you want it to behave, you should declare your variable either as an instance variable or a static/class variable:
If you'd like the variable to exist beyond the scope of the function but be particular to a single instance of the class (like VB.NET does), then create an instance variable:
public class Foo
{
private int bar;
public void Bar()
{
bar++;
System.out.println(bar);
}
}
If you want it to be accessible to all instances of the class (or even without an instance), make it static:
public class Foo
{
private static int bar;
public static void Bar()
{
bar++;
System.out.println(bar);
}
}
(Note that I made Bar() static in the last example, but there is no reason that it has to be.)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Use ANTLR to find Variable usage/reference in Java source-code? - java

Related

Local variable log defined in an enclosing scope must be final or effectively final

Why do java lambda expressions not introduce a new level of scope?

Lambdas: local variables need final, instance variables don't

What are captured variables in Java Local Classes

Local variables in java

Categories

Resources