Using ANTLR to identify global variable declarations in a JavaScript file

Using ANTLR to identify global variable declarations in a JavaScript file - java

I've been using the ANTLR supplied ECMAScript grammar with the objective of identifying JavaScript global variables. An AST is produced and I'm now wondering what the based way of filtering out the global variable declarations is.
I'm interested in looking for all of the outermost "variableDeclaration" tokens in my AST; the actual how-to-do-this is eluding me though. Here's my set up code so far:
String input = "var a, b; var c;";
CharStream cs = new ANTLRStringStream(input);
JavaScriptLexer lexer = new JavaScriptLexer(cs);
CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
JavaScriptParser parser = new JavaScriptParser(tokens);
program_return programReturn = parser.program();
Being new to ANTLR can anyone offer any pointers?

I guess you're using this grammar.
Although that grammar suggests a proper AST is created, this is not the case. It uses some inline operators to exclude certain tokens from the parse-tree, but it never creates any roots for the tree, resulting in a completely flat parse tree. From this, you can't get all global vars in a reasonable way.
You'll need to adjust the grammar slightly:
Add the following under the options { ... } at the top of the grammar file:
tokens
{
VARIABLE;
FUNCTION;
}
Now replace the following rules: functionDeclaration, functionExpression and variableDeclaration with these:
functionDeclaration
: 'function' LT* Identifier LT* formalParameterList LT* functionBody
-> ^(FUNCTION Identifier formalParameterList functionBody)
;
functionExpression
: 'function' LT* Identifier? LT* formalParameterList LT* functionBody
-> ^(FUNCTION Identifier? formalParameterList functionBody)
;
variableDeclaration
: Identifier LT* initialiser?
-> ^(VARIABLE Identifier initialiser?)
;
Now a more suitable tree is generated. If you now parse the source:
var a = 1; function foo() { var b = 2; } var c = 3;
the following tree is generated:
All you now have to do is iterate over the children of the root of your tree and when you stumble upon a VARIABLE token, you know it's a "global" since all other variables will be under FUNCTION nodes.
Here's how to do that:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "var a = 1; function foo() { var b = 2; } var c = 3;";
ANTLRStringStream in = new ANTLRStringStream(source);
JavaScriptLexer lexer = new JavaScriptLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaScriptParser parser = new JavaScriptParser(tokens);
JavaScriptParser.program_return returnValue = parser.program();
CommonTree tree = (CommonTree)returnValue.getTree();
for(Object o : tree.getChildren()) {
CommonTree child = (CommonTree)o;
if(child.getType() == JavaScriptParser.VARIABLE) {
System.out.println("Found a global var: "+child.getChild(0));
}
}
}
}
which produces the following output:
Found a global var: a
Found a global var: c

Related

In antlr visitor pattern how to navigate from one method to another

I am a newbie to Antlr I wanted to know how to navigate from one parse the enter each method and I wanted the below implementation to be done using Antlr4. I am having the below-written functions.
Below is the github link of project. https://github.com/VIKRAMAS/AntlrNestedFunctionParser/tree/master
1. FUNCTION.add(Integer a,Integer b)
2. FUNCTION.concat(String a,String b)
3. FUNCTION.mul(Integer a,Integer b)
And I am storing the functions metadata like this.
Map<String,String> map=new HashMap<>();
map.put("FUNCTION.add","Integer:Integer,Integer");
map.put("FUNCTION.concat","String:String,String");
map.put("FUNCTION.mul","Integer:Integer,Integer");
Where, Integer:Integer,Integer represents Integer is the return type and input params the function will accespts are Integer,Integer.
if the input is something like this
FUNCTION.concat(Function.substring(String,Integer,Integer),String)
or
FUNCTION.concat(Function.substring("test",1,1),String)
Using the visitor implementation I wanted to check whether the input is validate or not against the functions metadata stored in map.
Below is the lexer and parser that I'm using:
Lexer MyFunctionsLexer.g4:
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Za-z0-9]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
Parser MyFunctionsParser.g4:
parser grammar MyFunctionsParser;
options {
tokenVocab=MyFunctionsLexer;
}
function : FUNCTION '.' NAME '('(function | argument (',' argument)*)')';
argument: (NAME | function);
WS : [ \t\r\n]+ -> skip;
I am using Antlr4.
Below is the implementation I'm using as per the suggested answer.
Visitor Implementation:
public class FunctionValidateVisitorImpl extends MyFunctionsParserBaseVisitor {
Map<String, String> map = new HashMap<String, String>();
public FunctionValidateVisitorImpl()
{
map.put("FUNCTION.add", "Integer:Integer,Integer");
map.put("FUNCTION.concat", "String:String,String");
map.put("FUNCTION.mul", "Integer:Integer,Integer");
map.put("FUNCTION.substring", "String:String,Integer,Integer");
}
#Override
public String visitFunctions(#NotNull MyFunctionsParser.FunctionsContext ctx) {
System.out.println("entered the visitFunctions::");
for (int i = 0; i < ctx.getChildCount(); ++i)
{
ParseTree c = ctx.getChild(i);
if (c.getText() == "<EOF>")
continue;
String top_level_result = visit(ctx.getChild(i));
System.out.println(top_level_result);
if (top_level_result == null)
{
System.out.println("Failed semantic analysis: "+ ctx.getChild(i).getText());
}
}
return null;
}
#Override
public String visitFunction( MyFunctionsParser.FunctionContext ctx) {
// Get function name and expected type information.
String name = ctx.getChild(2).getText();
String type=map.get("FUNCTION." + name);
if (type == null)
{
return null; // not declared in function table.
}
String result_type = type.split(":")[0];
String args_types = type.split(":")[1];
String[] expected_arg_type = args_types.split(",");
int j = 4;
ParseTree a = ctx.getChild(j);
if (a instanceof MyFunctionsParser.FunctionContext)
{
String v = visit(a);
if (v != result_type)
{
return null; // Handle type mismatch.
}
} else {
for (int i = j; i < ctx.getChildCount(); i += 2)
{
ParseTree parameter = ctx.getChild(i);
String v = visit(parameter);
if (v != expected_arg_type[(i - j)/2])
{
return null; // Handle type mismatch.
}
}
}
return result_type;
}
#Override
public String visitArgument(ArgumentContext ctx){
ParseTree c = ctx.getChild(0);
if (c instanceof TerminalNodeImpl)
{
// Unclear if what this is supposed to parse:
// Mutate "1" to "Integer"?
// Mutate "Integer" to "String"?
// Or what?
return c.getText();
}
else
return visit(c);
}
}
Testcalss:
public class FunctionValidate {
public static void main(String[] args) {
String input = "FUNCTION.concat(FUNCTION.substring(String,Integer,Integer),String)";
ANTLRInputStream str = new ANTLRInputStream(input);
MyFunctionsLexer lexer = new MyFunctionsLexer(str);
CommonTokenStream tokens = new CommonTokenStream(lexer);
MyFunctionsParser parser = new MyFunctionsParser(tokens);
parser.removeErrorListeners(); // remove ConsoleErrorListener
parser.addErrorListener(new VerboseListener()); // add ours
FunctionsContext tree = parser.functions();
FunctionValidateVisitorImpl visitor = new FunctionValidateVisitorImpl();
visitor.visit(tree);
}
}
Lexer:
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Za-z0-9]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
WS : [ \t\r\n]+ -> skip;
Parser:
parser grammar MyFunctionsParser;
options { tokenVocab=MyFunctionsLexer; }
functions : function* EOF;
function : FUNCTION '.' NAME '(' (function | argument (',' argument)*) ')';
argument: (NAME | function);
Verbose Listener:
public class VerboseListener extends BaseErrorListener {
#Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
List<String> stack = ((Parser)recognizer).getRuleInvocationStack();
Collections.reverse(stack);
throw new FunctionInvalidException("line "+line+":"+charPositionInLine+" at "+ offendingSymbol+": "+msg);
}
}
Output:
It is not entering visitor implementation as it is not printing System.out.println("entered the visitFunctions::"); statement. I am not able to walk through the child nodes by using visit method.

You have a version skew between your generated parser and the runtime. Further, you have a version skew in your generated .java files, as though you downloaded and ran two Antlr tool versions (4.4 and 4.7.2), once without the -visitor option, then again with it. The source for MyFunctionsParser.java is in AntlrNestedFunctionParser\FunctionValidator\target\generated-sources\antlr4\com\functionvalidate\validate. At the top of the file, it says
// Generated from MyFunctionsParser.g4 by ANTLR 4.4
The source for MyFunctionsParserVisitor.java is
// Generated from com\functionvalidate\validate\MyFunctionsParser.g4 by ANTLR 4.7.2
The runtime is 4.7.2, which you state in pom.xml in AntlrNestedFunctionParser\FunctionValidator. There's MyFunctionsLexer.tokens defined in at least two locations, which one you are picking up, who knows. I'm not familiar with the Antlr build rules associated with the pom.xml, but what was generated is a mess (which is why I wrote my own build rules and editor for Antlr for C#). Make sure you clean the target directory completely, generate clean fresh up-to-date .java files, and you are using the right Antlr runtime 4.7.2.

How to implement the visitor pattern for nested function

I am a newbie to Antlr and I wanted the below implementation to be done using Antlr4. I am having the below-written functions.
1. FUNCTION.add(Integer a,Integer b)
2. FUNCTION.concat(String a,String b)
3. FUNCTION.mul(Integer a,Integer b)
And I am storing the functions metadata like this.
Map<String,String> map=new HashMap<>();
map.put("FUNCTION.add","Integer:Integer,Integer");
map.put("FUNCTION.concat","String:String,String");
map.put("FUNCTION.mul","Integer:Integer,Integer");
Where, Integer:Integer,Integer represents Integer is the return type and input params the function will accespts are Integer,Integer.
if the input is something like this
FUNCTION.concat(Function.substring(String,Integer,Integer),String)
or
FUNCTION.concat(Function.substring("test",1,1),String)
Using the visitor implementation I wanted to check whether the input is validate or not against the functions metadata stored in map.
Below is the lexer and parser that I'm using:
Lexer MyFunctionsLexer.g4:
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Za-z0-9]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
Parser MyFunctionsParser.g4:
parser grammar MyFunctionsParser;
options {
tokenVocab=MyFunctionsLexer;
}
function : FUNCTION '.' NAME '('(function | argument (',' argument)*)')';
argument: (NAME | function);
WS : [ \t\r\n]+ -> skip;
I am using Antlr4.
Below is the implementation I'm using as per the suggested answer.
Visitor Implementation:
public class FunctionValidateVisitorImpl extends MyFunctionsParserBaseVisitor {
Map<String, String> map = new HashMap<String, String>();
public FunctionValidateVisitorImpl()
{
map.put("FUNCTION.add", "Integer:Integer,Integer");
map.put("FUNCTION.concat", "String:String,String");
map.put("FUNCTION.mul", "Integer:Integer,Integer");
map.put("FUNCTION.substring", "String:String,Integer,Integer");
}
#Override
public String visitFunctions(#NotNull MyFunctionsParser.FunctionsContext ctx) {
System.out.println("entered the visitFunctions::");
for (int i = 0; i < ctx.getChildCount(); ++i)
{
ParseTree c = ctx.getChild(i);
if (c.getText() == "<EOF>")
continue;
String top_level_result = visit(ctx.getChild(i));
System.out.println(top_level_result);
if (top_level_result == null)
{
System.out.println("Failed semantic analysis: "+ ctx.getChild(i).getText());
}
}
return null;
}
#Override
public String visitFunction( MyFunctionsParser.FunctionContext ctx) {
// Get function name and expected type information.
String name = ctx.getChild(2).getText();
String type=map.get("FUNCTION." + name);
if (type == null)
{
return null; // not declared in function table.
}
String result_type = type.split(":")[0];
String args_types = type.split(":")[1];
String[] expected_arg_type = args_types.split(",");
int j = 4;
ParseTree a = ctx.getChild(j);
if (a instanceof MyFunctionsParser.FunctionContext)
{
String v = visit(a);
if (v != result_type)
{
return null; // Handle type mismatch.
}
} else {
for (int i = j; i < ctx.getChildCount(); i += 2)
{
ParseTree parameter = ctx.getChild(i);
String v = visit(parameter);
if (v != expected_arg_type[(i - j)/2])
{
return null; // Handle type mismatch.
}
}
}
return result_type;
}
#Override
public String visitArgument(ArgumentContext ctx){
ParseTree c = ctx.getChild(0);
if (c instanceof TerminalNodeImpl)
{
// Unclear if what this is supposed to parse:
// Mutate "1" to "Integer"?
// Mutate "Integer" to "String"?
// Or what?
return c.getText();
}
else
return visit(c);
}
}
Testcalss:
public class FunctionValidate {
public static void main(String[] args) {
String input = "FUNCTION.concat(FUNCTION.substring(String,Integer,Integer),String)";
ANTLRInputStream str = new ANTLRInputStream(input);
MyFunctionsLexer lexer = new MyFunctionsLexer(str);
CommonTokenStream tokens = new CommonTokenStream(lexer);
MyFunctionsParser parser = new MyFunctionsParser(tokens);
parser.removeErrorListeners(); // remove ConsoleErrorListener
parser.addErrorListener(new VerboseListener()); // add ours
FunctionsContext tree = parser.functions();
FunctionValidateVisitorImpl visitor = new FunctionValidateVisitorImpl();
visitor.visit(tree);
}
}
Lexer:
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Za-z0-9]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
WS : [ \t\r\n]+ -> skip;
Parser:
parser grammar MyFunctionsParser;
options { tokenVocab=MyFunctionsLexer; }
functions : function* EOF;
function : FUNCTION '.' NAME '(' (function | argument (',' argument)*) ')';
argument: (NAME | function);
Verbose Listener:
public class VerboseListener extends BaseErrorListener {
#Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
List<String> stack = ((Parser)recognizer).getRuleInvocationStack();
Collections.reverse(stack);
throw new FunctionInvalidException("line "+line+":"+charPositionInLine+" at "+ offendingSymbol+": "+msg);
}
}
Output:
It is not entering visitor implementation as it is not printing System.out.println("entered the visitFunctions::"); statement.

Below is a solution in C#. This should give you an idea of how to proceed. You should be able to easily translate the code to Java.
For ease, I implemented the code using my extension AntlrVSIX for Visual Studio 2019 with NET Core C#. It makes life easier using a full IDE that supports the building of split lexer/parser grammars, debugging, and a plug-in that is suited for editing Antlr grammars.
There are several things to note with your grammar. First, your parser grammar isn't accepted by Antlr 4.7.2. Production "WS : [ \t\r\n]+ -> skip;" is a lexer rule, it can't go in a parser grammar. It has to go into the lexer grammar (or you define a combined grammar). Second, I personally wouldn't define lexer symbols like DOT, and then use in the parser the RHS of the lexer symbol directly in the parser grammar, e.g., '.'. It's confusing, and I'm pretty sure there isn't an IDE or editor would know how to go to the definition "DOT: '.';" in the lexer grammar if you positioned your cursor on the '.' in the parser grammar. I never understood why it's allowed in Antlr, but c'est la vie. I would instead use the lexer symbol you define. Third, I would consider augmenting the parser grammar in the usual way with EOF, e.g., "functions : function* EOF". But, this is entirely up to you.
Now, on the problem statement, your example input contains an inconsistency. In the first case, "substring(String,Integer,Integer)", the input is in a meta-like description of substring(). In the second case, "substring(\"test\",1,1)", you are parsing code. The first case parses with your grammar, the second does not--there's no string literal lexer rule defined in your lexer grammar. It's unclear what you really want to parse.
Overall, I defined the visitor code over strings, i.e., each method returns a string representing the output type of the function or argument, e.g., "Integer" or "String" or null if there was an error (or you could throw an exception for static semantic errors). Then, using Visit() on each child in the parse tree node, check the resulting string if it is expected, and handle matches as you like.
One other thing to note. You can solve this problem via a visitor or listener class. The visitor class is useful for purely synthesized attributes. In this example solution, I return a string that represents the type of the function or arg up the associated parse tree, checking the value for each important child. The listener class is useful for L-attributed grammars--i.e., where you are passing attributes in a DFS-oriented manner, left to right at each node in the tree. For this example, you could use the listener class and only override the Exit() functions, but you would then need a Map/Dictionary to map a "context" into an attribute (string).
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Za-z0-9]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
WS : [ \t\r\n]+ -> skip;
parser grammar MyFunctionsParser;
options { tokenVocab=MyFunctionsLexer; }
functions : function* EOF;
function : FUNCTION '.' NAME '(' (function | argument (',' argument)*) ')';
argument: (NAME | function);
using Antlr4.Runtime;
namespace AntlrConsole2
{
public class Program
{
static void Main(string[] args)
{
var input = #"FUNCTION.concat(FUNCTION.substring(String,Integer,Integer),String)";
var str = new AntlrInputStream(input);
var lexer = new MyFunctionsLexer(str);
var tokens = new CommonTokenStream(lexer);
var parser = new MyFunctionsParser(tokens);
var listener = new ErrorListener<IToken>();
parser.AddErrorListener(listener);
var tree = parser.functions();
if (listener.had_error)
{
System.Console.WriteLine("error in parse.");
}
else
{
System.Console.WriteLine("parse completed.");
}
var visitor = new Validate();
visitor.Visit(tree);
}
}
}
namespace AntlrConsole2
{
using System;
using Antlr4.Runtime.Misc;
using System.Collections.Generic;
class Validate : MyFunctionsParserBaseVisitor<string>
{
Dictionary<String, String> map = new Dictionary<String, String>();
public Validate()
{
map.Add("FUNCTION.add", "Integer:Integer,Integer");
map.Add("FUNCTION.concat", "String:String,String");
map.Add("FUNCTION.mul", "Integer:Integer,Integer");
map.Add("FUNCTION.substring", "String:String,Integer,Integer");
}
public override string VisitFunctions([NotNull] MyFunctionsParser.FunctionsContext context)
{
for (int i = 0; i < context.ChildCount; ++i)
{
var c = context.GetChild(i);
if (c.GetText() == "<EOF>")
continue;
var top_level_result = Visit(context.GetChild(i));
if (top_level_result == null)
{
System.Console.WriteLine("Failed semantic analysis: "
+ context.GetChild(i).GetText());
}
}
return null;
}
public override string VisitFunction(MyFunctionsParser.FunctionContext context)
{
// Get function name and expected type information.
var name = context.GetChild(2).GetText();
map.TryGetValue("FUNCTION." + name, out string type);
if (type == null)
{
return null; // not declared in function table.
}
string result_type = type.Split(":")[0];
string args_types = type.Split(":")[1];
string[] expected_arg_type = args_types.Split(",");
const int j = 4;
var a = context.GetChild(j);
if (a is MyFunctionsParser.FunctionContext)
{
var v = Visit(a);
if (v != result_type)
{
return null; // Handle type mismatch.
}
} else {
for (int i = j; i < context.ChildCount; i += 2)
{
var parameter = context.GetChild(i);
var v = Visit(parameter);
if (v != expected_arg_type[(i - j)/2])
{
return null; // Handle type mismatch.
}
}
}
return result_type;
}
public override string VisitArgument([NotNull] MyFunctionsParser.ArgumentContext context)
{
var c = context.GetChild(0);
if (c is Antlr4.Runtime.Tree.TerminalNodeImpl)
{
// Unclear if what this is supposed to parse:
// Mutate "1" to "Integer"?
// Mutate "Integer" to "String"?
// Or what?
return c.GetText();
}
else
return Visit(c);
}
}
}

How can I detect if a user enters a string which does not follow my ANTLR grammar rules?

I am making a Computer Algebra System which will take an algebraic expression and simplify or differentiate it.
As you can see by the following code the user input is taken, but if it is a string which does not conform to my grammar rules the error,
line 1:6 mismatched input '' expecting {'(', INT, VAR}, occurs and the program continues running.
How would I catch the error and stop the program from running? Thank you in advance for any help.
Controller class:
public static void main(String[] args) throws IOException {
String userInput = "x*x*x+";
getAST(userInput);
}
public static AST getAST(String userInput) {
ParseTree tree = null;
ExpressionLexer lexer = null;
ANTLRInputStream input = new ANTLRInputStream(userInput);
try {
lexer = new ExpressionLexer(input);
}catch(Exception e) {
System.out.println("Incorrect grammar");
}
System.out.println("Lexer created");
CommonTokenStream tokens = new CommonTokenStream(lexer);
System.out.println("Tokens created");
ExpressionParser parser = new ExpressionParser(tokens);
System.out.println("Tokens parsed");
tree = parser.expr();
System.out.println("Tree created");
System.out.println(tree.toStringTree(parser)); // print LISP-style tree
Trees.inspect(tree, parser);
ParseTreeWalker walker = new ParseTreeWalker();
ExpressionListener listener = new buildAST();
walker.walk(listener, tree);
listener.printAST();
listener.extractExpression();
return new AST();
}
}
My Grammar:
grammar Expression;
#header {
package exprs;
}
#members {
// This method makes the parser stop running if it encounters
// invalid input and throw a RuntimeException.
public void reportErrorsAsExceptions() {
//removeErrorListeners();
addErrorListener(new ExceptionThrowingErrorListener());
}
private static class ExceptionThrowingErrorListener extends BaseErrorListener {
#Override
public void syntaxError(Recognizer<?, ?> recognizer,
Object offendingSymbol, int line, int charPositionInLine,
String msg, RecognitionException e) {
throw new RuntimeException(msg);
}
}
}
#rulecatch {
// ANTLR does not generate its normal rule try/catch
catch(RecognitionException e) {
throw e;
}
}
expr : left=expr op=('*'|'/'|'^') right=expr
| left=expr op=('+'|'-') right=expr
| '(' expr ')'
| atom
;
atom : INT|VAR;
INT : ('0'..'9')+ ;
VAR : ('a' .. 'z') | ('A' .. 'Z') | '_';
WS : [ \t\r\n]+ -> skip ;

A typical parse run with ANTLR4 consists of 2 stages:
A "quick'n dirty" run with SLL prediction mode that bails out on the first found syntax error.
A normal run using the LL prediction mode which tries to recover from parser errors. This second step only needs to be executed if there was an error in the first step.
The first step is kinda loose parse run which doesn't resolve certain ambiquities and hence can report an error which doesn't really exist (when resolved in LL mode). But the first step is faster and delivers so a quicker result for syntactically correct input. This (JS) code shows the setup:
this.parser.removeErrorListeners();
this.parser.addErrorListener(this.errorListener);
this.parser.errorHandler = new BailErrorStrategy();
this.parser.interpreter.setPredictionMode(PredictionMode.SLL);
try {
this.tree = this.parser.grammarSpec();
} catch (e) {
if (e instanceof ParseCancellationException) {
this.tokenStream.seek(0);
this.parser.reset();
this.parser.errorHandler = new DefaultErrorStrategy();
this.parser.interpreter.setPredictionMode(PredictionMode.LL);
this.tree = this.parser.grammarSpec();
} else {
throw e;
}
}
In order to avoid any resolve attempt for syntax errors in the first step you also have to set the BailErrorStrategy. This strategy simply throws a ParseCancellationException in case of a syntax error (similar like you do in your code). You could add your own handling in the catch clause to ask the user for correct input and respin the parse step.

Accessing string template rule names from ANTLR base listener

Working on a pretty printer. Based on my understanding of ANTLR and StringTemplate so far, if I want to match all my grammar rules to templates and apply the template each time the grammar rule is invoked, I can create my templates with names matching my grammar rules.
[Side question: Is this how I should approach it? It seems like ANTLR should being doing the work of matching the parsed text to the output templates. My job will be to make sure the parser rules and templates are complete/correct.]
I think ANTLR 3 allowed directly setting templates inside of the ANTLR grammar, but ANTLR 4 seems to have moved away from that.
Based on the above assumptions, it looks like the MyGrammarBaseListener class that ANTLR generates is going to be doing all the work.
I've been able to collect the names of the rules invoked while parsing the text input by converting this example to ANTLR 4. I ended up with this for my enterEveryRule():
#Override public void enterEveryRule(ParserRuleContext ctx) {
if (builder.length() > 0) {
builder.append(' ');
}
if (ctx.getChildCount() > 0) {
builder.append('(');
}
int ruleIndex = ctx.getRuleIndex();
String ruleName;
if (ruleIndex >= 0 && ruleIndex < ruleNames.length) {
ruleName = ruleNames[ruleIndex];
System.out.println(ruleName); // this part works as intended
}
else {
ruleName = Integer.toString(ruleIndex);
}
builder.append(ruleName);
// CONFUSION HERE:
// get template names (looking through the API to figure out how to do this)
Set<String> templates = (MyTemplates.stg).getTemplateNames()
// or String[] for return value? Java stuff
// for each ruleName in ruleNames
// if (ruleName == templateName)
// run template using rule children as parameters
// write pretty-printed version to file
}
The linked example applies the changes to create the text output in exitEveryRule() so I'm not sure where to actually implement my template-matching algorithm. I'll experiment with both enter and exit to see what works best.
My main question is: How do I access the template names in MyTemplates.stg? What do I have to import, etc.?
(I'll probably be back to ask about matching up rule children to template parameters in a different question...)

Following demonstrates a simple way of dynamically accessing and rendering named StringTemplates. Intent is to build varMap values in the listener (or visitor) in its corresponding context, keyed by parameter name, and call the context dependent named template to incrementally render the content of the template.
public class Render {
private static final String templateDir = "some/path/to/templates";
private STGroupFile blocksGroup;
private STGroupFile stmtGroup;
public Render() {
blocksGroup = new STGroupFile(Strings.concatAsClassPath(templateDir, "Blocks.stg"));
stmtGroup = new STGroupFile(Strings.concatAsClassPath(templateDir, "Statements.stg"));
}
public String gen(GenType type, String name) {
return gen(type, name, null);
}
/**
* type is an enum, identifying the group template
* name is the template name within the group
* varMap contains the named values to be passed to the template
*/
public String gen(GenType type, String name, Map<String, Object> varMap) {
Log.debug(this, name);
STGroupFile stf = null;
switch (type) {
case BLOCK:
stf = blocksGroup;
break;
case STMT:
stf = stmtGroup;
break;
}
ST st = stf.getInstanceOf(name);
if (varMap != null) {
for (String varName : varMap.keySet()) {
try {
st.add(varName, varMap.get(varName));
} catch (NullPointerException e) {
Log.error(this, "Error adding attribute: " + name + ":" + varName + " [" + e.getMessage() + "]");
}
}
}
return st.render();
}
}

Remove LineComment in Java AST

I´m trying to remove a LineComment from a java file via AST. I read the document from a source file, create an AST parser (AST.JLS3) and afterwards create a CompilationUnit and an ASTRewrite instance.
doc = new Document( doctext );
parser = ASTParser.newParser( AST.JLS3 );
parser.setSource( doc.get().toCharArray() );
cu = (CompilationUnit) parser.createAST( null );
astRewrite = ASTRewrite.create( cu.getAST() );
Nothing special so far, I´m able to add and remove fields a.s.o. Now I´m trying to remove comments from the unit with the following code:
#SuppressWarnings( "unchecked" )
final List<Comment> comments = (List<Comment>) cu.getCommentList();
final Iterator<Comment> commentIter = comments.iterator();
while ( commentIter.hasNext() ) {
final Comment curComment = commentIter.next();
if ( curComment.isLineComment() ) {
final LineComment lineComment = (LineComment) curComment;
lineComment.accept( new CommentCopyrightFieldVisitor( cu, document.get(), astRewrite ) );
}
}
Here´s the visitor that should perform the action and remove the comment.
public class CommentFieldVisitor extends ASTVisitor {
final CompilationUnit cu;
final String sourceCode;
final ASTRewrite astRewrite;
public CommentFieldVisitor( final CompilationUnit cu, final String sourceCode, final ASTRewrite astRewrite ) {
this.cu = cu;
this.sourceCode = sourceCode;
this.astRewrite = astRewrite;
}
#Override
public boolean visit( final LineComment commentNode ) {
int start = commentNode.getStartPosition();
int end = start + commentNode.getLength();
final String comment = sourceCode.substring( start, end );
final String fieldComment = Config.INSTANCE.getTargetFieldComment();
if ( comment != null && comment.equalsIgnoreCase( fieldComment ) ) {
System.out.println( "REMOVE COMMENT" );
assert astRewrite != null : "ERROR: AST Rewriter is null";
astRewrite.remove( commentNode, null );
}
return false;
}
}
I iterate all comments in the compilation unit, I create a visitor for every comment in the list. This visitor checks, if the content of the comment equals a preconfigured string. If it does, it should be removed. Though if I call
astRewrite.remove( commentNode, null );
I always get a NPE from inside the remove method. astRewrite and commentNode are defined (because the remove-code is reached.
Does anyone have an idea, what I might be doing wrong? Or another approach how to remove such a comment via AST?

I managed it via a workaround which uses Comment.getStartPosition() and Comment.getLength(). I use these methods to exract the comments from my source code file and replace them with "". After that I need to re-create the AST tree from the modified source code. This is far away from perfect, but I didn´t find an alternative solution.

A tiny bit late here but I'm sure the answer will help other people.
I can confirm the NPE with remove method when it lacks the required context of a ICompilationUnit to execute. This results in :
REMOVE COMMENT
Exception in thread "main" java.lang.NullPointerException: Cannot invoke "org.eclipse.jdt.core.dom.StructuralPropertyDescriptor.isChildListProperty()" because "property" is null
at org.eclipse.jdt.core.dom.rewrite.ASTRewrite.remove(ASTRewrite.java:398)
Here's how I do when I don't have such context:
String sourceCode = "/* block comment */ public class Hello {} // line comment";
Document doc = new Document(sourceCode);
ASTParser parser = ASTParser.newParser(AST.getJLSLatest());
parser.setSource(doc.get().toCharArray());
CompilationUnit cu = (CompilationUnit) parser.createAST(null);
ASTRewrite astRewrite = ASTRewrite.create(cu.getAST());
TextEdit edits = astRewrite.rewriteAST(doc, null);
final List<Comment> comments = cu.getCommentList();
List<TextEdit> textEdits = new ArrayList<>();
for (Comment curComment : comments) {
if (curComment.isLineComment()) {
final LineComment lineComment = (LineComment) curComment;
int commentStart = lineComment.getStartPosition();
int commentLength = lineComment.getLength();
int commentEnd = commentStart + commentLength;
String comment = sourceCode.substring(commentStart, commentEnd);
if (comment != null && comment.equalsIgnoreCase("// line comment")) {
textEdits.add(new DeleteEdit(commentStart, commentLength));
}
}
}
edits.addChildren(textEdits.toArray(TextEdit[]::new));
try {
edits.apply(doc);
System.out.println(doc.get());
} catch (MalformedTreeException | BadLocationException e) {
e.printStackTrace();
}
See also this project https://github.com/JnRouvignac/AutoRefactor and especially these 2 classes below:
ASTCommentRewriter.java for the solution above with the list of DeleteEdit
CommentsCleanUp.java for the usage of ASTRewrite.remove()

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using ANTLR to identify global variable declarations in a JavaScript file - java

Related

In antlr visitor pattern how to navigate from one method to another

How to implement the visitor pattern for nested function

How can I detect if a user enters a string which does not follow my ANTLR grammar rules?

Accessing string template rule names from ANTLR base listener

Remove LineComment in Java AST

Categories

Resources