AST how to deal with empty nodes

AST how to deal with empty nodes - java

I am building an expression evaluator using Java, JFlex(lexer gen) and Jacc(parser gen). I need to:
generate the lexer
generate the parser
generate the AST
display the AST graph
evaluate expression
I was able to create the lexer and the parser and the AST. Now I am trying to make the AST graph using the visitor pattern, but this made a problem with my generated AST evident (so to speak). In my calculator I need to handle parentheses and they create empty nodes in my AST (and that makes my parse tree not an AST I suppose). Here is the relevant part of my grammar:
Calc : /* empty */
| AddExpr { ast = new Calc($1); }
;
AddExpr : ModExpr
| AddExpr '+' ModExpr { $$ = new AddExpr($1, $3, "+"); }
| AddExpr '-' ModExpr { $$ = new AddExpr($1, $3, "-"); }
;
ModExpr : IntDivExpr
| ModExpr MOD IntDivExpr { $$ = new ModExpr($1, $3); }
;
IntDivExpr : MultExpr
| IntDivExpr DIV MultExpr { $$ = new IntDivExpr($1, $3); }
;
MultExpr : UnaryExpr
| MultExpr '*' UnaryExpr { $$ = new MultExpr($1, $3, "*"); }
| MultExpr '/' UnaryExpr { $$ = new MultExpr($1, $3, "/"); }
;
UnaryExpr : ExpExpr
| '-' UnaryExpr { $$ = new UnaryExpr($2, "-"); }
| '+' UnaryExpr { $$ = new UnaryExpr($2, "+"); }
;
ExpExpr : Value
| ExpExpr '^' Value { $$ = new ExpExpr($1, $3); }
;
Value : DoubleLiteral
| '(' AddExpr ')' { $$ = new Value($2); }
;
DoubleLiteral : DOUBLE { $$ = $1; }
;
Here is an example expression:
1*(2+3)/(4-5)*((((6))))
and the resulting image:
This leaves me with Value nodes for each pair of parentheses. I have a few ideas on how to handle this, but I am not sure how to proceed:
Try to handle this in my grammar (not sure how as I am not allowed to use precedence directives)
Handle this in my evaluator

If you don't want Value nodes, then just replace { $$ = new Value($2); } with { $$ = $2; }.

Related

Antlr3 grammar generates parsering error on encountering the Pound char

Antlr-3 generating an error on encountering the Pound char ("£") of the French language, which is equivalent char of Hash "#" char of English, even the Unicode value for three special characters #, #, and $ are specified in lexer/parser rule.
FYI: The Unicode value of Pound char (of the French language) = The Unicode value of Hash char (of ENGLISH language).
The lexer/parser rules:
grammar SimpleCalc;
options
{
k = 8;
language = Java;
//filter = true;
}
tokens {
PLUS = '+' ;
MINUS = '-' ;
MULT = '*' ;
DIV = '/' ;
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
expr : n1=NUMBER ( exp = ( PLUS | MINUS ) n2=NUMBER )*
{
if ($exp.text.equals("+"))
System.out.println("Plus Result = " + $n1.text + $n2.text);
else
System.out.println("Minus Result = " + $n1.text + $n2.text);
}
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
NUMBER : (DIGIT)+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
The text file also reading in UTF-8 as:
public static void main(String[] args) throws Exception
{
try
{
args = new String[1];
args[0] = new String("antlr_test.txt");
SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], "UTF-8"));
CommonTokenStream tokens = new CommonTokenStream(lex);
SimpleCalcParser parser = new SimpleCalcParser(tokens);
parser.expr();
//System.out.println(tokens);
}
catch (Exception e)
{
e.printStackTrace();
}
}
The input file is having only 1 line:
£3 + 4£
the error is:
antlr_test.txt line 1:1 no viable alternative at character '£'
antlr_test.txt line 1:7 no viable alternative at character '£'
What is wrong with my approach?
or did I miss something?

I cannot reproduce what you describe. When I test your grammar without modifications, I get a NumberFormatException, which is expected, because Integer.parseInt("£3") cannot succeed.
When I change your embedded code into this:
{
if ($exp.text.equals("+"))
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) + Integer.parseInt($n2.text.replaceAll("\\D", ""))));
else
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) - Integer.parseInt($n2.text.replaceAll("\\D", ""))));
}
and regenerate lexer and parser classes (something you might not have done) and rerun the driver code, I get the following output:
Result = 7
EDIT
Perhaps the pound sign in the grammar is the issue? What if you try:
fragment DIGIT : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');
instead of:
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
?

How to get the declared variables ANTLR

I have the following parser rule
study: 'study' '(' ( assign* | ( assign (',' assign)*) ) ')' NEWLINE;
assign: ID '=' (INT | DATA );
INT : [0-9]+ ;
DATA : '"' ID '"' | '"' INT '"';
ID : [a-zA-Z]+ ;
my problem now how I can retrieve the variables defined in the study in the entryStudy method
#Override
public void enterStudy(StudyParser.StudyContext ctx) {
// get the declared variables
// study(hello = "hello",world = "world")
// study(hello = "hello",world = "world",name = "name")
System.out.println("enterStudy");
}

Add the following snippet to your grammar:
#members {
public final java.util.List<java.util.Map.Entry<String, String>> parameters = new java.util.ArrayList<>();
}
Modify your assign rule:
assign: name=ID '=' value=(INT | DATA ) {
parameters.add(new java.util.AbstractMap.SimpleImmutableEntry($name.text, $value.text));
};
Now you can use StudyParser.parameters field to access required information:
StudyParser parser = ...;
parser.study();
System.out.println(parser.parameters);
Also please note that your grammar probably is slightly wrong, because it allows the following input: study(x=1y=2).

Antlr parser rule fails to match either of specified lexer rules

I have a small work-in-progress Antlr grammar that looks like:
filterExpression returns [ActivityPredicate pred]
: NAME OPERATOR (PACE | NUMBER) {
if ($PACE != null) {
$pred = new SingleActivityPredicate($NAME.text, Operator.fromCharacter($OPERATOR.text), $PACE.text);
} else {
$pred = new SingleActivityPredicate($NAME.text, Operator.fromCharacter($OPERATOR.text), $NUMBER.text);
}
};
OPERATOR: ('>' | '<' | '=') ;
NAME: ('A'..'Z' | 'a'..'z')+ ;
NUMBER: ('0'..'9')+ ('.' ('0'..'9')+)? ;
PACE: ('0'..'9')('0'..'9')? ':' ('0'..'5')('0'..'9');
WS: (' ' | '\t' | '\r'| '\n')+ -> skip;
Hoping to parse things like:
distance = 4 or pace < 8:30
However, both of those inputs result in null for both the PACE and NUMBER, while trying to parse either:
However, dropping the option and just picking PACE works fine (it also works fine the other way, opting for NUMBER):
filterExpression returns [ActivityPredicate pred]
: NAME OPERATOR PACE { ... };
Why is it that when I provide the option, they're both null?

Try this.
filterExpression returns [ActivityPredicate pred]
: n=NAME o=OPERATOR (p=PACE | i=NUMBER) {
if ($PACE != null) {
$pred = new SingleActivityPredicate(
$n.text, Operator.fromCharacter($o.text), $p.text);
} else {
$pred = new SingleActivityPredicate(
$n.text, Operator.fromCharacter($o.text), $i.text);
}
};

ANTLR : AST eval problems

Allo,
I would like to eval an AST that i generated.
I wrote a grammar generating an AST, and now I'm triying to write the grammar to evaluate this tree.
Here's my grammar :
tree grammar XHTML2CSVTree;
options {
tokenVocab=XHTML2CSV;
ASTLabelType=CommonTree;
}
#members {
// variables and methods to be included in the java file generated
}
/*------------------------------------------------------------------
* TREE RULES
*------------------------------------------------------------------*/
// example
tableau returns [String csv]
: ^(TABLEAU {String retour="";}(l=ligne{retour += $l.csv;})* {System.out.println(retour);})
;
ligne returns [String csv]
: ^(LIGNE {String ret="";}(c=cellule{ret += $c.csv;})+)
;
cellule returns [String csv]
: ^(CELLULE s=CHAINE){ $csv = $s.text;}
;
And here's the grammar building the AST :
grammar XHTML2CSV;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
CELLULE;
LIGNE;
TABLEAU;
CELLULEG = '<td>'; // simple lexemes
CELLULED = '</td>';
DEBUTCOL = '<tr>';
FINCOL = '</tr>';
DTAB = '<table';
FTAB = '>';
FINTAB = '</table>';
// anonymous tokens (usefull to give meaningfull name to AST labels)
// simple lexemes
}
#members {
// variables and methods to be included in the java file generated
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
tableau
: DTAB STAB* FTAB ligne* FINTAB -> ^(TABLEAU ligne*)
;
ligne
: DEBUTCOL cellule+ FINCOL -> ^(LIGNE cellule+)
;
cellule
: CELLULEG CHAINE CELLULED -> ^(CELLULE CHAINE)
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
STAB
: ' '.*'=\"'.*'\"'
;
WS
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ {$channel = HIDDEN;}
; // skip white spaces
CHAINE : (~('\"' | ',' | '\n' | '<' | '>'))+
;
// complex lexemes
XHTML2CSV.g works, i can see the AST generated in ANTLRworks,
but i cannot parse this AST to generated CSV code.
I get errors :
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: not a statement
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: not a statement
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
5 errors
If someone could help me,
Thanks.
eo
Edit :
My main class looks like :
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.CommonTreeNodeStream;
import org.antlr.runtime.tree.Tree;
public class Main {
public static void main(String args[]) throws Exception {
try {
XHTML2CSVLexer lex = new XHTML2CSVLexer(new ANTLRFileStream(args[0])); // create lexer to read the file specified from command line (i.e., first argument, e.g., java Main test1.xhtml)
CommonTokenStream tokens = new CommonTokenStream(lex); // transform it into a token stream
XHTML2CSVParser parser = new XHTML2CSVParser(tokens); // create the parser that reads from the token stream
Tree t = (Tree) parser.cellule().tree; // (try to) parse a given rule specified in the parser file, e.g., my_main_rule
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t); // transform it into a common data structure readable by the tree pattern
nodes.setTokenStream(tokens); // declare which token to use (i.e., labels of the nodes defined in the parser, mainly anonymous tokens)
XHTML2CSVTree tparser = new XHTML2CSVTree(nodes); // instantiate the tree pattern
System.out.println(tparser.cellule()); // apply patterns
} catch (Exception e) {
e.printStackTrace();
}
}
}

The ligne rule in you tree grammar:
ligne returns [String csv]
: ^(LIGNE {Sting ret="";r}(c=cellule{ret += $c.csv;})+)
; // ^ ^
// | |
// problem 1, problem 2
has 2 problems:
it contains Sting where it should be String;
there's a trailing r that is messing up your custom Java code.
It should be:
ligne returns [String csv]
: ^(LIGNE {String ret="";}(c=cellule{ret += $c.csv;})+)
;
EDIT
If I generate a lexer and parser (1), generate a tree walker (2), compile all .java source files (3) and run the Main class (4):
java -cp antlr-3.3.jar org.antlr.Tool XHTML2CSV.g
java -cp antlr-3.3.jar org.antlr.Tool XHTML2CSVTree.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main test.txt
the following gets printed to the console:
table data
where the file test.txt contains:
<td>table data</td>
So I don't see any problem. Perhaps you're trying to parse a <table>? This would go wrong since both your parser and tree-walker are invoking the cellule rule, not the tableau rule.

How to handle escape sequences in string literals in ANTLR 3?

I've been looking through the ANTLR v3 documentation (and my trusty copy of "The Definitive ANTLR reference"), and I can't seem to find a clean way to implement escape sequences in string literals (I'm currently using the Java target). I had hoped to be able to do something like:
fragment
ESCAPE_SEQUENCE
: '\\' '\'' { setText("'"); }
;
STRING
: '\'' (ESCAPE_SEQUENCE | ~('\'' | '\\'))* '\''
{
// strip the quotes from the resulting token
setText(getText().substring(1, getText().length() - 1));
}
;
For example, I would want the input token "'Foo\'s House'" to become the String "Foo's House".
Unfortunately, the setText(...) call in the ESCAPE_SEQUENCE fragment sets the text for the entire STRING token, which is obviously not what I want.
Is there a way to implement this grammar without adding a method to go back through the resulting string and manually replace escape sequences (e.g., with something like setText(escapeString(getText())) in the STRING rule)?

Here is how I accomplished this in the JSON parser I wrote.
STRING
#init{StringBuilder lBuf = new StringBuilder();}
:
'"'
( escaped=ESC {lBuf.append(getText());} |
normal=~('"'|'\\'|'\n'|'\r') {lBuf.appendCodePoint(normal);} )*
'"'
{setText(lBuf.toString());}
;
fragment
ESC
: '\\'
( 'n' {setText("\n");}
| 'r' {setText("\r");}
| 't' {setText("\t");}
| 'b' {setText("\b");}
| 'f' {setText("\f");}
| '"' {setText("\"");}
| '\'' {setText("\'");}
| '/' {setText("/");}
| '\\' {setText("\\");}
| ('u')+ i=HEX_DIGIT j=HEX_DIGIT k=HEX_DIGIT l=HEX_DIGIT
{setText(ParserUtil.hexToChar(i.getText(),j.getText(),
k.getText(),l.getText()));}
)
;

For ANTLR4, Java target and standard escaped string grammar, I used a dedicated singleton class : CharSupport to translate string. It is available in antlr API :
STRING : '"'
( ESC
| ~('"'|'\\'|'\n'|'\r')
)*
'"' {
setText(
org.antlr.v4.misc.CharSupport.getStringFromGrammarStringLiteral(
getText()
)
);
}
;
As I saw in V4 documentation and by experiments, #init is no longer supported in lexer part!

Another (possibly more efficient) alternative is to use rule arguments:
STRING
#init { final StringBuilder buf = new StringBuilder(); }
:
'"'
(
ESCAPE[buf]
| i = ~( '\\' | '"' ) { buf.appendCodePoint(i); }
)*
'"'
{ setText(buf.toString()); };
fragment ESCAPE[StringBuilder buf] :
'\\'
( 't' { buf.append('\t'); }
| 'n' { buf.append('\n'); }
| 'r' { buf.append('\r'); }
| '"' { buf.append('\"'); }
| '\\' { buf.append('\\'); }
| 'u' a = HEX_DIGIT b = HEX_DIGIT c = HEX_DIGIT d = HEX_DIGIT { buf.append(ParserUtil.hexChar(a, b, c, d)); }
);

I needed to do just that, but my target was C and not Java. Here's how I did it based on answer #1 (and comment), in case anyone needs something alike:
QUOTE : '\'';
STR
#init{ pANTLR3_STRING unesc = GETTEXT()->factory->newRaw(GETTEXT()->factory); }
: QUOTE ( reg = ~('\\' | '\'') { unesc->addc(unesc, reg); }
| esc = ESCAPED { unesc->appendS(unesc, GETTEXT()); } )+ QUOTE { SETTEXT(unesc); };
fragment
ESCAPED : '\\'
( '\\' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\\")); }
| '\'' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\'")); }
)
;
HTH.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

AST how to deal with empty nodes - java

If you don't want Value nodes, then just replace { $$ = new Value($2); } with { $$ = $2; }.

Related

Antlr3 grammar generates parsering error on encountering the Pound char

How to get the declared variables ANTLR

Antlr parser rule fails to match either of specified lexer rules

ANTLR : AST eval problems

How to handle escape sequences in string literals in ANTLR 3?

Categories

Resources