ANTLR : AST eval problems - java

Allo,
I would like to eval an AST that i generated.
I wrote a grammar generating an AST, and now I'm triying to write the grammar to evaluate this tree.
Here's my grammar :
tree grammar XHTML2CSVTree;
options {
tokenVocab=XHTML2CSV;
ASTLabelType=CommonTree;
}
#members {
// variables and methods to be included in the java file generated
}
/*------------------------------------------------------------------
* TREE RULES
*------------------------------------------------------------------*/
// example
tableau returns [String csv]
: ^(TABLEAU {String retour="";}(l=ligne{retour += $l.csv;})* {System.out.println(retour);})
;
ligne returns [String csv]
: ^(LIGNE {String ret="";}(c=cellule{ret += $c.csv;})+)
;
cellule returns [String csv]
: ^(CELLULE s=CHAINE){ $csv = $s.text;}
;
And here's the grammar building the AST :
grammar XHTML2CSV;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
CELLULE;
LIGNE;
TABLEAU;
CELLULEG = '<td>'; // simple lexemes
CELLULED = '</td>';
DEBUTCOL = '<tr>';
FINCOL = '</tr>';
DTAB = '<table';
FTAB = '>';
FINTAB = '</table>';
// anonymous tokens (usefull to give meaningfull name to AST labels)
// simple lexemes
}
#members {
// variables and methods to be included in the java file generated
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
tableau
: DTAB STAB* FTAB ligne* FINTAB -> ^(TABLEAU ligne*)
;
ligne
: DEBUTCOL cellule+ FINCOL -> ^(LIGNE cellule+)
;
cellule
: CELLULEG CHAINE CELLULED -> ^(CELLULE CHAINE)
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
STAB
: ' '.*'=\"'.*'\"'
;
WS
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ {$channel = HIDDEN;}
; // skip white spaces
CHAINE : (~('\"' | ',' | '\n' | '<' | '>'))+
;
// complex lexemes
XHTML2CSV.g works, i can see the AST generated in ANTLRworks,
but i cannot parse this AST to generated CSV code.
I get errors :
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: not a statement
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: not a statement
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
5 errors
If someone could help me,
Thanks.
eo
Edit :
My main class looks like :
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.CommonTreeNodeStream;
import org.antlr.runtime.tree.Tree;
public class Main {
public static void main(String args[]) throws Exception {
try {
XHTML2CSVLexer lex = new XHTML2CSVLexer(new ANTLRFileStream(args[0])); // create lexer to read the file specified from command line (i.e., first argument, e.g., java Main test1.xhtml)
CommonTokenStream tokens = new CommonTokenStream(lex); // transform it into a token stream
XHTML2CSVParser parser = new XHTML2CSVParser(tokens); // create the parser that reads from the token stream
Tree t = (Tree) parser.cellule().tree; // (try to) parse a given rule specified in the parser file, e.g., my_main_rule
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t); // transform it into a common data structure readable by the tree pattern
nodes.setTokenStream(tokens); // declare which token to use (i.e., labels of the nodes defined in the parser, mainly anonymous tokens)
XHTML2CSVTree tparser = new XHTML2CSVTree(nodes); // instantiate the tree pattern
System.out.println(tparser.cellule()); // apply patterns
} catch (Exception e) {
e.printStackTrace();
}
}
}

The ligne rule in you tree grammar:
ligne returns [String csv]
: ^(LIGNE {Sting ret="";r}(c=cellule{ret += $c.csv;})+)
; // ^ ^
// | |
// problem 1, problem 2
has 2 problems:
it contains Sting where it should be String;
there's a trailing r that is messing up your custom Java code.
It should be:
ligne returns [String csv]
: ^(LIGNE {String ret="";}(c=cellule{ret += $c.csv;})+)
;
EDIT
If I generate a lexer and parser (1), generate a tree walker (2), compile all .java source files (3) and run the Main class (4):
java -cp antlr-3.3.jar org.antlr.Tool XHTML2CSV.g
java -cp antlr-3.3.jar org.antlr.Tool XHTML2CSVTree.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main test.txt
the following gets printed to the console:
table data
where the file test.txt contains:
<td>table data</td>
So I don't see any problem. Perhaps you're trying to parse a <table>? This would go wrong since both your parser and tree-walker are invoking the cellule rule, not the tableau rule.

Related

Antlr3 grammar generates parsering error on encountering the Pound char

Antlr-3 generating an error on encountering the Pound char ("£") of the French language, which is equivalent char of Hash "#" char of English, even the Unicode value for three special characters #, #, and $ are specified in lexer/parser rule.
FYI: The Unicode value of Pound char (of the French language) = The Unicode value of Hash char (of ENGLISH language).
The lexer/parser rules:
grammar SimpleCalc;
options
{
k = 8;
language = Java;
//filter = true;
}
tokens {
PLUS = '+' ;
MINUS = '-' ;
MULT = '*' ;
DIV = '/' ;
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
expr : n1=NUMBER ( exp = ( PLUS | MINUS ) n2=NUMBER )*
{
if ($exp.text.equals("+"))
System.out.println("Plus Result = " + $n1.text + $n2.text);
else
System.out.println("Minus Result = " + $n1.text + $n2.text);
}
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
NUMBER : (DIGIT)+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
The text file also reading in UTF-8 as:
public static void main(String[] args) throws Exception
{
try
{
args = new String[1];
args[0] = new String("antlr_test.txt");
SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], "UTF-8"));
CommonTokenStream tokens = new CommonTokenStream(lex);
SimpleCalcParser parser = new SimpleCalcParser(tokens);
parser.expr();
//System.out.println(tokens);
}
catch (Exception e)
{
e.printStackTrace();
}
}
The input file is having only 1 line:
£3 + 4£
the error is:
antlr_test.txt line 1:1 no viable alternative at character '£'
antlr_test.txt line 1:7 no viable alternative at character '£'
What is wrong with my approach?
or did I miss something?
I cannot reproduce what you describe. When I test your grammar without modifications, I get a NumberFormatException, which is expected, because Integer.parseInt("£3") cannot succeed.
When I change your embedded code into this:
{
if ($exp.text.equals("+"))
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) + Integer.parseInt($n2.text.replaceAll("\\D", ""))));
else
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) - Integer.parseInt($n2.text.replaceAll("\\D", ""))));
}
and regenerate lexer and parser classes (something you might not have done) and rerun the driver code, I get the following output:
Result = 7
EDIT
Perhaps the pound sign in the grammar is the issue? What if you try:
fragment DIGIT : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');
instead of:
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
?

ANTLR mismatched input 'foo(some_foo)' expecting {'foo'}

I'm writing a parser using ANTLR and am now at the stage of testing my parser/lexer.
I stumbled over a strange bug while trying to parse basically a variable assignment. (Like this)
Foo = mpsga(LT);
I get the error : mismatched input 'line 1:6 mismatched input 'mpsga(LT)' expecting 'mpsga'
This is especially strange for when I remove the brackets (or the argument LT),
the parser recognizes mpsga and it only misses the brackets (or the argument).
My Grammar looks something like this:
Lexer
lexer grammar FooLexer;
COMMENT
:
'#' ~[\r\n]* -> channel ( HIDDEN )
;
NEWLINE
:
(
'\r'? '\n'
| '\r'
)+ -> channel ( HIDDEN )
;
EQUALSSIGN
:
'='
;
SEMICOLON
:
';'
;
MPSGA_255_1
:
'LT'
;
MPSGA
:
'mpsga'
;
WHITESPACE
:
(
' '
| '\t'
)+ -> channel ( HIDDEN )
;
BRACKET_OPEN
:
'('
;
BRACKET_CLOSED
:
')'
;
VAR
:
[a-zA-Z][0-9a-zA-Z_]*
;
Parser
parser grammar FooParser;
options {
tokenVocab = FooLexer;
}
stmt_block
:
stmt_list EOF
;
stmt
:
VAR EQUALSSIGN expr SEMICOLON NEWLINE?
;
stmt_list
:
stmt
| stmt_list stmt
;
expr
:
extvar
;
extvar
:
MPSGA BRACKET_OPEN mpsga_field BRACKET_CLOSED
;
mpsga_field
:
MPSGA_255_1
;
When I try to parse this Foo = mpsga(LT); in Java i get the error.
Any help is appreciated!
Edit:
My Parse hierachy looks like the following:
Foo = mpsga(LT);
stmt_block
->stmt_list:1
-->stmt
--->"Foo"
--->"="
--->expr
---->extvar
----->"mpsga(LT)"
---->";"
-><EOF>
Foo = mpsga(LT;
stmt_block
->stmt_list:1
-->stmt
--->"Foo"
--->"="
--->expr
---->extvar
----->"mpsga"
----->"("
----->mpsga_field
------>"LT"
----->"<missing ')'>"
---->";"
-><EOF>
DISCLAIMER: I solved the problem. For anyone experiencing the same issue: I had some Lexer rules that were ambiguous for the mpsga part.
It's the argument: your grammar accepts 'foo' or 'foo2' as constants, not some_foo.

AST how to deal with empty nodes

I am building an expression evaluator using Java, JFlex(lexer gen) and Jacc(parser gen). I need to:
generate the lexer
generate the parser
generate the AST
display the AST graph
evaluate expression
I was able to create the lexer and the parser and the AST. Now I am trying to make the AST graph using the visitor pattern, but this made a problem with my generated AST evident (so to speak). In my calculator I need to handle parentheses and they create empty nodes in my AST (and that makes my parse tree not an AST I suppose). Here is the relevant part of my grammar:
Calc : /* empty */
| AddExpr { ast = new Calc($1); }
;
AddExpr : ModExpr
| AddExpr '+' ModExpr { $$ = new AddExpr($1, $3, "+"); }
| AddExpr '-' ModExpr { $$ = new AddExpr($1, $3, "-"); }
;
ModExpr : IntDivExpr
| ModExpr MOD IntDivExpr { $$ = new ModExpr($1, $3); }
;
IntDivExpr : MultExpr
| IntDivExpr DIV MultExpr { $$ = new IntDivExpr($1, $3); }
;
MultExpr : UnaryExpr
| MultExpr '*' UnaryExpr { $$ = new MultExpr($1, $3, "*"); }
| MultExpr '/' UnaryExpr { $$ = new MultExpr($1, $3, "/"); }
;
UnaryExpr : ExpExpr
| '-' UnaryExpr { $$ = new UnaryExpr($2, "-"); }
| '+' UnaryExpr { $$ = new UnaryExpr($2, "+"); }
;
ExpExpr : Value
| ExpExpr '^' Value { $$ = new ExpExpr($1, $3); }
;
Value : DoubleLiteral
| '(' AddExpr ')' { $$ = new Value($2); }
;
DoubleLiteral : DOUBLE { $$ = $1; }
;
Here is an example expression:
1*(2+3)/(4-5)*((((6))))
and the resulting image:
This leaves me with Value nodes for each pair of parentheses. I have a few ideas on how to handle this, but I am not sure how to proceed:
Try to handle this in my grammar (not sure how as I am not allowed to use precedence directives)
Handle this in my evaluator
If you don't want Value nodes, then just replace { $$ = new Value($2); } with { $$ = $2; }.

How to create dynamic ANTLR4 lexer rule

I have next grammar
grammar SearchEngine;
#lexer::members {
private java.util.Set<String> extraCriteria;
public SearchEngineLexer(CharStream input, java.util.Set<String> extraCriteria) {
this(input);
this.extraCriteria = extraCriteria;
}
}
query: expression EOF;
expression: criteria operator literal_value | expression 'AND' expression | expression 'OR' expression;
criteria : 'SERVICE_NAME' | ..a lot of hardcoded criterias here | EXTRA_CRITERIA;
EXTRA_CRITERIA: {extraCriteria.stream().filter(c -> c.equals(getText())).findFirst().isPresent()}? . ;
It accepts queries like SERVICE_NAME = 'something' OR EXCEPTION IS NULL and so on. The rest part of my grammar is not important, because it works without EXTRA_CRITERIA definition.
So, I created new lexer with "TestCriteria" as extra criteria and trying to parse my query:
Lexer lexer = new SearchEngineLexer(CharStreams.fromString("TestCriteria != 'test' OR SERVICE_NAME = 'EchoService'"), Collections.singleton("TestCriteria"));
TokenStream tokenStream = new CommonTokenStream(lexer);
SearchEngineParser parser = new SearchEngineParser(tokenStream);
parser.setErrorHandler(new BailErrorStrategy());
SearchEngineParser.QueryContext context = parser.query();
But when I execute this code I retrieve
line 1:0 token recognition error at: 'Te'
line 1:2 token recognition error at: 'st'
line 1:4 token recognition error at: 'C'
line 1:5 token recognition error at: 'ri'
line 1:7 token recognition error at: 'te'
line 1:9 token recognition error at: 'ri'
line 1:11 token recognition error at: 'a'
org.antlr.v4.runtime.misc.ParseCancellationException
at org.antlr.v4.runtime.BailErrorStrategy.recoverInline(BailErrorStrategy.java:66)
at de.telekom.tvpp.mtool.language.SearchEngineParser.criteria(SearchEngineParser.java:277)
at de.telekom.tvpp.mtool.language.SearchEngineParser.expression(SearchEngineParser.java:180)
at de.telekom.tvpp.mtool.language.SearchEngineParser.query(SearchEngineParser.java:117)
at de.telekom.tvpp.mtool.language.App.main(App.java:22)
Caused by: org.antlr.v4.runtime.InputMismatchException
at org.antlr.v4.runtime.BailErrorStrategy.recoverInline(BailErrorStrategy.java:61)
... 4 more
Where am I wrong? How to setup ANTL4 to use dynamic rule?
I had a mistake in the expression. This is my solution:
grammar SearchEngine;
#lexer::members {
private java.util.Set<String> extraCriteria;
public SearchEngineLexer(CharStream input, java.util.Set<String> extraCriteria) {
this(input);
this.extraCriteria = extraCriteria;
}
private boolean isExtraCriteria() {
return extraCriteria.stream().anyMatch(term -> ahead(term, _input));
}
private boolean ahead(final String word, final CharStream input) {
for (int i = 0; i < word.length(); i++) {
char wordChar = word.charAt(i);
int inputChar = input.LA(i + 1);
if (inputChar != wordChar) {
return false;
}
}
input.seek(input.index() + word.length() - 1);
return true;
}
}
query: expression EOF;
expression: criteria operator literal_value | expression 'AND' expression | expression 'OR' expression;
criteria : 'SERVICE_NAME' | ..a lot of hardcoded criterias here | EXTRA_CRITERIA;
EXTRA_CRITERIA: {isExtraCriteria()}? . ;

Antlr 3.3 return values in java

I try to figure out how to get values from the parser.
My input is 'play the who' and it should return a string with 'the who'.
Sample.g:
text returns [String value]
: speech = wordExp space name {$value = $speech.text;}
;
name returns [String value]
: SongArtist = WORD (space WORD)* {$value = $SongArtist.text;}
;
wordExp returns [String value]
: command = PLAY {$value = $command.text;} | command = SEARCH {$value = $command.text;}
;
PLAY : 'play';
SEARCH : 'search';
space : ' ';
WORD : ( 'a'..'z' | 'A'..'Z' )*;
WS
: ('\t' | '\r'| '\n') {$channel=HIDDEN;}
;
If I enter 'play the who' that tree comes up:
http://i.stack.imgur.com/ET61P.png
I created a Java file to catch the output. If I call parser.wordExp() I supposed to get 'the who', but it returns the object and this EOF failure (see the output below). parser.text() returns 'play'.
import org.antlr.runtime.*;
import a.b.c.SampleLexer;
import a.b.c.SampleParser;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("play the who");
SampleLexer lexer = new SampleLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
SampleParser parser = new SampleParser(tokens);
System.out.println(parser.text());
System.out.println(parser.wordExp());
}
}
The console return this:
play
a.b.c.SampleParser$wordExp_return#1d0ca25a
line 1:12 no viable alternative at input '<EOF>'
How can I catch 'the who'? It is weird for me why I can not catch this string. The interpreter creates the tree correctly.
First, in your grammar, speech only gets assigned the return value of parser rule wordExp. If you want to manipulate the return value of rule name as well, you can do this with an additional variable like the example below.
text returns [String value]
: a=wordExp space b=name {$value = $a.text+" "+$b.text;}
;
Second, invoking parser.text() parses the entire input. A second invocation (in your case parser.wordExp()) thus finds EOF. If you remove the second call the no viable alternative at input 'EOF' goes away.
There may be a better way to do this, but in the meantime this may help you out.

Categories