How to get all logical operator token in a rule - java

I have a problem in getting all operator token in a rule. Such as if my input is (states = failed) and (states1 = nominal) or (states2 = nominal), then I want to get "and"/"or". I already have a grammar that can parse my input, but words like 'and' and 'or' are keywords in my grammar. So that they can show up in the parse tree but they didn't match a rule.
I want to finish it by Listener method, but I don't know how to get these tokens.
My lexer file:
lexer grammar TransitionLexer;
BOOLEAN: 'true' | 'false';
IF: 'if';
THEN: 'then';
ELSE: 'else';
NAME: (ALPHA | CHINESE | '_')(ALPHA | CHINESE | '_'|DIGIT)*;
ALPHA: [a-zA-Z];
CHINESE: [\u4e00-\u9fa5];
NUMBER: INT | REAL;
INT: DIGIT+
|'(-'DIGIT+')';
REAL: DIGIT+ ('.' DIGIT+)?
| '(-' DIGIT+ ('.' DIGIT+)? ')';
fragment DIGIT: [0-9];
OPCOMPARE: '='|'>='|'<='|'!='|'>'|'<';
WS: [ \t\n\r]+ ->skip;
SL_COMMENT: '/*' .*? '*/' ->skip;
My grammar file:
grammar TransitionCondition;
import TransitionLexer;
#parser::header{
import java.util.*;
}
#parser:: members{
private List<String> keywords = new ArrayList<String>();
public boolean isKeyWord(){
return keywords.contains(getCurrentToken().getText());
}
public List<String> getKeywords(){
return keywords;
}
}
condition : stat+ EOF;
stat : expr;
expr: pair (('and' | 'or') pair)*
| '(' pair ')';
pair: '(' var OPCOMPARE value ')' # keyValuePair
| booleanExpr # booleanPair
| BOOLEAN # plainBooleanPair
;
var: localStates # localVar
| globalStates # globalVar
| connector # connectorVar
;
localStates: NAME;
globalStates: 'Top' ('.' brick)+ '.' NAME;
connector: brick '.' NAME;
value: {isKeyWord()}? userDefinedValue
|basicValue
;
userDefinedValue: NAME;
basicValue: arithmeticExpr | booleanExpr;
booleanExpr: booleanExpr op=('and' | 'or') booleanExpr
| BOOLEAN
| relationExpr
| 'not' booleanExpr
| '(' booleanExpr ')'
;
relationExpr: arithmeticExpr
| arithmeticExpr OPCOMPARE arithmeticExpr
;
arithmeticExpr: arithmeticExpr op=('*'|'/') arithmeticExpr
| arithmeticExpr op=('+'|'-') arithmeticExpr
| 'min' '(' arithmeticExpr (',' arithmeticExpr)* ')'
| 'max' '(' arithmeticExpr (',' arithmeticExpr)* ')'
| globalStates
| connector
| localStates
| NUMBER
| '(' arithmeticExpr ')'
;
brick: NAME;
My Input file t.expr with content: (states = failed) and (states1 = nominal) or (states2 = nominal)
I get the tree in Command line using 'grun'.

If you label your parser rule expr:
expr
: pair (operators+=('and' | 'or') pair)* #logicalExpr
| '(' pair ')' #parensExpr
;
your (generated) listener class will contain these methods:
void enter_logicalExpr(TransitionConditionParser.LogicalExprContext ctx);
void enter_parensExpr(TransitionConditionParser.ParensExprContext ctx);
Inside enter_logicalExpr you can find the and/or tokens in the java.util.List from the context: ctx.operators.

Related

ANTLR "no viable alternative at input" Error for parsing SAS code If Then Else

I am new to ANTLR and working on a parser to parse SAS code which mainly comprises of if then else if statements. I have created the following grammar to parse the code but I am getting error in Intellij when I tried to run using sample application.
Grammar created :
grammar SASDTModel;
parse
: if_block+
| score_block
;
//Model
// : If_block+
// | Score_block
// ;
if_block
: (if_statement|if_in_block)
| else_if_statement+
| else_statement
;
if_statement
: IF '(' if_condition ')' THEN Identifier'='Value ';'
| IF Identifier'='Value THEN Identifier'='Value ';'
;
else_if_statement
: ELSEIF '(' if_condition ')' THEN Identifier'='Value ';'
| ELSEIF Identifier'='Value THEN Identifier'='Value ';'
;
if_condition
: Value ComparisionOperators Identifier ComparisionOperators Value
| Value ComparisionOperators Value;
else_statement
: ELSE Identifier'='Value ';'
;
if_in_block
: IF Identifier IN '(' StringArray ')' THEN Identifier'='Value ';'
;
score_block
: Identifier'='Arithmetic_expression ';'
;
Arithmetic_expression:
| ( ArithmeticOperators '(' Value ')' )+
| ( ArithmeticOperators '(' Value ArithmeticOperators Identifier ')' )+
;
WS : ( ' ' | '\t' | '\r' | '\n' )-> channel(HIDDEN);
//WS : [ \t\n\r]+ -> channel(HIDDEN) ;
//WS : (' ' | '\t')+ -> channel(HIDDEN);
//COMMENT : '/*' .*? '*/' -> skip ;
//LINE_COMMENT : '*' ~[\r\n]* -> skip ;
ArithmeticOperators:
| '+'
| '-'
| '*'
| '/'
| '**'
;
ComparisionOperators
: '=='
| '<'
| '>'
| '<='
| '>='
;
IF: 'IF' | 'if' ;
ELSE: 'ELSE' | 'else' ;
ELSEIF: 'ELSE IF' | 'else if' ;
THEN: 'THEN' | 'then';
IN: 'IN' | 'in';
Value : INT
| DOUBLE
| '-'DOUBLE
| '-'INT
| Identifier
|'null';
INT : [0-9];
DOUBLE : INT+ PT INT+
| PT INT+
| INT+
;
PT : '.';
Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ;
StringArray : (('\'')(Value)('\''))+;
Input:
if scored = null then scored = -0.05;
else if ( 0 < scored <= 300 ) then scored = -0.5;
else if ( 300 < scored <= 500 ) then scored = -0.4;
else if ( 500 < scored <= 800 ) then scored = -0.8;
else if ( 800 < scored <= 1000 ) then scored = 0.9;
else if ( scored > 1000 ) then scored = 1.735409628;
else scored = 0;
Error I am getting
line 1:4 no viable alternative at input 'IF scored'
line 1:61 mismatched input '<=' expecting ')'
line 1:112 mismatched input '<=' expecting ')'
line 1:163 mismatched input '<=' expecting ')'
line 1:214 mismatched input '<=' expecting ')'
line 1:276 mismatched input 'scored' expecting Identifier
line 1:303 mismatched input 'scored' expecting Identifier
All the error codes are 1: since I am preprocessing the SAS code and removing any comments and converting into single line.
So after preprocessing the input is converted to following : `
IF scored = null THEN scored = -0.05;ELSE IF ( 0 < scored <= 300 )
THEN scored = -0.5;ELSE IF ( 300 < scored <= 500 ) THEN scored =
-0.4;ELSE IF ( 500 < scored <= 800 ) THEN scored = -0.8;ELSE IF ( 800 < scored <= 1000 ) THEN scored = 0.9;ELSE IF ( scored > 1000 ) THEN
scored = 1.735409628;ELSE scored = 0;
`
Here are a couple of things that might causing problems:
by making StringArray : (('\'')(Value)('\''))+; a lexer rule, you will only match 'foo123mu' (values without spaces). You should make StringArray a parser rule (and then Value should also become a parser rule)
your else If rule: ELSEIF: 'ELSE IF' | 'else if' ; is rather fragile: whenever there are 2 or more spaces between ELSE and IF, your rule will not be matched. You should remove this rule an use the existing ELSE and IF rules in your parser rule(s)
the rules ArithmeticOperators and Arithmetic_expression match empry strings. Lexer rules must never match empty strings (the lever can produce an infinite amount of empty-string tokens)
the lever rule Arithmetic_expression should be a parser rule: whenever lever rules are used to "glue" other tokens to each other, you should "promote" them to parser rules
your naming convention for lexer rules in inconsistent: use either PascalCasse, or UPPER_CASE, not both
as already mentioned, INT : [0-9]; should be INT : [0-9]+; otherwise 4 would be tokenised as an INT and 42 as a DOUBLE
These are just a few of the things I saw while reading your question, so there may be more things incorrect. I suggest you first take the time to learn a bit more ANTLR before trying to write a SAS grammar. Or, better yet, try to find an existing (ANTLR) grammar for this language instead of writing your own.
Here's an existing one you could take a look at: https://github.com/xueqilsj/sas-grammar (no idea how accurate it is)
The syntax of your input is incorrect: == should be used instead of =.
UPDATE:
Also, although the syntax of INT and DOUBLE should work, it would be better expressed like so:
INT : [0-9]+;
DOUBLE : INT PT INT
| PT INT
| INT
;
otherwise, 300 would be identified as a DOUBLE, not as an INT.
UPDATE 2
As #Raven has commented:
INT : [0-9]+;
DOUBLE : INT PT INT
| PT INT
;
I have completed my grammar and resolved all the errors thanks to #Bart, #Seelenvirtuose and #Maurice.
Following is the ANTLR grammar for parsing SAS If Else and simple Assignment statements.
grammar SASDTModel;
parse : block+ EOF;
block
: if_block+ # oneOrMoreIfBlock
| assignment_block+ # assignmentBlocks
;
if_block
: if_statement (else_if_statement)* else_statement?
;
/*nested_if_else_statement
: If if_condition Then Do? ';'? if_statement (else_if_statement)* else_statement? End? ';'?
;*/
if_statement
: If '('? if_condition ')'? Then if_block # nestedIfStatement
| If '('? if_condition ')'? Then expression Equal expression ';' # ifStatement
| If expression In '(' expression_list+ ')' Then expression Equal expression ';' # ifInBlock
;
else_if_statement
: Else If '('? if_condition ')'? Then expression Equal expression ';' # elseIf
| Else If expression In '(' expression_list+ ')' Then expression Equal expression ';' # elseIfInBlock
;
if_condition
: Identifier (Equal|ComparisionOperators) Quote? expression+ Quote? # equalCondition
| expression # expressionCondition
| expression equals_to_null # checkIfNull
| expression op=(And|Or) expression # andOrExpression
;
/*if_range_condition
: expression ComparisionOperators expression ComparisionOperators expression
;*/
else_statement
: Else expression Equal expression ';'
;
assignment_block
: Identifier Equal Identifier '(' function_parameter ')' ';' # functionCall
| Identifier Equal expression expression* ';' # assignValue
;
expression
: Value # value
| Identifier # identifier
| SignedFloat # signedFloat
| '(' expression ')' # expressionBracket
| expression '(' expression_list? ')' # expressionBracketList
| Not expression # notExpression
| expression (Min|Max) expression # minMaxExpression
| expression op=('*'|'/') expression # mulDivideExpression
| expression op=('+'|'-') expression # addSubtractExpression
| expression ('||' | '!!' ) expression # orOperatorExpression
| expression ComparisionOperators expression ComparisionOperators expression # inRangeExpression
| expression ComparisionOperators Quote? expression+ Quote? # ifPlainCondition
| expression (Equal|ComparisionOperators) Quote {_input.get(_input.index() -1).getType() == WS}? Quote # ifSpaceStringCondition
| expression Equal expression # equalExpression
;
expression_list
: Quote? expression+ Quote? Comma? # generalExpressionList
| Quote ({_input.get(_input.index() -1).getType() == WS}?)? Quote Comma? # spaceString
;
function_parameter
: expression+
;
equals_to_null : Equal Pt ;
/*ArithmeticOperators
: '+'
| '-'
| '*'
| '/'
| '**'
;*/
Equal : '=' ;
ComparisionOperators
: '<'
| '>'
| '<='
| '>='
;
And : '&' | 'and';
Or
: '|'
| '!'
;
Not
: '^'
| '~'
;
Min : '><';
Max : '<>';
If
: 'IF'
| 'if'
| 'If'
;
Else
: 'ELSE'
| 'else'
| 'Else';
Then
: 'THEN'
| 'then'
| 'Then'
;
In : 'IN' | 'in';
Do : 'do' | 'Do';
End : 'end' | 'END';
Value
: Int
| DOUBLE
| '-'DOUBLE
| '-'Int
| SignedFloat
| 'null';
Int : [0-9]+;
SignedFloat
: UnaryOperator? UnsignedFloat
;
MUL : '*' ; // assigns token name to '*' used above in grammar
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
DOUBLE
: Int Pt Int
| Pt Int
| Int
;
Pt : '.';
UnaryOperator
: '+'
| '-'
;
UnsignedFloat
: ('0'..'9')+ '.' ('0'..'9')* Exponent?
| '.' ('0'..'9')+ Exponent?
| ('0'..'9')+ Exponent
;
Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
Comma : ',';
Quote
: '\''
| '"'
;
Identifier : [a-zA-Z_] [a-zA-Z_0-9]* ;
WS : ( ' ' | '\t' | '\r' | '\n' )-> channel(HIDDEN);

Parsing script tag with jsp scriptlet with antlr4

I am having difficulty in parsing <script> tag with scriptlet expression inside it using antlr4. I started with existing HTMLParser and Lexer grammar and trying to modify for my need.
The Parser Grammar is:
parser grammar HTMLParser;
options { tokenVocab=HTMLLexer; }
htmlDocument
: script*
;
script
: SCRIPT_OPEN scriptAttribute* SCRIPT_TAG_CLOSE? WORD? SCRIPT_TAG_FULL_CLOSE
| SCRIPT_OPEN scriptAttribute* SCRIPT_TAG_SLASH_CLOSE
;
scriptAttribute
: scriptAttributeName SCRIPT_EQUALS QUOTE SCRIPLET_INSIDE_SCRIPT javaExpression SCRIPTLET_TAG_CLOSE scriptAttributeValue? QUOTE
| scriptAttributeName SCRIPT_EQUALS QUOTE scriptAttributeValue QUOTE
| scriptAttributeName
;
scriptAttributeName
: WORD
;
scriptAttributeValue
: SCRIPT_ATTRIBUTE
;
scriptlet
: SCRIPTLET_TAG_OPEN javaExpression SCRIPTLET_TAG_CLOSE
;
jspElementName
: TAG_NAME
;
jspElementAttribute
: jspAttributeName TAG_EQUALS jspAttributeValue
;
jspAttributeName
: TAG_NAME
;
jspAttributeValue
: ATTVALUE_VALUE
;
javaExpression
: VALID_JAVA_CHARS | SEA_WS*
;
The lexer grammar:
lexer grammar HTMLLexer;
SCRIPT_OPEN
: '<script' ->pushMode(SCRIPT)
;
SCRIPTLET_TAG_OPEN
: ('<%!' | '<%=' | '<%') ->pushMode(SCRIPTVALUE)
;
SEA_WS
: (' '|'\t'|'\r'? '\n')+
;
TAG_NAME
: TAG_NameStartChar TAG_NameChar*
// | TAG_NameStartChar* ':' TAG_NameStartChar*
;
TAG_WHITESPACE
: [ \t\r\n] -> channel(HIDDEN)
;
fragment
HEXDIGIT
: [a-fA-F0-9]
;
fragment
DIGIT
: [0-9]
;
TAG_NameChar
: TAG_NameStartChar
// | ':'
| '-'
| '_'
| '.'
| DIGIT
| '\u00B7'
| '\u0300'..'\u036F'
| '\u203F'..'\u2040'
;
TAG_NameStartChar
: [a-zA-Z]
| '\u2070'..'\u218F'
| '\u2C00'..'\u2FEF'
| '\u3001'..'\uD7FF'
| '\uF900'..'\uFDCF'
| '\uFDF0'..'\uFFFD'
;
TAG_EQUALS
: '=' -> pushMode(ATTVALUE)
;
//
// attribute values
//
mode ATTVALUE;
// an attribute value may have spaces b/t the '=' and the value
ATTVALUE_VALUE
: [ ]* ATTRIBUTE -> popMode
;
ATTRIBUTE
: DOUBLE_QUOTE_STRING
| SINGLE_QUOTE_STRING
| ATTCHARS
| HEXCHARS
| DECCHARS
;
fragment ATTCHAR
: '-'
| '_'
| '.'
| '/'
| '+'
| ','
| '?'
| '='
| ':'
| ';'
| '#'
| [0-9a-zA-Z]
;
fragment ATTCHARS
: ATTCHAR+ ' '?
;
fragment HEXCHARS
: '#' [0-9a-fA-F]+
;
fragment DECCHARS
: [0-9]+ '%'?
;
fragment DOUBLE_QUOTE_STRING
: '"' ~[<"]* '"'
;
fragment SINGLE_QUOTE_STRING
: '\'' ~[<']* '\''
;
//
// <scripts>
//
mode SCRIPT;
SCRIPT_TAG_FULL_CLOSE
: '</script>' ->popMode
;
SCRIPT_TAG_CLOSE
: '>'
;
SCRIPT_TAG_SLASH_CLOSE
: '/>' -> popMode
;
SCRIPT_EQUALS
: '='
;
SCRIPLET_INSIDE_SCRIPT
: '<%' ->pushMode(SCRIPTVALUE)
;
SCRIPT_ATTRIBUTE
: SCRIPT_ATTCHARS
;
fragment SCRIPT_ATTCHARS
: SCRIPT_ATTCHAR+
;
SCRIPTTAG_WS
: [ \r\n\t]+ -> channel(HIDDEN)
;
WORD
: [a-zA-Z]+
;
QUOTE
: '"'
;
fragment SCRIPT_ATTCHAR
: '-'
| '_'
| '.'
| '/'
| ','
| ';'
| '\''
// | '"'
| [0-9a-zA-Z]
;
mode SCRIPTVALUE;
SCRIPTLET_TAG_CLOSE
: '%>' ->popMode
;
VALID_JAVA_CHARS
: SCRIPTCHARS+
;
SCRIPT_WS
: [\r\n\t]+ -> channel(HIDDEN)
;
fragment SCRIPTCHARS
: SCRIPTCHAR+ ' '?
;
fragment SCRIPTCHAR
: '-'
| '_'
| '.'
| '/'
| '+'
| ','
| '?'
| '='
| ':'
| ';'
| '#'
| '('
| ')'
| '}'
| '{'
| '#'
| '*'
| '!'
| '%'[0-9]+
| '&'
| '['
| ']'
| '~'
| '+'
| '^'
| '\r'
| '\t'
| '\n'
| ' '
| '"'
| '\''
| [0-9a-zA-Z]
;
Please note: Currently, I am just testing the parser on a un-realistically simple two line files with the following text:
<script type="text/javascript" src="<%= request.getContextPath() %>"></script>
<script type="text/javascript" src="<%= request.getContextPath() %>/scripts/Main.js"></script>
The output is:
line 1:8 no viable alternative at input '<script type'
^<script^ start start index: 0 start stop index: 6
^<script^ start start index: 0 start stop index: 6
^
^ stop start index: 78 stop stop index: 79
^<script^ start start index: 80 start stop index: 86
^
Please note the stop token start index is starting from previous line's newline character. My final goal is to create a jsp grammar where I can identify each script, link, style, jsp tag library tags and replace them with something else. The rest of the grammar is in place, but I am stuck here in parsing the script tag.

Why I add the "cppInclude" parser then the whole g4 doesn't work

g4:
grammar KBDP;
#header {package kbdp.translator.antlr;}
COMMENT: '/*' .*? '*/' -> skip ; LINE_COMMENT: '//' ~[\r\n]* '\r'? '\n' -> skip;
KS:'[' ('KS_'|'KA_') MIX+ ']';
STRING : '"' ~[\r\n"]+ '"';
fragment NUM:[0-9]+; VARNAME:[_a-zA-Z-0-9]+;
INCLUDE :'#include' ;
MIX : CHINESE | VARNAME ;
CHINESE : ('\u4E00'..'\u9FA5')+ ; ARG : VARNAME|STRING ;
DB : '[' '数据库' ']'; SQL : '[' 'SQL' NUM ']'; SQLRESULT: '[''SQL' NUM '有数据'']'; SQLREADLINE:'[' '列' NUM ']'; RESULTWIRTELINE:'[' '结果集' NUM ']'; RETURNMULTI:'[' '返回值' NUM '有数据' ']'; RETURNSINGLE:'[' '返回值' NUM ']'; PRINT:'[' '打印' ']'; WS: [\r\n \t] -> skip;
prog: kinglangStat+ | cppStat+;
block:'{' prog* '}';
kinglangStat:kinglangServiceDeclaration |
kinglangDBOpen |
kinglangSQL |
kinglangSQLResult |
kinglangSQLReadLine |
kinglangResultDeclare |
kinglangResultWriteLine |
kinglangFunctionCall |
kinglangFunctionReturnSingle |
kinglangFunctionReturnMulti |
kinglangPrintStatus;
kinglangServiceDeclaration: KS '(' VARNAME? (',' VARNAME)* ')' '{' prog* '}'; kinglangDBOpen:(VARNAME '=')? DB '(' (VARNAME|STRING) ')' ';';
kinglangSQL:(VARNAME '=')? SQL '(' STRING? ')' ';' ;
kinglangSQLResult:SQLRESULT block; kinglangSQLReadLine:SQLREADLINE '(' VARNAME ')' ';'; kinglangResultDeclare:RESULTWIRTELINE ';';
kinglangResultWriteLine:RESULTWIRTELINE '(' kinglangArg? (',' kinglangArg)* ')' ';'; kinglangArg : VARNAME|STRING ;
kinglangFunctionCall:KS '(' (VARNAME|STRING)? (',' (VARNAME|STRING))* ')' ';';
kinglangFunctionReturnSingle:RETURNSINGLE '(' VARNAME? (',' VARNAME)* ')' ';';
kinglangFunctionReturnMulti:RETURNMULTI block;
kinglangPrintStatus:PRINT '(' VARNAME|STRING ')' ';';
cppStat: block |
cppBreakStat |
cppContinueStat|
cppReturnStat |
cppSingleStat |
cppIfStat |
cppWhileStat |
cppGotoStat |
cppLabelStat |
cppForStat |
cppInclude;
cppIfStat: cppIfStatPart cppElseifPart* cppElsePart?;
cppIfStatPart:'if' '(' expression ')' (cppSingleStat|block)?;
cppElseifPart:'else if' '(' expression ')' (cppSingleStat|block)? ;
cppElsePart:'else' (cppSingleStat|block)?;
cppWhileStat:'while' '(' expression ')' block;
cppForStat:'for' '(' cppForCondition1?';' cppForCondition2?';'cppForCondition3?')' block; cppForCondition1:expression; cppForCondition2:expression; cppForCondition3:expression;
cppBreakStat:'break' ';';
cppContinueStat:'continue' ';';
cppGotoStat:'goto' expression ';' ;
cppLabelStat:VARNAME ':' ;
cppReturnStat: 'return' VARNAME? ';';
cppSingleStat: expression ';';
cppInclude: INCLUDE ('<'|'"') VARNAME '.' VARNAME ('>'|'"') ';';
expression: VARNAME |
STRING |
'!' expression |
expression '=' expression |
expression ('<'|'>') expression |
expression expression |
expression ('+'|'-'|'*'|'/'|'%') expression |
expression ('=='|'!=') expression |
expression ('>='|'<=') expression |
expression ('&&'|'||') expression |
expression ('++'|'--') |
('++'|'--') expression |
'(' expression ')'|
'\'' expression '\'' |
expression ',' expression |
expression expression | //var decl
expression '<' expression '>' expression | //class template
expression '[' expression ']' | //array
expression '.' expression | //class obj
expression '(' expression ')'; //func call
text:
[KS_MyTest](name,code)
{
char szCredit[1024];
memset(szCredit,0,sizeof(szCredit));
[数据库]("DB");
[SQL1]("select * from myTable where name='#name' and code='#code'");
[SQL1有数据]
{
[列0](szCredit);
}
[结果集1];
[结果集1]("ERROE_SUCCESS",0,0);
[结果集2];
[结果集2](szCredit);
}
Before add the "cppInclude" parser,everything goes well.
But when I write the text:
#include <iostream.h>
[KS_MyTest](name,code)
{
}
the analyzer doesn't work well.
It notice me:
line 20:0 extraneous input '[KS_MyTest]' expecting {<EOF>, '{', '(', 'if', 'while', 'for', 'break', 'continue', 'goto', 'return', '!', '++', '--', ''', STRING, VARNAME, '#include'}
How could I fix the bug?
This production:
prog: kinglangStat+ | cppStat+;
says that a prog is either a sequence of kinglangStat or a sequence of cppStat.
Your example is a cppStat followed by a kinglangStat (I think). That isn't a prog.
How could I fix the bug?
Try this:
prog: ( kinglangStat | cppStat )+;
or
prog: stat+;
stat: kinglangStat | cppStat;
(I am not a Antlr expert. I'm just reading the grammar ... as a grammar.)

Define a rule for double-type in ANTLR v4

I wrote this grammar (*.g4):
ID : [a-zA-Z]+;
INT : [0-9];
DBL : INT+ (PT INT+)?;
PT : '.';
...
prog: stat+;
stat: expr NEWLINE # printExpr
| ID EQL expr NEWLINE # assign
| 'clear' # clear
| NEWLINE # blank
;
expr: expr op=(MUL|DIV) expr # MulDiv
| expr op=(ADD|SUB) expr # AddSub
| DBL # double
| ID # id
| LBR expr RBR # parens
;
My ANTLR and Java files compile with no problems, but if I run this input:
193.2
a =5.2
b= 6
c= a+b*2.2
c
there is a problem with b= 6 line 3:3 no viable alternative at input '6'
followed by a nullPointerException on the visit().
I assume there might be some ambiguity within my expr rule.
What am I doing wrong?
Found the problem!
Defining my previous DBL as:
dbl : INT+ PT INT+
| PT INT+
| INT+
;
It did the trick!!

ANTLR: Lexer rule accepting strictly one letter, and a token of multiple chars, instead of just one (Java)

I've written the below grammar for ANTLR parser and lexer for building trees for logical formulae and had a couple of questions if someone could help:
class AntlrFormulaParser extends Parser;
options {
buildAST = true;
}
biconexpr : impexpr (BICONDITIONAL^ impexpr)*;
impexpr : orexpr (IMPLICATION^ orexpr)*;
orexpr : andexpr (DISJUNCTION^ andexpr)*;
andexpr : notexpr (CONJUNCTION^ notexpr)*;
notexpr : (NEGATION^)? formula;
formula
: atom
| LEFT_PAREN! biconexpr RIGHT_PAREN!
;
atom
: CHAR
| TRUTH
| FALSITY
;
class AntlrFormulaLexer extends Lexer;
// Atoms
CHAR: 'a'..'z';
TRUTH: ('\u22A4' | 'T');
FALSITY: ('\u22A5' | 'F');
// Grouping
LEFT_PAREN: '(';
RIGHT_PAREN: ')';
NEGATION: ('\u00AC' | '~' | '!');
CONJUNCTION: ('\u2227' | '&' | '^');
DISJUNCTION: ('\u2228' | '|' | 'V');
IMPLICATION: ('\u2192' | "->");
BICONDITIONAL: ('\u2194' | "<->");
WHITESPACE : (' ' | '\t' | '\r' | '\n') { $setType(Token.SKIP); };
The tree grammar:
tree grammar AntlrFormulaTreeParser;
options {
tokenVocab=AntlrFormula;
ASTLabelType=CommonTree;
}
expr returns [Formula f]
: ^(BICONDITIONAL f1=expr f2=expr) {
$f = new Biconditional(f1, f2);
}
| ^(IMPLICATION f1=expr f2=expr) {
$f = new Implication(f1, f2);
}
| ^(DISJUNCTION f1=expr f2=expr) {
$f = new Disjunction(f1, f2);
}
| ^(CONJUNCTION f1=expr f2=expr) {
$f = new Conjunction(f1, f2);
}
| ^(NEGATION f1=expr) {
$f = new Negation(f1);
}
| CHAR {
$f = new Atom($CHAR.getText());
}
| TRUTH {
$f = Atom.TRUTH;
}
| FALSITY {
$f = Atom.FALSITY;
}
;
The problems I'm having with the above grammar are these:
The tokens, IMPLICATION and BICONDITIONAL, in the java code for AntlrFormulaLexer only seem to be checking for their respective first character (i.e. '-' and '<') to match the token, instead of the whole string, as specified in the grammar.
When testing the java code for AntlrFormulaParser, if I pass a string such as "~ab", it returns a tree of "(~ a)" (and a string "ab&c" returns just "a"), when it should really be returning an error/exception, since an atom can only have one letter according to the above grammar. It doesn't give any error/exception at all with these sample strings.
I'd really appreciate if someone could help me solve these couple of problems. Thank you :)
I would change the following definitions as:
IMPLICATION: ('\u2192' | '->');
BICONDITIONAL: ('\u2194' | '<->');
note "->" vs '->'
And to solve the error issue:
formula
: (
atom
| LEFT_PAREN! biconexpr RIGHT_PAREN!
) EOF
;
from here:
http://www.antlr.org/wiki/pages/viewpage.action?pageId=4554943
Fixed grammar to compile against antlr 3.3 (save as AntlrFormula.g):
grammar AntlrFormula;
options {
output = AST;
}
program : formula ;
formula : atom | LEFT_PAREN! biconexpr RIGHT_PAREN! ;
biconexpr : impexpr (BICONDITIONAL^ impexpr)*;
impexpr : orexpr (IMPLICATION^ orexpr)*;
orexpr : andexpr (DISJUNCTION^ andexpr)*;
andexpr : notexpr (CONJUNCTION^ notexpr)*;
notexpr : (NEGATION^)? formula;
atom
: CHAR
| TRUTH
| FALSITY
;
// Atoms
CHAR: 'a'..'z';
TRUTH: ('\u22A4' | 'T');
FALSITY: ('\u22A5' | 'F');
// Grouping
LEFT_PAREN: '(';
RIGHT_PAREN: ')';
NEGATION: ('\u00AC' | '~' | '!');
CONJUNCTION: ('\u2227' | '&' | '^');
DISJUNCTION: ('\u2228' | '|' | 'V');
IMPLICATION: ('\u2192' | '->');
BICONDITIONAL: ('\u2194' | '<->');
WHITESPACE : (' ' | '\t' | '\r' | '\n') { $channel = HIDDEN; };
Link to antlr 3.3 binary: http://www.antlr.org/download/antlr-3.3-complete.jar
you will need to try to match the program rule in order to match the complete file.
testable with this class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) {
AntlrFormulaLexer lexer = new AntlrFormulaLexer(new ANTLRStringStream("(~ab)"));
AntlrFormulaParser p = new AntlrFormulaParser(new CommonTokenStream(lexer));
try {
p.program();
if ( p.failed() || p.getNumberOfSyntaxErrors() != 0) {
System.out.println("failed");
}
} catch (RecognitionException e) {
e.printStackTrace();
}
}
}

Categories