ANTLR Grammar line 1:6 mismatched input '<EOF>' expecting '.' - java

I am playing with antlr4 grammar files, and I wanted to write my own jsonpath grammar.
I've comeup with this:
grammar ObjectPath;
objectPath : dnot;
dnot : ROOT expr ('.' expr)
| EOF
;
expr : select #selectExpr
| ID #idExpr
;
select : ID '[]' #selectAll
| ID '[' INT ']' #selectIndex
| ID '[' INT (',' INT)* ']' #selectIndexes
| ID '[' INT ':' INT ']' #selectRange
| ID '[' INT ':]' #selectFrom
| ID '[:' INT ']' #selectUntil
| ID '[-' INT ':]' #selectLast
| ID '[?(' query ')]' #selectQuery
;
query : expr (AND|OR) expr # andOr
| ALL # all
| QPREF ID # prop
| QPREF ID GT INT # gt
| QPREF ID LT INT # lt
| QPREF ID EQ INT # eq
| QPREF ID GTE INT # gte
| QPREF ID LTE INT # lte
;
/** Lexer **/
ROOT : '$.' ;
QPREF : '#.' ;
ID : [a-zA-Z][a-zA-Z0-9]* ;
INT : '0' | [1-9][0-9]* ;
AND : '&&' ;
OR : '||' ;
GT : '>' ;
LT : '<' ;
EQ : '==' ;
GTE : '>=' ;
LTE : '<=' ;
ALL : '*' ;
After running this on a simple expression:
CharStream input = CharStreams.fromString("$.name");
ObjectPathLexer lexer = new ObjectPathLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ObjectPathParser parser = new ObjectPathParser(tokens);
ParseTree parseTree = parser.dnot();
ObjectPathDefaultVisitor visitor = ...
System.out.println(visitor.visit(parseTree));
System.out.println(parseTree.toStringTree(parser));
The output is ok, meaning that the "name" is actually retrieved from the json, but there's a warning I cannot explain:
line 1:6 mismatched input '<EOF>' expecting '.'
I've read that I need to explicitly have an EOF rule added to my starting one (dnot), but this doesn't seem to work.
Any idea what can I do ?

Your input $.name cannot be parsed by your rule:
dnot : ROOT expr ('.' expr)
| EOF
;
$.name produces 2 tokens:
ROOT
ID
But your first alternative, ROOT expr ('.' expr), expects 2 expressions separated by a .. Perhaps you meant to make the second expr optional, like this:
dnot : ROOT expr ('.' expr)*
| EOF
;
And the EOF is generally added at the end of your start rule, to force the parser to consume all tokens. As you did it now, the parser successfully parsed ROOT expr, but then failed to parse further, and produces the warning you saw (expecting '.').
Since objectPath seems to be your start rule, I think this is what you want to do:
objectPath : dnot EOF;
dnot : ROOT expr ('.' expr)?
;
Also, tokens like these [], '[?(', etc look suspicious. I'm not really familiar with Object Path, but by glueing these chars to each other, input like this [ ] ([ and ] separated by a space) will not be matched by []. So if foo[ ] is valid, I'd write it like this instead:
select : ID '[' ']' #selectAll
| ...
and skip spaces in the lexer:
SPACES : [ \t\r\n]+ -> skip;

Related

ANTLR "no viable alternative at input" Error for parsing SAS code If Then Else

I am new to ANTLR and working on a parser to parse SAS code which mainly comprises of if then else if statements. I have created the following grammar to parse the code but I am getting error in Intellij when I tried to run using sample application.
Grammar created :
grammar SASDTModel;
parse
: if_block+
| score_block
;
//Model
// : If_block+
// | Score_block
// ;
if_block
: (if_statement|if_in_block)
| else_if_statement+
| else_statement
;
if_statement
: IF '(' if_condition ')' THEN Identifier'='Value ';'
| IF Identifier'='Value THEN Identifier'='Value ';'
;
else_if_statement
: ELSEIF '(' if_condition ')' THEN Identifier'='Value ';'
| ELSEIF Identifier'='Value THEN Identifier'='Value ';'
;
if_condition
: Value ComparisionOperators Identifier ComparisionOperators Value
| Value ComparisionOperators Value;
else_statement
: ELSE Identifier'='Value ';'
;
if_in_block
: IF Identifier IN '(' StringArray ')' THEN Identifier'='Value ';'
;
score_block
: Identifier'='Arithmetic_expression ';'
;
Arithmetic_expression:
| ( ArithmeticOperators '(' Value ')' )+
| ( ArithmeticOperators '(' Value ArithmeticOperators Identifier ')' )+
;
WS : ( ' ' | '\t' | '\r' | '\n' )-> channel(HIDDEN);
//WS : [ \t\n\r]+ -> channel(HIDDEN) ;
//WS : (' ' | '\t')+ -> channel(HIDDEN);
//COMMENT : '/*' .*? '*/' -> skip ;
//LINE_COMMENT : '*' ~[\r\n]* -> skip ;
ArithmeticOperators:
| '+'
| '-'
| '*'
| '/'
| '**'
;
ComparisionOperators
: '=='
| '<'
| '>'
| '<='
| '>='
;
IF: 'IF' | 'if' ;
ELSE: 'ELSE' | 'else' ;
ELSEIF: 'ELSE IF' | 'else if' ;
THEN: 'THEN' | 'then';
IN: 'IN' | 'in';
Value : INT
| DOUBLE
| '-'DOUBLE
| '-'INT
| Identifier
|'null';
INT : [0-9];
DOUBLE : INT+ PT INT+
| PT INT+
| INT+
;
PT : '.';
Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ;
StringArray : (('\'')(Value)('\''))+;
Input:
if scored = null then scored = -0.05;
else if ( 0 < scored <= 300 ) then scored = -0.5;
else if ( 300 < scored <= 500 ) then scored = -0.4;
else if ( 500 < scored <= 800 ) then scored = -0.8;
else if ( 800 < scored <= 1000 ) then scored = 0.9;
else if ( scored > 1000 ) then scored = 1.735409628;
else scored = 0;
Error I am getting
line 1:4 no viable alternative at input 'IF scored'
line 1:61 mismatched input '<=' expecting ')'
line 1:112 mismatched input '<=' expecting ')'
line 1:163 mismatched input '<=' expecting ')'
line 1:214 mismatched input '<=' expecting ')'
line 1:276 mismatched input 'scored' expecting Identifier
line 1:303 mismatched input 'scored' expecting Identifier
All the error codes are 1: since I am preprocessing the SAS code and removing any comments and converting into single line.
So after preprocessing the input is converted to following : `
IF scored = null THEN scored = -0.05;ELSE IF ( 0 < scored <= 300 )
THEN scored = -0.5;ELSE IF ( 300 < scored <= 500 ) THEN scored =
-0.4;ELSE IF ( 500 < scored <= 800 ) THEN scored = -0.8;ELSE IF ( 800 < scored <= 1000 ) THEN scored = 0.9;ELSE IF ( scored > 1000 ) THEN
scored = 1.735409628;ELSE scored = 0;
`
Here are a couple of things that might causing problems:
by making StringArray : (('\'')(Value)('\''))+; a lexer rule, you will only match 'foo123mu' (values without spaces). You should make StringArray a parser rule (and then Value should also become a parser rule)
your else If rule: ELSEIF: 'ELSE IF' | 'else if' ; is rather fragile: whenever there are 2 or more spaces between ELSE and IF, your rule will not be matched. You should remove this rule an use the existing ELSE and IF rules in your parser rule(s)
the rules ArithmeticOperators and Arithmetic_expression match empry strings. Lexer rules must never match empty strings (the lever can produce an infinite amount of empty-string tokens)
the lever rule Arithmetic_expression should be a parser rule: whenever lever rules are used to "glue" other tokens to each other, you should "promote" them to parser rules
your naming convention for lexer rules in inconsistent: use either PascalCasse, or UPPER_CASE, not both
as already mentioned, INT : [0-9]; should be INT : [0-9]+; otherwise 4 would be tokenised as an INT and 42 as a DOUBLE
These are just a few of the things I saw while reading your question, so there may be more things incorrect. I suggest you first take the time to learn a bit more ANTLR before trying to write a SAS grammar. Or, better yet, try to find an existing (ANTLR) grammar for this language instead of writing your own.
Here's an existing one you could take a look at: https://github.com/xueqilsj/sas-grammar (no idea how accurate it is)
The syntax of your input is incorrect: == should be used instead of =.
UPDATE:
Also, although the syntax of INT and DOUBLE should work, it would be better expressed like so:
INT : [0-9]+;
DOUBLE : INT PT INT
| PT INT
| INT
;
otherwise, 300 would be identified as a DOUBLE, not as an INT.
UPDATE 2
As #Raven has commented:
INT : [0-9]+;
DOUBLE : INT PT INT
| PT INT
;
I have completed my grammar and resolved all the errors thanks to #Bart, #Seelenvirtuose and #Maurice.
Following is the ANTLR grammar for parsing SAS If Else and simple Assignment statements.
grammar SASDTModel;
parse : block+ EOF;
block
: if_block+ # oneOrMoreIfBlock
| assignment_block+ # assignmentBlocks
;
if_block
: if_statement (else_if_statement)* else_statement?
;
/*nested_if_else_statement
: If if_condition Then Do? ';'? if_statement (else_if_statement)* else_statement? End? ';'?
;*/
if_statement
: If '('? if_condition ')'? Then if_block # nestedIfStatement
| If '('? if_condition ')'? Then expression Equal expression ';' # ifStatement
| If expression In '(' expression_list+ ')' Then expression Equal expression ';' # ifInBlock
;
else_if_statement
: Else If '('? if_condition ')'? Then expression Equal expression ';' # elseIf
| Else If expression In '(' expression_list+ ')' Then expression Equal expression ';' # elseIfInBlock
;
if_condition
: Identifier (Equal|ComparisionOperators) Quote? expression+ Quote? # equalCondition
| expression # expressionCondition
| expression equals_to_null # checkIfNull
| expression op=(And|Or) expression # andOrExpression
;
/*if_range_condition
: expression ComparisionOperators expression ComparisionOperators expression
;*/
else_statement
: Else expression Equal expression ';'
;
assignment_block
: Identifier Equal Identifier '(' function_parameter ')' ';' # functionCall
| Identifier Equal expression expression* ';' # assignValue
;
expression
: Value # value
| Identifier # identifier
| SignedFloat # signedFloat
| '(' expression ')' # expressionBracket
| expression '(' expression_list? ')' # expressionBracketList
| Not expression # notExpression
| expression (Min|Max) expression # minMaxExpression
| expression op=('*'|'/') expression # mulDivideExpression
| expression op=('+'|'-') expression # addSubtractExpression
| expression ('||' | '!!' ) expression # orOperatorExpression
| expression ComparisionOperators expression ComparisionOperators expression # inRangeExpression
| expression ComparisionOperators Quote? expression+ Quote? # ifPlainCondition
| expression (Equal|ComparisionOperators) Quote {_input.get(_input.index() -1).getType() == WS}? Quote # ifSpaceStringCondition
| expression Equal expression # equalExpression
;
expression_list
: Quote? expression+ Quote? Comma? # generalExpressionList
| Quote ({_input.get(_input.index() -1).getType() == WS}?)? Quote Comma? # spaceString
;
function_parameter
: expression+
;
equals_to_null : Equal Pt ;
/*ArithmeticOperators
: '+'
| '-'
| '*'
| '/'
| '**'
;*/
Equal : '=' ;
ComparisionOperators
: '<'
| '>'
| '<='
| '>='
;
And : '&' | 'and';
Or
: '|'
| '!'
;
Not
: '^'
| '~'
;
Min : '><';
Max : '<>';
If
: 'IF'
| 'if'
| 'If'
;
Else
: 'ELSE'
| 'else'
| 'Else';
Then
: 'THEN'
| 'then'
| 'Then'
;
In : 'IN' | 'in';
Do : 'do' | 'Do';
End : 'end' | 'END';
Value
: Int
| DOUBLE
| '-'DOUBLE
| '-'Int
| SignedFloat
| 'null';
Int : [0-9]+;
SignedFloat
: UnaryOperator? UnsignedFloat
;
MUL : '*' ; // assigns token name to '*' used above in grammar
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
DOUBLE
: Int Pt Int
| Pt Int
| Int
;
Pt : '.';
UnaryOperator
: '+'
| '-'
;
UnsignedFloat
: ('0'..'9')+ '.' ('0'..'9')* Exponent?
| '.' ('0'..'9')+ Exponent?
| ('0'..'9')+ Exponent
;
Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
Comma : ',';
Quote
: '\''
| '"'
;
Identifier : [a-zA-Z_] [a-zA-Z_0-9]* ;
WS : ( ' ' | '\t' | '\r' | '\n' )-> channel(HIDDEN);

ANTLR: Parse a date within a quote string

I have a problem figuring out how to parse a date in my grammar.
The thing is that it shares its definition with a String, but according to the Antlr 4 documentation, it should follow the precedence by looking at the order of declaration.
Here is my grammar:
grammar formula;
/* entry point */
parse: expr EOF;
expr
: value # argumentArithmeticExpr
| l=expr operator=('*'|'/'|'%') r=expr # multdivArithmeticExpr // TODO: test the % operator
| l=expr operator=('+'|'-') r=expr # addsubtArithmeticExpr
| '-' expr # minusArithmeticExpr
| FUNCTION_NAME '(' (expr ( ',' expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
value
: number
| variable
| date
| string
| bool;
/* Atomes */
bool
: BOOL
;
variable
: '[' (~(']') | ' ')* ']'
;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
string
: STRING_LITERAL
;
number
: ('+'|'-')? NUMERIC_LITERAL
;
/* lexemes de base */
QUOTE : '\'';
DQUOTE : '"';
MINUS : '-';
COLON : ':';
DOT : '.';
PIPE : '|';
BOOL : T R U E | F A L S E;
FUNCTION_NAME: IDENTIFIER ;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]* // TODO: do we more chars in this set?
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )? // ex: 0.05e3
| '.' DIGIT+ ( E [-+]? DIGIT+ )? // ex: .05e3
;
INT: DIGIT+;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
| '"' ( ~'"' | '""' )* '"'
;
WS: [ \t\n]+ -> skip;
UNEXPECTED_CHAR: . ;
fragment DIGIT: [0-9];
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
The important part here is this:
value
: number
| variable
| date
| string
| bool;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
My grammar expects these things:
"a quoted string" -> gives a string
"2015-03 TOTOTo" -> gives a string because the date format doesn't match.
"2015-03-15" -> gives a date because it matches DQUOTE INT '-' INT '-' INT DQUOTE
And I (tried?) to make sure that the parser tries to match a date before trying to match a string: value: ...| date | string| ....
But when I use the grun utility (and my unit tests...), I can see that it categorizes the date as a string, like if it never bothered to check the date format.
Can you tell me why it is so?
I suspect there's a catch with the order in which I declare my grammar rules, but I tried some permutations and didn't get anything.
The problem stems from the failure to understand that the lexer runs to completion before any of the parser rules are effectively considered.
That means, the STRING_LITERAL lexer rule will consume all strings, dates included, and output just STRING_LITERAL tokens. The date and related parser subrules are never even considered by the parser.
Perhaps the minimal solution is to modify the STRING_LITERAL lexer rule to
STRING_LITERAL
: { notDateString() }?
( QUOTE .*? QUOTE
| DQUOTE .*? DQUOTE
)
;
The notDateString predicate requires native code to perform the essential disambiguation between date formats and other strings.
Another alternative is to promote the STRING_LITERAL rule entirely to the parser. Doable, but a bit messy depending on whether there is a need to preserve whitespaces within 'real' strings.
BTW, you may wish to add a token stream dump to your standard series of unit tests.

How to integrate the generated lexer/parser from Antlr4 into my java project

please bear with me I'm not a coding expert.
I built a grammar in ANTLR4 using ANTRWorks 2. I tested the grammar with various teststrings and it works fine within there. Now what I'm having trouble with is using the generated lexer and parser in my own code. As code generation target I'm using Java.
Here is the code I'm trying:
String s = "query(std::map .find(x) == y): bla";
ANTLRInputStream input = new ANTLRInputStream(s);
TokenStream tokens = new CommonTokenStream(new pqlcLexer(input));
pqlcParser parser = new pqlcParser(tokens);
ParseTree tree = parser.query();
System.out.println(tree.toStringTree());
The Output of that is just "query", which is my starting rule. I would expect something like the output from ANTLRworks:
"(query (quant_expr query ( (match std::map . find ( (cm x) ) == (cm (numeral 256))) ) : (query (qexpr bla))))"
Here is the tree visually: http://puu.sh/94Nlx/00dc35bb05.png
Which methods do I have to call to get the proper syntax tree as output?
Here is the generated Parser for reference: http://pastebin.com/Lb34TyRW and the grammar:
// Lexer
//Schlüsselwörter
EXISTS: 'exists';
REDUCE: 'reduce';
QUERY: 'query';
INT: 'int';
DOUBLE: 'double';
CONST: 'const';
STDVECTOR: 'std::vector';
STDMAP: 'std::map';
STDSET: 'std::set';
INTEGER_LITERAL : (DIGIT)+ ;
fragment DIGIT: '0'..'9';
DOUBLE_LITERAL : DIGIT '.' DIGIT+;
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
DOT : '.';
EQUAL : '==';
LE : '<=';
GE : '>=';
GT : '>';
LT : '<';
ADD : '+';
MUL : '*';
AND : '&&';
COLON : ':';
IDENTIFIER : JavaLetter JavaLetterOrDigit*;
fragment JavaLetter : [a-zA-Z$_]; // these are the "java letters" below 0xFF
fragment JavaLetterOrDigit : [a-zA-Z0-9$_]; // these are the "java letters or digits" below 0xFF
WS
: [ \t\r\n\u000C]+ -> skip
;
COMMENT
: '/*' .*? '*/' -> skip
;
LINE_COMMENT
: '//' ~[\r\n]* -> skip
;
// Parser
//start_rule: query;
query :
quant_expr
| qexpr+
| IDENTIFIER // order IDENTIFIER and qexpr+?
| numeral
//| c_expr TODO
;
c_type : INT | DOUBLE | CONST;
bin_op: AND | ADD | MUL | EQUAL | LT | GT | LE| GE;
qexpr:
LPAREN query RPAREN bin_op_query?
// query bin_op query
| IDENTIFIER bin_op_query? // copied from query to resolve left recursion problem
| numeral bin_op_query? // ^
| quant_expr bin_op_query? // ^
// query.find(query)
| IDENTIFIER find_query? // copied from query to resolve left recursion problem
| numeral find_query? // ^
| quant_expr find_query?
// query[query]
| IDENTIFIER array_query? // copied from query to resolve left recursion problem
| numeral array_query? // ^
| quant_expr array_query?
// | qexpr bin_op_query // bad, resolved by quexpr+ in query
;
bin_op_query: bin_op query bin_op_query?; // resolve left recursion of query bin_op query
find_query: '.''find' LPAREN query RPAREN;
array_query: LBRACK query RBRACK;
quant_expr:
quant id ':' query
| QUERY LPAREN match RPAREN ':' query
| REDUCE LPAREN IDENTIFIER RPAREN id ':' query
;
match:
STDVECTOR LBRACK id RBRACK EQUAL cm
| STDMAP '.''find' LPAREN cm RPAREN EQUAL cm
| STDSET '.''find' LPAREN cm RPAREN
;
cm:
IDENTIFIER
| numeral
// | c_expr TODO
;
quant :
EXISTS;
id :
c_type IDENTIFIER
| IDENTIFIER // Nach Seite 2 aber nicht der Übersicht. Laut übersicht id -> aber dann wäre Regel 1 ohne +
;
numeral :
INTEGER_LITERAL
| DOUBLE_LITERAL
;
Apart from the fact that Java Classes should start with an uppercase letter (so you should rename your grammar, so it starts with an uppercase letter) your last line should be
System.out.println(tree.toStringTree(parser));
to print the tree. Otherwise the tree doesnÄt know which parser to use and only outputs what you described.
EDIT
When naming your grammar PQLC the following code
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
public class Test {
public static void main(String[] args) throws Exception {
String query = "query(std::map .find(x) == y): bla";
ANTLRInputStream input = new ANTLRInputStream(query);
PQLCLexer lexer = new PQLCLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PQLCParser parser = new PQLCParser(tokens);
ParseTree tree = parser.query(); // begin parsing at query rule
System.out.println(tree.toStringTree(parser)); // print LISP-style tree
}
}
produces this output with ANTLR v4.2 at my machine:
(query (quant_expr query ( (match std::map . find ( (cm x) ) == (cm y)) ) : (query (qexpr bla))))

Antlr4 - Parser for multi line file -

I'm trying to use antlr4 to parse a ssh command result, but I can not figure out why this code doesn't work, I keep getting an "extraneous input" error.
Here is a sample of the file I'm trying to parse :
system
home[1] HOME-NEW
sp
cpu[1]
cpu[2]
home[2] SECOND-HOME
sp
cpu[1]
cpu[2]
Here is my grammar file :
listAll
: ( system | home | NL)*
;
elements
: (sp | cpu )*
;
home
: 'home[' number ']' value NL elements
;
system
: 'system' NL
;
sp
: 'sp' NL
;
cpu
: 'cpu[' number ']' NL
;
value
: VALUE
;
number
: INT
;
VALUE : STRING+;
STRING: ('a'..'z'|'A'..'Z'| '-' | ' ' | '(' | ')' | '/' | '.' | '[' | ']');
INT : ('0'..'9')+ ;
NL : '\r'? '\n';
WS : (' '|'\t')* {skip();} ;
The entry point is 'listAll'.
Here is the result I get :
(listAll \r\n (system system \r\n) home[1] HOME-NEW \r\n sp \r\n cpu[1] \r\n cpu[2] \r\n[...])
The parsing failed after 'system'. And I get this error :
line 2:1 extraneous input 'home[1] HOME-NEW' expecting {, system', NL, WS}
Does anybody know why this is not working ?
I am a beginner with Antlr, and I'm not sure I really understand how it works !
Thank you all !
You need to combine NL and WS as one WS element and skip it using -> skip (not {skip()})
And since the WS will be skipped automatically, no need to specify it in all the rules.
Also, your STRING had a space (' ') which was causing the error and taking up the next input.
Here is your complete grammar :
listAll : ( system | home )* ;
elements : ( sp | cpu )* ;
home : 'home[' number ']' value elements;
system : 'system' ;
sp : 'sp' ;
cpu : 'cpu[' number ']' ;
value : VALUE ;
number : INT ;
VALUE : STRING+;
STRING : ('a'..'z'|'A'..'Z'| '-' | '(' | ')' | '/' | '.' | '[' | ']') ;
INT : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
Also, I'll suggest you to go through the ANTLR4 Documentation

ANTLR: Syntax Errors are ignored when running parser programmatically

I am currently creating a more or less simple expression evaluator using ANTLR.
My grammar is straightforward (at least i hope so) and looks like this:
grammar SXLGrammar;
options {
language = Java;
output = AST;
}
tokens {
OR = 'OR';
AND = 'AND';
NOT = 'NOT';
GT = '>'; //greater then
GE = '>='; //greater then or equal
LT = '<'; //lower then
LE = '<='; //lower then or equal
EQ = '=';
NEQ = '!='; //Not equal
PLUS = '+';
MINUS = '-';
MULTIPLY = '*';
DIVISION = '/';
CALL;
}
#header {
package somepackage;
}
#members {
}
#lexer::header {
package rise.spics.sxl;
}
rule
: ('='|':')! expression
;
expression
: booleanOrExpression
;
booleanOrExpression
:
booleanAndExpression ('OR'^ booleanAndExpression)*
;
booleanAndExpression
:
booleanNotExpression ('AND'^ booleanNotExpression)*
;
booleanNotExpression
:
('NOT'^)? booleanAtom
;
booleanAtom
:
| compareExpression
;
compareExpression
:
commonExpression (('<' | '>' | '=' | '<=' | '>=' | '!=' )^ commonExpression)?
;
commonExpression
:
multExpr
(
(
'+'^
| '-'^
)
multExpr
)*
| DATE
;
multExpr
:
atom (('*'|'/')^ atom)*
| '-'^ atom
;
atom
:
INTEGER
| DECIMAL
| BOOLEAN
| ID
| '(' expression ')' -> expression
| functionCall
;
functionCall
:
ID '(' arguments ')' -> ^(CALL ID arguments?)
;
arguments
:
(expression) (','! expression)*
| WS
;
BOOLEAN
:
'true'
| 'false'
;
ID
:
(
'a'..'z'
| 'A'..'Z'
)+
;
INTEGER
:
('0'..'9')+
;
DECIMAL
:
('0'..'9')+ ('.' ('0'..'9')*)?
;
DATE
:
'!' '0'..'9' '0'..'9' '0'..'9' '0'..'9' '-' '0'..'9' '0'..'9' '-' '0'..'9' '0'..'9' (' ' '0'..'9' '0'..'9' ':''0'..'9' '0'..'9' (':''0'..'9' '0'..'9')?)?
;
WS
: (' '|'\t' | '\n' | '\r' | '\f')+ { $channel = HIDDEN; };
Now if i try to parse an invalid Expression like "= true NOT true", the graphical test-tool of the eclipse plugin throws an NoViableAltException: line 1:6 no viable alternative at input 'NOT', which is correct and supposed.
Now if i try to parse the expression in a Java Program, nothing happens. The Program
String expression = "=true NOT false";
CharStream input = new ANTLRStringStream(expression);
SXLGrammarLexer lexer = new SXLGrammarLexer(input);
TokenStream tokenStream = new CommonTokenStream(lexer);
SXLGrammarParser parser = new SXLGrammarParser(tokenStream);
CommonTree tree = (CommonTree) parser.rule().getTree();
System.out.println(tree.toStringTree());
System.out.println(parser.getNumberOfSyntaxErrors());
would output:
true
0
that means, the AST created by the parser exists of one node and ignores the rest. I'd like to handle syntax errors in my application, but its not possible if the generated parser doesn't find any error.
I also tried to alter the parser by overwriting the displayRecognitionError() method with something like this:
public void displayRecognitionError(String[] tokenNames,
RecognitionException e) {
String msg = getErrorMessage(e, tokenNames);
throw new RuntimeException("Error at position "+e.index+" " + msg);
}
but displayRecognitionError gets never called.
If i try something like "=1+", a error gets displayed. I guess theres something wrong with my grammar, but why does the eclipse plugin throw that error while the generated parser does not?
If you want rule to consume the entire token-stream, you have to specify where you expect the end of your input. Like this:
rule
: ('='|':')! expression EOF
;
Without the EOF your parser reads the true as boolean an ignores the rest.

Categories