Only recognizing one token in antlr4 grammar - java

I want my grammar to recognize the following expression &COL[0]. I have built the following grammar:
array:
ARRAY_NAME L_RIGHT_PAR (ARRAY_DIGIT|STRING) R_RIGHT_PAR;
ARRAY_DIGIT:DIGIT+;
ARRAY_NAME: '&''COL';
STRING : QUOT ('\\"' | ~'"')* QUOT
;
L_RIGHT_PAR : '[' ;
R_RIGHT_PAR : ']' ;
fragment
DIGIT : '0'..'9' ;
This gives the error:
mismatched input '[1]' expecting '['
It only works if I write &COL[ 0] with spaces between the [ and ]

I changed the grammar a bit to make it complete enough to run. The text &COL[0] lexes fine with this amended grammar.
grammar test1; // different name for my test rig
test1: ARRAY_NAME L_RIGHT_PAR (ARRAY_DIGIT|STRING) R_RIGHT_PAR;
ARRAY_DIGIT:DIGIT+;
ARRAY_NAME: '&''COL';
STRING : QUOT ('\\"' | ~'"')* QUOT
;
QUOT: '"'; // assumed this
L_RIGHT_PAR : '[' ;
R_RIGHT_PAR : ']' ;
fragment
DIGIT : '0'..'9' ;
WS : [ \t\r\n] -> skip; // added whitespace just so I could add \r\n
Here's the tokenized output:
[#0,0:3='&COL',<ARRAY_NAME>,1:0]
[#1,4:4='[',<'['>,1:4]
[#2,5:5='0',<ARRAY_DIGIT>,1:5]
[#3,6:6=']',<']'>,1:6]
[#4,9:8='<EOF>',<EOF>,2:0]
So this answers the question you asked but I'm still not sure about your definition of STRING. But &COL[0] parses great now.

Related

Xtend Syntax Content Assist

I'm trying to make a content assist in my RCP application. For that, I'm using Xtend and AbstractJavaBasedContentProposalProvider. So, I made my AbstratMyDSLProposalProvider and now I'm writting the MyDSLProposalProvider class. Below, the xtend file and an extract of my grammar :
//Xtend file
override void completeKeyword(Keyword keyword,ContentAssistContextcontentAssistContext, ICompletionProposalAcceptor acceptor) {
//acceptor.accept(createCompletionProposal(keyword, context))
if(keyword.getValue().equals("const")){
return;
}
super.completeKeyword(keyword, contentAssistContext, acceptor);
}
// Grammar File
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
NEWLINE*
(sections+=Options_sect?)?
(sections+=Parameters_sect)?
;
Options_sect
: name=SEC_OPTIONS QUOTE_COMMENT? NEWLINE+ suiteOpt=Suite_options?
;
Suite_options
: {Suite_options} INDENT (options+=Opt)* DEDENT NEWLINE?
;
Opt
: name=OPTION_NAME EQUAL (value=DECIMALINTEGER) NEWLINE+
;
Parameters_sect
: name=SEC_PARAMETERS QUOTE_COMMENT? NEWLINE+ suiteParam=Suite_parameters?
;
Suite_parameters
: {Suite_parameters} INDENT (params+=Param)* DEDENT NEWLINE?
;
Param
: CONST name=NAME EQUAL value=DECIMALINTEGER NEWLINE+
;
terminal SEC_OPTIONS : 'options'SPACES*':';
terminal SEC_PARAMETERS : 'parameters'SPACES*':';
terminal EQUAL : '=';
terminal DECIMALINTEGER : '0'|('1'..'9'(('_'|'0'..'9')*'0'..'9')?);
terminal NAME
: ( ( PP_LABEL* ID_START ID_CONTINUE* PP_LABEL* ) | PP_LABEL )( '.' (PP_LABEL|ID_CONTINUE)* )*
;
terminal PP_LABEL
: '%'ID_START ID_CONTINUE*'%'
;
terminal fragment ID_START
: '_'
| 'A'..'Z'
| 'a'..'z'
;
terminal fragment ID_CONTINUE
: ID_START
| '0'..'9'
;
terminal OPTION_NAME : '$'NAME;
terminal CONST : 'const';
terminal NEWLINE : ((NLINE SPACES?)+);
terminal fragment NLINE:( '\r'? '\n' | '\r' );
terminal SPACES: (' '|'\t')+;
terminal QUOTE_COMMENT : INVERTED_COMMA -> INVERTED_COMMA;
terminal INVERTED_COMMA : '\"';
// Indentation
terminal INDENT :'µµµ';
terminal DEDENT : '£££';
But the content assist doesn't work. Is it the good way to make a content assist in Xtext?
Thank you
You have to override the complete method specific for your terminal rule complete_CONST - not complete keyword. If you go to the place where you would write the new method you get proposals for the method you can override

ANTLR: Parse a date within a quote string

I have a problem figuring out how to parse a date in my grammar.
The thing is that it shares its definition with a String, but according to the Antlr 4 documentation, it should follow the precedence by looking at the order of declaration.
Here is my grammar:
grammar formula;
/* entry point */
parse: expr EOF;
expr
: value # argumentArithmeticExpr
| l=expr operator=('*'|'/'|'%') r=expr # multdivArithmeticExpr // TODO: test the % operator
| l=expr operator=('+'|'-') r=expr # addsubtArithmeticExpr
| '-' expr # minusArithmeticExpr
| FUNCTION_NAME '(' (expr ( ',' expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
value
: number
| variable
| date
| string
| bool;
/* Atomes */
bool
: BOOL
;
variable
: '[' (~(']') | ' ')* ']'
;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
string
: STRING_LITERAL
;
number
: ('+'|'-')? NUMERIC_LITERAL
;
/* lexemes de base */
QUOTE : '\'';
DQUOTE : '"';
MINUS : '-';
COLON : ':';
DOT : '.';
PIPE : '|';
BOOL : T R U E | F A L S E;
FUNCTION_NAME: IDENTIFIER ;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]* // TODO: do we more chars in this set?
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )? // ex: 0.05e3
| '.' DIGIT+ ( E [-+]? DIGIT+ )? // ex: .05e3
;
INT: DIGIT+;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
| '"' ( ~'"' | '""' )* '"'
;
WS: [ \t\n]+ -> skip;
UNEXPECTED_CHAR: . ;
fragment DIGIT: [0-9];
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
The important part here is this:
value
: number
| variable
| date
| string
| bool;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
My grammar expects these things:
"a quoted string" -> gives a string
"2015-03 TOTOTo" -> gives a string because the date format doesn't match.
"2015-03-15" -> gives a date because it matches DQUOTE INT '-' INT '-' INT DQUOTE
And I (tried?) to make sure that the parser tries to match a date before trying to match a string: value: ...| date | string| ....
But when I use the grun utility (and my unit tests...), I can see that it categorizes the date as a string, like if it never bothered to check the date format.
Can you tell me why it is so?
I suspect there's a catch with the order in which I declare my grammar rules, but I tried some permutations and didn't get anything.
The problem stems from the failure to understand that the lexer runs to completion before any of the parser rules are effectively considered.
That means, the STRING_LITERAL lexer rule will consume all strings, dates included, and output just STRING_LITERAL tokens. The date and related parser subrules are never even considered by the parser.
Perhaps the minimal solution is to modify the STRING_LITERAL lexer rule to
STRING_LITERAL
: { notDateString() }?
( QUOTE .*? QUOTE
| DQUOTE .*? DQUOTE
)
;
The notDateString predicate requires native code to perform the essential disambiguation between date formats and other strings.
Another alternative is to promote the STRING_LITERAL rule entirely to the parser. Doable, but a bit messy depending on whether there is a need to preserve whitespaces within 'real' strings.
BTW, you may wish to add a token stream dump to your standard series of unit tests.

ANTLR4-based lexer loses syntax hightlighting during typing on NetBeans

I've coded a simple lexer and parser using ANTLR4 grammars to make a language plugin for NetBeans 7.3 to help team write more quickly our layout files (a mix of XHTML and widgets definitions also in form of XHTML tags but with custom properties, characteristics, and with some differencies against XHTML syntax).
Template file example:
<div style="dyn_layout_panel">
#symbol#
<w_label=label, text="Try to close this window" />
<w_buttonclose=button, text = "CLOSE", on_press=press_close />
<w_buttonterminate=button, text="TERMINATE", on_press=press_terminate />
<w_mydatepicker=datepicker, parent=tab0, ary=[10, "str", /regex/i], start_date=2013-10-05, on_selected=datepicker_selected />
<w_myeditbox=editbox, parent=tab0, validation=USER_REGEX, validation_regex=/^[0-9]+[a-z]*$/i,
validation_msg="User regex don't match editbox contents.", on_keyreturn=tab0_editbox_keyreturn />
<div style="dyn_layout_panel">
$SYMBOL_2$
Some text that make a text node.
</div>
</div>
I use AnltrWorks 2 to write and debug lexer and parser and all seem to be fine, in NetBeans also I don't get any exception and the parser work properly but during editing/typing I lose token colors near the cursor.
Screenshot of problem:
Adding a debug console output for each keystroke I see that the lexer enter in IN_TAG or IN_WIDGET mode correctly, but after a WHITESPACE it returns to the default mode and match te rest of text inside a tag as a TEXT_NODE token.
I know that a lexer can have only one active mode at a time, so because it matches the TEXT_NODE rule when in IN_TAG or IN_WIDGET modes?
Lexer grammar file:
lexer grammar LayoutLexer;
COMMENT
: '/*' .*? '*/' -> channel(HIDDEN)
;
WS : ( ' '
| '\t'
| EOL
)+? -> channel(HIDDEN)
;
WDG_START_OPEN : '<w_' PROPERTY -> pushMode(IN_WIDGET) ;
WDG_END_OPEN : '</w_' PROPERTY -> pushMode(IN_WIDGET) ;
TAG_START_OPEN : '<' ATTRIBUTE -> pushMode(IN_TAG) ;
TAG_END_OPEN : '</' ATTRIBUTE -> pushMode(IN_TAG) ;
EXT_REF
: ( ('#' REF_NAME '#') | ('$' SYMBOL '$') | ('§' REF_NAME '§') )
;
fragment
REF_NAME
: ( [a-z]+ [0-9a-z_]*? )
;
fragment
EOL : ( '\r\n' | '\n\r' | '\n' )
;
EQUAL
: '='
;
TEXT_NODE
: ( (~('\r'|'\n'|'<'|'#'|'$'|'§'))+ )
;
ERROR
: ( .+? )
;
mode IN_TAG;
TAG_CLOSE : '>' -> popMode ;
TAG_EMPTY_CLOSE : '/>' -> popMode ;
TAG_WS : WS -> type(WS), channel(HIDDEN) ;
TAG_COMMENT : COMMENT -> type(COMMENT), channel(HIDDEN) ;
TAG_EQ : EQUAL -> type(EQUAL) ;
ATTRIBUTE
: ( LITERAL [0-9a-zA-Z_]* )
;
VAL
: ( '"' ( ESC_SEQ | ~('\\'|'"') )*? '"'
| '\'' ( ESC_SEQ | ~('\\'|'\'') )*? '\'' )
;
TAG_ERR : ERROR -> type(ERROR) ;
mode IN_WIDGET;
WDG_CLOSE : '>' -> popMode ;
WDG_EMPTY_CLOSE : '/>' -> popMode ;
WDG_WS : WS -> type(WS), mode(IN_WIDGET), channel(HIDDEN) ;
WDG_COMMENT : COMMENT -> type(COMMENT), channel(HIDDEN) ;
WDG_EQ : EQUAL -> type(EQUAL), pushMode(WDG_ASSIGN) ;
COMMA
: ','
;
fragment
MINUS
: '-'
;
STRING
: ( '"' ( ESC_SEQ | ~('\\'|'"') )*? '"'
| '\'' ( ESC_SEQ | ~('\\'|'\'') )*? '\'' )
;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
fragment
HEX_DIGIT
: [0-9a-fA-F]
;
fragment
DIGIT
: [0-9]
;
fragment
HEX_NUMBER
: '0x' HEX_DIGIT+
;
fragment
HTML_NUMBER
: (INT_NUMBER | FLOAT_NUMBER) HTML_UNITS
;
fragment
FLOAT_NUMBER
: MINUS? INT_NUMBER '.' DIGIT+
;
fragment
INT_NUMBER
: MINUS? DIGIT+
;
EVENT_HANDLER
: 'on_' PROPERTY
;
PROPERTY
: ( LITERAL [0-9a-zA-Z_]* )
;
fragment
LITERAL
: ( LITERAL_U | LITERAL_L )
;
fragment
LITERAL_U
: [A-Z]+
;
fragment
LITERAL_L
: [a-z]+
;
WDG_ERR : ERROR -> type(ERROR) ;
mode WDG_ASSIGN;
PHP_REF
: ( LITERAL_L ('_' | LITERAL_L | [0-9])* ) -> popMode
;
VALUE : (WDG_VAL | ARRAY) -> popMode;
ASGN_WS : WS -> type(WS), channel(HIDDEN);
ASGN_COMMA : COMMA -> type(COMMA);
ARY_START
: '['
;
ARY_END
: ']'
;
BIT_OR
: '|'
;
ARRAY
: ARY_START ARY_VALUE (ASGN_COMMA ARY_VALUE)* ARY_END
;
fragment
ARY_VALUE : ASGN_WS? WDG_VAL ASGN_WS? -> type(VALUE);
fragment
WDG_VAL
: (STRING
| UTC_DATE
| HEX_NUMBER
| HTML_NUMBER
| FLOAT_NUMBER
| INT_NUMBER
| BOOLEAN
| BITFIELD
| REGEX
| CSS_CLASS)
;
fragment
HTML_UNITS
: ('%'|'in'|'cm'|'mm'|'em'|'ex'|'pt'|'pc'|'px')
;
fragment
BOOLEAN
: ('true'|'false')
;
fragment
BITFIELD
: SYMBOL (WS? BIT_OR WS? SYMBOL)*
;
SYMBOL
: LITERAL_U [0-9A-Z_]*
;
UTC_DATE
: (DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT)
;
REGEX
: ('/' ('\\'.|.)*? '/' ('g'|'m'|'i')* )
;
CSS_CLASS
: ( LITERAL_L ('-' | '_' | LITERAL_L | [0-9])* )
;
WDG_ASSIGN_ERR : ERROR -> type(ERROR), popMode;
Parser grammar file:
parser grammar LayoutParser;
options
{
tokenVocab=LayoutLexer;
language=Java;
}
document : (element | TEXT_NODE | EXT_REF)* EOF;
element
locals
[
String currentTag
]
: ( ( html_open_tag (element | TEXT_NODE | EXT_REF)* html_close_tag )
| ( wdg_open_tag (element | TEXT_NODE | EXT_REF)* wdg_close_tag )
| ( html_empty_tag | wdg_empty_tag ) )
;
html_empty_tag
: TAG_START_OPEN (ATTRIBUTE EQUAL VAL)* TAG_EMPTY_CLOSE
;
html_open_tag
: ( tag=TAG_START_OPEN (ATTRIBUTE EQUAL VAL)* TAG_CLOSE )
{$element::currentTag = $tag.text.substring(1);}
;
html_close_tag
: tag=TAG_END_OPEN TAG_CLOSE
{
if (!$element::currentTag.equals($tag.text.substring(2)))
notifyErrorListeners("HTML tag mismatch '" + $element::currentTag + "' - '" + $tag.text.substring(2) + "'");
}
;
wdg_empty_tag
: WDG_START_OPEN EQUAL PHP_REF ( COMMA (wdg_prop | wdg_event) )* WDG_EMPTY_CLOSE
;
wdg_open_tag
: tag=WDG_START_OPEN EQUAL PHP_REF ( COMMA (wdg_prop | wdg_event) )* WDG_CLOSE
{$element::currentTag = $tag.text.substring(1);}
;
wdg_close_tag
: tag=WDG_END_OPEN WDG_CLOSE
{
if (!$element::currentTag.equals($tag.text.substring(2)))
notifyErrorListeners("Widget alias mismatch '" + $element::currentTag + "' - '" + $tag.text + "'");
}
;
wdg_prop
: PROPERTY (EQUAL (ARRAY | VALUE | PHP_REF | UTC_DATE | REGEX | CSS_CLASS))?
;
wdg_event
: EVENT_HANDLER EQUAL PHP_REF
;
Depending on the implementation of syntax highlighting, the IDE may or may not start at the beginning of the document when lexing the input for syntax highlighting. If it does not start at the beginning of the document, then before returning any tokens, you need to ensure that the lexer instance is initialized in the correct mode (both the _mode and _modeStack fields need to be initialized to their correct state at the point where lexing starts).
If your lexer reads or writes any custom fields during lexing, you may need to restore those fields as well.
Examples
GoWorks (NetBeans based, LGPL License). This implementation does not use the lexer facilities in the NetBeans API, but instead implements the functionality at a lower level. For now you can ignore the MarkOccurrences* and SemanticHighlighter classes.
package org.tvl.goworks.editor.go.highlighter
package org.antlr.works.editor.antlr4.highlighting
ANTLR 4 IntelliJ Plugin (IntelliJ IDEA, BSD license).
package org.antlr.intellij.adaptor.lexer
package org.antlr.intellij.plugin (in particular, the SyntaxHighlighter classes)
Additional efficiency notes
Your REF_NAME, VAL, and STRING rules use non-greedy loops that do not need to be non-greedy. In each of these rules, change +? to + and change *? to *.
Your WS and ERROR rules use a non-greedy operator +? which is equivalent to not having a closure at all. The unnecessary use of a non-greedy operator in these cases only serves to slow down your lexer. To preserve the existing behavior, you can remove +? from these rules (replacing with + would change behavior).
Additional functionality notes
ANTLR 4 does not perform any error correction during lexing. If the input does not match a token, then the input simply does not match a token. This issue affects your VAL and STRING tokens in particular, which will not get syntax highlighting prior to adding the closing " or ' character. For syntax highlighting these types of tokens, I prefer to use an additional mode in the lexer, allowing me to produce separate tokens for the escape sequences embedded in the string, as well as syntax highlighting an unterminated string at the end of the line (unless your language allows strings to span multiple lines, in which case you'd stop at the end of the input).
For future references
All problems are related to the wrong implementation I done of NetBeans Lexer<T> class; many tutorials on the web do not take into account that a lexer may have more than one mode and that the lexer state must be backuped and restored between Lexer allocation/releases as mentioned by 280Z28.
This is the code I use to make syntax highlighting consistent:
public class LayoutEditorLexer implements Lexer<LayoutTokenId> {
private LexerRestartInfo<LayoutTokenId> info;
private LayoutLexer lexer;
private class LexerState {
public int Mode = -1;
public IntegerStack Stack = null;
public LexerState(int mode, IntegerStack stack)
{
Mode = mode;
Stack = new IntegerStack(stack);
}
}
public LayoutEditorLexer(LexerRestartInfo<LayoutTokenId> info) {
this.info = info;
AntlrCharStream charStream = new AntlrCharStream(info.input(), "LayoutEditor", false);
lexer = new LayoutLexer(charStream);
lexer.removeErrorListeners();
lexer.addErrorListener(ErrorListener.INSTANCE);
LexerState lexerMode = (LexerState)info.state();
if (lexerMode != null)
{
lexer._mode = lexerMode.Mode;
lexer._modeStack.addAll(lexerMode.Stack);
}
}
#Override
public org.netbeans.api.lexer.Token<LayoutTokenId> nextToken() {
Token token = lexer.nextToken();
int ttype = token.getType();
if (ttype != LayoutLexer.EOF)
{
LayoutTokenId tokenId = LayoutLanguageHierarchy.getToken(ttype);
return info.tokenFactory().createToken(tokenId);
}
return null;
}
#Override
public Object state()
{
// Here many tutorials simply returns null.
return new LexerState(lexer._mode, lexer._modeStack);
}
#Override
public void release()
{
}
}

Antlr4 - Parser for multi line file -

I'm trying to use antlr4 to parse a ssh command result, but I can not figure out why this code doesn't work, I keep getting an "extraneous input" error.
Here is a sample of the file I'm trying to parse :
system
home[1] HOME-NEW
sp
cpu[1]
cpu[2]
home[2] SECOND-HOME
sp
cpu[1]
cpu[2]
Here is my grammar file :
listAll
: ( system | home | NL)*
;
elements
: (sp | cpu )*
;
home
: 'home[' number ']' value NL elements
;
system
: 'system' NL
;
sp
: 'sp' NL
;
cpu
: 'cpu[' number ']' NL
;
value
: VALUE
;
number
: INT
;
VALUE : STRING+;
STRING: ('a'..'z'|'A'..'Z'| '-' | ' ' | '(' | ')' | '/' | '.' | '[' | ']');
INT : ('0'..'9')+ ;
NL : '\r'? '\n';
WS : (' '|'\t')* {skip();} ;
The entry point is 'listAll'.
Here is the result I get :
(listAll \r\n (system system \r\n) home[1] HOME-NEW \r\n sp \r\n cpu[1] \r\n cpu[2] \r\n[...])
The parsing failed after 'system'. And I get this error :
line 2:1 extraneous input 'home[1] HOME-NEW' expecting {, system', NL, WS}
Does anybody know why this is not working ?
I am a beginner with Antlr, and I'm not sure I really understand how it works !
Thank you all !
You need to combine NL and WS as one WS element and skip it using -> skip (not {skip()})
And since the WS will be skipped automatically, no need to specify it in all the rules.
Also, your STRING had a space (' ') which was causing the error and taking up the next input.
Here is your complete grammar :
listAll : ( system | home )* ;
elements : ( sp | cpu )* ;
home : 'home[' number ']' value elements;
system : 'system' ;
sp : 'sp' ;
cpu : 'cpu[' number ']' ;
value : VALUE ;
number : INT ;
VALUE : STRING+;
STRING : ('a'..'z'|'A'..'Z'| '-' | '(' | ')' | '/' | '.' | '[' | ']') ;
INT : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
Also, I'll suggest you to go through the ANTLR4 Documentation

ANTLR generating invalid java exceptions throws code

I've been using ANTLRwork 1.5 these days, together with antlr runtime 3.5. Here is a weird thing i found:
Antlr is generating this kind of java code for me:
public final BLABLABLAParser.addExpression_return addExpression() throws {
blablabla...
}
notice that this function throws nothing, and this is invalid in java. So I need to correct these mistakes manually.
Anyone knows why?
here is the sample grammar, it's directly taken from the book Language implementation patterns.
// START: header
// START: header
grammar Cymbol; // my grammar is called Cymbol
options {
output = AST;
ASTLabelType = CommonTree;
}
tokens{
METHOD_DECL;
ARG_DECL;
BLOCK;
VAR_DECL;
CALL;
ELIST;
EXPR;
}
// define a SymbolTable field in generated parser
compilationUnit // pass symbol table to start rule
: (methodDeclaration | varDeclaration)+ // recognize at least one variable declaration
;
// END: header
methodDeclaration
: type ID '(' formalParameters? ')' block
-> ^(METHOD_DECL type ID formalParameters? block)
;
formalParameters
: type ID (',' type ID)* -> ^(ARG_DECL type ID)+
;
// START: type
type
: 'float'
| 'int'
| 'void'
;
// END: type
block : '{' statement* '}' -> ^(BLOCK statement*)
;
// START: decl
varDeclaration
: type ID ('=' expression)? ';' -> ^(VAR_DECL type ID expression?)// E.g., "int i = 2;", "int i;"
;
// END: decl
statement
: block
| varDeclaration
| 'return' expression? ';' -> ^('return' expression?)
| postfixExpression
(
'=' expression -> ^('=' postfixExpression expression)
| -> ^(EXPR postfixExpression)
) ';'
;
expressionList
: expression(',' expression)* -> ^(ELIST expression+)
| -> ELIST
;
expression
: addExpression -> ^(EXPR addExpression)
;
addExpression
: postfixExpression('+'^ postfixExpression)*
;
postfixExpression
: primary (lp='('^ expressionList ')'! {$lp.setType(CALL);})*
;
// START: primary
primary
: ID // reference variable in an expression
| INT
| '(' expression ')' -> expression
;
// END: primary
// LEXER RULES
ID : LETTER (LETTER | '0'..'9')*
;
fragment
LETTER : ('a'..'z' | 'A'..'Z')
;
INT : '0'..'9'+
;
WS : (' '|'\r'|'\t'|'\n') {$channel=HIDDEN;}
;
SL_COMMENT
: '//' ~('\r'|'\n')* '\r'? '\n' {$channel=HIDDEN;}
;
Edit: This is a bug in ANTLRWorks 1.5 that has already been fixed for the next release.
#5: ANTLRworks fails to generate proper Java Code
I used the exact configuration you described above, with a copy/pasted grammar. The signature generated for the rule you mention was the following:
// $ANTLR start "addExpression"
// C:\\dev\\Cymbol.g:72:1: addExpression : postfixExpression ( '+' ^ postfixExpression )* ;
public final CymbolParser.addExpression_return addExpression() throws RecognitionException {
Can you post the first line of the generated file? It should start with // $ANTLR 3.5 like the following:
// $ANTLR 3.5 C:\\dev\\Cymbol.g 2013-02-13 09:55:44

Categories