I downloaded latest release of ANTLR - 4.2.2 (antlr-4.2.2-complete.jar)
When I use it to generate parsers for grammar file Java.g4 it prints me some warnings like:
"Java.g4:525:16: rule 'expression' contains an 'assoc' terminal option in an unrecognized location"
Files was generated but didn't compile
Previous version works fine.
Whats wrong?
The <assoc> should now be moved left of the "expression".
It must be placed always right to the surrounding |:
Look here: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Left-recursive+rules
...
| expression '&&' expression
| expression '||' expression
| expression '?' expression ':' expression
|<assoc=right> expression
( '='
| '+='
| '-='
| '*='
| '/='
| '&='
| '|='
| '^='
| '>>='
| '>>>='
| '<<='
| '%='
)
expression
Related
I use antlr JavaParser.
expression
: primary
| expression bop='.'
(
identifier
| methodCall
| THIS
| NEW nonWildcardTypeArguments? innerCreator
| SUPER superSuffix
| explicitGenericInvocation
)
| expression '[' expression ']'
| methodCall
| NEW creator
| '(' annotation* typeType ('&' typeType)* ')' expression
| expression postfix=('++' | '--')
| prefix=('+'|'-'|'++'|'--') expression
| prefix=('~'|'!') expression
| expression bop=('*'|'/'|'%') expression
| expression bop=('+'|'-') expression
| expression ('<' '<' | '>' '>' '>' | '>' '>') expression
| expression bop=('<=' | '>=' | '>' | '<') expression
| expression bop=INSTANCEOF (typeType | pattern)
| expression bop=('==' | '!=') expression
| expression bop='&' expression
| expression bop='^' expression
| expression bop='|' expression
| expression bop='&&' expression
| expression bop='||' expression
| <assoc=right> expression bop='?' expression ':' expression
| <assoc=right> expression
bop=('=' | '+=' | '-=' | '*=' | '/=' | '&=' | '|=' | '^=' | '>>=' | '>>>=' | '<<=' | '%=')
expression
But some illegal expressions cannot be detected directly.
Such as 3 > 3 ? 3 : 45 : 45, 2 > 3 > 4.
I wonder how the java syntax detects such illegal expressions, it's extra coding work, or the ability to use antlr itself.
If it's extra coding work, how to do it?
Note that from 3 > 3 ? 3 : 45 : 45 only 3 > 3 ? 3 : 45 is recognized as the expression (which is correct). If you'd test the parser with the extra rule:
single_expression
: expression EOF
;
then the input 3 > 3 ? 3 : 45 : 45 will not be valid.
As for 2 > 3 > 4: it is syntactically correct. To validate if it is semantically correct (which is not the work for a parser!), you'll need to implement type checking yourself. ANTLR's listener- and visitor classes can help, but the bulk of the work will have to be done by yourself. Also see How to implement type checking in a Listener
first of all, thank you in advance for your answer, this problem is killing me
My first question is how can ignore certain text?
I wanna ignore certain text from my document, I have the next text:
And I wanna ignore the text enclosed by the rectangle...when the lexer find the "demandante" word it will stop to ignore...
I used this grammar
grammar A;
documento:((acciondemandante acciondemandado) | (acciondemandado acciondemandante));
acciondemandante: PALABRASDEMANDA informacionentidad+;
acciondemandado: PALABRASDEMANDADO informacionentidad+;
informacionentidad: nombres distancia? identificacion;
nombres: nombrenormal|nombremayuscula;
nombrenormal: WORDCAPITALIZE WORDCAPITALIZE+;
nombremayuscula: WORDUPPER WORDUPPER+;
distancia: WORDLOWER;
identificacion: tipo indicador? INT+;
tipo: cedula | NIT;
cedula: CEDULA | LCASE_LETTER LCASE_LETTER | UCASE_LETTER UCASE_LETTER;
indicador: WORDCAPITALIZE | WORDLOWER;
CEDULA: 'cedula' | 'cc' | 'CC';
NIT: 'NIT' | 'nit';
PALABRASDEMANDADO: 'demandados' | 'demandado';
PALABRASDEMANDA: 'demandante' | 'demandantes';
WORDUPPER: UCASE_LETTER UCASE_LETTER+;
WORDLOWER: LCASE_LETTER LCASE_LETTER+;
WORDCAPITALIZE: UCASE_LETTER LCASE_LETTER+;
LCASE_LETTER: 'a'..'z' | 'ñ' | 'á' | 'é' | 'í' | 'ó' | 'ú';
UCASE_LETTER: 'A'..'Z' | 'Ñ' | 'Á' | 'É' | 'Í' | 'Ó' | 'Ú';
INT: DIGIT+;
DIGIT: '0'..'9';
SPECIAL_CHAR: '.' -> skip;
WS : [ \t\r\n]+ -> skip;
//ANY: ~[ ]+;
I have tried a trick skipping the whitespaces WS : [ \t\r\n]+ -> skip; and then ignoring what is not whitespaces ANY: ~[ ]+; But it does not work because the lexer never recognize the ANY token...
What I would like my grammar to read
bullshit bullshit demandado Julian Solarte c.c 120109321 bullshit bullshit
My second problem is that I get the "mismatched input ''" problem, and in order to resolve this problem I add this rule "SKIPEND: EOF ->skip;" but it does not works...
Thank you thank you so much.
My approach to this problem would be 2 steps:
Find the keyword in the input stream (here demandado).
Let a parser parse from this position without forcing an EOF for the input in the grammar. It will go as far as possible ignoring everything it doesn't understand after what was understood.
This will make your grammar much simpler and you will get a parse tree only for the relevant input.
I am trying to parse the following:
SELECT name-of-key[random text]
This is part of a larger grammar which I am trying to construct. I left it our for clarity.
I came up with the following rules:
select : 'select' NAME '[' anything ']'
;
anything : (ANYTHING | NAME)+
;
NAME : ('a'..'z' | 'A'..'Z' | '0'..'9' | '-' | '_')+
;
ANYTHING : (~(']' | '['))+
;
WHITESPACE : ('\t' | ' ' | '\r' | '\n')+ -> skip
;
This doesn't seem to work. For example, input SELECT a[hello world!] gives the following error:
line 1:0 mismatched input 'SELECT a' expecting 'SELECT'
This goes wrong because the input SELECT a is recognized by ANYTHING, instead of select. How do I fix that? I feel that I am missing some concept(s) here, but it is difficult to get started.
Maybe the concept you are missing is rule priority.
[1] Lexer rules matching the longest possible string have priority.
As you mentioned, the ANYTHING token rule above matches "select a", which is longer than what the (implicit) token rule 'select' matches, hence its precedence. Non-greedy behaviour is indicated by a question mark.
ANYTHING : (~(']' | '['))+?
Just making the ANYTHING rule non-greedy doesn't completely solve your problem though, because after properly matching 'select', the lexer will produce an ANYTHING token for the space, because ...
[2] Lexer rules appearing first have priority.
Switching lexer rules WHITE_SPACE and ANYTHING fixes this. The grammar below should parse your example.
select : 'select' NAME '[' anything ']'
;
anything : (ANYTHING | NAME)+
;
NAME : ('a'..'z' | 'A'..'Z' | '0'..'9' | '-' | '_')+
;
WHITESPACE : ('\t' | ' ' | '\r' | '\n')+ -> skip
;
ANYTHING : (~(']' | '['))+?
;
I personally avoid implicit token rules, especially if your grammar is complex, precisely because of token rule priority. I would thus write this.
SELECT : 'select' ;
L_BRACKET : '[';
R_BRACKET : ']';
NAME : ('a'..'z' | 'A'..'Z' | '0'..'9' | '-' | '_')+ ;
WHITESPACE : ('\t' | ' ' | '\r' | '\n')+ -> skip ;
ANY : . ;
select : SELECT NAME L_BRACKET anything R_BRACKET ;
anything : (~R_BRACKET)+ ;
Also note that the space in "hello world" will be swallowed by the WHITESPACE rule. To properly manage this, you need ANTLR island grammars.
'Hope this helps!
Hey I have a quick question. I am using ANTLRworks to create an interpreter in Java from a set of grammar. I was going to write it out by hand but then realized I didn't have to because of antlrworks. I am getting this error though
T.g:9:23: label ID conflicts with token with same name
Is ANTLRworks the way to go when creating a interpreter from grammar. And do y'all see any error in my code?
I am trying to make ID one letter from a-z and not case sensitive. and to have white space in between every lexeme. THANK YOU
grammar T;
programs : ID WS compound_statement;
statement:
if_statement|assignment_statement|while_statement|print_statement|compound_statement;
compound_statement: 'begin' statement_list 'end';
statement_list: statement|statement WS statement_list;
if_statement: 'if' '(' boolean_expression ')' 'then' statement 'else' statement;
while_statement: 'while' boolean_expression 'do' statement;
assignment_statement: ID = arithmetic_expression;
print_statement: 'print' ID;
boolean_expression: operand relative_op operand;
operand : ID |INT;
relative_op: '<'|'<='|'>'|'>='|'=='|'/=';
arithmetic_expression: operand|operand WS arithmetic_op WS operand;
arithmetic_op: '+'|'-'|'*'|'/';
ID : ('a'..'z'|'A'..'Z'|'_').
;
INT : '0'..'9'+
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
and here is the grammar
<program> → program id <compound_statement>
<statement> → <if_statement> | <assignment_statement> | <while_statement> |
<print_statement> | <compound_statement>
<compound_statement> → begin <statement_list> end
<statement_list> → <statement> | <statement> ; <statement_list>
<if_statement> → if <boolean_expression> then <statement> else <statement>
<while_statement> → while <boolean_expression> do <statement>
<assignment_statement> -> id := <arithmetic_expression>
<print_statement> → print id
<boolean_expression> → <operand> <relative_op> <operand>
<operand> → id | constant
<relative_op> → < | <= | > | >= | = | /=
<arithmetic_expression> → <operand> | <operand> <arithmetic_op> <operand>
<arithmetic_op> → + | - | * | /
Is ANTLRworks the way to go when creating a interpreter from grammar.
No.
ANTLRWorks can only be used to write your grammar and possibly test to see if it input properly (through its debugger or interpreter). It cannot be used to create an interpreter for the language you've written the grammar for. ANTLRWorks is just a fancy text-editor, nothing more.
And do y'all see any error in my code?
As indicated by Treebranch: you didn't have quotes around the = sign in:
assignment_statement: ID = arithmetic_expression;
making ANTLR "think" you wanted to assign the label ID to the parser rule arithmetic_expression, which is illegal: you can't have a label-name that is also the name of a rule (ID, in your case).
Some possible issues in your code:
I think you want your ID tag to have a + regex so that it can be of length 1 or more, like so:
ID : ('a'..'z'|'A'..'Z'|'_')+
;
It also looks like you are missing quotes around your = sign:
assignment_statement: ID '=' arithmetic_expression;
EDIT
Regarding your left recursion issue: ANTLR is very powerful because of the regex functionality. While an EBNF (like the one you have presented) may be limited in the way it can express things, ANTLR can be used to express certain grammar rules in a much simpler way. For instance, if you want to have a statement_list in your compound_statement, just use your statement rule with closure (*). Like so:
compound_statement: 'begin' statement* 'end';
Suddently, you can remove unnecessary rules like statement_list.
What are the rules for Ant path style patterns.
The Ant site itself is surprisingly uninformative.
Ant-style path patterns matching in spring-framework:
The mapping matches URLs using the following rules:
? matches one character
* matches zero or more characters
** matches zero or more 'directories' in a path
{spring:[a-z]+} matches the regexp [a-z]+ as a path variable named "spring"
Some examples:
com/t?st.jsp - matches com/test.jsp but also com/tast.jsp or com/txst.jsp
com/*.jsp - matches all .jsp files in the com directory
com/**/test.jsp - matches all test.jsp files underneath the com path
org/springframework/**/*.jsp - matches all .jsp files underneath the org/springframework path
org/**/servlet/bla.jsp - matches org/springframework/servlet/bla.jsp but also org/springframework/testing/servlet/bla.jsp and org/servlet/bla.jsp
com/{filename:\\w+}.jsp will match com/test.jsp and assign the value test to the filename variable
http://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/util/AntPathMatcher.html
I suppose you mean how to use path patterns
If it is about whether to use slashes or backslashes these will be translated to path-separators on the platform used during execution-time.
Most upvoted answer by #user11153 using tables for a more readable format.
The mapping matches URLs using the following rules:
+-----------------+---------------------------------------------------------+
| Wildcard | Description |
+-----------------+---------------------------------------------------------+
| ? | Matches exactly one character. |
| * | Matches zero or more characters. |
| ** | Matches zero or more 'directories' in a path |
| {spring:[a-z]+} | Matches regExp [a-z]+ as a path variable named "spring" |
+-----------------+---------------------------------------------------------+
Some examples:
+------------------------------+--------------------------------------------------------+
| Example | Matches: |
+------------------------------+--------------------------------------------------------+
| com/t?st.jsp | com/test.jsp but also com/tast.jsp or com/txst.jsp |
| com/*.jsp | All .jsp files in the com directory |
| com/**/test.jsp | All test.jsp files underneath the com path |
| org/springframework/**/*.jsp | All .jsp files underneath the org/springframework path |
| org/**/servlet/bla.jsp | org/springframework/servlet/bla.jsp |
| also: | org/springframework/testing/servlet/bla.jsp |
| also: | org/servlet/bla.jsp |
| com/{filename:\\w+}.jsp | com/test.jsp & assign value test to filename variable |
+------------------------------+--------------------------------------------------------+
ANT Style Pattern Matcher
Wildcards
The utility uses three different wildcards.
+----------+-----------------------------------+
| Wildcard | Description |
+----------+-----------------------------------+
| * | Matches zero or more characters. |
| ? | Matches exactly one character. |
| ** | Matches zero or more directories. |
+----------+-----------------------------------+
As #user11153 mentioned, Spring's AntPathMatcher implements and documents the basics of Ant-style path pattern matching.
In addition, Java 7's nio APIs added some built in support for basic pattern matching via FileSystem.getPathMatcher