ANTLRworks creating interpreter from grammar - java

Hey I have a quick question. I am using ANTLRworks to create an interpreter in Java from a set of grammar. I was going to write it out by hand but then realized I didn't have to because of antlrworks. I am getting this error though
T.g:9:23: label ID conflicts with token with same name
Is ANTLRworks the way to go when creating a interpreter from grammar. And do y'all see any error in my code?
I am trying to make ID one letter from a-z and not case sensitive. and to have white space in between every lexeme. THANK YOU
grammar T;
programs : ID WS compound_statement;
statement:
if_statement|assignment_statement|while_statement|print_statement|compound_statement;
compound_statement: 'begin' statement_list 'end';
statement_list: statement|statement WS statement_list;
if_statement: 'if' '(' boolean_expression ')' 'then' statement 'else' statement;
while_statement: 'while' boolean_expression 'do' statement;
assignment_statement: ID = arithmetic_expression;
print_statement: 'print' ID;
boolean_expression: operand relative_op operand;
operand : ID |INT;
relative_op: '<'|'<='|'>'|'>='|'=='|'/=';
arithmetic_expression: operand|operand WS arithmetic_op WS operand;
arithmetic_op: '+'|'-'|'*'|'/';
ID : ('a'..'z'|'A'..'Z'|'_').
;
INT : '0'..'9'+
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
and here is the grammar
<program> → program id <compound_statement>
<statement> → <if_statement> | <assignment_statement> | <while_statement> |
<print_statement> | <compound_statement>
<compound_statement> → begin <statement_list> end
<statement_list> → <statement> | <statement> ; <statement_list>
<if_statement> → if <boolean_expression> then <statement> else <statement>
<while_statement> → while <boolean_expression> do <statement>
<assignment_statement> -> id := <arithmetic_expression>
<print_statement> → print id
<boolean_expression> → <operand> <relative_op> <operand>
<operand> → id | constant
<relative_op> → < | <= | > | >= | = | /=
<arithmetic_expression> → <operand> | <operand> <arithmetic_op> <operand>
<arithmetic_op> → + | - | * | /

Is ANTLRworks the way to go when creating a interpreter from grammar.
No.
ANTLRWorks can only be used to write your grammar and possibly test to see if it input properly (through its debugger or interpreter). It cannot be used to create an interpreter for the language you've written the grammar for. ANTLRWorks is just a fancy text-editor, nothing more.
And do y'all see any error in my code?
As indicated by Treebranch: you didn't have quotes around the = sign in:
assignment_statement: ID = arithmetic_expression;
making ANTLR "think" you wanted to assign the label ID to the parser rule arithmetic_expression, which is illegal: you can't have a label-name that is also the name of a rule (ID, in your case).

Some possible issues in your code:
I think you want your ID tag to have a + regex so that it can be of length 1 or more, like so:
ID : ('a'..'z'|'A'..'Z'|'_')+
;
It also looks like you are missing quotes around your = sign:
assignment_statement: ID '=' arithmetic_expression;
EDIT
Regarding your left recursion issue: ANTLR is very powerful because of the regex functionality. While an EBNF (like the one you have presented) may be limited in the way it can express things, ANTLR can be used to express certain grammar rules in a much simpler way. For instance, if you want to have a statement_list in your compound_statement, just use your statement rule with closure (*). Like so:
compound_statement: 'begin' statement* 'end';
Suddently, you can remove unnecessary rules like statement_list.

Related

How do I make a Simple ANTLR grammar extension?

I'm writing a framework that uses ANTLR to parse Java-style expressions. I had in mind to create a new type of free-form literal. The literal will look similar to a string, so I thought to extend the Java8 grammar I'm using with a new literal identical to StringLiteral, but bounded by '`' characters instead of '"'.
So I created:
ExternalLiteral
: '`' StringCharacters? '`'
;
in the Lexer and modified:
fragment
StringCharacter
: ~["`\\\r\n]
| EscapeSequence
;
and
fragment
EscapeSequence
: '\\' [btnfr"'`\\]
| OctalEscape
| UnicodeEscape // This is not in the spec but prevents having to preprocess the input
;
so '`' would be treated as a special character, identically to '"'. Then in the grammar I modified
literal
: IntegerLiteral
| FloatingPointLiteral
| BooleanLiteral
| CharacterLiteral
| StringLiteral
| ExternalLiteral
| NullLiteral
;
That seems like it would work to me, but when I try to parse any such expressions, e.g.`0`, I get:
line 1:3 mismatched input '<EOF>' expecting {'boolean', 'byte', 'char', 'double', 'float', 'int', 'long', 'new', 'short', 'super', 'this', 'void', IntegerLiteral, FloatingPointLiteral, BooleanLiteral, CharacterLiteral, StringLiteral, ExternalLiteral, 'null', '(', '!', '~', '++', '--', '+', '-', Identifier, '#'}
line 1:3 missing {IntegerLiteral, FloatingPointLiteral, BooleanLiteral, CharacterLiteral, StringLiteral, ExternalLiteral, 'null'} at '<EOF>'
I've had fights with ANTLR before, I don't know if it's ANTLR or me that is more the problem. Does anyone with more experience than me see what I might've done wrong?

antlr4 - any text and keywords

I am trying to parse the following:
SELECT name-of-key[random text]
This is part of a larger grammar which I am trying to construct. I left it our for clarity.
I came up with the following rules:
select : 'select' NAME '[' anything ']'
;
anything : (ANYTHING | NAME)+
;
NAME : ('a'..'z' | 'A'..'Z' | '0'..'9' | '-' | '_')+
;
ANYTHING : (~(']' | '['))+
;
WHITESPACE : ('\t' | ' ' | '\r' | '\n')+ -> skip
;
This doesn't seem to work. For example, input SELECT a[hello world!] gives the following error:
line 1:0 mismatched input 'SELECT a' expecting 'SELECT'
This goes wrong because the input SELECT a is recognized by ANYTHING, instead of select. How do I fix that? I feel that I am missing some concept(s) here, but it is difficult to get started.
Maybe the concept you are missing is rule priority.
[1] Lexer rules matching the longest possible string have priority.
As you mentioned, the ANYTHING token rule above matches "select a", which is longer than what the (implicit) token rule 'select' matches, hence its precedence. Non-greedy behaviour is indicated by a question mark.
ANYTHING : (~(']' | '['))+?
Just making the ANYTHING rule non-greedy doesn't completely solve your problem though, because after properly matching 'select', the lexer will produce an ANYTHING token for the space, because ...
[2] Lexer rules appearing first have priority.
Switching lexer rules WHITE_SPACE and ANYTHING fixes this. The grammar below should parse your example.
select : 'select' NAME '[' anything ']'
;
anything : (ANYTHING | NAME)+
;
NAME : ('a'..'z' | 'A'..'Z' | '0'..'9' | '-' | '_')+
;
WHITESPACE : ('\t' | ' ' | '\r' | '\n')+ -> skip
;
ANYTHING : (~(']' | '['))+?
;
I personally avoid implicit token rules, especially if your grammar is complex, precisely because of token rule priority. I would thus write this.
SELECT : 'select' ;
L_BRACKET : '[';
R_BRACKET : ']';
NAME : ('a'..'z' | 'A'..'Z' | '0'..'9' | '-' | '_')+ ;
WHITESPACE : ('\t' | ' ' | '\r' | '\n')+ -> skip ;
ANY : . ;
select : SELECT NAME L_BRACKET anything R_BRACKET ;
anything : (~R_BRACKET)+ ;
Also note that the space in "hello world" will be swallowed by the WHITESPACE rule. To properly manage this, you need ANTLR island grammars.
'Hope this helps!

ANTLR Decision can match input using multiple alternatives

I have this simple grammer:
expr: factor;
factor: atom (('*' ^ | '/'^) atom)*;
atom: INT
| ':' expr;
INT: ('0'..'9')+
when I run it it says :
Decision can match input such as '*' using multiple alternatives 1,2
Decision can match input such as '/' using multiple alternatives 1,2
I can't spot the ambiguity. How are the red arrows pointing ?
Any help would be appreciated.
Let's say you want to parse the input:
:3*4*:5*6
The parser generated by your grammar could match this input into the following parse trees:
and:
(I omitted the colons to keep the trees more clear)
Note that what you see is just a warning. By specifically instructing ANTLR that (('*' | '/') atom)* needs to be matched greedily, like this:
factor
: atom (options{greedy=true;}: ('*'^ | '/'^) atom)*
;
the parser "knows" which alternative to take, and no warning is emitted.
EDIT
I tested the grammar with ANTLR 3.3 as follows:
grammar T;
options {
output=AST;
}
parse
: expr EOF!
;
expr
: factor
;
factor
: atom (options{greedy=true;}: ('*'^ | '/'^) atom)*
;
atom
: INT
| ':'^ expr
;
INT : ('0'..'9')+;
And then from the command line:
java -cp antlr-3.3.jar org.antlr.Tool T.g
which does not produce any warning (or error).

Own DSL with XText. Problem with unlimited brackets ("(", ")")

I am developing my own DSL in XText.
I want do something like this:
1 AND (2 OR (3 OR 4))
Here my current .xtext file:
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
(greetings+=CONDITION_LEVEL)
;
terminal NUMBER :
('1'..'9') ('0'..'9')*
;
AND:
' AND '
;
OR:
' OR '
;
OPERATOR :
AND | OR
;
CONDITION_LEVEL:
('('* NUMBER (=>')')* OPERATOR)+ NUMBER ')'*
;
The problem I am having is that the dsl should have the possibility to make unlimited bracket, but show an error when the programmer don't closes all opened bracket.
example:
1 AND (2 OR (3 OR 4)
one bracket is missing --> should make error.
I don't know how I can realize this in XText. Can anybody help?
thx for helping.
Try this:
CONDITION_LEVEL
: ATOM ((AND | OR) ATOM)*
;
ATOM
: NUMBER
| '(' CONDITION_LEVEL ')'
;
Note that I have no experience with XText (so I did not test this), but this does work with ANTLR, on which XText is built (or perhaps it only uses ANTLR...).
Aslo, you probably don't want to surround your operator-tokens with spaces, but put them on a hidden-parser channel:
grammar org.xtext.example.mydsl.MyDsl hidden(SPACE)
...
terminal SPACE : (' '|'\t'|'\r'|'\n')+;
...
Otherwise source like this would fail:
1 AND(2 OR 3)
For details, see Hidden Terminal Symbols from the XText user guide.
You need to make your syntax recursive. The basic idea is that a CONDITION_LEVEL can be, for example, two CONDITION_LEVEL separated by an OPERATOR.
I don't know the specifics of the xtext syntax, but using a BCNF-like syntax you could have:
CONDITION_LEVEL:
NUMBER
'(' CONDITION_LEVEL OPERATOR CONDITION_LEVEL ')'

What is the equivalent for epsilon in ANTLR BNF grammar notation?

During taking advantage of ANTLR 3.3, I'm changing the current grammar to support inputs without parenthesis too. Here's the first version of my grammar :
grammar PropLogic;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOLS : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
prog : formula EOF ;
formula : NOT formula
| OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
| SYMBOLS ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Then I changed it this way to support the appropriate features :
grammar PropLogic;
NOT : '!' ;
OR : '+' ;
AND : '.' ;
IMPLIES : '->' ;
SYMBOL : ('a'..'z') | '~' ;
OP : '(' ;
CP : ')' ;
EM : '' ;
prog : formula EOF ;
formula : OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
| ( NOT formula | SYMBOL )( AND formula | OR formula | IMPLIES formula | EM ) ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
But I've been faced with following error :
error<100>: syntax error: invalid char literal: ''
error<100>: syntax error: invalid char literal: ''
Does anybody know that how can I overcome this error?
Your EM token:
EM : '' ;
is invalid: you can't match an empty string in lexer rules.
To match epsilon (nothing), you should do:
rule
: A
| B
| /* epsilon */
;
Of course, the comment /* epsilon */ can safely be removed.
Note that when you do it like that in your current grammar, ANTLR will complain that there can be rules matched using multiple alternatives. This is because your grammar is ambiguous.
I'm not an ANTLR expert, but you might try:
formula : term ((AND | OR | IMPLIES ) term )*;
term : OP formula CP | NOT term | SYMBOL ;
If you want traditional precedence of operators this won't do the trick, but that's another issue.
EDIT: OP raised the ante; he wants precedence too. I'll meet him halfway, since it wasn't part
of the orginal question. I've added precedence to the grammar that makes IMPLIES
the lower precedence than other operators, and leave it to OP to figure out how to do the rest.
formula: disjunction ( IMPLIES disjunction )* ;
disjunction: term (( AND | OR ) term )* ;
term: OP formula CP | NOT term | SYMBOL ;
OP additionally asked, "how to convert (!p or q ) into p -> q". I think he should
have asked this as a separate question. However, I'm already here.
What he needs to do is walk the tree, looking for the pattern he doesn't
like, and change the tree into one he does, and then prettyprint the answer.
It is possible to do all this with ANTLR, which is part of the reason
it is popular.
As a practical matter, procedurally walking the tree and checking the node
types, and splicing out old nodes and splicing in new is doable, but a royal PitA.
Especially if you want to do this for lots of transformations.
A more effective way to do this is to use a
program transformation system, which allows surface syntax patterns to be expressed for matching and replacement. Program transformation systems of course include parsing machinery and more powerful ones let you (and indeed insist) that you define
a grammar up front much as you for ANTLR.
Our DMS Software Reengineering Toolkit is such a program transformation tool, and with a suitably defined grammar for propositions,
the following DMS transformation rule would carry out OP's additional request:
domain proplogic; // tell DMS to use OP's definition of logic as a grammar
rule normalize_implies_from_or( p: term, q: term): formula -> formula
" NOT \p OR \q " -> " \p IMPLIES \q ";
The " ... " is "domain notation", e.g, surface syntax from the proplogic domain, the "\" are meta-escapes,
so "\p" and "\q" represent any arbitrary term from the proplogic grammar. Notice the rule has to reach "across" precedence levels when being applied, as "NOT \p OR \q" isn't a formula and "\p IMPLIES \q" is; DMS takes care of all this (the "formula -> formula" notation is how DMS knows what to do). This rule does a tree-to-tree rewrite. The resulting tree can be prettyprinted by DMS.
You can see a complete example of something very similar, e.g., a grammar for conventional algebra and rewrite rule to simplify algebraic equations.

Categories