I'm working on a parser and require custom errors to be thrown for every keyword. My code is the following.
SKIP: { " " | "\t" | "\n" | "\r" }
TOKEN: { "DEF" | "MAIN" | <NAME: (["A"-"Z"])+> | <PARAM: (["a"-"z"])+> | <NUM: (["0"-"9"])+> }
void Start(): {} {(Def() Func())+ <EOF>}
void Def(): {} {"DEF" | { throw new ParseException("expected keyword DEF"); }}
void Func(): {} {"MAIN" | Name() Param() | { throw new ParseException("Expected MAIN or NAME PARAM"); }}
void Name(): {} {<NAME> | { throw new ParseException("invalid function name"); }}
void Param(): {} { <PARAM> | { throw new ParseException("invalid PARAM"); }}
The Start() function is giving me an error and tells me that Expansion within "(...)+" can be matched by empty string error. I think the problem is in the Name() Param() part of Func() but I do not know how to change this while still throwing custom error messages. Can anyone provide some pointers?
While I agree with the comment from user207421, you could maybe do the following
void oneOrMoreThings() : {} {
(Thing() | (throw new ParseException( ... ) ; }
( Thing() )*
}
Make DEF optional and then check it has been found and if not raise the exception.
Start(): {Token tk=null;} {tk="DEF"? {if (tk==null) throw ...} "MAIN" etc
Related
I'm trying to design a simple query language as following
grammar FilterExpression;
// Lexer rules
AND : 'AND' ;
OR : 'OR' ;
NOT : 'NOT';
GT : '>' ;
GE : '>=' ;
LT : '<' ;
LE : '<=' ;
EQ : '=' ;
DECIMAL : '-'?[0-9]+('.'[0-9]+)? ;
KEY : ~[ \t\r\n\\"~=<>:(),]+ ;
QUOTED_WORD: ["] ('\\"' | ~["])* ["] ;
NEWLINE : '\r'? '\n';
WS : [ \t\r\n]+ -> skip ;
StringFilter : KEY ':' QUOTED_WORD;
NumericalFilter : KEY (GT | GE | LT | LE | EQ) DECIMAL;
condition : StringFilter # stringCondition
| NumericalFilter # numericalCondition
| StringFilter op=(AND|OR) StringFilter # combinedStringCondition
| NumericalFilter op=(AND|OR) NumericalFilter # combinedNumericalCondition
| condition AND condition # combinedCondition
| '(' condition ')' # parens
;
I added a few tests and would like to verify if they work as expected. To my surprise, some cases which should be clearly wrong passed
For instance when I type
(brand:"apple" AND t>3) 1>3
where the 1>3 is deliberately put as an error. However it seems Antlr is still happily generating a tree which looks like:
Is it because my grammar has some problems I didn't realize?
I also tried in IntelliJ plugin (because I thought grun might not behaving as expected) but it give
Test code I'm using. Note I also tried to use BailErrorStrategy but these doesn't seem to help
public class ParserTest {
private class BailLexer extends FilterExpressionLexer {
public BailLexer(CharStream input) {
super(input);
}
public void recover(LexerNoViableAltException e) {
throw new RuntimeException(e);
}
}
private FilterExpressionParser createParser(String filterString) {
//FilterExpressionLexer lexer = new FilterExpressionLexer(CharStreams.fromString(filterString));
FilterExpressionLexer lexer = new BailLexer(CharStreams.fromString(filterString));
CommonTokenStream tokens = new CommonTokenStream(lexer);
FilterExpressionParser parser = new FilterExpressionParser(tokens);
parser.setErrorHandler(new BailErrorStrategy());
parser.addErrorListener(new ANTLRErrorListener() {
#Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
System.out.print("here1");
}
#Override
public void reportAmbiguity(Parser recognizer, DFA dfa, int startIndex, int stopIndex, boolean exact, BitSet ambigAlts, ATNConfigSet configs) {
System.out.print("here2");
}
#Override
public void reportAttemptingFullContext(Parser recognizer, DFA dfa, int startIndex, int stopIndex, BitSet conflictingAlts, ATNConfigSet configs) {
System.out.print("here3");
}
#Override
public void reportContextSensitivity(Parser recognizer, DFA dfa, int startIndex, int stopIndex, int prediction, ATNConfigSet configs) {
System.out.print("here4");
}
});
return parser;
}
#Test
public void test() {
FilterExpressionParser parser = createParser("(brand:\"apple\" AND t>3) 1>3");
parser.condition();
}
}
Looks like I found the answer finally.
The reason is in the grammar I didn't provide an EOF. And obviously in ANTLR it's perfectly fine to parse the prefix os syntax. that's why the rest of the test string
(brand:"apple" AND t>3) 1>3 i.e. 1>3 is allowed.
See discussion here: https://github.com/antlr/antlr4/issues/351
Then I changed the grammar a little to add an EOF at the end of the syntax condition EOF everything works
I'm currently stuck on my project on creating a Fuseki Triple Store Browser. I need to visualize all the data from a TripleStore and make the app browsable. The only problem is that the QuerySolution leaves out the "< >" that are in the triplestore.
If I use the ResultSetFormatter.asText(ResultSet) it returns this:
-------------------------------------------------------------------------------------------------------------------------------------
| subject | predicate | object |
=====================================================================================================================================
| <urn:animals:data> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq> |
| <urn:animals:data> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> | <urn:animals:lion> |
| <urn:animals:data> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> | <urn:animals:tarantula> |
| <urn:animals:data> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#_3> | <urn:animals:hippopotamus> |
-------------------------------------------------------------------------------------------------------------------------------------
Notice that the some of the data contains the smaller/greater than signs "<" and ">". As soon as i try to parse the data from the ResultSet, it removes those sign, so that the data looks like this:
-------------------------------------------------------------------------------------------------------------------------------
| subject | predicate | object |
===============================================================================================================================
| urn:animals:data | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq |
| urn:animals:data | http://www.w3.org/1999/02/22-rdf-syntax-ns#_1 | urn:animals:lion |
| urn:animals:data | http://www.w3.org/1999/02/22-rdf-syntax-ns#_2 | urn:animals:tarantula |
| urn:animals:data | http://www.w3.org/1999/02/22-rdf-syntax-ns#_3 | urn:animals:hippopotamus |
As you can see, the data doesn't contain the "<" and ">" signs.
This is how I parse the data from the ResultSet:
while (rs.hasNext()) {
// Moves onto the next result
QuerySolution sol = rs.next();
// Return the value of the named variable in this binding.
// A return of null indicates that the variable is not present in
// this solution
RDFNode object = sol.get("object");
RDFNode predicate = sol.get("predicate");
RDFNode subject = sol.get("subject");
// Fill the table with the data
DefaultTableModel modelTable = (DefaultTableModel) this.getModel();
modelTable.addRow(new Object[] { subject, predicate, object });
}
It's quite hard to explain this problem, but is there a way to keep the "< >" signs after parsing the data?
The '<>' are used by the formatter to indicate that the value is a URI rather than a string: so "http://example.com/" is a literal text value, whereas <http://example.com/> is a URI.
You can do the same yourself:
RDFNode node; // subject, predicate, or object
if (node.isURIResource()) {
return "<" + node.asResource().getURI() + ">";
} else {
...
}
But it's much easier to use FmtUtils:
String nodeAsString = FmtUtils.stringForRDFNode(subject); // or predicate, or object
What you need to do is get that code invoked when the table cell is rendered: currently the table is using Object::toString().
In outline, the steps needed are:
modelTable.setDefaultRenderer(RDFNode.class, new MyRDFNodeRenderer());
Then see http://docs.oracle.com/javase/tutorial/uiswing/components/table.html#renderer about how to create a simple renderer. Note that value will be an RDFNode:
static class MyRDFNodeRenderer extends DefaultTableCellRenderer {
public MyRDFNodeRenderer() { super(); }
public void setValue(Object value) {
setText((value == null) ? "" : FmtUtils.stringForRDFNode((RDFNode) value));
}
}
The following is a snippet from an ANTLR grammar I have been working on:
compoundEvaluation returns [boolean evalResult]
: singleEvaluation (('AND'|'OR') singleEvaluation)*
;
//overall rule to evaluate a single expression
singleEvaluation returns [boolean evalResult]
: simpleStringEvaluation {$evalResult = $simpleStringEvaluation.evalResult;}
| stringEvaluation {$evalResult = $stringEvaluation.evalResult;}
| simpleDateEvaluation {$evalResult = $simpleDateEvaluation.evalResult;}
| dateEvaluatorWithModifier1 {$evalResult = $dateEvaluatorWithModifier1.evalResult;}
| dateEvaluatorWithoutModifier1 {$evalResult = $dateEvaluatorWithoutModifier1.evalResult;}
| simpleIntegerEvaluator {$evalResult = $simpleIntegerEvaluator.evalResult;}
| integerEvaluator {$evalResult = $integerEvaluator.evalResult;}
| integerEvaluatorWithModifier {$evalResult = $integerEvaluatorWithModifier.evalResult;}
;
Here's a sample of one of those evaluation rules:
simpleStringEvaluation returns [boolean evalResult]
: op1=STR_FIELD_IDENTIFIER operator=(EQ|NE) '"'? op2=(SINGLE_VALUE|INTEGER) '"'?
{
// I don't want these to be equal by default
String op1Value = op1.getText();
String op2Value = op2.getText();
try {
// get the values of the bean property specified by the value of op1 and op2
op1Value = BeanUtils.getProperty(policy,op1.getText());
} catch (NoSuchMethodException e) {
e.printStackTrace();
} catch (InvocationTargetException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
}
String strOperator = operator.getText();
if (strOperator.equals("=")) {
evalResult = op1Value.equals(op2Value);
}
if (strOperator.equals("<>")) {
evalResult = !op1Value.equals(op2Value);
}
}
;
Obviously I'm a newbie since I'm not building a tree, but the code works so I'm reasonably happy with it. However, the next step is to perform logical evaluations on multiple singleEvaluation statements. Since I'm embedding the code in the grammar, I was hoping someone could point me in the right direction to figure out how to evaluate 0 or more results.
There is no need to store the values in a set.
Why not simply do something like this:
compoundOrEvaluation returns [boolean evalResult]
: a=singleEvaluation { $evalResult = $a.evalResult; }
( ('OR') b=singleEvaluation { $evalResult ||= $b.evalResult; } )*
;
?
Here's how I did it. I created a Set as a member, then in each statement's #init, I reinitialized the Set. As the statement was evaluated, it populated the set. Since the only legal values of the set are true or false, I end up having a set with 0, 1, or two members.
The OR evaluation looks like this:
compoundOrEvaluation returns [boolean evalResult]
#init {evaluationResults = new HashSet<Boolean>();}
: a=singleEvaluation {evaluationResults.add($a.evalResult);} (('OR') b=singleEvaluation {evaluationResults.add($b.evalResult);})+
{
if (evaluationResults.size()==1) {
evalResult = evaluationResults.contains(true);
} else {
evalResult = true;
}
}
;
The AND evaluation only differs in the else statement, where evalResult will be set to false. So far, this passes the unit tests I can throw at it.
Eventually I may use a tree and a visitor class, but the code currently works.
I am trying to implement Error Reporting and Recovery in JavaCC grammar
I have mentioned the following code in .jjt grammar file
void Stm() :
{}
{
try {
(
IfStm()
|
WhileStm()
)
}catch (ParseException e) {
error_skipto(SEMICOLON);
}
}
void error_skipto(int kind) {
ParseException e = generateParseException(); // generate the exception object.
System.out.println(e.toString()); // print the error message
Token t;
do {
t = getNextToken();
} while (t.kind != kind);
}
When I execute the command jjtree CMinus.jjt I get following error:
Reading from file CMinus_ragu.jjt . . .
Error parsing input: org.javacc.jjtree.ParseException: Encountered " "{" "{ "" a
t line 111, column 30.
Was expecting one of:
"throws" ...
":" ...
"#" ...
What is the error in the code and how should I handle the error recovery?
The keyword JAVACODE should be added before error handler code in grammar file. Therefore the method should appear as follows:
JAVACODE
void error_skipto(int kind) {
ParseException e = generateParseException(); // generate the exception object.
System.out.println(e.toString()); // print the error message
Token t;
do {
t = getNextToken();
} while (t.kind != kind);
}
This is because the keyword JAVACODE should be added before using java style production.
I need big help, I have two simple classes Tree and Node ( I put just interface to use less space on forum, I can easy modify those classes ), I also have flex file and parser file and need to create AST ( abstract syntax tree - to put tokens in Node objects and fill Tree in right way ).
public class Tree {
Node root;
public void AddNode(Node n){}
public void Evaluate(){}
}
public class Node {
public String value;
public int type;
Node left, right;
}
This is parser file
import java_cup.runtime.*;
parser code {:
public boolean result = true;
public void report_fatal_error(String message, Object info) throws java.lang.Exception {
done_parsing();
System.out.println("report_fatal_error");
report_error();
}
public void syntax_error(Symbol cur_token) {
System.out.println("syntax_error");
report_error();
}
public void unrecovered_syntax_error(Symbol cur_token) throws java.lang.Exception {
System.out.println("unrecovered_syntax_error");
report_fatal_error("Fatalna greska, parsiranje se ne moze nastaviti", cur_token);
}
public void report_error(){
System.out.println("report_error");
result = false;
}
:}
init with {: result = true; :};
/* Terminals (tokens returned by the scanner). */
terminal AND, OR, NOT;
terminal LPAREN, RPAREN;
terminal ITEM;
terminal OPEN, CLOSE, MON, MOFF, TIMEOUT, ESERR, BAE, I, O, BUS, EXT, PUSHB;
terminal VAL, OK, BUS_BR_L, BUS_BR_R, SH_CRT_L, SH_CRT_R, BUS_ALL, EXT_ALL, NO_TIMEOUT, NO_ES_ERR, IBUS_OK, CFG_OK, SYNTAX;
terminal OUT;
/* Non-terminals */
non terminal extension;
non terminal Integer expr;
/* Precedences */
precedence left AND, OR;
/* The grammar */
expr ::=
|
expr:e1 AND expr:e2
{:
//System.out.println("AND");
RESULT = 1;
:}
|
expr:e1 OR expr:e2
{:
//System.out.println("OR");
RESULT = 2;
:}
|
NOT expr:e1
{:
//System.out.println("NOT");
RESULT = 3;
:}
|
LPAREN expr:e RPAREN
{:
//System.out.println("()");
RESULT = 4;
:}
|
ITEM extension:e1
{:
//System.out.println("ITEM.");
RESULT = 5;
:}
|
error
{:
System.out.println("error");
parser.report_error();
RESULT = 0;
:}
;
extension ::=
OPEN
|
MON
|
CLOSE
|
MOFF
|
TIMEOUT
|
ESERR
|
BAE
|
I
|
O
|
BUS
|
EXT
|
PUSHB
|
VAL
|
OK
|
BUS_BR_L
|
BUS_BR_R
|
SH_CRT_L
|
SH_CRT_R
|
BUS_ALL
|
EXT_ALL
|
NO_TIMEOUT
|
NO_ES_ERR
|
IBUS_OK
|
CFG_OK
|
SYNTAX
|
OUT
;
This is grammar
%%
%{
public boolean result = true;
//Puni expression sa tokenima radi reimenovanja
public Expression expression=new Expression();
//
public ArrayList<String> items = new ArrayList<String>();
public ArrayList<Integer> extensions = new ArrayList<Integer>();
// ukljucivanje informacije o poziciji tokena
private Symbol new_symbol(int type) {
return new Symbol(type, yyline+1, yycolumn);
}
// ukljucivanje informacije o poziciji tokena
private Symbol new_symbol(int type, Object value) {
return new Symbol(type, yyline+1, yycolumn, value);
}
%}
%cup
%xstate COMMENT
%eofval{
return new_symbol(sym.EOF);
%eofval}
%line
%column
%%
" " {}
"\b" {}
"\t" {}
"\r\n" {}
"\f" {}
"open" {extensions.add(sym.OPEN); return new_symbol(sym.OPEN);}
"close" {extensions.add(sym.CLOSE); return new_symbol(sym.CLOSE);}
"m_on" {extensions.add(sym.MON); return new_symbol(sym.MON);}
"m_off" {extensions.add(sym.MOFF); return new_symbol(sym.MOFF);}
"timeout" {extensions.add(sym.TIMEOUT); return new_symbol(sym.TIMEOUT);}
"es_err" {extensions.add(sym.ESERR); return new_symbol(sym.ESERR);}
"bae" {extensions.add(sym.BAE); return new_symbol(sym.BAE);}
"i" {extensions.add(sym.I); return new_symbol(sym.I);}
"o" {extensions.add(sym.O); return new_symbol(sym.O);}
"bus" {extensions.add(sym.BUS); return new_symbol(sym.BUS);}
"ext" {extensions.add(sym.EXT); return new_symbol(sym.EXT);}
"pushb" {extensions.add(sym.PUSHB); return new_symbol(sym.PUSHB);}
"val" {extensions.add(sym.VAL); return new_symbol(sym.VAL);}
"ok" {extensions.add(sym.OK); return new_symbol(sym.OK);}
"bus_br_l" {extensions.add(sym.BUS_BR_L); return new_symbol(sym.BUS_BR_L);}
"bus_br_r" {extensions.add(sym.BUS_BR_R); return new_symbol(sym.BUS_BR_R);}
"sh_crt_l" {extensions.add(sym.SH_CRT_L); return new_symbol(sym.SH_CRT_L);}
"sh_crt_r" {extensions.add(sym.SH_CRT_R); return new_symbol(sym.SH_CRT_R);}
"bus_all" {extensions.add(sym.BUS_ALL); return new_symbol(sym.BUS_ALL);}
"ext_all" {extensions.add(sym.EXT_ALL); return new_symbol(sym.EXT_ALL);}
"no_timeout" {extensions.add(sym.NO_TIMEOUT); return new_symbol(sym.NO_TIMEOUT);}
"no_es_err" {extensions.add(sym.NO_ES_ERR); return new_symbol(sym.NO_ES_ERR);}
"ibus_ok" {extensions.add(sym.IBUS_OK); return new_symbol(sym.IBUS_OK);}
"cfg_ok" {extensions.add(sym.CFG_OK); return new_symbol(sym.CFG_OK);}
"syntax" {extensions.add(sym.SYNTAX); return new_symbol(sym.SYNTAX);}
"out" {extensions.add(sym.OUT); return new_symbol(sym.OUT);}
"!" { return new_symbol(sym.NOT);}
"&" { return new_symbol(sym.AND);}
"|" { return new_symbol(sym.OR);}
"(" { return new_symbol(sym.LPAREN);}
")" { return new_symbol(sym.RPAREN);}
([[:jletter:]])[[:jletterdigit:]]* \. {items.add(yytext().substring(0, yytext().length()-1)); return new_symbol (sym.ITEM);}
. {result = false;}
Probem is how to create AST from here, I got on input expression something like
A.open && b.i
? Can anybody help ?
The lines in your Parser where you have commented out print statements like:
//System.out.println("OR");
is where you'll want to maintain your AST using the Tree data structure you have. Find out which token will create the tree, add something somewhere in the tree, etc based on your grammar.