ANTLR Evaluation of a Compound Rule - java

The following is a snippet from an ANTLR grammar I have been working on:
compoundEvaluation returns [boolean evalResult]
: singleEvaluation (('AND'|'OR') singleEvaluation)*
;
//overall rule to evaluate a single expression
singleEvaluation returns [boolean evalResult]
: simpleStringEvaluation {$evalResult = $simpleStringEvaluation.evalResult;}
| stringEvaluation {$evalResult = $stringEvaluation.evalResult;}
| simpleDateEvaluation {$evalResult = $simpleDateEvaluation.evalResult;}
| dateEvaluatorWithModifier1 {$evalResult = $dateEvaluatorWithModifier1.evalResult;}
| dateEvaluatorWithoutModifier1 {$evalResult = $dateEvaluatorWithoutModifier1.evalResult;}
| simpleIntegerEvaluator {$evalResult = $simpleIntegerEvaluator.evalResult;}
| integerEvaluator {$evalResult = $integerEvaluator.evalResult;}
| integerEvaluatorWithModifier {$evalResult = $integerEvaluatorWithModifier.evalResult;}
;
Here's a sample of one of those evaluation rules:
simpleStringEvaluation returns [boolean evalResult]
: op1=STR_FIELD_IDENTIFIER operator=(EQ|NE) '"'? op2=(SINGLE_VALUE|INTEGER) '"'?
{
// I don't want these to be equal by default
String op1Value = op1.getText();
String op2Value = op2.getText();
try {
// get the values of the bean property specified by the value of op1 and op2
op1Value = BeanUtils.getProperty(policy,op1.getText());
} catch (NoSuchMethodException e) {
e.printStackTrace();
} catch (InvocationTargetException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
}
String strOperator = operator.getText();
if (strOperator.equals("=")) {
evalResult = op1Value.equals(op2Value);
}
if (strOperator.equals("<>")) {
evalResult = !op1Value.equals(op2Value);
}
}
;
Obviously I'm a newbie since I'm not building a tree, but the code works so I'm reasonably happy with it. However, the next step is to perform logical evaluations on multiple singleEvaluation statements. Since I'm embedding the code in the grammar, I was hoping someone could point me in the right direction to figure out how to evaluate 0 or more results.

There is no need to store the values in a set.
Why not simply do something like this:
compoundOrEvaluation returns [boolean evalResult]
: a=singleEvaluation { $evalResult = $a.evalResult; }
( ('OR') b=singleEvaluation { $evalResult ||= $b.evalResult; } )*
;
?

Here's how I did it. I created a Set as a member, then in each statement's #init, I reinitialized the Set. As the statement was evaluated, it populated the set. Since the only legal values of the set are true or false, I end up having a set with 0, 1, or two members.
The OR evaluation looks like this:
compoundOrEvaluation returns [boolean evalResult]
#init {evaluationResults = new HashSet<Boolean>();}
: a=singleEvaluation {evaluationResults.add($a.evalResult);} (('OR') b=singleEvaluation {evaluationResults.add($b.evalResult);})+
{
if (evaluationResults.size()==1) {
evalResult = evaluationResults.contains(true);
} else {
evalResult = true;
}
}
;
The AND evaluation only differs in the else statement, where evalResult will be set to false. So far, this passes the unit tests I can throw at it.
Eventually I may use a tree and a visitor class, but the code currently works.

Related

How can I detect if a user enters a string which does not follow my ANTLR grammar rules?

I am making a Computer Algebra System which will take an algebraic expression and simplify or differentiate it.
As you can see by the following code the user input is taken, but if it is a string which does not conform to my grammar rules the error,
line 1:6 mismatched input '' expecting {'(', INT, VAR}, occurs and the program continues running.
How would I catch the error and stop the program from running? Thank you in advance for any help.
Controller class:
public static void main(String[] args) throws IOException {
String userInput = "x*x*x+";
getAST(userInput);
}
public static AST getAST(String userInput) {
ParseTree tree = null;
ExpressionLexer lexer = null;
ANTLRInputStream input = new ANTLRInputStream(userInput);
try {
lexer = new ExpressionLexer(input);
}catch(Exception e) {
System.out.println("Incorrect grammar");
}
System.out.println("Lexer created");
CommonTokenStream tokens = new CommonTokenStream(lexer);
System.out.println("Tokens created");
ExpressionParser parser = new ExpressionParser(tokens);
System.out.println("Tokens parsed");
tree = parser.expr();
System.out.println("Tree created");
System.out.println(tree.toStringTree(parser)); // print LISP-style tree
Trees.inspect(tree, parser);
ParseTreeWalker walker = new ParseTreeWalker();
ExpressionListener listener = new buildAST();
walker.walk(listener, tree);
listener.printAST();
listener.extractExpression();
return new AST();
}
}
My Grammar:
grammar Expression;
#header {
package exprs;
}
#members {
// This method makes the parser stop running if it encounters
// invalid input and throw a RuntimeException.
public void reportErrorsAsExceptions() {
//removeErrorListeners();
addErrorListener(new ExceptionThrowingErrorListener());
}
private static class ExceptionThrowingErrorListener extends BaseErrorListener {
#Override
public void syntaxError(Recognizer<?, ?> recognizer,
Object offendingSymbol, int line, int charPositionInLine,
String msg, RecognitionException e) {
throw new RuntimeException(msg);
}
}
}
#rulecatch {
// ANTLR does not generate its normal rule try/catch
catch(RecognitionException e) {
throw e;
}
}
expr : left=expr op=('*'|'/'|'^') right=expr
| left=expr op=('+'|'-') right=expr
| '(' expr ')'
| atom
;
atom : INT|VAR;
INT : ('0'..'9')+ ;
VAR : ('a' .. 'z') | ('A' .. 'Z') | '_';
WS : [ \t\r\n]+ -> skip ;
A typical parse run with ANTLR4 consists of 2 stages:
A "quick'n dirty" run with SLL prediction mode that bails out on the first found syntax error.
A normal run using the LL prediction mode which tries to recover from parser errors. This second step only needs to be executed if there was an error in the first step.
The first step is kinda loose parse run which doesn't resolve certain ambiquities and hence can report an error which doesn't really exist (when resolved in LL mode). But the first step is faster and delivers so a quicker result for syntactically correct input. This (JS) code shows the setup:
this.parser.removeErrorListeners();
this.parser.addErrorListener(this.errorListener);
this.parser.errorHandler = new BailErrorStrategy();
this.parser.interpreter.setPredictionMode(PredictionMode.SLL);
try {
this.tree = this.parser.grammarSpec();
} catch (e) {
if (e instanceof ParseCancellationException) {
this.tokenStream.seek(0);
this.parser.reset();
this.parser.errorHandler = new DefaultErrorStrategy();
this.parser.interpreter.setPredictionMode(PredictionMode.LL);
this.tree = this.parser.grammarSpec();
} else {
throw e;
}
}
In order to avoid any resolve attempt for syntax errors in the first step you also have to set the BailErrorStrategy. This strategy simply throws a ParseCancellationException in case of a syntax error (similar like you do in your code). You could add your own handling in the catch clause to ask the user for correct input and respin the parse step.

Left Recursion: ANTLR

I am trying to do Lexical Analysis in Java grammar, but got stack in that error. I am in expression part right now, doing it in parts (just using string_expression):
expression:
( expression8)
;
expression8:
{Expression8Action}
((
( "+"
| "+=" )
e2=expression )e1=expression8)?
;
Solved with turning on backtrack (file .mwe2):
language = StandardLanguage {
name = "org.xtext.example.mydsl.MyDsl"
fileExtensions = "mydsl"
serializer = {
generateStub = false
}
validator = {
// composedCheck = "org.eclipse.xtext.validation.NamesAreUniqueValidator"
}
parserGenerator = {
options = {
backtrack = true
}
}
}

Only execute once for every ID in array list

I'm getting results from a PHP, and parse them to a String array
ParseResults[0] is the ID returned from the database.
What I'm trying to do, is make a message box, which is only shown once (until the application is restarted of course).
My code looks something like this, but I can't figure out what stops it from working properly.
public void ShowNotification() {
try {
ArrayList<String> SearchGNArray = OverblikIntetSvar(Main.BrugerID);
// SearchGNArray = Gets undecoded rows of information from DB
for(int i=0; i<SearchGNArray.size(); i++){
String[] ParseTilArray = ParseResultater(SearchGNArray.get(i));
// ParseToArray = Parse results and decode to useable results
// ParseToArray[0] = the index containing the ID we'd like
// to keep track of, if it already had shown a popup about it
if (SearchPopUpArray.size() == 0) {
// No ID's yet in SearchPopUpArray
// SearchPopUpArray = Where we'd like to store our already shown ID's
SearchPopUpArray.add(ParseTilArray[0]);
// Create Messagebox
}
boolean match = false ;
for(int ii=0; ii<SearchPopUpArray.size(); ii++){
try {
match = SearchPopUpArray.get(ii).equals(ParseTilArray[0]);
} catch (Exception e) {
e.printStackTrace();
}
if(match){
// There is a match
break; // Break to not create a popup
} else {
// No match in MatchPopUpArray
SearchPopUpArray.add(ParseTilArray[0]);
// Create a Messagebox
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
As of now I have 2 rows, so there should be two ID's. There's 101 and 102. It shows 102 once, and then it just keeps creating messageboxes about 101.
You are not incrementing the right variable in the second for-loop:
for(int ii = 0; ii <SearchPopUpArray.size();i++){
/* ^
|
should be ii++
*/
}
It might be help to use more descriptive variable name like indexGN and indexPopup instead to avoid this sort of issue
I've deleted the second for loop:
for(int ii=0; ii<SearchPopUpArray.size(); ii++){
try {
match = SearchPopUpArray.get(ii).equals(ParseTilArray[0]);
} catch (Exception e) {
e.printStackTrace();
}
if(match){
// There is a match
} else {
// No match in MatchPopUpArray
SearchPopUpArray.add(ParseTilArray[0]);
// Create a Messagebox
}
}
And replaced with
if (SearchPopUpArray.contains(ParseTilArray[0])) {
// Match
} else {
// No match i MatchPopUpArray
SearchPopUpArray.add(ParseTilArray[0]);
// Create a Messagebox
}
Much more simple.

Xtext: Inferring type in variable declaration not working in interface generation

I am writing my DSL's Model inferrer, which extends from AbstractModelInferrer. Until now, I have successfully generated classes for some grammar constructs, however when I try to generate an interface the type inferrer does not work and I get the following Exception:
0 [Worker-2] ERROR org.eclipse.xtext.builder.BuilderParticipant - Error during compilation of 'platform:/resource/pascani/src/org/example/namespaces/SLA.pascani'.
java.lang.IllegalStateException: equivalent could not be computed
The Model inferrer code is:
def dispatch void infer(Namespace namespace, IJvmDeclaredTypeAcceptor acceptor, boolean isPreIndexingPhase) {
acceptor.accept(processNamespace(namespace, isPreIndexingPhase))
}
def JvmGenericType processNamespace(Namespace namespace, boolean isPreIndexingPhase) {
namespace.toInterface(namespace.fullyQualifiedName.toString) [
if (!isPreIndexingPhase) {
documentation = namespace.documentation
for (e : namespace.expressions) {
switch (e) {
Namespace: {
members +=
e.toMethod("get" + Strings.toFirstUpper(e.name), typeRef(e.fullyQualifiedName.toString)) [
abstract = true
]
members += processNamespace(e, isPreIndexingPhase);
}
XVariableDeclaration: {
members += processNamespaceVarDecl(e)
}
}
}
}
]
}
def processNamespaceVarDecl(XVariableDeclaration decl) {
val EList<JvmMember> members = new BasicEList();
val field = decl.toField(decl.name, inferredType(decl.right))[initializer = decl.right]
// members += field
members += decl.toMethod("get" + Strings.toFirstUpper(decl.name), field.type) [
abstract = true
]
if (decl.isWriteable) {
members += decl.toMethod("set" + Strings.toFirstUpper(decl.name), typeRef(Void.TYPE)) [
parameters += decl.toParameter(decl.name, field.type)
abstract = true
]
}
return members
}
I have tried using the lazy initializer after the acceptor.accept method, but it still does not work.
When I uncomment the line members += field, which adds a field to an interface, the model inferrer works fine; however, as you know, interfaces cannot have fields.
This seems like a bug to me. I have read tons of posts in the Eclipse forum but nothing seems to solve my problem. In case it is needed, this is my grammar:
grammar org.pascani.Pascani with org.eclipse.xtext.xbase.Xbase
import "http://www.eclipse.org/xtext/common/JavaVMTypes" as types
import "http://www.eclipse.org/xtext/xbase/Xbase"
generate pascani "http://www.pascani.org/Pascani"
Model
: ('package' name = QualifiedName ->';'?)?
imports = XImportSection?
typeDeclaration = TypeDeclaration?
;
TypeDeclaration
: MonitorDeclaration
| NamespaceDeclaration
;
MonitorDeclaration returns Monitor
: 'monitor' name = ValidID
('using' usings += [Namespace | ValidID] (',' usings += [Namespace | ValidID])*)?
body = '{' expressions += InternalMonitorDeclaration* '}'
;
NamespaceDeclaration returns Namespace
: 'namespace' name = ValidID body = '{' expressions += InternalNamespaceDeclaration* '}'
;
InternalMonitorDeclaration returns XExpression
: XVariableDeclaration
| EventDeclaration
| HandlerDeclaration
;
InternalNamespaceDeclaration returns XExpression
: XVariableDeclaration
| NamespaceDeclaration
;
HandlerDeclaration
: 'handler' name = ValidID '(' param = FullJvmFormalParameter ')' body = XBlockExpression
;
EventDeclaration returns Event
: 'event' name = ValidID 'raised' (periodically ?= 'periodically')? 'on'? emitter = EventEmitter ->';'?
;
EventEmitter
: eventType = EventType 'of' emitter = QualifiedName (=> specifier = RelationalEventSpecifier)? ('using' probe = ValidID)?
| cronExpression = CronExpression
;
enum EventType
: invoke
| return
| change
| exception
;
RelationalEventSpecifier returns EventSpecifier
: EventSpecifier ({RelationalEventSpecifier.left = current} operator = RelationalOperator right = EventSpecifier)*
;
enum RelationalOperator
: and
| or
;
EventSpecifier
: (below ?= 'below' | above ?= 'above' | equal ?= 'equal' 'to') value = EventSpecifierValue
| '(' RelationalEventSpecifier ')'
;
EventSpecifierValue
: value = Number (percentage ?= '%')?
| variable = QualifiedName
;
CronExpression
: seconds = CronElement // 0-59
minutes = CronElement // 0-59
hours = CronElement // 0-23
days = CronElement // 1-31
months = CronElement // 1-2 or Jan-Dec
daysOfWeek = CronElement // 0-6 or Sun-Sat
| constant = CronConstant
;
enum CronConstant
: reboot // Run at startup
| yearly // 0 0 0 1 1 *
| annually // Equal to #yearly
| monthly // 0 0 0 1 * *
| weekly // 0 0 0 * * 0
| daily // 0 0 0 * * *
| hourly // 0 0 * * * *
| minutely // 0 * * * * *
| secondly // * * * * * *
;
CronElement
: RangeCronElement | PeriodicCronElement
;
RangeCronElement hidden()
: TerminalCronElement ({RangeCronElement.start = current} '-' end = TerminalCronElement)?
;
TerminalCronElement
: expression = (IntLiteral | ValidID | '*' | '?')
;
PeriodicCronElement hidden()
: expression = TerminalCronElement '/' elements = RangeCronList
;
RangeCronList hidden()
: elements += RangeCronElement (',' elements +=RangeCronElement)*
;
IntLiteral
: INT
;
UPDATE
The use of a field was a way to continue working in other stuff until I find a solution. The actual code is:
def processNamespaceVarDecl(XVariableDeclaration decl) {
val EList<JvmMember> members = new BasicEList();
val type = if (decl.right != null) inferredType(decl.right) else decl.type
members += decl.toMethod("get" + Strings.toFirstUpper(decl.name), type) [
abstract = true
]
if (decl.isWriteable) {
members += decl.toMethod("set" + Strings.toFirstUpper(decl.name), typeRef(Void.TYPE)) [
parameters += decl.toParameter(decl.name, type)
abstract = true
]
}
return members
}
From the answer in the Eclipse forum:
i dont know if that you are doing is a good idea. the inferrer maps
your concepts to java concepts and this enables the scoping for the
expressions. if you do not have a place for your expressions then it
wont work. their types never will be computed
thus i think you have a usecase which is not possible using xbase
without customizations. your semantics is not quite clear to me.
Christian Dietrich
My answer:
Thanks Christian, I though I was doing something wrong. If it seems not to be a common use case, then there is no problem, I will make sure the user explicitly defines a variable type.
Just to clarify a little bit, a Namespace is intended to define variables that are used in Monitors. That's why a Namespace becomes an interface, and a Monitor becomes a class.
Read the Eclipse forum thread

Iterating over tokens in HIDDEN channel

I am currently working on creating an IDE for the custom, very lua-like scripting language MobTalkerScript (MTS), which provides me with an ANTLR4 lexer. Since the specifications from the language file for MTS puts comments into the HIDDEN_CHANNEL channel, I need to tell the lexer to actually read from the HIDDEN_CHANNEL channel. This is how I tried to do that.
Mts3Lexer lexer = new Mts3Lexer(new ANTLRInputStream("<replace this with the input>"));
lexer.setTokenFactory(new CommonTokenFactory(false));
lexer.setChannel(Token.HIDDEN_CHANNEL);
Token token = lexer.emit();
int type = token.getType();
do {
switch(type) {
case Mts3Lexer.LINE_COMMENT:
case Mts3Lexer.COMMENT:
System.out.println("token "+token.getText()+" is a comment");
default:
System.out.println("token "+token.getText()+" is not a comment");
}
} while((token = lexer.nextToken()) != null && (type = token.getType()) != Token.EOF);
Now, if I use this code on the following input, nothing but token ... is not a comment gets printed to the console.
function foo()
-- this should be a single-line comment
something = "blah"
--[[ this should
be a multi-line
comment ]]--
end
The tokens containing the comments never show up, though. So I searched for the source of this problem and found the following method in the ANTLR4 Lexer class:
/** Return a token from this source; i.e., match a token on the char
* stream.
*/
#Override
public Token nextToken() {
if (_input == null) {
throw new IllegalStateException("nextToken requires a non-null input stream.");
}
// Mark start location in char stream so unbuffered streams are
// guaranteed at least have text of current token
int tokenStartMarker = _input.mark();
try{
outer:
while (true) {
if (_hitEOF) {
emitEOF();
return _token;
}
_token = null;
_channel = Token.DEFAULT_CHANNEL;
_tokenStartCharIndex = _input.index();
_tokenStartCharPositionInLine = getInterpreter().getCharPositionInLine();
_tokenStartLine = getInterpreter().getLine();
_text = null;
do {
_type = Token.INVALID_TYPE;
// System.out.println("nextToken line "+tokenStartLine+" at "+((char)input.LA(1))+
// " in mode "+mode+
// " at index "+input.index());
int ttype;
try {
ttype = getInterpreter().match(_input, _mode);
}
catch (LexerNoViableAltException e) {
notifyListeners(e); // report error
recover(e);
ttype = SKIP;
}
if ( _input.LA(1)==IntStream.EOF ) {
_hitEOF = true;
}
if ( _type == Token.INVALID_TYPE ) _type = ttype;
if ( _type ==SKIP ) {
continue outer;
}
} while ( _type ==MORE );
if ( _token == null ) emit();
return _token;
}
}
finally {
// make sure we release marker after match or
// unbuffered char stream will keep buffering
_input.release(tokenStartMarker);
}
}
The line that caught my eye was the following.
_channel = Token.DEFAULT_CHANNEL;
I don't know much about ANTLR, but apparently this line keeps the lexer in the DEFAULT_CHANNEL channel.
Is the way I tried to read from the HIDDEN_CHANNEL channel right or can't I use nextToken() with the hidden channel?
I found out why the lexer didn't give me any tokens containing the comments - I seem to have missed that the grammar file skips comments instead of putting them into the hidden channel. Contacted the author, changed the grammar file and now it works.
Note to myself: pay more attention to what you read.
For Go (golang) this snippet works for me:
import (
"github.com/antlr/antlr4/runtime/Go/antlr"
)
type antlrparser interface {
GetParser() antlr.Parser
}
func fullText(prc antlr.ParserRuleContext) string {
p := prc.(antlrparser).GetParser()
ts := p.GetTokenStream()
tx := ts.GetTextFromTokens(prc.GetStart(), prc.GetStop())
return tx
}
just pass your ctx.GetSomething() into fullText. Of course, as shown above, whitespace has to go to the hidden channel in the *.g4 file:
WS: [ \t\r\n] -> channel(HIDDEN);

Categories