Left Recursion: ANTLR - java

I am trying to do Lexical Analysis in Java grammar, but got stack in that error. I am in expression part right now, doing it in parts (just using string_expression):
expression:
( expression8)
;
expression8:
{Expression8Action}
((
( "+"
| "+=" )
e2=expression )e1=expression8)?
;

Solved with turning on backtrack (file .mwe2):
language = StandardLanguage {
name = "org.xtext.example.mydsl.MyDsl"
fileExtensions = "mydsl"
serializer = {
generateStub = false
}
validator = {
// composedCheck = "org.eclipse.xtext.validation.NamesAreUniqueValidator"
}
parserGenerator = {
options = {
backtrack = true
}
}
}

Related

How to implement the visitor pattern for nested function

I am a newbie to Antlr and I wanted the below implementation to be done using Antlr4. I am having the below-written functions.
1. FUNCTION.add(Integer a,Integer b)
2. FUNCTION.concat(String a,String b)
3. FUNCTION.mul(Integer a,Integer b)
And I am storing the functions metadata like this.
Map<String,String> map=new HashMap<>();
map.put("FUNCTION.add","Integer:Integer,Integer");
map.put("FUNCTION.concat","String:String,String");
map.put("FUNCTION.mul","Integer:Integer,Integer");
Where, Integer:Integer,Integer represents Integer is the return type and input params the function will accespts are Integer,Integer.
if the input is something like this
FUNCTION.concat(Function.substring(String,Integer,Integer),String)
or
FUNCTION.concat(Function.substring("test",1,1),String)
Using the visitor implementation I wanted to check whether the input is validate or not against the functions metadata stored in map.
Below is the lexer and parser that I'm using:
Lexer MyFunctionsLexer.g4:
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Za-z0-9]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
Parser MyFunctionsParser.g4:
parser grammar MyFunctionsParser;
options {
tokenVocab=MyFunctionsLexer;
}
function : FUNCTION '.' NAME '('(function | argument (',' argument)*)')';
argument: (NAME | function);
WS : [ \t\r\n]+ -> skip;
I am using Antlr4.
Below is the implementation I'm using as per the suggested answer.
Visitor Implementation:
public class FunctionValidateVisitorImpl extends MyFunctionsParserBaseVisitor {
Map<String, String> map = new HashMap<String, String>();
public FunctionValidateVisitorImpl()
{
map.put("FUNCTION.add", "Integer:Integer,Integer");
map.put("FUNCTION.concat", "String:String,String");
map.put("FUNCTION.mul", "Integer:Integer,Integer");
map.put("FUNCTION.substring", "String:String,Integer,Integer");
}
#Override
public String visitFunctions(#NotNull MyFunctionsParser.FunctionsContext ctx) {
System.out.println("entered the visitFunctions::");
for (int i = 0; i < ctx.getChildCount(); ++i)
{
ParseTree c = ctx.getChild(i);
if (c.getText() == "<EOF>")
continue;
String top_level_result = visit(ctx.getChild(i));
System.out.println(top_level_result);
if (top_level_result == null)
{
System.out.println("Failed semantic analysis: "+ ctx.getChild(i).getText());
}
}
return null;
}
#Override
public String visitFunction( MyFunctionsParser.FunctionContext ctx) {
// Get function name and expected type information.
String name = ctx.getChild(2).getText();
String type=map.get("FUNCTION." + name);
if (type == null)
{
return null; // not declared in function table.
}
String result_type = type.split(":")[0];
String args_types = type.split(":")[1];
String[] expected_arg_type = args_types.split(",");
int j = 4;
ParseTree a = ctx.getChild(j);
if (a instanceof MyFunctionsParser.FunctionContext)
{
String v = visit(a);
if (v != result_type)
{
return null; // Handle type mismatch.
}
} else {
for (int i = j; i < ctx.getChildCount(); i += 2)
{
ParseTree parameter = ctx.getChild(i);
String v = visit(parameter);
if (v != expected_arg_type[(i - j)/2])
{
return null; // Handle type mismatch.
}
}
}
return result_type;
}
#Override
public String visitArgument(ArgumentContext ctx){
ParseTree c = ctx.getChild(0);
if (c instanceof TerminalNodeImpl)
{
// Unclear if what this is supposed to parse:
// Mutate "1" to "Integer"?
// Mutate "Integer" to "String"?
// Or what?
return c.getText();
}
else
return visit(c);
}
}
Testcalss:
public class FunctionValidate {
public static void main(String[] args) {
String input = "FUNCTION.concat(FUNCTION.substring(String,Integer,Integer),String)";
ANTLRInputStream str = new ANTLRInputStream(input);
MyFunctionsLexer lexer = new MyFunctionsLexer(str);
CommonTokenStream tokens = new CommonTokenStream(lexer);
MyFunctionsParser parser = new MyFunctionsParser(tokens);
parser.removeErrorListeners(); // remove ConsoleErrorListener
parser.addErrorListener(new VerboseListener()); // add ours
FunctionsContext tree = parser.functions();
FunctionValidateVisitorImpl visitor = new FunctionValidateVisitorImpl();
visitor.visit(tree);
}
}
Lexer:
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Za-z0-9]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
WS : [ \t\r\n]+ -> skip;
Parser:
parser grammar MyFunctionsParser;
options { tokenVocab=MyFunctionsLexer; }
functions : function* EOF;
function : FUNCTION '.' NAME '(' (function | argument (',' argument)*) ')';
argument: (NAME | function);
Verbose Listener:
public class VerboseListener extends BaseErrorListener {
#Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
List<String> stack = ((Parser)recognizer).getRuleInvocationStack();
Collections.reverse(stack);
throw new FunctionInvalidException("line "+line+":"+charPositionInLine+" at "+ offendingSymbol+": "+msg);
}
}
Output:
It is not entering visitor implementation as it is not printing System.out.println("entered the visitFunctions::"); statement.
Below is a solution in C#. This should give you an idea of how to proceed. You should be able to easily translate the code to Java.
For ease, I implemented the code using my extension AntlrVSIX for Visual Studio 2019 with NET Core C#. It makes life easier using a full IDE that supports the building of split lexer/parser grammars, debugging, and a plug-in that is suited for editing Antlr grammars.
There are several things to note with your grammar. First, your parser grammar isn't accepted by Antlr 4.7.2. Production "WS : [ \t\r\n]+ -> skip;" is a lexer rule, it can't go in a parser grammar. It has to go into the lexer grammar (or you define a combined grammar). Second, I personally wouldn't define lexer symbols like DOT, and then use in the parser the RHS of the lexer symbol directly in the parser grammar, e.g., '.'. It's confusing, and I'm pretty sure there isn't an IDE or editor would know how to go to the definition "DOT: '.';" in the lexer grammar if you positioned your cursor on the '.' in the parser grammar. I never understood why it's allowed in Antlr, but c'est la vie. I would instead use the lexer symbol you define. Third, I would consider augmenting the parser grammar in the usual way with EOF, e.g., "functions : function* EOF". But, this is entirely up to you.
Now, on the problem statement, your example input contains an inconsistency. In the first case, "substring(String,Integer,Integer)", the input is in a meta-like description of substring(). In the second case, "substring(\"test\",1,1)", you are parsing code. The first case parses with your grammar, the second does not--there's no string literal lexer rule defined in your lexer grammar. It's unclear what you really want to parse.
Overall, I defined the visitor code over strings, i.e., each method returns a string representing the output type of the function or argument, e.g., "Integer" or "String" or null if there was an error (or you could throw an exception for static semantic errors). Then, using Visit() on each child in the parse tree node, check the resulting string if it is expected, and handle matches as you like.
One other thing to note. You can solve this problem via a visitor or listener class. The visitor class is useful for purely synthesized attributes. In this example solution, I return a string that represents the type of the function or arg up the associated parse tree, checking the value for each important child. The listener class is useful for L-attributed grammars--i.e., where you are passing attributes in a DFS-oriented manner, left to right at each node in the tree. For this example, you could use the listener class and only override the Exit() functions, but you would then need a Map/Dictionary to map a "context" into an attribute (string).
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Za-z0-9]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
WS : [ \t\r\n]+ -> skip;
parser grammar MyFunctionsParser;
options { tokenVocab=MyFunctionsLexer; }
functions : function* EOF;
function : FUNCTION '.' NAME '(' (function | argument (',' argument)*) ')';
argument: (NAME | function);
using Antlr4.Runtime;
namespace AntlrConsole2
{
public class Program
{
static void Main(string[] args)
{
var input = #"FUNCTION.concat(FUNCTION.substring(String,Integer,Integer),String)";
var str = new AntlrInputStream(input);
var lexer = new MyFunctionsLexer(str);
var tokens = new CommonTokenStream(lexer);
var parser = new MyFunctionsParser(tokens);
var listener = new ErrorListener<IToken>();
parser.AddErrorListener(listener);
var tree = parser.functions();
if (listener.had_error)
{
System.Console.WriteLine("error in parse.");
}
else
{
System.Console.WriteLine("parse completed.");
}
var visitor = new Validate();
visitor.Visit(tree);
}
}
}
namespace AntlrConsole2
{
using System;
using Antlr4.Runtime.Misc;
using System.Collections.Generic;
class Validate : MyFunctionsParserBaseVisitor<string>
{
Dictionary<String, String> map = new Dictionary<String, String>();
public Validate()
{
map.Add("FUNCTION.add", "Integer:Integer,Integer");
map.Add("FUNCTION.concat", "String:String,String");
map.Add("FUNCTION.mul", "Integer:Integer,Integer");
map.Add("FUNCTION.substring", "String:String,Integer,Integer");
}
public override string VisitFunctions([NotNull] MyFunctionsParser.FunctionsContext context)
{
for (int i = 0; i < context.ChildCount; ++i)
{
var c = context.GetChild(i);
if (c.GetText() == "<EOF>")
continue;
var top_level_result = Visit(context.GetChild(i));
if (top_level_result == null)
{
System.Console.WriteLine("Failed semantic analysis: "
+ context.GetChild(i).GetText());
}
}
return null;
}
public override string VisitFunction(MyFunctionsParser.FunctionContext context)
{
// Get function name and expected type information.
var name = context.GetChild(2).GetText();
map.TryGetValue("FUNCTION." + name, out string type);
if (type == null)
{
return null; // not declared in function table.
}
string result_type = type.Split(":")[0];
string args_types = type.Split(":")[1];
string[] expected_arg_type = args_types.Split(",");
const int j = 4;
var a = context.GetChild(j);
if (a is MyFunctionsParser.FunctionContext)
{
var v = Visit(a);
if (v != result_type)
{
return null; // Handle type mismatch.
}
} else {
for (int i = j; i < context.ChildCount; i += 2)
{
var parameter = context.GetChild(i);
var v = Visit(parameter);
if (v != expected_arg_type[(i - j)/2])
{
return null; // Handle type mismatch.
}
}
}
return result_type;
}
public override string VisitArgument([NotNull] MyFunctionsParser.ArgumentContext context)
{
var c = context.GetChild(0);
if (c is Antlr4.Runtime.Tree.TerminalNodeImpl)
{
// Unclear if what this is supposed to parse:
// Mutate "1" to "Integer"?
// Mutate "Integer" to "String"?
// Or what?
return c.GetText();
}
else
return Visit(c);
}
}
}

How can I change javaCC adder.jj to receive a String instead of a stream from command prompt?

I created adder.jj file following this tutorial (till page 13, just before it starts with the calculator example), to create an adder, which works great for obtaining the result of numbers and plus sign in a syntactically correct way (e.g. "4+3 +7" returns 14, while "4++3" gives an error), those numbers and + signs come from a text file (this is explained in a bit).
The code I use to generate the needed classes to do what is explained before.
options
{
STATIC = false ;
}
PARSER_BEGIN(Adder)
class Adder
{
public static void main (String[] args)
throws ParseException, TokenMgrError, NumberFormatException
{
Adder parser = new Adder (System.in) ;
int val = parser.Start() ;
System.out.println(val) ;
}
}
PARSER_END(Adder)
SKIP : { " " }
SKIP : { "\n" | "\r" | "\r\n" }
TOKEN : { < PLUS :"+"> }
TOKEN : { < NUMBER : (["0"-"9"])+ > }
int Start() throws NumberFormatException :
{
int i ;
int value ;
}
{
value = Primary()
(
<PLUS>
i = Primary()
{ value += i ; }
)*
{ return value ; }
}
int Primary() throws NumberFormatException :
{
Token t ;
}
{
t=<NUMBER>
{ return Integer.parseInt( t.image ) ; }
}
The classes are generated with
javacc adder.jj
Then I compile the generated classes with
javac *.java
And finally
java Adder < ex1.txt
Gives the right output if the content of ex1.txt has the format I explained before.
How can I change this code to receive a String so I can actually use it in my project instead of the stream from the command line?
Try replacing
Adder parser = new Adder (System.in) ;
with
Reader reader = new StringReader( someString ) ;
Adder parser = new Adder( reader ) ;

Iterating over tokens in HIDDEN channel

I am currently working on creating an IDE for the custom, very lua-like scripting language MobTalkerScript (MTS), which provides me with an ANTLR4 lexer. Since the specifications from the language file for MTS puts comments into the HIDDEN_CHANNEL channel, I need to tell the lexer to actually read from the HIDDEN_CHANNEL channel. This is how I tried to do that.
Mts3Lexer lexer = new Mts3Lexer(new ANTLRInputStream("<replace this with the input>"));
lexer.setTokenFactory(new CommonTokenFactory(false));
lexer.setChannel(Token.HIDDEN_CHANNEL);
Token token = lexer.emit();
int type = token.getType();
do {
switch(type) {
case Mts3Lexer.LINE_COMMENT:
case Mts3Lexer.COMMENT:
System.out.println("token "+token.getText()+" is a comment");
default:
System.out.println("token "+token.getText()+" is not a comment");
}
} while((token = lexer.nextToken()) != null && (type = token.getType()) != Token.EOF);
Now, if I use this code on the following input, nothing but token ... is not a comment gets printed to the console.
function foo()
-- this should be a single-line comment
something = "blah"
--[[ this should
be a multi-line
comment ]]--
end
The tokens containing the comments never show up, though. So I searched for the source of this problem and found the following method in the ANTLR4 Lexer class:
/** Return a token from this source; i.e., match a token on the char
* stream.
*/
#Override
public Token nextToken() {
if (_input == null) {
throw new IllegalStateException("nextToken requires a non-null input stream.");
}
// Mark start location in char stream so unbuffered streams are
// guaranteed at least have text of current token
int tokenStartMarker = _input.mark();
try{
outer:
while (true) {
if (_hitEOF) {
emitEOF();
return _token;
}
_token = null;
_channel = Token.DEFAULT_CHANNEL;
_tokenStartCharIndex = _input.index();
_tokenStartCharPositionInLine = getInterpreter().getCharPositionInLine();
_tokenStartLine = getInterpreter().getLine();
_text = null;
do {
_type = Token.INVALID_TYPE;
// System.out.println("nextToken line "+tokenStartLine+" at "+((char)input.LA(1))+
// " in mode "+mode+
// " at index "+input.index());
int ttype;
try {
ttype = getInterpreter().match(_input, _mode);
}
catch (LexerNoViableAltException e) {
notifyListeners(e); // report error
recover(e);
ttype = SKIP;
}
if ( _input.LA(1)==IntStream.EOF ) {
_hitEOF = true;
}
if ( _type == Token.INVALID_TYPE ) _type = ttype;
if ( _type ==SKIP ) {
continue outer;
}
} while ( _type ==MORE );
if ( _token == null ) emit();
return _token;
}
}
finally {
// make sure we release marker after match or
// unbuffered char stream will keep buffering
_input.release(tokenStartMarker);
}
}
The line that caught my eye was the following.
_channel = Token.DEFAULT_CHANNEL;
I don't know much about ANTLR, but apparently this line keeps the lexer in the DEFAULT_CHANNEL channel.
Is the way I tried to read from the HIDDEN_CHANNEL channel right or can't I use nextToken() with the hidden channel?
I found out why the lexer didn't give me any tokens containing the comments - I seem to have missed that the grammar file skips comments instead of putting them into the hidden channel. Contacted the author, changed the grammar file and now it works.
Note to myself: pay more attention to what you read.
For Go (golang) this snippet works for me:
import (
"github.com/antlr/antlr4/runtime/Go/antlr"
)
type antlrparser interface {
GetParser() antlr.Parser
}
func fullText(prc antlr.ParserRuleContext) string {
p := prc.(antlrparser).GetParser()
ts := p.GetTokenStream()
tx := ts.GetTextFromTokens(prc.GetStart(), prc.GetStop())
return tx
}
just pass your ctx.GetSomething() into fullText. Of course, as shown above, whitespace has to go to the hidden channel in the *.g4 file:
WS: [ \t\r\n] -> channel(HIDDEN);

Lucene multi word tokens with delimiter

I am just starting with Lucene so it's probably a beginners question. We are trying to implement a semantic search on digital books and already have a concept generator, so for example the contexts I generate for a new article could be:
|Green Beans | Spring Onions | Cooking |
I am using Lucene to create an index on the books/articles using only the extracted concepts (stored in a temporary document for that purpose). Now the standard analyzer is creating single word tokens: Green, Beans, Spring, Onions, Cooking, which of course is not the same.
My question: is there an analyzer that is able to detect delimiters around tokens (|| in our example), or an analyzer that is able to detect multi-word constructs?
I'm afraid we'll have to create our own analyzer, but I don't quite know where to start for that one.
Creating an analyzer is pretty easy. An analyzer is just a tokenizer optionally followed by token filters. In your case, you'd have to create your own tokenizer. Fortunately, you have a convenient base class for this: CharTokenizer.
You implement the isTokenChar method and make sure it returns false on the | character and true on any other character. Everything else will be considered part of a token.
Once you have the tokenizer, the analyzer should be straightforward, just look at the source code of any existing analyzer and do likewise.
Oh, and if you can have spaces between your | chars, just add a TrimFilter to the analyzer.
I came across this question because I am doing something with my Lucene mechanisms which creates data structures to do with sequencing, in effect "hijacking" the Lucene classes. Otherwise I can't imagine why people would want knowledge of the separators ("delimiters") between tokens, but as it was quite tricky I thought I'd put it here for the benefit of anyone who might need to.
You have to rewrite your own versions of StandardTokenizer and StandardTokenizerImpl. These are both final classes so you can't extend them.
SeparatorDeliveringTokeniserImpl (tweaked from source of StandardTokenizerImpl):
3 new fields:
private int startSepPos = 0;
private int endSepPos = 0;
private String originalBufferAsString;
Tweak these methods:
public final void getText(CharTermAttribute t) {
t.copyBuffer(zzBuffer, zzStartRead, zzMarkedPos - zzStartRead);
if( originalBufferAsString == null ){
originalBufferAsString = new String( zzBuffer, 0, zzBuffer.length );
}
// startSepPos == -1 is a "flag condition": it means that this token is the last one and it won't be followed by a sep
if( startSepPos != -1 ){
// if the flag is NOT set, record the start pos of the next sep...
startSepPos = zzMarkedPos;
}
}
public final void yyreset(java.io.Reader reader) {
zzReader = reader;
zzAtBOL = true;
zzAtEOF = false;
zzEOFDone = false;
zzEndRead = zzStartRead = 0;
zzCurrentPos = zzMarkedPos = 0;
zzFinalHighSurrogate = 0;
yyline = yychar = yycolumn = 0;
zzLexicalState = YYINITIAL;
if (zzBuffer.length > ZZ_BUFFERSIZE)
zzBuffer = new char[ZZ_BUFFERSIZE];
// reset fields responsible for delivering separator...
originalBufferAsString = null;
startSepPos = 0;
endSepPos = 0;
}
(inside getNextToken:)
if ((zzAttributes & 1) == 1) {
zzAction = zzState;
zzMarkedPosL = zzCurrentPosL;
if ((zzAttributes & 8) == 8) {
// every occurrence of a separator char leads here...
endSepPos = zzCurrentPosL;
break zzForAction;
}
}
And make a new method:
String getPrecedingSeparator() {
String sep = null;
if( originalBufferAsString == null ){
sep = new String( zzBuffer, 0, endSepPos );
}
else if( startSepPos == -1 || endSepPos <= startSepPos ){
sep = "";
}
else {
sep = originalBufferAsString.substring( startSepPos, endSepPos );
}
if( zzMarkedPos < startSepPos ){
// ... then this is a sign that the next token will be the last one... and will NOT have a trailing separator
// so set a "flag condition" for next time this method is called
startSepPos = -1;
}
return sep;
}
SeparatorDeliveringTokeniser (tweaked from source of StandardTokenizer):
Add this:
private String separator;
String getSeparator(){
// normally this delivers a preceding separator... but after incrementToken returns false, if there is a trailing
// separator, it then delivers that...
return separator;
}
(inside incrementToken:)
while(true) {
int tokenType = scanner.getNextToken();
// added NB this gives you the separator which PRECEDES the token
// which you are about to get from scanner.getText( ... )
separator = scanner.getPrecedingSeparator();
if (tokenType == SeparatorDeliveringTokeniserImpl.YYEOF) {
// NB at this point sep is equal to the trailing separator...
return false;
}
...
Usage:
In my FilteringTokenFilter subclass, called TokenAndSeparatorExamineFilter, the methods accept and end look like this:
#Override
public boolean accept() throws IOException {
String sep = ((SeparatorDeliveringTokeniser) input).getSeparator();
// a preceding separator can only be an empty String if we are currently
// dealing with the first token and if the sequence starts with a token
if (!sep.isEmpty()) {
// ... do something with the preceding separator
}
// then get the token...
String token = getTerm();
// ... do something with the token
// my filter does no filtering! Every token is accepted...:
return true;
}
#Override
public void end() throws IOException {
// deals with trailing separator at the end of a sequence of tokens and separators (if there is one, i.e. if it doesn't end with a token)
String sep = ((SeparatorDeliveringTokeniser) input).getSeparator();
// NB will be an empty String if there is no trailing separator
if (!sep.isEmpty()) {
// ... do something with this trailing separator
}
}

ANTLR Evaluation of a Compound Rule

The following is a snippet from an ANTLR grammar I have been working on:
compoundEvaluation returns [boolean evalResult]
: singleEvaluation (('AND'|'OR') singleEvaluation)*
;
//overall rule to evaluate a single expression
singleEvaluation returns [boolean evalResult]
: simpleStringEvaluation {$evalResult = $simpleStringEvaluation.evalResult;}
| stringEvaluation {$evalResult = $stringEvaluation.evalResult;}
| simpleDateEvaluation {$evalResult = $simpleDateEvaluation.evalResult;}
| dateEvaluatorWithModifier1 {$evalResult = $dateEvaluatorWithModifier1.evalResult;}
| dateEvaluatorWithoutModifier1 {$evalResult = $dateEvaluatorWithoutModifier1.evalResult;}
| simpleIntegerEvaluator {$evalResult = $simpleIntegerEvaluator.evalResult;}
| integerEvaluator {$evalResult = $integerEvaluator.evalResult;}
| integerEvaluatorWithModifier {$evalResult = $integerEvaluatorWithModifier.evalResult;}
;
Here's a sample of one of those evaluation rules:
simpleStringEvaluation returns [boolean evalResult]
: op1=STR_FIELD_IDENTIFIER operator=(EQ|NE) '"'? op2=(SINGLE_VALUE|INTEGER) '"'?
{
// I don't want these to be equal by default
String op1Value = op1.getText();
String op2Value = op2.getText();
try {
// get the values of the bean property specified by the value of op1 and op2
op1Value = BeanUtils.getProperty(policy,op1.getText());
} catch (NoSuchMethodException e) {
e.printStackTrace();
} catch (InvocationTargetException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
}
String strOperator = operator.getText();
if (strOperator.equals("=")) {
evalResult = op1Value.equals(op2Value);
}
if (strOperator.equals("<>")) {
evalResult = !op1Value.equals(op2Value);
}
}
;
Obviously I'm a newbie since I'm not building a tree, but the code works so I'm reasonably happy with it. However, the next step is to perform logical evaluations on multiple singleEvaluation statements. Since I'm embedding the code in the grammar, I was hoping someone could point me in the right direction to figure out how to evaluate 0 or more results.
There is no need to store the values in a set.
Why not simply do something like this:
compoundOrEvaluation returns [boolean evalResult]
: a=singleEvaluation { $evalResult = $a.evalResult; }
( ('OR') b=singleEvaluation { $evalResult ||= $b.evalResult; } )*
;
?
Here's how I did it. I created a Set as a member, then in each statement's #init, I reinitialized the Set. As the statement was evaluated, it populated the set. Since the only legal values of the set are true or false, I end up having a set with 0, 1, or two members.
The OR evaluation looks like this:
compoundOrEvaluation returns [boolean evalResult]
#init {evaluationResults = new HashSet<Boolean>();}
: a=singleEvaluation {evaluationResults.add($a.evalResult);} (('OR') b=singleEvaluation {evaluationResults.add($b.evalResult);})+
{
if (evaluationResults.size()==1) {
evalResult = evaluationResults.contains(true);
} else {
evalResult = true;
}
}
;
The AND evaluation only differs in the else statement, where evalResult will be set to false. So far, this passes the unit tests I can throw at it.
Eventually I may use a tree and a visitor class, but the code currently works.

Categories