Hopefully there are a few experts in the EpochX framework around here...I'm not sure that the user group is still active.
I am attempting to implement simple recursion within their represention of a BNF grammar and have fun into the following issue:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -9
at java.lang.String.substring(String.java:1911)
at org.epochx.epox.EpoxParser.parse(EpoxParser.java:235)
at org.epochx.epox.EpoxParser.parse(EpoxParser.java:254)
at org.epochx.tools.eval.EpoxInterpreter.eval(EpoxInterpreter.java:89)
at org.epochx.ge.model.epox.SAGE.getFitness(SAGE.java:266)
at org.epochx.ge.representation.GECandidateProgram.getFitness(GECandidateProgram.java:304)
at org.epochx.stats.StatField$7.getStatValue(StatField.java:97)
at org.epochx.stats.Stats.getStat(Stats.java:134)
at org.epochx.stats.StatField$8.getStatValue(StatField.java:117)
at org.epochx.stats.Stats.getStat(Stats.java:134)
at org.epochx.stats.Stats.getStats(Stats.java:162)
at org.epochx.stats.Stats.print(Stats.java:194)
at org.epochx.stats.Stats.print(Stats.java:178)
at org.epochx.ge.model.epox.Tester$1.onGenerationEnd(Tester.java:41)
at org.epochx.life.Life.fireGenerationEndEvent(Life.java:634)
at org.epochx.core.InitialisationManager.initialise(InitialisationManager.java:207)
at org.epochx.core.RunManager.run(RunManager.java:166)
at org.epochx.core.Model.run(Model.java:147)
at org.epochx.ge.model.GEModel.run(GEModel.java:82)
at org.epochx.ge.model.epox.Tester.main(Tester.java:55)
Java Result: 1
My simple grammar is structured as follows, where terminals are passed in separately to the evaluation function:
public static final String GRAMMAR_FRAGMENT = "<program> ::= <node>\n"
+ "<node> ::= <s_list>\n"
+ "<s_list> ::= <s> | <s> <s_list>\n"
+ "<s> ::= FUNCTION( <terminal> )\n"
+ "<terminal> ::= ";
Edit: Terminal creation -
// Generate the input sequences.
inputValues = BoolUtils.generateBoolSequences(4);
argNames = new String[4];
argNames[0] = "void";
argNames[1] = "bubbleSort";
argNames[2] = "int*";
argNames[3] = "numbers";
...
// Evaluate all possible inputValues.
for (final boolean[] vars: inputValues) {
// Convert to object array.
final Boolean[] objVars = ArrayUtils.toObject(vars);
Boolean result = null;
try {
interpreter.eval(program.getSourceCode(),
argNames, objVars);
score = (double)program.getParseTreeDepth();
} catch (final MalformedProgramException e) {
// Assign worst possible fitness and stop evaluating.
score = 0;
break;
}
}
The stacktrace shows that the problem is actually in the EpoxParser, this means that its not so much the grammar that is ill-formed, but rather that the programs that get generated cannot be parsed.
Because you're using the EpoxInterpreter, the programs that get generated get parsed as Epox programs. Epox is the name used to refer to the language that the tree representation of EpochX uses (a sort of corrupted form of Lisp which you can add your own literals/functions to). The parsing expects the S-Expression format, and tries to identify each function and terminal and it builds a tree made up of equivalent Node objects (see the org.epochx.epox.* packages). Then the tree can be evaluated to run the program.
But in Epox there's no built-in function called FUNCTION, nor any known literals 'void', 'bubbleSort', 'int*' or 'numbers'. So the parsing fails. So you need to add these constructs to the EpoxParser, so it knows how to parse them into nodes. You can do this with the declareFunction, declareLiteral and declareVariable methods (see the JavaDoc for the EpoxParser http://www.epochx.org/javadoc/1.4/).
Related
I am trying to parse Java class files using Java.g4 grammar and Antlr4.
There is a particular parser rule as follows:
classOrInterfaceType
: Identifier typeArguments? ('.' Identifier typeArguments? )*
;
I am parsing it in my visitor class in this way:
public String visitClassOrInterfaceType(JavaParser.ClassOrInterfaceTypeContext ctx) {
StringBuilder clsIntr = new StringBuilder("");
int n = ctx.getChildCount();
for(int i = 0; i < n; i++){
TerminalNode id = ctx.Identifier(i);
if(id!=null){
clsIntr.append(id.getText()).append(" ");
}
TypeArgumentsContext typArgCtx =ctx.typeArguments(i);
if(typArgCtx!=null){
String val = this.visitTypeArguments(typArgCtx);
clsIntr.append(val);
}
}
return clsIntr.toString();
}
Is this correct or there is some other way to do this?
Your approach looks ok, even though that ultimately depends on what you are actually trying to do. My crystal ball tells me you try to reconstruct the original query text by walking the parse tree. However, you can get that much simpler. Each parse context has start and stop members that hold tokens for the parsed text range this context stands for. You could use those to directly get the original text exactly like it was entered (via the token stream and the token's positions).
Using Antlr4, I want to generate the parse tree in the form of Java/JavaScript code.
This is what my main.Java looks like
String sql = "SELECT log AS x FROM t1 \n" +
"GROUP BY x\n" +
"HAVING count(*) >= 4 \n" +
"ORDER BY max(n) + 0";
// Create a lexer and parser for the input.
SQLiteLexer lexer = new SQLiteLexer(new ANTLRInputStream(sql));
SQLiteParser parser = new SQLiteParser(new CommonTokenStream(lexer));
// Invoke the `select_stmt` production.
ParseTree tree = parser.select_stmt();
ParseTreeWalker walker = new ParseTreeWalker();
SQLiteListener listener = new SQLiteBaseListener();
ParseTreeWalker.DEFAULT.walk(listener, tree);
System.out.println(listener.);
What function should I invoke to generate the parse tree in code format?
I'm not sure what you meant by "hierarchical" view. There is -gui option with antlr command line tool if that is that is what you are looking for. Otherwise, you can print how the grammar is evaluated by adding Actions and/or add Sysouts in the enter/exist listener methods created by ANTLR. For example, if you have the following grammer:
grammar Grammar;
#lexer::header{
//package name where Java files will be created
}
#parser::header{
//package name where Java files will be created
}
value : letters | number | string;
letters : LETTERS;
number : NUMBER;
string : STRING;
LETTERS : '/*' {System.out.println("Found Letters!");};
NUMBER : [0-9]+ {System.out.println("Found Number!");};
STRING : [a-zA-Z0-9]+ {System.out.println("Found String!");};
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
ANTLR4 will generate a GrammarListener.java (assuming that your grammar is called Grammar.g4) if you run right click the .g4 file in Eclipse and select "Generate ANTLR Recognizer" with ANTRL4 IDE installed. You can also generate the parser and lexer using Java by calling the static org.antlr.v4.Tool.main:
Tool.main(new String[]{grammarFile, "-o", outputDirectory});
The generated interface will contain methods like as follows:
public interface GrammarListener extends ParserTreeListener {
void enterValue(GrammarParser.Valuecontext ctx);
void exitValue(GrammarParser.ValueContext ctx);
void enterLetters(GrammarParser.StringContext ctx);
void exitLetters(GrammarParser.StringContext ctx);
.
.
}
You would need to implement this interface...
public class GrammarListenerImpl implements GrammarListener {
.
.
#Override
public void enterLetters(GrammarParser.LetterContext ctx) {
System.out.println("Enter: Letters");
// do other stuff
.
.
}
and add Sysouts, in this case, or other business logic to handle when this match occurs in the grammar. The Sysouts can generate something like:
Enter value
Enter Letters
Do something...
Exit Letters
Exit value
This would show a nested (format it with tab/space etc.) the call sequence in which the grammar is evaluated.
This line:
ParseTree tree = parser.select_stmt();
is your parse tree. Checkout the API docs to see what method it has: http://www.antlr.org/api/JavaTool/org/antlr/v4/runtime/tree/ParseTree.html
You'll probably be interested in its getChild(...) and getParent() methods.
JSpeech Grammar Format allows user to specify tags for separate strings in curly brackets as follows:
<jump> = jump { primitive jump } [up] |
jump [to the] (left { primitive jump_left } |right { primitive jump_right } );
or
<effects> = nothing happens { NOTHING_HAPPENS } | ( [will] die | dies ) { OBJECT_DESTRUCTION } | (get|gets) new (coin|coins) { COIN_INCREASE };
Using tags is described more thoroughly in section 4.6.1 of the referenced specification.
In Sphinx4 you can catch these tags using getTags() method in RuleParse. So if user says "jump to the left" the following tag will be returned "primitive jump_left"
Now, I would like to do exactly the opposite - given the tag, I would like to match it to the string. So for "NOTHING_HAPPENS" I would like to get "nothing happens" or for "OBJECT_DESTRUCTION" an arry with all possible options: "will die, die, dies".
Is there any such method that can parse grammar files in such way or do I have to hardcode it?
My sollution to this is to generate all possible sentences defined by JSGF file. This can be done easily with dumpRandomSentences or getRandomSentence methods provided by Grammar classin Sphinx and give them back to the Recognizer which will print out the tags.
Sample code from my project:
for (int i = 0; i < 20000; i++) {
String utterance = grammar.getRandomSentence();
String tags;
try {
tags = parser.getTagString(utterance);
System.out.println(tags+" ==> "+utterance);
} catch (GrammarException e) {
error(e.toString());
}
}
I have to retrieve a set of column values from D/B and check it as a condition.
For example, I will have strings like "value > 2", "4 < value < 6" in a D/B column. (value is the one which is compared all the time). I will have a variable value declared in my code and I should evaluate this condition.
int value = getValue();
if (value > 2) //(the string retrieved from the D/B)
doSomething();
How can I do this?? Any help is muceh appreciated. Thanks.
Here is an example using the standard (Java 1.6+) scripting library:
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
public class Test {
public static void main(String[] args) throws Exception {
ScriptEngineManager factory = new ScriptEngineManager();
ScriptEngine engine = factory.getEngineByName("JavaScript");
engine.eval("value = 10");
Boolean greaterThan5 = (Boolean) engine.eval("value > 5");
Boolean lessThan5 = (Boolean) engine.eval("value < 5");
System.out.println("10 > 5? " + greaterThan5); // true
System.out.println("10 < 5? " + lessThan5); // false
}
}
You are basically evaluating a scripted expression. Depending what is allowed in that expression, you can get away with something very simple (regular expression identifying the groups) or very complex (embed a javascript engine?).
I'm assuming you're looking at the simple case, where the expression is:
[boundary0] [operator] "value" [operator] [boundary1]
Where one, but not both of the [boundary] [operator] groups might be omitted. (And if both are presents, operator should be the same)
I would use a regular expression for that with capturing groups.
Something like: (?:(\d+)\s*([<>])\s*)?value(?:\s*([<>])\s*(\d+))?
And then:
boundary1 = group(1); operator1 = group(2); operator2 = group(3); boundary2 = group(4)
It's not going to be trivial: you need a parser for the expression language used in your database. If it's some standard, well-specified language, then you might be able to find one on the Internet, but if it's an in-house thing, then you may need to write your own (perhaps using a parser generator like ANTLR.)
The javax.script package contains some tools for integrating external scripting engines like a Javascript interpreter. An alternative idea would be to bring in a scripting engine and feed the expressions to that.
You should try parsing the string inside the if statement by doing something like
if(parseMyString(getValue()))
doSomething();
In parseMyString can determine what you need to evaluate for the condition. If you don't know how to create a parser then take a look at: http://www.javapractices.com/topic/TopicAction.do?Id=87
This doesn't answer your question per se; it offers an alternate solution that may effect the same result.
Instead of storing a single database column with pseudocode that defines a condition, make a table in which the schema define types of conditions that must be satisifed and the values of those conditions. This simplifies programmatic evaluation of those conditions, but it may become complicated if you have a variety of types of conditions to evaluate.
For example, you might have a table that looks like the following.
CONDITION_ID | MINIMUM | MAXIMUM | IS_PRIME | ETC.
______________________________________________________
1 | 2 | NULL | NULL | ...
2 | 4 | 6 | NULL | ...
Those row entries, respectively map to the rules value > 2 and 6 > value > 4.
This confers a number of benefits over the approach you provide.
Improved performance and cleanliness
Your conditions can be evaluated at the database level, and can be used to filter queries
You needn't worry about handling scenarios in which your pseudocode syntax is broken
For evaluating the conditions with maximum flexibility use a scripting language designed for embedding, for instance MVEL can parse and evaluate simple conditional expression like the ones in the question.
Using MVEL has one huge advantage over using the Scripting engine in Java 1.6+ (in particular, with JavaScript): with MVEL you can compile the scripts to bytecode, making their evaluation much more efficient at runtime.
The latest version of java (Java 7) allows Switch Case statements on Strings, if there are not many possible variations you could just do this or similar :
int value = getValue();
switch(myString) {
case "value > 2" : if (value > 2) { doSomething();} break;
case "4 < value < 6" : if (value > 4 && value < 6) { doSomethingElse();} break;
default : doDefault();
}
A very good way of doing this apart from using Java 7 is using enums.
Declare enum as shown below
The above enum has a collection of constants whose values are set to the strings that you expect would be returned from the database. As you can use enums in switch cases the remaining code becomes easy
enum MyEnum
{
val1("value < 4"),val2("4<value<6");
private String value;
private MyEnum(String value)
{
this.value = value;
}
}
public static void chooseStrategy(MyEnum enumVal)
{
int value = getValue();
switch(enumVal)
{
case val1:
if(value > 2){}
break;
case val2:
if(4 < value && value < 6) {}
break;
default:
}
}
public static void main(String[] args)
{
String str = "4<value<6";
chooseStrategy(MyEnum.valueOf(str));
}
All you have to do is pass your string to the enum.valueof method and it will return the appropiate enum which is put in a switch case block to perform conditional operation . In the above code you can pass any string in place of what is passed in this example
I'm trying to write a simple interactive (using System.in as source) language using antlr, and I have a few problems with it. The examples I've found on the web are all using a per line cycle, e.g.:
while(readline)
result = parse(line)
doStuff(result)
But what if I'm writing something like pascal/smtp/etc, with a "first line" looks like X requirment? I know it can be checked in doStuff, but I think logically it is part of the syntax.
Or what if a command is split into multiple lines? I can try
while(readline)
lines.add(line)
try
result = parse(lines)
lines = []
doStuff(result)
catch
nop
But with this I'm also hiding real errors.
Or I could reparse all lines everytime, but:
it will be slow
there are instructions I don't want to run twice
Can this be done with ANTLR, or if not, with something else?
Dutow wrote:
Or I could reparse all lines everytime, but:
it will be slow
there are instructions I don't want to run twice
Can this be done with ANTLR, or if not, with something else?
Yes, ANTLR can do this. Perhaps not out of the box, but with a bit of custom code, it sure is possible. You also don't need to re-parse the entire token stream for it.
Let's say you want to parse a very simple language line by line that where each line is either a program declaration, or a uses declaration, or a statement.
It should always start with a program declaration, followed by zero or more uses declarations followed by zero or more statements. uses declarations cannot come after statements and there can't be more than one program declaration.
For simplicity, a statement is just a simple assignment: a = 4 or b = a.
An ANTLR grammar for such a language could look like this:
grammar REPL;
parse
: programDeclaration EOF
| usesDeclaration EOF
| statement EOF
;
programDeclaration
: PROGRAM ID
;
usesDeclaration
: USES idList
;
statement
: ID '=' (INT | ID)
;
idList
: ID (',' ID)*
;
PROGRAM : 'program';
USES : 'uses';
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT : '0'..'9'+;
SPACE : (' ' | '\t' | '\r' | '\n') {skip();};
But, we'll need to add a couple of checks of course. Also, by default, a parser takes a token stream in its constructor, but since we're planning to trickle tokens in the parser line-by-line, we'll need to create a new constructor in our parser. You can add custom members in your lexer or parser classes by putting them in a #parser::members { ... } or #lexer::members { ... } section respectively. We'll also add a couple of boolean flags to keep track whether the program declaration has happened already and if uses declarations are allowed. Finally, we'll add a process(String source) method which, for each new line, creates a lexer which gets fed to the parser.
All of that would look like:
#parser::members {
boolean programDeclDone;
boolean usesDeclAllowed;
public REPLParser() {
super(null);
programDeclDone = false;
usesDeclAllowed = true;
}
public void process(String source) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(source);
REPLLexer lexer = new REPLLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
super.setTokenStream(tokens);
this.parse(); // the entry point of our parser
}
}
Now inside our grammar, we're going to check through a couple of gated semantic predicates if we're parsing declarations in the correct order. And after parsing a certain declaration, or statement, we'll want to flip certain boolean flags to allow- or disallow declaration from then on. The flipping of these boolean flags is done through each rule's #after { ... } section that gets executed (not surprisingly) after the tokens from that parser rule are matched.
Your final grammar file now looks like this (including some System.out.println's for debugging purposes):
grammar REPL;
#parser::members {
boolean programDeclDone;
boolean usesDeclAllowed;
public REPLParser() {
super(null);
programDeclDone = false;
usesDeclAllowed = true;
}
public void process(String source) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(source);
REPLLexer lexer = new REPLLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
super.setTokenStream(tokens);
this.parse();
}
}
parse
: programDeclaration EOF
| {programDeclDone}? (usesDeclaration | statement) EOF
;
programDeclaration
#after{
programDeclDone = true;
}
: {!programDeclDone}? PROGRAM ID {System.out.println("\t\t\t program <- " + $ID.text);}
;
usesDeclaration
: {usesDeclAllowed}? USES idList {System.out.println("\t\t\t uses <- " + $idList.text);}
;
statement
#after{
usesDeclAllowed = false;
}
: left=ID '=' right=(INT | ID) {System.out.println("\t\t\t " + $left.text + " <- " + $right.text);}
;
idList
: ID (',' ID)*
;
PROGRAM : 'program';
USES : 'uses';
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT : '0'..'9'+;
SPACE : (' ' | '\t' | '\r' | '\n') {skip();};
which can be tested wit the following class:
import org.antlr.runtime.*;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws Exception {
Scanner keyboard = new Scanner(System.in);
REPLParser parser = new REPLParser();
while(true) {
System.out.print("\n> ");
String input = keyboard.nextLine();
if(input.equals("quit")) {
break;
}
parser.process(input);
}
System.out.println("\nBye!");
}
}
To run this test class, do the following:
# generate a lexer and parser:
java -cp antlr-3.2.jar org.antlr.Tool REPL.g
# compile all .java source files:
javac -cp antlr-3.2.jar *.java
# run the main class on Windows:
java -cp .;antlr-3.2.jar Main
# or on Linux/Mac:
java -cp .:antlr-3.2.jar Main
As you can see, you can only declare a program once:
> program A
program <- A
> program B
line 1:0 rule programDeclaration failed predicate: {!programDeclDone}?
uses cannot come after statements:
> program X
program <- X
> uses a,b,c
uses <- a,b,c
> a = 666
a <- 666
> uses d,e
line 1:0 rule usesDeclaration failed predicate: {usesDeclAllowed}?
and you must start with a program declaration:
> uses foo
line 1:0 rule parse failed predicate: {programDeclDone}?
Here's an example of how to parse input from System.in without first manually parsing it one line at a time and without making major compromises in the grammar. I'm using ANTLR 3.4. ANTLR 4 may have addressed this problem already. I'm still using ANTLR 3, though, and maybe someone else with this problem still is too.
Before getting into the solution, here are the hurdles I ran into that keeps this seemingly trivial problem from being easy to solve:
The built-in ANTLR classes that derive from CharStream consume the entire stream of data up-front. Obviously an interactive mode (or any other indeterminate-length stream source) can't provide all the data.
The built-in BufferedTokenStream and derived class(es) will not end on a skipped or off-channel token. In an interactive setting, this means that the current statement can't end (and therefore can't execute) until the first token of the next statement or EOF has been consumed when using one of these classes.
The end of the statement itself may be indeterminate until the next statement begins.
Consider a simple example:
statement: 'verb' 'noun' ('and' 'noun')*
;
WS: //etc...
Interactively parsing a single statement (and only a single statement) isn't possible. Either the next statement has to be started (that is, hitting "verb" in the input), or the grammar has to be modified to mark the end of the statement, e.g. with a ';'.
I haven't found a way to manage a multi-channel lexer with my solution. It doesn't hurt me since I can replace my $channel = HIDDEN with skip(), but it's still a limitation worth mentioning.
A grammar may need a new rule to simplify interactive parsing.
For example, my grammar's normal entry point is this rule:
script
: statement* EOF -> ^(STMTS statement*)
;
My interactive session can't start at the script rule because it won't end until EOF. But it can't start at statement either because STMTS might be used by my tree parser.
So I introduced the following rule specifically for an interactive session:
interactive
: statement -> ^(STMTS statement)
;
In my case, there are no "first line" rules, so I can't say how easy or hard it would be to do something similar for them. It may be a matter of making a rule like so and execute it at the beginning of the interactive session:
interactive_start
: first_line
;
The code behind a grammar (e.g., code that tracks symbols) may have been written under the assumption that the lifespan of the input and the lifespan of the parser object would effectively be the same. For my solution, that assumption doesn't hold. The parser gets replaced after each statement, so the new parser must be able to pick up the symbol tracking (or whatever) where the last one left off. This is a typical separation-of-concerns problem so I don't think there's much else to say about it.
The first problem mentioned, the limitations of the built-in CharStream classes, was my only major hang-up. ANTLRStringStream has all the workings that I need, so I derived my own CharStream class off of it. The base class's data member is assumed to have all the past characters read, so I needed to override all the methods that access it. Then I changed the direct read to a call to (new method) dataAt to manage reading from the stream. That's basically all there is to this. Please note that the code here may have unnoticed problems and does no real error handling.
public class MyInputStream extends ANTLRStringStream {
private InputStream in;
public MyInputStream(InputStream in) {
super(new char[0], 0);
this.in = in;
}
#Override
// copied almost verbatim from ANTLRStringStream
public void consume() {
if (p < n) {
charPositionInLine++;
if (dataAt(p) == '\n') {
line++;
charPositionInLine = 0;
}
p++;
}
}
#Override
// copied almost verbatim from ANTLRStringStream
public int LA(int i) {
if (i == 0) {
return 0; // undefined
}
if (i < 0) {
i++; // e.g., translate LA(-1) to use offset i=0; then data[p+0-1]
if ((p + i - 1) < 0) {
return CharStream.EOF; // invalid; no char before first char
}
}
// Read ahead
return dataAt(p + i - 1);
}
#Override
public String substring(int start, int stop) {
if (stop >= n) {
//Read ahead.
dataAt(stop);
}
return new String(data, start, stop - start + 1);
}
private int dataAt(int i) {
ensureRead(i);
if (i < n) {
return data[i];
} else {
// Nothing to read at that point.
return CharStream.EOF;
}
}
private void ensureRead(int i) {
if (i < n) {
// The data has been read.
return;
}
int distance = i - n + 1;
ensureCapacity(n + distance);
// Crude way to copy from the byte stream into the char array.
for (int pos = 0; pos < distance; ++pos) {
int read;
try {
read = in.read();
} catch (IOException e) {
// TODO handle this better.
throw new RuntimeException(e);
}
if (read < 0) {
break;
} else {
data[n++] = (char) read;
}
}
}
private void ensureCapacity(int capacity) {
if (capacity > n) {
char[] newData = new char[capacity];
System.arraycopy(data, 0, newData, 0, n);
data = newData;
}
}
}
Launching an interactive session is similar to the boilerplate parsing code, except that UnbufferedTokenStream is used and the parsing takes place in a loop:
MyLexer lex = new MyLexer(new MyInputStream(System.in));
TokenStream tokens = new UnbufferedTokenStream(lex);
//Handle "first line" parser rule(s) here.
while (true) {
MyParser parser = new MyParser(tokens);
//Set up the parser here.
MyParser.interactive_return r = parser.interactive();
//Do something with the return value.
//Break on some meaningful condition.
}
Still with me? Okay, well that's it. :)
If you are using System.in as source, which is an input stream, why not just have ANTLR tokenize the input stream as it is read and then parse the tokens?
You have to put it in doStuff....
For instance, if you're declaring a function, the parse would return a function right? without body, so, that's fine, because the body will come later. You'd do what most REPL do.