I tried to match the extra space at the beginning of the line, but it didn't work. How to modify the lexer rule to match?
TestParser.g4:
parser grammar TestParser;
options { tokenVocab=TestLexer; }
root
: choice+ EOF
;
choice:
QUESTION OPTION+;
TestLexer.g4:
lexer grammar TestLexer;
#lexer::members {
private boolean aheadIsNotAnOption(IntStream _input) {
int nextChar = _input.LA(1);
return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
}
}
QUESTION: {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER: . -> skip;
mode OPTION_MODE;
OPTION: OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE: NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER: OTHER -> skip;
fragment DIGIT: [0-9]+;
fragment OPTION_HEADER: [A-D];
fragment CONTENT: [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == '\n'}?;
fragment DOT: '.';
fragment NEWLINE: '\n';
fragment SPACE: ' ';
Text:
1.title
A.aaa
B.bbb
C.ccc
2.title
A.aaa
Java code:
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;
import java.io.IOException;
import java.net.URISyntaxException;
public class TestParseTest {
public static void main(String[] args) throws URISyntaxException, IOException {
CharStream charStream = CharStreams.fromString("1.title\n" +
"A.aaa\n" +
"B.bbb\n" +
" C.ccc\n" +
"2.title\n" +
"A.aaa\n");
Lexer lexer = new TestLexer(charStream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();
System.out.println(parseTree.toStringTree(parser));
}
}
The output is as follows:
(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)
The idea is that when a non-option line is encountered in OPTION_MODE, the mode will pop up, and now when there is an extra space at the beginning of the line, it is not matched as expected.
It seems that the \n before C.ccc matches NOT_OPTION_LINE causing the mode to pop up? I want C.ccc to match as OPTION, thanks.
I think you're making it a bit too complex. As I see it, lines either start as a question ([ \t]* [0-9]+) or as an option [ \t]* [A-Z]. In all other cases, just ignore the line (. -> skip). That boils down to the following grammar:
lexer grammar TestLexer;
QuestionStart
: {getCharPositionInLine() == 0}? [ \t]* [0-9]+ '.' -> pushMode(ContentMode)
;
OptionStart
: {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.' -> pushMode(ContentMode)
;
Ignored
: . -> skip
;
mode ContentMode;
Content
: ~[\r\n]+
;
QuestionEnd
: [\r\n]+ -> skip, popMode
;
A parser grammar could then look like this:
parser grammar TestParser;
options { tokenVocab=TestLexer; }
root
: question+ EOF
;
question
: QuestionStart Content option+
;
option
: OptionStart Content+
;
And the Java code:
String source = "1.title\n" +
"A.aaa\n" +
"B.bbb\n" +
" C.ccc\n" +
" ...ignored ...\n" +
"2.title\n" +
"A.aaa\n";
Lexer lexer = new TestLexer(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();
System.out.println(parseTree.toStringTree(parser));
will then print:
(root (question 1. title (option A. aaa) (option B. bbb) (option C. ccc)) (question 2. title (option A. aaa)) <EOF>)
EDIT
Given that you already have target specific code in your grammar, you could just trim the spaces from an option like this (untested!):
OptionStart
: {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.'
{setText(getText().trim());}
-> pushMode(ContentMode)
;
Related
Antlr-3 generating an error on encountering the Pound char ("£") of the French language, which is equivalent char of Hash "#" char of English, even the Unicode value for three special characters #, #, and $ are specified in lexer/parser rule.
FYI: The Unicode value of Pound char (of the French language) = The Unicode value of Hash char (of ENGLISH language).
The lexer/parser rules:
grammar SimpleCalc;
options
{
k = 8;
language = Java;
//filter = true;
}
tokens {
PLUS = '+' ;
MINUS = '-' ;
MULT = '*' ;
DIV = '/' ;
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
expr : n1=NUMBER ( exp = ( PLUS | MINUS ) n2=NUMBER )*
{
if ($exp.text.equals("+"))
System.out.println("Plus Result = " + $n1.text + $n2.text);
else
System.out.println("Minus Result = " + $n1.text + $n2.text);
}
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
NUMBER : (DIGIT)+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
The text file also reading in UTF-8 as:
public static void main(String[] args) throws Exception
{
try
{
args = new String[1];
args[0] = new String("antlr_test.txt");
SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], "UTF-8"));
CommonTokenStream tokens = new CommonTokenStream(lex);
SimpleCalcParser parser = new SimpleCalcParser(tokens);
parser.expr();
//System.out.println(tokens);
}
catch (Exception e)
{
e.printStackTrace();
}
}
The input file is having only 1 line:
£3 + 4£
the error is:
antlr_test.txt line 1:1 no viable alternative at character '£'
antlr_test.txt line 1:7 no viable alternative at character '£'
What is wrong with my approach?
or did I miss something?
I cannot reproduce what you describe. When I test your grammar without modifications, I get a NumberFormatException, which is expected, because Integer.parseInt("£3") cannot succeed.
When I change your embedded code into this:
{
if ($exp.text.equals("+"))
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) + Integer.parseInt($n2.text.replaceAll("\\D", ""))));
else
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) - Integer.parseInt($n2.text.replaceAll("\\D", ""))));
}
and regenerate lexer and parser classes (something you might not have done) and rerun the driver code, I get the following output:
Result = 7
EDIT
Perhaps the pound sign in the grammar is the issue? What if you try:
fragment DIGIT : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');
instead of:
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
?
I try to figure out how to get values from the parser.
My input is 'play the who' and it should return a string with 'the who'.
Sample.g:
text returns [String value]
: speech = wordExp space name {$value = $speech.text;}
;
name returns [String value]
: SongArtist = WORD (space WORD)* {$value = $SongArtist.text;}
;
wordExp returns [String value]
: command = PLAY {$value = $command.text;} | command = SEARCH {$value = $command.text;}
;
PLAY : 'play';
SEARCH : 'search';
space : ' ';
WORD : ( 'a'..'z' | 'A'..'Z' )*;
WS
: ('\t' | '\r'| '\n') {$channel=HIDDEN;}
;
If I enter 'play the who' that tree comes up:
http://i.stack.imgur.com/ET61P.png
I created a Java file to catch the output. If I call parser.wordExp() I supposed to get 'the who', but it returns the object and this EOF failure (see the output below). parser.text() returns 'play'.
import org.antlr.runtime.*;
import a.b.c.SampleLexer;
import a.b.c.SampleParser;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("play the who");
SampleLexer lexer = new SampleLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
SampleParser parser = new SampleParser(tokens);
System.out.println(parser.text());
System.out.println(parser.wordExp());
}
}
The console return this:
play
a.b.c.SampleParser$wordExp_return#1d0ca25a
line 1:12 no viable alternative at input '<EOF>'
How can I catch 'the who'? It is weird for me why I can not catch this string. The interpreter creates the tree correctly.
First, in your grammar, speech only gets assigned the return value of parser rule wordExp. If you want to manipulate the return value of rule name as well, you can do this with an additional variable like the example below.
text returns [String value]
: a=wordExp space b=name {$value = $a.text+" "+$b.text;}
;
Second, invoking parser.text() parses the entire input. A second invocation (in your case parser.wordExp()) thus finds EOF. If you remove the second call the no viable alternative at input 'EOF' goes away.
There may be a better way to do this, but in the meantime this may help you out.
im using antlr to create an interpreter and i need help readig every line of a file (one by one) because when i use the instruction "parser.program()" it parses only one line of the file.
the "program" instruction is the first parse rule on my grammar file.
I really need help, i need to create an interpreter instead of a compiler.
I use the NetBeans IDE with java.
` public static void main(String[] args) {
File archivo = new File ("PROYECTO.PSE");
try{
FileReader fr = new FileReader (archivo);
BufferedReader br = new BufferedReader(fr);
String linea;
StringTokenizer token;
while((linea = br.readLine())!=null){
token = new StringTokenizer(linea, " ");
while(token.hasMoreTokens()){
String palabra = token.nextToken();
ANTLRInputStream input = new ANTLRInputStream(palabra);
JayGrammarLexer lexer = new JayGrammarLexer (input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JayGrammarParser parser = new JayGrammarParser(tokens);
ParseTree t = parser.program();
}
}
}catch(Exception e){
e.printStackTrace();
}
}`
Grammar:
program: KEYWORD_VOI KEYWORD_MAI SEPARATOR_PAB SEPARATOR_PCD SEPARATOR_LAB (declarations statements) SEPARATOR_LCD;
declarations: (declaration)*;
declaration: (type identifiers);
type: (KEYWORD_INT | KEYWORD_BOO);
identifiers: (IDENTIFIER)*;
statements: (statement)*;
statement: (block | assignment | ifstatement | whilestatementk);
block: SEPARATOR_LAB statements SEPARATOR_LCD;
assignment: (IDENTIFIER OPERATOR_IGU expression);
ifstatement: KEYWORD_IF SEPARATOR_PAB expression SEPARATOR_PCD SEPARATOR_LAB statement SEPARATOR_LCD (KEYWORD_ELS SEPARATOR_LAB statement SEPARATOR_LCD)?;
whilestatementk: KEYWORD_WHI SEPARATOR_PAB expression SEPARATOR_PCD SEPARATOR_LAB statement SEPARATOR_LCD;
expression: conjunction ((OPERATOR_O) conjunction)*;
conjunction: relation ((OPERATOR_Y) relation)*;
relation: addition ((OPERATOR_REL) addition)*;
addition: term ((OPERATOR_SUM|OPERATOR_RES) term)*;
term: negation ((OPERATOR_POR|OPERATOR_DIV) negation)*;
negation:(OPERATOR_NO) factor;
factor: IDENTIFIER|LITERAL|SEPARATOR_PAB expression SEPARATOR_PCD;
INPUTELEMENT: (WHITESPACE|COMMENT|TOKEN);
WHITESPACE: (' '|'\t'|'\r'|'\n'|'\f') -> skip;
COMMENT: ('//');
TOKEN: (IDENTIFIER|KEYWORD_BOO|KEYWORD_ELS|KEYWORD_IF|KEYWORD_MAI|KEYWORD_VOI|KEYWORD_WHI|LITERAL
|SEPARATOR_COM|SEPARATOR_LAB|SEPARATOR_LCD|SEPARATOR_PAB|SEPARATOR_PCD|SEPARATOR_PYC
|OPERATOR_REL|OPERATOR_DIV|OPERATOR_IGU|OPERATOR_NO|OPERATOR_O|OPERATOR_POR|OPERATOR_RES|OPERATOR_SUM|OPERATOR_Y);
LITERAL: (BOOLEAN INTEGER);
KEYWORD_BOO: 'boolean';
KEYWORD_ELS:'else';
KEYWORD_IF: 'if';
KEYWORD_INT: 'int';
KEYWORD_MAI: 'main';
KEYWORD_VOI: 'void';
KEYWORD_WHI: 'while';
BOOLEAN: ('true'|'false');
INTEGER: (DIGIT+);
IDENTIFIER: (LETTER (LETTER| DIGIT)*);
DIGIT: ('0'..'9')+;
LETTER: ('a'..'z'|'A'..'Z')+;
SEPARATOR_PAB: '(';
SEPARATOR_PCD: ')';
SEPARATOR_LAB: '{';
SEPARATOR_LCD: '}';
SEPARATOR_PYC: ';';
SEPARATOR_COM: ',';
OPERATOR_IGU: ('=');
OPERATOR_SUM: ('+');
OPERATOR_RES: ('-');
OPERATOR_POR: ('*');
OPERATOR_DIV: ('/');
OPERATOR_REL: ('<'|'<='|'>'|'>='|'=='|'!=');
OPERATOR_Y: ('&&');
OPERATOR_O: ('||');
OPERATOR_NO: ('!');
I have a grammar that uses the $ character at the start of many terminal rules, such as $video{, $audio{, $image{, $link{ and others that are like this.
However, I'd also like to match all the $ and { and } characters that don't match these rules too. For example, my grammar does not properly match $100 in the CHUNK rule, but adding the $ to the long list of acceptable characters in CHUNK causes the other production rules to break.
How can I change my grammar so that it's smart enough to distinguish normal $, { and } characters from my special production rules?
Basically what I'd to be able to do is say, "if the $ character doesn't have {, video, image, audio, link, etc. after it, then it should go to CHUNK".
grammar Text;
#header {
}
#lexer::members {
private boolean readLabel = false;
private boolean readUrl = false;
}
#members {
private int numberOfVideos = 0;
private int numberOfAudios = 0;
private StringBuilder builder = new StringBuilder();
public String getResult() {
return builder.toString();
}
}
text
: expression*
;
expression
: fillInTheBlank
{
builder.append($fillInTheBlank.value);
}
| image
{
builder.append($image.value);
}
| video
{
builder.append($video.value);
}
| audio
{
builder.append($audio.value);
}
| link
{
builder.append($link.value);
}
| everythingElse
{
builder.append($everythingElse.value);
}
;
fillInTheBlank returns [String value]
: BEGIN_INPUT LABEL END_COMMAND
{
$value = "<input type=\"text\" id=\"" +
$LABEL.text +
"\" name=\"" +
$LABEL.text +
"\" class=\"FillInTheBlankAnswer\" />";
}
;
image returns [String value]
: BEGIN_IMAGE URL END_COMMAND
{
$value = "<img src=\"" + $URL.text + "\" />";
}
;
video returns [String value]
: BEGIN_VIDEO URL END_COMMAND
{
numberOfVideos++;
StringBuilder b = new StringBuilder();
b.append("<div id=\"video1\">Loading the player ...</div>\r\n");
b.append("<script type=\"text/javascript\">\r\n");
b.append("\tjwplayer(\"video" + numberOfVideos + "\").setup({\r\n");
b.append("\t\tflashplayer: \"/trainingdividend/js/jwplayer/player.swf\", file: \"");
b.append($URL.text);
b.append("\"\r\n\t});\r\n");
b.append("</script>\r\n");
$value = b.toString();
}
;
audio returns [String value]
: BEGIN_AUDIO URL END_COMMAND
{
numberOfAudios++;
StringBuilder b = new StringBuilder();
b.append("<p id=\"audioplayer_");
b.append(numberOfAudios);
b.append("\">Alternative content</p>\r\n");
b.append("<script type=\"text/javascript\">\r\n");
b.append("\tAudioPlayer.embed(\"audioplayer_");
b.append(numberOfAudios);
b.append("\", {soundFile: \"");
b.append($URL.text);
b.append("\"});\r\n");
b.append("</script>\r\n");
$value = b.toString();
}
;
link returns [String value]
: BEGIN_LINK URL END_COMMAND
{
$value = "" + $URL.text + "";
}
;
everythingElse returns [String value]
: CHUNK
{
$value = $CHUNK.text;
}
;
BEGIN_INPUT
: '${'
{
readLabel = true;
}
;
BEGIN_IMAGE
: '$image{'
{
readUrl = true;
}
;
BEGIN_VIDEO
: '$video{'
{
readUrl = true;
}
;
BEGIN_AUDIO
: '$audio{'
{
readUrl = true;
}
;
BEGIN_LINK
: '$link{'
{
readUrl = true;
}
;
END_COMMAND
: { readLabel || readUrl }?=> '}'
{
readLabel = false;
readUrl = false;
}
;
URL
: { readUrl }?=> 'http://' ('a'..'z'|'A'..'Z'|'0'..'9'|'.'|'/'|'-'|'_'|'%'|'&'|'?'|':')+
;
LABEL
: { readLabel }?=> ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')*
;
CHUNK
//: (~('${'|'$video{'|'$image{'|'$audio{'))+
: ('a'..'z'|'A'..'Z'|'0'..'9'|' '|'\t'|'\n'|'\r'|'-'|','|'.'|'?'|'\''|':'|'\"'|'>'|'<'|'/'|'_'|'='|';'|'('|')'|'&'|'!'|'#'|'%'|'*')+
;
You can't negate more than a single character. So, the following is invalid:
~('${')
But why not simply add '$', '{' and '}' to your CHUNK rule and remove the + at the end of the CHUNK rule (otherwise it would gobble up to much, possibly '$video{' further in the source, as you have noticed yourself already)?.
Now a CHUNK token will always consist of a single character, but you could create a production rule to fix this:
chunk
: CHUNK+
;
and use chunk in your production rules instead of CHUNK (or use CHUNK+, of course).
Input like "{ } $foo $video{" would be tokenized as follows:
CHUNK {
CHUNK
CHUNK }
CHUNK
CHUNK $
CHUNK f
CHUNK o
CHUNK o
CHUNK
BEGIN_VIDEO $video{
EDIT
And if you let your parser output an AST, you can easily merge all the text that one or more CHUNK's match into a single AST, whose inner token is of type CHUNK, like this:
grammar Text;
options {
output=AST;
}
...
chunk
: CHUNK+ -> {new CommonTree(new CommonToken(CHUNK, $text))}
;
...
An alternative solution which doesn't generate that many single-character tokens would be to allow chunks to contain a $ sign only as the first character. That way your input data will get split up at the dollar signs only.
You can achieve this by introducing a fragment lexer rule (i.e., a rule that does not define a token itself but can be used in other token regular expressions):
fragment CHUNKBODY
: 'a'..'z'|'A'..'Z'|'0'..'9'|' '|'\t'|'\n'|'\r'|'-'|','|'.'|'?'|'\''|':'|'\"'|'>'|'<'|'/'|'_'|'='|';'|'('|')'|'&'|'!'|'#'|'%'|'*';
The CHUNK rule then looks like:
CHUNK
: { !readLabel && !readUrl }?=> (CHUNKBODY|'$')CHUNKBODY*
;
This seems to work for me.
Allo,
I would like to eval an AST that i generated.
I wrote a grammar generating an AST, and now I'm triying to write the grammar to evaluate this tree.
Here's my grammar :
tree grammar XHTML2CSVTree;
options {
tokenVocab=XHTML2CSV;
ASTLabelType=CommonTree;
}
#members {
// variables and methods to be included in the java file generated
}
/*------------------------------------------------------------------
* TREE RULES
*------------------------------------------------------------------*/
// example
tableau returns [String csv]
: ^(TABLEAU {String retour="";}(l=ligne{retour += $l.csv;})* {System.out.println(retour);})
;
ligne returns [String csv]
: ^(LIGNE {String ret="";}(c=cellule{ret += $c.csv;})+)
;
cellule returns [String csv]
: ^(CELLULE s=CHAINE){ $csv = $s.text;}
;
And here's the grammar building the AST :
grammar XHTML2CSV;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
CELLULE;
LIGNE;
TABLEAU;
CELLULEG = '<td>'; // simple lexemes
CELLULED = '</td>';
DEBUTCOL = '<tr>';
FINCOL = '</tr>';
DTAB = '<table';
FTAB = '>';
FINTAB = '</table>';
// anonymous tokens (usefull to give meaningfull name to AST labels)
// simple lexemes
}
#members {
// variables and methods to be included in the java file generated
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
tableau
: DTAB STAB* FTAB ligne* FINTAB -> ^(TABLEAU ligne*)
;
ligne
: DEBUTCOL cellule+ FINCOL -> ^(LIGNE cellule+)
;
cellule
: CELLULEG CHAINE CELLULED -> ^(CELLULE CHAINE)
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
STAB
: ' '.*'=\"'.*'\"'
;
WS
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ {$channel = HIDDEN;}
; // skip white spaces
CHAINE : (~('\"' | ',' | '\n' | '<' | '>'))+
;
// complex lexemes
XHTML2CSV.g works, i can see the AST generated in ANTLRworks,
but i cannot parse this AST to generated CSV code.
I get errors :
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: not a statement
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: not a statement
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
5 errors
If someone could help me,
Thanks.
eo
Edit :
My main class looks like :
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.CommonTreeNodeStream;
import org.antlr.runtime.tree.Tree;
public class Main {
public static void main(String args[]) throws Exception {
try {
XHTML2CSVLexer lex = new XHTML2CSVLexer(new ANTLRFileStream(args[0])); // create lexer to read the file specified from command line (i.e., first argument, e.g., java Main test1.xhtml)
CommonTokenStream tokens = new CommonTokenStream(lex); // transform it into a token stream
XHTML2CSVParser parser = new XHTML2CSVParser(tokens); // create the parser that reads from the token stream
Tree t = (Tree) parser.cellule().tree; // (try to) parse a given rule specified in the parser file, e.g., my_main_rule
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t); // transform it into a common data structure readable by the tree pattern
nodes.setTokenStream(tokens); // declare which token to use (i.e., labels of the nodes defined in the parser, mainly anonymous tokens)
XHTML2CSVTree tparser = new XHTML2CSVTree(nodes); // instantiate the tree pattern
System.out.println(tparser.cellule()); // apply patterns
} catch (Exception e) {
e.printStackTrace();
}
}
}
The ligne rule in you tree grammar:
ligne returns [String csv]
: ^(LIGNE {Sting ret="";r}(c=cellule{ret += $c.csv;})+)
; // ^ ^
// | |
// problem 1, problem 2
has 2 problems:
it contains Sting where it should be String;
there's a trailing r that is messing up your custom Java code.
It should be:
ligne returns [String csv]
: ^(LIGNE {String ret="";}(c=cellule{ret += $c.csv;})+)
;
EDIT
If I generate a lexer and parser (1), generate a tree walker (2), compile all .java source files (3) and run the Main class (4):
java -cp antlr-3.3.jar org.antlr.Tool XHTML2CSV.g
java -cp antlr-3.3.jar org.antlr.Tool XHTML2CSVTree.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main test.txt
the following gets printed to the console:
table data
where the file test.txt contains:
<td>table data</td>
So I don't see any problem. Perhaps you're trying to parse a <table>? This would go wrong since both your parser and tree-walker are invoking the cellule rule, not the tableau rule.