Get active Antlr rule - java

Is it possible to get the "active" ANTLR rule from which a action method was called?
Something like this log-function in Antlr-Pseudo-Code which should show the start and end position of some rules without hand over the $start- and $end-tokens with every log()-call:
#members{
private void log() {
System.out.println("Start: " + $activeRule.start.pos +
"End: " + $activeRule.stop.pos);
}
}
expr: multExpr (('+'|'-') multExpr)* {log(); }
;
multExpr
: atom('*' atom)* {log(); }
;
atom: INT
| ID {log(); }
| '(' expr ')'
;

No, there is no way to get the name of the rule the parser is currently in. Realize that parser rules are, by default, simply Java methods returning a void. From a Java method, you cannot find out the name of it at run-time after all (when inside of this method).
If you set output=AST in the options { ... } of your grammar, every parser rule creates (and returns) an instance of a ParserRuleReturnScope called retval: so you could use that for your purposes:
// ...
options {
output=AST;
}
// ...
#parser::members{
private void log(ParserRuleReturnScope rule) {
System.out.println("Rule: " + rule.getClass().getName() +
", start: " + rule.start +
", end: " + rule.stop);
}
}
expr: multExpr (('+'|'-') multExpr)* {log(retval);}
;
multExpr
: atom('*' atom)* {log(retval);}
;
atom: INT
| ID {log(retval);}
| '(' expr ')'
;
// ...
This is however not a very reliable thing to do: the name of the variable may very well change in the next version of ANTLR.

(for Antlr4)
I was googling on how to get the name of the active rule and found this post. After some more research, I have found how to do it :
prog: statement[this.getRuleNames() /* parser rule names */]* EOF
;
statement [String[] rule_names]
locals [String rule_name]
#after { System.out.println("The statement is a " + $rule_name + " : `" + $text + "`"); }
: stmt_a[rule_names] {$rule_name = $stmt_a.rule_name;}
;
stmt_a [String[] rule_names] returns [String rule_name]
: 'stmt_a' { $rule_name = rule_names[$ctx.getRuleIndex()]; }
;
A more general solution passes the context on to the surrounding rule, from which you can extract all informations about the last active rule.
File RuleName.g4 :
grammar RuleName;
prog
#init {System.out.println("Last update 1026");}
: statement[this.getRuleNames() /* parser rule names */]* EOF
;
statement [String[] rule_names]
locals [String rule_name, ParserRuleContext context]
#after { $rule_name = rule_names[$context.getRuleIndex()];
System.out.println("The statement is a " + $rule_name + " : `" + $text + "`" + " from " + $start + " to " + $stop); }
: stmt_a {$context = (ParserRuleContext)$stmt_a.context;}
| stmt_b {$context = (ParserRuleContext)$stmt_b.context;}
| stmt_c {$context = (ParserRuleContext)$stmt_c.context;}
;
stmt_a returns [Stmt_aContext context]
: 'stmt_a' more { $context = $ctx; }
;
stmt_b returns [Stmt_bContext context]
: 'stmt_b' more { $context = $ctx; }
;
stmt_c returns [Stmt_cContext context]
: 'stmt_c' more { $context = $ctx; }
;
more
: ID+
;
ID : [A-Z] ;
WS : [ \t]+ -> channel(HIDDEN) ;
NL : [\r\n]+ -> skip ;
File input.txt :
stmt_c X Y Z
stmt_a A B C
stmt_b D E F
Execution :
$ export CLASSPATH=".:/usr/local/lib/antlr-4.9-complete.jar"
$ alias a4='java -jar /usr/local/lib/antlr-4.9-complete.jar'
$ alias grun='java org.antlr.v4.gui.TestRig'
$ a4 -no-listener RuleName.g4
$ javac RuleName*.java
$ grun RuleName prog -tokens input.txt
[#0,0:5='stmt_c',<'stmt_c'>,1:0]
[#1,6:6=' ',<WS>,channel=1,1:6]
[#2,7:7='X',<ID>,1:7]
[#3,8:8=' ',<WS>,channel=1,1:8]
[#4,9:9='Y',<ID>,1:9]
[#5,10:10=' ',<WS>,channel=1,1:10]
[#6,11:11='Z',<ID>,1:11]
...
[#21,39:38='<EOF>',<EOF>,4:0]
Last update 1026
The statement is a stmt_c : `stmt_c X Y Z` from [#0,0:5='stmt_c',<3>,1:0] to [#6,11:11='Z',<4>,1:11]
The statement is a stmt_a : `stmt_a A B C` from [#7,13:18='stmt_a',<1>,2:0] to [#13,24:24='C',<4>,2:11]
The statement is a stmt_b : `stmt_b D E F` from [#14,26:31='stmt_b',<2>,3:0] to [#20,37:37='F',<4>,3:11]

Related

Antlr3 grammar generates parsering error on encountering the Pound char

Antlr-3 generating an error on encountering the Pound char ("£") of the French language, which is equivalent char of Hash "#" char of English, even the Unicode value for three special characters #, #, and $ are specified in lexer/parser rule.
FYI: The Unicode value of Pound char (of the French language) = The Unicode value of Hash char (of ENGLISH language).
The lexer/parser rules:
grammar SimpleCalc;
options
{
k = 8;
language = Java;
//filter = true;
}
tokens {
PLUS = '+' ;
MINUS = '-' ;
MULT = '*' ;
DIV = '/' ;
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
expr : n1=NUMBER ( exp = ( PLUS | MINUS ) n2=NUMBER )*
{
if ($exp.text.equals("+"))
System.out.println("Plus Result = " + $n1.text + $n2.text);
else
System.out.println("Minus Result = " + $n1.text + $n2.text);
}
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
NUMBER : (DIGIT)+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
The text file also reading in UTF-8 as:
public static void main(String[] args) throws Exception
{
try
{
args = new String[1];
args[0] = new String("antlr_test.txt");
SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], "UTF-8"));
CommonTokenStream tokens = new CommonTokenStream(lex);
SimpleCalcParser parser = new SimpleCalcParser(tokens);
parser.expr();
//System.out.println(tokens);
}
catch (Exception e)
{
e.printStackTrace();
}
}
The input file is having only 1 line:
£3 + 4£
the error is:
antlr_test.txt line 1:1 no viable alternative at character '£'
antlr_test.txt line 1:7 no viable alternative at character '£'
What is wrong with my approach?
or did I miss something?
I cannot reproduce what you describe. When I test your grammar without modifications, I get a NumberFormatException, which is expected, because Integer.parseInt("£3") cannot succeed.
When I change your embedded code into this:
{
if ($exp.text.equals("+"))
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) + Integer.parseInt($n2.text.replaceAll("\\D", ""))));
else
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) - Integer.parseInt($n2.text.replaceAll("\\D", ""))));
}
and regenerate lexer and parser classes (something you might not have done) and rerun the driver code, I get the following output:
Result = 7
EDIT
Perhaps the pound sign in the grammar is the issue? What if you try:
fragment DIGIT : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');
instead of:
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
?

JTB generating jj file that compiles but throws NullPointerExceptions on grammatically valid input

I am very new to JavaCC and JTB. I am trying to get a a moderately complicated parser/validator going using a JTB file to generate the AST for me. I'm having a lot of problems making that work, so I decided to do the simplest example possible.
I am using Eclipse 3.8.1. My JavaCC and JTB plugins are:
plugins/sf.eclipse.javacc_1.5.30/jars/javacc-5.0.jar
plugins/sf.eclipse.javacc_1.5.30/jars/jtb-1.4.9.jar
I believe these to be fairly recent versions of what's available and so they should work OK.
I created a project and inside that project, I created a new JTB file. The plugin generated a bunch of code.
/**
* JTB template file created by SF JavaCC plugin 1.5.28+ wizard for JTB 1.4.0.2+ and JavaCC 1.5.0+
*/
options
{
static = true;
JTB_P = "";
}
PARSER_BEGIN(SimpleGrammar)
// this import is not needed as it is generated by JTB
// import syntaxtree.*;
// this import is needed as it is not generated by JTB
import visitor.*;
public class SimpleGrammar
{
public static void main(String args [])
{
System.out.println("Reading from standard input...");
System.out.print("Enter an expression like \"1+(2+3)*var;\" :");
new SimpleGrammar(System.in);
try
{
Start start = SimpleGrammar.Start();
DepthFirstVoidVisitor v = new MyVisitor();
start.accept(v);
}
catch (Exception e)
{
System.out.println("Oops.");
System.out.println(e);
System.out.println(e.getMessage());
}
}
}
class MyVisitor extends DepthFirstVoidVisitor
{
public void visit(NodeToken n)
{
System.out.println("visit " + n.tokenImage);
}
}
PARSER_END(SimpleGrammar)
SKIP :
{
" "
| "\t"
| "\n"
| "\r"
| < "//" (~[ "\n", "\r" ])*
(
"\n"
| "\r"
| "\r\n"
) >
| < "/*" (~[ "*" ])* "*"
(
~[ "/" ] (~[ "*" ])* "*"
)*
"/" >
}
TOKEN : /* LITERALS */
{
< INTEGER_LITERAL :
< DECIMAL_LITERAL > ([ "l", "L" ])?
| < HEX_LITERAL > ([ "l", "L" ])?
| < OCTAL_LITERAL > ([ "l", "L" ])?
>
| < #DECIMAL_LITERAL : [ "1"-"9" ] ([ "0"-"9" ])* >
| < #HEX_LITERAL : "0" [ "x", "X" ] ([ "0"-"9", "a"-"f", "A"-"F" ])+ >
| < #OCTAL_LITERAL : "0" ([ "0"-"7" ])* >
}
TOKEN : /* IDENTIFIERS */
{
< IDENTIFIER :
< LETTER >
(
< LETTER >
| < DIGIT >
)* >
| < #LETTER : [ "_", "a"-"z", "A"-"Z" ] >
| < #DIGIT : [ "0"-"9" ] >
}
void Start() :
{}
{
Expression() ";"
}
void Expression() :
{}
{
AdditiveExpression()
}
void AdditiveExpression() :
{}
{
MultiplicativeExpression()
(
(
"+"
| "-"
)
MultiplicativeExpression()
)*
}
void MultiplicativeExpression() :
{}
{
UnaryExpression()
(
(
"*"
| "/"
| "%"
)
UnaryExpression()
)*
}
void UnaryExpression() :
{}
{
"(" Expression() ")"
| Identifier()
| MyInteger()
}
void Identifier() :
{}
{
< IDENTIFIER >
}
void MyInteger() :
{}
{
< INTEGER_LITERAL >
}
It actually had a bunch of ?parser_name? tags wherever you see SimpleGrammar which I noticed and changed.
So I right-click on the SimpleGrammar.jtb file and hit "Compile with JavaCC | JJTree | JTB" and it generates all the files just like it's supposed to. So I then press F11 to run the program. Here is the output of that console session:
Reading from standard input...
Enter an expression like "1+(2+3)*var;" :1+2
Oops.
java.lang.NullPointerException
null
That's funny, it probably shouldn't be doing that. So I change the exception catching to ParseException so that it won't catch the NullPointerException and I'll get a debug console. After I do that and hunt around a bit, I find this snippet of code:
static final public MyInteger MyInteger() throws ParseException {
// --- JTB generated node declarations ---
NodeToken n0 = null;
Token n1 = null;
jj_consume_token(INTEGER_LITERAL);
n0 = JTBToolkit.makeNodeToken(n1);
{if (true) return new MyInteger(n0);}
throw new Error("Missing return statement in function");
}
So what's happening here is that n1 is getting referenced before assigned to. I don't know why this is happening.
I've also looked in the generated jtb.out.jj file to see what's there. Here's the snippet corresponding to the code that I'm seeing:
MyInteger MyInteger() :
{
// --- JTB generated node declarations ---
NodeToken n0 = null;
Token n1 = null;
}
{
< INTEGER_LITERAL >
{ n0 = JTBToolkit.makeNodeToken(n1); }
{ return new MyInteger(n0); }
}
Is that wrong? I'm honestly not sure. I was under the impression that you did everything in the JTB file and all the files that were generated after that were not to be touched.
Any insight as to what's going wrong here would be greatly appreciated.
EDIT
So I've done a bit of poking around and I've found that if I change
<INTEGER_LITERAL>
to
n1 = <INTEGER_LITERAL>
in the above-mentioned snipped then there won't be a null pointer exception anymore I can successfully enter "1;" at the console and it'll parse.
I think what I'm trying to figure out is why JTB is generating a bogus jtb.out.jj rather than what can I do to stop the null pointer exceptions. I've got a decent sized, non-toy grammar I want to work with and manually editing the jtb.out.jj file every time is not a scalable solution.
I had the same problem, it's a bug in the generation. If you use version JTB 1.4.7, it will probably work.
You can find the available versions here: https://java.net/projects/jtb/sources/svn/show/trunk/lib?rev=75 (the download section doesn't really work for me)

JavaCC - XPath parser

I need to create a (very) simple parser of XPath expressions. I'm trying to use JavaCC for that purpose. I'm completely new to JavaCC (although we learned Flex & Bison at school), and so I'm trying to build the JJ script stepwise, by adding a tiny piece of functionality at a time.
So far I got up to the following grammar:
XPATHEXPRESSION ::= ("/" <STEP>)+
STEP ::= <AXIS_NAME> ":" <NODE_TEST> ( "[" <EXPRESSION> "]" )*
EXPRESSION ::= <XPATHEXPRESSION> "=" """ <IDENTIFIER> """
And the related JJ file looks like this:
options {
STATIC = false ;
}
PARSER_BEGIN(XPathParser)
package cz.me.generator.parser;
import cz.me.generator.expression.*;
import java.io.Reader;
import java.io.StringReader;
public class XPathParser
{
public static XPathExpr parse(String exprLiteral)
throws TokenMgrError, ParseException
{
Reader in = new StringReader(exprLiteral);
XPathParser parser = new XPathParser(in);
return parser.XPathExpr();
}
}
PARSER_END(XPathParser)
SKIP : { " " }
TOKEN : { < SLASH : "/" > }
TOKEN : { < COLON : ":" > }
TOKEN : { < OPEN_PAR : "[" > }
TOKEN : { < CLOSE_PAR : "]" > }
TOKEN : { < QUOTE : "\"" > }
TOKEN : { < EQ : "=" > }
TOKEN : { < GT : ">" > }
TOKEN : { < LT : "<" > }
TOKEN : { < IDENTIFIER : (["a"-"z","A"-"Z","0"-"9"])+ > }
TOKEN : { < NUMBER : (["0"-"9"])+ > }
Expression Expression() :
{
Token t;
XPathExpr xPathExpr;
String value;
}
{
xPathExpr = XPathExpr()
<EQ>
<QUOTE>
t = <IDENTIFIER>
{ value = t.image; }
<QUOTE>
{ return new EqExpr(xPathExpr, new StringLiteral(value)); }
}
XPathExpr XPathExpr() :
{
XPathExpr xPathExpr;
Step step;
}
{
{ xPathExpr = new XPathExpr(); }
(
<SLASH>
step = Step()
{ xPathExpr.addStep(step); }
)+
<EOF>
{ return xPathExpr; }
}
Step Step() :
{
Token t;
Step step;
Axis axis;
NodeTest nodeTest;
Expression predicate;
}
{
t = <IDENTIFIER>
{ axis = Axis.valueOf(t.image); }
<COLON>
t = <IDENTIFIER>
{ nodeTest = new NodeNameTest(t.image); }
{ step = new Step(axis, nodeTest); }
(
<OPEN_PAR>
predicate = Expression()
{ step.addPredicate(predicate); }
<CLOSE_PAR>
)*
{ return step; }
}
However, this does not work.
The following code:
XPathExpr expr = XPathParser.parse("/self:house/child:window[/child:material = \"glass\"]");
System.out.println(expr);
yields the following output (when running JavaCC in debug mode):
Call: XPathExpr
Consumed token: <"/" at line 1 column 1>
Call: Step
Consumed token: <<IDENTIFIER>: "self" at line 1 column 2>
Consumed token: <":" at line 1 column 6>
Consumed token: <<IDENTIFIER>: "house" at line 1 column 7>
Return: Step
Consumed token: <"/" at line 1 column 12>
Call: Step
Consumed token: <<IDENTIFIER>: "child" at line 1 column 13>
Consumed token: <":" at line 1 column 18>
Consumed token: <<IDENTIFIER>: "window" at line 1 column 19>
Consumed token: <"[" at line 1 column 25>
Call: Expression
Call: XPathExpr
Consumed token: <"/" at line 1 column 26>
Call: Step
Consumed token: <<IDENTIFIER>: "child" at line 1 column 27>
Consumed token: <":" at line 1 column 32>
Consumed token: <<IDENTIFIER>: "material" at line 1 column 33>
Return: Step
Return: XPathExpr
Return: Expression
Return: Step
Return: XPathExpr
Exception in thread "main" cz.me.generator.parser.ParseException: Encountered " "=" "= "" at line 1, column 42.
Was expecting one of:
<EOF>
"/" ...
"[" ...
at cz.me.generator.parser.XPathParser.generateParseException(XPathParser.java:270)
at cz.me.generator.parser.XPathParser.jj_consume_token(XPathParser.java:207)
at cz.me.generator.parser.XPathParser.XPathExpr(XPathParser.java:65)
at cz.me.generator.parser.XPathParser.Expression(XPathParser.java:32)
at cz.me.generator.parser.XPathParser.Step(XPathParser.java:100)
at cz.me.generator.parser.XPathParser.XPathExpr(XPathParser.java:54)
at cz.me.generator.parser.XPathParser.parse(XPathParser.java:22)
at cz.me.generator.Main.main(Main.java:17)
I have tried some variations to see if JavaCC is not having problems with the recursion, but it seems that the problem is elsewhere.
What is wrong?
The problem is the <EOF> in XPathExpr. Take that out. Add a production
void Start() :
{ }
{
XPathExpr()
< EOF >
}
and rewrite parse to use Start.

JavaCC lexer doesn't work as expected (whitespace not ignored)

I'm trying to implement a parser for the example file listed below. I'd like to recognize quoted strings with '+' between them as a single token. So I created a jj file, but it doesn't match such strings. I was under the impression that JavaCC is supposed to match the longest possible match for each token spec. But that doesn't seem to be case for me.
What am I doing wrong here? Why isn't my <STRING> token matching the '+' even though it's specified in there? Why is whitespace not being ignored?
options {
TOKEN_FACTORY = "Token";
}
PARSER_BEGIN(Parser)
package com.example.parser;
public class Parser {
public static void main(String args[]) throws ParseException {
ParserTokenManager manager = new ParserTokenManager(new SimpleCharStream(Parser.class.getResourceAsStream("example")));
Token token = manager.getNextToken();
while (token != null && token.kind != ParserConstants.EOF) {
System.out.println(token.toString() + "[" + token.kind + "]");
token = manager.getNextToken();
}
Parser parser = new Parser(Parser.class.getResourceAsStream("example"));
parser.start();
}
}
PARSER_END(Parser)
// WHITE SPACE
<DEFAULT, IN_STRING_KEYWORD>
SKIP :
{
" " // <-- skipping spaces
| "\t"
| "\n"
| "\r"
| "\f"
}
// TOKENS
TOKEN :
{
< KEYWORD1 : "keyword1" > : IN_STRING_KEYWORD
}
<IN_STRING_KEYWORD>
TOKEN : {<STRING : <CONCAT_STRING> | <UNQUOTED_STRING> > : DEFAULT
| <#CONCAT_STRING : <QUOTED_STRING> ("+" <QUOTED_STRING>)+ >
// <-- CONCAT_STRING never matches "+" part when input is "'smth' +", because whitespace is not ignored!?
| <#QUOTED_STRING : <SINGLEQUOTED_STRING> | <DOUBLEQUOTED_STRING> >
| <#SINGLEQUOTED_STRING : "'" (~["'"])* "'" >
| <#DOUBLEQUOTED_STRING :
"\""
(
(~["\"", "\\"]) |
("\\" ["n", "t", "\"", "\\"])
)*
"\""
>
| <#UNQUOTED_STRING : (~[" ","\t", ";", "{", "}", "/", "*", "'", "\"", "\n", "\r"] | "/" ~["/", "*"] | "*" ~["/"])+ >
}
void start() :
{}
{
(<KEYWORD1><STRING>";")+ <EOF>
}
Here's an example file that should get parsed:
keyword1 "foo" + ' bar';
I'd like to match the argument of the first keyword1 as a single <STRING> token.
Current output:
keyword1[6]
Exception in thread "main" com.example.parser.TokenMgrError: Lexical error at line 1, column 15. Encountered: " " (32), after : "\"foo\""
at com.example.parser.ParserTokenManager.getNextToken(ParserTokenManager.java:616)
at com.example.parser.Parser.main(Parser.java:12)
I'm using JavaCC 5.0.
STRING is expanding to the longest sequence that can be matched, which is "foo" as the error indicates. The space after the closing double quote is not part of the definition of the private token CONCAT_STRING. Skip tokens do not apply within the definition of other tokens, so you must incorporate the space directly into the definition, on either side of the +.
As an aside, I recommend have a final token definition like so:
<each-state-in-which-the-empty-string-cannot-be-recognized>
TOKEN : {
< ILLEGAL : ~[] >
}
This prevents TokenMgrErrors from being thrown and makes debugging a bit easier.

How can I parse a special character differently in two terminal rules using antlr?

I have a grammar that uses the $ character at the start of many terminal rules, such as $video{, $audio{, $image{, $link{ and others that are like this.
However, I'd also like to match all the $ and { and } characters that don't match these rules too. For example, my grammar does not properly match $100 in the CHUNK rule, but adding the $ to the long list of acceptable characters in CHUNK causes the other production rules to break.
How can I change my grammar so that it's smart enough to distinguish normal $, { and } characters from my special production rules?
Basically what I'd to be able to do is say, "if the $ character doesn't have {, video, image, audio, link, etc. after it, then it should go to CHUNK".
grammar Text;
#header {
}
#lexer::members {
private boolean readLabel = false;
private boolean readUrl = false;
}
#members {
private int numberOfVideos = 0;
private int numberOfAudios = 0;
private StringBuilder builder = new StringBuilder();
public String getResult() {
return builder.toString();
}
}
text
: expression*
;
expression
: fillInTheBlank
{
builder.append($fillInTheBlank.value);
}
| image
{
builder.append($image.value);
}
| video
{
builder.append($video.value);
}
| audio
{
builder.append($audio.value);
}
| link
{
builder.append($link.value);
}
| everythingElse
{
builder.append($everythingElse.value);
}
;
fillInTheBlank returns [String value]
: BEGIN_INPUT LABEL END_COMMAND
{
$value = "<input type=\"text\" id=\"" +
$LABEL.text +
"\" name=\"" +
$LABEL.text +
"\" class=\"FillInTheBlankAnswer\" />";
}
;
image returns [String value]
: BEGIN_IMAGE URL END_COMMAND
{
$value = "<img src=\"" + $URL.text + "\" />";
}
;
video returns [String value]
: BEGIN_VIDEO URL END_COMMAND
{
numberOfVideos++;
StringBuilder b = new StringBuilder();
b.append("<div id=\"video1\">Loading the player ...</div>\r\n");
b.append("<script type=\"text/javascript\">\r\n");
b.append("\tjwplayer(\"video" + numberOfVideos + "\").setup({\r\n");
b.append("\t\tflashplayer: \"/trainingdividend/js/jwplayer/player.swf\", file: \"");
b.append($URL.text);
b.append("\"\r\n\t});\r\n");
b.append("</script>\r\n");
$value = b.toString();
}
;
audio returns [String value]
: BEGIN_AUDIO URL END_COMMAND
{
numberOfAudios++;
StringBuilder b = new StringBuilder();
b.append("<p id=\"audioplayer_");
b.append(numberOfAudios);
b.append("\">Alternative content</p>\r\n");
b.append("<script type=\"text/javascript\">\r\n");
b.append("\tAudioPlayer.embed(\"audioplayer_");
b.append(numberOfAudios);
b.append("\", {soundFile: \"");
b.append($URL.text);
b.append("\"});\r\n");
b.append("</script>\r\n");
$value = b.toString();
}
;
link returns [String value]
: BEGIN_LINK URL END_COMMAND
{
$value = "" + $URL.text + "";
}
;
everythingElse returns [String value]
: CHUNK
{
$value = $CHUNK.text;
}
;
BEGIN_INPUT
: '${'
{
readLabel = true;
}
;
BEGIN_IMAGE
: '$image{'
{
readUrl = true;
}
;
BEGIN_VIDEO
: '$video{'
{
readUrl = true;
}
;
BEGIN_AUDIO
: '$audio{'
{
readUrl = true;
}
;
BEGIN_LINK
: '$link{'
{
readUrl = true;
}
;
END_COMMAND
: { readLabel || readUrl }?=> '}'
{
readLabel = false;
readUrl = false;
}
;
URL
: { readUrl }?=> 'http://' ('a'..'z'|'A'..'Z'|'0'..'9'|'.'|'/'|'-'|'_'|'%'|'&'|'?'|':')+
;
LABEL
: { readLabel }?=> ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')*
;
CHUNK
//: (~('${'|'$video{'|'$image{'|'$audio{'))+
: ('a'..'z'|'A'..'Z'|'0'..'9'|' '|'\t'|'\n'|'\r'|'-'|','|'.'|'?'|'\''|':'|'\"'|'>'|'<'|'/'|'_'|'='|';'|'('|')'|'&'|'!'|'#'|'%'|'*')+
;
You can't negate more than a single character. So, the following is invalid:
~('${')
But why not simply add '$', '{' and '}' to your CHUNK rule and remove the + at the end of the CHUNK rule (otherwise it would gobble up to much, possibly '$video{' further in the source, as you have noticed yourself already)?.
Now a CHUNK token will always consist of a single character, but you could create a production rule to fix this:
chunk
: CHUNK+
;
and use chunk in your production rules instead of CHUNK (or use CHUNK+, of course).
Input like "{ } $foo $video{" would be tokenized as follows:
CHUNK {
CHUNK
CHUNK }
CHUNK
CHUNK $
CHUNK f
CHUNK o
CHUNK o
CHUNK
BEGIN_VIDEO $video{
EDIT
And if you let your parser output an AST, you can easily merge all the text that one or more CHUNK's match into a single AST, whose inner token is of type CHUNK, like this:
grammar Text;
options {
output=AST;
}
...
chunk
: CHUNK+ -> {new CommonTree(new CommonToken(CHUNK, $text))}
;
...
An alternative solution which doesn't generate that many single-character tokens would be to allow chunks to contain a $ sign only as the first character. That way your input data will get split up at the dollar signs only.
You can achieve this by introducing a fragment lexer rule (i.e., a rule that does not define a token itself but can be used in other token regular expressions):
fragment CHUNKBODY
: 'a'..'z'|'A'..'Z'|'0'..'9'|' '|'\t'|'\n'|'\r'|'-'|','|'.'|'?'|'\''|':'|'\"'|'>'|'<'|'/'|'_'|'='|';'|'('|')'|'&'|'!'|'#'|'%'|'*';
The CHUNK rule then looks like:
CHUNK
: { !readLabel && !readUrl }?=> (CHUNKBODY|'$')CHUNKBODY*
;
This seems to work for me.

Categories