ANTLR3: No viable alternative at character - java

I have this ANTLR3 grammar:
grammar wft;
#header {
package com.mycompany.wftdiff.parser;
import com.mycompany.wftdiff.model.*;
}
#lexer::header {
package com.mycompany.wftdiff.parser;
}
#members {
private final WftFile wftFile = new WftFile();
public WftFile getParsingResult() {
return wftFile;
}
}
wftFile:
{
System.out.println("Heyo!");
}
(CommentLine | assignment | NewLine)*
itemTypeDefinition
EOF
;
/**
* ItemTypeDefinition
* DEFINE ITEM_TYPE
* END ITEM_TYPE
*/
itemTypeDefinition:
'DEFINE ITEM_TYPE' NewLine
(KeyName|TransStmt|BaseStmt|NewLine)+
WhiteSpace* 'DEFINE ITEM_ATTRIBUTE' NewLine
(KeyName|TransStmt|BaseStmt)*
WhiteSpace* 'END ITEM_ATTRIBUTE' NewLine
'END ITEM_TYPE'
;
/**
* KeyName
* KEY NAME VARCHAR2(8)
*/
KeyName: WhiteSpace* KeyNameStart .* {$channel = HIDDEN;} NewLine;
fragment KeyNameStart: 'KEY NAME VARCHAR2(';
/**
* TransStmt
* TRANS DISPLAY_NAME VARCHAR2(80)
*/
TransStmt: WhiteSpace* TransStmtStart .* {$channel = HIDDEN;} NewLine;
fragment TransStmtStart: 'TRANS';
/**
* BaseStmt
BASE PROTECT_LEVEL NUMBER
*/
BaseStmt: WhiteSpace* BaseStmtStart .* {$channel = HIDDEN;} NewLine;
fragment BaseStmtStart: 'BASE';
/**
* Assignment
*/
assignment returns [Assignment assignment]:
{
System.out.println("Assignment found!");
}
target=AssignmentTarget
WhiteSpace '=' WhiteSpace
value=String {
assignment = new Assignment(target.getText(), value.getText());
wftFile.addAssignment(new Assignment(target.getText(), value.getText()));
}
NewLine;
AssignmentTarget: A (A|D|'_')*;
String: '"' ~'"'* '"'
;
/**
* Comment
*/
CommentLine: CommentStart .* {$channel = HIDDEN;} NewLine;
fragment CommentStart: '#';
// Lexer rules
fragment D: '0'..'9';
fragment A: 'A'..'Z'
| 'a'..'z';
StringLength: D+;
NewLine : '\r' '\n' | '\n' | '\r';
WhiteSpace: ' ';
Then I generate a parser for it using
java -cp "D:\wftdiff\lib\antlr-3.5.2\antlr-3.5.2-complete.jar" org.antlr.Tool -o src/com/mycompany/wftdiff/parser/ grammar-src/wft.g
...and call it like this:
val lexer = wftLexer(ANTLRFileStream(fileName))
val parser = wftParser(CommonTokenStream(lexer))
parser.wftFile()
System.out.println("Test")
fileName points to a text file with the following contents:
# Oracle Workflow Process Definition
# $Header$
VERSION_MAJOR = "2"
VERSION_MINOR = "6"
LANGUAGE = "GERMAN"
ACCESS_LEVEL = "100"
DEFINE ITEM_TYPE
KEY NAME VARCHAR2(8)
TRANS DISPLAY_NAME VARCHAR2(80)
TRANS DESCRIPTION VARCHAR2(240)
BASE PROTECT_LEVEL NUMBER
BASE CUSTOM_LEVEL NUMBER
BASE WF_SELECTOR VARCHAR2(240)
BASE READ_ROLE REFERENCES ROLE
BASE WRITE_ROLE REFERENCES ROLE
BASE EXECUTE_ROLE REFERENCES ROLE
BASE PERSISTENCE_TYPE VARCHAR2(8)
BASE PERSISTENCE_DAYS NUMBER
DEFINE ITEM_ATTRIBUTE
KEY NAME VARCHAR2(30)
TRANS DISPLAY_NAME VARCHAR2(80)
TRANS DESCRIPTION VARCHAR2(240)
BASE PROTECT_LEVEL NUMBER
BASE CUSTOM_LEVEL NUMBER
BASE TYPE VARCHAR2(8)
BASE FORMAT VARCHAR2(240)
BASE VALUE_TYPE VARCHAR2(8)
BASE DEFAULT VARCHAR2(4000)
END ITEM_ATTRIBUTE
END ITEM_TYPE
I get the following output:
Heyo!
Assignment found!
Assignment found!
Assignment found!
Assignment found!
test-data/partialSample01.wft line 25:2 no viable alternative at character 'D'
test-data/partialSample01.wft line 35:2 no viable alternative at character 'E'
Test
How should I change my grammar in order to get rid of the no viable alternative at character 'D' error?
Note that I don't need to parse this section of the file (I'm not interested in this particular information; it comes later in the file).
Update 1: Tried to ignore the whole thing as suggested here (using skip()), but it didn't help.
New grammar file:
grammar wft;
#header {
package com.mycompany.wftdiff.parser;
import com.mycompany.wftdiff.model.*;
}
#lexer::header {
package com.mycompany.wftdiff.parser;
}
#members {
private final WftFile wftFile = new WftFile();
public WftFile getParsingResult() {
return wftFile;
}
}
wftFile:
{
System.out.println("Heyo!");
}
(CommentLine | assignment | NewLine)*
itemTypeDefinition
EOF
;
/**
* ItemTypeDefinition
* DEFINE ITEM_TYPE
* END ITEM_TYPE
*/
itemTypeDefinition:
'DEFINE ITEM_TYPE' NewLine
(KeyName|TransStmt|BaseStmt|NewLine)+
WhiteSpace*
NewLine
DefineItemAttribute
WhiteSpace*
'END ITEM_TYPE'
;
DefineItemAttribute: 'DEFINE ITEM_ATTRIBUTE' .* 'END ITEM_ATTRIBUTE' {skip();};
/**
* KeyName
* KEY NAME VARCHAR2(8)
*/
KeyName: WhiteSpace* KeyNameStart .* {$channel = HIDDEN;} NewLine;
fragment KeyNameStart: 'KEY NAME VARCHAR2(';
/**
* TransStmt
* TRANS DISPLAY_NAME VARCHAR2(80)
*/
TransStmt: WhiteSpace* TransStmtStart .* {$channel = HIDDEN;} NewLine;
fragment TransStmtStart: 'TRANS';
/**
* BaseStmt
BASE PROTECT_LEVEL NUMBER
*/
BaseStmt: WhiteSpace* BaseStmtStart .* {$channel = HIDDEN;} NewLine;
fragment BaseStmtStart: 'BASE';
/**
* Assignment
*/
assignment returns [Assignment assignment]:
{
System.out.println("Assignment found!");
}
target=AssignmentTarget
WhiteSpace '=' WhiteSpace
value=String {
assignment = new Assignment(target.getText(), value.getText());
wftFile.addAssignment(new Assignment(target.getText(), value.getText()));
}
NewLine;
AssignmentTarget: A (A|D|'_')*;
String: '"' ~'"'* '"'
;
/**
* Comment
*/
CommentLine: CommentStart .* {$channel = HIDDEN;} NewLine;
fragment CommentStart: '#';
// Lexer rules
fragment D: '0'..'9';
fragment A: 'A'..'Z'
| 'a'..'z';
StringLength: D+;
NewLine : '\r' '\n' | '\n' | '\r';
WhiteSpace: ' ';
Parsing result:
Heyo!
Assignment found!
Assignment found!
Assignment found!
Assignment found!
test-data/partialSample01.wft line 25:2 no viable alternative at character 'D'
test-data/partialSample01.wft line 36:0 missing DefineItemAttribute at 'END ITEM_TYPE'
Test
Bounty terms
I will award the bounty to a person who accomplishes the following heroic deeds:
Creates a parser, which is capable to recognize all parts of this file, which are marked as relevant in the comments, that is
1.1. everything inside BEGIN ACTIVITY and END ACTIVITY tags,
1.2. everything inside BEGIN ACTIVITY_TRANSITION and END ACTIVITY_TRANSITION,
1.3. everything inside BEGIN PROCESS_ACTIVITY and BEGIN PROCESS_ACTIVITY tags.
By "recognize everything" I mean there must be ANTLR 3 code, which allows me to put Java statements that would process the data extracted from the file like in the assignment rule in the original post. You don't need to write any Java code there, but it must be possible for me to add that code later.
All parts which are not marked as relevant can be ignored by the parser (similar to the comments in the original grammar).
Your grammar must be compatible with ANTLR 3, Java 8, and Windows 7.
You can remove the code in the original version (like here), so you don't get compiler errors.
The parser must be either be able to be generated using java -cp "D:\wftdiff\lib\antlr-3.5.2\antlr-3.5.2-complete.jar" org.antlr.Tool -o src/com/mycompany/wftdiff/parser/ grammar-src/wft.g, or, if you use any special settings, you need to specify them in your answer. The point is, I need to be able to reproduce your result.
When I feed the sample file to the parser, it must consume it without complaining (without printing any ANTLR error messages, without crashing and without throwing technical exceptions like NullPointerException).

Here is the grammar. It recognize all parts, you can add java actions wherever you want.
Compiled and tested with jdk1.8, antlr 3.5.2 and the provided sample input.
grammar wft;
#header {
package com.mycompany.wftdiff.parser;
}
#lexer::header {
package com.mycompany.wftdiff.parser;
}
#members {
}
wftFile : (COMMENT|assignment|definition|flow)*
;
assignment
: ID EQ STRING
;
definition
: 'DEFINE' ID
(COMMENT | (dclass ID type) | definition)*
'END' ID
;
dclass : 'KEY' | 'BASE' | 'TRANS'
;
type : tnum | tvarchar | tref | tdate
;
tnum : 'NUMBER'
;
tvarchar: 'VARCHAR2' '(' INT ')'
;
tref : 'REFERENCES' ID
;
tdate : 'DATE'
;
flow : 'BEGIN' ID (STRING)+
(COMMENT|assignment|flow)+
'END' ID
;
EQ : '='
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
NL : '\r'? '\n' {$channel=HIDDEN;}
;
COMMENT
: '#' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
;
WS : ( ' '
| '\t'
) {$channel=HIDDEN;}
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
INT : '0'..'9'+
;

Related

Antlr3 grammar generates parsering error on encountering the Pound char

Antlr-3 generating an error on encountering the Pound char ("£") of the French language, which is equivalent char of Hash "#" char of English, even the Unicode value for three special characters #, #, and $ are specified in lexer/parser rule.
FYI: The Unicode value of Pound char (of the French language) = The Unicode value of Hash char (of ENGLISH language).
The lexer/parser rules:
grammar SimpleCalc;
options
{
k = 8;
language = Java;
//filter = true;
}
tokens {
PLUS = '+' ;
MINUS = '-' ;
MULT = '*' ;
DIV = '/' ;
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
expr : n1=NUMBER ( exp = ( PLUS | MINUS ) n2=NUMBER )*
{
if ($exp.text.equals("+"))
System.out.println("Plus Result = " + $n1.text + $n2.text);
else
System.out.println("Minus Result = " + $n1.text + $n2.text);
}
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
NUMBER : (DIGIT)+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
The text file also reading in UTF-8 as:
public static void main(String[] args) throws Exception
{
try
{
args = new String[1];
args[0] = new String("antlr_test.txt");
SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], "UTF-8"));
CommonTokenStream tokens = new CommonTokenStream(lex);
SimpleCalcParser parser = new SimpleCalcParser(tokens);
parser.expr();
//System.out.println(tokens);
}
catch (Exception e)
{
e.printStackTrace();
}
}
The input file is having only 1 line:
£3 + 4£
the error is:
antlr_test.txt line 1:1 no viable alternative at character '£'
antlr_test.txt line 1:7 no viable alternative at character '£'
What is wrong with my approach?
or did I miss something?
I cannot reproduce what you describe. When I test your grammar without modifications, I get a NumberFormatException, which is expected, because Integer.parseInt("£3") cannot succeed.
When I change your embedded code into this:
{
if ($exp.text.equals("+"))
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) + Integer.parseInt($n2.text.replaceAll("\\D", ""))));
else
System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) - Integer.parseInt($n2.text.replaceAll("\\D", ""))));
}
and regenerate lexer and parser classes (something you might not have done) and rerun the driver code, I get the following output:
Result = 7
EDIT
Perhaps the pound sign in the grammar is the issue? What if you try:
fragment DIGIT : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');
instead of:
fragment DIGIT : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
?

ANTLR mismatched input 'foo(some_foo)' expecting {'foo'}

I'm writing a parser using ANTLR and am now at the stage of testing my parser/lexer.
I stumbled over a strange bug while trying to parse basically a variable assignment. (Like this)
Foo = mpsga(LT);
I get the error : mismatched input 'line 1:6 mismatched input 'mpsga(LT)' expecting 'mpsga'
This is especially strange for when I remove the brackets (or the argument LT),
the parser recognizes mpsga and it only misses the brackets (or the argument).
My Grammar looks something like this:
Lexer
lexer grammar FooLexer;
COMMENT
:
'#' ~[\r\n]* -> channel ( HIDDEN )
;
NEWLINE
:
(
'\r'? '\n'
| '\r'
)+ -> channel ( HIDDEN )
;
EQUALSSIGN
:
'='
;
SEMICOLON
:
';'
;
MPSGA_255_1
:
'LT'
;
MPSGA
:
'mpsga'
;
WHITESPACE
:
(
' '
| '\t'
)+ -> channel ( HIDDEN )
;
BRACKET_OPEN
:
'('
;
BRACKET_CLOSED
:
')'
;
VAR
:
[a-zA-Z][0-9a-zA-Z_]*
;
Parser
parser grammar FooParser;
options {
tokenVocab = FooLexer;
}
stmt_block
:
stmt_list EOF
;
stmt
:
VAR EQUALSSIGN expr SEMICOLON NEWLINE?
;
stmt_list
:
stmt
| stmt_list stmt
;
expr
:
extvar
;
extvar
:
MPSGA BRACKET_OPEN mpsga_field BRACKET_CLOSED
;
mpsga_field
:
MPSGA_255_1
;
When I try to parse this Foo = mpsga(LT); in Java i get the error.
Any help is appreciated!
Edit:
My Parse hierachy looks like the following:
Foo = mpsga(LT);
stmt_block
->stmt_list:1
-->stmt
--->"Foo"
--->"="
--->expr
---->extvar
----->"mpsga(LT)"
---->";"
-><EOF>
Foo = mpsga(LT;
stmt_block
->stmt_list:1
-->stmt
--->"Foo"
--->"="
--->expr
---->extvar
----->"mpsga"
----->"("
----->mpsga_field
------>"LT"
----->"<missing ')'>"
---->";"
-><EOF>
DISCLAIMER: I solved the problem. For anyone experiencing the same issue: I had some Lexer rules that were ambiguous for the mpsga part.
It's the argument: your grammar accepts 'foo' or 'foo2' as constants, not some_foo.

antlr3 - ignore a token / parse it only once

At the moment I try to parse a text like
"play by the way by band by"
to parse a command for a mediaplayer.
I got problems if the play and by token is in the name of a song or an artist.
How can I ignore the multiple token in the songname and artist or parse the token only once in the directions I want?
That is my .g file:
text returns [String value]
: speech=wordExp (space s1=name)? (byartist space a1=name)? {
command = $speech.text;
match = $s1.text;
artist = $a1.text;
}
;
name
: s1 = (WORD (space s2 = WORD)*)
;
byartist
: space BY
;
wordExp
: PLAY | STOP
;
//Lexer
PLAY : 'play';
STOP : 'stop';
BY : 'by';
space : ' ';
WORD : ( 'a'..'z' | 'A'..'Z' )*; // digits in here?
WS : ('\t' | '\r'| '\n') {
$channel=HIDDEN;
}
;

ANTLR : AST eval problems

Allo,
I would like to eval an AST that i generated.
I wrote a grammar generating an AST, and now I'm triying to write the grammar to evaluate this tree.
Here's my grammar :
tree grammar XHTML2CSVTree;
options {
tokenVocab=XHTML2CSV;
ASTLabelType=CommonTree;
}
#members {
// variables and methods to be included in the java file generated
}
/*------------------------------------------------------------------
* TREE RULES
*------------------------------------------------------------------*/
// example
tableau returns [String csv]
: ^(TABLEAU {String retour="";}(l=ligne{retour += $l.csv;})* {System.out.println(retour);})
;
ligne returns [String csv]
: ^(LIGNE {String ret="";}(c=cellule{ret += $c.csv;})+)
;
cellule returns [String csv]
: ^(CELLULE s=CHAINE){ $csv = $s.text;}
;
And here's the grammar building the AST :
grammar XHTML2CSV;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
CELLULE;
LIGNE;
TABLEAU;
CELLULEG = '<td>'; // simple lexemes
CELLULED = '</td>';
DEBUTCOL = '<tr>';
FINCOL = '</tr>';
DTAB = '<table';
FTAB = '>';
FINTAB = '</table>';
// anonymous tokens (usefull to give meaningfull name to AST labels)
// simple lexemes
}
#members {
// variables and methods to be included in the java file generated
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
tableau
: DTAB STAB* FTAB ligne* FINTAB -> ^(TABLEAU ligne*)
;
ligne
: DEBUTCOL cellule+ FINCOL -> ^(LIGNE cellule+)
;
cellule
: CELLULEG CHAINE CELLULED -> ^(CELLULE CHAINE)
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
STAB
: ' '.*'=\"'.*'\"'
;
WS
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ {$channel = HIDDEN;}
; // skip white spaces
CHAINE : (~('\"' | ',' | '\n' | '<' | '>'))+
;
// complex lexemes
XHTML2CSV.g works, i can see the AST generated in ANTLRworks,
but i cannot parse this AST to generated CSV code.
I get errors :
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: not a statement
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: not a statement
match(input, Token.DOWN, null);
^
XHTML2CSVTree.java:144: ';' expected
match(input, Token.DOWN, null);
^
5 errors
If someone could help me,
Thanks.
eo
Edit :
My main class looks like :
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.tree.CommonTreeNodeStream;
import org.antlr.runtime.tree.Tree;
public class Main {
public static void main(String args[]) throws Exception {
try {
XHTML2CSVLexer lex = new XHTML2CSVLexer(new ANTLRFileStream(args[0])); // create lexer to read the file specified from command line (i.e., first argument, e.g., java Main test1.xhtml)
CommonTokenStream tokens = new CommonTokenStream(lex); // transform it into a token stream
XHTML2CSVParser parser = new XHTML2CSVParser(tokens); // create the parser that reads from the token stream
Tree t = (Tree) parser.cellule().tree; // (try to) parse a given rule specified in the parser file, e.g., my_main_rule
CommonTreeNodeStream nodes = new CommonTreeNodeStream(t); // transform it into a common data structure readable by the tree pattern
nodes.setTokenStream(tokens); // declare which token to use (i.e., labels of the nodes defined in the parser, mainly anonymous tokens)
XHTML2CSVTree tparser = new XHTML2CSVTree(nodes); // instantiate the tree pattern
System.out.println(tparser.cellule()); // apply patterns
} catch (Exception e) {
e.printStackTrace();
}
}
}
The ligne rule in you tree grammar:
ligne returns [String csv]
: ^(LIGNE {Sting ret="";r}(c=cellule{ret += $c.csv;})+)
; // ^ ^
// | |
// problem 1, problem 2
has 2 problems:
it contains Sting where it should be String;
there's a trailing r that is messing up your custom Java code.
It should be:
ligne returns [String csv]
: ^(LIGNE {String ret="";}(c=cellule{ret += $c.csv;})+)
;
EDIT
If I generate a lexer and parser (1), generate a tree walker (2), compile all .java source files (3) and run the Main class (4):
java -cp antlr-3.3.jar org.antlr.Tool XHTML2CSV.g
java -cp antlr-3.3.jar org.antlr.Tool XHTML2CSVTree.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main test.txt
the following gets printed to the console:
table data
where the file test.txt contains:
<td>table data</td>
So I don't see any problem. Perhaps you're trying to parse a <table>? This would go wrong since both your parser and tree-walker are invoking the cellule rule, not the tableau rule.

How to handle escape sequences in string literals in ANTLR 3?

I've been looking through the ANTLR v3 documentation (and my trusty copy of "The Definitive ANTLR reference"), and I can't seem to find a clean way to implement escape sequences in string literals (I'm currently using the Java target). I had hoped to be able to do something like:
fragment
ESCAPE_SEQUENCE
: '\\' '\'' { setText("'"); }
;
STRING
: '\'' (ESCAPE_SEQUENCE | ~('\'' | '\\'))* '\''
{
// strip the quotes from the resulting token
setText(getText().substring(1, getText().length() - 1));
}
;
For example, I would want the input token "'Foo\'s House'" to become the String "Foo's House".
Unfortunately, the setText(...) call in the ESCAPE_SEQUENCE fragment sets the text for the entire STRING token, which is obviously not what I want.
Is there a way to implement this grammar without adding a method to go back through the resulting string and manually replace escape sequences (e.g., with something like setText(escapeString(getText())) in the STRING rule)?
Here is how I accomplished this in the JSON parser I wrote.
STRING
#init{StringBuilder lBuf = new StringBuilder();}
:
'"'
( escaped=ESC {lBuf.append(getText());} |
normal=~('"'|'\\'|'\n'|'\r') {lBuf.appendCodePoint(normal);} )*
'"'
{setText(lBuf.toString());}
;
fragment
ESC
: '\\'
( 'n' {setText("\n");}
| 'r' {setText("\r");}
| 't' {setText("\t");}
| 'b' {setText("\b");}
| 'f' {setText("\f");}
| '"' {setText("\"");}
| '\'' {setText("\'");}
| '/' {setText("/");}
| '\\' {setText("\\");}
| ('u')+ i=HEX_DIGIT j=HEX_DIGIT k=HEX_DIGIT l=HEX_DIGIT
{setText(ParserUtil.hexToChar(i.getText(),j.getText(),
k.getText(),l.getText()));}
)
;
For ANTLR4, Java target and standard escaped string grammar, I used a dedicated singleton class : CharSupport to translate string. It is available in antlr API :
STRING : '"'
( ESC
| ~('"'|'\\'|'\n'|'\r')
)*
'"' {
setText(
org.antlr.v4.misc.CharSupport.getStringFromGrammarStringLiteral(
getText()
)
);
}
;
As I saw in V4 documentation and by experiments, #init is no longer supported in lexer part!
Another (possibly more efficient) alternative is to use rule arguments:
STRING
#init { final StringBuilder buf = new StringBuilder(); }
:
'"'
(
ESCAPE[buf]
| i = ~( '\\' | '"' ) { buf.appendCodePoint(i); }
)*
'"'
{ setText(buf.toString()); };
fragment ESCAPE[StringBuilder buf] :
'\\'
( 't' { buf.append('\t'); }
| 'n' { buf.append('\n'); }
| 'r' { buf.append('\r'); }
| '"' { buf.append('\"'); }
| '\\' { buf.append('\\'); }
| 'u' a = HEX_DIGIT b = HEX_DIGIT c = HEX_DIGIT d = HEX_DIGIT { buf.append(ParserUtil.hexChar(a, b, c, d)); }
);
I needed to do just that, but my target was C and not Java. Here's how I did it based on answer #1 (and comment), in case anyone needs something alike:
QUOTE : '\'';
STR
#init{ pANTLR3_STRING unesc = GETTEXT()->factory->newRaw(GETTEXT()->factory); }
: QUOTE ( reg = ~('\\' | '\'') { unesc->addc(unesc, reg); }
| esc = ESCAPED { unesc->appendS(unesc, GETTEXT()); } )+ QUOTE { SETTEXT(unesc); };
fragment
ESCAPED : '\\'
( '\\' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\\")); }
| '\'' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\'")); }
)
;
HTH.

Categories