I want to parse the sentence "i am looking for a java developer from india".
The output i need is language=java and place=india
I created a grammar file as follows.
grammar Job;
eval returns [String value]
: output=jobExp {System.out.println($output.text); $value = $output.text;}
;
jobExp returns [String value]
: ind=indro whitespace var1=language ' developer from ' var2=place
{
System.out.println($var1.text);
System.out.println($var2.text);
$value = $var1.text+$var2.text; }
;
indro
:
'i am looking for a'
|
'i am searching for a'
;
language :
'java' | 'python' | 'cpp'
;
place :
'india' | 'america' | 'africa'
;
whitespace :
(' '|'\t')+
;
inside jobExp i am getting the values for place and language. And I am just returning only those two variables. But in eval i am getting the whole sentence(i am looking for a java developer from india). What should i need to get the exact matching output in eval ? Is it possible to get the output as json or hashMap in antlr?
My java class for testing the grammar as follows:
import org.antlr.runtime.*;
public class JobTest {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("i am looking for a java developer from india" );
JobLexer lexer = new JobLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JobParser parser = new JobParser(tokens);
System.out.println(parser.eval()); // print the value
}
}
You can have a #header block in your grammar to add it to the generated parser file.
grammar Job;
#header {
import java.util.HashMap;
}
You can further on your grammar file use the HashMap class just as you're using String.
There's also the #members block to define private fields of the parser. You can check an example using both blocks to define an expression evaluator in this tutorial.
Related
I have a .g4 file for my grammar and it works fine.
In my java program, user input must follow some rules which are the rules in the .g4 file.
how can I use it in my java code to check if the user input is valid?
BTW, my IDE is IntelliJ IDEA.
here is my antlr code:
grammar CFG;
/*
* Parser Rules
*/
cfg: (rull NewLine)+;
rull: Variable TransitionOperator sententialForm (Or sententialForm)*;
sententialForm: ((Variable | Literal)+) | Landa;
/*
* Lexer Rules
*/
Literal: [a-z];
Variable: [A-Z];
TransitionOperator: '->';
Or: '|';
OpenParenthesis: '(';
CloseParenthesis: ')';
// Star: '*';
// Plus: '+';
Landa: 'λ';
WhiteSpace: ' ' -> skip;
NewLine: '\n';
That's pretty easy to do: set up your parsing pipeline as usual:
using Antlr4.Runtime;
using Antlr4.Runtime.Tree;
public void MyParseMethod() {
String input = "your text to parse here";
ICharStream stream = CharStreams.fromstring(input);
ITokenSource lexer = new CFGLexer(stream);
ITokenStream tokens = new CommonTokenStream(lexer);
MyGrammarParser parser = new CFGParser(tokens);
// parser.BuildParseTree = true;
IParseTree tree = parser.cfg();
}
(here written in C#) and once the parse run is done check getNumberOfSyntaxErrors() to see if there was an error in the input. For more finegrained handling set up your own error listener and collect the produced errors.
I am using mysql grammar from here: https://github.com/antlr/grammars-v4/tree/master/mysql and have generated java files using Maven. Now, I was trying to parse a query but I am not getting how to do so.
I basically want to 'get' all the different components of a query, like the list columns selected, where conditions, sub queries, table names, etc. But I have no idea how to proceed. I have written below code as of now. Can someone please suggest with a simple example so that I can understand the usage and take up more complex tasks? Here is my code:
public static void main( String[] args )
{
String sql="select cust_name from database..table where cust_name like 'Kash%'";
ANTLRInputStream input = new ANTLRInputStream(sql);
MySqlLexer mySqlLexer = new MySqlLexer(input);
CommonTokenStream tokens = new CommonTokenStream(mySqlLexer);
MySqlParser mySqlParser = new MySqlParser(tokens);
ParseTree tree = mySqlParser.dmlStatement();
ParseTreeWalker walker = new ParseTreeWalker();
MySqlParserBaseListener listener=new MySqlParserBaseListener();
ParseTreeWalker.DEFAULT.walk(listener, tree);
System.out.println(?);
}
Using the above code, I am getting the following output:
line 1:11 no viable alternative at input '_'
(dmlStatement _ . . _ 'Kash%')
Thanks For Help :)
I basically want to 'get' all the different components of a query, like the list columns selected, where conditions, sub queries, table names, etc.
Your tree variable holds all that data: ParseTree tree = mySqlParser.dmlStatement();
line 1:11 no viable alternative at input '_'
If you look at the lexer rules:
SELECT: 'SELECT';
ID: ID_LITERAL;
fragment ID_LITERAL: [A-Z_$0-9]*?[A-Z_$]+?[A-Z_$0-9]*;
it appears that keywords and identifiers cannot contain lowercase letters.
If you run it like this:
String sql = "SELECT CUST_NAME FROM CUSTOMERS WHERE CUST_NAME LIKE 'Kash%'";
MySqlLexer lexer = new MySqlLexer(CharStreams.fromString(sql));
MySqlParser parser = new MySqlParser(new CommonTokenStream(lexer));
ParseTree root = parser.dmlStatement();
System.out.println(root.toStringTree(parser));
you will see the following output (indented for easier reading):
(dmlStatement
(selectStatement
(querySpecification SELECT
(selectElements
(selectElement
(fullColumnName
(uid
(simpleId CUST_NAME)))))
(fromClause FROM
(tableSources
(tableSource
(tableSourceItem
(tableName
(fullId
(uid
(simpleId CUSTOMERS))))))) WHERE
(expression
(predicate
(predicate
(expressionAtom
(fullColumnName
(uid
(simpleId CUST_NAME))))) LIKE
(predicate
(expressionAtom
(constant
(stringLiteral 'Kash%'))))))))))
I wrote a grammar with antlr 4.4 like this :
grammar CSV;
file
: row+ EOF
;
row
: value (Comma value)* (LineBreak | EOF)
;
value
: SimpleValueA
| QuotedValue
;
Comma
: ','
;
LineBreak
: '\r'? '\n'
| '\r'
;
SimpleValue
: ~(',' | '\r' | '\n' | '"')+
;
QuotedValue
: '"' ('""' | ~'"')* '"'
;
then I use antlr 4.4 for generating parser & lexer,
this process is successful
after generate classes I wrote some java code for using grammar
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
public class Main {
public static void main(String[] args)
{
String source = "\"a\",\"b\",\"c";
CSVLexer lex = new CSVLexer(new ANTLRInputStream(source));
CommonTokenStream tokens = new CommonTokenStream(lex);
tokens.fill();
CSVParser parser = new CSVParser(tokens);
CSVParser.FileContext file = parser.file();
}
}
all of above code is a parser for CSV strings
for example : ""a","b",c"
Window Output :
line 1:8 token recognition error at: '"c'
line 1:10 missing {SimpleValue, QuotedValue} at '<EOF>'
I want to know How I can get this errors from a method (getErrors() or ...) in code-behind not as result of output window
Can anyone help me ?
Using ANTLR for CSV parsing is a nuclear option IMHO, but since you're at it...
Implement the interface ANTLRErrorListener. You may extend BaseErrorListener for that. Collect the errors and append them to a list.
Call parser.removeErrorListeners() to remove the default listeners
Call parser.addErrorListener(yourListenerInstance) to add your own listener
Parse your input
Now, for the lexer, you may either do the same thing removeErrorListeners/addErrorListener, or add the following rule at the end:
UNKNOWN_CHAR : . ;
With this rule, the lexer will never fail (it will generate UNKNOWN_CHAR tokens when it can't do anything else) and all errors will be generated by the parser (because it won't know what to do with these UNKNOWN_CHAR tokens). I recommend this approach.
I am trying to develop a tool using ANTLR 4.0. I am very new to ANTLR and Advance Java. I had downloaded the package i.e antlr-4.2.2-complete.jar. ANTLER is working fine.
I have few doubts.
I took a very basic grammar , give below:
grammar test;
start : (aa) | (bb);
aa : A C D;
bb : A C B;
A : 'a';
B : 'b';
C : 'c';
D : 'd';
WS : [ \t\r\n] ->skip;
now I am using command prompt to parse string in it..
C:\javalib\test>java org.antlr.v4.Tool test.g4
C:\javalib\test>javac test*.java
C:\javalib\test>java org.antlr.v4.runtime.misc.TestRig test start -gui -tree
**acb**
^Z
(start (bb a c b))
string acb was parsed and output obtained was (start (bb a c b)).
Now, i want know how can i parse manystrings/ a file in ANTLR. Each line in that file will have different start rule.
For Example , file which we have to parse will look like (input file)
start : acb
bb : acb
aa : acd
I can't take the advice of changing the grammar accordingly so that i will get one start rule which can be used for all the strings, because the grammar on which i am working really very vast.
I can change the format of my input string, so that we can parse it easily in ANTLR. I wanted to give the basic idea, that i have many strings , each string have different start rule, how can i parse it in ANTLR.
To parse each line with a given rule you could do this.
testcase :
singletest
( Linebreak singletest) *
;
singletest:
'ruleA' ':' ruleA
| 'ruleB' ':' ruleB
|...
;
Whitespace: [ \t] -> skip; // no line break!
Linebreak: '\r\n' | '\r' | '\n';
Now, i want know how can i parse manystrings/ a file in ANTLR.
To parse the file input.txt, do:
testLexer lexer = new testLexer(new ANTLRFileStream("input.txt"));
testParser parser = new testParser(new CommonTokenStream(lexer));
ParseTree tree = parser.start();
System.out.println(tree.toStringTree());
This is my tree grammar:
grammar t;
options{
output = AST;
}
type
:
'NVARCHAR' -> "VARCHAR"
;
ANTLR3 3.1.3 says:
syntax error: antlr: t.g:12:5: unexpected token: 'NVARCHAR'
What's wrong here? I took it from this article.
ps. I'm using this grammar later in order to get AST out of it. Once the AST is retrieved I'm walking through it and add every token's text to some string buffer. The idea of the rewriting above is to replace certain tokens. I'm doing language-to-language mapping (SQL to SQL dialect, to be more specific).
Note the first sentence Terence starts with: "just had some cool ideas about a semantic rule specification language...". That's what the first example is: an idea. It's not valid syntax.
There are (at least) two options for you:
1. rewrite the text in the token immediately
grammar T;
options{
output=AST;
}
#parser::members {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("NVARCHAR"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.type();
}
}
type
: NVARCHAR {System.out.println("token=" + $NVARCHAR.text);}
;
NVARCHAR
: 'NVARCHAR' {setText("VARCHAR");}
;
But this only adjusts the text, not the type of the token, which remains a NVARCHAR type.
2. use an imaginary token:
grammar T;
options{
output=AST;
}
tokens {
VARCHAR='VARCHAR';
}
#parser::members {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("NVARCHAR"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.type();
}
}
type
: NVARCHAR -> VARCHAR
;
NVARCHAR
: 'NVARCHAR'
;
which changes the text and type of the token.
As you can see, with both demos, token=VARCHAR is being printed to the console:
bart#hades:~/Programming/ANTLR/Demos/T$ java -cp antlr-3.3.jar org.antlr.Tool T.g
bart#hades:~/Programming/ANTLR/Demos/T$ javac -cp antlr-3.3.jar *.java
bart#hades:~/Programming/ANTLR/Demos/T$ java -cp .:antlr-3.3.jar TParser
token=VARCHAR
in antlr4 replacing text and type can be achieved with the type action:
OldTokenType:
('Token1' | 'Token2' | 'Token3' ) {setText("New Token");}
-> type(NewTokenType);