I'm new to PEG parsing and trying to write a simple parser to parse out an expression like: "term1 OR term2 anotherterm" ideally into an AST that would look something like:
OR
-----------|---------
| |
"term1" "term2 anotherterm"
I'm currently using Grappa (https://github.com/fge/grappa) but it's not matching even the more basic expression "term1 OR term2". This is what I have:
package grappa;
import com.github.fge.grappa.annotations.Label;
import com.github.fge.grappa.parsers.BaseParser;
import com.github.fge.grappa.rules.Rule;
public class ExprParser extends BaseParser<Object> {
#Label("expr")
Rule expr() {
return sequence(terms(), wsp(), string("OR"), wsp(), terms(), push(match()));
}
#Label("terms")
Rule terms() {
return sequence(whiteSpaces(),
join(term()).using(wsp()).min(0),
whiteSpaces());
}
#Label("term")
Rule term() {
return sequence(oneOrMore(character()), push(match()));
}
Rule character() {
return anyOf(
"0123456789" +
"abcdefghijklmnopqrstuvwxyz" +
"ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
"-_");
}
#Label("whiteSpaces")
Rule whiteSpaces() {
return join(zeroOrMore(wsp())).using(sequence(optional(cr()), lf())).min(0);
}
}
Can anyone point me in the right direction?
(author of grappa here...)
OK, so, what you seem to want is in fact a parse tree.
Very recently there has been an extension to grappa (2.0.x+) developed which can answer your needs: https://github.com/ChrisBrenton/grappa-parsetree.
Grappa, by default, only "blindly" matches text and has a stack at its disposal, so you could have, for instance:
public Rule oneOrOneOrEtc()
{
return join(one(), push(match())).using(or()).min(1));
}
But then all of your matches would have been on the stack... Not very practical, but still usable in some situations (see, for instance, sonar-sslr-grappa).
In your case you want this package. You can do this with it:
// define your root node
public final class Root
extends ParseNode
{
public Root(final String match, final List<ParseNode> children)
{
super(match, children);
}
}
// define your parse node
public final class Alternative
extends ParseNode
{
public Alternative(final String match, final List<ParseNode> children)
{
super(match, children);
}
}
That is the minimal implementation. And then your parser can look like this:
#GenerateNode(Alternative.class)
public Rule alternative() // or whatever
{
return // whatever an alternative is
}
#GenerateNode(Root.class)
public Rule root
{
return join(alternative())
.using(or())
.min(1);
}
What happens here is since the root node is matched before the alternative, if, say, you have a string:
a or b or c or d
then the root node will match the "whole sequence", and it will have four alternatives matching each a, b, c, and d.
Full credits here go to Christopher Brenton for coming up with this idea in the first place!
Related
I am new to ANTLR, I have a list of functions which are mostly of nested types.
Below are the examples for functions:
1. Function.add(Integer a,Integer b)
2. Function.concat(String a,String b)
3. Function.mul(Integer a,Integer b)
If the input is having:
Function.concat(Function.substring(String,Integer,Integer),String)
So by using ANTLR with Java program, how to define and validate whether the function names are correct and parameter count and datatypes are correct, which has to be recursive as the Function will be in deeply nested format?
validate test class:
public class FunctionValidate {
public static void main(String[] args) {
FunctionValidate fun = new FunctionValidate();
fun.test("FUNCTION.concat(1,2)");
}
private String test(String source) {
CodePointCharStream input = CharStreams.fromString(source);
return compile(input);
}
private String compile(CharStream source) {
MyFunctionsLexer lexer = new MyFunctionsLexer(source);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
MyFunctionsParser parser = new MyFunctionsParser(tokenStream);
FunctionContext tree = parser.function();
ArgumentContext tree1= parser.argument();
FunctionValidateVisitorImpl visitor = new FunctionValidateVisitorImpl();
visitor.visitFunction(tree);
visitor.visitArgument(tree1);
return null;
}
}
Visitor impl:
public class FunctionValidateVisitorImpl extends MyFunctionsParserBaseVisitor<String> {
#Override
public String visitFunction(MyFunctionsParser.FunctionContext ctx) {
String function = ctx.getText();
System.out.println("------>"+function);
return null;
}
#Override
public String visitArgument(MyFunctionsParser.ArgumentContext ctx){
String param = ctx.getText();
System.out.println("------>"+param);
return null;
}
}
System.out.println("------>"+param); this statement is not printing argument it is only printing ------>.
This task can be accomplished by implementing two main steps:
1) Parse given input and build an Abstract Syntax Tree (AST).
2) Traverse the tree and validate each function, each argument, one after another, using a Listener or a Visitor patterns.
Fortunately, ANTLR provides tools for implementing both steps.
Here's a simple grammar I wrote based on your example. It does recursive parsing and builds the AST. You may want to extend its functionality to meet your needs.
Lexer:
lexer grammar MyFunctionsLexer;
FUNCTION: 'FUNCTION';
NAME: [A-Z]+;
DOT: '.';
COMMA: ',';
L_BRACKET: '(';
R_BRACKET: ')';
WS : [ \t\r\n]+ -> skip;
Parser:
parser grammar MyFunctionsParser;
options {
tokenVocab=MyFunctionsLexer;
}
function : FUNCTION '.' NAME '('(argument (',' argument)*)')';
argument: (NAME | function);
Important thing to notice here: the parser does not make distinction between a valid (from your point of view) and invalid functions, arguments, number of arguments, etc.
So the function like Function.whatever(InvalidArg) is also a valid construction from parser's point of view. To further validate the input and test whether it meets your requirements (which is a predefined list of functions and their arguments), you have to traverse the tree using a Listener or a Visitor (I think Visitor fits here perfectly).
To get a better understanding of what it is I'd recommend reading this and this. But if you want to get deeper into the subject, you should definitely look at "The Dragons Book", which covers the topic exhaustively.
Hi everyone I have the following code in my .jjt file for my abstract syntax tree for checking track if where the nodes are made within the file that is passed to it but I cannot access this variable from my semantic checker class.
The code is bellow and any help would be appreciated! I've tried everything and I'm losing hope at this stage.
This is the integer in the .jjt file i'd like to access
TOKEN_MGR_DECLS :
{
static int commentNesting = 0;
public static int linenumber = 0;
}
SKIP : /*STRUCTURES AND CHARACTERS TO SCAPE*/
{
" "
| "\t"
| "\n" {linenumber++;}
| "\r"
| "\f"
}
An example of one of my nodes
void VariableDeclaration() #VariableDeclaration : {Token t; String id; String type;}
{
t = <VARIABLE> id = Identifier() <COLON> type = Type()
}
My semantic checker class
public class SemanticCheckVisitor implements "My jjt file visitor" {
public Object visit(VariableDeclaration node, Object data) {
node.childrenAccept(this, data);
return data;
}
How would it be possible to get the linenumber which this node was declared?
Thanks everyone.
}
You can see an example of this in the Teaching Machine's Java parser, which is here.
First you need to modify your SimpleNode type to include a field for the line number. In the TM I added a declaration
private SourceCoords myCoords ;
where SourceCoords is a type that includes not only the line number, but also information about what file the line was in. You can just use an int field. Also in SimpleNode you need to declare some methods like this
public void setCoords( SourceCoords toSet ) { myCoords = toSet ; }
public SourceCoords getCoords() { return myCoords ; }
You might want to declare them in the Node interface too.
Over in your .jjt file, use the option
NODE_SCOPE_HOOK=true;
And declare two methods in your parser class
void jjtreeOpenNodeScope(Node n) {
((SimpleNode)n).setCoords( new SourceCoords( file, getToken(1).beginLine ) ) ;
}
void jjtreeCloseNodeScope(Node n) {
}
Hmm. I probably should have declared the methods in Node to avoid that ugly cast.
One more thing, you are keeping count of the lines yourself. It's better the get the line number from the token, like I did. Your counter will generally by one token ahead. But when the parser looks ahead, it could be several tokens ahead.
If the token manager isn't keeping count of the lines correctly, then use your own count, but communicate it to the parser through an extra added field in the Token class.
Generally it's a bad idea to compute anything in the token manager and then use it in the parser unless its information you store in the tokens.
The target class is:
class Example{
public void m(){
System.out.println("Hello" + 1);
}
}
I want to get the full string of MethodInvocation "System.out.println("Hello" + 1)" for some regex check. How to write?
public class Rule extends BaseTreeVisitor implements JavaFileScanner {
#Override
public void visitMethodInvocation(MethodInvocationTree tree) {
//get the string of MethodInvocation
//some regex check
super.visitMethodInvocation(tree);
}
}
I wrote some code inspection rules using eclipse jdt and idea psi whose expression tree node has these attributes. I wonder why sonar's just has first and last token instead.
Thanks!
An old question, but I have a solution.
This works for any sort of tree.
#Override
public void visitMethodInvocation(MethodInvocationTree tree) {
int firstLine = tree.firstToken().line();
int lastLine = tree.lastToken().line();
String rawText = getRelevantLines(firstLine, lastLine);
// do your thing here with rawText
}
private String getRelevantLines(int startLine, int endLine) {
StringBuilder builder = new StringBuilder();
context.getFileLines().subList(startLine, endLine).forEach(builder::append);
return builder.toString();
}
If you want to refine further, you can also use firstToken().column or perhaps use the method name in your regex.
If you want more lines/bigger scope, just use the parent of that tree tree.parent()
This will also handle cases where the expression/params/etc span multiple lines.
There might be a better way... but I don't know of any other way. May update if I figure out something better.
I have enum defined in class as
public enum EnumSample {
SPACE,
NASA,
SPUTNIK;
}
In class Test, I have a method with following code snippet
if (str.contains(<>)) {
Is it possible to search all enum values in contain method of String?
You can iterate over the .values() of the enum and apply .contains() for each of them.
For example:
for (EnumSample value : EnumSample.values()) {
if (str.contains(value.name()) {
//do your thing
}
}
One problem with contains is that it finds parts of words - for example, it would find "NASA" in "NASAL DECONGESTANTS". If you would like your comparison to be fast, and look for specific words, not parts of words, use regex search instead.
The regex for your example would look like this:
\b(SPACE|NASA|SPUTNIK)\b
You can construct and use it like this:
static Pattern allEnumVals;
static {
StringBuilder b = new StringBuilder("\\b(");
boolean first = true;
for (EnumSample e : EnumSample.values()) {
if (!first) {
b.append("|");
} else {
first = false;
}
b.append(e.name());
}
b.append(")\\b");
allEnumVals = Pattern.compile(b.toString());
}
static boolean check(String str) {
return allEnumVals.matcher(str).find();
}
I'm trying to update some of my old Scala code to new APIs.
In one of the libraries I use, a case class has been converted to a simple POJO for compatibility reasons.
I was wondering if it is still possible somehow to use pattern matching for the Java class.
Imagine I have a simple Java class like:
public class A {
private int i;
public A(int i) {
this.i = i;
}
public int getI() {
return i;
}
}
After compilation, I would like to use it in pattern matching somehow like:
class Main extends App {
val a = ...
a match {
case _ # A(i) =>
println(i);
}
}
For the code above, I obviously get an error: Main.scala:7: error: object A is not a case class constructor, nor does it have an unapply/unapplySeq method.
Is there any trick I could use here?
Thanks in advance!
It's a little late in the night here for subtlety, but
object `package` {
val A = AX
}
object AX {
def unapply(a: A): Option[Int] = Some(a.getI)
}
object Test extends App {
Console println {
new A(42) match {
case A(i) => i
}
}
}
Write unapply yourself:
object A {
def unapply(x: A) = Some(x.getI)
}
#som-snytt's answer is correct - but if you are doing this just for e.g. pattern-matching then I prefer the more succinct approach:
import spray.httpx.{UnsuccessfulResponseException => UrUnsuccessfulResponseException}
object UnsuccessfulResponseException {
def unapply(a: UrUnsuccessfulResponseException): Option[HttpResponse]
= Some(a.response)
}
... match {
case Failure(UnsuccessfulResponseException(r)) => r
case ...
}
Ur is a pretentious way of saying "original", but it only takes two letters.