Understanding trees in ANTLR - java

I'm trying to use Antlr for some text IDE-like functions -- specifically parsing a file to identify the points for code folding, and for applying syntax highlighting.
First question - is Antlr suitable for this requirement, or is it overkill? This could be achieved using regex and/or a hand-rolled parser ... but it seems that Antlr is out there to do this work for me.
I've had a look through the ... and the excellent tutorial resource here.
I've managed to get a Java grammar built (using the standard grammar), and get everything parsed neatly into a tree. However, I'd have expected to see elements nested within the tree. In actual fact, everything is a child of the very top element.
Eg. Given:
package com.example
public class Foo {
String myString = "Hello World"
// etc
}
I'd have expected the tree node for Foo to be a child of the node for the package declaration. Likewise, myString would be a child of Foo.
Instead, I'm finding that Foo and myString (and everything else for that matter) are all children of package.
Here's the relevant excerpt doing the parsing:
public void init() throws Exception {
CharStream c = new ANTLRFileStream(
"src/com/inversion/parser/antlr/Test.code");
Lexer lexer = new JavaLexer(c);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
parser.setTreeAdaptor(adaptor);
compilationUnit_return result = parser.compilationUnit();
}
static final TreeAdaptor adaptor = new CommonTreeAdaptor() {
public Object create(Token payload) {
if (payload != null)
{
System.out.println("Create " + JavaParser.tokenNames[payload.getType()] + ": L" + payload.getLine() + ":C" + payload.getCharPositionInLine() + " " + payload.getText());
}
return new CommonTree(payload);
}
};
Examining result.getTree() returns a CommonTree instance, whose children are the result of the parsing.
Expected value (perhaps incorrectly)
package com.example (4 tokens)
|
+-- public class Foo (3 tokens)
|
+--- String myString = "Hello World" (4 tokens)
+--- Comment "// etc"
(or something similar)
Actual value (All values are children of the root node of result.getTree() )
package
com
.
example
public
class
Foo
String
myString
=
"Hello World"
Is my understanding of how this should be working correct?
I'm a complete noob at Antlr so far, and I'm finding the learning curve quite steep.

The Java-6 grammar at the top of the file sharing section on antlr.org does not include tree building. You'll need to do two things. First, tell ANTLR you want to build an AST:
options {
output=AST;
}
Second, you need to tell it what the tree should look like by either using the tree operators or by using the rewrite rules. See the documentation on tree construction. I usually end up doing a combination of both.

To build tree, you should set output=AST. (Abstract syntax tree)
As far as I know, in an ANTLR only 1 token can be the root of a tree, so you can't get exactly what you're looking for, but you can get close.
Check out:
http://www.antlr.org/wiki/display/ANTLR3/Tree+construction

Related

How to get class field name using ANTLR4?

I have java9 source code and I need to extract fields of some classes using antlr4. This is my listener:
private static class FieldListener extends Java9BaseListener {
#Override
public void enterFieldDeclaration(Java9Parser.FieldDeclarationContext ctx) {
for (ParseTree subTree : ctx.children) {
System.out.println(subTree.getText());
}
String fieldName = ??????;
}
}
And this is the code
//Now, let's do some testing. First, we construct the lexer:
Java9Lexer java9Lexer = new Java9Lexer(CharStreams.fromString(classContent));
//Then, we instantiate the parser:
CommonTokenStream tokens = new CommonTokenStream(java9Lexer);
Java9Parser parser = new Java9Parser(tokens);
ParseTree tree = parser.compilationUnit();
//And then, the walker and the listener:
ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(new FieldListener(), tree);
However, I can only iterate all field tokens but can't get specific token - class fieldName. Could anyone say how to get it?
There's no match bullet here. Static semantic analysis is not easy because you have to run around a tree. But it is straight forward. You're going to have to extract the names of the fields by a tree walk from the variableDeclaratorList. If you print out a parse tree for some input, you can see that a variableDeclaratorList contains a list of variableDeclarator, each variableDeclarator contains a variableDeclaratorId, each variableDeclaratorId containing an identifier with optional dims, and each identifier a subtree of tokens JavaLetter and more JavaLetterOrDigit. How should you compute the attributes of the parse tree? That's up to you. Usually, people compute the synthesized attributes with this Antlr listener framework. You can define a mapping from the node to a data structure containing the attributes, so then when you want the attributes for nodes lower in the tree, you can do a lookup (in this case, maybe resulting in a list of strings for the variableDeclaratorList?). You can access a particular node using the accessor function for that child so you don't have to compute attributes from immediate descendants. Look at the generated parser code. You can see what is available, e.g., FieldDeclarationContext.variableDeclaratorList(). Or, if you like, you could call a function to derive the attributes for this specific variableDeclaratorList rather than use the Antlr listener framework.
The Antlr Listener does not have a short circuit to the walk, so be aware that if you write your own function to walk the variableDeclaratorList, you might not want to use a listener ParseTreeWalker.walk() and instead use the ParseTreeVisitor.visit().

Clojure RT/Compiler: How to Iterate through forms?

I am working on a Java project that has some Clojure involved. I know how to run compile and run clojure code:
public static void main(String[] args) throws Exception {
RT.init();
runCode();
}
public static Object runCode() {
String str = "(ns my-ns)" +
"(defn add [a b] (+ a b))" +
"(println (add 1 2))";
Compiler.load(new StringReader(str));
/* I know how to invoke it: */
Var foo = RT.var("my-ns", "add");
return foo.invoke(1,2);
}
What would be very useful at the point is to have a way to iterate over forms in Java, and in some sense "analyze" the compiler output. Basic things I want to know is:
What is the text source of a form?
What function is being called in a form.
What arguments are being passed to the function (forms are ok)
Be able to do this on top level forms, or drill in as needed.
Is there a way to do this using the clojure compiler, or runtime (or other Java classes in Clojure?) I see such compiler methods as analyze, for example:
Expr target = analyze(C.EXPRESSION, RT.second(form));
Though its not clear to me yet how form was constructed, and there are no Javadoc :-). Do I need to go The Compiler Source and figure out how it works?

How to invoke Xtend code from Java?

I have a code generator, which takes a syntax tree and converts it into a source file (text).
Basically, it traverses through all nodes of the tree, maps the node to text and appends the resulting texts to a StringBuilder.
Now I want the node to text mappers to be implemented using Xtend like this:
public class NodeXMapper
{
private XtendRunner xtendRunner = ...;
public String map(final NodeX aNode)
{
return xtendRunner.runScript("def String map(NodeX aNode) {
''' «aNode.fieldX» - «aNode.fieldY» '''
}", aNode);
}
}
xtendRunner.runScript(String aScript, final Object... aParams) is a method, which passes the parameters aParams to Xtend script aScript and returns the result.
How can I implement that method?
Update 1: Here I found this piece of code, which seems to run Xtend code in Java:
// setup
XtendFacade f = XtendFacade.create("my::path::MyExtensionFile");
// use
f.call("sayHello",new Object[]{"World"});
But I can't find XtendFacade class in the Type hiearchy view of Eclipse.
The interpreter you found was for the old Xtend1 language, which is not what you are looking for.
The new Xtend you are referring to is compiled, so there is no interpreter.
However, you could build an interpreted expression language using Xbase. See the documentation and Github for an example on how to do that. Then you could run the interpreter of your expression language from Java.

Way to view a parsed file output?

I am Vinod and am interested to use an ANTLR v3.3 for C parser generation in a Java project and generate the parsed tree in some viewable form. I got help to write grammar from this tutorial
ANTLR generates lexer and parser files for the grammar but I don't exactly get how these generated files are viewed. e.g. in few examples from above article, author has generated output using ASTFrame. I found only an interpreter option in ANTLRWorks which shows some tree but it gives error if predicates are more.
Any good reference book or article would be really helpful.
There's only one book you need:
The Definitive ANTLR Reference: Building Domain-Specific Languages.
After that, many more excellent books exist (w.r.t. DSL creation), but this is the book for getting started with ANTLR.
As you saw, ANTLRWorks will print both parse trees and the AST but won't work with predicates and the C target. While not a nice picture like ANTLRWorks, this will print a text version of the AST when you pass it the root of your tree.
void printNodes(pANTLR3_BASE_TREE thisNode, int level)
{
ANTLR3_UINT32 numChildren = thisNode->getChildCount(thisNode);
//printf("Child count %d\n",numChildren);
pANTLR3_BASE_TREE loopNode;
for(int i=0;i<numChildren;i++)
{
//Need to cast since these can hold anything
loopNode = (pANTLR3_BASE_TREE)thisNode->getChild(thisNode,i);
//Print this node
pANTLR3_STRING thisText = loopNode->getText(loopNode);
for(int j=0;j<level;j++)
printf(" ");
printf("%s\n",thisText->chars);
//If this node has a child
if(loopNode->getChildCount(loopNode) > 0)
printNodes(loopNode, level + 2);
}
}

Get declared methods in order they appear in source code

The situation seems to be abnormal, but I was asked to build serializer that will parse an object into string by concatenating results of "get" methods. The values should appear in the same order as their "get" equivalent is declared in source code file.
So, for example, we have
Class testBean1{
public String getValue1(){
return "value1";
}
public String getValue2(){
return "value2";
}
}
The result should be:
"value1 - value2"
An not
"value2 - value1"
It can't be done with Class object according to the documentation. But I wonder if I can find this information in "*.class" file or is it lost? If such data exists, maybe, someone knows a ready to use tool for that purpose? If such information can't be found, please, suggest the most professional way of achieving the goal. I thought about adding some kind of custom annotations to the getters of the class that should be serialized.
If you want that you have to parse the source code, not the byte code.
There are a number of libraries that parse a source file into a node tree, my favorite is the javaparser (hosted at code.google.com), which, in a slightly modified version, is also used by spring roo.
On the usage page you can find some samples. Basically you will want to use a Visitor that listens for MethodDefinitions.
Although reflection does not anymore (as of java 7 I think) give you the methods in the order in which they appear in the source code, the class file appears to still (as of Java 8) contain the methods in the order in which they appear in the source code.
So, you can parse the class file looking for method names and then sort the methods based on the file offset in which each method was found.
If you want to do it in a less hacky way you can use Javassist, which will give you the line number of each declared method, so you can sort methods by line number.
I don't think the information is retained.
JAXB, for example, has #XmlType(propOrder="field1, field2") where you define the order of the fields when they are serialized to xml. You can implemenet something similar
Edit: This works only on concrete classes (the class to inspect has its own .class file). I changed the code below to reflect this. Until diving deeper into the ClassFileAnalyzer library to work with classes directly instead of reading them from a temporary file this limitation exists.
Following approach works for me:
Download and import following libarary ClassFileAnalyzer
Add the following two static methods (Attention! getClussDump() needs a little modification for writing out the class file to a temporary file: I removed my code here because it's very special at this point):
public static String getClassDump(Class<?> c) throws Exception {
String classFileName = c.getSimpleName() + ".class";
URL resource = c.getResource(classFileName);
if (resource == null) {
throw new RuntimeException("Works only for concreate classes!");
}
String absolutePath = ...; // write to temp file and get absolute path
ClassFile classFile = new ClassFile(absolutePath);
classFile.parse();
Info infos = new Info(classFile, absolutePath);
StringBuffer infoBuffer = infos.getInfos();
return infoBuffer.toString();
}
public static <S extends List<Method>> S sortMethodsBySourceOrder(Class<?> c, S methods) throws Exception {
String classDump = getClassDump(c);
int index = classDump.indexOf("constant_pool_count:");
final String dump = classDump.substring(index);
Collections.sort(methods, new Comparator<Method>() {
public int compare(Method o1, Method o2) {
Integer i1 = Integer.valueOf(dump.indexOf(" " + o1.getName() + lineSeparator));
Integer i2 = Integer.valueOf(dump.indexOf(" " + o2.getName() + lineSeparator));
return i1.compareTo(i2);
}});
return methods;
}
Now you can call the sortMethodsBySourceOrder with any List of methods (because sorting arrays is not very comfortable) and you will get the list back sorted.
It works by looking at the class dumps constant pool which in turn can be determined by the library.
Greetz,
GHad
Write your custom annotation to store ordering data, then use Method.getAnnotation(Class annotationClass)

Categories