Converting Antlr syntax tree into useful objects - java

I'm currently pondering how best to take an AST generated using Antlr and convert it into useful objects which I can use in my program.
The purpose of my grammar (apart from learning) is to create an executable (runtime interpretted) language.
For example, how would I take an attribute sub-tree and have a specific Attribute class instanciated. E.g.
The following code in my language:
Print(message:"Hello stackoverflow")
would product the following AST:
My current line of thinking is that a factory class could read the tree, pull out the name (message), and type(STRING) value("Hello stackoverflow"). Now, knowing the type I could instanciate the correct class (e.g. A StringAttribute class) and pass in the required attribute data - the name and value.
The same approach could be used for a definition factory, pulling out the definition name (Print), instanciating the Print class, and then passing in the attributes generated from the attribute factory.
Things do get a bit more complicated with a more complicated program:
Program(args:[1,2,3,4,5])
{
If(isTrue:IsInArray(array:{Program.args} value:5))
{
Then {
Print(message:"5 is in the array")
} Else {
Print(message:"More complex " + "message")
}
}
}
ANY/ALL help or thoughts are very welcome. Many thanks.
Previous related questions by me (Could be useful):
How do I make a tree
parser
Solving LL recursion problem
Antrl3 conditional tree rewrites

I recommend reading chapter 9, Building High-Level Interpreters, from Language Implementation Patterns by Terence Parr.
EDIT
Okay, to get you through the time waiting for that book, here's what you're (at least) going to need:
a global memory space;
function spaces (each function space will also have a (local) memory space);
and classes that spring to mind (in UML-ish style):
class Interpreter
global : MemorySpace
functions : Stack<Function>
...
class MemorySpace
vars : Map<String, Object>
...
class Function
local: MemorySpace
execute(): void
...

Here's one with ANTLR -> LLVM:

Once you have the AST, all you need is an iterator to walk the tree and the template to emit the objects you want.

This Tutorial is based on Flex and Bison but at the end he details how he converts his AST to LLVM assembly code, it might be helpful.

Related

Constructing AST in ANTLR version4

I am developing a compiler and have already implemented lexer, parser and semantic analyzer(using listener and visitor) using ANTLR4. For code generation I am planning to generate LLVM IR using StringTemplate(ST).
To do so I am thinking of first constructing an AST and then generating the code.
My question here is do I need to construct AST?? or can I use Parse Tree?
If I need to use AST, I am not able to find any examples of manually constructing AST using visitors or listeners. Even a small grammar example will be very helpful.
Thank you.
No, there is no fundamental need to construct an AST. In the simplest case, you can walk the parse-tree and output the IR, directly or using ST.
Where transformations are required for output as IR, the two basic approaches are to (1) analyze and annotate the parse-tree describing the necessary changes; or (2) walk the parse-tree, constructing a separate AST, and then walk and transform the AST.
For the annotation strategy, extend ParseTreeProperty to create context node type specific property classes. See the comment in that class for how to use.
The AST strategy is not discouraged -- it was the primary strategy used in Antlr3 -- but is essentially unsupported in Antlr4. As for why Antlr4 favors the annotation strategy, see the last few paragraphs of this answer.

JavaParser - Get typed AST

I want to get typed AST from JavaParser or another parser of Java code. It means I would be able to get the type for a specific variable or parameters+returning type of a method. I googled a lot about this feature of JavaParser, but didn't find anything, I assume this is because JavaParser makes untyped AST. So, advise me how I can get this. But please don't say to parse all the code and make my own set of types, I tried and this is very hard, I think this is harder than making my own AST parser.
I am a JavaParser contributor and I just did that in Clojure, on top of JavaParser. I explain how implement that in one post How to build a symbol solver for Java, in Clojure
JavaParser, or any other parser just build an Abstract Syntax Tree (AST) of the code, then you have to resolve symbols to understand which references are associated to which declarations.
Suppose you have in your code something like:
a = 1;
Now, to understand the type of a you should find where it is declared. It could be a reference to a parameter, to a local variable, to field declared in a current class or to a field inherited. If it is a field inherited you should find the code (or the bytecode) of the parent class and look there for the declaration of a. A parser does not do that, a parse just take a string (or a file) and build an AST.
Build a symbol resolver is not rocket science but it requires a bit of work. The solution I described in the post linked above is available on GitHub and I would be glad to help you use it if you want (even if it is written in Clojure you can call it from Java quite easily)

How to get the variables name declared within a method in java

How to get the variables name declared within a method in java class?
for eg:
public class A {
String x;
public void xyz(){
int i;
String z = null;
//some code here
}
Method[] methods = A.class.getDeclaredMethods();
for (Method q = methods){
//some code here to get the variables declared inside method (i.e q)
}
}
How can i do that?
Thanks in advance..
There is no simple way to do this.
If those were fields, you could get their names using reflection. However, local variable and parameter names are not loaded into the JVM. So you would need to resort to reading the "A.class" file and extracting the debug information for that method. And the bad news is that if the class wasn't compiled with debug information, then even that wouldn't work.
There are libraries around for reading ".class" files for various purposes, but I can't give a specific recommendation.
But the $64,000 question is "But why ...?". What is the point of listing the local variable names for a method from Java? Can't you just look at the source code? Can't you dump the ".class" file with "javap" or decompile it with some 3rd party decompiler?
I thought for big programs it will be useful to understand and analyze it if we can come to know the variables their types and method names and their parameter list etc so only...
I think you just need a decent IDE ...
To paraphrase another answer, There's no simple way to do this with reflection.
There is a way to do it. You need a full Java source code parser and name/type resolver ("symbol tables").
The Java compiler offers internal APIs to get at that information. Eclipse JDT offers something similar. Our DMS Software Reengineering Toolkit offers a full parser with this information easily accessible, and considerable additional help to build analyzers and/or code generators that take advantage of this extra information. (You can see this information extracted by DMS in the example Java Source Code Browser at my site, see bio).

Any high-level byte-code editor?

Suppose I have a the following in Scala
object Foo {
var functions: List[String => String] = Nil // can be any type to any type.
def addFunc(f:String => String) = functions = f :: functions
}
At runtime, I am given Foo with some functions added. I now want to construct a new .class file implementing something like following in Scala:
object MyObject {
def process1(s:String) = // call Foo.functions(1)
}
I then want to save MyObject in bytecode that can be executed later on even when Foo is not there.
The above is just an example to show what I want to do . I am given the names MyObject, process1, and I have to generate an executable file MyObject.class. The source of MyObject is not needed (it could well have been Java source).
So, at a high level, we need to take memory "snapshot" of Foo.function(1), convert that snapshot into bytecode to store, and generate bytecode of MyObject using this.
All the bytecode engineering libraries I found are too low-level, so I was wondering if there is a higher level library that lets me deal with abstract objects such as functions etc.
Have you looked at the Tree model of ASM? I've only used the Event model before, but the Tree sounds like just what you're looking for. You'll find an overview in section 1.2.2 of the ASM user guide (a PDF--I don't think there's an HTML version, or I'd link that).
I will also recommend ASM framework. There is a paper from AOSD'07 about implementing common bytecode transformation patterns with ASM. Sections "Merging Two Classes into One" and "Inline Method" describing bytecode transformations very close to yours.

How to serialise an antlr3 AST

I have just started using antlr3 and am trying to serialize the AST output of a .g grammar.
Thanks,
Lezan
As Vladimir pointed out, you can use a a custom AST node class that has serialize capabilities builtin. You could also use a tree adaptor to create the types of nodes you need.
If you only need serialization, and not de-serialization, you could probably just do:
ast.toStringTree()
The above will give you a LISP like tree structure. An easy way to do serialization would be to use that in combination with a custom AST node class with an overridden toString(). Since toStringTree() uses the node's toStringTree method, it'll essentially serialize whatever you put in toString. Make its output sufficient and useful and you should be set.
CommonTree nodes produced by Parser are not Serializable.
I'd suggest you to serialize Tokens and use a secondary grammar for parsing the (deserialized) stream of Tokens later. In the book (The Definitive ANTLR Reference), in the Quick Tour for Impatient chapter, Terence Parr gives exactly this scenario -- without serialization though, but serialization is trivial for tokens as they are just text.
My understanding also that you can replace the Tree class with your own:
options {
ASTLabelType = MyOwnTreeClass;
}
But I haven't tried it.

Categories