How to serialise an antlr3 AST - java

I have just started using antlr3 and am trying to serialize the AST output of a .g grammar.
Thanks,
Lezan

As Vladimir pointed out, you can use a a custom AST node class that has serialize capabilities builtin. You could also use a tree adaptor to create the types of nodes you need.
If you only need serialization, and not de-serialization, you could probably just do:
ast.toStringTree()
The above will give you a LISP like tree structure. An easy way to do serialization would be to use that in combination with a custom AST node class with an overridden toString(). Since toStringTree() uses the node's toStringTree method, it'll essentially serialize whatever you put in toString. Make its output sufficient and useful and you should be set.

CommonTree nodes produced by Parser are not Serializable.
I'd suggest you to serialize Tokens and use a secondary grammar for parsing the (deserialized) stream of Tokens later. In the book (The Definitive ANTLR Reference), in the Quick Tour for Impatient chapter, Terence Parr gives exactly this scenario -- without serialization though, but serialization is trivial for tokens as they are just text.
My understanding also that you can replace the Tree class with your own:
options {
ASTLabelType = MyOwnTreeClass;
}
But I haven't tried it.

Related

Constructing AST in ANTLR version4

I am developing a compiler and have already implemented lexer, parser and semantic analyzer(using listener and visitor) using ANTLR4. For code generation I am planning to generate LLVM IR using StringTemplate(ST).
To do so I am thinking of first constructing an AST and then generating the code.
My question here is do I need to construct AST?? or can I use Parse Tree?
If I need to use AST, I am not able to find any examples of manually constructing AST using visitors or listeners. Even a small grammar example will be very helpful.
Thank you.
No, there is no fundamental need to construct an AST. In the simplest case, you can walk the parse-tree and output the IR, directly or using ST.
Where transformations are required for output as IR, the two basic approaches are to (1) analyze and annotate the parse-tree describing the necessary changes; or (2) walk the parse-tree, constructing a separate AST, and then walk and transform the AST.
For the annotation strategy, extend ParseTreeProperty to create context node type specific property classes. See the comment in that class for how to use.
The AST strategy is not discouraged -- it was the primary strategy used in Antlr3 -- but is essentially unsupported in Antlr4. As for why Antlr4 favors the annotation strategy, see the last few paragraphs of this answer.

Parser with objects of custom class as a tokens

Looking for java parser generator that accept objects of custom class implementing Comparable interface (or perhaps some parser-specific interface) as an input tokens in resulting parser.
ANTLR seems to be too character stream-oriented.
Any suggestions?
Generally I use JavaCC. It is very simple and you can produce what you want.
Here you can find a simple tutorial.

Whats the correct way to extend the functionality of DOM elements?

After taking quite a long break from active coding I am just starting to get accustomed to Java again, so this might be considered a "newbie question". Any help is appreciated.
Consider the following scenario. I am parsing an XML document as DOM. I am using javax.xml.parsers.DocumentBuilder to obtain an org.w3c.dom.Document node and scan through its org.w3c.dom.Element nodes, and I am fine with that.
However, I would like to extend the functionality of my org.w3c.dom.Element objects. Say, I would like to have a convenient way to extract some information from the nodes by giving them some public FancyObject toFancyObject() method. Whats the right way of doing this?
Considering that org.w3c.dom.Element is an interface, inheritance seems to be no option. Composition, on the other hand, seems to be quite cumbersome in this case, since this would be like 5% new functionality and 95% delegation of the existing methods.
Also, I am aware that I could always write a static utility method to obtain my FancyObject, but I would like to avoid this solution.
You have a couple of options:
Use the user data field of the Node interface. You can attach arbitrary objects to i t and build something that resembles your static variant.
Use JDOM or DOM4J instead. These APIs are better suited for your requirements w.r.t. extending base implementation classes. For example, with JDOM you can define a custom NodeFactory that can create the customized Element implementations.
Use JAXB to unmarshal the XML into an object graph. In this case, you have almost complete freedom to implement custom behavior.

Existing implementations of Trees in Java?

I'm looking for any implementation of a pure Tree data structure for Java (that aren't graphical ones in java.awt), preferably generic.
With a generic tree I'd like to add elements that are not supposed to be sorted and do something like this:
TreeNode anotherNode = new TreeNode();
node.add(anotherNode);
…and then I'd like to traverse the nodes (so that I can save and preserve the structure in a file when I load the tree from the same file again).
Anyone knows what implementations exist or have any other idea to achieve this?
You can use the DefaultMutableTreeNode defined in the javax.swing.tree package. It contains methods getUserObject() and setUserObject(Object) allowing you to attach data to each tree node. It allows an arbitrary number of child nodes for each parent and provides methods for iterating over the tree in a breadth-first or depth-first fashion (breadthFirstEnumeration() / depthFirstEnumeration()).
Also note that despite being resident in the javax.swing.tree package this class does not contain any UI code; It is merely the underlying model of JTree.
Scala has a nice Tree data structure. It's a "General Balanced Tree." It's not exactly Java, but it's close, and can serve as a good model.
It is hard to believe, given how much is in the base Java libraries, but there is no good generic Tree structure.
For a start, the TreeSet and TreeMap in the runtime are red-black-tree implementations.
Assuming you don't want to store arbitrary Java objects on the nodes, you could use the W3C DOM. It even comes with its own serialization format (I forget what it's called :-).

Converting Antlr syntax tree into useful objects

I'm currently pondering how best to take an AST generated using Antlr and convert it into useful objects which I can use in my program.
The purpose of my grammar (apart from learning) is to create an executable (runtime interpretted) language.
For example, how would I take an attribute sub-tree and have a specific Attribute class instanciated. E.g.
The following code in my language:
Print(message:"Hello stackoverflow")
would product the following AST:
My current line of thinking is that a factory class could read the tree, pull out the name (message), and type(STRING) value("Hello stackoverflow"). Now, knowing the type I could instanciate the correct class (e.g. A StringAttribute class) and pass in the required attribute data - the name and value.
The same approach could be used for a definition factory, pulling out the definition name (Print), instanciating the Print class, and then passing in the attributes generated from the attribute factory.
Things do get a bit more complicated with a more complicated program:
Program(args:[1,2,3,4,5])
{
If(isTrue:IsInArray(array:{Program.args} value:5))
{
Then {
Print(message:"5 is in the array")
} Else {
Print(message:"More complex " + "message")
}
}
}
ANY/ALL help or thoughts are very welcome. Many thanks.
Previous related questions by me (Could be useful):
How do I make a tree
parser
Solving LL recursion problem
Antrl3 conditional tree rewrites
I recommend reading chapter 9, Building High-Level Interpreters, from Language Implementation Patterns by Terence Parr.
EDIT
Okay, to get you through the time waiting for that book, here's what you're (at least) going to need:
a global memory space;
function spaces (each function space will also have a (local) memory space);
and classes that spring to mind (in UML-ish style):
class Interpreter
global : MemorySpace
functions : Stack<Function>
...
class MemorySpace
vars : Map<String, Object>
...
class Function
local: MemorySpace
execute(): void
...
Here's one with ANTLR -> LLVM:
Once you have the AST, all you need is an iterator to walk the tree and the template to emit the objects you want.
This Tutorial is based on Flex and Bison but at the end he details how he converts his AST to LLVM assembly code, it might be helpful.

Categories