Parser with objects of custom class as a tokens - java

Looking for java parser generator that accept objects of custom class implementing Comparable interface (or perhaps some parser-specific interface) as an input tokens in resulting parser.
ANTLR seems to be too character stream-oriented.
Any suggestions?

Generally I use JavaCC. It is very simple and you can produce what you want.
Here you can find a simple tutorial.

Related

Constructing AST in ANTLR version4

I am developing a compiler and have already implemented lexer, parser and semantic analyzer(using listener and visitor) using ANTLR4. For code generation I am planning to generate LLVM IR using StringTemplate(ST).
To do so I am thinking of first constructing an AST and then generating the code.
My question here is do I need to construct AST?? or can I use Parse Tree?
If I need to use AST, I am not able to find any examples of manually constructing AST using visitors or listeners. Even a small grammar example will be very helpful.
Thank you.
No, there is no fundamental need to construct an AST. In the simplest case, you can walk the parse-tree and output the IR, directly or using ST.
Where transformations are required for output as IR, the two basic approaches are to (1) analyze and annotate the parse-tree describing the necessary changes; or (2) walk the parse-tree, constructing a separate AST, and then walk and transform the AST.
For the annotation strategy, extend ParseTreeProperty to create context node type specific property classes. See the comment in that class for how to use.
The AST strategy is not discouraged -- it was the primary strategy used in Antlr3 -- but is essentially unsupported in Antlr4. As for why Antlr4 favors the annotation strategy, see the last few paragraphs of this answer.

org.w3c.dom.NodeList doesn't extend Iterable

Is there any reason why would authors of Java org.w3c.dom library choose not to support the Iterable interface? For example, the interface NodeList seems like a perfect fit for extending Iterable.
The World Wide Web consortium has defined the Document Object Model (DOM) as follows:
The Document Object Model is a platform- and language-neutral
interface that will allow programs and scripts to dynamically access
and update the content, structure and style of documents.
It's implementation for a number of languages look very much like each other, which smart people thought to be a good idea, a lot of years ago when they designed it.
As a result, it doesn't look like anything familiar in any language.
If you want to use an alternative to the w3c DOM that does look like a Java library, use JDOM. Or map your XML to Java objects using a mapping/binding solution, such as JAXB
But if you need to interface with existing libraries that already use w3c DOM (like the built-in XSLT and XSD processors), then you're stuck with it. Unfortunately.
To #eis:
Yes there is a reason that you can't add an interface such as Iterable to NodeList, and that reason is that the Java binding of the Document Object Model is defined in the standard. Take NodeList, it is 100% defined in the standard. No room for any extra interfaces.
org/w3c/dom/NodeList.java:
package org.w3c.dom;
public interface NodeList {
public Node item(int index);
public int getLength();
}
There is no binding in the standard for C#, but there is one for EcmaScript. I believe the the IXMLDocument interfaces that you mention are also used for their EcmaScript implementation (but I could be wrong), in which case they still need to stick to the standard in terms of what methods they support and what the type hierarchy is.
The difference is that the EcmaScript binding only describes which methods should exist, while the Java binding describes the exact method in the interface.
There is no reason though in Java that the class that implements NodeList can't implement Iterable too. However, if your code depended on that it would not work with the DOM standard, but with a particular implementation only.
Microsoft has never really bothered with this fine distinction since they generally don't cater for multiple standards compliant implementations - if you use any of the methods that Microsoft has labelled with "* Denotes an extension to the World Wide Web Consortium (W3C) DOM." in Microsoft's implementation, then you're not using the DOM standard.

Whats the correct way to extend the functionality of DOM elements?

After taking quite a long break from active coding I am just starting to get accustomed to Java again, so this might be considered a "newbie question". Any help is appreciated.
Consider the following scenario. I am parsing an XML document as DOM. I am using javax.xml.parsers.DocumentBuilder to obtain an org.w3c.dom.Document node and scan through its org.w3c.dom.Element nodes, and I am fine with that.
However, I would like to extend the functionality of my org.w3c.dom.Element objects. Say, I would like to have a convenient way to extract some information from the nodes by giving them some public FancyObject toFancyObject() method. Whats the right way of doing this?
Considering that org.w3c.dom.Element is an interface, inheritance seems to be no option. Composition, on the other hand, seems to be quite cumbersome in this case, since this would be like 5% new functionality and 95% delegation of the existing methods.
Also, I am aware that I could always write a static utility method to obtain my FancyObject, but I would like to avoid this solution.
You have a couple of options:
Use the user data field of the Node interface. You can attach arbitrary objects to i t and build something that resembles your static variant.
Use JDOM or DOM4J instead. These APIs are better suited for your requirements w.r.t. extending base implementation classes. For example, with JDOM you can define a custom NodeFactory that can create the customized Element implementations.
Use JAXB to unmarshal the XML into an object graph. In this case, you have almost complete freedom to implement custom behavior.

implementation of shift reduce parser in java

i need to implement the shift reduce parser in my college ,i need to know how can i implement it using java
is there is any implementations already .... or any sample one
is there any implementations already?
Unless the task is to actually practice writing it yourself, I'd recommend using a parser generator such as JavaCUP or ANTLR. (I used JavaCUP in one of my compiler courses, but perhaps you have a different scope in your course.)

How to serialise an antlr3 AST

I have just started using antlr3 and am trying to serialize the AST output of a .g grammar.
Thanks,
Lezan
As Vladimir pointed out, you can use a a custom AST node class that has serialize capabilities builtin. You could also use a tree adaptor to create the types of nodes you need.
If you only need serialization, and not de-serialization, you could probably just do:
ast.toStringTree()
The above will give you a LISP like tree structure. An easy way to do serialization would be to use that in combination with a custom AST node class with an overridden toString(). Since toStringTree() uses the node's toStringTree method, it'll essentially serialize whatever you put in toString. Make its output sufficient and useful and you should be set.
CommonTree nodes produced by Parser are not Serializable.
I'd suggest you to serialize Tokens and use a secondary grammar for parsing the (deserialized) stream of Tokens later. In the book (The Definitive ANTLR Reference), in the Quick Tour for Impatient chapter, Terence Parr gives exactly this scenario -- without serialization though, but serialization is trivial for tokens as they are just text.
My understanding also that you can replace the Tree class with your own:
options {
ASTLabelType = MyOwnTreeClass;
}
But I haven't tried it.

Categories