Im working on a UNI project and we have to develop a programming language from scratch. We use antlr4 to generate the parse tree. I'm currently working on getting a for loop to work, I have the grammar and can take the values out. My current problem is how to loop the statements in the body of the for loop.
Here is my grammar:
grammar LCT;
program: stmt*;
stmt: assignStmt
| invocationStmt
| show
| forStatement
;
assignStmt: VAR ID '=' expr;
invocationStmt: name=ID ((expr COMMA)* expr)?;
expr: ID | INT | STRING;
show: 'show' (INT | STRING | ID);
block : '{' statement* '}' ;
statement : block
| show
| assignStmt
;
forStatement : 'loop' ('(')? forConditions (')')? statement* ;
forConditions : iterator=expr 'from' startExpr=INT range='to' endExpr=INT ;
//tokens
COMMA: ',';
VAR: 'var';
INT: [0-9]+;
STRING: '"' (~('\n' | '"'))* '"';
ID: [a-zA-Z_] [a-zA-Z0-9_]*;
WS: [ \n\t\r]+ -> skip;
And this is the current listener that supports assigning and printing ints
package LCTlang;
import java.util.HashMap;
public class LCTCustomBaseListener extends LCTBaseListener {
HashMap<String, Integer> variableMap = new HashMap();
String[] keyWords = {"show", "var"};
#Override public void exitAssignStmt(LCTParser.AssignStmtContext ctx) {
this.variableMap.put(ctx.ID().getText(),
Integer.parseInt(ctx.expr().getText()));
}
#Override public void exitInvocationStmt(LCTParser.InvocationStmtContext ctx) {
this.variableMap.put(ctx.name.getText(),
Integer.parseInt(ctx.ID().getText()));
}
#Override
public void exitShow(LCTParser.ShowContext ctx) {
if(ctx.INT() != null) {
System.out.println(ctx.INT().getText());
}
if(ctx.STRING() != null) {
System.out.println(ctx.ID().getText());
}
else if(ctx.ID() != null) {
System.out.println(this.variableMap.get(ctx.ID().getText()));
}
}
#Override public void exitForStatement(LCTParser.ForStatementContext ctx) {
int start = Integer.parseInt(ctx.forConditions().startExpr.getText());
int end = Integer.parseInt(ctx.forConditions().endExpr.getText());
String varName = ctx.forConditions().iterator.getText();
int i;
for (i = start; i < end; i++) {
for (LCTParser.StatementContext state : ctx.statement()){
System.out.println(state);
}
}
}
}
My problem is in the looping of the statements, and how that is done.
A listener is going to be a poor choice for execution. You've turned the tree navigation over to a Tree Walker (hitting each node only once), that calls you back when it encounters nodes you're interested in. You won't convince it to walk the children nodes of some iteration node (while, for, etc.) more than once, and that's pretty much the point of iteration structures. It won't detect that a node is a call to a function and then navigate to that function. It's JUST walking through the ParseTree.
For some, fairly simple grammars (usually something like an expression evaluator (maybe a calculator), you could set up a visitor that returns whatever your expression datatype is (probably a Float for a calculator).
In your case, I'd suggest that ANTLR has provided its value. It's a Parser and has provided a ParseTree for you. Now it's up to you to write code that utilizes that parse tree to execute the functionality. You're now in the world of creating a runtime for your language. Thank ANTLR for making it easy to parse, and providing nice error messages and robust error recovery.
To execute your logic, you'll need to write code that uses those data structures, keeps up with the current value of variables, and, based on those values, decides to execute everything contained in that for/while/... loop. You'll have similar runtime work to evaluate boolean expressions to decide whether to execute children in if/else statements, etc. This runtime will also have to keep up with call stacks of functions calling other functions, etc. In short, executing your resulting logic will involve referencing the parsed input, but won't particularly look like navigating your parse Tree.
(Note: many people find a full parse tree to be a bit tedious to navigate (it tends to have a lot of intermediate nodes). In the past, I've written a Visitor to produce something more like an AST (Abstract Syntax Tree). It's a trimmed down tree that has the structure I want for further processing. This is not necessarily required, but you may find it easier to work with.)
Related
I am using Antlr4 to parse Java.g4 grammar file. The parser rule I was working with is :
typeArgument
: typeType
| '?' (('extends' | 'super') typeType)?
;
I have implemented the visitor method for this parser rule in the following way:
public String visitTypeArgument(JavaParser.TypeArgumentContext ctx) {
StringBuilder typArg = new StringBuilder();
if(ctx.getChild(0).getText().equalsIgnoreCase("?")){
// '?' (('extends' | 'super') typeType)?
typArg.append("?").append(" ");
TypeTypeContext typTypCtx = ctx.typeType();
if(typTypCtx != null){
typArg.append(ctx.getChild(1).getText()).append(" "); // <- Confusion is here
typArg.append(this.visitTypeType(typTypCtx));
}
}
else{
TypeTypeContext typTypCtx = ctx.typeType();
typArg.append(this.visitTypeType(typTypCtx));
}
return typArg.toString();
}
I have indicated the confusion in my code with a comment.I am parsing a typeArgument like <? extends SomeClassIdentifier>.
Why ctx.getChild(1).getText() returns "extends" instead of "extends SomeClassIdentifier"?
According to rule '?' (('extends' | 'super') typeType)? there should be only two child contexts i.e. one for ? and another for ('extends' | 'super') typeType'. Please help me clear my confusion anyone!!!
According to rule '?' (('extends' | 'super') typeType)? there should be only two child contexts i.e. one for ? and another for ('extends' | 'super') typeType'.
I don't think this is correct. Without seeing more of your grammar, I think you should get three children from this rule, assuming the optional (?) phrase is present in the input text:
? as an implicit lexer token
either extends or super as an implicit lexer token
typeType as a child context of its own, possibly with its own set of children, since your rule is recursive in that typeType can itself contain a typeType
Does that help? Examine the tree of children and I think it will make sense. The context tree for right-recursive rules can get pretty deep depending on your input text.
I got a small problem with the output of ANTLR.
Ive a realy small grammar which looks like this:
test : states;
states : '.states' state+;
state : stateID=ID {
System.out.println("state: " + $stateID.text);
| stateID=ID '{' state* '}' {
System.out.println("SubState: " + $stateID.text);};
And what I want to parse looks like this:
a{
b
c{
d
}
}
Well, the problem is, the first token I'll get is 'b' followed by 'd' and then 'c'.
But my intention is to parse it into my datastructure and I need to know their parents.
What I know by this order is, c is the parent of d, but what about b?
If I rewrote the example to this form:
a{
c{
d
}
b
}
Everything is fine. So is there a way to know who is the parent of b, without having the constraint to write it in the last example?
In ANTLR 4 using grammar-actions is no longer recommended. The parser may visit and test different rules and alternatives in unexpected orders, so unless you're adding error-handling code it's better to let the process run normally and then inspect the result.
So you let the parser create its tree, and then write a custom listener that will emit your println calls at each step. For example, suppose you're working with a grammar called Foo, so that and ANTLR autogenerates a FooBaseListener class.
So first you'd make something like:
public class PrintingFooListener extends FooBaseListener {
#Override
public void enterState(FooParser.StateContext ctx)
{
// It is possible to get all sorts of token/subrule/text
// information from the ctx input, especially if you labeled
// the parser/lexer rules.
System.out.println("I entered State");
}
}
Then use the ParseTreeWalker utility class to navigate through the parse tree with your visitor in-tow:
// Assume lexing, etc. already done before this point
ParserRuleContext<Token> tree = parser.myMainRule(); // Do parse
ParseTreeWalker walker = new ParseTreeWalker(); // Premade utility class
PrintingFooListener listener = new PrintingFooListener(); // Your customized subclass
walker.walk(listener, tree);
I've been puzzling over this for days and searching doesn't seem to give any results. Makes me wonder if it's possible. For example:
funct functionNAME (Object o) { o+1 };
The point is that The user has to use the identifier 'o' within the curly braces and not some other identifier. This is of course specified by the input in the (Object o) part where 'o' can be anything. Basically the identifier within the curly braces must be the same as the identifier defined in the parameter. I know I can store the matched token and print it out to screen but is it possible to use it as a lexical token itself? Thanks.
Yes there is a better way to do this. You need a symbol table. The job of a symbol table is to keep track of which identifiers can be used at each point in the program. Generally the symbol table also contains other information about the identifiers, such as what they represent (e.g. variable or function name) and what their types are.
Using a symbol table you can detect the use of variables that are not in scope during parsing for many languages but not all. E.g. C and Pascal are languages where identifiers must be declared before they are used (with a few exceptions). But other languages (e.g. Java) allow identifiers to be declared after they are used and in that case it is best not to try to detect errors such as use of an undeclared variable until after the program is parsed. (Indeed in Java you need to wait until all files are parsed, as identifiers might be declared in another file.)
I'll assume a simple scenario, which is that you only need to record information about variables, that there is no type information, and that things must be declared before use. That will get you started. I haven't bothered about adding the function name to the symbol table.
Suppose a symbol table is a stack of things called frames. Each frame is a mutable set of strings. (Later you may want to change that to a mutable map from strings to some additional information.)
void Start(): { }
{
<FUNCTION>
<IDENTIFIER>
{symttab.pushNewFrame() ;}
<LBRACKET> Parameters() <RBRACKET>
<LBRACE> Expression() <RBRACE>
{symtab.popFrame() ; }
}
void Parameters() : {}
{
( Parameter() (<COMMA> Parameter() )* )?
}
void Parameter() : { Token x ; }
<OBJECT> x=<IDENTIFIER>
{ if( symtab.topFrame().contains(x.image) ) reportError( ... ) ; }
{ symtab.topFrame().add(x.image) ; }
}
void Expression() : { }
{
Exp1() ( <PLUS> Exp1() )*
}
void Exp1() : { Token y ; }
{
y = <IDENTIFIER>
{ if( ! symtab.topFrame().contains(y.image) ) reportError( ... ) ; }
|
<NUMBER>
}
you can store the value of the identifier matchin o, and then check in the curly brace if the identifier there is the same, and, if not, throw an Exception.
Okay I have worked out a way to get what I want based on the example I gave in OP. It is a simple variant of the solution I have implemented in mine just to give a proof of concept. Trivial things such as token definitions will be left out for simplicity.
void Start():
{
Token x, y;
}
{
<FUNCTION>
<FUNCTION_NAME>
<LBRACKET>
<OBJECT>
x = <PARAMETER>
<RBRACKET>
<LBRACE>
y = <PARAMETER>
{
if (x.image.equals(y.image) == false)
{
System.out.println("Identifier must be specified in the parameters.");
System.exit(0);
}
}
<PLUS>
<DIGIT>
<RBRACE>
<COLON>
}
Is there a better way to do this?
I've got the following problem: I have a tree of objects of different classes where an action in the child class invalidates the parent. In imperative languages, it is trivial to do. For example, in Java:
public class A {
private List<B> m_children = new LinkedList<B>();
private boolean m_valid = true;
public void invalidate() {
m_valid = false;
}
public void addChild(B child) {
m_children.add(child);
child.m_parent = this;
}
}
public class B {
public A m_parent = null;
private int m_data = 0;
public void setData(int data) {
m_data = 0;
m_parent.invalidate();
}
}
public class Main {
public static void main(String[] args) {
A a = new A();
B b = new B();
b.setData(0); //invalidates A
}
}
How do I do the above in Haskell? I cannot wrap my mind around this, since once I construct an object in Haskell, it cannot be changed.
I would be much obliged if the relevant Haskell code is posted.
EDIT: the problem I am trying to solve is the following:
I have an application that edits documents. A document is a hierarchy of objects. When properties of children objects are modified, the document needs to be set to an invalid state, so as that the user knows that the document needs to be validated.
Modifying a tree which might require frequent excursions up the path to the root and back seems like the perfect job for a variant of the Zipper data structure with "scars", in the terminology of the original paper by Huet; the code samples from the paper also suggest a name of "memorising zipper". Of course, with some care, a regular zipper could also be used, but the augmented version might be more convenient and/or efficient to use.
The basic idea is the same as that behind a regular zipper, which already allows one to move up and down a tree in a purely functional manner (without any explicit back-pointers), but a "go up" operation followed by a "go down" operation becomes a no-op, leaving the focus at the original node (whereas with the regular zipper it would move it to the leftmost sibling of the original node).
Here's a link to the paper: Gérard Huet, Functional Pearl: The Zipper. It's just six pages, but the ideas contained therein are of great usefulness to any functional programmer.
To answer the question in your title: Yes, you can create nodes which have links to their parents as well as their children. Example:
-- parent children
data Tree = Node (Maybe Tree) [Tree]
root = Node Nothing [a,b] -- I can "forward reference" a and b because haskell is lazy
a = Node (Just root) []
b = Node (Just root) []
The question is whether that's useful for your particular use-case (often times it isn't).
Now the question in your body: You're right, you can't change a value after it's been created. So once you have a valid tree, you'll always have a valid tree as long as the variable referencing that tree is in scope.
You didn't really describe what problem you're trying to solve, so I can't tell you how to functionally model what you're trying to do, but I'm sure there's a way without mutating the tree.
Here is some zipper code that demonstrates easy modification of the data a cursor points at as well as a "global" property of the tree. We build a tree, move the cursor to the node initially containing a 1, change it to a 3, and are left with a cursor pointing at that node in a fully updated tree.
import Data.Maybe (fromJust)
import Data.Tree
import Data.Tree.Zipper
type NodeData = Either Bool Int
type TreePath a = [TreePos Full a -> TreePos Full a]
firstChild' = fromJust . firstChild
parent' = fromJust . parent
prev' = fromJust . prev
next' = fromJust . next
-- Determine the path from the root of the tree to the cursor.
pathToMe :: TreePos Full NodeData -> TreePath NodeData
pathToMe t | isRoot t = []
| isFirst t = firstChild' : pathToMe (parent' t)
| otherwise = next' : pathToMe (prev' t)
-- Mark a tree as invalid, but leave the cursor in the same place.
invalidate :: TreePos Full NodeData -> TreePos Full NodeData
invalidate t = foldr ($) (setLabel (Left False) (root t)) (pathToMe t)
-- Set a node's internal data.
setData :: Int -> TreePos Full NodeData -> TreePos Full NodeData
setData = (invalidate . ) . setLabel . Right
main = let tree1 = Node (Left True) [Node (Right 1) [], Node (Right 2) []]
Just cursor = firstChild (fromTree tree1)
tree2 = setData 3 cursor
in do putStrLn (drawTree (fmap show tree1))
putStrLn (drawTree (fmap show (toTree tree2)))
putStrLn $ "Cursor at "++show (label tree2)
Output:
Left True
|
+- Right 1
|
`- Right 2
Left False
|
+- Right 3
|
`- Right 2
Cursor at Right 3
I don't have much experience with Haskell, but as far as I know it's not possible to have circles in the reference graph in pure functional languages. That means that:
You can't have a 2-way lists, children in trees pointing to their parents, etc.*
It is usually not enough to change just one node. Any node that is changed requires changes in every node starting from the "root" of the data structures all the way to the node you wish to change.
The bottom line is, I wouldn't try to take a Java (or any other imperative language) algorithm and try to convert it to Haskell. Instead, try to find a more functional algorithm (and maybe even a different data structure) to solve the problem.
EDIT:
From your clarification it's not entirely clear whether or not you need to invalidate only the direct parent of the object that changed or all its ancestors in the hierarchy, but that doesn't actually matter that much. Since invalidating an object basically means changing it and that's not possible, you basically have to create a modified duplicate of that object, and then you have to make its parent point to it to, so you have to create a new object for that as well. This goes on until you get to the root. If you have some recursion to traverse the tree in order to "modify" your object, then you can recreate the path from that object to the root on your way out of the recursion.
Hope that made sense. :s
*As pointed out in the comments by jberryman and in other answers, it is possible to create circular reference graphs in Haskell using lazy evaluation.
Look into using the Functor instance of the Maybe type.
For example, maybe your problem is something like this: you want to insert an element into a binary tree, but only if it isn't already present. You could do that with something like:
data Tree a = Node a (Tree a) (Tree a)
| Tip
maybeInsert :: a -> Tree a -> Maybe (Tree a)
maybeInsert a Tip = Just $ Node a Tip Tip
maybeInsert a (Node a' l r)
| a == a' = Nothing
| a < a' = fmap (\l'-> Node a' l' r) (maybeInsert a l)
| a > a' = fmap (\r'-> Node a' l r') (maybeInsert a r)
So the function will return Nothing if we found the element to be already present, or return Just the new tree with the element inserted.
Hopefully that is relevant to whatever you are trying to do.
Couldn't laziness take care of making sure validation doesn't happen too often? That way, you don't need to store the m_valid field.
For example, if you only validate on save, then you can edit the objects to your hearts content, without revalidating all the time; only when the user presses the 'Save' button is the value of validateDoc computed. Since I don't know for sure what your notion of valid means and what you need it for, I might be totally of the mark.
Untried & incomplete code:
data Document = Document { subDocs :: [SubDoc] }
data SubDoc = SubDoc { content :: String }
addSubDoc :: SubDoc -> (Document -> Document)
addSubDoc = error "not yet implemented: addSubDoc"
modifySubDoc :: Int -> (SubDoc -> SubDoc) -> (Document -> Document)
modifySubDoc = error "not yet implemented: modifySubDoc"
validateDoc :: Document -> Bool
validateDoc = all validateSubDoc . subDocs
validateSubDoc :: SubDoc -> Bool
validateSubDoc = not . null . contents
I'm assuming the overall validity of the document depends only on the subdocuments (simulated here by ensuring that they contain a non-empty string).
By the way, I think you forgot a a.addChild(b); in main.
The following is the Xtext grammar for my DSL.
Model:
variableTypes=VariableTypes predicateTypes=PredicateTypes variableDeclarations=
VariableDeclarations rules=Rules;
VariableType:
name=ID;
VariableTypes:
'var types' (variableTypes+=VariableType)+;
PredicateTypes:
'predicate types' (predicateTypes+=PredicateType)+;
PredicateType:
name=ID '(' (variableTypes+=[VariableType|ID])+ ')';
VariableDeclarations:
'vars' (variableDeclarations+=VariableDeclaration)+;
VariableDeclaration:
name=ID ':' type=[VariableType|ID];
Rules:
'rules' (rules+=Rule)+;
Rule:
head=Head ':-' body=Body;
Head:
predicate=Predicate;
Body:
(predicates+=Predicate)+;
Predicate:
predicateType=[PredicateType|ID] '(' (terms+=Term)+ ')';
Term:
variable=Variable;
Variable:
variableDeclaration=[VariableDeclaration|ID];
terminal WS:
(' ' | '\t' | '\r' | '\n' | ',')+;
And, the following is a program in the above DSL.
var types
Node
predicate types
Edge(Node, Node)
Path(Node, Node)
vars
x : Node
y : Node
z : Node
rules
Path(x, y) :- Edge(x, y)
Path(x, y) :- Path(x, z) Path(z, y)
The following is my subclass of the generated Switch class that demonstrates the getPredicateType() returns null on a Predicate node.
public class ModelPrinter extends MyDSLSwitch<Object> {
protected Object visitChildren(EObject object) {
for (EObject eobj : object.eContents()) {
doSwitch(eobj);
}
return object;
}
#Override
public Object casePredicate(Predicate object) {
System.out.println(object.getPredicateType());
return object;
}
#Override
public Object defaultCase(EObject object) {
return visitChildren(object);
}
}
When I used the ModelPrinter class to traverse the EMF object model corresponding to the above program, I realized that the nodes are not linked together properly. For example, the getPredicateType() method on a Predicate node returns null. Having read the Xtext user's guide, my impression is that the Xtext default linking semantics should work for my DSL. But, for some reason, the AST nodes of my DSL don't get linked together properly. Can anyone help me in diagnosing this problem?
Finally, I figured out the problem. The links were not set properly because I wasn't loading the model properly. I had just used the parser to load the model. So, I didn't get the links. Therefore, I used the following code snippet from Xtext FAQ to load the model correctly. Then, I passed the returned model to my switch class.
// "workspace" is a string that contains the path to the workspace containing the DSL program.
new org.eclipse.emf.mwe.utils.StandaloneSetup().setPlatformUri(workspace);
Injector injector = new MyDslStandaloneSetup().createInjectorAndDoEMFRegistration();
XtextResourceSet resourceSet = injector.getInstance(XtextResourceSet.class);
resourceSet.addLoadOption(XtextResource.OPTION_RESOLVE_ALL, Boolean.TRUE);
// "DSLProgram" is a string that contains the path to the file of the DSL program relative to the workspace set above.
Resource resource = resourceSet.getResource(URI.createURI("platform:/resource/" + DSLProgram), true);
Model model = (Model) resource.getContents().get(0);
I have tried it out, but I am not familiar with the Switch, I rather used Xpand/Xtend to access predicateTypes from Predicate and generated their names.
Template.xpt:
«IMPORT myDsl»;
«DEFINE main FOR Model-»
«FILE "output.txt"-»
«FOREACH this.rules.rules.body.last().predicates AS p-»
«p.predicateType.name»
«ENDFOREACH-»
«ENDFILE-»
«ENDDEFINE»
and the output.txt:
Path
Path
I guess this is the expected behaviour.