The following is the Xtext grammar for my DSL.
Model:
variableTypes=VariableTypes predicateTypes=PredicateTypes variableDeclarations=
VariableDeclarations rules=Rules;
VariableType:
name=ID;
VariableTypes:
'var types' (variableTypes+=VariableType)+;
PredicateTypes:
'predicate types' (predicateTypes+=PredicateType)+;
PredicateType:
name=ID '(' (variableTypes+=[VariableType|ID])+ ')';
VariableDeclarations:
'vars' (variableDeclarations+=VariableDeclaration)+;
VariableDeclaration:
name=ID ':' type=[VariableType|ID];
Rules:
'rules' (rules+=Rule)+;
Rule:
head=Head ':-' body=Body;
Head:
predicate=Predicate;
Body:
(predicates+=Predicate)+;
Predicate:
predicateType=[PredicateType|ID] '(' (terms+=Term)+ ')';
Term:
variable=Variable;
Variable:
variableDeclaration=[VariableDeclaration|ID];
terminal WS:
(' ' | '\t' | '\r' | '\n' | ',')+;
And, the following is a program in the above DSL.
var types
Node
predicate types
Edge(Node, Node)
Path(Node, Node)
vars
x : Node
y : Node
z : Node
rules
Path(x, y) :- Edge(x, y)
Path(x, y) :- Path(x, z) Path(z, y)
The following is my subclass of the generated Switch class that demonstrates the getPredicateType() returns null on a Predicate node.
public class ModelPrinter extends MyDSLSwitch<Object> {
protected Object visitChildren(EObject object) {
for (EObject eobj : object.eContents()) {
doSwitch(eobj);
}
return object;
}
#Override
public Object casePredicate(Predicate object) {
System.out.println(object.getPredicateType());
return object;
}
#Override
public Object defaultCase(EObject object) {
return visitChildren(object);
}
}
When I used the ModelPrinter class to traverse the EMF object model corresponding to the above program, I realized that the nodes are not linked together properly. For example, the getPredicateType() method on a Predicate node returns null. Having read the Xtext user's guide, my impression is that the Xtext default linking semantics should work for my DSL. But, for some reason, the AST nodes of my DSL don't get linked together properly. Can anyone help me in diagnosing this problem?
Finally, I figured out the problem. The links were not set properly because I wasn't loading the model properly. I had just used the parser to load the model. So, I didn't get the links. Therefore, I used the following code snippet from Xtext FAQ to load the model correctly. Then, I passed the returned model to my switch class.
// "workspace" is a string that contains the path to the workspace containing the DSL program.
new org.eclipse.emf.mwe.utils.StandaloneSetup().setPlatformUri(workspace);
Injector injector = new MyDslStandaloneSetup().createInjectorAndDoEMFRegistration();
XtextResourceSet resourceSet = injector.getInstance(XtextResourceSet.class);
resourceSet.addLoadOption(XtextResource.OPTION_RESOLVE_ALL, Boolean.TRUE);
// "DSLProgram" is a string that contains the path to the file of the DSL program relative to the workspace set above.
Resource resource = resourceSet.getResource(URI.createURI("platform:/resource/" + DSLProgram), true);
Model model = (Model) resource.getContents().get(0);
I have tried it out, but I am not familiar with the Switch, I rather used Xpand/Xtend to access predicateTypes from Predicate and generated their names.
Template.xpt:
«IMPORT myDsl»;
«DEFINE main FOR Model-»
«FILE "output.txt"-»
«FOREACH this.rules.rules.body.last().predicates AS p-»
«p.predicateType.name»
«ENDFOREACH-»
«ENDFILE-»
«ENDDEFINE»
and the output.txt:
Path
Path
I guess this is the expected behaviour.
Related
Im working on a UNI project and we have to develop a programming language from scratch. We use antlr4 to generate the parse tree. I'm currently working on getting a for loop to work, I have the grammar and can take the values out. My current problem is how to loop the statements in the body of the for loop.
Here is my grammar:
grammar LCT;
program: stmt*;
stmt: assignStmt
| invocationStmt
| show
| forStatement
;
assignStmt: VAR ID '=' expr;
invocationStmt: name=ID ((expr COMMA)* expr)?;
expr: ID | INT | STRING;
show: 'show' (INT | STRING | ID);
block : '{' statement* '}' ;
statement : block
| show
| assignStmt
;
forStatement : 'loop' ('(')? forConditions (')')? statement* ;
forConditions : iterator=expr 'from' startExpr=INT range='to' endExpr=INT ;
//tokens
COMMA: ',';
VAR: 'var';
INT: [0-9]+;
STRING: '"' (~('\n' | '"'))* '"';
ID: [a-zA-Z_] [a-zA-Z0-9_]*;
WS: [ \n\t\r]+ -> skip;
And this is the current listener that supports assigning and printing ints
package LCTlang;
import java.util.HashMap;
public class LCTCustomBaseListener extends LCTBaseListener {
HashMap<String, Integer> variableMap = new HashMap();
String[] keyWords = {"show", "var"};
#Override public void exitAssignStmt(LCTParser.AssignStmtContext ctx) {
this.variableMap.put(ctx.ID().getText(),
Integer.parseInt(ctx.expr().getText()));
}
#Override public void exitInvocationStmt(LCTParser.InvocationStmtContext ctx) {
this.variableMap.put(ctx.name.getText(),
Integer.parseInt(ctx.ID().getText()));
}
#Override
public void exitShow(LCTParser.ShowContext ctx) {
if(ctx.INT() != null) {
System.out.println(ctx.INT().getText());
}
if(ctx.STRING() != null) {
System.out.println(ctx.ID().getText());
}
else if(ctx.ID() != null) {
System.out.println(this.variableMap.get(ctx.ID().getText()));
}
}
#Override public void exitForStatement(LCTParser.ForStatementContext ctx) {
int start = Integer.parseInt(ctx.forConditions().startExpr.getText());
int end = Integer.parseInt(ctx.forConditions().endExpr.getText());
String varName = ctx.forConditions().iterator.getText();
int i;
for (i = start; i < end; i++) {
for (LCTParser.StatementContext state : ctx.statement()){
System.out.println(state);
}
}
}
}
My problem is in the looping of the statements, and how that is done.
A listener is going to be a poor choice for execution. You've turned the tree navigation over to a Tree Walker (hitting each node only once), that calls you back when it encounters nodes you're interested in. You won't convince it to walk the children nodes of some iteration node (while, for, etc.) more than once, and that's pretty much the point of iteration structures. It won't detect that a node is a call to a function and then navigate to that function. It's JUST walking through the ParseTree.
For some, fairly simple grammars (usually something like an expression evaluator (maybe a calculator), you could set up a visitor that returns whatever your expression datatype is (probably a Float for a calculator).
In your case, I'd suggest that ANTLR has provided its value. It's a Parser and has provided a ParseTree for you. Now it's up to you to write code that utilizes that parse tree to execute the functionality. You're now in the world of creating a runtime for your language. Thank ANTLR for making it easy to parse, and providing nice error messages and robust error recovery.
To execute your logic, you'll need to write code that uses those data structures, keeps up with the current value of variables, and, based on those values, decides to execute everything contained in that for/while/... loop. You'll have similar runtime work to evaluate boolean expressions to decide whether to execute children in if/else statements, etc. This runtime will also have to keep up with call stacks of functions calling other functions, etc. In short, executing your resulting logic will involve referencing the parsed input, but won't particularly look like navigating your parse Tree.
(Note: many people find a full parse tree to be a bit tedious to navigate (it tends to have a lot of intermediate nodes). In the past, I've written a Visitor to produce something more like an AST (Abstract Syntax Tree). It's a trimmed down tree that has the structure I want for further processing. This is not necessarily required, but you may find it easier to work with.)
I am using Antlr4 to parse Java.g4 grammar file. The parser rule I was working with is :
typeArgument
: typeType
| '?' (('extends' | 'super') typeType)?
;
I have implemented the visitor method for this parser rule in the following way:
public String visitTypeArgument(JavaParser.TypeArgumentContext ctx) {
StringBuilder typArg = new StringBuilder();
if(ctx.getChild(0).getText().equalsIgnoreCase("?")){
// '?' (('extends' | 'super') typeType)?
typArg.append("?").append(" ");
TypeTypeContext typTypCtx = ctx.typeType();
if(typTypCtx != null){
typArg.append(ctx.getChild(1).getText()).append(" "); // <- Confusion is here
typArg.append(this.visitTypeType(typTypCtx));
}
}
else{
TypeTypeContext typTypCtx = ctx.typeType();
typArg.append(this.visitTypeType(typTypCtx));
}
return typArg.toString();
}
I have indicated the confusion in my code with a comment.I am parsing a typeArgument like <? extends SomeClassIdentifier>.
Why ctx.getChild(1).getText() returns "extends" instead of "extends SomeClassIdentifier"?
According to rule '?' (('extends' | 'super') typeType)? there should be only two child contexts i.e. one for ? and another for ('extends' | 'super') typeType'. Please help me clear my confusion anyone!!!
According to rule '?' (('extends' | 'super') typeType)? there should be only two child contexts i.e. one for ? and another for ('extends' | 'super') typeType'.
I don't think this is correct. Without seeing more of your grammar, I think you should get three children from this rule, assuming the optional (?) phrase is present in the input text:
? as an implicit lexer token
either extends or super as an implicit lexer token
typeType as a child context of its own, possibly with its own set of children, since your rule is recursive in that typeType can itself contain a typeType
Does that help? Examine the tree of children and I think it will make sense. The context tree for right-recursive rules can get pretty deep depending on your input text.
I am currently implementing cross-referencing for my Xtext dsl. A dsl file can contain more then one XImportSection and in some special case an XImportSection does not necessariely contain all import statements. It means I need to customize the "XImportSectionNamespaceScopeProvider" to find/build the correct XimportSection. During the implementation I figured out an unexpected behavior of the editor and/or some validation.
I used the following dsl code snipped for testing my implementation:
delta MyDelta {
adds {
package my.pkg;
import java.util.List;
public class MyClass
implements List
{
}
}
modifies my.pkg.MyClass { // (1)
adds import java.util.ArrayList;
adds superclass ArrayList<String>;
}
}
The dsl source code is described by the following grammar rules (not complete!):
AddsUnit:
{AddsUnit} 'adds' '{' unit=JavaCompilationUnit? '}';
ModifiesUnit:
'modifies' unit=[ClassOrInterface|QualifiedName] '{'
modifiesPackage=ModifiesPackage?
modifiesImports+=ModifiesImport*
modifiesSuperclass=ModifiesInheritance?
'}';
JavaCompilationUnit:
=> (annotations+=Annotation*
'package' name=QualifiedName EOL)?
importSection=XImportSection?
typeDeclarations+=ClassOrInterfaceDeclaration;
ClassOrInterfaceDeclaration:
annotations+=Annotation* modifiers+=Modifier* classOrInterface=ClassOrInterface;
ClassOrInterface: // (2a)
ClassDeclaration | InterfaceDeclaration | EnumDeclaration | AnnotationTypeDeclaration;
ClassDeclaration: // (2b)
'class' name=QualifiedName typeParameters=TypeParameters?
('extends' superClass=JvmTypeReference)?
('implements' interfaces=Typelist)?
body=ClassBody;
To provide better tool support, a ModifiesUnit references the class which is modified. This Xtext specific implementation enables hyperlinking to the class.
I am currently working on customized XImportSectionScopeProvider which provides all namespace scopes for a ModifiesUnit. The default implemantation contain a method protected List<ImportNormalizer> internalGetImportedNamespaceResolvers(EObject context, boolean ignoreCase) assumes that there is only one class-like element in a source file. But for my language there can be more then one. For this reason I have to customize it.
My idea now is the following implementation (using the Xtend programming language):
override List<ImportNormalizer> internalGetImportedNamespaceResolvers(EObject context, boolean ignoreCase) {
switch (context) {
ModifiesUnit: context.buildImportSection
default: // ... anything else
}
}
Before I startet this work, the reference worked fine and nothing unexpected happend. My goal now is to build a customized XImportSection for the ModifiesUnit which is used by Xbase to resolve references to JVM types. To do that, I need a copy of the XImportSection of the referenced ClassOrInterface. To get access to the XImportSection, I first call ModifiesUnit.getUnit(). Directly after this call is executed, the editor shows the unexpected behaviour. The minimal implementation which leads to the error looks like this:
def XImportSection buildImportSection(ModifiesUnit u) {
val ci = u.unit // Since this expression is executed, the error occurs!
// ...
}
Here, I don't know what is going internally! But it calculates an error. The editor shows the follwoing error on the qualified name at (1): "Cyclic linking detected : ModifiesUnit.unit->ModifiesUnit.unit".
My questions are: What does it mean? Why does Xtext show this error? Why does it appear if I access the referenced object?
I also figured out a strange thing there: In my first approach my code threw a NullPointerException. Ok, I tried to figure out why by printing the object ci. The result is:
org.deltaj.scoping.deltaJ.impl.ClassOrInterfaceImpl#4642f064 (eProxyURI: platform:/resource/Test/src/My.dj#xtextLink_::0.0.0.1.1::0::/2)
org.deltaj.scoping.deltaJ.impl.ClassDeclarationImpl#1c70366 (name: MyClass)
Ok, it seems to be that this method is executed two times and Xtext resolves the proxy between the first and second execution. It is fine for me as long as the received object is the correct one once. I handle it with an if-instanceof statement.
But why do I get two references there? Does it rely on the ParserRule ClassOrInterface (2a) which only is an abstract super rule of ClassDeclaration (2b)? But why is Xtext not able to resolve the reference for the ClassOrInterface?
OK, now I found a solution for my problem. During I was experimenting with my implementation, I saw that the "Problems" view stil contained unresolved references. This was the reason to rethink what my implementation did. At first, I decided to build the returned list List<ImportNormalizer directly instead of building an XImportSection which then will be converted to this list. During implementing this, I noticed that I have built the scope only for ModifiesUnitelements instead of elements which need the scope within a ModifiesUnit. This is the reason for the cyclic linking error. Now, I am building the list only if it is needed. The result is that the cyclic linking error occurs does not occur any more and all references to JVM types are resolved correctly without any errors in the problems view.
My implementation now looks like this:
class DeltaJXImportSectionNamespaceScopeProvider extends XImportSectionNamespaceScopeProvider {
override List<ImportNormalizer> internalGetImportedNamespaceResolvers(EObject context, boolean ignoreCase) {
// A scope will only be provided for elements which really need a scope. A scope is only necessary for elements
// which are siblings of a JavaCompilationUnit or a ModifiesUnit.
if (context.checkElement) { // (1)
return Collections.emptyList
}
// Finding the container which contains the import section
val container = context.jvmUnit // (2)
// For a non null container create the import normalizer list depending of returned element. If the container is
// null, no scope is needed.
return if (container != null) { // (3)
switch (container) {
JavaCompilationUnit: container.provideJcuImportNormalizerList(ignoreCase)
ModifiesUnit: container.provideMcuImportNormalizerList(ignoreCase)
}
} else {
Collections.emptyList
}
}
// Iterates upwards through the AST until a ModifiesUnit or a JavaCompilationUnit is found. (2)
def EObject jvmUnit(EObject o) {
switch (o) {
ModifiesUnit: o
JavaCompilationUnit: o
default: o.eContainer.jvmUnit
}
}
// Creates the list with all imports of a JCU (3a)
def List<ImportNormalizer> provideJcuImportNormalizerList(JavaCompilationUnit jcu, boolean ignoreCase) {
val is = jcu.importSection
return if (is != null) {
is.getImportedNamespaceResolvers(ignoreCase)
} else {
Collections.emptyList
}
}
// Creates the list of all imports of a ModifiesUnit. This implementation is similar to
// getImportedNamespaceResolvers(XImportSection, boolean) // (3b)
def List<ImportNormalizer> provideMcuImportNormalizerList(ModifiesUnit mu, boolean ignoreCase) {
val List<ImportNormalizer> result = Lists.newArrayList
result.addAll((mu.unit.jvmUnit as JavaCompilationUnit).provideJcuImportNormalizerList(ignoreCase))
for (imp : mu.modifiesImports) {
if (imp instanceof AddsImport) {
val decl = imp.importDeclaration
if (!decl.static) {
result.add(decl.transform(ignoreCase))
}
}
}
result
}
// Creates an ImportNormalizer for a given XImportSection
def ImportNormalizer transform(XImportDeclaration decl, boolean ignoreCase) {
var value = decl.importedNamespace
if (value == null) {
value = decl.importedTypeName
}
return value.createImportedNamespaceResolver(ignoreCase)
}
// Determines whether an element needs to be processed. (1)
def checkElement(EObject o) {
return o instanceof DeltaJUnit || o instanceof Delta || o instanceof AddsUnit || o instanceof ModifiesUnit ||
o instanceof RemovesUnit
}
}
As one can see, elements which do not need namespaces for correct scopes, will be ignored (1).
For each element which might need namespace for a correct scope the n-father element which directly contains the imports is determined (2).
With the correct father element the namespace list can be calculated (3) for JCU's (3a) and MU's (3b).
I got a small problem with the output of ANTLR.
Ive a realy small grammar which looks like this:
test : states;
states : '.states' state+;
state : stateID=ID {
System.out.println("state: " + $stateID.text);
| stateID=ID '{' state* '}' {
System.out.println("SubState: " + $stateID.text);};
And what I want to parse looks like this:
a{
b
c{
d
}
}
Well, the problem is, the first token I'll get is 'b' followed by 'd' and then 'c'.
But my intention is to parse it into my datastructure and I need to know their parents.
What I know by this order is, c is the parent of d, but what about b?
If I rewrote the example to this form:
a{
c{
d
}
b
}
Everything is fine. So is there a way to know who is the parent of b, without having the constraint to write it in the last example?
In ANTLR 4 using grammar-actions is no longer recommended. The parser may visit and test different rules and alternatives in unexpected orders, so unless you're adding error-handling code it's better to let the process run normally and then inspect the result.
So you let the parser create its tree, and then write a custom listener that will emit your println calls at each step. For example, suppose you're working with a grammar called Foo, so that and ANTLR autogenerates a FooBaseListener class.
So first you'd make something like:
public class PrintingFooListener extends FooBaseListener {
#Override
public void enterState(FooParser.StateContext ctx)
{
// It is possible to get all sorts of token/subrule/text
// information from the ctx input, especially if you labeled
// the parser/lexer rules.
System.out.println("I entered State");
}
}
Then use the ParseTreeWalker utility class to navigate through the parse tree with your visitor in-tow:
// Assume lexing, etc. already done before this point
ParserRuleContext<Token> tree = parser.myMainRule(); // Do parse
ParseTreeWalker walker = new ParseTreeWalker(); // Premade utility class
PrintingFooListener listener = new PrintingFooListener(); // Your customized subclass
walker.walk(listener, tree);
I've got the following problem: I have a tree of objects of different classes where an action in the child class invalidates the parent. In imperative languages, it is trivial to do. For example, in Java:
public class A {
private List<B> m_children = new LinkedList<B>();
private boolean m_valid = true;
public void invalidate() {
m_valid = false;
}
public void addChild(B child) {
m_children.add(child);
child.m_parent = this;
}
}
public class B {
public A m_parent = null;
private int m_data = 0;
public void setData(int data) {
m_data = 0;
m_parent.invalidate();
}
}
public class Main {
public static void main(String[] args) {
A a = new A();
B b = new B();
b.setData(0); //invalidates A
}
}
How do I do the above in Haskell? I cannot wrap my mind around this, since once I construct an object in Haskell, it cannot be changed.
I would be much obliged if the relevant Haskell code is posted.
EDIT: the problem I am trying to solve is the following:
I have an application that edits documents. A document is a hierarchy of objects. When properties of children objects are modified, the document needs to be set to an invalid state, so as that the user knows that the document needs to be validated.
Modifying a tree which might require frequent excursions up the path to the root and back seems like the perfect job for a variant of the Zipper data structure with "scars", in the terminology of the original paper by Huet; the code samples from the paper also suggest a name of "memorising zipper". Of course, with some care, a regular zipper could also be used, but the augmented version might be more convenient and/or efficient to use.
The basic idea is the same as that behind a regular zipper, which already allows one to move up and down a tree in a purely functional manner (without any explicit back-pointers), but a "go up" operation followed by a "go down" operation becomes a no-op, leaving the focus at the original node (whereas with the regular zipper it would move it to the leftmost sibling of the original node).
Here's a link to the paper: Gérard Huet, Functional Pearl: The Zipper. It's just six pages, but the ideas contained therein are of great usefulness to any functional programmer.
To answer the question in your title: Yes, you can create nodes which have links to their parents as well as their children. Example:
-- parent children
data Tree = Node (Maybe Tree) [Tree]
root = Node Nothing [a,b] -- I can "forward reference" a and b because haskell is lazy
a = Node (Just root) []
b = Node (Just root) []
The question is whether that's useful for your particular use-case (often times it isn't).
Now the question in your body: You're right, you can't change a value after it's been created. So once you have a valid tree, you'll always have a valid tree as long as the variable referencing that tree is in scope.
You didn't really describe what problem you're trying to solve, so I can't tell you how to functionally model what you're trying to do, but I'm sure there's a way without mutating the tree.
Here is some zipper code that demonstrates easy modification of the data a cursor points at as well as a "global" property of the tree. We build a tree, move the cursor to the node initially containing a 1, change it to a 3, and are left with a cursor pointing at that node in a fully updated tree.
import Data.Maybe (fromJust)
import Data.Tree
import Data.Tree.Zipper
type NodeData = Either Bool Int
type TreePath a = [TreePos Full a -> TreePos Full a]
firstChild' = fromJust . firstChild
parent' = fromJust . parent
prev' = fromJust . prev
next' = fromJust . next
-- Determine the path from the root of the tree to the cursor.
pathToMe :: TreePos Full NodeData -> TreePath NodeData
pathToMe t | isRoot t = []
| isFirst t = firstChild' : pathToMe (parent' t)
| otherwise = next' : pathToMe (prev' t)
-- Mark a tree as invalid, but leave the cursor in the same place.
invalidate :: TreePos Full NodeData -> TreePos Full NodeData
invalidate t = foldr ($) (setLabel (Left False) (root t)) (pathToMe t)
-- Set a node's internal data.
setData :: Int -> TreePos Full NodeData -> TreePos Full NodeData
setData = (invalidate . ) . setLabel . Right
main = let tree1 = Node (Left True) [Node (Right 1) [], Node (Right 2) []]
Just cursor = firstChild (fromTree tree1)
tree2 = setData 3 cursor
in do putStrLn (drawTree (fmap show tree1))
putStrLn (drawTree (fmap show (toTree tree2)))
putStrLn $ "Cursor at "++show (label tree2)
Output:
Left True
|
+- Right 1
|
`- Right 2
Left False
|
+- Right 3
|
`- Right 2
Cursor at Right 3
I don't have much experience with Haskell, but as far as I know it's not possible to have circles in the reference graph in pure functional languages. That means that:
You can't have a 2-way lists, children in trees pointing to their parents, etc.*
It is usually not enough to change just one node. Any node that is changed requires changes in every node starting from the "root" of the data structures all the way to the node you wish to change.
The bottom line is, I wouldn't try to take a Java (or any other imperative language) algorithm and try to convert it to Haskell. Instead, try to find a more functional algorithm (and maybe even a different data structure) to solve the problem.
EDIT:
From your clarification it's not entirely clear whether or not you need to invalidate only the direct parent of the object that changed or all its ancestors in the hierarchy, but that doesn't actually matter that much. Since invalidating an object basically means changing it and that's not possible, you basically have to create a modified duplicate of that object, and then you have to make its parent point to it to, so you have to create a new object for that as well. This goes on until you get to the root. If you have some recursion to traverse the tree in order to "modify" your object, then you can recreate the path from that object to the root on your way out of the recursion.
Hope that made sense. :s
*As pointed out in the comments by jberryman and in other answers, it is possible to create circular reference graphs in Haskell using lazy evaluation.
Look into using the Functor instance of the Maybe type.
For example, maybe your problem is something like this: you want to insert an element into a binary tree, but only if it isn't already present. You could do that with something like:
data Tree a = Node a (Tree a) (Tree a)
| Tip
maybeInsert :: a -> Tree a -> Maybe (Tree a)
maybeInsert a Tip = Just $ Node a Tip Tip
maybeInsert a (Node a' l r)
| a == a' = Nothing
| a < a' = fmap (\l'-> Node a' l' r) (maybeInsert a l)
| a > a' = fmap (\r'-> Node a' l r') (maybeInsert a r)
So the function will return Nothing if we found the element to be already present, or return Just the new tree with the element inserted.
Hopefully that is relevant to whatever you are trying to do.
Couldn't laziness take care of making sure validation doesn't happen too often? That way, you don't need to store the m_valid field.
For example, if you only validate on save, then you can edit the objects to your hearts content, without revalidating all the time; only when the user presses the 'Save' button is the value of validateDoc computed. Since I don't know for sure what your notion of valid means and what you need it for, I might be totally of the mark.
Untried & incomplete code:
data Document = Document { subDocs :: [SubDoc] }
data SubDoc = SubDoc { content :: String }
addSubDoc :: SubDoc -> (Document -> Document)
addSubDoc = error "not yet implemented: addSubDoc"
modifySubDoc :: Int -> (SubDoc -> SubDoc) -> (Document -> Document)
modifySubDoc = error "not yet implemented: modifySubDoc"
validateDoc :: Document -> Bool
validateDoc = all validateSubDoc . subDocs
validateSubDoc :: SubDoc -> Bool
validateSubDoc = not . null . contents
I'm assuming the overall validity of the document depends only on the subdocuments (simulated here by ensuring that they contain a non-empty string).
By the way, I think you forgot a a.addChild(b); in main.