Where to Add Checks for Validity in a Binary Search Tree

Where to Add Checks for Validity in a Binary Search Tree - java

I know this is a really specific question, but I wasn't quite sure how to phrase it more broadly, so if you recognize the broader question here, please answer it. I have implemented most of a BinarySearchTree class, but I'm not sure where to put the checks to make sure that it is a valid BinarySearchTree. Currently I have no checks, but I am considering checking before every relevant operation on that tree (i.e. check before searching, check before doing a tree walk, check before minimum, etc) as well as when the tree is constructed. If the tree isn't valid then I will throw an IllegalArgumentException. However, before I implemented this I considered instead making a new BinarySearchTreeNode class which extends BinaryTreeNode (which is the class I am currently using) and instead making the checks in the BinarySearchTreeNode class (on construction and whenever the children are set). So, I don't need an implementation of any of this (I already have an isValidBinaryTree method), but which is best practice: never checking if the tree is valid, checking if the tree is valid during the construction of the tree and before each method call on the tree, or making a BinarySearchTreeNode class which checks during construction and whenever the children are set?
Just for reference you don't really need to know what a binary search tree is. It is just a type of binary tree (each node has 0-2 children) where the value of the left child is <= the value of the parent and the value of the right child is >= the value of the parent for all nodes in the tree (this is what I refer to as a valid tree).

Your class will have contracts. A reasonable expectation for a binary search tree would be the contract that every such tree is indeed a binary search tree. This is called an invariant.
Every operation that manipulates such a tree should do it in such a way that this invariant is never broken.
This means that for every manipulation, you need to make sure that when the method finishes, the object still represents a binary search tree with all invariants intact.
Of course, you can make this easy or hard for you by choice of the API. If your tree exposes internal methods that allow for the invariants to be broken to the public, then your design is broken.
So, you should design your public API in such a way that you can indeed keep the invariants for every public method.
For example, the empty tree is indeed a binary search tree. If your add and remove methods make sure that the invariants hold (and no other methods that manipulate the state of the tree exist), then there is no reason at all to check if these invariants are true before searching. You can write a proof that they must be true at that point, if you did it right.

In my opinion it's most logical to check if your tree (or some other data structure) is valid is when it changes.
In the case of a binary search tree check for validity each time a node is inserted or removed.
If you construct the tree simply by inserting nodes then you get validity check during construction for free.
If construction is done some other way (e.g blindly copying some data) then you'd want to check the validity right after construction as well.
It wouldn't make much sense to do a validity check each time a search tree is queried since querying doesn't change it
(but you can be sure that the tree structure is good if you do the checks when changeing it).
Also, if possible, check if a modification would result in an invalid structure before actually doing the modification (and throw an exception instead).

Related

Can yacc be used to generate three address code for Java 1?

I have read that yacc generates bottom up parser for LALR(1) grammars. I have a grammar for Java 1 that can be used for generating three address code and is strictly LALR(1), but the translation scheme I am employing makes it L-attributed. Now I have read that L-attributed LR grammars cannot be translated during bottom up parsing. So, can yacc be used here or not? And if yes, how does yacc get around this problem?

You're not going to get a good answer unless you ask a specific, detailed question. Here's a vague sketch of an approach.
Synthesized attributes are obviously not a problem for a bottom-up parser, since they are computed in the final reduction action for the corresponding terminal. So the question comes down to "How can a bottom-up parser compute inherited attributes?"
Since the grammar is L-attributed, we know that any inherited attribute is computed from the attributes of its left siblings. Yacc/bison allows actions to be inserted anywhere in a right-hand side, and these "Mid-Rule Actions" (MRAs)
are executed as they are encountered. A MRA has available to it precisely its left-siblings, so which is all that is needed to compute an inherited attribute.
However, that doesn't show how the attribute can actually be inherited. A MRA inserted just before a grammar symbol in some rule can certainly be used to partially compute an inherited attribute of that symbol, but an inherited attribute can also use synthesized attributes of the children.
To accomplish that, we need to do two things:
Insert a MRA just before the non-terminal, which gathers together the left-sibling attributes. Since MRAs are also grammar symbols, this MRA will be the last left-sibling, in effect the youngest uncle of the terminal's children. (You don't necessarily need a MRA; you can insert a "marker": a non-terminal whose only production is empty and whose action is the MRA body. But that's not so convenient because the action will have to get at the semantic values of the preceding grammar symbols. Or you could split the production into two pieces, so that both actions are final.)
Access the uncle's attributes in the terminal's reduction action.
Bison/yacc allow the second step by letting you use a non-positibd symbol index to refer to slots in the parser stack. In particular, $0 refers to the symbol immediately preceding the non-terminal in the parent production (what I called the uncle above). Of course, for that to work, you have to ensure that the uncle is the same non-terminal (or at least has the same semantic type) in every production in which the non-terminal appears. This may require adding some markers.
Speaking of semantic values, you may be able to satisfy yourself that all the uncles of a given non-terminal are the same, or at least have the same type. But bison does not do this analysis, so it can't warn you if you get it wrong. Be careful! And as another consequence, you have to tell bison what the type is, so you can't just write $0: you need $<tag>0.
Note:
It is not always possible to handle inherited attributes in an L-attributed LR grammar, because at the moment in which the non-terminal is encountered, the parser may not yet know that the non-terminal will in fact form part of the parse tree. This problem does not occur in LL grammars, because in LL parsing the parser can only predict a non-terminal which is guaranteed to be present in the parse if the rest of the input is valid.
Any LL grammar can be parsed bottom-up, so there is no problem with L-attributed LL grammars. But the bottom-up parser can do better than that; it doesn't require that the full grammar be LL. Only those decision points for non-terminals which are about to be assigned an inherited attribute need to be LL-deterministic.
This restriction is enforced by the technique of placing a MRA or a marker immediately before the non-terminal. In other words, adding a marker (or an MRA) at certain points of an LR grammar might invalidate the LR property. There is a good discussion of this issue in the bison manual, so I won't elaborate on it here, except to observe one detail.
The technique outlined above for propagating inherited attributes uses MRAs (or markers) at strategic points to hold the inherited attributes. These productions must be reduced in order to proceed with the parse, so as noted in the above-mentioned section of the bison manual it may be necessary to rearrange the grammar in order to remove conflicts. In rare cases, this rewriting is not even possible.
However, removing the conflict might still result in a grammar in which an inherited attribute is propagated in case some non-terminal needs the value, without any guarantee that the non-terminal will eventually be reduced. In this case, the inherited attribute will be needlessly computed and then later ignored. But that shouldn't be a problem. Inherent in the concept of attributes is the idea that attributes are functional; in other words, that the computation is free of side-effects.
The absence of side effects means that an attribute grammar parser should be free to evaluate attributes in any order which respects attribute dependency. In particular, this means that you can trivially achieve correct evaluation of attributes by turning the attribute computations into continuations, a technique sometimes referred to as lazy evaluation or "thunking".
But there is always the temptation to use MRAs precisely in order to perform side-effects. One very common such side effect is printing three-address code to the output stream. Another one is mutating persistent data structures such as symbol tables. That's no longer L-attributed parsing, and so the suggestions offered here might not work for such applications.

Can we directly point to a node from Eclipse AST instead of visiting all the nodes

I am trying to parse a java file using Eclipse JDT's AST. ASTVisitor provides a nice API to traverse all the nodes and work with the node which we want. Now what I want is, can we go to a target node, let say of type MethodDeclaration or all the nodes of that type, instead of traversing all the nodes? Because this reduces time if I have to get all the nodes of a particular type in a whole package. Thanks in advance.

Finding all nodes of a given type inherently is traversing. ASTVisitor is suitable for this exact task.
If you are concerned about unnecessary traversal below the node you are interested in, just return false from the corresponding visit() method, and the visitor will not descend into children of the current node.
I'd be surprised, though, if traversing actually were a performance bottleneck. Creating the AST in the first place is more expensive than that.
If you only want to address few nodes (identified, e.g., by a name pattern), then performing a search (which relies on an index) could perhaps be faster, but this probably pays off only if a significant number of files can be skipped entirely.
Finally, as you mention MethodDeclaration: perhaps you don't even need AST but the Java Model (which is much more light weight) is sufficient for your task?

Java tree with N leaves?

I'm solving a Java problem which needs a java implementation of a tree with N leaves. For that, i decided to use XML Dom tree to represent the problem.
Is that possible with Dom4j ?
The problem is basically a tree of game which represents all moves to calculate the minimum number of moves required to win the game. Is that any useful sample for Dom4j ? Thanks.

I will attempt to answer but with some considerations in mind:
First of all to directly answer your question about whether it is possible to use a XML Dom tree to represent the problem? - Yes, it would be possible but in my opinion using an XML DOM tree to represent a data structure is not so natural (in fact I could also call it an overkill) unless you want to serialise that data structure to the disk or across the network as a result of an API call.
Examples are present here : http://dom4j.sourceforge.net/dom4j-1.6.1/guide.html
A cleaner and more simpler alternative would be to define a Node class like below:
class Node
{
int value; //assuming the node holds an integer value
List<Node> childNodes; // these are the N child nodes.
}
Your game algorithm can then iterate through the list of child nodes to perform it's computation logic.
One limitation that may appear with the above definition is that in case you want to search for a particular child node you need to iterate over the list - while in case you use DOM4j you could use XPath. But using DOM and XPath have their own limitations in terms of memory consumption - refer to this for details.
The simplistic data structure mentioned however above would not have the same memory implications - also it would be easier to manipulate the data structure during the computation process.
Hope this helps.

What's the name of a list with only one element

What is the 'formal' name of a list that has only one element?

Java Collections calls this a singleton list, although I don't think this is a formal name.
public static List singletonList(T o) Returns an immutable list containing only the specified object. The returned list is serializable.
I much prefer "single element list".
Edit: It appears that singleton is a valid mathematical term for a single element set.

I would call it a unitList (to avoid confusion with the Singleton pattern) in mathematics a set containing
only a single element is also known as a unit set

You could call it singleton, since that is the correct mathematical term for a set with exactly one element. The term singleton, however, is used differently in computer science and commonly refers to an object, of which there is exactly one instance at any given time during the execution of the program, so it might be misleading or confusing, therefore i'd avoid using it for a one element set.
In mathematics, a set with exactly one element is also known as unit set. This is a name I'd go for.

What should a Tree class contain?

My class is currently doing Binary Trees as part of our data structures unit. I understand that the Node class must have 3 parameters (value, left node, right node) in its constructor. As part of a requirement, we must have a Tree class. What is the purpose of a tree class? Is it for managing the entire set of Nodes? Does it simply contain the functions necessary to insert, remove and search for specific Nodes?
Thank you in advance.
My Node class:
class Node {
protected int data;
protected leftNode;
protected rightNode;
Node (int data, Node leftNode, Node rightNode){
this.data = data;
this.leftNode = leftNode;
this.rightNode = rightNode;
}
}

Yes, it is supposed to give the functional interface of a Tree by encapsulating all the behavior and algorithms related to the internal structure.
Why this is good?
Because you will define something that just provide some functionality and that works in a stand-alone way so that everyone should be able to use your tree class without caring about nodes, algorithms and whatever.
Ideally the class should be parametric so that you'll have Tree<T> and you'll be able to have generic methods for example
T getRoot()
Basically you'll have to project it so that it will allow you to
insert values
delete values
search values
visit the whole tree
whatever

As I see it, a Tree class should hold a reference to the root node of the tree, and to methods and attributes that operate on the tree as a whole, as opposed to methods that belong to each node, like getLeft(), getValue().
For example, I'd define a size attribute in the Tree class, and the methods that add or remove nodes to the tree (which also happen to be in this class) would be responsible for keeping the size up to date.

The purpose of any data structure is to give you a way to hold onto collections of related values and manipulate them in meaningful ways.
A tree is a special kind of data structure. It's a good choice for data that is hierarchical: those with natural parent-child relationships. A binary tree has, at most, two children for every parent node.
One other feature of trees that deserves special mention is the fact that it's self-similar: every Node in a tree is itself the root of a sub-tree. Recursion exploits this fact.
Yes, those are good methods to start with. You might want to have a look at the java.util.Map interface for others that might be useful.

The assumption is, as you suspect, a little bit flawed. The Node type that you have defined is a perfectly fine instance of a binary tree.
Jack mentions that you want to have a Tree type around the whole set of nodes in order to provide operations like getRoot, inserts, deletes and so on. This is of course not a bad idea, but it is by no means a requirement. Many of these operations could go on Node itself, and you don't necessarily have to have all of these operations; for example, there are arguments both for and against having a getRoot operation. An argument against it is that if you have an algorithm that at one point only needs a subtree, then the Tree object that holds on to the root Node prevents garbage collection of Nodes that your algorithm no longer needs.
But more generally, the real question that needs to be asked here is this: which of the things you're dealing are interface, and which are implementation? Because it's one thing if you want to expose binary trees as an interface to callers, and another if you want to provide some sort of finite map as the interface, and the binary tree is just an implementation detail. In the former case, the client would "know" that a tree is either null or a Node with a value and branches, and would be allowed to, for example, recurse on the structure of the tree; in the latter case, all that the client would "know" is that your class supports some form of put, get and delete operations, but it's not allowed to rely on the fact that this is stored as a search tree. In many variants of the latter case you indeed would use a Tree type as a front end to hide the nodes from the clients. (For example, Java's built-in java.util.TreeMap class does this.)
So the shortest answer is, really, it depends. The slightly longer answer is it depends on what the contract is between the binary tree implementation and its users; what things are the callers allowed to assume about the trees, what details of the binary tree are things that the author expects to be able to change in the future, etc.

"Is it for managing the entire set of Nodes?"
Yes.
"Does it simply contain the functions necessary to insert, remove and search for specific Nodes?"
Basically, yes. For that reason, it should contain at least a reference to the root node.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.