How to construct a unordered binary tree from text file - java

Dear Friends I am an intermediate java user. I am stuck in a following problem. I want to construct a unordered binary tree [or general tree having at most two nodes] from a multi line (let say 40 lines) text file. The text file is then divided into two halfs; let say 20:20 lines. Then for each half a specific (let say hash) value is calculated and stored in the root node. So each node contains four elements. Two pointers to the two children (left and right) and two hashes of the two halfs of the original file. Next for each half (20 lines) the process is repeated until at each leaf we have a single line of text. Let the node have
public class BinaryTree {
private BinaryTreeNode leftNode, rightNode;
private String leftHash,rightHash;
}
I need help for writing the tree construction and searching functions. Well searching is performed by entering a line. Then hash code is created for this query line and compared against the two hashes saved at each node. If the hash of query line is close to leftHas then leftNode is accessed and if the hash of query line is close to rightHash then rightNode is accessed. The process continues until an exact hash is found.
I just need the tree construction and search teachnique. The hash comparison etc are not a problem

You'll need to start by reading the file into a string.
The first character in the string could be used as the root. Root + 1 would be the left, root + 2 would be the right
Consider left node of the root (Root + 1), you could also consider this as Root + N. Meaning that the right node would be Root + N + 1.
You can now recursively solve this problem by establishing which Node you are currently on, and setting the left and right now respectively.
So lets think about it,
You have the root node, left node, and right node established. At this point you have used 3 letters/numbers (it really doesnt matter if it is unordered). The next step would be to move down one level and start filling the left, you have the root, you need left and right nodes. Then move to the right node, do the left and right node of this and so on and so forth.
Think about that for a little bit and see where you get.
Cheers,
Mike
EDIT:
To search,
Searching a binary tree is also a recursive theme. (I thought you previously said the tree was unordered, which may change how the tree is laid out if it is suppose to be order).
If it is unordered, you can simply recurse the tree in a manner such that
A.) Check root node
B.) Check left node
C.) Continue checking left nodes until either there is a match, or no more left nodes to check
D.) Recurse back 1, check right node
E.) Check left nodes,
F.) Recuse back, check right node
This theme will continue until eventually you have checked ALL left nodes first, and then the right nodes. The KEY to this, is at any point you have a root node, go left first, then right. (I forget what traversal type this is, but there are others if you wish to implement them over this, i personally think this is the easiest to remember).
You will then repeat for right child of Root node.
If at any time you get a match, exit.
Remember this is recursive, so make sure you think your way through this step by step. It is recursive by definition, in that you will always do steps x,y,z for each part of the tree.
To beat a dead horse, lets look at just 3 nodes to start.
(simplified)
First the root,
if(root == (what your looking for))
{
return root
}
else if(root.leftNode == (what your looking for))
{
return root.leftNode
}
else if(root.rightNode == (what your looking for))
{
return root.rightNode
}
else
{
System.out.println("Value not found")
}
If you have 5 nodes, that would be root would have a left and right, and the root.leftNode would have a left and right... You would repeat the steps above on root.leftNode also, then search root.rightNode
If you have 7 nodes, you would search ALL of root.leftNode and then recurse back to search root.leftNode.
I hope this helps,
pictures work much better in my opinion when talking about traversing trees.
Perhaps look here for a better visual
http://www.newthinktank.com/2013/03/binary-tree-in-java/

Related

Missing link in understanding recursive in order tree Traversal

I mostly get the recursively programmed in order tree Traversal, but theres just a small gap in my understanding that I want to clarify. when the first recursive method call moves the node pointer all the way to the left, why does the fact that there is a null node cause the method to stop at the last node, move past the recursive call and print the last node. Part of me can just accept that that's just what happens, but what I really don't get is why once you print the last left node, why does it move back up to the right, its as if it treats the recently printed node like it treats the null node. In my mind it would keep moving left, as if repeatedly hitting a wall. I'm not sure how well I'm explaining what I don't get, I'll happily try to rephrase and use pictures if it's not clear. I assume the recursive in order tree Traversal is very standard, though I know of two implementations that do the same thing, but I'll add the method code I'm using in Java to be sure.
public void print(Node<E> e){
if (e != null) {
print(e.left);
System.out.println(e.element);
print(e.right)
}
}
why does the fact that there is a null node cause the method to stop at the last node, move past the recursive call and print the last node
As long as you haven't reached the left most node, you will keep calling print(e.left). Once you reached the left most node, e.left of that node would be null, and print(null) will end the recursion. Once it returns, System.out.println(e.element); will print the left most node.
why once you print the last left node, why does it move back up to the right
Then print(e.right) will be executed. If the left most node has a right child, print(e.right) will recursively print the sub-tree of that right child. Otherwise, it will execute print(null), ending the recursion on that path.
Once print() of the left most node returns, you go back to print() of the parent of that node, so you call System.out.println(e.element); for that parent node, and so on...

How exactly is the print statement working in these two methods?

This is wrt to Binary Search Tree.I am traversing the tree in 2 ways.1)InOrder Traversal
An inorder traversal of tree t is a recursive algorithm that follows the the left subtree; once there are no more left subtrees to process, we process the right subtree. The elements are processed in left-root-right order.
2)PostOrder Traversal
A postorder traversal of tree t is a recursive algorithm that follows the the left and right subtrees before processing the root element. The elements are processed left-right-root order.
I am confused over how the recursion methods and print statements are working. Could you please enlighten me?
static void inOrder(Leaf root){
if(root != null){
inOrder(root.left);
System.out.print(root.value+" ");
inOrder(root.right);
}
}
static void postOrder(Leaf root){
if(root != null){
postOrder(root.left);
postOrder(root.right);
System.out.print(root.value+" ");
}
}
The way this works is each function will print out a single, long line containing each of the values if the tree. As each of the algorithms goes through the tree, it appends the current value of the node to the output along with a space. For example, if we consider the following tree:
2
/ \
1 3
The first algorithm will start at the 2, then call itself on the left child, 1. The function will then call itself on the left child, which is null, so it will return immediately. The function will then print "1 " to the console. The function will call itself on the right child, which is null, so it will return immediately. The function will then return to the 2, and "2 " will be printed to the console. The function will then call itself on the right child, which is the 3. The function will call itself on the left child , which returns, then print "3 " to the console, then call itself on the right child, which returns. The function will then return to the 2, and that will return to whatever called it. The console at the end will say
1 2 3
A similar thing would happen for the second algorithm, except the function would go to and print the left and right children, 1 and 3, respectively, before printing the root node, 2, resulting in the following output:
1 3 2
If you are having trouble understanding it, it would benefit you to draw yourself a tree and follow the code step by step to see what the computer is doing.
As every recursion method, it needs a base case, a condition to stop the recursion. In this case, it's when "root" is null.
Observe that the variable root, will only be the actual root of the Tree at the first call in the method. Consecutive calls are passing the element on it's left or right.
On the method inOrder, it prints the elements that are further down on the left branch of the Tree. So it "goes down a level" on the tree before calling the print. This means that if the element has a leaf on a left branch, it will print said leaf before going to the right branch.
in the postOrder Method, it will go the furthest down on the tree it can go and then print the elements, doing that for both branches ( left and right ). This meaning that the leafs of thre tree will be printed first while the actual root will be the last Element.
If you're having trouble visualizing, i suggest you draw the tree in a paper and run the code with a small sample tree using the Debug on Eclipse, so you can follow the execution line by line and see how the algorithm is traversing the Tree.

Graph traversal - finding and returning the shortest distance

I am using a Breadth first search in a program that is trying to find and return the shortest path between two nodes on an unweighted digraph.
My program works like the wikipedia page psuedo code
The algorithm uses a queue data structure to store intermediate results as it traverses the graph, as follows:
Enqueue the root node
Dequeue a node and examine it
If the element sought is found in this node, quit the search and return a result.
Otherwise enqueue any successors (the direct child nodes) that have not yet been discovered.
If the queue is empty, every node on the graph has been examined – quit the search and return "not found".
If the queue is not empty, repeat from Step 2.
So I have been thinking of how to track number of steps made but I am having trouble with the limitations of java (I am not very knowledgeable of how java works). I originally was thinking that I could create some queue made up of a data type I made that stores steps and nodes, and as it traverses the graph it keeps track of the steps. If ever the goal is reached just simply return the steps.
I don't know how to make this work in java so I had to get rid of that idea and I moved on to using that wonky Queue = new LinkedList implementation of a queue. So basically I think it is a normal integer queue, I couldn't get my data type I made to work with it.
So now I have to find a more basic approach so I tried to use a simple counter, this doesn't work because the traversal algorithm searches down many paths before reaching the shortest one so I had an idea. I added a second queue that tracked steps, and I added a couple counters. Any time a node is added to the first queue I add to the counter, meaning I know that I am inspecting new nodes so I am not a distance further away. Once all those have been inspected I can then increase the step counter and any time a node is added to the first queue I add the step value to the step queue. The step queue is managed just like the node queue so that when the goal node is found the corresponding step should be the one to be dequeued out.
This doesn't work though and I was having a lot of problems with it, I am actually not sure why.
I deleted most of my code in panic and frustration but I will start to try and recreate it and post it here if anyone needs me to.
Were any of my ideas close and how can I make them work? I am sure there is a standard and simple way of doing this as well that I am not clever enough to see.
Code would help. What data structure are you using to store the partial or candidate solutions? You say your using a queue to store nodes to be examined, but really the objects stored in the queue should wrap some structure (e.g. List) that indicates the nodes traversed to get to the node to be examined. So, instead of simple Nodes being stored in the queue, some more complex object would be needed to make available the information necessary to know the complete path taken to that point. A simple node would only have information about itself, and it's children. But if you're examining node X, you also need to know how you arrived to node X. Just knowing node X isn't enough, and the only way (I know of) to know the path taken to node X is to store the path in the object that represents a "partial solution" or "candidate solution". If this is done, then finding the length of the path is trivial, because it's just the length of this list (or whichever data structure chosen). Hope I'm making some sense here. If not, post code and I'll take a look.
EDIT
These bits of code help show what I mean (they're by no means complete):
public class Solution {
List<Node> path;
}
Queue<Solution> q;
NOT
Queue<Node> q;
EDIT 2
If all you need is the length of the path, and not the path, per se, then try something like this:
public class Solution {
Node node; // whatever represents a node in you algorithm.
int len; // the length of the path to this node.
}
// Your queue:
LinkedList<Solution> q;
With this, before enqueuing a candidate solution (node), you do something like:
Solution sol = new Solution();
sol.node = childNodeToEnqueue;
sol.len = parentNode.len + 1;
q.add(sol);
The easiest solution in order to track distance during a traversal is to add a simple array (or a map if you vertices are not indexed by integers).
Here is pseudo code algorithm:
shortest_path(g, src, dst):
q = new empty queue
distances = int array of length order of g
for i = 0 to order: distances[i] = -1
distances[src] = 0
enqueue src in q
while q is not empty:
cur = pop next element in q
if cur is dst: return distances[dst]
foreach s in successors of cur in g:
if distances[s] == -1:
distances[s] = distances[cur] + 1
enqueue s in q
return not found
Note: order of a graph is the number of vertices
You don't need special data structures, the queue can just contains vertices' id (probably integers). In Java, LinkedList class implements the Queue interface, so it's a good candidate for your queue. For the distances array, if your vertices are identified by integers an integer array is enough, otherwise you need a kind of map.
You can also separate the vertex tainting (the -1 in my algo) using a separate boolean array or a set, but it's not really necessary and will waste some space.
If you want the path, you can also do that with a simple parent array: for each vertex you store its parent in the traversal, just add parent[s] = cur when you enqueue the successor. Then retrieving the path (in reverse order) is a simple like this:
path = new empty stack
cur = dst
while cur != src:
push cur in path
cur = parent[cur]
push src in path
And there you are …

Traversing a Huffman Tree

So Currently I have a program that creates a huffman tree. the tree is made up of "Hnodes" with these fields: right (points to right child) left (points to left child) code (string of integers, ideally the 0's and 1's that will be the huffman code of this node) character (the character contained in the node).
I have created the huffman tree by adding nodes from a linked list - i know the tree was created correctly. As i created the tree, i told the node when i gave it a parent node, that if it was the parent's "right", its code string was 1, if left 0. However obviously after the entire tree is created, each node is only going to have either a 0 or 1, but not yet a string like 00100101. My question is, now that I have this tree, can can I traverse it? I understand the thought would be to give each child its parent's code + the child's own code, but I do not understand how to loop through the tree to accomplish this.
Thank you in advance.
Maybe this?
ExpandBinaryPaths(node, prefix)
1. if node is null then return
2. else then
3. node.binary = prefix concat node.binary
4. ExpandBinaryPaths(node.left, node.binary)
5. ExpandBinaryPaths(node.right, node.binary)
6. return
The idea is you would call this on the root with no prefix... ExpandBinaryPaths(root, "").

Java Algorithm for finding the largest set of independent nodes in a binary tree

By independent nodes, I mean that the returned set can not contain nodes that are in immediate relations, parent and child cannot both be included. I tried to use Google, with no success. I don't think I have the right search words.
A link, any help would be very much appreciated. Just started on this now.
I need to return the actual set of independent nodes, not just the amount.
You can compute this recursive function with dynamic programming (memoization):
MaxSet(node) = 1 if "node" is a leaf
MaxSet(node) = Max(1 + Sum{ i=0..3: MaxSet(node.Grandchildren[i]) },
Sum{ i=0..1: MaxSet(node.Children[i]) })
The idea is, you can pick a node or choose not to pick it. If you pick it, you can't pick its direct children but you can pick the maximum set from its grandchildren. If you don't pick it, you can pick maximum set from the direct children.
If you need the set itself, you just have to store how you selected "Max" for each node. It's similar to the LCS algorithm.
This algorithm is O(n). It works on trees in general, not just binary trees.
I would take-and-remove all leaves first while marking their parents as not-to-take, then remove all leaves that are marked until no such leaves are left, then recurse until the tree is empty. I don't have a proof that this always produces the largest possible set, but I believe it should.
I've provided an answer to a question for the same problem, although the solution is in python, the explanation, algorithm, and test cases could be applicable.

Categories