Print random element from binary tree in O(logn) - java

Given a binary tree with TreeNode like:
class TreeNode {
int data;
TreeNode left;
TreeNode right;
int size;
}
Where size is the number of nodes in the (left subtree + right subtree + 1).
Print a random element from the tree in O(logn) running time.
Note: This post is different from this one, as it is clearly mentioned that we have a size associated with each node in this problem.
PS: Wrote this post inspired from this.

There is an easy approach which gives O(n) complexity.
Generate a random number in the range of 1 to root.size
Do a BFS or DFS traversal
Stop iterating at random numbered element and print it.
For improving the complexity, we need to create an ordering of our own where we branch at each iteration instead of going sequentially as in BFS and DFS. We can use the size property of each node to decide whether to traverse through the left sub-tree or right sub-tree. Following is the approach:
Generate a random number in the range of 1 to root.size (Say r)
Start traversing from the root node and decide whether to go to left sub-tree, right-subtree or print root:
if r <= root.left.size, traverse through the left sub-tree
if r == root.left.size + 1, print the root (we have found the random node to print)
if r > root.left.size + 1, traverse through the right sub-tree
Essentially, we have defined an order where current node is ordered at (size of left subtree of current) + 1.
Since we eliminate traversing through left or right sub-tree at each iteration, its running time is O(logn).
The pseudo-code would look something like this:
traverse(root, random)
if r == root.left.size + 1
print root
else if r > root.left.size + 1
traverse(root.right, random - root.left.size - 1)
else
traverse(root.left, random)
Following is an implementation in java:
public static void printRandom(TreeNode n, int randomNum) {
int leftSize = n.left == null ? 0 : n.left.size;
if (randomNum == leftSize + 1)
System.out.println(n.data);
else if (randomNum > leftSize + 1)
printRandom(n.right, randomNum - (leftSize + 1));
else
printRandom(n.left, randomNum);
}

Use size!
Pick a random number q between 0 and n.
Start from the root. If left->size == q return current node value. If the left->size < q the go right else you go left. If you go right subtract q -= left->size + 1. Repeat until you output a node.
This give you o(height of tree). If the tree is balanced you get O(LogN).
If the tree is not balanced but you still want to keep O(logN) you can do the same thing but cap the maximum number of iterations. Note that in this case not all nodes have the same probability of being returned.

Yes, use size!
As Sorin said, pick a random number i between 0 and n - 1 (not n)
Then perform the following instruction:
Treenode rand_node = select(root, i);
Where select could be as follows:
TreeNode select_rec(TreeNode r, int i) noexcept
{
if (i == r.left.size)
return r;
if (i < r.left.size)
return select_rec(r.left, i);
return select_rec(r.right, i - r.left.size - 1);
}
Now a very important trick: the null node must be a sentinel node with size set to 0, what has sense because the empty tree has zero nodes. You can avoid the use of sentinel, but then the select() operation is lightly more complex.
If the trees is balanced, then select() is O(log n)

Related

Binary Search Tree select method implementation

I'm currently going over Robert Sedgewick's Algorithms book. In the book I'm trying to understand the implementation of the select method in a Binary Search Tree. The author uses a BST to implement a symbol table. The author describes the select method as follow:
Suppose that we seek the key of rank k (the key such that precisely k
other keys in the BST are smaller). If the number of keys t in the
left sub- tree is larger than k, we look (recursively) for the key of
rank k in the left subtree; if t is equal to k, we return the key at
the root; and if t is smaller than k, we look (recursively) for the
key of rank k - t - 1 in the right subtree. As usual, this de-
scription serves both as the basis for the recursive select() method
on the facing page and for a proof by induction that it works as
expected.
I want to understand specifically what is the purpose of the k - t - 1 pass to the recursive select method when the size of the left node is less than the number of keys smaller than k.
public Key select(int k)
{
return select(root, k).key;
}
private Node select(Node x, int k)
{ // Return Node containing key of rank k.
if (x == null) return null;
int t = size(x.left);
if (t > k) return select(x.left, k);
else if (t < k) return select(x.right, k-t-1);
else return x;
}
As you can see the above implementation of the select method of a Binary Search Tree. When the conditional t < k the author passes k-t-1 to the recursive select method call but I can't still figure out why that is.
k − t − 1 is equal to k − (t + 1). When t < k and we recurse into the right subtree, the t elements in the left subtree and the 1 root count toward the rank of the elements in the right subtree, so we need to adjust k to match.

Why is number of array accesses during the union and find operations said to be of the order lg(N) in Weighted QuickUnion?

The general code for Quick Union Algorithm.
public class QuickUnionUF
{
private int[] id;
public QuickUnionUF(int N)
{
id = new int[N];
for (int i = 0; i < N; i++) id[i] = i;
}
private int root(int i)
{
while (i != id[i]) i = id[i];
return i;
}
public boolean connected(int p, int q)
{
return root(p) == root(q);
}
public void union(int p, int q)
{
int i = root(p);
int j = root(q);
id[i] = j;
}
}
Here it is clear that for union operations like "initialize", "union" and "find if connected" the number of array accesses will be of the order of N (which is quite clear from the code).
However, my book claims that if we modify the QuickUnion to Weighted QuickUnion then the number of array accesses becomes of the order of lg(N). But I can't see how from the code.
The only change that has been made for Weighted QuickUnion is the part within the union() function, as follows:
int i = root(p);
int j = root(q);
if (i == j) return;
if (sz[i] < sz[j]) { id[i] = j; sz[j] += sz[i]; }
else { id[j] = i; sz[i] += sz[j]; }
Here we maintain extra array sz[i]
to count number of objects in the tree rooted at i.
But here, I don't see how number of array accesses for union is of the order lg(N). Array accesses must be of the order N as we have to call the root() method twice. Also how is it of the order lg(N) even for "find if connected" operation?
I'm confused as to how are they getting the lg(N). Could someone please explain?
In the non-modified version, you get the linear dependency because the links to the parent can be arbitrary. So, in the worst case, you might end up with a single long list (i.e. you have to traverse every other element if you're at the end of the list).
The modification (union-by-rank) aims at producing shallower trees by making the smaller subtree a child of the root of the larger subtree. This heuristic makes the trees much more balanced, i.e. the length of the path from any leaf to its root becomes O(log n). Remember that the height of a full binary tree with k nodes is O(log k).
For a more formal proof, please refer to existing literature.
Additional note: I mentioned that union-by-rank is only a heuristic. Ideally, you would want to make the decision based on the height of both subtrees. However, keeping track of the height is pretty hard. That's why you usually use the size of the subtree, which correlates with its height.
Sure,
Now, is clear that if the complexity of root method is order of lg(n), then the union will have lg(n) as well.
The weighted quick union guarantees that lg(n) complexity. Here is how:
The WQU(Weighted quick union algorithm) have the elements saved as several tree structures. The root method finds the root of the tree which contains the element i. So it's complexity is bounded by the maximum height of such a tree.
Now let h(i) be the height of the tree which containts element i and w(i) the size (weight) of that tree.
We impose that h(i) <= lg(w(i)). Let's see what happens when we make the union of 2 trees(let them be i and j).
Since we are binding the tree to the root of another. The height of the new tree can be at most max(h(i), h(j)) + 1.
Let's say that w(i) <= w(j), then we bind i to the root of j. If h(j) > h(i) we have nothing to worry about, the height doesn't change. If not, the new height will be h(i)+1. w(i) + w(j) >= 2 * w(i) => lg (w(i) + w(j)) >= lg(2*w(i)) => log (new size) >= 1 + lg(w(i)) >= 1 + h(i). So h(i) + 1 <= log(new size) thus, the constraint remains (the height of the new tree is smaller than log of the new weight) which means, in the worst case, with a tree of size N, the root method will need at most lg(N) steps.
Weighted union preserves the invariant that for every tree with height h and size n, h ≤ log(n)+1.
This is trivially true for the initial set of trees, with n=1 and h=1.
Let say we merge two trees with heights h₁, h₂, and sizes n₁ , n₂, and n₁ ≥ n₂.
Weighted union ensures that the new height is either h₁ or h₂ + 1, and the new size is n₁ + n₂. In both these cases, the invariant is preserved:
h₁ ≤ log(n₁) + 1 ⇒ h₁ ≤ log(n₁+n₂) + 1
and
h₂ ≤ log(n₂) + 1 ⇒ h₂ + 1 ≤ log(n₂) + 2 ⇒ h₂ + 1 ≤ log(2n₂) + 1 ⇒ h₂ + 1 ≤ log(n₁+n₂) + 1
because n₁ ≥ n₂.

Print Specific nodes at a every level calculated by a given function

In an interview, i was given a function:
f(n)= square(f(n-1)) - square(f(n-2)); for n>2
f(1) = 1;
f(2) = 2;
Here n is the level of an n-array tree. f(n)=1,2,3,5,16...
For every level n of a given N-Array I have to print the f(n) node at every level. For example:
At level 1 print node number 1 (i.e. root)
At level 2 print node number 2 (from left)
At level 3 print node number 3 (from left)
At level 4 print node number 5... and so on
If the number of nodes(say nl) at any level n is less than f(n), then have to print node number nl%f(n) counting from the left.
I did a basic level order traversal using a queue, but I was stuck at how to count nodes at every level and handle the condition when number of nodes at any level n is less than f(n).
Suggest a way to proceed for remaining part of problem.
You need to perform Level Order Traversal.
In the code below I am assuming two methods:
One is getFn(int level) which takes in an int and returns the f(n) value;
Another is printNth(int i, Node n) that takes in an int and Node and beautifully prints that "{n} is the desired one for level {i}".
The code is simple to implement now. Comments explain it like a story...
public void printNth throws IllegalArgumentException (Node root) {
if (root == null) throw new IllegalArgumentException("Null root provided");
//Beginning at level 1 - adding the root. Assumed that levels start from 1
LinkedList<Node> parent = new LinkedList<>();
parent.add(root)
int level = 1;
printNth(getFn(level), root);
//Now beginning from the first set of parents, i.e. the root itself,
//we dump all the children from left to right in another list.
while (parent.size() > 0) { //Last level will have no children. Loop will break there.
//Starting with a list to store the children
LinkedList<Node> children = new LinkedList<>();
//For each parent node add both children, left to right
for (Node n: parent) {
if (n.left != null) children.add(n.left);
if (n.right != null) children.add(n.right);
}
//Now get the value of f(n) for this level
level++;
int f_n = getFn(level);
//Now, if there are not sufficient elements in the list, print the list size
//because children.size()%f(n) will be children.size() only!
if (children.size() < f_n) System.out.println(children.size());
else printNth(level, children.get(f_n - 1)); //f_n - 1 because index starts from 0
//Finally, the children list of this level will serve as the new parent list
//for the level below.
parent = children;
}
}
Added solution here
I have used queue to read all nodes at a particular level, before reading the nodes checking if given level(n) is equal to current level then checking size of the queue is greater than f(n) then just read f(n) nodes from queue and mark it as deleted otherwise perform mod operation and delete the node nl%f(n).

How is this program a pre-order traversal?

int count(Node node) {
if (node == null)
return 0;
int r = count (node.right);
int l = count (node.left);
return 1 + r + l;
}
This function returns number of nodes in Binary Tree rooted at node. A few articles say this is a pre-order traversal, but to me this looks like a post-order traversal, because we are visiting the left and the right parts before we visit the root. Am I wrong here ? Or is my notion of "visited" at fault ?
In this code, no actual processing is being done at each node, so there would be no difference between a pre-order and post-order traversal. If there were processing, the difference would be:
pre-order
int count(Node node) {
if (node == null)
return 0;
process(node);
int r = count (node.right);
int l = count (node.left);
return 1 + r + l;
}
post-order
int count(Node node) {
if (node == null)
return 0;
int r = count (node.right);
int l = count (node.left);
process(node);
return 1 + r + l;
}
(Actually, in these cases—unlike with your code—you'd probably want to recurse on node.left before node.right to preserve the conventional left-to-right ordering of processing children.)
Counting nodes is case in which it is hard to say if algorithm is pre-order or is post-order because we don't know "when" we "count" 1 for current node.
But if we change case to printing it becomes clear:
pre-order:
int visit(Node node) {
...
node.print(); // pre-order : root cames first
visit(node.left);
visit(node.right);
...
}
post-order
int visit(Node node) {
...
visit(node.left);
visit(node.right);
node.print(); // post-order : root cames last
...
}
As you can see we can say which print() comes first.
With counting we cannot say if root is counted (+1) prior to subtrees or not.
This is question of convention.
We could say this is pre-order traversal as the count function is applied to a node before than to its children.
But the question is rather tricky as you are using direct recursion, doing both the traversal and the "action" in the same function.
Yes. Your notion of visited is wrong! Here visited means that you are at current node and then trying to traverse the tree. Counting is done at 'root' first then ur counting rights and then lefts so yes it is preorder.

Depth of a Node in BST including duplicates

i have implemented a function to find the depth of a node in a binary search tree but my implementation does not take care of duplicates. I have my code below and would like some suggestions on how to consider duplicates case in this function. WOuld really appreciate your help.
public int depth(Node n) {
int result=0;
if(n == null || n == getRoot())
return 0;
return (result = depth(getRoot(), n, result));
}
public int depth(Node temp, Node n, int result) {
int cmp = n.getData().compareTo(temp.getData());
if(cmp == 0) {
int x = result;
return x;
}
else if(cmp < 0) {
return depth(temp.getLeftChild(), n, ++result);
}
else {
return depth(temp.getRightChild(), n, ++result);
}
}
In the code you show, there is no way to prefer one node with same value over another. You need to have some criteria for differentiation.
You can retrieve the list of all duplicate nodes depths using the following approach, for example:
Find the depth of your node.
Find depth of the same node for the left subtree emerging from the found node - stop if not found.
Add depth of the previously found node (in 1) to the depth of the duplicate
Find depth of the same node for the right subtree emerging from the found node (in 1) - stop if not found.
Add depth of the previously found node (in 1) to the depth of the duplicate
Repeat for left and right subtrees.
Also see here: What's the case for duplications in BST?
Well, if there's duplicates, then the depth of a node with a given value doesn't make any sense on its own, because there may be multiple nodes with that value, hence multiple depths.
You have to decide what it means, which could be (not necessarily an exhaustive list):
the depth of the deepest node with that value.
the depth of the shallowest node with that value.
the depth of the first node found with that value.
the average depth of all nodes with that value.
the range (min/max) of depths of all nodes with that value.
a list of depths of all nodes with that value.
an error code indicating your query made little sense.
Any of those could make sense in specific circumstances.
Of course, if n is an actual pointer to a node, you shouldn't be comparing values of nodes at all, you should be comparing pointers. That way, you will only ever find one match and the depth of it makes sense.
Something like the following pseudo-code should do:
def getDepth (Node needle, Node haystack, int value):
// Gone beyond leaf, it's not in tree
if haystack == NULL: return -1
// Pointers equal, you've found it.
if needle == haystack: return value
// Data not equal search either left or right subtree.
if needle.data < haystack.data:
return getDepth (needle, haystack.left, value + 1)
if needle.data > haystack.data:
return getDepth (needle, haystack.right, value + 1)
// Data equal, need to search BOTH subtrees.
tryDepth = getDepth (needle, haystack.left, value + 1)
if trydepth == -1:
tryDepth = getDepth (needle, haystack.right, value + 1)
return trydepth
The reason why you have to search both subtrees when the values are equal is because the desired node may be in either subtree. Where the values are unequal, you know which subtree it's in. So, for the case where they're equal, you check one subtree and, if not found, you check the other.

Categories