I'm currently going over Robert Sedgewick's Algorithms book. In the book I'm trying to understand the implementation of the select method in a Binary Search Tree. The author uses a BST to implement a symbol table. The author describes the select method as follow:
Suppose that we seek the key of rank k (the key such that precisely k
other keys in the BST are smaller). If the number of keys t in the
left sub- tree is larger than k, we look (recursively) for the key of
rank k in the left subtree; if t is equal to k, we return the key at
the root; and if t is smaller than k, we look (recursively) for the
key of rank k - t - 1 in the right subtree. As usual, this de-
scription serves both as the basis for the recursive select() method
on the facing page and for a proof by induction that it works as
expected.
I want to understand specifically what is the purpose of the k - t - 1 pass to the recursive select method when the size of the left node is less than the number of keys smaller than k.
public Key select(int k)
{
return select(root, k).key;
}
private Node select(Node x, int k)
{ // Return Node containing key of rank k.
if (x == null) return null;
int t = size(x.left);
if (t > k) return select(x.left, k);
else if (t < k) return select(x.right, k-t-1);
else return x;
}
As you can see the above implementation of the select method of a Binary Search Tree. When the conditional t < k the author passes k-t-1 to the recursive select method call but I can't still figure out why that is.
k − t − 1 is equal to k − (t + 1). When t < k and we recurse into the right subtree, the t elements in the left subtree and the 1 root count toward the rank of the elements in the right subtree, so we need to adjust k to match.
Related
The general code for Quick Union Algorithm.
public class QuickUnionUF
{
private int[] id;
public QuickUnionUF(int N)
{
id = new int[N];
for (int i = 0; i < N; i++) id[i] = i;
}
private int root(int i)
{
while (i != id[i]) i = id[i];
return i;
}
public boolean connected(int p, int q)
{
return root(p) == root(q);
}
public void union(int p, int q)
{
int i = root(p);
int j = root(q);
id[i] = j;
}
}
Here it is clear that for union operations like "initialize", "union" and "find if connected" the number of array accesses will be of the order of N (which is quite clear from the code).
However, my book claims that if we modify the QuickUnion to Weighted QuickUnion then the number of array accesses becomes of the order of lg(N). But I can't see how from the code.
The only change that has been made for Weighted QuickUnion is the part within the union() function, as follows:
int i = root(p);
int j = root(q);
if (i == j) return;
if (sz[i] < sz[j]) { id[i] = j; sz[j] += sz[i]; }
else { id[j] = i; sz[i] += sz[j]; }
Here we maintain extra array sz[i]
to count number of objects in the tree rooted at i.
But here, I don't see how number of array accesses for union is of the order lg(N). Array accesses must be of the order N as we have to call the root() method twice. Also how is it of the order lg(N) even for "find if connected" operation?
I'm confused as to how are they getting the lg(N). Could someone please explain?
In the non-modified version, you get the linear dependency because the links to the parent can be arbitrary. So, in the worst case, you might end up with a single long list (i.e. you have to traverse every other element if you're at the end of the list).
The modification (union-by-rank) aims at producing shallower trees by making the smaller subtree a child of the root of the larger subtree. This heuristic makes the trees much more balanced, i.e. the length of the path from any leaf to its root becomes O(log n). Remember that the height of a full binary tree with k nodes is O(log k).
For a more formal proof, please refer to existing literature.
Additional note: I mentioned that union-by-rank is only a heuristic. Ideally, you would want to make the decision based on the height of both subtrees. However, keeping track of the height is pretty hard. That's why you usually use the size of the subtree, which correlates with its height.
Sure,
Now, is clear that if the complexity of root method is order of lg(n), then the union will have lg(n) as well.
The weighted quick union guarantees that lg(n) complexity. Here is how:
The WQU(Weighted quick union algorithm) have the elements saved as several tree structures. The root method finds the root of the tree which contains the element i. So it's complexity is bounded by the maximum height of such a tree.
Now let h(i) be the height of the tree which containts element i and w(i) the size (weight) of that tree.
We impose that h(i) <= lg(w(i)). Let's see what happens when we make the union of 2 trees(let them be i and j).
Since we are binding the tree to the root of another. The height of the new tree can be at most max(h(i), h(j)) + 1.
Let's say that w(i) <= w(j), then we bind i to the root of j. If h(j) > h(i) we have nothing to worry about, the height doesn't change. If not, the new height will be h(i)+1. w(i) + w(j) >= 2 * w(i) => lg (w(i) + w(j)) >= lg(2*w(i)) => log (new size) >= 1 + lg(w(i)) >= 1 + h(i). So h(i) + 1 <= log(new size) thus, the constraint remains (the height of the new tree is smaller than log of the new weight) which means, in the worst case, with a tree of size N, the root method will need at most lg(N) steps.
Weighted union preserves the invariant that for every tree with height h and size n, h ≤ log(n)+1.
This is trivially true for the initial set of trees, with n=1 and h=1.
Let say we merge two trees with heights h₁, h₂, and sizes n₁ , n₂, and n₁ ≥ n₂.
Weighted union ensures that the new height is either h₁ or h₂ + 1, and the new size is n₁ + n₂. In both these cases, the invariant is preserved:
h₁ ≤ log(n₁) + 1 ⇒ h₁ ≤ log(n₁+n₂) + 1
and
h₂ ≤ log(n₂) + 1 ⇒ h₂ + 1 ≤ log(n₂) + 2 ⇒ h₂ + 1 ≤ log(2n₂) + 1 ⇒ h₂ + 1 ≤ log(n₁+n₂) + 1
because n₁ ≥ n₂.
Given a binary tree with TreeNode like:
class TreeNode {
int data;
TreeNode left;
TreeNode right;
int size;
}
Where size is the number of nodes in the (left subtree + right subtree + 1).
Print a random element from the tree in O(logn) running time.
Note: This post is different from this one, as it is clearly mentioned that we have a size associated with each node in this problem.
PS: Wrote this post inspired from this.
There is an easy approach which gives O(n) complexity.
Generate a random number in the range of 1 to root.size
Do a BFS or DFS traversal
Stop iterating at random numbered element and print it.
For improving the complexity, we need to create an ordering of our own where we branch at each iteration instead of going sequentially as in BFS and DFS. We can use the size property of each node to decide whether to traverse through the left sub-tree or right sub-tree. Following is the approach:
Generate a random number in the range of 1 to root.size (Say r)
Start traversing from the root node and decide whether to go to left sub-tree, right-subtree or print root:
if r <= root.left.size, traverse through the left sub-tree
if r == root.left.size + 1, print the root (we have found the random node to print)
if r > root.left.size + 1, traverse through the right sub-tree
Essentially, we have defined an order where current node is ordered at (size of left subtree of current) + 1.
Since we eliminate traversing through left or right sub-tree at each iteration, its running time is O(logn).
The pseudo-code would look something like this:
traverse(root, random)
if r == root.left.size + 1
print root
else if r > root.left.size + 1
traverse(root.right, random - root.left.size - 1)
else
traverse(root.left, random)
Following is an implementation in java:
public static void printRandom(TreeNode n, int randomNum) {
int leftSize = n.left == null ? 0 : n.left.size;
if (randomNum == leftSize + 1)
System.out.println(n.data);
else if (randomNum > leftSize + 1)
printRandom(n.right, randomNum - (leftSize + 1));
else
printRandom(n.left, randomNum);
}
Use size!
Pick a random number q between 0 and n.
Start from the root. If left->size == q return current node value. If the left->size < q the go right else you go left. If you go right subtract q -= left->size + 1. Repeat until you output a node.
This give you o(height of tree). If the tree is balanced you get O(LogN).
If the tree is not balanced but you still want to keep O(logN) you can do the same thing but cap the maximum number of iterations. Note that in this case not all nodes have the same probability of being returned.
Yes, use size!
As Sorin said, pick a random number i between 0 and n - 1 (not n)
Then perform the following instruction:
Treenode rand_node = select(root, i);
Where select could be as follows:
TreeNode select_rec(TreeNode r, int i) noexcept
{
if (i == r.left.size)
return r;
if (i < r.left.size)
return select_rec(r.left, i);
return select_rec(r.right, i - r.left.size - 1);
}
Now a very important trick: the null node must be a sentinel node with size set to 0, what has sense because the empty tree has zero nodes. You can avoid the use of sentinel, but then the select() operation is lightly more complex.
If the trees is balanced, then select() is O(log n)
Say we have an array
a[] ={1,2,-3,3,-3,-3,4,-4,5}
And find the position of 3 (which would be 3)
There are be no multiple indexes for an answer.
It must be efficient, and NOT linear.
I was thinking of doing a Binary Search of the array, but instead of comparing the raw values, I wanted to compare the absolute values; abs(a[i]) and abs(n) [n is the input number]. Then if the values are equal, I do another comparison, now with the raw values a[i] and n.
But I run into a problem where, if I am in the above situation with the same array {1,2,-3,3,-3,-3,4,-4,5}, and am looking for 3, there are multiple -3 that get in the way (thus, I would have to check if the raw values a[i] and n does not work, I have to check a[i+1] and a[i-1].)
Ok im just rambling now. Am i thinking too hard for this?
Help me out thanks!!! :D
It is a modified binary search problem. The difference between this and regular binary search is that you need to find and test all of the elements that compare as equal according to the sorting criterion.
I would:
use a tweaked binary search algorithm to find the index of the left-most element that matches
iterate through the indexes until you find the element are looking for, or an element whose absolute value no longer matches.
That should be O(logN) for the first step. The second step is O(1) on average if you assume that the element values are evenly distributed. (The worst case for the second step is O(N); e.g. when the elements all have the same absolute value, and the one you want is the last in the array.)
Here's the method to solve your problem:
/**
* #param a array sorted by absolute value
* #param key value to find (must be positive)
* #return position of the first occurence of the key or -1 if key not found
*/
public static int binarySearch(int[] a, int key) {
int low = 0;
int high = a.length-1;
while (low <= high) {
int mid = (low + high) >>> 1;
int midVal = Math.abs(a[mid]);
if (midVal < key)
low = mid + 1;
else if (midVal > key || (midVal == key && mid > 0 && Math.abs(a[mid-1]) == key))
high = mid - 1;
else
return mid; // key found
}
return -1; // key not found.
}
It's a modification of Arrays.binarySearch from JDK. There are several changes. First, we compare absolute values. Second, as you want not any key position, but the first one, I modified a condition: if we found a key we check whether the previous array item has the same value. If yes, then we continue search. This way algorithm remains O(log N) even for special cases where too many values which are equal to key.
i have implemented a function to find the depth of a node in a binary search tree but my implementation does not take care of duplicates. I have my code below and would like some suggestions on how to consider duplicates case in this function. WOuld really appreciate your help.
public int depth(Node n) {
int result=0;
if(n == null || n == getRoot())
return 0;
return (result = depth(getRoot(), n, result));
}
public int depth(Node temp, Node n, int result) {
int cmp = n.getData().compareTo(temp.getData());
if(cmp == 0) {
int x = result;
return x;
}
else if(cmp < 0) {
return depth(temp.getLeftChild(), n, ++result);
}
else {
return depth(temp.getRightChild(), n, ++result);
}
}
In the code you show, there is no way to prefer one node with same value over another. You need to have some criteria for differentiation.
You can retrieve the list of all duplicate nodes depths using the following approach, for example:
Find the depth of your node.
Find depth of the same node for the left subtree emerging from the found node - stop if not found.
Add depth of the previously found node (in 1) to the depth of the duplicate
Find depth of the same node for the right subtree emerging from the found node (in 1) - stop if not found.
Add depth of the previously found node (in 1) to the depth of the duplicate
Repeat for left and right subtrees.
Also see here: What's the case for duplications in BST?
Well, if there's duplicates, then the depth of a node with a given value doesn't make any sense on its own, because there may be multiple nodes with that value, hence multiple depths.
You have to decide what it means, which could be (not necessarily an exhaustive list):
the depth of the deepest node with that value.
the depth of the shallowest node with that value.
the depth of the first node found with that value.
the average depth of all nodes with that value.
the range (min/max) of depths of all nodes with that value.
a list of depths of all nodes with that value.
an error code indicating your query made little sense.
Any of those could make sense in specific circumstances.
Of course, if n is an actual pointer to a node, you shouldn't be comparing values of nodes at all, you should be comparing pointers. That way, you will only ever find one match and the depth of it makes sense.
Something like the following pseudo-code should do:
def getDepth (Node needle, Node haystack, int value):
// Gone beyond leaf, it's not in tree
if haystack == NULL: return -1
// Pointers equal, you've found it.
if needle == haystack: return value
// Data not equal search either left or right subtree.
if needle.data < haystack.data:
return getDepth (needle, haystack.left, value + 1)
if needle.data > haystack.data:
return getDepth (needle, haystack.right, value + 1)
// Data equal, need to search BOTH subtrees.
tryDepth = getDepth (needle, haystack.left, value + 1)
if trydepth == -1:
tryDepth = getDepth (needle, haystack.right, value + 1)
return trydepth
The reason why you have to search both subtrees when the values are equal is because the desired node may be in either subtree. Where the values are unequal, you know which subtree it's in. So, for the case where they're equal, you check one subtree and, if not found, you check the other.
If you are provided the head of a linked list, and are asked to reverse every k sequence of nodes, how might this be done in Java? e.g., a->b->c->d->e->f->g->h with k = 3 would be c->b->a->f->e->d->h->g->f
Any general help or even pseudocode would be greatly appreciated! Thanks!
If k is expected to be reasonably small, I would just go for the simplest thing: ignore the fact that it's a linked list at all, and treat each subsequence as just an array-type thing of things to be reversed.
So, if your linked list's node class is a Node<T>, create a Node<?>[] of size k. For each segment, load k Nodes into the array list, then just reverse their elements with a simple for loop. In pseudocode:
// reverse the elements within the k nodes
for i from 0 to k/2:
nodeI = segment[i]
nodeE = segment[segment.length-i-1]
tmp = nodeI.elem
nodeI.elem = nodeE.elem
nodeE.elem = tmp
Pros: very simple, O(N) performance, takes advantage of an easily recognizable reversing algorithm.
Cons: requires a k-sized array (just once, since you can reuse it per segment)
Also note that this means that each Node doesn't move in the list, only the objects the Node holds. This means that each Node will end up holding a different item than it held before. This could be fine or not, depending on your needs.
This is pretty high-level, but I think it'll give some guidance.
I'd have a helper method like void swap3(Node first, Node last) that take three elements at an arbitrary position of the list and reverses them. This shouldn't be hard, and could could be done recursively (swap the outer elements, recurse on the inner elements until the size of the list is 0 or 1). Now that I think of it, you could generalize this into swapK() easily if you're using recursion.
Once that is done, then you can just walk along your linked list and call swapK() every k nodes. If the size of the list isn't divisble by k, you could either just not swap that last bit, or reverse the last length%k nodes using your swapping technique.
TIME O(n); SPACE O(1)
A usual requirement of list reversal is that you do it in O(n) time and O(1) space. This eliminates recursion or stack or temporary array (what if K==n?), etc.
Hence the challenge here is to modify an in-place reversal algorithm to account for the K factor. Instead of K I use dist for distance.
Here is a simple in-place reversal algorithm: Use three pointers to walk the list in place: b to point to the head of the new list; c to point to the moving head of the unprocessed list; a to facilitate swapping between b and c.
A->B->C->D->E->F->G->H->I->J->L //original
A<-B<-C<-D E->F->G->H->I->J->L //during processing
^ ^
| |
b c
`a` is the variable that allow us to move `b` and `c` without losing either of
the lists.
Node simpleReverse(Node n){//n is head
if(null == n || null == n.next)
return n;
Node a=n, b=a.next, c=b.next;
a.next=null; b.next=a;
while(null != c){
a=c;
c=c.next;
a.next=b;
b=a;
}
return b;
}
To convert the simpleReverse algorithm to a chunkReverse algorithm, do following:
1] After reversing the first chunk, set head to b; head is the permanent head of the resulting list.
2] for all the other chunks, set tail.next to b; recall that b is the head of the chunk just processed.
some other details:
3] If the list has one or fewer nodes or the dist is 1 or less, then return the list without processing.
4] use a counter cnt to track when dist consecutive nodes have been reversed.
5] use variable tail to track the tail of the chunk just processed and tmp to track the tail of the chunk being processed.
6] notice that before a chunk is processed, it's head, which is bound to become its tail, is the first node you encounter: so, set it to tmp, which is a temporary tail.
public Node reverse(Node n, int dist) {
if(dist<=1 || null == n || null == n.right)
return n;
Node tail=n, head=null, tmp=null;
while(true) {
Node a=n, b=a.right; n=b.right;
a.right=null; b.right=a;
int cnt=2;
while(null != n && cnt < dist) {
a=n; n=n.right; a.right=b; b=a;
cnt++;
}
if(null == head) head = b;
else {
tail.right=b;tail=tmp;
}
tmp=n;
if(null == n) return head;
if(null == n.right) {
tail.right=n;
return head;
}
}//true
}
E.g. by Common Lisp
(defun rev-k (k sq)
(if (<= (length sq) k)
(reverse sq)
(concatenate 'list (reverse (subseq sq 0 k)) (rev-k k (subseq sq k)))))
other way
E.g. by F# use Stack
open System.Collections.Generic
let rev_k k (list:'T list) =
seq {
let stack = new Stack<'T>()
for x in list do
stack.Push(x)
if stack.Count = k then
while stack.Count > 0 do
yield stack.Pop()
while stack.Count > 0 do
yield stack.Pop()
}
|> Seq.toList
Use a stack and recursively remove k items from the list, push them to the stack then pop them and add them in place. Not sure if it's the best solution, but stacks offer a proper way of inverting things. Notice that this also works if instead of a list you had a queue.
Simply dequeue k items, push them to the stack, pop them from the stack and enqueue them :)
This implementation uses ListIterator class:
LinkedList<T> list;
//Inside the method after the method's parameters check
ListIterator<T> it = (ListIterator<T>) list.iterator();
ListIterator<T> reverseIt = (ListIterator<T>) list.listIterator(k);
for(int i = 0; i< (int) k/2; i++ )
{
T element = it.next();
it.set(reverseIt.previous());
reverseIt.set(element);
}