How to Compute Space Complexity for Binary SubTree Finding - java

This problem is from the book Cracking the Coding Interview. I have trouble understanding the space complexity of the solution.
Problem:
You have two very large binary trees: T1, with millions of nodes, and T2, with hundreds of nodes. Create an algorithm to decide if T2 is a subtree of T1.
Solution (in Java):
public static boolean containsTree(TreeNode t1, TreeNode t2) {
if (t2 == null)
return true; // The empty tree is a subtree of every tree.
else
return subTree(t1, t2);
}
/* Checks if the binary tree rooted at r1 contains the binary tree
* rooted at r2 as a subtree somewhere within it.
*/
public static boolean subTree(TreeNode r1, TreeNode r2) {
if (r1 == null)
return false; // big tree empty & subtree still not found.
if (r1.data == r2.data) {
if (matchTree(r1,r2)) return true;
}
return (subTree(r1.left, r2) || subTree(r1.right, r2));
}
/* Checks if the binary tree rooted at r1 contains the
* binary tree rooted at r2 as a subtree starting at r1.
*/
public static boolean matchTree(TreeNode r1, TreeNode r2) {
if (r2 == null && r1 == null)
return true; // nothing left in the subtree
if (r1 == null || r2 == null)
return false; // big tree empty & subtree still not found
if (r1.data != r2.data)
return false; // data doesn’t match
return (matchTree(r1.left, r2.left) &&
matchTree(r1.right, r2.right));
}
The book says the space complexity of this solution is O(log(n) +log(m)) where m is the number of nodes in T1 (larger tree) and n number of nodes in T2.
To me, it appears that the solution has O(log(m)*log(n)) space complexity since "subtree" function has log(n) recursive calls and each recursive call executes "matchTree" function which triggers log(m) recursive calls.
Why is this solution O(log(n) + log(m)) complexity?

Since we're not creating any objects on the heap, the space complexity is the size of the stack. So the question is not how many total calls occur, but how big the stack can grow.
containsTree() can only call subTree(), subTree() can call itself or matchTree(), and matchTree() can only call itself. So at any point where matchTree() has been called, the stack looks like this:
[containsTree] [subTree] ... [subTree] [matchTree] ... [matchTree]
This is why you don't multiply the space complexities here: while each call to subTree() can call matchTree(), those calls to matchTree() leave the stack before subTree() continues recursing.
Analysis along the lines of the "correct answer"
If the question doesn't specify if the trees are balanced, then a real worst-case analysis would assume they might not be. However, you and the book are assuming they are. We can set aside that question for later by saying the depth of T1 is c, and the depth of T2 is d. c is O(log(m)) if T1 is balanced, and O(m) otherwise. Same thing for T2's d.
Worst case for matchTree() is O(d), because the furthest it could recurse would be the height of T2.
Worst case for subTree() is O(c) for its recursion, because the furthest it could recurse would be the height of T1, plus the cost of calling matchTree(), for a total of O(c+d).
And containsTree() just adds a constant on top of calling subTree(), so that doesn't change the space complexity.
So if both T1 and T2 are balanced, by replacing c and d you can see that O(log(m)+log(n)) seems reasonable.
Problems with the "correct answer"
Like I said before, it's not right to assume binary trees are balanced until you know for a fact that they are. So a better answer might be O(m+n).
But wait! The question states that the size of T2 is less than the size of T1. That means that n is O(m), and log(n) is O(log(m)). So why have we been wasting time worrying about n?
If the trees are balanced, the space complexity is simply O(log(m)). In the general case where you don't know what's balanced or not, the real answer should be O(m), the size of the larger tree.

Related

How to calculate the runtime of this function?

I am trying to calculate the runtime of a function I wrote in Javq, I wrote a function that calculates the sum of all the right children in a BT.
I used recursion in the function, and I don't really understand how to calculate the runtime in recursion non the less in a BT (just started studying the subject).
This is the code I wrote:
public int sumOfRightChildren(){
return sumOfRightChildren(this.root);
}
private int sumOfRightChildren(Node root){
if(root == null) //O(1)
return 0;//O(1)
int sum = 0;//O(1)
if(root.right != null)//O(1)
sum+=root.right.data;//O(1)
sum += sumOfRightChildren(root.right); //worst case O(n) ?
if(root.left != null)
{
sum += sumOfRightChildren(root.left);//worst case O(n)?
}
return sum;
}
I tried writing down the runtimes I think it takes, but I don't think I am doing it right.
If someone can help guide me I'd be very thankful.
I'm trying to calculate T(n).
Since you visit every node exactly once is easy to see the runtime cost is T(n) = n * K where n is the number of nodes in the Binary Tree and K is the expected function cost.
If you want to explicitly consider the cost of certain operations you may not be able to calculate it exactly (without having an input example). For example, calculating the number of times sum+=... is executed is not possible because it depends on the particular tree.
In this case the worst case is a full Binary Tree and, if it is n=1,2,... their depth:
the complexity is O(2^n) (no matter the operations since all of them take O(1) as you have posted).
the cost of sum+=root.right.data; is T(n) = 2^n - 1 (all internal nodes).
the cost of sum+=... is T(n) = 3 * (2^n - 1) (twice for every internal node and one more for each node).
...
(NOTE: the exact final expression may vary since your if(root.left != null) is not usefull and prefereable let that condition to the if(root == null))
Ok I think I understood,
The worst case is that is has to check all the nodes in the tree so the answer is: O(n)

Determining the Big O of a recursive method with two recursive cases?

I am currently struggling with computing the Big O for a recursive exponent function that takes a shortcut whenever n%2 == 0. The code is as follows:
public static int fasterExponent(int x, int n){
if ( n == 0 ) return 1;
if ( n%2 == 0 ){
int temp = fasterExponent(x, n/2);
return temp * temp;
}
return x * fasterExponent(x, --n); //3
}
I understand that, without the (n%2 == 0) case, this recursive exponent function would be O(n). The inclusion of the (n%2 == 0) case speeds up the execution time, but I do not know how to determine neither its complexity nor the values of its witness c.
The answer is O(log n).
Reason: fasterExponent(x, n/2) this halfes the input in each step and when it reaches 0 we are done. This obviously needs log n steps.
But what about fasterExponent(x, --n);? We do this when the input is odd and in the next step it will be even and we fall back to the n/2 case. Let's consider the worst case that we have to do this every time we divide n by 2. Well then we do the second recursive step once for every time we do the first recursive step. So we need 2 * log n operations. That is still O(log n).
I hope my explanation helps.
It's intuitive to see that at each stage you are cutting the problem size by half. For instance, to find x4, you find x2(let's call this A), and return the result as A*A. Again x2 itself is found by dividing it into x and x.
Considering multiplication of two numbers as a primitive operation, you can see that the recurrence is:
T(N) = T(N/2) + O(1)
Solving this recurrence(using say the Master Theorem) yields:
T(N) = O(logN)

Cracking the Coding Interview, 6th edition, 2.8

Problem Statement: Given a circular linked list, implement an algoirthm that returns the node at the beginning of the loop.
The answer key gives a more complicated solution than what I propose. What's wrong with mine?:
public static Node loopDetection(Node n1) {
ArrayList<Node> nodeStorage = new ArrayList<Node>();
while (n1.next != null) {
nodeStorage.add(n1);
if (nodeStorage.contains(n1.next)) {
return n1;
}
else {
n1 = n1.next;
}
}
return null;
}
Your solution isO(n^2) time (each contains() in ArrayList is O(n) time) and O(n) space (for storing nodeStorage), while the "more complicated" solution is O(n) time and O(1) space.
The book offers the following solution, to whomever is interested, which is O(n) time and O(1) space:
If we move two pointers, one with speed 1 and another with speed 2,
they will end up meeting if the linked list has a loop. Why? Think
about two cars driving on a track—the faster car will always pass the
slower one! The tricky part here is finding the start of the loop.
Imagine, as an analogy, two people racing around a track, one running
twice as fast as the other. If they start off at the same place, when
will they next meet? They will next meet at the start of the next lap.
Now, let’s suppose Fast Runner had a head start of k meters on an n
step lap. When will they next meet? They will meet k meters before the
start of the next lap. (Why? Fast Runner would have made k + 2(n - k)
steps, including its head start, and Slow Runner would have made n - k
steps. Both will be k steps before the start of the loop.) Now, going
back to the problem, when Fast Runner (n2) and Slow Runner (n1) are
moving around our circular linked list, n2 will have a head start on
the loop when n1 enters. Specifically, it will have a head start of k,
where k is the number of nodes before the loop. Since n2 has a head
start of k nodes, n1 and n2 will meet k nodes before the start of the
loop. So, we now know the following:
1. Head is k nodes from LoopStart (by definition).
2. MeetingPoint for n1 and n2 is k nodes from LoopStart (as shown above). Thus, if we move n1 back to Head and keep n2 at MeetingPoint,
and move them both at the same pace, they will meet at LoopStart.
LinkedListNode FindBeginning(LinkedListNode head) {
LinkedListNode n1 = head;
LinkedListNode n2 = head;
// Find meeting point
while (n2.next != null) {
n1 = n1.next;
n2 = n2.next.next;
if (n1 == n2) {
break;
}
}
// Error check - there is no meeting point, and therefore no loop
if (n2.next == null) {
return null;
}
/* Move n1 to Head. Keep n2 at Meeting Point. Each are k steps
/* from the Loop Start. If they move at the same pace, they must
* meet at Loop Start. */
n1 = head;
while (n1 != n2) {
n1 = n1.next;
n2 = n2.next;
}
// Now n2 points to the start of the loop.
return n2;
}
I had trouble visualizing what was going on with this algorithm. Hopefully this helps someone else.
At time t = k(3), p2 is twice the distance from the head(0) as p1, so for them to get back in line, we need p2 to 'catch up' to p1 and it will take L - k(8) 5 more steps to occur. p2 is travelling at 2x the speed of p1.
At time t = k + (L - k) (8), p2 needs to travel k steps forward to get back to k. If we reset p1 back to the head(0), we know that p1 and p2 will both meet back at k(3, 19 respectively) if p2 is travelling at the same speed as p1.
There is the solution given by amit. The problem is that you either know it or you don't, but you won't be able to figure it out in an interview. Since I have never had a need to find a cycle in a linked list, knowing it to me is pointless except for passing interviews. So for an interviewer, stating this as an interview question, and expecting amir's answer (which is nice because it has linear time and zero extra space), is quite stupid.
So your solution is mostly fine, except that you should use a hash table, and you must make sure that the hash table hashes references to nodes and not nodes. Say you have a node containing a string and a "next" pointer, and the hash function hashes the string and compares nodes as equal if the strings are equal. In that case you'd find the first node with a duplicate string, and not the node at the start of the loop, unless you are careful.
(amir's solution has a very similar problem in languages where == compares the objects, and not the references. For example in Swift, you'd have to use === (compares references) and not == (compares objects)).

What is the big oh for comparing two binary trees?

If one binary tree has x nodes and the other has y nodes where x is bigger than y. I was thinking O(n2) because searching for each node is O(n).
And how about inserting then comparing the trees?
Assuming your binary trees are sorted, this is an O(n) operation (where n is the sum of the nodes in both trees, not the product).
You can simply run two "indexes" side by side through the trees stopping when an element is different. If you get to the end of both and no differences were found, then the trees were identical, something like the following pseudo-code:
def areEqual (tree1, tree2):
pos1 = first (tree1)
pos2 = first (tree2)
while pos1 != END and pos2 != END:
if tree1[pos1] != tree2[pos2]:
return false
pos1 = next (tree1, pos1)
pos2 = next (tree2, pos2)
if pos1 != END or pos2 != END:
return false
return true
If they're not sorted, and you have no other information that may allow you to optimise the function, and cannot use extra data structures, it will be O(n2), since you'll have to find an arbitrary equal node in the second tree for every single node in the first (as well as mark it somehow to indicate you've used it).
Keep in mind there are usually ways to trade space for time if the former is more important (and it often is).
For example, even with totally unordered trees, you can reduce the complexity considerably by using hashing for example:
def areEqual (tree1, tree2):
hash = []
# Add all items from first tree.
for item in tree1.allItems():
if not exists hash[item]
hash[item] = 0
hash[item] += 1
# Subtract all items from second tree.
for item in tree2.allItems():
if not exists hash[item]
hash[item] = 0
hash[item] -= 1
if hash[item] == 0:
delete hash[item]
if hash.size != 0:
return false
return true
Since hashing tends to amortise toward O(1), the problem as a whole can be considered O(n).

Recursive toString() method in a binary search tree. What is the time complexity for this?

I'm a beginner in Java and looking for some help.
So I've made this binary tree in Java, and I'm supposed to implement a method which sorts all elements in order and convert them into a string. It's supposed to look like ex. "[1, 2, 3, 4]". I used the StringBuilder in order to do this.
My code for the method looks loke this:
/**
* Converts all nodes in current tree to a string. The string consists of
* all elements, in order.
* Complexity: ?
*
* #return string
*/
public String toString() {
StringBuilder string = new StringBuilder("[");
helpToString(root, string);
string.append("]");
return string.toString();
}
/**
* Recursive help method for toString.
*
* #param node
* #param string
*/
private void helpToString(Node<T> node, StringBuilder string) {
if (node == null)
return; // Tree is empty, so leave.
if (node.left != null) {
helpToString(node.left, string);
string.append(", ");
}
string.append(node.data);
if (node.right != null) {
string.append(", ");
helpToString(node.right, string);
}
}
So my question is, how do I calculate the time complexity for this? Also, if there are any suggestions in how to make this method better, I would gladly appreciate it.
The easiest answer is: O(n). You visit every node once and do one (a) amount of work. The calculation would go like
O(a*n)
and because we ignore constant factors, the simple answer would be
O(n)
But. One could also argue, your doing just a little bit more: you return null each time you visit a place where there is no leaf. This again is one (b) amount of work to be done.
Let's call those invisible leaves for a moment. Following this idea, every value in the tree is a node which has one or two invisible leafs.
So now, we have the following to do (for each node):
a | if a node has two child nodes
a + b | if a node has one child node
a + 2b | if a node has no child node.
for a worst case binary tree (totally unbalanced), we have (n-1) nodes with one child node and one node with no child node:
"1"
\
"2"
\
"3"
\
"4"
And so, the calculation is
(n-1)*(a+b) + b
<=> an + bn - a - b + b
<=> n(a+b) + b
=> O(an + bn) // we ignore `+ b` because we always look at really big n's
Fortunately, even in a worst case time scenario, complexity is still linear. Only the constant is higher than in the easy answer. In most cases - usually when Big-O is needed to compare algorithms - we even ignore the factor and we're happy to say, that the algorithms time complexity is linear (O(n)).
The time complexity is O(n) since you are visiting every node once. You cannot do any better than that in order walk the tree.
Time complexity is linear in the number of nodes in the tree: you are visiting each node exactly once.

Categories