Time Complexity of a Binary Search Tree Insert method - java

Good day!
I have a question regarding the time complexity of a binary search tree insertion method. I read some of the answers regarding this but some were different from each other. Is the time complexity for a binary search tree insertion method O(log n) at average case and O(n) at worst case? Or is it O(n log n) for the average case and O(n^2) for the worst case? When does it become O(n log n) at average case and O(n^2) at worst case?

In avg case, is O(log n) for 1 insert operation since it consists of a test (constant time) and a recursive call (with half of the total number of nodes in the tree to visit), making the problem smaller in constant time. Thus for n insert operations, avg case is O(nlogn). The key is that the operation requieres time proportional to the height of the tree. In average, 1 insert operation is O(logn) but in the worst case the height is O(n)
If you're doing n operations, then avg is O(nlgn) and worst O(n^2)

Related

Time complexity of k-th smallest number using PriorityQueue in Java

I am trying to solve the popular interview question Find the k-th smallest number in an array of distinct integers. I read some solutions and found that a heap data structure suits this problem very well.
So, I have tried to implement a solution using the PriorityQueue class of the Collections framework, assuming that it is functionally identical to a heap.
Here is the code that I've tried:
public static int getKthMinimum(int[] input, int k){
PriorityQueue<Integer> heap = new PriorityQueue<Integer>();
// Total cost of building the heap - O(n) or O(n log n) ?
for(int num : input){
heap.add(num); // Add to heap - O(log n)
}
// Total cost of removing k smallest elements - O(k log n) < O(n) ?
while(k > 0){
heap.poll(); // Remove min - O(log n)
k--;
}
return heap.peek(); // Fetch root - O(1)
}
Based on the docs, the poll & add methods take O(log n) time and peek takes constant time.
What will be the time complexity of the while loop? (I think O(k log n)).
Should O(k log n) be considered higher than O(n) for the purpose of this question? Is there a threshold where it switches?
What will be the total time complexity of this code? Will it be O(n)?
If not already O(n), is there a way to solve it in O(n), while using the PriorityQueue class?
1. What will be the time complexity of the while loop? (I think O(k log n)).
O(k log n) is correct.
2. Should O(k log n) be considered higher than O(n) for the purpose of this question? Is there a threshold where it switches?
You cannot assume that. k can be anywhere from 0 to n−1, which means that k log n can be anywhere from 0 to n log n.
3. What will be the total time complexity of this code? Will it be O(n)?
O(n log n), because that's the cost of building the heap.
It's possible to build a heap in O(n) time, but your code doesn't do that; if it did, your total complexity would be O(n + k log n) or, equivalently, O(MAX(n, k log n)).
4. If not already O(n), is there a way to solve it in O(n), while using the PriorityQueue class?
No. There exist selection algorithms in worst-case O(n) time, but they're a bit complicated, and they don't use PriorityQueue.
The fastest PriorityQueue-based solutions would require O(MAX(n, n log MIN(k, n−k))) time. (The key is to keep only the k smallest elements in the heap while iterating — or the n−k largest elements and use a max-heap, if k is large enough for that to be worthwhile.)

Complexity after Arrays.sort and for loop

In java, if I sort using Arrays.sort() (n log n) and then use a for loop o(n) in the code, what will be the new complexity ? is it n^2 log n or n log n
Answer: O(nLog(n))
non nested complexities can be simply added. i.e. O(n) + O(nLog(n))
For large n, nLog(n) is significantly greater than n. Therefore, O(nLog(n)) is the answer.
Read this: https://en.wikipedia.org/wiki/Big_O_notation
Note:
if the complexities are nested then the complexities are multiplied, for example:
Inside a loop of order n, you are doing a sort of order nLog(n).
Then complexity will be O(n * nLog(n)). i.e. O(n²Log(n))
If you perform the for loop after, you have an O(nlog(n)) operation followed by an O(n) one. Since O(n) is negligible compared to O(nlog(n)), your overall complexity would be O(nlog(n)).

Should I use TreeSet or HashSet?

I have large number of strings, I need to print unique strings in sorted order.
TreeSet stores them in sorted order but insertion time is O(Logn) for each insertion. HashSet takes O(1) time to add but then I will have to get list of the set and then sort using Collections.sort() which takes O(nLogn) (I assumes there is no memory overhead here since only the references of Strings will be copied in the new collection i.e. List). Is it fair to say overall any choice is same since at the end total time will be same?
That depends on how close you look. Yes, the asymptotic time complexity is O(n log n) in either case, but the constant factors differ. So it's not like one method can get a 100 times faster than the other, but it's certainly possible that one method is twice a fast as the other.
For most parts of a program, a factor of 2 is totally irrelevant, but if your program actually spends a significant part of its running time in this algorithm, it would be a good idea to implement both approaches, and measure their performance.
Measuring is the way to go, but if you're talking purely theoretically and ignoring read from after sorting, then consider for number of strings = x:
HashSet:
x * O(1) add operations + 1 O(n log n) (where n is x) sort operation = approximately O(n + n log n) (ok, that's a gross oversimplification, but..)
TreeSet:
x * O(log n) (where n increases from 1 to x) + O(0) sort operation = approximately O(n log (n/2)) (also a gross oversimplification, but..)
And continuing in the oversimplification vein, O(n + n log n) > O(n log (n/2)). Maybe TreeSet is the way to go?
If you distinguish the total number of strings (n) and number of unique strings (m), you get more detailed results for both approaches:
Hash set + sort: O(n) + O(m log m)
TreeSet: O(n log m)
So if n is much bigger than m, using a hash set and sorting the result should be slightly better.
You should take into account which methods will be executed more frequently and base your decision on that.
Apart from HashSet and TreeSet you can use LinkedHashSet which provides better performance for sorted sets. If you want to learn more about their differences in performance I suggest your read 6 Differences between TreeSet HashSet and LinkedHashSet in Java

Deciding a Big-O notation for an algorithm

I have questions for my assignment.
I need to decide what is the Big-O characterization for this following algorithm:
I'm guessing the answer for Question 1 is O(n) and Question 2 is O(log n), but I kinda confused
how to state the reason. Are my answers correct? And could you explain the reason why the characterization is like that?
Question 1 : O(n) because it increments by constant (1).
first loop O(n) second loop also O(n)
total O(n) + O(n) = O(n)
Question 2 : O(lg n) it's binary search.
it's O(lg n), because problem halves every time.
if the array is size n at first second is n/2 then n/4 ..... 1.
n/2^i = 1 => n = 2^i => i = log(n) .
Yes, your answers are right.The first one is pretty simple. 2 separate for loops. So effectively its O(n).
The second one is actually tricky. You are actually dividing the input size by 2 (half), that would effectively lead to a time complexity of O(log n).

TreeRangeMap time and space complexities

I'm looking to guava TreeRangeMap that seem to very well suit my needs for a project. The java docs says that is based on a (java standard ?) TreeMap that have O(log(n)) time for get, put and next.
But the TreeRangeMap should be some kind of range tree implementation that according to this SO question have O(k + log(n)) time complexity for queries, O(n) space, with k being the range size?. Can somebody confirm this?
I'm also very interested in the time complexity of TreeRangeMap.subRangeMap() operation. Does it have the same O(k + log(n))?
Thanks.
It's a view, not an actual mutation or anything. subRangeMap returns in O(1) time, and the RangeMap it returns has O(log n) additive cost for each of its query operations -- that is, all of its operations still take O(log n), just with a higher constant factor.
Source: I'm "the guy who implemented it."
We generally use Range tree to find the points that lie in a given interval [x1, x2] and x1 < x2. However, if the range tree is a balanced-binary tree(as in the case of TreeMap which is implemented with Red-Black tree), the search paths to x1(or successor) and x2(or predecessor) have cost O(log n). When we find them, if there k number of points lie in this range, we will have to report it using Tree traversal which will have linear cost O(k). So in total O(k + log(n)).
I'm also very interested in the time complexity of
TreeRangeMap.subRangeMap() operation. Does it have the same O(k +
log(n))?
<K,V> subRangeMap(Range<K> subRange) Returns a view of the part of this range map that intersects with range, resulting in just another balanced-binary search Tree. So why not?

Categories