I found other entries for this question that dealt with specific methods, but nothing comprehensive. I'd like to verify my own understanding of the most often used methods of this data structure:
O(1) - Constant Time:
isEmpty()
add(x)
add(x, i)
set(x, i)
size()
get(i)
remove(i)
O(N) - Linear Time:
indexof(x)
clear()
remove(x)
remove(i)
Is this correct? Thanks for your help.
The best resource is straight from the official API:
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
Related
I have to construct a max-heap out of an array (called nums in the following code) and so I'm using java.util.PriorityQueue.
My code looks like this:
PriorityQueue<Integer> pq = new PriorityQueue<>(nums.length, (a, b) -> b - a);
for (int i = 0; i < nums.length; i++) {
pq.offer(nums[i]);
}
I'm trying to find the time complexity (in terms of Big-O notation) of the above for loop.
I understand that PriorityQueue don't specify the details of the growth of underlying data structure. (And in worst-case it can be O(n) when expanding the internal-array and copying all the elements over the newly allocated space).
But I assume that when I specify the initialCapacity and don't add elements more than this initialCapacity, then the worst case time complexity of the above loop should be O(n) instead of O(nlog(n)). I understand from here that building time of heap is O(n) and nlog(n) is a loose upper bound.
Am I correct, or am I missing something?
I just want to understand that if I configure my PriorityQueue with initialCapacity of n and add n elements in that priority-queue, what will be the time complexity of this building-heap process?
PS: I already saw this, but answers for this question just claim things without explanation and may be they are not so Java specific.
I also see that java.util.PriorityQueue has a constructor that takes in a Collection. What will be the time complexity of this constructor? Shouldn't it be O(n)?
I understand that PriorityQueue don't specify the details of the growth of underlying data structure.
Let us be clear about this. The javadoc states that the policy for expanding the queue is unspecified.
(And in worst-case it can be O(n) when expanding the internal-array and copying all the elements over the newly allocated space).
The current policy (Java 11) is:
// Double size if small; else grow by 50%
int newCapacity = oldCapacity + ((oldCapacity < 64) ?
(oldCapacity + 2) :
(oldCapacity >> 1));
The amortized cost is O(1) per insertion for a "double" policy. For grow by 50% is is not as good. But is is way better than O(n).
It is safe to assume that they wouldn't unilaterally change this policy to something with substantially worse complexity, no matter what the current specification (technically) permits.
However, this is not germane to your question since you are using initialCapacity capacity, either explicitly, or when you populate the PriorityQueue from a collection.
I assume that when I specify the initialCapacity and don't add elements more than this `initialCapacity, then the worst case time complexity of the above loop should be O(n) instead of O(nlog(n)). I understand from here that building time of heap is O(n) and nlog(n) is a loose upper bound.
Am I correct, or am I missing something?
I think you are missing something.
Assuming that your input array is unsorted, building the heap ("heapification") and retrieving the elements in order is equivalent to sorting the elements into priority order. That is an O(nlogn) operation on average. While the heapification itself is O(n) (since the code uses sift down heapification), you have actually put off some of the sorting cost until later.
So unless you only intend to retrieve a non O(n) subset of the elements that you put into the queue, the overall answer is O(nlogn).
I just want to understand that if I configure my PriorityQueue with initialCapacity of n and add n elements in that priority-queue, what will be the time complexity of this building-heap process?
The overall complexity (adding and removing n elements) will be O(nlogn) for the reason stated above.
I also see that PriorityQueue has a constructor that takes in a Collection. What will be the time complexity of this constructor? Shouldn't it be O(n)?
If the collection is unsorted, then the elements must be heapified; see above. There is some special-case code to deal with a SortedCollection which skips the heapification step.
Notes:
You can confirm the details above by reading the source code for PriorityQueue. Google can find it for you.
The Wikipedia page on HeapSort talks about heapification
The proof that growing by doubling an array-based data structure is O(1) per insertion is given in good algorithms text books. The same analysis could be applied to growing by 50%.
Your lambda expression (a, b) -> b - a is not correct for ordering integers unless they are positive.
In page 290 of the book data structures and algorithms it is mentioned that complexity of remove(i) for arraylist is O(1). My first question is why not O(n)? It is also mentioned add(i,e) for linked list is O(n), so my second question is why not O(min(i,n-i))?
Finally, my 3rd question is the reason the complexity is mentioned as O(min(i,n-i)) is it due to being a doubly linked list, meaning we could either traverse from beginning (i) or end (n-i)?
The first one is debatable. When you remove the last element in an ArrayList, it's constant, but for a middle element, you need to shift all successor elements to the left. Java does that using System.arrayCopy(), a very fast native routine for copying arrays, but even that method is clearly O(n), not constant, so I'm inclined to agree with you. It's different for an insert, where the amortized cost of resizing arrays up to the required index is averaged out to a constant factor, so add() is O(1).
The second one could be implemented that way, but it isn't. Remove starts from the beginning only. I'm guessing the choice was made to reduce accidents by unsynchronized access.
Finally, in notations of Big-O complexity, less significant factors are discarded, so O(min(i,n-i)) is actually equivalent to O(n), even though the real world tells us that the former would certainly be an optimization.
Based on this post,
Time complexity of TreeMap operations- subMap, headMap, tailMap
subMap() itself is O(1), and O(n) comes from iterating the sub map.
So, why use get(key) then?
We can use subMap(key, true, key, true) instead,
which is O(1) and iterating this sub map is also O(1).
Faster than get(key), which is O(log(n)). Something wrong here...
We can use subMap(key, true, key, true) instead, which is O(1)
This is correct
and iterating this sub map is also O(1).
O(n) comes from the question. The answer says nothing to imply this, which is good, because it's not true.
Time complexity of iterating a subtree is O(log n + k), where n is the number of elements in the whole map, and k is the number of elements in the sub-map. In other words, it still takes O(log n) to get to the first position when you start iterating. Look up getFirstEntry() implementation to see how it is done.
This brings the overall complexity of your approach to O(log n), but it is bound to be slower than a simple get, because an intermediate object is created and discarded in the process.
The answer is a bit confusing. Technically it's true that creating the submap is constant operation. But that's just because it actually does nothing apart from setting the low and high keys and still shares the tree structure with the original tree.
As a result any operation on the tree is actually postponed until the specific method is invoked. So then get() still goes through the whole original map and only checks whether it didn't cross the low and high boundaries. Simply saying the get() is still O(n) where the n comes from the original map, not from the submap.
The construction of subMap takes O(1) time, however all retrieval operations take the same O(log n) time as in the original map because SubMap just wraps this object and eventually complete a range check and delegate the invocation of get() method to the original source map object.
I have read somewhere that ArrayList's add() and remove() operations run in "amortized constant" time. What does this mean exactly?
In the implementation of add(item) I can see that it ArrayList uses an array buffer, which is at most 3/2 of the list't size, and if it is full, System.arraycopy() is called, which should execute in O(n), not O(1) time. Is it then that System.arraycopy attempts to do something smarter than copying elements one by one into newly created array, since the time is actually O(1)?
Conclusion: add(item) runs in amortized constant time, but add(item, index) and remove(index) don't, they run in linear time (as explained in answers).
I have read somewhere that ArrayList's add() and remove() operations run in "amortized constant" time.
I don't think that is true for remove() except under unusual conditions.
A remove(Object) call for a random element on average has to call equals on half of entries in the list, and then copy the references for the other half.
A remove(int) call for a random element on average has to copy the references for half of the elements.
The only cases where remove(...) is going to be O(1) on average (e.g. amortized) is when you are using remove(int) to remove elements some constant offset from the end of the list.
"Amortized" roughly means "averaged across the entire runtime". Yes, an array-copy will be O(n). But that only happens when the list is full, which happens 1 in n times.
I think that amortized constant time just means that it pretty much constant time if you do tons of operations. So in one test a million items to the list, then in another test add two million items to the list. The latter should be about ~2 times slower than the former, therefore, amortized constant time.
Amortized constant time is different than constant time.
Basically amortized O(1) means that over n operations, the average run time for any operation is O(1).
For array lists, this works something like:
(O(1) insert + O(1) insert + ... + O(n) array copy) / n operations = O(1)
An indepth description of the meaning of Amortized Constant in thread Constant Amortized Time
Amortised time explained in simple terms:
If you do an operation say a million times, you don't really care about the worst-case or the best-case of that operation - what you care about is how much time is taken in total when you repeat the operation a million times.
So it doesn't matter if the operation is very slow once in a while, as long as "once in a while" is rare enough for the slowness to be diluted away. Essentially amortised time means "average time taken per operation, if you do many operations". Amortised time doesn't have to be constant; you can have linear and logarithmic amortised time or whatever else.
Let's take mats' example of a dynamic array, to which you repeatedly add new items. Normally adding an item takes constant time (that is, O(1)). But each time the array is full, you allocate twice as much space, copy your data into the new region, and free the old space. Assuming allocates and frees run in constant time, this enlargement process takes O(n) time where n is the current size of the array.
So each time you enlarge, you take about twice as much time as the last enlarge. But you've also waited twice as long before doing it! The cost of each enlargement can thus be "spread out" among the insertions. This means that in the long term, the total time taken for adding m items to the array is O(m), and so the amortised time (i.e. time per insertion) is O(1).
I am brushing up algorithms and data structures and have a few questions as well as statements I would like you to check.
ArrayList - O(1) (size, get, set, ...), O(n) - add operation.
LinkedList - all operation O(1) (including add() ), except for retrieving n-th element which is O(n). I assume size() operation runs in O(1) as well, right?
TreeSet - all operations O(lg(N)). size() operation takes O(lg(n)), right?
HashSet - all operations O(1) if proper hash function is applied.
HashMap - all operations O(1), anologous to HashSet.
Any further explanations are highly welcome. Thank you in advance.
ArrayList.add() is amortized O(1). If the operation doesn't require a resize, it's O(1). If it does require a resize, it's O(n), but the size is then increased such that the next resize won't occur for a while.
From the Javadoc:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
The documentation is generally pretty good for Java collections, in terms of performance analysis.
The O(1) for hash algorithms isn't a matter of just applying a "proper" hash function - even with a very good hash function, you could still happen to get hash collisions. The usual complexity is O(1), but of course it can be O(n) if all the hashes happen to collide.
(Additionally, that's counting the cost of hashing as O(1) - in reality, if you're hashing strings for example, each call to hashCode may be O(k) in the length of the string.)
Visit the following links. It will help you getting your doubts cleared.
Data structures & their complexity
Java standard data structures Big O notation