Time complexity of TreeMap<> operations: get() and subMap() - java

Based on this post,
Time complexity of TreeMap operations- subMap, headMap, tailMap
subMap() itself is O(1), and O(n) comes from iterating the sub map.
So, why use get(key) then?
We can use subMap(key, true, key, true) instead,
which is O(1) and iterating this sub map is also O(1).
Faster than get(key), which is O(log(n)). Something wrong here...

We can use subMap(key, true, key, true) instead, which is O(1)
This is correct
and iterating this sub map is also O(1).
O(n) comes from the question. The answer says nothing to imply this, which is good, because it's not true.
Time complexity of iterating a subtree is O(log n + k), where n is the number of elements in the whole map, and k is the number of elements in the sub-map. In other words, it still takes O(log n) to get to the first position when you start iterating. Look up getFirstEntry() implementation to see how it is done.
This brings the overall complexity of your approach to O(log n), but it is bound to be slower than a simple get, because an intermediate object is created and discarded in the process.

The answer is a bit confusing. Technically it's true that creating the submap is constant operation. But that's just because it actually does nothing apart from setting the low and high keys and still shares the tree structure with the original tree.
As a result any operation on the tree is actually postponed until the specific method is invoked. So then get() still goes through the whole original map and only checks whether it didn't cross the low and high boundaries. Simply saying the get() is still O(n) where the n comes from the original map, not from the submap.

The construction of subMap takes O(1) time, however all retrieval operations take the same O(log n) time as in the original map because SubMap just wraps this object and eventually complete a range check and delegate the invocation of get() method to the original source map object.

Related

Want to delete some elements from a list that is existed in another list

I have a list suppose
listA=[679,890,907,780,5230,781]
and want to delete some elements that is existed in another
listB=[907,5230]
in minimum time complexity?
I can do this problem by using two "for loops" means O(n2) time complexity, but I want to reduce this complexity to O(nlog(n)) or O(n)?
Is it possible?
It's possible - if one of the lists is sorted. Assuming that list A is sorted and list B is unsorted, with respective dimensions M and N, the minimum time complexity to remove all of list B's elements from list A will be O((N+M)*log(M)). The way you can achieve this is by binary search - each lookup for an element in list A takes O(log(M)) time, and there are N lookups (one for each element in list B). Since it takes O(M*log(M)) time to sort A, it's more efficient for huge lists to sort and then remove all elements, with total time complexity O((N+M)*log(M)).
On the other hand, if you don't have a sorted list, just use Collection.removeAll, which has a time complexity of O(M*N) in this case. The reason for this time complexity is that removeAll does (by default) something like the following pseudocode:
public boolean removeAll(Collection<?> other)
for each elem in this list
if other contains elem
remove elem from this list
Since contains has a time complexity of O(N) for lists, and you end up doing M iterations, this takes O(M*N) time in total.
Finally, if you want to minimize the time complexity of removeAll (with possibly degraded real world performance) you can do the following:
List<Integer> a = ...
List<Integer> b = ...
HashSet<Integer> lookup = new HashSet<>(b);
a.removeAll(lookup);
For bad values of b, the time to construct lookup could take up to time O(N*log(N)), as shown here (see "pathologically distributed keys"). After that, invoking removeAll will take O(1) for contains over M iterations, taking O(M) time to execute. Therefore, the time complexity of this approach is O(M + N*log(N)).
So, there are three approaches here. One provides you with time complexity O((N+M)*log(M)), another provides you with time complexity O(M*N), and the last provides you with time complexity O(M + N*log(N)). Considering that the first and last approaches are similar in time complexity (as log tends to be very small even for large numbers), I would suggest going with the naive O(M*N) for small inputs, and the simplest O(M + N*log(N)) for medium-sized inputs. At the point where your memory usage starts to suffer from creating a HashSet to store the elements of B (very large inputs), I would finally switch to the more complex O((N+M)*log(M)) approach.
You can find an AbstractCollection.removeAll implementation here.
Edit:
The first approach doesn't work so well for ArrayLists - removing from the middle of list A takes O(M) time, apparently. Instead, sort list B (O(N*log(N))), and iterate through list A, removing items as appropriate. This takes O((M+N)*log(N)) time and is better than the O(M*N*log(M)) that you end up with when using an ArrayList. Unfortunately, the "removing items as appropriate" part of this algorithm requires that you create data to store the non-removed elements in O(M), as you don't have access to the internal data array of list A. In this case, it's strictly better to go with the HashSet approach. This is because (1) the time complexity of O((M+N)*log(N)) is actually worse than the time complexity for the HashSet method, and (2) the new algorithm doesn't save on memory. Therefore, only use the first approach when you have a List with O(1) time for removal (e.g. LinkedList) and a large amount of data. Otherwise, use removeAll. It's simpler, often faster, and supported by library designers (e.g. ArrayList has a custom removeAll implementation that allows it to take linear instead of quadratic time using negligible extra memory).
You can achieve this in following way
Sort second list( you can sort any one of the list. Here I have sorted second list). After that loop through first array and for each element of first array, do binary search in second array.
You can sort list by using Collections.sort() method.
Total complexity:-
For sorting :- O(mLogm) where m is size of second array. I have sorted only second array.
For removing :- O(nLogm)

HashMap vs. ArrayList insertion performance confusion

From my understanding a hashmap insertion is O(1) and for an arraylist the insertion is O(n) since for the hashmap the hashfunction computes the hashcode and index and inserts the entry and an array list does a comparison every time it enters a new element.
Firstly, an operation of complexity O(1) does not always take lesser time than an operation of complexity O(n). O(1) only means that the operation takes a constant time (which could be any value), regardless of the size of the input. O(n) means that the time required for the operation increases linearly with the size of the input. This means that O(1) is theoretically guaranteed to take lesser time than O(n) only when n is infinity.
Now coming to your examples, ArrayList.add() operation runs in amortized constant time, which means although it could take up to O(n) time for a particular iteration, the average complexity spread out over time is O(1). For more information on amortized constant time, refer to this question.
ArrayList is faster than the HashMap when you add item at the last of the ArrayList because there is no need to shift the elements in the ArrayList to the right side, you can see the efficiency of the HashMap if you add an item at the front of the ArrayList like this arrayList.add(0, str).
when check this use 1000 as outer loop instead of 100000 otherwise it may hang.

Java: Inserting into LinkedList efficiently

I am optimizing an implementation of a sorted LinkedList.
To insert an element I traverse the list and compare each element until I have the correct index, and then break loop and insert.
I would like to know if there is any other way that I can insert the element at the same time as traversing the list to reduce the insert from O(n + (n capped at size()/2)) to O(n).
A ListIterator is almost what Im after because of its add() method, but unfortunately in the case where there are elements in the list equal to the insert, the insert has to be placed after them in the list. To implement this ListIterator needs a peek() which it doesnt have.
edit: I have my answer, but will add this anyway since a lot of people havent understood correctly:
I am searching for an insertion point AND inserting, which combined is higher than O(n)
You may consider a skip list, which is implemented using multiple linked lists at varying granularity. E.g. the linked list at level 0 contains all items, level 1 only links to every 2nd item on average, level 2 to only every 4th item on average, etc.... Searching starts from the top level and gradually descends to lower levels until it finds an exact match. This logic is similar to a binary search. Thus search and insertion is an O(log n) operation.
A concrete example in the Java class library is ConcurrentSkipListSet (although it may not be directly usable for you here).
I'd favor Péter Török suggestion, but I'd still like to add something for the iterator approach:
Note that ListIterator provides a previous() method to iterate through the list backwards. Thus first iterate until you find the first element that is greater and then go to the previous element and call add(...). If you hit the end, i.e. all elements are smaller, then just call add(...) without going back.
I have my answer, but will add this anyway since a lot of people havent understood correctly: I am searching for an insertion point AND inserting, which combined is higher than O(n).
Your require to maintain a collection of (possibly) non-unique elements that can iterated in an order given by a ordering function. This can be achieved in a variety of ways. (In the following I use "total insertion cost" to mean the cost of inserting a number (N) of elements into an initially empty data structure.)
A singly or doubly linked list offers O(N^2) total insertion cost (whether or not you combine the steps of finding the position and doing the insertion!), and O(N) iteration cost.
A TreeSet offers O(NlogN) total insertion cost and O(N) iteration cost. But has the restriction of no duplicates.
A tree-based multiset (e.g. TreeMultiset) has the same complexity as a TreeSet, but allows duplicates.
A skip-list data structure also has the same complexity as the previous two.
Clearly, the complexity measures say that a data structure that uses a linked list performs the worst as N gets large. For this particular group of requirements, a well-implemented tree-based multiset is probably the best, assuming there is only one thread accessing the collection. If the collection is heavily used by many threads (and it is a set), then a ConcurrentSkipListSet is probably better.
You also seem to have a misconception about how "big O" measures combine. If I have one step of an algorithm that is O(N) and a second step that is also O(N), then the two steps combined are STILL O(N) .... not "more than O(N)". You can derive this from the definition of "big O". (I won't bore you with the details, but the Math is simple.)

Worse case time complexity put/get HashMap

What is the worst case time complexity of an Hashmap when the hashcode of it's keys are always equal.
In my understanding: As every key has the same hashcode it will always go to the same bucket and loop through it to check for equals method so for both get and put the time complexity should be O(n), Am I right?
I was looking at this HashMap get/put complexity but it doesn't answer my question.
Also here Wiki Hash Table they state the worse case time complexity for insert is O(1) and for get O(n) why is it so?
Yes, in the worst case your hash map will degenerate into a linked list and you will suffer an O(N) penalty for lookups, as well as inserts and deletions, both of which require a lookup operation (thanks to the comments for pointing out the mistake in my earlier answer).
There are some ways of mitigating the worst-case behavior, such as by using a self-balancing tree instead of a linked list for the bucket overflow - this reduces the worst-case behavior to O(logn) instead of O(n).
In Java 8's HashMap implementation (for when the key type implements Comparable):
Handle Frequent HashMap Collisions with Balanced Trees: In the case of high hash collisions, this will improve worst-case performance from O(n) to O(log n).
From here.
in open hashing, you will have a linked list to store objects which have the same hashcode. so:
for example, you have a hashed table with size 4.
1) assume you want to store an object with hashcode = 0. the object then will be mapped into index (0 mod 4 = ) 0.
2) then you again want to put another object with hashcode = 8. this object will be mapped into index (8 mod 4 = ) 0, as we remember that the index 0 has already filled with our first object, so we have to put the second next to the first.
[0]=>linkedList{object1, object2}
[1]=>null
[2]=>null
[3]=>null
3) what are the steps for searching? 1st, you have to hash the key object and assume that it hashcode is 8, so you will be redirected to index (8 mod 4 = ) 0, then because there is more than one object stored in the same index, we have to search one-by-one all stored objects in the list until you find the matched one or until the end of the list. as the example has 2 objects which stored in the same hashtable index 0, and the searched object lies right in the end of the linkedlist, so you need to walk through all the stored objects. that's why it is O(n) as the worst case.
worst case occured when all the stored object are in the same index in the hashtable. so they will be stored in a linkedlist in which we (may) need to walk through all of them to find our searched object.
hope this help,.
HasMap Complexity
Best. Avg. Worst
Search O(1) O(1) O(n)
Insert O(1) O(1) O(n)
Delete O(1) O(1) O(n)
Hope that will help in short
When inserting, it doesn't matter where in the bucket you put it, so you can just insert it anywhere, thus insertion is O(1).
Lookup is O(n) because you will have to loop through each object and verify that it is the one you were looking for (as you've stated).

Set time and speed complexity

I am brushing up algorithms and data structures and have a few questions as well as statements I would like you to check.
ArrayList - O(1) (size, get, set, ...), O(n) - add operation.
LinkedList - all operation O(1) (including add() ), except for retrieving n-th element which is O(n). I assume size() operation runs in O(1) as well, right?
TreeSet - all operations O(lg(N)). size() operation takes O(lg(n)), right?
HashSet - all operations O(1) if proper hash function is applied.
HashMap - all operations O(1), anologous to HashSet.
Any further explanations are highly welcome. Thank you in advance.
ArrayList.add() is amortized O(1). If the operation doesn't require a resize, it's O(1). If it does require a resize, it's O(n), but the size is then increased such that the next resize won't occur for a while.
From the Javadoc:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
The documentation is generally pretty good for Java collections, in terms of performance analysis.
The O(1) for hash algorithms isn't a matter of just applying a "proper" hash function - even with a very good hash function, you could still happen to get hash collisions. The usual complexity is O(1), but of course it can be O(n) if all the hashes happen to collide.
(Additionally, that's counting the cost of hashing as O(1) - in reality, if you're hashing strings for example, each call to hashCode may be O(k) in the length of the string.)
Visit the following links. It will help you getting your doubts cleared.
Data structures & their complexity
Java standard data structures Big O notation

Categories