HashMap vs. ArrayList insertion performance confusion

HashMap vs. ArrayList insertion performance confusion - java

From my understanding a hashmap insertion is O(1) and for an arraylist the insertion is O(n) since for the hashmap the hashfunction computes the hashcode and index and inserts the entry and an array list does a comparison every time it enters a new element.

Firstly, an operation of complexity O(1) does not always take lesser time than an operation of complexity O(n). O(1) only means that the operation takes a constant time (which could be any value), regardless of the size of the input. O(n) means that the time required for the operation increases linearly with the size of the input. This means that O(1) is theoretically guaranteed to take lesser time than O(n) only when n is infinity.
Now coming to your examples, ArrayList.add() operation runs in amortized constant time, which means although it could take up to O(n) time for a particular iteration, the average complexity spread out over time is O(1). For more information on amortized constant time, refer to this question.

ArrayList is faster than the HashMap when you add item at the last of the ArrayList because there is no need to shift the elements in the ArrayList to the right side, you can see the efficiency of the HashMap if you add an item at the front of the ArrayList like this arrayList.add(0, str).
when check this use 1000 as outer loop instead of 100000 otherwise it may hang.

Related

Want to delete some elements from a list that is existed in another list

I have a list suppose
listA=[679,890,907,780,5230,781]
and want to delete some elements that is existed in another
listB=[907,5230]
in minimum time complexity?
I can do this problem by using two "for loops" means O(n2) time complexity, but I want to reduce this complexity to O(nlog(n)) or O(n)?
Is it possible?

It's possible - if one of the lists is sorted. Assuming that list A is sorted and list B is unsorted, with respective dimensions M and N, the minimum time complexity to remove all of list B's elements from list A will be O((N+M)*log(M)). The way you can achieve this is by binary search - each lookup for an element in list A takes O(log(M)) time, and there are N lookups (one for each element in list B). Since it takes O(M*log(M)) time to sort A, it's more efficient for huge lists to sort and then remove all elements, with total time complexity O((N+M)*log(M)).
On the other hand, if you don't have a sorted list, just use Collection.removeAll, which has a time complexity of O(M*N) in this case. The reason for this time complexity is that removeAll does (by default) something like the following pseudocode:
public boolean removeAll(Collection<?> other)
for each elem in this list
if other contains elem
remove elem from this list
Since contains has a time complexity of O(N) for lists, and you end up doing M iterations, this takes O(M*N) time in total.
Finally, if you want to minimize the time complexity of removeAll (with possibly degraded real world performance) you can do the following:
List<Integer> a = ...
List<Integer> b = ...
HashSet<Integer> lookup = new HashSet<>(b);
a.removeAll(lookup);
For bad values of b, the time to construct lookup could take up to time O(N*log(N)), as shown here (see "pathologically distributed keys"). After that, invoking removeAll will take O(1) for contains over M iterations, taking O(M) time to execute. Therefore, the time complexity of this approach is O(M + N*log(N)).
So, there are three approaches here. One provides you with time complexity O((N+M)*log(M)), another provides you with time complexity O(M*N), and the last provides you with time complexity O(M + N*log(N)). Considering that the first and last approaches are similar in time complexity (as log tends to be very small even for large numbers), I would suggest going with the naive O(M*N) for small inputs, and the simplest O(M + N*log(N)) for medium-sized inputs. At the point where your memory usage starts to suffer from creating a HashSet to store the elements of B (very large inputs), I would finally switch to the more complex O((N+M)*log(M)) approach.
You can find an AbstractCollection.removeAll implementation here.
Edit:
The first approach doesn't work so well for ArrayLists - removing from the middle of list A takes O(M) time, apparently. Instead, sort list B (O(N*log(N))), and iterate through list A, removing items as appropriate. This takes O((M+N)*log(N)) time and is better than the O(M*N*log(M)) that you end up with when using an ArrayList. Unfortunately, the "removing items as appropriate" part of this algorithm requires that you create data to store the non-removed elements in O(M), as you don't have access to the internal data array of list A. In this case, it's strictly better to go with the HashSet approach. This is because (1) the time complexity of O((M+N)*log(N)) is actually worse than the time complexity for the HashSet method, and (2) the new algorithm doesn't save on memory. Therefore, only use the first approach when you have a List with O(1) time for removal (e.g. LinkedList) and a large amount of data. Otherwise, use removeAll. It's simpler, often faster, and supported by library designers (e.g. ArrayList has a custom removeAll implementation that allows it to take linear instead of quadratic time using negligible extra memory).

You can achieve this in following way
Sort second list( you can sort any one of the list. Here I have sorted second list). After that loop through first array and for each element of first array, do binary search in second array.
You can sort list by using Collections.sort() method.
Total complexity:-
For sorting :- O(mLogm) where m is size of second array. I have sorted only second array.
For removing :- O(nLogm)

Time complexity of TreeMap<> operations: get() and subMap()

Based on this post,
Time complexity of TreeMap operations- subMap, headMap, tailMap
subMap() itself is O(1), and O(n) comes from iterating the sub map.
So, why use get(key) then?
We can use subMap(key, true, key, true) instead,
which is O(1) and iterating this sub map is also O(1).
Faster than get(key), which is O(log(n)). Something wrong here...

We can use subMap(key, true, key, true) instead, which is O(1)
This is correct
and iterating this sub map is also O(1).
O(n) comes from the question. The answer says nothing to imply this, which is good, because it's not true.
Time complexity of iterating a subtree is O(log n + k), where n is the number of elements in the whole map, and k is the number of elements in the sub-map. In other words, it still takes O(log n) to get to the first position when you start iterating. Look up getFirstEntry() implementation to see how it is done.
This brings the overall complexity of your approach to O(log n), but it is bound to be slower than a simple get, because an intermediate object is created and discarded in the process.

The answer is a bit confusing. Technically it's true that creating the submap is constant operation. But that's just because it actually does nothing apart from setting the low and high keys and still shares the tree structure with the original tree.
As a result any operation on the tree is actually postponed until the specific method is invoked. So then get() still goes through the whole original map and only checks whether it didn't cross the low and high boundaries. Simply saying the get() is still O(n) where the n comes from the original map, not from the submap.

The construction of subMap takes O(1) time, however all retrieval operations take the same O(log n) time as in the original map because SubMap just wraps this object and eventually complete a range check and delegate the invocation of get() method to the original source map object.

Can we reduce Time complexity O(n) of preparing a Java HashSet from an ArrayList?

There is an ArrayList of 10000+ items. I am trying to make them unique through a HashSet which is a operation of O(n) complexity. Is there any other algorithm / DS which can make a Collection unique with lower complexity than O(n)?

No, this is literally impossible. O(n) is the minimum complexity to so much as read through the ArrayList, let alone do anything with the elements.

Without going through all the elements once, it is not possible to confirm that your set has all unique values. Hence, O(n) is minimum possible.

Running time of insertion into 2 hashtables with iteration and printing

I have a program that does the following:
Iterates through a string, placing words into a HashMap<String, Integer> where the key represents the unique word, and the value represents a running total occurrences (incremented each time the word is found).
I believe up to this point we are O(n) since each of the insertions is constant time.
Then, I iterate through the hashmap and insert the values into a new HashMap<Integer, List<String>>. The String goes into the List in the value where the count matches. I think that we are still at O(n) because the operations used on HashMaps and Lists are constant time.
Then, I iterate through the HashMap and print the Strings in each List.
Does anything in this program cause me to go above O(n) complexity?

That is O(n), unless your word-parsing algorithm is not linear (but it should be).

You're correct, with a caveat. In a hash table, insertions and lookups take expected O(1) time each, so the expected runtime of your algorithm is O(n). If you have a bad hash function, there's a chance it will take longer than that, usually (for most reasonable hash table implementations) O(n2) in the worst-case.
Additionally, as #Paul Draper pointed out, this assumes that the computation of the hash code for each string takes time O(1) and that comparing the strings in the table takes time O(1). If you have strings whose lengths aren't bounded from above by some constant, it might take longer to compute the hash codes. In fact, a more accurate analysis would be that the runtime is O(n + L), where L is the total length of all the strings.
Hope this helps!

Beyond the two issues that Paul Draper and templatetypedef point out, there's another potential one. You write that your second map is a hashmap < int,list < string > >. This allows for a total linear complexity only if the implementation you choose for the list allows for (amortized) constant time appending. This is the case if you use an ArrayList and you add entries at the end, or you choose a LinkedList and add entries at either end.
I think this covers the default choices for most developers, so it's not really an obstacle.

Java: ArrayList add() and remove() performance, implementation?

I have read somewhere that ArrayList's add() and remove() operations run in "amortized constant" time. What does this mean exactly?
In the implementation of add(item) I can see that it ArrayList uses an array buffer, which is at most 3/2 of the list't size, and if it is full, System.arraycopy() is called, which should execute in O(n), not O(1) time. Is it then that System.arraycopy attempts to do something smarter than copying elements one by one into newly created array, since the time is actually O(1)?
Conclusion: add(item) runs in amortized constant time, but add(item, index) and remove(index) don't, they run in linear time (as explained in answers).

I have read somewhere that ArrayList's add() and remove() operations run in "amortized constant" time.
I don't think that is true for remove() except under unusual conditions.
A remove(Object) call for a random element on average has to call equals on half of entries in the list, and then copy the references for the other half.
A remove(int) call for a random element on average has to copy the references for half of the elements.
The only cases where remove(...) is going to be O(1) on average (e.g. amortized) is when you are using remove(int) to remove elements some constant offset from the end of the list.

"Amortized" roughly means "averaged across the entire runtime". Yes, an array-copy will be O(n). But that only happens when the list is full, which happens 1 in n times.

I think that amortized constant time just means that it pretty much constant time if you do tons of operations. So in one test a million items to the list, then in another test add two million items to the list. The latter should be about ~2 times slower than the former, therefore, amortized constant time.

Amortized constant time is different than constant time.
Basically amortized O(1) means that over n operations, the average run time for any operation is O(1).
For array lists, this works something like:
(O(1) insert + O(1) insert + ... + O(n) array copy) / n operations = O(1)

An indepth description of the meaning of Amortized Constant in thread Constant Amortized Time

Amortised time explained in simple terms:
If you do an operation say a million times, you don't really care about the worst-case or the best-case of that operation - what you care about is how much time is taken in total when you repeat the operation a million times.
So it doesn't matter if the operation is very slow once in a while, as long as "once in a while" is rare enough for the slowness to be diluted away. Essentially amortised time means "average time taken per operation, if you do many operations". Amortised time doesn't have to be constant; you can have linear and logarithmic amortised time or whatever else.
Let's take mats' example of a dynamic array, to which you repeatedly add new items. Normally adding an item takes constant time (that is, O(1)). But each time the array is full, you allocate twice as much space, copy your data into the new region, and free the old space. Assuming allocates and frees run in constant time, this enlargement process takes O(n) time where n is the current size of the array.
So each time you enlarge, you take about twice as much time as the last enlarge. But you've also waited twice as long before doing it! The cost of each enlargement can thus be "spread out" among the insertions. This means that in the long term, the total time taken for adding m items to the array is O(m), and so the amortised time (i.e. time per insertion) is O(1).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HashMap vs. ArrayList insertion performance confusion - java

From my understanding a hashmap insertion is O(1) and for an arraylist the insertion is O(n) since for the hashmap the hashfunction computes the hashcode and index and inserts the entry and an array list does a comparison every time it enters a new element.

Related

Want to delete some elements from a list that is existed in another list

Time complexity of TreeMap<> operations: get() and subMap()

Can we reduce Time complexity O(n) of preparing a Java HashSet from an ArrayList?

Running time of insertion into 2 hashtables with iteration and printing

Java: ArrayList add() and remove() performance, implementation?

Categories

Resources