Java: ArrayList add() and remove() performance, implementation? - java

I have read somewhere that ArrayList's add() and remove() operations run in "amortized constant" time. What does this mean exactly?
In the implementation of add(item) I can see that it ArrayList uses an array buffer, which is at most 3/2 of the list't size, and if it is full, System.arraycopy() is called, which should execute in O(n), not O(1) time. Is it then that System.arraycopy attempts to do something smarter than copying elements one by one into newly created array, since the time is actually O(1)?
Conclusion: add(item) runs in amortized constant time, but add(item, index) and remove(index) don't, they run in linear time (as explained in answers).

I have read somewhere that ArrayList's add() and remove() operations run in "amortized constant" time.
I don't think that is true for remove() except under unusual conditions.
A remove(Object) call for a random element on average has to call equals on half of entries in the list, and then copy the references for the other half.
A remove(int) call for a random element on average has to copy the references for half of the elements.
The only cases where remove(...) is going to be O(1) on average (e.g. amortized) is when you are using remove(int) to remove elements some constant offset from the end of the list.

"Amortized" roughly means "averaged across the entire runtime". Yes, an array-copy will be O(n). But that only happens when the list is full, which happens 1 in n times.

I think that amortized constant time just means that it pretty much constant time if you do tons of operations. So in one test a million items to the list, then in another test add two million items to the list. The latter should be about ~2 times slower than the former, therefore, amortized constant time.

Amortized constant time is different than constant time.
Basically amortized O(1) means that over n operations, the average run time for any operation is O(1).
For array lists, this works something like:
(O(1) insert + O(1) insert + ... + O(n) array copy) / n operations = O(1)

An indepth description of the meaning of Amortized Constant in thread Constant Amortized Time

Amortised time explained in simple terms:
If you do an operation say a million times, you don't really care about the worst-case or the best-case of that operation - what you care about is how much time is taken in total when you repeat the operation a million times.
So it doesn't matter if the operation is very slow once in a while, as long as "once in a while" is rare enough for the slowness to be diluted away. Essentially amortised time means "average time taken per operation, if you do many operations". Amortised time doesn't have to be constant; you can have linear and logarithmic amortised time or whatever else.
Let's take mats' example of a dynamic array, to which you repeatedly add new items. Normally adding an item takes constant time (that is, O(1)). But each time the array is full, you allocate twice as much space, copy your data into the new region, and free the old space. Assuming allocates and frees run in constant time, this enlargement process takes O(n) time where n is the current size of the array.
So each time you enlarge, you take about twice as much time as the last enlarge. But you've also waited twice as long before doing it! The cost of each enlargement can thus be "spread out" among the insertions. This means that in the long term, the total time taken for adding m items to the array is O(m), and so the amortised time (i.e. time per insertion) is O(1).

Related

Want to delete some elements from a list that is existed in another list

I have a list suppose
listA=[679,890,907,780,5230,781]
and want to delete some elements that is existed in another
listB=[907,5230]
in minimum time complexity?
I can do this problem by using two "for loops" means O(n2) time complexity, but I want to reduce this complexity to O(nlog(n)) or O(n)?
Is it possible?
It's possible - if one of the lists is sorted. Assuming that list A is sorted and list B is unsorted, with respective dimensions M and N, the minimum time complexity to remove all of list B's elements from list A will be O((N+M)*log(M)). The way you can achieve this is by binary search - each lookup for an element in list A takes O(log(M)) time, and there are N lookups (one for each element in list B). Since it takes O(M*log(M)) time to sort A, it's more efficient for huge lists to sort and then remove all elements, with total time complexity O((N+M)*log(M)).
On the other hand, if you don't have a sorted list, just use Collection.removeAll, which has a time complexity of O(M*N) in this case. The reason for this time complexity is that removeAll does (by default) something like the following pseudocode:
public boolean removeAll(Collection<?> other)
for each elem in this list
if other contains elem
remove elem from this list
Since contains has a time complexity of O(N) for lists, and you end up doing M iterations, this takes O(M*N) time in total.
Finally, if you want to minimize the time complexity of removeAll (with possibly degraded real world performance) you can do the following:
List<Integer> a = ...
List<Integer> b = ...
HashSet<Integer> lookup = new HashSet<>(b);
a.removeAll(lookup);
For bad values of b, the time to construct lookup could take up to time O(N*log(N)), as shown here (see "pathologically distributed keys"). After that, invoking removeAll will take O(1) for contains over M iterations, taking O(M) time to execute. Therefore, the time complexity of this approach is O(M + N*log(N)).
So, there are three approaches here. One provides you with time complexity O((N+M)*log(M)), another provides you with time complexity O(M*N), and the last provides you with time complexity O(M + N*log(N)). Considering that the first and last approaches are similar in time complexity (as log tends to be very small even for large numbers), I would suggest going with the naive O(M*N) for small inputs, and the simplest O(M + N*log(N)) for medium-sized inputs. At the point where your memory usage starts to suffer from creating a HashSet to store the elements of B (very large inputs), I would finally switch to the more complex O((N+M)*log(M)) approach.
You can find an AbstractCollection.removeAll implementation here.
Edit:
The first approach doesn't work so well for ArrayLists - removing from the middle of list A takes O(M) time, apparently. Instead, sort list B (O(N*log(N))), and iterate through list A, removing items as appropriate. This takes O((M+N)*log(N)) time and is better than the O(M*N*log(M)) that you end up with when using an ArrayList. Unfortunately, the "removing items as appropriate" part of this algorithm requires that you create data to store the non-removed elements in O(M), as you don't have access to the internal data array of list A. In this case, it's strictly better to go with the HashSet approach. This is because (1) the time complexity of O((M+N)*log(N)) is actually worse than the time complexity for the HashSet method, and (2) the new algorithm doesn't save on memory. Therefore, only use the first approach when you have a List with O(1) time for removal (e.g. LinkedList) and a large amount of data. Otherwise, use removeAll. It's simpler, often faster, and supported by library designers (e.g. ArrayList has a custom removeAll implementation that allows it to take linear instead of quadratic time using negligible extra memory).
You can achieve this in following way
Sort second list( you can sort any one of the list. Here I have sorted second list). After that loop through first array and for each element of first array, do binary search in second array.
You can sort list by using Collections.sort() method.
Total complexity:-
For sorting :- O(mLogm) where m is size of second array. I have sorted only second array.
For removing :- O(nLogm)

About time complexity of arraylist and linkedlist

In page 290 of the book data structures and algorithms it is mentioned that complexity of remove(i) for arraylist is O(1). My first question is why not O(n)? It is also mentioned add(i,e) for linked list is O(n), so my second question is why not O(min(i,n-i))?
Finally, my 3rd question is the reason the complexity is mentioned as O(min(i,n-i)) is it due to being a doubly linked list, meaning we could either traverse from beginning (i) or end (n-i)?
The first one is debatable. When you remove the last element in an ArrayList, it's constant, but for a middle element, you need to shift all successor elements to the left. Java does that using System.arrayCopy(), a very fast native routine for copying arrays, but even that method is clearly O(n), not constant, so I'm inclined to agree with you. It's different for an insert, where the amortized cost of resizing arrays up to the required index is averaged out to a constant factor, so add() is O(1).
The second one could be implemented that way, but it isn't. Remove starts from the beginning only. I'm guessing the choice was made to reduce accidents by unsynchronized access.
Finally, in notations of Big-O complexity, less significant factors are discarded, so O(min(i,n-i)) is actually equivalent to O(n), even though the real world tells us that the former would certainly be an optimization.

HashMap vs. ArrayList insertion performance confusion

From my understanding a hashmap insertion is O(1) and for an arraylist the insertion is O(n) since for the hashmap the hashfunction computes the hashcode and index and inserts the entry and an array list does a comparison every time it enters a new element.
Firstly, an operation of complexity O(1) does not always take lesser time than an operation of complexity O(n). O(1) only means that the operation takes a constant time (which could be any value), regardless of the size of the input. O(n) means that the time required for the operation increases linearly with the size of the input. This means that O(1) is theoretically guaranteed to take lesser time than O(n) only when n is infinity.
Now coming to your examples, ArrayList.add() operation runs in amortized constant time, which means although it could take up to O(n) time for a particular iteration, the average complexity spread out over time is O(1). For more information on amortized constant time, refer to this question.
ArrayList is faster than the HashMap when you add item at the last of the ArrayList because there is no need to shift the elements in the ArrayList to the right side, you can see the efficiency of the HashMap if you add an item at the front of the ArrayList like this arrayList.add(0, str).
when check this use 1000 as outer loop instead of 100000 otherwise it may hang.

ArrayList Constant-time and Linear Time Access

I have been learning the tips of Java SE 7. I have read a statement about ArrayList:
Access is performed in constant time.
Insertion/deletion is performed in linear time.
I would like to know what is constant and linear time access?
constant time means there is a hard bound how much time each op will take to perform.
Linear time means the longer the ArrayList is (more object it contains) the longer time the op will take. The connection is linear, i.e. time(op) <= CONST * #elements
In complexity analysis, we refer it as big O notation and linear time is O(n), and constant time is O(1)
The reason for it is:
Access is plain array access, and it is done in constant time in RAM machine (such as out PCs).
Insertion/Deletion - if it is not in the last element, requires shifting all following elements: (Insertion requries shifting to the right, and deletion to the left) - thus you actually need a linear number of OPs to perform insertion/deletion (unless it is the last element)
The meanings are:
constant means that the time is always the same, it doesn't matter the length of the List.
[constant time is also called O(1) in Big-O notation]
linear means that the more the List grows, the longer is the time, but in a linear way, so for example to perform a linear operation on a list that contains 20 elements it will take two times the time needed for a list with 10 elements.
[linear time is also called O(n) in Big-O notation]
A precisation: when comparing algorithms is normally provided the worst case performance, so it means that the time needed is less or equal than linear.
In your case the implementation of the List is based on arrays (so the name ArrayList) like this:
The access time is constant because when the program knows where the first element of the list is and how big is every cell, it can directly get the n-th element using simple math like:
element_n_cell = element_1_cell + (cell_size * n)
Insertions and deletions are more time-expensive for two reasons:
When you insert or delete an element in a position, all the following elements need to be shifted.
An array can't be resized, so when you instantiate a new ArrayList, Java will create an array with a pre-defined length s, and it will use the same array as long as it fits. When you add the (s+1)-th element, the program needs to create a bigger array and copy all the elements in the new one.
Undestand Constant time access
java.util.ArrayList implements java.util.RandomAccess interface, which is a marker interface that signifies that you can directly access any element of this collection. This also implies that it takes the same amount of time (constant time) to access any element.
If we take java.util.LinkedList, it takes more time to access the last element than the first element.

java List processing time

This is from wikipedia: http://en.wikipedia.org/wiki/Arraylist under Performance.
ArrayList: constant time for remove(), add() at end of array, linear time to add(), remove() at beginning.
LinkedList: both of operations stated : constant time, indexing: linear.
1)Why the difference in arraylist processing time between the two operations?
2)Linkedlist is linear for indexing, constant for adding at the end, why?
1) Because to add/remove at the beginning it has to shift everything and reindex.
2) Because it maintains references to the head and tail (beginning & end). Indexing means traversing the list.
When you add to the end of an ArrayList, it will grow itself to have some room to spare. So if you have a ten-element ArrayList, adding at the end will cause it to internally allocate room for twenty elements, copy the ten you already had, and then add one. Then, when you add another element at the end, it just sticks that twelfth element into the space it already created.
This does not technically give it constant time insertion at the end, but it does give it amortized constant time insertion. That is to say, over a large number of operations, the cost approaches constant time; each time it grows, it doubles, so you'll have an ever-larger number of "free" constant-time inserts before you have to grow-and-copy again.
When you insert at the beginning, it can't do this and must always copy the whole list into a new location (linear time).
Removal from the end is always constant time because you just switch the last cell from being "filled" to "free space". You never need to copy the list.
As for your second question, a LinkedList keeps a pointer to the end of the list, so add and remove there just use that pointer and are thus constant time. There are no quick pointers into the middle of the list, so accessing an arbitrary element requires a linear-time traversal from start to (potentially) finish.
i) ArrayList -> You've got to push all the elements by one position in case of removal/addition in the beginning, hence linear time. At the end of array, you simply add or remove.
ii)LinkedList -> You have references of head and tail, hence you can add/remove anything there (in constant time).
Because removing at the end does not require moving the data. Adding may require copying to resize the storage array, but it's time is amortized.
Because adding at the end does not require walking the list, but indexing does.

Categories