I have been learning the tips of Java SE 7. I have read a statement about ArrayList:
Access is performed in constant time.
Insertion/deletion is performed in linear time.
I would like to know what is constant and linear time access?
constant time means there is a hard bound how much time each op will take to perform.
Linear time means the longer the ArrayList is (more object it contains) the longer time the op will take. The connection is linear, i.e. time(op) <= CONST * #elements
In complexity analysis, we refer it as big O notation and linear time is O(n), and constant time is O(1)
The reason for it is:
Access is plain array access, and it is done in constant time in RAM machine (such as out PCs).
Insertion/Deletion - if it is not in the last element, requires shifting all following elements: (Insertion requries shifting to the right, and deletion to the left) - thus you actually need a linear number of OPs to perform insertion/deletion (unless it is the last element)
The meanings are:
constant means that the time is always the same, it doesn't matter the length of the List.
[constant time is also called O(1) in Big-O notation]
linear means that the more the List grows, the longer is the time, but in a linear way, so for example to perform a linear operation on a list that contains 20 elements it will take two times the time needed for a list with 10 elements.
[linear time is also called O(n) in Big-O notation]
A precisation: when comparing algorithms is normally provided the worst case performance, so it means that the time needed is less or equal than linear.
In your case the implementation of the List is based on arrays (so the name ArrayList) like this:
The access time is constant because when the program knows where the first element of the list is and how big is every cell, it can directly get the n-th element using simple math like:
element_n_cell = element_1_cell + (cell_size * n)
Insertions and deletions are more time-expensive for two reasons:
When you insert or delete an element in a position, all the following elements need to be shifted.
An array can't be resized, so when you instantiate a new ArrayList, Java will create an array with a pre-defined length s, and it will use the same array as long as it fits. When you add the (s+1)-th element, the program needs to create a bigger array and copy all the elements in the new one.
Undestand Constant time access
java.util.ArrayList implements java.util.RandomAccess interface, which is a marker interface that signifies that you can directly access any element of this collection. This also implies that it takes the same amount of time (constant time) to access any element.
If we take java.util.LinkedList, it takes more time to access the last element than the first element.
Related
I have a list suppose
listA=[679,890,907,780,5230,781]
and want to delete some elements that is existed in another
listB=[907,5230]
in minimum time complexity?
I can do this problem by using two "for loops" means O(n2) time complexity, but I want to reduce this complexity to O(nlog(n)) or O(n)?
Is it possible?
It's possible - if one of the lists is sorted. Assuming that list A is sorted and list B is unsorted, with respective dimensions M and N, the minimum time complexity to remove all of list B's elements from list A will be O((N+M)*log(M)). The way you can achieve this is by binary search - each lookup for an element in list A takes O(log(M)) time, and there are N lookups (one for each element in list B). Since it takes O(M*log(M)) time to sort A, it's more efficient for huge lists to sort and then remove all elements, with total time complexity O((N+M)*log(M)).
On the other hand, if you don't have a sorted list, just use Collection.removeAll, which has a time complexity of O(M*N) in this case. The reason for this time complexity is that removeAll does (by default) something like the following pseudocode:
public boolean removeAll(Collection<?> other)
for each elem in this list
if other contains elem
remove elem from this list
Since contains has a time complexity of O(N) for lists, and you end up doing M iterations, this takes O(M*N) time in total.
Finally, if you want to minimize the time complexity of removeAll (with possibly degraded real world performance) you can do the following:
List<Integer> a = ...
List<Integer> b = ...
HashSet<Integer> lookup = new HashSet<>(b);
a.removeAll(lookup);
For bad values of b, the time to construct lookup could take up to time O(N*log(N)), as shown here (see "pathologically distributed keys"). After that, invoking removeAll will take O(1) for contains over M iterations, taking O(M) time to execute. Therefore, the time complexity of this approach is O(M + N*log(N)).
So, there are three approaches here. One provides you with time complexity O((N+M)*log(M)), another provides you with time complexity O(M*N), and the last provides you with time complexity O(M + N*log(N)). Considering that the first and last approaches are similar in time complexity (as log tends to be very small even for large numbers), I would suggest going with the naive O(M*N) for small inputs, and the simplest O(M + N*log(N)) for medium-sized inputs. At the point where your memory usage starts to suffer from creating a HashSet to store the elements of B (very large inputs), I would finally switch to the more complex O((N+M)*log(M)) approach.
You can find an AbstractCollection.removeAll implementation here.
Edit:
The first approach doesn't work so well for ArrayLists - removing from the middle of list A takes O(M) time, apparently. Instead, sort list B (O(N*log(N))), and iterate through list A, removing items as appropriate. This takes O((M+N)*log(N)) time and is better than the O(M*N*log(M)) that you end up with when using an ArrayList. Unfortunately, the "removing items as appropriate" part of this algorithm requires that you create data to store the non-removed elements in O(M), as you don't have access to the internal data array of list A. In this case, it's strictly better to go with the HashSet approach. This is because (1) the time complexity of O((M+N)*log(N)) is actually worse than the time complexity for the HashSet method, and (2) the new algorithm doesn't save on memory. Therefore, only use the first approach when you have a List with O(1) time for removal (e.g. LinkedList) and a large amount of data. Otherwise, use removeAll. It's simpler, often faster, and supported by library designers (e.g. ArrayList has a custom removeAll implementation that allows it to take linear instead of quadratic time using negligible extra memory).
You can achieve this in following way
Sort second list( you can sort any one of the list. Here I have sorted second list). After that loop through first array and for each element of first array, do binary search in second array.
You can sort list by using Collections.sort() method.
Total complexity:-
For sorting :- O(mLogm) where m is size of second array. I have sorted only second array.
For removing :- O(nLogm)
I am currently reading my textbook and I am totally confused why a dynamic array would require O(n) time to delete an item at the end. I understand that deleting an item from any other index is O(n) because you have to copy all the data and move them to fill in the gap, but if it’s at the end don’t we simply just decrement the count and set the index to like 0 or null? I included a picture from my book. It’s weird cause it says indexing is O(1) so we must know where the item is so we don’t have to traverse the array like a linked list.
First, let's look up what the books means with a "Dynamic Array":
Dynamic array (also called as growable array, resizable array,
dynamic table, or array list) is a random access, variable-size list data structure that allows elements to be added or removed.
[...]
Note: We will see the implementation for dynamic array in the Stacks, Queues and Hashing chapters.
From this we learn that array lists are examples of a "Dynamic Array" as the author of the book defines it.
But looking further, the book mentioned that:
As soon as that array becomes full, create the new array of size
double than the original array. Similarly, reduce the array size to
half if the elements in the array are less than half.
(emphasis added by me)
A Java ArrayList doesn't do this - it doesn't decrease in storage when elements are removed. But the author is talking about (or believes that ArrayList does) reduce the array size.
In that case, from a worst-worst-case perspective, you could say that the complexity is O(n) because reducing the size involves copying n elements to the reduced array.
Conclusion:
Although it's not true for Java ArrayList implementations, when the author of this book talks about "dynamic arrays" that "reduce the array size" on deletion when necessary, then the worst-case complexity of a delete at the end of the array is indeed O(n).
That entry seems like it's either
incorrect, or
true, but misleading.
You are absolutely right that you can just destroy the object at the final position in a dynamic array and then decrement the size to remove the last element. In many implementations of dynamic arrays, you'll sometimes need to perform resize operations to make sure that the size of the allocated array is within some constant factor of the number of elements. If that happens, then yes, you'll need to make a new array, copy over the old elements, and free the previous array, which does take time O(n). However, you can show that these resizes are sufficiently infrequent that the average cost of doing a remove from the end is O(1). (In a more technical sense, we say that the amortized cost of removing an element from the end is O(1)). That is, as long as you care only about the total cost of performing a series of operations rather than the cost of any individual operation, you would not be wrong to just pretend each operation costs you time O(1).
I'd say that you're most likely looking at a typo in the materials. Looking at their entry for appending to the end, which differentiates between the not-full and full cases, I think this is likely a copy/paste error. Following the table's lead, that should say something to the effect of "O(1) if the array is not 'too empty,' O(n) otherwise." But again, keep in mind that the amortized efficiency of each operation is O(1), meaning that these scary O(n) terms aren't actually likely to burn you in practice unless you are in a specialized environment where each operation needs to work really quickly.
In java for Dynamic array (ArrayList) time complexity deletion of last element is o(1) in java it does not copy array
in java they will check weather the array index is end.
int numMoved = size - index - 1;
if (numMoved > 0)
//copy array element
Insertion and deletion are operations that we generally do not perform on arrays, because they have a fixed length by nature. You cannot increase or decrease the length of something which is fixed by its nature.
When people speak of "dynamic arrays" in Java they tend to mean using class ArrayList, which is backed by an array, and it provides the illusion of the ability to insert and remove elements.
But in order for some piece of software to provide the illusion of performing insertions and deletions on an array, each time (or almost each time, there are optimizations possible) it has to allocate a new array of the new desired length, and copy the old array into it, skipping the removed element or adding the inserted element as the case may be. That array copy is where the O(N) comes from.
And, due to the optimizations performed by ArrayList, the table that you posted is not accurate: it should rather say 'O(1) if the array has not shrunk by much, O(N) if the array has shrunk by so much that reallocation is deemed necessary'. But I guess that would have been too long to fit in the table cell.
As you have mentioned this could be confusing, if you add an element to a dynamic array, it change its size in a constant interval and will create a new array an copy elements to the new array as you may already know. And when it shrinks in size it will also shirk if needed.
For an example if interval is 4 when you add 1st, 2nd, 3rd, 4th element, everything will be okay, but when you add the 5th item dynamic array will grow into a 8 elements array and will copy the all elements to the new array.
It is the same when it is decreasing. If you remove one item from a 5 item array which has an interval of 4, dynamic array it will create a new 4 elements array and copy the elements.
Here is a good representation video tutorial,
Yes. When dynamic array does not have to shrink it is O(1) which it takes to remove the element but when it has to shrink its O(n), as you may already figured out.
when you find the big O notation you are defining the worst case, so it is O(n)
As far as I think
As it is a dynamic array so the computer system does not know as to what is the current length of this dynamic array so to find the length of this dynamic array it takes O(n) time and then takes O(1) time to delete the element at end.
Deleting an Item from Dynamic array(ArrayList in java) require to Search for the Item and than Delete.
If the element is at the end of list, than Search itself will result in n computation time. I hope this makes sense to you.
You can view source at http://www.docjar.com/html/api/java/util/ArrayList.java.html
Suppose you are given a list of integers that have already been sorted such as (1,7,13,14,50). It should be noted that the list will contain no duplicates.
Is there some data structure that could store this while allowing me to add any new element (at it's proper location) in constant time? add(10) would yield (1,7,10,13,14,50).
Similarly, would I be able to update an element (such as changing 7 to 19) and shift the order accordingly in constant time? change(7,19) yields (1,13,14,19,50).
For a class I need to write a data structure that performs these operations as quickly as possible, but I just wanted to know if constant time could be done and if not, then what would the ideal runtime be?
To insert in constant time, O(1), this would only occur as a best case for any of the data structures. Hash tables generally have the best insertion time, but it might not always be O(1) if there are collisions and there is separate chaining. You do not sort a hash so the complexity is irrelevent.
Binary tree's have a good insertion time, and as a bonus, it is sorted already upon inserting a new node. This takes on average O(logn) time however. The best case for inserting is O(1) if the tree is empty.
Those were just a couple examples, see here for more info on the complexities of these operations: http://bigocheatsheet.com/
In general? No. Determining where to insert a new element or re-ordering the list after insertion involves performing analysis of the list's contents, which involves reading the elements of the list, which (in general) means iterating over some portion of the length of the list. This (again, in general) is dependant on how many elements are in the list, which by definition is not a constant. Hence, a constant-time sorted insert is simply not possible except in special cases.
A binary tree, TreeSet, would be adequate. An array with Arrays.binarySearch and Arrays.copy would be fine too because here we have ints, and then we do not need the wrapper class Integer.
For real constant time, O(1), one must pay in space. Use a BitSet. To add 17 simply set 17 to true. There are optimized methods to find the next set bit and so on.
But I doubt optimizing is really needed at this spot. File I/O might pay off more.
This is from wikipedia: http://en.wikipedia.org/wiki/Arraylist under Performance.
ArrayList: constant time for remove(), add() at end of array, linear time to add(), remove() at beginning.
LinkedList: both of operations stated : constant time, indexing: linear.
1)Why the difference in arraylist processing time between the two operations?
2)Linkedlist is linear for indexing, constant for adding at the end, why?
1) Because to add/remove at the beginning it has to shift everything and reindex.
2) Because it maintains references to the head and tail (beginning & end). Indexing means traversing the list.
When you add to the end of an ArrayList, it will grow itself to have some room to spare. So if you have a ten-element ArrayList, adding at the end will cause it to internally allocate room for twenty elements, copy the ten you already had, and then add one. Then, when you add another element at the end, it just sticks that twelfth element into the space it already created.
This does not technically give it constant time insertion at the end, but it does give it amortized constant time insertion. That is to say, over a large number of operations, the cost approaches constant time; each time it grows, it doubles, so you'll have an ever-larger number of "free" constant-time inserts before you have to grow-and-copy again.
When you insert at the beginning, it can't do this and must always copy the whole list into a new location (linear time).
Removal from the end is always constant time because you just switch the last cell from being "filled" to "free space". You never need to copy the list.
As for your second question, a LinkedList keeps a pointer to the end of the list, so add and remove there just use that pointer and are thus constant time. There are no quick pointers into the middle of the list, so accessing an arbitrary element requires a linear-time traversal from start to (potentially) finish.
i) ArrayList -> You've got to push all the elements by one position in case of removal/addition in the beginning, hence linear time. At the end of array, you simply add or remove.
ii)LinkedList -> You have references of head and tail, hence you can add/remove anything there (in constant time).
Because removing at the end does not require moving the data. Adding may require copying to resize the storage array, but it's time is amortized.
Because adding at the end does not require walking the list, but indexing does.
I have read somewhere that ArrayList's add() and remove() operations run in "amortized constant" time. What does this mean exactly?
In the implementation of add(item) I can see that it ArrayList uses an array buffer, which is at most 3/2 of the list't size, and if it is full, System.arraycopy() is called, which should execute in O(n), not O(1) time. Is it then that System.arraycopy attempts to do something smarter than copying elements one by one into newly created array, since the time is actually O(1)?
Conclusion: add(item) runs in amortized constant time, but add(item, index) and remove(index) don't, they run in linear time (as explained in answers).
I have read somewhere that ArrayList's add() and remove() operations run in "amortized constant" time.
I don't think that is true for remove() except under unusual conditions.
A remove(Object) call for a random element on average has to call equals on half of entries in the list, and then copy the references for the other half.
A remove(int) call for a random element on average has to copy the references for half of the elements.
The only cases where remove(...) is going to be O(1) on average (e.g. amortized) is when you are using remove(int) to remove elements some constant offset from the end of the list.
"Amortized" roughly means "averaged across the entire runtime". Yes, an array-copy will be O(n). But that only happens when the list is full, which happens 1 in n times.
I think that amortized constant time just means that it pretty much constant time if you do tons of operations. So in one test a million items to the list, then in another test add two million items to the list. The latter should be about ~2 times slower than the former, therefore, amortized constant time.
Amortized constant time is different than constant time.
Basically amortized O(1) means that over n operations, the average run time for any operation is O(1).
For array lists, this works something like:
(O(1) insert + O(1) insert + ... + O(n) array copy) / n operations = O(1)
An indepth description of the meaning of Amortized Constant in thread Constant Amortized Time
Amortised time explained in simple terms:
If you do an operation say a million times, you don't really care about the worst-case or the best-case of that operation - what you care about is how much time is taken in total when you repeat the operation a million times.
So it doesn't matter if the operation is very slow once in a while, as long as "once in a while" is rare enough for the slowness to be diluted away. Essentially amortised time means "average time taken per operation, if you do many operations". Amortised time doesn't have to be constant; you can have linear and logarithmic amortised time or whatever else.
Let's take mats' example of a dynamic array, to which you repeatedly add new items. Normally adding an item takes constant time (that is, O(1)). But each time the array is full, you allocate twice as much space, copy your data into the new region, and free the old space. Assuming allocates and frees run in constant time, this enlargement process takes O(n) time where n is the current size of the array.
So each time you enlarge, you take about twice as much time as the last enlarge. But you've also waited twice as long before doing it! The cost of each enlargement can thus be "spread out" among the insertions. This means that in the long term, the total time taken for adding m items to the array is O(m), and so the amortised time (i.e. time per insertion) is O(1).