In page 290 of the book data structures and algorithms it is mentioned that complexity of remove(i) for arraylist is O(1). My first question is why not O(n)? It is also mentioned add(i,e) for linked list is O(n), so my second question is why not O(min(i,n-i))?
Finally, my 3rd question is the reason the complexity is mentioned as O(min(i,n-i)) is it due to being a doubly linked list, meaning we could either traverse from beginning (i) or end (n-i)?
The first one is debatable. When you remove the last element in an ArrayList, it's constant, but for a middle element, you need to shift all successor elements to the left. Java does that using System.arrayCopy(), a very fast native routine for copying arrays, but even that method is clearly O(n), not constant, so I'm inclined to agree with you. It's different for an insert, where the amortized cost of resizing arrays up to the required index is averaged out to a constant factor, so add() is O(1).
The second one could be implemented that way, but it isn't. Remove starts from the beginning only. I'm guessing the choice was made to reduce accidents by unsynchronized access.
Finally, in notations of Big-O complexity, less significant factors are discarded, so O(min(i,n-i)) is actually equivalent to O(n), even though the real world tells us that the former would certainly be an optimization.
Related
I have a list suppose
listA=[679,890,907,780,5230,781]
and want to delete some elements that is existed in another
listB=[907,5230]
in minimum time complexity?
I can do this problem by using two "for loops" means O(n2) time complexity, but I want to reduce this complexity to O(nlog(n)) or O(n)?
Is it possible?
It's possible - if one of the lists is sorted. Assuming that list A is sorted and list B is unsorted, with respective dimensions M and N, the minimum time complexity to remove all of list B's elements from list A will be O((N+M)*log(M)). The way you can achieve this is by binary search - each lookup for an element in list A takes O(log(M)) time, and there are N lookups (one for each element in list B). Since it takes O(M*log(M)) time to sort A, it's more efficient for huge lists to sort and then remove all elements, with total time complexity O((N+M)*log(M)).
On the other hand, if you don't have a sorted list, just use Collection.removeAll, which has a time complexity of O(M*N) in this case. The reason for this time complexity is that removeAll does (by default) something like the following pseudocode:
public boolean removeAll(Collection<?> other)
for each elem in this list
if other contains elem
remove elem from this list
Since contains has a time complexity of O(N) for lists, and you end up doing M iterations, this takes O(M*N) time in total.
Finally, if you want to minimize the time complexity of removeAll (with possibly degraded real world performance) you can do the following:
List<Integer> a = ...
List<Integer> b = ...
HashSet<Integer> lookup = new HashSet<>(b);
a.removeAll(lookup);
For bad values of b, the time to construct lookup could take up to time O(N*log(N)), as shown here (see "pathologically distributed keys"). After that, invoking removeAll will take O(1) for contains over M iterations, taking O(M) time to execute. Therefore, the time complexity of this approach is O(M + N*log(N)).
So, there are three approaches here. One provides you with time complexity O((N+M)*log(M)), another provides you with time complexity O(M*N), and the last provides you with time complexity O(M + N*log(N)). Considering that the first and last approaches are similar in time complexity (as log tends to be very small even for large numbers), I would suggest going with the naive O(M*N) for small inputs, and the simplest O(M + N*log(N)) for medium-sized inputs. At the point where your memory usage starts to suffer from creating a HashSet to store the elements of B (very large inputs), I would finally switch to the more complex O((N+M)*log(M)) approach.
You can find an AbstractCollection.removeAll implementation here.
Edit:
The first approach doesn't work so well for ArrayLists - removing from the middle of list A takes O(M) time, apparently. Instead, sort list B (O(N*log(N))), and iterate through list A, removing items as appropriate. This takes O((M+N)*log(N)) time and is better than the O(M*N*log(M)) that you end up with when using an ArrayList. Unfortunately, the "removing items as appropriate" part of this algorithm requires that you create data to store the non-removed elements in O(M), as you don't have access to the internal data array of list A. In this case, it's strictly better to go with the HashSet approach. This is because (1) the time complexity of O((M+N)*log(N)) is actually worse than the time complexity for the HashSet method, and (2) the new algorithm doesn't save on memory. Therefore, only use the first approach when you have a List with O(1) time for removal (e.g. LinkedList) and a large amount of data. Otherwise, use removeAll. It's simpler, often faster, and supported by library designers (e.g. ArrayList has a custom removeAll implementation that allows it to take linear instead of quadratic time using negligible extra memory).
You can achieve this in following way
Sort second list( you can sort any one of the list. Here I have sorted second list). After that loop through first array and for each element of first array, do binary search in second array.
You can sort list by using Collections.sort() method.
Total complexity:-
For sorting :- O(mLogm) where m is size of second array. I have sorted only second array.
For removing :- O(nLogm)
I am currently reading my textbook and I am totally confused why a dynamic array would require O(n) time to delete an item at the end. I understand that deleting an item from any other index is O(n) because you have to copy all the data and move them to fill in the gap, but if it’s at the end don’t we simply just decrement the count and set the index to like 0 or null? I included a picture from my book. It’s weird cause it says indexing is O(1) so we must know where the item is so we don’t have to traverse the array like a linked list.
First, let's look up what the books means with a "Dynamic Array":
Dynamic array (also called as growable array, resizable array,
dynamic table, or array list) is a random access, variable-size list data structure that allows elements to be added or removed.
[...]
Note: We will see the implementation for dynamic array in the Stacks, Queues and Hashing chapters.
From this we learn that array lists are examples of a "Dynamic Array" as the author of the book defines it.
But looking further, the book mentioned that:
As soon as that array becomes full, create the new array of size
double than the original array. Similarly, reduce the array size to
half if the elements in the array are less than half.
(emphasis added by me)
A Java ArrayList doesn't do this - it doesn't decrease in storage when elements are removed. But the author is talking about (or believes that ArrayList does) reduce the array size.
In that case, from a worst-worst-case perspective, you could say that the complexity is O(n) because reducing the size involves copying n elements to the reduced array.
Conclusion:
Although it's not true for Java ArrayList implementations, when the author of this book talks about "dynamic arrays" that "reduce the array size" on deletion when necessary, then the worst-case complexity of a delete at the end of the array is indeed O(n).
That entry seems like it's either
incorrect, or
true, but misleading.
You are absolutely right that you can just destroy the object at the final position in a dynamic array and then decrement the size to remove the last element. In many implementations of dynamic arrays, you'll sometimes need to perform resize operations to make sure that the size of the allocated array is within some constant factor of the number of elements. If that happens, then yes, you'll need to make a new array, copy over the old elements, and free the previous array, which does take time O(n). However, you can show that these resizes are sufficiently infrequent that the average cost of doing a remove from the end is O(1). (In a more technical sense, we say that the amortized cost of removing an element from the end is O(1)). That is, as long as you care only about the total cost of performing a series of operations rather than the cost of any individual operation, you would not be wrong to just pretend each operation costs you time O(1).
I'd say that you're most likely looking at a typo in the materials. Looking at their entry for appending to the end, which differentiates between the not-full and full cases, I think this is likely a copy/paste error. Following the table's lead, that should say something to the effect of "O(1) if the array is not 'too empty,' O(n) otherwise." But again, keep in mind that the amortized efficiency of each operation is O(1), meaning that these scary O(n) terms aren't actually likely to burn you in practice unless you are in a specialized environment where each operation needs to work really quickly.
In java for Dynamic array (ArrayList) time complexity deletion of last element is o(1) in java it does not copy array
in java they will check weather the array index is end.
int numMoved = size - index - 1;
if (numMoved > 0)
//copy array element
Insertion and deletion are operations that we generally do not perform on arrays, because they have a fixed length by nature. You cannot increase or decrease the length of something which is fixed by its nature.
When people speak of "dynamic arrays" in Java they tend to mean using class ArrayList, which is backed by an array, and it provides the illusion of the ability to insert and remove elements.
But in order for some piece of software to provide the illusion of performing insertions and deletions on an array, each time (or almost each time, there are optimizations possible) it has to allocate a new array of the new desired length, and copy the old array into it, skipping the removed element or adding the inserted element as the case may be. That array copy is where the O(N) comes from.
And, due to the optimizations performed by ArrayList, the table that you posted is not accurate: it should rather say 'O(1) if the array has not shrunk by much, O(N) if the array has shrunk by so much that reallocation is deemed necessary'. But I guess that would have been too long to fit in the table cell.
As you have mentioned this could be confusing, if you add an element to a dynamic array, it change its size in a constant interval and will create a new array an copy elements to the new array as you may already know. And when it shrinks in size it will also shirk if needed.
For an example if interval is 4 when you add 1st, 2nd, 3rd, 4th element, everything will be okay, but when you add the 5th item dynamic array will grow into a 8 elements array and will copy the all elements to the new array.
It is the same when it is decreasing. If you remove one item from a 5 item array which has an interval of 4, dynamic array it will create a new 4 elements array and copy the elements.
Here is a good representation video tutorial,
Yes. When dynamic array does not have to shrink it is O(1) which it takes to remove the element but when it has to shrink its O(n), as you may already figured out.
when you find the big O notation you are defining the worst case, so it is O(n)
As far as I think
As it is a dynamic array so the computer system does not know as to what is the current length of this dynamic array so to find the length of this dynamic array it takes O(n) time and then takes O(1) time to delete the element at end.
Deleting an Item from Dynamic array(ArrayList in java) require to Search for the Item and than Delete.
If the element is at the end of list, than Search itself will result in n computation time. I hope this makes sense to you.
You can view source at http://www.docjar.com/html/api/java/util/ArrayList.java.html
Suppose you are given a list of integers that have already been sorted such as (1,7,13,14,50). It should be noted that the list will contain no duplicates.
Is there some data structure that could store this while allowing me to add any new element (at it's proper location) in constant time? add(10) would yield (1,7,10,13,14,50).
Similarly, would I be able to update an element (such as changing 7 to 19) and shift the order accordingly in constant time? change(7,19) yields (1,13,14,19,50).
For a class I need to write a data structure that performs these operations as quickly as possible, but I just wanted to know if constant time could be done and if not, then what would the ideal runtime be?
To insert in constant time, O(1), this would only occur as a best case for any of the data structures. Hash tables generally have the best insertion time, but it might not always be O(1) if there are collisions and there is separate chaining. You do not sort a hash so the complexity is irrelevent.
Binary tree's have a good insertion time, and as a bonus, it is sorted already upon inserting a new node. This takes on average O(logn) time however. The best case for inserting is O(1) if the tree is empty.
Those were just a couple examples, see here for more info on the complexities of these operations: http://bigocheatsheet.com/
In general? No. Determining where to insert a new element or re-ordering the list after insertion involves performing analysis of the list's contents, which involves reading the elements of the list, which (in general) means iterating over some portion of the length of the list. This (again, in general) is dependant on how many elements are in the list, which by definition is not a constant. Hence, a constant-time sorted insert is simply not possible except in special cases.
A binary tree, TreeSet, would be adequate. An array with Arrays.binarySearch and Arrays.copy would be fine too because here we have ints, and then we do not need the wrapper class Integer.
For real constant time, O(1), one must pay in space. Use a BitSet. To add 17 simply set 17 to true. There are optimized methods to find the next set bit and so on.
But I doubt optimizing is really needed at this spot. File I/O might pay off more.
I have to choose between two data structures, ArrayList and LinkedList.
I have two operations op_one, op_two.
If I choose ArrayList -
I will end up with
for op_one ------ O(n), and at maximum n re-allocations
for op_two ------ O(1), and at maximum n re-allocations
If I choose LinkedList -
I will end up with
for op_one ------ O(n), and zero re-allocations
for op_two ------ O(n), and zero re-allocations
I will be storing millions of comparable elements. And I will be doing both the operations equally likely. Which one should I choose.
I suggest you time them together and a realistic way and see which is faster. If they are not significantly different I would use the approach you believe is simplest.
While the order of ArrayList and LinkedLIst are the same for space, the ArrayList is much smaller.
All the same clarity is usually the most important unless you know you have a performance issue.
(Question first understood as time complexity disregarding space. Request for clarifications made in comments)
I would use ArrayList (over LinkedList), not only because time complexity, but because it is simpler and it doesn't build a new node for each item.
Note overall complexity will be O(n) in either case: For ArrayList an O(n) + O(1), or prob*O(n) + (1-prob)*O(1) will be in the order of O(n).
Given equal complexity ( O(n) ), then you should find out actual execution time, or chose the easier to implement or maintain.
Consider that on modern architectures memory bandwidth is often the bottleneck. Sometimes computing a result is hence faster than storing and reading pre-computed values. In other words, computational complexity should be considered together with memory complexity. I would keep the memory usage small if the computations are not expensive and can work within the cache.
But in the end, you will have to test...
I am optimizing an implementation of a sorted LinkedList.
To insert an element I traverse the list and compare each element until I have the correct index, and then break loop and insert.
I would like to know if there is any other way that I can insert the element at the same time as traversing the list to reduce the insert from O(n + (n capped at size()/2)) to O(n).
A ListIterator is almost what Im after because of its add() method, but unfortunately in the case where there are elements in the list equal to the insert, the insert has to be placed after them in the list. To implement this ListIterator needs a peek() which it doesnt have.
edit: I have my answer, but will add this anyway since a lot of people havent understood correctly:
I am searching for an insertion point AND inserting, which combined is higher than O(n)
You may consider a skip list, which is implemented using multiple linked lists at varying granularity. E.g. the linked list at level 0 contains all items, level 1 only links to every 2nd item on average, level 2 to only every 4th item on average, etc.... Searching starts from the top level and gradually descends to lower levels until it finds an exact match. This logic is similar to a binary search. Thus search and insertion is an O(log n) operation.
A concrete example in the Java class library is ConcurrentSkipListSet (although it may not be directly usable for you here).
I'd favor Péter Török suggestion, but I'd still like to add something for the iterator approach:
Note that ListIterator provides a previous() method to iterate through the list backwards. Thus first iterate until you find the first element that is greater and then go to the previous element and call add(...). If you hit the end, i.e. all elements are smaller, then just call add(...) without going back.
I have my answer, but will add this anyway since a lot of people havent understood correctly: I am searching for an insertion point AND inserting, which combined is higher than O(n).
Your require to maintain a collection of (possibly) non-unique elements that can iterated in an order given by a ordering function. This can be achieved in a variety of ways. (In the following I use "total insertion cost" to mean the cost of inserting a number (N) of elements into an initially empty data structure.)
A singly or doubly linked list offers O(N^2) total insertion cost (whether or not you combine the steps of finding the position and doing the insertion!), and O(N) iteration cost.
A TreeSet offers O(NlogN) total insertion cost and O(N) iteration cost. But has the restriction of no duplicates.
A tree-based multiset (e.g. TreeMultiset) has the same complexity as a TreeSet, but allows duplicates.
A skip-list data structure also has the same complexity as the previous two.
Clearly, the complexity measures say that a data structure that uses a linked list performs the worst as N gets large. For this particular group of requirements, a well-implemented tree-based multiset is probably the best, assuming there is only one thread accessing the collection. If the collection is heavily used by many threads (and it is a set), then a ConcurrentSkipListSet is probably better.
You also seem to have a misconception about how "big O" measures combine. If I have one step of an algorithm that is O(N) and a second step that is also O(N), then the two steps combined are STILL O(N) .... not "more than O(N)". You can derive this from the definition of "big O". (I won't bore you with the details, but the Math is simple.)