Is there an efficient method to remove a range - say the tail - of X elements from a List, e.g. LinkedList in Java?
It is obviously possible to remove the last elements one by one, which should result in O(X) level performance. At least for LinkedList instances it should be possible to have O(1) performance (by setting the references around the first element to be removed and setting the head/tail references). Unfortunately I don't see any method within List or LinkedList to remove the last elements all at once.
Currently I am thinking of replacing the list by using List.subList() but I'm not sure if that has equal performance. At least it would be more clear within the code, on the other hand I would loose the additional functionality that LinkedList provides.
I'm mainly using the List as a stack, for which LinkedList seems to be the best option, at least regarding semantics.
subList(list.size() - N, list.size()).clear() is the recommended way to remove the last N elements. Indeed, the Javadoc for subList specifically recommends this idiom:
This method eliminates the need for explicit range operations (of the sort that commonly exist for arrays). Any operation that expects a list can be used as a range operation by passing a subList view instead of a whole list. For example, the following idiom removes a range of elements from a list:
list.subList(from, to).clear();
Indeed, I suspect that this idiom might be more efficient (albeit by a constant factor) than calling removeLast() N times, just because once it finds the Nth-to-last node, it only needs to update a constant number of pointers in the linked list, rather than updating the pointers of each of the last N nodes one at a time.
Be aware that subList() returns a view of the original list, meaning:
Any modification done to the view will be reflected in the original list
The returned list is not a LinkedList - it's an inner implementation of List that's not serializable
Anyway, using either removeFirst() or removeLast() should be efficient enough, because popping the first or last element of a linked list in Java is an O(1) operation - internally, LinkedList holds pointers to both ends of the list and removing either one is as simple as moving a pointer one position.
For removing m elements at once, you're stuck with O(m) performance with a LinkedList, although strangely enough an ArrayList might be a better option, because removing elements at the end of an ArrayList is as simple as moving an index pointer (denoting the end of the array) one position to its left, and no garbage nodes are left dangling as is the case with a LinkedList. The best choice? try both approaches, profile them and let the numbers speak for themselves.
Related
LinkedList is a data structure, in which each element is coupled with a link to its next element.
So, in theory, this data structure is made for freely iterating through a list, in whichever direction, while performing whatever operations (except, maybe, deleting the element you're currently at).
However, in application, this isn't true. Returning an Iterator from a LinkedList is subject to the general Iterator rules (i.e. no modifying while iterating), and even creating a ListIterator, an improved Iterator, which allows modifying the next/previous element of the iterator, and let's you go forward/backward dynamically, still has severe limitations:
You can't delete elements from the beginning of the list if you're not currently there, and neither can you add elements to the end of the list, unless you're currently there.
So, is there a way to iterate freely through a LinkedList while performing whatever modifications to the list? And if not, why isn't there one? Shouldn't it be one of the main goals of this data structure to realize it?
The choice to make all Iterators failfast was a design decision, just that and nothing more.
Nothing stops you to take the code and starting from that, build a NotSoFailFastIterator for yourself if you think you can use it. However I think you'll quickly revert from using it once yoy see its behaviour and its results in usage scenarios where there's really lots of concurrent activity going on on the underlying List of your iterator.
This behavior is not specific to LinkedLists. When you iterate over a List (any List) with a ListIterator, you can only make structural changes (adding or removing elements) in the current position of the iterator. Otherwise, continuing to use the iterator after a structural change of the List may yield unexpected results.
For adding elements to the start or end of the LinkedList, you have addFirst and addLast methods. You don't have to iterate over the List in order to do that.
A ListIterator instance maintains a state that allows it to locate the next and previous elements as well as support other operations (remove the current element, add an element at the current position). If you make a structural change to a List not via the ListIterator, the state of the iterator may become invalid, leading to unexpected results. Therefore all structural changes must be made via the ListIterator.
I guess that the LinkedList class could supply an implementation of a more complex iterator that supports operations such as addFirst and addLast. I'm not sure how useful that would have been, and whether it would justify the added complexity.
If you want to iterate freely use array or list. Linked lists are meant to be traversed and access the data useful in dynamic allocation of the memory to the data.
When you have a linked list datastructure, you can add or remove at a particular node, when your cursor is pointing to the right node where you want to add or remove.
Inserts the specified element into the list (optional operation). The
element is inserted immediately before the element that would be
returned by next(), if any, and after the element that would be
returned by previous(), if any. The new element is inserted before
the implicit cursor: a subsequent call to next would be unaffected,
and a subsequent call to previous would return the new element. (This
call increases by one the value that would be returned by a call to
nextIndex or previousIndex.)
ListIterator
Instead if its a array structure, then you access by index , and it is possible to add or remove at a particular index limited , by the length of the array. ArrayList does that.
In Java, when you do this:
alist[0].remove();
What happens to the rest of the array list. Do all of the objects move up one or do they stay the same and there is just an empty index at [0]?
If not, is there an efficient way of moving each object's index closer down by one?
To clarify what I mean by more effecient:
You could just remove the first index and then iterate through the ArrayList and delete each object and re-assign it to a new index, but this seems very ineffecient and it seems like there should be a way but I have looked through at the JavaDoc page for the ArrayList class and do not see anything that would accomplish what I am trying to do.
Assuming you actually meant to ask about aList.remove(0)...
As documented by Oracle:
public E remove(int index)
Removes the element at the specified
position in this list. Shifts any subsequent elements to the left
(subtracts one from their indices).
So remove does as you require. However, you may not consider the implementation efficient since it requires time proportional to the number of elements remaining in the list. For example, if you have a list with 1 million items in it and you remove the item at index 0, then the remaining 999,999 items will need to be moved in memory.
Ignoring the code you posted that is irrelevant to an ArrayList, if you were to look at the source for ArrayList you'd find that when calling ArrayList.remove(obj) it finds the index (or if using remove(int) it already knows) then does:
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
An ArrayList is backed by an array and it shifts everything in that backing array to the left.
In that case, the lookup is O(1) if you're using remove(int) or O(n) if providing an object, and the remove operation is O(n).
If you were to use a LinkedList the lookup is O(n) or O(n) but the removal is O(1) because it's a doubly-linked list.
When choosing a data structure, it's important to consider how you're going to be using it; there are always trade-offs depending on your use pattern.
In the book Effective Java by Joshua Bloch, there is a discussion on how a class can provide "judiciously chosen protected methods" as hooks into its internal workings.
The author then cites the documentation in AbstractList.removeRange():
This method is called by the clear operation on this list and its
subLists. Overriding this method to take advantage of the internals of
the list implementation can substantially improve the performance of
the clear operation on this list and its subLists.
My question is, how can overriding this method improve performance (more than simply not overriding it)? Can anyone give an example of this?
Let's take a concrete example - suppose that your implementation is backed by a dynamic array (this is how ArrayList works, for example). Now, suppose that you want to remove elements in the range [start, end). The default implementation of removeRange works by getting an iterator to position start, then calling remove() the appropriate number of times.
Each time remove() is called, the dynamic array implementation has to shuffle all the elements at position start + 1 and forward back one spot to fill the gap left in the removed element. This could potentially take time O(n), because potentially all of the array elements might need to get shuffled down. This means that if you're removing a total of k elements from the list, the naive approach will take time O(kn), since you're doing O(n) work k times.
Now consider a much better approach: copy the element at position end to position start, then element end + 1 to position start + 1, etc. until all elements are copied. This requires you to only do a total of O(n) work, because every element is moved at most once. Compared with the O(kn) approach given by the naive algorithm, this is a huge performance improvement. Consequently, overriding removeRange to use this more efficient algorithm can dramatically increase performance.
Hope this helps!
As specified in the method's javadocs:
This implementation gets a list iterator positioned before fromIndex, and repeatedly calls ListIterator.next followed by ListIterator.remove until the entire range has been removed.
Since this abstract class does not know about the internals of its subclasses, it relies on this generic algorithm which will run in time proportional to the number of items being removed.
If, for example, you implemented a subclass that stored elements as a linked list. Then you could take advantage of this fact and override this method to use a linked list specific algorithm (move pointer to fromIndex to point to toIndex) which runs in constant time. You have thus improved performance because you took advantage of internals.
Simply by overriding this method you can utilize this generic algorithm according to your requirement as your indexing issues. As it is a protected method in AbstractList and also in ArrayList and its implementation there works as iterative calls to remove() that need each time shifting of all elements available at right side of removed element by one index.
Obviously it is not effective, so you can make it working better.
I am not clear on a point in the documentation of List.
It says:
i) Note that these operations may execute in time proportional to the
index value for some implementations (the LinkedList class, for
example).
ii) Thus, iterating over the elements in a list is typically
preferable to indexing through it if the caller does not know the
implementation.
Note that I put the (i) and (ii) in the quote.
Point (i) is pretty obvious due to the way we access a linked list vs the random access of an array.
I can not understand point (ii) though.
What do we gain by prefering an iterator if we don't know the implementation?
I mean if the implementation is a LinkedList is there any difference in the performance than accessing via the index?
I imagine not, since the Iterator would be manipulating a LinkedList anyway.
So there would be no difference.
So what is the meaning of the recommendation of (ii) in the doc?
The iterator of a linked list can just have a pointer to the next node in the list, and go to the next node each time next() is called. It doesn't start from the beginning every time. Whereas if you use an index and call get(i), the linked list has to iterate from the beginning until the ith element at each iteration.
What you missed is that the iterator implementation of an ArrayList and the one of a LinkedList are completely different.
No, if the implementation is a LinkedList then an iterator will be much more efficient - O(n) for iterating over the whole list instead of O(N2). As the iterator is provided by the list, it has access to the internal data structures. It can just keep a reference to "the current node" making it a constant time operation to get to the next one: just follow the link!
(If you're still confused, I suggest you just look at the implementation - that's likely to make it clearer.)
Insertion or deletion of an element at a specific point of a list, assuming that we have a pointer to the node already, is a constant-time operation.
- from the Wikipedia Article on Linked list
Linked list traversal in a single linked list always starts from the head. We have to keep going till we satisfy a given condition.
So that will make any operation worst case O(n) unless we are dealing with the head node.
We CANNOT DIRECTLY go to a given pointer in a linked list. So why is it said that it is a constant time operation?
EDIT:
Even if we have a pointer to the node, we have to start from the head only right? So how is it constant time operation
First of: the LinkedList implemented in the Sun JDK effectively has a link to the last element as well as to the first element (there's only a head entry, but head.previous points to the last element). This means that even in the worst case navigating through a list to an element indicated by an index should take n/2 operations. It's also a doubly linked list.
Apart from that: inserting into the beginning or end of a LinkedList is trivially O(1), because you don't need to traverse all elements.
Inserting/removing anywhere else depends on how exactly you do it! If you use an Iterator (of a ListIterator for adding) then the operation can be O(1) as well, as the Iterator will already have a reference to relevant entry.
If, however, you are using add(int, E) or remove(int), then the LinkedList will have to find the relevant entry (O(n)) and then remove the element (O(1)), so the entire operation will be O(n).
You said it yourself: "assuming we have a pointer to the node already". That avoids the traversal you identify as the cause of the linear time.
Admittedly, the Wikipedia text is a bit ambiguous, since there are two nodes involved: the one being inserted, and the one in the list where to insert it.
" assuming that we have a pointer to the node already, is a constant-time operation"
You missed the first assumption, it seems.
You are missing the point I think here. It is just the INSERTION and DELETION that have a constant time not the finding the point of insertion or deletion as well!
The time is constant because you simply need to set the references (links) to the previous and next item in the list -- whereas for instance with ArrayList, in the case of insertion you need to allocate memory for (at least) one more item and transfer the existing data into the newly allocated array (or with deletion you have to shift elements in the array around once you deleted the item).