I am not clear on a point in the documentation of List.
It says:
i) Note that these operations may execute in time proportional to the
index value for some implementations (the LinkedList class, for
example).
ii) Thus, iterating over the elements in a list is typically
preferable to indexing through it if the caller does not know the
implementation.
Note that I put the (i) and (ii) in the quote.
Point (i) is pretty obvious due to the way we access a linked list vs the random access of an array.
I can not understand point (ii) though.
What do we gain by prefering an iterator if we don't know the implementation?
I mean if the implementation is a LinkedList is there any difference in the performance than accessing via the index?
I imagine not, since the Iterator would be manipulating a LinkedList anyway.
So there would be no difference.
So what is the meaning of the recommendation of (ii) in the doc?
The iterator of a linked list can just have a pointer to the next node in the list, and go to the next node each time next() is called. It doesn't start from the beginning every time. Whereas if you use an index and call get(i), the linked list has to iterate from the beginning until the ith element at each iteration.
What you missed is that the iterator implementation of an ArrayList and the one of a LinkedList are completely different.
No, if the implementation is a LinkedList then an iterator will be much more efficient - O(n) for iterating over the whole list instead of O(N2). As the iterator is provided by the list, it has access to the internal data structures. It can just keep a reference to "the current node" making it a constant time operation to get to the next one: just follow the link!
(If you're still confused, I suggest you just look at the implementation - that's likely to make it clearer.)
Related
I have a LinkedList. Suppose that I'm inserting an element at the end and I want to save the position where was it inserted, so that I can call a function on an element next to it, whatever manages to get into this collection later. Is it possible with Java iterators? Many thanks.
Just to recollect, I'm not interested in reverse iteration. The application will be multithreaded, hence the weird requirement.
You can call List#listIterator(int index) with index = size() -1 to get an iterator to the current last element of the list. See documentation: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/List.html
However, you are going to be stuck from there.
Whether the List implementation you are using isn't thread-safe, which is the case for LinkedList, ArrayList and most others, and any attempt to use the iterator after the list has been structurally modified is going to result in a ConcurrentModificationException being thrown.
A list is structurally modified when its size changes, i.e. on additions and removals.
Or the List implementation you are using is thread-safe, in which case you have no guaranty that the iterator will have access to the elements added to the list after the creation of the iterator.
For example, it wouldn't be the case with CopyOnWriteArrayList, for which the iterator iterates through data as it was at creation (like a snapshot).
You must find an implementation of List that clearly describe this behavior and explicitly say it in its documentation. As far as I know, there doesn't exist any that allow it, at least in the standard library.
When I declare LinkedList like:
List<String> names = new LinkedList<String>();
it does not support any of the LinkedList's special methods (ex: names.peekLast(), names.pollFirst() )
But when I declare like:
LinkedList<String> names = new LinkedList<String>();
then it supports these methods.
Yes, it is obvious that reason is the reference, as LinkedList contains that's methods and List does not have!
But my question is that when I want to work with LinkedList, which one is better and correct? Or what is the usage of them?
If you need to use LinkedList methods that don't exist in List, you should use a LinkedList reference (you could use a List reference and cast to LinkedList in order to call LinkedList specific methods, but that would make less sense).
Otherwise, it is preferable to use the List interface for holding the reference, since it makes your code more generic, since it won't depend on any specific List implementation.
Well, List is basically backed by an array which is usually bigger than the current number of items. The elements are put in an array, and a new array is created when the old one runs out of space. This is fast for access by index, but slow at removing or inserting elements within the list or at the start. Adding/removing entries at the end of the list is reasonably cheap.
LinkedList is a doubly-linked list - each node knows its previous entry and its next one. This is fast for inserting after/before a particular node (or the head/tail), but slow at access by index.
LinkedList will usually take more memory than List because it needs space for all those next/previous references - and the data will probably have less locality of reference, as each node is a separate object. On the other hand, a List can have a backing array which is much larger than its current needs.
Reference from Difference between List<T> and LinkedList<T>
You can also refer to oracle docs
Linked List
All of the operations perform as could be expected for a doubly-linked list. Operations that index into the list will traverse the list from the beginning or the end, whichever is closer to the specified index.
List
The List interface provides four methods for positional (indexed) access to list elements. Lists (like Java arrays) are zero based. Note that these operations may execute in time proportional to the index value for some implementations (the LinkedList class, for example). Thus, iterating over the elements in a list is typically preferable to indexing through it if the caller does not know the implementation.
Well there is very simple explanation regarding that is List<> is like array which is making new array when its running out of space. And LinkedList<> is like doubly-linked list where each an every node will have link of previous node as well as next node.
More of that you can search from oracle docs
https://docs.oracle.com/javase/7/docs/api/java/util/List.html
and
https://docs.oracle.com/javase/7/docs/api/java/util/LinkedList.html
You can differentiate by your self. :)
In the book Effective Java by Joshua Bloch, there is a discussion on how a class can provide "judiciously chosen protected methods" as hooks into its internal workings.
The author then cites the documentation in AbstractList.removeRange():
This method is called by the clear operation on this list and its
subLists. Overriding this method to take advantage of the internals of
the list implementation can substantially improve the performance of
the clear operation on this list and its subLists.
My question is, how can overriding this method improve performance (more than simply not overriding it)? Can anyone give an example of this?
Let's take a concrete example - suppose that your implementation is backed by a dynamic array (this is how ArrayList works, for example). Now, suppose that you want to remove elements in the range [start, end). The default implementation of removeRange works by getting an iterator to position start, then calling remove() the appropriate number of times.
Each time remove() is called, the dynamic array implementation has to shuffle all the elements at position start + 1 and forward back one spot to fill the gap left in the removed element. This could potentially take time O(n), because potentially all of the array elements might need to get shuffled down. This means that if you're removing a total of k elements from the list, the naive approach will take time O(kn), since you're doing O(n) work k times.
Now consider a much better approach: copy the element at position end to position start, then element end + 1 to position start + 1, etc. until all elements are copied. This requires you to only do a total of O(n) work, because every element is moved at most once. Compared with the O(kn) approach given by the naive algorithm, this is a huge performance improvement. Consequently, overriding removeRange to use this more efficient algorithm can dramatically increase performance.
Hope this helps!
As specified in the method's javadocs:
This implementation gets a list iterator positioned before fromIndex, and repeatedly calls ListIterator.next followed by ListIterator.remove until the entire range has been removed.
Since this abstract class does not know about the internals of its subclasses, it relies on this generic algorithm which will run in time proportional to the number of items being removed.
If, for example, you implemented a subclass that stored elements as a linked list. Then you could take advantage of this fact and override this method to use a linked list specific algorithm (move pointer to fromIndex to point to toIndex) which runs in constant time. You have thus improved performance because you took advantage of internals.
Simply by overriding this method you can utilize this generic algorithm according to your requirement as your indexing issues. As it is a protected method in AbstractList and also in ArrayList and its implementation there works as iterative calls to remove() that need each time shifting of all elements available at right side of removed element by one index.
Obviously it is not effective, so you can make it working better.
Is there an efficient method to remove a range - say the tail - of X elements from a List, e.g. LinkedList in Java?
It is obviously possible to remove the last elements one by one, which should result in O(X) level performance. At least for LinkedList instances it should be possible to have O(1) performance (by setting the references around the first element to be removed and setting the head/tail references). Unfortunately I don't see any method within List or LinkedList to remove the last elements all at once.
Currently I am thinking of replacing the list by using List.subList() but I'm not sure if that has equal performance. At least it would be more clear within the code, on the other hand I would loose the additional functionality that LinkedList provides.
I'm mainly using the List as a stack, for which LinkedList seems to be the best option, at least regarding semantics.
subList(list.size() - N, list.size()).clear() is the recommended way to remove the last N elements. Indeed, the Javadoc for subList specifically recommends this idiom:
This method eliminates the need for explicit range operations (of the sort that commonly exist for arrays). Any operation that expects a list can be used as a range operation by passing a subList view instead of a whole list. For example, the following idiom removes a range of elements from a list:
list.subList(from, to).clear();
Indeed, I suspect that this idiom might be more efficient (albeit by a constant factor) than calling removeLast() N times, just because once it finds the Nth-to-last node, it only needs to update a constant number of pointers in the linked list, rather than updating the pointers of each of the last N nodes one at a time.
Be aware that subList() returns a view of the original list, meaning:
Any modification done to the view will be reflected in the original list
The returned list is not a LinkedList - it's an inner implementation of List that's not serializable
Anyway, using either removeFirst() or removeLast() should be efficient enough, because popping the first or last element of a linked list in Java is an O(1) operation - internally, LinkedList holds pointers to both ends of the list and removing either one is as simple as moving a pointer one position.
For removing m elements at once, you're stuck with O(m) performance with a LinkedList, although strangely enough an ArrayList might be a better option, because removing elements at the end of an ArrayList is as simple as moving an index pointer (denoting the end of the array) one position to its left, and no garbage nodes are left dangling as is the case with a LinkedList. The best choice? try both approaches, profile them and let the numbers speak for themselves.
Insertion or deletion of an element at a specific point of a list, assuming that we have a pointer to the node already, is a constant-time operation.
- from the Wikipedia Article on Linked list
Linked list traversal in a single linked list always starts from the head. We have to keep going till we satisfy a given condition.
So that will make any operation worst case O(n) unless we are dealing with the head node.
We CANNOT DIRECTLY go to a given pointer in a linked list. So why is it said that it is a constant time operation?
EDIT:
Even if we have a pointer to the node, we have to start from the head only right? So how is it constant time operation
First of: the LinkedList implemented in the Sun JDK effectively has a link to the last element as well as to the first element (there's only a head entry, but head.previous points to the last element). This means that even in the worst case navigating through a list to an element indicated by an index should take n/2 operations. It's also a doubly linked list.
Apart from that: inserting into the beginning or end of a LinkedList is trivially O(1), because you don't need to traverse all elements.
Inserting/removing anywhere else depends on how exactly you do it! If you use an Iterator (of a ListIterator for adding) then the operation can be O(1) as well, as the Iterator will already have a reference to relevant entry.
If, however, you are using add(int, E) or remove(int), then the LinkedList will have to find the relevant entry (O(n)) and then remove the element (O(1)), so the entire operation will be O(n).
You said it yourself: "assuming we have a pointer to the node already". That avoids the traversal you identify as the cause of the linear time.
Admittedly, the Wikipedia text is a bit ambiguous, since there are two nodes involved: the one being inserted, and the one in the list where to insert it.
" assuming that we have a pointer to the node already, is a constant-time operation"
You missed the first assumption, it seems.
You are missing the point I think here. It is just the INSERTION and DELETION that have a constant time not the finding the point of insertion or deletion as well!
The time is constant because you simply need to set the references (links) to the previous and next item in the list -- whereas for instance with ArrayList, in the case of insertion you need to allocate memory for (at least) one more item and transfer the existing data into the newly allocated array (or with deletion you have to shift elements in the array around once you deleted the item).