Effective Java Item 17: How can overriding removeRange() improve performance? - java

In the book Effective Java by Joshua Bloch, there is a discussion on how a class can provide "judiciously chosen protected methods" as hooks into its internal workings.
The author then cites the documentation in AbstractList.removeRange():
This method is called by the clear operation on this list and its
subLists. Overriding this method to take advantage of the internals of
the list implementation can substantially improve the performance of
the clear operation on this list and its subLists.
My question is, how can overriding this method improve performance (more than simply not overriding it)? Can anyone give an example of this?

Let's take a concrete example - suppose that your implementation is backed by a dynamic array (this is how ArrayList works, for example). Now, suppose that you want to remove elements in the range [start, end). The default implementation of removeRange works by getting an iterator to position start, then calling remove() the appropriate number of times.
Each time remove() is called, the dynamic array implementation has to shuffle all the elements at position start + 1 and forward back one spot to fill the gap left in the removed element. This could potentially take time O(n), because potentially all of the array elements might need to get shuffled down. This means that if you're removing a total of k elements from the list, the naive approach will take time O(kn), since you're doing O(n) work k times.
Now consider a much better approach: copy the element at position end to position start, then element end + 1 to position start + 1, etc. until all elements are copied. This requires you to only do a total of O(n) work, because every element is moved at most once. Compared with the O(kn) approach given by the naive algorithm, this is a huge performance improvement. Consequently, overriding removeRange to use this more efficient algorithm can dramatically increase performance.
Hope this helps!

As specified in the method's javadocs:
This implementation gets a list iterator positioned before fromIndex, and repeatedly calls ListIterator.next followed by ListIterator.remove until the entire range has been removed.
Since this abstract class does not know about the internals of its subclasses, it relies on this generic algorithm which will run in time proportional to the number of items being removed.
If, for example, you implemented a subclass that stored elements as a linked list. Then you could take advantage of this fact and override this method to use a linked list specific algorithm (move pointer to fromIndex to point to toIndex) which runs in constant time. You have thus improved performance because you took advantage of internals.

Simply by overriding this method you can utilize this generic algorithm according to your requirement as your indexing issues. As it is a protected method in AbstractList and also in ArrayList and its implementation there works as iterative calls to remove() that need each time shifting of all elements available at right side of removed element by one index.
Obviously it is not effective, so you can make it working better.

Related

Is get(0) on java.util.List always O(1)?

Of my knowledge, there are the following implementations:
ArrayList
LinkedList
Vector
Stack
(based on http://tutorials.jenkov.com/java-collections/list.html pls correct if wrong)
ArrayList is a dynamic array implementation, so, as array, get is O(1), LinkedList has O(1) for get from Head, Vector and Stack are based on ArrayList, hence, O(1).
So in EVERY case get(0) on any built-in (cause you can make your own, for a specific purpose on making get(0) TS of O(n!)) implementation of List if O(1)?
Is get(0) on java.util.List always O(1)?
Let us assume that there is a parameter N which stands for the length of the list1.
For the 4 implementations of List that you mentioned, get(0) is indeed an O(1) operation:
ArrayList, Vector and Stack all implement get(i) using array subscripting and that is an O(1) operation.
LinkedList.get(i) involves i link traversals which is O(i). But if i is a constant, that reduces to O(1).
However there are other "built in" implementations of List. Indeed, there are a considerable number of them if you include the various non-public implementations, such as the List classes that implement sublists, unmodifiable lists, and so on. Generalizing from those 4 to "all of them" is not sound2.
But get(0) won't be O(1) for all possible implementations of List.
Consider a simple linked list where the elements are chained in the reverse order. Since get(0) needs to traverse to the end of the list, which is N link traversals: O(N).
Consider a list that is fully populated from the rows in a database query's result set the first time that you attempt to retrieve a list element. The first get call will be at least O(N) because you are fetching N rows. (It could be worse than O(N) if the database query is not O(N).) So the worst case complexity for any call to get is O(N) ... or worse.
Indeed, with a some ingenuity, one could invent a custom list where get(0) has any Big-O complexity that you care to propose.
1 - I am being deliberately vague here. On the one hand, we need to identify a variable N denoting the "problem" size for complexity analysis to make sense. (The length of the list is the obvious choice.) On the other hand, the length of a List is a surprisingly "rubbery" concept when you consider all of the possible ways to implement the interface.
2 - I assume that you are asking this question because you want to write some library code that relies on List.get(0) being O(1). Since you can't prevent someone from using your library with a non-builtin list implementation, your "assume it is builtin" constraint in your question doesn't really help ... even if we could check all possible (past, current or future) builtin List implementations for you.
Ignoring custom implementations, and only looking at built-in implementations, like suggested at the end of the question, you still cannot say that get(0) will be O(1) regardless of list size.
As an example, calling get(0) on a sublist based on a LinkedList will be O(n):
List<Integer> list = new LinkedList<>(Arrays.asList(1,2,3,4,5,6,7,8,9));
List<Integer> subList = list.subList(4, 8);
Integer num = subList.get(0); // <===== O(n), not O(1)
In that code, subList.get(0) internally calls list.get(4), which has O(n) time complexity.
Yes, for all implementations of List you mentioned get(0) is O(1).

Implement a list with o(1) running time

Could I make a list with everything having fast running time? Is it possible having this type of list? I can't wrap my head around how you could keep search or add times being constant if it needs to go through nodes to search for others, much less adding.
At first you only said get(), add() and set(), but then you said search() as well. The first three all have O(1) average run time in an ArrayList and similar implementations. You can't have O(1) search time in anything that would normally be considered a list.
Edit: Some people have pointed out, correctly, that you could get O(1) lookup time if the list implementation also stored element indices in a hashmap. Strictly speaking, as long as it implements a List interface, it is a list. I should have said that you can't do that with only a list.
Just posting my comment as an answer...
If you used an expanding list that expands in size as you need it, you could achieve amortized O(1) getting, setting, and adding, which just means you'd eventually need to expand its size, but that doesn't happen often enough. Pretty sure this is how Java's ArrayList class works.
You can read more about it here. https://stackoverflow.com/a/4450659/1572906
You mentioned searching, but with this kind of approach, O(1) doesn't seem very feasible. You could achieve O(1) using a hash table alongside the array.

Removing range (tail) from a List

Is there an efficient method to remove a range - say the tail - of X elements from a List, e.g. LinkedList in Java?
It is obviously possible to remove the last elements one by one, which should result in O(X) level performance. At least for LinkedList instances it should be possible to have O(1) performance (by setting the references around the first element to be removed and setting the head/tail references). Unfortunately I don't see any method within List or LinkedList to remove the last elements all at once.
Currently I am thinking of replacing the list by using List.subList() but I'm not sure if that has equal performance. At least it would be more clear within the code, on the other hand I would loose the additional functionality that LinkedList provides.
I'm mainly using the List as a stack, for which LinkedList seems to be the best option, at least regarding semantics.
subList(list.size() - N, list.size()).clear() is the recommended way to remove the last N elements. Indeed, the Javadoc for subList specifically recommends this idiom:
This method eliminates the need for explicit range operations (of the sort that commonly exist for arrays). Any operation that expects a list can be used as a range operation by passing a subList view instead of a whole list. For example, the following idiom removes a range of elements from a list:
list.subList(from, to).clear();
Indeed, I suspect that this idiom might be more efficient (albeit by a constant factor) than calling removeLast() N times, just because once it finds the Nth-to-last node, it only needs to update a constant number of pointers in the linked list, rather than updating the pointers of each of the last N nodes one at a time.
Be aware that subList() returns a view of the original list, meaning:
Any modification done to the view will be reflected in the original list
The returned list is not a LinkedList - it's an inner implementation of List that's not serializable
Anyway, using either removeFirst() or removeLast() should be efficient enough, because popping the first or last element of a linked list in Java is an O(1) operation - internally, LinkedList holds pointers to both ends of the list and removing either one is as simple as moving a pointer one position.
For removing m elements at once, you're stuck with O(m) performance with a LinkedList, although strangely enough an ArrayList might be a better option, because removing elements at the end of an ArrayList is as simple as moving an index pointer (denoting the end of the array) one position to its left, and no garbage nodes are left dangling as is the case with a LinkedList. The best choice? try both approaches, profile them and let the numbers speak for themselves.

Confusing point in List documentation

I am not clear on a point in the documentation of List.
It says:
i) Note that these operations may execute in time proportional to the
index value for some implementations (the LinkedList class, for
example).
ii) Thus, iterating over the elements in a list is typically
preferable to indexing through it if the caller does not know the
implementation.
Note that I put the (i) and (ii) in the quote.
Point (i) is pretty obvious due to the way we access a linked list vs the random access of an array.
I can not understand point (ii) though.
What do we gain by prefering an iterator if we don't know the implementation?
I mean if the implementation is a LinkedList is there any difference in the performance than accessing via the index?
I imagine not, since the Iterator would be manipulating a LinkedList anyway.
So there would be no difference.
So what is the meaning of the recommendation of (ii) in the doc?
The iterator of a linked list can just have a pointer to the next node in the list, and go to the next node each time next() is called. It doesn't start from the beginning every time. Whereas if you use an index and call get(i), the linked list has to iterate from the beginning until the ith element at each iteration.
What you missed is that the iterator implementation of an ArrayList and the one of a LinkedList are completely different.
No, if the implementation is a LinkedList then an iterator will be much more efficient - O(n) for iterating over the whole list instead of O(N2). As the iterator is provided by the list, it has access to the internal data structures. It can just keep a reference to "the current node" making it a constant time operation to get to the next one: just follow the link!
(If you're still confused, I suggest you just look at the implementation - that's likely to make it clearer.)

Best way to remove and add elements from the java List

I have 100,000 objects in the list .I want to remove few elements from the list based on condition.Can anyone tell me what is the best approach to achieve interms of memory and performance.
Same question for adding objects also based on condition.
Thanks in Advance
Raju
Your container is not just a List. List is an interface that can be implemented by, for example ArrayList and LinkedList. The performance will depend on which of these underlying classes is actually instantiated for the object you are polymorphically referring to as List.
ArrayList can access elements in the middle of the list quickly, but if you delete one of them you need to shift a whole bunch of elements. LinkedList is the opposite i nthis respect., requiring iteration for the access but deletion is just a matter of reassigning pointers.
Your performance depends on the implementation of List, and the best choice of implementation depends on how you will be using the List and which operations are most frequent.
If you're going to be iterating a list and applying tests to each element, then a LinkedList will be most efficient in terms of CPU time, because you don't have to shift any elements in the list. It will, however consume more memory than an ArrayList, because each list element is actually held in an entry.
However, it might not matter. 100,000 is a small number, and if you aren't removing a lot of elements the cost to shift an ArrayList will be low. And if you are removing a lot of elements, it's probably better to restructure as a copy-with filter.
However, the only real way to know is to write the code and benchmark it.
Collections2.filter (from Guava) produces a filtered collection based on a predicate.
List<Number> myNumbers = Arrays.asList(Integer.valueOf(1), Double.valueOf(1e6));
Collection<Number> bigNumbers = Collections2.filter(
myNumbers,
new Predicate<Number>() {
public boolean apply(Number n) {
return n.doubleValue() >= 100d;
}
});
Note, that some operations like size() are not efficient with this scheme. If you tend to follow Josh Bloch's advice and prefer isEmpty() and iterators to unnecessary size() checks, then this shouldn't bite you in practice.
LinkedList could be a good choice.
LinkedList does "remove and add elements" more effective than ArrayList. and no need to call such method as ArrayList.trimToSize() to remove useless memory. But LinkedList is a dual-linked list, each element is wrapped as an Entry which needs extra memory.

Categories