Of my knowledge, there are the following implementations:
ArrayList
LinkedList
Vector
Stack
(based on http://tutorials.jenkov.com/java-collections/list.html pls correct if wrong)
ArrayList is a dynamic array implementation, so, as array, get is O(1), LinkedList has O(1) for get from Head, Vector and Stack are based on ArrayList, hence, O(1).
So in EVERY case get(0) on any built-in (cause you can make your own, for a specific purpose on making get(0) TS of O(n!)) implementation of List if O(1)?
Is get(0) on java.util.List always O(1)?
Let us assume that there is a parameter N which stands for the length of the list1.
For the 4 implementations of List that you mentioned, get(0) is indeed an O(1) operation:
ArrayList, Vector and Stack all implement get(i) using array subscripting and that is an O(1) operation.
LinkedList.get(i) involves i link traversals which is O(i). But if i is a constant, that reduces to O(1).
However there are other "built in" implementations of List. Indeed, there are a considerable number of them if you include the various non-public implementations, such as the List classes that implement sublists, unmodifiable lists, and so on. Generalizing from those 4 to "all of them" is not sound2.
But get(0) won't be O(1) for all possible implementations of List.
Consider a simple linked list where the elements are chained in the reverse order. Since get(0) needs to traverse to the end of the list, which is N link traversals: O(N).
Consider a list that is fully populated from the rows in a database query's result set the first time that you attempt to retrieve a list element. The first get call will be at least O(N) because you are fetching N rows. (It could be worse than O(N) if the database query is not O(N).) So the worst case complexity for any call to get is O(N) ... or worse.
Indeed, with a some ingenuity, one could invent a custom list where get(0) has any Big-O complexity that you care to propose.
1 - I am being deliberately vague here. On the one hand, we need to identify a variable N denoting the "problem" size for complexity analysis to make sense. (The length of the list is the obvious choice.) On the other hand, the length of a List is a surprisingly "rubbery" concept when you consider all of the possible ways to implement the interface.
2 - I assume that you are asking this question because you want to write some library code that relies on List.get(0) being O(1). Since you can't prevent someone from using your library with a non-builtin list implementation, your "assume it is builtin" constraint in your question doesn't really help ... even if we could check all possible (past, current or future) builtin List implementations for you.
Ignoring custom implementations, and only looking at built-in implementations, like suggested at the end of the question, you still cannot say that get(0) will be O(1) regardless of list size.
As an example, calling get(0) on a sublist based on a LinkedList will be O(n):
List<Integer> list = new LinkedList<>(Arrays.asList(1,2,3,4,5,6,7,8,9));
List<Integer> subList = list.subList(4, 8);
Integer num = subList.get(0); // <===== O(n), not O(1)
In that code, subList.get(0) internally calls list.get(4), which has O(n) time complexity.
Yes, for all implementations of List you mentioned get(0) is O(1).
Related
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are (and the number won't change), and was wondering whether Array would have a notable performance advantage over HashSet for storing/retrieving said elements.
On paper, Array guarantees constant time insertion (since I know the size ahead of time) and retrieval, but the code for HashSet looks much cleaner and adds some flexibility, so I'm wondering if I'm losing anything performance-wise using it, at least, theoretically.
Depends on your data;
HashSet gives you an O(1) contains() method but doesn't preserve order.
ArrayList contains() is O(n) but you can control the order of the entries.
Array if you need to insert anything in between, worst case can be O(n), since you will have to move the data down and make room for the insertion. In Set, you can directly use SortedSet which too has O(n) too but with flexible operations.
I believe Set is more flexible.
The choice greatly depends on what do you want to do with it.
If it is what mentioned in your question:
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are
If this is what you need to do, the you need neither of them. There is a size() method in Collection for which you can get the size of it, which mean how many of them there are in the collection.
If what you mean for "collection of object" is not really a collection, and you need to choose a type of collection to store your objects for further processing, then you need to know, for different kind of collections, there are different capabilities and characteristic.
First, I believe to have a fair comparison, you should consider using ArrayList instead Array, for which you don't need to deal with the reallocation.
Then it become the choice of ArrayList vs HashSet, which is quite straight-forward:
Do you need a List or Set? They are for different purpose: Lists provide you indexed access, and iteration is in order of index. While Sets are mainly for you to keep a distinct set of data, and given its nature, you won't have indexed access.
After you made your decision of List or Set to use, then it is a choice of List/Set implementation, normally for Lists, you choose from ArrayList and LinkedList, while for Sets, you choose between HashSet and TreeSet.
All the choice depends on what you would want to do with that collection of data. They performs differently on different action.
For example, an indexed access in ArrayList is O(1), in HashSet (though not meaningful) is O(n), (just for your interest, in LinkedList is O(n), in TreeSet is O(nlogn) )
For adding new element, both ArrayList and HashSet is O(1) operation. Inserting in the middle is O(n) for ArrayList, while it doesn't make sense in HashSet. Both will suffer from reallocation, and both of them need O(n) for the reallocation (HashSet is normally slower in reallocation, because it involve calculation of hash for each element again).
To find if certain element exists in the collection, ArrayList is O(n) and HashSet is O(1).
There are still lots of operations you can do, so it is quite meaningless to discuss for performance without knowing what you want to do.
theoretically, and as SCJP6 Study guide says :D
arrays are faster than collections, and as said, most of the collections depend mainly on arrays (Maps are not considered Collection, but they are included in the Collections framework)
if you guarantee that the size of your elements wont change, why get stuck in Objects built on Objects (Collections built on Arrays) while you can use the root objects directly (arrays)
It looks like you will want an HashMap that maps id's to counts. Particularly,
HashMap<Integer,Integer> counts=new HashMap<Integer,Integer>();
counts.put(uniqueID,counts.get(uniqueID)+1);
This way, you get amortized O(1) adds, contains and retrievals. Essentially, an array with unique id's associated with each object IS a HashMap. By using the HashMap, you get the added bonus of not having to manage the size of the array, not having to map the keys to an array index yourself AND constant access time.
In the book Effective Java by Joshua Bloch, there is a discussion on how a class can provide "judiciously chosen protected methods" as hooks into its internal workings.
The author then cites the documentation in AbstractList.removeRange():
This method is called by the clear operation on this list and its
subLists. Overriding this method to take advantage of the internals of
the list implementation can substantially improve the performance of
the clear operation on this list and its subLists.
My question is, how can overriding this method improve performance (more than simply not overriding it)? Can anyone give an example of this?
Let's take a concrete example - suppose that your implementation is backed by a dynamic array (this is how ArrayList works, for example). Now, suppose that you want to remove elements in the range [start, end). The default implementation of removeRange works by getting an iterator to position start, then calling remove() the appropriate number of times.
Each time remove() is called, the dynamic array implementation has to shuffle all the elements at position start + 1 and forward back one spot to fill the gap left in the removed element. This could potentially take time O(n), because potentially all of the array elements might need to get shuffled down. This means that if you're removing a total of k elements from the list, the naive approach will take time O(kn), since you're doing O(n) work k times.
Now consider a much better approach: copy the element at position end to position start, then element end + 1 to position start + 1, etc. until all elements are copied. This requires you to only do a total of O(n) work, because every element is moved at most once. Compared with the O(kn) approach given by the naive algorithm, this is a huge performance improvement. Consequently, overriding removeRange to use this more efficient algorithm can dramatically increase performance.
Hope this helps!
As specified in the method's javadocs:
This implementation gets a list iterator positioned before fromIndex, and repeatedly calls ListIterator.next followed by ListIterator.remove until the entire range has been removed.
Since this abstract class does not know about the internals of its subclasses, it relies on this generic algorithm which will run in time proportional to the number of items being removed.
If, for example, you implemented a subclass that stored elements as a linked list. Then you could take advantage of this fact and override this method to use a linked list specific algorithm (move pointer to fromIndex to point to toIndex) which runs in constant time. You have thus improved performance because you took advantage of internals.
Simply by overriding this method you can utilize this generic algorithm according to your requirement as your indexing issues. As it is a protected method in AbstractList and also in ArrayList and its implementation there works as iterative calls to remove() that need each time shifting of all elements available at right side of removed element by one index.
Obviously it is not effective, so you can make it working better.
I am using vector of object. My issue is the removal from vector is expensive operation( O(n^2)). What would be the replacement of vector in Java. In my uses addition and removal is extensively happens.
i am C++ person don't know much Java
Well, Vector class shouldn't be used. There are so many containers available in Java. Few of them:
ArrayList is good for random access, but is bad for inserting or removing from the middle of the list.
LinkedList is bad for random access, but is fair good for iterating and adding/removing elements from the middle of container.
You can use ArrayList instead of vector in Java.
Check out this article:
http://www.javaworld.com/javaworld/javaqa/2001-06/03-qa-0622-vector.html
LinkedList can add/remove items at O(1)
First of all, Vector removal time complexity is O(n) not O(n^2). If you want more performant class, you should choose LinkedList. Its time complexity is constant.
Maybe a list is not the ideal data structure for your use case - would you be better off using a HashSet if the ordering of elements is not imporant?
Actually, the difference between Vector and ArrayList is that Vector is synchronized whereas ArrayList is not. Generally, you don't need synchronization and thus you'd use ArrayList (much like StringBuffer <-> StringBuilder).
The replacement mostly depends on how you intend to use the collection.
Adding objects to an ArrayList is quite fast, since if more space is required, it is normally doubled, and if you know the size requirements in advance, even better.
Removing from an ArrayList is O(n) but iteration and random access are fast.
If you have frequent add or remove operations and otherwise iterate over the list, a LinkedList would be fine.
You could even consider using a LinkedHashMap which allows fast access as well as preserves the order of insertion.
i think, Vector using System.arrayCopy which complexity is O(n^2)
It is correct that Vector will use System.arrayCopy to move the elements. However the System.arrayCopy() call copies at most Vector.size() elements, and hence is O(N) where N is the vector's size.
Hence O(N^2) is incorrect for a single insertion / removal.
In fact, if you want better than O(N) insertion and deletion, you will need to use some kind of linked list type with a cursor abstraction that allows insertion and deletion at "the current position". Even then you only get better than O(N) if you can do the insertions / deletions in the right order; i.e. not random order.
FWIW, the Java List APIs don't provide such a cursor mechanism ... not least because it would be awkward to use, and only efficient in certain circumstances / implementations.
Thanks to everyone for there contribution which helped me to solve this problem. I used a circular queue which has been written with help of vector.
I have 100,000 objects in the list .I want to remove few elements from the list based on condition.Can anyone tell me what is the best approach to achieve interms of memory and performance.
Same question for adding objects also based on condition.
Thanks in Advance
Raju
Your container is not just a List. List is an interface that can be implemented by, for example ArrayList and LinkedList. The performance will depend on which of these underlying classes is actually instantiated for the object you are polymorphically referring to as List.
ArrayList can access elements in the middle of the list quickly, but if you delete one of them you need to shift a whole bunch of elements. LinkedList is the opposite i nthis respect., requiring iteration for the access but deletion is just a matter of reassigning pointers.
Your performance depends on the implementation of List, and the best choice of implementation depends on how you will be using the List and which operations are most frequent.
If you're going to be iterating a list and applying tests to each element, then a LinkedList will be most efficient in terms of CPU time, because you don't have to shift any elements in the list. It will, however consume more memory than an ArrayList, because each list element is actually held in an entry.
However, it might not matter. 100,000 is a small number, and if you aren't removing a lot of elements the cost to shift an ArrayList will be low. And if you are removing a lot of elements, it's probably better to restructure as a copy-with filter.
However, the only real way to know is to write the code and benchmark it.
Collections2.filter (from Guava) produces a filtered collection based on a predicate.
List<Number> myNumbers = Arrays.asList(Integer.valueOf(1), Double.valueOf(1e6));
Collection<Number> bigNumbers = Collections2.filter(
myNumbers,
new Predicate<Number>() {
public boolean apply(Number n) {
return n.doubleValue() >= 100d;
}
});
Note, that some operations like size() are not efficient with this scheme. If you tend to follow Josh Bloch's advice and prefer isEmpty() and iterators to unnecessary size() checks, then this shouldn't bite you in practice.
LinkedList could be a good choice.
LinkedList does "remove and add elements" more effective than ArrayList. and no need to call such method as ArrayList.trimToSize() to remove useless memory. But LinkedList is a dual-linked list, each element is wrapped as an Entry which needs extra memory.
Someone knows a nice solution for EnumSet + List
I mean I need to store enum values and I also need to preserve the order , and to be able to access its index of the enum value in the collection in O(1) time.
The closest thing I can come to think of, present in the API is the LinkedHashSet:
From http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashSet.html:
Hash table and linked list implementation of the Set interface, with predictable iteration order.
I doubt it's possible to do what you want. Basically, you want to look up indexes in constant time, even after modifying the order of the list. Unless you allow remove / reorder operations to take O(n) time, I believe you can't get away with lower than O(log n) (which can be achieved by a heap structure).
The only way I can see to satisfy ordering and O(1) access is to duplicate the data in a List and an array of indexes (wrapped in a nice little OrderedEnumSet, of course).