ArrayList vs LinkedList efficiency - java

I have a problem which uses many insertions in the list at the beginning and afterwards search and retrieval operations are extensively used, So which approach is good and efficient?
Approach 1: Use LinkedList as my data structure for the whole program.
Approach 2: Use ArrayList as my data structure for the whole program.
Approach 3: Use LinkedList as my data structure at the beginning for insertion and do
Arraylist al = new Arraylist(ll);
for retrieval operations.
How much does the changing of data structure cost?? Is it actually worth doing it?

Since they both implement the same interface you can find this out for yourself by writing your code so that the constructor can be plugged in and test your code both ways. Benchmarking can be done with jmh.
You can plug in the constructor by using the Supplier interface.
Depending on the nature of your problem you may find that using a Deque is appropriate.

Allow me to suggest a 4th approach: using ArrayDeque for the whole program. It has efficient insertion at the front and the back (possibly even faster then LinkedList) and search efficiency like ArrayList. ArrayDeque is an unfairly overlooked class in the Java Collections Framework, possibly because it was added later (Java 6, I think). It does not implement the List interface, so you will have to write your program specifically for it.
Other than that, the only two valid answers to your question are
Do not worry about efficiency until you absolutely have to.
If and when you have to worry about efficiency, you will have to make your own measurements of what performs satisfactorily on your data in your environment. No one here can tell you.

It depends on Frequencies of insert and retrieve operations. Here are the complexities:
ArrayList -> Insertion in the beginning : O(n)
ArrayList -> Retrieval based on index : O(1)
LinkedList -> Insertion in the beginning : O(1)
LinkedList -> Retrieval based on index : O(i) where i is number of elements to be scanned.
So, if you have more retrievals than insertions, go for ArrayList, if not, go for LinkedList.

Related

Faster Access Version of ArrayList?

Does anyone know of something similar to ArrayList that is better geared to handling really large amounts of data as quickly as possible?
I've got a program with a really large ArrayList that's getting choked up when it tries to explore or modify the ArrayList.
Presumably when you do:
//i is an int;
arrayList.remove(i);
The code behind the scenes runs something like:
public T remove(int i){
//Let's say ArrayList stores it's data in a T [] array called "contents".
T output = contents[i];
T [] overwrite = new T [contents.length - 1];
//Yes, I know generic arrays aren't created this simply. Bear with me here...
for(int x=0;x<i;x++){
overwrite[x] = contents[x];
}
for(int x=i+1;x<contents.length;x++){
overwrite[x-1] = contents[x];
}
contents = overwrite;
return output;
}
When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.
I've tried to alleviate this problem by creating my own custom ArrayList subclass which segments it's data storage into smaller ArrayLists. Any process that required the ArrayList to scan it's data for a specific item generates a new search thread for each of the smaller ArrayLists within (to take advantage of my multiple CPU cores).
But this system doesn't work because when the Thread calling the search has an item in any of the ArrayLists synchronized, it can block those seperate search threads from completing their search, which in turn locks up the original thread that called the search in the process, essentially deadlocking the whole program up.
I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.
Any ideas?
I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.
The answer depends a lot on what sort of data you are talking about and the specific operations you need. You use the work "explore" without defining it.
If you are talking about looking up a record then nothing beats a HashMap – ConcurrentHashMap for threaded operation. If you are talking about keeping in order, especially when dealing with threads, then I'd recommend a ConcurrentSkipListMap which has O(logN) lookup, insert, remove, etc..
You may also want to consider using multiple collections. You need to be careful that the collections don't get out of sync, which can be especially challenging with threads, but that might be faster depending on the various operations you are making.
When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.
As mentioned ConcurrentSkipListMap is O(logN) for rearranging an item. i.e. remove and add with new position.
The [ArrayList.remove(i)] code behind the scenes runs something like: ...
Well not really. You can look at the code in the JDK right? ArrayList uses System.arraycopy(...) for these sorts of operations. They maybe not efficient for your case but it isn't O(N).
One example of good usage for a linked list is where the list elements are very large ie. large enough that only one or two can fit in CPU cache at the same time. At this point the advantage that contiguous block containers like vectors or arrays for iteration have is more or less nullified, and a performance advantage may be possible if many insertions and removals are occurring in realtime.
ref: Under what circumstances are linked lists useful?
ref : https://coderanch.com/t/508171/java/Collection-datastructure-large-data
Different collection types has different time complexity for various operations. Typical complexities are: O(1), O(N), and O(log(N)). To choose a collection, you first need to decide which operation you use often, and avoid collections which have O(N) complexity for that operations. Here you often use operation ArrayList.remove(i) which is O(N). Even worse, you use remove(i) and not remove(element). If remove(element) would have been the only operation used often, then LinkedList could help, its remove(element) is O(1), but LinkedList.remove(i)is also O(N).
I doubt that a List with remove(i) complexity of O(1) can be implemented. The best possible time is O(log(N)), which is definitely better than O(N). Java standard library has no such implementation. You can try to google it by "binary indexed tree" keywords.
But the first thing I would do is to review the algorithm and try to get rid of List.remove(i) operation.

Most efficient collection for filtering a Java Stream?

I'm storing several Things in a Collection. The individual Things are unique, but their types aren't. The order in which they are stored also doesn't matter.
I want to use Java 8's Stream API to search it for a specific type with this code:
Collection<Thing> things = ...;
// ... populate things ...
Stream<Thing> filtered = things.stream.filter(thing -> thing.type.equals(searchType));
Is there a particular Collection that would make the filter() more efficient?
I'm inclined to think no, because the filter has to iterate through the entire collection.
On the other hand, if the collection is some sort of tree that is indexed by the Thing.type then the filter() might be able to take advantage of that fact. Is there any way to achieve this?
The stream operations like filter are not that specialized to take an advantage in special cases. For example, IntStream.range(0, 1_000_000_000).filter(x -> x > 999_999_000) will actually iterate all the input numbers, it cannot just "skip" the first 999_999_000. So your question is reduced to find the collection with the most efficient iteration.
The iteration is usually performed in Spliterator.forEachRemaining method (for non-short-circuiting stream) and in Spliterator.tryAdvance method (for short-circuiting stream), so you can take a look into the corresponding spliterator implementation and check how efficient it is. To my opinion the most efficient is an array (either bare or wrapped into list with Arrays.asList): it has minimal overhead. ArrayList is also quite fast, but for short-circuiting operation it will check the modCount (to detect concurrent modification) on every iteration which would add very slight overhead. Other types like HashSet or LinkedList are comparably slower, though in most of applications this difference is practically insignificant.
Note that parallel streams should be used with care. For example, the splitting of LinkedList is quite poor and you may experience worse performance than in sequential case.
The most important thing to understand, regarding this question, is that when you pass a lambda expression to a particular library like the Stream API, all the library receives is an implementation of a functional interface, e.g. an instance of Predicate. It has no knowledge about what that implementation will do and therefore has no way to exploit scenarios like filtering sorted data via comparison. The stream library simply doesn’t know that the Predicate is doing a comparison.
An implementation doing such an optimization would need an interaction of the JVM, which knows and understands the code, and the library, which knows the semantics. Such thing does not happen in current implementation and is currently far away, at least as I can see it.
If the source is a tree or sorted list and you want to benefit from that for filtering, you have to do it using APIs operating on the source, before creating the stream. E.g. suppose, we have a TreeSet and want to filter it to get items within a particular range, like
// our made-up source
TreeSet<Integer> tree=IntStream.range(0, 100).boxed()
.collect(Collectors.toCollection(TreeSet::new));
// the naive implementation
tree.stream().filter(i -> i>=65 && i<91).forEach(i->System.out.print((char)i.intValue()));
We can do instead:
tree.tailSet(65).headSet(91).stream().forEach(i->System.out.print((char)i.intValue()));
which will utilize the sorted/tree nature. When we have a sorted list instead, say
List<Integer> list=new ArrayList<>(tree);
utilizing the sorted nature is more complex as the collection itself doesn’t know that it’s sorted and doesn’t offer operations utilizing that directly:
int ix=Collections.binarySearch(list, 65);
if(ix<0) ix=~ix;
if(ix>0) list=list.subList(ix, list.size());
ix=Collections.binarySearch(list, 91);
if(ix<0) ix=~ix;
if(ix<list.size()) list=list.subList(0, ix);
list.stream().forEach(i->System.out.print((char)i.intValue()));
Of course, the stream operations here are only exemplary and you don’t need a stream at all, when all you do then is forEach…
As far as I am aware, there's no such differenciation for normal streaming.
However, you might be better off when you use parallel streaming when you use a collection which is easily devideable, like ArrayList over LinkedList or any type of Set.

Is an ArrayList or a LinkedList better for sorting?

I want to use data structure that needs to be sorted every now and again. The size of the data structure will hardly exceed 1000 items.
Which one is better - ArrayList or LinkedList?
Which sorting algorithm is better to use?
Up to Java 7, it made no difference because Collections.sort would dump the content of the list into an array.
With Java 8, using an ArrayList should be slightly faster because Collections.sort will call List.sort and ArrayList has a specialised version that sorts the backing array directly, saving a copy.
So bottom line is ArrayList is better as it gives a similar or better performance depending on the version of Java.
If you're going to be using java.util.Collections.sort(List) then it really doesn't matter.
If the List does not implement RandomAccess, then it will be dumped to a List The list will get dumped into an array for purposes of sorting anyway.
(Thanks for keeping me honest Ralph. Looks like I confused the implementations of sort and shuffle. They're close enough to the same thing right?)
If you can use the Apache library, then have a look at TreeList. It addresses your problem correctly.
Only 1000 items? Why do you care?
I usually always use ArrayList unless I have specific need to do otherwise.
Have a look at the source code. I think sorting is based on arrays anyway, if I remember correctly.
If you are just sorting and not dynamically updating your sorted list, then either is fine and an array will be more memory efficient. Linked lists are really better if you want to maintain a sorted list. Inserting an object is fast into the middle of a linked list, but slow into an array.
Arrays are better if you want to find an object in the middle. With an array, you can do a binary sort and find if a member is in the list in O(logN) time. With a linked list, you need to walk the entire list which is very slow.
I guess which is better for your application depends on what you want to do with the list after it is sorted.

Vector option in Java

I am using vector of object. My issue is the removal from vector is expensive operation( O(n^2)). What would be the replacement of vector in Java. In my uses addition and removal is extensively happens.
i am C++ person don't know much Java
Well, Vector class shouldn't be used. There are so many containers available in Java. Few of them:
ArrayList is good for random access, but is bad for inserting or removing from the middle of the list.
LinkedList is bad for random access, but is fair good for iterating and adding/removing elements from the middle of container.
You can use ArrayList instead of vector in Java.
Check out this article:
http://www.javaworld.com/javaworld/javaqa/2001-06/03-qa-0622-vector.html
LinkedList can add/remove items at O(1)
First of all, Vector removal time complexity is O(n) not O(n^2). If you want more performant class, you should choose LinkedList. Its time complexity is constant.
Maybe a list is not the ideal data structure for your use case - would you be better off using a HashSet if the ordering of elements is not imporant?
Actually, the difference between Vector and ArrayList is that Vector is synchronized whereas ArrayList is not. Generally, you don't need synchronization and thus you'd use ArrayList (much like StringBuffer <-> StringBuilder).
The replacement mostly depends on how you intend to use the collection.
Adding objects to an ArrayList is quite fast, since if more space is required, it is normally doubled, and if you know the size requirements in advance, even better.
Removing from an ArrayList is O(n) but iteration and random access are fast.
If you have frequent add or remove operations and otherwise iterate over the list, a LinkedList would be fine.
You could even consider using a LinkedHashMap which allows fast access as well as preserves the order of insertion.
i think, Vector using System.arrayCopy which complexity is O(n^2)
It is correct that Vector will use System.arrayCopy to move the elements. However the System.arrayCopy() call copies at most Vector.size() elements, and hence is O(N) where N is the vector's size.
Hence O(N^2) is incorrect for a single insertion / removal.
In fact, if you want better than O(N) insertion and deletion, you will need to use some kind of linked list type with a cursor abstraction that allows insertion and deletion at "the current position". Even then you only get better than O(N) if you can do the insertions / deletions in the right order; i.e. not random order.
FWIW, the Java List APIs don't provide such a cursor mechanism ... not least because it would be awkward to use, and only efficient in certain circumstances / implementations.
Thanks to everyone for there contribution which helped me to solve this problem. I used a circular queue which has been written with help of vector.

Best way to remove and add elements from the java List

I have 100,000 objects in the list .I want to remove few elements from the list based on condition.Can anyone tell me what is the best approach to achieve interms of memory and performance.
Same question for adding objects also based on condition.
Thanks in Advance
Raju
Your container is not just a List. List is an interface that can be implemented by, for example ArrayList and LinkedList. The performance will depend on which of these underlying classes is actually instantiated for the object you are polymorphically referring to as List.
ArrayList can access elements in the middle of the list quickly, but if you delete one of them you need to shift a whole bunch of elements. LinkedList is the opposite i nthis respect., requiring iteration for the access but deletion is just a matter of reassigning pointers.
Your performance depends on the implementation of List, and the best choice of implementation depends on how you will be using the List and which operations are most frequent.
If you're going to be iterating a list and applying tests to each element, then a LinkedList will be most efficient in terms of CPU time, because you don't have to shift any elements in the list. It will, however consume more memory than an ArrayList, because each list element is actually held in an entry.
However, it might not matter. 100,000 is a small number, and if you aren't removing a lot of elements the cost to shift an ArrayList will be low. And if you are removing a lot of elements, it's probably better to restructure as a copy-with filter.
However, the only real way to know is to write the code and benchmark it.
Collections2.filter (from Guava) produces a filtered collection based on a predicate.
List<Number> myNumbers = Arrays.asList(Integer.valueOf(1), Double.valueOf(1e6));
Collection<Number> bigNumbers = Collections2.filter(
myNumbers,
new Predicate<Number>() {
public boolean apply(Number n) {
return n.doubleValue() >= 100d;
}
});
Note, that some operations like size() are not efficient with this scheme. If you tend to follow Josh Bloch's advice and prefer isEmpty() and iterators to unnecessary size() checks, then this shouldn't bite you in practice.
LinkedList could be a good choice.
LinkedList does "remove and add elements" more effective than ArrayList. and no need to call such method as ArrayList.trimToSize() to remove useless memory. But LinkedList is a dual-linked list, each element is wrapped as an Entry which needs extra memory.

Categories