Arraylist vs Array for finding element? - java

Which is the most efficient way of finding an element in terms of performance. Say I have 100's of strings. I need to find whether a specified string is available in those bulk strings. I have contains() method in Arraylist, But I need to iterate through Array for the same purpose. Anyone explain, which is the best way of doing this in terms of performance.

Say I have 100's of strings. I need to find whether a specified string is available in those bulk strings.
That sounds like you want a HashSet<String> - not a list or an array. At least, that's the case if the hundreds of strings is the same every time you want to search. If you're searching within a different set of strings every time, you're not going to do better than O(N) if you receive the set in an arbitrary order.
In general, checking for containment in a list/array is an O(N) operation, whereas in a hash-based data structure it's O(1). Of course there's also the cost of performing the hashing and equality checking, but that's a different matter.
Another option would be a sorted list, which would be O(log N).
If you care about the ordering, you might want to consider a LinkedHashSet<String>, which maintains insertion order but still has O(1) access. (It's basically a linked list combined with a hash set.)

An Arraylist uses an array as backing data so the performance will be the same for both

Look at the implementation of ArrayList#contains which calls indexOf()
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
You would do the exact same thing if you implemented the contains() on your own for an array.

You don't have to worry about performance issues. It will not affect much. Its good and easy to use contains() method in ArrayList

Related

How to increase the performance with Java ArrayList

I'm using a huge ArrayList with the code bellow
public final List<MyClass> list = new ArrayList<>();
public void update(MyClass myClass) {
int i;
for (i=0; i < list.size(); i++) {
if (myClass.foo(list.get(i))) {
list.set(i, myClass);
break;
}
}
if (i == list.size()) {
list.add(myClass);
}
}
The list is extremely large. There is something else that I can do to increase the performance with this scenario? Maybe using some Java 8 feature, replacing ArrayList or something like that.
Another code that is taking too long to run related this List is the code bellow:
public List<MyClass> something(Integer amount) {
list.sort((m1, m2) -> Double.compare(m2.getBar(), m1.getBar()));
return list.stream()
.limit(amount)
.collect(Collectors.toList());
}
Any help is welcome, thank you all
It seems like the choice of the ArrayList is not good.
In the first case, you attempt to find an object by his properties in the list. To find an object in the list, you have to check in each elements of your list. Bigger is the list, the longer it will be. (You have a worst case complexity of O(N) with ArrayList)
If you use an HashMap instead of a List, you can use your property as key of your map. Like this, you can select the object you need to update directly without check each element of your list. The execution time will be no more dependent of the number of entries. (You have a worst case complexity of O(1) with HashMap)
If you use HashMap instead of ArrayList, your update code gonna look like this:
public void update(MyClass myClass) {
map.put(myClass.getKey(), myClass);
}
(where getKey() is the properties you try to equals in your foo method).
But this is only for the first case. With the informations we have it seems the best solution.
There is something else that I can do to increase the performance with this scenario?
The problem is that your algorithm has to apply myClass.foo to every element of the list until you find the first match. If you do this serially, then the worst-case complexity is O(N) where N is the list size. (And the list size is large.)
Now, you could do the searching in parallel. However, if there can be multiple matches, then matching the first one in the list is going to be tricky. And you still end up with O(N/C) where C is the number of cores available.
The only way to get better than O(N) is to use a different data structure. But without knowing what the MyClass::foo method does, it is hard to say what that data structure should be.
Your second problem seems to be trying to solve the "top K of N" problem. This can be implemented in O(N log K) and possibly better; see Optimal algorithm for returning top k values from an array of length N.

What is the best way to iterate over list

I have worked pretty much on collection but I have few doubts.
I am aware that we can iterate list with iterator.
Another way is that we can go through as below:
for(int i=0; i<list.size(); i++){
list.get(i);
}
Here I think there is problem that each time it will call list.size() that will build whole tree that will impact performance.
I thought other solution as well like:
int s = list.size();
for(int i=0; i<s; i++){
list.get(i);
}
I think this can solve the problem. I am not much exposed to thread. I am thinking that whetherthis should be right approach or not.
Another way I thought is like:
for (Object obj; list){
}
With this new for loop, I think compiler again checks size of list.
Please give best solution from these or alternative performance efficient approach. Thank you for your help.
Calling size() at each iteration is not really a problem. This operation is O(1) for all the collections I know of: size() simply returns the value of a field of the list, holding its size.
The main problem of the first way is the repeated call to get(i). This operation is O(1) for an ArrayList, but is O(n) for a LinkedList, making the whole iteration O(n2) instead of O(n): get(i) forces the list to start from the first element of the list (or the last one), and to go to the next node until the ith element.
Using an iterator, or using a foreach loop (which internally uses an iterator), guarantees that the most appropriate way of iterating is used, because the iterator knows about how the list is implemented and how best go from one element to the next.
BTW, this is also the only way to iterate through non-indexed collections, like Sets. So you'd better get used to use that kind of loop.
For your example is the best way:
for (Object obj: list){
}
It is the same like in java version < 1.5:
for (Iterator it = hs.iterator() ; it.hasNext() ; ){}
It use iterator of collection. You actually don't need the size of collection. The .size() method is should actually don't build the tree, but .get() can loops to the given element. .get() and .size() methods depend on List implementation. .get() In ArrayList should be actually O(1) complexity and not O(n)
UPDATE
In java 8 you can use:
myList.forEach{ Object elem ->
//do something
}
The best way to iterate the list in terms of performance would be to use iterators ( your second approach using foreach ).
If you are using list.get(i), it's performance would depend upon the implementation of the list. For ArrayList, list.get(i) is O(1) where as it's O(n) for LinkedList.
Also, list.size() is O(1) and should not have any impact over the performance.
for (Object obj: list){
}
Above code for me is the best way, it is clean and can be read easily.
The forEach in Java 8 is nice too.

Java: What collection type should I use for this case?

What I need:
Fastest put/remove, this is used alot.
Iteration, also used frequently.
Holds an object, e.g. Player. remove should be o(1) so maybe hashmap?
No duplicate keys
direct get() is never used, mainly iterating to retrieve data.`
I don't worry about memory, I just want the fastest speed possible even if it's at the cost of memory.
For iteration, nothing is faster than a plain old array. Entries are saved sequentially in memory, so the JVM can get to the next entry simply by adding the length of one entry to the its address.
Arrays are typically a bit of a hassle to deal with compared to maps or lists (e.g: no dictionary-style lookups, fixed length). However, in your case I think it makes sense to go with a one or two dimensional array since the length of the array will not change and dictionary-style lookups are not needed.
So if I understand you correctly you want to have a two-dimensional grid that holds information of which, if any, player is in specific tiles? To me it doesn't sound like you should be removing, or adding things to the grid. I would simply use a two-dimensional array that holds type Player or something similar. Then if no player is in a tile you can set that position to null, or some static value like Player.none() or Tile.empty() or however you'd want to implement it. Either way, a simple two-dimensional array should work fine. :)
The best Collection for your case is a LinkedList. Linked lists will allow for fast iteration, and fast removal and addition at any place in the linked list. For example, if you use an ArrayList, and you can to insert something at index i, then you have to move all the elements from i to the end one entry to the right. The same would happen if you want to remove. In a linked list you can add and remove in constant time.
Since you need two dimensions, you can use linked lists inside of linked lists:
List<List<Tile> players = new LinkedList<List<Tile>>(20);
for (int i = 0; i < 20; ++i){
List<Tile> tiles = new LinkedList<Tile>(20);
for (int j = 0; j < 20; ++j){
tiles.add(new Tile());
}
players.add(tiles);
}
use a map of sets guarantee O(1) for vertices lookup and amortized O(1) complexity edge insertion and deletions.
HashMap<VertexT, HashSet<EdgeT>> incidenceMap;
There is no simple one-size-fits-all solution to this.
For example, if you only want to append, iterate and use Iterator.remove(), there are two obvious options: ArrayList and LinkedList
ArrayList uses less memory, but Iterator.remove() is O(N)
LinkedList uses more memory, but Iterator.remove() is O(1)
If you also want to do fast lookup; (e.g. Collection.contains tests), or removal using Collection.remove, then HashSet is going to be better ... if the collections are likely to be large. A HashSet won't allow you to put an object into the collection multiple times, but that could be an advantage. It also uses more memory than either ArrayList or LinkedList.
If you were more specific on the properties required, and what you are optimizing for (speed, memory use, both?) then we could give you better advice.
The requirement of not allowing duplicates is effectively adding a requirement for efficient get().
Your options are either hash-based, or O(Log(N)). Most likely, hashcode will be faster, unless for whatever reason, calling hashCode() + equals() once is much slower than calling compareTo() Log(N) times. This could be, for instance, if you're dealing with very long strings. Log(N) is not very much, by the way: Log(1,000,000,000) ~= 30.
If you want to use a hash-based data structure, then HashSet is your friend. Make sure that Player has a good fast implementation of hashCode(). If you know the number of entries ahead of time, specify the HashSet size. ( ceil(N/load_factor)+1. The default load factor is 0.75).
If you want to use a sort-based structure, implement an efficient Player.compareTo(). Your choices are TreeSet, or Skip List. They're pretty comparable in terms of characteristics. TreeSet is nice in that it's available out of the box in the JDK, whereas only a concurrent SkipList is available. Both need to be rebalanced as you add data, which may take time, and I don't know how to predict which will be better.

Is adding to a set O(n)?

Since sets can only have unique values does this mean every time you add an element to a set it has to check whether it is equal to every element there and is hence O(n)?
Since this would make them much slower than arrayLists if this is the case, is the only time you should ever actually use them is when making sure your elements are all unique or is there any other advantage of them?
This depends on the implementation of a set.
C++
An std::set in C++ is typically implemented as a red-black tree and guarantees an insert complexity of O(log(n)) (source).
std::set is an associative container that contains a sorted set of unique objects of type Key. Sorting is done using the key comparison function Compare. Search, removal, and insertion operations have logarithmic complexity.
C++11's std::unordered_set has an insert complexity of O(1) (source).
Unordered set is an associative container that contains set of unique objects of type Key. Search, insertion, and removal have average constant-time complexity.
JAVA
In Java, adding an element to a HashSet is O(1). From the documentation:
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets
Inserting an element into a TreeSet is O(log(n)).
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
All classes implementing Set can be found in the documentation.
Conclusion
Adding to a set is, in most cases, not slower than adding to an ArrayList or std::vector. However, a set does not necessarily keep items in the order in which they are inserted. Also, accessing some Nth element of a set has a worse complexity than the same operation on an ArrayList or std::vector. Each has their advantages and disadvantages and should be used accordingly.
You tagged this Java as well as C++, so I'll answer for both:
In C++ std::set is an ordered container, likely implemented as a tree. Regardless of implementation adding to a set and checking whether an element in a set are guaranteed to be O(log n). For std::unordered_set, which is new in C++11, those operations are O(1) (given a proper hashing function).
In Java java.util.Set is an interface that can have many different classes who implement them. The complexities of the operations are up to those classes. The most commonly used sets are TreeSet and HashSet. The operations on the former are O(log n) and for the latter they're O(1) (again, giving a proper hashing function).
A C++ std::set is normally implemented as a black-and-red tree. This means that adding to it will be O(log n).
A C++ std::unordered_set insertion is implemented as a hash-table so insertion is O(1).
You forget that a set may not be a bulk list of elements; it can be arranged (and it indeed is) in such a way that searches are much faster than O(N).
http://en.wikipedia.org/wiki/Set_(abstract_data_type)#Implementations
It depends. Different languages provide different implementations. Even java has 2 different sets: A TreeSet and a HashSet. Adding is a TreeSet is O(logn), since the elements are already in an order.
in cpp, sets are typically implemented as binary search trees. with that being said, inserting would require O(log(N)) time complexity. when it comes to unique key, u can try hash_map in cpp, which has a constant time complexity when inserting.
Time complexity: Wrapper Classes vs Primitives.
When the value changes, especially multiple times, primitives give better time.
Example:
int counter = 0;
while (x>y){
counter++;
}
is much faster than:
Integer counter = 0;
while (x>y){
counter++;
}
When the value remains the same, wrapper classes give better time since only a pointer to the wrapper class is passed to the algorithm. It comes handy in defining parameters of methods that do not change their value.
Example:
public int sum (Integer one, Integer two, Integer three){
int sum = one+two+three;
return sum;
}
is faster than
public int sum (int one, int two, int three){
int sum = one+two+three;
return sum;
}
The values that are passed to the methods could be primitive, the important thing is the definition of the parameters of the method itself, that is to say:
public int sum (Integer one, Integer two, Integer three){
int sum = one+two+three;
return sum;
}
int a = 1; int b = 2; int c = 3;
public int sum (a, b, c){
int sum = a+b+c;
return sum;
}
The cumulative effect of using wrapper classes as described above could significantly improve the performance of a program.
As others have stated, sets are usually either trees O(logn) or hash tables O(1). However, there is one thing you can be sure about: No sane map implementation would have O(n) behaviour.

which one is better from performance point of view..arraylist or array?

I want to store 50000 or more strings and I need to perform several operations like retrieval of a specific string, deletion of a specific string, etc. I have been given only two options to select from and these are array list and array to store them. From a performance point of view which one is better?
Neither. If you want retrieval of specific strings (e.g. get the string "Foo") and deleting specific strings (e.g. delete "Foo"), I would consider using a Set.
An array list or an array will give you O(N) retrieval (unless you keep it sorted). A Set will typically give you at least O(lg N) time for finding a specific item.
ArrayList is backed by an array so performance wise you should see no difference.
If there is no error in your requirements, and indeed you have to choose among only an arraylist and a raw array, I would suggest an arraylist since you have all the APIs to manipulate the data available which you would have to write yourself for a raw array of Strings.
an array is more efficient performance wise than an arraylist but unless you know how many elements you will be placing into an array an arraylist would be a better option since the size of the arraylist can grow as needed whereas a static array cannot.
An array will always have better performance than an ArrayList. In part, because when using an array you don't have to pay the extra cost of type-casting its elements (using generics doesn't mean that typecasts disappear, only that they're hidden from plain view).
To make my point: Trove and fastutil are a couple of very fast Java collections libraries, which rely on the fact of providing type-specific collections and not Object-based implementations like ArrayList does.
Also, there's a cost for using a get() method for accessing elements (albeit small) and a cost for resizing operations, which can be important in huge ArrayLists with many insertions and deletions. Of course, this doesn't happen with arrays because by their very nature have a fixed size, that's both an advantage and a disadvantage.
Answering your question: if you know in advance the number of elements that you're going to need, and those elements aren't going to change much (insertions, deletion) then your best bet is to use an array. If some modification operations are needed and performance is of paramount importance, try using either Trove or fastutil.
Retrieval of specific string,deleting specific string...i think ArrayList is not the best solution. Take a look at HashSet or LinkedHashSet.
If you look at the source code of ArrayList you will see:
107 /**
108 * The array buffer into which the elements of the ArrayList are stored.
109 * The capacity of the ArrayList is the length of this array buffer.
110 */
111 private transient Object[] elementData;
it is using an array internally.
So ArrayList could never be faster than using an array.
Provided you intially size the ArrayList correctly, the main difference will come from additions, which do a range check that you could get rid of with an array. But we are talking about a few CPU cycles here.
Apart from that there should be no noticeable difference. For example, the indexOf method in ArrayList looks like this:
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}

Categories