I wish to
Avoid duplicated item being inserted.
When I iterate through the collection class, the returned item is same as insertion order.
May I know, what thing I should consider, to choose either ArrayList (explicitly perform contains check before insertion) or LinkedHashSet?
Thanks.
Definitely use LinkedHashSet. It is made for what you need. Searching entire ArrayList every time you need to insert something will be performance killer (O(n) every time))
Use LinkedHashSet if you don't want duplicate items inserted.
A LinkedHashSet seems to fit the bill perfectly.
When you build your own objects, and plan to use them in a Collection like LinkedHashSet here. Don't forget to override both equals and hashcode for the item you are going to store in it.
Please check this out:
http://wiki3.cosc.canterbury.ac.nz/images/e/e9/JavaCollections.png
LinkedHashSet is what you need, because it's an implementation of the Set interface. Set has one very cool habit: it doesn't allows duplicates by default. So, we are done with your 1.
What about 2?
We know, that we need one of the Set implementation, but which ?
HashMap - you are able to store K,V pairs, but there is no order.
TreeSet - this is the slowest solution, because it's using a compareTo method to keep every item sorted and ordered. This is why you can pass a comparator to it, when you are constructing a TreeSet.
LinkedHashSet - Gives back the elements in order of INSERTING them. It is the ordered version of a HashSet.
Please find a cool description here:
http://java67.blogspot.co.uk/2014/01/when-to-use-linkedhashset-vs-treeset-vs-hashset-java.html?_sm_au_=iVVMtMLHSDQ5P0P7
Related
Please explain how different collection are used under different scenario.
By this I mean to say how can I differentiate when to use a List, a Set or a Map interface.
Please provide some links to examples that can provide a clear explanation.
Also
if insertion order is preserved then we should go for List.
if insertion order is not preserved then we should go for Set.
What does "insertion order is preserved" means?
Insertion order
Insertion order is preserving the order in which you have inserted the data.
For example you have inserted data {1,2,3,4,5}
Set returns something like {2,3,1,4,5}
while list returns {1,2,3,4,5} .//It preserves the order of insertion
When to use List, Set and Map in Java
1) If you need to access elements frequently by using index, then List is a way to go. Its implementation e.g. ArrayList provides faster access if you know index.
2) If you want to store elements and want them to maintain an order on which they are inserted into collection then go for List again, as List is an ordered collection and maintain insertion order.
3) If you want to create collection of unique elements and don't want any duplicate then choose any Set implementation e.g. HashSet, LinkedHashSet or TreeSet. All Set implementation follow there general contract e.g. uniqueness but also add addition feature e.g. TreeSet is a SortedSet and elements stored on TreeSet can be sorted by using Comparator or Comparable in Java. LinkedHashSet also maintains insertion order.
4) If you store data in form of key and value then Map is the way to go. You can choose from Hashtable, HashMap, TreeMap based upon your subsequent need.
You will find some more useful info at http://java67.blogspot.com/2013/01/difference-between-set-list-and-map-in-java.html
I am considering using a Java collection that would work best with random insertions. I will be inserting a lot and only read the collection once at the end.
My desired functionality is adding an element at a specified index, anywhere between <0, current_length>. Which collection would be the most efficient to use?
Useful link for your reference:
http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf
Not entirely sure how you will be reading the information post input (and how important it is to you). Hashmap or ArrayList would make sense depending on what you are looking to do. Also not sure if you are looking for something thread safe or not.
Hope it helps.
The inefficiency of using List is endemic to the problem. Every time you add something, every subsequent element will have to be re-indexed - as the javadoc states:
Shifts the element currently at that position (if any) and any
subsequent elements to the right (adds one to their indices).
From your question/comments, it would appear that you have a bunch of Objects, and you're sorting them as you go. I'd suggest a more efficient solution to this problem would be to write a Comparator (or make your object implement Comparable), and then use Collections.sort(list, comparator) (or Collections.sort(list)).
You might suggest that your Objects are being sorted on the basis of other variables. In which case, you could create an extension of the Object, with those other variables as fields and extending Comparable, and with a method like getOriginal(). You add these wrapped objects to your list, sort, and then iterate through the list, adding the original objects (from getOriginal()) to a new list.
For info on the sorting algorithm of collections - see this SO question
If I am going to create a Java Collection, and only want to fill it with elements, and then iterate through it (without knowing the necessary size beforehand), i.e. all I need is Collection<E>.add(E) and Collection<E>.iterator(), which concrete class should I choose? Is there any advantage to using a Set rather than a List, for example? Which one would have the least overhead?
which concrete class should I choose?
I would probably just go with an ArrayList or a LinkedList. Both support the add and iterator methods, and neighter of them have any considerable overhead.
Is there any advantage to using a Set rather than a List, for example?
No, I wouldn't say so. (Unless you rely on the order of the elements, in which case you must use a List, or want to disallow duplicates, in which case you should use a Set.)
(I don't see how any Set implementation could beat a list implementation for add / iterator methods, so I'd probably go with a List even if I don't care about order.)
Which one would have the least overhead?
Sounds like micro benchmarking here, but if I'd be forced to guess, I'd say ArrayList (or perhaps LinkedList in coner cases where ArrayLists need to reallocate memory often :-)
Do not go with a Set. Sets and Lists differ according to their purpose, that you should always consider when choosing the right Collection
a List is there for maintaining elements in the order you added them; and if you insert the same element twice it will be kept twice
a Set is there for holding one specific element exactly once (uniqueness); order is only relevant for specific implementations (like TreeSet), but still elements that are 'the same' would not be added twice
Set is only meaningful if you want to sort your objects and to make sure no duplicate element is 'registered'. Else, an ArrayList is just fine.
However, if you want to add elements while iterating too, an ArrayBlockingQueue is better.
Here are some key points which can help you to choose your collection according to your requirement -
List(ArrayList or LinkedList)
Allowed duplicate values.
Insertion order preserved.
Set
Not allowed duplicate values.
Insertion order is not preserved.
So according to your requirement List seems to be a suitable choice.
Now Between ArrayList and LinkedList -
ArrayList is a random access list. Use if your frequent operation is the retrieval of elements.
LinkedList is the best option if you want to add or remove elements from the list.
What changes to be done in ArrayList to make it behave like a Set (means it should not accept any duplicate values).
There are many ways to accomplish this. Here are a two:
Store the elements of the ArrayList in random order. When inserting a new value, do a linear scan over the elements and see if the element you're adding already exists. If so, don't add it. Otherwise, append it to the elements.
Enforce that the elements of the ArrayList always be stored in sorted order. To insert a new element, do a binary search to find where that element should be placed, and if the element already exists don't insert it. Otherwise, insert it at the given position.
However, you shouldn't be doing this. These approaches are very slow compared to HashSet or TreeSet, which are specialized data structures optimized to handle this efficiently.
Create your own implementation , implement java.util.List
override add(), addAll() , make use of contains()
As the others said, it's unclear why you need this.
Maybe LinkedHashSet is what you need?
http://download.oracle.com/javase/6/docs/api/java/util/LinkedHashSet.html
Other than already said, you could have a look at java.util.concurrent.CopyOnWriteArraySet. If you leave the "CopyOnWrite" part away, you have your ArraySet.
Given a List of MyClass objects (and a custom Comparitor myComparitor if needed), what good options are there for checking if the List contains two "equal" objects?
Edit: if there are duplicates, return a reference to one or more of the duplicates.
Overriding MyClass.equals(MyClass) in this case is not an option.
My initial thought is to create a hash table of sorts, but I suspect that there's a non-hack way to accomplish the same thing:
SortedSet mySet = new TreeSet(myComparitor);
mySet.addAll(myList);
// Find duplicates in a sorted set in O(N) time
P.S. Is there a good reference on Markdown?
If the element's equals(Object) method does not give you the semantic that you require, then HashMap or HashSet are not options. Your choices are:
Use a TreeMap for de-duping. This is O(NlogN).
Sort the ArrayList or a copy, then iterate over looking for element i equals element i + 1. This is O(NlogN).
Find an alternative implementation of hash sets that allows you to provide a separate object to implement equality and hashing. (Neither Apache or Google collections support this, so you'll need to look further afield.)
Create a wrapper class for your element type that overrides equals(Object) and hashCode(), and de-dup using a HashSet of wrapped objects. This is O(N), but the constant of proportionality will be larger than a simple HashSet due to creation of wrapper objects.
When de-duping with a Set it is probably better to use a loop rather than addAll. This is necessary if you need to know what all of the duplicates are. If you don't need to know that, then using a loop allows you to stop when you find the first duplicate. The only case where addAll is likely to perform better is when there are likely to be no duplicates.
if you already have a sorted list, you could just look at any element and the next element, and if they're the same you have dups.
in your question you are using a TreeSet, which would have culled out duplicates already, so if you just need to know if you have duplicates, just check the size of mySet vs the size of myList. if they aren't the same you have dups.