HashSet and LinkedHashSet [duplicate] - java

This question already has answers here:
HashSet vs LinkedHashSet
(10 answers)
Closed 9 years ago.
So im developing a program where I need to store unique customer data of primitive types. In this regard, I have been reading a book about datastructures, and came to the conclusion to use a HashSet.
Now this book states, that a HashSet has faster insertion and removal, than a LinkedHashSet. Now this baffles me a bit. I thought that the only difference between the two, was that a LinkedHashSet uses some extra memory, using a LinkedList to keep order.
Can anyone elaborate?

Choose you Data Structure wisely.
You can use Linked Hash Set instead of Hash Set if the order of insertion is important to you. With the additional features the memory or processor cycles might take a hit.
Edit1 :
Things to consider other than the insertion order: Because LinkedHashSet maitains a doubly linkedlist, it will be slower for insertion and removing, but will be slightly faster in iteration.
To Quote the java doc:
This class provides all of the optional Set operations, and permits null elements. Like HashSet, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.

TreeSet, LinkedHashSet and HashSet in Java are three Set implementation in collection framework and like many others they are also used to store objects. Main feature of TreeSet is sorting, LinkedHashSet is insertion order and HashSet is just general purpose collection for storing object. HashSet is implemented using HashMap in Java while TreeSet is implemented using TreeMap. TreeSet is a SortedSetimplementation which allows it to keep elements in the sorted order defined by either Comparable or Comparator interface. Comparable is used for natural order sorting and Comparator for custom order sorting of objects, which can be provided while creating instance of TreeSet. Anyway before seeing difference between TreeSet, LinkedHashSet and HashSet, let's see some similarities between them:
1) Duplicates : All three implements Set interface means they are not allowed to store duplicates.
2) Thread safety : HashSet, TreeSet and LinkedHashSet are not thread-safe, if you use them in multi-threading environment where at least one Thread modifies Set you need to externally synchronize them.
3) Fail-Fast Iterator : Iterator returned by TreeSet, LinkedHashSet and HashSet are fail-fast Iterator. i.e. If Iterator is modified after its creation by any way other than Iterators remove() method, it will throw ConcurrentModificationException with best of effort. read more about fail-fast vs fail-safe Iterator here
Now let’s see difference between HashSet, LinkedHashSet and TreeSet in Java :
Performance and Speed : First difference between them comes in terms of speed. HashSet is fastest, LinkedHashSet is second on performance or almost similar to HashSet but TreeSet is bit slower because of sorting operation it needs to perform on each insertion. TreeSet provides guaranteed O(log(n)) time for common operations like add, remove and contains, while HashSet and LinkedHashSet offer constant timeperformance e.g. O(1) for add, contains and remove given hash function uniformly distribute elements in bucket.
Ordering : HashSet does not maintain any order while LinkedHashSet maintains insertion order of elements much like List interface and TreeSet maintains sorting order or elements.
Internal Implementation : HashSet is backed by an HashMap instance, LinkedHashSet is implemented using HashSet and LinkedList while TreeSet is backed up by NavigableMap in Java and by default it uses TreeMap.
null : Both HashSet and LinkedHashSet allows null but TreeSet doesn't allow null but TreeSet doesn't allow null and throw java.lang.NullPointerException when you will insert null into TreeSet. Since TreeSet uses compareTo() method of respective elements to compare them which throws NullPointerException while comparing with null, here is an

Related

Acessing a element in Treeset using index

Suppose there is a string treeset (ts)of elemnent 1,2,3,4,5,6,7,8,9,10.
Is there is any in built method in treeset so that i can access an element.
For eg accessing 3 can i do ts.[2]and accessing 8 ts.[7].(something like that).
i used this method:
Iterator<String> it = ts.iterator();
int i=0;
while(it.hasNext()) {
String ele=it.next();
if(i==2){
System.out.println(ele+"");
}
i++;
}
though when i ran it didn't showed any o/p but if i did i=0 then it showed all the o/p i.e 1,2,3,4,5,6,7,8,9,10.
Secondly can anyone tell me that when it is best to use hashset,treeset and linkedhashset
If you wanna access elements in your collection like ts[2], then you should better convert your collection into array using collection inbuilt method.
Otherwise, using iterator is the standard and efficient way to access elements in collection.
For second question, Hashset is used as hash table ; LinkedHashSet is used as hash table with elements stored in same way as inserted; TreeSet is used for collection using navigations.
For complete knowledge you must check Oracle documentation.
TreeSet is a NavigableSetwhich means you have an order of items (natural ordering as default, but you can define your own ordering relationship by using Comparator or Comparable interface) and you can navigate through items by this order. However there is no index mechanism. Basically a TreeSet is based on a TreeMap which is a red-black tree. In such a data structure indexes (element indexes, not indexes in the sense of efficient access) are not much meaningfull.
HashSet on the other hand is based on a HashMap which is a classical hash table implementation. In this data structure there is no order defined. You can look up each item at O(1) time though due to hash function used.
LinkedHashSet is a subclass of HashSet. Other then HashSet methods no new method is defined, so LinkedHashSet does not allow any more capability like natural order or indexes. However it has an auxilary linked list that keeps track of the order in which elements are inserted. In this way when you iterate over a LinkedHashSet by .iterator() method or a for loop you get elements in the order you inserted.
So basically a HashSet is more appropriate if you will access elements individually. Or being the simplest Set implementation you can use HashSet in generic cases. If you need to keep the order of insertion you need to use LinkedHashSet and if you have to enforce any custom ordering or natural ordering of items you should use TreeSet.

copy elements of arraylist in a set with the same order java

I've sorted an arraylist of int in ascending order, but when I copy it in a set, the elements are not sorted anymore.
I'm using this :
HashSet<Integer> set = new HashSet<Integer>(sortedArray);
why is like that?
LinkedHashSet will keep the order. TreeSet will sort based either on an external Comparator or natural ordering through Comparable.
A general point of a Set is that order is irrelevant. Hashing is intended to put the elements in as random an order as possible. LinkedHashSet maintains a linked-list between references to the elements, so can maintain an order.
BitSet (which is not a Set) may, or may not, provide a more efficient data structure.
HashSet's don't sort or maintain order, and the API will tell you this:
it does not guarantee that the order will remain constant over time.
Consider using another type of Set such as a TreeSet.
If you just care about uniqueness, use the HashSet. If you're after sorting, then consider the TreeSet.
you need to use TreeSet and implement a Comparator object or Comparable interface for your data. you can read about Object ordering here
hash set is designed for quick access to unique data, not for maintaining a particular order.

java significance of hashing linkedHashSet

I know following things about linkedHashSet
it maintains insertion order
uses LinkedList to preserve order
my question is how does hashing come into picture ??
I understand If hashing is used then the concept of bucketing comes in
However, from checking the code in the JDK it seems that LinkedHashSet implementation contains only constructor and no implementation, so I guess all the logic happens in HashSet?
so hashSet uses LinkedList by default ?
Let me put my question this way ... if objective is to write a collection that
maintains unique values
preserves insertion order using a linked list THEN ... it can easily be done without Hashing ... may be we can call this collection LinkedSet
saw a similar question what's the difference between HashSet and LinkedHashSet but not very helpful
Let me know if i need to explain my question more
False. The implementation of LinkedHashSet is really all in LinkedHashMap. (And the implementation of HashSet is really all in HashMap. Le gasp!)
HashSet has no linked list at all.
It's entirely possible to write a LinkedSet collection backed by a linked list, that keeps elements unique -- it's just that its performance will be pretty crappy.
It's an 'interesting' implementation. The constructors for LinkedHashSet defer to package-private constructors in HashSet which setup the data structure (a LinkedHashMap) for maintaining iteration order.
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
}
The API designers could simply have exposed this constructor as public, with appropriate documentation, but I guess they wanted the code to be more 'self-documenting'.
If you look closely, you will see it is actually using some protected constructors on the HashSet that are there just for it, not regular ones. e.g.,
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
}
So the keySet being used to back the LinkedHashSet is in fact coming from the implementation of LinkedHashMap, not a regular HashMap like a regular HashSet. It doesn't actually use java.util.LinkedList. It just maintains pointers that form a list within the implementation of the bucket contents (Map.Entry<K,V>)
316 private static class Entry<K,V> extends HashMap.Entry<K,V> {
317 // These fields comprise the doubly linked list used for iteration.
318 Entry<K,V> before, after;
319
320 Entry(int hash, K key, V value, HashMap.Entry<K,V> next) {
321 super(hash, key, value, next);
322 }
Hashing comes into the picture because it's an easy way to create a collection that enforces uniqueness and offers constant-time performance for most operations. Sure we could just use a linked list and add uniqueness checking, but the time for several operations would become O(N) cause you'd have to iterate the whole list to check for duplicates.
Code Sample
Set<Registeration> registerationSet = new LinkedHashSet<>();
registerationSet.add(new Registeration());
Explanation of Line2.
computes hashCode for Registeration object
search for hashCode in registerationSet to locate the bucket
check for equal object in shortlisted bucket
3.1. if equal found, replace it, with new objects reference
3.2. if not found, append/add Registeration object's reference in bucket
Parallel to it,
A List maintains entry order/queue of all elements inserted
Always, add new reference to the end
In case of replacement(3.1. in above), remove previous occurrence.
For a Specific answer to your question
how does hashing come into picture? (in a LinkedHashSet)
What the Java Docs say...
Like HashSet, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets.
This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order).
The buckets accessed by a hashcode is used to speed up random access, and the LinkedList implementation is for returning an iterator which spits out elements in insertion order.
Hope i have answered your question?

does java sortedhashset type collection exist?

Does such a thing exist anywhere? Basically I see java has LinkedHashSet but no type of navigatable hash set?
By its very nature, a hash-based data structure is not ordered. You can write wrappers which supplement it with an additional data structure (this is more or less what LinkedHashMap does). But while it makes some sense to keep a hash set and a list, in order to keep a good ordering, you would need a tree or similar data structure. But the tree can work as a set by itself, so you would essentially be duplicating the information (more than in the case of set plus list, which differ more than two different set implemnentations). So the best solution is to just use TreeSet or another SortedSet if you need order.
It's not a HashSet, but as a descendant of Set you have the TreeSet
This class implements the Set interface, backed by a TreeMap instance. This class guarantees that the sorted set will be in ascending element order
You can traverse the elements using the iterator
public Iterator iterator()
Returns an iterator over the elements in this set. The elements are returned in ascending order
You can use a TreeSet but all the operations in it are lg(n)
You can use a LinkedHashSet, which keeps a linked list on top of hashset, but it only maintains insertion ordering (first inserted will be first element in iterator), you cannot have natural or custom ordering
You could also use TreeSet+HashSet approach but two reference for each element will be kept and while add and remove would still be lg(n) the contains will become expected o(n)
choose wisely :)
I guess there's TreeMap which is...related but definitely not the same :)

ordering a hashset example?

I need an example on how to use a comparable class on a HashSet to get an ascending order. Let’s say I have a HashSet like this one:
HashSet<String> hs = new HashSet<String>();
How can I get hs to be in ascending order?
Use a TreeSet instead. It has a constructor taking a Comparator. It will automatically sort the Set.
If you want to convert a HashSet to a TreeSet, then do so:
Set<YourObject> hashSet = getItSomehow();
Set<YourObject> treeSet = new TreeSet<YourObject>(new YourComparator());
treeSet.addAll(hashSet);
// Now it's sorted based on the logic as implemented in YourComparator.
If the items you have itself already implements Comparable and its default ordering order is already what you want, then you basically don't need to supply a Comparator. You could then construct the TreeSet directly based on the HashSet. E.g.
Set<String> hashSet = getItSomehow();
Set<String> treeSet = new TreeSet<String>(hashSet);
// Now it's sorted based on the logic as implemented in String#compareTo().
See also:
Object ordering tutorial
Collections tutorial - Set Implementations
HashSet "makes no guarantees as to the iteration order of the set." Use LinkedHashSet instead.
Addendum: I would second #BalusC's point about implementing Comparable and express
a slight preference for LinkedHashSet, which offers "predictable iteration order ... without incurring the increased cost associated with TreeSet."
Addendum: #Stephen raises an important point, which favors #BalusC's suggestion of TreeMap. LinkedHashSet is a more efficient alternative only if the data is (nearly) static and already sorted.
HashSets do not guarantee iteration order:
This class implements the Set
interface, backed by a hash table
(actually a HashMap instance). It
makes no guarantees as to the
iteration order of the set; in
particular, it does not guarantee that
the order will remain constant over
time. This class permits the null
element.
You probably need to choose a different datastructure if you want to be able to control the iteration order (or indeed have one at all!)

Categories