I'm using LinkedHashSet. I want to insert items at the 0th position, like:
Set<String> set = new LinkedHashSet<String>();
for (int i = 0; i < n; i++) {
set.add(0, "blah" + i);
}
I'm not sure how linked hash set is implemented, is inserting going to physically move all addresses of current items, or is it the same cost as inserting as in a linked-list implementation?
Thank you
------ Edit ---------------
Complete mess up by me, was referencing ArrayList docs. The Set interface has no add(index, object) method. Is there a way to iterate over the set backwards then? Right now to iterate I'm doing:
for (String it : set) {
}
can we do that in reverse?
Thanks
Sets are, by definition, independent of order. Thus, Set doesn't have add(int , Object) method available.
This is also true of LinkedHashSet http://download.oracle.com/javase/6/docs/api/java/util/LinkedHashSet.html
LinkedHashSet maintains insertion order and thus all elements are added at the end of the linked list. This is achieved using the LinkedHashMap. You can have a look at the method linkEntry in LinkedHashMap http://www.docjar.com/html/api/java/util/LinkedHashMap.java.html
Edit: in response to edited question
There is no API method available to do this. But you can do the following
Add Set to a List using new ArrayList(Set)
Use Collections.reverse(List)
Iterate this list
Judging by the source code of LinkedHashMap (which backs LinkedHashSet -- see http://www.docjar.com/html/api/java/util/LinkedHashMap.java.html ), inserts are cheap, like in a linked list.
To answer your latest question, there is no reverse iterator feature available from LinkedHashSet, even though internally the implementation uses a doubly linked list.
There is an open Request For Enhancement about this:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4848853
Mark Peters links to functionality available in guava, though their reverse list actually generates a reverse list.
As already mentioned, LinkedHashSet is build on LinkedHashMap, which is built on HashMap :) Javadocs says that it takes constant time to add an element into HashMap, assuming that your hash function is implemented properly. If your hash function is not implemented well, it may take up to O(n).
Iteration backwards in not supported at this moment.
You can't add elements to the front of a LinkedHashSet... it has no method such as add(int, Object) nor any other methods that make use of the concept of an "index" in the set (that's a List concept). It only provides consistent iteration order, based on the order in which elements were inserted. The most recently inserted element that was not already in the set will be the last element when you iterate over it.
And the Javadoc for LinkedHashSet explicitly states:
Like HashSet, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets.
Edit: There is not any way to iterate over a LinkedHashSet in reverse short of something like copying it to a List and iterating over that in reverse. Using Guava you could do that like:
for (String s : Lists.reverse(ImmutableList.copyOf(set))) { ... }
Note that while creating the ImmutableList does require iterating over each element of the original set, the reverse method simply provides a reverse view and doesn't iterate at all itself.
Related
my question was why does iterator work on set?
Here is my example code,
public class Staticex {
public static void main(String[] args) {
HashSet set = new HashSet();
set.add(1);
set.add(2);
set.add(3);
set.add(4);
set.add(5);
Iterator iter = set.iterator();
while (iter.hasNext()) {
System.out.println(iter.next());
}
}
}
I understand, set is unordered, In contrast List
So, How can get the values one by one through an iterator?
Is iterator changing set into like list which ordered data structure?
How can Iterator can using in set?
Like you are using it.
How can get the values one by one through an iterator?
Your code is doing that.
Is iterator changing set into like list which ordered data structure?
No.
The thing that you are missing is what "unordered" means. It means that the order in which the (set's) elements are returned is not predictable1, and not specified in the javadocs. However each element will be returned once and (since the elements of a set are unique!) only once for the iteration.
1 - Actually, this is not strictly true. If you have enough information about the element class, the element values, how they were created and how / when they were added to the HashSet, AND you analyze the specific HashSet implementation ... it is possible that you CAN predict what the iteration order is going to be. For example if you create a HashSet<Integer> and add 1, 2, 3, 4, ... to it, you will see a clear (and repeatable) pattern when you iterate the elements. This is in part due to the way that Integer.hashCode() is specified.
Referring to the documentation, we see that:
Iterator<E> iterator()
Returns an iterator over the elements in this collection. There are no guarantees concerning the order in which the elements are returned (unless this collection is an instance of some class that provides a guarantee).
Since there are no guarantees concerning the order in which the elements are returned for iterator, it is not a problem for iterator to apply to Set, which is unordered.
Further, it is not changing the Set into a List
Set is unordered in a logical sense. When you have a bag of things, there isn't a sense of order when they are inside the bag. But when you take each thing out of the bag, one at a time, you end up with some order. And like the other answer has mentioned, you cannot rely on that order since it is purely accidental.
I understand, set is unordered, In contrast List
This is not necessarily true. SortedSet is a subinterface of Set. As the name implies, instances of this interface are ordered in some fashion. For example, TreeSets are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used. Also, the main distinction between Set and List is that List allows for duplicate objects to be contained, whereas Set does not.
Now, if you are talking specifically about HashSet, then you are correct about being unordered.
I think your confusion is because you are asking yourself "why is the print out showing the numbers in numeric (insertion) order?" This is sort of a complicated answer for someone of your familiarization level, but the order in which they are printed out is because you are inserting integers and their hash code are basically their numeric values. And, although there is no guarantee as to the order in which the elements of the hash set are returned when iterating, the implementation of HashSet is backed by a hash table. In fact, if you change the insertion order of those same values, most likely the numbers will be printed out in the same numeric order. Now, remember that with all that, the order is not guaranteed. This may not be true, for instance, if you change the set elements to be String objects.
Suppose there is a string treeset (ts)of elemnent 1,2,3,4,5,6,7,8,9,10.
Is there is any in built method in treeset so that i can access an element.
For eg accessing 3 can i do ts.[2]and accessing 8 ts.[7].(something like that).
i used this method:
Iterator<String> it = ts.iterator();
int i=0;
while(it.hasNext()) {
String ele=it.next();
if(i==2){
System.out.println(ele+"");
}
i++;
}
though when i ran it didn't showed any o/p but if i did i=0 then it showed all the o/p i.e 1,2,3,4,5,6,7,8,9,10.
Secondly can anyone tell me that when it is best to use hashset,treeset and linkedhashset
If you wanna access elements in your collection like ts[2], then you should better convert your collection into array using collection inbuilt method.
Otherwise, using iterator is the standard and efficient way to access elements in collection.
For second question, Hashset is used as hash table ; LinkedHashSet is used as hash table with elements stored in same way as inserted; TreeSet is used for collection using navigations.
For complete knowledge you must check Oracle documentation.
TreeSet is a NavigableSetwhich means you have an order of items (natural ordering as default, but you can define your own ordering relationship by using Comparator or Comparable interface) and you can navigate through items by this order. However there is no index mechanism. Basically a TreeSet is based on a TreeMap which is a red-black tree. In such a data structure indexes (element indexes, not indexes in the sense of efficient access) are not much meaningfull.
HashSet on the other hand is based on a HashMap which is a classical hash table implementation. In this data structure there is no order defined. You can look up each item at O(1) time though due to hash function used.
LinkedHashSet is a subclass of HashSet. Other then HashSet methods no new method is defined, so LinkedHashSet does not allow any more capability like natural order or indexes. However it has an auxilary linked list that keeps track of the order in which elements are inserted. In this way when you iterate over a LinkedHashSet by .iterator() method or a for loop you get elements in the order you inserted.
So basically a HashSet is more appropriate if you will access elements individually. Or being the simplest Set implementation you can use HashSet in generic cases. If you need to keep the order of insertion you need to use LinkedHashSet and if you have to enforce any custom ordering or natural ordering of items you should use TreeSet.
According to this question I have ordered a Java Map, as follows:
ValueComparator bvc = new ValueComparator(originalMap);
Map<String,Integer> sortedMap = new TreeMap<String,Integer>(bvc);
sortedMap.putAll(originalMap);
Now, I would like to extract the K most relevant values from the map, in top-K fashion. Is there a highly efficient way of doing it without iterating through the map?
P.S., some similar questions (e.g., this) ask for a solution to the top-1 retrieval problem.
No, not if you use a Map. You'd have to iterate over it.
Have you considered using a PriorityQueue? It's Java's implementation of a heap. It has efficient operations for insertion of arbitrary elements and for removal of the "minimum". You might think about doing this here. Instead of a Map, you could put them into a PriorityQueue ordered by relevance, with the most relevant as the root. Then, to extract the K most relevant, you'd just pop K elements from the PriorityQueue.
If you need the map-like property (mapping from String to Integer), then you could write a class that internally keeps everything in both a PriorityQueue and a HashMap. When you insert, you insert into both; when you remove the minimal element, you pop from the PriorityQueue, and that then tells you which element you also need to remove from your HashMap. This will still give you log-time inserts and min-removals.
I am working on a problem where i'm required to store elements with requirements of No Duplication and Maintaining order. I chose to go with LinkedHashSet Since it fulfilled both my requirements.
Let's say I have this code:
LinkedHashSet hs = new LinkedHashSet();
hs.add("B");
hs.add("A");
hs.add("D");
hs.add("E");
hs.add("C");
hs.add("F");
if(hs.contains("D")){
//do something to remove elements added after"D" i-e remove "E", "C" and "F"
//maybe hs.removeAll(Collection<?>c) ??
}
Can anyone please guide me with the logic to remove these elements?
Am I using the wrong datastructure? If so, then what would be a better alternative?
I think you may need to use an iterator to do the removal if you are using a LinkedHashSet. That is to say find the element, then keep removing until you get to the tail. This will be O(n), but even if you wrote your own LinkedHashSet (with a doubly linked list and hashset) you would have access to the raw linking structure so that you could cut the linked list in O(1), but you would still need to remove all elements that you just cut from the linked list from the HashSet which is where the O(n) cost would arise again.
So in summary, remove the element, then keep an iterator to that element and continue to walk down removing elements until you get to the end. I'm not sure if LinkedHashSet exposes the required calls, but you can probably figure that out.
You could write your own version of an ArrayList that doesn't allow for duplicates, by overriding add() and addAll(). To my knowledge, there is no "common" 3rd party version of such, which has always surprised me. Anybody know of one?
Then the remove code is pretty simple (no need to use an ListIterator)
int idx = this.indexOf("D");
if (idx >= 0) {
for (int goInReverse = this.size()-1; goInReverse > idx; goInReverse--)
this.remove(goInReverse);
}
However, this is still O(N), cause you loop through every element of the List.
The basic problem here is that you have to maintain two data structures, a "map" one representing the key / value mapping, and a "list" other representing the insertion order.
There are "map" and "list" organizations that offer fast removal of a elements after a given point; e.g. ordered trees of various kinds and both array and chain-based lists (modulo the cost of locating the point.)
However, it seems impossible to remove N elements from the two data structures in better than O(N). You have to visit all of the elements being removed to remove them from the 2nd data structure. (In fact, I suspect one could prove this mathematically ...)
In short, there is no data structure that has better complexity than what you are currently using.
The area where it is possible to improve performance (with a custom collection class!) is in avoiding an explicit use of an iterator. Using an iterator and the standard iterator API, the cost is O(N) on the total number of elements in the data structure. You could make this O(N) on the number of elements removed ... if the hash entry nodes also had next/prev links for the sequence.
So, after trying a couple of things mentioned above, I chose to implement a different Data structure. Since I did not have any issue with the O(n) for this problem (as my data is very small)
I used Graphs, this library came in really handy: http://jgrapht.org/
What I am doing is adding all elements as vertices to a DirectedGraph also creating edges between them (edges helped me solve another non-related problem as well). And when it's time to remove the elements I use a recursive function with the following pseudo code:
removeElements(element) {
tempEdge = graph.getOutgoingEdgeFrom(element)
if(tempEdge !=null)
return;
tempVertex = graph.getTargetVertex(tempEdge)
removeElements(tempVertex)
graph.remove(tempVertex)
}
I agree that graph DS is not good for these kind of problems, but under my conditions, this works perfectly... Cheers!
I know following things about linkedHashSet
it maintains insertion order
uses LinkedList to preserve order
my question is how does hashing come into picture ??
I understand If hashing is used then the concept of bucketing comes in
However, from checking the code in the JDK it seems that LinkedHashSet implementation contains only constructor and no implementation, so I guess all the logic happens in HashSet?
so hashSet uses LinkedList by default ?
Let me put my question this way ... if objective is to write a collection that
maintains unique values
preserves insertion order using a linked list THEN ... it can easily be done without Hashing ... may be we can call this collection LinkedSet
saw a similar question what's the difference between HashSet and LinkedHashSet but not very helpful
Let me know if i need to explain my question more
False. The implementation of LinkedHashSet is really all in LinkedHashMap. (And the implementation of HashSet is really all in HashMap. Le gasp!)
HashSet has no linked list at all.
It's entirely possible to write a LinkedSet collection backed by a linked list, that keeps elements unique -- it's just that its performance will be pretty crappy.
It's an 'interesting' implementation. The constructors for LinkedHashSet defer to package-private constructors in HashSet which setup the data structure (a LinkedHashMap) for maintaining iteration order.
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
}
The API designers could simply have exposed this constructor as public, with appropriate documentation, but I guess they wanted the code to be more 'self-documenting'.
If you look closely, you will see it is actually using some protected constructors on the HashSet that are there just for it, not regular ones. e.g.,
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
}
So the keySet being used to back the LinkedHashSet is in fact coming from the implementation of LinkedHashMap, not a regular HashMap like a regular HashSet. It doesn't actually use java.util.LinkedList. It just maintains pointers that form a list within the implementation of the bucket contents (Map.Entry<K,V>)
316 private static class Entry<K,V> extends HashMap.Entry<K,V> {
317 // These fields comprise the doubly linked list used for iteration.
318 Entry<K,V> before, after;
319
320 Entry(int hash, K key, V value, HashMap.Entry<K,V> next) {
321 super(hash, key, value, next);
322 }
Hashing comes into the picture because it's an easy way to create a collection that enforces uniqueness and offers constant-time performance for most operations. Sure we could just use a linked list and add uniqueness checking, but the time for several operations would become O(N) cause you'd have to iterate the whole list to check for duplicates.
Code Sample
Set<Registeration> registerationSet = new LinkedHashSet<>();
registerationSet.add(new Registeration());
Explanation of Line2.
computes hashCode for Registeration object
search for hashCode in registerationSet to locate the bucket
check for equal object in shortlisted bucket
3.1. if equal found, replace it, with new objects reference
3.2. if not found, append/add Registeration object's reference in bucket
Parallel to it,
A List maintains entry order/queue of all elements inserted
Always, add new reference to the end
In case of replacement(3.1. in above), remove previous occurrence.
For a Specific answer to your question
how does hashing come into picture? (in a LinkedHashSet)
What the Java Docs say...
Like HashSet, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets.
This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order).
The buckets accessed by a hashcode is used to speed up random access, and the LinkedList implementation is for returning an iterator which spits out elements in insertion order.
Hope i have answered your question?