I'm wondering if a HashMap uses a HashSet to store its keys. I would guess it does, because a HashMap would correspond with a HashSet, while a TreeMap would correspond with a TreeSet.
I looked at the source code for the HashMap class, and the method returns an AbstractSet that's implemented by some kind of Iterator.
Additionally...when I write
HashMap map = new HashMap();
if(map.keySet() instanceof HashSet){
System.out.println("true");
}
The above if statement never runs. Now I'm unsure
Could someone explain how the HashMap stores its keys?
You're actually asking two different questions:
Does a HashMap use a HashSet to store its keys?
Does HashMap.keySet() return a HashSet?
The answer to both questions is no, and for the same reason, but there's no technical reason preventing either 1. or 2. from being true.
A HashSet is actually a wrapper around a HashMap; HashSet has the following member variable:
private transient HashMap<E,Object> map;
It populates a PRESENT sentinel value as the value of the map when an object is added to the set.
Now a HashMap stores it's data in an array of Entry objects holding the Key, Value pairs:
transient Entry<K,V>[] table;
And it's keySet() method returns an instance of the inner class KeySet:
public Set<K> keySet() {
Set<K> ks = keySet;
return (ks != null ? ks : (keySet = new KeySet()));
}
private final class KeySet extends AbstractSet<K> {
// minimal Set implementation to access the keys of the map
}
Since KeySet is a private inner class, as far as you should be concerned it is simply an arbitrary Set implementation.
Like I said, there's no reason this has to be the case. You could absolutely implement a Map class that used a HashSet internally, and then have your Map return a HashSet from .keySet(). However this would be inefficient and difficult to code; the existing implementation is both more robust and more efficient than naive Map/Set implementations.
Code snippets taken from Oracle JDK 1.7.0_17. You can view the source of your version of Java inside the src.zip file in your Java install directory.
I'm wondering if a HashMap uses a HashSet to store its keys.
That would not work too well, because a Set only keeps track of the keys. It has no way to store the associated value mapping.
The opposite (using a Map to store Set elements) is possible, though, and this approach is being used:
HashSet is implemented by using a HashMap (with a dummy value for all keys).
The set of keys returned by HashMap#keySet is implemented by a private inner class (HashMap.KeySet extends AbstractSet).
You can study the source for both class, for example on GrepCode: HashMap and HashSet.
Could someone explain how the HashMap stores its keys?
It uses an array of buckets. Each bucket has a linked list of entries. See also
How does Java HashMap store entries internally
Hashmap and how this works behind the scene
The set that is returned by the keySet is backed by the underlying map only.
As per javadoc
Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined. The set supports element removal, which removes the corresponding mapping from the map, via the Iterator.remove, Set.remove, removeAll, retainAll, and clear operations. It does not support the add or addAll operations.
Blockquote
HashMap stores keys into buckets. Keys that have same hash code goes into the same bucket. When retrieving value for an key if more than one key is found in the bucket than equals method is used to find the right key and hence the right value.
Answer is: NO.
HashMap.keySet() is a VIEW of the keys contained in this map.
The data of the map is stored in Entry[] table of HashMap.
Related
This question already has answers here:
Iteration order of HashSet
(9 answers)
Closed 5 years ago.
I was wondering, both HashMap and HashSet do not return values in order?
Please someone clarify.
I am confused and why do you need these two?
In a word - yes. Neither HashMap or HashSet give any guarantee on the order of iteration.
The HashMap API does not define the order of iteration.
However, if you look at the implementation of HashMap, you can deduce that there is a complex transient relationship between the iteration order, the keys' hash values, the order in which the keys were inserted and the size of the hashtable. This relationship gets scrambled if the hashtable resizes itself.
Please refer:
Is the order of values retrieved from a HashMap the insertion order
If you want the values in the order you insert you have to use LinkedHasmap other wise you can use TreeMap where sort by the key. hash does not give any order because it uses the hascode to order values it may vary depend on the object.
A HashMap stores key value pairs. A HashSet is an unordered collection of objects in which each there can be no repeats. Neither are necessarily iterated in order insertion or otherwise. There are ordered implementations but you would not use the standard HashMap or HashSet classes.
Talked about further here
Linked Hash Map does maintain insertion order
HashMap is a implementation of Map interface. Map is a data structure to say that A corresponds to B.
Map<Integer, String> map = new HashMap<Integer, String>();
map.put(new Integer(1), "Test");
In the case above we know that every time that we look for 1, the correspondent is "Test".
Sets are something completely different. Sets ensures that no duplicate object will exists in your collection. HashSet is an implementation of Set interface.
To insert and retrieve something in order you could use LinkedHashMap, LinkedHashSet, ArrayList (implementation of List interface) and others
Wrapping it up:
Map - correspond objects
Set - Ensure unique objects in a collection
I'm a bit confused about the internal implementation of HashSet and HashMap in java.
This is my understanding, so please correct me if I'm wrong:
Neither HashSet or HashMap allow duplicate elements.
HashSet is backed by a HashMap, so in a HashSet when we call .add(element), we are calling the hashCode() method on the element and internally doing a put(k,v) to the internal HashMap, where the key is the hashCode and the value is the actual object. So if we try to add the same object to the Set, it will see that the hashCode is already there, and then replace the old value by the new one.
But then, this seems inconsistent to me when I read how a HashMap works when storing our own objects as keys in a HashMap.
In this case we must override the hashCode() and equals() methods and make them consistent between each other, because, if we find keys with the same hashCode, they will go to the same bucket, and then to distinguish between all the entries with the same hashCode we have to iterate over the list of entries to call the method equals() on each key and find a match.
So in this case, we allow to have the same hashCode and we create a bucket containing a list for all the objects with the same hashCode, however using a HashSet, if we find already a hashCode, we replace the old value by the new value.
I'm a bit confused, could someone clarify this to me please?
You are correct regarding the behavior of HashMap, but you are wrong about the implementation of HashSet.
HashSet is backed by a HashMap internally, but the element you are adding to the HashSet is used as the key in the backing HashMap. For the value, a dummy value is used. Therefore the HashSet's contains(element) simply calls the backing HashMap's containsKey(element).
The value we insert in HashMap acts as a Key to the map object and for its value, java uses a constant variable.So in the key-value pair, all the keys will have the same value.
you can refer to this link
https://www.geeksforgeeks.org/hashset-in-java/
Hash Map:-Basically Hash map working as key and value ,if we want to store data as key and value pair then we will go to the hash map, basically when we insert data by using hash map basically internally it will follow 3 think,
1.hashcode
2..equale
3.==
when we insert the data in hash map it will store the data in bucket(fast in) by using hash code , if there is 2 data store in the same bocket then key collision will happen to resolve this key collision we use (==) method, always == method check the reference of the object, if both object hashcode is same then first one replace to second one if the hashcode is not same then hashing Collision will happen to resolve this hashing collision we will use (.equal) method .equal method basically it will check the content , if both the content is same then it will return true other wise it will return false, so in the hash map it will check is the content is same ? if the content is same then first one replace to the second one if both content is different the it will create another one object in the bocket and store the data
Hash Set:- Basically Hash Set is use to store bunch of object at a time ,internally hash set also use hash map only , when we insert somethink by using add method internally it will call put method and it will store data in the hashmap key bcz hash map key always unique and duplicate are not allowed that's way hashset also unique and duplicate are not allowed and if we entered duplicate also in hashst it will not through any exception first one will replace to the second one and in the value it will store constant data "PRESENT".
You can observe that internal hashmap object contains the element of hashset as keys and constant “PRESENT” as their value.
Where present is constant which is defined as
private static final Object present = new Object()
I cannot understand the use of HashFunction in LinkedHashMap.
In the HashMap implementation, the use of hashFunction is to find the index of the internal array, which can be justified, following the hashfunction contract (same key will must have same hashcode, but distinct key can have same hashcode).
My questions are:
1) What is the use of hashfunction in LinkedHashMap?
2) How does the put and get method works for LinkedHashMap?
3) Why does it maintains the doublylinkedlist internally?
Whats wrong in using the HashMap as internal implementation(just like HashSet) and maintain a separate Array/List of indexes of the Entry array in the sequence of insertion?
Appreciate useful response and references.
1) LinkedHashMap extends HashMap so the hashfunction is the same of HashMap (if you check the code the hash function is inherited from HashMap), i.e. the function computes a the hash of the object inserted and it use to store in a data structure together with the elements with the same key hash; the hasfunction is used in the get method to retrieve the object with the key specified as a param.
2)Put and Get method are behave the same way as HashMap plus the track the insertion order of the elements so when you iterate over the the keyset you get the key values in the order you inserted into the map (see here for more details)
3)the LinkedHashMap uses a double linked list instead of an Array because it's more compact; a double linked list is the the most efficient data structure for list where you insert and remove items; if you mostly insert/append elements then an array based implementation may be better. Since the map sematic is a key-value implementation and removing elements from the map could be a frequent operation a double linked list is a better fit. The internal implmentation could be made with a LinkedList but my opionion is that using a low level data stucture is more efficient and decouples LinkedHashMap from other classes.
A LinkedHashMap does use a HashMap (in fact it extends from it), so the hashCode is used to identify the right hash bucket in the array of hash buckets, just as for HashMap. put and get work just as for HashMap (except that the before and after references for iterating over the entries are updated differently for the two implementations).
The reason insertion order is not kept by keeping an Array or ArrayList is that addition or removal in the middle of an ArrayList is an O(n) operation because you have to move all subsequent items along one place. You could do this with a LinkedList because addition and removal in the middle of a LinkedList is O(1) (all you have to do is break a few links and make a few new ones). However there's no point using a separate LinkedList because you may as well make the Map.Entry objects reference the previous and next Entry objects, which is exactly how LinkedHashMap works.
LinkedHashMap is a good choice for a data structure where you want to be able to put and get entries with O(1) running time, but you also need the behavior of a LinkedList. The internal hashing function is what allows you put and get entries with constant-time.
Here is how you use LinkedHashMap:
Map<String, Double> linkedHashMap = new LinkedHashMap<String, String>();
linkedHashMap.put("today", "Wednesday");
linkedHashMap.put("tomorrow", "Thursday");
String today = linkedHashMap.get("today"); // today is 'Wednesday'
There are several arguments against using a simple HashMap and maintaining a separate List for the insertion order. First, if you go this route it means you will have to maintain 2 data structures instead of one. This is error prone, and makes maintaining your code more difficult. Second, if you have to make your data structure Thread-safe, this would be complex for 2 data structures. On the other hand, if you use a LinkedHashMap you only would have to worry about making this single data structure thread-safe.
As for implementation details, when you do a put into a LinkedHashMap, the JVM will take your key and use a cryptographic mapping function to ultimately convert that key into a memory address where your value will be stored. Doing a get with a given key will also use this mapping function to find the exact location in memory where the value be stored. The entrySet() method returns a Set consisting of all the keys and values in the LinkedHashMap. By definition, sets are not ordered. The entrySet() is not guaranteed to be Thread-safe.
Ans. 2)
when we call put(map,key) of linkedhashmap. Internally it calls createEntry
void createEntry(int hash, K key, V value, int bucketIndex) {
HashMap.Entry<K,V> old = table[bucketIndex];
Entry<K,V> e = new Entry<K,V>(hash, key, value, old);
table[bucketIndex] = e;
e.addBefore(header);
size++;
Ans 3)
To efficiently maintain a linkedHashmap, you actually need a doubly linked list.
Consider three entries in order
A ---> B ---> C
Suppose you want to remove B. Obviously A should now point to C. But unless you know the entry before B you cannot efficiently say which entry should now point to C. To fix this, you need entries to point in both the directions Like this
---> --->
A B C
<--- <---
This way, when you remove B you can look at the entries before and after B (A and C) and update so that A and C point to each other.
similar post in this link discussed earlier
why linkedhashmap maintains doubly linked list for iteration
What are the practical scenario for choosing among the linkedhashmap and hashmap? I have gone through working of each and come to the conclusion that linkedhashmap maintains the order of insertion i.e elements will be retrieved in the same order as that of insertion order while hashmap won't maintain order.
So can someone tell in what practical scenarios selection of one of the collection framework and why?
LinkedHashMap will iterate in the order in which the entries were put into the map.
null Values are allowed in LinkedHashMap.
The implementation is not synchronized and uses double linked buckets.
LinkedHashMap is very similar to HashMap, but it adds awareness to the order at which items are added or accessed, so the iteration order is the same as insertion order depending on construction parameters.
LinkedHashMap also provides a great starting point for creating a Cache object by overriding the removeEldestEntry() method. This lets you create a Cache object that can expire data using some criteria that you define.
Based on linked list and hashing data structures with linked list (think of indexed-SkipList) capability to store data in the way it gets inserted in the tree. Best suited to implement LRU ( least recently used ).
LinkedHashMap extends HashMap.
It maintains a linked list of the entries in the map, in the order in which they were inserted. This allows insertion-order iteration over the map. That is,when iterating through a collection-view of a LinkedHashMap, the elements will be returned in the order in which they were inserted. Also if one inserts the key again into the LinkedHashMap, the original order is retained. This allows insertion-order iteration over the map. That is, when iterating a LinkedHashMap, the elements will be returned in the order in which they were inserted. You can also create a LinkedHashMap that returns its elements in the order in which they were last accessed.
LinkedHashMap constructors
LinkedHashMap( )
This constructor constructs an empty insertion-ordered LinkedHashMap instance with the default initial capacity (16) and load factor (0.75).
LinkedHashMap(int capacity)
This constructor constructs an empty LinkedHashMap with the specified initial capacity.
LinkedHashMap(int capacity, float fillRatio)
This constructor constructs an empty LinkedHashMap with the specified initial capacity and load factor.
LinkedHashMap(Map m)
This constructor constructs a insertion-ordered Linked HashMap with the same mappings as the specified Map.
LinkedHashMap(int capacity, float fillRatio, boolean Order)
This constructor construct an empty LinkedHashMap instance with the specified initial capacity, load factor and ordering mode.
Important methods supported by LinkedHashMap
Class clear( )
Removes all mappings from the map.
containsValue(object value )>
Returns true if this map maps one or more keys to the specified value.
get(Object key)
Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.
removeEldestEntry(Map.Entry eldest)
Below is an example of how you can use LinkedHashMap:
Map<Integer, String> myLinkedHashMapObject = new LinkedHashMap<Integer, String>();
myLinkedHashMapObject.put(3, "car");
myLinkedHashMapObject.put(5, "bus");
myLinkedHashMapObject.put(7, "nano");
myLinkedHashMapObject.put(9, "innova");
System.out.println("Modification Before" + myLinkedHashMapObject);
System.out.println("Vehicle exists: " +myLinkedHashMapObject.containsKey(3));
System.out.println("vehicle innova Exists: "+myLinkedHashMapObject.containsValue("innova"));
System.out.println("Total number of vehicles: "+ myLinkedHashMapObject.size());
System.out.println("Removing vehicle 9: " + myLinkedHashMapObject.remove(9));
System.out.println("Removing vehicle 25 (does not exist): " + myLinkedHashMapObject.remove(25));
System.out.println("LinkedHashMap After modification" + myLinkedHashMapObject);
Shopping Cart is a real life example, where we see cart number against Item we have chosen in order we selected the item. So map could be LinkedHashMap<Cart Number Vs Item Chosen>
HashMap makes absolutely no guarantees about the iteration order. It can (and will) even change completely when new elements are added.
LinkedHashMap will iterate in the order in which the entries were put into the map
LinkedHashMap also requires more memory than HashMap because of this ordering feature. As I said before LinkedHashMap uses doubly LinkedList to keep order of elements.
In most cases when using a Map you don't care whether the order of insertion is maintained. Use a HashMap if you don't care, and a LinkedHashMap is you care.
However, if you look when and where maps are used, in many cases it contains only a few entries, not enough for the performance difference of the different implementations to make a difference.
LinkedHashMap maintain insertion order of keys, i.e the order in which keys are inserted into LinkedHashMap. On the other hand HashMap doesn't maintain any order or keys or values. In terms of Performance there is not much difference between HashMap and LinkedHashMap but yes LinkedHashMap has more memory foot print than HashMap to maintain doubly linked list which it uses to keep track of insertion order of keys.
A HashMap has a better performance than a LinkedHashMap because a LinkedHashMap needs the expense of maintaining the linked list. The LinkedHashMap implements a normal hashtable, but with the added benefit of the keys of the hashtable being stored as a doubly-linked list.
Both of their methods are not synchronized.
Let's take a look their API documentation:
The HashMap is a hash table with buckets in each hash slot.
API documentation:
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
LinkedHashMap is a linked list implementing the map interface. As
said in the API documentation:
Hash table and linked list implementation of the Map interface, with
predictable iteration order. This implementation differs from HashMap
in that it maintains a doubly-linked list running through all of its
entries. This linked list defines the iteration ordering, which is
normally the order in which keys were inserted into the map
(insertion-order).
One way that I have used these at work are for cached backend REST queries. These also have the added benefit of returning the data in the some order for the client. You can read more about it in the oracle docs:
https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html
This technique is particularly useful if a module takes a map on input, copies it, and later returns results whose order is determined by that of the copy. (Clients generally appreciate having things returned in the same order they were presented.)
A special constructor is provided to create a linked hash map whose order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order). This kind of map is well-suited to building LRU caches. Invoking the put, putIfAbsent, get, getOrDefault, compute, computeIfAbsent, computeIfPresent, or merge methods results in an access to the corresponding entry (assuming it exists after the invocation completes). The replace methods only result in an access of the entry if the value is replaced. The putAll method generates one entry access for each mapping in the specified map, in the order that key-value mappings are provided by the specified map's entry set iterator. No other methods generate entry accesses. In particular, operations on collection-views do not affect the order of iteration of the backing map.
I have been wondering about the Map from java.util.
Why does values() method return a Collection while the keySet and entrySet return a Set?
What's the advantages/disadvantages of a Set and Collection?
A set guarantees that a given entry can only exist in it once. A Collection doesn't. Since a Map has no uniqueness guarantees in terms of values, the set of them isn't really a set at all but would have to be a Collection.
It's not really an issue of advantages and disadvantages -- it's what the keys, values and entries of a map represent that's important.
Keys in a map are unique
The keys in a Map are unique -- that is, there aren't going to be duplicate keys in a Map. A Collection which assures that duplicates don't exist is a Set.
Therefore, the keys are returned as a Set by the keySet method.
Values in a map are not necessarily unique
On the other hand, the values of a Map does not have to be unique.
For example, we could have an entry in a map with the key "fruit" map to the value "apple", and also have another entry with key "computer" mapping to the value "apple":
map {
key:"fruit" -> value:"apple"
key:"computer" -> value:"apple"
}
Having duplicate values in a map is allowed.
Therefore, we cannot use a Set, as that necessitates that all the entries unique. A good choice for the values of a Map is to return a plain-old Collection as it does not impose any restrictions to what the values are.
Entries in a map are also unique
The entries of the Map are unique -- they are a combination of the key and value, represented by the Map.Entry object. Since this key-value pair is unique, it is returned as a Set by the entrySet method.
Further reading
Wikipedia: associative arrays (map)
Wikipedia: set
Map internally manages Set of keys because keys are unique values aren't
Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined. The set supports element removal, which removes the corresponding mapping from the map, via the Iterator.remove, Set.remove, removeAll, retainAll, and clear operations. It does not support the add or addAll operations.
Also See
Javadoc
Source Code
A Set is a Collection that contains no duplicate elements. The advantage of returning a Set when possible is that it makes the guarantee of uniqueness explicit.
As already pointed out by others, values() cannot return a Set because the collection of values may contain duplicates.
values() could be duplicated, so it is Collection.
keySet() and entrySet() couldn't be duplicated, so they are Set.
ps: Set is a non-duplicated Collection.