I cannot understand the use of HashFunction in LinkedHashMap.
In the HashMap implementation, the use of hashFunction is to find the index of the internal array, which can be justified, following the hashfunction contract (same key will must have same hashcode, but distinct key can have same hashcode).
My questions are:
1) What is the use of hashfunction in LinkedHashMap?
2) How does the put and get method works for LinkedHashMap?
3) Why does it maintains the doublylinkedlist internally?
Whats wrong in using the HashMap as internal implementation(just like HashSet) and maintain a separate Array/List of indexes of the Entry array in the sequence of insertion?
Appreciate useful response and references.
1) LinkedHashMap extends HashMap so the hashfunction is the same of HashMap (if you check the code the hash function is inherited from HashMap), i.e. the function computes a the hash of the object inserted and it use to store in a data structure together with the elements with the same key hash; the hasfunction is used in the get method to retrieve the object with the key specified as a param.
2)Put and Get method are behave the same way as HashMap plus the track the insertion order of the elements so when you iterate over the the keyset you get the key values in the order you inserted into the map (see here for more details)
3)the LinkedHashMap uses a double linked list instead of an Array because it's more compact; a double linked list is the the most efficient data structure for list where you insert and remove items; if you mostly insert/append elements then an array based implementation may be better. Since the map sematic is a key-value implementation and removing elements from the map could be a frequent operation a double linked list is a better fit. The internal implmentation could be made with a LinkedList but my opionion is that using a low level data stucture is more efficient and decouples LinkedHashMap from other classes.
A LinkedHashMap does use a HashMap (in fact it extends from it), so the hashCode is used to identify the right hash bucket in the array of hash buckets, just as for HashMap. put and get work just as for HashMap (except that the before and after references for iterating over the entries are updated differently for the two implementations).
The reason insertion order is not kept by keeping an Array or ArrayList is that addition or removal in the middle of an ArrayList is an O(n) operation because you have to move all subsequent items along one place. You could do this with a LinkedList because addition and removal in the middle of a LinkedList is O(1) (all you have to do is break a few links and make a few new ones). However there's no point using a separate LinkedList because you may as well make the Map.Entry objects reference the previous and next Entry objects, which is exactly how LinkedHashMap works.
LinkedHashMap is a good choice for a data structure where you want to be able to put and get entries with O(1) running time, but you also need the behavior of a LinkedList. The internal hashing function is what allows you put and get entries with constant-time.
Here is how you use LinkedHashMap:
Map<String, Double> linkedHashMap = new LinkedHashMap<String, String>();
linkedHashMap.put("today", "Wednesday");
linkedHashMap.put("tomorrow", "Thursday");
String today = linkedHashMap.get("today"); // today is 'Wednesday'
There are several arguments against using a simple HashMap and maintaining a separate List for the insertion order. First, if you go this route it means you will have to maintain 2 data structures instead of one. This is error prone, and makes maintaining your code more difficult. Second, if you have to make your data structure Thread-safe, this would be complex for 2 data structures. On the other hand, if you use a LinkedHashMap you only would have to worry about making this single data structure thread-safe.
As for implementation details, when you do a put into a LinkedHashMap, the JVM will take your key and use a cryptographic mapping function to ultimately convert that key into a memory address where your value will be stored. Doing a get with a given key will also use this mapping function to find the exact location in memory where the value be stored. The entrySet() method returns a Set consisting of all the keys and values in the LinkedHashMap. By definition, sets are not ordered. The entrySet() is not guaranteed to be Thread-safe.
Ans. 2)
when we call put(map,key) of linkedhashmap. Internally it calls createEntry
void createEntry(int hash, K key, V value, int bucketIndex) {
HashMap.Entry<K,V> old = table[bucketIndex];
Entry<K,V> e = new Entry<K,V>(hash, key, value, old);
table[bucketIndex] = e;
e.addBefore(header);
size++;
Ans 3)
To efficiently maintain a linkedHashmap, you actually need a doubly linked list.
Consider three entries in order
A ---> B ---> C
Suppose you want to remove B. Obviously A should now point to C. But unless you know the entry before B you cannot efficiently say which entry should now point to C. To fix this, you need entries to point in both the directions Like this
---> --->
A B C
<--- <---
This way, when you remove B you can look at the entries before and after B (A and C) and update so that A and C point to each other.
similar post in this link discussed earlier
why linkedhashmap maintains doubly linked list for iteration
Related
What are the practical scenario for choosing among the linkedhashmap and hashmap? I have gone through working of each and come to the conclusion that linkedhashmap maintains the order of insertion i.e elements will be retrieved in the same order as that of insertion order while hashmap won't maintain order.
So can someone tell in what practical scenarios selection of one of the collection framework and why?
LinkedHashMap will iterate in the order in which the entries were put into the map.
null Values are allowed in LinkedHashMap.
The implementation is not synchronized and uses double linked buckets.
LinkedHashMap is very similar to HashMap, but it adds awareness to the order at which items are added or accessed, so the iteration order is the same as insertion order depending on construction parameters.
LinkedHashMap also provides a great starting point for creating a Cache object by overriding the removeEldestEntry() method. This lets you create a Cache object that can expire data using some criteria that you define.
Based on linked list and hashing data structures with linked list (think of indexed-SkipList) capability to store data in the way it gets inserted in the tree. Best suited to implement LRU ( least recently used ).
LinkedHashMap extends HashMap.
It maintains a linked list of the entries in the map, in the order in which they were inserted. This allows insertion-order iteration over the map. That is,when iterating through a collection-view of a LinkedHashMap, the elements will be returned in the order in which they were inserted. Also if one inserts the key again into the LinkedHashMap, the original order is retained. This allows insertion-order iteration over the map. That is, when iterating a LinkedHashMap, the elements will be returned in the order in which they were inserted. You can also create a LinkedHashMap that returns its elements in the order in which they were last accessed.
LinkedHashMap constructors
LinkedHashMap( )
This constructor constructs an empty insertion-ordered LinkedHashMap instance with the default initial capacity (16) and load factor (0.75).
LinkedHashMap(int capacity)
This constructor constructs an empty LinkedHashMap with the specified initial capacity.
LinkedHashMap(int capacity, float fillRatio)
This constructor constructs an empty LinkedHashMap with the specified initial capacity and load factor.
LinkedHashMap(Map m)
This constructor constructs a insertion-ordered Linked HashMap with the same mappings as the specified Map.
LinkedHashMap(int capacity, float fillRatio, boolean Order)
This constructor construct an empty LinkedHashMap instance with the specified initial capacity, load factor and ordering mode.
Important methods supported by LinkedHashMap
Class clear( )
Removes all mappings from the map.
containsValue(object value )>
Returns true if this map maps one or more keys to the specified value.
get(Object key)
Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.
removeEldestEntry(Map.Entry eldest)
Below is an example of how you can use LinkedHashMap:
Map<Integer, String> myLinkedHashMapObject = new LinkedHashMap<Integer, String>();
myLinkedHashMapObject.put(3, "car");
myLinkedHashMapObject.put(5, "bus");
myLinkedHashMapObject.put(7, "nano");
myLinkedHashMapObject.put(9, "innova");
System.out.println("Modification Before" + myLinkedHashMapObject);
System.out.println("Vehicle exists: " +myLinkedHashMapObject.containsKey(3));
System.out.println("vehicle innova Exists: "+myLinkedHashMapObject.containsValue("innova"));
System.out.println("Total number of vehicles: "+ myLinkedHashMapObject.size());
System.out.println("Removing vehicle 9: " + myLinkedHashMapObject.remove(9));
System.out.println("Removing vehicle 25 (does not exist): " + myLinkedHashMapObject.remove(25));
System.out.println("LinkedHashMap After modification" + myLinkedHashMapObject);
Shopping Cart is a real life example, where we see cart number against Item we have chosen in order we selected the item. So map could be LinkedHashMap<Cart Number Vs Item Chosen>
HashMap makes absolutely no guarantees about the iteration order. It can (and will) even change completely when new elements are added.
LinkedHashMap will iterate in the order in which the entries were put into the map
LinkedHashMap also requires more memory than HashMap because of this ordering feature. As I said before LinkedHashMap uses doubly LinkedList to keep order of elements.
In most cases when using a Map you don't care whether the order of insertion is maintained. Use a HashMap if you don't care, and a LinkedHashMap is you care.
However, if you look when and where maps are used, in many cases it contains only a few entries, not enough for the performance difference of the different implementations to make a difference.
LinkedHashMap maintain insertion order of keys, i.e the order in which keys are inserted into LinkedHashMap. On the other hand HashMap doesn't maintain any order or keys or values. In terms of Performance there is not much difference between HashMap and LinkedHashMap but yes LinkedHashMap has more memory foot print than HashMap to maintain doubly linked list which it uses to keep track of insertion order of keys.
A HashMap has a better performance than a LinkedHashMap because a LinkedHashMap needs the expense of maintaining the linked list. The LinkedHashMap implements a normal hashtable, but with the added benefit of the keys of the hashtable being stored as a doubly-linked list.
Both of their methods are not synchronized.
Let's take a look their API documentation:
The HashMap is a hash table with buckets in each hash slot.
API documentation:
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
LinkedHashMap is a linked list implementing the map interface. As
said in the API documentation:
Hash table and linked list implementation of the Map interface, with
predictable iteration order. This implementation differs from HashMap
in that it maintains a doubly-linked list running through all of its
entries. This linked list defines the iteration ordering, which is
normally the order in which keys were inserted into the map
(insertion-order).
One way that I have used these at work are for cached backend REST queries. These also have the added benefit of returning the data in the some order for the client. You can read more about it in the oracle docs:
https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html
This technique is particularly useful if a module takes a map on input, copies it, and later returns results whose order is determined by that of the copy. (Clients generally appreciate having things returned in the same order they were presented.)
A special constructor is provided to create a linked hash map whose order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order). This kind of map is well-suited to building LRU caches. Invoking the put, putIfAbsent, get, getOrDefault, compute, computeIfAbsent, computeIfPresent, or merge methods results in an access to the corresponding entry (assuming it exists after the invocation completes). The replace methods only result in an access of the entry if the value is replaced. The putAll method generates one entry access for each mapping in the specified map, in the order that key-value mappings are provided by the specified map's entry set iterator. No other methods generate entry accesses. In particular, operations on collection-views do not affect the order of iteration of the backing map.
According to this question I have ordered a Java Map, as follows:
ValueComparator bvc = new ValueComparator(originalMap);
Map<String,Integer> sortedMap = new TreeMap<String,Integer>(bvc);
sortedMap.putAll(originalMap);
Now, I would like to extract the K most relevant values from the map, in top-K fashion. Is there a highly efficient way of doing it without iterating through the map?
P.S., some similar questions (e.g., this) ask for a solution to the top-1 retrieval problem.
No, not if you use a Map. You'd have to iterate over it.
Have you considered using a PriorityQueue? It's Java's implementation of a heap. It has efficient operations for insertion of arbitrary elements and for removal of the "minimum". You might think about doing this here. Instead of a Map, you could put them into a PriorityQueue ordered by relevance, with the most relevant as the root. Then, to extract the K most relevant, you'd just pop K elements from the PriorityQueue.
If you need the map-like property (mapping from String to Integer), then you could write a class that internally keeps everything in both a PriorityQueue and a HashMap. When you insert, you insert into both; when you remove the minimal element, you pop from the PriorityQueue, and that then tells you which element you also need to remove from your HashMap. This will still give you log-time inserts and min-removals.
I read that HashMap has the following implementation:
main array
↓
[Entry] → Entry → Entry ← linked-list implementation
[Entry]
[Entry] → Entry
[Entry]
[null ]
So, it has an array of Entry objects.
Questions:
I was wondering how can an index of this array store multiple Entry objects in case of same hashCode but different objects.
How is this different from LinkedHashMap implementation? Its doubly linked list implementation of map but does it maintain an array like the above and how does it store pointers to the next and previous element?
HashMap does not maintain insertion order, hence it does not maintain any doubly linked list.
Most salient feature of LinkedHashMap is that it maintains insertion order of key-value pairs. LinkedHashMap uses doubly Linked List for doing so.
Entry of LinkedHashMap looks like this-
static class Entry<K, V> {
K key;
V value;
Entry<K,V> next;
Entry<K,V> before, after; //For maintaining insertion order
public Entry(K key, V value, Entry<K,V> next){
this.key = key;
this.value = value;
this.next = next;
}
}
By using before and after - we keep track of newly added entry in LinkedHashMap, which helps us in maintaining insertion order.
Before refers to previous entry and
after refers to next entry in LinkedHashMap.
For diagrams and step by step explanation please refer http://www.javamadesoeasy.com/2015/02/linkedhashmap-custom-implementation.html
Thanks..!!
So, it has an array of Entry objects.
Not exactly. It has an array of Entry object chains. A HashMap.Entry object has a next field allowing the Entry objects to be chained as a linked list.
I was wondering how can an index of this array store multiple Entry objects in case of same hashCode but different objects.
Because (as the picture in your question shows) the Entry objects are chained.
How is this different from LinkedHashMap implementation? Its doubly linked list implementation of map but does it maintain an array like the above and how does it store pointers to the next and previous element?
In the LinkedHashMap implementation, the LinkedHashMap.Entry class extends the HashMap.Entry class, by adding before and after fields. These fields are used to assemble the LinkedHashMap.Entry objects into an independent doubly-linked list that records the insertion order. So, in the LinkedHashMap class, each entry object is in two distinct chains:
There are a number of singly linked hash chains that is accessed via the main hash array. This is used for (regular) hashmap lookups.
There is a separate doubly linked list that contains all of the entry objects. It is kept in entry insertion order, and is used when you iterate the entries, keys or values in the hashmap.
Take a look for yourself. For future reference, you can just google:
java LinkedHashMap source
HashMap uses a LinkedList to handle collissions, but the difference between HashMap and LinkedHashMap is that LinkedHashMap has a predicable iteration order, which is achieved through an additional doubly-linked list, which usually maintains the insertion order of the keys. The exception is when a key is reinserted, in which case it goes back to the original position in the list.
For reference, iterating through a LinkedHashMap is more efficient than iterating through a HashMap, but LinkedHashMap is less memory efficient.
In case it wasn't clear from my above explanation, the hashing process is the same, so you get the benefits of a normal hash, but you also get the iteration benefits as stated above, since you're using a doubly linked list to maintain the ordering of your Entry objects, which is independent of the linked-list used during hashing for collisions, in case that was ambiguous..
EDIT: (in response to OP's comment):
A HashMap is backed by an array, in which some slots contain chains of Entry objects to handle the collisions. To iterate through all of the (key,value) pairs, you would need to go through all of the slots in the array and then go through the LinkedLists; hence, your overall time would be proportional to the capacity.
When using a LinkedHashMap, all you need to do is traverse through the doubly-linked list, so the overall time is proportional to the size.
Since none of the other answers actually explain how something like this could be implemented I'll give it a shot.
One way would be to have some extra information in the value (of the key->value pair) not visible to the user, that had a reference to the previous and next element inserted into the hash map. The benefits are that you can still delete elements in constant time removing from a hashmap is constant time and removing from a linked list is in this case because you have a reference to the entry. You can still insert in constant time because hash map insert is constant, linked list isn't normally but in this case you have constant time access to a spot in the linked list so you can insert in constant time, and lastly retrieval is constant time because you only have to deal with the hash map part of the structure for it.
Keep in mind that a data structure like this does not come without costs. The size of the hash map will rise significantly because of all the extra references. Each of the main methods will be slightly slower (could matter if they are called repeatedly). And the indirection of the data structure (not sure if that's a real term :P) is increased, though this might not be as big a deal because the references are guaranteed to be pointing to stuff inside the hash map.
Since the only advantage of this type of structure is that it preserves order be careful when you use it. Also when reading the answer keep in mind I don't know that this is the way it's implemented but it is how I would do it if given the task.
On the oracle docs there is a quote confirming some of my guesses.
This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries.
Another relevant quote from the same website.
This class provides all of the optional Map operations, and permits null elements. Like HashMap, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashMap, due to the added expense of maintaining the linked list, with one exception: Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.
hashCode will be mapped to any bucket by the hash function. If there is a collision in hashCode than HashMap resolve this collision by chaining i.e. it will add the value to the linked list. Below is the code which does this:
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
392 Object k;
393 if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
394 `enter code here` V oldValue = e.value;
395 e.value = value;
396 e.recordAccess(this);
397 return oldValue;
398 }
399 }
You can clearly see that it traverse the linked list and if it finds the key than it replaces the old value with new else append to the linked list.
But the difference between LinkedHashMap and HashMap is LinkedHashMap maintains the insertion order. From docs:
This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map. (A key k is reinserted into a map m if m.put(k, v) is invoked when m.containsKey(k) would return true immediately prior to the invocation).
I know following things about linkedHashSet
it maintains insertion order
uses LinkedList to preserve order
my question is how does hashing come into picture ??
I understand If hashing is used then the concept of bucketing comes in
However, from checking the code in the JDK it seems that LinkedHashSet implementation contains only constructor and no implementation, so I guess all the logic happens in HashSet?
so hashSet uses LinkedList by default ?
Let me put my question this way ... if objective is to write a collection that
maintains unique values
preserves insertion order using a linked list THEN ... it can easily be done without Hashing ... may be we can call this collection LinkedSet
saw a similar question what's the difference between HashSet and LinkedHashSet but not very helpful
Let me know if i need to explain my question more
False. The implementation of LinkedHashSet is really all in LinkedHashMap. (And the implementation of HashSet is really all in HashMap. Le gasp!)
HashSet has no linked list at all.
It's entirely possible to write a LinkedSet collection backed by a linked list, that keeps elements unique -- it's just that its performance will be pretty crappy.
It's an 'interesting' implementation. The constructors for LinkedHashSet defer to package-private constructors in HashSet which setup the data structure (a LinkedHashMap) for maintaining iteration order.
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
}
The API designers could simply have exposed this constructor as public, with appropriate documentation, but I guess they wanted the code to be more 'self-documenting'.
If you look closely, you will see it is actually using some protected constructors on the HashSet that are there just for it, not regular ones. e.g.,
HashSet(int initialCapacity, float loadFactor, boolean dummy) {
map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
}
So the keySet being used to back the LinkedHashSet is in fact coming from the implementation of LinkedHashMap, not a regular HashMap like a regular HashSet. It doesn't actually use java.util.LinkedList. It just maintains pointers that form a list within the implementation of the bucket contents (Map.Entry<K,V>)
316 private static class Entry<K,V> extends HashMap.Entry<K,V> {
317 // These fields comprise the doubly linked list used for iteration.
318 Entry<K,V> before, after;
319
320 Entry(int hash, K key, V value, HashMap.Entry<K,V> next) {
321 super(hash, key, value, next);
322 }
Hashing comes into the picture because it's an easy way to create a collection that enforces uniqueness and offers constant-time performance for most operations. Sure we could just use a linked list and add uniqueness checking, but the time for several operations would become O(N) cause you'd have to iterate the whole list to check for duplicates.
Code Sample
Set<Registeration> registerationSet = new LinkedHashSet<>();
registerationSet.add(new Registeration());
Explanation of Line2.
computes hashCode for Registeration object
search for hashCode in registerationSet to locate the bucket
check for equal object in shortlisted bucket
3.1. if equal found, replace it, with new objects reference
3.2. if not found, append/add Registeration object's reference in bucket
Parallel to it,
A List maintains entry order/queue of all elements inserted
Always, add new reference to the end
In case of replacement(3.1. in above), remove previous occurrence.
For a Specific answer to your question
how does hashing come into picture? (in a LinkedHashSet)
What the Java Docs say...
Like HashSet, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets.
This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order).
The buckets accessed by a hashcode is used to speed up random access, and the LinkedList implementation is for returning an iterator which spits out elements in insertion order.
Hope i have answered your question?
I was wondering if the Collection view of the values contained in a HashMap is kept ordered when the HashMap changes.
For example if I have a HashMap whose values() method returns L={a, b, c}
What happened to L if I add a new element "d" to the map?
Is it added at the end, i.e. if I iterate through the elements, it's the order kept?
In particular, if the addition of the new element "d" causes a rehash, will the order be kept in L?
Many thanks!
I was wondering if the Collection view of the values contained in a HashMap is kept ordered when the HashMap changes.
No, there is no such guarantee.
If this was the case, then the following program would output and ordered sequence from 1-100
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
for (int i = 0; i < 100; i++)
map.put(i, i);
System.out.println(map.values());
(and it doesn't).
There is a class that does precisely what you're asking for, and that is LinkedHashMap:
Hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order).
If it doesn't say it in the JavaDoc then there are no guarantees about it. Different versions of Java could do different things. Don't depend on undocumented behaviour.
You might want to look at LinkedHashMap.
HashMap in Java aren't ordered, so I think it will be safe to say that values() won't return an ordered Collection.
LinkedHashMap is an ordered version of HashMap (insertion order), but I don't know it values() will return an ordered Collection. I think the best is to try.
Generally they is no guarantee of order when you are using HashMap. It might be in the order in which you add elements for a few elements but it would get reshuffled when there is a possibility of collision and it has to go with a collision resolution strategy.