When we are adding an entry or retrieving an entry to the HashMap, the equals() on the key would have been sufficient to find it in the particular index of the bucket. Why is hash also stored and checked?
You are correct that there is no need to recalculate the hash values for keys (in entries) when doing operations like lookup and removal. They are stored for performance related reasons.
Calculating the hash value for a key can be expensive.
When the ratio of entries to the size of a main array exceeds a certain value (the "load factor") the HashMap implementation expands the array. When this happens, all existing entries need to be redistributed to new hash chains ... based on the entry keys' has values. The hash values are stored in the Entry objects so that they doesn't have to be recalculated each time the entries need to be redistributed.
Having stored the hashcode values, they could also be used to accelerate lookup ...
// Version #1
if (node.key.equals(keyToTest)) {
...
}
// Versions #2
if (node.hashValue == hashValueToTest && node.key.equals(keyToTest)) {
...
}
If the key.equals method is expensive, then you can save some time (on average) by avoiding calling it when the hash values don't match. (But when they do match, the call must be made anyway!)
So, really, there are TWO reasons why the hash values are stored.
HashMap is an implementation of Map which maintains a table of entries, with references to the associated keys and values, organized according to their hash code. This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.
Please see:
Hashmap and how this works behind the scene
While it is correct that equals() would be sufficient to (ultimately) find the key, this would not provide constant-time performance.
There are other implementations of Map which do not use hash codes.
Related
I want to know how hashtable orders its values after using Put method.
For example:
a b c d e
Normal 2 weeks Next Save and Finish Go to Cases
hashtable.put("a","Normal"); ...
The order of the values will be different and not in the same order we put.
I think the order will be like this:
b a e c d
2 weeks Normal Go to Cases Next Save and Finish
Please suggest data structures that solve the problem.
Thanks.
As very often in these cases, the answer is in the documentation:
This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
Similar to HashMap, HashTable also does not guarantee insertion order for the elements.
Reason
HashTable is optimized for fast look up. This is achieved by calculating hash for key values stored. This ensures that searching for any value in HashTable is O(1), takes the same time irrespective the number entries in HashTable.
Thus the entries are stored based on the hash generated for key. This is the reason why HashTable does not guarantee the order of the elements in which they were inserted.
A hash value (or simply hash), also called a message digest, is a number generated from a string of text. The hash is substantially smaller than the text itself, and is generated by a formula in such a way that it is extremely unlikely that some other text will produce the same hash value.
http://www.webopedia.com/TERM/H/hashing.html
http://interactivepython.org/runestone/static/pythonds/SortSearch/Hashing.html
As previously explained, hashtable iterate order is just casual. If you want to preserve inserted order use LinkedHashMap. If you want to obtain natural order or a predefined one, use TreeMap. As natural order I mean the order of key, for example String, Integer, Long and so on, as implement Comparable interface, are automatically sorted as any other class that implements Comparable. Predefined order can be supplied by a Comparator too, creating the TreeMap.
I cannot understand the use of HashFunction in LinkedHashMap.
In the HashMap implementation, the use of hashFunction is to find the index of the internal array, which can be justified, following the hashfunction contract (same key will must have same hashcode, but distinct key can have same hashcode).
My questions are:
1) What is the use of hashfunction in LinkedHashMap?
2) How does the put and get method works for LinkedHashMap?
3) Why does it maintains the doublylinkedlist internally?
Whats wrong in using the HashMap as internal implementation(just like HashSet) and maintain a separate Array/List of indexes of the Entry array in the sequence of insertion?
Appreciate useful response and references.
1) LinkedHashMap extends HashMap so the hashfunction is the same of HashMap (if you check the code the hash function is inherited from HashMap), i.e. the function computes a the hash of the object inserted and it use to store in a data structure together with the elements with the same key hash; the hasfunction is used in the get method to retrieve the object with the key specified as a param.
2)Put and Get method are behave the same way as HashMap plus the track the insertion order of the elements so when you iterate over the the keyset you get the key values in the order you inserted into the map (see here for more details)
3)the LinkedHashMap uses a double linked list instead of an Array because it's more compact; a double linked list is the the most efficient data structure for list where you insert and remove items; if you mostly insert/append elements then an array based implementation may be better. Since the map sematic is a key-value implementation and removing elements from the map could be a frequent operation a double linked list is a better fit. The internal implmentation could be made with a LinkedList but my opionion is that using a low level data stucture is more efficient and decouples LinkedHashMap from other classes.
A LinkedHashMap does use a HashMap (in fact it extends from it), so the hashCode is used to identify the right hash bucket in the array of hash buckets, just as for HashMap. put and get work just as for HashMap (except that the before and after references for iterating over the entries are updated differently for the two implementations).
The reason insertion order is not kept by keeping an Array or ArrayList is that addition or removal in the middle of an ArrayList is an O(n) operation because you have to move all subsequent items along one place. You could do this with a LinkedList because addition and removal in the middle of a LinkedList is O(1) (all you have to do is break a few links and make a few new ones). However there's no point using a separate LinkedList because you may as well make the Map.Entry objects reference the previous and next Entry objects, which is exactly how LinkedHashMap works.
LinkedHashMap is a good choice for a data structure where you want to be able to put and get entries with O(1) running time, but you also need the behavior of a LinkedList. The internal hashing function is what allows you put and get entries with constant-time.
Here is how you use LinkedHashMap:
Map<String, Double> linkedHashMap = new LinkedHashMap<String, String>();
linkedHashMap.put("today", "Wednesday");
linkedHashMap.put("tomorrow", "Thursday");
String today = linkedHashMap.get("today"); // today is 'Wednesday'
There are several arguments against using a simple HashMap and maintaining a separate List for the insertion order. First, if you go this route it means you will have to maintain 2 data structures instead of one. This is error prone, and makes maintaining your code more difficult. Second, if you have to make your data structure Thread-safe, this would be complex for 2 data structures. On the other hand, if you use a LinkedHashMap you only would have to worry about making this single data structure thread-safe.
As for implementation details, when you do a put into a LinkedHashMap, the JVM will take your key and use a cryptographic mapping function to ultimately convert that key into a memory address where your value will be stored. Doing a get with a given key will also use this mapping function to find the exact location in memory where the value be stored. The entrySet() method returns a Set consisting of all the keys and values in the LinkedHashMap. By definition, sets are not ordered. The entrySet() is not guaranteed to be Thread-safe.
Ans. 2)
when we call put(map,key) of linkedhashmap. Internally it calls createEntry
void createEntry(int hash, K key, V value, int bucketIndex) {
HashMap.Entry<K,V> old = table[bucketIndex];
Entry<K,V> e = new Entry<K,V>(hash, key, value, old);
table[bucketIndex] = e;
e.addBefore(header);
size++;
Ans 3)
To efficiently maintain a linkedHashmap, you actually need a doubly linked list.
Consider three entries in order
A ---> B ---> C
Suppose you want to remove B. Obviously A should now point to C. But unless you know the entry before B you cannot efficiently say which entry should now point to C. To fix this, you need entries to point in both the directions Like this
---> --->
A B C
<--- <---
This way, when you remove B you can look at the entries before and after B (A and C) and update so that A and C point to each other.
similar post in this link discussed earlier
why linkedhashmap maintains doubly linked list for iteration
What are the practical scenario for choosing among the linkedhashmap and hashmap? I have gone through working of each and come to the conclusion that linkedhashmap maintains the order of insertion i.e elements will be retrieved in the same order as that of insertion order while hashmap won't maintain order.
So can someone tell in what practical scenarios selection of one of the collection framework and why?
LinkedHashMap will iterate in the order in which the entries were put into the map.
null Values are allowed in LinkedHashMap.
The implementation is not synchronized and uses double linked buckets.
LinkedHashMap is very similar to HashMap, but it adds awareness to the order at which items are added or accessed, so the iteration order is the same as insertion order depending on construction parameters.
LinkedHashMap also provides a great starting point for creating a Cache object by overriding the removeEldestEntry() method. This lets you create a Cache object that can expire data using some criteria that you define.
Based on linked list and hashing data structures with linked list (think of indexed-SkipList) capability to store data in the way it gets inserted in the tree. Best suited to implement LRU ( least recently used ).
LinkedHashMap extends HashMap.
It maintains a linked list of the entries in the map, in the order in which they were inserted. This allows insertion-order iteration over the map. That is,when iterating through a collection-view of a LinkedHashMap, the elements will be returned in the order in which they were inserted. Also if one inserts the key again into the LinkedHashMap, the original order is retained. This allows insertion-order iteration over the map. That is, when iterating a LinkedHashMap, the elements will be returned in the order in which they were inserted. You can also create a LinkedHashMap that returns its elements in the order in which they were last accessed.
LinkedHashMap constructors
LinkedHashMap( )
This constructor constructs an empty insertion-ordered LinkedHashMap instance with the default initial capacity (16) and load factor (0.75).
LinkedHashMap(int capacity)
This constructor constructs an empty LinkedHashMap with the specified initial capacity.
LinkedHashMap(int capacity, float fillRatio)
This constructor constructs an empty LinkedHashMap with the specified initial capacity and load factor.
LinkedHashMap(Map m)
This constructor constructs a insertion-ordered Linked HashMap with the same mappings as the specified Map.
LinkedHashMap(int capacity, float fillRatio, boolean Order)
This constructor construct an empty LinkedHashMap instance with the specified initial capacity, load factor and ordering mode.
Important methods supported by LinkedHashMap
Class clear( )
Removes all mappings from the map.
containsValue(object value )>
Returns true if this map maps one or more keys to the specified value.
get(Object key)
Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.
removeEldestEntry(Map.Entry eldest)
Below is an example of how you can use LinkedHashMap:
Map<Integer, String> myLinkedHashMapObject = new LinkedHashMap<Integer, String>();
myLinkedHashMapObject.put(3, "car");
myLinkedHashMapObject.put(5, "bus");
myLinkedHashMapObject.put(7, "nano");
myLinkedHashMapObject.put(9, "innova");
System.out.println("Modification Before" + myLinkedHashMapObject);
System.out.println("Vehicle exists: " +myLinkedHashMapObject.containsKey(3));
System.out.println("vehicle innova Exists: "+myLinkedHashMapObject.containsValue("innova"));
System.out.println("Total number of vehicles: "+ myLinkedHashMapObject.size());
System.out.println("Removing vehicle 9: " + myLinkedHashMapObject.remove(9));
System.out.println("Removing vehicle 25 (does not exist): " + myLinkedHashMapObject.remove(25));
System.out.println("LinkedHashMap After modification" + myLinkedHashMapObject);
Shopping Cart is a real life example, where we see cart number against Item we have chosen in order we selected the item. So map could be LinkedHashMap<Cart Number Vs Item Chosen>
HashMap makes absolutely no guarantees about the iteration order. It can (and will) even change completely when new elements are added.
LinkedHashMap will iterate in the order in which the entries were put into the map
LinkedHashMap also requires more memory than HashMap because of this ordering feature. As I said before LinkedHashMap uses doubly LinkedList to keep order of elements.
In most cases when using a Map you don't care whether the order of insertion is maintained. Use a HashMap if you don't care, and a LinkedHashMap is you care.
However, if you look when and where maps are used, in many cases it contains only a few entries, not enough for the performance difference of the different implementations to make a difference.
LinkedHashMap maintain insertion order of keys, i.e the order in which keys are inserted into LinkedHashMap. On the other hand HashMap doesn't maintain any order or keys or values. In terms of Performance there is not much difference between HashMap and LinkedHashMap but yes LinkedHashMap has more memory foot print than HashMap to maintain doubly linked list which it uses to keep track of insertion order of keys.
A HashMap has a better performance than a LinkedHashMap because a LinkedHashMap needs the expense of maintaining the linked list. The LinkedHashMap implements a normal hashtable, but with the added benefit of the keys of the hashtable being stored as a doubly-linked list.
Both of their methods are not synchronized.
Let's take a look their API documentation:
The HashMap is a hash table with buckets in each hash slot.
API documentation:
This implementation provides constant-time performance for the basic
operations (get and put), assuming the hash function disperses the
elements properly among the buckets. Iteration over collection views
requires time proportional to the "capacity" of the HashMap instance
(the number of buckets) plus its size (the number of key-value
mappings). Thus, it's very important not to set the initial capacity
too high (or the load factor too low) if iteration performance is
important.
LinkedHashMap is a linked list implementing the map interface. As
said in the API documentation:
Hash table and linked list implementation of the Map interface, with
predictable iteration order. This implementation differs from HashMap
in that it maintains a doubly-linked list running through all of its
entries. This linked list defines the iteration ordering, which is
normally the order in which keys were inserted into the map
(insertion-order).
One way that I have used these at work are for cached backend REST queries. These also have the added benefit of returning the data in the some order for the client. You can read more about it in the oracle docs:
https://docs.oracle.com/javase/8/docs/api/java/util/LinkedHashMap.html
This technique is particularly useful if a module takes a map on input, copies it, and later returns results whose order is determined by that of the copy. (Clients generally appreciate having things returned in the same order they were presented.)
A special constructor is provided to create a linked hash map whose order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order). This kind of map is well-suited to building LRU caches. Invoking the put, putIfAbsent, get, getOrDefault, compute, computeIfAbsent, computeIfPresent, or merge methods results in an access to the corresponding entry (assuming it exists after the invocation completes). The replace methods only result in an access of the entry if the value is replaced. The putAll method generates one entry access for each mapping in the specified map, in the order that key-value mappings are provided by the specified map's entry set iterator. No other methods generate entry accesses. In particular, operations on collection-views do not affect the order of iteration of the backing map.
I am trying to figure something out about hashing in java.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
HashMap is basically implemented internally as an array of Entry[]. If you understand what is linkedList, this Entry type is nothing but a linkedlist implementation. This type actually stores both key and value.
To insert an element into the array, you need index. How do you calculate index? This is where hashing function(hashFunction) comes into picture. Here, you pass an integer to this hashfunction. Now to get this integer, java gives a call to hashCode method of the object which is being added as a key in the map. This concept is called preHashing.
Now once the index is known, you place the element on this index. This is basically called as BUCKET , so if element is inserted at Entry[0], you say that it falls under bucket 0.
Now assume that the hashFunction returns you same index say 0, for another object that you wanted to insert as a key in the map. This is where equals method is called and if even equals returns true, it simple means that there is a hashCollision. So under this case, since Entry is a linkedlist implmentation, on this index itself, on the already available entry at this index, you add one more node(Entry) to this linkedlist. So bottomline, on hashColission, there are more than one elements at a perticular index through linkedlist.
The same case is applied when you are talking about getting a key from map. Based on index returned by hashFunction, if there is only one entry, that entry is returned otherwise on linkedlist of entries, equals method is called.
Hope this helps with the internals of how it works :)
Hash values in Java are provided by objects through the implementation of public int hashCode() which is declared in Object class and it is implemented for all the basic data types. Once you implement that method in your custom data object then you don't need to worry about how these are used in miscellaneous data structures provided by Java.
A note: implementing that method requires also to have public boolean equals(Object o) implemented in a consistent manner.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
A HashMap is a form of hash table (and HashTable is another). They work by using the hashCode() and equals(Object) methods provided by the HashMaps key type. Depending on how you want you keys to behave, you can use the hashCode / equals methods implemented by java.lang.Object ... or you can override them.
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
I suggest you read the Wikipedia page on Hash Tables to understand how they work. (FWIW, the HashMap and HashTable classes use "separate chaining with linked lists", and some other tweaks to optimize average performance.)
A hash function works by turning an object (i.e. a "key") into an integer. How it does this is up to the implementor. But a common approach is to combine hashcodes of the object's fields something like this:
hashcode = (..((field1.hashcode * prime) + field2.hashcode) * prime + ...)
where prime is a smallish prime number like 31. The key is that you get a good spread of hashcode values for different keys. What you DON'T want is lots of keys all hashing to the same value. That causes "collisions" and is bad for performance.
When you implement the hashcode and equals methods, you need to do it in a way that satisfies the following constraints for the hash table to work correctly:
1. O1.equals(o2) => o1.hashcode() == o2.hashcode()
2. o2.equals(o2) == o2.equals(o1)
3. The hashcode of an object doesn't change while it is a key in a hash table.
It is also worth noting that the default hashCode and equals methods provided by Object are based on the target object's identity.
"But where is the hash values stored then? It is not a part of the HashMap, so is there an array assosiated to the HashMap?"
The hash values are typically not stored. Rather they are calculated as required.
In the case of the HashMap class, the hashcode for each key is actually cached in the entry's Node.hash field. But that is a performance optimization ... to make hash chain searching faster, and to avoid recalculating hashes if / when the hash table is resized. But if you want this level of understanding, you really need to read the source code rather than asking Questions.
This is the most fundamental contract in Java: the .equals()/.hashCode() contract.
The most important part of it here is that two objects which are considered .equals() should return the same .hashCode().
The reverse is not true: objects not considered equal may return the same hash code. But it should be as rare an occurrence as possible. Consider the following .hashCode() implementation, which, while perfectly legal, is as broken an implementation as can exist:
#Override
public int hashCode() { return 42; } // legal!!
While this implementation obeys the contract, it is pretty much useless... Hence the importance of a good hash function to begin with.
Now: the Set contract stipulates that a Set should not contain duplicate elements; however, the strategy of a Set implementation is left... Well, to the implementation. You will notice, if you look at the javadoc of Map, that its keys can be retrieved by a method called .keySet(). Therefore, Map and Set are very closely related in this regard.
If we take the case of a HashSet (and, ultimately, HashMap), it relies on .equals() and .hashCode(): when adding an item, it first calculates this item's hash code, and according to this hash code, attemps to insert the item into a given bucket. In contrast, a TreeSet (and TreeMap) relies on the natural ordering of elements (see Comparable).
However, if an object is to be inserted and the hash code of this object would trigger its insertion into a non empty hash bucket (see the legal, but broken, .hashCode() implementation above), then .equals() is used to determine whether that object is really unique.
Note that, internally, a HashSet is a HashMap...
Hashing is a way to assign a unique code for any variable/object after applying any function/algorithm on its properties.
HashMap stores key-value pair in Map.Entry static nested class implementation.
HashMap works on hashing algorithm and uses hashCode() and equals() method in put and get methods.
When we call put method by passing key-value pair, HashMap uses Key hashCode() with hashing to find out
the index to store the key-value pair. The Entry is stored in the LinkedList, so if there are already
existing entry, it uses equals() method to check if the passed key already exists, if yes it overwrites
the value else it creates a new entry and store this key-value Entry.
When we call get method by passing Key, again it uses the hashCode() to find the index
in the array and then use equals() method to find the correct Entry and return it’s value.
Below image will explain these detail clearly.
The other important things to know about HashMap are capacity, load factor, threshold resizing.
HashMap initial default capacity is 16 and load factor is 0.75. Threshold is capacity multiplied
by load factor and whenever we try to add an entry, if map size is greater than threshold,
HashMap rehashes the contents of map into a new array with a larger capacity.
The capacity is always power of 2, so if you know that you need to store a large number of key-value pairs,
for example in caching data from database, it’s good idea to initialize the HashMap with correct capacity
and load factor.
How exactly hash map store data internally ... I knew it will calculate HashCode value of key and store it.If two key having same hash code it will put into same bucket. But why if "two keys are same hashMap over write" the existing one?
Well, that's what it's designed to do. It's a mapping of key/value pairs, where any key is associated with 0 or 1 value. If you put a second value for a key, the entry for that key will be replaced.
It's not based just on the hash code though - it will test the keys for equality, too. Two keys can be unequal but have the same hash code. The important thing is that two equal keys must have the same hash code.
If you want to store multiple values for a single key, you should use something like Guava's Multimap.
It will not overwrite the value if the hashCode() is same. It will overwrite only if they are equal by the equals method.
See http://en.wikipedia.org/wiki/Hash_table and http://www.docjar.com/html/api/java/util/HashMap.java.html
A hash table or hash map is an array of linked lists, keyed by hashcode.
Hashcode code main purpose is to reduce the number of invocation of equals method in the hash based collection. Same hash code need not return true for equals method. But if you say its equals is true, then it hascode should be true.
Hash functions are generallyy used for eliminating duplicate data.That is why collections type
like Hashmap not allowing to store duplicate data.
This algorithms has been used in database as well to eliminate possible duplicates while retrieval.