Java HashMap resizing - java

Let's assume we have some code
class WrongHashCode{
public int code=0;
#Override
public int hashCode(){
return code;
}
}
public class Rehashing {
public static void main(String[] args) {
//Initial capacity is 2 and load factor 75%
HashMap<WrongHashCode,String> hashMap=new HashMap<>(2,0.75f);
WrongHashCode wrongHashCode=new WrongHashCode();
//put object to be lost
hashMap.put(wrongHashCode,"Test1");
//Change hashcode of same Key object
wrongHashCode.code++;
//Resizing hashMap involved 'cause load factor barrier
hashMap.put(wrongHashCode,"Test2");
//Always 2
System.out.println("Keys count " + hashMap.keySet().size());
}
}
So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys), we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?

So, my question is why after resizing hashMap (that, as far, as I understand involves rehashing keys)
It actually does not involve rehashing keys – at least not in the HashMap code except in certain circumstances (see below). It involves repositioning them in the map buckets. Inside of HashMap is a Entry class which has the following fields:
final K key;
V value;
Entry<K,V> next;
int hash;
The hash field is the stored hashcode for the key that is calculated when the put(...) call is made. This means that if you change the hashcode in your object it will not affect the entry in the HashMap unless you re-put it into the map. Of course if you change the hashcode for a key you won't be even able to find it in the HashMap because it has a different hashcode as the stored hash entry.
we still have 2 keys in keySet instead of 1 (since key object is same for both existing KV pairs) ?
So even though you've changed the hash for the single object, it is in the map with 2 entries with different hash fields in it.
All that said, there is code inside of HashMap which may rehash the keys when a HashMap is resized – see the package protected HashMap.transfer(...) method in jdk 7 (at least). This is why the hash field above is not final. It is only used however when initHashSeedAsNeeded(...) returns true to use "alternative hashing". The following sets the threshold of number of entries where the alt-hashing is enabled:
-Djdk.map.althashing.threshold=1
With this set on the VM, I'm actually able to get the hashcode() to be called again when the resizing happens but I'm not able to get the 2nd put(...) to be seen as an overwrite. Part of the problem is that the HashMap.hash(...) method is doing an XOR with the internal hashseed which is changed when the resizing happens, but after the put(...) records the new hash code for the incoming entry.

The HashMap actually caches the hashCode for each key (as a key's hashCode may be expensive to compute). So, although you changed the hashCode for an existing key, the Entry to which it is linked in the HashMap still has the old code (and hence gets put in the "wrong" bucket after resize).
You can see this for yourself in the jvm code for HashMap.resize() (or a little easier to see in the java 6 code HashMap.transfer()).

I can't tell why two of the answers rely on HashMap.tranfer for some example, when that method is not present in java-8 at all. As such I will provide my small input taking java-8 in consideration.
Entries in a HashMap are indeed re-hashed, but not in the sense you might think they do. A re-hash is basically re-computing the already provided (by you) of the Key#hashcode; there is a method for that:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
So basically when you compute your hashcode, HashMap will basically say - "i don't trust you enough" and it will re-hash your hashcode and potentially spread the bits better (it's actually a XOR of the first 16 bits and last 16 bits).
On the other hand when HashMap is re-sized it actually means that number of bins/buckets is doubled in size; and because bins are always a power of two - that means that an entry from a current bin will: potential stay in the same bucket OR move to a bucket that is at the offset at the current number of bins. You can find a bit of details how this is done in this question.
So once a re-size happens, there is no extra re-hashing; actually one more bit is taken into consideration and thus an entry might move or stay where it is. And Gray's answer is correct in this sense, that each Entry has the hash field, that is computed only once - the first time you put that Entry.

I can't find it clearly documented, but changing a key value in a way that changes its hashCode() typically breaks a HashMap.
HashMap divides entries amongst b buckets. You can imagine key with hash h is assigned to bucket h%b.
When it receives a new entry it works out which bucket it belongs to then if an equal key already exists in that bucket. It finally adds it to the bucket removing any matched key.
By changing the hash-code the object wrongHashCode will be (typically and here actually) directed to another bucket second time around and its first entry won't be found or removed.
In short, changing the hash of an already inserted key breaks the HashMap and what you get after that is unpredictable but may result in (a) not finding a key or (b) find two or more equal keys.

Because HashMap stores the elements in an internal table and incrementing the code does not affect that table:
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
And addEntry
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
As you can see table[bucketIndex] = new Entry (hash, ...) so although you increment the code, it won't be reflected here.
Try making the field code to be Integer and see what happens?

Related

Why does the get method of HashMap have a FOR loop?

I am looking at the source code for HashMap in Java 7, and I see that the put method will check if any entry is already present and if it is present then it will replace the old value with the new value.
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
So, basically it means that there would always be only one entry for the given key, I have seen this by debugging as well, but if I am wrong then please correct me.
Now, since there is only one entry for a given key, why does the get method have a FOR loop, since it could have simply returned the value directly?
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
I feel the above loop is unnecessary. Please help me understand if I am wrong.
table[indexFor(hash, table.length)] is the bucket of the HashMap that may contain the key we are looking for (if it is present in the Map).
However, each bucket may contain multiple entries (either different keys having the same hashCode(), or different keys with different hashCode() that still got mapped to the same bucket), so you must iterate over these entries until you find the key you are looking for.
Since the expected number of entries in each bucket should be very small, this loop is still executed in expected O(1) time.
If you see the internal working of get method of HashMap.
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];e != null;e = e.next)
{
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
First, it gets the hash code of the key object, which is passed, and
finds the bucket location.
If the correct bucket is found, it returns the value (e.value)
If no match is found, it returns null.
Some times there may be chances of Hashcode collision and for solving this collision Hashmap uses equals() and then store that element into LinkedList in same bucket.
Lets take example:
Fetch the data for key vaibahv:
map.get(new Key("vaibhav"));
Steps:
Calculate hash code of Key {“vaibhav”}.It will be generated as 118.
Calculate index by using index method it will be 6.
Go to index 6 of array and compare first element’s key with given
key. If both are equals then return the value, otherwise check for
next element if it exists.
In our case it is not found as first element and next of node object
is not null.
If next of node is null then return null.
If next of node is not null traverse to the second element and
repeat the process 3 until key is not found or next is not null.
For this retrieval process for loop will be used.
For more reference you can refer
this
For the record, in java-8, this is present also (sort of, since there are TreeNodes also):
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
Basically (for the case when the bin is not a Tree), iterate the entire bin, until you find the entry we are looking for.
Looking at this implementation you might understand why providing a good hash is good - so that not all entries end up in the same bucket, thus a bigger time to search for it.
I think #Eran has already answered your query well and #Prashant has also made a good attempt along with other people who have answered, so let me explain it using an example so that concept becomes very clear.
Concepts
Basically what #Eran is trying to say that in a given bucket (basically at given index of the array) it is possible that there is more than one entry (nothing but Entry object) and this is possible when 2 or more keys give different hash but give the same index/bucket location.
Now, in order to put the entry in the hashmap, this is what happens at a high level (read carefully because I have gone the extra mile to explain some good things which are otherwise not part of your question):
Get the hash: what happens here is that first hash is calculated for a given key (notice that this is not hashCode, a hash is calculated using the hashCode and it is done as-as to mitigate the risk of poorly written hash function).
Get the index: This is basically the index of the array or in other words bucket. Now, why this index is calculated instead of directly using the hash as the index is because to mitigate the risk that hash could more than the size of the hashmap, so this index calculation step ensures that index will always be less than the size of the hashmap.
And when a situation occurs when 2 keys give different hash but the same index, then both those will go in the same bucket, and that is the reason that FOR loop is important.
Example
Below is a simple example I have created to demonstrate the concept to you:
public class Person {
private int id;
Person(int _id){
id = _id;
}
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
#Override
public int hashCode() {
return id;
}
}
Test class:
import java.util.Map;
public class HashMapHashingTest {
public static void main(String[] args) {
Person p1 = new Person(129);
Person p2 = new Person(133);
Map<Person, String> hashMap = new MyHashMap<>(2);
hashMap.put(p1, "p1");
hashMap.put(p2, "p2");
System.out.println(hashMap);
}
}
Debug screenshot (please click and zoom because it is looking small):
Notice, that in above example, both Person object gives different hash value (136 and 140 respectively) but gives the same index of 0, so both objects go in the same bucket. In the screenshot, you can see that both objects are at index 0 and there you have a next also populated which basically points to the second object.
Update:
Another easiest way to see that more than one key is going into the same bucket is by creating a class and overriding the hashCode method to always return the same int value, now what would happen is that all the objects of that class would give the same index/bucket location but since you have not overridden the equals method so they would not be considered same and hence will form a list at that index/bucket location.
Another twist in this would suppose you override the equals method as well and compare all objects equal then only one object will be present at the index/bucket location because all objects are equal.
While the other answers explain what is going on, OP's comments on those answers leads me to think a different angle of explanation is required.
Simplified example
Let's say you are going to toss 10 strings into a hash map: "A", "B", "C", "Hi", "Bye", "Yo", "Yo-yo", "Z", "1", "2"
You are using HashMap as your hash map instead of making your own hash map (good choice). Some of the stuff below will not use HashMap implementation directly but will approach it from a more theoretical and abstract point of view.
HashMap does not magically know that you are going to add 10 strings to it, nor does it know what strings you will be putting into it later. It has to provide places to put whatever you might give to it... for all it knows you are going to put 100,000 strings in it - perhaps every word in the dictionary.
Let's say that, because of the constructor argument you chose when making your new HashMap(n) that your hash map has 20 buckets. We'll call them bucket[0] through bucket[19].
map.put("A", value); Let's say that the hash value for "A" is 5. The hash map can now do bucket[5] = new Entry("A", value);
map.put("B", value); Assume hash("B") = 3. So, bucket[3] = new Entry("B", value);
map.put("C"), value); - hash("C") = 19 - bucket[19] = new Entry("C", value);
map.put("Hi", value); Now here's where it gets interesting. Let's say that your hash function is such that hash("Hi") = 3. So now hash map wants to do bucket[3] = new Entry("Hi", value); We have a problem! bucket[3] is where we put the key "B", and "Hi" is definitely a different key than "B"... but they have the same hash value. We have a collision!
Because of this possibility, the HashMap is not actually implemented this way. A hash map needs to have buckets that can hold more than 1 entry in them. NOTE: I did not say more than 1 entry with the same key, as we cannot have that, but it needs to have buckets that can hold more than 1 entry of different keys. We need a bucket that can hold both "B" and "Hi".
So let's not do bucket[n] = new Entry(key, value);, but instead let's have bucket be of type Bucket[] instead of Entry[]. So now we do bucket[n].add( new Entry(key, value) );
So let's change to...
bucket[3].add("B", value);
and
bucket[3].add("Hi", value);
As you can see, we now have the entries for "B" and "Hi" in the same bucket. Now, when we want to get them back out, we need to loop through everything in the bucket, for example, with a for loop.
So the looping is present because of the collisions. Not collisions of key, but collisions of hash(key).
Why do we use such a crazy data structure?
You might be asking at this point, "Wait, WHAT!?! Why would we do such a weird thing like that??? Why are we using such a contrived and convoluted data structure???" The answer to that question would be...
A hash map works like this because of the properties that such a peculiar setup provides to us due to the way the math works out. If you use a good hash function which minimizes conflicts, and if you size your HashMap to have more buckets than the number of entries that you guess will be in it, then you have an optimized hash map which will be the fastest data structure for inserts and queries of complex data.
Your HashMap may be too small
Since you say you are often seeing this for-loop being iterated over with multiple elements in your debugging, that means that your HashMap might be too small. If you have a reasonable guess as to how many things you might put into it, try to set the size to be larger than that. Notice in my example above that I was inserting 10 strings but had a hash map with 20 buckets. With a good hash function, this will yield very few collisions.
Note:
Note: the above example is a simplification of the matter and does take some shortcuts for brevity. A full explanation is even slightly more complicated, but everything you need to know to answer the question as asked is here.
Hash tables has buckets because hashes of objects do not have to be unique. If hashes of objects are equal, means, objects, probably, are equal. If hashes of objects are different, then objects are exactly different.
Therefore, objects with the same hashes are grouped into buckets. The for loop is used to iterate objects contained in such a bucket.
In fact, this means that the algorithmic complexity of finding an object in such a hash table is not constant (although very close to it), but something between logarithmic and linear.
I would like to put it in simple words. the put method have a FOR loop to iterate over the list of keys which falls under the same bucket of hashCode.
What happens when you do put the key-value pair into the hashmap:
So for every key you pass to the HashMap, it will calculate the hashCode for it.
So many keys can fall under the same hashCode bucket. Now HashMap will check if the same key is already present or not in the same bucket.
In Java 7, HashMap maintains all the keys of the same bucket in a list. So before inserting the key it will traverse through the list to check if the same key is present or not. That's why there is a FOR loop.
So in average case its time complexity: O(1) and in worst case its time complexity is O(N).

Is different hashcode ensures a different bucket will be assigned in a hashmap? [duplicate]

As per my understanding I think:
It is perfectly legal for two objects to have the same hashcode.
If two objects are equal (using the equals() method) then they have the same hashcode.
If two objects are not equal then they cannot have the same hashcode
Am I correct?
Now if am correct, I have the following question:
The HashMap internally uses the hashcode of the object. So if two objects can have the same hashcode, then how can the HashMap track which key it uses?
Can someone explain how the HashMap internally uses the hashcode of the object?
A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode() and equals() methods of keys:
If two keys are the same (equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).
If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals() to tell them apart.
Your third assertion is incorrect.
It's perfectly legal for two unequal objects to have the same hash code. It's used by HashMap as a "first pass filter" so that the map can quickly find possible entries with the specified key. The keys with the same hash code are then tested for equality with the specified key.
You wouldn't want a requirement that two unequal objects couldn't have the same hash code, as otherwise that would limit you to 232 possible objects. (It would also mean that different types couldn't even use an object's fields to generate hash codes, as other classes could generate the same hash.)
HashMap is an array of Entry objects.
Consider HashMap as just an array of objects.
Have a look at what this Object is:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
…
}
Each Entry object represents a key-value pair. The field next refers to another Entry object if a bucket has more than one Entry.
Sometimes it might happen that hash codes for 2 different objects are the same. In this case, two objects will be saved in one bucket and will be presented as a linked list.
The entry point is the more recently added object. This object refers to another object with the next field and so on. The last entry refers to null.
When you create a HashMap with the default constructor
HashMap hashMap = new HashMap();
The array is created with size 16 and default 0.75 load balance.
Adding a new key-value pair
Calculate hashcode for the key
Calculate position hash % (arrayLength-1) where element should be placed (bucket number)
If you try to add a value with a key which has already been saved in HashMap, then value gets overwritten.
Otherwise element is added to the bucket.
If the bucket already has at least one element, a new one gets added and placed in the first position of the bucket. Its next field refers to the old element.
Deletion
Calculate hashcode for the given key
Calculate bucket number hash % (arrayLength-1)
Get a reference to the first Entry object in the bucket and by means of equals method iterate over all entries in the given bucket. Eventually we will find the correct Entry.
If a desired element is not found, return null
You can find excellent information at http://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html
To Summarize:
HashMap works on the principle of hashing
put(key, value): HashMap stores both key and value object as Map.Entry. Hashmap applies hashcode(key) to get the bucket. if there is collision ,HashMap uses LinkedList to store object.
get(key): HashMap uses Key Object's hashcode to find out bucket location and then call keys.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap.
Here is a rough description of HashMap's mechanism, for Java 8 version, (it might be slightly different from Java 6).
Data structures
Hash table
Hash value is calculated via hash() on key, and it decide which bucket of the hashtable to use for a given key.
Linked list (singly)
When count of elements in a bucket is small, a singly linked list is used.
Red-Black tree
When count of elements in a bucket is large, a red-black tree is used.
Classes (internal)
Map.Entry
Represent a single entity in map, the key/value entity.
HashMap.Node
Linked list version of node.
It could represent:
A hash bucket.
Because it has a hash property.
A node in singly linked list, (thus also head of linkedlist).
HashMap.TreeNode
Tree version of node.
Fields (internal)
Node[] table
The bucket table, (head of the linked lists).
If a bucket don't contains elements, then it's null, thus only take space of a reference.
Set<Map.Entry> entrySet
Set of entities.
int size
Number of entities.
float loadFactor
Indicate how full the hash table is allowed, before resizing.
int threshold
The next size at which to resize.
Formula: threshold = capacity * loadFactor
Methods (internal)
int hash(key)
Calculate hash by key.
How to map hash to bucket?
Use following logic:
static int hashToBucket(int tableSize, int hash) {
return (tableSize - 1) & hash;
}
About capacity
In hash table, capacity means the bucket count, it could be get from table.length.
Also could be calculated via threshold and loadFactor, thus no need to be defined as a class field.
Could get the effective capacity via: capacity()
Operations
Find entity by key.
First find the bucket by hash value, then loop linked list or search sorted tree.
Add entity with key.
First find the bucket according to hash value of key.
Then try find the value:
If found, replace the value.
Otherwise, add a new node at beginning of linked list, or insert into sorted tree.
Resize
When threshold reached, will double hashtable's capacity(table.length), then perform a re-hash on all elements to rebuild the table.
This could be an expensive operation.
Performance
get & put
Time complexity is O(1), because:
Bucket is accessed via array index, thus O(1).
Linked list in each bucket is of small length, thus could view as O(1).
Tree size is also limited, because will extend capacity & re-hash when element count increase, so could view it as O(1), not O(log N).
The hashcode determines which bucket for the hashmap to check. If there is more than one object in the bucket then a linear search is done to find which item in the bucket equals the desired item (using the equals()) method.
In other words, if you have a perfect hashcode then hashmap access is constant, you will never have to iterate through a bucket (technically you would also have to have MAX_INT buckets, the Java implementation may share a few hash codes in the same bucket to cut down on space requirements). If you have the worst hashcode (always returns the same number) then your hashmap access becomes linear since you have to search through every item in the map (they're all in the same bucket) to get what you want.
Most of the time a well written hashcode isn't perfect but is unique enough to give you more or less constant access.
You're mistaken on point three. Two entries can have the same hash code but not be equal. Take a look at the implementation of HashMap.get from the OpenJdk. You can see that it checks that the hashes are equal and the keys are equal. Were point three true, then it would be unnecessary to check that the keys are equal. The hash code is compared before the key because the former is a more efficient comparison.
If you're interested in learning a little more about this, take a look at the Wikipedia article on Open Addressing collision resolution, which I believe is the mechanism that the OpenJdk implementation uses. That mechanism is subtly different than the "bucket" approach one of the other answers mentions.
import java.util.HashMap;
public class Students {
String name;
int age;
Students(String name, int age ){
this.name = name;
this.age=age;
}
#Override
public int hashCode() {
System.out.println("__hash__");
final int prime = 31;
int result = 1;
result = prime * result + age;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
System.out.println("__eq__");
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Students other = (Students) obj;
if (age != other.age)
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
public static void main(String[] args) {
Students S1 = new Students("taj",22);
Students S2 = new Students("taj",21);
System.out.println(S1.hashCode());
System.out.println(S2.hashCode());
HashMap<Students,String > HM = new HashMap<Students,String > ();
HM.put(S1, "tajinder");
HM.put(S2, "tajinder");
System.out.println(HM.size());
}
}
Output:
__ hash __
116232
__ hash __
116201
__ hash __
__ hash __
2
So here we see that if both the objects S1 and S2 have different content, then we are pretty sure that our overridden Hashcode method will generate different Hashcode(116232,11601) for both objects. NOW since there are different hash codes, so it won't even bother to call EQUALS method. Because a different Hashcode GUARANTEES DIFFERENT content in an object.
public static void main(String[] args) {
Students S1 = new Students("taj",21);
Students S2 = new Students("taj",21);
System.out.println(S1.hashCode());
System.out.println(S2.hashCode());
HashMap<Students,String > HM = new HashMap<Students,String > ();
HM.put(S1, "tajinder");
HM.put(S2, "tajinder");
System.out.println(HM.size());
}
}
Now lets change out main method a little bit. Output after this change is
__ hash __
116201
__ hash __
116201
__ hash __
__ hash __
__ eq __
1
We can clearly see that equal method is called. Here is print statement __eq__, since we have same hashcode, then content of objects MAY or MAY not be similar. So program internally calls Equal method to verify this.
Conclusion
If hashcode is different , equal method will not get called.
if hashcode is same, equal method will get called.
Thanks , hope it helps.
two objects are equal, implies that they have same hashcode, but not vice versa.
2 equal objects ------> they have same hashcode
2 objects have same hashcode ----xxxxx--> they are NOT equal
Java 8 update in HashMap-
you do this operation in your code -
myHashmap.put("old","old-value");
myHashMap.put("very-old","very-old-value");
so, suppose your hashcode returned for both keys "old" and "very-old" is same. Then what will happen.
myHashMap is a HashMap, and suppose that initially you didn't specify its capacity. So default capacity as per java is 16. So now as soon as you initialised hashmap using the new keyword, it created 16 buckets. now when you executed first statement-
myHashmap.put("old","old-value");
then hashcode for "old" is calculated, and because the hashcode could be very large integer too, so, java internally did this - (hash is hashcode here and >>> is right shift)
hash XOR hash >>> 16
so to give as a bigger picture, it will return some index, which would be between 0 to 15. Now your key value pair "old" and "old-value" would be converted to Entry object's key and value instance variable. and then this entry object will be stored in the bucket, or you can say that at a particular index, this entry object would be stored.
FYI- Entry is a class in Map interface- Map.Entry, with these signature/definition
class Entry{
final Key k;
value v;
final int hash;
Entry next;
}
now when you execute next statement -
myHashmap.put("very-old","very-old-value");
and "very-old" gives same hashcode as "old", so this new key value pair is again sent to the same index or the same bucket. But since this bucket is not empty, then the next variable of the Entry object is used to store this new key value pair.
and this will be stored as linked list for every object which have the same hashcode, but a TRIEFY_THRESHOLD is specified with value 6. so after this reaches, linked list is converted to the balanced tree(red-black tree) with first element as the root.
Each Entry object represents key-value pair. Field next refers to other Entry object if a bucket has more than 1 Entry.
Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other object with next field and so one. Last entry refers to null.
When you create HashMap with default constructor
Array is gets created with size 16 and default 0.75 load balance.
(Source)
Hash map works on the principle of hashing
HashMap get(Key k) method calls hashCode method on the key object and applies returned hashValue to its own static hash function to find a bucket location(backing array) where keys and values are stored in form of a nested class called Entry (Map.Entry) . So you have concluded that from the previous line that Both key and value is stored in the bucket as a form of Entry object . So thinking that Only value is stored in the bucket is not correct and will not give a good impression on the interviewer .
Whenever we call get( Key k ) method on the HashMap object . First it checks that whether key is null or not . Note that there can only be one null key in HashMap .
If key is null , then Null keys always map to hash 0, thus index 0.
If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like
int hash = hash(hashValue)
and now ,it applies returned hashValue into its own hashing function .
We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is It defends against poor quality hash functions.
Now final hashvalue is used to find the bucket location at which the Entry object is stored . Entry object stores in the bucket like this (hash,key,value,bucketindex)
I will not get into the details of how HashMap works, but will give an example so we can remember how HashMap works by relating it to reality.
We have Key, Value ,HashCode and bucket.
For sometime, we will relate each of them with the following:
Bucket -> A Society
HashCode -> Society's address(unique always)
Value -> A House in the Society
Key -> House address.
Using Map.get(key) :
Stevie wants to get to his friend's(Josse) house who lives in a villa in a VIP society, let it be JavaLovers Society.
Josse's address is his SSN(which is different for everyone).
There's an index maintained in which we find out the Society's name based on SSN.
This index can be considered to be an algorithm to find out the HashCode.
SSN Society's Name
92313(Josse's) -- JavaLovers
13214 -- AngularJSLovers
98080 -- JavaLovers
53808 -- BiologyLovers
This SSN(key) first gives us a HashCode(from the index table) which is nothing but Society's name.
Now, mulitple houses can be in the same society, so the HashCode can be common.
Suppose, the Society is common for two houses, how are we going to identify which house we are going to, yes, by using the (SSN)key which is nothing but the House address
Using Map.put(key,Value)
This finds a suitable society for this Value by finding the HashCode and then the value is stored.
I hope this helps and this is open for modifications.
Bearing in mind the explanations here for the structure of a hashmap, perhaps someone could explain the following paragraph on Baeldung :-
Java has several implementations of the interface Map, each one with its own particularities.
However, none of the existing Java core Map implementations allow a Map to handle multiple values for a single key.
As we can see, if we try to insert two values for the same key, the second value will be stored, while the first one will be dropped.
It will also be returned (by every proper implementation of the put(K key, V value) method):
Map<String, String> map = new HashMap<>();
assertThat(map.put("key1", "value1")).isEqualTo(null);
assertThat(map.put("key1", "value2")).isEqualTo("value1");
assertThat(map.get("key1")).isEqualTo("value2");
It gonna be a long answer , grab a drink and read on …
Hashing is all about storing a key-value pair in memory that can be read and written faster. It stores keys in an array and values in a LinkedList .
Lets Say I want to store 4 key value pairs -
{
“girl” => “ahhan” ,
“misused” => “Manmohan Singh” ,
“horsemints” => “guess what”,
“no” => “way”
}
So to store the keys we need an array of 4 element . Now how do I map one of these 4 keys to 4 array indexes (0,1,2,3)?
So java finds the hashCode of individual keys and map them to a particular array index .
Hashcode Formulae is -
1) reverse the string.
2) keep on multiplying ascii of each character with increasing power of 31 . then add the components .
3) So hashCode() of girl would be –(ascii values of l,r,i,g are 108, 114, 105 and 103) .
e.g. girl = 108 * 31^0 + 114 * 31^1 + 105 * 31^2 + 103 * 31^3 = 3173020
Hash and girl !! I know what you are thinking. Your fascination about that wild duet might made you miss an important thing .
Why java multiply it with 31 ?
It’s because, 31 is an odd prime in the form 2^5 – 1 . And odd prime reduces the chance of Hash Collision
Now how this hash code is mapped to an array index?
answer is , Hash Code % (Array length -1) . So “girl” is mapped to (3173020 % 3) = 1 in our case . which is second element of the array .
and the value “ahhan” is stored in a LinkedList associated with array index 1 .
HashCollision - If you try to find hasHCode of the keys “misused” and “horsemints” using the formulae described above you’ll see both giving us same 1069518484. Whooaa !! lesson learnt -
2 equal objects must have same hashCode but there is no guarantee if
the hashCode matches then the objects are equal . So it should store
both values corresponding to “misused” and “horsemints” to bucket 1
(1069518484 % 3) .
Now the hash map looks like –
Array Index 0 –
Array Index 1 - LinkedIst (“ahhan” , “Manmohan Singh” , “guess what”)
Array Index 2 – LinkedList (“way”)
Array Index 3 –
Now if some body tries to find the value for the key “horsemints” , java quickly will find the hashCode of it , module it and start searching for it’s value in the LinkedList corresponding index 1 . So this way we need not search all the 4 array indexes thus making data access faster.
But , wait , one sec . there are 3 values in that linkedList corresponding Array index 1, how it finds out which one was was the value for key “horsemints” ?
Actually I lied , when I said HashMap just stores values in LinkedList .
It stores both key value pair as map entry. So actually Map looks like this .
Array Index 0 –
Array Index 1 - LinkedIst (<”girl” => “ahhan”> , <” misused” => “Manmohan Singh”> , <”horsemints” => “guess what”>)
Array Index 2 – LinkedList (<”no” => “way”>)
Array Index 3 –
Now you can see While traversing through the linkedList corresponding to ArrayIndex1 it actually compares key of each entry to of that LinkedList to “horsemints” and when it finds one it just returns the value of it .
Hope you had fun while reading it :)
As it is said, a picture is worth 1000 words. I say: some code is better than 1000 words. Here's the source code of HashMap. Get method:
/**
* Implements Map.get and related methods
*
* #param hash hash for key
* #param key the key
* #return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
So it becomes clear that hash is used to find the "bucket" and the first element is always checked in that bucket. If not, then equals of the key is used to find the actual element in the linked list.
Let's see the put() method:
/**
* Implements Map.put and related methods
*
* #param hash hash for key
* #param key the key
* #param value the value to put
* #param onlyIfAbsent if true, don't change existing value
* #param evict if false, the table is in creation mode.
* #return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
It's slightly more complicated, but it becomes clear that the new element is put in the tab at the position calculated based on hash:
i = (n - 1) & hash here i is the index where the new element will be put (or it is the "bucket"). n is the size of the tab array (array of "buckets").
First, it is tried to be put as the first element of in that "bucket". If there is already an element, then append a new node to the list.

How does HashSet not allow duplicates?

I was going through the add method of HashSet. It is mentioned that
If this set already contains the element, the call leaves the set unchanged and returns false.
But the add method is internally saving the values in HashMap
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
The put method of HashMap states that
Associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced.
So if the put method of HashMap replaces the old value, how the HashSet add method leaves the set unchanged in case of duplicate elements?
PRESENT is just a dummy value -- the set doesn't really care what it is. What the set does care about is the map's keys. So the logic goes like this:
Set.add(a):
map.put(a, PRESENT) // so far, this is just what you said
the key "a" is in the map, so...
keep the "a" key, but map its value to the PRESENT we just passed in
also, return the old value (which we'll call OLD)
look at the return value: it's OLD, != null. So return false.
Now, the fact that OLD == PRESENT doesn't matter -- and note that Map.put doesn't change the key, just the value mapped to that key. Since the map's keys are what the Set really cares about, the Set is unchanged.
In fact, there has been some change to the underlying structures of the Set -- it replaced a mapping of (a, OLD) with (a, PRESENT). But that's not observable from outside the Set's implementation. (And as it happens, that change isn't even a real change, since OLD == PRESENT).
The answer that you may be looking comes down to the fact that the backing hashmap maps the elements of the set to the value PRESENT which is defined in HashSet.java as follows:
private static final Object PRESENT = new Object();
In the source code for HashMap.put we have:
386 public V put(K key, V value) {
387 if (key == null)
388 return putForNullKey(value);
389 int hash = hash(key.hashCode());
390 int i = indexFor(hash, table.length);
391 for (Entry<K,V> e = table[i]; e != null; e = e.next) {
392 Object k;
393 if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
394 V oldValue = e.value;
395 e.value = value;
396 e.recordAccess(this);
397 return oldValue;
398 }
399 }
400
401 modCount++;
402 addEntry(hash, key, value, i);
403 return null;
404 }
Because the key in question already exists, we will take the early return on line 397. But you might think a change is being made to the map on line 395, in which it appears that we are changing the value of a map entry. However, the value of value is PRESENT. But because PRESENT is static and final, so there is only one such instance; and so the assignment e.value = value actually doesn't change the map, and therefore the set, at all!
Update:
Once a HashSet is initialized.
- All the items in it are stored as keys in a HashMap
- All the values that HashMap have ONLY ONE object that is PRESENT which is a static field in HashSet
As you can see the HashSet.add method adds the element to the HashMap.put as a key not as a value. Value is replaced in the HashMap not the key.
See HashMap#put:
Associates the specified value with the specified key in this map. If
the map previously contained a mapping for the key, the old value is
replaced.
It replaces the key with the new value, this way, no duplicates will be in the HashSet.
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
e is the key, So if e is already present put will not return null. Hence add will return false.
JavaDoc for put :
the previous value associated with key, or null if there was no mapping for key. (A null return can also indicate that the map previously associated null with key.)
From javadocs for HashMap.put(),
"Associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced."
Thus the map value will be replaced, (which is a constant static field in HashSet class, and thus the same instance is replaced), and the map key is kept untouched (which, in fact IS the Set collection item)

Why HashMap space gets expanded if all entries got stored in linked fashion under same index

From java HashMap's source code it is clear that its space is expanded twice when the space threshold is reached.
I thought about an use case where all 6 elements are stored under same index in linked fashion. The HashMap(size 10) with threshold 7(10*.75) gets expanded when the 7th element arrives. here actually there is no need of expansion since all are saved under one index.
kindly enlighten me
void addEntry(int hash, K key, V value, int bucketIndex)
{
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
void resize(int newCapacity)
{
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable);
table = newTable;
threshold = (int)(newCapacity * loadFactor);
}
You say there's no need to resize, since the HashMap can hold these entries.
However a HashMap ideally should be providing constant access time (O(1)). The resizing occurs in order to try and provide this access time. By reorganising the buckets a lookup for a key should ideally reference a bucket with only one value (to avoid iterating through a list of entries).
In the get() method you'll find this line:
for (Entry<K,V> e = table[indexFor(hash, table.length)];
The HashMap is using the indexFor() method to identify the bucket, and then it will iterate through the buckets to find a matching key. In order to optimise this the iteration should ideally only occur once (you can't avoid the bucket lookup)
This points to hashcodes ideally being equally distributed across the int range (2^31-1). You can make an objects hashcode constant (e.g. 1), but then you can see the HashMap can't do anything but dump all entries in one bucket, and performance is consequently impacted.
It's just a design decision. Probably based on the fact that maps should be very fast in retrieval and store and if you end up linking so many entries, performance will be affected. Thus, rehashing will probably sparse your items across buckets, instead of leaving them linked in just one bucket.
It is a trade of. All the elements that are in the same bucket while size is small will get scattered while size increases. This increases the performance.

Why does a HashTable store the hash value of the key in the table in java

I was going through Java's implementation of the put method for a hashtable and came across this :
// Makes sure the key is not already in the hashtable.
Entry tab[] = table;
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;
for (Entry<K,V> e = tab[index] ; e != null ; e = e.next) {
if ((e.hash == hash) && e.key.equals(key)) {
V old = e.value;
e.value = value;
return old;
}
}
While I understand that a key is required to check for collisions, why is Java storing the hash value of the key and also checking it ?
Because the same bucket (tab) can hold items having different hashes due to % tab.length operation. Checking hash first is probably some performance optimization to avoid calling equals() if hashes are different.
To put this in an example: Say you have two complex objects with costly equals() method. One object has hash equal to 1 while the other object has hash of 32. If you put both objects in a hash table having 31 buckets, they'll end up in the same bucket (tab). When adding a second (different object) you must make sure it's not yet in the table. You can use equals() immediately, but this might be slower. Instead you first compare hashes, avoiding costly equals() if not necessary. In this example hashes are different (despite being in the same bucket) so equals() is not necessary.
It makes access faster, since the hash value does not need to be recomputed for every access. This is important not just for explicit searches (where the hash is checked before doing equals) but also for rehash.

Categories