Collision resolution in Java HashMap

Collision resolution in Java HashMap - java

Java HashMap uses put method to insert the K/V pair in HashMap.
Lets say I have used put method and now HashMap<Integer, Integer> has one entry with key as 10 and value as 17.
If I insert 10,20 in this HashMap it simply replaces the the previous entry with this entry due to collision because of same key 10.
If the key collides HashMap replaces the old K/V pair with the new K/V pair.
So my question is when does the HashMap use Chaining collision resolution technique?
Why it did not form a linkedlist with key as 10 and value as 17,20?

When you insert the pair (10, 17) and then (10, 20), there is technically no collision involved. You are just replacing the old value with the new value for a given key 10 (since in both cases, 10 is equal to 10 and also the hash code for 10 is always 10).
Collision happens when multiple keys hash to the same bucket. In that case, you need to make sure that you can distinguish between those keys. Chaining collision resolution is one of those techniques which is used for this.
As an example, let's suppose that two strings "abra ka dabra" and "wave my wand" yield hash codes 100 and 200 respectively. Assuming the total array size is 10, both of them end up in the same bucket (100 % 10 and 200 % 10). Chaining ensures that whenever you do map.get( "abra ka dabra" );, you end up with the correct value associated with the key. In the case of hash map in Java, this is done by using the equals method.

In a HashMap the key is an object, that contains hashCode() and equals(Object) methods.
When you insert a new entry into the Map, it checks whether the hashCode is already known. Then, it will iterate through all objects with this hashcode, and test their equality with .equals(). If an equal object is found, the new value replaces the old one. If not, it will create a new entry in the map.
Usually, talking about maps, you use collision when two objects have the same hashCode but they are different. They are internally stored in a list.

It could have formed a linked list, indeed. It's just that Map contract requires it to replace the entry:
V put(K key, V value)
Associates the specified value with the specified key in this map
(optional operation). If the map previously contained a mapping for
the key, the old value is replaced by the specified value. (A map m is
said to contain a mapping for a key k if and only if m.containsKey(k)
would return true.)
http://docs.oracle.com/javase/6/docs/api/java/util/Map.html
For a map to store lists of values, it'd need to be a Multimap. Here's Google's: http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multimap.html
A collection similar to a Map, but which may associate multiple values
with a single key. If you call put(K, V) twice, with the same key but
different values, the multimap contains mappings from the key to both
values.
Edit: Collision resolution
That's a bit different. A collision happens when two different keys happen to have the same hash code, or two keys with different hash codes happen to map into the same bucket in the underlying array.
Consider HashMap's source (bits and pieces removed):
public V put(K key, V value) {
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
// i is the index where we want to insert the new element
addEntry(hash, key, value, i);
return null;
}
void addEntry(int hash, K key, V value, int bucketIndex) {
// take the entry that's already in that bucket
Entry<K,V> e = table[bucketIndex];
// and create a new one that points to the old one = linked list
table[bucketIndex] = new Entry<>(hash, key, value, e);
}
For those who are curious how the Entry class in HashMap comes to behave like a list, it turns out that HashMap defines its own static Entry class which implements Map.Entry. You can see for yourself by viewing the source code:
GrepCode for HashMap

First of all, you have got the concept of hashing a little wrong and it has been rectified by #Sanjay.
And yes, Java indeed implement a collision resolution technique. When two keys get hashed to a same value (as the internal array used is finite in size and at some point the hashcode() method will return same hash value for two different keys) at this time, a linked list is formed at the bucket location where all the informations are entered as an Map.Entry object that contains a key-value pair. Accessing an object via a key will at worst require O(n) if the entry in present in such a lists. Comparison between the key you passed with each key in such list will be done by the equals() method.
Although, from Java 8 , the linked lists are replaced with trees (O(log n))

Your case is not talking about collision resolution, it is simply replacement of older value with a new value for the same key because Java's HashMap can't contain duplicates (i.e., multiple values) for the same key.
In your example, the value 17 will be simply replaced with 20 for the same key 10 inside the HashMap.
If you are trying to put a different/new value for the same key, it is not the concept of collision resolution, rather it is simply replacing the old value with a new value for the same key. It is how HashMap has been designed and you can have a look at the below API (emphasis is mine) taken from here.
public V put(K key, V value)
Associates the specified value with the
specified key in this map. If the map previously contained a mapping
for the key, the old value is replaced.
On the other hand, collision resolution techniques comes into play only when multiple keys end up with the same hashcode (i.e., they fall in the same bucket location) where an entry is already stored. HashMap handles the collision resolution by using the concept of chaining i.e., it stores the values in a linked list (or a balanced tree since Java8, depends on the number of entries).

When multiple keys end up in same hash code which is present in same bucket.
When the same key has different values then the old value will be replaced with new value.
Liked list converted to balanced Binary tree from java 8 version on wards in worst case scenario.
Collision happen when 2 distinct keys generate the same hashcode() value.
When there are more collisions then there it will leads to worst performance of hashmap.
Objects which are are equal according to the equals method must return the same hashCode value.
When both objects return the same has code then they will be moved into the same bucket.

There is difference between collision and duplication.
Collision means hashcode and bucket is same, but in duplicate, it will be same hashcode,same bucket, but here equals method come in picture.
Collision detected and you can add element on existing key. but in case of duplication it will replace new value.

It isn't defined to do so. In order to achieve this functionality, you need to create a map that maps keys to lists of values:
Map<Foo, List<Bar>> myMap;
Or, you could use the Multimap from google collections / guava libraries

There is no collision in your example. You use the same key, so the old value gets replaced with the new one. Now, if you used two keys that map to the same hash code, then you'd have a collision. But even in that case, HashMap would replace your value! If you want the values to be chained in case of a collision, you have to do it yourself, e.g. by using a list as a value.

Related

Hashmap with two key object having same value but different bucket index

If Hashmap has two keys having same value, for eg:
HashMap map=new HashMap();
map.put("a","abc");
map.put("a","xyz");
So here put two key with "a" value and suppose for first bucketindex=1 and
second bucketindex=9
So my question is if bucket index for both is coming different after
applying hashing algorithm, in this how to handle for not inserting
duplicate key as it is already present and hashmap cannot have duplicate
key.
please suggest your view on this.

There won't be any such thing as "second bucket index".
I suggest you add something like System.out.println(map.toString()) in order to see what that second put() has done to your map.
EDIT:
In the method put(key,value), the "bucket index" is computed as a function of the key element's value, not the value element's value (so "a" and "a" give the same index for the bucket). This function is supposed to be deterministic so feeding it the same value ("a" in your case), the same hashCode() will come out and subsequently, the same bucket index.

In Java if a hashing function returns the same hash, equality of two objects is determined by equals() method. And if the objects are found equal, the old one is simply replaced by the new one.
Instead, if the objects are not equal, they just get chained in a linked list (or a balanced tree) and the map contains both objects, because they are different.
So, back to your question: "if bucket index for both is coming different after applying hashing algorithm" - this is impossible for equal objects. Equal objects must have the same hash code.

To make #Erwin's answer more clear, here's the source code of HashMap from JDK
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Digging more deep you will find that the bucket index is calculated from key's hash code.
To make it simple and straightforward, putting duplicate key with different values to the same HashMap will result just one single entry, which the second put is just overwriting the value of the entry.

If your question is how to create a hashmap that can handle more than one value for the same key, what you need is a Map> so it adds a new value to the arraylist everytime the key is the same.

How to calculate the complexity of the HashMap search algorithm? [duplicate]

This question already has answers here:
Is a Java hashmap search really O(1)?
(15 answers)
Closed 4 years ago.
How to calculate the complexity of the HashMap search algorithm? I'm googling result of this calculation - O(1), but I don't understand how they arrived at these findings.

HashMap works on the hashing principle.It is the data structure that allow you to store and retrieve data in O(1) time provided we know the key.
In hashing, hash functions are used to link key and value in HashMap. Objects are stored by calling put(key, value) method of HashMap and retrieved by calling get(key) method. When we call put method, hashcode() method of the key object is called so that hash function of the map can find a bucket location to store value object, which is actually an index of the internal array, known as the table. HashMap internally stores mapping in the form of Map.Entry object which contains both key and value object. When you want to retrieve the object, you call the get() method and again pass the key object. This time again key object generate same hash code (it's mandatory for it to do so to retrieve the object and that's why HashMap keys are immutable e.g. String) and we end up at same bucket location. If there is only one object then it is returned and that's your value object which you have stored earlier. Things get little tricky when collisions occur.
Collision : Since the internal array of HashMap is of fixed size, and if you keep storing objects, at some point of time hash function will return same bucket location for two different keys, this is called collision in HashMap. In this case, a linked list is formed at that bucket location and a new entry is stored as next node.
If we try to retrieve an object from this linked list, we need an extra check to search correct value, this is done by equals() method. Since each node contains an entry, HashMap keeps comparing entry's key object with the passed key using equals() and when it return true, Map returns the corresponding value.
Since searching inlined list is O(n) operation, in worst case hash collision reduce a map to linked list. This issue is recently addressed in Java 8 by replacing linked list to the tree to search in O(logN) time.
By the way, you can easily verify how HashMap works by looking at the code of HashMap.java in your Eclipse IDE if you are keenly interested in the code, otherwise the logic is explained above.
Information On Buckets : An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

Why does toString function of a HashMap prints itself with a different order?

I have this very simple piece of code, and I was just trying to play a bit with different kind of objects inside a Map.
//There's a bit of spanish, sorry about that
//just think 'persona1' as an object with
//a string and an int
Map mapa = new HashMap();
mapa.put('c', 12850);
mapa.put(38.6, 386540);
mapa.put("Andrés", 238761);
mapa.put(14, "Valor de 14");
mapa.put("p1", persona1);
mapa.put("Andrea", 34500);
System.out.println(mapa.toString());
And then I expect from console something like:
{c=12850, 38.6=386540, Andrés=238761, 14=Valor de 14, p1={nombre: Andres Perea, edad: 10}, Andrea=34500}
But susprisingly for me I got same data in different order:
{38.6=386540, Andrés=238761, c=12850, p1={nombre: Andres Perea, edad: 10}, Andrea=34500, 14=Valor de 14}
It doesn't matter if I try other kind of objects, even just Strings or numeric types, it always does the same, it makes a different without-apparently-any-sense order.
Can someone give me a hint why this happens? Or may be something too obvious I'm missing?
I'm using Java 1.7 and Eclipse Juno.

As per Oracle's documentation
The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls. This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
Refer to HashMap JavaDocs.

There are 3 class which implements map interface in java.
1. hashMap: Id does not guarantee any order.
2. Linked HashMap:It will store them in insertion order.
3. TreeMap: It will store in ascending order.(ASCII value)
So As per your requirement you can use Linked HashMap instead of HashMap.so instead of writing
Map mapa = new HashMap();
create object of Linked HashMap
Map mapa = new LinkedHashMap();
follow below link for more info.
http://docs.oracle.com/javase/tutorial/collections/interfaces/map.html

HashMap not guaranteed the order of element. If you want to keep order use LinkedHashMap.
See following case
Map<Integer,String> unOrderedMap=new HashMap<>();
unOrderedMap.put(1,"a");
unOrderedMap.put(3,"a");
unOrderedMap.put(2,"a");
System.out.println("HashMap output: "+unOrderedMap.toString());
Map<Integer,String> orderedMap=new LinkedHashMap<>();
orderedMap.put(1,"a");
orderedMap.put(3,"a");
orderedMap.put(2,"a");
System.out.println("LinkedHashMap output: "+orderedMap.toString());
Output:
HashMap output: {1=a, 2=a, 3=a}
LinkedHashMap output: {1=a, 3=a, 2=a}

Maps does not maintain the order the order in which elements were added, List will maintain the order of elements
"The order of a map is defined as the order in which the iterators on the map's collection views return their elements. Some map implementations, like the TreeMap class, make specific guarantees as to their order; others, like the HashMap class, do not."

This is how a hashmap works: (citing from another source)
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode() and equals() methods of keys:
If two keys are the same (equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).
If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals() to tell them apart.
Now, when you put all your "key-value" pairs in the hashmap, and print them, it prints them in some random order of the keys which got generated by hashing the value you supplied for keys.
If your requirement is still to maintain the ordering, you can use the LinkedHashMap in Java.
Hope this helps :-)
Edit: Original Post: How does a Java HashMap handle different objects with the same hash code?

Old values in hash map being overwritten by new values?

I have one hash map. I'm storing 12 different key,values pairs in it.
The first 8 values are stored fine, but when I try to put the 9th value it overwrites the old value. But the size increases.
If I try to get the old values, I get nulls. I have also checked the hash map table. Only 8 values are there. The old values are overwritten.
here have only 7 values but size is 9 . how it's possible ?
What could I be doing wrong?

Make sure you use different keys. If that's the case, make sure equals and hashcode for your key class work as required, i.e. when two objects are equal, their hashcodes must be same. And of course, equals for different key values (or what you'd expect to be distinct keys) must return false.
If that doesn't help, post a minimal, yet complete (compilable) example that demonstrates your problem.

As for the size=9 but only 7 values in the table, you are misunderstanding the internal workings of the HashMap. All values are not stored in the top-level table. The table is more like "buckets" that store entries grouped by certain hashcode ranges. Each "bucket" holds a chain of linked entries so what you are seeing in the table are just the first entries in each particular range chain. The size is always correct though, in terms of total number of entries in the map.
As for entries overwriting eachother, that happens only when you put en entry with a key that is identical (hashCode and equals) to en existing entry. So you are either adding with an existing key, or you are adding with null as key (null is permissible as key, but you can only have one entry with the key null).
Check your code, are you adding with null keys? If you are using instances of a custom class (one you created yourself) as key, have you implemented hashCode() and equals() according to the specifications (see http://download.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode%28%29)? Are you making sure that you are really using unique keys for all 12 put operations?

What does it mean by "the hash table is open" in Java?

I was reading the Java api docs on Hashtable class and came across several questions. In the doc, it says "Note that the hash table is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. " I tried the following code myself
Hashtable<String, Integer> me = new Hashtable<String, Integer>();
me.put("one", new Integer(1));
me.put("two", new Integer(2));
me.put("two", new Integer(3));
System.out.println(me.get("one"));
System.out.println(me.get("two"));
the out put was
1
3
Is this what it means by "open"?
what happened to the Integer 2? collected as garbage?
Is there an "closed" example?

No, this is not what is meant by "open".
Note the difference between a key collision and a hash collision.
The Hashtable will not allow more than one entry with the same key (as in your example, you put two entries with the key "two", the second one (3) replaced the first one (2), and you were left with only the second one in the Hashtable).
A hash collision is when two different keys have the same hashcode (as returned by their hashCode() method). Different hash table implementations could treat this in different ways, mostly in terms of low-level implementation. Being "open", Hashtable will store a linked list of entries whose keys hash to the same value. This can cause, in the worst case, O(N) performance for simple operations, that normally would be O(1) in a hash map where the hashes mostly were different values.

It means that two items with different keys that have the same hashcode end up in the same bucket.
In your case the keys "two" are the same and so the second put overwrites the first one.
But assuming that you have your own class
class Thingy {
private final String name;
public Thingy(String name) {
this.name = name;
}
public boolean equals(Object o) {
...
}
public int hashcode() {
//not the worlds best idea
return 1;
}
}
And created multiple instances of it. i.e.
Thingy a = new Thingy("a");
Thingy b = new Thingy("b");
Thingy c = new Thingy("c");
And inserted them into a map. Then one bucket i.e. the bucket containing the stuff with hashcode 1 will contain a list (chain) of the three items.
Map<Thingy, Thingy> map = new HashMap<Thingy, Thingy>();
map.put(a, a);
map.put(b, b);
map.put(c, c);
So getting an item by any Thingy key would result in a lookup of the hashcode O(1) followed by a linear search O(n) on the list of items in the bucket with hashcode 1.
Also be careful to ensure that you obey the correct relationship when implementing hashcode and equals. Namely if two objects are equal then they should have the same hascode, but not necessarily the otherway round as multiple keys are likely to get the same hashcode.
Oh and for the full definitions of Open hashing and Closed hash tables look here http://www.c2.com/cgi/wiki?HashTable

Open means that if two keys are not equal, but have the same hash value, then they will be stored in the same "bucket". In this case, you can think of each bucket as a linked list, so if many things are stored in the same bucket, search performance will decrease.
Bucket 0: Nothing
Bucket 1: Item 1
Bucket 2: Item 2 -> Item 3
Bucket 3: Nothing
Bucket 4: Item 4
In this case, if you search for a key that hashes to bucket 2, you have to then perform an O(n) search on the list to find the key that equals what you're searching for. If the key hashes to Bucket 0, 1, 3, or 4, then you get an O(1) search performance.

It means that Hashtable uses open hashing (also known as separate chaining) to deal with hash collisions. If two separate keys have the same hashcode, both of them will be stored in the same bucket (in a list).

A hash is a computed function that maps one object ("one" or "two" in your sample) to (in this case) an integer. This means that there may be multiple values that map to the same integer ( an integer has a finite number of permitted values while there may be an infinite number of inputs) . In this case "equals" must be able to tell these two apart. So your code example is correct, but there may be some other key that has the same hashcode (and will be put in the same bucket as "two")

Warning: there are contradictory definitions of "open hashing" in common usage:
Quoting from http://www.c2.com/cgi/wiki?HashTable cited in another answer:
Caution: some people use the term
"open hashing" to mean what I've
called "closed hashing" here! The
usage here is in accordance with that
in TheArtOfComputerProgramming and
IntroductionToAlgorithms, both of
which are recommended references if
you want to know more about hash
tables.
For example, the above page defines "open hashing" as follows:
There are two main strategies. Open
hashing, also called open addressing,
says: when the table entry you need
for a new key/value pair is already
occupied, find another unused entry
somehow and put it there. Closed
hashing says: each entry in the table
is a secondary data structure (usually
a linked list, but there are other
possibilities) containing the actual
data, and this data structure can be
extended without limit.
By contrast, the definition supplied by Wikipedia is:
In the strategy known as separate
chaining, direct chaining, or simply
chaining, each slot of the bucket
array is a pointer to a linked list
that contains the key-value pairs that
hashed to the same location. Lookup
requires scanning the list for an
entry with the given key. Insertion
requires appending a new entry record
to either end of the list in the
hashed slot. Deletion requires
searching the list and removing the
element. (The technique is also called
open hashing or closed addressing,
which should not be confused with
'open addressing' or 'closed
hashing'.)
If so-called "experts" cannot agree what the term "open hashing" means, it is best to avoid using it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.