I know about collections. But i was thinking , when we overide the key in map , it simply overides the previous value assigned to it.
On the otherside , we say that when we put value in map it calls hashcode() and if it goes to same bucket then equal() is been called. In case not , there is no need to call equal() method.
If thats the case, when we overide the key in hashmap. It probably calls hashcode() and as if same hashcode is returned for key/value pair. Then equal is been called and after checking that its exactly same , why it gets overide??
The HashMap (and map implementations) do not call hashCode() method on the values - they are called on the keys. There is a check performed to see if the key is null - if this is the case, hashCode() is not called and the key is stored in position 0 in the map's table.
The hash function is called on the keys. We all know that a map retrieves elements in constant time O(1). Now comes the interesting part - it is not always the case that the hash code of a key resolves to a unique value. When the number of entries is large, the HashMap in java applies another hashing function to resize the map structure but this cannot be done infinitely. At a certain point, more than one key will resolve to the same value. In this case, when more than one key gets hashed into the same bucket(location within the map structure), the collision is resolved by storing the elements as a linked list (this happens very rarely). Now the look up time for an element that has gone into a bucket which points to a linked list is linear (O(n)).
Related
in my project I use HashMap in order to store some data, and I've recently discovered that when I mutate the keys of the HashMap, some unexpected wrong results may happen. For Example:
HashMap<ArrayList,Integer> a = new HashMap<>();
ArrayList list1 = new ArrayList<>();
a.put(list1, 1);
System.out.println(a.containsKey(new ArrayList<>())); // true
list1.add(5);
ArrayList list2 = new ArrayList<>();
list2.add(5);
System.out.println(a.containsKey(list2)); // false
Note that both a.keySet().iterator().next().hashCode() == list2.hashCode() and a.keySet().iterator().next().equals(list2) are true.
I cannot understand why it happens, referring to the fact that the two objects are equal and have the same hash-code. Do anyone know what is the cause of that, and if there is any other similar structure that allows mutation of the keys? Thanks.
Mutable keys are always a problem. Keys are to be considered mutable if the mutation could change their hashcode and/or the result of equals(). That being said, lists often generate their hashcodes and check equality based on their elements so they almost never are good candidates for map keys.
What is the problem in your example? When the key is added it is an empty list and thus produces a different hashcode than when it contains an element. Hence even though the hashcode of the key and list2 are the same after changing the key list you'll not find the element. Why? Simply because the map looks in the wrong bucket.
Example (simplified):
Let's start with a few assumptions:
an empty list returns a hashcode of 0
if the list contains the element 5 it returns the hashcode 5
our map has 16 buckets (default)
the bucket index is determined by hashcode % 16 (the number of our buckets)
If you now add the empty list it gets inserted into bucket 0 due to its hashcode.
When you do the lookup with list1 it will look in bucket 5 due to the hashcode of 5. Since that bucket is empty nothing will be found.
The problem is that your key list changes its hashcode and thus should be put into a different bucket but the map doesn't know this should happen (and doing so would probably cause a bunch of other problems).
According to the javadocs for Map:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map. A special case of this prohibition is that
it is not permissible for a map to contain itself as a key. While it
is permissible for a map to contain itself as a value, extreme caution
is advised: the equals and hashCode methods are no longer well defined
on such a map.
Your lists are the keys and you're changing them. It would not be a problem if the contents of the list were not what determine the values for hash code and what is equal, however that is not your case. If you think about it, it doesn't make much sense to change the key of a map. The key is what identifies the value, and if that key changes, all bets are off.
The map inserts the value given the hash code upon insertion. When you search for it later, it uses the hash code of the parameter to determine if it is a hit. I think you'd find that had you inserted list1 with the value already inserted that you would see "true" printed out since list2.hashCode() would produce the same hash code as list1 when it was inserted.
That's because a HashMap uses the hashCode() Method of Object in combination with equals(Object obj) to check if this map contains an object.
See:
ArrayList<Integer> a = new ArrayList<>();
a.add(1);
System.out.println(a.hashCode());
a.add(2);
System.out.println(a.hashCode());
This example shows, that the hashCode of your ArrayList has changed.
You should never use a mutable object as a key in your hashmap.
So what basically going on when u put the list1 as key in line 3 is that the map calculates its hashCode which it would later compare in containsKey(someKey) .
but when u mutated the list1 in line 5 its hashCode is essentially changed.
so if u now do
System.out.println(a.containsKey(list1));
after line 5 it would say false
and if u do System.out.println(a.get(list1));
it would say null as its comparing two different hashCodes
Probably you didn't override equals() and hashCode() methods.
On every single article about HashMaps hash collision one thing is in common and my question revolves around that.
Let me explain what i understand about hashmaps internal working.
Saving two entries(e1,e2) with same hashcode using map.put(k,v)
1) when the map.put(k,v) is called, hashmap finds the hashCode() of the key 'k'.
2) then it uses this hashcode it found as a seed for its internal static hashing method & gets another hash value.
3) then this new found hash value is mapped to the internal index of bucket.
4) then a Entry is added to the bucket.
In case of a hash collision.
1) same as normal, when the map.put(k,v) is called, hashmap finds the hashCode() of the key 'k'.
2) again same as usual, then it uses this hashcode it found as a seed for its internal static hashing method & gets another hash value.
3) the new found hash value is mapped to the internal index of the bucket, now there is a problem as it already has a entry at this bucket position.
Resolution : since the Entry is actually a simple linked list, the new item with the collided hash is stored at the next of the previous Entry.
Fetching the entry e2 with map.get(k)
1) hash generated from key & again static hash method called using the hash obtain from the key as seed.
2) finding the mapped bucket using the hash value obtained by the static hash method, now if there are more than one entries here equals() method comes to the rescue.
that is the linked list would traverse & keep on calling the "equals()" method until it finds the match.
Now my question is where is this so called equals() method defined ?
I opened the official documentation of HashMap & it doesn't override the .equals() method, so where is it overriden? Or is it the default .equals() from the Object class ?
Both hashCode() and equals() methods belong to the class of the key object, not to the hash map.
The methods are defined in the Object class, but it is expected that the objects used as keys in a hash map provide their own implementation for both these methods. Therefore, it's not the default .equals() from Object class, it is the specific .equals() from the actual key class that gets called for collision resolution.
For example, if you use String objects as keys, the overrides of hashCode() and equals() provided by String would be used.
I'm a bit confused about the internal implementation of HashSet and HashMap in java.
This is my understanding, so please correct me if I'm wrong:
Neither HashSet or HashMap allow duplicate elements.
HashSet is backed by a HashMap, so in a HashSet when we call .add(element), we are calling the hashCode() method on the element and internally doing a put(k,v) to the internal HashMap, where the key is the hashCode and the value is the actual object. So if we try to add the same object to the Set, it will see that the hashCode is already there, and then replace the old value by the new one.
But then, this seems inconsistent to me when I read how a HashMap works when storing our own objects as keys in a HashMap.
In this case we must override the hashCode() and equals() methods and make them consistent between each other, because, if we find keys with the same hashCode, they will go to the same bucket, and then to distinguish between all the entries with the same hashCode we have to iterate over the list of entries to call the method equals() on each key and find a match.
So in this case, we allow to have the same hashCode and we create a bucket containing a list for all the objects with the same hashCode, however using a HashSet, if we find already a hashCode, we replace the old value by the new value.
I'm a bit confused, could someone clarify this to me please?
You are correct regarding the behavior of HashMap, but you are wrong about the implementation of HashSet.
HashSet is backed by a HashMap internally, but the element you are adding to the HashSet is used as the key in the backing HashMap. For the value, a dummy value is used. Therefore the HashSet's contains(element) simply calls the backing HashMap's containsKey(element).
The value we insert in HashMap acts as a Key to the map object and for its value, java uses a constant variable.So in the key-value pair, all the keys will have the same value.
you can refer to this link
https://www.geeksforgeeks.org/hashset-in-java/
Hash Map:-Basically Hash map working as key and value ,if we want to store data as key and value pair then we will go to the hash map, basically when we insert data by using hash map basically internally it will follow 3 think,
1.hashcode
2..equale
3.==
when we insert the data in hash map it will store the data in bucket(fast in) by using hash code , if there is 2 data store in the same bocket then key collision will happen to resolve this key collision we use (==) method, always == method check the reference of the object, if both object hashcode is same then first one replace to second one if the hashcode is not same then hashing Collision will happen to resolve this hashing collision we will use (.equal) method .equal method basically it will check the content , if both the content is same then it will return true other wise it will return false, so in the hash map it will check is the content is same ? if the content is same then first one replace to the second one if both content is different the it will create another one object in the bocket and store the data
Hash Set:- Basically Hash Set is use to store bunch of object at a time ,internally hash set also use hash map only , when we insert somethink by using add method internally it will call put method and it will store data in the hashmap key bcz hash map key always unique and duplicate are not allowed that's way hashset also unique and duplicate are not allowed and if we entered duplicate also in hashst it will not through any exception first one will replace to the second one and in the value it will store constant data "PRESENT".
You can observe that internal hashmap object contains the element of hashset as keys and constant “PRESENT” as their value.
Where present is constant which is defined as
private static final Object present = new Object()
I am trying to figure something out about hashing in java.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
HashMap is basically implemented internally as an array of Entry[]. If you understand what is linkedList, this Entry type is nothing but a linkedlist implementation. This type actually stores both key and value.
To insert an element into the array, you need index. How do you calculate index? This is where hashing function(hashFunction) comes into picture. Here, you pass an integer to this hashfunction. Now to get this integer, java gives a call to hashCode method of the object which is being added as a key in the map. This concept is called preHashing.
Now once the index is known, you place the element on this index. This is basically called as BUCKET , so if element is inserted at Entry[0], you say that it falls under bucket 0.
Now assume that the hashFunction returns you same index say 0, for another object that you wanted to insert as a key in the map. This is where equals method is called and if even equals returns true, it simple means that there is a hashCollision. So under this case, since Entry is a linkedlist implmentation, on this index itself, on the already available entry at this index, you add one more node(Entry) to this linkedlist. So bottomline, on hashColission, there are more than one elements at a perticular index through linkedlist.
The same case is applied when you are talking about getting a key from map. Based on index returned by hashFunction, if there is only one entry, that entry is returned otherwise on linkedlist of entries, equals method is called.
Hope this helps with the internals of how it works :)
Hash values in Java are provided by objects through the implementation of public int hashCode() which is declared in Object class and it is implemented for all the basic data types. Once you implement that method in your custom data object then you don't need to worry about how these are used in miscellaneous data structures provided by Java.
A note: implementing that method requires also to have public boolean equals(Object o) implemented in a consistent manner.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
A HashMap is a form of hash table (and HashTable is another). They work by using the hashCode() and equals(Object) methods provided by the HashMaps key type. Depending on how you want you keys to behave, you can use the hashCode / equals methods implemented by java.lang.Object ... or you can override them.
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
I suggest you read the Wikipedia page on Hash Tables to understand how they work. (FWIW, the HashMap and HashTable classes use "separate chaining with linked lists", and some other tweaks to optimize average performance.)
A hash function works by turning an object (i.e. a "key") into an integer. How it does this is up to the implementor. But a common approach is to combine hashcodes of the object's fields something like this:
hashcode = (..((field1.hashcode * prime) + field2.hashcode) * prime + ...)
where prime is a smallish prime number like 31. The key is that you get a good spread of hashcode values for different keys. What you DON'T want is lots of keys all hashing to the same value. That causes "collisions" and is bad for performance.
When you implement the hashcode and equals methods, you need to do it in a way that satisfies the following constraints for the hash table to work correctly:
1. O1.equals(o2) => o1.hashcode() == o2.hashcode()
2. o2.equals(o2) == o2.equals(o1)
3. The hashcode of an object doesn't change while it is a key in a hash table.
It is also worth noting that the default hashCode and equals methods provided by Object are based on the target object's identity.
"But where is the hash values stored then? It is not a part of the HashMap, so is there an array assosiated to the HashMap?"
The hash values are typically not stored. Rather they are calculated as required.
In the case of the HashMap class, the hashcode for each key is actually cached in the entry's Node.hash field. But that is a performance optimization ... to make hash chain searching faster, and to avoid recalculating hashes if / when the hash table is resized. But if you want this level of understanding, you really need to read the source code rather than asking Questions.
This is the most fundamental contract in Java: the .equals()/.hashCode() contract.
The most important part of it here is that two objects which are considered .equals() should return the same .hashCode().
The reverse is not true: objects not considered equal may return the same hash code. But it should be as rare an occurrence as possible. Consider the following .hashCode() implementation, which, while perfectly legal, is as broken an implementation as can exist:
#Override
public int hashCode() { return 42; } // legal!!
While this implementation obeys the contract, it is pretty much useless... Hence the importance of a good hash function to begin with.
Now: the Set contract stipulates that a Set should not contain duplicate elements; however, the strategy of a Set implementation is left... Well, to the implementation. You will notice, if you look at the javadoc of Map, that its keys can be retrieved by a method called .keySet(). Therefore, Map and Set are very closely related in this regard.
If we take the case of a HashSet (and, ultimately, HashMap), it relies on .equals() and .hashCode(): when adding an item, it first calculates this item's hash code, and according to this hash code, attemps to insert the item into a given bucket. In contrast, a TreeSet (and TreeMap) relies on the natural ordering of elements (see Comparable).
However, if an object is to be inserted and the hash code of this object would trigger its insertion into a non empty hash bucket (see the legal, but broken, .hashCode() implementation above), then .equals() is used to determine whether that object is really unique.
Note that, internally, a HashSet is a HashMap...
Hashing is a way to assign a unique code for any variable/object after applying any function/algorithm on its properties.
HashMap stores key-value pair in Map.Entry static nested class implementation.
HashMap works on hashing algorithm and uses hashCode() and equals() method in put and get methods.
When we call put method by passing key-value pair, HashMap uses Key hashCode() with hashing to find out
the index to store the key-value pair. The Entry is stored in the LinkedList, so if there are already
existing entry, it uses equals() method to check if the passed key already exists, if yes it overwrites
the value else it creates a new entry and store this key-value Entry.
When we call get method by passing Key, again it uses the hashCode() to find the index
in the array and then use equals() method to find the correct Entry and return it’s value.
Below image will explain these detail clearly.
The other important things to know about HashMap are capacity, load factor, threshold resizing.
HashMap initial default capacity is 16 and load factor is 0.75. Threshold is capacity multiplied
by load factor and whenever we try to add an entry, if map size is greater than threshold,
HashMap rehashes the contents of map into a new array with a larger capacity.
The capacity is always power of 2, so if you know that you need to store a large number of key-value pairs,
for example in caching data from database, it’s good idea to initialize the HashMap with correct capacity
and load factor.
How exactly hash map store data internally ... I knew it will calculate HashCode value of key and store it.If two key having same hash code it will put into same bucket. But why if "two keys are same hashMap over write" the existing one?
Well, that's what it's designed to do. It's a mapping of key/value pairs, where any key is associated with 0 or 1 value. If you put a second value for a key, the entry for that key will be replaced.
It's not based just on the hash code though - it will test the keys for equality, too. Two keys can be unequal but have the same hash code. The important thing is that two equal keys must have the same hash code.
If you want to store multiple values for a single key, you should use something like Guava's Multimap.
It will not overwrite the value if the hashCode() is same. It will overwrite only if they are equal by the equals method.
See http://en.wikipedia.org/wiki/Hash_table and http://www.docjar.com/html/api/java/util/HashMap.java.html
A hash table or hash map is an array of linked lists, keyed by hashcode.
Hashcode code main purpose is to reduce the number of invocation of equals method in the hash based collection. Same hash code need not return true for equals method. But if you say its equals is true, then it hascode should be true.
Hash functions are generallyy used for eliminating duplicate data.That is why collections type
like Hashmap not allowing to store duplicate data.
This algorithms has been used in database as well to eliminate possible duplicates while retrieval.