HashMap in Java. hash.containsKey returns unexpected - java

I have a problem with hashMap. More specific with containsKey.
I want to check if a object exists in my hash. The problem is when I call this method with 2 different objects containing the same exact data, that should have same hashCode.
Person pers1,pers2;
pers1=new Person("EU",22);
pers2=new Person("EU",22);
public int hashCode(){ //From Person Class
return this.getName().hashCode()+age;
}
After inserting the pers1 key in my hash and calling " hash.containsKey(pers1);" returns true but"hash.containsKey(pers2)" returns false. Why and how could I fix this issue?
Thank you!

The cause of the issue seems to be that you did not override the equals method in the Person class. Hashmap needs that to locate the key while searching.
The steps performed while searching the key are as follows :
1) use hashCode() on the object (key) to locate the appropriate bucket where the key can be placed.
2) Once bucket is located, try to find the particular Key using equals() method.

containsKey() uses the .equals() method which you don't seem to override. .hashCode() provides a normalized (ideally) distribution across the hashtable, it does not do any equality comparisons (aside from requiring two equal objects require the same hashcode).
As you can see in the source code:
if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))

Related

Whose .equals() method is called to resolve hash collision in HashMaps?

On every single article about HashMaps hash collision one thing is in common and my question revolves around that.
Let me explain what i understand about hashmaps internal working.
Saving two entries(e1,e2) with same hashcode using map.put(k,v)
1) when the map.put(k,v) is called, hashmap finds the hashCode() of the key 'k'.
2) then it uses this hashcode it found as a seed for its internal static hashing method & gets another hash value.
3) then this new found hash value is mapped to the internal index of bucket.
4) then a Entry is added to the bucket.
In case of a hash collision.
1) same as normal, when the map.put(k,v) is called, hashmap finds the hashCode() of the key 'k'.
2) again same as usual, then it uses this hashcode it found as a seed for its internal static hashing method & gets another hash value.
3) the new found hash value is mapped to the internal index of the bucket, now there is a problem as it already has a entry at this bucket position.
Resolution : since the Entry is actually a simple linked list, the new item with the collided hash is stored at the next of the previous Entry.
Fetching the entry e2 with map.get(k)
1) hash generated from key & again static hash method called using the hash obtain from the key as seed.
2) finding the mapped bucket using the hash value obtained by the static hash method, now if there are more than one entries here equals() method comes to the rescue.
that is the linked list would traverse & keep on calling the "equals()" method until it finds the match.
Now my question is where is this so called equals() method defined ?
I opened the official documentation of HashMap & it doesn't override the .equals() method, so where is it overriden? Or is it the default .equals() from the Object class ?
Both hashCode() and equals() methods belong to the class of the key object, not to the hash map.
The methods are defined in the Object class, but it is expected that the objects used as keys in a hash map provide their own implementation for both these methods. Therefore, it's not the default .equals() from Object class, it is the specific .equals() from the actual key class that gets called for collision resolution.
For example, if you use String objects as keys, the overrides of hashCode() and equals() provided by String would be used.

Implementation of containsKey HashMap<> - Java

The whole purpose of using containsKey() is to check whether any given key is already in HashMap or not? If it doesn't contain that key than just add key into that HasMap.
But seems like when we call this method it's parameters are Object
type that means, containsKey() checks whether given argument(key) has
similar memory address with any other already entered key.
Potential Solution:
One solution could be get a unique data from that object1(oldKey) and
check with object2(new key), If they are same than don't use it in
HashMap. However this means containsKey has no purpose at all. Am I
right?
Sorry I am not ranting, or probably I sound like one. But I would like to know the most efficient way to get over this problem.
will be thankful for any kind of help.
But seems like when we call this method it's parameters are Object type that means, containsKey() checks whether given argument(key) has similar memory address with any other already entered key.
Wrong. Their equality is checked by comparing their hashCode() values first. Only if the hash values are equal, the objects themselves may be compared (but always using equals(), not ==). So any class where these two methods are implemented properly will work correctly as a key in a HashMap.
HashMap.containsKey() methods finds if whether the key's hashCode() exists and not by equality comparison. If the hash code exists, it will pull the entry to see if the reference equality OR equals() of the key is equal.
This is implemented in HashMap.getEntry() method:
/**
* Returns the entry associated with the specified key in the
* HashMap. Returns null if the HashMap contains no mapping
* for the key.
*/
final Entry<K,V> getEntry(Object key) {
int hash = (key == null) ? 0 : hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
But seems like when we call this method it's parameters are Object
type
Yes, but the method will be called on the actual implementation type, not on Object.class. That's why it's so important to implement hashCode() properly.
Read: Effective java, Item 9 (in fact you should buy and read the whole book)
But seems like when we call this method it's parameters are Object type that means, containsKey() checks whether given argument(key) has similar memory address with any other already entered key
This conclusion is wrong. The containskey(Object key) calls the equals() method on the passed key , so if this has overriden the equals(Object key) method , then it will resolve correctly based on the key equivalence criteria. Ofcourse if the Key has not overridden the equals() method , then it is a bad design to start with.

Equals method benifit for hashtable implementation in java?

For the benefit of hashtable we have two methods hashcode and equals.Internally when we add a key value pair in hastable first it goes inside hashcode method of key and checks if it is equal to hashcode value of any previous key. If it is not then it simply add key value pair in hashtable but if it is equal then it goes inside equals method of key where we provide again some logic to check if the objects are equal.So my Question here is the work we are doing in equals method we can eliminate that and put the same kind of logic inside hashcode method where we provide different hashcode (depending upon the logic we are putting in equals method). In that way we can manage the hashtable with hashcode mthod only and eliminate the need of equals method.
Take the example of Employee class where we have id,salary and name as its state.We are using Employee as key in hashtable. So we override the hashcode in a way that suffice the need of hashcode and equals method both.So need of equal method.
I know I am missing something here. Looking for it.
Yes, you're missing something.
First: hashCode returns an int, and can thus only return 2^32 different values. equals is thus needed to be able to differentiate between values which have identical hash codes.
Second: the hash table uses the hashCode modulo the number of buckets it maintains. So, even if two keys have different hashCodes, they might fall in the same bucket, and equals will be necessary to differentiate them.
The problem is that you can't guarantee (as a general condition) that the hashcode will always be unique.
You might be able to make a single class that can, for example Employee should be uniquely identified by employeeId. There would be no reason your hashcode could not simply be return employeeId; - you would guarantee uniqueness that way.
But, a general object will have much more. Consider a coordinate class
class Coordinate {
int x;
int y;
int z;
public boolean equals(Object o) {
if(o instanceof Coordinate) {
Coordinate c = (Coordinate)o;
return x == c.x && y == c.y && z == c.z;
}
return false;
}
public int hashCode() {
return x ^ y ^ z;
}
}
Your x y and z would make for 2^96 different combinations of uniqueness, but only 2^32 possible hashcodes. For example 1,2,3 vs 3,2,1 would both be the same. Now you could improve this to make the hashcode something like
public int hashCode() {
int c = x;
c *= 31 + y;
c *= 31 + z;
return c;
}
But this wouldn't get rid of the problem - you'd still be able to come up with thousands of combinations that would cause a hashcode collision.
But fear not - there are such things as what you describe: they're called Perfect Hashes
The problem is that hashCode() returns an int, and there are only 2^32 different hashcodes. Therefore, for classes with more than 2^32 different states (i.e. pretty much everything), you cannot avoid returning the same hashcode for some objects even though they are not equal.
The thing you're missing is that some data cannot be uniquely represented by a finite integer. A String is an example.
Also, equals isn't used only for when the hashCodes are the same. Elements are put into a "bucket" that usually covers millions of possible hashCode values (using the modulo operator). So even if every possible object had a unique hashCode you'd still need to double check everything.
So my Question here is the work we are doing in equals method we can eliminate that and put the same kind of logic inside hashcode method where we provide different hashcode (depending upon the logic we are putting in equals method).
The equals method is used to prevent duplicate keys from being inserted into a Map (if you go by the API documentation); this includes HashMaps and HashTables. The hashcode method on the other hand is used to optimize lookups, but cannot be relied on to compare equality of two keys as there is the possibility of hash collisions. The Map documentation specifically states:
Implementations are free to implement optimizations whereby the equals invocation is avoided, for example, by first comparing the hash codes of the two keys.
In the event of hash collisions among keys, a single bucket will store two or more values for two different keys, and the bucket must be traversed sequentially to find the value matching the key, which is the worst case. That's why the use of hashcode for comparison is an optimization, as the actual value matching the key can be obtained only via the equals methods. Note that, this assumes that the same fields used to calculate hashcode is also used to compare for equality.

Java Set collection - override equals method

Is there any way to override the the equals method used by a Set datatype? I wrote a custom equals method for a class called Fee. Now I have a LnkedList of Fee and I want to ensure that there are no duplicated entries. Thus I am considering using a Set insted of a LinkedList, but the criteria for deciding if two fees are equal resides in the overriden equals method in the Fee class.
If using a LinkedList, I will have to iterate over every list item and call the overriden equals method in the Fee class with the remaining entries as a parameter. Just reading this alone sounds like too much processing and will add to computational complexity.
Can I use Set with an overridden equals method? Should I?
As Jeff Foster said:
The Set.equals() method is only used to compare two sets for equality.
You can use a Set to get rid of the duplicate entries, but beware: HashSet doesn't use the equals() methods of its containing objects to determine equality.
A HashSet carries an internal HashMap with <Integer(HashCode), Object> entries and uses equals() as well as the equals method of the HashCode to determine equality.
One way to solve the issue is to override hashCode() in the Class that you put in the Set, so that it represents your equals() criteria
For Example:
class Fee {
String name;
public boolean equals(Object o) {
return (o instanceof Fee) && ((Fee)o.getName()).equals(this.getName());
}
public int hashCode() {
return name.hashCode();
}
}
You can and should use a Set to hold an object type with an overridden equals method, but you may need to override hashCode() too. Equal objects must have equal hash codes.
For example:
public Fee{
public String fi;
public String fo;
public int hashCode(){
return fi.hashCode() ^ fo.hashCode();
}
public boolean equals(Object obj){
return fi.equals(obj.fi) && fo.equals(obj.fo);
}
}
(With null checks as necessary, of course.)
Sets often use hashCode() to optimize performance, and will misbehave if your hashCode method is broken. For example, HashSet uses an internal HashMap.
If you check the source code of HashMap, you'll see it depends on both the hashCode() and the equals() methods of the elements to determine equality:
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
If the hash is not generated correctly, your equals method may never get called.
To make your set faster, you should generate distinct hash codes for objects that are not equal, wherever possible.
Set uses the equals method of the object added to the set. The JavaDoc states
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.
The Set.equals() method is only used to compare two sets for equality. It's never used as part of adding/remove items from the set.
One solution would be to use a TreeSet with a Comparator.
From the documentation:
TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal.
This approach would be much faster than using a LinkedList, but a bit slower than a HashSet (ln(n) vs n).
It's worth noting a one side effect of using TreeSet would be that your set is sorted.
There are PredicatedList or PredicatedSet in Apache Commons Collection

How does Java implement hash tables?

Does anyone know how Java implements its hash tables (HashSet or HashMap)? Given the various types of objects that one may want to put in a hash table, it seems very difficult to come up with a hash function that would work well for all cases.
HashMap and HashSet are very similar. In fact, the second contains an instance of the first.
A HashMap contains an array of buckets in order to contain its entries. Array size is always powers of 2. If you don't specify another value, initially there are 16 buckets.
When you put an entry (key and value) in it, it decides the bucket where the entry will be inserted calculating it from its key's hashcode (hashcode is not its memory address, and the the hash is not a modulus). Different entries can collide in the same bucket, so they'll be put in a list.
Entries will be inserted until they reach the load factor. This factor is 0.75 by default, and is not recommended to change it if you are not very sure of what you're doing. 0.75 as load factor means that a HashMap of 16 buckets can only contain 12 entries (16*0.75). Then, an array of buckets will be created, doubling the size of the previous. All entries will be put again in the new array. This process is known as rehashing, and can be expensive.
Therefore, a best practice, if you know how many entries will be inserted, is to construct a HashMap specifying its final size:
new HashMap(finalSize);
You can check the source of HashMap, for example.
Java depends on each class' implementation of the hashCode() method to distribute the objects evenly. Obviously, a bad hashCode() method will result in performance problems for large hash tables. If a class does not provide a hashCode() method, the default in the current implementation is to return some function (i.e. a hash) of the the object's address in memory. Quoting from the API doc:
As much as is reasonably practical,
the hashCode method defined by class
Object does return distinct integers
for distinct objects. (This is
typically implemented by converting
the internal address of the object
into an integer, but this
implementation technique is not
required by the JavaTM programming
language.)
There are two general ways to implement a HashMap. The difference is how one deals with collisions.
The first method, which is the one Java users, makes every bucket in a the HashMap contain a singly linked list. To accomplish this, each bucket contains an Entry type, which caches the hashCode, has a pointer to the key, pointer to the value, and a pointer to the next entry. When a collision occurs in Java, another entry is added to the list.
The other method for handling collisions, is to simply put the item into the next empty bucket. The advantage of this method is it requires less space, however, it complicates removals, as if the bucket following the removed item is not empty, one has to check to see if that item is in the right or wrong bucket, and shift the item if it originally has collided with the item being removed.
In my own words:
An Entry object is created to hold the reference of the Key and Value.
The HashMap has an array of Entry's.
The index for the given entry is the hash returned by key.hashCode()
If there is a collision ( two keys gave the same index ) , the entry is stored in the .next attribute of the existing entry.
That's how two objects with the same hash could be stored into the collection.
From this answer we get:
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
Let me know if I got something wrong.

Categories