Equals method benifit for hashtable implementation in java? - java

For the benefit of hashtable we have two methods hashcode and equals.Internally when we add a key value pair in hastable first it goes inside hashcode method of key and checks if it is equal to hashcode value of any previous key. If it is not then it simply add key value pair in hashtable but if it is equal then it goes inside equals method of key where we provide again some logic to check if the objects are equal.So my Question here is the work we are doing in equals method we can eliminate that and put the same kind of logic inside hashcode method where we provide different hashcode (depending upon the logic we are putting in equals method). In that way we can manage the hashtable with hashcode mthod only and eliminate the need of equals method.
Take the example of Employee class where we have id,salary and name as its state.We are using Employee as key in hashtable. So we override the hashcode in a way that suffice the need of hashcode and equals method both.So need of equal method.
I know I am missing something here. Looking for it.

Yes, you're missing something.
First: hashCode returns an int, and can thus only return 2^32 different values. equals is thus needed to be able to differentiate between values which have identical hash codes.
Second: the hash table uses the hashCode modulo the number of buckets it maintains. So, even if two keys have different hashCodes, they might fall in the same bucket, and equals will be necessary to differentiate them.

The problem is that you can't guarantee (as a general condition) that the hashcode will always be unique.
You might be able to make a single class that can, for example Employee should be uniquely identified by employeeId. There would be no reason your hashcode could not simply be return employeeId; - you would guarantee uniqueness that way.
But, a general object will have much more. Consider a coordinate class
class Coordinate {
int x;
int y;
int z;
public boolean equals(Object o) {
if(o instanceof Coordinate) {
Coordinate c = (Coordinate)o;
return x == c.x && y == c.y && z == c.z;
}
return false;
}
public int hashCode() {
return x ^ y ^ z;
}
}
Your x y and z would make for 2^96 different combinations of uniqueness, but only 2^32 possible hashcodes. For example 1,2,3 vs 3,2,1 would both be the same. Now you could improve this to make the hashcode something like
public int hashCode() {
int c = x;
c *= 31 + y;
c *= 31 + z;
return c;
}
But this wouldn't get rid of the problem - you'd still be able to come up with thousands of combinations that would cause a hashcode collision.
But fear not - there are such things as what you describe: they're called Perfect Hashes

The problem is that hashCode() returns an int, and there are only 2^32 different hashcodes. Therefore, for classes with more than 2^32 different states (i.e. pretty much everything), you cannot avoid returning the same hashcode for some objects even though they are not equal.

The thing you're missing is that some data cannot be uniquely represented by a finite integer. A String is an example.
Also, equals isn't used only for when the hashCodes are the same. Elements are put into a "bucket" that usually covers millions of possible hashCode values (using the modulo operator). So even if every possible object had a unique hashCode you'd still need to double check everything.

So my Question here is the work we are doing in equals method we can eliminate that and put the same kind of logic inside hashcode method where we provide different hashcode (depending upon the logic we are putting in equals method).
The equals method is used to prevent duplicate keys from being inserted into a Map (if you go by the API documentation); this includes HashMaps and HashTables. The hashcode method on the other hand is used to optimize lookups, but cannot be relied on to compare equality of two keys as there is the possibility of hash collisions. The Map documentation specifically states:
Implementations are free to implement optimizations whereby the equals invocation is avoided, for example, by first comparing the hash codes of the two keys.
In the event of hash collisions among keys, a single bucket will store two or more values for two different keys, and the bucket must be traversed sequentially to find the value matching the key, which is the worst case. That's why the use of hashcode for comparison is an optimization, as the actual value matching the key can be obtained only via the equals methods. Note that, this assumes that the same fields used to calculate hashcode is also used to compare for equality.

Related

Why do two different HashSets with the same data have the same HashCode?

I recently ran across a problem on leetcode which I solved with a nested hashset. This is the problem, if you're interested: https://leetcode.com/problems/group-anagrams/.
My intuition was to add all of the letters of each word into a hashset, then put that hashset into another hashset. At each iteration, I would check if the hashset already existed, and if it did, add to the existing hashset.
Oddly enough, that seems to work. Why do 2 hashsets share the same hashcode if they are different objects? Would something like if(set1.hashCode() == set2.hashCode()) doStuff() be valid code?
This is expected. HashSet extends AbstractSet. The hashCode() method in AbstractSet says:
Returns the hash code value for this set. The hash code of a set is defined to be the sum of the hash codes of the elements in the set, where the hash code of a null element is defined to be zero. This ensures that s1.equals(s2) implies that s1.hashCode()==s2.hashCode() for any two sets s1 and s2, as required by the general contract of Object.hashCode.
This implementation iterates over the set, calling the hashCode method on each element in the set, and adding up the results.
Here's the code from AbstractSet:
public int hashCode() {
int h = 0;
Iterator<E> i = iterator();
while (i.hasNext()) {
E obj = i.next();
if (obj != null)
h += obj.hashCode();
}
return h;
}
Why do 2 hashsets share the same hashcode if they are different objects?
With HashSet, the hashCode is calculated using the contents of the set. Since it's just numeric addition, the order of addition doesn't matter – just add them all up. So it makes sense that you have two sets, each containing objects which are equivalent (and thus should have matching hashCode() values), and then the sum of hashCodes within each set is the same.
Would something like if(set1.hashCode() == set2.hashCode()) doStuff() be valid code?
Sure.
EDIT: The best way of comparing two sets for equality is to use equals(). In the case of AbstractSet, calling set1.equals(set2) would result in individual calls to equals() at the level of the objects within the set (as well as some other checks).
Why do two different HashSets with the same data have the same
HashCode?
Actually this is needed to fulfill another need that is specified in Java.
The equals method of Set is overridden to take in consideration that equals returns true (example a.equals(b)) if:
a is of type Set and b is of type Set.
both a and b have exactly the same size.
a contains all elements of b.
b contains all elements of a.
Since the default equals (which compares only the memory reference to be the same) is overridden for Set, according to java guidelines the hashCode method has to be overridden as well. So, this custom implementation of hashCode is provided in order to match with the custom implementation of equals.
In order to see why it is necessary to override hashCode method when the equals method is overridden, you can take a look at this previous answer of mine.
Why do 2 hashsets share the same hashcode if they are different
objects
Because as explained above this is needed so that Set can have the custom functionality for equals that it currently has.
If you want to just check if a and b are different instances of set you can still check this with operators == and !=.
a == b -> true means a and b point to the same instance of Set in memory
a != b -> true means a and b point to different instances of Set in memory

Why can't I just compare the hashCode of two objects in order to find out if they are equal or not?

Why do the equals methods implemented by Eclipse compare each value, wouldn't it be simpler to just compare the hashCodes of both objects?
From what I know:
hashCode always generates the same hash for the same input
So if two objects are equal, they should have the same hash
If objects that are equal have the same hash, I can just check the hash in order to determine of objects are equal or not
edit: Related question, why does one always implement the hashCode when equals is implemented, if the hashCode isn't actually needed for equals?
hashCode always generates the same hash for the same input
Correct.
So if two objects are equal, they should have the same hash
Correct.
If objects that are equal have the same hash, I can just check the hash in order to determine of objects are equal or not
Non sequitur. Objects that are unequal can also have the same hashcode. That is the purpose of a hashcode.
Related question, why does one always implement the hashCode when equals is implemented, if the hashCode isn't actually needed for equals?
Because it is needed for hashing, in HashMap, HashSet, and friends. If you think your object will never be so used, don't override it, and good luck with that.
To complement #EJP's answer, here is a perfectly valid, although useless, implementation of .hashCode():
#Override
public int hashCode()
{
return 42; // The Answer
}
Putting this in very simple terms: while every squirrel is an animal, not every animal is a squirrel. The hashCode is usually used for quick lookup - it should be efficient and it should distribute data uniformly across a lookup table - see here. But a hash function can generate collisions, which is why it shouldn't be used as a means of verifying object equality.
It's all very much dependent on the implementation of hashCode - as you can also see in fge's answer.
As to why it usually needs to be reimplemented when you override equals: they are both used when storing and retrieving objects from collections (for example a HashMap). The hashCode determines the place in the map where the object will be inserted, while equals is used to identify the object inside a collision bucket.

What happens if we override only hashCode() in a class and use it in a Set?

This may not be the real world scenario but just curious to know what happens, below is the code.
I am creating a set of object of class UsingSet.
According to hashing concept in Java, when I first add object which contains "a", it will create a bucket with hashcode 97 and put the object inside it.
Again when it encounters an object with "a", it will call the overridden hashcode method in the class UsingSet and it will get hashcode 97 so what is next?
As I have not overridden equals method, the default implementation will return false. So where will be the Object with value "a" be kept, in the same bucket where the previous object with hashcode 97 kept? or will it create new bucket?
anybody know how it will be stored internally?
/* package whatever; // don't place package name! */
import java.util.*;
import java.lang.*;
import java.io.*;
class UsingSet {
String value;
public UsingSet(String value){
this.value = value;
}
public String toString() {
return value;
}
public int hashCode() {
int hash = value.hashCode();
System.out.println("hashcode called" + hash);
return hash;
}
public static void main(String args[]) {
java.util.Set s = new java.util.HashSet();
s.add(new UsingSet("A"));
s.add(new UsingSet("b"));
s.add(new UsingSet("a"));
s.add(new UsingSet("b"));
s.add(new UsingSet("a"));
s.add(new Integer(1));
s.add(new Integer(1));
System.out.println("s = " + s);
}
}
output is:
hashcode called65
hashcode called98
hashcode called97
hashcode called98
hashcode called97
s = [1, b, b, A, a, a]
HashCode & Equals methods
Only Override HashCode, Use the default Equals:
Only the references to the same object will return true. In other words, those objects you expected to be equal will not be equal by calling the equals method.
Only Override Equals, Use the default HashCode: There might be duplicates in the HashMap or HashSet. We write the equals method and expect{"abc", "ABC"} to be equals. However, when using a HashMap, they might appear in different buckets, thus the contains() method will not detect them each other.
James Large answer is incorrect, or rather misleading (and part incorrect as well). I will explain.
If two objects are equal according to their equals() method, they must also have the same hash code.
If two objects have the same hash code, they do NOT have to be equal too.
Here is the actual wording from the java.util.Object documentation:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
It is true, that if two objects don't have the same hash then they are not equal. However, hashing is not a way to check equality - so it is wildly incorrect to say that it is a faster way to check equality.
Also, it is also wildly incorrect to say the hashCode function is an efficient way to do anything. This is all up to implementation, but the default implementation for hashCode of a string is very inefficient as the String gets large. It will perform a calculation based on each char of the String, so if you are using large Strings as keys, then this becomes very inefficient; moreso if you have a large number of buckets.
In a Map (HashSet uses a HashMap internally), there are buckets and in each bucket is a linked list. Java uses the hashCode() function to find out which bucket it belongs in (it actually will modify the hash, depending on how many buckets exist). Since two objects may share the same hash, it will iterate through the linked list sequentially next, checking the equals() method to see if the object is a duplicate. Per the java.util.Set documenation:
A collection that contains no duplicate elements.
So, if its hashCode() leads it to a bucket, in which that bucket contains an Object where the .equals() evaluates to true, then the previous Object is overwritten with the new Object. You can probably view here for more information:
How does a Java HashMap handle different objects with the same hash code?
Generally speaking though, it is good practice that if you overwrite the hashCode function, you also overwrite the equals function (if I'm not mistaken, this breaks the contract if you choose not to).
Simply you can Assume hashcode and equals methods as a 2D search like:-
Where Hashcode is the Rows and the object list is the Column.
Consider the following class structure.
public class obj
{
int Id;
String name;
public obj(String name,int id)
{
this.id=id;
this.name=name;
}
}
now if you create the objects like this:-
obj obj1=new obj("Hassu",1);
obj obj2=new obj("Hoor",2);
obj obj3=new obj("Heniel",3);
obj obj4=new obj("Hameed",4);
obj obj5=new obj("Hassu",1);
and you place this objects in map like this :-
HashMap hMap=new HashMap();
1. hMap.put(obj1,"value1");
2. hMap.put(obj2,"value2");
3. hMap.put(obj3,"value3");
4. hMap.put(obj4,"value4");
5. hMap.put(obj5,"value5");
now if you have not override the hashcode and equals then after putting all the objects till line 5 if you put obj5 in the map as By Default HashCode you get different hashCode so the row(Bucket will be different).
So in runtime memory it will be stored like this.
|hashcode | Objects
|-----------| ---------
|000562 | obj1
|000552 | obj2
|000588 | obj3
|000546 | obj4
|000501 | obj5
Now if you create the same object Like :-
obj obj6 = new obj("hassu",1);
And if you search for this value in the map.like
if(hMap.conaints(obj6))
or
hMpa.get(obj 6);
though the key(obj1) with the same content is available you will get false and null respectively.
Now if you override only equals method.
and perform the same content search key will also get the Null as the HashCode for obj6 is different and in that hashcode you wont find any key.
Now if you override only hashCode method.
You will get the same bucket (HashCode row) but the content cant be checked and it will take the reference checked implementation by Super Object Class.
SO here if you search for the key hMap.get(obj6) you will get the correct hashcode:- 000562 but as the reference for both obj1 and obj6 is different you will get null.
Set will behave differently.
Uniqueness wont happen. Because unique will be achieved by both hashcode and equals methods.
output will be liked this s = [A, a, b, 1] instead of early one.
Apart that remove and contains all wont work.
Without looking at your code...
The whole point of hash codes is to speed up the process of testing two objects for equality. It can be costly to test whether two large, complex objects are equal, but it is trivially easy to compare their hash codes, and hash codes can be pre-computed.
The rule is: If two objects don't have the same hash code, that means they are not equal. No need to do the expensive equality test.
So, the answer to the question in your title: If you define an equals() method that says object A is equal to object B, and you define a hashCode() method that says object A is not equal to object B (i.e., it says they have different hash codes), and then you hand those two objects to some library that cares whether they are equal or not (e.g., if you put them in a hash table), then the behavior of the library is going to be undefined (i.e., probably wrong).
Added information: Wow! I really missed seeing the forest for the trees here---thinking about the purpose of hashCode() without putting it in the context of HashMap. If m is a Map with N entries, and k is a key; what is the purpose of calling m.get(k)? The purpose, obviously, is to search the map for an entry whose key is equal to k.
What if hash codes and hash maps had not been invented? Well the best you could do, assuming that the keys have a natural, total order, is to search a TreeMap, comparing the given key for equality with O(log(N)) other keys. In the worst case, where the keys have no order, you would have to compare the given key for equality with every key in the map until you either find a match or tested them all. In other words, the complexity of m.get(k) would be O(N).
When m is a HashMap, the complexity of m.get(k) is O(1), whether the keys can be ordered or not.
So, I messed up by saying that the point of hash codes was to speed up the process of testing two objects for equality. It's really about testing an object for equality with a whole collection of other objects. That's where comparing hash codes doesn't just help a little; It helps by orders of magnitude...
...If the k.hashCode() and k.equals(o) methods obey the rule: j.hashCode()!=k.hashCode() implies !j.equals(k).

Do I need a equals and Hashcode method if my class implements comparable in Java?

I found this comment on can StringBuffer objects be keys in TreeSet in Java?
"There are 2 identifying strategies used with Maps in Java (more-or-less).
Hashing: An input "Foo" is converted into a best-as-possible attempt to generate a number that uniquely accesses an index into an array. (Purists, please don't abuse me, I am intentionally simplifying). This index is where your value is stored. There is the likely possibility that "Foo" and "Bar" actually generate the same index value meaning they would both be mapped to the same array position. Obviously this can't work and so that's where the "equals()" method comes in; it is used to disambiguate
Comparison: By using a comparative method you don't need this extra disambiguation step because comparison NEVER produces this collision in the first place. The only key that "Foo" is equal to is "Foo". A really good idea though is if you can is to define "equals()" as compareTo() == 0; for consistency sake. Not a requirement."
my question is as follows:
if my class implements comparable, then does it mean I dont have to override equals and hashcode method for using my objects as keys in Hash collections. eg
class Person implements Comparable<Person> {
int id;
String name;
public Person(int id, String name) {
this.id=id;
this.name=name;
}
public int compareTo(Person other) {
return this.id-other.id;
}
}
Now, can I use my Person objects in Hashable collections?
The article you brough is talking on TreeSet. a tree set is a tree with each node has a place defined by it's value in compare to the other values already in the tree.
a hashTable stores key/value pairs in a hash table. When using a Hashtable, you specify an object that is used as a key, and the value that you want linked to that key. The key is then hashed, and the resulting hash code is used as the index at which the value is stored within the table.
the difference between Hashable and TreeSet is that treeset don't need hashCode, it just need to know if you need the take the item left or right in the tree. for that you can use Compare and nothing more.
in hashTable a compare will suffice, because it's build differently, each object get to his cell by hashing it, not by comparing it to the items already in the collection.
so the answer is no, you can' use Person in hashtable just with compareTo. u must override hashCode() and equals() for that
i also suggest you read this article on hashtables
HashTable does use equals and hashCode. Every class has those methods. If you don't implement them, you inherit them.
Whether you need to implement them depends on whether the inherited version is suitable for your purposes. In particular, since Person has no specified superclass, it inherits the Object methods. That means a Person object is equal only to itself.
Do you need two distinct Person objects to be treated as being equal as HashTable keys?
if my class implements comparable, then does it mean I dont have to override equals and hashcode method for using my objects as keys in Hash collections. eg
No, you still need to implement equals() and hashCode(). The methods perform very different functions and cannot be replaced by compareTo().
equals() returns a boolean based on equality of the object. This is usually identity equality and not field equality. This can be very different from the fields used to compare an object in compareTo(...) although if it makes sense for the entity, the equals() method can be:
#Overrides
public boolean equals(Object obj) {
if (obj == null || obj.getClass() != getClass()) {
return false;
} else {
return compareTo((Person)obj) == 0;
}
}
hashCode() returns an integer value for the instance which is used in hash tables to calculate the bucket it should be placed in. There is no equivalent way to get this value out of compareTo(...).
TreeSet needs Comparable, to add values to right or left of tree. HashMap needs equals() and Hashcode() methods that are available from Object Class but you have to override them for your purpose.
If a class implements Comparable, that would suggest that instances of the class represent values of some sort; generally, when classes encapsulate values it will be possible for there to exist two distinct instances which hold the same value and should consequently be considered equivalent. Since the only way for distinct object instances to be considered equivalent is for them to override equals and hashCode, that would imply that things which implement Comparable should override equals and hashCode unless the encapsulated values upon which compare operates will be globally unique (implying that distinct instances should never be considered equivalent).
As a simple example, suppose a class includes a CreationRank field of type long; every time an instances is created, that member is set to a value fetched from a singleton AtomicLong, and Comparable uses that field to rank objects in the order of creation. No two distinct instances of the class will ever report the same CreationRank; consequently, the only way x.equals(y) should ever be true is if x and y refer to the same object instance--exactly the way the default equals and hashCode work.
BTW, having x.compare(y) return zero should generally imply that x.equals(y) will return true, and vice versa, but there are some cases where x.equals(y) may be false but x.compare(y) should nonetheless return zero. This may be the case when an object encapsulates some properties that can be ranked and others that cannot. Consider, for example, a hypohetical FutureAction type which encapsulates a DateTime and an implementation of a DoSomething interface. Such things could be ranked based upon the encapsulated date and time, but there may be no sensible way to rank two items which have the same date and time but different actions. Having equals report false while compare reports zero would make more sense than pretending that the clearly-non-equivalent items should be called "equal".

equals and hashCode

I am running into a question about equals and hashCode contracts:
here it is
Given:
class SortOf {
String name;
int bal;
String code;
short rate;
public int hashCode() {
return (code.length() * bal);
}
public boolean equals(Object o) {
// insert code here
}
}
Which of the following will fulfill the equals() and hashCode() contracts for this
class? (Choose all that apply.)
Correct Answer
C:
return ((SortOf)o).code.length() * ((SortOf)o).bal == this.code.length() *
this.bal;
D:
return ((SortOf)o).code.length() * ((SortOf)o).bal * ((SortOf)o).rate ==
this.code.length() * this.bal * this.rate;
I have a question about the last choice D, say if the two objects
A: code.length=10, bal=10, rate = 100
B: code.length=10, bal=100, rate = 10
Then using the equals() method in D, we get A.equals(B) evaluating to true right? But then they get a different hashCode because they have different balances? Is it that I misunderstood the concept somewhere? Can someone clarify this for me?
You're right - D would be inappropriate because of this.
More generally, hashCode and equals should basically take the same fields into account, in the same way. This is a very strange equals implementation to start with, of course - you should normally be checking for equality between each of the fields involved. In a few cases fields may be inter-related in a way which would allow for multiplication etc, but I wouldn't expect that to involve a string length...
One important point which often confuses people is that it is valid for unequal objects to have the same hash code; it's the case you highlighted (equal objects having different hash codes) which is unacceptable.
You have to check at least all the fields used by .hashCode() so objects which are equal do have the same hash. But you can check more fields in equals, its totally fine to have different objects with the same hash. It seems your doing SCJP 1.6? This topic is well covered in the SCJP 1.6 book from Katherine Sierra and Bert Bates.
Note: thats why its legit to implement a useful .equals() while returning a constant value from .hashCode()
It's all about fulfilling the contract (as far as this question is concerned). Different implementation (of hasCode and equal) has different limitations and its own advantages - so its for developer to check that.
but then they get different hashCode because they have a different balance?
Exactly! But that's why you should choose option C. The question wants to test your grasp on fulfilling the contract concept and not which hascode will be better for the scenario.
More clarification:
The thing you need to check always is :
Your hashCode() implementation should use the same instance variables as used in equals() method.
Here these instance variables are : code.length() and bal used in hashCode() and hence you are limited to use these same variables in equals() as well. (Unless you can edit the hashCode() implementation and add rate to it)
hashCode() method is used to get a unique integer for given object. This integer is used to determined the bucket location, when this object need to be stored in some HashTable like HashMap data structure. But default, Object's hashCode() method returns an integer to represent memory address where object is stored.
equals() method, as name suggest, is used to simply verify the equality of two objects. Default implementation just simply check the object references of two object to verify their equality.
equal objects must have equal hash codes.
equals() must define an equality relation. if the objects are not modified, then it must keep returning the same value. o.equals(null) must always return false.
hashCode() must also be consistent, if the object is not modified in terms of equals(), it must keep returning the same value.
The relation between the two method is:
whenever a.equals(b) then a.hashCode() must be same as b.hashCode().
refer to: https://howtodoinjava.com/interview-questions/core-java-interview-questions-series-part-1/
In general, you should always override one if you override the other in a class. If you don't, you might find yourself getting into trouble when that class is used in hashmaps/hashtables, etc.

Categories