Do java objects used as hash keys need to be 'fully' immutable? - java

If hashCode() calculation uses immutable fields and equals() uses all the fields would it be a problem when the class is used as a hash key? E.g.
import java.util.Objects;
public class Car {
protected final long vin;
protected String state;
protected String plateNumber;
public Car( long v, String s, String p ) {
vin = v; state = s; plateNumber = p;
}
public void move( String s, String p ) {
state = s; plateNumber = p;
}
public int hashCode() {
return (int)( vin % Integer.MAX_VALUE );
}
public boolean equals( Object other ) {
if (this == other) return true;
else if (!(other instanceof Car)) return false;
Car otherCar = (Car) other;
return vin == otherCar.vin
&& Objects.equals( state, otherCar.state )
&& Objects.equals( plateNumber, otherCar.plateNumber );
}
}
And move() is called on a car object after it is inserted into a hashset, possible via a reference kept elsewhere.
I am not after performance issues here. Only correctness.
I have read java hashCode contact, few answers on SO including this by venerable Jon Skeet and this from big blue. I feel that the last link gives the best explanation and imply that above code is correct.
Edit
Conclusion:
This class satisfy constraints placed on ‘equals()’ and ‘hashCode()’ in java. However it violates restrictions additional requirements placed on ‘equals()’ when used as keys in collections, hashed or not.
The additional requirement is that ‘equals()’ need to be consistent as long as the object is a key.
See the counter example by Louis Wasserman and the reference provided by Douglas below.
Few clarifications:
A) This class satisfy java object level constraints:
( carA == carB ) implies ( carA.hashCode() == carB.hashCode() )
( carA.hashCode() != carB.hashCode() ) implies ( carA != carB )
equals() need to be reflexive, symmetric, transitive.
hashCode() need to be consistent. i.e. Cannot change for an object during its lifetime.
equals() need to be consistent as long as neither object is modified.
Note that the reverse of ‘1.’ and ‘2.’ are not necessary. And the class above satisfies all the conditions.
Also java docs mention "equals() … implements the most discriminating possible equivalence relation on objects", but not sure if that is compulsory.
B) As for performance, the increment in collision avoidance probability decrease with each successive member variable we combine. Usually few well chosen member variables is sufficient.

It's correct if you never, ever call move after the Car is in the map. Otherwise it's wrong. Both hashCode and equals have to stay consistent after a key is in the map.

When considering only the hashCode and equals contracts, you are correct that this implementation satisfies their requirements. hashCode using a strict subset of the fields that equals uses is sufficient to guarantee that a.equals(b) implies a.hashCode() == b.hashCode() as required.
However, things change when you bring in Map. From the Map javadoc, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map."
After you call move on a Car that is a key in a Map, the behavior of that Map is now unspecified. In many cases it will in practice still work the way you want it to, but bizarre things could happen in ways that are hard to predict. While it would technically be valid for the Map to spontaneously empty itself or switch all lookups to use a random number generator, a more likely scenario might go like this:
Car car1 = ...
Car car2 = ... // a copy of car1
Map<Car, String> map1 = ...
map1.put(car1, "value");
assert map1.get(car2).equals("value"); // true
car1.move(...);
assert map1.get(car2).equals("value"); // NullPointerException on the equals call, car2 is no longer found
Notice that neither car2 nor the Map were changed themselves in any way, but the mapping of car2 changed (or rather, disappeared) anyway. This behavior is not officially specified, but I would guess most Map implementations do behave this way.

You may mutate your key candidates as much as you want, before or after (not during) they are used as keys.
In practice, it is very hard to enforce this rule. If you mutate objects you do not have a control if somebody uses them as keys or not.
Immutability for keys is just easier, removes source of subtle, hard-to-find bugs and just work better for key.
In your case I see no correctness issues. But why you ever bother not to include all fields in hashcode?

Short answer: it should be OK, but prepare for bizarre behavior.
Longer answer: when you change fields that participate in equals() on a key, the value keyed by that key will no longer be found.
Still longer answer: this looks as X/Y problem: you're asking about X, but you really need X to accomplish Y. Maybe you should ask about Y?
The car in your case is uniquely identified by vin. A car equals to itself. But, a car can be registered in different states. Maybe the answer is to have a Registration object (or a few of them) attached to the car? And then you can separate car.equals() from registration.equals().

Hash works by putting items into "buckets". Each bucket is calculated by the hashcode. After finding the bucket then the search continues comparing each item one by one using equals.
For example:
During insertion: an object whose id is 100 is placed in bucket 5 (the hashcode calculated 5).
During retrieval: you ask the hashmap to find the item 100. If the hash calculates 7 now then the algorithm will search for your object in bucket 7 but your object will never be found as it is dwelling in bucket 5.
In summary: the hash code and the actual key work together. The former is used to know in which bucket the item should be. The latter is used by the equals comparison seeking the actual item to return.

When your hashCode() implementation uses only limited number of fields (vs equals) you're reducing performance of almost any algorithm/data structure that uses hashing: HashMap, HashSet etc. You're increasing collision probability - it's the situation when two different objects (equals return false) have the same hash value.

The short answer is: No.
Long answer:
Fully immutability is not neccessary. BUT:
Equals must only depend on immutable values. Hashcode must depend on immutable values either a constant or a subset of the values used in equals or all values used in equals. Values that are not mentioned within equals mustn't be part of hashcode.
If you mutate values equals and hashcode rely on it is likely that you do not find your objects again in a hash based datastructure. Look at this:
public class Test {
private static class TestObject {
private String s;
public TestObject(String s) {
super();
this.s = s;
}
public void setS(String s) {
this.s = s;
}
#Override
public boolean equals(Object obj) {
boolean equals = false;
if (obj instanceof TestObject) {
TestObject that = (TestObject) obj;
equals = this.s.equals(that.s);
}
return equals;
}
#Override
public int hashCode() {
return this.s.hashCode();
}
}
public static void main(String[] args) {
TestObject a1 = new TestObject("A");
TestObject a2 = new TestObject("A");
System.out.println(a1.equals(a2)); // true
HashMap<TestObject, Object> hashSet = new HashMap<>(); // hash based datastructure
hashSet.put(a1, new Object());
System.out.println(hashSet.containsKey(a1)); // true
a1.setS("A*");
System.out.println(hashSet.containsKey(a1)); // false !!! // Object is not found as the hashcode used initially before mutation was used to determine the hash bucket
a2.setS("A*");
System.out.println(hashSet.containsKey(a2)); // false !!! Because a1 is in wrong hash bucket ...
System.out.println(a1.equals(a2)); // ... even if the objects are equals
}
}

Related

Why doesn't distinct work for a custom class in Java Streams? [duplicate]

Recently I read through this
Developer Works Document.
The document is all about defining hashCode() and equals() effectively and correctly, however I am not able to figure out why we need to override these two methods.
How can I take the decision to implement these methods efficiently?
Joshua Bloch says on Effective Java
You must override hashCode() in every class that overrides equals(). Failure to do so will result in a violation of the general contract for Object.hashCode(), which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.
Let's try to understand it with an example of what would happen if we override equals() without overriding hashCode() and attempt to use a Map.
Say we have a class like this and that two objects of MyClass are equal if their importantField is equal (with hashCode() and equals() generated by eclipse)
public class MyClass {
private final String importantField;
private final String anotherField;
public MyClass(final String equalField, final String anotherField) {
this.importantField = equalField;
this.anotherField = anotherField;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((importantField == null) ? 0 : importantField.hashCode());
return result;
}
#Override
public boolean equals(final Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
final MyClass other = (MyClass) obj;
if (importantField == null) {
if (other.importantField != null)
return false;
} else if (!importantField.equals(other.importantField))
return false;
return true;
}
}
Imagine you have this
MyClass first = new MyClass("a","first");
MyClass second = new MyClass("a","second");
Override only equals
If only equals is overriden, then when you call myMap.put(first,someValue) first will hash to some bucket and when you call myMap.put(second,someOtherValue) it will hash to some other bucket (as they have a different hashCode). So, although they are equal, as they don't hash to the same bucket, the map can't realize it and both of them stay in the map.
Although it is not necessary to override equals() if we override hashCode(), let's see what would happen in this particular case where we know that two objects of MyClass are equal if their importantField is equal but we do not override equals().
Override only hashCode
If you only override hashCode then when you call myMap.put(first,someValue) it takes first, calculates its hashCode and stores it in a given bucket. Then when you call myMap.put(second,someOtherValue) it should replace first with second as per the Map Documentation because they are equal (according to the business requirement).
But the problem is that equals was not redefined, so when the map hashes second and iterates through the bucket looking if there is an object k such that second.equals(k) is true it won't find any as second.equals(first) will be false.
Hope it was clear
Collections such as HashMap and HashSet use a hashcode value of an object to determine how it should be stored inside a collection, and the hashcode is used again in order to locate the object
in its collection.
Hashing retrieval is a two-step process:
Find the right bucket (using hashCode())
Search the bucket for the right element (using equals() )
Here is a small example on why we should overrride equals() and hashcode().
Consider an Employee class which has two fields: age and name.
public class Employee {
String name;
int age;
public Employee(String name, int age) {
this.name = name;
this.age = age;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
#Override
public boolean equals(Object obj) {
if (obj == this)
return true;
if (!(obj instanceof Employee))
return false;
Employee employee = (Employee) obj;
return employee.getAge() == this.getAge()
&& employee.getName() == this.getName();
}
// commented
/* #Override
public int hashCode() {
int result=17;
result=31*result+age;
result=31*result+(name!=null ? name.hashCode():0);
return result;
}
*/
}
Now create a class, insert Employee object into a HashSet and test whether that object is present or not.
public class ClientTest {
public static void main(String[] args) {
Employee employee = new Employee("rajeev", 24);
Employee employee1 = new Employee("rajeev", 25);
Employee employee2 = new Employee("rajeev", 24);
HashSet<Employee> employees = new HashSet<Employee>();
employees.add(employee);
System.out.println(employees.contains(employee2));
System.out.println("employee.hashCode(): " + employee.hashCode()
+ " employee2.hashCode():" + employee2.hashCode());
}
}
It will print the following:
false
employee.hashCode(): 321755204 employee2.hashCode():375890482
Now uncomment hashcode() method , execute the same and the output would be:
true
employee.hashCode(): -938387308 employee2.hashCode():-938387308
Now can you see why if two objects are considered equal, their hashcodes must
also be equal? Otherwise, you'd never be able to find the object since the default
hashcode method in class Object virtually always comes up with a unique number
for each object, even if the equals() method is overridden in such a way that two
or more objects are considered equal. It doesn't matter how equal the objects are if
their hashcodes don't reflect that. So one more time: If two objects are equal, their
hashcodes must be equal as well.
You must override hashCode() in every
class that overrides equals(). Failure
to do so will result in a violation of
the general contract for
Object.hashCode(), which will prevent
your class from functioning properly
in conjunction with all hash-based
collections, including HashMap,
HashSet, and Hashtable.
   from Effective Java, by Joshua Bloch
By defining equals() and hashCode() consistently, you can improve the usability of your classes as keys in hash-based collections. As the API doc for hashCode explains: "This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable."
The best answer to your question about how to implement these methods efficiently is suggesting you to read Chapter 3 of Effective Java.
Why we override equals() method
In Java we can not overload how operators like ==, +=, -+ behave. They are behaving a certain way. So let's focus on the operator == for our case here.
How operator == works.
It checks if 2 references that we compare point to the same instance in memory. Operator == will resolve to true only if those 2 references represent the same instance in memory.
So now let's consider the following example
public class Person {
private Integer age;
private String name;
..getters, setters, constructors
}
So let's say that in your program you have built 2 Person objects on different places and you wish to compare them.
Person person1 = new Person("Mike", 34);
Person person2 = new Person("Mike", 34);
System.out.println ( person1 == person2 ); --> will print false!
Those 2 objects from business perspective look the same right? For JVM they are not the same. Since they are both created with new keyword those instances are located in different segments in memory. Therefore the operator == will return false
But if we can not override the == operator how can we say to JVM that we want those 2 objects to be treated as same. There comes the .equals() method in play.
You can override equals() to check if some objects have same values for specific fields to be considered equal.
You can select which fields you want to be compared. If we say that 2 Person objects will be the same if and only if they have the same age and same name, then the IDE will create something like the following for automatic generation of equals()
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Person person = (Person) o;
return age == person.age &&
name.equals(person.name);
}
Let's go back to our previous example
Person person1 = new Person("Mike", 34);
Person person2 = new Person("Mike", 34);
System.out.println ( person1 == person2 ); --> will print false!
System.out.println ( person1.equals(person2) ); --> will print true!
So we can not overload == operator to compare objects the way we want but Java gave us another way, the equals() method, which we can override as we want.
Keep in mind however, if we don't provide our custom version of .equals() (aka override) in our class then the predefined .equals() from Object class and == operator will behave exactly the same.
Default equals() method which is inherited from Object will check whether both compared instances are the same in memory!
Why we override hashCode() method
Some Data Structures in java like HashSet, HashMap store their elements based on a hash function which is applied on those elements. The hashing function is the hashCode()
If we have a choice of overriding .equals() method then we must also have a choice of overriding hashCode() method. There is a reason for that.
Default implementation of hashCode() which is inherited from Object considers all objects in memory unique!
Let's get back to those hash data structures. There is a rule for those data structures.
HashSet can not contain duplicate values and HashMap can not contain duplicate keys
HashSet is implemented with a HashMap behind the scenes where each value of a HashSet is stored as a key in a HashMap.
So we have to understand how a HashMap works.
In a simple way a HashMap is a native array that has some buckets. Each bucket has a linkedList. In that linkedList our keys are stored. HashMap locates the correct linkedList for each key by applying hashCode() method and after that it iterates through all elements of that linkedList and applies equals() method on each of these elements to check if that element is already contained there. No duplicate keys are allowed.
When we put something inside a HashMap, the key is stored in one of those linkedLists. In which linkedList that key will be stored is shown by the result of hashCode() method on that key. So if key1.hashCode() has as a result 4, then that key1 will be stored on the 4th bucket of the array, in the linkedList that exists there.
By default hashCode() method returns a different result for each different instance. If we have the default equals() which behaves like == which considers all instances in memory as different objects we don't have any problem.
But in our previous example we said we want Person instances to be considered equal if their ages and names match.
Person person1 = new Person("Mike", 34);
Person person2 = new Person("Mike", 34);
System.out.println ( person1.equals(person2) ); --> will print true!
Now let's create a map to store those instances as keys with some string as pair value
Map<Person, String> map = new HashMap();
map.put(person1, "1");
map.put(person2, "2");
In Person class we have not overridden the hashCode method but we have overridden equals method. Since the default hashCode provides different results for different java instances person1.hashCode() and person2.hashCode() have big chances of having different results.
Our map might end with those persons in different linkedLists.
This is against the logic of a HashMap
A HashMap is not allowed to have multiple equal keys!
But ours now has and the reason is that the default hashCode() which was inherited from Object Class was not enough. Not after we have overridden the equals() method on Person Class.
That is the reason why we must override hashCode() method after we have overridden equals method.
Now let's fix that. Let's override our hashCode() method to consider the same fields that equals() considers, namely age, name
public class Person {
private Integer age;
private String name;
..getters, setters, constructors
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Person person = (Person) o;
return age == person.age &&
name.equals(person.name);
}
#Override
public int hashCode() {
return Objects.hash(name, age);
}
}
Now let's try again to save those keys in our HashMap
Map<Person, String> map = new HashMap();
map.put(person1, "1");
map.put(person2, "2");
person1.hashCode() and person2.hashCode() will definitely be the same. Let's say it is 0.
HashMap will go to bucket 0 and in that LinkedList will save the person1 as key with the value "1". For the second put HashMap is intelligent enough and when it goes again to bucket 0 to save person2 key with value "2" it will see that another equal key already exists there. So it will overwrite the previous key. So in the end only person2 key will exist in our HashMap.
Now we are aligned with the rule of Hash Map that says no multiple equal keys are allowed!
Identity is not equality.
equals operator == test identity.
equals(Object obj) method compares equality test(i.e. we need to tell equality by overriding the method)
Why do I need to override the equals and hashCode methods in Java?
First we have to understand the use of equals method.
In order to identity differences between two objects we need to override equals method.
For example:
Customer customer1=new Customer("peter");
Customer customer2=customer1;
customer1.equals(customer2); // returns true by JVM. i.e. both are refering same Object
------------------------------
Customer customer1=new Customer("peter");
Customer customer2=new Customer("peter");
customer1.equals(customer2); //return false by JVM i.e. we have two different peter customers.
------------------------------
Now I have overriden Customer class equals method as follows:
#Override
public boolean equals(Object obj) {
if (this == obj) // it checks references
return true;
if (obj == null) // checks null
return false;
if (getClass() != obj.getClass()) // both object are instances of same class or not
return false;
Customer other = (Customer) obj;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name)) // it again using bulit in String object equals to identify the difference
return false;
return true;
}
Customer customer1=new Customer("peter");
Customer customer2=new Customer("peter");
Insteady identify the Object equality by JVM, we can do it by overring equals method.
customer1.equals(customer2); // returns true by our own logic
Now hashCode method can understand easily.
hashCode produces integer in order to store object in data structures like HashMap, HashSet.
Assume we have override equals method of Customer as above,
customer1.equals(customer2); // returns true by our own logic
While working with data structure when we store object in buckets(bucket is a fancy name for folder). If we use built-in hash technique, for above two customers it generates two different hashcode. So we are storing the same identical object in two different places. To avoid this kind of issues we should override the hashCode method also based on the following principles.
un-equal instances may have same hashcode.
equal instances should return same hashcode.
Simply put, the equals-method in Object check for reference equality, where as two instances of your class could still be semantically equal when the properties are equal. This is for instance important when putting your objects into a container that utilizes equals and hashcode, like HashMap and Set. Let's say we have a class like:
public class Foo {
String id;
String whatevs;
Foo(String id, String whatevs) {
this.id = id;
this.whatevs = whatevs;
}
}
We create two instances with the same id:
Foo a = new Foo("id", "something");
Foo b = new Foo("id", "something else");
Without overriding equals we are getting:
a.equals(b) is false because they are two different instances
a.equals(a) is true since it's the same instance
b.equals(b) is true since it's the same instance
Correct? Well maybe, if this is what you want. But let's say we want objects with the same id to be the same object, regardless if it's two different instances. We override the equals (and hashcode):
public class Foo {
String id;
String whatevs;
Foo(String id, String whatevs) {
this.id = id;
this.whatevs = whatevs;
}
#Override
public boolean equals(Object other) {
if (other instanceof Foo) {
return ((Foo)other).id.equals(this.id);
}
}
#Override
public int hashCode() {
return this.id.hashCode();
}
}
As for implementing equals and hashcode I can recommend using Guava's helper methods
Let me explain the concept in very simple words.
Firstly from a broader perspective we have collections, and hashmap is one of the datastructure in the collections.
To understand why we have to override the both equals and hashcode method, if need to first understand what is hashmap and what is does.
A hashmap is a datastructure which stores key value pairs of data in array fashion. Lets say a[], where each element in 'a' is a key value pair.
Also each index in the above array can be linked list thereby having more than one values at one index.
Now why is a hashmap used?
If we have to search among a large array then searching through each if them will not be efficient, so what hash technique tells us that lets pre process the array with some logic and group the elements based on that logic i.e. Hashing
EG: we have array 1,2,3,4,5,6,7,8,9,10,11 and we apply a hash function mod 10 so 1,11 will be grouped in together. So if we had to search for 11 in previous array then we would have to iterate the complete array but when we group it we limit our scope of iteration thereby improving speed. That datastructure used to store all the above information can be thought of as a 2d array for simplicity
Now apart from the above hashmap also tells that it wont add any Duplicates in it. And this is the main reason why we have to override the equals and hashcode
So when its said that explain the internal working of hashmap , we need to find what methods the hashmap has and how does it follow the above rules which i explained above
so the hashmap has method called as put(K,V) , and according to hashmap it should follow the above rules of efficiently distributing the array and not adding any duplicates
so what put does is that it will first generate the hashcode for the given key to decide which index the value should go in.if nothing is present at that index then the new value will be added over there, if something is already present over there then the new value should be added after the end of the linked list at that index. but remember no duplicates should be added as per the desired behavior of the hashmap. so lets say you have two Integer objects aa=11,bb=11.
As every object derived from the object class, the default implementation for comparing two objects is that it compares the reference and not values inside the object. So in the above case both though semantically equal will fail the equality test, and possibility that two objects which same hashcode and same values will exists thereby creating duplicates. If we override then we could avoid adding duplicates.
You could also refer to Detail working
import java.util.HashMap;
public class Employee {
String name;
String mobile;
public Employee(String name,String mobile) {
this.name = name;
this.mobile = mobile;
}
#Override
public int hashCode() {
System.out.println("calling hascode method of Employee");
String str = this.name;
int sum = 0;
for (int i = 0; i < str.length(); i++) {
sum = sum + str.charAt(i);
}
return sum;
}
#Override
public boolean equals(Object obj) {
// TODO Auto-generated method stub
System.out.println("calling equals method of Employee");
Employee emp = (Employee) obj;
if (this.mobile.equalsIgnoreCase(emp.mobile)) {
System.out.println("returning true");
return true;
} else {
System.out.println("returning false");
return false;
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
Employee emp = new Employee("abc", "hhh");
Employee emp2 = new Employee("abc", "hhh");
HashMap<Employee, Employee> h = new HashMap<>();
//for (int i = 0; i < 5; i++) {
h.put(emp, emp);
h.put(emp2, emp2);
//}
System.out.println("----------------");
System.out.println("size of hashmap: "+h.size());
}
}
hashCode() :
If you only override the hash-code method nothing happens, because it always returns a new hashCode for each object as an Object class.
equals() :
If you only override the equals method, if a.equals(b) is true it means the hashCode of a and b must be the same but that does not happen since you did not override the hashCode method.
Note : hashCode() method of Object class always returns a new hashCode for each object.
So when you need to use your object in the hashing based collection, you must override both equals() and hashCode().
Java puts a rule that
"If two objects are equal using Object class equals method, then the hashcode method should give the same value for these two objects."
So, if in our class we override equals() we should override hashcode() method also to follow this rule.
Both methods, equals() and hashcode(), are used in Hashtable, for example, to store values as key-value pairs. If we override one and not the other, there is a possibility that the Hashtable may not work as we want, if we use such object as a key.
Adding to #Lombo 's answer
When will you need to override equals() ?
The default implementation of Object's equals() is
public boolean equals(Object obj) {
return (this == obj);
}
which means two objects will be considered equal only if they have the same memory address which will be true only if you are
comparing an object with itself.
But you might want to consider two objects the same if they have the same value for one
or more of their properties (Refer the example given in #Lombo 's answer).
So you will override equals() in these situations and you would give your own conditions for equality.
I have successfully implemented equals() and it is working great.So why are they asking to override hashCode() as well?
Well.As long as you don't use "Hash" based Collections on your user-defined class,it is fine.
But some time in the future you might want to use HashMap or HashSet and if you don't override and "correctly implement" hashCode(), these Hash based collection won't work as intended.
Override only equals (Addition to #Lombo 's answer)
myMap.put(first,someValue)
myMap.contains(second); --> But it should be the same since the key are the same.But returns false!!! How?
First of all,HashMap checks if the hashCode of second is the same as first.
Only if the values are the same,it will proceed to check the equality in the same bucket.
But here the hashCode is different for these 2 objects (because they have different memory address-from default implementation).
Hence it will not even care to check for equality.
If you have a breakpoint inside your overridden equals() method,it wouldn't step in if they have different hashCodes.
contains() checks hashCode() and only if they are the same it would call your equals() method.
Why can't we make the HashMap check for equality in all the buckets? So there is no necessity for me to override hashCode() !!
Then you are missing the point of Hash based Collections.
Consider the following :
Your hashCode() implementation : intObject%9.
The following are the keys stored in the form of buckets.
Bucket 1 : 1,10,19,... (in thousands)
Bucket 2 : 2,20,29...
Bucket 3 : 3,21,30,...
...
Say,you want to know if the map contains the key 10.
Would you want to search all the buckets? or Would you want to search only one bucket?
Based on the hashCode,you would identify that if 10 is present,it must be present in Bucket 1.
So only Bucket 1 will be searched !!
Because if you do not override them you will be use the default implentation in Object.
Given that instance equality and hascode values generally require knowledge of what makes up an object they generally will need to be redefined in your class to have any tangible meaning.
In order to use our own class objects as keys in collections like HashMap, Hashtable etc.. , we should override both methods ( hashCode() and equals() ) by having an awareness on internal working of collection. Otherwise, it leads to wrong results which we are not expected.
class A {
int i;
// Hashing Algorithm
if even number return 0 else return 1
// Equals Algorithm,
if i = this.i return true else false
}
put('key','value') will calculate the hash value using hashCode() to determine the
bucket and uses equals() method to find whether the value is already
present in the Bucket. If not it will added else it will be replaced with current value
get('key') will use hashCode() to find the Entry (bucket) first and
equals() to find the value in Entry
if Both are overridden,
Map<A>
Map.Entry 1 --> 1,3,5,...
Map.Entry 2 --> 2,4,6,...
if equals is not overridden
Map<A>
Map.Entry 1 --> 1,3,5,...,1,3,5,... // Duplicate values as equals not overridden
Map.Entry 2 --> 2,4,6,...,2,4,..
If hashCode is not overridden
Map<A>
Map.Entry 1 --> 1
Map.Entry 2 --> 2
Map.Entry 3 --> 3
Map.Entry 4 --> 1
Map.Entry 5 --> 2
Map.Entry 6 --> 3 // Same values are Stored in different hasCodes violates Contract 1
So on...
HashCode Equal Contract
Two keys equal according to equal method should generate same hashCode
Two Keys generating same hashCode need not be equal (In above example all even numbers generate same hash Code)
1) The common mistake is shown in the example below.
public class Car {
private String color;
public Car(String color) {
this.color = color;
}
public boolean equals(Object obj) {
if(obj==null) return false;
if (!(obj instanceof Car))
return false;
if (obj == this)
return true;
return this.color.equals(((Car) obj).color);
}
public static void main(String[] args) {
Car a1 = new Car("green");
Car a2 = new Car("red");
//hashMap stores Car type and its quantity
HashMap<Car, Integer> m = new HashMap<Car, Integer>();
m.put(a1, 10);
m.put(a2, 20);
System.out.println(m.get(new Car("green")));
}
}
the green Car is not found
2. Problem caused by hashCode()
The problem is caused by the un-overridden method hashCode(). The contract between equals() and hashCode() is:
If two objects are equal, then they must have the same hash code.
If two objects have the same hash code, they may or may not be equal.
public int hashCode(){
return this.color.hashCode();
}
It is useful when using Value Objects. The following is an excerpt from the Portland Pattern Repository:
Examples of value objects are things
like numbers, dates, monies and
strings. Usually, they are small
objects which are used quite widely.
Their identity is based on their state
rather than on their object identity.
This way, you can have multiple copies
of the same conceptual value object.
So I can have multiple copies of an
object that represents the date 16 Jan
1998. Any of these copies will be equal to each other. For a small
object such as this, it is often
easier to create new ones and move
them around rather than rely on a
single object to represent the date.
A value object should always override
.equals() in Java (or = in Smalltalk).
(Remember to override .hashCode() as
well.)
Assume you have class (A) that aggregates two other (B) (C), and you need to store instances of (A) inside hashtable. Default implementation only allows distinguishing of instances, but not by (B) and (C). So two instances of A could be equal, but default wouldn't allow you to compare them in correct way.
Consider collection of balls in a bucket all in black color. Your Job is to color those balls as follows and use it for appropriate game,
For Tennis - Yellow, Red.
For Cricket - White
Now bucket has balls in three colors Yellow, Red and White. And that now you did the coloring Only you know which color is for which game.
Coloring the balls - Hashing.
Choosing the ball for game - Equals.
If you did the coloring and some one chooses the ball for either cricket or tennis they wont mind the color!!!
I was looking into the explanation " If you only override hashCode then when you call myMap.put(first,someValue) it takes first, calculates its hashCode and stores it in a given bucket. Then when you call myMap.put(first,someOtherValue) it should replace first with second as per the Map Documentation because they are equal (according to our definition)." :
I think 2nd time when we are adding in myMap then it should be the 'second' object like myMap.put(second,someOtherValue)
The methods equals and hashcode are defined in the object class. By default if the equals method returns true, then the system will go further and check the value of the hash code. If the hash code of the 2 objects is also same only then the objects will be considered as same. So if you override only equals method, then even though the overridden equals method indicates 2 objects to be equal , the system defined hashcode may not indicate that the 2 objects are equal. So we need to override hash code as well.
Equals and Hashcode methods in Java
They are methods of java.lang.Object class which is the super class of all the classes (custom classes as well and others defined in java API).
Implementation:
public boolean equals(Object obj)
public int hashCode()
public boolean equals(Object obj)
This method simply checks if two object references x and y refer to the same object. i.e. It checks if x == y.
It is reflexive: for any reference value x, x.equals(x) should return true.
It is symmetric: for any reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive: for any reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
It is consistent: for any reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
For any non-null reference value x, x.equals(null) should return
false.
public int hashCode()
This method returns the hash code value for the object on which this method is invoked. This method returns the hash code value as an integer and is supported for the benefit of hashing based collection classes such as Hashtable, HashMap, HashSet etc. This method must be overridden in every class that overrides the equals method.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Equal objects must produce the same hash code as long as they are
equal, however unequal objects need not produce distinct hash codes.
Resources:
JavaRanch
Picture
If you override equals() and not hashcode(), you will not find any problem unless you or someone else uses that class type in a hashed collection like HashSet.
People before me have clearly explained the documented theory multiple times, I am just here to provide a very simple example.
Consider a class whose equals() need to mean something customized :-
public class Rishav {
private String rshv;
public Rishav(String rshv) {
this.rshv = rshv;
}
/**
* #return the rshv
*/
public String getRshv() {
return rshv;
}
/**
* #param rshv the rshv to set
*/
public void setRshv(String rshv) {
this.rshv = rshv;
}
#Override
public boolean equals(Object obj) {
if (obj instanceof Rishav) {
obj = (Rishav) obj;
if (this.rshv.equals(((Rishav) obj).getRshv())) {
return true;
} else {
return false;
}
} else {
return false;
}
}
#Override
public int hashCode() {
return rshv.hashCode();
}
}
Now consider this main class :-
import java.util.HashSet;
import java.util.Set;
public class TestRishav {
public static void main(String[] args) {
Rishav rA = new Rishav("rishav");
Rishav rB = new Rishav("rishav");
System.out.println(rA.equals(rB));
System.out.println("-----------------------------------");
Set<Rishav> hashed = new HashSet<>();
hashed.add(rA);
System.out.println(hashed.contains(rB));
System.out.println("-----------------------------------");
hashed.add(rB);
System.out.println(hashed.size());
}
}
This will yield the following output :-
true
-----------------------------------
true
-----------------------------------
1
I am happy with the results. But if I have not overridden hashCode(), it will cause nightmare as objects of Rishav with same member content will no longer be treated as unique as the hashCode will be different, as generated by default behavior, here's the would be output :-
true
-----------------------------------
false
-----------------------------------
2
In the example below, if you comment out the override for equals or hashcode in the Person class, this code will fail to look up Tom's order. Using the default implementation of hashcode can cause failures in hashtable lookups.
What I have below is a simplified code that pulls up people's order by Person. Person is being used as a key in the hashtable.
public class Person {
String name;
int age;
String socialSecurityNumber;
public Person(String name, int age, String socialSecurityNumber) {
this.name = name;
this.age = age;
this.socialSecurityNumber = socialSecurityNumber;
}
#Override
public boolean equals(Object p) {
//Person is same if social security number is same
if ((p instanceof Person) && this.socialSecurityNumber.equals(((Person) p).socialSecurityNumber)) {
return true;
} else {
return false;
}
}
#Override
public int hashCode() { //I am using a hashing function in String.java instead of writing my own.
return socialSecurityNumber.hashCode();
}
}
public class Order {
String[] items;
public void insertOrder(String[] items)
{
this.items=items;
}
}
import java.util.Hashtable;
public class Main {
public static void main(String[] args) {
Person p1=new Person("Tom",32,"548-56-4412");
Person p2=new Person("Jerry",60,"456-74-4125");
Person p3=new Person("Sherry",38,"418-55-1235");
Order order1=new Order();
order1.insertOrder(new String[]{"mouse","car charger"});
Order order2=new Order();
order2.insertOrder(new String[]{"Multi vitamin"});
Order order3=new Order();
order3.insertOrder(new String[]{"handbag", "iPod"});
Hashtable<Person,Order> hashtable=new Hashtable<Person,Order>();
hashtable.put(p1,order1);
hashtable.put(p2,order2);
hashtable.put(p3,order3);
//The line below will fail if Person class does not override hashCode()
Order tomOrder= hashtable.get(new Person("Tom", 32, "548-56-4412"));
for(String item:tomOrder.items)
{
System.out.println(item);
}
}
}
hashCode() method is used to get a unique integer for given object. This integer is used for determining the bucket location, when this object needs to be stored in some HashTable, HashMap like data structure. By default, Object’s hashCode() method returns and integer representation of memory address where object is stored.
The hashCode() method of objects is used when we insert them into a HashTable, HashMap or HashSet. More about HashTables on Wikipedia.org for reference.
To insert any entry in map data structure, we need both key and value. If both key and values are user define data types, the hashCode() of the key will be determine where to store the object internally. When require to lookup the object from the map also, the hash code of the key will be determine where to search for the object.
The hash code only points to a certain "area" (or list, bucket etc) internally. Since different key objects could potentially have the same hash code, the hash code itself is no guarantee that the right key is found. The HashTable then iterates this area (all keys with the same hash code) and uses the key's equals() method to find the right key. Once the right key is found, the object stored for that key is returned.
So, as we can see, a combination of the hashCode() and equals() methods are used when storing and when looking up objects in a HashTable.
NOTES:
Always use same attributes of an object to generate hashCode() and equals() both. As in our case, we have used employee id.
equals() must be consistent (if the objects are not modified, then it must keep returning the same value).
Whenever a.equals(b), then a.hashCode() must be same as b.hashCode().
If you override one, then you should override the other.
http://parameshk.blogspot.in/2014/10/examples-of-comparable-comporator.html
String class and wrapper classes have different implementation of equals() and hashCode() methods than Object class. equals() method of Object class compares the references of the objects, not the contents. hashCode() method of Object class returns distinct hashcode for every single object whether the contents are same.
It leads problem when you use Map collection and the key is of Persistent type, StringBuffer/builder type. Since they don't override equals() and hashCode() unlike String class, equals() will return false when you compare two different objects even though both have same contents. It will make the hashMap storing same content keys. Storing same content keys means it is violating the rule of Map because Map doesnt allow duplicate keys at all.
Therefore you override equals() as well as hashCode() methods in your class and provide the implementation(IDE can generate these methods) so that they work same as String's equals() and hashCode() and prevent same content keys.
You have to override hashCode() method along with equals() because equals() work according hashcode.
Moreover overriding hashCode() method along with equals() helps to intact the equals()-hashCode() contract: "If two objects are equal, then they must have the same hash code."
When do you need to write custom implementation for hashCode()?
As we know that internal working of HashMap is on principle of Hashing. There are certain buckets where entrysets get stored. You customize the hashCode() implementation according your requirement so that same category objects can be stored into same index.
when you store the values into Map collection using put(k,v)method, the internal implementation of put() is:
put(k, v){
hash(k);
index=hash & (n-1);
}
Means, it generates index and the index is generated based on the hashcode of particular key object. So make this method generate hashcode according your requirement because same hashcode entrysets will be stored into same bucket or index.
That's it!
IMHO, it's as per the rule says - If two objects are equal then they should have same hash, i.e., equal objects should produce equal hash values.
Given above, default equals() in Object is == which does comparison on the address, hashCode() returns the address in integer(hash on actual address) which is again distinct for distinct Object.
If you need to use the custom Objects in the Hash based collections, you need to override both equals() and hashCode(), example If I want to maintain the HashSet of the Employee Objects, if I don't use stronger hashCode and equals I may endup overriding the two different Employee Objects, this happen when I use the age as the hashCode(), however I should be using the unique value which can be the Employee ID.
To help you check for duplicate Objects, we need a custom equals and hashCode.
Since hashcode always returns a number its always fast to retrieve an object using a number rather than an alphabetic key. How will it do? Assume we created a new object by passing some value which is already available in some other object. Now the new object will return the same hash value as of another object because the value passed is same. Once the same hash value is returned, JVM will go to the same memory address every time and if in case there are more than one objects present for the same hash value it will use equals() method to identify the correct object.
When you want to store and retrieve your custom object as a key in Map, then you should always override equals and hashCode in your custom Object .
Eg:
Person p1 = new Person("A",23);
Person p2 = new Person("A",23);
HashMap map = new HashMap();
map.put(p1,"value 1");
map.put(p2,"value 2");
Here p1 & p2 will consider as only one object and map size will be only 1 because they are equal.
public class Employee {
private int empId;
private String empName;
public Employee(int empId, String empName) {
super();
this.empId = empId;
this.empName = empName;
}
public int getEmpId() {
return empId;
}
public void setEmpId(int empId) {
this.empId = empId;
}
public String getEmpName() {
return empName;
}
public void setEmpName(String empName) {
this.empName = empName;
}
#Override
public String toString() {
return "Employee [empId=" + empId + ", empName=" + empName + "]";
}
#Override
public int hashCode() {
return empId + empName.hashCode();
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (!(this instanceof Employee)) {
return false;
}
Employee emp = (Employee) obj;
return this.getEmpId() == emp.getEmpId() && this.getEmpName().equals(emp.getEmpName());
}
}
Test Class
public class Test {
public static void main(String[] args) {
Employee emp1 = new Employee(101,"Manash");
Employee emp2 = new Employee(101,"Manash");
Employee emp3 = new Employee(103,"Ranjan");
System.out.println(emp1.hashCode());
System.out.println(emp2.hashCode());
System.out.println(emp1.equals(emp2));
System.out.println(emp1.equals(emp3));
}
}
In Object Class equals(Object obj) is used to compare address comparesion thats why when in Test class if you compare two objects then equals method giving false but when we override hashcode() the it can compare content and give proper result.
Both the methods are defined in Object class. And both are in its simplest implementation. So when you need you want add some more implementation to these methods then you have override in your class.
For Ex: equals() method in object only checks its equality on the reference. So if you need compare its state as well then you can override that as it is done in String class.
There's no mention in this answer of testing the equals/hashcode contract.
I've found the EqualsVerifier library to be very useful and comprehensive. It is also very easy to use.
Also, building equals() and hashCode() methods from scratch involves a lot of boilerplate code. The Apache Commons Lang library provides the EqualsBuilder and HashCodeBuilder classes. These classes greatly simplify implementing equals() and hashCode() methods for complex classes.
As an aside, it's worth considering overriding the toString() method to aid debugging. Apache Commons Lang library provides the ToStringBuilder class to help with this.

Mutable objects and hashCode

Have the following class:
public class Member {
private int x;
private long y;
private double d;
public Member(int x, long y, double d) {
this.x = x;
this.y = y;
this.d = d;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + x;
result = (int) (prime * result + y);
result = (int) (prime * result + Double.doubleToLongBits(d));
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj instanceof Member) {
Member other = (Member) obj;
return other.x == x && other.y == y
&& Double.compare(d, other.d) == 0;
}
return false;
}
public static void main(String[] args) {
Set<Member> test = new HashSet<Member>();
Member b = new Member(1, 2, 3);
test.add(b);
System.out.println(b.hashCode());
b.x = 0;
System.out.println(b.hashCode());
Member first = test.iterator().next();
System.out.println(test.contains(first));
System.out.println(b.equals(first));
System.out.println(test.add(first));
}
}
It produces the following results:
30814
29853
false
true
true
Because the hashCode depends of the state of the object it can no longer by retrieved properly, so the check for containment fails. The HashSet in no longer working properly. A solution would be to make Member immutable, but is that the only solution? Should all classes added to HashSets be immutable? Is there any other way to handle the situation?
Regards.
Objects in hashsets should either be immutable, or you need to exercise discipline in not changing them after they've been used in a hashset (or hashmap).
In practice I've rarely found this to be a problem - I rarely find myself needing to use complex objects as keys or set elements, and when I do it's usually not a problem just not to mutate them. Of course if you've exposed the references to other code by this time, it can become harder.
Yes. While maintaining your class mutable, you can compute the hashCode and the equals methods based on immutable values of the class ( perhaps a generated id ) to adhere to the hashCode contract defined in Object class:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Depending on your situation this may be easier or not.
class Member {
private static long id = 0;
private long id = Member.id++;
// other members here...
public int hashCode() { return this.id; }
public boolean equals( Object o ) {
if( this == o ) { return true; }
if( o instanceOf Member ) { return this.id == ((Member)o).id; }
return false;
}
...
}
If you need a thread safe attribute, you may consider use: AtomicLong instead, but again, it depends on how are you going to use your object.
As already mentioned, one can accept the following three solutions:
Use immutable objects; even when your class is mutable, you may use immutable identities on your hashcode implementation and equals checking, eg an ID-like value.
Similarly to the above, implement add/remove to get a clone of the inserted object, not the actual reference. HashSet does not offer a get function (eg to allow you alter the object later on); thus, you are safe there won't exist duplicates.
Exercise discipline in not changing them after they've been used, as #Jon Skeet suggests
But, if for some reason you really need to modify objects after being inserted to a HashSet, you need to find a way of "informing" your Collection with the new changes. To achieve this functionality:
You can use the Observer design pattern, and extend HashSet to implement the Observer interface. Your Member objects must be Observable and update the HashSet on any setter or other method that affects hashcode and/or equals.
Note 1: Extending 3, using 4: we may accept alterations, but those that do not create an already existing object (eg I updated a user's ID, by assigning a new ID, not setting it to an existing one). Otherwise, you have to consider the scenario where an object is transformed in such a way that is now equal to another object already existing in the Set. If you accept this limitation, 4th suggestion will work fine, else you must be proactive and define a policy for such cases.
Note 2: You have to provide both previous and current states of the altered object on your update implementation, because you have to initially remove the older element (eg use getClone() before setting new values), then add the object with the new state. The following snippet is just an example implementation, it needs changes based on your policy of adding a duplicate.
#Override
public void update(Observable newItem, Object oldItem) {
remove(oldItem);
if (add(newItem))
newItem.addObserver(this);
}
I've used similar techniques on projects, where I require multiple indices on a class, so I can look up with O(1) for Sets of objects that share a common identity; imagine it as a MultiKeymap of HashSets (this is really useful, as you can then intersect/union indices and work similarly to SQL-like searching). In such cases I annotate methods (usually setters) that must fireChange-update each of the indices when a significant change occurs, so indices are always updated with the latest states.
Jon Skeet has listed all alternatives. As for why the keys in a Map or Set must not change:
The contract of a Set implies that at any time, there are no two objects o1 and o2 such that
o1 != o2 && set.contains(o1) && set.contains(o2) && o1.equals(o2)
Why that is required is especially clear for a Map. From the contract of Map.get():
More formally, if this map contains a mapping from a key
k to a value v such that (key==null ? k==null : key.equals(k)), then this method returns v, otherwise it returns null. (There can be at most one such mapping.)
Now, if you modify a key inserted into a map, you might make it equal to some other key already inserted. Moreover, the map can not know that you have done so. So what should the map do if you then do map.get(key), where key is equal to several keys in the map? There is no intuitive way to define what that would mean - chiefly because our intuition for these datatypes is the mathematical ideal of sets and mappings, which don't have to deal with changing keys, since their keys are mathematical objects and hence immutable.
Theoretically (and more often than not practically too) your class either:
has a natural immutable identity that can be inferred from a subset of its fields, in which case you can use those fields to generate the hashCode from.
has no natural identity, in which case using a Set to store them is unnecessary, you could just as well use a List.
Never change 'hashable field" after putting in hash based container.
As if you (Member) registered your phone number (Member.x) in yellow page(hash based container), but you changed your number, then no one can find you in the yellow page any more.

What is the use of hashCode in Java?

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

Why should I override hashCode() when I override equals() method?

Ok, I have heard from many places and sources that whenever I override the equals() method, I need to override the hashCode() method as well. But consider the following piece of code
package test;
public class MyCustomObject {
int intVal1;
int intVal2;
public MyCustomObject(int val1, int val2){
intVal1 = val1;
intVal2 = val2;
}
public boolean equals(Object obj){
return (((MyCustomObject)obj).intVal1 == this.intVal1) &&
(((MyCustomObject)obj).intVal2 == this.intVal2);
}
public static void main(String a[]){
MyCustomObject m1 = new MyCustomObject(3,5);
MyCustomObject m2 = new MyCustomObject(3,5);
MyCustomObject m3 = new MyCustomObject(4,5);
System.out.println(m1.equals(m2));
System.out.println(m1.equals(m3));
}
}
Here the output is true, false exactly the way I want it to be and I dont care of overriding the hashCode() method at all. This means that hashCode() overriding is an option rather being a mandatory one as everyone says.
I want a second confirmation.
It works for you because your code does not use any functionality (HashMap, HashTable) which needs the hashCode() API.
However, you don't know whether your class (presumably not written as a one-off) will be later called in a code that does indeed use its objects as hash key, in which case things will be affected.
As per the documentation for Object class:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
Because HashMap/Hashtable will lookup object by hashCode() first.
If they are not the same, hashmap will assert object are not the same and return not exists in the map.
The reason why you need to #Override neither or both, is because of the way they interrelate with the rest of the API.
You'll find that if you put m1 into a HashSet<MyCustomObject>, then it doesn't contains(m2). This is inconsistent behavior and can cause a lot of bugs and chaos.
The Java library has tons of functionalities. In order to make them work for you, you need to play by the rules, and making sure that equals and hashCode are consistent is one of the most important ones.
Most of the other comments already gave you the answer: you need to do it because there are collections (ie: HashSet, HashMap) that uses hashCode as an optimization to "index" object instances, an those optimizations expects that if: a.equals(b) ==> a.hashCode() == b.hashCode() (NOTE that the inverse doesn't hold).
But as an additional information you can do this exercise:
class Box {
private String value;
/* some boring setters and getters for value */
public int hashCode() { return value.hashCode(); }
public boolean equals(Object obj) {
if (obj != null && getClass().equals(obj.getClass()) {
return ((Box) obj).value.equals(value);
} else { return false; }
}
}
The do this:
Set<Box> s = new HashSet<Box>();
Box b = new Box();
b.setValue("hello");
s.add(b);
s.contains(b); // TRUE
b.setValue("other");
s.contains(b); // FALSE
s.iterator().next() == b // TRUE!!! b is in s but contains(b) returns false
What you learn from this example is that implementing equals or hashCode with properties that can be changed (mutable) is a really bad idea.
It is primarily important when searching for an object using its hashCode() value in a collection (i.e. HashMap, HashSet, etc.). Each object returns a different hashCode() value therefore you must override this method to consistently generate a hashCode value based on the state of the object to help the Collections algorithm locate values on the hash table.

In Java, why must equals() and hashCode() be consistent?

If I override either method on a class, it must make sure that if A.equals(B) == true then A.hashCode() == B.hashCode must also be true.
Can someone show me a simple example where if this is violated, it'll cause a problem? I think it has something to do with if you use that class as the type of keys to Hashmap?
Sure:
public class Test {
private final int m, n;
public Test(int m, int n) {
this.m = m;
this.n = n;
}
public int hashCode() { return n * m; }
public boolean equals(Object ob) {
if (ob.getClass() != Test.class) return false;
Test other = (Test)ob;
return m == other.m;
}
}
with:
Set<Test> set = new HashSet<Test>();
set.put(new Test(3,4));
boolean b = set.contains(new Test(3, 10)); // false
Technically that should be true because m == 3 in both cases.
In general a HashMap works like this: it has a variable number of what are commonly called "buckets". The number of buckets can change over time (as entries are added and removed) but it is always a power of 2.
Let's say a given HashMap has 16 buckets. When you call put() to add an entry, the hashCode() of the key is calculated and then a mask is taken depending on the size of the buckets. If you (bitwise) AND the hashCode() with 15 (0x0F) you will get the last 4 bits, equaling a number between 0 and 15 inclusive:
int factor = 4;
int buckets = 1 << (factor-1) - 1; // 16
int mask = buckets - 1; // 15
int code = key.hashCode();
int dest = code & mask; // a number from 0 to 15 inclusive
Now if there is already an entry in that bucket you have what's called a collision. There are multiple ways of dealing with this but the one used by HashMap (and is probably the most common overall) is bucketing. All the entries with the same masked hashCode are put in a list of some kind.
So to find if a given key is in the map already:
Calculate the masked hash code;
Find the appropriate bucket;
If it's empty, key not found;
If is isn't empty, loop through all entries in the bucket checking equals().
Looking through a bucket is a linear (O(n)) operation but it's on a small subset. The hashcode bucket determination is essentially constant (O(1)). If buckets are sufficiently small then access to a HashMap is usually described as "near O(1)".
You can make a couple of observations about this.
Firstly, if you have a bunch of objects that all return 42 as their hash code a HashMap will still work but it will operate as an expensive list. Access will be O(n) (as everything will be in the same bucket regardless of the number of buckets). I've actually been asked this in an interview.
Secondly, returning to your original point, if two objects are equal (meaning a.equals(b) == b.equals(a) == true) but have different hash codes then the HashMap will go looking in (probably) the wrong bucket resulting in unpredictable and undefined behaviour.
This is discussed in the Item 8: Always override hashCode when you override equals of Joshua Bloch's Effective Java:
A common source of bugs is the failure to override the hashCode method. You must
override hashCode in every class that overrides equals. Failure to do so will
result in a violation of the general contract for Object.hashCode, which will pre-
vent your class from functioning properly in conjunction with all hash-based collec-
tions, including HashMap, HashSet, and Hashtable.
Here is the contract, copied from the
java.lang.Object specification:
Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The key provision that is violated when you fail to override hashCode is
the second one: Equal objects must have equal hash codes. Two distinct
instances may be logically equal according to the class’s equals method, but to
the Object class’s hashCode method, they’re just two objects with nothing much
in common. Therefore object’s hashCode method returns two seemingly random
numbers instead of two equal numbers as required by the contract.
For example, consider the following simplistic PhoneNumber class, whose
equals method is constructed according to the recipe in Item 7:
public final class PhoneNumber {
private final short areaCode;
private final short exchange;
private final short extension;
public PhoneNumber(int areaCode, int exchange,
int extension) {
rangeCheck(areaCode, 999, "area code");
rangeCheck(exchange, 999, "exchange");
rangeCheck(extension, 9999, "extension");
this.areaCode = (short) areaCode;
this.exchange = (short) exchange;
this.extension = (short) extension;
}
private static void rangeCheck(int arg, int max,
String name) {
if (arg < 0 || arg > max)
throw new IllegalArgumentException(name +": " + arg);
}
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof PhoneNumber))
return false;
PhoneNumber pn = (PhoneNumber)o;
return pn.extension == extension &&
pn.exchange == exchange &&
pn.areaCode == areaCode;
}
// No hashCode method!
... // Remainder omitted
}
Suppose you attempt to use this class
with a HashMap:
Map m = new HashMap();
m.put(new PhoneNumber(408, 867, 5309), "Jenny");
At this point, you might expect
m.get(new PhoneNumber(408 , 867,
5309)) to return "Jenny", but it
returns null. Notice that two PhoneNumber instances are
involved: One is used for insertion
into the HashMap, and a second, equal,
instance is used for (attempted)
retrieval. The PhoneNumber class’s
failure to override hashCode causes
the two equal instances to have
unequal hash codes, in violation of
the hashCode contract. Therefore the
get method looks for the phone number
in a different hash bucket from the
one in which it was stored by the put
method. Fixing this problem is as
simple as providing a proper hashCode
method for the PhoneNumber class.
[...]
See the Chapter 3 for the full content.
Containers like HashSet rely on the hash function to determine where to put it, and where to get it from when asked for it. If A.equals(B), then a HashSet is expecting A to be in the same place as B. If you put A in with value V, and look up B, you should expect to get V back (since you've said A.equals(B)). But if A.hashcode() != B.hashcode(), then the hashset may not find where you put it.
Here's a little example:
Set<Foo> myFoos = new HashSet<Foo>();
Foo firstFoo = new Foo(123,"Alpha");
myFoos.add(firstFoo);
// later in the processing you get another Foo from somewhere
Foo someFoo = //use imagination here...;
// maybe you get it from a database... and it's equal to Foo(123,"Alpha)
if (myFoos.contains(someFoo)) {
// maybe you win a million bucks.
}
So, imagine that the hashCode that gets created for firstFoo is 99999 and it winds up at a specific spot in the myFoos HashSet. Later when you get the someFoo and you look for it in the myFoos HashSet, it needs to generate the same hashCode so you can find it.
It's exactly because of hash tables.
Because of the possibility of hash code collisions, hash tables need to check identity as well, otherwise the table can't determine if it found the object it was looking for, or one with the same hash code. So every get() in a hash table calls key.equals(potentialMatch) before returning a value.
If equals() and hashCode() are inconsistent you can get very inconsistent behavior. Say for two objects, a and b, a.equals(b) returns true, but a.hashCode() != b.hashCode(). Insert a and a HashSet will return false for .contains(b), but a List created from that set will return true (because the list doesn't use hash codes).
HashSet set = new HashSet();
set.add(a);
set.contains(b); // false
new ArrayList(set).contains(b); // true
Obviously, that could be bad.
The idea behind this is that two objects are "equal" if all of their fields have equal values. If all of fields have equal values, the two objects should have the same hash value.

Categories