How hashset remove duplicate entries? - java

public class Type extends SomeObject implements java.io.Serializable {
private Integer typeId;
private String typeName;
private String typeCode;
}
I am getting data from database using hibernate as Type object. Now my class do not (Nor its parent class) override equals method. SO if i insert all of Type object into a hashset it shouldn't remove duplicates but it is removing duplicate. My question is that how is it able to identify duplicates ?

Inside a Hibernate Session, a given entity only exists once. I.e. if you do
Type type1 = (Type) session.get(Type.class, 42);
Type type2 = (Type) session.get(Type.class, 42);
Type type3 = (Type) session.createQuery("select t from Type t where t.id = 42").uniqueResult();
then type1, type2 and type3 will all be references to a single object. Adding them all in a empty HashSet will thus lead to a HashSet of size 1, because Object.equals() returns true when comparing an object with itself.

if you're class is overriding the hashcode and equals method, the overridden methods are used, or else default methods, the ones present in Object class are used. But hashcode and equals are used in HashSets, Hashmaps, etc.
For default behaviour you can refer to doc
The equals method for class Object implements the most discriminating
possible equivalence relation on objects; that is, for any non-null
reference values x and y, this method returns true if and only if x
and y refer to the same object (x == y has the value true)
As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct objects. (This
is typically implemented by converting the internal address of the
object into an integer, but this implementation technique is not
required by the JavaTM programming language.)
Adding how does it find duplicates:
Lets look at hashing first:
Hashing is used to retrieve data quickly. Basically hash function generates an index value (this index value usually refers to a bucket that can store objects), once this index value is generated, it searches for that object inside that particular bucket.
i'd suggest you to look into hashing once
Hashing Wiki
Whenever we add to hashset, the hashcode method is used,it returns an index value, the object is stored in that bucket(the bucket corresponding to that index). when again the same object is being added, the hashcode value will be same, so it refers to the same bucket, now since the object is already present(here equals method is used to check equality amongst objects) in that bucket, it is not added again
class Data {
public Data(String name) {
super();
this.name = name;
}
String name;
}
public class ReadFromFile {
public static void main(String[] args) throws IOException {
Data s = new Data("yo");
HashSet<Data> set = new HashSet<>();
set.add(new Data("hi"));// hi
set.add(new Data("hi"));// hi but has a different address because a new
// object is created
System.out.println("Size before adding s " + set.size());
set.add(s);
set.add(s);// because same object is being added, default hashcode
// generates same value, as it uses address to generate
// hashcode
System.out.println("Size after adding s " + set.size());
}
}
output
Size before adding s 2
Size after adding s 3

Related

Why doesn't distinct work for a custom class in Java Streams? [duplicate]

Recently I read through this
Developer Works Document.
The document is all about defining hashCode() and equals() effectively and correctly, however I am not able to figure out why we need to override these two methods.
How can I take the decision to implement these methods efficiently?
Joshua Bloch says on Effective Java
You must override hashCode() in every class that overrides equals(). Failure to do so will result in a violation of the general contract for Object.hashCode(), which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.
Let's try to understand it with an example of what would happen if we override equals() without overriding hashCode() and attempt to use a Map.
Say we have a class like this and that two objects of MyClass are equal if their importantField is equal (with hashCode() and equals() generated by eclipse)
public class MyClass {
private final String importantField;
private final String anotherField;
public MyClass(final String equalField, final String anotherField) {
this.importantField = equalField;
this.anotherField = anotherField;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((importantField == null) ? 0 : importantField.hashCode());
return result;
}
#Override
public boolean equals(final Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
final MyClass other = (MyClass) obj;
if (importantField == null) {
if (other.importantField != null)
return false;
} else if (!importantField.equals(other.importantField))
return false;
return true;
}
}
Imagine you have this
MyClass first = new MyClass("a","first");
MyClass second = new MyClass("a","second");
Override only equals
If only equals is overriden, then when you call myMap.put(first,someValue) first will hash to some bucket and when you call myMap.put(second,someOtherValue) it will hash to some other bucket (as they have a different hashCode). So, although they are equal, as they don't hash to the same bucket, the map can't realize it and both of them stay in the map.
Although it is not necessary to override equals() if we override hashCode(), let's see what would happen in this particular case where we know that two objects of MyClass are equal if their importantField is equal but we do not override equals().
Override only hashCode
If you only override hashCode then when you call myMap.put(first,someValue) it takes first, calculates its hashCode and stores it in a given bucket. Then when you call myMap.put(second,someOtherValue) it should replace first with second as per the Map Documentation because they are equal (according to the business requirement).
But the problem is that equals was not redefined, so when the map hashes second and iterates through the bucket looking if there is an object k such that second.equals(k) is true it won't find any as second.equals(first) will be false.
Hope it was clear
Collections such as HashMap and HashSet use a hashcode value of an object to determine how it should be stored inside a collection, and the hashcode is used again in order to locate the object
in its collection.
Hashing retrieval is a two-step process:
Find the right bucket (using hashCode())
Search the bucket for the right element (using equals() )
Here is a small example on why we should overrride equals() and hashcode().
Consider an Employee class which has two fields: age and name.
public class Employee {
String name;
int age;
public Employee(String name, int age) {
this.name = name;
this.age = age;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
#Override
public boolean equals(Object obj) {
if (obj == this)
return true;
if (!(obj instanceof Employee))
return false;
Employee employee = (Employee) obj;
return employee.getAge() == this.getAge()
&& employee.getName() == this.getName();
}
// commented
/* #Override
public int hashCode() {
int result=17;
result=31*result+age;
result=31*result+(name!=null ? name.hashCode():0);
return result;
}
*/
}
Now create a class, insert Employee object into a HashSet and test whether that object is present or not.
public class ClientTest {
public static void main(String[] args) {
Employee employee = new Employee("rajeev", 24);
Employee employee1 = new Employee("rajeev", 25);
Employee employee2 = new Employee("rajeev", 24);
HashSet<Employee> employees = new HashSet<Employee>();
employees.add(employee);
System.out.println(employees.contains(employee2));
System.out.println("employee.hashCode(): " + employee.hashCode()
+ " employee2.hashCode():" + employee2.hashCode());
}
}
It will print the following:
false
employee.hashCode(): 321755204 employee2.hashCode():375890482
Now uncomment hashcode() method , execute the same and the output would be:
true
employee.hashCode(): -938387308 employee2.hashCode():-938387308
Now can you see why if two objects are considered equal, their hashcodes must
also be equal? Otherwise, you'd never be able to find the object since the default
hashcode method in class Object virtually always comes up with a unique number
for each object, even if the equals() method is overridden in such a way that two
or more objects are considered equal. It doesn't matter how equal the objects are if
their hashcodes don't reflect that. So one more time: If two objects are equal, their
hashcodes must be equal as well.
You must override hashCode() in every
class that overrides equals(). Failure
to do so will result in a violation of
the general contract for
Object.hashCode(), which will prevent
your class from functioning properly
in conjunction with all hash-based
collections, including HashMap,
HashSet, and Hashtable.
   from Effective Java, by Joshua Bloch
By defining equals() and hashCode() consistently, you can improve the usability of your classes as keys in hash-based collections. As the API doc for hashCode explains: "This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable."
The best answer to your question about how to implement these methods efficiently is suggesting you to read Chapter 3 of Effective Java.
Why we override equals() method
In Java we can not overload how operators like ==, +=, -+ behave. They are behaving a certain way. So let's focus on the operator == for our case here.
How operator == works.
It checks if 2 references that we compare point to the same instance in memory. Operator == will resolve to true only if those 2 references represent the same instance in memory.
So now let's consider the following example
public class Person {
private Integer age;
private String name;
..getters, setters, constructors
}
So let's say that in your program you have built 2 Person objects on different places and you wish to compare them.
Person person1 = new Person("Mike", 34);
Person person2 = new Person("Mike", 34);
System.out.println ( person1 == person2 ); --> will print false!
Those 2 objects from business perspective look the same right? For JVM they are not the same. Since they are both created with new keyword those instances are located in different segments in memory. Therefore the operator == will return false
But if we can not override the == operator how can we say to JVM that we want those 2 objects to be treated as same. There comes the .equals() method in play.
You can override equals() to check if some objects have same values for specific fields to be considered equal.
You can select which fields you want to be compared. If we say that 2 Person objects will be the same if and only if they have the same age and same name, then the IDE will create something like the following for automatic generation of equals()
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Person person = (Person) o;
return age == person.age &&
name.equals(person.name);
}
Let's go back to our previous example
Person person1 = new Person("Mike", 34);
Person person2 = new Person("Mike", 34);
System.out.println ( person1 == person2 ); --> will print false!
System.out.println ( person1.equals(person2) ); --> will print true!
So we can not overload == operator to compare objects the way we want but Java gave us another way, the equals() method, which we can override as we want.
Keep in mind however, if we don't provide our custom version of .equals() (aka override) in our class then the predefined .equals() from Object class and == operator will behave exactly the same.
Default equals() method which is inherited from Object will check whether both compared instances are the same in memory!
Why we override hashCode() method
Some Data Structures in java like HashSet, HashMap store their elements based on a hash function which is applied on those elements. The hashing function is the hashCode()
If we have a choice of overriding .equals() method then we must also have a choice of overriding hashCode() method. There is a reason for that.
Default implementation of hashCode() which is inherited from Object considers all objects in memory unique!
Let's get back to those hash data structures. There is a rule for those data structures.
HashSet can not contain duplicate values and HashMap can not contain duplicate keys
HashSet is implemented with a HashMap behind the scenes where each value of a HashSet is stored as a key in a HashMap.
So we have to understand how a HashMap works.
In a simple way a HashMap is a native array that has some buckets. Each bucket has a linkedList. In that linkedList our keys are stored. HashMap locates the correct linkedList for each key by applying hashCode() method and after that it iterates through all elements of that linkedList and applies equals() method on each of these elements to check if that element is already contained there. No duplicate keys are allowed.
When we put something inside a HashMap, the key is stored in one of those linkedLists. In which linkedList that key will be stored is shown by the result of hashCode() method on that key. So if key1.hashCode() has as a result 4, then that key1 will be stored on the 4th bucket of the array, in the linkedList that exists there.
By default hashCode() method returns a different result for each different instance. If we have the default equals() which behaves like == which considers all instances in memory as different objects we don't have any problem.
But in our previous example we said we want Person instances to be considered equal if their ages and names match.
Person person1 = new Person("Mike", 34);
Person person2 = new Person("Mike", 34);
System.out.println ( person1.equals(person2) ); --> will print true!
Now let's create a map to store those instances as keys with some string as pair value
Map<Person, String> map = new HashMap();
map.put(person1, "1");
map.put(person2, "2");
In Person class we have not overridden the hashCode method but we have overridden equals method. Since the default hashCode provides different results for different java instances person1.hashCode() and person2.hashCode() have big chances of having different results.
Our map might end with those persons in different linkedLists.
This is against the logic of a HashMap
A HashMap is not allowed to have multiple equal keys!
But ours now has and the reason is that the default hashCode() which was inherited from Object Class was not enough. Not after we have overridden the equals() method on Person Class.
That is the reason why we must override hashCode() method after we have overridden equals method.
Now let's fix that. Let's override our hashCode() method to consider the same fields that equals() considers, namely age, name
public class Person {
private Integer age;
private String name;
..getters, setters, constructors
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Person person = (Person) o;
return age == person.age &&
name.equals(person.name);
}
#Override
public int hashCode() {
return Objects.hash(name, age);
}
}
Now let's try again to save those keys in our HashMap
Map<Person, String> map = new HashMap();
map.put(person1, "1");
map.put(person2, "2");
person1.hashCode() and person2.hashCode() will definitely be the same. Let's say it is 0.
HashMap will go to bucket 0 and in that LinkedList will save the person1 as key with the value "1". For the second put HashMap is intelligent enough and when it goes again to bucket 0 to save person2 key with value "2" it will see that another equal key already exists there. So it will overwrite the previous key. So in the end only person2 key will exist in our HashMap.
Now we are aligned with the rule of Hash Map that says no multiple equal keys are allowed!
Identity is not equality.
equals operator == test identity.
equals(Object obj) method compares equality test(i.e. we need to tell equality by overriding the method)
Why do I need to override the equals and hashCode methods in Java?
First we have to understand the use of equals method.
In order to identity differences between two objects we need to override equals method.
For example:
Customer customer1=new Customer("peter");
Customer customer2=customer1;
customer1.equals(customer2); // returns true by JVM. i.e. both are refering same Object
------------------------------
Customer customer1=new Customer("peter");
Customer customer2=new Customer("peter");
customer1.equals(customer2); //return false by JVM i.e. we have two different peter customers.
------------------------------
Now I have overriden Customer class equals method as follows:
#Override
public boolean equals(Object obj) {
if (this == obj) // it checks references
return true;
if (obj == null) // checks null
return false;
if (getClass() != obj.getClass()) // both object are instances of same class or not
return false;
Customer other = (Customer) obj;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name)) // it again using bulit in String object equals to identify the difference
return false;
return true;
}
Customer customer1=new Customer("peter");
Customer customer2=new Customer("peter");
Insteady identify the Object equality by JVM, we can do it by overring equals method.
customer1.equals(customer2); // returns true by our own logic
Now hashCode method can understand easily.
hashCode produces integer in order to store object in data structures like HashMap, HashSet.
Assume we have override equals method of Customer as above,
customer1.equals(customer2); // returns true by our own logic
While working with data structure when we store object in buckets(bucket is a fancy name for folder). If we use built-in hash technique, for above two customers it generates two different hashcode. So we are storing the same identical object in two different places. To avoid this kind of issues we should override the hashCode method also based on the following principles.
un-equal instances may have same hashcode.
equal instances should return same hashcode.
Simply put, the equals-method in Object check for reference equality, where as two instances of your class could still be semantically equal when the properties are equal. This is for instance important when putting your objects into a container that utilizes equals and hashcode, like HashMap and Set. Let's say we have a class like:
public class Foo {
String id;
String whatevs;
Foo(String id, String whatevs) {
this.id = id;
this.whatevs = whatevs;
}
}
We create two instances with the same id:
Foo a = new Foo("id", "something");
Foo b = new Foo("id", "something else");
Without overriding equals we are getting:
a.equals(b) is false because they are two different instances
a.equals(a) is true since it's the same instance
b.equals(b) is true since it's the same instance
Correct? Well maybe, if this is what you want. But let's say we want objects with the same id to be the same object, regardless if it's two different instances. We override the equals (and hashcode):
public class Foo {
String id;
String whatevs;
Foo(String id, String whatevs) {
this.id = id;
this.whatevs = whatevs;
}
#Override
public boolean equals(Object other) {
if (other instanceof Foo) {
return ((Foo)other).id.equals(this.id);
}
}
#Override
public int hashCode() {
return this.id.hashCode();
}
}
As for implementing equals and hashcode I can recommend using Guava's helper methods
Let me explain the concept in very simple words.
Firstly from a broader perspective we have collections, and hashmap is one of the datastructure in the collections.
To understand why we have to override the both equals and hashcode method, if need to first understand what is hashmap and what is does.
A hashmap is a datastructure which stores key value pairs of data in array fashion. Lets say a[], where each element in 'a' is a key value pair.
Also each index in the above array can be linked list thereby having more than one values at one index.
Now why is a hashmap used?
If we have to search among a large array then searching through each if them will not be efficient, so what hash technique tells us that lets pre process the array with some logic and group the elements based on that logic i.e. Hashing
EG: we have array 1,2,3,4,5,6,7,8,9,10,11 and we apply a hash function mod 10 so 1,11 will be grouped in together. So if we had to search for 11 in previous array then we would have to iterate the complete array but when we group it we limit our scope of iteration thereby improving speed. That datastructure used to store all the above information can be thought of as a 2d array for simplicity
Now apart from the above hashmap also tells that it wont add any Duplicates in it. And this is the main reason why we have to override the equals and hashcode
So when its said that explain the internal working of hashmap , we need to find what methods the hashmap has and how does it follow the above rules which i explained above
so the hashmap has method called as put(K,V) , and according to hashmap it should follow the above rules of efficiently distributing the array and not adding any duplicates
so what put does is that it will first generate the hashcode for the given key to decide which index the value should go in.if nothing is present at that index then the new value will be added over there, if something is already present over there then the new value should be added after the end of the linked list at that index. but remember no duplicates should be added as per the desired behavior of the hashmap. so lets say you have two Integer objects aa=11,bb=11.
As every object derived from the object class, the default implementation for comparing two objects is that it compares the reference and not values inside the object. So in the above case both though semantically equal will fail the equality test, and possibility that two objects which same hashcode and same values will exists thereby creating duplicates. If we override then we could avoid adding duplicates.
You could also refer to Detail working
import java.util.HashMap;
public class Employee {
String name;
String mobile;
public Employee(String name,String mobile) {
this.name = name;
this.mobile = mobile;
}
#Override
public int hashCode() {
System.out.println("calling hascode method of Employee");
String str = this.name;
int sum = 0;
for (int i = 0; i < str.length(); i++) {
sum = sum + str.charAt(i);
}
return sum;
}
#Override
public boolean equals(Object obj) {
// TODO Auto-generated method stub
System.out.println("calling equals method of Employee");
Employee emp = (Employee) obj;
if (this.mobile.equalsIgnoreCase(emp.mobile)) {
System.out.println("returning true");
return true;
} else {
System.out.println("returning false");
return false;
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
Employee emp = new Employee("abc", "hhh");
Employee emp2 = new Employee("abc", "hhh");
HashMap<Employee, Employee> h = new HashMap<>();
//for (int i = 0; i < 5; i++) {
h.put(emp, emp);
h.put(emp2, emp2);
//}
System.out.println("----------------");
System.out.println("size of hashmap: "+h.size());
}
}
hashCode() :
If you only override the hash-code method nothing happens, because it always returns a new hashCode for each object as an Object class.
equals() :
If you only override the equals method, if a.equals(b) is true it means the hashCode of a and b must be the same but that does not happen since you did not override the hashCode method.
Note : hashCode() method of Object class always returns a new hashCode for each object.
So when you need to use your object in the hashing based collection, you must override both equals() and hashCode().
Java puts a rule that
"If two objects are equal using Object class equals method, then the hashcode method should give the same value for these two objects."
So, if in our class we override equals() we should override hashcode() method also to follow this rule.
Both methods, equals() and hashcode(), are used in Hashtable, for example, to store values as key-value pairs. If we override one and not the other, there is a possibility that the Hashtable may not work as we want, if we use such object as a key.
Adding to #Lombo 's answer
When will you need to override equals() ?
The default implementation of Object's equals() is
public boolean equals(Object obj) {
return (this == obj);
}
which means two objects will be considered equal only if they have the same memory address which will be true only if you are
comparing an object with itself.
But you might want to consider two objects the same if they have the same value for one
or more of their properties (Refer the example given in #Lombo 's answer).
So you will override equals() in these situations and you would give your own conditions for equality.
I have successfully implemented equals() and it is working great.So why are they asking to override hashCode() as well?
Well.As long as you don't use "Hash" based Collections on your user-defined class,it is fine.
But some time in the future you might want to use HashMap or HashSet and if you don't override and "correctly implement" hashCode(), these Hash based collection won't work as intended.
Override only equals (Addition to #Lombo 's answer)
myMap.put(first,someValue)
myMap.contains(second); --> But it should be the same since the key are the same.But returns false!!! How?
First of all,HashMap checks if the hashCode of second is the same as first.
Only if the values are the same,it will proceed to check the equality in the same bucket.
But here the hashCode is different for these 2 objects (because they have different memory address-from default implementation).
Hence it will not even care to check for equality.
If you have a breakpoint inside your overridden equals() method,it wouldn't step in if they have different hashCodes.
contains() checks hashCode() and only if they are the same it would call your equals() method.
Why can't we make the HashMap check for equality in all the buckets? So there is no necessity for me to override hashCode() !!
Then you are missing the point of Hash based Collections.
Consider the following :
Your hashCode() implementation : intObject%9.
The following are the keys stored in the form of buckets.
Bucket 1 : 1,10,19,... (in thousands)
Bucket 2 : 2,20,29...
Bucket 3 : 3,21,30,...
...
Say,you want to know if the map contains the key 10.
Would you want to search all the buckets? or Would you want to search only one bucket?
Based on the hashCode,you would identify that if 10 is present,it must be present in Bucket 1.
So only Bucket 1 will be searched !!
Because if you do not override them you will be use the default implentation in Object.
Given that instance equality and hascode values generally require knowledge of what makes up an object they generally will need to be redefined in your class to have any tangible meaning.
In order to use our own class objects as keys in collections like HashMap, Hashtable etc.. , we should override both methods ( hashCode() and equals() ) by having an awareness on internal working of collection. Otherwise, it leads to wrong results which we are not expected.
class A {
int i;
// Hashing Algorithm
if even number return 0 else return 1
// Equals Algorithm,
if i = this.i return true else false
}
put('key','value') will calculate the hash value using hashCode() to determine the
bucket and uses equals() method to find whether the value is already
present in the Bucket. If not it will added else it will be replaced with current value
get('key') will use hashCode() to find the Entry (bucket) first and
equals() to find the value in Entry
if Both are overridden,
Map<A>
Map.Entry 1 --> 1,3,5,...
Map.Entry 2 --> 2,4,6,...
if equals is not overridden
Map<A>
Map.Entry 1 --> 1,3,5,...,1,3,5,... // Duplicate values as equals not overridden
Map.Entry 2 --> 2,4,6,...,2,4,..
If hashCode is not overridden
Map<A>
Map.Entry 1 --> 1
Map.Entry 2 --> 2
Map.Entry 3 --> 3
Map.Entry 4 --> 1
Map.Entry 5 --> 2
Map.Entry 6 --> 3 // Same values are Stored in different hasCodes violates Contract 1
So on...
HashCode Equal Contract
Two keys equal according to equal method should generate same hashCode
Two Keys generating same hashCode need not be equal (In above example all even numbers generate same hash Code)
1) The common mistake is shown in the example below.
public class Car {
private String color;
public Car(String color) {
this.color = color;
}
public boolean equals(Object obj) {
if(obj==null) return false;
if (!(obj instanceof Car))
return false;
if (obj == this)
return true;
return this.color.equals(((Car) obj).color);
}
public static void main(String[] args) {
Car a1 = new Car("green");
Car a2 = new Car("red");
//hashMap stores Car type and its quantity
HashMap<Car, Integer> m = new HashMap<Car, Integer>();
m.put(a1, 10);
m.put(a2, 20);
System.out.println(m.get(new Car("green")));
}
}
the green Car is not found
2. Problem caused by hashCode()
The problem is caused by the un-overridden method hashCode(). The contract between equals() and hashCode() is:
If two objects are equal, then they must have the same hash code.
If two objects have the same hash code, they may or may not be equal.
public int hashCode(){
return this.color.hashCode();
}
It is useful when using Value Objects. The following is an excerpt from the Portland Pattern Repository:
Examples of value objects are things
like numbers, dates, monies and
strings. Usually, they are small
objects which are used quite widely.
Their identity is based on their state
rather than on their object identity.
This way, you can have multiple copies
of the same conceptual value object.
So I can have multiple copies of an
object that represents the date 16 Jan
1998. Any of these copies will be equal to each other. For a small
object such as this, it is often
easier to create new ones and move
them around rather than rely on a
single object to represent the date.
A value object should always override
.equals() in Java (or = in Smalltalk).
(Remember to override .hashCode() as
well.)
Assume you have class (A) that aggregates two other (B) (C), and you need to store instances of (A) inside hashtable. Default implementation only allows distinguishing of instances, but not by (B) and (C). So two instances of A could be equal, but default wouldn't allow you to compare them in correct way.
Consider collection of balls in a bucket all in black color. Your Job is to color those balls as follows and use it for appropriate game,
For Tennis - Yellow, Red.
For Cricket - White
Now bucket has balls in three colors Yellow, Red and White. And that now you did the coloring Only you know which color is for which game.
Coloring the balls - Hashing.
Choosing the ball for game - Equals.
If you did the coloring and some one chooses the ball for either cricket or tennis they wont mind the color!!!
I was looking into the explanation " If you only override hashCode then when you call myMap.put(first,someValue) it takes first, calculates its hashCode and stores it in a given bucket. Then when you call myMap.put(first,someOtherValue) it should replace first with second as per the Map Documentation because they are equal (according to our definition)." :
I think 2nd time when we are adding in myMap then it should be the 'second' object like myMap.put(second,someOtherValue)
The methods equals and hashcode are defined in the object class. By default if the equals method returns true, then the system will go further and check the value of the hash code. If the hash code of the 2 objects is also same only then the objects will be considered as same. So if you override only equals method, then even though the overridden equals method indicates 2 objects to be equal , the system defined hashcode may not indicate that the 2 objects are equal. So we need to override hash code as well.
Equals and Hashcode methods in Java
They are methods of java.lang.Object class which is the super class of all the classes (custom classes as well and others defined in java API).
Implementation:
public boolean equals(Object obj)
public int hashCode()
public boolean equals(Object obj)
This method simply checks if two object references x and y refer to the same object. i.e. It checks if x == y.
It is reflexive: for any reference value x, x.equals(x) should return true.
It is symmetric: for any reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive: for any reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
It is consistent: for any reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
For any non-null reference value x, x.equals(null) should return
false.
public int hashCode()
This method returns the hash code value for the object on which this method is invoked. This method returns the hash code value as an integer and is supported for the benefit of hashing based collection classes such as Hashtable, HashMap, HashSet etc. This method must be overridden in every class that overrides the equals method.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Equal objects must produce the same hash code as long as they are
equal, however unequal objects need not produce distinct hash codes.
Resources:
JavaRanch
Picture
If you override equals() and not hashcode(), you will not find any problem unless you or someone else uses that class type in a hashed collection like HashSet.
People before me have clearly explained the documented theory multiple times, I am just here to provide a very simple example.
Consider a class whose equals() need to mean something customized :-
public class Rishav {
private String rshv;
public Rishav(String rshv) {
this.rshv = rshv;
}
/**
* #return the rshv
*/
public String getRshv() {
return rshv;
}
/**
* #param rshv the rshv to set
*/
public void setRshv(String rshv) {
this.rshv = rshv;
}
#Override
public boolean equals(Object obj) {
if (obj instanceof Rishav) {
obj = (Rishav) obj;
if (this.rshv.equals(((Rishav) obj).getRshv())) {
return true;
} else {
return false;
}
} else {
return false;
}
}
#Override
public int hashCode() {
return rshv.hashCode();
}
}
Now consider this main class :-
import java.util.HashSet;
import java.util.Set;
public class TestRishav {
public static void main(String[] args) {
Rishav rA = new Rishav("rishav");
Rishav rB = new Rishav("rishav");
System.out.println(rA.equals(rB));
System.out.println("-----------------------------------");
Set<Rishav> hashed = new HashSet<>();
hashed.add(rA);
System.out.println(hashed.contains(rB));
System.out.println("-----------------------------------");
hashed.add(rB);
System.out.println(hashed.size());
}
}
This will yield the following output :-
true
-----------------------------------
true
-----------------------------------
1
I am happy with the results. But if I have not overridden hashCode(), it will cause nightmare as objects of Rishav with same member content will no longer be treated as unique as the hashCode will be different, as generated by default behavior, here's the would be output :-
true
-----------------------------------
false
-----------------------------------
2
In the example below, if you comment out the override for equals or hashcode in the Person class, this code will fail to look up Tom's order. Using the default implementation of hashcode can cause failures in hashtable lookups.
What I have below is a simplified code that pulls up people's order by Person. Person is being used as a key in the hashtable.
public class Person {
String name;
int age;
String socialSecurityNumber;
public Person(String name, int age, String socialSecurityNumber) {
this.name = name;
this.age = age;
this.socialSecurityNumber = socialSecurityNumber;
}
#Override
public boolean equals(Object p) {
//Person is same if social security number is same
if ((p instanceof Person) && this.socialSecurityNumber.equals(((Person) p).socialSecurityNumber)) {
return true;
} else {
return false;
}
}
#Override
public int hashCode() { //I am using a hashing function in String.java instead of writing my own.
return socialSecurityNumber.hashCode();
}
}
public class Order {
String[] items;
public void insertOrder(String[] items)
{
this.items=items;
}
}
import java.util.Hashtable;
public class Main {
public static void main(String[] args) {
Person p1=new Person("Tom",32,"548-56-4412");
Person p2=new Person("Jerry",60,"456-74-4125");
Person p3=new Person("Sherry",38,"418-55-1235");
Order order1=new Order();
order1.insertOrder(new String[]{"mouse","car charger"});
Order order2=new Order();
order2.insertOrder(new String[]{"Multi vitamin"});
Order order3=new Order();
order3.insertOrder(new String[]{"handbag", "iPod"});
Hashtable<Person,Order> hashtable=new Hashtable<Person,Order>();
hashtable.put(p1,order1);
hashtable.put(p2,order2);
hashtable.put(p3,order3);
//The line below will fail if Person class does not override hashCode()
Order tomOrder= hashtable.get(new Person("Tom", 32, "548-56-4412"));
for(String item:tomOrder.items)
{
System.out.println(item);
}
}
}
hashCode() method is used to get a unique integer for given object. This integer is used for determining the bucket location, when this object needs to be stored in some HashTable, HashMap like data structure. By default, Object’s hashCode() method returns and integer representation of memory address where object is stored.
The hashCode() method of objects is used when we insert them into a HashTable, HashMap or HashSet. More about HashTables on Wikipedia.org for reference.
To insert any entry in map data structure, we need both key and value. If both key and values are user define data types, the hashCode() of the key will be determine where to store the object internally. When require to lookup the object from the map also, the hash code of the key will be determine where to search for the object.
The hash code only points to a certain "area" (or list, bucket etc) internally. Since different key objects could potentially have the same hash code, the hash code itself is no guarantee that the right key is found. The HashTable then iterates this area (all keys with the same hash code) and uses the key's equals() method to find the right key. Once the right key is found, the object stored for that key is returned.
So, as we can see, a combination of the hashCode() and equals() methods are used when storing and when looking up objects in a HashTable.
NOTES:
Always use same attributes of an object to generate hashCode() and equals() both. As in our case, we have used employee id.
equals() must be consistent (if the objects are not modified, then it must keep returning the same value).
Whenever a.equals(b), then a.hashCode() must be same as b.hashCode().
If you override one, then you should override the other.
http://parameshk.blogspot.in/2014/10/examples-of-comparable-comporator.html
String class and wrapper classes have different implementation of equals() and hashCode() methods than Object class. equals() method of Object class compares the references of the objects, not the contents. hashCode() method of Object class returns distinct hashcode for every single object whether the contents are same.
It leads problem when you use Map collection and the key is of Persistent type, StringBuffer/builder type. Since they don't override equals() and hashCode() unlike String class, equals() will return false when you compare two different objects even though both have same contents. It will make the hashMap storing same content keys. Storing same content keys means it is violating the rule of Map because Map doesnt allow duplicate keys at all.
Therefore you override equals() as well as hashCode() methods in your class and provide the implementation(IDE can generate these methods) so that they work same as String's equals() and hashCode() and prevent same content keys.
You have to override hashCode() method along with equals() because equals() work according hashcode.
Moreover overriding hashCode() method along with equals() helps to intact the equals()-hashCode() contract: "If two objects are equal, then they must have the same hash code."
When do you need to write custom implementation for hashCode()?
As we know that internal working of HashMap is on principle of Hashing. There are certain buckets where entrysets get stored. You customize the hashCode() implementation according your requirement so that same category objects can be stored into same index.
when you store the values into Map collection using put(k,v)method, the internal implementation of put() is:
put(k, v){
hash(k);
index=hash & (n-1);
}
Means, it generates index and the index is generated based on the hashcode of particular key object. So make this method generate hashcode according your requirement because same hashcode entrysets will be stored into same bucket or index.
That's it!
IMHO, it's as per the rule says - If two objects are equal then they should have same hash, i.e., equal objects should produce equal hash values.
Given above, default equals() in Object is == which does comparison on the address, hashCode() returns the address in integer(hash on actual address) which is again distinct for distinct Object.
If you need to use the custom Objects in the Hash based collections, you need to override both equals() and hashCode(), example If I want to maintain the HashSet of the Employee Objects, if I don't use stronger hashCode and equals I may endup overriding the two different Employee Objects, this happen when I use the age as the hashCode(), however I should be using the unique value which can be the Employee ID.
To help you check for duplicate Objects, we need a custom equals and hashCode.
Since hashcode always returns a number its always fast to retrieve an object using a number rather than an alphabetic key. How will it do? Assume we created a new object by passing some value which is already available in some other object. Now the new object will return the same hash value as of another object because the value passed is same. Once the same hash value is returned, JVM will go to the same memory address every time and if in case there are more than one objects present for the same hash value it will use equals() method to identify the correct object.
When you want to store and retrieve your custom object as a key in Map, then you should always override equals and hashCode in your custom Object .
Eg:
Person p1 = new Person("A",23);
Person p2 = new Person("A",23);
HashMap map = new HashMap();
map.put(p1,"value 1");
map.put(p2,"value 2");
Here p1 & p2 will consider as only one object and map size will be only 1 because they are equal.
public class Employee {
private int empId;
private String empName;
public Employee(int empId, String empName) {
super();
this.empId = empId;
this.empName = empName;
}
public int getEmpId() {
return empId;
}
public void setEmpId(int empId) {
this.empId = empId;
}
public String getEmpName() {
return empName;
}
public void setEmpName(String empName) {
this.empName = empName;
}
#Override
public String toString() {
return "Employee [empId=" + empId + ", empName=" + empName + "]";
}
#Override
public int hashCode() {
return empId + empName.hashCode();
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (!(this instanceof Employee)) {
return false;
}
Employee emp = (Employee) obj;
return this.getEmpId() == emp.getEmpId() && this.getEmpName().equals(emp.getEmpName());
}
}
Test Class
public class Test {
public static void main(String[] args) {
Employee emp1 = new Employee(101,"Manash");
Employee emp2 = new Employee(101,"Manash");
Employee emp3 = new Employee(103,"Ranjan");
System.out.println(emp1.hashCode());
System.out.println(emp2.hashCode());
System.out.println(emp1.equals(emp2));
System.out.println(emp1.equals(emp3));
}
}
In Object Class equals(Object obj) is used to compare address comparesion thats why when in Test class if you compare two objects then equals method giving false but when we override hashcode() the it can compare content and give proper result.
Both the methods are defined in Object class. And both are in its simplest implementation. So when you need you want add some more implementation to these methods then you have override in your class.
For Ex: equals() method in object only checks its equality on the reference. So if you need compare its state as well then you can override that as it is done in String class.
There's no mention in this answer of testing the equals/hashcode contract.
I've found the EqualsVerifier library to be very useful and comprehensive. It is also very easy to use.
Also, building equals() and hashCode() methods from scratch involves a lot of boilerplate code. The Apache Commons Lang library provides the EqualsBuilder and HashCodeBuilder classes. These classes greatly simplify implementing equals() and hashCode() methods for complex classes.
As an aside, it's worth considering overriding the toString() method to aid debugging. Apache Commons Lang library provides the ToStringBuilder class to help with this.

when do i need hashcode and equals method?

I have hashset whare i will store set to objects and i want to find particular object, in this case why do i need to override the hashcode and equals method which i read from below example
public class Emp
{
private int age ;
public Emp( int age )
{
super();
this.age = age;
}
public int hashCode()
{
return age;
}
public boolean equals( Object obj )
{
boolean flag = false;
Emp emp = ( Emp )obj;
if( emp.age == age )
flag = true;
return flag;
}
}
They are saying i would get false for the below query if i not override the hashcode and equals method.
System.out.println("HashSet Size--->>>"+hs.size());
System.out.println("hs.contains( new Emp(25))--->>>"+hs.contains(new Emp(25)));
System.out.println("hs.remove( new Emp(24)--->>>"+hs.remove( new Emp(24));
System.out.println("Now HashSet Size--->>>"+hs.size());
I got confuse how this is related to hashcode and equals just checking the contains(anyobject) and remove(anyobject) in hashset .
Can somebody explain me the above scenario?
At the base of your confusion is the concept of identity vs equality, and what does it mean for a set to contain an element. Let me try to explain.
Suppose in some place in your code you have done this:
HashSet<Emp> hs = new HashSet<>();
hs.add(new Emp(32));
In some other place, you want to see whether an employee for age 32 is in the set. How would you do it? You may consider this:
boolean isThere = hs.contains(new Emp(32));
what you're doing here is creating an instance of Emp passing 32 to the constructor, and then passing the instance to contain().
Note that this instance is not the same instance as the one you created when you added to the set. So, the question is: should contains() return true, as this instance is identical to the one you added, or should it return false, as it is not the very same instance?
The result depends on how hashCode() and equals() are implemented for Emp. With the default implementation, equals() returns true only if the instance passed is the very same that is contained (i.e. uses == to compare the instance passed to contains() and the one stored). In this case, it would return false.
To understand hashCode(), you need to understand how a HashSet works.
When you add an element to a HashSet, an index in an array is computed from the element using hashCode() % <size of the array>. The element is then set as value at the corresponding index.
As different elements may end up having the same hashCode(), more elements may be mapped at the same index, in which case a list of collisions is maintained.
So, going back to your case, why do you need to implement hashCode()? because the default implementation will return different numbers for different instances of Ent, event though age may be the same (The implementation is JVM dependent, for example it could return the address in memory of the instance). So, for contain to work, we need to make sure that the same index in the array is computed for the two instances, hence you need to implement it accordingly. for example, in this case, hashCode() could return the age itself.

Java Collections: IF the key of Hashmap is Immutable Object then do we need to override hashcode and equals?

I am getting confused with one concept. Can someone please throw some light on it.
Question: If the key of Hashmap is Immutable Object(create by developer) then do we need to override hashcode() and equals()? Or having immutable field as key solves the problem of overriding hashcode() and equals()?
Thanks.
Yes. I'll cite the example of java.lang.Integer here. If we wish to have a (sparse) mapping of integers to objects, we'd use something along the lines of HashMap<Integer, Object>. If we add an entry Integer.valueOf(2)=>"foo", and try to retrieve it with new Integer(2) then the overridden hashcode and equals are required.
These are slightly different categories of issues.
As in hexafraction's answer, having immutable instances is not sufficient to let you skip the step of writing an equals and hashCode, if two different instances could ever be considered to be the same. new Integer(2) should always be equal to every other new Integer(2), even though the objects are immutable and the instances are different.
That said, there are examples of "instance-controlled classes" where the default behavior of instance identity is enough:
Enum instances are created at compile time, one per value. There is (theoretically) no way to produce any other instance. If no two instances are equal, the default implementation of equals and hashCode is sufficient. Enum instances aren't compiler-guaranteed to be immutable, but you should treat them as such.
If your class's instances are guaranteed to be different from one another, regardless of whether they're immutable, you can skip equals and hashCode. One could imagine a Car object, where every Car that a CarFactory produces is different.
As a variation of the above, if you control object instantiation tightly enough that equal representations are always given the exact same instance, then that could be considered sufficient:
public class MyClass {
private MyClass(int foo) { /* ... */ }
private static final Map<Integer, MyClass> instanceCache = new HashMap<>();
/** Returns an existing MyClass(foo) if present; otherwise, creates one. */
public synchronized static MyClass create(int foo) {
// Neither leak-proof or thread-safe. Just demonstrating a concept.
if (instanceCache.contains(foo)) {
return instanceCache.get(foo);
}
MyClass newMyClass = new MyClass(foo);
instanceCache.put(foo, newMyClass);
return newMyClass;
}
}
Try it and see:
public class OverrideIt
{
static class MyKey
{
public final int i; // immutable
public MyKey(int i)
{
this.i = i;
}
}
public static void main(String[] args)
{
Map<MyKey, String> map = new HashMap<MyKey, String>();
map.put(new MyKey(1), "Value");
String result = map.get(new MyKey(1));
System.out.println(result); // null
}
}
which prints out null, showing that we failed to look up our value. This is because the two copies of MyKey are not equal and don't have the same hashcode, because we did not override .equals() and hashcode().
Facts
Basic data structure for hashMap is Entry[] (Entry is kind of LinkedList).
Key's hashcode used to locate the position in this Array
Once the Entry retried using hashcode then Key's Equal used to pick the correct Entry (By Iterating hasNext)
Default hashcode returns unique Integer value for that instance.
And you Agreed in comment section
"Is it possible that you'll store an object using one key,
and then try to retrieve it using a key which is an identical object,
but not the same object"
Even though both keys having same value the instances are different. then you might have different hashcode for both keys as per contract (Fact 4) . Thus you will have different position in array (Rule 2)
map.put(new key(1),"first element");
Here key Object does not override so it will return hashcode unquie per instance. (to avoid too complication assume the hashcode returned as 000025 . So Entry[25] is "First Element" )
map.get(new key(1))
Now this new key may return hashcode value as 000017 , So it would try to get value from Entry[17] and return null ( which is not excepted) .
Note I just gave sample as 000025 and 000017 for simplicity , actually hashmap would revisit the hashcode and change it based on array size
So far we have not discussed weather the Key is Mutable or Immutable . Irrespective of the key is Mutable or Immutable
If you store an object using one key,and then try to retrieve it using a key
which is an identical object,but not the same object
You need to override the hashcode and make sure it returns the same Integer , so that it would locate same bucket (position in Array) and get the element. Same applies to equals to get the correct element from Entry

HashSet behavior when changing field value

I just did the following code:
import java.util.HashSet;
import java.util.Set;
public class MyClass {
private static class MyObject {
private int field;
public int getField() {
return field;
}
public void setField(int aField) {
field = aField;
}
#Override
public boolean equals(Object other) {
boolean result = false;
if (other != null && other instanceof MyObject) {
MyObject that = (MyObject) other;
result = (this.getField() == that.getField());
}
return result;
}
#Override
public int hashCode() {
return field;
}
}
public static void main(String[] args) {
Set<MyObject> mySet = new HashSet<MyObject>();
MyObject object = new MyObject();
object.setField(3);
mySet.add(object);
object.setField(5);
System.out.println(mySet.contains(object));
MyObject firstElement = mySet.iterator().next();
System.out.println("The set object: " + firstElement + " the object itself: " + object);
}
}
It prints:
false
The set object: MyClass$MyObject#5 the object itself: MyClass$MyObject#5
Basically meaning that the object is not considered to be in the set, whiile its instance itself apparantly is in the set. this means that if I insert a object in a set, then change the value of a field that participates in the calculation of the hashCode method, then the HashSet method will seize working as expected. Isn;t this too big source of possible errors? How can someone defend against such cases?
Below is the quote from Set API. It explains everything.
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
http://docs.oracle.com/javase/7/docs/api/java/util/Set.html
HashSet is implemented on HashMap.
HashMap caches the hashCode of the key, So if you change the hashCode than even though the hash function maps the hashCode to the same bucket as the original object present but it will not find because before even checking the object equality it will check the hashCode.
see the line:
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
And if the hashCode maps to different bucket by hash function than the original object is present than obviously it can't find.
So even though same object if you change the hashCode hashSet can't find. Hope it helps.
So key for the HashMap or the object that you are putting into HashSet should be immutable or effective immutable.
#fazomisiek
public HashSet() {
map = new HashMap<E,Object>();
}
Similarly if you check the source of HashSet you can find it.
This problem is just a limitation of the implementation of java.util.HashSet and the underlying java.util.HashMap. Fundamentally, you're trading off the ability to modify elements in the set for faster insert/lookup performance - it's just part of the contract of using a hash set / map data structure.
If you can't guarantee everybody will remember they can't modify the objects in the set, the only way to absolutely guard against this happening is to only insert immutable objects into the set in the first place.

hashcode implementation on java classes containing collections

I have a class that contains a collection.
The two instances of the class are equal if the contents of the collection are equal.
While I am building my data structure I store the class in a HashSet, and the contents of the collection change. The changes cause a change in the hash code value. This seems to cause side effects where my data is lost in the Set.
Removing the collection from the hashcode calculation fixes the problem, but violates rule where all fields in equals should be used in the hashcode.
How would you implement the hashcode in this situation?
public class LeveZeroHolder
{
private final Set<LevelOneHolder> orgGroups = new HashSet<LevelOneHolder>();
private final String name;
public LeveZeroHolder(String name, LevelOneHolder og)
{
this.name = name;
orgGroups.add(og);
og.setFA(this);
}
#Override
public boolean equals(Object obj)
{
if (this == obj)
return true;
if (obj == null || obj.getClass () != getClass ())
return false;
LeveZeroHolder hobj = (LeveZeroHolder)obj;
return getOrgGroups().equals(hobj.getOrgGroups()) && getName().equals(hobj.getName());
}
#Override
public int hashCode()
{
int rs = 17;
rs = rs * 37 + ((getName() == null) ? 0 : getName().hashCode ());
rs = rs * 37 + ((getOrgGroups() == null) ? 0 : getOrgGroups().hashCode());
return rs;
}
public String getName()
{
return name;
}
public Set<LevelOneHolder> getOrgGroups()
{
return orgGroups;
}
public void addOrgGroup(LevelOneHolder o)
{
o.setFA(this);
orgGroups.add(o);
}
}
If you mean that when you store such objects as a key in a Map or in a Set they get lost, you might want to have a look at this thread which explains why storing mutable objects in a set in not a good idea, especially if they change while being held by the set.
Extract from the Set javadoc:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
If your collection of LevelOneHolder really does affect the uniqueness of your object then you should make LevelZeroHolder immutable. If you add a LevelOneHolder to your LevelZeroHolder then you should, instead of updating the collection, return a completely new LevelZeroHolder with the collection copied from the pre-existing one and joined with the new one you want to add.
In this way the hashCode never changes but you end up with different LevelZeroHolders with different hashCodes. This is probably correct though considering that you're saying that the nested collection within LevelZeroHolder contributes to the uniqueness of that object.
If you put your object in a data structure where you depend on the hashCode or the equals() to find it, AND if you change some property of the object (like its name, or something in the list in it) when it is already in that data structure, then you wont find it. Is that your case?
The reason is that when you put the object in such a data structure, it is inserted according to its hashCode, at the time of insertion. If when its inside, you change something that changes the hashCode, then your object is still stored according to the old hashCode but when you try to get it, and so you compute its hashCode to see if thats your object, then you get the new hashCode value and so you dont see that its the one you want.
The rule is that for an object in a data structure that uses the object's hash code or equals NEVER change the object in a way that affects the hashCode or equals.
Try using immutable objects instead.

Categories