Understanding contains method of Java HashSet

Understanding contains method of Java HashSet - java

Newbie question about java HashSet
Set<User> s = new HashSet<User>();
User u = new User();
u.setName("name1");
s.add(u);
u.setName("name3");
System.out.println(s.contains(u));
Can someone explain why this code output false ? Moreover this code does not even call equals method of User. But according to the sources of HashSet and HashMap it have to call it. Method equals of User simply calls equals on user's name. Method hashCode return hashCode of user's name

If the hash code method is based on the name field, and you then change it after adding the object, then the second contains check will use the new hash value, and won't find the object you were looking for. This is because HashSets first search by hash code, so they won't bother calling equals if that search fails.
The only way this would work is if you hadn't overridden equals (and so the default reference equality was used) and you got lucky and the hash codes of the two objects were equal. But this is a really unlikely scenario, and you shouldn't rely on it.
In general, you should never update an object after you have added it to a HashSet if that change will also change its hashcode.

Since your new User has a different hashcode, the HashSet knows that it isn't equal.
HashSets store their items according to their hashcodes.
The HashSet will only call equals if it finds an item with the same hashcode, to make sure that the two items are actually equal (as opposed to a hash collision)

Related

Consequences of different hashcodes but same equals for two java objects

I understand that we should have same hashcodes incase equals are same for two java objects, but just wanted to understand if hashcodes are not same but equals returns true, what would be the consequences with respect to collections like HashMap, HashSet etc.
Would it only impact the performance or will it impact the behavior/functionality of those collection classes.

Let's call the objects o1 and o2 where o1.equals(o2) but o1.hashCode() != o2.hashCode()
Consider the following:
Map map = new HashMap();
Set set = new HashSet();
map.put(o1, "foo");
set.add(o1);
The following assertions would fail
Assert.assertTrue(map.containsKey(o2));
Assert.assertTrue(set.contains(o2));

The consequences will be unexpected behavior.
If a.equals(b) == true but a.hashCode()!=b.hashCode(), set.add(a) followed by set.contains(b) will most likely return false (assuming set is a HashSet), even though according to equals it should return true. (the reason it's most likely and not a certainty is that two different hash codes still have a chance of being mapped to the same bucket of the HashSet/HashMap, in which case you can still get true).

It would break the functionality. If you are looking for an object in a hashmap or hashset, it is using the hash code in order to find it. If the hash code is not consistent, it probably will not be able to find it.
The most basic requirement of a hash code is that two equal objects must have the same hash code. Everything else is secondary.

If two objects are equal, their hashcode will always return same value.
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
Please read this

It will miss the bucket during fetch. HashMap is designed to store huge data with fetch time as O(1) in best possible scenario. To do this it stores key/values marked against hashcode(which we generally refer as bucket). This is called hashing technology.
So you stored it in hashmap with hashcode number, say 100 and now you are trying to fetch the object with different hashcode (say 200)(ie looking inside different bucket). So even though your object is inside hashmap, it will not be able to retrieve it because it will try to find in different bucket (i.e 200).
That is the reason we should have same hashcodes incase equals are same for two java objects

HashMap/HashSet is meant to help you find what the target object in the collections. If two equal objects have different hashcodes, you are unlikely to find the object in the hash bucket.

Should I override hashCode() of Collections?

Given that I some class with various fields in it:
class MyClass {
private String s;
private MySecondClass c;
private Collection<someInterface> coll;
// ...
#Override public int hashCode() {
// ????
}
}
and of that, I do have various objects which I'd like to store in a HashMap. For that, I need to have the hashCode() of MyClass.
I'll have to go into all fields and respective parent classes recursively to make sure they all implement hashCode() properly, because otherwise hashCode() of MyClass might not take into consideration some values. Is this right?
What do I do with that Collection? Can I always rely on its hashCode() method? Will it take into consideration all child values that might exist in my someInterface object?
I OPENED A SECOND QUESTION regarding the actual problem of uniquely IDing an object here: How do I generate an (almost) unique hash ID for objects?
Clarification:
is there anything more or less unqiue in your class? The String s? Then only use that as hashcode.
MyClass hashCode() of two objects should definitely differ, if any of the values in the coll of one of the objects is changed. HashCode should only return the same value if all fields of two objects store the same values, resursively. Basically, there is some time-consuming calculation going on on a MyClass object. I want to spare this time, if the calculation had already been done with the exact same values some time ago. For this purpose, I'd like to look up in a HashMap, if the result is available already.
Would you be using MyClass in a HashMap as the key or as the value? If the key, you have to override both equals() and hashCode()
Thus, I'm using the hashCode OF MyClass as the key in a HashMap. The value (calculation result) will be something different, like an Integer (simplified).
What do you think equality should mean for multiple collections? Should it depend on element ordering? Should it only depend on the absolute elements that are present?
Wouldn't that depend on the kind of Collection that is stored in coll? Though I guess ordering not really important, no
The response you get from this site is gorgeous. Thank you all
#AlexWien that depends on whether that collection's items are part of the class's definition of equivalence or not.
Yes, yes they are.

I'll have to go into all fields and respective parent classes recursively to make sure they all implement hashCode() properly, because otherwise hashCode() of MyClass might not take into consideration some values. Is this right?
That's correct. It's not as onerous as it sounds because the rule of thumb is that you only need to override hashCode() if you override equals(). You don't have to worry about classes that use the default equals(); the default hashCode() will suffice for them.
Also, for your class, you only need to hash the fields that you compare in your equals() method. If one of those fields is a unique identifier, for instance, you could get away with just checking that field in equals() and hashing it in hashCode().
All of this is predicated upon you also overriding equals(). If you haven't overridden that, don't bother with hashCode() either.
What do I do with that Collection? Can I always rely on its hashCode() method? Will it take into consideration all child values that might exist in my someInterface object?
Yes, you can rely on any collection type in the Java standard library to implement hashCode() correctly. And yes, any List or Set will take into account its contents (it will mix together the items' hash codes).

So you want to do a calculation on the contents of your object that will give you a unique key you'll be able to check in a HashMap whether the "heavy" calculation that you don't want to do twice has already been done for a given deep combination of fields.
Using hashCode alone:
I believe hashCode is not the appropriate thing to use in the scenario you are describing.
hashCode should always be used in association with equals(). It's part of its contract, and it's an important part, because hashCode() returns an integer, and although one may try to make hashCode() as well-distributed as possible, it is not going to be unique for every possible object of the same class, except for very specific cases (It's easy for Integer, Byte and Character, for example...).
If you want to see for yourself, try generating strings of up to 4 letters (lower and upper case), and see how many of them have identical hash codes.
HashMap therefore uses both the hashCode() and equals() method when it looks for things in the hash table. There will be elements that have the same hashCode() and you can only tell if it's the same element or not by testing all of them using equals() against your class.
Using hashCode and equals together
In this approach, you use the object itself as the key in the hash map, and give it an appropriate equals method.
To implement the equals method you need to go deeply into all your fields. All of their classes must have equals() that matches what you think of as equal for the sake of your big calculation. Special care needs to be be taken when your objects implement an interface. If the calculation is based on calls to that interface, and different objects that implement the interface return the same value in those calls, then they should implement equals in a way that reflects that.
And their hashCode is supposed to match the equals - when the values are equal, the hashCode must be equal.
You then build your equals and hashCode based on all those items. You may use Objects.equals(Object, Object) and Objects.hashCode( Object...) to save yourself a lot of boilerplate code.
But is this a good approach?
While you can cache the result of hashCode() in the object and re-use it without calculation as long as you don't mutate it, you can't do that for equals. This means that calculation of equals is going to be lengthy.
So depending on how many times the equals() method is going to be called for each object, this is going to be exacerbated.
If, for example, you are going to have 30 objects in the hashMap, but 300,000 objects are going to come along and be compared to them only to realize that they are equal to them, you'll be making 300,000 heavy comparisons.
If you're only going to have very few instances in which an object is going to have the same hashCode or fall in the same bucket in the HashMap, requiring comparison, then going the equals() way may work well.
If you decide to go this way, you'll need to remember:
If the object is a key in a HashMap, it should not be mutated as long as it's there. If you need to mutate it, you may need to make a deep copy of it and keep the copy in the hash map. Deep copying again requires consideration of all the objects and interfaces inside to see if they are copyable at all.
Creating a unique key for each object
Back to your original idea, we have established that hashCode is not a good candidate for a key in a hash map. A better candidate for that would be a hash function such as md5 or sha1 (or more advanced hashes, like sha256, but you don't need cryptographic strength in your case), where collisions are a lot rarer than a mere int. You could take all the values in your class, transform them into a byte array, hash it with such a hash function, and take its hexadecimal string value as your map key.
Naturally, this is not a trivial calculation. So you need to think if it's really saving you much time over the calculation you are trying to avoid. It is probably going to be faster than repeatedly calling equals() to compare objects, as you do it only once per instance, with the values it had at the time of the "big calculation".
For a given instance, you could cache the result and not calculate it again unless you mutate the object. Or you could just calculate it again only just before doing the "big calculation".
However, you'll need the "cooperation" of all the objects you have inside your class. That is, they will all need to be reasonably convertible into a byte array in such a way that two equivalent objects produce the same bytes (including the same issue with the interface objects that I mentioned above).
You should also beware of situations in which you have, for example, two strings "AB" and "CD" which will give you the same result as "A" and "BCD", and then you'll end up with the same hash for two different objects.

For future readers.
Yes, equals and hashCode go hand in hand.
Below shows a typical implementation using a helper library, but it really shows the "hand in hand" nature. And the helper library from apache keeps things simpler IMHO:
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
MyCustomObject castInput = (MyCustomObject) o;
boolean returnValue = new org.apache.commons.lang3.builder.EqualsBuilder()
.append(this.getPropertyOne(), castInput.getPropertyOne())
.append(this.getPropertyTwo(), castInput.getPropertyTwo())
.append(this.getPropertyThree(), castInput.getPropertyThree())
.append(this.getPropertyN(), castInput.getPropertyN())
.isEquals();
return returnValue;
}
#Override
public int hashCode() {
return new org.apache.commons.lang3.builder.HashCodeBuilder(17, 37)
.append(this.getPropertyOne())
.append(this.getPropertyTwo())
.append(this.getPropertyThree())
.append(this.getPropertyN())
.toHashCode();
}
17, 37 .. those you can pick your own values.

From your clarifications:
You want to store MyClass in an HashMap as key.
This means the hashCode() is not allowed to change after adding the object.
So if your collections may change after object instantiation, they should not be part of the hashcode().
From http://docs.oracle.com/javase/8/docs/api/java/util/Map.html
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map.
For 20-100 objects it is not worth that you enter the risk of an inconsistent hash() or equals() implementation.
There is no need to override hahsCode() and equals() in your case.
If you don't overide it, java takes the unique object identity for equals and hashcode() (and that works, epsecially because you stated that you don't need an equals() considering the values of the object fields).
When using the default implementation, you are on the safe side.
Making an error like using a custom hashcode() as key in the HashMap when the hashcode changes after insertion, because you used the hashcode() of the collections as part of your object hashcode may result in an extremly hard to find bug.
If you need to find out whether the heavy calculation is finished, I would not absue equals(). Just write an own method objectStateValue() and call hashcode() on the collection, too. This then does not interfere with the objects hashcode and equals().
public int objectStateValue() {
// TODO make sure the fields are not null;
return 31 * s.hashCode() + coll.hashCode();
}
Another simpler possibility: The code that does the time consuming calculation can raise an calculationCounter by one as soon as the calculation is ready. You then just check whether or not the counter has changed. this is much cheaper and simpler.

What happens if we override only hashCode() in a class and use it in a Set?

This may not be the real world scenario but just curious to know what happens, below is the code.
I am creating a set of object of class UsingSet.
According to hashing concept in Java, when I first add object which contains "a", it will create a bucket with hashcode 97 and put the object inside it.
Again when it encounters an object with "a", it will call the overridden hashcode method in the class UsingSet and it will get hashcode 97 so what is next?
As I have not overridden equals method, the default implementation will return false. So where will be the Object with value "a" be kept, in the same bucket where the previous object with hashcode 97 kept? or will it create new bucket?
anybody know how it will be stored internally?
/* package whatever; // don't place package name! */
import java.util.*;
import java.lang.*;
import java.io.*;
class UsingSet {
String value;
public UsingSet(String value){
this.value = value;
}
public String toString() {
return value;
}
public int hashCode() {
int hash = value.hashCode();
System.out.println("hashcode called" + hash);
return hash;
}
public static void main(String args[]) {
java.util.Set s = new java.util.HashSet();
s.add(new UsingSet("A"));
s.add(new UsingSet("b"));
s.add(new UsingSet("a"));
s.add(new UsingSet("b"));
s.add(new UsingSet("a"));
s.add(new Integer(1));
s.add(new Integer(1));
System.out.println("s = " + s);
}
}
output is:
hashcode called65
hashcode called98
hashcode called97
hashcode called98
hashcode called97
s = [1, b, b, A, a, a]

HashCode & Equals methods
Only Override HashCode, Use the default Equals:
Only the references to the same object will return true. In other words, those objects you expected to be equal will not be equal by calling the equals method.
Only Override Equals, Use the default HashCode: There might be duplicates in the HashMap or HashSet. We write the equals method and expect{"abc", "ABC"} to be equals. However, when using a HashMap, they might appear in different buckets, thus the contains() method will not detect them each other.

James Large answer is incorrect, or rather misleading (and part incorrect as well). I will explain.
If two objects are equal according to their equals() method, they must also have the same hash code.
If two objects have the same hash code, they do NOT have to be equal too.
Here is the actual wording from the java.util.Object documentation:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
It is true, that if two objects don't have the same hash then they are not equal. However, hashing is not a way to check equality - so it is wildly incorrect to say that it is a faster way to check equality.
Also, it is also wildly incorrect to say the hashCode function is an efficient way to do anything. This is all up to implementation, but the default implementation for hashCode of a string is very inefficient as the String gets large. It will perform a calculation based on each char of the String, so if you are using large Strings as keys, then this becomes very inefficient; moreso if you have a large number of buckets.
In a Map (HashSet uses a HashMap internally), there are buckets and in each bucket is a linked list. Java uses the hashCode() function to find out which bucket it belongs in (it actually will modify the hash, depending on how many buckets exist). Since two objects may share the same hash, it will iterate through the linked list sequentially next, checking the equals() method to see if the object is a duplicate. Per the java.util.Set documenation:
A collection that contains no duplicate elements.
So, if its hashCode() leads it to a bucket, in which that bucket contains an Object where the .equals() evaluates to true, then the previous Object is overwritten with the new Object. You can probably view here for more information:
How does a Java HashMap handle different objects with the same hash code?
Generally speaking though, it is good practice that if you overwrite the hashCode function, you also overwrite the equals function (if I'm not mistaken, this breaks the contract if you choose not to).

Simply you can Assume hashcode and equals methods as a 2D search like:-
Where Hashcode is the Rows and the object list is the Column.
Consider the following class structure.
public class obj
{
int Id;
String name;
public obj(String name,int id)
{
this.id=id;
this.name=name;
}
}
now if you create the objects like this:-
obj obj1=new obj("Hassu",1);
obj obj2=new obj("Hoor",2);
obj obj3=new obj("Heniel",3);
obj obj4=new obj("Hameed",4);
obj obj5=new obj("Hassu",1);
and you place this objects in map like this :-
HashMap hMap=new HashMap();
1. hMap.put(obj1,"value1");
2. hMap.put(obj2,"value2");
3. hMap.put(obj3,"value3");
4. hMap.put(obj4,"value4");
5. hMap.put(obj5,"value5");
now if you have not override the hashcode and equals then after putting all the objects till line 5 if you put obj5 in the map as By Default HashCode you get different hashCode so the row(Bucket will be different).
So in runtime memory it will be stored like this.
|hashcode | Objects
|-----------| ---------
|000562 | obj1
|000552 | obj2
|000588 | obj3
|000546 | obj4
|000501 | obj5
Now if you create the same object Like :-
obj obj6 = new obj("hassu",1);
And if you search for this value in the map.like
if(hMap.conaints(obj6))
or
hMpa.get(obj 6);
though the key(obj1) with the same content is available you will get false and null respectively.
Now if you override only equals method.
and perform the same content search key will also get the Null as the HashCode for obj6 is different and in that hashcode you wont find any key.
Now if you override only hashCode method.
You will get the same bucket (HashCode row) but the content cant be checked and it will take the reference checked implementation by Super Object Class.
SO here if you search for the key hMap.get(obj6) you will get the correct hashcode:- 000562 but as the reference for both obj1 and obj6 is different you will get null.

Set will behave differently.
Uniqueness wont happen. Because unique will be achieved by both hashcode and equals methods.
output will be liked this s = [A, a, b, 1] instead of early one.
Apart that remove and contains all wont work.

Without looking at your code...
The whole point of hash codes is to speed up the process of testing two objects for equality. It can be costly to test whether two large, complex objects are equal, but it is trivially easy to compare their hash codes, and hash codes can be pre-computed.
The rule is: If two objects don't have the same hash code, that means they are not equal. No need to do the expensive equality test.
So, the answer to the question in your title: If you define an equals() method that says object A is equal to object B, and you define a hashCode() method that says object A is not equal to object B (i.e., it says they have different hash codes), and then you hand those two objects to some library that cares whether they are equal or not (e.g., if you put them in a hash table), then the behavior of the library is going to be undefined (i.e., probably wrong).
Added information: Wow! I really missed seeing the forest for the trees here---thinking about the purpose of hashCode() without putting it in the context of HashMap. If m is a Map with N entries, and k is a key; what is the purpose of calling m.get(k)? The purpose, obviously, is to search the map for an entry whose key is equal to k.
What if hash codes and hash maps had not been invented? Well the best you could do, assuming that the keys have a natural, total order, is to search a TreeMap, comparing the given key for equality with O(log(N)) other keys. In the worst case, where the keys have no order, you would have to compare the given key for equality with every key in the map until you either find a match or tested them all. In other words, the complexity of m.get(k) would be O(N).
When m is a HashMap, the complexity of m.get(k) is O(1), whether the keys can be ordered or not.
So, I messed up by saying that the point of hash codes was to speed up the process of testing two objects for equality. It's really about testing an object for equality with a whole collection of other objects. That's where comparing hash codes doesn't just help a little; It helps by orders of magnitude...
...If the k.hashCode() and k.equals(o) methods obey the rule: j.hashCode()!=k.hashCode() implies !j.equals(k).

How HashSet works with regards to hashCode()?

I'm trying to understand java.util.Collection and java.util.Map a little deeper but I have some doubts about HashSet funcionality:
In the documentation, it says: This class implements the Set interface, backed by a hash table (actually a HashMap instance). Ok, so I can see that a HashSet always has a Hashtable working in background. A hashtable is a struct that asks for a key and a value everytime you want to add a new element to it. Then, the value and the key are stored in a bucket based on the key hashCode. If the hashcodes of two keys are the same, they add both key values to the same bucket, using a linkedlist. Please, correct me if I said something wrong.
So, my question is: If a HashSet always has a Hashtable acting in background, then everytime we add a new element to the HashSet using HashSet.add() method, the HashSet should add it to its internal Hashtable. But, the Hashtable asks for a value and a key, so what key does it use? Does it just uses the value we're trying to add also as a key and then take its hashCode? Please, correct me if I said something wrong about HashSet implementation.
Another question that I have is: In general, what classes can use the hashCode() method of an java object? I'm asking this because, in the documentation, it says that everytime we override equals() method we need to override hashCode() method. Ok, it really makes sense, but my doubt is if it's just a recommendation we should do to keep everything 'nice and perfect' (putting in this way), or if it's really necessary, because maybe a lot of Java defaults classes will constantly uses hashCode() method of your objects. In my vision, I can't see other classes using this method instead of those classes related to Collections. Thank you very much guys

If you look at the actual javacode of HashSet you can see what it does:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
...
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
So the element you are adding is the Key in the backing hashmap with a dummy value as the value. this dummy value is never actually used by the hashSet.
Your second question regarding overriding equals and hashcode:
It is really necessary to always override both if you want to override either one. This is because the contract for hashCode says equal objects must have the same hashcode. the default implementation of hashcode will give different values for each instance.
Therefore, if you override equals() but not hashcode() This could happen
object1.equals(object2) //true
MySet.add(object1);
MySet.contains(object2); //false but should be true if we overrode hashcode()
Since contains will use hashcode to find the bucket to search in we might get a different bucket back and not find the equal object.

If you look at the source for HashSet (the source comes with the JDK and is very informative), you will see that it creates an object to use as the value:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
Each value that is added to the HashSet is used as a key to the backing HashMap with this PRESENT object as the value.
Regarding overriding equals() whenever you override hashCode() (and vice versa), it is very important that these two methods return consistent results. That is, they should agree with one another. For more details, see the book Effective Java by Josh Bloch.

Equal Objects must have equal hashcodes?

Equal Objects must have equal hashcodes. As per my understanding this statement is valid when we have intention of using object in hashbased datastuctures. This is one of contract for hashcode and equals method in java docs. I explored the reason why this is said and looked in the implementation of hashtable and found out below code in put method
if ((e.hash == hash) && e.key.equals(key))
So I got it, contract came from condition e.hash == hash above. I further tried to explore why java is checking hashcode when comparing two objects for equality. So here is my understaing
If two equal object have equal hascodes then they can be stored in the same bucket and this will be good in terms of look up in single bucket only
Its better to check hashcode then actually calling equals method because hascode method is less costly than equals method, because here we just have to compare int value where in equals method may be invloving object field comparison. So hashcode method providing one extra filter.
Please correct me if both above reasons are valid?

Correct, just a small correction - if two unequal objects have the same hashcode.
Not exactly, It's better to check it first, as a filter for the non-equal, but if you want to make sure the objects are equal, you should call equals()

You got it wrong. equals just returns a boolean value (two possible values), and needs another object to compare against. hashCode returns an int (2^32 possible values), and only needs the object to be called.
The HashMap tries to distribute all the objects it holds among buckets. When put is called on the map, it has to decide which bucket it will use for the given object. It thus uses hashCode (modulo the number of buckets) to decide which bucket to use. Then, once the bucket is found, it has to check whether the key is already in the map or not. To do this, it compares every object in the bucket with the object to put in the map. And to do this, it uses equals. If the object isn't found, it adds it in the bucket.
hashCode isn't used because it's faster than equals. It's used because it allows distributing keys among a set of buckets. And it's much faster to compute the hashCode once and compare the object with (hopefully) 0, one or two objects in the same bucket that to compare the object with the thousands of objects already stored in the map.

" I further tried to Exlpore why java is checking Hashcode when comparing two objects for equality". Put method is not just checking for equality, it is trying to first narrow down the bucket and then use the equals. That is why we need to combine HashCode with Equals in case of bucketed collections.
But if your sole intention is to just check equality between two objects, you will never need a hashcode method.
Obj1.equals(Obj2) will never use the hashcode method by default.

Its a general type of contract so that when we store the objects inside a hashing based data structure, then we should always consistently put or get the same object to and from the hashtable.
Its a contract which we have created to be followed such that the entry/put processes occur smoothly.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.