I understand that we should have same hashcodes incase equals are same for two java objects, but just wanted to understand if hashcodes are not same but equals returns true, what would be the consequences with respect to collections like HashMap, HashSet etc.
Would it only impact the performance or will it impact the behavior/functionality of those collection classes.
Let's call the objects o1 and o2 where o1.equals(o2) but o1.hashCode() != o2.hashCode()
Consider the following:
Map map = new HashMap();
Set set = new HashSet();
map.put(o1, "foo");
set.add(o1);
The following assertions would fail
Assert.assertTrue(map.containsKey(o2));
Assert.assertTrue(set.contains(o2));
The consequences will be unexpected behavior.
If a.equals(b) == true but a.hashCode()!=b.hashCode(), set.add(a) followed by set.contains(b) will most likely return false (assuming set is a HashSet), even though according to equals it should return true. (the reason it's most likely and not a certainty is that two different hash codes still have a chance of being mapped to the same bucket of the HashSet/HashMap, in which case you can still get true).
It would break the functionality. If you are looking for an object in a hashmap or hashset, it is using the hash code in order to find it. If the hash code is not consistent, it probably will not be able to find it.
The most basic requirement of a hash code is that two equal objects must have the same hash code. Everything else is secondary.
If two objects are equal, their hashcode will always return same value.
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
Please read this
It will miss the bucket during fetch. HashMap is designed to store huge data with fetch time as O(1) in best possible scenario. To do this it stores key/values marked against hashcode(which we generally refer as bucket). This is called hashing technology.
So you stored it in hashmap with hashcode number, say 100 and now you are trying to fetch the object with different hashcode (say 200)(ie looking inside different bucket). So even though your object is inside hashmap, it will not be able to retrieve it because it will try to find in different bucket (i.e 200).
That is the reason we should have same hashcodes incase equals are same for two java objects
HashMap/HashSet is meant to help you find what the target object in the collections. If two equal objects have different hashcodes, you are unlikely to find the object in the hash bucket.
Related
I read in a book that hashCode() shows a memory area which helps (e.g. HashSets) to locate appropriate objects in memory. But how can that be true if we cannot manipulate memory in Java directly? There are no pointers, in addition to it objects are created and moved from one place to another and the developer doesn't know about it.
I read that realization like hashCode() {return 42;} is awful and terrible, but what's the difference if we can't instruct VM where to put our objects?
The question is: what is the purpose of hashCode() on deep level if we can't manipulate memory?
I read in a book that hashCode() shows a memory area which helps (e.g. HashSets) to locate appropriate objects in memory.
No, that's a completely bogus description of the purpose of hashCode. It's used to find potentially equal objects in an efficient manner. It's got nothing to do with the location of the object in memory.
The idea is that if you've got something like a HashMap, you want to find a matching key quickly when you do a lookup. So you first check the requested key's hash code, and then you can really efficiently find all the keys in your map with that hash code. You can then check each of those (and only those) candidate keys for equality against the requested key.
See the Wikipedia article on hash tables for more information.
I like Jon Skeet's answer (+1) but it requires knowing how hash tables work. A hash table is a data structure, basically an array of buckets, that uses the hashcode of the key to decide which bucket to stick that entry in. That way future calls to retrieve whatever's at that key don't have to sift through the whole list of things stored in the hashtable, the hashtable can calculate the hashcode for the key, then go straight to the matching bucket and look there. The hashcode has to be something that can be calculated quickly, and you'd rather it was unique but if it isn't it's not a disaster, except in the worst case (your return 42;), which is bad because everything ends up in the same bucket and you're back to sifting through everything.
The default value for Object#hashCode may be based on something like a memory location just because it's a convenient sort-of-random number, but as the object is shunted around during memory management that value is cached and nobody cares anyway. The hashcodes created by different objects, like String or BigDecimal, certainly have nothing to do with memory. It's just a number that is quickly generated and that you hope is unique more often than not.
A hash code is a just a "value". It has nothing more to do with "where you put it in memory" than "MyClass obj = new MyClass()" has to do with where "obj" is placed in memory.
So what is a Java hashCode() all about?
Here is a good discussion on the subject:
http://www.coderanch.com/t/269570/java-programmer-SCJP/certification/discuss-hashcode-contract
K&B says that the hashcode() contract are :
If two objects are equal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must
produce the same integer result.
If two objects are unequal according to the equals(Object) method, there's no requirement about hashcode().
If calling hashcode() on two objects produce different integer result, then both of them must be unequal according to the
equals(Object).
A hashcode is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like hashmaps that need to store objects, will use a hashcode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode, and that, of course, need to be solved carefully). For the details, I would suggest a read of the wikipedia hashmap entry
HashCode is a encryption of an object and with that encryption java knows if the two of the objects for example in collections are the same or different . (SortedSet for an example)
I would recommend you read this article.
Yes, it has got nothing to do with the memory address, though it's typically implemented by converting the internal address of the object. The following statement found in the Object's hashCode() method makes it clear that the implementation is not forced to do it.
As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct objects. (This
is typically implemented by converting the internal address of the
object into an integer, but this implementation technique is not
required by the JavaTM programming language.
The hashCode() function takes an object and outputs a numeric value, which doesn't have to be unique. The hashcode for an object is always the same if the object doesn't change.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
For more, check out this Java hashcode article.
Why did java designers impose a mandate that
if obj1.equals(obj2) then
obj1.hashCode() MUST Be == obj2.hashCode()
Because a HashMap uses the following algorithm to find keys quickly:
get the hashCode() of the key in argument
deduce the bucket from this hash code
compare every key in the bucket with the key in argument (using equals()) to find the right one
If two equal objects didn't have the same hash code, the first two steps of the algorithm wouldn't work. And it's those two first steps that make a HashMap very fast (O(1)).
There is no mandate. It is a good practice since this is a required condition if your objects are meant to be used in hash based data structures like HashMap/HashSet etc.
As far as I know that's not baked into the language - you could technically have objects whose equals() method does not check the hashcode but you'll get pretty peculiar results.
In particular if you put a bunch of these objects into a HashMap or HashSet the map/set will use the hashCode() method to determine whether the objects may be duplicates - so you can have a situation where a collection will store 2 objects you've defined as equals (which should never happen) because they're each returning different hashCodes.
Because hashcodes are used to quickly determine if two objects are not equal.
Its sort of two steps matching to improve performance.
First Step: calculate hashcode()
Second Step: calculate equals()
Its because if you put your objects as keys in collections like hashmap, your keys will be compared first on hashcode() method if it finds matching hashcode it then goes on further to calculate equals().
Its like indexing for better search performance
I'd like to be able to determine whether I've encountered an object before - I have a graph implementation and I want to see if I've created a cycle, probably by iterating through the Node objects with a tortoise/hare floyd algorithm.
But I want to avoid a linear search through my list of "seen" nodes each time. This would be great if I had a hash table for just keys. Can I somehow hash an object? Aren't java objects just references to places in memory anyway? I wonder how much of a problem collisions would be if so..
The simple answer is to create a HashSet and add each node to the set the first time you encounter it.
The only case that this won't work is if you've overloaded hashCode() and equals(Object) for the node class to implement equality based on node contents (or whatever). Then you'll need to:
use the IdentityHashMap class which uses == and System.identityHashcode rather than equals(Object) and hashCode(), or
build a hashtable yourself using your own flavour of object identity.
Aren't java objects just references to places in memory anyway?
Yes and no. Yes, the reference is represented by a memory address (on most JVMs). The problem is that 1) you can't get hold of the address, and 2) it can change when the GC relocates the object. This means that you can't use the object address as a hashcode.
The identityHashCode method deals this by returning a value that is initially based on the memory address. If you then call identityHashCode again for the same object, you are guaranteed to get the same value as before ... even if the object has been relocated.
I wonder how much of a problem collisions would be if so..
The hash values produced by the identityHashCode method can collide. (That is, two distinct objects can have the same identity hashcode value.) Anything that uses these values has to deal with this. (The standard HashSet and IdentityHashMap classes take care of these collisions ... if you chose to use them.)
I'd like to be able to determine whether I've encountered an object
before
Use an IdentityHashMap. It is the ideal for your job since it is not an equals but a == implementation.
Take a look at HashSet. Note that in order for objects to work with HashSet, they need to provide correct implementations of hashCode and equals methods of the java.lang.Object class.
You'll need to implement a hash function for your objects. This is done by overriding hashCode() defined in java.lang.Object. This method is used by HashMap, HashSet etc to store objects. In hashCode() it's up to you to calculate a hash for the object. Don't forget to also implement the equals()-method!
Take a look at Java collection framework (http://docs.oracle.com/javase/tutorial/collections/)
Equal Objects must have equal hashcodes. As per my understanding this statement is valid when we have intention of using object in hashbased datastuctures. This is one of contract for hashcode and equals method in java docs. I explored the reason why this is said and looked in the implementation of hashtable and found out below code in put method
if ((e.hash == hash) && e.key.equals(key))
So I got it, contract came from condition e.hash == hash above. I further tried to explore why java is checking hashcode when comparing two objects for equality. So here is my understaing
If two equal object have equal hascodes then they can be stored in the same bucket and this will be good in terms of look up in single bucket only
Its better to check hashcode then actually calling equals method because hascode method is less costly than equals method, because here we just have to compare int value where in equals method may be invloving object field comparison. So hashcode method providing one extra filter.
Please correct me if both above reasons are valid?
Correct, just a small correction - if two unequal objects have the same hashcode.
Not exactly, It's better to check it first, as a filter for the non-equal, but if you want to make sure the objects are equal, you should call equals()
You got it wrong. equals just returns a boolean value (two possible values), and needs another object to compare against. hashCode returns an int (2^32 possible values), and only needs the object to be called.
The HashMap tries to distribute all the objects it holds among buckets. When put is called on the map, it has to decide which bucket it will use for the given object. It thus uses hashCode (modulo the number of buckets) to decide which bucket to use. Then, once the bucket is found, it has to check whether the key is already in the map or not. To do this, it compares every object in the bucket with the object to put in the map. And to do this, it uses equals. If the object isn't found, it adds it in the bucket.
hashCode isn't used because it's faster than equals. It's used because it allows distributing keys among a set of buckets. And it's much faster to compute the hashCode once and compare the object with (hopefully) 0, one or two objects in the same bucket that to compare the object with the thousands of objects already stored in the map.
" I further tried to Exlpore why java is checking Hashcode when comparing two objects for equality". Put method is not just checking for equality, it is trying to first narrow down the bucket and then use the equals. That is why we need to combine HashCode with Equals in case of bucketed collections.
But if your sole intention is to just check equality between two objects, you will never need a hashcode method.
Obj1.equals(Obj2) will never use the hashcode method by default.
Its a general type of contract so that when we store the objects inside a hashing based data structure, then we should always consistently put or get the same object to and from the hashtable.
Its a contract which we have created to be followed such that the entry/put processes occur smoothly.
why java Object class has two methods hashcode() and equals()? One of them looks redundant and its percolated to the bottom most derived class?
Why do you think one is redundant? They say different things:
hashCode is "give me some way of efficiently seeing whether two objects are likely to be equal"
equals is "check whether this object is genuinely equal to another"
You definitely need both - although I don't believe they should really be in Object in the first place.
You absolutely need hash codes in order to perform efficient lookups with hash tables - and you absolutely need further equality checks because hashes will collide (there are far more possible strings than hash codes, for example).
First of all, when you override equals() you MUST override hashcode() as well.
Failure to do so
will result in a violation of the general contract for Object.hashCode, which will
prevent your class from functioning properly in conjunction with all hash-based
collections, including HashMap, HashSet, and Hashtable.
Here is the contract, copied from the Object specification [JavaSE6]:
Whenever it is invoked on the same object more than once during an execu-
tion of an application, the hashCode method must consistently return the
same integer, provided no information used in equals comparisons on the
object is modified. This integer need not remain consistent from one execu-
tion of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then call-
ing the hashCode method on each of the two objects must produce the same
integer result.
It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects
must produce distinct integer results. However, the programmer should be
aware that producing distinct integer results for unequal objects may improve
the performance of hash tables.
The fundamental idea is that by comparing hashcode()s it's quick to check whether two objects are probably equal. If their hashcodes are equal, then the objects probably are equal (not necessarily, but it's a good guess). Then a more profound (and more expensive) check with equals() is performed. This is important to speed up all kind of look-ups (from maps etc).
equals is to compare objects, hashcode is used to generate a hash value from an object, which will then be used by the java map containers (Hashtable, Map etc).
it's common practice to override them together (if you override hashcode, you need to override equals and vice versa).