Related
This question is asked by interviewer that most of answers related to hash code is used for bucketing where it checks equals to search objects.
Is there any other general use case or scenario, where hash code is beneficial and can be used in a routine program?
As recently I have used JPA where it throws exception "Composite-id class does not override hashCode()" but again it is used by implementation class of hibernate. Sly, what other places or scenario we can use hashcode other then collections especially scenario where you have used it yourself.
class a {
public int hashCode() {
}
}
class b {
public static void main(String[] str) {
//In what ways can i use hashcode here?
}
}
Let's say your class will not be used in any collection ever( which is very unlikely though), it will be used in more than one place and by other developers. Any developer using your class will expect that if two instances of that class are equal based on equals method, they should produce same hashCode value. This fundamental assumption will be broken if hashCode is not overridden to be consistent with equals and that will prevent their code to function properly.
From Effective Java , 3rd Edition :
ITEM 11: ALWAYS OVERRIDE HASHCODE WHEN YOU OVERRIDE EQUALS
You must override hashCode in every class that overrides equals. If
you fail to do so, your class will violate the general contract for
hashCode, which will prevent it from functioning properly in
collections such as HashMap and HashSet. Here is the contract, adapted
from the Object specification :
• When the hashCode method is invoked on an object repeatedly during
an execution of an application, it must consistently return the same
value, provided no information used in equals comparisons is modified.
This value need not remain consistent from one execution of an
application to another.
• If two objects are equal according to the equals(Object) method,
then calling hashCode on the two objects must produce the same integer
result.
• If two objects are unequal according to the equals(Object) method,
it is not required that calling hashCode on each of the objects must
produce distinct results. However, the programmer should be aware that
producing distinct results for unequal objects may improve the
performance of hash tables.
The key provision that is violated when you fail to override hashCode
is the second one: equal objects must have equal hash codes. Two
distinct instances may be logically equal according to a class’s
equals method, but to Object’s hashCode method, they’re just two
objects with nothing much in common. Therefore, Object’s hashCode
method returns two seemingly random numbers instead of two equal
numbers as required by the contract.
A small semantic mistake in the interview question. Hash code is not used to check equality, it's used to detect inequality. If hash codes are different the objects are guaranteed to be inequal. If the codes are equal the objects may be equal and need to be checked with the equals-method.
That said, if the hash code is cached, it could be used to speed up the equals method.
Suppose you only override equals but not hashCode
This means that hashCode is inherited from Object
Object.hashCode always tries to return different hash codes for different objects (regardless if they are equal or not)
This means that you may end up with different hash codes for two objects that you consider to be equal.
This in turn causes these two equal objects to end up in different buckets in hash based collections such as HashSet.
This causes such collections to break.
More reference: https://programming.guide/java/overriding-hashcode-and-equals.html
This question already has answers here:
How is hashCode() calculated in Java
(11 answers)
Closed 9 years ago.
I have learned that hashcode is a Unique Identification reference number, which is a hexadecimal number.
My doubt is, does the reference number represents the memory address of the object?
For example:
Employeee e1=new Employee();
System.out.println(e1.hashcode());
Will this code return me the memory address of my object?
Hashcode is not a unique identification. It's just a number that helps you distinguish objects. Two different objects may have the same hash code and it is fine.
HashCode characteristics:
If obj1 and obj2 are equal, they must have the same hash code.
If obj1 and obj2 have the same hash code, they do not have to be equal.
If Employee class hasn't overridden the hashCode() method , then it will use the one defined in its super class, probably Object class . hashCode() in Object class says ,
As much as is reasonably practical, the hashCode method defined by class Object
does return distinct integers for distinct objects. (This is typically
implemented by converting the internal address of the object into an integer,
but this implementation technique is not required by the JavaTM
programming language.)
So, in short it may or may not depending upon the implementation.Suppose, if Employee class has overridden the hashCode() as(though bad practice and useless) :
public int hashCode() {
return 10;
}
Then, you can see that it doesn't return a memory address here.
Not necessarily the memory address. It should be kept different for different objects. But it might be anything. You can also override default hashCode definition with your own.
Hashcode is a number used by JVM for hashing in order to store and retrieve the objects. For example, when we add an object in hashmap, JVM looks for the hashcode implentation to decide where to put the object in memory. When we retrieve an object again hashcode is used to get the location of the object. Note that hashcode is not the actual memory address but its a link for JVM to fetch the object from a specified location with a complexity of O(1).
Simple Answer NO
A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code. If you write a custom type you are responsible for creating a good hashCode implementation that will best represent the state of the current instance.
So for your class you can implement the hashcode Method and return whatever you want.
hashCode is the native implementation which provides the memory address to a certain extent.
Any ways you can ovveride it.
If you look close at API
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
I do not know how much you understand from others answers but my doubts were clear after
I read
The 3 things you should know about hashCode() by #Ralf Sternberg
According to Java documentation:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
The hashcode for objects is implementation-specific, but I would highly doubt any JVM implementation would use the memory address. Since garbage collection is a central feature of Java, that means the object could be moved and thereby have different memory addresses during its lifetime, even if its contents remain unchanged (and this would violate the hashcode spec).
Hashcode is the 32 bit signed integer that is uniquely assigned to that object. It is the result of the hash function when inserting into the hashmap
I read in a book that hashCode() shows a memory area which helps (e.g. HashSets) to locate appropriate objects in memory. But how can that be true if we cannot manipulate memory in Java directly? There are no pointers, in addition to it objects are created and moved from one place to another and the developer doesn't know about it.
I read that realization like hashCode() {return 42;} is awful and terrible, but what's the difference if we can't instruct VM where to put our objects?
The question is: what is the purpose of hashCode() on deep level if we can't manipulate memory?
I read in a book that hashCode() shows a memory area which helps (e.g. HashSets) to locate appropriate objects in memory.
No, that's a completely bogus description of the purpose of hashCode. It's used to find potentially equal objects in an efficient manner. It's got nothing to do with the location of the object in memory.
The idea is that if you've got something like a HashMap, you want to find a matching key quickly when you do a lookup. So you first check the requested key's hash code, and then you can really efficiently find all the keys in your map with that hash code. You can then check each of those (and only those) candidate keys for equality against the requested key.
See the Wikipedia article on hash tables for more information.
I like Jon Skeet's answer (+1) but it requires knowing how hash tables work. A hash table is a data structure, basically an array of buckets, that uses the hashcode of the key to decide which bucket to stick that entry in. That way future calls to retrieve whatever's at that key don't have to sift through the whole list of things stored in the hashtable, the hashtable can calculate the hashcode for the key, then go straight to the matching bucket and look there. The hashcode has to be something that can be calculated quickly, and you'd rather it was unique but if it isn't it's not a disaster, except in the worst case (your return 42;), which is bad because everything ends up in the same bucket and you're back to sifting through everything.
The default value for Object#hashCode may be based on something like a memory location just because it's a convenient sort-of-random number, but as the object is shunted around during memory management that value is cached and nobody cares anyway. The hashcodes created by different objects, like String or BigDecimal, certainly have nothing to do with memory. It's just a number that is quickly generated and that you hope is unique more often than not.
A hash code is a just a "value". It has nothing more to do with "where you put it in memory" than "MyClass obj = new MyClass()" has to do with where "obj" is placed in memory.
So what is a Java hashCode() all about?
Here is a good discussion on the subject:
http://www.coderanch.com/t/269570/java-programmer-SCJP/certification/discuss-hashcode-contract
K&B says that the hashcode() contract are :
If two objects are equal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must
produce the same integer result.
If two objects are unequal according to the equals(Object) method, there's no requirement about hashcode().
If calling hashcode() on two objects produce different integer result, then both of them must be unequal according to the
equals(Object).
A hashcode is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like hashmaps that need to store objects, will use a hashcode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode, and that, of course, need to be solved carefully). For the details, I would suggest a read of the wikipedia hashmap entry
HashCode is a encryption of an object and with that encryption java knows if the two of the objects for example in collections are the same or different . (SortedSet for an example)
I would recommend you read this article.
Yes, it has got nothing to do with the memory address, though it's typically implemented by converting the internal address of the object. The following statement found in the Object's hashCode() method makes it clear that the implementation is not forced to do it.
As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct objects. (This
is typically implemented by converting the internal address of the
object into an integer, but this implementation technique is not
required by the JavaTM programming language.
The hashCode() function takes an object and outputs a numeric value, which doesn't have to be unique. The hashcode for an object is always the same if the object doesn't change.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
For more, check out this Java hashcode article.
I am curious how java generates hash values by using hashCode() method of the Object API ?
The hashCode() of Object is actually a native method and the implementation is actually not pure Java. Now, regarding the how it works, this answer from Tom Hawtin does a great job at explaining it:
Many people will claim that Object.hashCode will return the address of the object representation in memory. In modern implementations objects actually move within memory. Instead an area of the object header is used to store the value, which may be lazily derived from the memory address at the time that the value is first requested.
The whole answer is actually worth the read.
Java doesn't generate hashCode(), i.e. nothing automatic is happening here. However, Object generates a HashCode based on the memory address of the instance of the object. Most classes (especially if you are going to use it in any of the Collection API) should implement their own HashCode (and by contract their own equals method).
According to Java Platform API documentation, the calculation of hashcode is based on 32-bit internal JVM address of the Object.
It is true that the object moves during execution (AFAIK the only reason is garbage collector). But hashcode does not change.
So when you have an object like this
Person person1 = new Person();
person1.setName("Alex");
Person person2 = new Person();
person2.setName("Alex");
Person person3 = person2;
In this case person1.hashCode will not be equal to person2.hashCode because the memory addresses of these two objects are not the same.
But person2.hashCode will be equal to person3 because they are pointing to the same object.
So if you need to use hashCode method for your objects you must implement it yourself.
By the way String.hashCode implementation is different. It is something like this: (C# syntax)
public int hashCode(String str)
{
int h = 0;
for (int i = 0; i < str.Length; i++)
h = (h * 31) + str[i];
return h;
}
edit: No overflow check is done here, so the hashCode may be positive or negative.
The Object.hashCode() uses the System.identityHashCode() which is based on an id number for a given object.
The HashCode () function has several options for creating a hash code. It sets the JVM startup parameter. Function, which create hashCode() written on C++, and you can see code here
HashCode==0: Simply returns random numbers with no relation to where
in memory the object is found. As far as I can make out, the global
read-write of the seed is not optimal for systems with lots of
processors.
HashCode==1: Counts up the hash code values, not sure at what value
they start, but it seems quite high.
HashCode==2: Always returns the exact same identity hash code of 1.
This can be used to test code that relies on object identity. The
reason why JavaChampionTest returned Kirk's URL in the example above
is that all objects were returning the same hash code.
HashCode==3: Counts up the hash code values, starting from zero. It
does not look to be thread safe, so multiple threads could generate
objects with the same hash code.
HashCode==4: This seems to have some relation to the memory location
at which the object was created.
HashCode>=5: This is the default algorithm for Java 8 and has a
per-thread seed. It uses Marsaglia's xor-shift scheme to produce
pseudo-random numbers.
The information was taken from here
Java does not generate meaningful hashCode for you, it is your job as a programmer to generate a useful hashCode. The default hashCode is just the memory location.
The hashCode method outputs a numeric value. The hashcode for an object is always the same if the object doesn't change. It's important to mention, that it's not have to be unique.
Default implementation of hashCode() is given in such a way that it returns the Hash Code number for the object based on the address of the object. JVM automatically generates this unique number based on the address of the object.
If two objects are at same memory location (the same object referred by two reference variable) then hashcode number for both will be same but if two objects reside at different memory location then hashcode number for both object will be different.
You probably asked the question because you need to override the method. But, when to override it?
Default implementation of hashCode() method returns hashcode number (unique identity of object) based on the address of the object. But if my application requires to uniquely identify the objects based on some different parameter (rather than address of the object), then I should override hashCode() method and should give implementation as per the requirement.
An important note from Effective Java about the topic:
For more, check out this Java hashcode example.
If the hashCode() method is not overridden, what will be the result of invoking hashCode() on any object in Java?
In HotSpot JVM by default on the first invocation of non-overloaded Object.hashCode or System.identityHashCode a random number is generated and stored in the object header. The consequent calls to Object.hashCode or System.identityHashCode just extract this value from the header. By default it has nothing in common with object content or object location, just random number. This behavior is controlled by -XX:hashCode=n HotSpot JVM option which has the following possible values:
0: use global random generator. This is default setting in Java 7. It has the disadvantage that concurrent calls from multiple threads may cause a race condition which will result in generating the same hashCode for different objects. Also in highly-concurrent environment delays are possible due to contention (using the same memory region from different CPU cores).
5: use some thread-local xor-shift random generator which is free from the previous disadvantages. This is default setting in Java 8.
1: use object pointer mixed with some random value which is changed on the "stop-the-world" events, so between stop-the-world events (like garbage collection) generated hashCodes are stable (for testing/debugging purposes)
2: use always 1 (for testing/debugging purposes)
3: use autoincrementing numbers (for testing/debugging purposes, also global counter is used, thus contention and race conditions are possible)
4: use object pointer trimmed to 32 bit if necessary (for testing/debugging purposes)
Note that even if you set -XX:hashCode=4, the hashCode will not always point to the object address. Object may be moved later, but hashCode will stay the same. Also object addresses are poorly distributed (if your application uses not so much memory, most objects will be located close to each other), so you may end up having unbalanced hash tables if you use this option.
Typically, hashCode() just returns the object's address in memory if you don't override it.
From 1:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
1 http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode
The implementation of hashCode() may differ from class to class but the contract for hashCode() is very specific and stated clearly and explicitly in the Javadocs:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
hashCode() is closely tied to equals() and if you override equals(), you should also override hashCode().
The default hashCode() implementation is nothing to do with object's memory address.
In openJDK, in version 6 and 7 it is a randomly generated number. In 8 and 9, it is a number based on the thread state.
Refer this link: hashCode != address
So the result of identity hash generation(the value returned by default implementation of hashCode() method) is generated once and cached in the object's header.
If you want to learn more about this you can go through OpenJDK which defines entry points for hashCode() at
src/share/vm/prims/jvm.h
and
src/share/vm/prims/jvm.cpp
If you go through this above directory, it seems hundred lines of functions that seems to be far more complicated to understand. So, To simplify this, the naively way to represent the default hashcode implementation is something like below,
if (obj.hash() == 0) {
obj.set_hash(generate_new_hash());
}
return obj.hash();
If hashcode is not overriden you will call Object's hashcode, here is an excerpt from its javadoc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
the default hashcode implementation gives the internal address of the object in the jvm, as a 32 bits integer. Thus, two different (in memory) objects will have different hashcodes.
This is consistent with the default implementation of equals. If you want to override equals for your objects, you will have to adapt hashCode so that they are consistent.
See http://www.ibm.com/developerworks/java/library/j-jtp05273.html for a good overview.
You must override hashCode in every class that overrides equals. Failure to do so will result in a violation of the general contract for Object.hashCode, which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.
A hashcode is useful for storing an object in a collection, such as a hashset. By allowing an Object to define a Hashcode as something unique it allows the algorithm of the HashSet to work effectively.
Object itself uses the Object's address in memory, which is very unique, but may not be very useful if two different objects (for example two identical strings) should be considered the same, even if they are duplicated in memory.
You should try to implement the hash code so that different objects will give different results. I don't think there is a standard way of doing this.
Read this article for some information.
Two objects with different hash code must not be equal with regard to equals()
a.hashCode() != b.hashCode() must imply !a.equals(b)
However, two objects that are not equal with regard to equals() can have the same hash code. Storing these objects in a set or map will become less efficient if many objects have the same hash code.
Not really an answer but adding to my earlier comment
internal address of the object cannot be guaranteed to remain unchanged in the JVM, whose garbage collector might move it around during heap compaction.
I tried to do something like this:
public static void main(String[] args) {
final Object object = new Object();
while (true) {
int hash = object.hashCode();
int x = 0;
Runtime r = Runtime.getRuntime();
List<Object> list = new LinkedList<Object>();
while (r.freeMemory() / (double) r.totalMemory() > 0.3) {
Object p = new Object();
list.add(p);
x += object.hashCode();//ensure optimizer or JIT won't remove this
}
System.out.println(x);
list.clear();
r.gc();
if (object.hashCode() != hash) {
System.out.println("Voila!");
break;
}
}
}
But the hashcode indeed doesn't change... can someone tell me how Sun's JDK actually implements Obect.hashcode?
returns 6 digit hex number. This is usually the memory location of the slot where the object is addressed. From an algorithmic per-se, I guess JDK does double hashing (native implementation) which is one of the best hashing functions for open addressing. This double hashing scheme highly reduces the possibility of collisions.
The following post will give a supportive idea -
Java - HashMap confusion about collision handling and the get() method