I am curious how java generates hash values by using hashCode() method of the Object API ?
The hashCode() of Object is actually a native method and the implementation is actually not pure Java. Now, regarding the how it works, this answer from Tom Hawtin does a great job at explaining it:
Many people will claim that Object.hashCode will return the address of the object representation in memory. In modern implementations objects actually move within memory. Instead an area of the object header is used to store the value, which may be lazily derived from the memory address at the time that the value is first requested.
The whole answer is actually worth the read.
Java doesn't generate hashCode(), i.e. nothing automatic is happening here. However, Object generates a HashCode based on the memory address of the instance of the object. Most classes (especially if you are going to use it in any of the Collection API) should implement their own HashCode (and by contract their own equals method).
According to Java Platform API documentation, the calculation of hashcode is based on 32-bit internal JVM address of the Object.
It is true that the object moves during execution (AFAIK the only reason is garbage collector). But hashcode does not change.
So when you have an object like this
Person person1 = new Person();
person1.setName("Alex");
Person person2 = new Person();
person2.setName("Alex");
Person person3 = person2;
In this case person1.hashCode will not be equal to person2.hashCode because the memory addresses of these two objects are not the same.
But person2.hashCode will be equal to person3 because they are pointing to the same object.
So if you need to use hashCode method for your objects you must implement it yourself.
By the way String.hashCode implementation is different. It is something like this: (C# syntax)
public int hashCode(String str)
{
int h = 0;
for (int i = 0; i < str.Length; i++)
h = (h * 31) + str[i];
return h;
}
edit: No overflow check is done here, so the hashCode may be positive or negative.
The Object.hashCode() uses the System.identityHashCode() which is based on an id number for a given object.
The HashCode () function has several options for creating a hash code. It sets the JVM startup parameter. Function, which create hashCode() written on C++, and you can see code here
HashCode==0: Simply returns random numbers with no relation to where
in memory the object is found. As far as I can make out, the global
read-write of the seed is not optimal for systems with lots of
processors.
HashCode==1: Counts up the hash code values, not sure at what value
they start, but it seems quite high.
HashCode==2: Always returns the exact same identity hash code of 1.
This can be used to test code that relies on object identity. The
reason why JavaChampionTest returned Kirk's URL in the example above
is that all objects were returning the same hash code.
HashCode==3: Counts up the hash code values, starting from zero. It
does not look to be thread safe, so multiple threads could generate
objects with the same hash code.
HashCode==4: This seems to have some relation to the memory location
at which the object was created.
HashCode>=5: This is the default algorithm for Java 8 and has a
per-thread seed. It uses Marsaglia's xor-shift scheme to produce
pseudo-random numbers.
The information was taken from here
Java does not generate meaningful hashCode for you, it is your job as a programmer to generate a useful hashCode. The default hashCode is just the memory location.
The hashCode method outputs a numeric value. The hashcode for an object is always the same if the object doesn't change. It's important to mention, that it's not have to be unique.
Default implementation of hashCode() is given in such a way that it returns the Hash Code number for the object based on the address of the object. JVM automatically generates this unique number based on the address of the object.
If two objects are at same memory location (the same object referred by two reference variable) then hashcode number for both will be same but if two objects reside at different memory location then hashcode number for both object will be different.
You probably asked the question because you need to override the method. But, when to override it?
Default implementation of hashCode() method returns hashcode number (unique identity of object) based on the address of the object. But if my application requires to uniquely identify the objects based on some different parameter (rather than address of the object), then I should override hashCode() method and should give implementation as per the requirement.
An important note from Effective Java about the topic:
For more, check out this Java hashcode example.
Related
Is there any way to store HashMap in a specific memory location or within specific memory area (lets say memory address from 0 to 100).
As per my understanding it is possible by modifying the hashCode() function. Can anyone share the code for the same ?
hashCode() 's purpose is simply to be a unique id for distinct java objects in the VM where the code is running. It does not affect the object's position inside the virtual memory used by the VM. It works the other way, at least in Object 's implementation of the method, hashCode() gets you an integer describing the object's address in this virtual memory.
If you are to change it, you should follow these guidelines. I think this explanation from the Java docs can make things clear:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution
of the same application.
If two objects are equal according to the
equals(Object) method, then calling the hashCode method on each of the
two objects must produce the same integer result.
.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hash tables.
the hashCode method defined by class Object does return distinct integers for distinct objects.
Here comes the implementation part:
(This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
EDIT:
In the case of HashMap the hashCode() method is inherited from AbstractMap and works as described in the Java documentation:
Returns the hash code value for this map. The hash code of a map is
defined to be the sum of the hash codes of each entry in the map's
entrySet() view. This ensures that m1.equals(m2) implies that
m1.hashCode()==m2.hashCode() for any two maps m1 and m2, as required
by the general contract of Object.hashCode().
As you can see it follows the contract specified above
Hope this helps
If two objects of the same class have the same hashCode in Java then how would they be stored in a HashMap / HashTable? What is the actual architecture for hashcode and memory address. Where does hashCode reside in memory?
Example: There is a class A. when creating objects a1 and a2 then they will represent some memory address but I overrode hashcode every time same. When I read an article then I found that hashcode functions generate a hashcode from the memory address. this means the memory address will same if hashcode is same. Please clear my doubt.
public class A {
#Override
public int hashCode() {
return 1;
}
public static void main(String args[]) {
A a1 = new A();
A a2 = new A();
System.out.println(a1.hashCode());
System.out.println(a2.hashCode());
}
}
No two objects (that exist at the same time) can have the same memory address.
They can have the same hash code, though hashCode implementations try to avoid that. And the default implementation of hashCode doesn't have to be based on the object's memory address (though it can be).
So if two objects have the same hash code, you can't assume that they have the same memory address. In fact, if two variables refer to different objects (i.e. comparing them with == returns false), they definitely do not have the same address.
The article you read about hash codes being based on memory addresses was referring to the default implementation of the hashCode method in the Object class. If you override hashCode in a subclass, you're not using that default implementation anymore. Your return 1 has nothing to do with memory addresses.
The default object version of hashCode() is based on a memory address. When you override the hashCode() method and return a different value it does not change the Object's memory address. Nor does returning a constant 1 break HashMap, but it does severely effect performance.
System.out.println(a1==a2);
The Result is false.
Note that since objects commonly define their own implementation of hashcode and equals, based on their contents/value rather than object identity, hashcode is NOT reliably related to the object's address.
The identity hashcode -- which is also the default hashcode implementation provided by java.lang.Object -- may or may not be related to the object address, depending on how this JRE's garbage collector manages memory.
Hashcode and memory addresses are two different things. Hashcode is used to identify the bucket position in the memory to store the key. But two non equal objects with the same hashcode will reside in the same bucket but at a different memory address.
how would they be stored in a HashMap / HashTable?Hashcode does not reside in the memory anywhere.
Any hashed collection uses hashed buckets architecture to decide where to store the object. This helps in the quick retrieval of objects. This is the saving mechanism:
Objects with different hashcode and non equal(equals() return false on two object) : Will be saved in different hashed buckets
Objects with different hashcode and equal: Will be saved at the same hashed bucket but in a linked list
Object with same hashcode and equal: Will overwrite each other when saved
What is the actual architecture for hashcode and memory address.
Where does hashCode reside in memory?
It is always calculated when you try to put/retrieve an element in a hashed collection. And hashcode method provides the logic.
I think root of the question is to understand the relation between hash value and memory location.
Hash map/table use array for storing key and values.
The value obtained from hash(key) function is used to determine the index in the array.
If you go one more step deep in native side, memory address will be (memory address of first element of array + index). At this memory location, address of actual object will be stored.
As others have already answered, if two objects have same hash value then those objects will be in same bucket. Meaning, at the same index value of the array. But, in this case to avoid collision each element of array could be linked list. Thus, objects with same hash value will be added to a linked list.
I wanna make some supplementary instructions for the answer having most votes #Wyzard
The article you read about hash codes being based on memory addresses was referring to the default implementation of the hashCode method in the Object class.
In fact, the default implementation of the hashCode is not only one, instead, there are almost 6 ways (according to OpenJDK). You can get this conclusion by refer to the method get_next_hash found in the synchronizer.cpp, which is critical to calculate the hash code. For your convenience, I've posted a screenshot below: enter image description of the method here
What's more, different version of JDK just uses different way to calculate the hash code. For example, JDK1.8 product the result utilizing thread state combined with xorshift, seeing from enter image description here.
Now turn to the algorithm of generating a hash code base on the object memory address. Provided that it just simply cast the memory address to a hash code, what will happen if GC occurs during the Java running time? The address of the object may change, and then the original hash code will change too. Hence there are must more other extra operation to make sure the hash code will remain consistent using this kind of algorithm to implement the hashCode().
If two objects have the same hashcode and are of the same class - the second will replace the first if both are added to a Hashtable/HashMap.
This question already has answers here:
How is hashCode() calculated in Java
(11 answers)
Closed 9 years ago.
I have learned that hashcode is a Unique Identification reference number, which is a hexadecimal number.
My doubt is, does the reference number represents the memory address of the object?
For example:
Employeee e1=new Employee();
System.out.println(e1.hashcode());
Will this code return me the memory address of my object?
Hashcode is not a unique identification. It's just a number that helps you distinguish objects. Two different objects may have the same hash code and it is fine.
HashCode characteristics:
If obj1 and obj2 are equal, they must have the same hash code.
If obj1 and obj2 have the same hash code, they do not have to be equal.
If Employee class hasn't overridden the hashCode() method , then it will use the one defined in its super class, probably Object class . hashCode() in Object class says ,
As much as is reasonably practical, the hashCode method defined by class Object
does return distinct integers for distinct objects. (This is typically
implemented by converting the internal address of the object into an integer,
but this implementation technique is not required by the JavaTM
programming language.)
So, in short it may or may not depending upon the implementation.Suppose, if Employee class has overridden the hashCode() as(though bad practice and useless) :
public int hashCode() {
return 10;
}
Then, you can see that it doesn't return a memory address here.
Not necessarily the memory address. It should be kept different for different objects. But it might be anything. You can also override default hashCode definition with your own.
Hashcode is a number used by JVM for hashing in order to store and retrieve the objects. For example, when we add an object in hashmap, JVM looks for the hashcode implentation to decide where to put the object in memory. When we retrieve an object again hashcode is used to get the location of the object. Note that hashcode is not the actual memory address but its a link for JVM to fetch the object from a specified location with a complexity of O(1).
Simple Answer NO
A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code. If you write a custom type you are responsible for creating a good hashCode implementation that will best represent the state of the current instance.
So for your class you can implement the hashcode Method and return whatever you want.
hashCode is the native implementation which provides the memory address to a certain extent.
Any ways you can ovveride it.
If you look close at API
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
I do not know how much you understand from others answers but my doubts were clear after
I read
The 3 things you should know about hashCode() by #Ralf Sternberg
According to Java documentation:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
The hashcode for objects is implementation-specific, but I would highly doubt any JVM implementation would use the memory address. Since garbage collection is a central feature of Java, that means the object could be moved and thereby have different memory addresses during its lifetime, even if its contents remain unchanged (and this would violate the hashcode spec).
Hashcode is the 32 bit signed integer that is uniquely assigned to that object. It is the result of the hash function when inserting into the hashmap
I read in a book that hashCode() shows a memory area which helps (e.g. HashSets) to locate appropriate objects in memory. But how can that be true if we cannot manipulate memory in Java directly? There are no pointers, in addition to it objects are created and moved from one place to another and the developer doesn't know about it.
I read that realization like hashCode() {return 42;} is awful and terrible, but what's the difference if we can't instruct VM where to put our objects?
The question is: what is the purpose of hashCode() on deep level if we can't manipulate memory?
I read in a book that hashCode() shows a memory area which helps (e.g. HashSets) to locate appropriate objects in memory.
No, that's a completely bogus description of the purpose of hashCode. It's used to find potentially equal objects in an efficient manner. It's got nothing to do with the location of the object in memory.
The idea is that if you've got something like a HashMap, you want to find a matching key quickly when you do a lookup. So you first check the requested key's hash code, and then you can really efficiently find all the keys in your map with that hash code. You can then check each of those (and only those) candidate keys for equality against the requested key.
See the Wikipedia article on hash tables for more information.
I like Jon Skeet's answer (+1) but it requires knowing how hash tables work. A hash table is a data structure, basically an array of buckets, that uses the hashcode of the key to decide which bucket to stick that entry in. That way future calls to retrieve whatever's at that key don't have to sift through the whole list of things stored in the hashtable, the hashtable can calculate the hashcode for the key, then go straight to the matching bucket and look there. The hashcode has to be something that can be calculated quickly, and you'd rather it was unique but if it isn't it's not a disaster, except in the worst case (your return 42;), which is bad because everything ends up in the same bucket and you're back to sifting through everything.
The default value for Object#hashCode may be based on something like a memory location just because it's a convenient sort-of-random number, but as the object is shunted around during memory management that value is cached and nobody cares anyway. The hashcodes created by different objects, like String or BigDecimal, certainly have nothing to do with memory. It's just a number that is quickly generated and that you hope is unique more often than not.
A hash code is a just a "value". It has nothing more to do with "where you put it in memory" than "MyClass obj = new MyClass()" has to do with where "obj" is placed in memory.
So what is a Java hashCode() all about?
Here is a good discussion on the subject:
http://www.coderanch.com/t/269570/java-programmer-SCJP/certification/discuss-hashcode-contract
K&B says that the hashcode() contract are :
If two objects are equal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must
produce the same integer result.
If two objects are unequal according to the equals(Object) method, there's no requirement about hashcode().
If calling hashcode() on two objects produce different integer result, then both of them must be unequal according to the
equals(Object).
A hashcode is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like hashmaps that need to store objects, will use a hashcode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode, and that, of course, need to be solved carefully). For the details, I would suggest a read of the wikipedia hashmap entry
HashCode is a encryption of an object and with that encryption java knows if the two of the objects for example in collections are the same or different . (SortedSet for an example)
I would recommend you read this article.
Yes, it has got nothing to do with the memory address, though it's typically implemented by converting the internal address of the object. The following statement found in the Object's hashCode() method makes it clear that the implementation is not forced to do it.
As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct objects. (This
is typically implemented by converting the internal address of the
object into an integer, but this implementation technique is not
required by the JavaTM programming language.
The hashCode() function takes an object and outputs a numeric value, which doesn't have to be unique. The hashcode for an object is always the same if the object doesn't change.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
For more, check out this Java hashcode article.
If the hashCode() method is not overridden, what will be the result of invoking hashCode() on any object in Java?
In HotSpot JVM by default on the first invocation of non-overloaded Object.hashCode or System.identityHashCode a random number is generated and stored in the object header. The consequent calls to Object.hashCode or System.identityHashCode just extract this value from the header. By default it has nothing in common with object content or object location, just random number. This behavior is controlled by -XX:hashCode=n HotSpot JVM option which has the following possible values:
0: use global random generator. This is default setting in Java 7. It has the disadvantage that concurrent calls from multiple threads may cause a race condition which will result in generating the same hashCode for different objects. Also in highly-concurrent environment delays are possible due to contention (using the same memory region from different CPU cores).
5: use some thread-local xor-shift random generator which is free from the previous disadvantages. This is default setting in Java 8.
1: use object pointer mixed with some random value which is changed on the "stop-the-world" events, so between stop-the-world events (like garbage collection) generated hashCodes are stable (for testing/debugging purposes)
2: use always 1 (for testing/debugging purposes)
3: use autoincrementing numbers (for testing/debugging purposes, also global counter is used, thus contention and race conditions are possible)
4: use object pointer trimmed to 32 bit if necessary (for testing/debugging purposes)
Note that even if you set -XX:hashCode=4, the hashCode will not always point to the object address. Object may be moved later, but hashCode will stay the same. Also object addresses are poorly distributed (if your application uses not so much memory, most objects will be located close to each other), so you may end up having unbalanced hash tables if you use this option.
Typically, hashCode() just returns the object's address in memory if you don't override it.
From 1:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
1 http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode
The implementation of hashCode() may differ from class to class but the contract for hashCode() is very specific and stated clearly and explicitly in the Javadocs:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
hashCode() is closely tied to equals() and if you override equals(), you should also override hashCode().
The default hashCode() implementation is nothing to do with object's memory address.
In openJDK, in version 6 and 7 it is a randomly generated number. In 8 and 9, it is a number based on the thread state.
Refer this link: hashCode != address
So the result of identity hash generation(the value returned by default implementation of hashCode() method) is generated once and cached in the object's header.
If you want to learn more about this you can go through OpenJDK which defines entry points for hashCode() at
src/share/vm/prims/jvm.h
and
src/share/vm/prims/jvm.cpp
If you go through this above directory, it seems hundred lines of functions that seems to be far more complicated to understand. So, To simplify this, the naively way to represent the default hashcode implementation is something like below,
if (obj.hash() == 0) {
obj.set_hash(generate_new_hash());
}
return obj.hash();
If hashcode is not overriden you will call Object's hashcode, here is an excerpt from its javadoc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
the default hashcode implementation gives the internal address of the object in the jvm, as a 32 bits integer. Thus, two different (in memory) objects will have different hashcodes.
This is consistent with the default implementation of equals. If you want to override equals for your objects, you will have to adapt hashCode so that they are consistent.
See http://www.ibm.com/developerworks/java/library/j-jtp05273.html for a good overview.
You must override hashCode in every class that overrides equals. Failure to do so will result in a violation of the general contract for Object.hashCode, which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.
A hashcode is useful for storing an object in a collection, such as a hashset. By allowing an Object to define a Hashcode as something unique it allows the algorithm of the HashSet to work effectively.
Object itself uses the Object's address in memory, which is very unique, but may not be very useful if two different objects (for example two identical strings) should be considered the same, even if they are duplicated in memory.
You should try to implement the hash code so that different objects will give different results. I don't think there is a standard way of doing this.
Read this article for some information.
Two objects with different hash code must not be equal with regard to equals()
a.hashCode() != b.hashCode() must imply !a.equals(b)
However, two objects that are not equal with regard to equals() can have the same hash code. Storing these objects in a set or map will become less efficient if many objects have the same hash code.
Not really an answer but adding to my earlier comment
internal address of the object cannot be guaranteed to remain unchanged in the JVM, whose garbage collector might move it around during heap compaction.
I tried to do something like this:
public static void main(String[] args) {
final Object object = new Object();
while (true) {
int hash = object.hashCode();
int x = 0;
Runtime r = Runtime.getRuntime();
List<Object> list = new LinkedList<Object>();
while (r.freeMemory() / (double) r.totalMemory() > 0.3) {
Object p = new Object();
list.add(p);
x += object.hashCode();//ensure optimizer or JIT won't remove this
}
System.out.println(x);
list.clear();
r.gc();
if (object.hashCode() != hash) {
System.out.println("Voila!");
break;
}
}
}
But the hashcode indeed doesn't change... can someone tell me how Sun's JDK actually implements Obect.hashcode?
returns 6 digit hex number. This is usually the memory location of the slot where the object is addressed. From an algorithmic per-se, I guess JDK does double hashing (native implementation) which is one of the best hashing functions for open addressing. This double hashing scheme highly reduces the possibility of collisions.
The following post will give a supportive idea -
Java - HashMap confusion about collision handling and the get() method