What is an object's hash code if hashCode() is not overridden? - java

If the hashCode() method is not overridden, what will be the result of invoking hashCode() on any object in Java?

In HotSpot JVM by default on the first invocation of non-overloaded Object.hashCode or System.identityHashCode a random number is generated and stored in the object header. The consequent calls to Object.hashCode or System.identityHashCode just extract this value from the header. By default it has nothing in common with object content or object location, just random number. This behavior is controlled by -XX:hashCode=n HotSpot JVM option which has the following possible values:
0: use global random generator. This is default setting in Java 7. It has the disadvantage that concurrent calls from multiple threads may cause a race condition which will result in generating the same hashCode for different objects. Also in highly-concurrent environment delays are possible due to contention (using the same memory region from different CPU cores).
5: use some thread-local xor-shift random generator which is free from the previous disadvantages. This is default setting in Java 8.
1: use object pointer mixed with some random value which is changed on the "stop-the-world" events, so between stop-the-world events (like garbage collection) generated hashCodes are stable (for testing/debugging purposes)
2: use always 1 (for testing/debugging purposes)
3: use autoincrementing numbers (for testing/debugging purposes, also global counter is used, thus contention and race conditions are possible)
4: use object pointer trimmed to 32 bit if necessary (for testing/debugging purposes)
Note that even if you set -XX:hashCode=4, the hashCode will not always point to the object address. Object may be moved later, but hashCode will stay the same. Also object addresses are poorly distributed (if your application uses not so much memory, most objects will be located close to each other), so you may end up having unbalanced hash tables if you use this option.

Typically, hashCode() just returns the object's address in memory if you don't override it.
From 1:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
1 http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode

The implementation of hashCode() may differ from class to class but the contract for hashCode() is very specific and stated clearly and explicitly in the Javadocs:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
hashCode() is closely tied to equals() and if you override equals(), you should also override hashCode().

The default hashCode() implementation is nothing to do with object's memory address.
In openJDK, in version 6 and 7 it is a randomly generated number. In 8 and 9, it is a number based on the thread state.
Refer this link: hashCode != address
So the result of identity hash generation(the value returned by default implementation of hashCode() method) is generated once and cached in the object's header.
If you want to learn more about this you can go through OpenJDK which defines entry points for hashCode() at
src/share/vm/prims/jvm.h
and
src/share/vm/prims/jvm.cpp
If you go through this above directory, it seems hundred lines of functions that seems to be far more complicated to understand. So, To simplify this, the naively way to represent the default hashcode implementation is something like below,
if (obj.hash() == 0) {
obj.set_hash(generate_new_hash());
}
return obj.hash();

If hashcode is not overriden you will call Object's hashcode, here is an excerpt from its javadoc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

the default hashcode implementation gives the internal address of the object in the jvm, as a 32 bits integer. Thus, two different (in memory) objects will have different hashcodes.
This is consistent with the default implementation of equals. If you want to override equals for your objects, you will have to adapt hashCode so that they are consistent.
See http://www.ibm.com/developerworks/java/library/j-jtp05273.html for a good overview.

You must override hashCode in every class that overrides equals. Failure to do so will result in a violation of the general contract for Object.hashCode, which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.

A hashcode is useful for storing an object in a collection, such as a hashset. By allowing an Object to define a Hashcode as something unique it allows the algorithm of the HashSet to work effectively.
Object itself uses the Object's address in memory, which is very unique, but may not be very useful if two different objects (for example two identical strings) should be considered the same, even if they are duplicated in memory.

You should try to implement the hash code so that different objects will give different results. I don't think there is a standard way of doing this.
Read this article for some information.

Two objects with different hash code must not be equal with regard to equals()
a.hashCode() != b.hashCode() must imply !a.equals(b)
However, two objects that are not equal with regard to equals() can have the same hash code. Storing these objects in a set or map will become less efficient if many objects have the same hash code.

Not really an answer but adding to my earlier comment
internal address of the object cannot be guaranteed to remain unchanged in the JVM, whose garbage collector might move it around during heap compaction.
I tried to do something like this:
public static void main(String[] args) {
final Object object = new Object();
while (true) {
int hash = object.hashCode();
int x = 0;
Runtime r = Runtime.getRuntime();
List<Object> list = new LinkedList<Object>();
while (r.freeMemory() / (double) r.totalMemory() > 0.3) {
Object p = new Object();
list.add(p);
x += object.hashCode();//ensure optimizer or JIT won't remove this
}
System.out.println(x);
list.clear();
r.gc();
if (object.hashCode() != hash) {
System.out.println("Voila!");
break;
}
}
}
But the hashcode indeed doesn't change... can someone tell me how Sun's JDK actually implements Obect.hashcode?

returns 6 digit hex number. This is usually the memory location of the slot where the object is addressed. From an algorithmic per-se, I guess JDK does double hashing (native implementation) which is one of the best hashing functions for open addressing. This double hashing scheme highly reduces the possibility of collisions.
The following post will give a supportive idea -
Java - HashMap confusion about collision handling and the get() method

Related

storing hashmap in specific memory in java

Is there any way to store HashMap in a specific memory location or within specific memory area (lets say memory address from 0 to 100).
As per my understanding it is possible by modifying the hashCode() function. Can anyone share the code for the same ?
hashCode() 's purpose is simply to be a unique id for distinct java objects in the VM where the code is running. It does not affect the object's position inside the virtual memory used by the VM. It works the other way, at least in Object 's implementation of the method, hashCode() gets you an integer describing the object's address in this virtual memory.
If you are to change it, you should follow these guidelines. I think this explanation from the Java docs can make things clear:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution
of the same application.
If two objects are equal according to the
equals(Object) method, then calling the hashCode method on each of the
two objects must produce the same integer result.
.
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hash tables.
the hashCode method defined by class Object does return distinct integers for distinct objects.
Here comes the implementation part:
(This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
EDIT:
In the case of HashMap the hashCode() method is inherited from AbstractMap and works as described in the Java documentation:
Returns the hash code value for this map. The hash code of a map is
defined to be the sum of the hash codes of each entry in the map's
entrySet() view. This ensures that m1.equals(m2) implies that
m1.hashCode()==m2.hashCode() for any two maps m1 and m2, as required
by the general contract of Object.hashCode().
As you can see it follows the contract specified above
Hope this helps

hashCode() purpose in Java

I read in a book that hashCode() shows a memory area which helps (e.g. HashSets) to locate appropriate objects in memory. But how can that be true if we cannot manipulate memory in Java directly? There are no pointers, in addition to it objects are created and moved from one place to another and the developer doesn't know about it.
I read that realization like hashCode() {return 42;} is awful and terrible, but what's the difference if we can't instruct VM where to put our objects?
The question is: what is the purpose of hashCode() on deep level if we can't manipulate memory?
I read in a book that hashCode() shows a memory area which helps (e.g. HashSets) to locate appropriate objects in memory.
No, that's a completely bogus description of the purpose of hashCode. It's used to find potentially equal objects in an efficient manner. It's got nothing to do with the location of the object in memory.
The idea is that if you've got something like a HashMap, you want to find a matching key quickly when you do a lookup. So you first check the requested key's hash code, and then you can really efficiently find all the keys in your map with that hash code. You can then check each of those (and only those) candidate keys for equality against the requested key.
See the Wikipedia article on hash tables for more information.
I like Jon Skeet's answer (+1) but it requires knowing how hash tables work. A hash table is a data structure, basically an array of buckets, that uses the hashcode of the key to decide which bucket to stick that entry in. That way future calls to retrieve whatever's at that key don't have to sift through the whole list of things stored in the hashtable, the hashtable can calculate the hashcode for the key, then go straight to the matching bucket and look there. The hashcode has to be something that can be calculated quickly, and you'd rather it was unique but if it isn't it's not a disaster, except in the worst case (your return 42;), which is bad because everything ends up in the same bucket and you're back to sifting through everything.
The default value for Object#hashCode may be based on something like a memory location just because it's a convenient sort-of-random number, but as the object is shunted around during memory management that value is cached and nobody cares anyway. The hashcodes created by different objects, like String or BigDecimal, certainly have nothing to do with memory. It's just a number that is quickly generated and that you hope is unique more often than not.
A hash code is a just a "value". It has nothing more to do with "where you put it in memory" than "MyClass obj = new MyClass()" has to do with where "obj" is placed in memory.
So what is a Java hashCode() all about?
Here is a good discussion on the subject:
http://www.coderanch.com/t/269570/java-programmer-SCJP/certification/discuss-hashcode-contract
K&B says that the hashcode() contract are :
If two objects are equal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must
produce the same integer result.
If two objects are unequal according to the equals(Object) method, there's no requirement about hashcode().
If calling hashcode() on two objects produce different integer result, then both of them must be unequal according to the
equals(Object).
A hashcode is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like hashmaps that need to store objects, will use a hashcode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode, and that, of course, need to be solved carefully). For the details, I would suggest a read of the wikipedia hashmap entry
HashCode is a encryption of an object and with that encryption java knows if the two of the objects for example in collections are the same or different . (SortedSet for an example)
I would recommend you read this article.
Yes, it has got nothing to do with the memory address, though it's typically implemented by converting the internal address of the object. The following statement found in the Object's hashCode() method makes it clear that the implementation is not forced to do it.
As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct objects. (This
is typically implemented by converting the internal address of the
object into an integer, but this implementation technique is not
required by the JavaTM programming language.
The hashCode() function takes an object and outputs a numeric value, which doesn't have to be unique. The hashcode for an object is always the same if the object doesn't change.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
For more, check out this Java hashcode article.

compareTo involving non-comparable field: how to maintain transitivity?

Consider a class with a comparable (consistent with equals) and a non-comparable field (of a class about which I do not know whether it overrides Object#equals or not).
The class' instances shall be compared, where the resulting order shall be consistent with equals, i.e. 0 returned iff both fields are equal (as per Object#equals) and consistent with the order of the comparable field. I used System.identityHashCode to cover most of the cases not covered by these requirements (the order of instances with same comparable, but different other value is arbitrary), but am not sure whether this is the best approach.
public class MyClass implements Comparable<MyClass> {
private Integer intField;
private Object nonCompField;
public int compareTo(MyClass other) {
int intFieldComp = this.intField.compareTo(other.intField);
if (intFieldComp != 0)
return intFieldComp;
if (this.nonCompField.equals(other.nonCompField))
return 0;
// ...and now? My current approach:
if (Systems.identityHashCode(this.nonCompField) < Systems.identityHashCode(other.nonCompField))
return -1;
else
return 1;
}
}
Two problems I see here:
If Systems.identityHashCode is the same for two objects, each is greater than the other. (Can this happen at all?)
The order of instances with same intField value and different nonCompField values need not be consistent between runs of the program, as far as I understand what Systems.identityHashCode does.
Is that correct? Are there more problems? Most importantly, is there a way around this?
The first problem, although highly unlikely, could happen (I think you would need an enormous amount of memory, and a very bad luck). But it's solved by Guava's Ordering.arbitrary(), which uses the identity hash code behind the scenes, but maintains a cache of comparison results for the cases where two different objects have the same identity hash code.
Regarding your second question, no, the identity hash codes are not preserved between runs.
Systems.identityHashCode […] the same for two objects […] (Can this happen at all?)
Yes it can. Quoting from the Java API Documentation:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
identityHashCode(Object x) returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode().
So you may encounter hash collisions, and with memory ever growing but hash codes staying fixed at 32 bit, they will become increasingly more likely.
The order of instances with same intField value and different nonCompField values need not be consistent between runs of the program, as far as I understand what Systems.identityHashCode does.
Right. It might even be different during a single invocation of the same program: You could have (1,foo) < (1,bar) < (1,baz) even though foo.equals(baz).
Most importantly, is there a way around this?
You can maintain a map which maps each distinct value of the non-comparable type to a sequence number which you increase for each distinct value you encounter.
Memory management will be tricky, though: You cannot use a WeakHashMap as the code might make your key object unreachable but still hold a reference to another object of the same value. So either you maintain a list of weak references to all the objects of a given value, or you simply use strong references and accept the fact that any uncomparable value ever encountered will never be garbage collected.
Note that this scheme will still not result in reproducible sequence numbers unless you create values reproducibly in just the same order.
If the class of the nonCompField has implemented a reasonably good toString(), you might be able to use
return String.valueOf(this.nonCompField).compareTo(String.valueOf(other.nonCompField));
Unfortunately, the default Object.toString() uses the hashcode, which has potential issues as noted by others.

how does the hashCode() method of java works?

I am curious how java generates hash values by using hashCode() method of the Object API ?
The hashCode() of Object is actually a native method and the implementation is actually not pure Java. Now, regarding the how it works, this answer from Tom Hawtin does a great job at explaining it:
Many people will claim that Object.hashCode will return the address of the object representation in memory. In modern implementations objects actually move within memory. Instead an area of the object header is used to store the value, which may be lazily derived from the memory address at the time that the value is first requested.
The whole answer is actually worth the read.
Java doesn't generate hashCode(), i.e. nothing automatic is happening here. However, Object generates a HashCode based on the memory address of the instance of the object. Most classes (especially if you are going to use it in any of the Collection API) should implement their own HashCode (and by contract their own equals method).
According to Java Platform API documentation, the calculation of hashcode is based on 32-bit internal JVM address of the Object.
It is true that the object moves during execution (AFAIK the only reason is garbage collector). But hashcode does not change.
So when you have an object like this
Person person1 = new Person();
person1.setName("Alex");
Person person2 = new Person();
person2.setName("Alex");
Person person3 = person2;
In this case person1.hashCode will not be equal to person2.hashCode because the memory addresses of these two objects are not the same.
But person2.hashCode will be equal to person3 because they are pointing to the same object.
So if you need to use hashCode method for your objects you must implement it yourself.
By the way String.hashCode implementation is different. It is something like this: (C# syntax)
public int hashCode(String str)
{
int h = 0;
for (int i = 0; i < str.Length; i++)
h = (h * 31) + str[i];
return h;
}
edit: No overflow check is done here, so the hashCode may be positive or negative.
The Object.hashCode() uses the System.identityHashCode() which is based on an id number for a given object.
The HashCode () function has several options for creating a hash code. It sets the JVM startup parameter. Function, which create hashCode() written on C++, and you can see code here
HashCode==0: Simply returns random numbers with no relation to where
in memory the object is found. As far as I can make out, the global
read-write of the seed is not optimal for systems with lots of
processors.
HashCode==1: Counts up the hash code values, not sure at what value
they start, but it seems quite high.
HashCode==2: Always returns the exact same identity hash code of 1.
This can be used to test code that relies on object identity. The
reason why JavaChampionTest returned Kirk's URL in the example above
is that all objects were returning the same hash code.
HashCode==3: Counts up the hash code values, starting from zero. It
does not look to be thread safe, so multiple threads could generate
objects with the same hash code.
HashCode==4: This seems to have some relation to the memory location
at which the object was created.
HashCode>=5: This is the default algorithm for Java 8 and has a
per-thread seed. It uses Marsaglia's xor-shift scheme to produce
pseudo-random numbers.
The information was taken from here
Java does not generate meaningful hashCode for you, it is your job as a programmer to generate a useful hashCode. The default hashCode is just the memory location.
The hashCode method outputs a numeric value. The hashcode for an object is always the same if the object doesn't change. It's important to mention, that it's not have to be unique.
Default implementation of hashCode() is given in such a way that it returns the Hash Code number for the object based on the address of the object. JVM automatically generates this unique number based on the address of the object.
If two objects are at same memory location (the same object referred by two reference variable) then hashcode number for both will be same but if two objects reside at different memory location then hashcode number for both object will be different.
You probably asked the question because you need to override the method. But, when to override it?
Default implementation of hashCode() method returns hashcode number (unique identity of object) based on the address of the object. But if my application requires to uniquely identify the objects based on some different parameter (rather than address of the object), then I should override hashCode() method and should give implementation as per the requirement.
An important note from Effective Java about the topic:
For more, check out this Java hashcode example.

hashCode uniqueness

Is it possible for two instances of Object to have the same hashCode()?
In theory an object's hashCode is derived from its memory address, so all hashCodes should be unique, but what if objects are moved around during GC?
I think the docs for object's hashCode method state the answer.
"As much as is reasonably practical,
the hashCode method defined by class
Object does return distinct integers
for distinct objects. (This is
typically implemented by converting
the internal address of the object
into an integer, but this
implementation technique is not
required by the JavaTM programming
language.)"
Given a reasonable collection of objects, having two with the same hash code is quite likely. In the best case it becomes the birthday problem, with a clash with tens of thousands of objects. In practice objects a created with a relatively small pool of likely hash codes, and clashes can easily happen with merely thousands of objects.
Using memory address is just a way of obtaining a slightly random number. The Sun JDK source has a switch to enable use of a Secure Random Number Generator or a constant. I believe IBM (used to?) use a fast random number generator, but it was not at all secure. The mention in the docs of memory address appears to be of a historical nature (around a decade ago it was not unusual to have object handles with fixed locations).
Here's some code I wrote a few years ago to demonstrate clashes:
class HashClash {
public static void main(String[] args) {
final Object obj = new Object();
final int target = obj.hashCode();
Object clash;
long ct = 0;
do {
clash = new Object();
++ct;
} while (clash.hashCode() != target && ct<10L*1000*1000*1000L);
if (clash.hashCode() == target) {
System.out.println(ct+": "+obj+" - "+clash);
} else {
System.out.println("No clashes found");
}
}
}
RFE to clarify docs, because this comes up way too frequently: CR 6321873
Think about it. There are an infinite number of potential objects, and only 4 billion hash codes. Clearly, an infinity of potential objects share each hash code.
The Sun JVM either bases the Object hash code on a stable handle to the object or caches the initial hash code. Compaction during GC will not alter the hashCode(). Everything would break if it did.
Is it possible?
Yes.
Does it happen with any reasonable degree of frequency?
No.
I assume the original question is only about the hash codes generated by the default Object implementation. The fact is that hash codes must not be relied on for equality testing and are only used in some specific hash mapping operations (such as those implemented by the very useful HashMap implementation).
As such they have no need of being really unique - they only have to be unique enough to not generate a lot of clashes (which will render the HashMap implementation inefficient).
Also it is expected that when developer implement classes that are meant to be stored in HashMaps they will implement a hash code algorithm that has a low chance of clashes for objects of the same class (assuming you only store objects of the same class in application HashMaps), and knowing about the data makes it much easier to implement robust hashing.
Also see Ken's answer about equality necessitating identical hash codes.
Are you talking about the actual class Object or objects in general? You use both in the question. (And real-world apps generally don't create a lot of instances of Object)
For objects in general, it is common to write a class for which you want to override equals(); and if you do that, you must also override hashCode() so that two different instances of that class that are "equal" must also have the same hash code. You are likely to get a "duplicate" hash code in that case, among instances of the same class.
Also, when implementing hashCode() in different classes, they are often based on something in the object, so you end up with less "random" values, resulting in "duplicate" hash codes among instances of different classes (whether or not those objects are "equal").
In any real-world app, it is not unusual to find to different objects with the same hash code.
If there were as many hashcodes as memory addresses, then it would took the whole memory to store the hash itself. :-)
So, yes, the hash codes should sometimes happen to coincide.

Categories