How do we decide on the best implementation of hashCode() method for a collection (assuming that equals method has been overridden correctly) ?
The best implementation? That is a hard question because it depends on the usage pattern.
A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.
A short version
Create a int result and assign a non-zero value.
For every field f tested in the equals() method, calculate a hash code c by:
If the field f is a boolean:
calculate (f ? 0 : 1);
If the field f is a byte, char, short or int: calculate (int)f;
If the field f is a long: calculate (int)(f ^ (f >>> 32));
If the field f is a float: calculate Float.floatToIntBits(f);
If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
Combine the hash value c with result:
result = 37 * result + c
Return result
This should result in a proper distribution of hash values for most use situations.
If you're happy with the Effective Java implementation recommended by dmeister, you can use a library call instead of rolling your own:
#Override
public int hashCode() {
return Objects.hash(this.firstName, this.lastName);
}
This requires either Guava (com.google.common.base.Objects.hashCode) or the standard library in Java 7 (java.util.Objects.hash) but works the same way.
Although this is linked to Android documentation (Wayback Machine) and My own code on Github, it will work for Java in general. My answer is an extension of dmeister's Answer with just code that is much easier to read and understand.
#Override
public int hashCode() {
// Start with a non-zero constant. Prime is preferred
int result = 17;
// Include a hash for each field.
// Primatives
result = 31 * result + (booleanField ? 1 : 0); // 1 bit » 32-bit
result = 31 * result + byteField; // 8 bits » 32-bit
result = 31 * result + charField; // 16 bits » 32-bit
result = 31 * result + shortField; // 16 bits » 32-bit
result = 31 * result + intField; // 32 bits » 32-bit
result = 31 * result + (int)(longField ^ (longField >>> 32)); // 64 bits » 32-bit
result = 31 * result + Float.floatToIntBits(floatField); // 32 bits » 32-bit
long doubleFieldBits = Double.doubleToLongBits(doubleField); // 64 bits (double) » 64-bit (long) » 32-bit (int)
result = 31 * result + (int)(doubleFieldBits ^ (doubleFieldBits >>> 32));
// Objects
result = 31 * result + Arrays.hashCode(arrayField); // var bits » 32-bit
result = 31 * result + referenceField.hashCode(); // var bits » 32-bit (non-nullable)
result = 31 * result + // var bits » 32-bit (nullable)
(nullableReferenceField == null
? 0
: nullableReferenceField.hashCode());
return result;
}
EDIT
Typically, when you override hashcode(...), you also want to override equals(...). So for those that will or has already implemented equals, here is a good reference from my Github...
#Override
public boolean equals(Object o) {
// Optimization (not required).
if (this == o) {
return true;
}
// Return false if the other object has the wrong type, interface, or is null.
if (!(o instanceof MyType)) {
return false;
}
MyType lhs = (MyType) o; // lhs means "left hand side"
// Primitive fields
return booleanField == lhs.booleanField
&& byteField == lhs.byteField
&& charField == lhs.charField
&& shortField == lhs.shortField
&& intField == lhs.intField
&& longField == lhs.longField
&& floatField == lhs.floatField
&& doubleField == lhs.doubleField
// Arrays
&& Arrays.equals(arrayField, lhs.arrayField)
// Objects
&& referenceField.equals(lhs.referenceField)
&& (nullableReferenceField == null
? lhs.nullableReferenceField == null
: nullableReferenceField.equals(lhs.nullableReferenceField));
}
It is better to use the functionality provided by Eclipse which does a pretty good job and you can put your efforts and energy in developing the business logic.
First make sure that equals is implemented correctly. From an IBM DeveloperWorks article:
Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)
Reflexivity: For all non-null references, a.equals(a)
Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)
Then make sure that their relation with hashCode respects the contact (from the same article):
Consistency with hashCode(): Two equal objects must have the same hashCode() value
Finally a good hash function should strive to approach the ideal hash function.
about8.blogspot.com, you said
if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values
I cannot agree with you. If two objects have the same hashcode it doesn't have to mean that they are equal.
If A equals B then A.hashcode must be equal to B.hascode
but
if A.hashcode equals B.hascode it does not mean that A must equals B
If you use eclipse, you can generate equals() and hashCode() using:
Source -> Generate hashCode() and equals().
Using this function you can decide which fields you want to use for equality and hash code calculation, and Eclipse generates the corresponding methods.
There's a good implementation of the Effective Java's hashcode() and equals() logic in Apache Commons Lang. Checkout HashCodeBuilder and EqualsBuilder.
Just a quick note for completing other more detailed answer (in term of code):
If I consider the question how-do-i-create-a-hash-table-in-java and especially the jGuru FAQ entry, I believe some other criteria upon which a hash code could be judged are:
synchronization (does the algo support concurrent access or not) ?
fail safe iteration (does the algo detect a collection which changes during iteration)
null value (does the hash code support null value in the collection)
If I understand your question correctly, you have a custom collection class (i.e. a new class that extends from the Collection interface) and you want to implement the hashCode() method.
If your collection class extends AbstractList, then you don't have to worry about it, there is already an implementation of equals() and hashCode() that works by iterating through all the objects and adding their hashCodes() together.
public int hashCode() {
int hashCode = 1;
Iterator i = iterator();
while (i.hasNext()) {
Object obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
return hashCode;
}
Now if what you want is the best way to calculate the hash code for a specific class, I normally use the ^ (bitwise exclusive or) operator to process all fields that I use in the equals method:
public int hashCode(){
return intMember ^ (stringField != null ? stringField.hashCode() : 0);
}
#about8 : there is a pretty serious bug there.
Zam obj1 = new Zam("foo", "bar", "baz");
Zam obj2 = new Zam("fo", "obar", "baz");
same hashcode
you probably want something like
public int hashCode() {
return (getFoo().hashCode() + getBar().hashCode()).toString().hashCode();
(can you get hashCode directly from int in Java these days? I think it does some autocasting.. if that's the case, skip the toString, it's ugly.)
As you specifically asked for collections, I'd like to add an aspect that the other answers haven't mentioned yet: A HashMap doesn't expect their keys to change their hashcode once they are added to the collection. Would defeat the whole purpose...
Use the reflection methods on Apache Commons EqualsBuilder and HashCodeBuilder.
I use a tiny wrapper around Arrays.deepHashCode(...) because it handles arrays supplied as parameters correctly
public static int hash(final Object... objects) {
return Arrays.deepHashCode(objects);
}
any hashing method that evenly distributes the hash value over the possible range is a good implementation. See effective java ( http://books.google.com.au/books?id=ZZOiqZQIbRMC&dq=effective+java&pg=PP1&ots=UZMZ2siN25&sig=kR0n73DHJOn-D77qGj0wOxAxiZw&hl=en&sa=X&oi=book_result&resnum=1&ct=result ) , there is a good tip in there for hashcode implementation (item 9 i think...).
I prefer using utility methods fromm Google Collections lib from class Objects that helps me to keep my code clean. Very often equals and hashcode methods are made from IDE's template, so their are not clean to read.
Here is another JDK 1.7+ approach demonstration with superclass logics accounted. I see it as pretty convinient with Object class hashCode() accounted, pure JDK dependency and no extra manual work. Please note Objects.hash() is null tolerant.
I have not include any equals() implementation but in reality you will of course need it.
import java.util.Objects;
public class Demo {
public static class A {
private final String param1;
public A(final String param1) {
this.param1 = param1;
}
#Override
public int hashCode() {
return Objects.hash(
super.hashCode(),
this.param1);
}
}
public static class B extends A {
private final String param2;
private final String param3;
public B(
final String param1,
final String param2,
final String param3) {
super(param1);
this.param2 = param2;
this.param3 = param3;
}
#Override
public final int hashCode() {
return Objects.hash(
super.hashCode(),
this.param2,
this.param3);
}
}
public static void main(String [] args) {
A a = new A("A");
B b = new B("A", "B", "C");
System.out.println("A: " + a.hashCode());
System.out.println("B: " + b.hashCode());
}
}
The standard implementation is weak and using it leads to unnecessary collisions. Imagine a
class ListPair {
List<Integer> first;
List<Integer> second;
ListPair(List<Integer> first, List<Integer> second) {
this.first = first;
this.second = second;
}
public int hashCode() {
return Objects.hashCode(first, second);
}
...
}
Now,
new ListPair(List.of(a), List.of(b, c))
and
new ListPair(List.of(b), List.of(a, c))
have the same hashCode, namely 31*(a+b) + c as the multiplier used for List.hashCode gets reused here. Obviously, collisions are unavoidable, but producing needless collisions is just... needless.
There's nothing substantially smart about using 31. The multiplier must be odd in order to avoid losing information (any even multiplier loses at least the most significant bit, multiples of four lose two, etc.). Any odd multiplier is usable. Small multipliers may lead to faster computation (the JIT can use shifts and additions), but given that multiplication has latency of only three cycles on modern Intel/AMD, this hardly matters. Small multipliers also leads to more collision for small inputs, which may be a problem sometimes.
Using a prime is pointless as primes have no meaning in the ring Z/(2**32).
So, I'd recommend using a randomly chosen big odd number (feel free to take a prime). As i86/amd64 CPUs can use a shorter instruction for operands fitting in a single signed byte, there is a tiny speed advantage for multipliers like 109. For minimizing collisions, take something like 0x58a54cf5.
Using different multipliers in different places is helpful, but probably not enough to justify the additional work.
When combining hash values, I usually use the combining method that's used in the boost c++ library, namely:
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
This does a fairly good job of ensuring an even distribution. For some discussion of how this formula works, see the StackOverflow post: Magic number in boost::hash_combine
There's a good discussion of different hash functions at: http://burtleburtle.net/bob/hash/doobs.html
For a simple class it is often easiest to implement hashCode() based on the class fields which are checked by the equals() implementation.
public class Zam {
private String foo;
private String bar;
private String somethingElse;
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
Zam otherObj = (Zam)obj;
if ((getFoo() == null && otherObj.getFoo() == null) || (getFoo() != null && getFoo().equals(otherObj.getFoo()))) {
if ((getBar() == null && otherObj. getBar() == null) || (getBar() != null && getBar().equals(otherObj. getBar()))) {
return true;
}
}
return false;
}
public int hashCode() {
return (getFoo() + getBar()).hashCode();
}
public String getFoo() {
return foo;
}
public String getBar() {
return bar;
}
}
The most important thing is to keep hashCode() and equals() consistent: if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values.
Related
How do we decide on the best implementation of hashCode() method for a collection (assuming that equals method has been overridden correctly) ?
The best implementation? That is a hard question because it depends on the usage pattern.
A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.
A short version
Create a int result and assign a non-zero value.
For every field f tested in the equals() method, calculate a hash code c by:
If the field f is a boolean:
calculate (f ? 0 : 1);
If the field f is a byte, char, short or int: calculate (int)f;
If the field f is a long: calculate (int)(f ^ (f >>> 32));
If the field f is a float: calculate Float.floatToIntBits(f);
If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
Combine the hash value c with result:
result = 37 * result + c
Return result
This should result in a proper distribution of hash values for most use situations.
If you're happy with the Effective Java implementation recommended by dmeister, you can use a library call instead of rolling your own:
#Override
public int hashCode() {
return Objects.hash(this.firstName, this.lastName);
}
This requires either Guava (com.google.common.base.Objects.hashCode) or the standard library in Java 7 (java.util.Objects.hash) but works the same way.
Although this is linked to Android documentation (Wayback Machine) and My own code on Github, it will work for Java in general. My answer is an extension of dmeister's Answer with just code that is much easier to read and understand.
#Override
public int hashCode() {
// Start with a non-zero constant. Prime is preferred
int result = 17;
// Include a hash for each field.
// Primatives
result = 31 * result + (booleanField ? 1 : 0); // 1 bit » 32-bit
result = 31 * result + byteField; // 8 bits » 32-bit
result = 31 * result + charField; // 16 bits » 32-bit
result = 31 * result + shortField; // 16 bits » 32-bit
result = 31 * result + intField; // 32 bits » 32-bit
result = 31 * result + (int)(longField ^ (longField >>> 32)); // 64 bits » 32-bit
result = 31 * result + Float.floatToIntBits(floatField); // 32 bits » 32-bit
long doubleFieldBits = Double.doubleToLongBits(doubleField); // 64 bits (double) » 64-bit (long) » 32-bit (int)
result = 31 * result + (int)(doubleFieldBits ^ (doubleFieldBits >>> 32));
// Objects
result = 31 * result + Arrays.hashCode(arrayField); // var bits » 32-bit
result = 31 * result + referenceField.hashCode(); // var bits » 32-bit (non-nullable)
result = 31 * result + // var bits » 32-bit (nullable)
(nullableReferenceField == null
? 0
: nullableReferenceField.hashCode());
return result;
}
EDIT
Typically, when you override hashcode(...), you also want to override equals(...). So for those that will or has already implemented equals, here is a good reference from my Github...
#Override
public boolean equals(Object o) {
// Optimization (not required).
if (this == o) {
return true;
}
// Return false if the other object has the wrong type, interface, or is null.
if (!(o instanceof MyType)) {
return false;
}
MyType lhs = (MyType) o; // lhs means "left hand side"
// Primitive fields
return booleanField == lhs.booleanField
&& byteField == lhs.byteField
&& charField == lhs.charField
&& shortField == lhs.shortField
&& intField == lhs.intField
&& longField == lhs.longField
&& floatField == lhs.floatField
&& doubleField == lhs.doubleField
// Arrays
&& Arrays.equals(arrayField, lhs.arrayField)
// Objects
&& referenceField.equals(lhs.referenceField)
&& (nullableReferenceField == null
? lhs.nullableReferenceField == null
: nullableReferenceField.equals(lhs.nullableReferenceField));
}
It is better to use the functionality provided by Eclipse which does a pretty good job and you can put your efforts and energy in developing the business logic.
First make sure that equals is implemented correctly. From an IBM DeveloperWorks article:
Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)
Reflexivity: For all non-null references, a.equals(a)
Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)
Then make sure that their relation with hashCode respects the contact (from the same article):
Consistency with hashCode(): Two equal objects must have the same hashCode() value
Finally a good hash function should strive to approach the ideal hash function.
about8.blogspot.com, you said
if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values
I cannot agree with you. If two objects have the same hashcode it doesn't have to mean that they are equal.
If A equals B then A.hashcode must be equal to B.hascode
but
if A.hashcode equals B.hascode it does not mean that A must equals B
If you use eclipse, you can generate equals() and hashCode() using:
Source -> Generate hashCode() and equals().
Using this function you can decide which fields you want to use for equality and hash code calculation, and Eclipse generates the corresponding methods.
There's a good implementation of the Effective Java's hashcode() and equals() logic in Apache Commons Lang. Checkout HashCodeBuilder and EqualsBuilder.
Just a quick note for completing other more detailed answer (in term of code):
If I consider the question how-do-i-create-a-hash-table-in-java and especially the jGuru FAQ entry, I believe some other criteria upon which a hash code could be judged are:
synchronization (does the algo support concurrent access or not) ?
fail safe iteration (does the algo detect a collection which changes during iteration)
null value (does the hash code support null value in the collection)
If I understand your question correctly, you have a custom collection class (i.e. a new class that extends from the Collection interface) and you want to implement the hashCode() method.
If your collection class extends AbstractList, then you don't have to worry about it, there is already an implementation of equals() and hashCode() that works by iterating through all the objects and adding their hashCodes() together.
public int hashCode() {
int hashCode = 1;
Iterator i = iterator();
while (i.hasNext()) {
Object obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
return hashCode;
}
Now if what you want is the best way to calculate the hash code for a specific class, I normally use the ^ (bitwise exclusive or) operator to process all fields that I use in the equals method:
public int hashCode(){
return intMember ^ (stringField != null ? stringField.hashCode() : 0);
}
#about8 : there is a pretty serious bug there.
Zam obj1 = new Zam("foo", "bar", "baz");
Zam obj2 = new Zam("fo", "obar", "baz");
same hashcode
you probably want something like
public int hashCode() {
return (getFoo().hashCode() + getBar().hashCode()).toString().hashCode();
(can you get hashCode directly from int in Java these days? I think it does some autocasting.. if that's the case, skip the toString, it's ugly.)
As you specifically asked for collections, I'd like to add an aspect that the other answers haven't mentioned yet: A HashMap doesn't expect their keys to change their hashcode once they are added to the collection. Would defeat the whole purpose...
Use the reflection methods on Apache Commons EqualsBuilder and HashCodeBuilder.
I use a tiny wrapper around Arrays.deepHashCode(...) because it handles arrays supplied as parameters correctly
public static int hash(final Object... objects) {
return Arrays.deepHashCode(objects);
}
any hashing method that evenly distributes the hash value over the possible range is a good implementation. See effective java ( http://books.google.com.au/books?id=ZZOiqZQIbRMC&dq=effective+java&pg=PP1&ots=UZMZ2siN25&sig=kR0n73DHJOn-D77qGj0wOxAxiZw&hl=en&sa=X&oi=book_result&resnum=1&ct=result ) , there is a good tip in there for hashcode implementation (item 9 i think...).
I prefer using utility methods fromm Google Collections lib from class Objects that helps me to keep my code clean. Very often equals and hashcode methods are made from IDE's template, so their are not clean to read.
Here is another JDK 1.7+ approach demonstration with superclass logics accounted. I see it as pretty convinient with Object class hashCode() accounted, pure JDK dependency and no extra manual work. Please note Objects.hash() is null tolerant.
I have not include any equals() implementation but in reality you will of course need it.
import java.util.Objects;
public class Demo {
public static class A {
private final String param1;
public A(final String param1) {
this.param1 = param1;
}
#Override
public int hashCode() {
return Objects.hash(
super.hashCode(),
this.param1);
}
}
public static class B extends A {
private final String param2;
private final String param3;
public B(
final String param1,
final String param2,
final String param3) {
super(param1);
this.param2 = param2;
this.param3 = param3;
}
#Override
public final int hashCode() {
return Objects.hash(
super.hashCode(),
this.param2,
this.param3);
}
}
public static void main(String [] args) {
A a = new A("A");
B b = new B("A", "B", "C");
System.out.println("A: " + a.hashCode());
System.out.println("B: " + b.hashCode());
}
}
The standard implementation is weak and using it leads to unnecessary collisions. Imagine a
class ListPair {
List<Integer> first;
List<Integer> second;
ListPair(List<Integer> first, List<Integer> second) {
this.first = first;
this.second = second;
}
public int hashCode() {
return Objects.hashCode(first, second);
}
...
}
Now,
new ListPair(List.of(a), List.of(b, c))
and
new ListPair(List.of(b), List.of(a, c))
have the same hashCode, namely 31*(a+b) + c as the multiplier used for List.hashCode gets reused here. Obviously, collisions are unavoidable, but producing needless collisions is just... needless.
There's nothing substantially smart about using 31. The multiplier must be odd in order to avoid losing information (any even multiplier loses at least the most significant bit, multiples of four lose two, etc.). Any odd multiplier is usable. Small multipliers may lead to faster computation (the JIT can use shifts and additions), but given that multiplication has latency of only three cycles on modern Intel/AMD, this hardly matters. Small multipliers also leads to more collision for small inputs, which may be a problem sometimes.
Using a prime is pointless as primes have no meaning in the ring Z/(2**32).
So, I'd recommend using a randomly chosen big odd number (feel free to take a prime). As i86/amd64 CPUs can use a shorter instruction for operands fitting in a single signed byte, there is a tiny speed advantage for multipliers like 109. For minimizing collisions, take something like 0x58a54cf5.
Using different multipliers in different places is helpful, but probably not enough to justify the additional work.
When combining hash values, I usually use the combining method that's used in the boost c++ library, namely:
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
This does a fairly good job of ensuring an even distribution. For some discussion of how this formula works, see the StackOverflow post: Magic number in boost::hash_combine
There's a good discussion of different hash functions at: http://burtleburtle.net/bob/hash/doobs.html
For a simple class it is often easiest to implement hashCode() based on the class fields which are checked by the equals() implementation.
public class Zam {
private String foo;
private String bar;
private String somethingElse;
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
Zam otherObj = (Zam)obj;
if ((getFoo() == null && otherObj.getFoo() == null) || (getFoo() != null && getFoo().equals(otherObj.getFoo()))) {
if ((getBar() == null && otherObj. getBar() == null) || (getBar() != null && getBar().equals(otherObj. getBar()))) {
return true;
}
}
return false;
}
public int hashCode() {
return (getFoo() + getBar()).hashCode();
}
public String getFoo() {
return foo;
}
public String getBar() {
return bar;
}
}
The most important thing is to keep hashCode() and equals() consistent: if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values.
I am trying to implement an unique hashCode based on six different values. My Class has the following attributes:
private int id_place;
private String algorithm;
private Date mission_date;
private int mission_hour;
private int x;
private int y;
I am calculating the hashCode as following:
id_place * (7 * algorithm.hashCode()) + (31 * mission_date.hashCode()) + (23 * mission_hour + 89089) + (x * 19 + 67067) + (y * 11 + 97097);
How can I turn it into an unique hashCode? I'm not confident it is unique...
It doesn't have to be unique and it cannot be unique. hashCode() returns an int (32 bits), which means it could be unique if you only had one int property and nothing else.
The Integer class can (and does) have a unique hashCode(), but few other classes do.
Since you have multiple properties, some of which are int, a hashCode() that is a function of these properties can't be unique.
You should strive for a hasCode() function that gives a wide range of different values for different combinations of your properties, but it cannot be unique.
HashCode for two different object needs not be unique. According to https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode() -
Whenever it is invoked on the same object more than once during an execution of a Java application, hashCode() must consistently return the same value, provided no information used in equals comparisons on the object is modified. This value needs not remain consistent from one execution of an application to another execution of the same application
If two objects are equal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must produce the same value.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
So , you don't have to create hashCode() function which returns distinct hash code everytime.
Unique is not a hard requirement, but the more unique the hash code is, the better.
Note first that the hash code in general is used for a HashMap, as index into a 'bucket.' Hence optimally it should be unique modulo the bucket size, the number of slots in the bucket. However this may vary, when the map grows.
But okay, towards an optimal hash code:
Ranges are important; if x and y where in 0..255, then they could be packed uniquely in two bytes, or when 0..999 then y*1000+x. For LocalDateTime, if one could take the long in seconds (i.o. ms or ns), and since 2012-01-01 so you might assume a range from 0 upto say two years in the future.
You can explore existing or generate plausible test data. One then can mathematically optimize your hash code function by their coincidental coefficients (7, 13, 23). This is linear optimisation, but one can also do it by simple trial-and-error: counting the clashes for varying (A, B, C).
//int[] coeffients = ...;
int[][] coefficientsCandidates = new int[NUM_OF_CANDIDATES][NUM_OF_COEFFS];
...
int[] collisionCounts = new int[NUM_OF_CANDIDATES];
for (Data data : allTestData) {
... update collisionCounts for every candidate
}
... take the candidate with smallest collision count
... or sort by collisionCounts and pick other candidates to try out
In general such evaluation code is not needed for a working hash code, but especially it might detect bad hash codes, were there is some pseudo-randomness going wrong. For instance if a factor is way too large for the range (weekday * 1000), so value holes appear.
But also one has to say in all honesty, that all this effort probably really is not needed.
In Eclipse, there is a function that generates the method public int hashCode() for you. I used the class attributes you provided and the result is as follows:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((algorithm == null) ? 0 : algorithm.hashCode());
result = prime * result + id_place;
result = prime * result + ((mission_date == null) ? 0 : mission_date.hashCode());
result = prime * result + mission_hour;
result = prime * result + x;
result = prime * result + y;
return result;
}
It looks a lot like your calculation. However, as Andy Turner pointed out in a comment to your question and Eran in an answer, you simply cannot make a unique hash code for every single instance of an object if their amount exceeds the maximum amount of possible different hash codes.
Because you have multiple fields, use:
public int hashCode() {
return Objects.hash(id_place, algorithm, mission_date, mission_hour, x, y);
}
If objA.equals(objB) is true, then objA and objB must return the same hash code.
If objA.equals(objB) is false, then objA and objB might return the same hash code, if your hashing algo happens to return different hashCodes in this case, it ise good for performance reasons.
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
ClassA classA = (ClassA) o;
return id_place == classA.id_place &&
mission_hour == classA.mission_hour &&
x == classA.x &&
y == classA.y &&
Objects.equals(algorithm, classA.algorithm) &&
Objects.equals(mission_date, classA.mission_date);
}
I have the following java class:
public class Person{
String name; //a unique name
Long DoB; //a unique time
.
.
.
#Override
public int hashCode(){
return name.hashCode() + DoB.hashCode();
}
}
Is my hashCode method correct (i.e. would it return a unique number of all combinations.
I have a feeling I'm missing something here.
You could let java.util.Arrays do it for you:
return Arrays.hashCode(new Object[]{ name, DoB });
You might also want to use something more fluent and more NPE-bulletproof like Google Guava:
#Override
public int hashCode(){
return Objects.hashCode(name, DoB);
}
#Override
public boolean equals(Object o) {
if ( this == o ) {
return true;
}
if ( o == null || o.getClass() != Person.class ) {
return false;
}
final Person that = (Person) o;
return Objects.equal(name, that.name) && Objects.equal(DoB, that.DoB);
}
Edit:
IntelliJ IDEA and Eclipse can generate more efficient hashCode() and equals().
Aside for the obvious, which is, you might want to implement the equals method as well...
Summing two hash codes has the very small risk of overflowing int
The sum itself seems like a bit of a weak methodology to provide unique hash codes. I would instead try some bitwise manipulation and use a seed.
See Bloch's Effective Java #9.
But you should start with an initial value (so that subsequent zero values are significant), and combine the fields that apply to the result along with a multiplier so that order is significant (so that similar classes will have much different hashes.)
Also, you will have to treat things like long fields and Strings a little different. e.g., for longs:
(int) (field ^ (field>>>32))
So, this means something like:
#Override public int hashCode() {
int result = 17;
result += name.hashCode() == null ? 0 : name.hashCode();
result = 31 * result + (int) (DoB ^ (DoB >>> 32));
return result;
}
31 is slightly magic, but odd primes can make it easier for the compiler to optimize the math to shift-subtraction. (Or you can do the shift-subtraction yourself, but why not let the compiler do it.)
usually a hashcode is build like so:
#Override
public int hashCode(){
return name.hashCode() ^ DoB.hashCode();
}
but the important thing to remember when doing a hashcode method is the use of it. the use of hashcode method is to put different object in different buckets in a hashtable or other collection using hashcode. as such, it's impotent to have a method that gives different answers to different objects at a low run time but doesn't have to be different for every item, though it's better that way.
This hash is used by other code when storing or manipulating the
instance – the values are intended to be evenly distributed for varied
inputs in order to use in clustering. This property is important to
the performance of hash tables and other data structures that store
objects in groups ("buckets") based on their computed hash values
and
The general contract for overridden implementations of this method is
that they behave in a way consistent with the same object's equals()
method: that a given object must consistently report the same hash
value (unless it is changed so that the new version is no longer
considered "equal" to the old), and that two objects which equals()
says are equal must report the same hash value.
Your hash code implementation is fine and correct. It could be better if you follow any of the suggestions other people have made, but it satisfies the contract for hashCode, and collisions aren't particularly likely, though they could be made less likely.
I'm having some trouble writing a hashCode() method for a class I created. This class is meant to be used inside a TreeSet, and as such, it implements Comparable. The class has the following variables:
public class Node implements Comparable<Node> {
Matrix matrix;
int[] coordinates= new int[2];
Node father;
int depth;
int cost;
Here's the implementation of the compareTo() method. I want the TreeSet to organize these Node structures by their cost, therefore, compareTo() returns the result of a simple subtraction.
public int compareTo(Node nodeToCompare) {
return this.cost - nodeToCompare.cost;
}
I also implemented an equals() method.
public boolean equals(Object objectToCompare) {
if(objectToCompare== this) {return true;}
if(objectToCompare== null || objectToCompare.getClass()!= this.getClass()) {return false;}
Node objectNode= (Node) objectToCompare;
return this.father.equals(objectNode.father) &&
this.depth== objectNode.depth &&
this.cost== objectNode.cost &&
this.matrix.equals(objectNode.matrix) &&
Arrays.equals(this.coordinates, objectNode.coordinates);
}
Having said all of that, I have a few questions:
Since I implemented a new equals() method, should I implement a new hashCode() method?
How can I go about implementing a new hashCode method() with those variables? (Note that the variable matrix of the type Matrix has a hashCode() method implemented)
That's all!
Your compareTo method is not consistent with your equals method: your compareTo method says that two instances are equivalent if they have the same cost — such that a TreeSet can only ever contain at most one instance with a given cost — but your equals method says that they're only equivalent if they have the same cost and are the same in various other ways.
So, assuming that your equals method is correct:
you need to fix your compareTo method to be consistent with it.
you need to create a hashCode method that is consistent with it. I recommend using the same sort of logic as is used by java.util.List.hashCode(), which is a straightforward and effective way to assemble the hash-codes of component objects in a specific order; basically you would write something like: int hashCode = 1;
hashCode = 31 * hashCode + (father == null ? 0 : father.hashCode());
hashCode = 31 * hashCode + depth;
hashCode = 31 * hashCode + cost;
hashCode = 31 * hashCode + matrix.hashCode();
hashCode = 31 * hashCode + java.util.Arrays.hashCode(coordinates);
return hashCode;
Intellij IDEA can do this as a ' right-click' feature. Just seeing it done correctly will teach you alot.
And you should override both in any case.
The contract for the hashCode method states that if two objects are equal, then calling hashCode() should give you the same integer result. The opposite does not have to be true, i.e. if two hashCodes are the same the objects don't have to equal each other.
Looking at your equals method (which needs variable translation btw), you can add the hashCodes of all the internal member variables that need to be equals for your equals method to give true. e.g.
public int hashCode() {
return this.matrix.hashCode() +
this.coordinates[0] +
this.coordinates[1] +
this.father.hashCode() +
this.depth + this.cost;
}
The above assumes that matrix and father are never nulls, you need to make sure that you check for nulls if that's not the case.
If you feel more adventurous you can multiply a few of the above with a prime to ensure you don't get hashCode collisions for different data (this will help improve performance if you are using your class in hashTables and hashMaps). If you need to cater for nulls, the above method can be written a bit better like this:
public int hashCode() {
return ((this.matrix == null) ? 0 : this.matrix.hashCode()) +
17 * this.coordinates[0] +
this.coordinates[1] +
((this.father == null) ? 0 : this.father.hashCode()) +
31 * this.depth + 19 * this.cost;
}
If your collection is small you can return constant from hashCode method. It use for quick finding. hashCodes is like the boxes, which keep elements. Rules are:
Equal elements must be in same box (have same hashCode) - surely;
Not equal elements can be either in same or in different boxes.
Then you return constant, you obey these 2 rules, but it can significantly decrease perfomance on not small lists (because JVM will look for in all elements, and not in elements in the same box only). But return constant is the bad approach.
PS: Sorry for my writing. English is not my native language.
PPS: usualy you have to implement hashCode method in the same way as equals (use same elements)
What value does the hashCode() method return in java?
I read that it is a memory reference of an object... The hash value for new Integer(1) is 1; the hash value for String("a") is 97.
I am confused: is it ASCII or what type of value is?
The value returned by hashCode() is by no means guaranteed to be the memory address of the object. I'm not sure of the implementation in the Object class, but keep in mind most classes will override hashCode() such that two instances that are semantically equivalent (but are not the same instance) will hash to the same value. This is especially important if the classes may be used within another data structure, such as Set, that relies on hashCode being consistent with equals.
There is no hashCode() that uniquely identifies an instance of an object no matter what. If you want a hashcode based on the underlying pointer (e.g. in Sun's implementation), use System.identityHashCode() - this will delegate to the default hashCode method regardless of whether it has been overridden.
Nevertheless, even System.identityHashCode() can return the same hash for multiple objects. See the comments for an explanation, but here is an example program that continuously generates objects until it finds two with the same System.identityHashCode(). When I run it, it quickly finds two System.identityHashCode()s that match, on average after adding about 86,000 Long wrapper objects (and Integer wrappers for the key) to a map.
public static void main(String[] args) {
Map<Integer,Long> map = new HashMap<>();
Random generator = new Random();
Collection<Integer> counts = new LinkedList<>();
Long object = generator.nextLong();
// We use the identityHashCode as the key into the map
// This makes it easier to check if any other objects
// have the same key.
int hash = System.identityHashCode(object);
while (!map.containsKey(hash)) {
map.put(hash, object);
object = generator.nextLong();
hash = System.identityHashCode(object);
}
System.out.println("Identical maps for size: " + map.size());
System.out.println("First object value: " + object);
System.out.println("Second object value: " + map.get(hash));
System.out.println("First object identityHash: " + System.identityHashCode(object));
System.out.println("Second object identityHash: " + System.identityHashCode(map.get(hash)));
}
Example output:
Identical maps for size: 105822
First object value: 7446391633043190962
Second object value: -8143651927768852586
First object identityHash: 2134400190
Second object identityHash: 2134400190
A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code. If you write a custom type you are responsible for creating a good hashCode implementation that will best represent the state of the current instance.
If you want to know how they are implmented, I suggest you read the source. If you are using an IDE you can just + on a method you are interested in and see how a method is implemented. If you cannot do that, you can google for the source.
For example, Integer.hashCode() is implemented as
public int hashCode() {
return value;
}
and String.hashCode()
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
The hashCode() method is often used for identifying an object. I think the Object implementation returns the pointer (not a real pointer but a unique id or something like that) of the object. But most classes override the method. Like the String class. Two String objects have not the same pointer but they are equal:
new String("a").hashCode() == new String("a").hashCode()
I think the most common use for hashCode() is in Hashtable, HashSet, etc..
Java API Object hashCode()
Edit: (due to a recent downvote and based on an article I read about JVM parameters)
With the JVM parameter -XX:hashCode you can change the way how the hashCode is calculated (see the Issue 222 of the Java Specialists' Newsletter).
HashCode==0: Simply returns random numbers with no relation to where
in memory the object is found. As far as I can make out, the global
read-write of the seed is not optimal for systems with lots of
processors.
HashCode==1: Counts up the hash code values, not sure at what value
they start, but it seems quite high.
HashCode==2: Always returns the exact same identity hash code of 1.
This can be used to test code that relies on object identity. The
reason why JavaChampionTest returned Kirk's URL in the example above
is that all objects were returning the same hash code.
HashCode==3: Counts up the hash code values, starting from zero. It
does not look to be thread safe, so multiple threads could generate
objects with the same hash code.
HashCode==4: This seems to have some relation to the memory location
at which the object was created.
HashCode>=5: This is the default algorithm for Java 8 and has a
per-thread seed. It uses Marsaglia's xor-shift scheme to produce
pseudo-random numbers.
I read that it is an memory reference of an object..
No. Object.hashCode() used to return a memory address about 14 years ago. Not since.
what type of value is
What it is depends entirely on what class you're talking about and whether or not it has overridden `Object.hashCode().
From OpenJDK sources (JDK8):
Use default of 5 to generate hash codes:
product(intx, hashCode, 5,
"(Unstable) select hashCode generation algorithm")
Some constant data and a random generated number with a seed initiated per thread:
// thread-specific hashCode stream generator state - Marsaglia shift-xor form
_hashStateX = os::random() ;
_hashStateY = 842502087 ;
_hashStateZ = 0x8767 ; // (int)(3579807591LL & 0xffff) ;
_hashStateW = 273326509 ;
Then, this function creates the hashCode (defaulted to 5 as specified above):
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0 ;
if (hashCode == 0) {
// This form uses an unguarded global Park-Miller RNG,
// so it's possible for two threads to race and generate the same RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random() ;
} else
if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
} else
if (hashCode == 2) {
value = 1 ; // for sensitivity testing
} else
if (hashCode == 3) {
value = ++GVars.hcSequence ;
} else
if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj) ;
} else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX ;
t ^= (t << 11) ;
Self->_hashStateX = Self->_hashStateY ;
Self->_hashStateY = Self->_hashStateZ ;
Self->_hashStateZ = Self->_hashStateW ;
unsigned v = Self->_hashStateW ;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
Self->_hashStateW = v ;
value = v ;
}
value &= markOopDesc::hash_mask;
if (value == 0) value = 0xBAD ;
assert (value != markOopDesc::no_hash, "invariant") ;
TEVENT (hashCode: GENERATE) ;
return value;
}
So we can see that at least in JDK8 the default is set to random thread specific.
Definition: The String hashCode() method returns the hashcode value of the String as an Integer.
Syntax:
public int hashCode()
Hashcode is calculated using below formula
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
where:
s is ith character in the string
n is length of the string
^ is exponential operand
Example:
For example if you want to calculate hashcode for string "abc" then we have below details
s[] = {'a', 'b', 'c'}
n = 3
So the hashcode value will be calculated as:
s[0]*31^(2) + s[1]*31^1 + s[2]
= a*31^2 + b*31^1 + c*31^0
= (ASCII value of a = 97, b = 98 and c = 99)
= 97*961 + 98*31 + 99
= 93217 + 3038 + 99
= 96354
So the hashcode value for 'abc' is 96354
Object.hashCode(), if memory serves correctly (check the JavaDoc for java.lang.Object), is implementation-dependent, and will change depending on the object (the Sun JVM derives the value from the value of the reference to the object).
Note that if you are implementing any nontrivial object, and want to correctly store them in a HashMap or HashSet, you MUST override hashCode() and equals(). hashCode() can do whatever you like (it's entirely legal, but suboptimal to have it return 1.), but it's vital that if your equals() method returns true, then the value returned by hashCode() for both objects are equal.
Confusion and lack of understanding of hashCode() and equals() is a big source of bugs. Make sure that you thoroughly familiarize yourself with the JavaDocs for Object.hashCode() and Object.equals(), and I guarantee that the time spent will pay for itself.
From the Javadoc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--
I'm surprised that no one mentioned this but although its obvious for any non Object class your first action should be to read the source code for many classes .hashcode() is simply extended from Object in which case there are several different interesting things that may happen depending on your JVM implementation. Object.hashcode() calls to System.identityHashcode(object).
Indeed using object address in memory is ancient history but many do not realise they can control this behaviour and how Object.hashcode() is computed via jvm argument -XX:hashCode=N where N can be a number from [0-5]...
0 – Park-Miller RNG (default, blocking)
1 – f(address, global_statement)
2 – constant 1
3 – serial counter
4 – object address
5 – Thread-local Xorshift
Depending on an application you may see unexpected performance hits when .hashcode() is called, when that happens it is likely you are using one of the algorithms that shares global state and/or blocks.
According to javaDoc of "internal address of the object is converted into an integer". So it is clear that hashCode() method do not return internal address of object as it is. Link is provided below.
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--
To clear it please see following sample code:
public class HashCodeDemo
{
public static void main(String[] args)
{
final int CAPACITY_OF_MAP = 10000000;
/**
* hashCode as key, and Object as value
*/
java.util.HashMap<Integer, Object> hm1 = new java.util.HashMap<Integer, Object>(CAPACITY_OF_MAP);
int noOfDistinceObject = 0;
Object obj = null;
for(int i = 0; i < CAPACITY_OF_MAP; i++)
{
obj = new Object();
hm1.put(obj.hashCode(), new Object());
}
System.out.println("hm1.size() = "+hm1.size());
/**
* hashCode as key, and Object as value
*/
java.util.HashMap<Integer, Object> hm2 = new java.util.HashMap<Integer, Object>(CAPACITY_OF_MAP);
for(int i = 0; i < CAPACITY_OF_MAP; i++)
{
obj = new Object();
/**
* Each Object has unique memory location ,
* and if Object's hashCode is memory location then hashCode of Object is also unique
* then no object can put into hm2.
*
* If obj's hashCode is doesn't exists in hm1 then increment noOfDistinceObject , else add obj into hm2.
*/
if(hm1.get(obj.hashCode()) == null)
{
noOfDistinceObject++;
}
else
{
hm2.put(obj.hashCode(), new Object());
}
}
System.out.println("hm2.size() = "+hm2.size());
System.out.println("noOfDistinceObject = "+noOfDistinceObject);
}
}
Each Object has unique memory location , and if Object's hashCode method return memory location then hashCode of Object is also unique but if we run above sample code then some Objects have same hashcode value and some have unique hashcode value.
So we can say that hashCode method from Object class does not return memory location.