What value does the hashCode() method return in java?
I read that it is a memory reference of an object... The hash value for new Integer(1) is 1; the hash value for String("a") is 97.
I am confused: is it ASCII or what type of value is?
The value returned by hashCode() is by no means guaranteed to be the memory address of the object. I'm not sure of the implementation in the Object class, but keep in mind most classes will override hashCode() such that two instances that are semantically equivalent (but are not the same instance) will hash to the same value. This is especially important if the classes may be used within another data structure, such as Set, that relies on hashCode being consistent with equals.
There is no hashCode() that uniquely identifies an instance of an object no matter what. If you want a hashcode based on the underlying pointer (e.g. in Sun's implementation), use System.identityHashCode() - this will delegate to the default hashCode method regardless of whether it has been overridden.
Nevertheless, even System.identityHashCode() can return the same hash for multiple objects. See the comments for an explanation, but here is an example program that continuously generates objects until it finds two with the same System.identityHashCode(). When I run it, it quickly finds two System.identityHashCode()s that match, on average after adding about 86,000 Long wrapper objects (and Integer wrappers for the key) to a map.
public static void main(String[] args) {
Map<Integer,Long> map = new HashMap<>();
Random generator = new Random();
Collection<Integer> counts = new LinkedList<>();
Long object = generator.nextLong();
// We use the identityHashCode as the key into the map
// This makes it easier to check if any other objects
// have the same key.
int hash = System.identityHashCode(object);
while (!map.containsKey(hash)) {
map.put(hash, object);
object = generator.nextLong();
hash = System.identityHashCode(object);
}
System.out.println("Identical maps for size: " + map.size());
System.out.println("First object value: " + object);
System.out.println("Second object value: " + map.get(hash));
System.out.println("First object identityHash: " + System.identityHashCode(object));
System.out.println("Second object identityHash: " + System.identityHashCode(map.get(hash)));
}
Example output:
Identical maps for size: 105822
First object value: 7446391633043190962
Second object value: -8143651927768852586
First object identityHash: 2134400190
Second object identityHash: 2134400190
A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code. If you write a custom type you are responsible for creating a good hashCode implementation that will best represent the state of the current instance.
If you want to know how they are implmented, I suggest you read the source. If you are using an IDE you can just + on a method you are interested in and see how a method is implemented. If you cannot do that, you can google for the source.
For example, Integer.hashCode() is implemented as
public int hashCode() {
return value;
}
and String.hashCode()
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
The hashCode() method is often used for identifying an object. I think the Object implementation returns the pointer (not a real pointer but a unique id or something like that) of the object. But most classes override the method. Like the String class. Two String objects have not the same pointer but they are equal:
new String("a").hashCode() == new String("a").hashCode()
I think the most common use for hashCode() is in Hashtable, HashSet, etc..
Java API Object hashCode()
Edit: (due to a recent downvote and based on an article I read about JVM parameters)
With the JVM parameter -XX:hashCode you can change the way how the hashCode is calculated (see the Issue 222 of the Java Specialists' Newsletter).
HashCode==0: Simply returns random numbers with no relation to where
in memory the object is found. As far as I can make out, the global
read-write of the seed is not optimal for systems with lots of
processors.
HashCode==1: Counts up the hash code values, not sure at what value
they start, but it seems quite high.
HashCode==2: Always returns the exact same identity hash code of 1.
This can be used to test code that relies on object identity. The
reason why JavaChampionTest returned Kirk's URL in the example above
is that all objects were returning the same hash code.
HashCode==3: Counts up the hash code values, starting from zero. It
does not look to be thread safe, so multiple threads could generate
objects with the same hash code.
HashCode==4: This seems to have some relation to the memory location
at which the object was created.
HashCode>=5: This is the default algorithm for Java 8 and has a
per-thread seed. It uses Marsaglia's xor-shift scheme to produce
pseudo-random numbers.
I read that it is an memory reference of an object..
No. Object.hashCode() used to return a memory address about 14 years ago. Not since.
what type of value is
What it is depends entirely on what class you're talking about and whether or not it has overridden `Object.hashCode().
From OpenJDK sources (JDK8):
Use default of 5 to generate hash codes:
product(intx, hashCode, 5,
"(Unstable) select hashCode generation algorithm")
Some constant data and a random generated number with a seed initiated per thread:
// thread-specific hashCode stream generator state - Marsaglia shift-xor form
_hashStateX = os::random() ;
_hashStateY = 842502087 ;
_hashStateZ = 0x8767 ; // (int)(3579807591LL & 0xffff) ;
_hashStateW = 273326509 ;
Then, this function creates the hashCode (defaulted to 5 as specified above):
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0 ;
if (hashCode == 0) {
// This form uses an unguarded global Park-Miller RNG,
// so it's possible for two threads to race and generate the same RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random() ;
} else
if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
} else
if (hashCode == 2) {
value = 1 ; // for sensitivity testing
} else
if (hashCode == 3) {
value = ++GVars.hcSequence ;
} else
if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj) ;
} else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX ;
t ^= (t << 11) ;
Self->_hashStateX = Self->_hashStateY ;
Self->_hashStateY = Self->_hashStateZ ;
Self->_hashStateZ = Self->_hashStateW ;
unsigned v = Self->_hashStateW ;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
Self->_hashStateW = v ;
value = v ;
}
value &= markOopDesc::hash_mask;
if (value == 0) value = 0xBAD ;
assert (value != markOopDesc::no_hash, "invariant") ;
TEVENT (hashCode: GENERATE) ;
return value;
}
So we can see that at least in JDK8 the default is set to random thread specific.
Definition: The String hashCode() method returns the hashcode value of the String as an Integer.
Syntax:
public int hashCode()
Hashcode is calculated using below formula
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
where:
s is ith character in the string
n is length of the string
^ is exponential operand
Example:
For example if you want to calculate hashcode for string "abc" then we have below details
s[] = {'a', 'b', 'c'}
n = 3
So the hashcode value will be calculated as:
s[0]*31^(2) + s[1]*31^1 + s[2]
= a*31^2 + b*31^1 + c*31^0
= (ASCII value of a = 97, b = 98 and c = 99)
= 97*961 + 98*31 + 99
= 93217 + 3038 + 99
= 96354
So the hashcode value for 'abc' is 96354
Object.hashCode(), if memory serves correctly (check the JavaDoc for java.lang.Object), is implementation-dependent, and will change depending on the object (the Sun JVM derives the value from the value of the reference to the object).
Note that if you are implementing any nontrivial object, and want to correctly store them in a HashMap or HashSet, you MUST override hashCode() and equals(). hashCode() can do whatever you like (it's entirely legal, but suboptimal to have it return 1.), but it's vital that if your equals() method returns true, then the value returned by hashCode() for both objects are equal.
Confusion and lack of understanding of hashCode() and equals() is a big source of bugs. Make sure that you thoroughly familiarize yourself with the JavaDocs for Object.hashCode() and Object.equals(), and I guarantee that the time spent will pay for itself.
From the Javadoc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--
I'm surprised that no one mentioned this but although its obvious for any non Object class your first action should be to read the source code for many classes .hashcode() is simply extended from Object in which case there are several different interesting things that may happen depending on your JVM implementation. Object.hashcode() calls to System.identityHashcode(object).
Indeed using object address in memory is ancient history but many do not realise they can control this behaviour and how Object.hashcode() is computed via jvm argument -XX:hashCode=N where N can be a number from [0-5]...
0 – Park-Miller RNG (default, blocking)
1 – f(address, global_statement)
2 – constant 1
3 – serial counter
4 – object address
5 – Thread-local Xorshift
Depending on an application you may see unexpected performance hits when .hashcode() is called, when that happens it is likely you are using one of the algorithms that shares global state and/or blocks.
According to javaDoc of "internal address of the object is converted into an integer". So it is clear that hashCode() method do not return internal address of object as it is. Link is provided below.
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--
To clear it please see following sample code:
public class HashCodeDemo
{
public static void main(String[] args)
{
final int CAPACITY_OF_MAP = 10000000;
/**
* hashCode as key, and Object as value
*/
java.util.HashMap<Integer, Object> hm1 = new java.util.HashMap<Integer, Object>(CAPACITY_OF_MAP);
int noOfDistinceObject = 0;
Object obj = null;
for(int i = 0; i < CAPACITY_OF_MAP; i++)
{
obj = new Object();
hm1.put(obj.hashCode(), new Object());
}
System.out.println("hm1.size() = "+hm1.size());
/**
* hashCode as key, and Object as value
*/
java.util.HashMap<Integer, Object> hm2 = new java.util.HashMap<Integer, Object>(CAPACITY_OF_MAP);
for(int i = 0; i < CAPACITY_OF_MAP; i++)
{
obj = new Object();
/**
* Each Object has unique memory location ,
* and if Object's hashCode is memory location then hashCode of Object is also unique
* then no object can put into hm2.
*
* If obj's hashCode is doesn't exists in hm1 then increment noOfDistinceObject , else add obj into hm2.
*/
if(hm1.get(obj.hashCode()) == null)
{
noOfDistinceObject++;
}
else
{
hm2.put(obj.hashCode(), new Object());
}
}
System.out.println("hm2.size() = "+hm2.size());
System.out.println("noOfDistinceObject = "+noOfDistinceObject);
}
}
Each Object has unique memory location , and if Object's hashCode method return memory location then hashCode of Object is also unique but if we run above sample code then some Objects have same hashcode value and some have unique hashcode value.
So we can say that hashCode method from Object class does not return memory location.
Related
How do we decide on the best implementation of hashCode() method for a collection (assuming that equals method has been overridden correctly) ?
The best implementation? That is a hard question because it depends on the usage pattern.
A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.
A short version
Create a int result and assign a non-zero value.
For every field f tested in the equals() method, calculate a hash code c by:
If the field f is a boolean:
calculate (f ? 0 : 1);
If the field f is a byte, char, short or int: calculate (int)f;
If the field f is a long: calculate (int)(f ^ (f >>> 32));
If the field f is a float: calculate Float.floatToIntBits(f);
If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
Combine the hash value c with result:
result = 37 * result + c
Return result
This should result in a proper distribution of hash values for most use situations.
If you're happy with the Effective Java implementation recommended by dmeister, you can use a library call instead of rolling your own:
#Override
public int hashCode() {
return Objects.hash(this.firstName, this.lastName);
}
This requires either Guava (com.google.common.base.Objects.hashCode) or the standard library in Java 7 (java.util.Objects.hash) but works the same way.
Although this is linked to Android documentation (Wayback Machine) and My own code on Github, it will work for Java in general. My answer is an extension of dmeister's Answer with just code that is much easier to read and understand.
#Override
public int hashCode() {
// Start with a non-zero constant. Prime is preferred
int result = 17;
// Include a hash for each field.
// Primatives
result = 31 * result + (booleanField ? 1 : 0); // 1 bit » 32-bit
result = 31 * result + byteField; // 8 bits » 32-bit
result = 31 * result + charField; // 16 bits » 32-bit
result = 31 * result + shortField; // 16 bits » 32-bit
result = 31 * result + intField; // 32 bits » 32-bit
result = 31 * result + (int)(longField ^ (longField >>> 32)); // 64 bits » 32-bit
result = 31 * result + Float.floatToIntBits(floatField); // 32 bits » 32-bit
long doubleFieldBits = Double.doubleToLongBits(doubleField); // 64 bits (double) » 64-bit (long) » 32-bit (int)
result = 31 * result + (int)(doubleFieldBits ^ (doubleFieldBits >>> 32));
// Objects
result = 31 * result + Arrays.hashCode(arrayField); // var bits » 32-bit
result = 31 * result + referenceField.hashCode(); // var bits » 32-bit (non-nullable)
result = 31 * result + // var bits » 32-bit (nullable)
(nullableReferenceField == null
? 0
: nullableReferenceField.hashCode());
return result;
}
EDIT
Typically, when you override hashcode(...), you also want to override equals(...). So for those that will or has already implemented equals, here is a good reference from my Github...
#Override
public boolean equals(Object o) {
// Optimization (not required).
if (this == o) {
return true;
}
// Return false if the other object has the wrong type, interface, or is null.
if (!(o instanceof MyType)) {
return false;
}
MyType lhs = (MyType) o; // lhs means "left hand side"
// Primitive fields
return booleanField == lhs.booleanField
&& byteField == lhs.byteField
&& charField == lhs.charField
&& shortField == lhs.shortField
&& intField == lhs.intField
&& longField == lhs.longField
&& floatField == lhs.floatField
&& doubleField == lhs.doubleField
// Arrays
&& Arrays.equals(arrayField, lhs.arrayField)
// Objects
&& referenceField.equals(lhs.referenceField)
&& (nullableReferenceField == null
? lhs.nullableReferenceField == null
: nullableReferenceField.equals(lhs.nullableReferenceField));
}
It is better to use the functionality provided by Eclipse which does a pretty good job and you can put your efforts and energy in developing the business logic.
First make sure that equals is implemented correctly. From an IBM DeveloperWorks article:
Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)
Reflexivity: For all non-null references, a.equals(a)
Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)
Then make sure that their relation with hashCode respects the contact (from the same article):
Consistency with hashCode(): Two equal objects must have the same hashCode() value
Finally a good hash function should strive to approach the ideal hash function.
about8.blogspot.com, you said
if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values
I cannot agree with you. If two objects have the same hashcode it doesn't have to mean that they are equal.
If A equals B then A.hashcode must be equal to B.hascode
but
if A.hashcode equals B.hascode it does not mean that A must equals B
If you use eclipse, you can generate equals() and hashCode() using:
Source -> Generate hashCode() and equals().
Using this function you can decide which fields you want to use for equality and hash code calculation, and Eclipse generates the corresponding methods.
There's a good implementation of the Effective Java's hashcode() and equals() logic in Apache Commons Lang. Checkout HashCodeBuilder and EqualsBuilder.
Just a quick note for completing other more detailed answer (in term of code):
If I consider the question how-do-i-create-a-hash-table-in-java and especially the jGuru FAQ entry, I believe some other criteria upon which a hash code could be judged are:
synchronization (does the algo support concurrent access or not) ?
fail safe iteration (does the algo detect a collection which changes during iteration)
null value (does the hash code support null value in the collection)
If I understand your question correctly, you have a custom collection class (i.e. a new class that extends from the Collection interface) and you want to implement the hashCode() method.
If your collection class extends AbstractList, then you don't have to worry about it, there is already an implementation of equals() and hashCode() that works by iterating through all the objects and adding their hashCodes() together.
public int hashCode() {
int hashCode = 1;
Iterator i = iterator();
while (i.hasNext()) {
Object obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
return hashCode;
}
Now if what you want is the best way to calculate the hash code for a specific class, I normally use the ^ (bitwise exclusive or) operator to process all fields that I use in the equals method:
public int hashCode(){
return intMember ^ (stringField != null ? stringField.hashCode() : 0);
}
#about8 : there is a pretty serious bug there.
Zam obj1 = new Zam("foo", "bar", "baz");
Zam obj2 = new Zam("fo", "obar", "baz");
same hashcode
you probably want something like
public int hashCode() {
return (getFoo().hashCode() + getBar().hashCode()).toString().hashCode();
(can you get hashCode directly from int in Java these days? I think it does some autocasting.. if that's the case, skip the toString, it's ugly.)
As you specifically asked for collections, I'd like to add an aspect that the other answers haven't mentioned yet: A HashMap doesn't expect their keys to change their hashcode once they are added to the collection. Would defeat the whole purpose...
Use the reflection methods on Apache Commons EqualsBuilder and HashCodeBuilder.
I use a tiny wrapper around Arrays.deepHashCode(...) because it handles arrays supplied as parameters correctly
public static int hash(final Object... objects) {
return Arrays.deepHashCode(objects);
}
any hashing method that evenly distributes the hash value over the possible range is a good implementation. See effective java ( http://books.google.com.au/books?id=ZZOiqZQIbRMC&dq=effective+java&pg=PP1&ots=UZMZ2siN25&sig=kR0n73DHJOn-D77qGj0wOxAxiZw&hl=en&sa=X&oi=book_result&resnum=1&ct=result ) , there is a good tip in there for hashcode implementation (item 9 i think...).
I prefer using utility methods fromm Google Collections lib from class Objects that helps me to keep my code clean. Very often equals and hashcode methods are made from IDE's template, so their are not clean to read.
Here is another JDK 1.7+ approach demonstration with superclass logics accounted. I see it as pretty convinient with Object class hashCode() accounted, pure JDK dependency and no extra manual work. Please note Objects.hash() is null tolerant.
I have not include any equals() implementation but in reality you will of course need it.
import java.util.Objects;
public class Demo {
public static class A {
private final String param1;
public A(final String param1) {
this.param1 = param1;
}
#Override
public int hashCode() {
return Objects.hash(
super.hashCode(),
this.param1);
}
}
public static class B extends A {
private final String param2;
private final String param3;
public B(
final String param1,
final String param2,
final String param3) {
super(param1);
this.param2 = param2;
this.param3 = param3;
}
#Override
public final int hashCode() {
return Objects.hash(
super.hashCode(),
this.param2,
this.param3);
}
}
public static void main(String [] args) {
A a = new A("A");
B b = new B("A", "B", "C");
System.out.println("A: " + a.hashCode());
System.out.println("B: " + b.hashCode());
}
}
The standard implementation is weak and using it leads to unnecessary collisions. Imagine a
class ListPair {
List<Integer> first;
List<Integer> second;
ListPair(List<Integer> first, List<Integer> second) {
this.first = first;
this.second = second;
}
public int hashCode() {
return Objects.hashCode(first, second);
}
...
}
Now,
new ListPair(List.of(a), List.of(b, c))
and
new ListPair(List.of(b), List.of(a, c))
have the same hashCode, namely 31*(a+b) + c as the multiplier used for List.hashCode gets reused here. Obviously, collisions are unavoidable, but producing needless collisions is just... needless.
There's nothing substantially smart about using 31. The multiplier must be odd in order to avoid losing information (any even multiplier loses at least the most significant bit, multiples of four lose two, etc.). Any odd multiplier is usable. Small multipliers may lead to faster computation (the JIT can use shifts and additions), but given that multiplication has latency of only three cycles on modern Intel/AMD, this hardly matters. Small multipliers also leads to more collision for small inputs, which may be a problem sometimes.
Using a prime is pointless as primes have no meaning in the ring Z/(2**32).
So, I'd recommend using a randomly chosen big odd number (feel free to take a prime). As i86/amd64 CPUs can use a shorter instruction for operands fitting in a single signed byte, there is a tiny speed advantage for multipliers like 109. For minimizing collisions, take something like 0x58a54cf5.
Using different multipliers in different places is helpful, but probably not enough to justify the additional work.
When combining hash values, I usually use the combining method that's used in the boost c++ library, namely:
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
This does a fairly good job of ensuring an even distribution. For some discussion of how this formula works, see the StackOverflow post: Magic number in boost::hash_combine
There's a good discussion of different hash functions at: http://burtleburtle.net/bob/hash/doobs.html
For a simple class it is often easiest to implement hashCode() based on the class fields which are checked by the equals() implementation.
public class Zam {
private String foo;
private String bar;
private String somethingElse;
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
Zam otherObj = (Zam)obj;
if ((getFoo() == null && otherObj.getFoo() == null) || (getFoo() != null && getFoo().equals(otherObj.getFoo()))) {
if ((getBar() == null && otherObj. getBar() == null) || (getBar() != null && getBar().equals(otherObj. getBar()))) {
return true;
}
}
return false;
}
public int hashCode() {
return (getFoo() + getBar()).hashCode();
}
public String getFoo() {
return foo;
}
public String getBar() {
return bar;
}
}
The most important thing is to keep hashCode() and equals() consistent: if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values.
I am trying to implement an unique hashCode based on six different values. My Class has the following attributes:
private int id_place;
private String algorithm;
private Date mission_date;
private int mission_hour;
private int x;
private int y;
I am calculating the hashCode as following:
id_place * (7 * algorithm.hashCode()) + (31 * mission_date.hashCode()) + (23 * mission_hour + 89089) + (x * 19 + 67067) + (y * 11 + 97097);
How can I turn it into an unique hashCode? I'm not confident it is unique...
It doesn't have to be unique and it cannot be unique. hashCode() returns an int (32 bits), which means it could be unique if you only had one int property and nothing else.
The Integer class can (and does) have a unique hashCode(), but few other classes do.
Since you have multiple properties, some of which are int, a hashCode() that is a function of these properties can't be unique.
You should strive for a hasCode() function that gives a wide range of different values for different combinations of your properties, but it cannot be unique.
HashCode for two different object needs not be unique. According to https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode() -
Whenever it is invoked on the same object more than once during an execution of a Java application, hashCode() must consistently return the same value, provided no information used in equals comparisons on the object is modified. This value needs not remain consistent from one execution of an application to another execution of the same application
If two objects are equal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must produce the same value.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
So , you don't have to create hashCode() function which returns distinct hash code everytime.
Unique is not a hard requirement, but the more unique the hash code is, the better.
Note first that the hash code in general is used for a HashMap, as index into a 'bucket.' Hence optimally it should be unique modulo the bucket size, the number of slots in the bucket. However this may vary, when the map grows.
But okay, towards an optimal hash code:
Ranges are important; if x and y where in 0..255, then they could be packed uniquely in two bytes, or when 0..999 then y*1000+x. For LocalDateTime, if one could take the long in seconds (i.o. ms or ns), and since 2012-01-01 so you might assume a range from 0 upto say two years in the future.
You can explore existing or generate plausible test data. One then can mathematically optimize your hash code function by their coincidental coefficients (7, 13, 23). This is linear optimisation, but one can also do it by simple trial-and-error: counting the clashes for varying (A, B, C).
//int[] coeffients = ...;
int[][] coefficientsCandidates = new int[NUM_OF_CANDIDATES][NUM_OF_COEFFS];
...
int[] collisionCounts = new int[NUM_OF_CANDIDATES];
for (Data data : allTestData) {
... update collisionCounts for every candidate
}
... take the candidate with smallest collision count
... or sort by collisionCounts and pick other candidates to try out
In general such evaluation code is not needed for a working hash code, but especially it might detect bad hash codes, were there is some pseudo-randomness going wrong. For instance if a factor is way too large for the range (weekday * 1000), so value holes appear.
But also one has to say in all honesty, that all this effort probably really is not needed.
In Eclipse, there is a function that generates the method public int hashCode() for you. I used the class attributes you provided and the result is as follows:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((algorithm == null) ? 0 : algorithm.hashCode());
result = prime * result + id_place;
result = prime * result + ((mission_date == null) ? 0 : mission_date.hashCode());
result = prime * result + mission_hour;
result = prime * result + x;
result = prime * result + y;
return result;
}
It looks a lot like your calculation. However, as Andy Turner pointed out in a comment to your question and Eran in an answer, you simply cannot make a unique hash code for every single instance of an object if their amount exceeds the maximum amount of possible different hash codes.
Because you have multiple fields, use:
public int hashCode() {
return Objects.hash(id_place, algorithm, mission_date, mission_hour, x, y);
}
If objA.equals(objB) is true, then objA and objB must return the same hash code.
If objA.equals(objB) is false, then objA and objB might return the same hash code, if your hashing algo happens to return different hashCodes in this case, it ise good for performance reasons.
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
ClassA classA = (ClassA) o;
return id_place == classA.id_place &&
mission_hour == classA.mission_hour &&
x == classA.x &&
y == classA.y &&
Objects.equals(algorithm, classA.algorithm) &&
Objects.equals(mission_date, classA.mission_date);
}
In this code I have declared a Initialized a String variable and then printed its hashcode, then reinitialized it to another value and then invoked the Garbage Collector to clear the dereferenced objects.
But when I reinitialize the String variable to its original value and print the hashcode, the same hashcode is getting printed. How?
public class TestGarbage1 {
public static void main(String args[]) {
String m = "JAVA";
System.out.println(m.hashCode());
m = "java";
System.gc();
System.out.println(m.hashCode());
m = "JAVA";
System.out.println(m.hashCode());
}
}
Hash code relates to object equality, not identity.
a.equals(b) implies a.hashCode() == b.hashCode()
(Provided the two methods have been implemented consistently)
Even if a gc were actually taking place here (and you weren't simply referencing strings in the constant pool), you wouldn't expect two string instances with the same sequence of chars not to be equal - hence, their hash codes will also be the same.
String a = new String("whatever");
String b = new String(a);
System.out.println(a == b); // false, they are not the same instance
System.out.println(a.equals(b)); // true, they represent the same string
System.out.println(a.hashCode() == b.hashCode()); // true, they represent the same string
I think you are misunderstanding something about how hashcodes work. Without going in to too much detail, in Java, hashcodes are used for many things. One example is used to find an item in a Hash datastructure like HashMap or HashSet.
A hash of the same value should always return the same hash. In this case, a hash of "JAVA" should never change because then it will break the agreement set forth in Java.
I think it's too complicated to go about how hashcodes for String are calculated. You can read more about it here. I can give you an example though.
Let's say you have a class Fruit and it has fields like shape, color and weight.
You must implement equals AND hashcode for this class. It is very important to do both because otherwise you are breaking the way Hashmap work. Let's say you make this for your hashCode() method.
#Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + this.color;
hash = hash * 31 + this.shape.hashCode();
hash = hash * 31 + this.weight;
return hash;
}
This will generate the same hash value EVERY TIME for the two Fruit instances that are equal. That is exactly what you would want.
Really quick, how would this be actually used in a HashMap? Let's say you want to see if you have foo = new Fruit(); HashMap first calculates foo.hashCode(). It checks to see if there is anything in the bucket for that hashCode. If there is then it will use the equals() method until it returns true. It must do this because there might be hashcode collisions. And that's why it is important why equals and hashCode should be implemented together.
I am using HashMap.
Below is example code.
Classes MyKey and MyValue are inhereted from Object in simple way.
Java documentation says for Object and methods hashCode() and equals():
"As much as is reasonably practical, the hashCode method defined
by class Object does return distinct integers for distinct objects.
(This is typically implemented by converting the internal address
of the object into an integer, but this implementation technique
is not required by the JavaTM programming language.)"
"The equals method for class Object implements the most discriminating
possible equivalence relation on objects; that is, for any non-null reference
values x and y, this method returns true if and only if x and y refer
to the same object (x == y has the value true). "
My question is:
Can I trust that HashMap works in my example?
If not what would be right way to put simple objects into map without rewriting methods
hashCode() and equals()?
I am not sure but I have heard that Java may change location and addresss
of user objects during execution of program (Was it GC which may do that?)
If address and hash code of key2 have changed before the line
MyValue v = m.get(key2);
then calling m.get(key2) would return wrong value, null?
If this is true then I believe that also IdentityHashMap()
is useless for same reasons.
class MyKey
{
Integer v;
//<Perhaps more fields>
MyKey(Integer v) {this.v=v;}
}
class MyValue
{
String s;
//<Perhaps more fields>
MyValue(String s) {this.s = s;}
}
Then some code:
Map<MyKey,MyValue> m = new HashMap<MyKey,MyValue>();
MyKey key1 = new MyKey(5);
MyKey key2 = new MyKey(6);
MyKey key3 = new MyKey(7);
m.put(key1, new MyValue("AAA"));
m.put(key2, new MyValue("BBB"));
m.put(key3, new MyValue("CCC"));
.
.
//Is it sure that I will get value "AAA" here
//if entry with key2 has not been removed from map m?
MyValue v = m.get(key2);
System.out.println("s="+v.s);
Can I trust that HashMap works in my example? If not what would be right way to put simple objects into map without rewriting methods hashCode() and equals()?
You cannot avoid providing a sensible hashCode and equals methods, it is required for HashMap and other Hash collections to work. (with the exception of IdentityHashMap)
I am not sure but I have heard that Java may change location and addresss of user objects during execution of program (Was it GC which may do that?)
While this is true, it has nothing to do with your main question.
If address and hash code of key2 have changed before the line
The address and hashCode have nothing to do with one another. If the address changes it doesn't change the hashCode and if you change the hashCode it doesn't change the address.
If this is true then I believe that also IdentityHashMap() is useless for same reasons.
Even if you assume hashCode is useless, this doesn't affect IndentityHashCode because it doesn't use the hashCode or equals methods.
Objects are basically allocated continuously in memory from the Eden space. If you run
Object[] objects = new Object[20];
for (int i = 0; i < objects.length; i++)
objects[i] = new Object();
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
for (int i = 0; i < objects.length; i++) {
int location = unsafe.getInt(objects, Unsafe.ARRAY_OBJECT_BASE_OFFSET + Unsafe.ARRAY_OBJECT_INDEX_SCALE * i);
System.out.println(Integer.toHexString(location) + ": hashCode=" + Integer.toHexString(objects[i].hashCode()));
}
you might expect them to be continuous if the followed some memory location but they don't
eac89d10: hashCode=634e3372
eac89d20: hashCode=2313b44d
eac89d30: hashCode=62a23d38
eac89d40: hashCode=9615a1f
eac89d50: hashCode=233aa44
eac89d60: hashCode=59243f75
eac89d70: hashCode=5ac2480b
eac89d80: hashCode=907f8ba
eac89d90: hashCode=6a5a7ff7
eac89da0: hashCode=5b8767ad
eac89db0: hashCode=50ba0dfc
eac89dc0: hashCode=2198a037
eac89dd0: hashCode=2b3e8c1c
eac89de0: hashCode=17609872
eac89df0: hashCode=46b8705b
eac89e00: hashCode=76d88aa2
eac89e10: hashCode=275cea3
eac89e20: hashCode=4513098
eac89e30: hashCode=6e4d4d5e
eac89e40: hashCode=15128ee5
Java has four different way of encoding references in 32-bit and 64-bit, however if your maximum heap size is less than 2 GB it will be a simple 32-bit address as it was when I ran this example.
Can I trust that HashMap works in my example? If not what would be right way to put simple objects into map without rewriting methods hashCode() and equals()?
Your example does not provide hashCode or equals, so it will use the defaults.
The dafaults work with the object identity, which means o.equals(o2) will be true only if o and o2 refer to the same object.
MyKey m = new MyKey(1);
MyKey m2 = new MyKey(1);
MyKey m3 = m;
map.put(m,...);
map.get(m);//works
map.get(m2); //different object
map.get(m3);//works same object
I am not sure but I have heard that Java may change location and addresss of user objects during execution of program (Was it GC which may do that?)
The address of an object is not relevant, while the default hashCode might use it this happens only once for each object and then stays the same.
First off, the default hashcode of an object does not change after it is constructed so IdentityHashMap is not useless. Java may move objects around in memory but it doesn't change their identity hash code.
Secondly, if you don't define your own equals() method, then every object you construct will be not equals() to every other. This means if you want to use them as keys in a HashMap (or IdentityHashMap) you'll only be able to retrieve them by using the original object.
For example:
MyKey key = new MyKey(5);
m.put(key, value);
...
MyKey newKey = new MyKey(5);
m.get(newKey); // Will not find the value
As newKey and key are different objects, they are not ==. This is why you are recommended to override equals() (and hashcode()) for you objects.
A HashMap still works if you don't override equals() and hashcode() on your objects but it will generally not do what you want - in this case, it becomes equivalent to IdentityHashMap
If I override either method on a class, it must make sure that if A.equals(B) == true then A.hashCode() == B.hashCode must also be true.
Can someone show me a simple example where if this is violated, it'll cause a problem? I think it has something to do with if you use that class as the type of keys to Hashmap?
Sure:
public class Test {
private final int m, n;
public Test(int m, int n) {
this.m = m;
this.n = n;
}
public int hashCode() { return n * m; }
public boolean equals(Object ob) {
if (ob.getClass() != Test.class) return false;
Test other = (Test)ob;
return m == other.m;
}
}
with:
Set<Test> set = new HashSet<Test>();
set.put(new Test(3,4));
boolean b = set.contains(new Test(3, 10)); // false
Technically that should be true because m == 3 in both cases.
In general a HashMap works like this: it has a variable number of what are commonly called "buckets". The number of buckets can change over time (as entries are added and removed) but it is always a power of 2.
Let's say a given HashMap has 16 buckets. When you call put() to add an entry, the hashCode() of the key is calculated and then a mask is taken depending on the size of the buckets. If you (bitwise) AND the hashCode() with 15 (0x0F) you will get the last 4 bits, equaling a number between 0 and 15 inclusive:
int factor = 4;
int buckets = 1 << (factor-1) - 1; // 16
int mask = buckets - 1; // 15
int code = key.hashCode();
int dest = code & mask; // a number from 0 to 15 inclusive
Now if there is already an entry in that bucket you have what's called a collision. There are multiple ways of dealing with this but the one used by HashMap (and is probably the most common overall) is bucketing. All the entries with the same masked hashCode are put in a list of some kind.
So to find if a given key is in the map already:
Calculate the masked hash code;
Find the appropriate bucket;
If it's empty, key not found;
If is isn't empty, loop through all entries in the bucket checking equals().
Looking through a bucket is a linear (O(n)) operation but it's on a small subset. The hashcode bucket determination is essentially constant (O(1)). If buckets are sufficiently small then access to a HashMap is usually described as "near O(1)".
You can make a couple of observations about this.
Firstly, if you have a bunch of objects that all return 42 as their hash code a HashMap will still work but it will operate as an expensive list. Access will be O(n) (as everything will be in the same bucket regardless of the number of buckets). I've actually been asked this in an interview.
Secondly, returning to your original point, if two objects are equal (meaning a.equals(b) == b.equals(a) == true) but have different hash codes then the HashMap will go looking in (probably) the wrong bucket resulting in unpredictable and undefined behaviour.
This is discussed in the Item 8: Always override hashCode when you override equals of Joshua Bloch's Effective Java:
A common source of bugs is the failure to override the hashCode method. You must
override hashCode in every class that overrides equals. Failure to do so will
result in a violation of the general contract for Object.hashCode, which will pre-
vent your class from functioning properly in conjunction with all hash-based collec-
tions, including HashMap, HashSet, and Hashtable.
Here is the contract, copied from the
java.lang.Object specification:
Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The key provision that is violated when you fail to override hashCode is
the second one: Equal objects must have equal hash codes. Two distinct
instances may be logically equal according to the class’s equals method, but to
the Object class’s hashCode method, they’re just two objects with nothing much
in common. Therefore object’s hashCode method returns two seemingly random
numbers instead of two equal numbers as required by the contract.
For example, consider the following simplistic PhoneNumber class, whose
equals method is constructed according to the recipe in Item 7:
public final class PhoneNumber {
private final short areaCode;
private final short exchange;
private final short extension;
public PhoneNumber(int areaCode, int exchange,
int extension) {
rangeCheck(areaCode, 999, "area code");
rangeCheck(exchange, 999, "exchange");
rangeCheck(extension, 9999, "extension");
this.areaCode = (short) areaCode;
this.exchange = (short) exchange;
this.extension = (short) extension;
}
private static void rangeCheck(int arg, int max,
String name) {
if (arg < 0 || arg > max)
throw new IllegalArgumentException(name +": " + arg);
}
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof PhoneNumber))
return false;
PhoneNumber pn = (PhoneNumber)o;
return pn.extension == extension &&
pn.exchange == exchange &&
pn.areaCode == areaCode;
}
// No hashCode method!
... // Remainder omitted
}
Suppose you attempt to use this class
with a HashMap:
Map m = new HashMap();
m.put(new PhoneNumber(408, 867, 5309), "Jenny");
At this point, you might expect
m.get(new PhoneNumber(408 , 867,
5309)) to return "Jenny", but it
returns null. Notice that two PhoneNumber instances are
involved: One is used for insertion
into the HashMap, and a second, equal,
instance is used for (attempted)
retrieval. The PhoneNumber class’s
failure to override hashCode causes
the two equal instances to have
unequal hash codes, in violation of
the hashCode contract. Therefore the
get method looks for the phone number
in a different hash bucket from the
one in which it was stored by the put
method. Fixing this problem is as
simple as providing a proper hashCode
method for the PhoneNumber class.
[...]
See the Chapter 3 for the full content.
Containers like HashSet rely on the hash function to determine where to put it, and where to get it from when asked for it. If A.equals(B), then a HashSet is expecting A to be in the same place as B. If you put A in with value V, and look up B, you should expect to get V back (since you've said A.equals(B)). But if A.hashcode() != B.hashcode(), then the hashset may not find where you put it.
Here's a little example:
Set<Foo> myFoos = new HashSet<Foo>();
Foo firstFoo = new Foo(123,"Alpha");
myFoos.add(firstFoo);
// later in the processing you get another Foo from somewhere
Foo someFoo = //use imagination here...;
// maybe you get it from a database... and it's equal to Foo(123,"Alpha)
if (myFoos.contains(someFoo)) {
// maybe you win a million bucks.
}
So, imagine that the hashCode that gets created for firstFoo is 99999 and it winds up at a specific spot in the myFoos HashSet. Later when you get the someFoo and you look for it in the myFoos HashSet, it needs to generate the same hashCode so you can find it.
It's exactly because of hash tables.
Because of the possibility of hash code collisions, hash tables need to check identity as well, otherwise the table can't determine if it found the object it was looking for, or one with the same hash code. So every get() in a hash table calls key.equals(potentialMatch) before returning a value.
If equals() and hashCode() are inconsistent you can get very inconsistent behavior. Say for two objects, a and b, a.equals(b) returns true, but a.hashCode() != b.hashCode(). Insert a and a HashSet will return false for .contains(b), but a List created from that set will return true (because the list doesn't use hash codes).
HashSet set = new HashSet();
set.add(a);
set.contains(b); // false
new ArrayList(set).contains(b); // true
Obviously, that could be bad.
The idea behind this is that two objects are "equal" if all of their fields have equal values. If all of fields have equal values, the two objects should have the same hash value.