Create an unique hashCode based on many values - java

I am trying to implement an unique hashCode based on six different values. My Class has the following attributes:
private int id_place;
private String algorithm;
private Date mission_date;
private int mission_hour;
private int x;
private int y;
I am calculating the hashCode as following:
id_place * (7 * algorithm.hashCode()) + (31 * mission_date.hashCode()) + (23 * mission_hour + 89089) + (x * 19 + 67067) + (y * 11 + 97097);
How can I turn it into an unique hashCode? I'm not confident it is unique...

It doesn't have to be unique and it cannot be unique. hashCode() returns an int (32 bits), which means it could be unique if you only had one int property and nothing else.
The Integer class can (and does) have a unique hashCode(), but few other classes do.
Since you have multiple properties, some of which are int, a hashCode() that is a function of these properties can't be unique.
You should strive for a hasCode() function that gives a wide range of different values for different combinations of your properties, but it cannot be unique.

HashCode for two different object needs not be unique. According to https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode() -
Whenever it is invoked on the same object more than once during an execution of a Java application, hashCode() must consistently return the same value, provided no information used in equals comparisons on the object is modified. This value needs not remain consistent from one execution of an application to another execution of the same application
If two objects are equal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must produce the same value.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
So , you don't have to create hashCode() function which returns distinct hash code everytime.

Unique is not a hard requirement, but the more unique the hash code is, the better.
Note first that the hash code in general is used for a HashMap, as index into a 'bucket.' Hence optimally it should be unique modulo the bucket size, the number of slots in the bucket. However this may vary, when the map grows.
But okay, towards an optimal hash code:
Ranges are important; if x and y where in 0..255, then they could be packed uniquely in two bytes, or when 0..999 then y*1000+x. For LocalDateTime, if one could take the long in seconds (i.o. ms or ns), and since 2012-01-01 so you might assume a range from 0 upto say two years in the future.
You can explore existing or generate plausible test data. One then can mathematically optimize your hash code function by their coincidental coefficients (7, 13, 23). This is linear optimisation, but one can also do it by simple trial-and-error: counting the clashes for varying (A, B, C).
//int[] coeffients = ...;
int[][] coefficientsCandidates = new int[NUM_OF_CANDIDATES][NUM_OF_COEFFS];
...
int[] collisionCounts = new int[NUM_OF_CANDIDATES];
for (Data data : allTestData) {
... update collisionCounts for every candidate
}
... take the candidate with smallest collision count
... or sort by collisionCounts and pick other candidates to try out
In general such evaluation code is not needed for a working hash code, but especially it might detect bad hash codes, were there is some pseudo-randomness going wrong. For instance if a factor is way too large for the range (weekday * 1000), so value holes appear.
But also one has to say in all honesty, that all this effort probably really is not needed.

In Eclipse, there is a function that generates the method public int hashCode() for you. I used the class attributes you provided and the result is as follows:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((algorithm == null) ? 0 : algorithm.hashCode());
result = prime * result + id_place;
result = prime * result + ((mission_date == null) ? 0 : mission_date.hashCode());
result = prime * result + mission_hour;
result = prime * result + x;
result = prime * result + y;
return result;
}
It looks a lot like your calculation. However, as Andy Turner pointed out in a comment to your question and Eran in an answer, you simply cannot make a unique hash code for every single instance of an object if their amount exceeds the maximum amount of possible different hash codes.

Because you have multiple fields, use:
public int hashCode() {
return Objects.hash(id_place, algorithm, mission_date, mission_hour, x, y);
}
If objA.equals(objB) is true, then objA and objB must return the same hash code.
If objA.equals(objB) is false, then objA and objB might return the same hash code, if your hashing algo happens to return different hashCodes in this case, it ise good for performance reasons.
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
ClassA classA = (ClassA) o;
return id_place == classA.id_place &&
mission_hour == classA.mission_hour &&
x == classA.x &&
y == classA.y &&
Objects.equals(algorithm, classA.algorithm) &&
Objects.equals(mission_date, classA.mission_date);
}

Related

Uses of hashcode in Java apart from hashing collections [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

Combining Hash of String and Hash of Long

I have the following java class:
public class Person{
String name; //a unique name
Long DoB; //a unique time
.
.
.
#Override
public int hashCode(){
return name.hashCode() + DoB.hashCode();
}
}
Is my hashCode method correct (i.e. would it return a unique number of all combinations.
I have a feeling I'm missing something here.
You could let java.util.Arrays do it for you:
return Arrays.hashCode(new Object[]{ name, DoB });
You might also want to use something more fluent and more NPE-bulletproof like Google Guava:
#Override
public int hashCode(){
return Objects.hashCode(name, DoB);
}
#Override
public boolean equals(Object o) {
if ( this == o ) {
return true;
}
if ( o == null || o.getClass() != Person.class ) {
return false;
}
final Person that = (Person) o;
return Objects.equal(name, that.name) && Objects.equal(DoB, that.DoB);
}
Edit:
IntelliJ IDEA and Eclipse can generate more efficient hashCode() and equals().
Aside for the obvious, which is, you might want to implement the equals method as well...
Summing two hash codes has the very small risk of overflowing int
The sum itself seems like a bit of a weak methodology to provide unique hash codes. I would instead try some bitwise manipulation and use a seed.
See Bloch's Effective Java #9.
But you should start with an initial value (so that subsequent zero values are significant), and combine the fields that apply to the result along with a multiplier so that order is significant (so that similar classes will have much different hashes.)
Also, you will have to treat things like long fields and Strings a little different. e.g., for longs:
(int) (field ^ (field>>>32))
So, this means something like:
#Override public int hashCode() {
int result = 17;
result += name.hashCode() == null ? 0 : name.hashCode();
result = 31 * result + (int) (DoB ^ (DoB >>> 32));
return result;
}
31 is slightly magic, but odd primes can make it easier for the compiler to optimize the math to shift-subtraction. (Or you can do the shift-subtraction yourself, but why not let the compiler do it.)
usually a hashcode is build like so:
#Override
public int hashCode(){
return name.hashCode() ^ DoB.hashCode();
}
but the important thing to remember when doing a hashcode method is the use of it. the use of hashcode method is to put different object in different buckets in a hashtable or other collection using hashcode. as such, it's impotent to have a method that gives different answers to different objects at a low run time but doesn't have to be different for every item, though it's better that way.
This hash is used by other code when storing or manipulating the
instance – the values are intended to be evenly distributed for varied
inputs in order to use in clustering. This property is important to
the performance of hash tables and other data structures that store
objects in groups ("buckets") based on their computed hash values
and
The general contract for overridden implementations of this method is
that they behave in a way consistent with the same object's equals()
method: that a given object must consistently report the same hash
value (unless it is changed so that the new version is no longer
considered "equal" to the old), and that two objects which equals()
says are equal must report the same hash value.
Your hash code implementation is fine and correct. It could be better if you follow any of the suggestions other people have made, but it satisfies the contract for hashCode, and collisions aren't particularly likely, though they could be made less likely.

What is the use of hashCode in Java?

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

How is hashCode() calculated in Java

What value does the hashCode() method return in java?
I read that it is a memory reference of an object... The hash value for new Integer(1) is 1; the hash value for String("a") is 97.
I am confused: is it ASCII or what type of value is?
The value returned by hashCode() is by no means guaranteed to be the memory address of the object. I'm not sure of the implementation in the Object class, but keep in mind most classes will override hashCode() such that two instances that are semantically equivalent (but are not the same instance) will hash to the same value. This is especially important if the classes may be used within another data structure, such as Set, that relies on hashCode being consistent with equals.
There is no hashCode() that uniquely identifies an instance of an object no matter what. If you want a hashcode based on the underlying pointer (e.g. in Sun's implementation), use System.identityHashCode() - this will delegate to the default hashCode method regardless of whether it has been overridden.
Nevertheless, even System.identityHashCode() can return the same hash for multiple objects. See the comments for an explanation, but here is an example program that continuously generates objects until it finds two with the same System.identityHashCode(). When I run it, it quickly finds two System.identityHashCode()s that match, on average after adding about 86,000 Long wrapper objects (and Integer wrappers for the key) to a map.
public static void main(String[] args) {
Map<Integer,Long> map = new HashMap<>();
Random generator = new Random();
Collection<Integer> counts = new LinkedList<>();
Long object = generator.nextLong();
// We use the identityHashCode as the key into the map
// This makes it easier to check if any other objects
// have the same key.
int hash = System.identityHashCode(object);
while (!map.containsKey(hash)) {
map.put(hash, object);
object = generator.nextLong();
hash = System.identityHashCode(object);
}
System.out.println("Identical maps for size: " + map.size());
System.out.println("First object value: " + object);
System.out.println("Second object value: " + map.get(hash));
System.out.println("First object identityHash: " + System.identityHashCode(object));
System.out.println("Second object identityHash: " + System.identityHashCode(map.get(hash)));
}
Example output:
Identical maps for size: 105822
First object value: 7446391633043190962
Second object value: -8143651927768852586
First object identityHash: 2134400190
Second object identityHash: 2134400190
A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code. If you write a custom type you are responsible for creating a good hashCode implementation that will best represent the state of the current instance.
If you want to know how they are implmented, I suggest you read the source. If you are using an IDE you can just + on a method you are interested in and see how a method is implemented. If you cannot do that, you can google for the source.
For example, Integer.hashCode() is implemented as
public int hashCode() {
return value;
}
and String.hashCode()
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
The hashCode() method is often used for identifying an object. I think the Object implementation returns the pointer (not a real pointer but a unique id or something like that) of the object. But most classes override the method. Like the String class. Two String objects have not the same pointer but they are equal:
new String("a").hashCode() == new String("a").hashCode()
I think the most common use for hashCode() is in Hashtable, HashSet, etc..
Java API Object hashCode()
Edit: (due to a recent downvote and based on an article I read about JVM parameters)
With the JVM parameter -XX:hashCode you can change the way how the hashCode is calculated (see the Issue 222 of the Java Specialists' Newsletter).
HashCode==0: Simply returns random numbers with no relation to where
in memory the object is found. As far as I can make out, the global
read-write of the seed is not optimal for systems with lots of
processors.
HashCode==1: Counts up the hash code values, not sure at what value
they start, but it seems quite high.
HashCode==2: Always returns the exact same identity hash code of 1.
This can be used to test code that relies on object identity. The
reason why JavaChampionTest returned Kirk's URL in the example above
is that all objects were returning the same hash code.
HashCode==3: Counts up the hash code values, starting from zero. It
does not look to be thread safe, so multiple threads could generate
objects with the same hash code.
HashCode==4: This seems to have some relation to the memory location
at which the object was created.
HashCode>=5: This is the default algorithm for Java 8 and has a
per-thread seed. It uses Marsaglia's xor-shift scheme to produce
pseudo-random numbers.
I read that it is an memory reference of an object..
No. Object.hashCode() used to return a memory address about 14 years ago. Not since.
what type of value is
What it is depends entirely on what class you're talking about and whether or not it has overridden `Object.hashCode().
From OpenJDK sources (JDK8):
Use default of 5 to generate hash codes:
product(intx, hashCode, 5,
"(Unstable) select hashCode generation algorithm")
Some constant data and a random generated number with a seed initiated per thread:
// thread-specific hashCode stream generator state - Marsaglia shift-xor form
_hashStateX = os::random() ;
_hashStateY = 842502087 ;
_hashStateZ = 0x8767 ; // (int)(3579807591LL & 0xffff) ;
_hashStateW = 273326509 ;
Then, this function creates the hashCode (defaulted to 5 as specified above):
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0 ;
if (hashCode == 0) {
// This form uses an unguarded global Park-Miller RNG,
// so it's possible for two threads to race and generate the same RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random() ;
} else
if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
} else
if (hashCode == 2) {
value = 1 ; // for sensitivity testing
} else
if (hashCode == 3) {
value = ++GVars.hcSequence ;
} else
if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj) ;
} else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX ;
t ^= (t << 11) ;
Self->_hashStateX = Self->_hashStateY ;
Self->_hashStateY = Self->_hashStateZ ;
Self->_hashStateZ = Self->_hashStateW ;
unsigned v = Self->_hashStateW ;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
Self->_hashStateW = v ;
value = v ;
}
value &= markOopDesc::hash_mask;
if (value == 0) value = 0xBAD ;
assert (value != markOopDesc::no_hash, "invariant") ;
TEVENT (hashCode: GENERATE) ;
return value;
}
So we can see that at least in JDK8 the default is set to random thread specific.
Definition: The String hashCode() method returns the hashcode value of the String as an Integer.
Syntax:
public int hashCode()
Hashcode is calculated using below formula
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
where:
s is ith character in the string
n is length of the string
^ is exponential operand
Example:
For example if you want to calculate hashcode for string "abc" then we have below details
s[] = {'a', 'b', 'c'}
n = 3
So the hashcode value will be calculated as:
s[0]*31^(2) + s[1]*31^1 + s[2]
= a*31^2 + b*31^1 + c*31^0
= (ASCII value of a = 97, b = 98 and c = 99)
= 97*961 + 98*31 + 99
= 93217 + 3038 + 99
= 96354
So the hashcode value for 'abc' is 96354
Object.hashCode(), if memory serves correctly (check the JavaDoc for java.lang.Object), is implementation-dependent, and will change depending on the object (the Sun JVM derives the value from the value of the reference to the object).
Note that if you are implementing any nontrivial object, and want to correctly store them in a HashMap or HashSet, you MUST override hashCode() and equals(). hashCode() can do whatever you like (it's entirely legal, but suboptimal to have it return 1.), but it's vital that if your equals() method returns true, then the value returned by hashCode() for both objects are equal.
Confusion and lack of understanding of hashCode() and equals() is a big source of bugs. Make sure that you thoroughly familiarize yourself with the JavaDocs for Object.hashCode() and Object.equals(), and I guarantee that the time spent will pay for itself.
From the Javadoc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--
I'm surprised that no one mentioned this but although its obvious for any non Object class your first action should be to read the source code for many classes .hashcode() is simply extended from Object in which case there are several different interesting things that may happen depending on your JVM implementation. Object.hashcode() calls to System.identityHashcode(object).
Indeed using object address in memory is ancient history but many do not realise they can control this behaviour and how Object.hashcode() is computed via jvm argument -XX:hashCode=N where N can be a number from [0-5]...
0 – Park-Miller RNG (default, blocking)
1 – f(address, global_statement)
2 – constant 1
3 – serial counter
4 – object address
5 – Thread-local Xorshift
Depending on an application you may see unexpected performance hits when .hashcode() is called, when that happens it is likely you are using one of the algorithms that shares global state and/or blocks.
According to javaDoc of "internal address of the object is converted into an integer". So it is clear that hashCode() method do not return internal address of object as it is. Link is provided below.
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--
To clear it please see following sample code:
public class HashCodeDemo
{
public static void main(String[] args)
{
final int CAPACITY_OF_MAP = 10000000;
/**
* hashCode as key, and Object as value
*/
java.util.HashMap<Integer, Object> hm1 = new java.util.HashMap<Integer, Object>(CAPACITY_OF_MAP);
int noOfDistinceObject = 0;
Object obj = null;
for(int i = 0; i < CAPACITY_OF_MAP; i++)
{
obj = new Object();
hm1.put(obj.hashCode(), new Object());
}
System.out.println("hm1.size() = "+hm1.size());
/**
* hashCode as key, and Object as value
*/
java.util.HashMap<Integer, Object> hm2 = new java.util.HashMap<Integer, Object>(CAPACITY_OF_MAP);
for(int i = 0; i < CAPACITY_OF_MAP; i++)
{
obj = new Object();
/**
* Each Object has unique memory location ,
* and if Object's hashCode is memory location then hashCode of Object is also unique
* then no object can put into hm2.
*
* If obj's hashCode is doesn't exists in hm1 then increment noOfDistinceObject , else add obj into hm2.
*/
if(hm1.get(obj.hashCode()) == null)
{
noOfDistinceObject++;
}
else
{
hm2.put(obj.hashCode(), new Object());
}
}
System.out.println("hm2.size() = "+hm2.size());
System.out.println("noOfDistinceObject = "+noOfDistinceObject);
}
}
Each Object has unique memory location , and if Object's hashCode method return memory location then hashCode of Object is also unique but if we run above sample code then some Objects have same hashcode value and some have unique hashcode value.
So we can say that hashCode method from Object class does not return memory location.

In Java, why must equals() and hashCode() be consistent?

If I override either method on a class, it must make sure that if A.equals(B) == true then A.hashCode() == B.hashCode must also be true.
Can someone show me a simple example where if this is violated, it'll cause a problem? I think it has something to do with if you use that class as the type of keys to Hashmap?
Sure:
public class Test {
private final int m, n;
public Test(int m, int n) {
this.m = m;
this.n = n;
}
public int hashCode() { return n * m; }
public boolean equals(Object ob) {
if (ob.getClass() != Test.class) return false;
Test other = (Test)ob;
return m == other.m;
}
}
with:
Set<Test> set = new HashSet<Test>();
set.put(new Test(3,4));
boolean b = set.contains(new Test(3, 10)); // false
Technically that should be true because m == 3 in both cases.
In general a HashMap works like this: it has a variable number of what are commonly called "buckets". The number of buckets can change over time (as entries are added and removed) but it is always a power of 2.
Let's say a given HashMap has 16 buckets. When you call put() to add an entry, the hashCode() of the key is calculated and then a mask is taken depending on the size of the buckets. If you (bitwise) AND the hashCode() with 15 (0x0F) you will get the last 4 bits, equaling a number between 0 and 15 inclusive:
int factor = 4;
int buckets = 1 << (factor-1) - 1; // 16
int mask = buckets - 1; // 15
int code = key.hashCode();
int dest = code & mask; // a number from 0 to 15 inclusive
Now if there is already an entry in that bucket you have what's called a collision. There are multiple ways of dealing with this but the one used by HashMap (and is probably the most common overall) is bucketing. All the entries with the same masked hashCode are put in a list of some kind.
So to find if a given key is in the map already:
Calculate the masked hash code;
Find the appropriate bucket;
If it's empty, key not found;
If is isn't empty, loop through all entries in the bucket checking equals().
Looking through a bucket is a linear (O(n)) operation but it's on a small subset. The hashcode bucket determination is essentially constant (O(1)). If buckets are sufficiently small then access to a HashMap is usually described as "near O(1)".
You can make a couple of observations about this.
Firstly, if you have a bunch of objects that all return 42 as their hash code a HashMap will still work but it will operate as an expensive list. Access will be O(n) (as everything will be in the same bucket regardless of the number of buckets). I've actually been asked this in an interview.
Secondly, returning to your original point, if two objects are equal (meaning a.equals(b) == b.equals(a) == true) but have different hash codes then the HashMap will go looking in (probably) the wrong bucket resulting in unpredictable and undefined behaviour.
This is discussed in the Item 8: Always override hashCode when you override equals of Joshua Bloch's Effective Java:
A common source of bugs is the failure to override the hashCode method. You must
override hashCode in every class that overrides equals. Failure to do so will
result in a violation of the general contract for Object.hashCode, which will pre-
vent your class from functioning properly in conjunction with all hash-based collec-
tions, including HashMap, HashSet, and Hashtable.
Here is the contract, copied from the
java.lang.Object specification:
Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The key provision that is violated when you fail to override hashCode is
the second one: Equal objects must have equal hash codes. Two distinct
instances may be logically equal according to the class’s equals method, but to
the Object class’s hashCode method, they’re just two objects with nothing much
in common. Therefore object’s hashCode method returns two seemingly random
numbers instead of two equal numbers as required by the contract.
For example, consider the following simplistic PhoneNumber class, whose
equals method is constructed according to the recipe in Item 7:
public final class PhoneNumber {
private final short areaCode;
private final short exchange;
private final short extension;
public PhoneNumber(int areaCode, int exchange,
int extension) {
rangeCheck(areaCode, 999, "area code");
rangeCheck(exchange, 999, "exchange");
rangeCheck(extension, 9999, "extension");
this.areaCode = (short) areaCode;
this.exchange = (short) exchange;
this.extension = (short) extension;
}
private static void rangeCheck(int arg, int max,
String name) {
if (arg < 0 || arg > max)
throw new IllegalArgumentException(name +": " + arg);
}
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof PhoneNumber))
return false;
PhoneNumber pn = (PhoneNumber)o;
return pn.extension == extension &&
pn.exchange == exchange &&
pn.areaCode == areaCode;
}
// No hashCode method!
... // Remainder omitted
}
Suppose you attempt to use this class
with a HashMap:
Map m = new HashMap();
m.put(new PhoneNumber(408, 867, 5309), "Jenny");
At this point, you might expect
m.get(new PhoneNumber(408 , 867,
5309)) to return "Jenny", but it
returns null. Notice that two PhoneNumber instances are
involved: One is used for insertion
into the HashMap, and a second, equal,
instance is used for (attempted)
retrieval. The PhoneNumber class’s
failure to override hashCode causes
the two equal instances to have
unequal hash codes, in violation of
the hashCode contract. Therefore the
get method looks for the phone number
in a different hash bucket from the
one in which it was stored by the put
method. Fixing this problem is as
simple as providing a proper hashCode
method for the PhoneNumber class.
[...]
See the Chapter 3 for the full content.
Containers like HashSet rely on the hash function to determine where to put it, and where to get it from when asked for it. If A.equals(B), then a HashSet is expecting A to be in the same place as B. If you put A in with value V, and look up B, you should expect to get V back (since you've said A.equals(B)). But if A.hashcode() != B.hashcode(), then the hashset may not find where you put it.
Here's a little example:
Set<Foo> myFoos = new HashSet<Foo>();
Foo firstFoo = new Foo(123,"Alpha");
myFoos.add(firstFoo);
// later in the processing you get another Foo from somewhere
Foo someFoo = //use imagination here...;
// maybe you get it from a database... and it's equal to Foo(123,"Alpha)
if (myFoos.contains(someFoo)) {
// maybe you win a million bucks.
}
So, imagine that the hashCode that gets created for firstFoo is 99999 and it winds up at a specific spot in the myFoos HashSet. Later when you get the someFoo and you look for it in the myFoos HashSet, it needs to generate the same hashCode so you can find it.
It's exactly because of hash tables.
Because of the possibility of hash code collisions, hash tables need to check identity as well, otherwise the table can't determine if it found the object it was looking for, or one with the same hash code. So every get() in a hash table calls key.equals(potentialMatch) before returning a value.
If equals() and hashCode() are inconsistent you can get very inconsistent behavior. Say for two objects, a and b, a.equals(b) returns true, but a.hashCode() != b.hashCode(). Insert a and a HashSet will return false for .contains(b), but a List created from that set will return true (because the list doesn't use hash codes).
HashSet set = new HashSet();
set.add(a);
set.contains(b); // false
new ArrayList(set).contains(b); // true
Obviously, that could be bad.
The idea behind this is that two objects are "equal" if all of their fields have equal values. If all of fields have equal values, the two objects should have the same hash value.

Categories