I am using HashMap.
Below is example code.
Classes MyKey and MyValue are inhereted from Object in simple way.
Java documentation says for Object and methods hashCode() and equals():
"As much as is reasonably practical, the hashCode method defined
by class Object does return distinct integers for distinct objects.
(This is typically implemented by converting the internal address
of the object into an integer, but this implementation technique
is not required by the JavaTM programming language.)"
"The equals method for class Object implements the most discriminating
possible equivalence relation on objects; that is, for any non-null reference
values x and y, this method returns true if and only if x and y refer
to the same object (x == y has the value true). "
My question is:
Can I trust that HashMap works in my example?
If not what would be right way to put simple objects into map without rewriting methods
hashCode() and equals()?
I am not sure but I have heard that Java may change location and addresss
of user objects during execution of program (Was it GC which may do that?)
If address and hash code of key2 have changed before the line
MyValue v = m.get(key2);
then calling m.get(key2) would return wrong value, null?
If this is true then I believe that also IdentityHashMap()
is useless for same reasons.
class MyKey
{
Integer v;
//<Perhaps more fields>
MyKey(Integer v) {this.v=v;}
}
class MyValue
{
String s;
//<Perhaps more fields>
MyValue(String s) {this.s = s;}
}
Then some code:
Map<MyKey,MyValue> m = new HashMap<MyKey,MyValue>();
MyKey key1 = new MyKey(5);
MyKey key2 = new MyKey(6);
MyKey key3 = new MyKey(7);
m.put(key1, new MyValue("AAA"));
m.put(key2, new MyValue("BBB"));
m.put(key3, new MyValue("CCC"));
.
.
//Is it sure that I will get value "AAA" here
//if entry with key2 has not been removed from map m?
MyValue v = m.get(key2);
System.out.println("s="+v.s);
Can I trust that HashMap works in my example? If not what would be right way to put simple objects into map without rewriting methods hashCode() and equals()?
You cannot avoid providing a sensible hashCode and equals methods, it is required for HashMap and other Hash collections to work. (with the exception of IdentityHashMap)
I am not sure but I have heard that Java may change location and addresss of user objects during execution of program (Was it GC which may do that?)
While this is true, it has nothing to do with your main question.
If address and hash code of key2 have changed before the line
The address and hashCode have nothing to do with one another. If the address changes it doesn't change the hashCode and if you change the hashCode it doesn't change the address.
If this is true then I believe that also IdentityHashMap() is useless for same reasons.
Even if you assume hashCode is useless, this doesn't affect IndentityHashCode because it doesn't use the hashCode or equals methods.
Objects are basically allocated continuously in memory from the Eden space. If you run
Object[] objects = new Object[20];
for (int i = 0; i < objects.length; i++)
objects[i] = new Object();
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
for (int i = 0; i < objects.length; i++) {
int location = unsafe.getInt(objects, Unsafe.ARRAY_OBJECT_BASE_OFFSET + Unsafe.ARRAY_OBJECT_INDEX_SCALE * i);
System.out.println(Integer.toHexString(location) + ": hashCode=" + Integer.toHexString(objects[i].hashCode()));
}
you might expect them to be continuous if the followed some memory location but they don't
eac89d10: hashCode=634e3372
eac89d20: hashCode=2313b44d
eac89d30: hashCode=62a23d38
eac89d40: hashCode=9615a1f
eac89d50: hashCode=233aa44
eac89d60: hashCode=59243f75
eac89d70: hashCode=5ac2480b
eac89d80: hashCode=907f8ba
eac89d90: hashCode=6a5a7ff7
eac89da0: hashCode=5b8767ad
eac89db0: hashCode=50ba0dfc
eac89dc0: hashCode=2198a037
eac89dd0: hashCode=2b3e8c1c
eac89de0: hashCode=17609872
eac89df0: hashCode=46b8705b
eac89e00: hashCode=76d88aa2
eac89e10: hashCode=275cea3
eac89e20: hashCode=4513098
eac89e30: hashCode=6e4d4d5e
eac89e40: hashCode=15128ee5
Java has four different way of encoding references in 32-bit and 64-bit, however if your maximum heap size is less than 2 GB it will be a simple 32-bit address as it was when I ran this example.
Can I trust that HashMap works in my example? If not what would be right way to put simple objects into map without rewriting methods hashCode() and equals()?
Your example does not provide hashCode or equals, so it will use the defaults.
The dafaults work with the object identity, which means o.equals(o2) will be true only if o and o2 refer to the same object.
MyKey m = new MyKey(1);
MyKey m2 = new MyKey(1);
MyKey m3 = m;
map.put(m,...);
map.get(m);//works
map.get(m2); //different object
map.get(m3);//works same object
I am not sure but I have heard that Java may change location and addresss of user objects during execution of program (Was it GC which may do that?)
The address of an object is not relevant, while the default hashCode might use it this happens only once for each object and then stays the same.
First off, the default hashcode of an object does not change after it is constructed so IdentityHashMap is not useless. Java may move objects around in memory but it doesn't change their identity hash code.
Secondly, if you don't define your own equals() method, then every object you construct will be not equals() to every other. This means if you want to use them as keys in a HashMap (or IdentityHashMap) you'll only be able to retrieve them by using the original object.
For example:
MyKey key = new MyKey(5);
m.put(key, value);
...
MyKey newKey = new MyKey(5);
m.get(newKey); // Will not find the value
As newKey and key are different objects, they are not ==. This is why you are recommended to override equals() (and hashcode()) for you objects.
A HashMap still works if you don't override equals() and hashcode() on your objects but it will generally not do what you want - in this case, it becomes equivalent to IdentityHashMap
Related
If hashCode() calculation uses immutable fields and equals() uses all the fields would it be a problem when the class is used as a hash key? E.g.
import java.util.Objects;
public class Car {
protected final long vin;
protected String state;
protected String plateNumber;
public Car( long v, String s, String p ) {
vin = v; state = s; plateNumber = p;
}
public void move( String s, String p ) {
state = s; plateNumber = p;
}
public int hashCode() {
return (int)( vin % Integer.MAX_VALUE );
}
public boolean equals( Object other ) {
if (this == other) return true;
else if (!(other instanceof Car)) return false;
Car otherCar = (Car) other;
return vin == otherCar.vin
&& Objects.equals( state, otherCar.state )
&& Objects.equals( plateNumber, otherCar.plateNumber );
}
}
And move() is called on a car object after it is inserted into a hashset, possible via a reference kept elsewhere.
I am not after performance issues here. Only correctness.
I have read java hashCode contact, few answers on SO including this by venerable Jon Skeet and this from big blue. I feel that the last link gives the best explanation and imply that above code is correct.
Edit
Conclusion:
This class satisfy constraints placed on ‘equals()’ and ‘hashCode()’ in java. However it violates restrictions additional requirements placed on ‘equals()’ when used as keys in collections, hashed or not.
The additional requirement is that ‘equals()’ need to be consistent as long as the object is a key.
See the counter example by Louis Wasserman and the reference provided by Douglas below.
Few clarifications:
A) This class satisfy java object level constraints:
( carA == carB ) implies ( carA.hashCode() == carB.hashCode() )
( carA.hashCode() != carB.hashCode() ) implies ( carA != carB )
equals() need to be reflexive, symmetric, transitive.
hashCode() need to be consistent. i.e. Cannot change for an object during its lifetime.
equals() need to be consistent as long as neither object is modified.
Note that the reverse of ‘1.’ and ‘2.’ are not necessary. And the class above satisfies all the conditions.
Also java docs mention "equals() … implements the most discriminating possible equivalence relation on objects", but not sure if that is compulsory.
B) As for performance, the increment in collision avoidance probability decrease with each successive member variable we combine. Usually few well chosen member variables is sufficient.
It's correct if you never, ever call move after the Car is in the map. Otherwise it's wrong. Both hashCode and equals have to stay consistent after a key is in the map.
When considering only the hashCode and equals contracts, you are correct that this implementation satisfies their requirements. hashCode using a strict subset of the fields that equals uses is sufficient to guarantee that a.equals(b) implies a.hashCode() == b.hashCode() as required.
However, things change when you bring in Map. From the Map javadoc, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map."
After you call move on a Car that is a key in a Map, the behavior of that Map is now unspecified. In many cases it will in practice still work the way you want it to, but bizarre things could happen in ways that are hard to predict. While it would technically be valid for the Map to spontaneously empty itself or switch all lookups to use a random number generator, a more likely scenario might go like this:
Car car1 = ...
Car car2 = ... // a copy of car1
Map<Car, String> map1 = ...
map1.put(car1, "value");
assert map1.get(car2).equals("value"); // true
car1.move(...);
assert map1.get(car2).equals("value"); // NullPointerException on the equals call, car2 is no longer found
Notice that neither car2 nor the Map were changed themselves in any way, but the mapping of car2 changed (or rather, disappeared) anyway. This behavior is not officially specified, but I would guess most Map implementations do behave this way.
You may mutate your key candidates as much as you want, before or after (not during) they are used as keys.
In practice, it is very hard to enforce this rule. If you mutate objects you do not have a control if somebody uses them as keys or not.
Immutability for keys is just easier, removes source of subtle, hard-to-find bugs and just work better for key.
In your case I see no correctness issues. But why you ever bother not to include all fields in hashcode?
Short answer: it should be OK, but prepare for bizarre behavior.
Longer answer: when you change fields that participate in equals() on a key, the value keyed by that key will no longer be found.
Still longer answer: this looks as X/Y problem: you're asking about X, but you really need X to accomplish Y. Maybe you should ask about Y?
The car in your case is uniquely identified by vin. A car equals to itself. But, a car can be registered in different states. Maybe the answer is to have a Registration object (or a few of them) attached to the car? And then you can separate car.equals() from registration.equals().
Hash works by putting items into "buckets". Each bucket is calculated by the hashcode. After finding the bucket then the search continues comparing each item one by one using equals.
For example:
During insertion: an object whose id is 100 is placed in bucket 5 (the hashcode calculated 5).
During retrieval: you ask the hashmap to find the item 100. If the hash calculates 7 now then the algorithm will search for your object in bucket 7 but your object will never be found as it is dwelling in bucket 5.
In summary: the hash code and the actual key work together. The former is used to know in which bucket the item should be. The latter is used by the equals comparison seeking the actual item to return.
When your hashCode() implementation uses only limited number of fields (vs equals) you're reducing performance of almost any algorithm/data structure that uses hashing: HashMap, HashSet etc. You're increasing collision probability - it's the situation when two different objects (equals return false) have the same hash value.
The short answer is: No.
Long answer:
Fully immutability is not neccessary. BUT:
Equals must only depend on immutable values. Hashcode must depend on immutable values either a constant or a subset of the values used in equals or all values used in equals. Values that are not mentioned within equals mustn't be part of hashcode.
If you mutate values equals and hashcode rely on it is likely that you do not find your objects again in a hash based datastructure. Look at this:
public class Test {
private static class TestObject {
private String s;
public TestObject(String s) {
super();
this.s = s;
}
public void setS(String s) {
this.s = s;
}
#Override
public boolean equals(Object obj) {
boolean equals = false;
if (obj instanceof TestObject) {
TestObject that = (TestObject) obj;
equals = this.s.equals(that.s);
}
return equals;
}
#Override
public int hashCode() {
return this.s.hashCode();
}
}
public static void main(String[] args) {
TestObject a1 = new TestObject("A");
TestObject a2 = new TestObject("A");
System.out.println(a1.equals(a2)); // true
HashMap<TestObject, Object> hashSet = new HashMap<>(); // hash based datastructure
hashSet.put(a1, new Object());
System.out.println(hashSet.containsKey(a1)); // true
a1.setS("A*");
System.out.println(hashSet.containsKey(a1)); // false !!! // Object is not found as the hashcode used initially before mutation was used to determine the hash bucket
a2.setS("A*");
System.out.println(hashSet.containsKey(a2)); // false !!! Because a1 is in wrong hash bucket ...
System.out.println(a1.equals(a2)); // ... even if the objects are equals
}
}
In this code I have declared a Initialized a String variable and then printed its hashcode, then reinitialized it to another value and then invoked the Garbage Collector to clear the dereferenced objects.
But when I reinitialize the String variable to its original value and print the hashcode, the same hashcode is getting printed. How?
public class TestGarbage1 {
public static void main(String args[]) {
String m = "JAVA";
System.out.println(m.hashCode());
m = "java";
System.gc();
System.out.println(m.hashCode());
m = "JAVA";
System.out.println(m.hashCode());
}
}
Hash code relates to object equality, not identity.
a.equals(b) implies a.hashCode() == b.hashCode()
(Provided the two methods have been implemented consistently)
Even if a gc were actually taking place here (and you weren't simply referencing strings in the constant pool), you wouldn't expect two string instances with the same sequence of chars not to be equal - hence, their hash codes will also be the same.
String a = new String("whatever");
String b = new String(a);
System.out.println(a == b); // false, they are not the same instance
System.out.println(a.equals(b)); // true, they represent the same string
System.out.println(a.hashCode() == b.hashCode()); // true, they represent the same string
I think you are misunderstanding something about how hashcodes work. Without going in to too much detail, in Java, hashcodes are used for many things. One example is used to find an item in a Hash datastructure like HashMap or HashSet.
A hash of the same value should always return the same hash. In this case, a hash of "JAVA" should never change because then it will break the agreement set forth in Java.
I think it's too complicated to go about how hashcodes for String are calculated. You can read more about it here. I can give you an example though.
Let's say you have a class Fruit and it has fields like shape, color and weight.
You must implement equals AND hashcode for this class. It is very important to do both because otherwise you are breaking the way Hashmap work. Let's say you make this for your hashCode() method.
#Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + this.color;
hash = hash * 31 + this.shape.hashCode();
hash = hash * 31 + this.weight;
return hash;
}
This will generate the same hash value EVERY TIME for the two Fruit instances that are equal. That is exactly what you would want.
Really quick, how would this be actually used in a HashMap? Let's say you want to see if you have foo = new Fruit(); HashMap first calculates foo.hashCode(). It checks to see if there is anything in the bucket for that hashCode. If there is then it will use the equals() method until it returns true. It must do this because there might be hashcode collisions. And that's why it is important why equals and hashCode should be implemented together.
I am getting confused with one concept. Can someone please throw some light on it.
Question: If the key of Hashmap is Immutable Object(create by developer) then do we need to override hashcode() and equals()? Or having immutable field as key solves the problem of overriding hashcode() and equals()?
Thanks.
Yes. I'll cite the example of java.lang.Integer here. If we wish to have a (sparse) mapping of integers to objects, we'd use something along the lines of HashMap<Integer, Object>. If we add an entry Integer.valueOf(2)=>"foo", and try to retrieve it with new Integer(2) then the overridden hashcode and equals are required.
These are slightly different categories of issues.
As in hexafraction's answer, having immutable instances is not sufficient to let you skip the step of writing an equals and hashCode, if two different instances could ever be considered to be the same. new Integer(2) should always be equal to every other new Integer(2), even though the objects are immutable and the instances are different.
That said, there are examples of "instance-controlled classes" where the default behavior of instance identity is enough:
Enum instances are created at compile time, one per value. There is (theoretically) no way to produce any other instance. If no two instances are equal, the default implementation of equals and hashCode is sufficient. Enum instances aren't compiler-guaranteed to be immutable, but you should treat them as such.
If your class's instances are guaranteed to be different from one another, regardless of whether they're immutable, you can skip equals and hashCode. One could imagine a Car object, where every Car that a CarFactory produces is different.
As a variation of the above, if you control object instantiation tightly enough that equal representations are always given the exact same instance, then that could be considered sufficient:
public class MyClass {
private MyClass(int foo) { /* ... */ }
private static final Map<Integer, MyClass> instanceCache = new HashMap<>();
/** Returns an existing MyClass(foo) if present; otherwise, creates one. */
public synchronized static MyClass create(int foo) {
// Neither leak-proof or thread-safe. Just demonstrating a concept.
if (instanceCache.contains(foo)) {
return instanceCache.get(foo);
}
MyClass newMyClass = new MyClass(foo);
instanceCache.put(foo, newMyClass);
return newMyClass;
}
}
Try it and see:
public class OverrideIt
{
static class MyKey
{
public final int i; // immutable
public MyKey(int i)
{
this.i = i;
}
}
public static void main(String[] args)
{
Map<MyKey, String> map = new HashMap<MyKey, String>();
map.put(new MyKey(1), "Value");
String result = map.get(new MyKey(1));
System.out.println(result); // null
}
}
which prints out null, showing that we failed to look up our value. This is because the two copies of MyKey are not equal and don't have the same hashcode, because we did not override .equals() and hashcode().
Facts
Basic data structure for hashMap is Entry[] (Entry is kind of LinkedList).
Key's hashcode used to locate the position in this Array
Once the Entry retried using hashcode then Key's Equal used to pick the correct Entry (By Iterating hasNext)
Default hashcode returns unique Integer value for that instance.
And you Agreed in comment section
"Is it possible that you'll store an object using one key,
and then try to retrieve it using a key which is an identical object,
but not the same object"
Even though both keys having same value the instances are different. then you might have different hashcode for both keys as per contract (Fact 4) . Thus you will have different position in array (Rule 2)
map.put(new key(1),"first element");
Here key Object does not override so it will return hashcode unquie per instance. (to avoid too complication assume the hashcode returned as 000025 . So Entry[25] is "First Element" )
map.get(new key(1))
Now this new key may return hashcode value as 000017 , So it would try to get value from Entry[17] and return null ( which is not excepted) .
Note I just gave sample as 000025 and 000017 for simplicity , actually hashmap would revisit the hashcode and change it based on array size
So far we have not discussed weather the Key is Mutable or Immutable . Irrespective of the key is Mutable or Immutable
If you store an object using one key,and then try to retrieve it using a key
which is an identical object,but not the same object
You need to override the hashcode and make sure it returns the same Integer , so that it would locate same bucket (position in Array) and get the element. Same applies to equals to get the correct element from Entry
Have the following class:
public class Member {
private int x;
private long y;
private double d;
public Member(int x, long y, double d) {
this.x = x;
this.y = y;
this.d = d;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + x;
result = (int) (prime * result + y);
result = (int) (prime * result + Double.doubleToLongBits(d));
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj instanceof Member) {
Member other = (Member) obj;
return other.x == x && other.y == y
&& Double.compare(d, other.d) == 0;
}
return false;
}
public static void main(String[] args) {
Set<Member> test = new HashSet<Member>();
Member b = new Member(1, 2, 3);
test.add(b);
System.out.println(b.hashCode());
b.x = 0;
System.out.println(b.hashCode());
Member first = test.iterator().next();
System.out.println(test.contains(first));
System.out.println(b.equals(first));
System.out.println(test.add(first));
}
}
It produces the following results:
30814
29853
false
true
true
Because the hashCode depends of the state of the object it can no longer by retrieved properly, so the check for containment fails. The HashSet in no longer working properly. A solution would be to make Member immutable, but is that the only solution? Should all classes added to HashSets be immutable? Is there any other way to handle the situation?
Regards.
Objects in hashsets should either be immutable, or you need to exercise discipline in not changing them after they've been used in a hashset (or hashmap).
In practice I've rarely found this to be a problem - I rarely find myself needing to use complex objects as keys or set elements, and when I do it's usually not a problem just not to mutate them. Of course if you've exposed the references to other code by this time, it can become harder.
Yes. While maintaining your class mutable, you can compute the hashCode and the equals methods based on immutable values of the class ( perhaps a generated id ) to adhere to the hashCode contract defined in Object class:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Depending on your situation this may be easier or not.
class Member {
private static long id = 0;
private long id = Member.id++;
// other members here...
public int hashCode() { return this.id; }
public boolean equals( Object o ) {
if( this == o ) { return true; }
if( o instanceOf Member ) { return this.id == ((Member)o).id; }
return false;
}
...
}
If you need a thread safe attribute, you may consider use: AtomicLong instead, but again, it depends on how are you going to use your object.
As already mentioned, one can accept the following three solutions:
Use immutable objects; even when your class is mutable, you may use immutable identities on your hashcode implementation and equals checking, eg an ID-like value.
Similarly to the above, implement add/remove to get a clone of the inserted object, not the actual reference. HashSet does not offer a get function (eg to allow you alter the object later on); thus, you are safe there won't exist duplicates.
Exercise discipline in not changing them after they've been used, as #Jon Skeet suggests
But, if for some reason you really need to modify objects after being inserted to a HashSet, you need to find a way of "informing" your Collection with the new changes. To achieve this functionality:
You can use the Observer design pattern, and extend HashSet to implement the Observer interface. Your Member objects must be Observable and update the HashSet on any setter or other method that affects hashcode and/or equals.
Note 1: Extending 3, using 4: we may accept alterations, but those that do not create an already existing object (eg I updated a user's ID, by assigning a new ID, not setting it to an existing one). Otherwise, you have to consider the scenario where an object is transformed in such a way that is now equal to another object already existing in the Set. If you accept this limitation, 4th suggestion will work fine, else you must be proactive and define a policy for such cases.
Note 2: You have to provide both previous and current states of the altered object on your update implementation, because you have to initially remove the older element (eg use getClone() before setting new values), then add the object with the new state. The following snippet is just an example implementation, it needs changes based on your policy of adding a duplicate.
#Override
public void update(Observable newItem, Object oldItem) {
remove(oldItem);
if (add(newItem))
newItem.addObserver(this);
}
I've used similar techniques on projects, where I require multiple indices on a class, so I can look up with O(1) for Sets of objects that share a common identity; imagine it as a MultiKeymap of HashSets (this is really useful, as you can then intersect/union indices and work similarly to SQL-like searching). In such cases I annotate methods (usually setters) that must fireChange-update each of the indices when a significant change occurs, so indices are always updated with the latest states.
Jon Skeet has listed all alternatives. As for why the keys in a Map or Set must not change:
The contract of a Set implies that at any time, there are no two objects o1 and o2 such that
o1 != o2 && set.contains(o1) && set.contains(o2) && o1.equals(o2)
Why that is required is especially clear for a Map. From the contract of Map.get():
More formally, if this map contains a mapping from a key
k to a value v such that (key==null ? k==null : key.equals(k)), then this method returns v, otherwise it returns null. (There can be at most one such mapping.)
Now, if you modify a key inserted into a map, you might make it equal to some other key already inserted. Moreover, the map can not know that you have done so. So what should the map do if you then do map.get(key), where key is equal to several keys in the map? There is no intuitive way to define what that would mean - chiefly because our intuition for these datatypes is the mathematical ideal of sets and mappings, which don't have to deal with changing keys, since their keys are mathematical objects and hence immutable.
Theoretically (and more often than not practically too) your class either:
has a natural immutable identity that can be inferred from a subset of its fields, in which case you can use those fields to generate the hashCode from.
has no natural identity, in which case using a Set to store them is unnecessary, you could just as well use a List.
Never change 'hashable field" after putting in hash based container.
As if you (Member) registered your phone number (Member.x) in yellow page(hash based container), but you changed your number, then no one can find you in the yellow page any more.
What value does the hashCode() method return in java?
I read that it is a memory reference of an object... The hash value for new Integer(1) is 1; the hash value for String("a") is 97.
I am confused: is it ASCII or what type of value is?
The value returned by hashCode() is by no means guaranteed to be the memory address of the object. I'm not sure of the implementation in the Object class, but keep in mind most classes will override hashCode() such that two instances that are semantically equivalent (but are not the same instance) will hash to the same value. This is especially important if the classes may be used within another data structure, such as Set, that relies on hashCode being consistent with equals.
There is no hashCode() that uniquely identifies an instance of an object no matter what. If you want a hashcode based on the underlying pointer (e.g. in Sun's implementation), use System.identityHashCode() - this will delegate to the default hashCode method regardless of whether it has been overridden.
Nevertheless, even System.identityHashCode() can return the same hash for multiple objects. See the comments for an explanation, but here is an example program that continuously generates objects until it finds two with the same System.identityHashCode(). When I run it, it quickly finds two System.identityHashCode()s that match, on average after adding about 86,000 Long wrapper objects (and Integer wrappers for the key) to a map.
public static void main(String[] args) {
Map<Integer,Long> map = new HashMap<>();
Random generator = new Random();
Collection<Integer> counts = new LinkedList<>();
Long object = generator.nextLong();
// We use the identityHashCode as the key into the map
// This makes it easier to check if any other objects
// have the same key.
int hash = System.identityHashCode(object);
while (!map.containsKey(hash)) {
map.put(hash, object);
object = generator.nextLong();
hash = System.identityHashCode(object);
}
System.out.println("Identical maps for size: " + map.size());
System.out.println("First object value: " + object);
System.out.println("Second object value: " + map.get(hash));
System.out.println("First object identityHash: " + System.identityHashCode(object));
System.out.println("Second object identityHash: " + System.identityHashCode(map.get(hash)));
}
Example output:
Identical maps for size: 105822
First object value: 7446391633043190962
Second object value: -8143651927768852586
First object identityHash: 2134400190
Second object identityHash: 2134400190
A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code. If you write a custom type you are responsible for creating a good hashCode implementation that will best represent the state of the current instance.
If you want to know how they are implmented, I suggest you read the source. If you are using an IDE you can just + on a method you are interested in and see how a method is implemented. If you cannot do that, you can google for the source.
For example, Integer.hashCode() is implemented as
public int hashCode() {
return value;
}
and String.hashCode()
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
The hashCode() method is often used for identifying an object. I think the Object implementation returns the pointer (not a real pointer but a unique id or something like that) of the object. But most classes override the method. Like the String class. Two String objects have not the same pointer but they are equal:
new String("a").hashCode() == new String("a").hashCode()
I think the most common use for hashCode() is in Hashtable, HashSet, etc..
Java API Object hashCode()
Edit: (due to a recent downvote and based on an article I read about JVM parameters)
With the JVM parameter -XX:hashCode you can change the way how the hashCode is calculated (see the Issue 222 of the Java Specialists' Newsletter).
HashCode==0: Simply returns random numbers with no relation to where
in memory the object is found. As far as I can make out, the global
read-write of the seed is not optimal for systems with lots of
processors.
HashCode==1: Counts up the hash code values, not sure at what value
they start, but it seems quite high.
HashCode==2: Always returns the exact same identity hash code of 1.
This can be used to test code that relies on object identity. The
reason why JavaChampionTest returned Kirk's URL in the example above
is that all objects were returning the same hash code.
HashCode==3: Counts up the hash code values, starting from zero. It
does not look to be thread safe, so multiple threads could generate
objects with the same hash code.
HashCode==4: This seems to have some relation to the memory location
at which the object was created.
HashCode>=5: This is the default algorithm for Java 8 and has a
per-thread seed. It uses Marsaglia's xor-shift scheme to produce
pseudo-random numbers.
I read that it is an memory reference of an object..
No. Object.hashCode() used to return a memory address about 14 years ago. Not since.
what type of value is
What it is depends entirely on what class you're talking about and whether or not it has overridden `Object.hashCode().
From OpenJDK sources (JDK8):
Use default of 5 to generate hash codes:
product(intx, hashCode, 5,
"(Unstable) select hashCode generation algorithm")
Some constant data and a random generated number with a seed initiated per thread:
// thread-specific hashCode stream generator state - Marsaglia shift-xor form
_hashStateX = os::random() ;
_hashStateY = 842502087 ;
_hashStateZ = 0x8767 ; // (int)(3579807591LL & 0xffff) ;
_hashStateW = 273326509 ;
Then, this function creates the hashCode (defaulted to 5 as specified above):
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0 ;
if (hashCode == 0) {
// This form uses an unguarded global Park-Miller RNG,
// so it's possible for two threads to race and generate the same RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random() ;
} else
if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
} else
if (hashCode == 2) {
value = 1 ; // for sensitivity testing
} else
if (hashCode == 3) {
value = ++GVars.hcSequence ;
} else
if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj) ;
} else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX ;
t ^= (t << 11) ;
Self->_hashStateX = Self->_hashStateY ;
Self->_hashStateY = Self->_hashStateZ ;
Self->_hashStateZ = Self->_hashStateW ;
unsigned v = Self->_hashStateW ;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
Self->_hashStateW = v ;
value = v ;
}
value &= markOopDesc::hash_mask;
if (value == 0) value = 0xBAD ;
assert (value != markOopDesc::no_hash, "invariant") ;
TEVENT (hashCode: GENERATE) ;
return value;
}
So we can see that at least in JDK8 the default is set to random thread specific.
Definition: The String hashCode() method returns the hashcode value of the String as an Integer.
Syntax:
public int hashCode()
Hashcode is calculated using below formula
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
where:
s is ith character in the string
n is length of the string
^ is exponential operand
Example:
For example if you want to calculate hashcode for string "abc" then we have below details
s[] = {'a', 'b', 'c'}
n = 3
So the hashcode value will be calculated as:
s[0]*31^(2) + s[1]*31^1 + s[2]
= a*31^2 + b*31^1 + c*31^0
= (ASCII value of a = 97, b = 98 and c = 99)
= 97*961 + 98*31 + 99
= 93217 + 3038 + 99
= 96354
So the hashcode value for 'abc' is 96354
Object.hashCode(), if memory serves correctly (check the JavaDoc for java.lang.Object), is implementation-dependent, and will change depending on the object (the Sun JVM derives the value from the value of the reference to the object).
Note that if you are implementing any nontrivial object, and want to correctly store them in a HashMap or HashSet, you MUST override hashCode() and equals(). hashCode() can do whatever you like (it's entirely legal, but suboptimal to have it return 1.), but it's vital that if your equals() method returns true, then the value returned by hashCode() for both objects are equal.
Confusion and lack of understanding of hashCode() and equals() is a big source of bugs. Make sure that you thoroughly familiarize yourself with the JavaDocs for Object.hashCode() and Object.equals(), and I guarantee that the time spent will pay for itself.
From the Javadoc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--
I'm surprised that no one mentioned this but although its obvious for any non Object class your first action should be to read the source code for many classes .hashcode() is simply extended from Object in which case there are several different interesting things that may happen depending on your JVM implementation. Object.hashcode() calls to System.identityHashcode(object).
Indeed using object address in memory is ancient history but many do not realise they can control this behaviour and how Object.hashcode() is computed via jvm argument -XX:hashCode=N where N can be a number from [0-5]...
0 – Park-Miller RNG (default, blocking)
1 – f(address, global_statement)
2 – constant 1
3 – serial counter
4 – object address
5 – Thread-local Xorshift
Depending on an application you may see unexpected performance hits when .hashcode() is called, when that happens it is likely you are using one of the algorithms that shares global state and/or blocks.
According to javaDoc of "internal address of the object is converted into an integer". So it is clear that hashCode() method do not return internal address of object as it is. Link is provided below.
https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--
To clear it please see following sample code:
public class HashCodeDemo
{
public static void main(String[] args)
{
final int CAPACITY_OF_MAP = 10000000;
/**
* hashCode as key, and Object as value
*/
java.util.HashMap<Integer, Object> hm1 = new java.util.HashMap<Integer, Object>(CAPACITY_OF_MAP);
int noOfDistinceObject = 0;
Object obj = null;
for(int i = 0; i < CAPACITY_OF_MAP; i++)
{
obj = new Object();
hm1.put(obj.hashCode(), new Object());
}
System.out.println("hm1.size() = "+hm1.size());
/**
* hashCode as key, and Object as value
*/
java.util.HashMap<Integer, Object> hm2 = new java.util.HashMap<Integer, Object>(CAPACITY_OF_MAP);
for(int i = 0; i < CAPACITY_OF_MAP; i++)
{
obj = new Object();
/**
* Each Object has unique memory location ,
* and if Object's hashCode is memory location then hashCode of Object is also unique
* then no object can put into hm2.
*
* If obj's hashCode is doesn't exists in hm1 then increment noOfDistinceObject , else add obj into hm2.
*/
if(hm1.get(obj.hashCode()) == null)
{
noOfDistinceObject++;
}
else
{
hm2.put(obj.hashCode(), new Object());
}
}
System.out.println("hm2.size() = "+hm2.size());
System.out.println("noOfDistinceObject = "+noOfDistinceObject);
}
}
Each Object has unique memory location , and if Object's hashCode method return memory location then hashCode of Object is also unique but if we run above sample code then some Objects have same hashcode value and some have unique hashcode value.
So we can say that hashCode method from Object class does not return memory location.