Custom key generation and collision in a hashMap - java

I have a method that is expected to save an object in a hashmap (used as a cache) that has as a key a String.
The objects that are passed in the method have either fields that are “volatile” i.e. they can change on a next refresh from the data store or are the same across all objects except for 3 fields.
Those fields are 2 of type double and 1 field of type String.
What I am doing now is I use:
Objects.hash(doubleField1, doubleField2, theString)
and convert the result to String.
Basically I am generating a key for that object based on the immutable state.
This seems to work but I have the following question:
If we exclude the case that we have 2 objects with the exact same fields (which is impossible) how likely is that I could end up with a collision that won’t be able to be verified properly?
I mean that if I have a hashmap with keys e.g. strings etc if there is a collision on the hashCode the actual value of the key is compared to verify if it is the same object and not a collision.
Would using keys the way I have described create problems in such verification?
Update:
If I have e.g. a hashmap with key a Person and the hashCode is generated using fullName and ssn or dateOfBirth if there is a collision then the hashmap implementation uses equals to verify if it is the actual object being searched for.
I was wondering if the approach I describe could have some issue in that part because I generate the actual key directly

Here is a simple demo for a hashMap key implementation. When retrieving the object I construct the fields piecemeal to avoid any possibility of using cached Strings or Integers. It makes a more convincing demo.
Map<MyKey, Long> map = new HashMap<>();
map.put(new MyKey(10,"abc"), 1234556L);
map.put(new MyKey(400,"aefbc"), 548282L);
int n = 380;
long v = map.get(new MyKey(n + 20, "ae" + "fbc")); // Should get 548282
System.out.println(v);
prints
548282
The key class
class MyKey {
privat eint v;
private String s;
private int hashcode;
public MyKey(int v, String s) {
Objects.requireNonNull(s, "String must be provided");
this.v = v;
this.s = s;
// this class is immutable so no need to keep
// computing hashCode
hashcode = Objects.hash(s,v);
}
#Override
public int hashCode() {
return hashcode;
}
#Override
public boolean equals(Object o) {
if (o == this) {
return true;
}
if (o == null) {
return false;
}
if (o instanceof MyKey) {
MyKey mk = (MyKey)o;
return v == mk.v && s.equals(mk.s);
}
return false;
}
}

Related

What immutable object works with List.contains?

I want to create a list of coordinates and be able to check if this list contains a new coordinate.
I have tried implementing Pair, but using List.contains() with a Pair doesn't work, always returning false.
What other objects can I use that I will be able to check against List.contains()?
This is probably because your implementation of Pair doesn't provide an overridden equals method. I was able to reproduce your issue using the following code:
//a plain POJO, in Pair.java
public class Pair<A, B> {
private A a;
private B b;
public Pair(A a, B b) {
this.a = a;
this.b = b;
}
public A getA() {
return a;
}
public B getB() {
return b;
}
}
//... Main.java
public class Main {
public static void main(String[] args) {
List<Pair<Integer, String>> list = new ArrayList<>();
Pair<Integer, String> one = new Pair<>(1, "hello");
Pair<Integer, String> two = new Pair<>(1, "hello");
list.add(one);
System.out.println(list.contains(two));
}
}
This prints out false, because List.contains uses object equality as a test (the default equals method in class Object). With the above code, for example, one.equals(two) evaluates to false, because they're not the same object. To fix this, you have to provide an equals method that looks at each field and compares them individually:
//in class Pair
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Pair<?, ?> pair = (Pair<?, ?>) o;
return Objects.equals(a, pair.a) &&
Objects.equals(b, pair.b);
}
You can delegate the somewhat tedious and error-prone task of writing this code to your IDE. I'm using Intellij, and it's only a matter of clicking Code/Generate/equals() and hashcode() . You don't need hashcode() for this particular case, but it's always a good idea to keep equals() and hashcode() together. Now, when List.contains tries to find an element that is equal to the one you provided, it will use this new and more appropriate method. If you run the main method again, you'll see it evaluates to true.
List.contains checks to see if two objects are equal using the Object.equals method. Therefore, it only works with objects that have an Object.equals method implemented. If you are writing your own Pair class, make sure you give it an equals method.
I think your Pair contains the same value, but the object isn't the same. That's because it doesn't work. You could use a Map instead.
How you can use a map:
Map<String, Integer> map = new HashMap<>();
map.put("Hello", 0); // add key an value to map (Key: "Hello", Value: 0)
map.put("World", 1); // add key an value to map (Key: "World", Value: 1)
if (map.containsKey("World")) { // check if map contains key: "World"
System.out.println(map.get("World")); // try return value by key: returns 1
System.out.println(map.get("world")); // try return value by key: returns null
}
if (map.containsKey("Hello")) { // check if map contains key: "Hello"
System.out.println(map.get("Hello")); // try return value by key: returns 0
System.out.println(map.get("hello")); // try return value by key: returns null
}

Who calls the equals method of the class while putting the elements into the HashMap?

I am new to Java (very new).
I am trying to understand HashMap and the equals method of a class and how it overrides the duplicates.
Please see following code:
public class Student {
Integer StudentId;
String Name;
String City;
public Student(Integer studentId, String name, String city) {
super();
StudentId = studentId;
Name = name;
City = city;
}
public Integer getStudentId() {
return StudentId;
}
public String getName() {
return Name;
}
public String getCity() {
return City;
}
#Override
public int hashCode() {
System.out.println("haschode is called for " + this);
final int prime = 31;
int result = 1;
result = prime * result + ((StudentId == null) ? 0 : StudentId.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
System.out.println("equals is called for " + this);
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Student other = (Student) obj;
if (StudentId == null) {
if (other.StudentId != null)
return false;
} else if (!StudentId.equals(other.StudentId))
return false;
return true;
}
#Override
public String toString() {
return "\n Student [StudentId=" + StudentId + ", Name=" + Name + ", City=" + City + "] \n";
}
public static void main(String[] args) {
// TODO Auto-generated method stub
Map<Student, String> myMap = new HashMap<Student, String>();
myMap.put(new Student(1, "andy", "p"), "Great"); //Line 1
myMap.put(new Student(2, "sachin", "m"), "Better");
myMap.put(new Student(3, "dev", "s"), "Good");
myMap.put(new Student(1, "andy", "p"), "Excellent"); // Line 4
System.out.println(myMap);
}
}
Now, the code written in main() calls the equals method only when I write the code to put the same key again i.e. "Line 4" (see my code indentation).
Why is the equals method not called for "Line 2" and "Line 3"??
It should call equals for every put line .... correct?
I am missing some understanding here and am left with questions:
(1) Why is every put not calling the equals method to check the equality of class members?
(2) Who triggers the call of the Student class equals method?
It should call equals for every put line .... correct ?
No. A HashMap will call equals only after it encounters a hash collision between an existing key and the one given in put.
Rephrased, it calls hashCode first to determine which "hash bucket" to put the key into, and if there are already keys inside the target bucket, it then uses equals to compare the keys in the bucket for equality.
Since the value of Student.hashCode() is based on ID alone, during insertion, the map only needs to call equals when it encounters a Student key with the same ID as what is being inserted. If no existing keys have the same hashCode as the one being inserted, there is no need to call equals.
This makes HashMap very efficient during insertion. This is also why there is a contract between hashCode and equals: If two objects are equal as defined by equals, they must also have the same hashCode (but not necessarily vice-versa).
equals() is not called if the hashCode() result is different. It's only the same for Line 1 and Line 4 (same student Id of 1), so equals() is called for that.
Note that hashCode() may be the same for two objects that aren't equals(), but two equals() objects must never have a different hashCode():
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
So the initially different hash code is enough to not call equals() afterwards.
The whole purpose of a hash-based map is to operate on hash values (for efficiency that is).
The Map first and foremost cares about different hash values. Thus, as long as any "incoming" key has an (so far) unknown hash, equality doesn't matter.
Only when you run into a conflicting hash, then it matters whether that incoming key is actually a different key, or the same key. In the first case, you add a new key/value pair to the map, in the second case, you update an already stored key with a potential new value!
Therefore calling equals() only happens for situations where the Map implementation has to decide whether two keys that have the same hash are equal, or not.
If hash code differs, then there is no case for calling equals. Look at the code for HashMap(). If the hash is the same, then equals is called.
As you can see while running your code, hashcode() is called for every .put() call.
Basicly hashcode is called for every put() operation, if it is unique then a new element can be placed in the map - as one of the conditions for the hashcode() says, that different hashcodes always represent different objects. However, different object don't always have different hashcodes. Because of that, if the hashcodes are the same for two objects, hashmap have to check object equality with equals().
If you want to watch how the hashCode and equals methods work for selected values,
create the following map:
Map<Key, Value> map = new HashMap<>();
Then create instances of the following classes and use them to populate the Map. I recommend using a String as the object of both classes since I used its equals method in both.
Notice that you supply the hashCode to be returned. This allows it to be the same or different so you can see how the map behaves in different situations.
class Value {
private Object obj;
private int hashCode;
public Value(Object obj, int hashCode) {
this.obj = obj;
this.hashCode = hashCode;
}
public int hashCode() {
System.out.println("Value: hashCode is called - " + hashCode);
return hashCode;
}
public boolean equals(Object o) {
System.out.println("Value: equals is called - " + obj);
return obj.equals(o);
}
public String toString() {
return "Value: obj = " + obj + ", hashCode = " + hashCode;
}
}
class Key {
private Object obj;
private int hashCode;
public Key(Object obj, int hashCode) {
this.obj = obj;
this.hashCode = hashCode;
}
public int hashCode() {
System.out.println("Key: hashCode is called - " + hashCode);
return hashCode;
}
public boolean equals(Object o) {
System.out.println("Key: equals is called - " + obj);
return obj.equals(o);
}
public String toString() {
return "Key: obj = " + obj + ", hashCode = " + hashCode;
}
}
You can read the source code of HashMap.java in JDK 1.7.
Than you will understand the questions you asked.
Other answers are more helpful after you have reading the source code of HashMap.

Peculiar HashMap Behavior

I was reviewing one of Oracle’s Java Certification Practice Exams when I came across the follow question:
Given:
class MyKeys {
Integer key;
MyKeys(Integer k) {
key = k;
}
public boolean equals(Object o) {
return ((MyKeys) o).key == this.key;
}
}
And this code snippet:
Map m = new HashMap();
MyKeys m1 = new MyKeys(1);
MyKeys m2 = new MyKeys(2);
MyKeys m3 = new MyKeys(1);
MyKeys m4 = new MyKeys(new Integer(2));
m.put(m1, "car");
m.put(m2, "boat");
m.put(m3, "plane");
m.put(m4, "bus");
System.out.print(m.size());
What is the result?
A) 2
B) 3
C) 4
D) Compilation fails
My guess was B because m1 and m3 are equal due to their key references being the same. To my surprise, the answer is actually C. Does put() do something that I am missing? Why wouldn’t "plane" replace "car"? Thank you!
With given definition of class i.e
class MyKeys {
Integer key;
MyKeys(Integer k) {
key = k;
}
public boolean equals(Object o) {
return ((MyKeys) o).key == this.key;
}
}
It will result ans = 4, it has only equal method, if you add definition of hashcode then it will result ans=3
class MyKeys {
Integer key;
MyKeys(Integer k) {
this.key = k;
}
#Override
public boolean equals(Object o) {
return ((MyKeys) o).key == this.key;
}
#Override
public int hashCode(){
return key*key;
}
}
Contract of equal and hashcode:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. If you only override equals() and not hashCode() your class violates this contract.
The problem you will have is with collections where unicity of elements is calculated according to both .equals() and .hashCode(), for instance keys in a HashMap.
If you have two objects which are .equals(), but have different hash codes, you lose!
If we keep this simple, since this is for a Java Certification.
Notice that MyKeys doesn't override hashCode, you know there will be something about it. And I usually try to remember only one thing about Object.hashCode
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
Or in short, every instance will have a distinct hashcode. Meaning that with this code, every new MyKeys will add a new pair in the map.
In reality, this is a bit more complex since the method still return an integer, so the risk of collision is still present (integer doesn't provide infinite amount of values). You can see a bit more about this here.
This explain why the answer is that the map will have a size of 4. Each key inserted is a different instance.
As answered by others, the ans will be 4, reason being not overriding hashcode method.
For a more clear reason, whenever an object is added in hash map, the hashcode of the key is generated, which decides the location of the entry set. The 2 objects m1 and m3 will have different hash codes as the hashcode method is not overridden (usual hashcode behaviour). Different hash code will not create any collision and a new entry is made.
On the contrary, the equals methods is called only after the hashcode method produces the same result, i.e., same hash code.
In case of m2 and m4 also, the 2 objects have different hash codes, hence 2 different entries, with no calling done to the equals method.
Hence, in cases of hashing, it is necessary to overload hashcode method, along with equals.
It will be more clear when we see the implementation of put method of HashMap.
// here hash(key) method is call to calculate hash.
// and in putVal() method use int hash to find right bucket in map for the object.
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
In your code you #Override only equals method.
class MyKeys {
Integer key;
MyKeys(Integer k) {
key = k;
}
public boolean equals(Object o) {
return ((MyKeys) o).key == this.key;
}
}
To achieve output you need to override both hashCode() and equals() method.

get() method for HashMap

I'm trying to get a value with the key. I'm using get() method. My key is an object composed with int and String. So I make and object
HashMap<Keys,String> test = readFile(fileName2);
Keys myKey = new Keys(2,"SM");
test.get(myKey);
And I received null. When I look at debbuging mode or when I print keySet I received something like this
[Keys#d9c6e2, Keys#10285d8, Keys#c3dd7e]
although my key should be
[1,"GM", 2,"SM", 3"PM"]
why the key look like this Keys#d9c6e2 instead of 2,"SM"? and how to get the value with the key 2,"SM"?
I override toString methid in Keys . It looks better but still i have null value and im sure there is some value.
Keys myKey = new Keys(2,"GL-G--T");
System.out.println(myKey.toString());
System.out.println(test.get(myKey.toString()));
Set keyset = test.keySet();
System.out.println(keyset);
2,GL-G--T
null
[3,PNSN--G, 2,GL-G--T, 1,SM]
You need to override toString method on your Keys object. Otherwise you will get the default toString provided by java.lang.Object.
You could implement the toString method to look something like this:
public class Keys {
private final Integer i;
private final String s;
public Keys(Integer i, String s) {
this.i = i;
this.s = s;
}
#Override
public String toString() {
return i + "," + s;
}
}
if you want the quotes to be displayed then you'd need to provide those:
return i + ",\"" + s + "\"";
You'll also need to override the equals and hashCode for this object to be used as a key in a map:
#Override
public boolean equals(Object o) {
if (!(o instanceof Keys)) {
return false;
}
Keys other = (Keys)o;
return other.s.equals(s) && other.i.equals(i);
}
#Override
public int hashCode() {
return toString().hashCode();
}
If you don't override equals and hashcode, then the map uses the default implementations, which results in two Keys objects with the same values being unequal.
You could as you are doing use a special Keys object as the key to your hash map-- you then just need to correctly implement hashCode and equals on that Keys class as others have explained.
Unless you have a specific reason not to, though, you really could just use a String as the key to the hash map. Create some method such as the following:
private static String getHashMapKeyFor(int intKey, String stringKey) {
return stringKey + "|" + intKey;
}
and declare your hash map as taking a String as the key type. Then, whenever you want to put/find a value in the hash map, call the above method first to get the actual key to use to the hash map.
Using the custom object class may have a superficial air of "correctness" or "engineeredness" to it, but in reality, just using a String will generally perform equally well and if anything may even save slightly on memory.
In your Keys.java object override the toString method. Currently it's using the method defined in java.lang.Object#toString

HashSet behavior when changing field value

I just did the following code:
import java.util.HashSet;
import java.util.Set;
public class MyClass {
private static class MyObject {
private int field;
public int getField() {
return field;
}
public void setField(int aField) {
field = aField;
}
#Override
public boolean equals(Object other) {
boolean result = false;
if (other != null && other instanceof MyObject) {
MyObject that = (MyObject) other;
result = (this.getField() == that.getField());
}
return result;
}
#Override
public int hashCode() {
return field;
}
}
public static void main(String[] args) {
Set<MyObject> mySet = new HashSet<MyObject>();
MyObject object = new MyObject();
object.setField(3);
mySet.add(object);
object.setField(5);
System.out.println(mySet.contains(object));
MyObject firstElement = mySet.iterator().next();
System.out.println("The set object: " + firstElement + " the object itself: " + object);
}
}
It prints:
false
The set object: MyClass$MyObject#5 the object itself: MyClass$MyObject#5
Basically meaning that the object is not considered to be in the set, whiile its instance itself apparantly is in the set. this means that if I insert a object in a set, then change the value of a field that participates in the calculation of the hashCode method, then the HashSet method will seize working as expected. Isn;t this too big source of possible errors? How can someone defend against such cases?
Below is the quote from Set API. It explains everything.
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
http://docs.oracle.com/javase/7/docs/api/java/util/Set.html
HashSet is implemented on HashMap.
HashMap caches the hashCode of the key, So if you change the hashCode than even though the hash function maps the hashCode to the same bucket as the original object present but it will not find because before even checking the object equality it will check the hashCode.
see the line:
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
And if the hashCode maps to different bucket by hash function than the original object is present than obviously it can't find.
So even though same object if you change the hashCode hashSet can't find. Hope it helps.
So key for the HashMap or the object that you are putting into HashSet should be immutable or effective immutable.
#fazomisiek
public HashSet() {
map = new HashMap<E,Object>();
}
Similarly if you check the source of HashSet you can find it.
This problem is just a limitation of the implementation of java.util.HashSet and the underlying java.util.HashMap. Fundamentally, you're trading off the ability to modify elements in the set for faster insert/lookup performance - it's just part of the contract of using a hash set / map data structure.
If you can't guarantee everybody will remember they can't modify the objects in the set, the only way to absolutely guard against this happening is to only insert immutable objects into the set in the first place.

Categories