LinkedHashSet: hashCode() and equals() match, but contains() doesn't

LinkedHashSet: hashCode() and equals() match, but contains() doesn't - java

How is the following possible:
void contains(LinkedHashSet data, Object arg) {
System.out.println(data.getClass()); // java.util.LinkedHashSet
System.out.println(arg.hashCode() == data.iterator().next().hashCode()); // true
System.out.println(arg.equals(data.iterator().next())); // true
System.out.println(new ArrayList(data).contains(arg)); // true
System.out.println(new HashSet(data).contains(arg)); // true
System.out.println(new LinkedHashSet(data).contains(arg)); // true (!)
System.out.println(data.contains(arg)); // false
}
Am I doing something wrong?
Obviously, it doesn't always happen (if you create a trivial set of Objects, you won't reproduce it). But it does always happen in my case with more complicated class of arg.
EDIT: The main reason why I don't define arg here is that's it's fairly big class, with Eclipse-generated hashCode that spans 20 lines and equals twice as long. And I don't think it's relevant - as long as they're equal for the two objects.

When you build your own objects, and plan to use them in a collection you should always override the following methods:
boolean equals(Object o);
int hashCode();
The default implementation of equals checks whether the objects point to the same object, while you'd probably want to redefine it to check the contents.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. To respect the rules an hashCode of an object equals to another one should be the same, thus you've also to redefine hashCode.
EDIT: I was expecting a faulty hashCode or equals implementation, but since your answer, you revealed that you're mutating the keys after they are added to an HashSet or HashMap.
When you add an Object to an hash collection, its hashCode is computed and used to map it to a physical location in the Collection.
If some fields used to compute the hashCode are changed, the hashCode itself will change, so the HashSet implementation will become confused. When it tries to get the Object it will look at another physical location, and won't find the Object. The Object will still be present if you enumerate the set though.
For this reason, always make HashMap or HashSet keys Immutable.

Got it. Once you know it, the answer is so obvious you can only blush in embarrassment.
static class MyObj {
String s = "";
#Override
public int hashCode() {
return s.hashCode();
}
#Override
public boolean equals(Object obj) {
return ((MyObj) obj).s.equals(s);
}
}
public static void main(String[] args) {
LinkedHashSet set = new LinkedHashSet();
MyObj obj = new MyObj();
set.add(obj);
obj.s = "a-ha!";
contains(set, obj);
}
That is enough to reliably reproduce it.
Explanation: Thou Shalt Never Mutate Fields Used For hashCode()!

There seems to be something missing from your question. I have made some guesses:
private void testContains() {
LinkedHashSet set = new LinkedHashSet();
String hello = "Hello!";
set.add(hello);
contains(set, hello);
}
void contains(LinkedHashSet data, Object arg) {
System.out.println(data.getClass()); // java.util.LinkedHashSet
System.out.println(arg.hashCode() == data.iterator().next().hashCode()); // true
System.out.println(arg.equals(data.iterator().next())); // true
System.out.println(new ArrayList(data).contains(arg)); // true
System.out.println(new HashSet(data).contains(arg)); // true
System.out.println(new LinkedHashSet(data).contains(arg)); // true (!)
System.out.println(data.contains(arg)); // true (!!)
}
EDITED: To keep track of changing question!
I still get "true" for ALL but the first output. Please be more specific about the type of the "arg" parameter.

Related

Why does super.hashCode give different results on objects from the same Class?

I have a class DebugTo where if I have two equal instances el1, el2 a HashSet of el1 will not regard el2 as contained.
import java.util.Objects;
public class DebugTo {
public String foo;
public DebugTo(String foo) {
this.foo = foo;
}
#Override
public int hashCode() {
System.out.println(super.hashCode());
return Objects.hash(super.hashCode(), foo);
}
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
DebugTo that = (DebugTo) o;
return Objects.equals(foo, that.foo);
}
}
var el1 = new DebugTo("a");
var el2 = new DebugTo("a");
System.out.println("Objects.equals(el1, el2): " + Objects.equals(el1, el2));
System.out.println("Objects.equals(el2, el1): " + Objects.equals(el2, el1));
System.out.println("el1.hashCode(): " + el1.hashCode());
System.out.println("el2.hashCode(): " + el2.hashCode());
Objects.equals(el1, el2): true
Objects.equals(el2, el1): true
1205483858
el1.hashCode(): -1284705008
1373949107
el2.hashCode(): -357249585
From my analysis I have gathered that:
HashSet::contains calls hashCode not equals (relying on the Objects.equals(a, b) => a.hashSet() == b.hashSet())
super.hashCode() gives a different value both times.
Why does super.hashCode() give different results for el1 and el2? since they are of the same class, they have the same super class and so I expect super.hashCode() to give the same result for both.
The hashCode method was probably autogenerated by eclipse. If not answered above, why is super.hashCode used wrong here?

Because the default implementations of the equals and hashCode methods (which go hand in hand - you always override both or neither) treat any 2 different instances as not equal to each other. If you want different behaviour, you override equals and hashCode, and do not invoke super.equals / super.hashCode, or there'd be no point.
HashSets work as follows: They use .hashCode() to know which 'bucket' to put the object into, and if 2 objects end up in the same bucket, equals is used only on those very few objects to double check.
In other words, these are the rules:
If a.equals(b), then b.equals(a) must be true.
a.equals(a) must always be true.
If a.equals(b) and b.equals(c), a.equals(c) must be true.
If a.equals(b), a.hashCode() == b.hashCode() must be true.
The reverse of 4 does not hold: If a.hashCode() == b.hashCode(), that doesn't mean a.equals(b), and hashset does not require it.
Therefore, return 1; is a legal implementation of hashCode.
If a class has really bad hashcode spread (such as the idiotic but legal option listed in bullet 6), then the performance of hashset will be very bad. e.g. set.containsKey(k) which ordinarily takes constant time, will take linear time instead if your objects are all not-equal but have the same hashCode. Hence, do try to ensure hashcodes are as different as they can be.
HashSet and HashMap require stable objects, meaning, their behaviour when calling hashCode and equals cannot change over time.
From the above it naturally follows that overriding equals and not hashCode or vice versa is necessarily broken.
Breaking any of the above rules does not, generally, result in a compiler error. It often doesn't even result in an exception. But instead it results in bizarre behaviour with hashsets and hashmaps: You put an k/v pair in the map, and then immediately ask for the value back and you get null back instead of what you put in, or something completely different. Just an example.
NB: One weird effect of all this is that you cannot add equality-affecting state to subclasses, unless you apply a caveat that most classes including all classes in the core libraries don't apply.
Imagine as an example that we invent the notion of a 'coloured' arraylist. You could have a red '["Hello", "World"]' list, and a blue one:
class ColoredArrayList extends ArrayList {
Color color;
public ColoredArrayList(Color c) {
this.color = color;
}
}
You'd probably want an empty red list to not equal an empty blue one. However, that is impossible if you intend to follow the rules. That's because the equals/hashCode impl of ArrayList itself considers any other list equal to itself if it has the same items in the same order. Therefore:
List<String> a = new ArrayList<String>();
ColoredList<String> b = new ColoredList<String>(Color.RED);
a.equals(b); // this is true, and you can't change that!
Therefore, b.equals(a) must also be true (your impl of equals has to say that an empty red list is equal to an empty plain arraylist), and given that an empty arraylist is also equal to an empty blue one, given that a.equals(b) and b.equals(c) implies that a.equals(c), a red empty list has to be equal to a blue empty list.
There is an easy solution for this that brings in new problems, and a hard solution that is objectively better.
The easy solution is to define that you can't be equal to anything except exact instances of yourself, as in, any subclass is insta-disqualified. Imagine ArrayList's equals method returns false if you call it with an instance of a subclass of ArrayList. Then you could make your colored list just fine. But, this isn't necessarily great, for example, you probably want an empty LinkedList and an empty ArrayList to be equal.
The harder solution is to introduce a second method, canEqual, and call it. You override canEqual to return 'if other is instanceof the nearest class in my hierarchy that introduces equality-relevant state'. Thus, your ColoredList should have #Override public boolean canEqual(Object other) { return other instanceof ColoredList; }.
The problem is, all classes need to have that and use it, or it's not going to work, and ArrayList does not have it. And you can't change that.
Project Lombok can generate this for you if you prefer. It's not particularly common; I'd only use it if you really know you need it.

Duplicate elements being added in a set after I change the value after insertion

I have a set with two distinct objects added to it. After insertion, I change one of the objects in such a way that both the objects are equal (as verified by the overridden equals method in object class). At this point in time I have two duplicate elements in a set. Now I try to add these two duplicate objects in a new set and I am still able to add them even though the equals method returns true for them. Below is the code for the same. Can someone please tell me what exactly am I missing?
public class BasicSetImpl{
public int num; String entry;
public BasicSetImpl(int num, String entry){
this.num = num;
this.entry = entry;
}
#Override
public int hashCode() {
return Objects.hash(entry, num);
}
#Override
public boolean equals(Object obj) {
BasicSetImpl newObj = (BasicSetImpl)obj;
if (this.num == newObj.num)
return true;
else
return false;
}
public static void main(String[] args){
Set<BasicSetImpl> set = new HashSet<>();
BasicSetImpl k1 = new BasicSetImpl(1, "One");
BasicSetImpl k2 = new BasicSetImpl(2, "Two");
set.add(k1);
set.add(k2);
k2.num = 1;
System.out.println(k1.equals(k2)); //This line returns True
Set<BasicSetImpl> newSet = new HashSet<>();
newSet.add(k1);
newSet.add(k2);
//Set.size here is two

The hash based collection, in this case the HashSet uses the Object hashCode method to calculate the hash value as a function of contents of the object. Since you consider both the entry and num to determine the hashcode value of the object, these two objects has distinct hashcodes since they are different from entry. Thus they belong in two different hash buckets and never be recognized as the same.
However, if you set the entry as follows
k2.entry = "One";
then both k1 and k2 has the same hashcode value. Then they both belong to the same hash bucket and according to the equals method two objects are the same. Therefore, now the duplicates are ignored.
But, there's a problem lurking here. Ideally, equal objects must have equal hash codes. But, in your case it is not. Two objects are equal if they have the same num value, but they may produce different hashcode values. So the correct solution would be to just change your hashCode as follows.
#Override
public int hashCode() {
return Integer.hashCode(num);
}
Now, it should behave the way you would expect without the hack we added by setting k2.entry = "One";

hashCode and equals methods must be consistent.
From documentation
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
In your case despite they are equal their hashCodes are different.
When adding elements HashSet check hashe of the added object and only if there is existing object in set with the same hash, HashSet checks if these objects with same hashes are equal.

HashSet turns unreliable when modifying a field of a contained object. Why/When or how should I use a HashSet?

When I edit an object, which is contained within a HashSet, the hash of the object changes, but the HashSet is not updated internally. Therefor, I practically can add the same object twice:
TestObject testObject = new TestObject(1, "hello");
Set<TestObject> set = new HashSet<>();
set.add(testObject);
testObject.number = 2;
set.add(testObject);
set.forEach(System.out::println);
//will print
//{number:2, string:hello}
//{number:2, string:hello}
Full working code example:
import java.util.*;
public class Main {
public static void main(String[] args) {
TestObject testObject = new TestObject(1, "hello");
Set<TestObject> set = new HashSet<>();
// add initial object
set.add(testObject);
// modify object
testObject.number = 2;
testObject.string = "Bye";
// re-add same object
set.add(testObject);
set.forEach(System.out::println);
}
}
class TestObject {
public int number;
public String string;
public TestObject(int number, String string) {
this.number = number;
this.string = string;
}
#Override
public int hashCode() {
return Objects.hash(number, string);
}
#Override
public boolean equals(Object obj) {
if (!(obj instanceof TestObject)) {
return false;
}
TestObject o = (TestObject) obj;
return number == o.number && string.equals(o.string);
}
#Override
public String toString() {
return "{number:" + number + ", string:" + string + "}";
}
}
This means, after modifying an object which already is contained in a HashSet, theHashSet` turns unreliable or invalid.
Modifying an object that is somewhere contained in a Set (probably even without knowing) seems a regular use case to me . And something which I probably already have done a lot.
This throws me back and brings one basic question to me: When or why should I use a HashSet if it has such a behaviour?

Well, if you have a look at the HashSet source you'll see that it's basically a HashMap<E, Object> with the elements being the keys - and modifying keys of a hashmap is never a good idea. The map/set will not be updated if the hash would change, in fact the map/set wouldn't even know about that change.
In general keys of a HashMap or elements in a HashSet should be immutable in that their hash and equality doesn't change. In most cases the hash and equality are based on those object's (business) identity, so if number and string are both part of that object's identity then you shouldn't be able to change those.
Modifying an object that is somewhere contained in a Set (probably even without knowing) seems a regular use case to me . And something which I probably already have done a lot.
It's probably true that objects contained in sets are modified quite often but that normally would mean that data that's not used to generate the hashcode or to check equality are modified. As an example let's say a person's hashcode is based on their ID number. That would mean that hashCode() and equals() should only be based on that number and that everything else could be modified safely.
So you could modify elements in a HashSet as long as you're not modifying their "id".
When or why should I use a HashSet if it has such a behaviour?
If you need to store mutable objects in a HashSet you have a few options which basically revolve around using only the immutable parts for hashCode() and equals(). For sets that could be done by using a wrapper object that provides a customized implementation for those methods. Alternatively you could extract one or more immutable properties and use those as the key into a map (in case of multiple properties you'd need to build some sort of key object out of those)

You’re never supposed to compare strings with == use .equals instead

Adding an element that is already present, as you said, won't override the element that is already in the HashSet. Use a remove(), before calling the add(), to insure the new value to be inserted effectively.
Side note: as some users have noted, pay attention to the Strings' comparisons in your test.

Do java objects used as hash keys need to be 'fully' immutable?

If hashCode() calculation uses immutable fields and equals() uses all the fields would it be a problem when the class is used as a hash key? E.g.
import java.util.Objects;
public class Car {
protected final long vin;
protected String state;
protected String plateNumber;
public Car( long v, String s, String p ) {
vin = v; state = s; plateNumber = p;
}
public void move( String s, String p ) {
state = s; plateNumber = p;
}
public int hashCode() {
return (int)( vin % Integer.MAX_VALUE );
}
public boolean equals( Object other ) {
if (this == other) return true;
else if (!(other instanceof Car)) return false;
Car otherCar = (Car) other;
return vin == otherCar.vin
&& Objects.equals( state, otherCar.state )
&& Objects.equals( plateNumber, otherCar.plateNumber );
}
}
And move() is called on a car object after it is inserted into a hashset, possible via a reference kept elsewhere.
I am not after performance issues here. Only correctness.
I have read java hashCode contact, few answers on SO including this by venerable Jon Skeet and this from big blue. I feel that the last link gives the best explanation and imply that above code is correct.
Edit
Conclusion:
This class satisfy constraints placed on ‘equals()’ and ‘hashCode()’ in java. However it violates restrictions additional requirements placed on ‘equals()’ when used as keys in collections, hashed or not.
The additional requirement is that ‘equals()’ need to be consistent as long as the object is a key.
See the counter example by Louis Wasserman and the reference provided by Douglas below.
Few clarifications:
A) This class satisfy java object level constraints:
( carA == carB ) implies ( carA.hashCode() == carB.hashCode() )
( carA.hashCode() != carB.hashCode() ) implies ( carA != carB )
equals() need to be reflexive, symmetric, transitive.
hashCode() need to be consistent. i.e. Cannot change for an object during its lifetime.
equals() need to be consistent as long as neither object is modified.
Note that the reverse of ‘1.’ and ‘2.’ are not necessary. And the class above satisfies all the conditions.
Also java docs mention "equals() … implements the most discriminating possible equivalence relation on objects", but not sure if that is compulsory.
B) As for performance, the increment in collision avoidance probability decrease with each successive member variable we combine. Usually few well chosen member variables is sufficient.

It's correct if you never, ever call move after the Car is in the map. Otherwise it's wrong. Both hashCode and equals have to stay consistent after a key is in the map.

When considering only the hashCode and equals contracts, you are correct that this implementation satisfies their requirements. hashCode using a strict subset of the fields that equals uses is sufficient to guarantee that a.equals(b) implies a.hashCode() == b.hashCode() as required.
However, things change when you bring in Map. From the Map javadoc, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map."
After you call move on a Car that is a key in a Map, the behavior of that Map is now unspecified. In many cases it will in practice still work the way you want it to, but bizarre things could happen in ways that are hard to predict. While it would technically be valid for the Map to spontaneously empty itself or switch all lookups to use a random number generator, a more likely scenario might go like this:
Car car1 = ...
Car car2 = ... // a copy of car1
Map<Car, String> map1 = ...
map1.put(car1, "value");
assert map1.get(car2).equals("value"); // true
car1.move(...);
assert map1.get(car2).equals("value"); // NullPointerException on the equals call, car2 is no longer found
Notice that neither car2 nor the Map were changed themselves in any way, but the mapping of car2 changed (or rather, disappeared) anyway. This behavior is not officially specified, but I would guess most Map implementations do behave this way.

You may mutate your key candidates as much as you want, before or after (not during) they are used as keys.
In practice, it is very hard to enforce this rule. If you mutate objects you do not have a control if somebody uses them as keys or not.
Immutability for keys is just easier, removes source of subtle, hard-to-find bugs and just work better for key.
In your case I see no correctness issues. But why you ever bother not to include all fields in hashcode?

Short answer: it should be OK, but prepare for bizarre behavior.
Longer answer: when you change fields that participate in equals() on a key, the value keyed by that key will no longer be found.
Still longer answer: this looks as X/Y problem: you're asking about X, but you really need X to accomplish Y. Maybe you should ask about Y?
The car in your case is uniquely identified by vin. A car equals to itself. But, a car can be registered in different states. Maybe the answer is to have a Registration object (or a few of them) attached to the car? And then you can separate car.equals() from registration.equals().

Hash works by putting items into "buckets". Each bucket is calculated by the hashcode. After finding the bucket then the search continues comparing each item one by one using equals.
For example:
During insertion: an object whose id is 100 is placed in bucket 5 (the hashcode calculated 5).
During retrieval: you ask the hashmap to find the item 100. If the hash calculates 7 now then the algorithm will search for your object in bucket 7 but your object will never be found as it is dwelling in bucket 5.
In summary: the hash code and the actual key work together. The former is used to know in which bucket the item should be. The latter is used by the equals comparison seeking the actual item to return.

When your hashCode() implementation uses only limited number of fields (vs equals) you're reducing performance of almost any algorithm/data structure that uses hashing: HashMap, HashSet etc. You're increasing collision probability - it's the situation when two different objects (equals return false) have the same hash value.

The short answer is: No.
Long answer:
Fully immutability is not neccessary. BUT:
Equals must only depend on immutable values. Hashcode must depend on immutable values either a constant or a subset of the values used in equals or all values used in equals. Values that are not mentioned within equals mustn't be part of hashcode.
If you mutate values equals and hashcode rely on it is likely that you do not find your objects again in a hash based datastructure. Look at this:
public class Test {
private static class TestObject {
private String s;
public TestObject(String s) {
super();
this.s = s;
}
public void setS(String s) {
this.s = s;
}
#Override
public boolean equals(Object obj) {
boolean equals = false;
if (obj instanceof TestObject) {
TestObject that = (TestObject) obj;
equals = this.s.equals(that.s);
}
return equals;
}
#Override
public int hashCode() {
return this.s.hashCode();
}
}
public static void main(String[] args) {
TestObject a1 = new TestObject("A");
TestObject a2 = new TestObject("A");
System.out.println(a1.equals(a2)); // true
HashMap<TestObject, Object> hashSet = new HashMap<>(); // hash based datastructure
hashSet.put(a1, new Object());
System.out.println(hashSet.containsKey(a1)); // true
a1.setS("A*");
System.out.println(hashSet.containsKey(a1)); // false !!! // Object is not found as the hashcode used initially before mutation was used to determine the hash bucket
a2.setS("A*");
System.out.println(hashSet.containsKey(a2)); // false !!! Because a1 is in wrong hash bucket ...
System.out.println(a1.equals(a2)); // ... even if the objects are equals
}
}

Why should I override hashCode() when I override equals() method?

Ok, I have heard from many places and sources that whenever I override the equals() method, I need to override the hashCode() method as well. But consider the following piece of code
package test;
public class MyCustomObject {
int intVal1;
int intVal2;
public MyCustomObject(int val1, int val2){
intVal1 = val1;
intVal2 = val2;
}
public boolean equals(Object obj){
return (((MyCustomObject)obj).intVal1 == this.intVal1) &&
(((MyCustomObject)obj).intVal2 == this.intVal2);
}
public static void main(String a[]){
MyCustomObject m1 = new MyCustomObject(3,5);
MyCustomObject m2 = new MyCustomObject(3,5);
MyCustomObject m3 = new MyCustomObject(4,5);
System.out.println(m1.equals(m2));
System.out.println(m1.equals(m3));
}
}
Here the output is true, false exactly the way I want it to be and I dont care of overriding the hashCode() method at all. This means that hashCode() overriding is an option rather being a mandatory one as everyone says.
I want a second confirmation.

It works for you because your code does not use any functionality (HashMap, HashTable) which needs the hashCode() API.
However, you don't know whether your class (presumably not written as a one-off) will be later called in a code that does indeed use its objects as hash key, in which case things will be affected.
As per the documentation for Object class:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

Because HashMap/Hashtable will lookup object by hashCode() first.
If they are not the same, hashmap will assert object are not the same and return not exists in the map.

The reason why you need to #Override neither or both, is because of the way they interrelate with the rest of the API.
You'll find that if you put m1 into a HashSet<MyCustomObject>, then it doesn't contains(m2). This is inconsistent behavior and can cause a lot of bugs and chaos.
The Java library has tons of functionalities. In order to make them work for you, you need to play by the rules, and making sure that equals and hashCode are consistent is one of the most important ones.

Most of the other comments already gave you the answer: you need to do it because there are collections (ie: HashSet, HashMap) that uses hashCode as an optimization to "index" object instances, an those optimizations expects that if: a.equals(b) ==> a.hashCode() == b.hashCode() (NOTE that the inverse doesn't hold).
But as an additional information you can do this exercise:
class Box {
private String value;
/* some boring setters and getters for value */
public int hashCode() { return value.hashCode(); }
public boolean equals(Object obj) {
if (obj != null && getClass().equals(obj.getClass()) {
return ((Box) obj).value.equals(value);
} else { return false; }
}
}
The do this:
Set<Box> s = new HashSet<Box>();
Box b = new Box();
b.setValue("hello");
s.add(b);
s.contains(b); // TRUE
b.setValue("other");
s.contains(b); // FALSE
s.iterator().next() == b // TRUE!!! b is in s but contains(b) returns false
What you learn from this example is that implementing equals or hashCode with properties that can be changed (mutable) is a really bad idea.

It is primarily important when searching for an object using its hashCode() value in a collection (i.e. HashMap, HashSet, etc.). Each object returns a different hashCode() value therefore you must override this method to consistently generate a hashCode value based on the state of the object to help the Collections algorithm locate values on the hash table.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.