Violating rule of reflexivity when overriding equals method - java

I have a doubt from the book Efective Java. The doubt is regarding equals method
reflexive rule violation. The book says the following:
"If you were to violate it and then add an instance of your class to a collection, the collection's contains method would almost certainly say that the collection did not contain the instance that you just added."
To test it I wrote an example class, but the contains method doesn't return false It returns true. Can anybody tell what is the problem?

I agree that the result of this program is indeed puzzling:
import java.util.*;
class Item {
#Override
public boolean equals(Object obj) {
return false; // not even equal to itself.
}
}
class Test {
public static void main(String[] args) {
Collection<Item> items = new HashSet<Item>();
Item i = new Item();
items.add(i);
System.out.println(items.contains(i)); // Prints true!
}
}
The answer is that the contains implementation checks argument == object before doing argument.equals(object). The result from contains is true since item == item holds, even though item.equals(item) returns false.
Assuming equals follows its contract (is reflexive), this way of implementing contains is correct.
If you read the quote you posted carefully, the author includes the word "almost" :) It seems you stumbled across one of the few exceptions to the rule.
Other collections (ArrayList for instance) uses equals directly, and if you change from new HashSet<Item>() to new ArrayList<Item>() in the above program it prints false as expected.

Reflexive means x.equals(x) should return true
class Foo {
int i;
public boolean equals(Object obj) {
return ((Foo) obj).i < this.i;
}
}
this will return false. And when you put it into a list and call list.contains(foo) it will return false, because none of the elements in the list was equal to the one you passed. This is so because list.contains(..) iterates the elements and for each of them checks if (elem.equals(arg))
See the docs of Collection.contains(..)

Related

Why does super.hashCode give different results on objects from the same Class?

I have a class DebugTo where if I have two equal instances el1, el2 a HashSet of el1 will not regard el2 as contained.
import java.util.Objects;
public class DebugTo {
public String foo;
public DebugTo(String foo) {
this.foo = foo;
}
#Override
public int hashCode() {
System.out.println(super.hashCode());
return Objects.hash(super.hashCode(), foo);
}
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
DebugTo that = (DebugTo) o;
return Objects.equals(foo, that.foo);
}
}
var el1 = new DebugTo("a");
var el2 = new DebugTo("a");
System.out.println("Objects.equals(el1, el2): " + Objects.equals(el1, el2));
System.out.println("Objects.equals(el2, el1): " + Objects.equals(el2, el1));
System.out.println("el1.hashCode(): " + el1.hashCode());
System.out.println("el2.hashCode(): " + el2.hashCode());
Objects.equals(el1, el2): true
Objects.equals(el2, el1): true
1205483858
el1.hashCode(): -1284705008
1373949107
el2.hashCode(): -357249585
From my analysis I have gathered that:
HashSet::contains calls hashCode not equals (relying on the Objects.equals(a, b) => a.hashSet() == b.hashSet())
super.hashCode() gives a different value both times.
Why does super.hashCode() give different results for el1 and el2? since they are of the same class, they have the same super class and so I expect super.hashCode() to give the same result for both.
The hashCode method was probably autogenerated by eclipse. If not answered above, why is super.hashCode used wrong here?
Because the default implementations of the equals and hashCode methods (which go hand in hand - you always override both or neither) treat any 2 different instances as not equal to each other. If you want different behaviour, you override equals and hashCode, and do not invoke super.equals / super.hashCode, or there'd be no point.
HashSets work as follows: They use .hashCode() to know which 'bucket' to put the object into, and if 2 objects end up in the same bucket, equals is used only on those very few objects to double check.
In other words, these are the rules:
If a.equals(b), then b.equals(a) must be true.
a.equals(a) must always be true.
If a.equals(b) and b.equals(c), a.equals(c) must be true.
If a.equals(b), a.hashCode() == b.hashCode() must be true.
The reverse of 4 does not hold: If a.hashCode() == b.hashCode(), that doesn't mean a.equals(b), and hashset does not require it.
Therefore, return 1; is a legal implementation of hashCode.
If a class has really bad hashcode spread (such as the idiotic but legal option listed in bullet 6), then the performance of hashset will be very bad. e.g. set.containsKey(k) which ordinarily takes constant time, will take linear time instead if your objects are all not-equal but have the same hashCode. Hence, do try to ensure hashcodes are as different as they can be.
HashSet and HashMap require stable objects, meaning, their behaviour when calling hashCode and equals cannot change over time.
From the above it naturally follows that overriding equals and not hashCode or vice versa is necessarily broken.
Breaking any of the above rules does not, generally, result in a compiler error. It often doesn't even result in an exception. But instead it results in bizarre behaviour with hashsets and hashmaps: You put an k/v pair in the map, and then immediately ask for the value back and you get null back instead of what you put in, or something completely different. Just an example.
NB: One weird effect of all this is that you cannot add equality-affecting state to subclasses, unless you apply a caveat that most classes including all classes in the core libraries don't apply.
Imagine as an example that we invent the notion of a 'coloured' arraylist. You could have a red '["Hello", "World"]' list, and a blue one:
class ColoredArrayList extends ArrayList {
Color color;
public ColoredArrayList(Color c) {
this.color = color;
}
}
You'd probably want an empty red list to not equal an empty blue one. However, that is impossible if you intend to follow the rules. That's because the equals/hashCode impl of ArrayList itself considers any other list equal to itself if it has the same items in the same order. Therefore:
List<String> a = new ArrayList<String>();
ColoredList<String> b = new ColoredList<String>(Color.RED);
a.equals(b); // this is true, and you can't change that!
Therefore, b.equals(a) must also be true (your impl of equals has to say that an empty red list is equal to an empty plain arraylist), and given that an empty arraylist is also equal to an empty blue one, given that a.equals(b) and b.equals(c) implies that a.equals(c), a red empty list has to be equal to a blue empty list.
There is an easy solution for this that brings in new problems, and a hard solution that is objectively better.
The easy solution is to define that you can't be equal to anything except exact instances of yourself, as in, any subclass is insta-disqualified. Imagine ArrayList's equals method returns false if you call it with an instance of a subclass of ArrayList. Then you could make your colored list just fine. But, this isn't necessarily great, for example, you probably want an empty LinkedList and an empty ArrayList to be equal.
The harder solution is to introduce a second method, canEqual, and call it. You override canEqual to return 'if other is instanceof the nearest class in my hierarchy that introduces equality-relevant state'. Thus, your ColoredList should have #Override public boolean canEqual(Object other) { return other instanceof ColoredList; }.
The problem is, all classes need to have that and use it, or it's not going to work, and ArrayList does not have it. And you can't change that.
Project Lombok can generate this for you if you prefer. It's not particularly common; I'd only use it if you really know you need it.

Weird Set.contains() behavior

I initially started this as a test for a theory-based, best-practices question that I wanted to ask here, but in the process I found some interesting behavior in the java.Set class. Initially, I wanted to know any potential pitfalls of this approach, but now that I can see it doesn't work at all, I'd like to know why.
I have some objects that are containers for database objects for my app. The objects all have unique integer id's, and hashCode() and equals() are defined by the integer ids (for storage in hashsets).
Well, I wanted the ability to check if a hashset contains the object given only the id.
Certainly, I could create a new instance of the object and check that way. But, just for kicks, I wanted to see if I could accomplish it. Of course, this is also trivial with a hashmap, so this is really not an important question, just for fun and knowledge.
So, I made a class, and tried to call contains() on an integer, instead of an instance of the object. Netbeans, of course, gives a fun warning for this
Suspicious call to java.util.Collection.contains:
Given object cannot contain instances of int (expected Person)
Ignoring the error and running the code, I was shocked to find that Java does not even call the equals method. I placed debugging System.out.println()s in my equals method to verify, and yep, it's not even being called.
In the code posted below, the expected output should be (if my theory was correct):
Here
Yes
Here
Yes
or (if my theory was incorrect):
Here
Yes
Here
No
However, the output is:
Here
Yes
No
Notice, there's no "Here" before the "No" proving that the equals method is not even being called.
Can anyone shed light? I was always told to add this to equals() for efficiency:
if (!(obj instanceof Person))
return false;
But if equals() is not even called in such a situation, then that would be pointless.
Here is the SSCCE:
Thanks for your time.
import java.util.LinkedHashSet;
import java.util.Set;
/**
*
* #author Ryan
*/
public class Test7 {
public static void main(String[] args) {
class Person {
public final int id;
public final String name;
public Person(int id, String name) {
this.id = id;
this.name = name;
}
#Override
public boolean equals(Object obj) {
System.out.println("Here");
if (this == obj)
return true;
if (obj instanceof Person)
return id == ((Person)obj).id;
else if(obj instanceof Integer)
return id == (Integer)obj;
else {
System.out.println("Returning False");
return false;
}
}
#Override
public int hashCode() {
return id;
}
}
Set<Person> set = new LinkedHashSet<Person>();
set.add(new Person(1, "Bob"));
set.add(new Person(2, "George"));
set.add(new Person(3, "Sam"));
if(set.contains(new Person(1, "Bob")))
System.out.println("Yes");
else
System.out.println("No");
if(set.contains(1))
System.out.println("Yes");
else
System.out.println("No");
}
}
This is due to that fact that the comparison is done on the provided object not the elements in the set. From HashSet#contains(Object):
Returns true if this set contains the specified element. More formally, returns true if and only if this set contains an element e such that (o==null ? e==null : o.equals(e)).
So in your example, you would be doing comparison like integer.equals(person). So if your set contains Person objects, the if(obj instanceof Integer) condition will never be checked, but if your set contained Integer objects, that condition would be satisfied and as such would be checked.

LinkedHashSet: hashCode() and equals() match, but contains() doesn't

How is the following possible:
void contains(LinkedHashSet data, Object arg) {
System.out.println(data.getClass()); // java.util.LinkedHashSet
System.out.println(arg.hashCode() == data.iterator().next().hashCode()); // true
System.out.println(arg.equals(data.iterator().next())); // true
System.out.println(new ArrayList(data).contains(arg)); // true
System.out.println(new HashSet(data).contains(arg)); // true
System.out.println(new LinkedHashSet(data).contains(arg)); // true (!)
System.out.println(data.contains(arg)); // false
}
Am I doing something wrong?
Obviously, it doesn't always happen (if you create a trivial set of Objects, you won't reproduce it). But it does always happen in my case with more complicated class of arg.
EDIT: The main reason why I don't define arg here is that's it's fairly big class, with Eclipse-generated hashCode that spans 20 lines and equals twice as long. And I don't think it's relevant - as long as they're equal for the two objects.
When you build your own objects, and plan to use them in a collection you should always override the following methods:
boolean equals(Object o);
int hashCode();
The default implementation of equals checks whether the objects point to the same object, while you'd probably want to redefine it to check the contents.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. To respect the rules an hashCode of an object equals to another one should be the same, thus you've also to redefine hashCode.
EDIT: I was expecting a faulty hashCode or equals implementation, but since your answer, you revealed that you're mutating the keys after they are added to an HashSet or HashMap.
When you add an Object to an hash collection, its hashCode is computed and used to map it to a physical location in the Collection.
If some fields used to compute the hashCode are changed, the hashCode itself will change, so the HashSet implementation will become confused. When it tries to get the Object it will look at another physical location, and won't find the Object. The Object will still be present if you enumerate the set though.
For this reason, always make HashMap or HashSet keys Immutable.
Got it. Once you know it, the answer is so obvious you can only blush in embarrassment.
static class MyObj {
String s = "";
#Override
public int hashCode() {
return s.hashCode();
}
#Override
public boolean equals(Object obj) {
return ((MyObj) obj).s.equals(s);
}
}
public static void main(String[] args) {
LinkedHashSet set = new LinkedHashSet();
MyObj obj = new MyObj();
set.add(obj);
obj.s = "a-ha!";
contains(set, obj);
}
That is enough to reliably reproduce it.
Explanation: Thou Shalt Never Mutate Fields Used For hashCode()!
There seems to be something missing from your question. I have made some guesses:
private void testContains() {
LinkedHashSet set = new LinkedHashSet();
String hello = "Hello!";
set.add(hello);
contains(set, hello);
}
void contains(LinkedHashSet data, Object arg) {
System.out.println(data.getClass()); // java.util.LinkedHashSet
System.out.println(arg.hashCode() == data.iterator().next().hashCode()); // true
System.out.println(arg.equals(data.iterator().next())); // true
System.out.println(new ArrayList(data).contains(arg)); // true
System.out.println(new HashSet(data).contains(arg)); // true
System.out.println(new LinkedHashSet(data).contains(arg)); // true (!)
System.out.println(data.contains(arg)); // true (!!)
}
EDITED: To keep track of changing question!
I still get "true" for ALL but the first output. Please be more specific about the type of the "arg" parameter.

java why should equals method input parameter be Object

I'm going through a book on data structures. Currently I'm on graphs, and the below code is for the vertex part of the graph.
class Vertex<E>{
//bunch of methods
public boolean equals(Object o){
//some code
}
}
When I try to implement this equals method my compiler complains about not checking the type of the parameter and just allowing any object to be sent it. It also does seem a bit strange to me why that parameter shouldn't be a Vertex instead of an Object. Is there a reason why the author does this or is this some mistake or antiquated example?
#Override
public boolean equals(Object obj)
{
if (!(obj instanceof Vertex)) return false;
else return // blah blah
}
equals(Object) is the method defined in the root - Object. If you don't match the signature exactly, Object's version will be called when someone checks if two objects are equal. Not what you want.
You've probably seen other methods (like Comparator) where you can use the exact time. That's because those APIs were generic-ified with Java 5. Equals can't be because it is valid to call equals with two separate types. It should return false, but it is valid.
equals is a method inherited from Object, is defined to be flexible enough so that you can take any object and test if it is equal to any other object (as it rightfully should be able to do), so how could it be any other way?
Edit 1
Comment from jhlu87:
so is it not good form to write an equals method that has an input parameter of vertex?
You are welcome to create your own overload to any method, including equals, but doing so without changing the name could risk confusing many who would assume that your equals is the one that inherits from Object. If it were my code and I wanted a more specific equals method, I'd name it slightly different from just "equals" just to avoid confusion.
If your method doesn't take an argument of type Object, it isn't overriding the default version of equals but rather overloading it. When this happens, both versions exist and Java decides which one to use based on the variable type (not the actual object type) of the argument. Thus, this program:
public class Thing {
private int x;
public Thing(int x) {
this.x = x;
}
public boolean equals(Thing that) {
return this.x == that.x;
}
public static void main(String[] args) {
Thing a = new Thing(1);
Thing b = new Thing(1);
Object c = new Thing(1);
System.out.println(a.equals(b));
System.out.println(a.equals(c));
}
}
confusingly prints true for the first comparison (because b is of type Thing) and false for the second (because c is of type Object, even though it happens to contain a Thing).
It's because this method existed before generics, so for backward compatabitity it has to stay this way.
The standard workaround to impose type is:
return obj instanceof MyClass && <some condition>;
It is because the author is overriding equals. Equals is specified in java.lang.Object and is something that all classes inherrits from.
See the javadoc for java.lang.Object

Why should I override hashCode() when I override equals() method?

Ok, I have heard from many places and sources that whenever I override the equals() method, I need to override the hashCode() method as well. But consider the following piece of code
package test;
public class MyCustomObject {
int intVal1;
int intVal2;
public MyCustomObject(int val1, int val2){
intVal1 = val1;
intVal2 = val2;
}
public boolean equals(Object obj){
return (((MyCustomObject)obj).intVal1 == this.intVal1) &&
(((MyCustomObject)obj).intVal2 == this.intVal2);
}
public static void main(String a[]){
MyCustomObject m1 = new MyCustomObject(3,5);
MyCustomObject m2 = new MyCustomObject(3,5);
MyCustomObject m3 = new MyCustomObject(4,5);
System.out.println(m1.equals(m2));
System.out.println(m1.equals(m3));
}
}
Here the output is true, false exactly the way I want it to be and I dont care of overriding the hashCode() method at all. This means that hashCode() overriding is an option rather being a mandatory one as everyone says.
I want a second confirmation.
It works for you because your code does not use any functionality (HashMap, HashTable) which needs the hashCode() API.
However, you don't know whether your class (presumably not written as a one-off) will be later called in a code that does indeed use its objects as hash key, in which case things will be affected.
As per the documentation for Object class:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
Because HashMap/Hashtable will lookup object by hashCode() first.
If they are not the same, hashmap will assert object are not the same and return not exists in the map.
The reason why you need to #Override neither or both, is because of the way they interrelate with the rest of the API.
You'll find that if you put m1 into a HashSet<MyCustomObject>, then it doesn't contains(m2). This is inconsistent behavior and can cause a lot of bugs and chaos.
The Java library has tons of functionalities. In order to make them work for you, you need to play by the rules, and making sure that equals and hashCode are consistent is one of the most important ones.
Most of the other comments already gave you the answer: you need to do it because there are collections (ie: HashSet, HashMap) that uses hashCode as an optimization to "index" object instances, an those optimizations expects that if: a.equals(b) ==> a.hashCode() == b.hashCode() (NOTE that the inverse doesn't hold).
But as an additional information you can do this exercise:
class Box {
private String value;
/* some boring setters and getters for value */
public int hashCode() { return value.hashCode(); }
public boolean equals(Object obj) {
if (obj != null && getClass().equals(obj.getClass()) {
return ((Box) obj).value.equals(value);
} else { return false; }
}
}
The do this:
Set<Box> s = new HashSet<Box>();
Box b = new Box();
b.setValue("hello");
s.add(b);
s.contains(b); // TRUE
b.setValue("other");
s.contains(b); // FALSE
s.iterator().next() == b // TRUE!!! b is in s but contains(b) returns false
What you learn from this example is that implementing equals or hashCode with properties that can be changed (mutable) is a really bad idea.
It is primarily important when searching for an object using its hashCode() value in a collection (i.e. HashMap, HashSet, etc.). Each object returns a different hashCode() value therefore you must override this method to consistently generate a hashCode value based on the state of the object to help the Collections algorithm locate values on the hash table.

Categories