Potential pitfalls when ignoring some fields in equals/hashCode?

Potential pitfalls when ignoring some fields in equals/hashCode? - java

If only some of the fields of an object represents the actual state, I suppose these could be ignored when overriding equals and hashCode...
I get an uneasy feeling about this though, and wanted to ask,
Is this common practice?
Are there any potential pitfalls with this approach?
Is there any documentation or guidelines when it comes to ignoring some fields in equals / hashCode?
In my particular situation, I'm exploring a state-space of a problem. I'd like to keep a hash set of visited states, but I'm also considering including the path which lead to the state. Obviously, two states are equal, even though they are found through different paths.

This is based on how you would consider the uniqueness of a given object. If it has a primary key ( unique key) , then using that attribute alone is enough.
If you think the uniqueness is combination of 10 different attributes, then use all 10 attributes in the equals.
Then use only the attributes that you used in equals to generate the hashcode because same objects should generate the same hashcodes.
Selecting the attribute(s) for equals and hashcode is how you define the uniqueness of a given object.
Is this common practice? Yes
Are there any potential pitfalls with this approach? No
Is there any documentation or guidelines when it comes to ignoring some fields in equals / hashCode?
"The equals method for class Object implements the most discriminating
possible equivalence relation on objects;"
This is from object class Javadoc. But as the author of the class , you know how the uniqueness is defined.

Ultimately, "equals" means what you want it to mean. There is the restriction that "equal" values must return the same hashcode, and, of course, if presented with two identical address "equals" must return true. But you could, eg, have an "equals" that compared the contents of two web pages (ignoring the issue of repeatability for the nonce), and, even though the URLs were different, said "equal" if the page contents matched in some way.

The best documentation/guidelines I have seen for overriding the methods on Object was in Josh Bloch's Effective Java. It has a whole chapter on "Methods Common to All Objects" which includes sections about "Obey the general contract when overriding equals" and "Always override hashCode when you override equals". It describes, in detail, the things you should consider when overriding these two methods. I won't give away the answer directly; the book is definitely worth the cost for every Java developer.

Related

Why does Intelij Idea let us make incorrect pair equals()-hashcode() by generator?

There is a generator in IntelliJ IDEA. You press Alt+Ins, choose 'equal and hashCode' and a constructors opens. You can choose fields for equals and then you can choose fields for hashCode(). Why can we choose different field sets? Isn't it contradicted to equals-hashCode contracts?

As per Java Doc of Object Class -
Note that it is generally necessary to override the {#code hashCode}
method whenever this method is overridden, so as to maintain the
general contract for the {#code hashCode} method, which states
that equal objects must have equal hash codes.
By default equals method returns true for inputs referring to same object instance. An overridden equals might return true for completely different objects even (even the object with different field values) which totally depends on your implementation.
The contract enforces that if your equals logic is determining two
different object as same, your hashcode method should return the same
value for those two objects.
This does not mean that you should be using the same fields for hashcode also. This is all what you should be taking care of while overriding these functions.

Well, it doesn't really allow you to choose different field sets, it allows you to pick a subset of the fields for equals, to use for hashCode.
While this will probably lead to a poorer hash code since this will cause more hash collisions, it will technically still be correct. Note that the requirement is just that equal objects have equal hash codes, not that equal hash codes must be from equal objects. The latter would be impossible to achieve for classes that can have more different instances than there are ints (e.g. java.lang.Long).
There may be good motivations to use a suboptimal hash for collisions if calculating the best hash for collisions would be too expensive, compared to just dealing with the collision.

Why should two equal objects return equal hash codes if I don't want to use my object as a key in a hash table?

This is what the Java documentation of Object.hashCode() says:
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce
the same integer result.
But they don't explain why two equal objects must return equal hash codes. Why did Oracle engineers decided hashCode must be overriden when overriding equals?
The typical implementation of equals doesn't call the hashCode method:
#Override
public boolean equals(Object arg0) {
if (this == arg0) {
return true;
}
if (!(arg0 instanceof MyClass)) {
return false;
}
MyClass another = (MyClass) arg0;
// compare significant fields here
}
In Effective Java (2nd Edition) I read:
Item 9: Always override hashCode when you override equals.
A common source of bugs is the failure to override the hashCode
method. You must override hashCode in every class that overrides
equals. Failure to do so will result in a violation of the general
contract for Object.hashCode, which will prevent your class from
functioning properly in conjunction with all hash-based collections,
including HashMap, HashSet, and Hashtable.
Suppose I don't need to use MyClass as a key of a hash table. Why do I need to override hashCode() in this case?

Of course when you have a little program only written by your self and you check every time you use an external lib that this did not rely on hashCode() then you can ignore all this warnings. But when a software projects grows you will use external libraries and and these will rely on hashCode()and you will lose a lot of time searching for bugs. Or in a newer Java version some other classes use the hashCode()too and your program will fail.
So its a lot easier to just implement this and follow this easy rule, because an modern IDE can auto generate equals and hashCode with one click.
Update a little story: At work we ignored this rule too in a lot of classes and only implemented the one needed mostly equals or compareTo. Some day some strange things happen, because one programmer has used a Hash*-Class in the GUI and our Objects did not follow this rule. In the end an apprentice need to search all classes with equals and have to add the corresponding hashCode method.

As the text says, it's a violation of the general contract that is in use. Of course if you never ever use it in any place that hashCode would be required nobody is forcing you to implement it.
But what if some day in the future it is needed? What if the class is used by some other developer? That's why there is this contract that both of them have to be implemented, so there is no confusion.

Obviously if nobody ever calls your class's hashCode method, nobody will know that it's inconsistent with equals. Can you guarantee that for the duration of your project, including the years in maintenance, no one will need to, say, remove duplicate objects from a list or associate some extra bit of data with your objects?
You are probably just safer implementing hashCode so that it's consistent with equals. It's not particularly hard, always returning 0 is already a valid implementation.
(please don't just return 0 though)

The reason why hashCode has to be overridden to agree with equals is because of why and how hashCode is used. Hash codes are used as surrogates for values so that when mapping a key value to something hashing can be used to give near-constant lookup time with reasonable space. When two values compare equal (ie, they are the same value) then they have to map to the same something when used as a key for a hashed collection. That requires that they have the same hash code to get them there.
You have to override hashCode appropriately because the manual says you have to. It says you have to because the decision was made that libraries can assume that you are satisfying that contract so they (& you) can have the performance benefits of hashing when using a value that you gave it as a key in their functions' implementations.

The common wizdom dictates that the logic for // compare significant fields here is required for equals() and also for compareTo() (in case you want to sort instances of MyClass) and also for using hash tables. so it makes sense to put this logic in hashCode() and have the other methods use the hash code.

Annotation.equals() vs. Object.equals()

Some frameworks (e.g. guice) require in certain situations to create an implementing class of an annotation interface.
There seems to be a difference between the Annotation.equals(Object) and Object.equals(Object) definitions which need to be respected in that case (same applies for hashCode()).
Questions:
Why was it designed that way and what is the reason of the difference?
What side-effects can occur when using the Object.equals(Object) definition for annotation classes instead?
Update:
Additional questions:
What about the Annotation.hashCode() definition? Is it really required to implement it that way, especially the "(...)127 times the hash code of the member-name as computed by String.hashCode()) XOR the hash code(...)"-part?
What happens if a hashCode() method is implemented to be consistent to equals() but doesn't match the exact definition of Annotation.hashCode() (e.g. using 128 times the hash code of the member-name)?

The definitions are not different. The definition in Annotation is simply specialized for the annotation type.
The definition in Object basically says "If you decide to implement equals for your class, it should represent an equivalence relation that follows these rules".
In Annotation it defines an equivalence that follows those rules, which is meaningful specifically for Annotation instances.
In fact, the Annotation equivalence would work for many other classes. The point is that different classes have different meanings, and therefore their instances may have different equivalence relationships, and it's up to the programmer to decide which equivalence relation to use for his/her class. In Annotation, the contract is for this particular equivalence relation.
As for side effects - suppose an Annotation type inherited Object's equals. This is a mistake many people do when they try to use their own classes in maps or other equals()-dependent situations. Object has an equals() function that is the same as object identity: two references are equal only if they are references to the same object.
If you used that, then no two instances would be considered the same. You would not be able to create a second Annotation instance that would be equivalent to a previous one, despite them having the same values in their fields and semantically representing the same sort of behavior. So you wouldn't be able to tell if two items are annotated with the same annotation, when they have different instances of the same annotation.
As for the hashCode question, although Jeff Bowman has already answered that, I'll address that to make my answer more complete:
Basically, implementation of annotations is left to compilers, and the JLS doesn't dictate the exact implementation. It is also possible to create implementing classes, as your question itself mentions.
This means that annotation classes can come from different sources - different compilers (you are supposed to be able to run .class files anywhere, no matter which java compiler created them) and developer-created implementations.
The equals() and hashCode() methods are usually considered in a single class context, not in an interface context. This is because interfaces are usually antithetic to implementation - they only define contracts. When you create these methods for a particular class, you know that the object you compare with is supposed to be of the same class, and thus have the same implementation. Once it has a hashCode method that returns the same value for objects that are equivalent under equals for the same class, then whatever that implementation is, it satisfies the contract.
However, in this particular case, you have an interface, and you are required to make equals() and hashcode() to work not only for two instances of the same class, but for instances of different classes that implement of the same interface. This means that if you don't agree on a single implementation across all possible classes, you might get two instances of the same annotation with the same element values, and different hash codes. This would break the hashcode() contract.
As an example, imagine an annotation #SomeAnnotation that doesn't take parameters. Imagine that you implement it with a class SomeAnnotationImpl that returns 15 as the hash code. Two equal instances of SomeAnnotationImpl will have the same hash code, which is good. But the Java compiler would return 0 as the hash code when you check the returned instance of its own implementation of #SomeAnnotation. Therefore two objects of type Annotation are equal (they implement the same annotation interface and if they follow the equals() definition above, they should return true for equals), but have different hash codes. That breaks the contract.

RealSkeptic's answer is great, but I'll put it a slightly different way.
This is a specific instance of a general problem:
You defined an interface (specifically an annotation).
Someone (javac) wrote a particular (built-in) implementation of that interface. You can't access that implementation, but need to be able to create equal instances, particularly for use in Sets and Maps. (Guice is one big Map<Key, Provider> after all.)
The implementor (javac) wrote a custom implementation of equals so that annotation instances with the same parameters pass equals. You need to match that implementation so that equals is symmetric (a.equals(b) if and only if b.equals(a), which is assumed in Java along with reflexivity, consistency, and transitivity).
Equal objects must have equal hashCodes because Java uses it as a shortcut for equality: if objects have unequal hashCodes then they cannot be equal. This comes in handy to make the efficient Map implementation HashMap, because you can use the hashCode to only check objects in the right hashCode-determined bucket. If you used a different or modified hashCode algorithm, you'd be breaking spec in theory, and in practice your annotation implementation wouldn't match others consistently in HashSet or HashMap (rendering it worthless to Guice). Many other features use hashCode, but those are the most obvious examples.
It would be much easier if Java let you instantiate their implementation, or generate an implementation automatically for your class, but here the best they've done is an exact spec for you to match.
So yes, you'll run into this with annotations more often than anything else, but these matter any time you're trying to act equal with an implementation you can't control or use yourself.

The above answers are excellent general answers to the question, but since I haven't seen them mentioned I'll just add that the use of AnnotationLiteral for implementing Annotations takes care of the equals and hashCode issues properly. There are a couple to choose from:
AnnotationLiteral
AnnotationLiteral

What is the reason behind Enum.hashCode()?

The method hashCode() in class Enum is final and defined as super.hashCode(), which means it returns a number based on the address of the instance, which is a random number from programmers POV.
Defining it e.g. as ordinal() ^ getClass().getName().hashCode() would be deterministic across different JVMs. It would even work a bit better, since the least significant bits would "change as much as possible", e.g., for an enum containing up to 16 elements and a HashMap of size 16, there'd be for sure no collisions (sure, using an EnumMap is better, but sometimes not possible, e.g. there's no ConcurrentEnumMap). With the current definition you have no such guarantee, have you?
Summary of the answers
Using Object.hashCode() compares to a nicer hashCode like the one above as follows:
PROS
simplicity
CONTRAS
speed
more collisions (for any size of a HashMap)
non-determinism, which propagates to other objects making them unusable for
deterministic simulations
ETag computation
hunting down bugs depending e.g. on a HashSet iteration order
I'd personally prefer the nicer hashCode, but IMHO no reason weights much, maybe except for the speed.
UPDATE
I was curious about the speed and wrote a benchmark with surprising results. For a price of a single field per class you can a deterministic hash code which is nearly four times faster. Storing the hash code in each field would be even faster, although negligibly.
The explanation why the standard hash code is not much faster is that it can't be the object's address as objects gets moved by the GC.
UPDATE 2
There are some strange things going on with the hashCode performance in general. When I understand them, there's still the open question, why System.identityHashCode (reading from the object header) is way slower than accessing a normal object field.

The only reason for using Object's hashCode() and for making it final I can imagine, is to make me ask this question.
First of all, you should not rely on such mechanisms for sharing objects between JVMs. That's simply not a supported use case. When you serialize / deserialize you should rely on your own comparison mechanisms or only "compare" the results against objects within your own JVM.
The reason for letting enums hashCode be implemented as Objects hash code (based on identity) is because, within one JVM there will only be one instance of each enum object. This is enough to ensure that such implementation makes sense and is correct.
You could argue like "Hey, String and the wrappers for the primitives (Long, Integer, ...) all have well defined, deterministic, specifications of hashCode! Why doesn't the enums have it?", Well, to begin with, you can have several distinct string references representing the same string which means that using super.hashCode would be an error, so these classes necessarily need their own hashCode implementations. For these core classes it made sense to let them have well-defined deterministic hashCodes.
Why did they choose to solve it like this?
Well, look at the requirements of the hashCode implementation. The main concern is to make sure that each object should return a distinct hash code (unless it is equal to another object). The identity-based approach is super efficient and guarantees this, while your suggestion does not. This requirement is apparently stronger than any "convenience bonus" about easing up on serialization etc.

I think that the reason they made it final is to avoid developers shooting themselves in the foot by rewriting a suboptimal (or even incorrect) hashCode.
Regarding the chosen implementation: it's not stable across JVMs, but it's very fast, avoid collisions, and doesn't need an additional field in the enum. Given the normally small number of instances of an enum class, and the speed of the equals method, I wouldn't be surprised if the HashMap lookup time was bigger with your algorithm than with the current one, due to its additional complexity.

I've asked the same question, because did not saw this one. Why in Enum hashCode() refers to the Object hashCode() implementaion, instead of ordinal() function?
I encountered it as a sort of a problem, when defining my own hash function, for an Object relying on enum hashCode as one of the composites. When checking a value in a Set of Objects, returned by the function, I checked them in an order, which I would expect it to be the same, since the hashCode I define myself, and so I expect elements to fall at the same nodes on the tree, but since hashCode returned by enum changes from start to start, this assumption was wrong, and test could fail once in a while.
So, when I figured out the problem, I started using ordinal instead. I am not sure everyone writing hashCode for their Object realize this.
So basically, you can't define your own deterministic hashCode, while relying on enum hashCode, and you need to use ordinal instead
P.S. This was too big for a comment :)

The JVM enforces that for an enum constant, only one object will exist in memory. There is no way that you could end up with two different instance objects of the same enum constant within a single VM, not with reflection, not across the network via serialization/deserialization.
That being said, since it is the only object to represent this constant, it doesn't matter that its hascode is its address since no other object can occupy the same address space at the same time. It is guaranteed to be unique & "deterministic" (in the sense that in the same VM, in memory, all objects will have the same reference, no matter what it is).

There is no requirement for hash codes to be deterministic between JVMs and no advantage gained if they were. If you are relying on this fact you are using them wrong.
As only one instance of each enum value exists, Object.hashcode() is guaranteed never to collide, is good code reuse and is very fast.
If equality is defined by identity, then Object.hashcode() will always give the best performance.
The determinism of other hash codes is just a side effect of their implementation. As their equality is usually defined by field values, mixing in non-deterministic values would be a waste of time.

As long as we can't send an enum object1 to a different JVM I see no reason for putting such a requirements on enums (and objects in general)
1 I thought it was clear enough - an object is an instance of a class. A serialized object is a sequence of bytes, usually stored in a byte array. I was talking about an object.

One more reason that it is implemented like this I could imagine is because of the requirement for hashCode() and equals() to be consistent, and for the design goal of Enums that they sould be simple to use and compile-time constant (to use them is "case" constants). This also makes it legal to compare enum instances with "==", and you simply wouldn't want "equals" to behave differntly from "==" for enums. This again ties hashCode to the default Object.hashCode() reference-based behavior.
As said before, I also don't expect equals() and hashCode() to consider two enum constants from different JVM as being equal. When talking about serialization: For instance fields typed as enums the default binary serializer in Java has a special behaviour that serializess only the name of the constant, and on deserialization the reference to the corresponding enum value in the de-serializing JVM is re-created. JAXB and other XML-based serialization mechanisms work in a similar way. So: just don't worry

Overriding the equals method vs creating a new method

I have always thought that the .equals() method in java should be overridden to be made specific to the class you have created. In other words to look for equivalence of two different instances rather than two references to the same instance. However I have encountered other programmers who seem to think that the default object behavior should be left alone and a new method created for testing equivalence of two objects of the same class.
What are the argument for and against overriding the equals method?

Overriding the equals method is necessary if you want to test equivalence in standard library classes (for example, ensuring a java.util.Set contains unique elements or using objects as keys in java.util.Map objects).
Note, if you override equals, ensure you honour the API contract as described in the documentation. For example, ensure you also override Object.hashCode:
If two objects are equal according to
the equals(Object) method, then
calling the hashCode method on each of
the two objects must produce the same
integer result.
EDIT: I didn't post this as a complete answer on the subject, so I'll echo Fredrik Kalseth's statement that overriding equals works best for immutable objects. To quote the API for Map:
Note: great care must be exercised if
mutable objects are used as map keys.
The behavior of a map is not specified
if the value of an object is changed
in a manner that affects equals
comparisons while the object is a key
in the map.

I would highly recommend picking up a copy of Effective Java and reading through item 7 obeying the equals contract. You need to be careful if you are overriding equals for mutable objects, as many of the collections such as Maps and Sets use equals to determine equivalence, and mutating an object contained in a collection could lead to unexpected results. Brian Goetz also has a pretty good overview of implementing equals and hashCode.

You should "never" override equals & getHashCode for mutable objects - this goes for .net and Java both. If you do, and use such an object as the key in f.ex a dictionary and then change that object, you'll be in trouble because the dictionary relies on the hashcode to find the object.
Here's a good article on the topic: http://weblogs.asp.net/bleroy/archive/2004/12/15/316601.aspx

#David Schlosnagle mentions mentions Josh Bloch's Effective Java -- this is a must-read for any Java developer.
There is a related issue: for immutable value objects, you should also consider overriding compare_to. The standard wording for if they differ is in the Comparable API:
It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."

The Equals method is intended to compare references. So it should not be overriden to change its behaviour.
You should create a new method to test for equivalence in different instances if you need to (or use the CompareTo method in some .NET classes)

To be honest, in Java there is not really an argument against overriding equals. If you need to compare instances for equality, then that is what you do.
As mentioned above, you need to be aware of the contract with hashCode, and similarly, watch out for the gotchas around the Comparable interface - in almost all situations you want the natural ordering as defined by Comparable to be consistent with equals (see the BigDecimal api doc for the canonical counter example)
Creating a new method for deciding equality, quite apart from not working with the existing library classes, flies in the face of Java convention somewhat.

You should only need to override the equals() method if you want specific behaviour when adding objects to sorted data structures (SortedSet etc.)
When you do that you should also override hashCode().
See here for a complete explanation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.