Annotation.equals() vs. Object.equals() - java

Some frameworks (e.g. guice) require in certain situations to create an implementing class of an annotation interface.
There seems to be a difference between the Annotation.equals(Object) and Object.equals(Object) definitions which need to be respected in that case (same applies for hashCode()).
Questions:
Why was it designed that way and what is the reason of the difference?
What side-effects can occur when using the Object.equals(Object) definition for annotation classes instead?
Update:
Additional questions:
What about the Annotation.hashCode() definition? Is it really required to implement it that way, especially the "(...)127 times the hash code of the member-name as computed by String.hashCode()) XOR the hash code(...)"-part?
What happens if a hashCode() method is implemented to be consistent to equals() but doesn't match the exact definition of Annotation.hashCode() (e.g. using 128 times the hash code of the member-name)?

The definitions are not different. The definition in Annotation is simply specialized for the annotation type.
The definition in Object basically says "If you decide to implement equals for your class, it should represent an equivalence relation that follows these rules".
In Annotation it defines an equivalence that follows those rules, which is meaningful specifically for Annotation instances.
In fact, the Annotation equivalence would work for many other classes. The point is that different classes have different meanings, and therefore their instances may have different equivalence relationships, and it's up to the programmer to decide which equivalence relation to use for his/her class. In Annotation, the contract is for this particular equivalence relation.
As for side effects - suppose an Annotation type inherited Object's equals. This is a mistake many people do when they try to use their own classes in maps or other equals()-dependent situations. Object has an equals() function that is the same as object identity: two references are equal only if they are references to the same object.
If you used that, then no two instances would be considered the same. You would not be able to create a second Annotation instance that would be equivalent to a previous one, despite them having the same values in their fields and semantically representing the same sort of behavior. So you wouldn't be able to tell if two items are annotated with the same annotation, when they have different instances of the same annotation.
As for the hashCode question, although Jeff Bowman has already answered that, I'll address that to make my answer more complete:
Basically, implementation of annotations is left to compilers, and the JLS doesn't dictate the exact implementation. It is also possible to create implementing classes, as your question itself mentions.
This means that annotation classes can come from different sources - different compilers (you are supposed to be able to run .class files anywhere, no matter which java compiler created them) and developer-created implementations.
The equals() and hashCode() methods are usually considered in a single class context, not in an interface context. This is because interfaces are usually antithetic to implementation - they only define contracts. When you create these methods for a particular class, you know that the object you compare with is supposed to be of the same class, and thus have the same implementation. Once it has a hashCode method that returns the same value for objects that are equivalent under equals for the same class, then whatever that implementation is, it satisfies the contract.
However, in this particular case, you have an interface, and you are required to make equals() and hashcode() to work not only for two instances of the same class, but for instances of different classes that implement of the same interface. This means that if you don't agree on a single implementation across all possible classes, you might get two instances of the same annotation with the same element values, and different hash codes. This would break the hashcode() contract.
As an example, imagine an annotation #SomeAnnotation that doesn't take parameters. Imagine that you implement it with a class SomeAnnotationImpl that returns 15 as the hash code. Two equal instances of SomeAnnotationImpl will have the same hash code, which is good. But the Java compiler would return 0 as the hash code when you check the returned instance of its own implementation of #SomeAnnotation. Therefore two objects of type Annotation are equal (they implement the same annotation interface and if they follow the equals() definition above, they should return true for equals), but have different hash codes. That breaks the contract.

RealSkeptic's answer is great, but I'll put it a slightly different way.
This is a specific instance of a general problem:
You defined an interface (specifically an annotation).
Someone (javac) wrote a particular (built-in) implementation of that interface. You can't access that implementation, but need to be able to create equal instances, particularly for use in Sets and Maps. (Guice is one big Map<Key, Provider> after all.)
The implementor (javac) wrote a custom implementation of equals so that annotation instances with the same parameters pass equals. You need to match that implementation so that equals is symmetric (a.equals(b) if and only if b.equals(a), which is assumed in Java along with reflexivity, consistency, and transitivity).
Equal objects must have equal hashCodes because Java uses it as a shortcut for equality: if objects have unequal hashCodes then they cannot be equal. This comes in handy to make the efficient Map implementation HashMap, because you can use the hashCode to only check objects in the right hashCode-determined bucket. If you used a different or modified hashCode algorithm, you'd be breaking spec in theory, and in practice your annotation implementation wouldn't match others consistently in HashSet or HashMap (rendering it worthless to Guice). Many other features use hashCode, but those are the most obvious examples.
It would be much easier if Java let you instantiate their implementation, or generate an implementation automatically for your class, but here the best they've done is an exact spec for you to match.
So yes, you'll run into this with annotations more often than anything else, but these matter any time you're trying to act equal with an implementation you can't control or use yourself.

The above answers are excellent general answers to the question, but since I haven't seen them mentioned I'll just add that the use of AnnotationLiteral for implementing Annotations takes care of the equals and hashCode issues properly. There are a couple to choose from:
AnnotationLiteral
AnnotationLiteral

Related

Why is Set of java.util there in the API?

The interface Set in java.lang.util has the exact same structure
as Collection of the same package.
In the inheritance hierarchy, AbstractSet is
sub- to both Set and AbstractCollection, both
of which are sub- to Collection.
The other immediate descendant of Set is SortedSet,
and SortedSet is extending only Set.
What I'm wondering is, what's the gain in Set in java.lang.util-- why is it there?
If i'm not missing anything, it's not adding anything
to the current structure or hierarchy of the API.
All would be the same if AbstractSet didn't
implement Set but just extended AbstractCollection, and SortedSet
directly extended Collection.
The only thing I can think of is Set is there for documentation purposes.
Shouldn't be for further structuring/re-structuring the hierarchy-- that would mean
structural modifications of the descendants and doesn't make sense.
I'm looking for verification or counter-arguments if I'm missing something here.
//===========================================
EDIT: The Q is: "Why is Set there"-- what is it adding to the structure of the APIs?"
Obvious how set is particular among collections mathematically.
The methods in Set and Collection have the same signatures and return types, but they have different behavioural contracts ... deriving from the fact that a set cannot contain "the same" element more than once. THAT is why they are distinct interfaces.
It is not just documentation. Since Java doesn't do "duck typing", the distinction between Collection and Set is visible in both compile time and runtime type checking.
And the distinction is a useful one. If there was only Collection, then you would not be able to write methods that require a collection with no duplicates as an argument.
You write:
Set is a copy/paste of Collection apart from the comments.
I know that. The comments are the behavioural contract. They are critical. There is no other way to specify how something will behave in Java1, 2.
Reference:
Design by contract
1 - In one or two languages, you can specify the behavioural aspect of the "contract" in the language itself. Eiffel is the classical example ... that gave rise to the "design by contract" paradigm.
2 - In fact, the JML system adds formal preconditions, postconditions and invariants to Java, and checks them using an automated theorem prover. The problem is that it would be difficult to fully integrate this with the Java language's type system / static type checker. (How do you statically type check something when the theorem prover says "I don't know" ... because it is not smart enough to prove/disprove the JML assertions in the code?)
A set can't contain duplicate elements. A collection can.
what's the gain in Set in java.lang.util-- why is it there?
Separating the Sets from the other Collections lets you write code so that only a Set can be passed in. Here's an example where it's useful:
public void sendMessageTo(Collection<String> addresses) {
addresses.add("admin#example.com"); //The admin might now be on the list twice, and gets two emails, oops :(
//do something
}
I want to change the interface to take a Set:
public void sendMessageTo(Set<String> addresses) {
addresses.add("admin#example.com"); //This will add the admin if they were not already on the list, otherwise it won't because Sets don't allow duplicates
//do something
}
A Set is a Collection that contains no duplicates. For more info from the page:
More formally, sets contain no pair of
elements e1 and e2 such that e1.equals(e2), and at most one null
element. As implied by its name, this interface models the
mathematical set abstraction.
The Set interface places additional stipulations, beyond those
inherited from the Collection interface, on the contracts of all
constructors and on the contracts of the add, equals and hashCode
methods. Declarations for other inherited methods are also included
here for convenience. (The specifications accompanying these
declarations have been tailored to the Set interface, but they do not
contain any additional stipulations.)
The additional stipulation on constructors is, not surprisingly, that
all constructors must create a set that contains no duplicate elements
(as defined above).
If Set did not exist, there would be no way to enforce uniqueness in a Collection. It does not matter that the code is the same as Collection, Set exists to enforce behavioral restrictions, as in due to the defined behavior, when Set is implemented, the implementing class must adhere to its behavioral contract(s).

Using multiple alternatives of hashCode() and equals() for sets

Suppose I have a simple POJO class Class1 , and it has 2 fields of type int.
I've implemented the hashCode() and equals() methods of it to handle exactly those 2 fields, in order to put instances of the class into a set.
So far so good.
Now, I want to have a different set, which considers instances of Class1 to be equal if the first field is equal , making the equality condition weaker. I might even want to have another set which considers only the second field as the one that checks for equality.
Is it possible? If so, how?
You can get that effect by using a TreeSet when providing a custom Comparator that only inspects the fields you're interested in.
Note, however, that strictly speaking such a TreeSet no longer is a "correct" Set because it effectively ignores the equal() method of your objects:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
The standard Java libraries don't support this.
And (surprisingly) there doesn't appear to be a Map or Set class in the Apache Commons Collections or Guava libraries that supports this.
There are probably other libraries that to support this if you look hard enough.
Alternatively, you could write your own ... starting with the standard HashMap code.
A cheap-and-cheerful alternative is to create a light-weight wrapper class for your element type that delegates most methods to the wrapped class and provides a different equals / hashcode pair to the original. There is a small runtime penalty in doing this ... but it is worth considering.
Joachim's suggestion is good too, unless your sets are likely to be particularly big. (TreeSet has O(logN) lookup compared with O(1) for a properly implemented hash table.)

Potential pitfalls when ignoring some fields in equals/hashCode?

If only some of the fields of an object represents the actual state, I suppose these could be ignored when overriding equals and hashCode...
I get an uneasy feeling about this though, and wanted to ask,
Is this common practice?
Are there any potential pitfalls with this approach?
Is there any documentation or guidelines when it comes to ignoring some fields in equals / hashCode?
In my particular situation, I'm exploring a state-space of a problem. I'd like to keep a hash set of visited states, but I'm also considering including the path which lead to the state. Obviously, two states are equal, even though they are found through different paths.
This is based on how you would consider the uniqueness of a given object. If it has a primary key ( unique key) , then using that attribute alone is enough.
If you think the uniqueness is combination of 10 different attributes, then use all 10 attributes in the equals.
Then use only the attributes that you used in equals to generate the hashcode because same objects should generate the same hashcodes.
Selecting the attribute(s) for equals and hashcode is how you define the uniqueness of a given object.
Is this common practice? Yes
Are there any potential pitfalls with this approach? No
Is there any documentation or guidelines when it comes to ignoring some fields in equals / hashCode?
"The equals method for class Object implements the most discriminating
possible equivalence relation on objects;"
This is from object class Javadoc. But as the author of the class , you know how the uniqueness is defined.
Ultimately, "equals" means what you want it to mean. There is the restriction that "equal" values must return the same hashcode, and, of course, if presented with two identical address "equals" must return true. But you could, eg, have an "equals" that compared the contents of two web pages (ignoring the issue of repeatability for the nonce), and, even though the URLs were different, said "equal" if the page contents matched in some way.
The best documentation/guidelines I have seen for overriding the methods on Object was in Josh Bloch's Effective Java. It has a whole chapter on "Methods Common to All Objects" which includes sections about "Obey the general contract when overriding equals" and "Always override hashCode when you override equals". It describes, in detail, the things you should consider when overriding these two methods. I won't give away the answer directly; the book is definitely worth the cost for every Java developer.

How should one unit test the hashCode-equals contract?

In a nutshell, the hashCode contract, according to Java's object.hashCode():
The hash code shouldn't change unless something affecting equals() changes
equals() implies hash codes are ==
Let's assume interest primarily in immutable data objects - their information never changes after they're constructed, so #1 is assumed to hold. That leaves #2: the problem is simply one of confirming that equals implies hash code ==.
Obviously, we can't test every conceivable data object unless that set is trivially small. So, what is the best way to write a unit test that is likely to catch the common cases?
Since the instances of this class are immutable, there are limited ways to construct such an object; this unit test should cover all of them if possible. Off the top of my head, the entry points are the constructors, deserialization, and constructors of subclasses (which should be reducible to the constructor call problem).
[I'm going to try to answer my own question via research. Input from other StackOverflowers is a welcome safety mechanism to this process.]
[This could be applicable to other OO languages, so I'm adding that tag.]
EqualsVerifier is a relatively new open source project and it does a very good job at testing the equals contract. It doesn't have the issues the EqualsTester from GSBase has. I would definitely recommend it.
My advice would be to think of why/how this might ever not hold true, and then write some unit tests which target those situations.
For example, let's say you had a custom Set class. Two sets are equal if they contain the same elements, but it's possible for the underlying data structures of two equal sets to differ if those elements are stored in a different order. For example:
MySet s1 = new MySet( new String[]{"Hello", "World"} );
MySet s2 = new MySet( new String[]{"World", "Hello"} );
assertEquals(s1, s2);
assertTrue( s1.hashCode()==s2.hashCode() );
In this case, the order of the elements in the sets might affect their hash, depending on the hashing algorithm you've implemented. So this is the kind of test I'd write, since it tests the case where I know it would be possible for some hashing algorithm to produce different results for two objects I've defined to be equal.
You should use a similar standard with your own custom class, whatever that is.
It's worth using the junit addons for this. Check out the class EqualsHashCodeTestCase http://junit-addons.sourceforge.net/ you can extend this and implement createInstance and createNotEqualInstance, this will check the equals and hashCode methods are correct.
I would recommend the EqualsTester from GSBase. It does basically what you want. I have two (minor) problems with it though:
The constructor does all the work, which I don't consider to be good practice.
It fails when an instance of class A equals to an instance of a subclass of class A. This is not necessarily a violation of the equals contract.
[At the time of this writing, three other answers were posted.]
To reiterate, the aim of my question is to find standard cases of tests to confirm that hashCode and equals are agreeing with each other. My approach to this question is to imagine the common paths taken by programmers when writing the classes in question, namely, immutable data. For example:
Wrote equals() without writing hashCode(). This often means equality was defined to mean equality of the fields of two instances.
Wrote hashCode() without writing equals(). This may mean the programmer was seeking a more efficient hashing algorithm.
In the case of #2, the problem seems nonexistent to me. No additional instances have been made equals(), so no additional instances are required to have equal hash codes. At worst, the hash algorithm may yield poorer performance for hash maps, which is outside the scope of this question.
In the case of #1, the standard unit test entails creating two instances of the same object with the same data passed to the constructor, and verifying equal hash codes. What about false positives? It's possible to pick constructor parameters that just happen to yield equal hash codes on a nonetheless unsound algorithm. A unit test that tends to avoid such parameters would fulfill the spirit of this question. The shortcut here is to inspect the source code for equals(), think hard, and write a test based on that, but while this may be necessary in some cases, there may also be common tests that catch common problems - and such tests also fulfill the spirit of this question.
For example, if the class to be tested (call it Data) has a constructor that takes a String, and instances constructed from Strings that are equals() yielded instances that were equals(), then a good test would probably test:
new Data("foo")
another new Data("foo")
We could even check the hash code for new Data(new String("foo")), to force the String to not be interned, although that's more likely to yield a correct hash code than Data.equals() is to yield a correct result, in my opinion.
Eli Courtwright's answer is an example of thinking hard of a way to break the hash algorithm based on knowledge of the equals specification. The example of a special collection is a good one, as user-made Collections do turn up at times, and are quite prone to muckups in the hash algorithm.
This is one of the only cases where I would have multiple asserts in a test. Since you need to test the equals method you should also check the hashCode method at the same time. So on each of your equals method test cases check the hashCode contract as well.
A one = new A(...);
A two = new A(...);
assertEquals("These should be equal", one, two);
int oneCode = one.hashCode();
assertEquals("HashCodes should be equal", oneCode, two.hashCode());
assertEquals("HashCode should not change", oneCode, one.hashCode());
And of course checking for a good hashCode is another exercise. Honestly I wouldn't bother to do the double check to make sure the hashCode wasn't changing in the same run, that sort of problem is better handled by catching it in a code review and helping the developer understand why that's not a good way to write hashCode methods.
You can also use something similar to http://code.google.com/p/guava-libraries/source/browse/guava-testlib/src/com/google/common/testing/EqualsTester.java
to test equals and hashCode.
If I have a class Thing, as most others do I write a class ThingTest, which holds all the unit tests for that class. Each ThingTest has a method
public static void checkInvariants(final Thing thing) {
...
}
and if the Thing class overrides hashCode and equals it has a method
public static void checkInvariants(final Thing thing1, Thing thing2) {
ObjectTest.checkInvariants(thing1, thing2);
... invariants that are specific to Thing
}
That method is responsible for checking all invariants that are designed to hold between any pair of Thing objects. The ObjectTest method it delegates to is responsible for checking all invariants that must hold between any pair of objects. As equals and hashCode are methods of all objects, that method checks that hashCode and equals are consistent.
I then have some test methods that create pairs of Thing objects, and pass them to the pairwise checkInvariants method. I use equivalence partitioning to decide what pairs are worth testing. I usually create each pair to be different in only one attribute, plus a test that tests two equivalent objects.
I also sometimes have a 3 argument checkInvariants method, although I finds that is less useful in findinf defects, so I do not do this often

Overriding the equals method vs creating a new method

I have always thought that the .equals() method in java should be overridden to be made specific to the class you have created. In other words to look for equivalence of two different instances rather than two references to the same instance. However I have encountered other programmers who seem to think that the default object behavior should be left alone and a new method created for testing equivalence of two objects of the same class.
What are the argument for and against overriding the equals method?
Overriding the equals method is necessary if you want to test equivalence in standard library classes (for example, ensuring a java.util.Set contains unique elements or using objects as keys in java.util.Map objects).
Note, if you override equals, ensure you honour the API contract as described in the documentation. For example, ensure you also override Object.hashCode:
If two objects are equal according to
the equals(Object) method, then
calling the hashCode method on each of
the two objects must produce the same
integer result.
EDIT: I didn't post this as a complete answer on the subject, so I'll echo Fredrik Kalseth's statement that overriding equals works best for immutable objects. To quote the API for Map:
Note: great care must be exercised if
mutable objects are used as map keys.
The behavior of a map is not specified
if the value of an object is changed
in a manner that affects equals
comparisons while the object is a key
in the map.
I would highly recommend picking up a copy of Effective Java and reading through item 7 obeying the equals contract. You need to be careful if you are overriding equals for mutable objects, as many of the collections such as Maps and Sets use equals to determine equivalence, and mutating an object contained in a collection could lead to unexpected results. Brian Goetz also has a pretty good overview of implementing equals and hashCode.
You should "never" override equals & getHashCode for mutable objects - this goes for .net and Java both. If you do, and use such an object as the key in f.ex a dictionary and then change that object, you'll be in trouble because the dictionary relies on the hashcode to find the object.
Here's a good article on the topic: http://weblogs.asp.net/bleroy/archive/2004/12/15/316601.aspx
#David Schlosnagle mentions mentions Josh Bloch's Effective Java -- this is a must-read for any Java developer.
There is a related issue: for immutable value objects, you should also consider overriding compare_to. The standard wording for if they differ is in the Comparable API:
It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."
The Equals method is intended to compare references. So it should not be overriden to change its behaviour.
You should create a new method to test for equivalence in different instances if you need to (or use the CompareTo method in some .NET classes)
To be honest, in Java there is not really an argument against overriding equals. If you need to compare instances for equality, then that is what you do.
As mentioned above, you need to be aware of the contract with hashCode, and similarly, watch out for the gotchas around the Comparable interface - in almost all situations you want the natural ordering as defined by Comparable to be consistent with equals (see the BigDecimal api doc for the canonical counter example)
Creating a new method for deciding equality, quite apart from not working with the existing library classes, flies in the face of Java convention somewhat.
You should only need to override the equals() method if you want specific behaviour when adding objects to sorted data structures (SortedSet etc.)
When you do that you should also override hashCode().
See here for a complete explanation.

Categories