Using multiple alternatives of hashCode() and equals() for sets

Using multiple alternatives of hashCode() and equals() for sets - java

Suppose I have a simple POJO class Class1 , and it has 2 fields of type int.
I've implemented the hashCode() and equals() methods of it to handle exactly those 2 fields, in order to put instances of the class into a set.
So far so good.
Now, I want to have a different set, which considers instances of Class1 to be equal if the first field is equal , making the equality condition weaker. I might even want to have another set which considers only the second field as the one that checks for equality.
Is it possible? If so, how?

You can get that effect by using a TreeSet when providing a custom Comparator that only inspects the fields you're interested in.
Note, however, that strictly speaking such a TreeSet no longer is a "correct" Set because it effectively ignores the equal() method of your objects:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

The standard Java libraries don't support this.
And (surprisingly) there doesn't appear to be a Map or Set class in the Apache Commons Collections or Guava libraries that supports this.
There are probably other libraries that to support this if you look hard enough.
Alternatively, you could write your own ... starting with the standard HashMap code.
A cheap-and-cheerful alternative is to create a light-weight wrapper class for your element type that delegates most methods to the wrapped class and provides a different equals / hashcode pair to the original. There is a small runtime penalty in doing this ... but it is worth considering.
Joachim's suggestion is good too, unless your sets are likely to be particularly big. (TreeSet has O(logN) lookup compared with O(1) for a properly implemented hash table.)

Related

Why does Intelij Idea let us make incorrect pair equals()-hashcode() by generator?

There is a generator in IntelliJ IDEA. You press Alt+Ins, choose 'equal and hashCode' and a constructors opens. You can choose fields for equals and then you can choose fields for hashCode(). Why can we choose different field sets? Isn't it contradicted to equals-hashCode contracts?

As per Java Doc of Object Class -
Note that it is generally necessary to override the {#code hashCode}
method whenever this method is overridden, so as to maintain the
general contract for the {#code hashCode} method, which states
that equal objects must have equal hash codes.
By default equals method returns true for inputs referring to same object instance. An overridden equals might return true for completely different objects even (even the object with different field values) which totally depends on your implementation.
The contract enforces that if your equals logic is determining two
different object as same, your hashcode method should return the same
value for those two objects.
This does not mean that you should be using the same fields for hashcode also. This is all what you should be taking care of while overriding these functions.

Well, it doesn't really allow you to choose different field sets, it allows you to pick a subset of the fields for equals, to use for hashCode.
While this will probably lead to a poorer hash code since this will cause more hash collisions, it will technically still be correct. Note that the requirement is just that equal objects have equal hash codes, not that equal hash codes must be from equal objects. The latter would be impossible to achieve for classes that can have more different instances than there are ints (e.g. java.lang.Long).
There may be good motivations to use a suboptimal hash for collisions if calculating the best hash for collisions would be too expensive, compared to just dealing with the collision.

Annotation.equals() vs. Object.equals()

Some frameworks (e.g. guice) require in certain situations to create an implementing class of an annotation interface.
There seems to be a difference between the Annotation.equals(Object) and Object.equals(Object) definitions which need to be respected in that case (same applies for hashCode()).
Questions:
Why was it designed that way and what is the reason of the difference?
What side-effects can occur when using the Object.equals(Object) definition for annotation classes instead?
Update:
Additional questions:
What about the Annotation.hashCode() definition? Is it really required to implement it that way, especially the "(...)127 times the hash code of the member-name as computed by String.hashCode()) XOR the hash code(...)"-part?
What happens if a hashCode() method is implemented to be consistent to equals() but doesn't match the exact definition of Annotation.hashCode() (e.g. using 128 times the hash code of the member-name)?

The definitions are not different. The definition in Annotation is simply specialized for the annotation type.
The definition in Object basically says "If you decide to implement equals for your class, it should represent an equivalence relation that follows these rules".
In Annotation it defines an equivalence that follows those rules, which is meaningful specifically for Annotation instances.
In fact, the Annotation equivalence would work for many other classes. The point is that different classes have different meanings, and therefore their instances may have different equivalence relationships, and it's up to the programmer to decide which equivalence relation to use for his/her class. In Annotation, the contract is for this particular equivalence relation.
As for side effects - suppose an Annotation type inherited Object's equals. This is a mistake many people do when they try to use their own classes in maps or other equals()-dependent situations. Object has an equals() function that is the same as object identity: two references are equal only if they are references to the same object.
If you used that, then no two instances would be considered the same. You would not be able to create a second Annotation instance that would be equivalent to a previous one, despite them having the same values in their fields and semantically representing the same sort of behavior. So you wouldn't be able to tell if two items are annotated with the same annotation, when they have different instances of the same annotation.
As for the hashCode question, although Jeff Bowman has already answered that, I'll address that to make my answer more complete:
Basically, implementation of annotations is left to compilers, and the JLS doesn't dictate the exact implementation. It is also possible to create implementing classes, as your question itself mentions.
This means that annotation classes can come from different sources - different compilers (you are supposed to be able to run .class files anywhere, no matter which java compiler created them) and developer-created implementations.
The equals() and hashCode() methods are usually considered in a single class context, not in an interface context. This is because interfaces are usually antithetic to implementation - they only define contracts. When you create these methods for a particular class, you know that the object you compare with is supposed to be of the same class, and thus have the same implementation. Once it has a hashCode method that returns the same value for objects that are equivalent under equals for the same class, then whatever that implementation is, it satisfies the contract.
However, in this particular case, you have an interface, and you are required to make equals() and hashcode() to work not only for two instances of the same class, but for instances of different classes that implement of the same interface. This means that if you don't agree on a single implementation across all possible classes, you might get two instances of the same annotation with the same element values, and different hash codes. This would break the hashcode() contract.
As an example, imagine an annotation #SomeAnnotation that doesn't take parameters. Imagine that you implement it with a class SomeAnnotationImpl that returns 15 as the hash code. Two equal instances of SomeAnnotationImpl will have the same hash code, which is good. But the Java compiler would return 0 as the hash code when you check the returned instance of its own implementation of #SomeAnnotation. Therefore two objects of type Annotation are equal (they implement the same annotation interface and if they follow the equals() definition above, they should return true for equals), but have different hash codes. That breaks the contract.

RealSkeptic's answer is great, but I'll put it a slightly different way.
This is a specific instance of a general problem:
You defined an interface (specifically an annotation).
Someone (javac) wrote a particular (built-in) implementation of that interface. You can't access that implementation, but need to be able to create equal instances, particularly for use in Sets and Maps. (Guice is one big Map<Key, Provider> after all.)
The implementor (javac) wrote a custom implementation of equals so that annotation instances with the same parameters pass equals. You need to match that implementation so that equals is symmetric (a.equals(b) if and only if b.equals(a), which is assumed in Java along with reflexivity, consistency, and transitivity).
Equal objects must have equal hashCodes because Java uses it as a shortcut for equality: if objects have unequal hashCodes then they cannot be equal. This comes in handy to make the efficient Map implementation HashMap, because you can use the hashCode to only check objects in the right hashCode-determined bucket. If you used a different or modified hashCode algorithm, you'd be breaking spec in theory, and in practice your annotation implementation wouldn't match others consistently in HashSet or HashMap (rendering it worthless to Guice). Many other features use hashCode, but those are the most obvious examples.
It would be much easier if Java let you instantiate their implementation, or generate an implementation automatically for your class, but here the best they've done is an exact spec for you to match.
So yes, you'll run into this with annotations more often than anything else, but these matter any time you're trying to act equal with an implementation you can't control or use yourself.

The above answers are excellent general answers to the question, but since I haven't seen them mentioned I'll just add that the use of AnnotationLiteral for implementing Annotations takes care of the equals and hashCode issues properly. There are a couple to choose from:
AnnotationLiteral
AnnotationLiteral

Why doesn't ArrayDeque override equals() and hashCode()?

EDITED: Now only ArrayDeque is considered. (I originally thought LinkedList also doesn't override the two methods.)
Collection type ArrayDeque simply uses the hashCode and equals method implementations that it inherits from Object.
Why doesn't it instead override these methods with proper implementations (i.e. hash and equality test based on contained elements)?

LinkedList extends AbstractSequentialList which extends AbstractList which does override equals and hashCode - so the implementation is not inherited from Object.
ArrayDeque, on the other hand, really doesn't inherit anything other implementation as far as I can see. Its direct superclass (AbstractCollection) doesn't override them. This feels like an exception rather than the rule - I believe most collection implementations in Java "do the right thing".
I don't know of the justification for ArrayDeque choosing not to implement equality, but if you want to compare two deques you could easily just convert them into lists or arrays and do it that way.

They are overrided in AbstractList, that is present in LinkedList inheritance

It generally does not make sense for object instances which are going to be mutated to report themselves as equal to anything other than themselves. The primary reason that instances of some mutable collection types report themselves as equal to other collection instances that it is common for code to hold references to instances which, even though they "could" be mutated, won't be. Although code could hold references to two ArrayDequeue for the purpose of encapsulating all of the items that have ever been or are ever going to be put in them, and it might make sense to compare the contents of two ArrayDequeue instances which are held for that purpose, the whole purpose of the type is to facilitate the pushing and popping of items; in cases where it would make sense for equals to check for identical content, it would likely also make sense to extract the contents into a type whose purpose is to encapsulate a list.

According to official Javadoc - you're not correct. LinkedList use equals from AbstractList, that perform deep equals
For more information - look at this - http://docs.oracle.com/javase/6/docs/api/java/util/AbstractList.html#equals(java.lang.Object)

With Guava you can use the Iterables.elementsEqual method.

use of equals() method in comparator interface?

equals() method is available to all java collection classes from the Object class. This method is also declared in Comparator interface, so what is the purpose of declaring this method in Comparator? in which case is it used and how?

what is the purpose of declaring this method in Comparator?
I think it's the designer's way of highlighting the fact that Comparator.equals() imposes some additional requirements on any classes that implement the interface:
Additionally, this method can return true only if the specified object is also a comparator and it imposes the same ordering as this comparator. Thus, comp1.equals(comp2) implies that sgn(comp1.compare(o1, o2))==sgn(comp2.compare(o1, o2)) for every object reference o1 and o2.
The method can be used to establish whether or not two distinct comparators impose the same order.

I think that the main reason is to make it clear that equals method is for testing the Comparator itself. This is obvious when you think about it, but can I imagine that some people might expect equals(Object) to (somehow) be semantically related to the compare(T, T) method.
It also allows the documentation of some common-sense guidelines for when two comparators could be viewed as equal.
Either way, the presence of the equals(Object) method in the interface is solely for documentation purposes.

From the javadoc
Note that it is always safe not to override Object.equals(Object).
However, overriding this method may, in some cases, improve
performance by allowing programs to determine that two distinct
comparators impose the same order.
The idea is simply to be able to allow you to not sort a collection that has already been sorted by another comparator if you realize that the end result will be the same.
Generally it had little use, but when sorting very large collections it is something you might want to look into.

-when the declaring Comparator is compared to another Object (argument)

It's just an over-ridden form of the Object's equals method to let you know if two objects are of same comparator type.

As per your question I think It is used to compare objects after converting in string.
Object class eqlas methods chek both Object are eqls or not And Competres method chek object data like Hello.eqlas("hello")

Overriding the equals method vs creating a new method

I have always thought that the .equals() method in java should be overridden to be made specific to the class you have created. In other words to look for equivalence of two different instances rather than two references to the same instance. However I have encountered other programmers who seem to think that the default object behavior should be left alone and a new method created for testing equivalence of two objects of the same class.
What are the argument for and against overriding the equals method?

Overriding the equals method is necessary if you want to test equivalence in standard library classes (for example, ensuring a java.util.Set contains unique elements or using objects as keys in java.util.Map objects).
Note, if you override equals, ensure you honour the API contract as described in the documentation. For example, ensure you also override Object.hashCode:
If two objects are equal according to
the equals(Object) method, then
calling the hashCode method on each of
the two objects must produce the same
integer result.
EDIT: I didn't post this as a complete answer on the subject, so I'll echo Fredrik Kalseth's statement that overriding equals works best for immutable objects. To quote the API for Map:
Note: great care must be exercised if
mutable objects are used as map keys.
The behavior of a map is not specified
if the value of an object is changed
in a manner that affects equals
comparisons while the object is a key
in the map.

I would highly recommend picking up a copy of Effective Java and reading through item 7 obeying the equals contract. You need to be careful if you are overriding equals for mutable objects, as many of the collections such as Maps and Sets use equals to determine equivalence, and mutating an object contained in a collection could lead to unexpected results. Brian Goetz also has a pretty good overview of implementing equals and hashCode.

You should "never" override equals & getHashCode for mutable objects - this goes for .net and Java both. If you do, and use such an object as the key in f.ex a dictionary and then change that object, you'll be in trouble because the dictionary relies on the hashcode to find the object.
Here's a good article on the topic: http://weblogs.asp.net/bleroy/archive/2004/12/15/316601.aspx

#David Schlosnagle mentions mentions Josh Bloch's Effective Java -- this is a must-read for any Java developer.
There is a related issue: for immutable value objects, you should also consider overriding compare_to. The standard wording for if they differ is in the Comparable API:
It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."

The Equals method is intended to compare references. So it should not be overriden to change its behaviour.
You should create a new method to test for equivalence in different instances if you need to (or use the CompareTo method in some .NET classes)

To be honest, in Java there is not really an argument against overriding equals. If you need to compare instances for equality, then that is what you do.
As mentioned above, you need to be aware of the contract with hashCode, and similarly, watch out for the gotchas around the Comparable interface - in almost all situations you want the natural ordering as defined by Comparable to be consistent with equals (see the BigDecimal api doc for the canonical counter example)
Creating a new method for deciding equality, quite apart from not working with the existing library classes, flies in the face of Java convention somewhat.

You should only need to override the equals() method if you want specific behaviour when adding objects to sorted data structures (SortedSet etc.)
When you do that you should also override hashCode().
See here for a complete explanation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.