Why Comparable natural ordering needs to be consistent with equals method? - java

Back after a vacation :) with the questions. I am reading Comparable interface documentation from
ComparableDocumentation. I do understand, that we use comparable as it will provide us with sorting and natural ordering. In the documentation, it is written as.
It is strongly recommended (though not required) that natural
orderings be consistent with equals. This is so because sorted sets
(and sorted maps) without explicit comparators behave "strangely" when
they are used with elements (or keys) whose natural ordering is
inconsistent with equals. In particular, such a sorted set (or sorted
map) violates the general contract for set (or map), which is defined
in terms of the equals method.
How does Comparable is related to equals. Comparable has compareTo method and why it needs to be consistent with the equals method? I am unable to understand this concept.
Also quoting from the sources, can someone elaborate on this point as well
For example, if one adds two keys a and b such that (!a.equals(b) &&
a.compareTo(b) == 0) to a sorted set that does not use an explicit
comparator, the second add operation returns false (and the size of
the sorted set does not increase) because a and b are equivalent from
the sorted set's perspective.
Thanks.

The semantics of compareTo returning 0 is that the two objects are, well, equal. Having another definition of the same relation in the other method can obviously result in many kinds of trouble, as documented in your quote as well: the typical algorithms in SortedSet implementations rely on compareTo, but the general contract of the Set interface specifies that it must not contain two objects which are equals. The inconsistency of reports from compareTo and equals will result in just such a situation.

Short answer: whenever compareTo(a,b) returns 0 for two objects a and b, a.equals(b) and b.equals(a) should be true, and vice versa.
Long answer: as stated in the Comparable documentation, implementations of Comparable enforce a total ordering. One of the properties of total ordering is "antisymmetry": if we define a total ordering ≤, then:
If a ≤ b and b ≤ a then a = b
This is represented as the 0 return value of compareTo().
This trait is exploited by the methods and classes mentioned in the text you quoted for more correct and efficient behavior.

You can see it in Set's documentation:
Note that the ordering maintained by a set (whether or not an
explicit comparator is provided) must be consistent with equals
if it is to correctly implement the {#code Set} interface. (See
{#code Comparable} or {#code Comparator} for a precise definition of
consistent with equals.) This is so because the {#code Set}
interface is defined in terms of the {#code equals} operation, but a
{#code TreeSet} instance performs all element comparisons using its
{#code compareTo} (or {#code compare}) method, so two elements that
are deemed equal by this method are, from the standpoint of the set,
equal. The behavior of a set is well-defined even if its
ordering is inconsistent with equals; it just fails to obey the
general contract of the {#code Set} interface.

Related

Is it possible that TreeSet equals HashSet but not HashSet equals TreeSet

I had a interview today and the person taking my interview puzzled me with his statement asking if it possible that TreeSet equals HashSet but not HashSet equals TreeSet. I said "no" but according to him the answer is "yes".
How is it even possible?
Your interviewer is right, they do not hold equivalence relation for some specific cases. It is possible that TreeSet can be equal to HashSet and not vice-versa. Here is an example:
TreeSet<String> treeSet = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
HashSet<String> hashSet = new HashSet<>();
treeSet.addAll(List.of("A", "b"));
hashSet.addAll(List.of("A", "B"));
System.out.println(hashSet.equals(treeSet)); // false
System.out.println(treeSet.equals(hashSet)); // true
The reason for this is that a TreeSet uses comparator to determine if an element is duplicate while HashSet uses equals.
Quoting TreeSet:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.
It’s not possible without violating the contract of either equals or Set. The definition of equals in Java requires symmetry, I.e. a.equals(b) must be the same as b.equals(a).
In fact, the very documentation of Set says
Returns true if the specified object is also a set, the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set). This definition ensures that the equals method works properly across different implementations of the set interface.
NO, this is impossible without violating general contract of the equals method of the Object class, which requires symmetry, i. e. x.equals(y) if and only if y.equals(x).
BUT, classes TreeSet and HashSet implement the equals contract of the Set interface differently. This contract requires, among other things, that every member of the specified set is contained in this set. To determine whether an element is in the set the contains method is called, which for TreeSet uses Comparator and for HashSet uses hashCode.
And finally:
YES, this is possible in some cases.
This is a quote from the book Java Generics and Collections:
In principle, all that a client should need to know is how to keep to
its side of the contract; if it fails to do that, all bets are off and
there should be no need to say exactly what the supplier will do.
So the answer is : Yes it can happen but only when you don't keep to your side of the contract with Java. Here you can say Java has violated the symmetric property of equality but if that happen be sure that you are the one who has broken the contract of some other interfaces first. Java has already documented this behaviour.
Generally you should read documentation of Comparator and Comparable interfaces to use them correctly in sorted collections.
This question is somehow answered in Effective Java Third Edition Item 14 on pages 66-68.
This is a quote from the book when defining contract for implementing Comparable interface(note that this is only part of the whole contract):
• It is strongly recommended, but not required, that (x.compareTo(y)
== 0)
== (x.equals(y)). Generally speaking, any class that implements the Comparable interface and violates this condition should clearly
indicate this fact. The recommended language is “Note: This class has
a natural ordering that is inconsistent with equals.”
It says It is strongly recommended, but not required, it means you are allowed to have classes for which
x.compareTo(y)==0 does not mean x.equal(y)==true.(But if it is implemented that way you can't use them as an element in sorted collections, this is exactly the case with BigDecimal)
The paragraph of the book describing this part of the contract of Comparable interface is worth mentioning:
It is a strong suggestion rather than a true requirement, simply
states that the equality test imposed by the compareTo method should
generally return the same results as the equals method. If this
provision is obeyed, the ordering imposed by the compareTo method is
said to be consistent with equals. If it’s violated, the ordering is
said to be inconsistent with equals. A class whose compareTo method
imposes an order that is inconsistent with equals will still work, but
sorted collections containing elements of the class may not obey the
general contract of the appropriate collec- tion interfaces
(Collection, Set, or Map). This is because the general contracts for
these interfaces are defined in terms of the equals method, but sorted
collec- tions use the equality test imposed by compareTo in place of
equals. It is not a catastrophe if this happens, but it’s something to
be aware of.
Actually we have some classes in Java itself that did not follow this recommendation. BigDecimal is one of them and this is mentioned in the book.
For example, consider the BigDecimal class, whose compareTo method is
inconsistent with equals. If you create an empty HashSet instance and
then add new BigDecimal("1.0") and new BigDecimal("1.00"), the set
will contain two elements because the two BigDecimal instances added
to the set are unequal when compared using the equals method. If,
however, you perform the same procedure using a TreeSet instead of a
HashSet, the set will contain only one element because the two
BigDecimal instances are equal when compared using the compareTo
method. (See the BigDecimal documentation for details.)
However this behaviour is documented in BigDecimal Documentation. Let's have a look at that part of the documentation:
Note: care should be exercised if BigDecimal objects are used as keys
in a SortedMap or elements in a SortedSet since BigDecimal's natural
ordering is inconsistent with equals. See Comparable, SortedMap or
SortedSet for more information.
So although you can write code like below you should not do it because the BigDecimal class has prohibited this usage:
Set<BigDecimal> treeSet = new TreeSet<>();
Set<BigDecimal> hashSet = new HashSet<>();
treeSet.add(new BigDecimal("1.00"));
treeSet.add(new BigDecimal("2.0"));
hashSet.add(new BigDecimal("1.00"));
hashSet.add(new BigDecimal("2.00"));
System.out.println(hashSet.equals(treeSet)); // false
System.out.println(treeSet.equals(hashSet)); // true
Note that Comparable will be used as natural ordering of the elements when you don't pass any comparator to TreeSet or TreeMap, the same thing can happen when you pass Comparator to those class constructor. This is mentioned in the Comparator documentation:
The ordering imposed by a comparator c on a set of elements S is said
to be consistent with equals if and only if c.compare(e1, e2)==0 has
the same boolean value as e1.equals(e2) for every e1 and e2 in S.
Caution should be exercised when using a comparator capable of
imposing an ordering inconsistent with equals to order a sorted set
(or sorted map). Suppose a sorted set (or sorted map) with an explicit
comparator c is used with elements (or keys) drawn from a set S. If
the ordering imposed by c on S is inconsistent with equals, the sorted
set (or sorted map) will behave "strangely." In particular the sorted
set (or sorted map) will violate the general contract for set (or
map), which is defined in terms of equals.
So considering this documention of Comparator, following example given by #Aniket Sahrawat is not supported to work:
TreeSet<String> treeSet = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
HashSet<String> hashSet = new HashSet<>();
treeSet.addAll(List.of("A", "b"));
hashSet.addAll(List.of("A", "B"));
System.out.println(hashSet.equals(treeSet)); // false
System.out.println(treeSet.equals(hashSet)); // true
In a nutshell the answer is: Yes it can happen but only when you break the documented contract of one of the aforementioned interfaces(SortedSet, Comparable, Comparator).
There already are good answers, but I would like to approach this from a bit more general perspective.
In the Mathematics, Logic, and correspondingly, in the Computer Science, "is equal to" is a Symmetric Binary Relation, which means, that if A is equal to B then B is equal to A.
So, if TreeSet X equals HashSet Y, then HashSet Y must equal to TreeSet X, and that must be true always.
If, however, symmetric property of the Equality is violated (i.e. Equality is not implemented correctly), then x.equals(y) might not mean y.equals(x).
The documentation of Object#equals method in Java, explicitly states, that:
The equals method implements an equivalence relation on non-null object references.
hence, it implements the symmetric property, and if it does not, then it violates the Equality, in general, and violates the Object#equals method, specifically in Java.

What do the Guava JavaDocs mean by sets being based on different "equivalence relations"?

The Guava JavaDocs for Sets.SetView.union() (as well as intersection(), difference(), and symmetricDifference()) mention "equivalence relations":
Results are undefined if set1 and set2 are sets based on different equivalence relations (as HashSet, TreeSet, and the Map.keySet() of an IdentityHashMap all are).
I struggle to understand the meaning of that sentence.
The glossary defines "equivalence relation" as reflexive ("a.relation(a) is always true"), symmetric (a1.relation(a2) == a2.relation(a1)) and transitive (a1.relation(a2) && a2.relation(a3) implies a1.relation(a3)) - and refers to Object.equals()' docs. (Unfortunately, the Guava wiki doesn't go into any detail...
But how are the different types of Set different in that respect (i.e. equivalence relations)? They all seem to inherit equals() from AbstractSet? It doesn't have to do with the type of object a set holds (e.g. Set<Cow> vs. Set<Chicken>), does it?
It sounds like they are referring to when a Set doesn't use equals and hashCode to compare elements for some reason. The most common example of this is a TreeSet with a custom Comparator. For example, we could have something like this:
Set<String> a = new TreeSet<>();
Set<String> b = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
The union, intersection, etc. of a and b are undefined, because a and b have different equivalence relations defined between elements.
Java SE also makes mention of this kind of situation when it's talking about ordering which is inconsistent with equals (see TreeSet):
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

Java's Comparator Contract

It says in the contract for the Comparator interface, that it must be consistent with equals.
Does this mean that Comparator = zero if equalsTo = true , or does it mean that Comparator = zero if and only if equalsTo = true?
I seem to remember that it is the second one, but I have come across lots of comparators which sort by non-unique sub properties.
For example, I might have objects which have a sub-property date, and I want to sort my list of objects by the date of submission. However, you can have several objects with the same date? What are the consequences of this? Surely there is a best practice solution to this problem already? How can I sort a collection by a property which is not guaranteed to be unique without violating the comparator contract? What are the consequences for this type of violation? Are they manageable?
It's not at all true that Comparator must be consistent with equals.
The docs merely warn for this situation:
Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map) (http://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html)
If you have one ordering based on date, and another ordering based on date+time, you should simply implement multiple comparators.
Perhaps you are confusing Comparator with Comparable? For Comparable the docs strongly advice against this situation:
It is strongly recommended (though not required) that natural orderings be consistent with equals. (http://docs.oracle.com/javase/7/docs/api/java/lang/Comparable.html)
This difference makes sense if you realize that an object can only have 1 implementation of Comparable, but multiple of Comparator. The whole idea of Comparator is to have multiple ways of comparing the same class.
edit you could have mulitple Comparators and as popovitsj stated they don't necessarily have to be consistent with equals
(although I presume most of the time you have Comparator.compare(obj1, obj2) == 0 <=> obj1.equals(obj2) == true)
If you want to have specific sort results when sorting by non-unique field, you need to customize your Comparator to account for these,
for example, while implementing compare() you encounter that obj1.date == obj2.date, then you should compare other important fields (name, age, etc) to rank obj1 vs obj2 accordingly and return corresponding value.
Hope that helps.
As you suspect, in order for compareTo() to be consistent with equals(), compareTo() must always return 0 when equals() returns true. Similarly, if equals returns false, then compareTo must not return 0.
However, as popovitsj has pointed out in his answer, consistency with equals() is not a requirement. As such, the above only applies when you are attempting to make the two methods consistent.
It says in the contract for the Comparator interface, that it must be consistent with equals.
That's not entirely correct; see #popovitjs' answer.
Does this mean that Comparator = zero if equalsTo = true , or does it mean that Comparator = zero if and only if equalsTo = true?
It means the latter. However, it is not actually a hard requirement for Comparator objects.
I seem to remember that it is the second one, but I have come across lots of comparators which sort by non-unique sub properties.
Well that's reasonable, given that it is not actually a hard requirement. In fact, a Comparator that is inconsistent with equals(Object) is just fine if you are going to use it with Arrays.sort(...). The problems only arise with TreeSet and TreeMap.
For example, suppose that you have a Comparator<E> C that says e1 and e2 are not equal, but e1.equals(e2) returns true. Now suppose that you create a TreeSet<E> instance using the comparator, and then add e1 and e2 to that set. The set's tree is organized based on the comparator, and therefore e1 and e2 will slot into different places in the search tree, and will both be elements of the set. But that violates the primary invariant of a Set ... which is based on the equals method.
As the javadoc for TreeSet says:
"Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface."
And this answers the last part of your question.
What are the consequences for this type of violation?
If you use an inconsistent Comparator in a TreeSet or TreeMap, the collection will not obey the Set or Map contract.
a and b may not be equal. But if comparator is zero when comparing a and b it should be zero when comparing b and a.
In this answer say:
Typically, if 2 objects are equal from an equals perspective but not from a compareTo perspective, you can store both objects as keys in a TreeMap. This can lead to un-intuitive behaviour. It can also be done on purpose in specific situations.
But this is for specific situations in general nothing stops you from having an inconsistant behaviour where equals and compareTo dont behave consistently.
One example, this morning someone asked: Move specific items to the end of a list
Most of the answers have comparators that returns 0 for elements that are not equal.

Overriding compareTo and Compare in java7

Do I need to override the equals() method when ever I override the CompareTo and Compare method in java, in order to satisfy the Comparable contract ? Will this create any issues when I do a Collections.sort or Array.sort?
From the Javadoc for Comparator
It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."
This means that you don't generally need to override equals(). You probably shouldn't do this, unless you want your Comparator to return a non-zero comparison for two values which return true when compared with equals.
If you think that the existence of a comparison requires you to change the definition of what it means for two things to be equal, then you've probably designed something badly.
From Comparable:
The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false.
It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.
(emphasis mine)
So you don't need to override equals() (i.e. it will not cause a problem within the standard sort methods since they use only compareTo(), nor does it violate the contract of Comparable), but it certainly wouldn't hurt.

What are the implications for overriding compareTo?

I am aware that if one overrides equals, hashCode should also be overridden. Are there any similar rules that would apply to overriding compareTo?
This is a Java question.
The expectations of it can be read here: http://docs.oracle.com/javase/7/docs/api/java/lang/Comparable.html
The part that will be of the most interest to you is probably:
It is strongly recommended (though not required) that natural
orderings be consistent with equals. This is so because sorted sets
(and sorted maps) without explicit comparators behave "strangely" when
they are used with elements (or keys) whose natural ordering is
inconsistent with equals. In particular, such a sorted set (or sorted
map) violates the general contract for set (or map), which is defined
in terms of the equals method.
It is explained in the JavaDocs:
The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C
Note that it is not required, i.e. If two classes are equal according to the compareTo(), they don't have to hold equals(). This is fine because you can for instance sort people by age, so two people with the same age are considered equal with regards to Comparator<Person>, but they obviously don't have to be equal.
However in this particular case you might want to add secondary attributes to comparator if ages are equal (so sorting is always stable and predictable across same-aged people) so after all including the same attributes in compareTo() might be a good idea in some cases.
The documentation for Comparator has this cautionary note:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of equals.
I want to just tell you that you should have a particular properties or attribute in objects which you will use to compare two objects of same type.

Categories