I had a interview today and the person taking my interview puzzled me with his statement asking if it possible that TreeSet equals HashSet but not HashSet equals TreeSet. I said "no" but according to him the answer is "yes".
How is it even possible?
Your interviewer is right, they do not hold equivalence relation for some specific cases. It is possible that TreeSet can be equal to HashSet and not vice-versa. Here is an example:
TreeSet<String> treeSet = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
HashSet<String> hashSet = new HashSet<>();
treeSet.addAll(List.of("A", "b"));
hashSet.addAll(List.of("A", "B"));
System.out.println(hashSet.equals(treeSet)); // false
System.out.println(treeSet.equals(hashSet)); // true
The reason for this is that a TreeSet uses comparator to determine if an element is duplicate while HashSet uses equals.
Quoting TreeSet:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.
It’s not possible without violating the contract of either equals or Set. The definition of equals in Java requires symmetry, I.e. a.equals(b) must be the same as b.equals(a).
In fact, the very documentation of Set says
Returns true if the specified object is also a set, the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set). This definition ensures that the equals method works properly across different implementations of the set interface.
NO, this is impossible without violating general contract of the equals method of the Object class, which requires symmetry, i. e. x.equals(y) if and only if y.equals(x).
BUT, classes TreeSet and HashSet implement the equals contract of the Set interface differently. This contract requires, among other things, that every member of the specified set is contained in this set. To determine whether an element is in the set the contains method is called, which for TreeSet uses Comparator and for HashSet uses hashCode.
And finally:
YES, this is possible in some cases.
This is a quote from the book Java Generics and Collections:
In principle, all that a client should need to know is how to keep to
its side of the contract; if it fails to do that, all bets are off and
there should be no need to say exactly what the supplier will do.
So the answer is : Yes it can happen but only when you don't keep to your side of the contract with Java. Here you can say Java has violated the symmetric property of equality but if that happen be sure that you are the one who has broken the contract of some other interfaces first. Java has already documented this behaviour.
Generally you should read documentation of Comparator and Comparable interfaces to use them correctly in sorted collections.
This question is somehow answered in Effective Java Third Edition Item 14 on pages 66-68.
This is a quote from the book when defining contract for implementing Comparable interface(note that this is only part of the whole contract):
• It is strongly recommended, but not required, that (x.compareTo(y)
== 0)
== (x.equals(y)). Generally speaking, any class that implements the Comparable interface and violates this condition should clearly
indicate this fact. The recommended language is “Note: This class has
a natural ordering that is inconsistent with equals.”
It says It is strongly recommended, but not required, it means you are allowed to have classes for which
x.compareTo(y)==0 does not mean x.equal(y)==true.(But if it is implemented that way you can't use them as an element in sorted collections, this is exactly the case with BigDecimal)
The paragraph of the book describing this part of the contract of Comparable interface is worth mentioning:
It is a strong suggestion rather than a true requirement, simply
states that the equality test imposed by the compareTo method should
generally return the same results as the equals method. If this
provision is obeyed, the ordering imposed by the compareTo method is
said to be consistent with equals. If it’s violated, the ordering is
said to be inconsistent with equals. A class whose compareTo method
imposes an order that is inconsistent with equals will still work, but
sorted collections containing elements of the class may not obey the
general contract of the appropriate collec- tion interfaces
(Collection, Set, or Map). This is because the general contracts for
these interfaces are defined in terms of the equals method, but sorted
collec- tions use the equality test imposed by compareTo in place of
equals. It is not a catastrophe if this happens, but it’s something to
be aware of.
Actually we have some classes in Java itself that did not follow this recommendation. BigDecimal is one of them and this is mentioned in the book.
For example, consider the BigDecimal class, whose compareTo method is
inconsistent with equals. If you create an empty HashSet instance and
then add new BigDecimal("1.0") and new BigDecimal("1.00"), the set
will contain two elements because the two BigDecimal instances added
to the set are unequal when compared using the equals method. If,
however, you perform the same procedure using a TreeSet instead of a
HashSet, the set will contain only one element because the two
BigDecimal instances are equal when compared using the compareTo
method. (See the BigDecimal documentation for details.)
However this behaviour is documented in BigDecimal Documentation. Let's have a look at that part of the documentation:
Note: care should be exercised if BigDecimal objects are used as keys
in a SortedMap or elements in a SortedSet since BigDecimal's natural
ordering is inconsistent with equals. See Comparable, SortedMap or
SortedSet for more information.
So although you can write code like below you should not do it because the BigDecimal class has prohibited this usage:
Set<BigDecimal> treeSet = new TreeSet<>();
Set<BigDecimal> hashSet = new HashSet<>();
treeSet.add(new BigDecimal("1.00"));
treeSet.add(new BigDecimal("2.0"));
hashSet.add(new BigDecimal("1.00"));
hashSet.add(new BigDecimal("2.00"));
System.out.println(hashSet.equals(treeSet)); // false
System.out.println(treeSet.equals(hashSet)); // true
Note that Comparable will be used as natural ordering of the elements when you don't pass any comparator to TreeSet or TreeMap, the same thing can happen when you pass Comparator to those class constructor. This is mentioned in the Comparator documentation:
The ordering imposed by a comparator c on a set of elements S is said
to be consistent with equals if and only if c.compare(e1, e2)==0 has
the same boolean value as e1.equals(e2) for every e1 and e2 in S.
Caution should be exercised when using a comparator capable of
imposing an ordering inconsistent with equals to order a sorted set
(or sorted map). Suppose a sorted set (or sorted map) with an explicit
comparator c is used with elements (or keys) drawn from a set S. If
the ordering imposed by c on S is inconsistent with equals, the sorted
set (or sorted map) will behave "strangely." In particular the sorted
set (or sorted map) will violate the general contract for set (or
map), which is defined in terms of equals.
So considering this documention of Comparator, following example given by #Aniket Sahrawat is not supported to work:
TreeSet<String> treeSet = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
HashSet<String> hashSet = new HashSet<>();
treeSet.addAll(List.of("A", "b"));
hashSet.addAll(List.of("A", "B"));
System.out.println(hashSet.equals(treeSet)); // false
System.out.println(treeSet.equals(hashSet)); // true
In a nutshell the answer is: Yes it can happen but only when you break the documented contract of one of the aforementioned interfaces(SortedSet, Comparable, Comparator).
There already are good answers, but I would like to approach this from a bit more general perspective.
In the Mathematics, Logic, and correspondingly, in the Computer Science, "is equal to" is a Symmetric Binary Relation, which means, that if A is equal to B then B is equal to A.
So, if TreeSet X equals HashSet Y, then HashSet Y must equal to TreeSet X, and that must be true always.
If, however, symmetric property of the Equality is violated (i.e. Equality is not implemented correctly), then x.equals(y) might not mean y.equals(x).
The documentation of Object#equals method in Java, explicitly states, that:
The equals method implements an equivalence relation on non-null object references.
hence, it implements the symmetric property, and if it does not, then it violates the Equality, in general, and violates the Object#equals method, specifically in Java.
Related
The Guava JavaDocs for Sets.SetView.union() (as well as intersection(), difference(), and symmetricDifference()) mention "equivalence relations":
Results are undefined if set1 and set2 are sets based on different equivalence relations (as HashSet, TreeSet, and the Map.keySet() of an IdentityHashMap all are).
I struggle to understand the meaning of that sentence.
The glossary defines "equivalence relation" as reflexive ("a.relation(a) is always true"), symmetric (a1.relation(a2) == a2.relation(a1)) and transitive (a1.relation(a2) && a2.relation(a3) implies a1.relation(a3)) - and refers to Object.equals()' docs. (Unfortunately, the Guava wiki doesn't go into any detail...
But how are the different types of Set different in that respect (i.e. equivalence relations)? They all seem to inherit equals() from AbstractSet? It doesn't have to do with the type of object a set holds (e.g. Set<Cow> vs. Set<Chicken>), does it?
It sounds like they are referring to when a Set doesn't use equals and hashCode to compare elements for some reason. The most common example of this is a TreeSet with a custom Comparator. For example, we could have something like this:
Set<String> a = new TreeSet<>();
Set<String> b = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
The union, intersection, etc. of a and b are undefined, because a and b have different equivalence relations defined between elements.
Java SE also makes mention of this kind of situation when it's talking about ordering which is inconsistent with equals (see TreeSet):
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
When I'm looking at the Java Object Ordering tutorial, the last section 'Comparators' of the article confused me a little bit.
By defining a class Employee which itself is comparable by employee's name, the tutorial doesn't show if this class has overridden the equals method. Then it uses a customized Comparator in which the employees are sorted by the seniority to sort a list of employees and which I could understand.
Then the tutorial explains why this won't work for a sorted collection such as TreeSet (a SortedSet), and the reason is:
it generates an ordering that is not compatible with equals. This means that this Comparator equates objects that the equals method does not. In particular, any two employees who were hired on the same date will compare as equal. When you're sorting a List, this doesn't matter; but when you're using the Comparator to order a sorted collection, it's fatal. If you use this Comparator to insert multiple employees hired on the same date into a TreeSet, only the first one will be added to the set; the second will be seen as a duplicate element and will be ignored.
Now I'm confused, since I know List allows duplicate elements while Set doesn't based on equals method. So I wonder when the tutorial says the ordering generated by the Comparator is not compatible with equals, what does it mean? And it also says 'If you use this Comparator to insert multiple employees hired on the same date into a TreeSet, only the first one will be added to the set; the second will be seen as a duplicate element and will be ignored.' I don't understand how using a Comparator will affect the use of original equals method. I think my question is how the TreeSet will be produced and sorted in this case and when the compare and equals methods are used.
So I wonder when the tutorial says the ordering generated by the Comparator is not compatible with equals, what does it mean?
In this example, the Comparator compares two Employee objects based on their seniority alone. This comparison does not in any way use equals or hashCode. Keeping that in mind, when we pass this Comparator to a TreeSet, the set will consider any result of 0 from the Comparator as equality. Therefore, if any Employees share starting dates, only one will be added because the set thinks they are equal.
Finally:
I think my question is how the TreeSet will be produced and sorted in this case and when the compare and equals methods are used.
For the TreeSet, if a Comparator is given, it uses the compare method to determine equality and ordering of objects. If no Comparator is given, then the set uses the compareTo method of the objects being sorted (they must implement Comparable).
The reason why the Java specification claims that the compare/compareTo method being used must be in line with equals is because the Set specification makes use of equals, even though this specific type of Set, the TreeSet, uses comparisons instead.
If you ever receive a Set from some method implementation, you can expect that there are no duplicates of the objects in that Set as defined by the equals method. Because TreeSet doesn't use this method, however, developers must be careful to ensure that the comparison method results in the same equality as equals does.
A TreeSet uses only the Comparator to determine if two elements are "equal":
https://docs.oracle.com/javase/7/docs/api/java/util/TreeSet.html
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
This means that the Comparator should return 0 if and only if the equals returns true, to get a consistent behaviour between TreeSet and other sets, like a HashSet. The HashSet indeed uses equals and the hash code to determine if two elements are "equal".
I have an object, Foo which inherits the default equals method from Object, and I don't want to override this because reference equality is the identity relation that I would like to use.
I now have a specific situation in which I would now like to compare these objects according to a specific field. I'd like to write a comparator, FooValueComparator, to perform this comparison. However, if my FooValueComparator returns 0 whenever two objects have the same value for this particular field, then it is incompatible with the equals method inherited from Object, along with all the problems that entails.
What I would like to do would be to have FooValueComparator compare the two objects first on their field value, and then on their references. Is this possible? What pitfalls might that entail (eg. memory locations being changed causing the relative order of Foo objects to change)?
The reason I would like my comparator to be compatible with equals is because I would like to have the option of applying it to SortedSet collections of Foo objects. I don't want a SortedSet to reject a Foo that I try to add just because it already contains a different object having the same value.
This is described in the documentation of Comparator:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of equals.
It short, if the implementation of Comparator is not consistent with equals method, then you should know what you're doing and you're responsible of the side effects of this design, but it's not an imposition to make the implementation consistent to Object#equals. Still, take into account that it is preferable to do it in order to not cause confusion for future coders that will maintain the system. Similar concept applies when implementing Comparable.
An example of this in the JDK may be found in BigDecimal#compareTo, which explicitly states in javadoc that this method is not consistent with BigDecimal#equals.
If your intention is to use a SortedSet<YourClass> then probably you're using the wrong approach. I would recommend using a SortedMap<TypeOfYourField, Collection<YourClass>> (or SortedMap<TypeOfYourField, YourClass>, in case there are no equals elements for the same key) instead. It may be more work to do, but it provides you more control of the data stored/retrieved in/from the structure.
You may have several comparators for a given class, i.e each per different field. In that case equals can not be reused. Therefore the answer is not necessarily. You should make them consistence however if your collection is stored in a sorted (map or tree) and the comperator is used to determined element position in that collection.
See documentation for details.
It says in the contract for the Comparator interface, that it must be consistent with equals.
Does this mean that Comparator = zero if equalsTo = true , or does it mean that Comparator = zero if and only if equalsTo = true?
I seem to remember that it is the second one, but I have come across lots of comparators which sort by non-unique sub properties.
For example, I might have objects which have a sub-property date, and I want to sort my list of objects by the date of submission. However, you can have several objects with the same date? What are the consequences of this? Surely there is a best practice solution to this problem already? How can I sort a collection by a property which is not guaranteed to be unique without violating the comparator contract? What are the consequences for this type of violation? Are they manageable?
It's not at all true that Comparator must be consistent with equals.
The docs merely warn for this situation:
Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map) (http://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html)
If you have one ordering based on date, and another ordering based on date+time, you should simply implement multiple comparators.
Perhaps you are confusing Comparator with Comparable? For Comparable the docs strongly advice against this situation:
It is strongly recommended (though not required) that natural orderings be consistent with equals. (http://docs.oracle.com/javase/7/docs/api/java/lang/Comparable.html)
This difference makes sense if you realize that an object can only have 1 implementation of Comparable, but multiple of Comparator. The whole idea of Comparator is to have multiple ways of comparing the same class.
edit you could have mulitple Comparators and as popovitsj stated they don't necessarily have to be consistent with equals
(although I presume most of the time you have Comparator.compare(obj1, obj2) == 0 <=> obj1.equals(obj2) == true)
If you want to have specific sort results when sorting by non-unique field, you need to customize your Comparator to account for these,
for example, while implementing compare() you encounter that obj1.date == obj2.date, then you should compare other important fields (name, age, etc) to rank obj1 vs obj2 accordingly and return corresponding value.
Hope that helps.
As you suspect, in order for compareTo() to be consistent with equals(), compareTo() must always return 0 when equals() returns true. Similarly, if equals returns false, then compareTo must not return 0.
However, as popovitsj has pointed out in his answer, consistency with equals() is not a requirement. As such, the above only applies when you are attempting to make the two methods consistent.
It says in the contract for the Comparator interface, that it must be consistent with equals.
That's not entirely correct; see #popovitjs' answer.
Does this mean that Comparator = zero if equalsTo = true , or does it mean that Comparator = zero if and only if equalsTo = true?
It means the latter. However, it is not actually a hard requirement for Comparator objects.
I seem to remember that it is the second one, but I have come across lots of comparators which sort by non-unique sub properties.
Well that's reasonable, given that it is not actually a hard requirement. In fact, a Comparator that is inconsistent with equals(Object) is just fine if you are going to use it with Arrays.sort(...). The problems only arise with TreeSet and TreeMap.
For example, suppose that you have a Comparator<E> C that says e1 and e2 are not equal, but e1.equals(e2) returns true. Now suppose that you create a TreeSet<E> instance using the comparator, and then add e1 and e2 to that set. The set's tree is organized based on the comparator, and therefore e1 and e2 will slot into different places in the search tree, and will both be elements of the set. But that violates the primary invariant of a Set ... which is based on the equals method.
As the javadoc for TreeSet says:
"Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface."
And this answers the last part of your question.
What are the consequences for this type of violation?
If you use an inconsistent Comparator in a TreeSet or TreeMap, the collection will not obey the Set or Map contract.
a and b may not be equal. But if comparator is zero when comparing a and b it should be zero when comparing b and a.
In this answer say:
Typically, if 2 objects are equal from an equals perspective but not from a compareTo perspective, you can store both objects as keys in a TreeMap. This can lead to un-intuitive behaviour. It can also be done on purpose in specific situations.
But this is for specific situations in general nothing stops you from having an inconsistant behaviour where equals and compareTo dont behave consistently.
One example, this morning someone asked: Move specific items to the end of a list
Most of the answers have comparators that returns 0 for elements that are not equal.
I am aware that if one overrides equals, hashCode should also be overridden. Are there any similar rules that would apply to overriding compareTo?
This is a Java question.
The expectations of it can be read here: http://docs.oracle.com/javase/7/docs/api/java/lang/Comparable.html
The part that will be of the most interest to you is probably:
It is strongly recommended (though not required) that natural
orderings be consistent with equals. This is so because sorted sets
(and sorted maps) without explicit comparators behave "strangely" when
they are used with elements (or keys) whose natural ordering is
inconsistent with equals. In particular, such a sorted set (or sorted
map) violates the general contract for set (or map), which is defined
in terms of the equals method.
It is explained in the JavaDocs:
The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C
Note that it is not required, i.e. If two classes are equal according to the compareTo(), they don't have to hold equals(). This is fine because you can for instance sort people by age, so two people with the same age are considered equal with regards to Comparator<Person>, but they obviously don't have to be equal.
However in this particular case you might want to add secondary attributes to comparator if ages are equal (so sorting is always stable and predictable across same-aged people) so after all including the same attributes in compareTo() might be a good idea in some cases.
The documentation for Comparator has this cautionary note:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of equals.
I want to just tell you that you should have a particular properties or attribute in objects which you will use to compare two objects of same type.