How to implement java.util.Comparator that orders its elements according to a partial order relation?
For example given a partial order relation a ≺ c, b ≺ c; the order of a and b is undefined.
Since Comparator requires a total ordering, the implementation orders elements for which the partial ordering is undefined arbitrarily but consistent.
Would the following work?
interface Item {
boolean before(Item other);
}
class ItemPartialOrderComperator implements Comparator<Item> {
#Override
public int compare(Item o1, Item o2) {
if(o1.equals(o2)) { // Comparator returns 0 if and only if o1 and o2 are equal;
return 0;
}
if(o1.before(o2)) {
return -1;
}
if(o2.before(o1)) {
return +1;
}
return o1.hashCode() - o2.hashCode(); // Arbitrary order on hashcode
}
}
Is this comparator's ordering transitive?
(I fear that it is not)
Are Comparators required to be transitive?
(when used in a TreeMap)
How to implement it correctly?
(if the implementation above doesn't work)
(Hashcodes can collide, for simplicity collisions the example ignores collisions; see Damien B's answer to Impose a total ordering on all instances of *any* class in Java for a fail-safe ordering on hashcodes.)
The problem is that, when you have incomparable elements, you need to fall back to something cleverer than comparing hash codes. For example, given a partial order {a < b, c < d}, the hash codes could satisfy h(d) < h(b) < h(c) < h(a), which means that a < b < c < d < a (bold denotes tie broken by hash code), which will cause problems with a TreeMap.
In general, there's probably nothing for you to do except topologically sort the keys beforehand, so some details about the partial orders of interest to you would be welcome.
It seems to be more of an answer than a comment so I'll post it
The documentation says:
It follows immediately from the contract for compare that the quotient is an equivalence relation on S, and that the imposed ordering is a total order on S."
So no, a Comparator requires a total ordering. If you implement this with a partial ordering you're breaching the interface contract.
Even if it might work in some scenario, you should not attempt to solve your problem in a way that breaches the contract of the interface.
See this question about data structures that do fit a partial ordering.
Any time I've tried using hash codes for this sort of thing I've come to regret it. You will be much happier if your ordering is deterministic - for debuggability if nothing else. The following will achieve that, by creating a fresh index for any not previously encountered Item and using those indices for the comparison if all else fails.
Note that the ordering still is not guaranteed to be transitive.
class ItemPartialOrderComperator implements Comparator<Item> {
#Override
public int compare(Item o1, Item o2) {
if(o1.equals(o2)) {
return 0;
}
if(o1.before(o2)) {
return -1;
}
if(o2.before(o1)) {
return +1;
}
return getIndex(o1) - getIndex(o2);
}
private int getIndex(Item i) {
Integer result = indexMap.get(i);
if (result == null) {
indexMap.put(i, result = indexMap.size());
}
return result;
}
private Map<Item,Integer> indexMap = new HashMap<Item, Integer>();
}
In jdk7, your object will throw runtime exception :
Area: API: Utilities
Synopsis: Updated sort behavior for Arrays and Collections may throw an IllegalArgumentException
Description: The sorting algorithm used by java.util.Arrays.sort and (indirectly) by java.util.Collections.sort has been replaced. The
new sort implementation may throw an IllegalArgumentException if it
detects a Comparable that violates the Comparable contract. The
previous implementation silently ignored such a situation.
If the previous behavior is desired, you can use the new system property, java.util.Arrays.useLegacyMergeSort, to restore previous
mergesort behavior.
Nature of Incompatibility: behavioral
RFE: 6804124
If a < b and b < c implies a < c, then you have made a total ordering by using the hashCodes. Take a < d, d < c. The partial order says that b and d not necessarily are ordered. By introducing hashCodes you provide an ordering.
Example: is-a-descendant-of(human, human).
Adam (hash 42) < Moses (hash 17), Adam < Joe (hash 9)
Implies
Adam < Joe < Moses
A negative example would be the same relation, but when time travel allows being your own descendant.
When one item is neither "before" nor "after" another, instead of returning a comparison of the hashcode, just return 0. The result will be "total ordering" and "arbitrary" ordering of coincident items.
Related
It's written in all decent java courses, that if you implement the Comparable interface, you should (in most cases) also override the equals method to match its behavior.
Unfortunately, in my current organization people try to convince me to do exactly the opposite. I am looking for the most convincing code example to show them all the evil that will happen.
I think you can beat them by showing the Comparable javadoc that says:
It is strongly recommended (though not required) that natural
orderings be consistent with equals. This is so because sorted sets
(and sorted maps) without explicit comparators behave "strangely" when
they are used with elements (or keys) whose natural ordering is
inconsistent with equals. In particular, such a sorted set (or sorted
map) violates the general contract for set (or map), which is defined
in terms of the equals method.
For example, if one adds two keys a and b such that (!a.equals(b) &&
a.compareTo(b) == 0) to a sorted set that does not use an explicit
comparator, the second add operation returns false (and the size of
the sorted set does not increase) because a and b are equivalent from
the sorted set's perspective.
So especially with SortedSet (and SortedMap) if the compareTo method returns 0, it assumes it as equal and doesn't add that element second time even the the equals method returns false, and causes confusion as specified in the SortedSet javadoc
Note that the ordering maintained by a sorted set (whether or not an
explicit comparator is provided) must be consistent with equals if the
sorted set is to correctly implement the Set interface. (See the
Comparable interface or Comparator interface for a precise definition
of consistent with equals.) This is so because the Set interface is
defined in terms of the equals operation, but a sorted set performs
all element comparisons using its compareTo (or compare) method, so
two elements that are deemed equal by this method are, from the
standpoint of the sorted set, equal. The behavior of a sorted set is
well-defined even if its ordering is inconsistent with equals; it just
fails to obey the general contract of the Set interface.
If you don't override the equals method, it inherits its behaviour from the Object class.
This method returns true if and only if the specified object is not null and refers to the same instance.
Suppose the following class:
class VeryStupid implements Comparable
{
public int x;
#Override
public int compareTo(VeryStupid o)
{
if (o != null)
return (x - o.x);
else
return (1);
}
}
We create 2 instances:
VeryStupid one = new VeryStupid();
VeryStupid two = new VeryStupid();
one.x = 3;
two.x = 3;
The call to one.compareTo(two) returns 0 indicating the instances are equal but the call to one.equals(two) returns false indicating they're not equal.
This is inconsistent.
Consistency of compareTo and equals is not required but strongly recommended.
I'll give it a shot with this example:
private static class Foo implements Comparable<Foo> {
#Override
public boolean equals(Object _other) {
System.out.println("equals");
return super.equals(_other);
}
#Override
public int compareTo(Foo _other) {
System.out.println("compareTo");
return 0;
}
}
public static void main (String[] args) {
Foo a, b;
a = new Foo();
b = new Foo();
a.compareTo(b); // prints 'compareTo', returns 0 => equal
a.equals(b); // just prints 'equals', returns false => not equal
}
You can see that your (maybe very important and complicated) comparission code is ignored when you use the default equals-method.
the method int compareTo(T o) allow you know if the T o is (in some way) superior or inferior of this, so it allow you to order a list of T o.
In the scenario of int compareTo(T o) you have to do :
is o InstanceOfThis ? => true/false ;
is o EqualOfThis ? => true/false ;
is o SuperiorOfThis ? => true/false ;
is o InferiorOfThis ? true/false ;
So you see you have the equality test, and the best way to not implement the equality two times is to put it in the boolean equals(Object obj) method.
The tutorial Object Ordering refers to the concept of "natural ordering":
If the List consists of String elements, it will be sorted into
alphabetical order. If it consists of Date elements, it will be sorted
into chronological order. How does this happen? String and Date both
implement the Comparable interface. Comparable implementations provide
a natural ordering for a class, which allows objects of that class to
be sorted automatically. The following table summarizes some of the
more important Java platform classes that implement Comparable.
Is the term "natural ordering" specific to Java, or language-independent? For example, could I talk about "natural ordering" in Ruby?
(Note: I'm not talking about Natural sort order, mentioned in Jeff Atwood's blog post Sorting for Humans : Natural Sort Order)
This is not a reference to the type of natural ordering where numbers inside of strings are sorted "naturally" instead of lexicographically digit-by-digit. Java defines the term differently.
Let's change the emphasis:
Comparable implementations provide a natural ordering for a class, which allows objects of that class to be sorted automatically.
The word "natural" means that if you implement Comparable then users of your class can sort it easily without needing a custom comparator. The sorting is natural; it's built in; it's free, no thinking required.
Is the term "natural ordering" specific to Java, or language-independent?
Yes. No. Both? It's specific to Java insofar as the documentation italicizes the term and there is a naturalOrder method. The concept is applicable to other languages, though, sure.
For example, could I talk about "natural ordering" in Ruby?
You could. If you were talking off the cuff you could use the term. If you were writing formally it would be prudent to define it. Because of the confusion with Atwood's use of the term, I'd prefer a different one. Say, "default ordering".
I believe the term has a specific meaning in Java
At least in the official documentation.
From the API doc for the Comparable interface:
This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class's compareTo method is referred to as its natural comparison method.
If we are very familiar with this line from the API doc. Then we see a class implements the Comparable interface, and we see somewhere/somebody mentions the "natural order" of two instances of A, we know that it might be talking about the order imposed by the compareTo method. The question is, does the person saying/writing that term also knows about this specific meaning and is using it in the context?
Of course the API doc is using the term that way because the first thing it does is to define it.
No, the term natural ordering is not Java specific. My opinion is it doesn't pertain to programming languages specifically at all; rather, likely all programming languages (although they may or may not use the term explicitly) rely on the concept.
For example, how would a method like max(a, b) or a.after(b) work if there wasn't a concept of natural ordering? We know the natural order of integers: 1, 2, 3, ...; dates 1/1/1990, 1/2/1990, 1/1/1991, ...; time: 12:00, 12:01, 1:01 ... These system are human-defined but it's what we expect. If we had a number of integers ordered 1, 3, 2, 4 it would be unnatural.
Your quote, as I read it, suggests just that. A type that implements the Comparable interface provides a "default", well-defined, or expected ordering. For developer-defined types it's up to the developer to enforce the natural ordering (as Java developers did with Numbers) or define the natural ordering as we often do with complex types of our own.
When a class implements the Comparable interface it provides a compile-time (natural) ordering only to be altered by providing a custom Comparator. Still, there are a limited number of objects or systems we can represent in software that have a true, well-understood, and accepted natural order. Many types like Students, Cars, and Users can depend on one or a combination of attributes that determine their order which may not seem natural at all.
This can be achieved by implementing new Comparator<String> and pass it to Collections.sort(list, comparator) method.
#Override
public int compare(String s1, String s2) {
int len1 = s1.length();
int len2 = s2.length();
int lim = Math.min(len1, len2);
char v1[] = s1.toCharArray();
char v2[] = s2.toCharArray();
int k = 0;
while (k < lim) {
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
if(this.isInteger(c1) && this.isInteger(c2)) {
int i1 = grabContinousInteger(v1, k);
int i2 = grabContinousInteger(v2, k);
return i1 - i2;
}
return c1 - c2;
}
k++;
}
return len1 - len2;
}
private boolean isInteger(char c) {
return c >= 48 && c <= 57; // ascii value of 0-9
}
private int grabContinousInteger(char[] arr, int k) {
int i = k;
while(i < arr.length && this.isInteger(arr[i])) {
i++;
}
return Integer.parseInt(new String(arr, k, i - k));
}
I'm sorting an array of objects. The objects have lots of fields but I only care about one of them. So, I wrote a comparator:
Collections.sort(details, new Comparator<MyObj>() {
#Override
public int compare(MyObj d1, MyObj d2) {
if (d1.getDate() == null && d2.getDate() == null) {
return 0;
} else if (d1.getDate() == null) {
return -1;
} else if (d2.getDate() == null) {
return 1;
}
if (d1.getDate().before(d2.getDate())) return 1;
else if (d1.getDate().after(d2.getDate())) return -1;
else return 0;
}
});
From the perspective of my use case, this Comparator does all it needs to, even if I might consider this sorting non-deterministic. However, I wonder if this is bad code. Through this Comparator, two very distinct objects could be considered "the same" ordering even if they are unequal objects. I decided to use hashCode as a tiebreaker, and it came out something like this:
Collections.sort(details, new Comparator<MyObj>() {
#Override
public int compare(MyObj d1, MyObj d2) {
if (d1.getDate() == null && d2.getDate() == null) {
return d1.hashCode();
} else if (d1.getDate() == null) {
return -1;
} else if (d2.getDate() == null) {
return 1;
}
if (d1.getDate().before(d2.getDate())) return 1;
else if (d1.getDate().after(d2.getDate())) return -1;
else return d1.hashCode() - d2.hashCode();
}
});
(what I return might be backwards, but that's is not important to this question)
Is this necessary?
EDIT:
To anyone else looking at this question, consider using Google's ordering API. The logic above was replaced by:
return Ordering.<Date> natural().reverse().nullsLast().compare(d1.getDate(), d2.getDate());
Through this comparator, two very distinct objects could be considered "the same" ordering even if they are unequal objects.
That really doesn't matter; it's perfectly fine for two objects to compare as equal even if they are not "equal" in any other sense. Collections.sort is a stable sort, meaning objects that compare as equal come out in the same order they came in; that's equivalent to just using "the index in the input" as a tiebreaker.
(Also, your new Comparator is actually significantly more broken than the original. return d1.hashCode() is particularly nonsensical, and return d1.hashCode() - d2.hashCode() can lead to nontransitive orderings that will break Collections.sort, because of overflow issues. Unless both integers are definitely nonnegative, which hashCodes aren't, always use Integer.compare to compare integers.)
This is only mostly important if the objects implement Comparable.
It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.
For example, if one adds two keys a and b such that (!a.equals(b) && a.compareTo(b) == 0) to a sorted set that does not use an explicit comparator, the second add operation returns false (and the size of the sorted set does not increase) because a and b are equivalent from the sorted set's perspective.
However, you're not doing that, you're using a custom Comparator, probably for presentation reasons. Since this sorting metric isn't inherently attached to the object, it doesn't matter that much.
As an aside, why not just return 0 instead of messing with the hashCodes? Then they will preserve the original order if the dates match, because Collections.sort is a stable sort. I agree with #LouisWasserman that using hashCode in this way can have potentially very bizarre consequences, mostly relating to integer overflow. Consider the case where d1.hashCode() is positive and d2.hashCode() is negative, and vice versa.
I have a list of objects which implement Comparable.
I want to sort this list and that is why I used the Comparable.
Each object has a field, weight that is composed of 3 other member int variables.
The compareTo returns 1 for the object with the most weight.
The most weight is not only if the
weightObj1.member1 > weightObj2.member1
weightObj1.member2 > weightObj2.member2
weightObj1.member3 > weightObj2.member3
but actually is a little more complicated and I end up with code with too many conditional ifs.
If the weightObj1.member1 > weightObj2.member1 holds then I care if weightObj1.member2 > weightObj2.member2.
and vice versa.
else if weightObj1.member2 > weightObj2.member2 holds then I care if weightObj1.member3 > weightObj2.member3 and vice versa.
Finally if weightObj1.member3 > weightObj2.member3 holds AND if a specific condition is met then this weightObj1 wins and vice versa
I was wondering is there a design approach for something like this?
You can try with CompareToBuilder from Apache commons-lang:
public int compareTo(Object o) {
MyClass myClass = (MyClass) o;
return new CompareToBuilder()
.appendSuper(super.compareTo(o)
.append(this.field1, myClass.field1)
.append(this.field2, myClass.field2)
.append(this.field3, myClass.field3)
.toComparison();
}
See also
How write universal comparator which can make sorting through all necessary fields?
Group Comparator, Bean Comparator and Column Comparator
Similar to the above-mentioned Apache CompareToBuilder, but including generics support, Guava provides ComparisonChain:
public int compareTo(Foo that) {
return ComparisonChain.start()
.compare(this.aString, that.aString)
.compare(this.anInt, that.anInt)
.compare(this.anEnum, that.anEnum, Ordering.natural().nullsLast())
// you can specify comparators
.result();
}
The API for Comparable states:
It is strongly recommended (though not required) that natural
orderings be consistent with equals.
Since the values of interest are int values you should be able to come up with a single value that captures all comparisons and other transformations you need to compare two of your objects. Just update the single value when any of the member values change.
You can try using reflection, iterate over properties and compare them.
You can try something like this:
int c1 = o1.m1 - o2.m1;
if (c1 != 0) {
return c1;
}
int c2 = o1.m2 - o2.m2;
if (c2 != 0) {
return c2;
}
return o1.m3 - o2.m3;
because comparable shall not just return -1, 0 or 1. It can return any integer value and only the sign is considered.
I have a bunch of objects of a class Puzzle. I have overridden equals() and hashCode(). When it comes time to present the solutions to the user, I'd like to filter out all the Puzzles that are "similar" (by the standard I have defined), so the user only sees one of each.
Similarity is transitive.
Example:
Result of computations:
A (similar to A)
B (similar to C)
C
D
In this case, only A or D and B or C would be presented to the user - but not two similar Puzzles. Two similar puzzles are equally valid. It is only important that they are not both shown to the user.
To accomplish this, I wanted to use an ADT that prohibits duplicates. However, I don't want to change the equals() and hashCode() methods to return a value about similarity instead. Is there some Equalator, like Comparator, that I can use in this case? Or is there another way I should be doing this?
The class I'm working on is a Puzzle that maintains a grid of letters. (Like Scrabble.) If a Puzzle contains the same words, but is in a different orientation, it is considered to be similar. So the following to puzzle:
(2, 2): A
(2, 1): C
(2, 0): T
Would be similar to:
(1, 2): A
(1, 1): C
(1, 0): T
Okay you have a way of measuring similarity between objects. That means they form a Metric Space.
The question is, is your space also a Euclidean space like normal three dimensional space, or integers or something like that? If it is, then you could use a binary space partition in however many dimensions you've got.
(The question is, basically: is there a homomorphism between your objects and an n-dimensional real number vector? If so, then you can use techniques for measuring closeness of points in n-dimensional space.)
Now, if it's not a euclidean space then you've got a bigger problem. An example of a non-euclidean space that programers might be most familiar with would be the Levenshtein Distance between to strings.
If your problem is similar to seeing how similar a string is to a list of already existing strings then I don't know of any algorithms that would do that without O(n2) time. Maybe there are some out there.
But another important question is: how much time do you have? How many objects? If you have time or if your data set is small enough that an O(n2) algorithm is practical, then you just have to iterate through your list of objects to see if it's below a certain threshold. If so, reject it.
Just overload AbstractCollection and replace the Add function. Use an ArrayList or whatever. Your code would look kind of like this
class SimilarityRejector<T> extends AbstractCollection<T>{
ArrayList<T> base;
double threshold;
public SimilarityRejector(double threshold){
base = new ArrayList<T>();
this.threshold = threshold;
}
public void add(T t){
boolean failed = false;
for(T compare : base){
if(similarityComparison(t,compare) < threshold) faled = true;
}
if(!failed) base.add(t);
}
public Iterator<T> iterator() {
return base.iterator();
}
public int size() {
return base.size();
}
}
etc. Obviously T would need to be a subclass of some class that you can perform a comparison on. If you have a euclidean metric, then you can use a space partition, rather then going through every other item.
I'd use a wrapper class that overrides equals and hashCode accordingly.
private static class Wrapper {
public static final Puzzle puzzle;
public Wrapper(Puzzle puzzle) {
this.puzzle = puzzle;
}
#Override
public boolean equals(Object object) {
// ...
}
#Override
public int hashCode() {
// ...
}
}
and then you wrap all your puzzles, put them in a map, and get them out again…
public Collection<Collection<Puzzle>> method(Collection<Puzzles> puzzles) {
Map<Wrapper,<Collection<Puzzle>> map = new HashMap<Wrapper,<Collection<Puzzle>>();
for (Puzzle each: puzzles) {
Wrapper wrapper = new Wrapper(each);
Collection<Puzzle> coll = map.get(wrapper);
if (coll == null) map.put(wrapper, coll = new ArrayList<Puzzle>());
coll.add(puzzle);
}
return map.values();
}
Create a TreeSet using your Comparator
Adds all elements into the set
All duplicates are stripped out
Normally "similarity" is not a transitive relationship. So the first step would be to think of this in terms of equivalence rather than similarity. Equivalence is reflexive, symmetric and transitive.
Easy approach here is to define a puzzle wrapper whose equals() and hashCode() methods are implemented according to the equivalence relation in question.
Once you have that, drop the wrapped objects into a java.util.Set and that filters out duplicates.
IMHO, most elegant way was described by Gili (TreeSet with custom Comparator).
But if you like to make it by yourself, seems this easiest and clearest solution:
/**
* Distinct input list values (cuts duplications)
* #param items items to process
* #param comparator comparator to recognize equal items
* #return new collection with unique values
*/
public static <T> Collection<T> distinctItems(List<T> items, Comparator<T> comparator) {
List<T> result = new ArrayList<>();
for (int i = 0; i < items.size(); i++) {
T item = items.get(i);
boolean exists = false;
for (int j = 0; j < result.size(); j++) {
if (comparator.compare(result.get(j), item) == 0) {
exists = true;
break;
}
}
if (!exists) {
result.add(item);
}
}
return result;
}