The NavigableSet interface offers a number of useful methods that a normal Set does not (specifically I'm thinking about methods like headSet and tailSet for instance). However, being a Set, it does not support duplicate elements. Also, being a SortedSet, the ordering must be consistent with equals and hashCode to avoid violating the contract of the Set interface.
Is there any good alternative data structure for when there might be duplicate elements or multiple elements that are "equal" according to the natural ordering or Comparator but not "equal" according to the equals method? As a motivating example, consider the following code that shows why a NavigableSet is not appropriate:
public class Foo implements Comparable<Foo>{
double x;
double y;
#Override
public int compareTo(Foo o) {
return Double.compare(x, o.x); // only x matters for sort order
}
public static void main(String...args){
Foo a = new Foo();
a.x = 1;
a.y = 2;
Foo b = new Foo();
b.x = 1;
b.y = 42;
Foo c = new Foo();
c.x = 2;
c.y = 12.34;
NavigableSet<Foo> set = new TreeSet<Foo>();
set.add(a);
set.add(a);
set.add(b);
set.add(c);
System.out.println(set.size());
}
}
Notice that element a only gets added once (of course, since this is a Set). Also, notice that b does not get added, since there is already an element for which the comparison returns 0.
I felt like this was probably a fairly common thing, so I hoped to find an existing implementation rather than rolling my own. Is there a good, widely-used data structure for my purposes?
I'll add that while writing this question I did come across the Biscotti Project, but a) I'm not convinced it solves the comparison/equals issue and b) the FAQ explicitly says it's not really safe to use.
Let me reformulate your question to make sure I understand it well.
The need for headSet and tailSet implies the collection has to be sorted. Which is kind of in conflict with the need for allowing the duplicate members according to compareTo.
The conflict comes from the effective usage of such collection. Adding a member to sorted collection is done utilizing the compareTo method in O(log n) - kind of binary search and then add. The TreeSet is implemented using TreeMap which can't two same members according to compareTo.
What you are looking for won't be effective.
You may try to use a simple ArrayList and sort it by Collections.sort and then use sublist method. The problem with this is that it don't deal with duplicates at all.
You may also use the LinkedHashSet which deals with duplicates (according to equals() and it is immune to compareTo()), but it is not sorted. Of course, you may convert the LinkedHashSet instance to the SortedSet by passing its instance in constructor.
Related
I'm curious, is there any Set that only requires .equals() to determine the uniqueness?
When looking at Set classes from java.util, I can only find HashSet which needs .hashCode() and TreeSet (or generally SortedSet) which requires Comparator. I cannot find any class that use only .equals().
Does it make sense that if I have .equals() method, it is sufficient to use it to determine object uniqueness? Thus have a Set implementation that only need to use .equals()? Or did I miss something here that .equals() are not sufficient to determine object uniqueness in Set implementation?
Note that I am aware of Java practice that if we override .equals(), we should override .hashCode() as well to maintain contract defined in Object.
On its own, the equals method is perfectly sufficient to implement a set correctly, but not to implement it efficiently.
The point of a hash code or a comparator is that they provide ways to arrange objects in some ordered structure (a hash table or a tree) which allows for fast finding of objects. If you have only the equals method for comparing pairs of objects, you can't arrange the objects in any meaningful or clever order; you have only a loose jumble of objects.
For example, with only the equals method, ensuring that objects in a set are unique requires comparing each added object to every other object in the jumble. Adding n objects requires
n * (n - 1) / 2 comparisons. For 5 objects that's 10 comparisons, which is fine, but for 1,000 objects that's 499,500 comparisons. It scales terribly.
Because it would not give scalable performance, no such set implementation is in the standard library.
If you don't care about hash table performance, this is a minimal implementation of the hashCode method which works for any class:
#Override
public int hashCode() {
return 0; // or any other constant
}
Although it is required that equal objects have equal hash codes, it is never required for correctness that inequal objects have inequal hash codes, so returning a constant is legal. If you put these objects in a HashSet or use them as HashMap keys, they will end up in a jumble in a single hash table bucket. Performance will be bad, but it will work correctly.
Also, for what it's worth, a minimal working Set implementation which only ever uses the equals method would be:
public class ArraySet<E> extends AbstractSet<E> {
private final ArrayList<E> list = new ArrayList<>();
#Override
public boolean add(E e) {
if (!list.contains(e)) {
list.add(e);
return true;
}
return false;
}
#Override
public Iterator<E> iterator() {
return list.iterator();
}
#Override
public int size() {
return list.size();
}
}
The set stores objects in an ArrayList, and uses list.contains to call equals on objects. Inherited methods from AbstractSet and AbstractCollection provide the bulk of the functionality of the Set interface; for example its remove method gets implemented via the list iterator's remove method. Each operation to add or remove an object or test an object's membership does a comparison against every object in the set, so it scales terribly, but works correctly.
Is this useful? Maybe, in certain special cases. For sets that are known to be very tiny, the performance might be fine, and if you have millions of these sets, this could save memory compared to a HashSet.
In general, though, it is better to write meaningful hash code methods and comparators, so you can have sets and maps that scale efficiently.
You should always override hashCode() when you override equals(). The contract for Object clearly specifies that two equal objects have identical hash codes, and a surprising number of data structures and algorithms depend on this behavior. It's not difficult to add a hashCode(), and if you skip it now, you'll eventually get hard-to-diagnose bugs when your objects start getting put in hash-based structures.
It would mathematically make sense to have a set that requires nothing but .equals().
But such an implementation would be so slow (linear time for every operation) that it has been decided that you can always give a hint.
Anyway, if there is really no way you can write a hashCode(), just make it always return 0 and you will have a structure that is as slow as the one you hoped for!
Consider the following compareTo method, implementing the Comparable<T> interface.:
#Override
public int compareTo(MyObject o)
{
if (o.value.equals(value)
return 0;
return 1;
}
Apparantly, the programmer implemented the compareTo as if it was equals(). Obviously a mistake. I would expect this to cause Collections.sort() to crash, but it doesn't. Instead it will just give an arbitrairy result: the sorted result is dependant on the initial ordering.
public class MyObject implements Comparable<MyObject>
{
public static void main(String[] args)
{
List<MyObject> objects =
Arrays.asList(new MyObject[] {
new MyObject(1), new MyObject(2), new MyObject(3)
});
Collections.sort(objects);
System.out.println(objects);
List<MyObject> objects2 =
Arrays.asList(new MyObject[] {
new MyObject(3), new MyObject(1), new MyObject(2)
});
Collections.sort(objects2);
System.out.println(objects2);
}
public int value;
public MyObject(int value)
{
this.value = value;
}
#Override
public int compareTo(MyObject o)
{
if (value == o.value)
return 0;
return 1;
}
public String toString()
{
return "" + value;
}
}
Result:
[3, 2, 1]
[2, 1, 3]
Can we come up with a use case for this curious implementation of the compareTo, or is it always invalid. And in case of the latter, should it throw an exception, or perhaps not even compile?
There's no reason for it to crash or throw an exception.
You're required to fulfil the contract when you implement the method, but if you don't, it just means that you'll get arbitrary results from anything that relies on it. Nothing is going to go out of its way to check the correctness of your implementation, because that would just slow everything down.
A sorting algorithm's efficiency is defined in terms of the number of comparisons it makes. That means that it's not going to add in extra comparisons just to check that your implementation is consistent, any more than a HashMap is going to call .hashcode() on everything twice just to check it gives the same result both times.
If it happens to spot a problem during the course of sorting, then it might throw an exception; but don't rely on it.
Violating the contract of Comparable or Comparator does not necessarily result in an exception. The sort method won’t spend additional efforts to detect such a situation. Therefore, it might result in an inconsistent order, an apparently correct result or in an exception being thrown.
The actual behavior depends on the input data and the current implementation. E.g. Java 7 introduced TimSort in it’s sort implementation which is more likely to throw an exception for inconsistent Comparable or Comparator implementations than the implementations of earlier Java releases. This might spot errors that remained undetected when using previous Java versions, however, that’s not a feature to aid debugging in the first place, it’s just a side-effect of more sophisticated optimizations.
Note that it isn’t entirely impossible for a compiler or code audit tool to detect asymmetrical behavior of a compare method for simple cases like in your question. However, as far as I know, there are no compilers performing such a check automatically. If you want to be on the safe side, you should always implement unit tests for classes implementing Comparable or Comparator.
According to the documentation, Collections.sort() uses a variant of merge sort, which divides the list into multiple sublists and then repeatedly merge those lists repeatedly; the sorting part is done during the merging of those lists; if your comparison method is arbitrary, what will happen is that this merging will be done in an arbitrary order.
As a result of this every element is bigger than all the other elements.
Depending on the mathematical group or order you want to represent it may be a valid case and therefore there is no reason to throw any errors. However the example you show does not represent the natural ordering of numbers as you know them by standard.
The presented order is not total.
==EDIT==
Thanks for the comment, indeed it is not allowed by the specification to use the compareTo method to implement non-total or non-antisymmetric orders.
I have one class having two variables named as x and y. In this class I have overrided the equals and hashCode methods to compare two object of this class. But our requirement is to compare two object of this class sometimes on the basis of x and sometimes on the basis of y. Is it possible dynamically in Java?
Edit:
I have one more class named as B, in this class there is two method m1 and m2 and I want to compare the above class object in such a way that when we call from m1 (for sorting) the above objects will be compared on the basis of x (means compare object by compare x variable) and when we call from m2 (for sorting) then we compare according to y.
Changing behavior based on last method to call your method is possible, but you shouldn't do it for a lot of reasons.
it violates the equals contract, thus breaking the functionality of several algorithms designed to handle collections
result of the comparison cannot be anymore known without knowing the caller, which is a hard dependency that's prone to break
However, if you insist you need it, you can do like
StackTraceElement[] stackTraceElements = Thread.currentThread().getStackTrace();
if (stackTraceElements.length < 3)
{
// do something when last method to call is not available
// probably you'll want to return something
}
String callerMethod = stackTraceElements[2].getMethodName();
if (callerMethod.equals("m1"))
{
// something
} else
{
// something else
}
This example is simplified as it assumes the method calling this method is the candidate - it can be some method further down the call stack.
As noted, this is not recommended. Rather use different kind of comparators for the purpose, and give a relevant comparator to the sort method to have different kind of sorting per context.
Depending on the complexity of the comparison, you can either do this within the class or use two seperate comparator classes.
public boolean equals(Object other){
if(condition == true){
return x==x;
}else{
return y==y;
}
}
or
public boolean equals(Object other){
if(condition == true){
return new CompareX(this, other).compare();
}else{
return new CompareY(this, other).compare();
}
}
You have to extend the comparison logic to a valid one, of course.
Oh and, the same principle applies to the hashCode.
It's not possible, to change the behaviour of equals dynamically. You have to use Comparator to provide the comparison from the outside of the class.
Since Java8 with Lambdas it is easy to use Comparators.
There is a method comparing. You can create Comparators out of Methods, which you want to compare.
// A comparator comparing on x
Comparator<A> comp1 = comparing (a -> a.x);
// A comparator comparing on the output of m1
Comparator<A> comp2 = comparing (A::m1);
// A comparator comparing on the output of m1 and when equals, comparing on x
Comparator<A> comp2 = comparing (A::m1).thenComparing (a -> a.x);
From the external point you can decide, which comparator to use.
There's a new way to sort your data in Java8, too:
List<A> data;
data.stream ().sorted (comparing (a -> a.x));
Of course you have to be allowed to use Java8 for this.
If you can add flag setting code to m1 and m2 you can modify eis answer to get rid of the kludgy stacktrace stuff.
It is still kludgy.
#SuppressWarnings("unchecked")
public static final Ordering<EmailTemplate> ARBITRARY_ORDERING = (Ordering)Ordering.arbitrary();
public static final Ordering<EmailTemplate> ORDER_BY_NAME = Ordering.natural().nullsFirst().onResultOf(GET_NAME);
public static final Ordering<EmailTemplate> ORDER_BY_NAME_SAFE = Ordering.allEqual().nullsFirst()
.compound(ORDER_BY_NAME)
.compound(ARBITRARY_ORDERING);
Here's the code a use to order EmailTemplate.
If i have a list of EmailTemplate i want the null elements of the list to appear at the beginning, then the elements with a null name, and then by natural name order, and if they have the same name, an arbitrary order.
Is it how i am supposed to do? It seems strange to start the comparator by "allEqual" i think...
I also wonder what's the best way to deal with the Ordering.arbitrary(), since it's a static method that returns Ordering. Is there any elegant way to use it? I don't really like this kind of useless, with warning, line:
#SuppressWarnings("unchecked")
public static final Ordering<EmailTemplate> ARBITRARY_ORDERING = (Ordering)Ordering.arbitrary();
By the way, the documentation says:
Returns an arbitrary ordering over all objects, for which compare(a,
b) == 0 implies a == b (identity equality). There is no meaning
whatsoever to the order imposed, but it is constant for the life of the VM.
Does this mean that my object being compared with this Ordering will never be garbage collected?
Regarding the second question: no. Guava uses the identity hash codes of the objects to sort them arbitrarily.
Regarding the first question: I would use a comparison chain to sort by name, then by arbitrary order:
private class ByNameThenArbitrary implements Comparator<EmailTemplate> {
#Override
public int compare(EmailTemplate e1, EmailTemplate e2) {
return ComparisonChain.start()
.compare(e1.getName(), e2.getName(), Ordering.natural().nullsFirst(),
.compare(e1, e2, Ordering.arbitrary())
.result();
}
}
Then I would create the real ordering to order the templates with nulls first:
private static final Ordering<EmailTemplate> ORDER =
Ordering.fromComparator(new ByNameThenArbitrary()).nullsFirst();
Not tested, though.
I'm pretty sure, you're doing it too complicated:
Ordering.arbitrary() works with any Object and the compound doesn't require to restrict it to EmailTemplate
Saying nullsFirst() takes priority when null gets compared, and I'd suggest to apply it last
You don't need to define multiple constants, it all should be easy
I'd go for
public static final Ordering<EmailTemplate> ORDER_BY_NAME_SAFE = Ordering
.natural()
.onResultOf(GET_NAME)
.compound(Ordering.arbitrary())
.nullsFirst();
but I haven't tested it.
What's confusing here, is the way how compound and nullsFirst work. With the former, this takes precedence, while with the latter testing for null wins. Both is logical:
compound works left to right
nullsFirst must first test for null, otherwise we'd get an expection
but taken together it's confusing.
Does this mean that my object being compared with this Ordering will never be garbage collected?
No, it uses weak references. Whenever an object isn't referenced elsewhere, it can be garbage collected. This is no contradiction to "the ordering is constant for the life of the VM", since a no more existing object can't be compared anymore.
Note that Ordering.arbitrary() is indeed arbitrary and based on object's identity rather than on equals, which means that
Ordering.arbitrary().compare(new String("a"), new String("a"))
doesn't return 0.
I wonder if an "equals-compatible arbitrary ordering" could be implemented.
I have a list/collection of objects that may or may not have the same property values. What's the easiest way to get a distinct list of the objects with equal properties? Is one collection type best suited for this purpose? For example, in C# I could do something like the following with LINQ.
var recipients = (from recipient in recipientList
select recipient).Distinct();
My initial thought was to use lambdaj (link text), but it doesn't appear to support this.
return new ArrayList(new HashSet(recipients));
Use an implementation of the interface Set<T> (class T may need a custom .equals() method, and you may have to implement that .equals() yourself). Typically a HashSet does it out of the box : it uses Object.hashCode() and Object.equals() method to compare objects. That should be unique enough for simple objects. If not, you'll have to implement T.equals() and T.hashCode() accordingly.
See Gaurav Saini's comment below for libraries helping to implement equals and hashcode.
Place them in a TreeSet which holds a custom Comparator, which checks the properties you need:
SortedSet<MyObject> set = new TreeSet<MyObject>(new Comparator<MyObject>(){
public int compare(MyObject o1, MyObject o2) {
// return 0 if objects are equal in terms of your properties
}
});
set.addAll(myList); // eliminate duplicates
Java 8:
recipients = recipients.stream()
.distinct()
.collect(Collectors.toList());
See java.util.stream.Stream#distinct.
order preserving version of the above response
return new ArrayList(new LinkedHashSet(recipients));
If you're using Eclipse Collections, you can use the method distinct().
ListIterable<Integer> integers = Lists.mutable.with(1, 3, 1, 2, 2, 1);
Assert.assertEquals(
Lists.mutable.with(1, 3, 2),
integers.distinct());
The advantage of using distinct() instead of converting to a Set and then back to a List is that distinct() preserves the order of the original List, retaining the first occurrence of each element. It's implemented by using both a Set and a List.
MutableSet<T> seenSoFar = Sets.mutable.with();
int size = list.size();
for (int i = 0; i < size; i++)
{
T item = list.get(i);
if (seenSoFar.add(item))
{
targetCollection.add(item);
}
}
return targetCollection;
If you cannot convert your original List into an Eclipse Collections type, you can use ListAdapter to get the same API.
MutableList<Integer> distinct = ListAdapter.adapt(integers).distinct();
Note: I am a committer for Eclipse Collections.
You can use a Set. There's couple of implementations:
HashSet uses an object's hashCode and equals.
TreeSet uses compareTo (defined by Comparable) or compare (defined by Comparator). Keep in mind that the comparison must be consistent with equals. See TreeSet JavaDocs for more info.
Also keep in mind that if you override equals you must override hashCode such that two equals objects has the same hash code.
The ordinary way of doing this would be to convert to a Set, then back to a List. But you can get fancy with Functional Java. If you liked Lamdaj, you'll love FJ.
recipients = recipients
.sort(recipientOrd)
.group(recipientOrd.equal())
.map(List.<Recipient>head_());
You'll need to have defined an ordering for recipients, recipientOrd. Something like:
Ord<Recipient> recipientOrd = ord(new F2<Recipient, Recipient, Ordering>() {
public Ordering f(Recipient r1, Recipient r2) {
return stringOrd.compare(r1.getEmailAddress(), r2.getEmailAddress());
}
});
Works even if you don't have control of equals() and hashCode() on the Recipient class.
Actually lambdaj implements this feature through the selectDistinctArgument method
http://lambdaj.googlecode.com/svn/trunk/html/apidocs/ch/lambdaj/Lambda.html#selectDistinctArgument(java.lang.Object,%20A)