Suppose I need TreeSet with elements sorted with some domain logic. By this logic it doesn't matter order of some elements that doesn't equal so compare method can return 0, but in this case I couldn't put them in TreeSet.
So, question: what disadvantages I'll have from code like this:
class Foo implements Comparable<Foo>{}
new TreeSet<Foo>(new Comparator<Foo>(){
#Override
public int compare(Foo o1, Foo o2) {
int res = o1.compareTo(o2);
if(res == 0 || !o1.equals(o2)){
return o1.hashCode() - o2.hashCode();
}
return res;
}
});
Update:
Ok. If it should always be a consistency between the methods equals(), hashcode() and compareTo(), as #S.P.Floyd - seanizer and others said.
If it would be better or even good if I'll remove Comparable interface and move this logic in Comparator (I can do it without broken encapsulation)? So it will be:
class Foo{}
new TreeSet<Foo>(new Comparator<Foo>(){
#Override
public int compare(Foo o1, Foo o2) {
//some logic start
if(strictliBigger(o1, o2)){ return 1;}
if(strictliBigger(o2, o1)){ return -1;}
//some logic end
if(res == 0 || !o1.equals(o2)){
return o1.hashCode() - o2.hashCode();
}
return res;
}
});
Update 2:
Would System.identityHashCode(x) be better than hashCode() if I don't need stable sort?
While this might work, it is far from being a best practice.
From the SortedSet docs:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
For objects that implement Comparable, there should always be a consistency between the methods equals(), hashcode() and compareTo().
I'm afraid a SortedSet is just not what you want, nor will a Guava MultiSet be adequate (because it will not let you independently retrieve multiple equal items). I think what you need is a SortedList. There is no such beast that I know of (maybe in commons-collections, but those are a bit on the legacy side), so I implemented one for you using Guava's ForwardingList as a base class. In short: this List delegates almost everything to an ArrayList it uses internally, but it uses Collections.binarySearch() in it's add() method to find the right insertion position and it throws an UnsupportedOperationException on all optional methods of the List and ListIterator interfaces that add or set values at a given position.
The Constructors are identical to those of ArrayList, but for each of them there is also a second version with a custom Comparator. If you don't use a custom Comparator, your list elements need to implement Comparable or RuntimeExceptions will occur during sorting.
public class SortedArrayList<E> extends ForwardingList<E> implements
RandomAccess{
private final class ListIteratorImpl extends ForwardingListIterator<E>{
private final int start;
public ListIteratorImpl(final int start){
this.start = start;
}
#Override
public void set(E element){throw new UnsupportedOperationException();}
#Override
public void add(E element){throw new UnsupportedOperationException();}
#Override
protected ListIterator<E> delegate(){return inner.listIterator(start);};
}
private Comparator<? super E> comparator;
private List<E> inner;
public SortedArrayList(){this(null, null, null);}
#SuppressWarnings("unchecked")
private SortedArrayList(
final List<E> existing,
final Collection<? extends E> values,
final Comparator<? super E> comparator
){
this.comparator =
(Comparator<? super E>)
(comparator == null
? Ordering.natural()
: comparator );
inner = (
existing == null
? (values == null
? new ArrayList<E>(values)
: new ArrayList<E>()
)
: existing;
}
public SortedArrayList(final Collection<? extends E> c){
this(null, c, null);
}
public SortedArrayList(final Collection<? extends E> c,
final Comparator<? super E> comparator){
this(null, c, comparator);
}
public SortedArrayList(final Comparator<? super E> comparator){
this(null, null, comparator);
}
public SortedArrayList(final int initialCapacity){
this(new ArrayList<E>(initialCapacity), null, null);
}
public SortedArrayList(final int initialCapacity,
final Comparator<? super E> comparator){
this(new ArrayList<E>(initialCapacity), null, comparator);
}
#Override
public boolean add(final E e){
inner.add(
Math.abs(
Collections.binarySearch(inner, e, comparator)
) + 1,
e
);
return true;
}
#Override
public void add(int i, E e){throw new UnsupportedOperationException();}
#Override
public boolean addAll(final Collection<? extends E> collection){
return standardAddAll(collection);
}
#Override
public boolean addAll(int i,
Collection<? extends E> es){
throw new UnsupportedOperationException();
}
#Override
protected List<E> delegate(){ return inner; }
#Override
public List<E> subList(final int fromIndex, final int toIndex){
return new SortedArrayList<E>(
inner.subList(fromIndex, toIndex),
null,
comparator
);
}
#Override
public ListIterator<E> listIterator(){ return new ListIteratorImpl(0); }
#Override
public ListIterator<E> listIterator(final int index){
return new ListIteratorImpl(index);
}
#Override
public E set(int i, E e){ throw new UnsupportedOperationException(); }
}
Beware: even for two Foos f1,f2 with f1 != f2 you could get f1.hashCode() == f2.hashCode()! That means you won't get a stable sorting with your compare Method.
There is no rule in Java which says that the hash codes of two objects must be different just because they aren't equal (so o1.hashCode() - o2.hashCode() could return 0 in your case).
Also the behavior of equals() should be consistent with the results from compareTo(). This is not a must but if you can't maintain this, it suggests that your design has a big flaw.
I strongly suggest to look at the other fields of the objects and use some of those to extend your comparison so you get a value != 0 for objects were equals() == false.
hashcode() method doesn't guarantee any less than or greater than. compare() and equals() should yield the same meaning, but its not necessary, though.
As far as I can understand from your confusing code (no offence intended :)), you want to add duplicates to the TreeSet. For that reason you came up with this implementation. Here is the reason, you can't put them in the TreeSet, quoting from the docs,
The behavior of a set is well-defined
even if its ordering is inconsistent
with equals; it just fails to obey the
general contract of the Set interface.
So, you need to do something with yor equals() method, so it can never return true whats so ever. The best implementation would be,
public boolean equals(Object o) {
return false;
}
By the way, if I am right in my understanding, why not you use List instead and sort that.
Very interesting question.
As far as I understand your problem is duplicate elements.
I think that if o1.equals(o2) their hash codes might be equal too. It depends on the implementation of hashCode() in your Foo class. So, I'd suggest you to use System.identityHashCode(x) instead.
You have a Foo class wich is comparable but want to use a different sorting in a TreeSet<Foo> structure. Then your idea is the correct way to do it. Use that constructor to "overrule" the natural sorting of Foo.
If you have no specific expected ordering for any two given elements, but still want to consider them un-equal, then you have to return some specified ordering anyway.
As others have posted, hashCode() isn't a good candidate, because the hashCode() values of both elements can easily be equal. System.identityHashCode() might be a better choice, but still isn't perfect, as even identityHashCode() doesn't guarantee unique values either
The Guava arbitrary() Ordering implements a Comparator using System.identityHashCode().
Yes, as others said above, hashCode() is not secure to use here. But if you dont care about the ordering of objects that are equal in terms of o1.compareTo(o2) == 0, you could do something like:
public int compare(Foo o1, Foo o2) {
int res = o1.compareTo(o2);
if (res == 0 && !o1.equals(o2)) {
return -1;
}
return res;
}
int res = o1.compareTo(o2);
if(res == 0 || !o1.equals(o2)){
return o1.hashCode() - o2.hashCode();
}
Can be problematic, since if 2 objects are equal (i.e. in your res == 0) then these 2 objects return the same hashcode. Hashcodes are not unique for every object.
Edit #Stas, The System.identityHashCode(Object x); still won't help you. The reason is described on the javadoc:
Returns the same hash code for the
given object as would be returned by
the default method hashCode(),
whether or not the given object's
class overrides hashCode(). The hash
code for the null reference is zero.
There are a couple of problems here:
Hash codes are not generally unique, and in particular System.identityHashCode will not be unique on vaguely modern JVMs.
This is not a question of stability. We are sorting an array, but creating a tree structure. The hash code collisions will cause compare to return zero, which for TreeSet means one object wins and the other is discarded - it does not degrade to a linked-list (the clue is having "Set" in the name).
There is generally an integer overflow issue with subtracting one hash code from another. This means the comparison wont be transitive (i.e. it is broken). As luck would have it, on the Sun/Oracle implementation, System.identityHashCode always returns positive values. This means that extensive testing will probably not find this particular sort of bug.
I don't believe there is a good way to achieve this using TreeSet.
Two points may be relevant and these are that the return in one situation is shown as -1 and this depends if a negative value is permitted in the function parameter variable or in the relevant country of use, and also if the method you are using is permitted. There are standard data arranging methods like selector or selection sort and the paper description or code is usually available from a national authority if a copy is not in your workplace. Using comparisons like greater than or less than can speed up the code and avoids the use of a direct comparison for equality by implied dropthrough to later script or code.
Related
Q.1) As written in documentation of AbstractSet - "This class does not override any of the implementations from the AbstractCollection class." If it does not override or change add(Object o) or any other Collection interface contract implemented by AbstractCollection class, and merely inherits them and so as HashSet.
How do HashSet and other Set objects then enforce stipulations like no duplicate adding check or Hashtable way of inserting elements, which is totally different to how List or other Collection objects can add elements.
Q.2) In doc, for AbstractSet, it is written, AbstractSet merely adds implementation for equals and hashcode. However, in method details part, it is mentioned Object class has done overriding equals and hashcode method. Does AbstractSet only inherit without doing any change to these two methods? If so, what is the importance of AbstractSet class? Please clarify
Q1: How does HashSet enforce duplicate checks?
If you take a look at the implementation in java.util.HashSet, you'll see the following code:-
private static final Object PRESENT = new Object();
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
What happens is fairly simple; we use a private HashMap instance, which takes our provided value and inserts it as the key of the HashMap. The map's PRESENT value is never actually used or retrieved, but it allows us to use this backing map to verify whether or not the item exists in the Set.
If our provided value does not exist in the map, the call to map.put() will place the item in the map and return our object. Otherwise, the map remains unchanged and the method returns null. The HashMap is doing the hard work for the HashSet here.
This is different to the implementation provided by the AbstractCollection class, and hence the need to override.
Q2: AbstractSet's use of equals() & hashCode()
I think you have slightly misunderstood what AbstractSet is doing here. The purpose of AbstractSet is to provide a collection-safe implementation of equals and hashCode.
Equals checks are performed by verifying that we are comparing two Set objects, that they are of equal size, and that they contain the same items.
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof Set))
return false;
Collection<?> c = (Collection<?>) o;
if (c.size() != size())
return false;
try {
return containsAll(c);
} catch (ClassCastException unused) {
return false;
} catch (NullPointerException unused) {
return false;
}
}
The hashCode is produced by looping over the Set instance, and hashing each item iteratively:
public int hashCode() {
int h = 0;
Iterator<E> i = iterator();
while (i.hasNext()) {
E obj = i.next();
if (obj != null)
h += obj.hashCode();
}
return h;
}
Any class extending from AbstractSet will use this implementation of equals() and hashCode() unless it overrides them explicitly. This implementation takes preference over the default equals and hashCode methods defined in java.lang.Object.
The documentation you provided are for Java 7, and I was checking the code of java 8 and I found the below so I think it isn't the same for java 7, still you can use the same methodology of checking the code when the documentation isn't very clear for you:
Q1: HashSet Overrides the add method in AbstractCollection you can easily check this if you open the HashSet code in some ide. If a parent doesn't override some methods doesn't mean its children can't do it.
Q2: Again by checking the code we notice that AbstractSet defines its own implementation of equals and hashCode methods. It also overrides the removeAll method of AbstractCollection.
The code shown below does output:
[b]
[a, b]
However I would expect it to print two identical lines in the output.
import java.util.*;
public class Test{
static void test(String... abc) {
Set<String> s = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
s.addAll(Arrays.asList("a", "b"));
s.removeAll(Arrays.asList(abc));
System.out.println(s);
}
public static void main(String[] args) {
test("A");
test("A", "C");
}
}
The spec clearly states that removeAll
"Removes all this collection's elements that are also contained in the
specified collection."
So from my understanding current behavior is unpredictable . Please help me understand this
You only read documentation partly. You forgot one important paragraph from TreeSet:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
Now removeAll implementation comes from AbstractSet and utilizes equals method. According to your code you will have that "a".equals("A") is not true so that elements are not considered equal even if you provided a comparator which manages them when used in the TreeSet itself. If you try with a wrapper then the problem goes away:
import java.util.*;
import java.lang.*;
class Test
{
static class StringWrapper implements Comparable<StringWrapper>
{
public final String string;
public StringWrapper(String string)
{
this.string = string;
}
#Override public boolean equals(Object o)
{
return o instanceof StringWrapper &&
((StringWrapper)o).string.compareToIgnoreCase(string) == 0;
}
#Override public int compareTo(StringWrapper other) {
return string.compareToIgnoreCase(other.string);
}
#Override public String toString() { return string; }
}
static void test(StringWrapper... abc)
{
Set<StringWrapper> s = new TreeSet<>();
s.addAll(Arrays.asList(new StringWrapper("a"), new StringWrapper("b")));
s.removeAll(Arrays.asList(abc));
System.out.println(s);
}
public static void main(String[] args)
{
test(new StringWrapper("A"));
test(new StringWrapper("A"), new StringWrapper("C"));
}
}
This because you are now providing a consistent implementation between equals and compareTo of your object so you never have incoherent behavior between how the objects are added inside the sorted set and how all the abstract behavior of the set uses them.
This is true in general, a sort of rule of three for Java code: if you implement compareTo or equals or hashCode you should always implement all of them to avoid problems with standard collections (even if hashCode is less crucial unless you are using these objects in any hashed collection). This is specified many times around java documentation.
This is an inconsistency in the implementation of TreeSet<E>, bordering on the bug. The code will ignore custom comparator when the number of items in the collection that you pass to removeAll is greater than or equal to the number of items in the set.
The inconsistency is caused by a small optimization: if you look at the implementation of removeAll, which is inherited from AbstractSet, the optimization goes as follows:
public boolean removeAll(Collection<?> c) {
boolean modified = false;
if (size() > c.size()) {
for (Iterator<?> i = c.iterator(); i.hasNext(); )
modified |= remove(i.next());
} else {
for (Iterator<?> i = iterator(); i.hasNext(); ) {
if (c.contains(i.next())) {
i.remove();
modified = true;
}
}
}
return modified;
}
you can see that the behavior is different when c has fewer items than this set (top branch) vs. when it has as many or more items (bottom branch).
Top branch uses the comparator associated with this set, while the bottom branch uses equals for comparison c.contains(i.next()) - all in the same method!
You can demonstrate this behavior by adding a few extra elements to the original tree set:
s.addAll(Arrays.asList("x", "z", "a", "b"));
Now the output for both test cases becomes identical, because remove(i.next()) utilizes the comparator of the set.
The reason is because the comparator String.CASE_INSENSITIVE_ORDER you use is not consistent with equals.
As stated by TreeSet:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided)
must be consistent with equals if it is to correctly implement the Set interface.
Consistency with equals as stated by Comparable:
The natural ordering for a class C is said to be consistent with equals if and only if
e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2)
for every e1 and e2 of class C.
And as an example for the case insensitive comparator you use:
"a".compareTo("A") == 0 => true
while
"a".equals("A") => false
It's written in all decent java courses, that if you implement the Comparable interface, you should (in most cases) also override the equals method to match its behavior.
Unfortunately, in my current organization people try to convince me to do exactly the opposite. I am looking for the most convincing code example to show them all the evil that will happen.
I think you can beat them by showing the Comparable javadoc that says:
It is strongly recommended (though not required) that natural
orderings be consistent with equals. This is so because sorted sets
(and sorted maps) without explicit comparators behave "strangely" when
they are used with elements (or keys) whose natural ordering is
inconsistent with equals. In particular, such a sorted set (or sorted
map) violates the general contract for set (or map), which is defined
in terms of the equals method.
For example, if one adds two keys a and b such that (!a.equals(b) &&
a.compareTo(b) == 0) to a sorted set that does not use an explicit
comparator, the second add operation returns false (and the size of
the sorted set does not increase) because a and b are equivalent from
the sorted set's perspective.
So especially with SortedSet (and SortedMap) if the compareTo method returns 0, it assumes it as equal and doesn't add that element second time even the the equals method returns false, and causes confusion as specified in the SortedSet javadoc
Note that the ordering maintained by a sorted set (whether or not an
explicit comparator is provided) must be consistent with equals if the
sorted set is to correctly implement the Set interface. (See the
Comparable interface or Comparator interface for a precise definition
of consistent with equals.) This is so because the Set interface is
defined in terms of the equals operation, but a sorted set performs
all element comparisons using its compareTo (or compare) method, so
two elements that are deemed equal by this method are, from the
standpoint of the sorted set, equal. The behavior of a sorted set is
well-defined even if its ordering is inconsistent with equals; it just
fails to obey the general contract of the Set interface.
If you don't override the equals method, it inherits its behaviour from the Object class.
This method returns true if and only if the specified object is not null and refers to the same instance.
Suppose the following class:
class VeryStupid implements Comparable
{
public int x;
#Override
public int compareTo(VeryStupid o)
{
if (o != null)
return (x - o.x);
else
return (1);
}
}
We create 2 instances:
VeryStupid one = new VeryStupid();
VeryStupid two = new VeryStupid();
one.x = 3;
two.x = 3;
The call to one.compareTo(two) returns 0 indicating the instances are equal but the call to one.equals(two) returns false indicating they're not equal.
This is inconsistent.
Consistency of compareTo and equals is not required but strongly recommended.
I'll give it a shot with this example:
private static class Foo implements Comparable<Foo> {
#Override
public boolean equals(Object _other) {
System.out.println("equals");
return super.equals(_other);
}
#Override
public int compareTo(Foo _other) {
System.out.println("compareTo");
return 0;
}
}
public static void main (String[] args) {
Foo a, b;
a = new Foo();
b = new Foo();
a.compareTo(b); // prints 'compareTo', returns 0 => equal
a.equals(b); // just prints 'equals', returns false => not equal
}
You can see that your (maybe very important and complicated) comparission code is ignored when you use the default equals-method.
the method int compareTo(T o) allow you know if the T o is (in some way) superior or inferior of this, so it allow you to order a list of T o.
In the scenario of int compareTo(T o) you have to do :
is o InstanceOfThis ? => true/false ;
is o EqualOfThis ? => true/false ;
is o SuperiorOfThis ? => true/false ;
is o InferiorOfThis ? true/false ;
So you see you have the equality test, and the best way to not implement the equality two times is to put it in the boolean equals(Object obj) method.
I needed to sort my treemap based on it's value. The requirements of what I'm doing are such that I have to use a sorted map. I tried the solution here: Sort a Map<Key, Value> by values (Java) however as the comments say, this will make getting values from my map not work. So, instead I did the following:
class sorter implements Comparator<String> {
Map<String, Integer> _referenceMap;
public boolean sortDone = false;
public sorter(Map<String, Integer> referenceMap) {
_referenceMap = referenceMap;
}
public int compare(String a, String b) {
return sortDone ? a.compareTo(b) : _referenceMap.get(a) >= _referenceMap.get(b) ? -1 : 1;
}
}
So I leave sortDone to false until I'm finished sorting my map, and then I switch sortDone to true so that it compares things as normal. Problem is, I still cannot get items from my map. When I do myMap.get(/anything/) it is always null still.
I also do not understand what the comparator inconsistent with equals even means.
I also do not understand what the comparator inconsistent with equals even means.
As per the contract of the Comparable interface.
The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false.
It is strongly recommended (though not required) that natural orderings be consistent with equals.
I believe you need to change the line :
_referenceMap.get(a) >= _referenceMap.get(b) ? -1 : 1;
to
_referenceMap.get(a).compareTo(_referenceMap.get(b));
Since if the Integer returned by _referenceMap.get(a) is actually == in value to the Integer returned by _referenceMap.get(b) then you should ideally return 0, not -1.
It means you must implement, ie override, the equals() method to compare the same field(s) you are comparing for the compareTo() method.
It is good practice to override the hashCode() method to return a hash based on the same fields too.
I have just started using google's Guava collection (ComparisonChain and Objects). In my pojo I am overiding the equals method, so I did this first:
return ComparisonChain.start()
.compare(this.id, other.id)
.result() == 0;
However, I then realized that I could also use this :
return Objects.equal(this.id, other.id);
And I fail to see when comparison chain would be better as you can easily add further conditions like so:
return Objects.equal(this.name, other.name)
&& Objects.equal(this.number, other.number);
The only benefit I can see if you specifically need an int returned. It has two extra method calls (start and result) and is more complex to a noob.
Are there obvious benefits of ComparisonChain I missing ?
(Yes, I am also overriding hashcode with appropriate Objects.hashcode())
ComparisonChain allow you to check whether an object is less-than or greater-than another object by comparing multiple properties (like sorting a grid by multiple columns).
It should be used when implementing Comparable or Comparator.
Objects.equal can only check for equality.
ComparisonChain is meant to be used in helping objects implement the Comparable or Comparator interfaces.
If you're just implementing Object.equals(), then you're correct; Objects.equal is all you need. But if you're trying to implement Comparable or Comparator -- correctly -- that is much easier with ComparisonChain than otherwise.
Consider:
class Foo implements Comparable<Foo> {
final String field1;
final int field2;
final String field3;
public boolean equals(#Nullable Object o) {
if (o instanceof Foo) {
Foo other = (Foo) o;
return Objects.equal(field1, other.field1)
&& field2 == other.field2
&& Objects.equal(field3, other.field3);
}
return false;
}
public int compareTo(Foo other) {
return ComparisonChain.start()
.compare(field1, other.field1)
.compare(field2, other.field2)
.compare(field3, other.field3)
.result();
}
}
as opposed to implementing compareTo as
int result = field1.compareTo(other.field2);
if (result == 0) {
result = Ints.compare(field2, other.field2);
}
if (result == 0) {
result = field3.compareTo(other.field3);
}
return result;
...let alone the trickiness of doing that correctly, which is higher than you'd guess. (I have seen more ways to mess up compareTo than you can imagine.)
In the context of overriding methods in your POJOs, I think of a few of Guava's tools matching with a few standard methods.
Object.equals is handled using Objects.equals in roughly the manner you mentioned
Object.hashCode is handled with Objects.hashCode like return Objects.hashCode(id, name);
Comparable.compareTo is handled with ComparisonChain as below:
public int compareTo(Chimpsky chimpsky) {
return ComparisonChain.start()
.compare(this.getId(), chimpsky.getId())
.compare(this.getName(), chimpsky.getName())
.result();
}
I would be careful when using Guava's ComparisonChain because it creates an instance of it per element been compared so you would be looking at a creation of N x Log N comparison chains just to compare if you are sorting, or N instances if you are iterating and checking for equality.
I would instead create a static Comparator using the newest Java 8 API if possible or Guava's Ordering API which allows you to do that, here is an example with Java 8:
import java.util.Comparator;
import static java.util.Comparator.naturalOrder;
import static java.util.Comparator.nullsLast;
private static final Comparator<DomainObject> COMPARATOR=Comparator
.comparingInt(DomainObject::getId)
.thenComparing(DomainObject::getName,nullsLast(naturalOrder()));
#Override
public int compareTo(#NotNull DomainObject other) {
return COMPARATOR.compare(this,other);
}
Here is how to use the Guava's Ordering API: https://github.com/google/guava/wiki/OrderingExplained