Possible Duplicate of
When should a class be Comparable and/or Comparator?
I understand the difference that is given in this link.
And also in the book that i am referring it is given that we can not use comparable when we need to sort the objects on the basis of more than one fields.
My Question:
I just want an example where we could not possibly use comparable and have to go for comparator in order to compare and please also show that with comparable why can't we compare on two different fields of object.
If you find this question as duplicate please provide link,, i have searched many questions but none has the example that i wanted.
If a class implements Comparable, this defines what is usually considered the natural ordering of it elements. In some cases this is the only ordering that may make sense, in other cases it might be the most widely used ordering. If you look for example at numbers, there is probably only one (total) ordering that makes sense (except maybe for taking the reverse). As others already have pointed out, their are other objects that have other useful orderings. What makes the primary ordering (or if there is even one) depends on your application. If you manage persons with adresses in you application, phonebook sort order could be considered the natural order if this is the most widely used one and sorting by age could be a secondary. Slightly OT: Beware of cases where non equal objects are considered equal wrt to the ordering, this may yield problems with containers like OrderedList etc.
Comparing apples with each other will result in classes of equal apples, like red ones, green ones, old and fresh ones. That's OK as long as you only interested in a rather broad equality. But if you you are going to receive a paycheck you are very happy that you are identifiable within you equality class.
So compareto is good for sorting and clustering and equals/hashcode is got for identification.
Comparable is mostly used when there is a 'known' default sort order and the object or class that we are ordering is editable or owned by the developer making the change.
Comparator is suitable where the class or object being ordered is not owned by the developer making the change like a web service response. It is also preferred when the natural ordering doesn't fit the objective that needs to be accomplished.
Related
According to the contract for a Set in Java, "it is not permissible for a set to contain itself as an element" (source). However, this is possible in the case of a HashSet of Objects, as demonstrated here:
Set<Object> mySet = new HashSet<>();
mySet.add(mySet);
assertThat(mySet.size(), equalTo(1));
This assertion passes, but I would expect the behavior to be to either have the resulting set be 0 or to throw an Exception. I realize the underlying implementation of a HashSet is a HashMap, but it seems like there should be an equality check before adding an element to avoid violating that contract, no?
Others have already pointed out why it is questionable from a mathematical point of view, by referring to Russell's paradox.
This does not answer your question on a technical level, though.
So let's dissect this:
First, once more the relevant part from the JavaDoc of the Set interface:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Interestingly, the JavaDoc of the List interface makes a similar, although somewhat weaker, and at the same time more technical statement:
While it is permissible for lists to contain themselves as elements, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a list.
And finally, the crux is in the JavaDoc of the Collection interface, which is the common ancestor of both the Set and the List interface:
Some collection operations which perform recursive traversal of the collection may fail with an exception for self-referential instances where the collection directly or indirectly contains itself. This includes the clone(), equals(), hashCode() and toString() methods. Implementations may optionally handle the self-referential scenario, however most current implementations do not do so.
(Emphasis by me)
The bold part is a hint at why the approach that you proposed in your question would not be sufficient:
it seems like there should be an equality check before adding an element to avoid violating that contract, no?
This would not help you here. The key point is that you'll always run into problems when the collection will directly or indirectly contain itself. Imagine this scenario:
Set<Object> setA = new HashSet<Object>();
Set<Object> setB = new HashSet<Object>();
setA.add(setB);
setB.add(setA);
Obviously, neither of the sets contains itself directly. But each of them contains the other - and therefore, itself indirectly. This could not be avoided by a simple referential equality check (using == in the add method).
Avoiding such an "inconsistent state" is basically impossible in practice. Of course it is possible in theory, using referential Reachability computations. In fact, the Garbage Collector basically has to do exactly that!
But it becomes impossible in practice when custom classes are involved. Imagine a class like this:
class Container {
Set<Object> set;
#Override
int hashCode() {
return set.hashCode();
}
}
And messing around with this and its set:
Set<Object> set = new HashSet<Object>();
Container container = new Container();
container.set = set;
set.add(container);
The add method of the Set basically has no way of detecting whether the object that is added there has some (indirect) reference to the set itself.
Long story short:
You cannot prevent the programmer from messing things up.
Adding the collection into itself once causes the test to pass. Adding it twice causes the StackOverflowError which you were seeking.
From a personal developer standpoint, it doesn't make any sense to enforce a check in the underlying code to prevent this. The fact that you get a StackOverflowError in your code if you attempt to do this too many times, or calculate the hashCode - which would cause an instant overflow - should be enough to ensure that no sane developer would keep this kind of code in their code base.
You need to read the full doc and quote it fully:
The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
The actual restriction is in the first sentence. The behavior is unspecified if an element of a set is mutated.
Since adding a set to itself mutates it, and adding it again mutates it again, the result is unspecified.
Note that the restriction is that the behavior is unspecified, and that a special case of that restriction is adding the set to itself.
So the doc says, in other words, that adding a set to itself results in unspecified behavior, which is what you are seeing. It's up to the concrete implementation to deal with (or not).
I agree with you that, from a mathematical perspective, this behavior really doesn't make sense.
There are two interesting questions here: first, to what extent were the designers of the Set interface trying to implement a mathematical set? Secondly, even if they weren't, to what extent does that exempt them from the rules of set theory?
For the first question, I will point you to the documentation of the Set:
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.
It's worth mentioning here that current formulations of set theory don't permit sets to be members of themselves. (See the Axiom of regularity). This is due in part to Russell's Paradox, which exposed a contradiction in naive set theory (which permitted a set to be any collection of objects - there was no prohibition against sets including themselves). This is often illustrated by the Barber Paradox: suppose that, in a particular town, a barber shaves all of the men - and only the men - who do not shave themselves. Question: does the barber shave himself? If he does, it violates the second constraint; if he doesn't, it violates the first constraint. This is clearly logically impossible, but it's actually perfectly permissible under the rules of naive set theory (which is why the newer "standard" formulation of set theory explicitly bans sets from containing themselves).
There's more discussion in this question on Math.SE about why sets cannot be an element of themselves.
With that said, this brings up the second question: even if the designers hadn't been explicitly trying to model a mathematical set, would this be completely "exempt" from the problems associated with naive set theory? I think not - I think that many of the problems that plagued naive set theory would plague any kind of a collection that was insufficiently constrained in ways that were analogous to naive set theory. Indeed, I may be reading too much into this, but the first part of the definition of a Set in the documentation sounds suspiciously like the intuitive concept of a set in naive set theory:
A collection that contains no duplicate elements.
Admittedly (and to their credit), they do place at least some constraints on this later (including stating that you really shouldn't try to have a Set contain itself), but you could question whether it's really "enough" to avoid the problems with naive set theory. This is why, for example, you have a "turtles all the way down" problem when trying to calculate the hash code of a HashSet that contains itself. This is not, as some others have suggested, merely a practical problem - it's an illustration of the fundamental theoretical problems with this type of formulation.
As a brief digression, I do recognize that there are, of course, some limitations on how closely any collection class can really model a mathematical set. For example, Java's documentation warns against the dangers of including mutable objects in a set. Some other languages, such as Python, at least attempt to ban many kinds of mutable objects entirely:
The set classes are implemented using dictionaries. Accordingly, the requirements for set elements are the same as those for dictionary keys; namely, that the element defines both __eq__() and __hash__(). As a result, sets cannot contain mutable elements such as lists or dictionaries. However, they can contain immutable collections such as tuples or instances of ImmutableSet. For convenience in implementing sets of sets, inner sets are automatically converted to immutable form, for example, Set([Set(['dog'])]) is transformed to Set([ImmutableSet(['dog'])]).
Two other major differences that others have pointed out are
Java sets are mutable
Java sets are finite. Obviously, this will be true of any collection class: apart from concerns about actual infinity, computers only have a finite amount of memory. (Some languages, like Haskell, have lazy infinite data structures; however, in my opinion, a lawlike choice sequence seems like a more natural way model these than classical set theory, but that's just my opinion).
TL;DR No, it really shouldn't be permitted (or, at least, you should never do that) because sets can't be members of themselves.
This question already has answers here:
To store unique element in a collection with natural order
(5 answers)
Closed 7 years ago.
By a chance it happened to me twice that I got the same Java question during a job interview Java test. For me it seems like a nonsense. It goes something like this:
Which of this collections would you use if you needed a collection with no
duplicates and with natural ordering?
java.util.List
java.util.Map
java.util.Set
java.util.Collection
The closest answer would be Set. But as far as I know these interfaces, with exception of List do not define any ordering in their contract but it is the matter of the implementing classes to have or not to have defined ordering.
Was I right in pointing out in the test that the question is wrong?
The first major clue is "no duplicates." A mathematical set contains only unique items, which means no duplicates, so you are correct here.
In terms of ordering, perhaps the interviewer was looking for you to expand upon your answer. Just as a "Set" extends a "Collection" (in Java), there are more specific types of "Sets" possible in Java. See: HashSet, TreeSet, LinkedHashSet. For example, TreeSet is inherited from SortedSet interface.
However, it is most definitely true that a Java set does not provide any ordering. Frankly, I think this is a poorly worded question and you were right to point out the lack in precision.
Yes, you're correct that none of the answers given matches the requirements. A correct answer might have been SortedSet or its subinterface NavigableSet.
A Set with natural ordering is a SortedSet (which extends Set so it is-a Set), and a concrete implementation of that interface is TreeSet (which implements SortedSet so it is-a Set).
The correct answer for that test is Set Let's remember that it's asking for an interface that could provide that; given the right implementation, the Set interface could provide it.
The Map interface doesn't make any guarantees around what order
things are stored, as that's implementation specific. However, if you
use the right implementation (that is, [TreeMap][1] as spelled out
by the docs), then you're guaranteed a natural ordering and no
duplicate entries. However, there's no requirement about
key-value pairs.
The Set interface also doesn't make any guarantees around what order
things are stored in, as that's implementation specific. But, like
TreeMap, [TreeSet][2] is a set that can be used to store things in a
natural order with no duplicates. Here's how it'd look.
Set<String> values = new TreeSet<>();
The List interface will definitely allow duplicates, which
instantly rules it out.
The Collection interface doesn't have anything directly implementing
it, but it is the patriarch of the entire collections hierarchy.
So, in theory, code like this is legal:
Collection<String> values = new TreeSet<>();
...but you'd lose information about what
kind of collection it actually was, so I'd discourage its
usage.
If only some of the fields of an object represents the actual state, I suppose these could be ignored when overriding equals and hashCode...
I get an uneasy feeling about this though, and wanted to ask,
Is this common practice?
Are there any potential pitfalls with this approach?
Is there any documentation or guidelines when it comes to ignoring some fields in equals / hashCode?
In my particular situation, I'm exploring a state-space of a problem. I'd like to keep a hash set of visited states, but I'm also considering including the path which lead to the state. Obviously, two states are equal, even though they are found through different paths.
This is based on how you would consider the uniqueness of a given object. If it has a primary key ( unique key) , then using that attribute alone is enough.
If you think the uniqueness is combination of 10 different attributes, then use all 10 attributes in the equals.
Then use only the attributes that you used in equals to generate the hashcode because same objects should generate the same hashcodes.
Selecting the attribute(s) for equals and hashcode is how you define the uniqueness of a given object.
Is this common practice? Yes
Are there any potential pitfalls with this approach? No
Is there any documentation or guidelines when it comes to ignoring some fields in equals / hashCode?
"The equals method for class Object implements the most discriminating
possible equivalence relation on objects;"
This is from object class Javadoc. But as the author of the class , you know how the uniqueness is defined.
Ultimately, "equals" means what you want it to mean. There is the restriction that "equal" values must return the same hashcode, and, of course, if presented with two identical address "equals" must return true. But you could, eg, have an "equals" that compared the contents of two web pages (ignoring the issue of repeatability for the nonce), and, even though the URLs were different, said "equal" if the page contents matched in some way.
The best documentation/guidelines I have seen for overriding the methods on Object was in Josh Bloch's Effective Java. It has a whole chapter on "Methods Common to All Objects" which includes sections about "Obey the general contract when overriding equals" and "Always override hashCode when you override equals". It describes, in detail, the things you should consider when overriding these two methods. I won't give away the answer directly; the book is definitely worth the cost for every Java developer.
I have a question about those two interfaces in Java.
Set extends Collection, but doesn't add anything. They are exactly the same.
Am I missing something here ?
Set doesn't allow duplicates.
It's a semantic difference, not a syntactic one.
From the documentation of Collection:
A collection represents a group of objects, known as its elements. Some collections allow duplicate elements and others do not. Some are ordered and others unordered.
From the documentation of Set:
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.
That should clarify the difference between a Set and a (the more general interface) Collection.
Good question. I guess the main purpose of explicitly having an interface for the concept of a Set as compared to the concept of a Collection is to actually formally distinguish the concepts. Let's say you're writing a method
void x(Collection<?> c);
You won't have the same idea of what arguments you want to get, as if you were writing
void x(Set<?> s);
The second method expects Collections that contain every element at most once (i.e. Sets). That's a big semantic difference to the first method, which doesn't care whether it receives Sets, Lists or any other type of Collection
If you look closely, the Javadoc of the Set method is different as well, explicitly showing the different notions that come into play when talking about Collection or Set
Collection is a more generic interface which comprises of Lists, Queues, Sets and many more.
Have a look at the 'All Known Subinterfaces' section here.
Everything is in the documentation:
Set - A collection that contains no
duplicate elements. More formally,
sets contain no pair of elements e1
and e2 such that e1.equals(e2), and at
most one null element. As implied by
its name, this interface models the
mathematical set abstraction.
and
Collection - The root interface in the
collection hierarchy. A collection
represents a group of objects, known
as its elements. Some collections
allow duplicate elements and others do
not. Some are ordered and others
unordered. The SDK does not provide
any direct implementations of this
interface: it provides implementations
of more specific subinterfaces like
Set and List. This interface is
typically used to pass collections
around and manipulate them where
maximum generality is desired.
It is only to distinguish the implementation and future usage.
This came from the Set theory and dictionary
Collection - something that is collected; a group of objects or an amount of material accumulated in one location, especially for some purpose or as a result of some process
Set - is a collection of distinct objects
Additionally, the Set documentation defines a contract for .equals, which says "only other Sets may be equal to this Set". If we couldn't recognize the other Sets by their type (with instanceof), it would be impossible to implement this.
If it were only for equals(), it would be possible to have a allowsDuplicates() method for Collection. But there are often cases where APIs want to say "please don't give me duplicates" or "I guarantee that this does not contain duplicates", and in Java there is no way to say in a method declaration "please give only collections whose allowsDuplicates() method returns false". Thus the additional type.
Is there any practical difference between a Set and Collection in Java, besides the fact that a Collection can include the same element twice? They have the same methods.
(For example, does Set give me more options to use libraries which accept Sets but not Collections?)
edit: I can think of at least 5 different situations to judge this question. Can anyone else come up with more? I want to make sure I understand the subtleties here.
designing a method which accepts an argument of Set or Collection. Collection is more general and accepts more possibilities of input. (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Collection.)
designing a method which returns a Set or Collection. Set offers more guarantees than Collection (even if it's just the guarantee not to include one element twice). (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Set.)
designing a class that implements the interface Set or Collection. Similar issues as #2. Users of my class/interface get more guarantees, subclassers/implementers have more responsibility.
designing an interface that extends the interface Set or Collection. Very similar to #3.
writing code that uses a Set or Collection. Here I might as well use Set; the only reasons for me to use Collection is if I get back a Collection from someone else's code, or if I have to handle a collection that contains duplicates.
Collection is also the supertype of List, Queue, Deque, and others, so it gives you more options. For example, I try to use Collection as a parameter to library methods that shouldn't explicitly depend on a certain type of collection.
Generally, you should use the right tool for the job. If you don't want duplicates, use Set (or SortedSet if you want ordering, or LinkedHashSet if you want to maintain insertion order). If you want to allow duplicates, use List, and so on.
I think you already have it figured out- use a Set when you want to specifically exclude duplicates. Collection is generally the lowest common denominator, and it's useful to specify APIs that accept/return this, which leaves you room to change details later on if needed. However if the details of your application require unique entries, use Set to enforce this.
Also worth considering is whether order is important to you; if it is, use List, or LinkedHashSet if you care about order and uniqueness.
See Java's Collection tutorial for a good walk-through of Collection usage. In particular, check out the class hierarchy.
As #mmyers states, Collection includes Set, as well as List.
When you declare something as a Set, rather than a Collection, you are saying that the variable cannot be a List or a Map. It will always be a Collection, though. So, any function that accepts a Collection will accept a Set, but a function that accepts a Set cannot take a Collection (unless you cast it to a Set).
One other thing to consider... Sets have extra overhead in time, memory, and coding in order to guarantee that there are no duplicates. (Time and memory because sets are usually backed by a HashMap or a Tree, which adds overhead over a list or an array. Coding because you have to implement the hashCode() and equals() methods.)
I usually use sets when I need a fast implementation of contains() and use Collection or List otherwise, even if the collection shouldn't have duplicates.
You should use a Set when that is what you want.
For example, a List without any order or duplicates. Methods like contains are quite useful.
A collection is much more generic. I believe that what mmyers wrote on their usage says it all.
The practical difference is that Set enforces the set logic, i.e. no duplicates and unordered, while Collection does not. So if you need a Collection and you have no particular requirement for avoiding duplicates then use a Collection. If you have the requirement for Set then use Set. Generally use the highest interface possibble.
As Collection is a super type of Set and SortedSet these can be passed to a method which expects a Collection. Collection just means it may or may not be sorted, order or allow duplicates.