Set interface and natural ordering [nonsensical interview test] [duplicate]

Set interface and natural ordering [nonsensical interview test] [duplicate] - java

This question already has answers here:
To store unique element in a collection with natural order
(5 answers)
Closed 7 years ago.
By a chance it happened to me twice that I got the same Java question during a job interview Java test. For me it seems like a nonsense. It goes something like this:
Which of this collections would you use if you needed a collection with no
duplicates and with natural ordering?
java.util.List
java.util.Map
java.util.Set
java.util.Collection
The closest answer would be Set. But as far as I know these interfaces, with exception of List do not define any ordering in their contract but it is the matter of the implementing classes to have or not to have defined ordering.
Was I right in pointing out in the test that the question is wrong?

The first major clue is "no duplicates." A mathematical set contains only unique items, which means no duplicates, so you are correct here.
In terms of ordering, perhaps the interviewer was looking for you to expand upon your answer. Just as a "Set" extends a "Collection" (in Java), there are more specific types of "Sets" possible in Java. See: HashSet, TreeSet, LinkedHashSet. For example, TreeSet is inherited from SortedSet interface.
However, it is most definitely true that a Java set does not provide any ordering. Frankly, I think this is a poorly worded question and you were right to point out the lack in precision.

Yes, you're correct that none of the answers given matches the requirements. A correct answer might have been SortedSet or its subinterface NavigableSet.

A Set with natural ordering is a SortedSet (which extends Set so it is-a Set), and a concrete implementation of that interface is TreeSet (which implements SortedSet so it is-a Set).

The correct answer for that test is Set Let's remember that it's asking for an interface that could provide that; given the right implementation, the Set interface could provide it.
The Map interface doesn't make any guarantees around what order
things are stored, as that's implementation specific. However, if you
use the right implementation (that is, [TreeMap][1] as spelled out
by the docs), then you're guaranteed a natural ordering and no
duplicate entries. However, there's no requirement about
key-value pairs.
The Set interface also doesn't make any guarantees around what order
things are stored in, as that's implementation specific. But, like
TreeMap, [TreeSet][2] is a set that can be used to store things in a
natural order with no duplicates. Here's how it'd look.
Set<String> values = new TreeSet<>();
The List interface will definitely allow duplicates, which
instantly rules it out.
The Collection interface doesn't have anything directly implementing
it, but it is the patriarch of the entire collections hierarchy.
So, in theory, code like this is legal:
Collection<String> values = new TreeSet<>();
...but you'd lose information about what
kind of collection it actually was, so I'd discourage its
usage.

Related

Which java data structures have a deterministic order of iteration?

In an interview, I was asked the following question:
Your application requires to store objects such that the order of
entries returned while iterating through the structure is
deterministic. In other words, if you iterate over the same
structure twice, the order of elements returned in both iterations
will be the same. Which of the following classes would you use?
Assume that structure is not mutated. (Check ANY that apply)
HashMap
LinkedHashSet
HashTable
LinkedHashMap
TreeSet
TreeMap
I suggested using a LinkedHashSet. Was this the correct answer? Why or why not?

A determenistic order just means that it's constantly reproducible - the same input will always provide the same iteration order. In this case, the answer is "all of the above". Although most Set's and Map's ordering can't be trusted, it is still determenistic, and will remain the same until the underlying implementation is changed (e.g., if you change or upgrade JVMs).
A predictable order is something more, though - it means that the collection guarantees the order items are returned when iterating the collection. Both "linked" types you mentioned above do that - the order that items were inserted to the collection is the order they will be returned when iterating over it. The "tree" types also guarantee a deterministic order of iteration - a sorted one.

As noted by #Elliot Frisch, they are all "deterministic" and will iterate in the same order if nothing has changed. That said, to paraphrase Animal House, some collections are more deterministic than others. :-)
Hash... collections have a deterministic iteration order which the JVM can "predict", but is very challenging for a human to predict and not worth the effort. In practice, they are not "predictable". As #Mureinik points out, the order is officially "unspecified" and subject to change of you change JVMs. The API docs describe this as "generally chaotic ordering" and all sane programmers would agree.
Linked... collections have "predictable iteration order" in that they iterate in the order elements were inserted, with the important caveat that if you insert the same element twice it retains the original order. i.e.
add("Tom");
add("Fred");
add("Tom");
would iterate "Tom", "Fred", not "Fred", "Tom"
This is clearly "more predictable" than Hash..., but still a bit challenging if elements get inserted multiple times and ordering is crucial. For stuff like properties files, XML, or JSON, Linked... collections are generally a good choice as they maintain the original order for nicer human viewing and comparison.
Tree... collections iterate the "most predictably", using the ordering provided by a Comparator at construction time, or else the "natural ordering" if the elements are Comparable. Assuming you have a predicable comparison method, they are completely predictable. In the Tom/Fred example, it would always iterate as "Fred", "Tom", unless your Comparator is unusual.

When answering this type of questions, I would highly suggest doing so according to the Java API Specification and not based on your assumptions of the implementations. So for example, even though you could argue that all the of those collections would have a deterministic iteration order provided that they are not mutated between repeated iterations, because you think it would not make sense to implement it that way, the only real answer—according to the options you listed—strictly adhering to the Java API Specification would be all except HashMap and HashTable.
The reason for answering this is that of all of them, according to the Java API Specification, the only classes that give no guarantee on the iteration order are those two I mentioned (HashMap and HashTable). So in general, when you program in Java, you should never assume an specific implementation of the API and or the JVM or anything that is based on an Specification unless that Specification.
So, as an example, what could be a problem of assuming a particular implementation for the HashSet collection? The thing is that the JDK that might be used to run your program (which is not necessarily the same you use for developing) could implement the HashTable with certain optimizations as long as it doesn't violate the Java API Specification, and such optimization could be such that if for example, a HashTable is not used for 10 minutes, then it could call the rehash function automatically in order to reorganized and optimize the access to its entries. And this is a possible scenario, because as noted here https://docs.oracle.com/javase/7/docs/api/java/util/Hashtable.html "The exact details as to when and whether the rehash method is invoked are implementation-dependent".

Why can't we use only comparable in every situation?

Possible Duplicate of
When should a class be Comparable and/or Comparator?
I understand the difference that is given in this link.
And also in the book that i am referring it is given that we can not use comparable when we need to sort the objects on the basis of more than one fields.
My Question:
I just want an example where we could not possibly use comparable and have to go for comparator in order to compare and please also show that with comparable why can't we compare on two different fields of object.
If you find this question as duplicate please provide link,, i have searched many questions but none has the example that i wanted.

If a class implements Comparable, this defines what is usually considered the natural ordering of it elements. In some cases this is the only ordering that may make sense, in other cases it might be the most widely used ordering. If you look for example at numbers, there is probably only one (total) ordering that makes sense (except maybe for taking the reverse). As others already have pointed out, their are other objects that have other useful orderings. What makes the primary ordering (or if there is even one) depends on your application. If you manage persons with adresses in you application, phonebook sort order could be considered the natural order if this is the most widely used one and sorting by age could be a secondary. Slightly OT: Beware of cases where non equal objects are considered equal wrt to the ordering, this may yield problems with containers like OrderedList etc.

Comparing apples with each other will result in classes of equal apples, like red ones, green ones, old and fresh ones. That's OK as long as you only interested in a rather broad equality. But if you you are going to receive a paycheck you are very happy that you are identifiable within you equality class.
So compareto is good for sorting and clustering and equals/hashcode is got for identification.

Comparable is mostly used when there is a 'known' default sort order and the object or class that we are ordering is editable or owned by the developer making the change.
Comparator is suitable where the class or object being ordered is not owned by the developer making the change like a web service response. It is also preferred when the natural ordering doesn't fit the objective that needs to be accomplished.

Is there a way to create a List/Set which keeps insertion order and does not allow duplicates in Java?

What is the most efficient way of maintaining a list that does not allow duplicates, but maintains insertion order and also allows the retrieval of the last inserted element in Java?

Try LinkedHashSet, which keeps the order of input.
Note that re-inserting an element would update its position in the input order, thus you might first try and check whether the element is already contained in the set.
Edit:
You could also try the Apache commons collections class ListOrderedSet which according to the JavaDoc (if I didn't missread anything again :) ) would decorate a set in order to keep insertion order and provides a get(index) method.
Thus, it seems you can get what you want by using new ListOrderedSet(new HashSet());
Unfortunately this class doesn't provide a generic parameter, but it might get you started.
Edit 2:
Here's a project that seems to represent commons collections with generics, i.e. it has a ListOrderedSet<E> and thus you could for example call new ListOrderedSet<String>(new HashSet<String>());

I don't think there's anything in the JDK which does this.
However, LinkedHashMap, which is used as the basis for LinkedHashSet, comes close: it maintains a circular doubly-linked list of the entries in the map. It only tracks the head of the list not the tail, but because the list is circular, header.before is the tail (the most recently inserted element).
You could therefore implement what you need on top of this. LinkedHashMap has not been designed for extension, so this is somewhat awkward. You could copy the code into your own class and add a suitable last() method (be aware of licensing issues here), or you could extend the existing class, and add a method which uses reflection to get at the private header and before fields.
That would get you a Map, rather than a Set. However, HashSet is already a wrapper which makes a Map look like a Set. Again, it is not designed for general extension, but you could write a subclass whose constructor calls the super constructor, then uses more reflection to replace the superclass's value of map with an instance of your new map. From there on, the class should do exactly what you want.
As an aside, the library classes here were all written by Josh Bloch and Neal Gafter. Those guys are two of the giants of Java. And yet the code in there is largely horrible. Never meet your heroes.

Just use a TreeSet.

Why doesn't Java Map extend Collection?

I was surprised by the fact that Map<?,?> is not a Collection<?>.
I thought it'd make a LOT of sense if it was declared as such:
public interface Map<K,V> extends Collection<Map.Entry<K,V>>
After all, a Map<K,V> is a collection of Map.Entry<K,V>, isn't it?
So is there a good reason why it's not implemented as such?
Thanks to Cletus for a most authoritative answer, but I'm still wondering why, if you can already view a Map<K,V> as Set<Map.Entries<K,V>> (via entrySet()), it doesn't just extend that interface instead.
If a Map is a Collection, what are the elements? The only reasonable answer is "Key-value pairs"
Exactly, interface Map<K,V> extends Set<Map.Entry<K,V>> would be great!
but this provides a very limited (and not particularly useful) Map abstraction.
But if that's the case then why is entrySet specified by the interface? It must be useful somehow (and I think it's easy to argue for that position!).
You can't ask what value a given key maps to, nor can you delete the entry for a given key without knowing what value it maps to.
I'm not saying that that's all there is to it to Map! It can and should keep all the other methods (except entrySet, which is redundant now)!

From the Java Collections API Design FAQ:
Why doesn't Map extend Collection?
This was by design. We feel that
mappings are not collections and
collections are not mappings. Thus, it
makes little sense for Map to extend
the Collection interface (or vice
versa).
If a Map is a Collection, what are the
elements? The only reasonable answer
is "Key-value pairs", but this
provides a very limited (and not
particularly useful) Map abstraction.
You can't ask what value a given key
maps to, nor can you delete the entry
for a given key without knowing what
value it maps to.
Collection could be made to extend
Map, but this raises the question:
what are the keys? There's no really
satisfactory answer, and forcing one
leads to an unnatural interface.
Maps can be viewed as Collections (of
keys, values, or pairs), and this fact
is reflected in the three "Collection
view operations" on Maps (keySet,
entrySet, and values). While it is, in
principle, possible to view a List as
a Map mapping indices to elements,
this has the nasty property that
deleting an element from the List
changes the Key associated with every
element before the deleted element.
That's why we don't have a map view
operation on Lists.
Update: I think the quote answers most of the questions. It's worth stressing the part about a collection of entries not being a particularly useful abstraction. For example:
Set<Map.Entry<String,String>>
would allow:
set.add(entry("hello", "world"));
set.add(entry("hello", "world 2"));
(assuming an entry() method that creates a Map.Entry instance)
Maps require unique keys so this would violate this. Or if you impose unique keys on a Set of entries, it's not really a Set in the general sense. It's a Set with further restrictions.
Arguably you could say the equals()/hashCode() relationship for Map.Entry was purely on the key but even that has issues. More importantly, does it really add any value? You may find this abstraction breaks down once you start looking at the corner cases.
It's worth noting that the HashSet is actually implemented as a HashMap, not the other way around. This is purely an implementation detail but is interesting nonetheless.
The main reason for entrySet() to exist is to simplify traversal so you don't have to traverse the keys and then do a lookup of the key. Don't take it as prima facie evidence that a Map should be a Set of entries (imho).

While you've gotten a number of answers that cover your question fairly directly, I think it might be useful to step back a bit, and look at the question a bit more generally. That is, not to look specifically at how the Java library happens to be written, and look at why it's written that way.
The problem here is that inheritance only models one type of commonality. If you pick out two things that both seem "collection-like", you can probably pick out a 8 or 10 things they have in common. If you pick out a different pair of "collection-like" things, they'll also 8 or 10 things in common -- but they won't be the same 8 or 10 things as the first pair.
If you look at a dozen or so different "collection-like" things, virtually every one of them will probably have something like 8 or 10 characteristics in common with at least one other one -- but if you look at what's shared across every one of them, you're left with practically nothing.
This is a situation that inheritance (especially single inheritance) just doesn't model well. There's no clean dividing line between which of those are really collections and which aren't -- but if you want to define a meaningful Collection class, you're stuck with leaving some of them out. If you leave only a few of them out, your Collection class will only be able to provide quite a sparse interface. If you leave more out, you'll be able to give it a richer interface.
Some also take the option of basically saying: "this type of collection supports operation X, but you're not allowed to use it, by deriving from a base class that defines X, but attempting to use the derived class' X fails (e.g., by throwing an exception).
That still leaves one problem: almost regardless of which you leave out and which you put in, you're going to have to draw a hard line between what classes are in and what are out. No matter where you draw that line, you're going to be left with a clear, rather artificial, division between some things that are quite similar.

I guess the why is subjective.
In C#, I think Dictionary extends or at least implements a collection:
public class Dictionary<TKey, TValue> : IDictionary<TKey, TValue>,
ICollection<KeyValuePair<TKey, TValue>>, IEnumerable<KeyValuePair<TKey, TValue>>,
IDictionary, ICollection, IEnumerable, ISerializable, IDeserializationCallback
In Pharo Smalltak as well:
Collection subclass: #Set
Set subclass: #Dictionary
But there is an asymmetry with some methods. For instance, collect: will takes association (the equivalent of an entry), while do: take the values. They provide another method keysAndValuesDo: to iterate the dictionary by entry. Add: takes an association, but remove: has been "suppressed":
remove: anObject
self shouldNotImplement
So it's definitively doable, but leads to some other issues regarding the class hierarchy.
What is better is subjective.

The answer of cletus is good, but I want to add a semantic approach. To combine both makes no sense, think of the case you add a key-value-pair via the collection interface and the key already exists. The Map-interface allows only one value associated with the key. But if you automatically remove the existing entry with the same key, the collection has after the add the same size as before - very unexpected for a collection.

Java collections are broken. There is a missing interface, that of Relation. Hence, Map extends Relation extends Set. Relations (also called multi-maps) have unique name-value pairs. Maps (aka "Functions"), have unique names (or keys) which of course map to values. Sequences extend Maps (where each key is an integer > 0). Bags (or multi-sets) extend Maps (where each key is an element and each value is the number of times the element appears in the bag).
This structure would allow intersection, union etc. of a range of "collections". Hence, the hierarchy should be:
Set
|
Relation
|
Map
/ \
Bag Sequence
Sun/Oracle/Java ppl - please get it right next time. Thanks.

Map<K,V> should not extend Set<Map.Entry<K,V>> since:
You can't add different Map.Entrys with the same key to the same Map, but
You can add different Map.Entrys with the same key to the same Set<Map.Entry>.

If you look at the respective data structure you can easily guess why Map is not a part of Collection. Each Collection stores a single value where as a Map stores key-value pair. So methods in Collection interface are incompatible for Map interface. For example in Collection we have add(Object o). What would be such implementation in Map. It doesn't make sense to have such a method in Map. Instead we have a put(key,value) method in Map.
Same argument goes for addAll(), remove(), and removeAll() methods. So the main reason is the difference in the way data is stored in Map and Collection.
Also if you recall Collection interface implemented Iterable interface i.e. any interface with .iterator() method should return an iterator which must allow us to iterate over the values stored in the Collection. Now what would such method return for a Map? Key iterator or a Value iterator? This does not make sense either.
There are ways in which we can iterate over keys and values stores in a Map and that is how it is a part of Collection framework.

Exactly, interface Map<K,V> extends
Set<Map.Entry<K,V>> would be great!
Actually, if it were implements Map<K,V>, Set<Map.Entry<K,V>>, then I tend to agree.. It seems even natural. But that doesn't work very well, right? Let's say we have HashMap implements Map<K,V>, Set<Map.Entry<K,V>, LinkedHashMap implements Map<K,V>, Set<Map.Entry<K,V> etc... that is all good, but if you had entrySet(), nobody will forget to implement that method, and you can be sure that you can get entrySet for any Map, whereas you aren't if you are hoping that the implementor has implemented both interfaces...
The reason I don't want to have interface Map<K,V> extends Set<Map.Entry<K,V>> is simply, because there will be more methods. And after all, they are different things, right? Also very practically, if I hit map. in IDE, I don't want to see .remove(Object obj), and .remove(Map.Entry<K,V> entry) because I can't do hit ctrl+space, r, return and be done with it.

Straight and simple.
Collection is an interface which is expecting only one Object, whereas Map requires Two.
Collection(Object o);
Map<Object,Object>

when to use Set vs. Collection?

Is there any practical difference between a Set and Collection in Java, besides the fact that a Collection can include the same element twice? They have the same methods.
(For example, does Set give me more options to use libraries which accept Sets but not Collections?)
edit: I can think of at least 5 different situations to judge this question. Can anyone else come up with more? I want to make sure I understand the subtleties here.
designing a method which accepts an argument of Set or Collection. Collection is more general and accepts more possibilities of input. (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Collection.)
designing a method which returns a Set or Collection. Set offers more guarantees than Collection (even if it's just the guarantee not to include one element twice). (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Set.)
designing a class that implements the interface Set or Collection. Similar issues as #2. Users of my class/interface get more guarantees, subclassers/implementers have more responsibility.
designing an interface that extends the interface Set or Collection. Very similar to #3.
writing code that uses a Set or Collection. Here I might as well use Set; the only reasons for me to use Collection is if I get back a Collection from someone else's code, or if I have to handle a collection that contains duplicates.

Collection is also the supertype of List, Queue, Deque, and others, so it gives you more options. For example, I try to use Collection as a parameter to library methods that shouldn't explicitly depend on a certain type of collection.
Generally, you should use the right tool for the job. If you don't want duplicates, use Set (or SortedSet if you want ordering, or LinkedHashSet if you want to maintain insertion order). If you want to allow duplicates, use List, and so on.

I think you already have it figured out- use a Set when you want to specifically exclude duplicates. Collection is generally the lowest common denominator, and it's useful to specify APIs that accept/return this, which leaves you room to change details later on if needed. However if the details of your application require unique entries, use Set to enforce this.
Also worth considering is whether order is important to you; if it is, use List, or LinkedHashSet if you care about order and uniqueness.

See Java's Collection tutorial for a good walk-through of Collection usage. In particular, check out the class hierarchy.

As #mmyers states, Collection includes Set, as well as List.
When you declare something as a Set, rather than a Collection, you are saying that the variable cannot be a List or a Map. It will always be a Collection, though. So, any function that accepts a Collection will accept a Set, but a function that accepts a Set cannot take a Collection (unless you cast it to a Set).

One other thing to consider... Sets have extra overhead in time, memory, and coding in order to guarantee that there are no duplicates. (Time and memory because sets are usually backed by a HashMap or a Tree, which adds overhead over a list or an array. Coding because you have to implement the hashCode() and equals() methods.)
I usually use sets when I need a fast implementation of contains() and use Collection or List otherwise, even if the collection shouldn't have duplicates.

You should use a Set when that is what you want.
For example, a List without any order or duplicates. Methods like contains are quite useful.
A collection is much more generic. I believe that what mmyers wrote on their usage says it all.

The practical difference is that Set enforces the set logic, i.e. no duplicates and unordered, while Collection does not. So if you need a Collection and you have no particular requirement for avoiding duplicates then use a Collection. If you have the requirement for Set then use Set. Generally use the highest interface possibble.

As Collection is a super type of Set and SortedSet these can be passed to a method which expects a Collection. Collection just means it may or may not be sorted, order or allow duplicates.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.