Do containsAll() and retainAll() in the Collection interface address cardinality? - java

In Java, the containsAll and retainAll in the AbstractCollection class explicitly state that cardinality is not respected, so in other words it does not matter how many instances of a value are on each side. Since all Java collections in the standard library extend AbstractCollection, it is assumed that all of them work the same.
However, the documentation of these methods in the Collection interface does not say anything. Is one supposed to infer from AbstractCollection, or was this left unspecified on purpose to allow one to define collections that work differently?
For example, Bag in apache-collections explicitly states that it does respect cardinality, and claims that it violates the contract of the version from Collection (even though it doesn't really).
So, what are the semantics of these operations in Collection rather than in AbstractCollection?
Edit: Tho those who are wondering about why I would care, it's because as part of my Ph.D. work I demonstrated that developers don't expect the conformance violation in Apache, but I'm trying to understand why the Collection interface was left so ambiguous.

The javadocs for containsAll (in Collection) say:
Returns: true if this collection
contains all of the elements in the
specified collection
and for retainAll (in Collection):
Retains only the elements in this
collection that are contained in the
specified collection (optional
operation). In other words, removes
from this collection all of its
elements that are not contained in the
specified collection.
I read containsAll's contract to mean that calling a.containsAll(b) will return true, if and only if, calling a.contains(bElem) for each element bElem in b would return true. I would also take it to imply that a.containsAll(someEmptyCollection) would also return true. As you state the javadocs for AbstractCollection more explicitly state this:
This implementation iterates over the
specified collection, checking each
element returned by the iterator in
turn to see if it's contained in this
collection. If all elements are so
contained true is returned, otherwise
false.
I agree that the contact for Collection for containsAll sould be more explicit to avoid any possiblity for confusion. (And that the reading of the javadocs for AbstractCollection should NOT have been necessary to confirm ones understanding of Collection)
I would not have made an assumption with regard to number of duplicate elements after a call to retainAll. The stated contract in Collection (by my reading) doesn't imply either way how duplicates in either collection would be handled. Based on my reading of retainAll in collection multiple possible results of a.retainAll(b) are all reasonable:
result contains 1 of each element that has at least one copy in both a and b
result contains each element (including duplicates) that was in a, except those that are not in b
or even, result contains somewhere between 1 and the number of copies found in a of each element in a, except those not in b.
I would have expected either #1 or #2, but would assume any of the the three to be legal based on the contract.
The javadocs for AbstractCollection confirm that it uses #2:
This implementation iterates over this
collection, checking each element
returned by the iterator in turn to
see if it's contained in the specified
collection. If it's not so contained,
it's removed from this collection with
the iterator's remove method
Although since this isn't in my reading of the original Collection interface's contract, I wouldn't necessarily assume the behavior of Collection to generally be this way.
Perhaps you should consider submitting suggested updates to the JavaDoc once you're done.
As to 'why the Collection interface was left so ambiguous' - I seriously doubt it was intentionally done - probably just something that wasn't given its due priority when that part of the API's were being written.

I don't think Collection defines it this way or the other, but it simply became sort of a convention to follow AbstractCollection behavior, for example google-collections do: see their Multiset documentation (Multiset is what they call a Bag)

Related

Can/should one write a Comparator consistent with Object's equals method

I have an object, Foo which inherits the default equals method from Object, and I don't want to override this because reference equality is the identity relation that I would like to use.
I now have a specific situation in which I would now like to compare these objects according to a specific field. I'd like to write a comparator, FooValueComparator, to perform this comparison. However, if my FooValueComparator returns 0 whenever two objects have the same value for this particular field, then it is incompatible with the equals method inherited from Object, along with all the problems that entails.
What I would like to do would be to have FooValueComparator compare the two objects first on their field value, and then on their references. Is this possible? What pitfalls might that entail (eg. memory locations being changed causing the relative order of Foo objects to change)?
The reason I would like my comparator to be compatible with equals is because I would like to have the option of applying it to SortedSet collections of Foo objects. I don't want a SortedSet to reject a Foo that I try to add just because it already contains a different object having the same value.
This is described in the documentation of Comparator:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
Caution should be exercised when using a comparator capable of imposing an ordering inconsistent with equals to order a sorted set (or sorted map). Suppose a sorted set (or sorted map) with an explicit comparator c is used with elements (or keys) drawn from a set S. If the ordering imposed by c on S is inconsistent with equals, the sorted set (or sorted map) will behave "strangely." In particular the sorted set (or sorted map) will violate the general contract for set (or map), which is defined in terms of equals.
It short, if the implementation of Comparator is not consistent with equals method, then you should know what you're doing and you're responsible of the side effects of this design, but it's not an imposition to make the implementation consistent to Object#equals. Still, take into account that it is preferable to do it in order to not cause confusion for future coders that will maintain the system. Similar concept applies when implementing Comparable.
An example of this in the JDK may be found in BigDecimal#compareTo, which explicitly states in javadoc that this method is not consistent with BigDecimal#equals.
If your intention is to use a SortedSet<YourClass> then probably you're using the wrong approach. I would recommend using a SortedMap<TypeOfYourField, Collection<YourClass>> (or SortedMap<TypeOfYourField, YourClass>, in case there are no equals elements for the same key) instead. It may be more work to do, but it provides you more control of the data stored/retrieved in/from the structure.
You may have several comparators for a given class, i.e each per different field. In that case equals can not be reused. Therefore the answer is not necessarily. You should make them consistence however if your collection is stored in a sorted (map or tree) and the comperator is used to determined element position in that collection.
See documentation for details.

Java collection interface that guarantees no duplicates as well as preservation of insertion order

Is there a java collection interface that guarantees no duplicates as well as the preservation of insertion order at the same time?
This is exactly what LinkedHashSet is doing? However, I am wondering if there is also an interface guaranteeing the same thing in order to avoid direct dependency on some specific class?
SortedSet is referring only to the natural order (and is not implemented by LinkedHashSet).
Essentially, I am looking for an interface that would indicate that the iteration order of elements is significant (and at the same time it contains no duplicates, i.e., List obviously would not apply).
Thanks!
UPDATE this question is not asking for an implementation or a data structure (as in the question to which this was marked as a duplicate). As several people pointed out as clarification, I am looking for an interface that demands both properties (no duplicates and significant order) in its contract. The application for this would be that I can return objects of this type to clients without promising any specific implementation.
UPDATE 2 Moreover, the related question specifically asks for preserving duplicates in contrast to this question. So I am pretty certain it is not a duplicate.
No interface in the JDK collections provides that.
You could try to build it by combining Set and List. Any collection implementing Set should not allow duplicate elements, and any collection implementing List should maintain order.
But then, no class in the JDK collection implements both Set and List. Because unfortunately LinkedHashSet does not implement List.
Of course, you could build one implementation easily by wrapping a LinkedHashSet (by composition patter, not by derivation) and adding a get(int i) method, or by wrapping an ArrayList (again by composition) and throwing an IllegalArgumentException when trying to add a new element.
The most tricky part IMHO would be the addAll method as both interfaces define it with different semantics (emphasize mine) :
Set: Adds all of the elements in the specified collection to this set if they're not already present
List : Appends all of the elements in the specified collection to the end of this list, in the order that they are returned by the specified collection's iterator
As you cannot meet both requirements is source collection contains duplicates, my advice would be that addAll throws an IllegalArgumentException in that case, or more simply that it always throw an UnsupportedOperationException as addAll is an optional operation for both interfaces

Is the order of HashMap elements reproducible?

First of all, I want to make it clear that I would never use a HashMap to do things that require some kind of order in the data structure and that this question is motivated by my curiosity about the inner details of Java HashMap implementation.
You can read in the java documentation on Object about the Object method hashCode.
I understand from there that hashCode implementation for classes such as String and basic types wrappers (Integer, Long,...) is predictable once the value contained by the object is given. An example of that would be that calls to hashCode for any String object containing the value hello should return always: 99162322
Having an algorithm that always insert into an empty Java HashMap where Strings are used as keys the same values in the same order. Then, the order of its elements at the end should be always the same, am I wrong?
Since the hash code for a concrete value is always the same, if there are not collisions the order should be the same.
On the other hand, if there are collisions, I think (I don't know the facts) that the collisions resolutions should result in the same order for exactly the same input elements.
So, isn't it right that two HashMap objects with the same elements, inserted in the same order should be traversed (by an iterator) giving the same elements sequence?
As far as I know the order (assuming we call "order" the order of elements as returned by values() iterator) of the elements in HashMap are kept until map rehash is performed. We can influence on probability of that event by providing capacity and/or loadFactor to the constructor.
Nevertheless, we should never rely on this statement because the internal implementation of HashMap is not a part of its public contract and is a subject to change in future.
I think you are asking "Is HashMap non-deterministic?". The answer is "probably not" (look at the source code of your favourite implementation to find out).
However, bear in mind that because the Java standard does not guarantee a particular order, the implementation is free to alter at any time (e.g. in newer JRE versions), giving a different (yet deterministic) result.
Whether or not that is true is entirely dependent upon the implementation. What's more important is that it isn't guaranteed. If you order is important to you there are options. You could create your own implementation of Map that does preserve order, you can use a SortedMap/LinkedHashMap or you can use something like the apache commons-collections OrderedMap: http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/OrderedMap.html.

LinkedHashSet as a returning type of API public method

Let's consider the following example.
Writing API which has public method which returns Collection of unique Objects.
I believe that is is good to write return type of that method Set to show to user that the items are unique.
In case when these items are unique and ordered is it a right idea to write return type LinkedHashSet or it is good to be Collection?
I know collections which are unique and sorted. I what to know it is a good idea to set public method's return type class(TreeSet,SortedSet,LinkedHashSet). In in terms of oop.
You can return SortedSet - it means the items are sorted, and are unique.
You can also use SetUniqueList (from commons-collection) and return List (indicating in the javadoc that the elements are unique), or any set and return Set (and indicate the ordered property in javadoc)
LinkedHashSet retains the insertion order, but since your object is likely doing the inserts, it means nothing to the client.
I'd recommend against returning LinkedHashSet (unless you have a very good justification for it). If you return Set, you can change the Set implementation as you see fit, e.g. HashSet, TreeSet etc
In this case, I think your suggestion of returning Set is a good one as it does indicate that the items are unique. This also indicates that contains will generally be fast (O(1) or O(log n)).
On the other hand, Collection is very generic, but all it tells the caller is that it is a plain old group of somethings without any special constraints on ordering or uniqueness. Specifying Set means that there isn't any confusion about uniqueness, and you can use it anywhere where a Collection can be used anyway.
If you're items are unique then I would return a Set. Then include in the Javadoc for the method that the items are guaranteed to be in sorted order.
To answer your question, you should ask yourself: "What is the most generic type that specifies the characteristcs of what this method returns?"
If the characteristic of the method is to return a a handful of unique objects, sorted in some way, then the most generic standard class that represents this is SortedSet.
If you return a TreeSet, then the method is giving details on the implementation of what it returns (ie, TreeSet is a concrete class, not an interface), which often you want to avoid in OOP.
If you return a Collection, you are not stating that the objects are unique, nor that they are sorted in some way.
If you return a LinkedHashSet, not only you fail to state that the returned collection is somehow sorted, but you also fail in the abstraction aspect of OOP (this is a concrete class, so you are leaking implementation details; always try to return interfaces, unless you have a good reason).
I would only give a method the return type LinkedHashSet<...> if it's part of the method's contract that its return-value is a Set<...> with a consistent ordering. And even then I'd be a bit wary of doing that unless it's also part of the method's contract that its return-value is modifiable, since otherwise LinkedHashSet<...> also precludes the use of Collections.unmodifiableSet(...).
In most cases, I think Set<...> is a better return-type. Alternatively, if the consistent ordering is particularly important, then you can use SortedSet<...> and switch to one of the implementations of that (such as TreeSet<...>); that still allows Collections.unmodifiableSortedSet(...).
It you want to make it clear they are unique and are ordered purely based on the order of insertion, you could return LinkedHashSet However, Set is usually the better choice.

Is there a way to check if two Collections contain the same elements, independent of order?

I've been looking for a method that operates like Arrays.equals(a1, a2), but ignoring the element order. I haven't been able to find it in either Google Collections (something like Iterables.elementsEqual(), but that does account for ordering) and JUnit (assertEquals() obviously just calls equals() on the Collection, which depends on the Collection implementation, and that's not what I want)
It would be best if such a method would take Iterables, but I'm also fine with simply taking Collections
Such a method would of course take into account any duplicate elements in the collection (so it can't simply test for containsAll()).
Note that I'm not asking how to implement such a thing, I was just wondering if any of the standard Collections libraries have it.
Apache commons-collections has CollectionUtils#isEqualCollection:
Returns true if the given Collections contain exactly the same elements with exactly the same cardinality.
That is, if the cardinality of e in a is equal to the cardinality of e in b, for each element e in a or b.
Which is, I think, exactly what you're after.
This is three method calls and uses Google CollectionsGuava, but is possibly as simple as it gets:
HashMultiset.create(c1).equals(HashMultiset.create(c2));
Creating the temporary Multisets may appear wasteful, but to compare the collections efficiently you need to index them somehow.
If you want to ignore order, then how about testing sets for equality?
new HashSet(c1).equals(new HashSet(c2))

Categories