We are learning about the Collection Interface and I was wondering if you all have any good advice for it's general use? What can you do with an Collection that you cannot do with an array? What can you do with an array that you cannot do with a Collection(besides allowing duplicates)?
The easy way to think of it is: Collections beat object arrays in basically every single way. Consider:
A collection can be mutable or immutable. A nonempty array must always be mutable.
A collection can allow or disallow null elements. An array must always permit null elements.
A collection can be thread-safe; even concurrent. An array is never safe to publish to multiple threads.
A list or set's equals, hashCode and toString methods do what users expect; on an array they are a common source of bugs.
A collection is type-safe; an array is not. Because arrays "fake" covariance, ArrayStoreException can result at runtime.
A collection can hold a non-reifiable type (e.g. List<Class<? extends E>> or List<Optional<T>>). An array will generate a warning for this.
A collection can have views (unmodifiable, subList...). No such luck for an array.
A collection has a full-fledged API; an array has only set-at-index, get-at-index, length and clone.
Type-use annotations like #Nullable are very confusing with arrays. I promise you can't guess what #A String #B [] #C [] means.
Because of all the reasons above, third-party utility libraries should not bother adding much additional support for arrays, focusing only on collections, so you also have a network effect.
Object arrays will never be first-class citizens in Java APIs.
A couple of the reasons above are covered -- but in much greater detail -- in Effective Java, Third Edition, Item 28, from page 126.
So, why would you ever use object arrays?
You're very tightly optimizing something
You have to interact with an API that uses them and you can't fix it
so convert to/from a List as close to that API as you can
Because varargs (but varargs is overused)
so ... same as previous
Obviously some collection implementations must be using them
I can't think of any other reasons, they suck bad
It's basically a question of the desired level of abstraction.
Most collections can be implemented in terms of arrays, but they provide many more methods on top of it for your convenience. Most collection implementations I know of for instance, can grow and shrink according to demand, or perform other "high-level" operations which basic arrays can't.
Suppose for instance that you're loading strings from a file. You don't know how many new-line characters the file contains, thus you don't know what size to use when allocating the array. Therefore an ArrayList is a better choice.
The details are in the sub interfaces of Collection, like Set, List, and Map. Each of those types has semantics. A Set typically cannot contain duplicates, and has no notion of order (although some implementations do), following the mathematical concept of a Set. A List is closest to an Array. A Map has specific behavior for push and get. You push an object by its key, and you retrieve with the same key.
There are even more details in the implementations of each collection type. For example, any of the hash based collections (e.g. HashSet, HasMap) are based on the hashcode() method that exists on any Java object.
You could simulate the semantics of any collection type based of an array, but you would have to write a lot of code to do it. For example, to back a Map with an array, you would need to write a method that puts any object entered into your Map into a specific bucket in the array. You would need to handle duplicates. For an array simulating a Set, you would need to write code to not allow duplicates.
The Collection interface is just a base interface for specialised collections -- I am not aware yet of a class that simply just implements Collection; instead classes implement specialized interfaces which extend Collection. These specialized interfaces and abstract classes provide functionality for working with sets (unique objects), growing arrays (e.g. ArrayList), key-value maps etc -- all of which you cannot do out of the box with an array.
However, iterating through an array and setting/reading items from an array remains one of the fastest methods of dealing with data in Java.
One advantage is the Iterator interface. That is all Collections implement an Iterator. An Iterator is an object that knows how to iterate over the given collection and present the programmer with a uniformed interface regardless of the underlying implementation. That is, a linked list is traversed differently from a binary tree, but the iterator hides these differences from the programmer making it easier for the programmer to use one or the other collection.
This also leads to the ability to use various implementations of Collections interchangeably if the client code targets the Collection interface iteself.
Related
Oracle site's definition of collection is:
A collection — sometimes called a container — is simply an object that groups multiple
elements into a single unit.
I know Java provides java.util.Collection. It includes Set, ArrayList, Queue, etc.
My question is: Will I be wrong if I refer an array of objects as a collection of objects? (Even though java.util.Collection probably didn't include array)
Edit:
Something interesting I found. This is how Microsoft defines arrays & collections:
http://msdn.microsoft.com/en-sg/library/9ct4ey7x(v=vs.90).aspx
It's somewhat a question of semantics. The word "collection" in English could mean "more than one", and thus, yes, an array is a collection. However, in Java, you'd usually see it spelled as Collection, with a capital C - i.e., an object that is of a subtype of java.util.Collection. For this meaning, you cannot use an array, as arrays in java do not implement this interface.
I don't see why this is a problem. In the literal sense an array is a 'collection' (Read).
But when you dive into Java the meaning of a collection changes from a mere group of objects to a set of mechanisms to manage(store,retrieve,operate etc) the objects and is specified by the prototype Collection interface.
So unless you are overriding the interface in Java your array isn't exactly a collection.
An array is an object with a fix count - including 0 - of places for variables of the same type.
Nothing more.
As you want, you can say this is a kind of collection, or list or a sequence.
But all these terms are well known as names, or part of it, of interfaces or classes which are much more than a simple array
So, if you use these names for an array, other people may not understand correctly what you mean.
My recommendation: An array is just an array, so call it array
One difference that can help you with this dilemma. Array is an object with contiguous but limited memory.While Collection is a set of objects by the virture of which each object has it's different memory address.
Arrays are relatively faster in operations as compared to Collections.
Collections give you a lot of utility methods on top of Arrays.
I have often read in many places that one should avoid returning an iterable and return a collection instead. For example -
public Iterable<Maze> Portals() {
// a list of some maze configurations
List<Maze> mazes = createMazes();
...
return Collections.unmodifiableList(mazes);
}
Since returning an iterable is only useful for using it in foreach loop, while collection already provides an iterator and provides much more control. Could you please tell me when it is beneficial to specifically return an iterable in a method? Or we should always return a collection instead?
Note : This question is not about Guava library
Returning an Iterable would be beneficial when we need to lazily load a collection that contains a lot of elements.
The following quote from Google Collections FAQ seems to support the idea of lazy loading:
Why so much emphasis on Iterators and Iterables?
In general, our methods do not require a Collection to be passed in
when an Iterable or Iterator would suffice. This distinction is
important to us, as sometimes at Google we work with very large
quantities of data, which may be too large to fit in memory, but which
can be traversed from beginning to end in the course of some
computation. Such data structures can be implemented as collections,
but most of their methods would have to either throw an exception,
return a wrong answer, or perform abysmally. For these situations,
Collection is a very poor fit; a square peg in a round hole.
An Iterator represents a one-way scrollable "stream" of elements, and
an Iterable is anything which can spawn independent iterators. A
Collection is much, much more than this, so we only require it when we
need to.
I can see advantages and disadvantages:
One advantage is that Iterable is a simpler interface than Collection. If you have a non-standard collection type, it may be easier to make it Iterable than Collection. Indeed, there are some kinds of collection for which some of the Collection methods are problematic to implement. For example, lazy collections types and collections where you don't want to rely on the standard equals(Object) method to determine membership.
One disadvantage is that Iterable is functionality poor. If you have a concrete type that implements Collection, and you return it as an Iterable, you are removing the possibility that the code can (directly) call a variety of useful collection methods.
There are some cases where neither Iterable or Collection are a good fit; e.g. specialist collections of primitive types ... where you need to avoid the overheads of using the primitive wrapper types.
You can't really say whether it is good or bad practice to return an Iterable. It depends on the circumstances; e.g. the purpose of the API you are designing, and the requirements or constraints that you want / need to place on it.
The problem is that if underlying collection changes, you will be in trouble.
If you are using a collection which throws concurrentmodification exception then you have to take care of it as well but with collection there are no such issues.
Return the most specific type that makes sense for the use in question. If you have a method that's creating a new collection, for example, or you can easily wrap the collection in an unmodifiable wrapper, returning the collection as a Collection, or even a List or Set, makes the client developer's life a little easier.
Returning Iterable makes sense for code where the values may be generated on-the-fly; you could imagine a Fibonacci generator, for example, that created an Iterator that calculated the next number instead of trying to store some lookup table. If you're writing framework or interface code where such a "streaming" sort of API might be useful (Guava and its functional classes do a good bit of this), then specifying Iterable instead of a collection type might be worth the loss of flexibility on the consumer side.
This question already has answers here:
Type List vs type ArrayList in Java [duplicate]
(15 answers)
Closed 10 years ago.
What are the fundamental differences between the two objects? Is one more efficient? Does one have more methods?
List is in interface while ArrayList is a class.
See ArrayList, and List.
E.g, you can't use this setup:
List<String> list = new List<String>();... Because it's an interface.
However, this works:
ArrayList<String> arrayList = new ArrayList<String>();
Also... You can do as duffymo says below, which is more or less the same as implementing the List interface (making your own list implementation).
Consider a line like the following:
List<String> names = new ArrayList<String>();
If you're new to object-oriented architectures, you might have expected instead to see something like ArrayList<String> names = new ArrayList<String>();. After all, you've just said that it's a new ArrayList, so shouldn't you store it in a variable of type ArrayList?
Well, you certainly can do that. However, List is an interface--like a template of sorts--that ArrayList is said to inherit. It is a contract that says "anytime you use a List implementation, you can expect these methods to be available". In the case of List, the methods are things like add, get, etc.
But ArrayList is only one implementation of List. There are others, such as LinkedList. The two have the same interface, and can be used the same way, but work very differently behind the scenes. Where ArrayList is "random" access, meaning that it directly finds a specific element of the array without iterating through the whole list, LinkedList does have to start from the first element and go one-by-one until it gets to the element you need.
The thing is, while you do need to specify which you want when you create the object, you generally only need to communicate no more than the fact that it is a List, so you simply say that's what it is. List communicates that you have a collection that is intended to be in the order that it is given. If you don't need to communicate that much, you might consider passing it around as a Collection, which is another interface (a super-interface of List). Or, if all you need to communicate is that you can iterate over it, you might even call it an Iterable.
List is an interface; ArrayList is a class that implements the List interface.
Interfaces define the method signatures that are required, but say nothing about how they are implemented.
Classes that implement an interface promise to provide public implementations of methods with the identical signatures declared by the interface.
A List defines the interface that ArrayList uses, that allows it to implement methods that will allow all other classes that implement List to be used together or in a similar way. An ArrayList is always also a List, but an List isn't necessarily an ArrayList.
That is, ArrayList implements List (among a few other interfaces).
How to use List and ArrayList, or other implementation of List, is Polymorphism and Inheritance, and also the reason why for using languages such as Java.
In simplicity, Polymorphism is many forms while Inheritance is reuse.
There can be many kinds of concrete and ready to us List that is available to you, such as ArrayList, Vector, LinkedList and Stack. The decision to use which comes from you, and if you look at the List API, you would notice that all of these List implementations extend in one way or another from List.
According to the java docs, List is just an interface, and ArrayList is one of the classes that implement it. There is no inherent efficiency advantage to using ArralyList specifically instead of List-typed references to an ArrayList object.
However, when it comes to "efficiency", there can be a difference between different implementations of the List interface. For instance there can be a small efficiency difference between a LinkedList and an ArrayList, depending on how you're using them.
To quote the java docs on the ArrayList page,
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
In other words, the performance difference will probably be negligible, but you may see some advantage from using an ArrayList (as opposed to a LinkedList).
In case you're interested, ArrayList is implemented with an array that is resized from time to time (most likely whenever your collection doubles in size), which is quite different from the implementation of a LinkedList (see wikipedia for details).
Well, it seems to me ArrayLists make it easier to expand the code later on both because they can grow and because they make using Generics easier. However, for multidimensional arrays, I find the readability of the code is better with standard arrays.
Anyway, are there some guidelines on when to use one or the other? For example, I'm about to return a table from a function (int[][]), but I was wondering if it wouldn't be better to return a List<List<Integer>> or a List<int[]>.
Unless you have a strong reason otherwise, I'd recommend using Lists over arrays.
There are some specific cases where you will want to use an array (e.g. when you are implementing your own data structures, or when you are addressing a very specific performance requirement that you have profiled and identified as a bottleneck) but for general purposes Lists are more convenient and will offer you more flexibility in how you use them.
Where you are able to, I'd also recommend programming to the abstraction (List) rather than the concrete type (ArrayList). Again, this offers you flexibility if you decide to chenge the implementation details in the future.
To address your readability point: if you have a complex structure (e.g. ArrayList of HashMaps of ArrayLists) then consider either encapsulating this complexity within a class and/or creating some very clearly named functions to manipulate the structure.
Choose a data structure implementation and interface based on primary usage:
Random Access: use List for variable type and ArrayList under the hood
Appending: use Collection for variable type and LinkedList under the hood
Loop and process: use Iterable and see the above for use under the hood based on producer code
Use the most abstract interface possible when handing around data. That said don't use Collection when you need random access. List has get(int) which is very useful when random access is needed.
Typed collections like List<String> make up for the syntactical convenience of arrays.
Don't use Arrays unless you have a qualified performance expert analyze and recommend them. Even then you should get a second opinion. Arrays are generally a premature optimization and should be avoided.
Generally speaking you are far better off using an interface rather than a concrete type. The concrete type makes it hard to rework the internals of the function in question. For example if you return int[][] you have to do all of the computation upfront. If you return List> you can lazily do computation during iteration (or even concurrently in the background) if it is beneficial.
The List is more powerful:
You can resize the list after it has been created.
You can create a read-only view onto the data.
It can be easily combined with other collections, like Set or Map.
The array works on a lower level:
Its content can always be changed.
Its length can never be changed.
It uses less memory.
You can have arrays of primitive data types.
I wanted to point out that Lists can hold the wrappers for the primitive data types that would otherwise need to be stored in an array. (ie a class Double that has only one field: a double) The newer versions of Java convert to and from these wrappers implicitly, at least most of the time, so the ability to put primitives in your Lists should not be a consideration for the vast majority of use cases.
For completeness: the only time that I have seen Java fail to implicitly convert from a primitive wrapper was when those wrappers were composed in a higher order structure: It could not convert a Double[] into a double[].
It mostly comes down to flexibility/ease of use versus efficiency. If you don't know how many elements will be needed in advance, or if you need to insert in the middle, ArrayLists are a better choice. They use Arrays under the hood, I believe, so you'll want to consider using the ensureCapacity method for performance. Arrays are preferred if you have a fixed size in advance and won't need inserts, etc.
Is there any practical difference between a Set and Collection in Java, besides the fact that a Collection can include the same element twice? They have the same methods.
(For example, does Set give me more options to use libraries which accept Sets but not Collections?)
edit: I can think of at least 5 different situations to judge this question. Can anyone else come up with more? I want to make sure I understand the subtleties here.
designing a method which accepts an argument of Set or Collection. Collection is more general and accepts more possibilities of input. (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Collection.)
designing a method which returns a Set or Collection. Set offers more guarantees than Collection (even if it's just the guarantee not to include one element twice). (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Set.)
designing a class that implements the interface Set or Collection. Similar issues as #2. Users of my class/interface get more guarantees, subclassers/implementers have more responsibility.
designing an interface that extends the interface Set or Collection. Very similar to #3.
writing code that uses a Set or Collection. Here I might as well use Set; the only reasons for me to use Collection is if I get back a Collection from someone else's code, or if I have to handle a collection that contains duplicates.
Collection is also the supertype of List, Queue, Deque, and others, so it gives you more options. For example, I try to use Collection as a parameter to library methods that shouldn't explicitly depend on a certain type of collection.
Generally, you should use the right tool for the job. If you don't want duplicates, use Set (or SortedSet if you want ordering, or LinkedHashSet if you want to maintain insertion order). If you want to allow duplicates, use List, and so on.
I think you already have it figured out- use a Set when you want to specifically exclude duplicates. Collection is generally the lowest common denominator, and it's useful to specify APIs that accept/return this, which leaves you room to change details later on if needed. However if the details of your application require unique entries, use Set to enforce this.
Also worth considering is whether order is important to you; if it is, use List, or LinkedHashSet if you care about order and uniqueness.
See Java's Collection tutorial for a good walk-through of Collection usage. In particular, check out the class hierarchy.
As #mmyers states, Collection includes Set, as well as List.
When you declare something as a Set, rather than a Collection, you are saying that the variable cannot be a List or a Map. It will always be a Collection, though. So, any function that accepts a Collection will accept a Set, but a function that accepts a Set cannot take a Collection (unless you cast it to a Set).
One other thing to consider... Sets have extra overhead in time, memory, and coding in order to guarantee that there are no duplicates. (Time and memory because sets are usually backed by a HashMap or a Tree, which adds overhead over a list or an array. Coding because you have to implement the hashCode() and equals() methods.)
I usually use sets when I need a fast implementation of contains() and use Collection or List otherwise, even if the collection shouldn't have duplicates.
You should use a Set when that is what you want.
For example, a List without any order or duplicates. Methods like contains are quite useful.
A collection is much more generic. I believe that what mmyers wrote on their usage says it all.
The practical difference is that Set enforces the set logic, i.e. no duplicates and unordered, while Collection does not. So if you need a Collection and you have no particular requirement for avoiding duplicates then use a Collection. If you have the requirement for Set then use Set. Generally use the highest interface possibble.
As Collection is a super type of Set and SortedSet these can be passed to a method which expects a Collection. Collection just means it may or may not be sorted, order or allow duplicates.