Well, it seems to me ArrayLists make it easier to expand the code later on both because they can grow and because they make using Generics easier. However, for multidimensional arrays, I find the readability of the code is better with standard arrays.
Anyway, are there some guidelines on when to use one or the other? For example, I'm about to return a table from a function (int[][]), but I was wondering if it wouldn't be better to return a List<List<Integer>> or a List<int[]>.
Unless you have a strong reason otherwise, I'd recommend using Lists over arrays.
There are some specific cases where you will want to use an array (e.g. when you are implementing your own data structures, or when you are addressing a very specific performance requirement that you have profiled and identified as a bottleneck) but for general purposes Lists are more convenient and will offer you more flexibility in how you use them.
Where you are able to, I'd also recommend programming to the abstraction (List) rather than the concrete type (ArrayList). Again, this offers you flexibility if you decide to chenge the implementation details in the future.
To address your readability point: if you have a complex structure (e.g. ArrayList of HashMaps of ArrayLists) then consider either encapsulating this complexity within a class and/or creating some very clearly named functions to manipulate the structure.
Choose a data structure implementation and interface based on primary usage:
Random Access: use List for variable type and ArrayList under the hood
Appending: use Collection for variable type and LinkedList under the hood
Loop and process: use Iterable and see the above for use under the hood based on producer code
Use the most abstract interface possible when handing around data. That said don't use Collection when you need random access. List has get(int) which is very useful when random access is needed.
Typed collections like List<String> make up for the syntactical convenience of arrays.
Don't use Arrays unless you have a qualified performance expert analyze and recommend them. Even then you should get a second opinion. Arrays are generally a premature optimization and should be avoided.
Generally speaking you are far better off using an interface rather than a concrete type. The concrete type makes it hard to rework the internals of the function in question. For example if you return int[][] you have to do all of the computation upfront. If you return List> you can lazily do computation during iteration (or even concurrently in the background) if it is beneficial.
The List is more powerful:
You can resize the list after it has been created.
You can create a read-only view onto the data.
It can be easily combined with other collections, like Set or Map.
The array works on a lower level:
Its content can always be changed.
Its length can never be changed.
It uses less memory.
You can have arrays of primitive data types.
I wanted to point out that Lists can hold the wrappers for the primitive data types that would otherwise need to be stored in an array. (ie a class Double that has only one field: a double) The newer versions of Java convert to and from these wrappers implicitly, at least most of the time, so the ability to put primitives in your Lists should not be a consideration for the vast majority of use cases.
For completeness: the only time that I have seen Java fail to implicitly convert from a primitive wrapper was when those wrappers were composed in a higher order structure: It could not convert a Double[] into a double[].
It mostly comes down to flexibility/ease of use versus efficiency. If you don't know how many elements will be needed in advance, or if you need to insert in the middle, ArrayLists are a better choice. They use Arrays under the hood, I believe, so you'll want to consider using the ensureCapacity method for performance. Arrays are preferred if you have a fixed size in advance and won't need inserts, etc.
Related
I am trying to write a class using so that I can implement a circular buffer of any type of data. There are two options that I see right now. First, I create a class using generics where the generic type would be used to determine the type of array that I want to initialise for the circular buffer. Here, I would have to use the array of the class. Second, I create an abstract base class defining the common components and extend this class for the implementation of different types of buffers that I want to use. For instance, if I want to create an instance of circular buffer of type int, I can create it in the implementation of my child class. Here, I can use the primitive datatypes to create the arrays. My application is real-time and would access these buffers in real-time. So, I would want to know which is an efficient solution.
The second approach will perform better because unboxing and (in particular) autoboxing have a performance hit.
The collections will also have a considerably smaller memory footprint too, so that's another benefit.
These reasons are why collection frameworks like GS Collections (now Eclipse Collections I believe) and Guava include primitive implementations of their collections.
The downside is that you will have more code to write up-front, and you won't have a single class that can be used in all circumstances.
I am doing a project in Scala, but am fairly new to the language and have a Java background. I see that Scala doesn't have ArrayList, so I am wondering what Scala's equivalent of Java's ArrayList is called, and if there are any important differences between the Java and Scala versions.
EDIT: I'm not looking for a specific behavior so much as an internal representation (data stored in an array, but the whole array isn't visible, only the part you use).
I can think of 3 more specific questions to address yours:
What is Scala's default collection?
What Scala collection has characteristics similar to ArrayList?
What's a good replacement for Array in Scala?
So here are the answers for these:
What is Scala's default collection?
Scala's equivalent of Java's List interface is the Seq. A more general interface exists as well, which is the GenSeq -- the main difference being that a GenSeq may have operations processed serially or in parallel, depending on the implementation.
Because Scala allows programmers to use Seq as a factory, they don't often bother with defining a particular implementation unless they care about it. When they do, they'll usually pick either Scala's List or Vector. They are both immutable, and Vector has good indexed access performance. On the other hand, List does very well the operations it does well.
What Scala collection has characteristics similar to ArrayList?
That would be scala.collection.mutable.ArrayBuffer.
What's a good replacement for Array in Scala?
Well, the good news is, you can just use Array in Scala! In Java, Array is often avoided because of its general incompatibility with generics. It is a co-variant collection, whereas generics is invariant, it is mutable -- which makes its co-variance a danger, it accepts primitives where generics don't, and it has a pretty limited set of methods.
In Scala, Array -- which is still the same Array as in Java -- is invariant, which makes most problems go away. Scala accepts AnyVal (the equivalent of primitives) as types for its "generics", even though it will do auto-boxing. And through the "enrich my library" pattern, ALL of Seq methods are available to Array.
So, if you want a more powerful Array, just use an Array.
What about a collection that shrinks and grows?
The default methods available to all collections all produce new collections. For example, if I do this:
val ys = xs filter (x => x % 2 == 0)
Then ys will be a new collection, while xs will still be the same as before this command. This is true no matter what xs was: Array, List, etc.
Naturally, this has a cost -- after all, you are producing a new collection. Scala's immutable collections are much better at handling this cost because they are persistent, but it depends on what operation is executed.
No collection can do much about filter, but a List has excellent performance on generating a new collection by prepending an element or removing the head -- the basic operations of a stack, as a matter of fact. Vector has good performance on a bunch of operations, but it only pays if the collection isn't small. For collections of, say, up to a hundred elements, the overall cost might exceed the gains.
So you can actually add or remove elements to an Array, and Scala will produce a new Array for you, but you'll pay the cost of a full copy when you do that.
Scala mutable collections add a few other methods. In particular, the collections that can increase or decrease size -- without producing a new collection -- implement the Growable and Shrinkable traits. They don't guarantee good performance on these operations, though, but they'll point you to the collections you want to check out.
It's ArrayBuffer from scala.collection.mutable. You can find the scaladocs here.
It's hard to say exactly what you should do because you haven't said what behavior of ArrayList you're interested in using. It's more useful to think in terms of which scala traits you want to take advantage of. Here's a good explanation: http://grahamhackingscala.blogspot.com/2010/02/how-to-convert-java-list-to-scala-list.html.
That said, you probably want some sort of IndexedSeq.
We are learning about the Collection Interface and I was wondering if you all have any good advice for it's general use? What can you do with an Collection that you cannot do with an array? What can you do with an array that you cannot do with a Collection(besides allowing duplicates)?
The easy way to think of it is: Collections beat object arrays in basically every single way. Consider:
A collection can be mutable or immutable. A nonempty array must always be mutable.
A collection can allow or disallow null elements. An array must always permit null elements.
A collection can be thread-safe; even concurrent. An array is never safe to publish to multiple threads.
A list or set's equals, hashCode and toString methods do what users expect; on an array they are a common source of bugs.
A collection is type-safe; an array is not. Because arrays "fake" covariance, ArrayStoreException can result at runtime.
A collection can hold a non-reifiable type (e.g. List<Class<? extends E>> or List<Optional<T>>). An array will generate a warning for this.
A collection can have views (unmodifiable, subList...). No such luck for an array.
A collection has a full-fledged API; an array has only set-at-index, get-at-index, length and clone.
Type-use annotations like #Nullable are very confusing with arrays. I promise you can't guess what #A String #B [] #C [] means.
Because of all the reasons above, third-party utility libraries should not bother adding much additional support for arrays, focusing only on collections, so you also have a network effect.
Object arrays will never be first-class citizens in Java APIs.
A couple of the reasons above are covered -- but in much greater detail -- in Effective Java, Third Edition, Item 28, from page 126.
So, why would you ever use object arrays?
You're very tightly optimizing something
You have to interact with an API that uses them and you can't fix it
so convert to/from a List as close to that API as you can
Because varargs (but varargs is overused)
so ... same as previous
Obviously some collection implementations must be using them
I can't think of any other reasons, they suck bad
It's basically a question of the desired level of abstraction.
Most collections can be implemented in terms of arrays, but they provide many more methods on top of it for your convenience. Most collection implementations I know of for instance, can grow and shrink according to demand, or perform other "high-level" operations which basic arrays can't.
Suppose for instance that you're loading strings from a file. You don't know how many new-line characters the file contains, thus you don't know what size to use when allocating the array. Therefore an ArrayList is a better choice.
The details are in the sub interfaces of Collection, like Set, List, and Map. Each of those types has semantics. A Set typically cannot contain duplicates, and has no notion of order (although some implementations do), following the mathematical concept of a Set. A List is closest to an Array. A Map has specific behavior for push and get. You push an object by its key, and you retrieve with the same key.
There are even more details in the implementations of each collection type. For example, any of the hash based collections (e.g. HashSet, HasMap) are based on the hashcode() method that exists on any Java object.
You could simulate the semantics of any collection type based of an array, but you would have to write a lot of code to do it. For example, to back a Map with an array, you would need to write a method that puts any object entered into your Map into a specific bucket in the array. You would need to handle duplicates. For an array simulating a Set, you would need to write code to not allow duplicates.
The Collection interface is just a base interface for specialised collections -- I am not aware yet of a class that simply just implements Collection; instead classes implement specialized interfaces which extend Collection. These specialized interfaces and abstract classes provide functionality for working with sets (unique objects), growing arrays (e.g. ArrayList), key-value maps etc -- all of which you cannot do out of the box with an array.
However, iterating through an array and setting/reading items from an array remains one of the fastest methods of dealing with data in Java.
One advantage is the Iterator interface. That is all Collections implement an Iterator. An Iterator is an object that knows how to iterate over the given collection and present the programmer with a uniformed interface regardless of the underlying implementation. That is, a linked list is traversed differently from a binary tree, but the iterator hides these differences from the programmer making it easier for the programmer to use one or the other collection.
This also leads to the ability to use various implementations of Collections interchangeably if the client code targets the Collection interface iteself.
Quick question here: why not ALWAYS use ArrayLists in Java? They apparently have equal access speed as arrays, in addition to extra useful functionality. I understand the limitation in that it cannot hold primitives, but this is easily mitigated by use of wrappers.
Plenty of projects do just use ArrayList or HashMap or whatever to handle all their collection needs. However, let me put one caveat on that. Whenever you are creating classes and using them throughout your code, if possible refer to the interfaces they implement rather than the concrete classes you are using to implement them.
For example, rather than this:
ArrayList insuranceClaims = new ArrayList();
do this:
List insuranceClaims = new ArrayList();
or even:
Collection insuranceClaims = new ArrayList();
If the rest of your code only knows it by the interface it implements (List or Collection) then swapping it out for another implementation becomes much easier down the road if you find you need a different one. I saw this happen just a month ago when I needed to swap out a regular HashMap for an implementation that would return the items to me in the same order I put them in when it came time to iterate over all of them. Fortunately just such a thing was available in the Jakarta Commons Collections and I just swapped out A for B with only a one line code change because both implemented Map.
If you need a collection of primitives, then an array may well be the best tool for the job. Boxing is a comparatively expensive operation. For a collection (not including maps) of primitives that will be used as primitives, I almost always use an array to avoid repeated boxing and unboxing.
I rarely worry about the performance difference between an array and an ArrayList, however. If a List will provide better, cleaner, more maintainable code, then I will always use a List (or Collection or Set, etc, as appropriate, but your question was about ArrayList) unless there is some compelling reason not to. Performance is rarely that compelling reason.
Using Collections almost always results in better code, in part because arrays don't play nice with generics, as Johannes Weiß already pointed out in a comment, but also because of so many other reasons:
Collections have a very rich API and a large variety of implementations that can (in most cases) be trivially swapped in and out for each other
A Collection can be trivially converted to an array, if occasional use of an array version is useful
Many Collections grow more gracefully than an array grows, which can be a performance concern
Collections work very well with generics, arrays fairly badly
As TofuBeer pointed out, array covariance is strange and can act in unexected ways that no object will act in. Collections handle covariance in expected ways.
arrays need to be manually sized to their task, and if an array is not full you need to keep track of that yourself. If an array needs to be resized, you have to do that yourself.
All of this together, I rarely use arrays and only a little more often use an ArrayList. However, I do use Lists very often (or just Collection or Set). My most frequent use of arrays is when the item being stored is a primitive and will be inserted and accessed and used as a primitive. If boxing and unboxing every become so fast that it becomes a trivial consideration, I may revisit this decision, but it is more convenient to work with something, to store it, in the form in which it is always referenced. (That is, 'int' instead of 'Integer'.)
This is a case of premature unoptimization :-). You should never do something because you think it will be better/faster/make you happier.
ArrayList has extra overhead, if you have no need of the extra features of ArrayList then it is wasteful to use an ArrayList.
Also for some of the things you can do with a List there is the Arrays class, which means that the ArrayList provided more functionality than Arrays is less true. Now using those might be slower than using an ArrayList, but it would have to be profiled to be sure.
You should never try to make something faster without being sure that it is slow to begin with... which would imply that you should go ahead and use ArrayList until you find out that they are a problem and slow the program down. However there should be common sense involved too - ArrayList has overhead, the overhead will be small but cumulative. It will not be easy to spot in a profiler, as all it is is a little overhead here, and a little overhead there. So common sense would say, unless you need the features of ArrayList you should not make use of it, unless you want to die by a thousands cuts (performance wise).
For internal code, if you find that you do need to change from arrays to ArrayList the chance is pretty straight forward in most cases ([i] becomes get(i), that will be 99% of the changes).
If you are using the for-each look (for( value : items) { }) then there is no code to change for that as well.
Also, going with what you said:
1) equal access speed, depending on your environment. For instance the Android VM doesn't inline methods (it is just a straight interpreter as far as I know) so the access on that will be much slower. There are other operations on an ArrayList that can cause slowdowns, depends on what you are doing, regardless of the VM (which could be faster with a stright array, again you would have to profile or examine the source to be sure).
2) Wrappers increase the amount of memory being used.
You should not worry about speed/memory before you profile something, on the other hand you shouldn't choose what you know to be a slower option unless you have a good reason to.
Performance should not be your primary concern.
Use List interface where possible, choose concrete implementation based on actual requirements (ArrayList for random access, LinkedList for structural modifications, ...).
You should be concerned about performance.
Use arrays, System.arraycopy, java.util.Arrays and other low-level stuff to squeeze out every last drop of performance.
Well don't always blindly use something that is not right for the job. Always start off using Lists, choose ArrayList as your implementation. This is a more OO approach. If you don't know that you specifically need an array, you'll find that not tying yourself to a particular implementation of List will be much better for you in the long run. Get it working first, optimize later.
While creating classes in Java I often find myself creating instance-level collections that I know ahead of time will be very small - less than 10 items in the collection. But I don't know the number of items ahead of time so I typically opt for a dynamic collection (ArrayList, Vector, etc).
class Foo
{
ArrayList<Bar> bars = new ArrayList<Bar>(10);
}
A part of me keeps nagging at me that it's wasteful to use complex dynamic collections for something this small in size. Is there a better way of implementing something like this? Or is this the norm?
Note, I'm not hit with any (noticeable) performance penalties or anything like that. This is just me wondering if there isn't a better way to do things.
The ArrayList class in Java has only two data members, a reference to an Object[] array and a size—which you need anyway if you don't use an ArrayList. So the only advantage to not using an ArrayList is saving one object allocation, which is unlikely ever to be a big deal.
If you're creating and disposing of many, many instances of your container class (and by extension your ArrayList instance) every second, you might have a slight problem with garbage collection churn—but that's something to worry about if it ever occurs. Garbage collection is typically the least of your worries.
For the sake of keeping things simple, I think this is pretty much a non-issue. Your implementation is flexible enough that if the requirements change in the future, you aren't forced into a refactoring. Also, adding more logic to your code for a hybrid solution just isn't worth it taking into account your small data set and the high-quality of Java's Collection API.
Google Collections has collections optimized for immutable/small number of elements. See Lists.asList API as an example.
The overhead is very small. It is possible to write a hybrid array list that has fields for the first few items, and then falls back to using an array for longer list.
You can avoid the overhead of the list object entirely by using an array. To go even further hardcore, you can declare the field as Object, and avoid the array altogether for a single item.
If memory really is a problem, you might want to forget about using object instances at the low-level. Instead use a larger data structure at a larger level of granularity.