Java List toArray(T[] a) implementation

Java List toArray(T[] a) implementation - java

I was just looking at the method defined in the List interface:
Returns an array containing all of the elements in this list in the correct order; the runtime type of the returned array is that of the specified array. If the list fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this list.
If the list fits in the specified array with room to spare (i.e., the array has more elements than the list), the element in the array immediately following the end of the collection is set to null. This is useful in determining the length of the list only if the caller knows that the list does not contain any null elements.
<T> T[] toArray(T[] a);
And I was just wondering why is it implemented this way, basically if you pass it an array with a length < to the list.size(), it will simply create a new one and return it. Therefore the creation of the new Array Object in the method parameter is useless.
Additionally if you pass it an array long enough using the size of the list if returns that same object with the objects - really no point in returning it since it is the same object but ok for clarity.
The problem is that I think this promotes slightly inefficient code, in my opinion toArray should simply receive the class and just return the new array with the contents.
Is there any reason why it is not coded that way?.

357 public <T> T[] toArray(T[] a) {
358 if (a.length < size)
359 // Make a new array of a's runtime type, but my contents:
360 return (T[]) Arrays.copyOf(elementData, size, a.getClass());
361 System.arraycopy(elementData, 0, a, 0, size);
362 if (a.length > size)
363 a[size] = null;
364 return a;
365 }
Maybe so it has a runtime type?
From wiki:
Consequently, instantiating a Java
class of a parameterized type is
impossible because instantiation
requires a call to a constructor,
which is unavailable if the type is
unknown.

As mentioned by others, there are a couple different reasons:
You need to pass in the type somehow, and passing in an array of the specified type isn't an unreasonable way to do it. Admittedly, it might be nice if there was a version that you pass in the Class of the type you want too, for speed.
If you want to reuse your array, you can keep passing in the same one, rather than needing to create a new one each time. This can save time and memory, and GC issues if you need to do it many, many times

Most likely this is to allow you to reuse arrays, so you basically avoid (relatively costly) array allocation for some use cases. Another much smaller benefit is that caller can instantiate array slightly more efficiently, since toArray() must use 'java.lang.reflect.Array.newInstance' method.

This method is a holdover from pre-1.5 Java. Here is the link to javadoc
Back then it was the only way to convert a list to a reifiable array.
It is an obscure fact, but although you can store anything in the Object[] array, you cannot cast this array to more specific type, e.g.
Object[] generic_array = { "string" };
String[] strings_array = generic_array; // ClassCastException
Seemingly more efficient List.toArray() does just that, it creates a generic Object array.
Before Java generics, the only way to do a type-safe transfer was to have this cludge:
String[] stronglyTypedArrayFromList ( List strings )
{
return (String[]) strings.toArray( new String[] );
// or another variant
// return (String[]) strings.toArray( new String[ strings.size( ) ] );
}
Thankfully generics made these kind of machinations obsolete. This method was left there to provide backward compatibility with pre 1.5 code.

My guess is that if you already know the concrete type of T at the point you're calling toArray(T[]), it's more performant to just declare an array of whatever it is than make the List implementation call Arrays.newInstance() for you -- plus in many cases you can re-use the array.
But if it annoys you, it's easy enough to write a utility method:
public static <E> E[] ToArray(Collection<E> c, Class<E> componentType) {
E[] array = (E[]) Array.newInstance(componentType, c.size());
return c.toArray(array);
}
(Note that there's no way to write <E> E[] ToArray(Collection<E> c), because there's no way to create an array of E at runtime without a Class object, and no way to get a Class object for E at runtime, because the generics have been erased.)

Related

Why ArrayList.toArray method pass an array parameter instead of an array class parameter？

Here I have a ArrayList List<String> list and I want to convert it to an array with String[]
So I check the ArrayList API, got a method toArray(T[] a), the only one with generic type.
I wonder know why doesn't ArrayList have a method like this below?
public <T> T[] toArray(Class<T[]> type) {
return Arrays.copyOf(elementData, size, type);
}
So I just call it list.toArray(String[].class) without crearte a new array. It work I try it, and I think it's simpler.
Of course maybe I didn't get the author's ideas about ArrayList.toArray.
So I would like to ask everyone to point out my mistake or share me about the advantages of the original method toArray(T[] a). Thank you! :)

Because the method pre-dates generics.
Back in Java 1.2 when the method was created, it was declared as:
public Object[] toArray(Object[] a)
The caller could then create the correct type of array, e.g. String[], and cast the return value. If the array was adequately size, no reflection would be needed to re-size the array, so it was better for performance.
The performance landscape of Java has changed immensely since then, so some of the performance considerations that was done with the design might seem unnecessary these days.
The designers also added that weird C-like quirk of null'ing the element immediately after the last element from the list if the array is oversized. I doubt that that would happen if the method was designed these days.
When generics was added in Java 5, some classes were changed to be generic, such as all the Collection classes, and all the existing methods were changed as much as possible to accommodate that. Some methods could be changed for backwards compatibility, e.g. Map.get(Object key) wasn't changed to Map.get(K key) since it was previously valid to call get() with an object that couldn't be in the map.
The toArray method was changed, since it remained fully backwards compatible even with the change:
public <T> T[] toArray(T[] a)

If you are referring to java.util.ArrayList.class. It does in-fact have a function
Internally, the function is doing the same thing you are doing. What is wrong in this?
public <T> T[] toArray(T[] a) {
if (a.length < size)
// Make a new array of a's runtime type, but my contents:
return (T[]) Arrays.copyOf(elementData, size, a.getClass());
System.arraycopy(elementData, 0, a, 0, size);
if (a.length > size)
a[size] = null;
return a;
}

Converting a generic List to an Array. Why do I need use clone?

I faced a problem yesterday, when I was writing my homework. I finished the homework, but I still don't really understand why my code works. I had to write a sort function that takes an varargs of any comparable generic object as an argument and return the argument. The problem was that I had to return an array of sorted objects. So I had to learn more about varargs lists and arrays.
The function was defined like this.
public <T extends Comparable<T>> T[] stableSort(T ... items)
and inside the function I made a list, which I would sort and do all the work on.
List<T> list = new ArrayList<T>(Arrays.asList(items));
and at the end of the function I was returning list toArray so that it matched the output type T[].
list.toArray(items.clone());
My question is since I already made the list from the varargs, why do I have to do items.clone() inside the toArray function. That seemed like doing two same things to me. I thought arrays.asList() would clone the values of array to list and I don't get why am I doing it again at the end of the code in toArray(). I know that this was the correct way to write it, because I finished the homework yesterday and found out this way from forums of the class, but I still don't understand why.
EDIT
The task required me to create a new array with sorted files and return it instead. Due to Type Erasure, it is not possible to instantiate an array of a generic type without a reference to a class that fits the generic. However, the varargs array has type T, so I should have cloned an array of a type which fits the generic constraints. Which I didn't know how to do in time. So I decided to use list to make my time easier till the deadline.

My question is since I already made the list from the varargs, why do I have to do items.clone()
You are right. Unfortunately, the compiler will be unable to determine the type of the array if you simply use the toArray() method. You should get a compilation error saying Cannot convert from Object[] to T[]. The call to item.clone() is required to assist the compiler in type-inference. An alternate approach would be to say return (T[])list.toArray
That said, I would not recommend either of the approaches. It doesn't really make sense to convert an array to a list and convert it back to an array in the first place. I don't see any significant take-aways that you would even understand from this code.

It seems to me there are a few questions here, that may have come together to create some confusion as to why what needs to be done.
I thought arrays.asList() would clone the values of array to list and I don't get why am I doing it again at the end of the code in toArray().
This is probably just the way it is typed, but it should be made clear that you don't clone the objects in the array, but only make a new List with the references to the objects in the array. The objects themselves will be the same ones in the array as in the List. I believe that is probably what you meant, but terminology can be tricky here.
I thought arrays.asList() would clone the values of array to list...
Not really. Using Arrays.asList(T[] items) will provide a view onto the array items that implements the java.util.List interface. This is a fixed-size list. You can't add to it. Changes to it, such as replacing an element or sorting in-place, will pass through to the underlying array. So if you do this
List<T> l = Arrays.asList(T[] items);
l.set(0, null);
... you've just set the element at index 0 of the actual array items to null.
The part of your code where you do this
List<T> list = new ArrayList<T>(Arrays.asList(items));
could be written as this:
List<T> temp = Arrays.asList(items);
List<T> list = new ArrayList<T>(temp);
The first line is the "view", the second line will effectively create a new java.util.ArrayList and fill it with the values of the view in the order they are returned in by their iterator (which is just the order in the array). So any changes to list that you make now don't change array items, but keep in mind that it's still just a list of references. items and list are referencing the same objects, just with their own order.
My question is since I already made the list from the varargs, why do I have to do items.clone() inside the toArray function.
There could be two reasons here. The first is as CKing said in his/her answer. Because of type erasure and the way arrays are implemented in Java (there are separate array types depending on whether it's an array of primitives or references) the JVM would not know what type of array to create if you just called toArray() on the list, which is why that method has a return type of Object[]. So in order to get an array of a specific type, you must provide an array to the method that can be used at run-time to determine the type from. This is a piece of the Java API where the fact that generics work via type-erasure, aren't retained at run-time and the particular way in which arrays work all come together to surprise the developer. A bit of abstraction is leaking there.
But there might be a second reason. If you go check the toArray(T[] a) method in the Java API, you'll notice this part:
If the list fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this list.
Suppose some code by another dev is using your stableSort method like this:
T[] items;
// items is created and filled...
T[] sortedItems = stableSort(items);
If you didn't do the clone, what would happen in your code would be this:
List<T> list = new ArrayList<T>(Arrays.asList(items));
// List is now a new ArrayList with the same elements as items
// Do some things with list, such as sorting
T[] result = list.toArray(items);
// Seeing how the list would fit in items, since it has the same number of elements,
// result IS in fact items
So now the caller of your code gets sortedItems back, but that array is the same array as the one he passed in, namely items. You see, varargs are nothing more than syntactic sugar for a method with an array argument, and are implemented as such. Perhaps the caller didn't expect the array he passed in as an argument to be changed, and might still need the array with the original order. Doing a clone first will avoid that and makes the effect of the method less surprising. Good documentation on your methods is crucial in situations like this.
It's possible that code testing your assignment's implementation wants a different array back, and it's an actual acquirement that your method adheres to that contract.
EDIT:
Actually, your code could be much simpler. You'll achieve the same with:
T[] copy = items.clone();
Arrays.sort(copy);
return copy;
But your assignment might have been to actually implement a sorting algorithm yourself, so this point may be moot.

You need to use this:
List<T> list = new ArrayList<T>(Arrays.asList(items));
when you want to do an inline declaration.
For example:
List<String> list = new ArrayList<String>(Arrays.asList("aaa", "bbb", "ccc"));

By the way, you didn't have to use return list.toArray(items.clone()); You could have used, for example, return list.toArray(Arrays.copyOf(items, 0));, where you are passing to list.toArray() an empty array that contains none of the arguments from items.
The whole point of passing an argument to the version of list.toArray() that takes an argument, is to provide an array object whose actual runtime class is the actual runtime class of the array object it wants to return. This could have been achieved with items.clone(), or with items itself (though that would cause list.toArray() to write the resulting elements into the original array pointed to by items which you may not want to happen), or with, as I showed above, an empty array that has the same runtime class.
By the way, the need to pass the argument to list.toArray() is not a generics type issue at all. Even if you had written this with pre-generics Java, you would have had to do the same thing. This is because the version of List::toArray() that took no arguments always returns an array object whose actual runtime class is Object[], as the List doesn't know at runtime what its component type is. To have it return an array object whose actual runtime class is something different, you had to give it an example array object of the right runtime class to follow. That's why pre-generics Java also had the version of List::toArray() that took one argument; even though in pre-generics, both methods were declared to return Object[], they are different as the actual runtime class returned is different.

Why autoboxing is not supporting in collection.toArray() in java

Is there no any short cut to do this nicely?
That ugly two loops (One loop to read the pmList and second loop to add to the markupArray) is the only option (Instead of ArrayUtils).
ArrayList<Double> pmList = new ArrayList<Double>();
pmList.add(0.1); // adding through a loop in real time.
pmList.add(0.1);
pmList.add(0.1);
pmList.add(0.1);
double[] markupArray = new double[pmList.size()];
arkupArray = pmList.toArray(markupArray); // This says The method toArray(T[]) in the type ArrayList<Double> is not applicable for the arguments (double[])

Simply use a Double[] array, instead of double[] then everything works fine. If you know the size of the list ahead of time, you can also skip the list and insert directly into the array. It might even be worth to traverse the input two times: Once for retrieving its size and once for insertion.
Auto boxing only works for primitive types, not for arrays of primitive types. A double[] array is no T[] array, since a type parameter T must always be an Object. While a double may be autoboxed to T (with T=Double), a double[] cannot be autoboxed to T[].
The reason why arrays are not autoboxed is probably that this operation would be very costly: Boxing an array means creating a new array and boxing each element. For large arrays, this has a huge performance hit. You don't want such a costly operation to be done implicitly, hence no autoboxing for arrays. In addition, boxing a complete array would yield a new array. Thus, when writing to the new array, the changes would not write through to the old array. So you see, there are some semantics problems with array-boxing, so it is not supported.
If you must return a double[] array, then your must either write your own function or use a third-party library like Guava (see msandiford's answer). The Java Collections framework has no methods for (un)boxing of arrays.

You could use TDoubleArraList or guava's primitive list collection.
You could also determine the size in advance in one loop and add the values in another.

Why not make your own shortcut?
static double[] doubleListToArray(List<Double> list) {
int k = 0;
double[] result = new double[list.size()];
for(double value : list)
result[k++] = value;
return result;
}

Google guava has Doubles#asList(...) and Doubles#toArray(...) which provide conversions from double[] to List<Double> and from Collection<? extends Number> to double[] respectively.

You are right that this is not very intuitive at first look. However, this limitation is related to the way the Java language implements generic types and auto-boxing:
Generic types are erased at runtime. This implies that any ArrayList<Double> is represented by a single compiled Java class ArrayList which is shared with other generic representations of ArrayList such as for example ArrayList<String>. As a consequence, the compiled method ArrayList::toArray does not (and must not) know what generic type an instance represents as the single compiled method must be applicable for any generic type. As the elements could therefore be anything like String or Double, you need to provide an array to the method. The method can then check the type of the target array at runtime and check the elements that are filled into the array at runtime to be assignable to the array's component type. All this logic can be implemented by a single compiled method.
Secondly, auto-boxing and -unboxing is something that only exists at compile time. This means that the statements
Integer i = 42;
int j = i;
are compiled as if you wrote
Integer i = new Integer(42);
int j = i.intValue();
It is the Java compiler that adds the boxing instructions for you. The Java runtime applies a slightly different type system where boxing is not considered. As a consequence, the single compiled method ArrayList::toArray that we mentioned in (1) cannot know that this boxing needs to be applied as we argued that the method must be applicable for any type T which might not always represent a Double.
In theory, you could alter the implementation of ArrayList::toArray to explicitly checks if an array's component type and a lists element type are applicable for unboxing but this approach would result in several branches which would add quite a runtime overhead to the method. Rather, write a small utility method that specializes on the Double type and applies the implicit unboxing due to the specialization. An iteration over all list items suffices for this purpose, this is how the ArrayList::toArray is implemented as well. If your array is small, consider to use an array of boxed values Double[] instead of double[]. If your array is however large, lives long or you are restrained to primitive types in order to comply to a third-party API, use the utility. Also, look out for implementations of primitive collections if you want to ungo the overall boxing. With Java 8, use a Stream in order to inline the array conversion.

Why does Java ArrayList use per-element casting instead of per-array casting?

What happens inside Java's ArrayList<T> (and probably many other classes) is that there is an internal Object[] array = new Object[n];, to which T Objects are written. Whenever an element is read from it, a cast return (T) array[i]; is done. So, a cast on every single read.
I wonder why this is done. To me, it seems like they're just doing unnecessary casts. Wouldn't it be more logical and also slightly faster to just create a T[] array = (T[]) new Object[n]; and then just return array[i]; without cast? This is only one cast per array creation, which is usually far less than the number of reads.
Why is their method to be preferred? I fail to see why my idea isn't strictly better?

It's more complicated than that: generics are erased in byte code, and the erasure of T[] is Object[]. Likewise, the return value of get() becomes Object. To retain integrity of the type system, a checked cast is inserted when the class is actually used, i.e.
Integer i = list.get(0);
will be erased to
Integer i = (Integer) list.get(0);
That being the case, any type check within ArrayList is redundant. But it's really beside the point, because both (T) and (T[]) are unchecked casts, and incur no runtime overhead.
One could write a checked ArrayList that does:
T[] array = Array.newInstance(tClass, n);
This would prevent heap pollution, but at the price of a redundant type check (you can not suppress the synthentic cast in calling code). It would also require the caller to provide the ArrayList with the class object of the element type, which clutters its api and makes it harder to use in generic code.
Edit: Why is generic array creation forbidden?
One problem is that arrays are checked, while generics are unchecked. That is:
Object[] array = new String[1];
array[0] = 1; // throws ArrayStoreException
ArrayList list = new ArrayList<String>();
list.add(1); // causes heap pollution
Therefore, the component type of an array matters. I assume this is why the designers of the Java language require us to be explicit about which component type to use.

Whenever an element is read from it, a cast return (T) array[i]; is done. So, a cast on every single read.
Generic is a compile time check. At runtime the type T extends is used instead. In this case T implicitly extends Object so what you have at runtime is effectively.
return (Object) array[i];
or
return array[i];
Wouldn't it be more logical and also slightly faster to just create a
T[] array = (T[]) new Object[n]
Not really. Again at runtime this becomes
Object[] array = (Object[]) new Object[n];
or
Object[] array = new Object[n];
What you are really fishing for is
T[] array = new T[n];
except this doesn't compile, mostly because T isn't known at runtime.
What you can do is
private final Class<T> tClass; // must be passed in the constructor
T[] array = (T[]) Array.newInstance(tClass, n);
only then will the array actually be the type expected. This could make reads faster, but at the cost of writes. The main benefit would be a fail fast check, i.e. you would stop a collection being corrupted rather than wait until you detect it was corrupted to throw an Exception.

I think is more a matter of code style not of performance or type safety (because the backing array is private)
The java 5 ArrayList was implemented the way you suggested with an E[] array. If you look at the source code you see that it contains 7 (E[]) casts. From java 6 the ArrayList changed to use an Object[] array which resulted in only 3 (E) casts.

Array is object too.
Here T[] array = (T[]) new Object[n] you cast only (T[]) object type not elements in the array.

Casting an array of Objects into an array of my intended class

Just for review, can someone quickly explain what prevents this from working (on compile):
private HashSet data;
...
public DataObject[] getDataObjects( )
{
return (DataObject[]) data.toArray();
}
...and what makes this the way that DOES work:
public DataObject[] getDataObjects( )
{
return (DataObject[]) data.toArray( new DataObject[ Data.size() ] );
}
I'm not clear on the mechanism at work with casting (or whatever it is) that makes this so.

Because toArray() creates an array of Object, and you can't make Object[] into DataObject[] just by casting it. toArray(DataObject[]) creates an array of DataObject.
And yes, it is a shortcoming of the Collections class and the way Generics were shoehorned into Java. You'd expect that Collection<E>.toArray() could return an array of E, but it doesn't.
Interesting thing about the toArray(DataObject[]) call: you don't have to make the "a" array big enough, so you can call it with toArray(new DataObject[0]) if you like.
Calling it like toArray(new DateObject[0]) is actually better if you use .length later to get the array length. if the initial length was large and the same array object you passed was returned then you may face NullPointerExceptions later
I asked a question earlier about Java generics, and was pointed to this FAQ that was very helpful: http://www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html

To ensure type safety when casting an array like you intended (DataObject[] dataArray = (DataObject[]) objectArray;), the JVM would have to inspect every single object in the array, so it's not actually a simple operation like a type cast. I think that's why you have to pass the array instance, which the toArray() operation then fills.

After Java 8 with introduction of streams and Lambda you can do the following too:
For casting a normal Array of objects
Stream.of(dataArray).toArray(DataObject[]::new);
For casting a List
dataList.stream().toArray(DataObject[]::new);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.