I want to convert a set to a list. Which method should I prefer (I dont care if it's immutable or not):
version 1:
return List.of(myHashSet().toArray());
or version 2:
return new ArrayList<Strategy>(myHashSet());
Are there any other differences besides that version 1 is immutable? For example, List.of is more space efficient in compare to ArrayList.
The only differences I see between your 1 and 2 are:
List.of produces a shallowly immutable list. You cannot add or remove items from the list. Of course, the state within each object contained in the list would still be mutable if already so. Your # 1 results in an immutable list while the ArrayList of # 2 is mutable.
With # 2, you know and control the concrete class of the resulting List. With # 1, you neither know nor control the concrete class used behind the scenes when calling List.of. The List.of feature is free to use any class that implements List interface, possibly even a class not publicly available with the Java Collections Framework. The List.of feature might even be smart about choosing an optimized class appropriate to your particular collected objects.
You said:
List.of is more space efficient in compare to ArrayList.
You cannot make that claim, if you mean usage of memory (RAM). As discussed above, you neither know nor control the choice of class used to implement List interface when calling List.of. The class used might vary depending on your data, and might vary by version of Java used at runtime.
As for memory used by the call to toArray, an array of objects is really just an array of references (pointers). Creating that array takes little memory and is fast. Object references are likely to be a value of four or eight octets (depending on your JVM being 32-bit or 64-bit), though not specified by Java. In creating the array, it is not as if the content of your element objects are being duplicated. So for most common apps the brief creation and disposal of that array would be insignificant.
And, as commented by Kuhn, Java 10 saw the arrival of a new factory method List.copyOf that takes a Collection. So no need to call toArray.
List<Strategy> myStrategyList = List.copyOf( myStrategySet ) ;
Conclusion:
If you need an immutable list, use List.copyOf.
If your need mutability or any other feature specific to a particular
implementation of List, use that particular class.
Related
I know that an instance of ArrayList can be declared in the two following ways:
ArrayList<String> list = new ArrayList<String>();
and
List<String> list = new ArrayList<String();
I know that using the latter declaration provides the flexibility of changing the implementation from one List subclass to another (eg, from ArrayList to LinkedList).
But, what is the difference in the time and space complexity in the two? Someone told me the former declaration will ultimately make the heap memory run out. Why does this happen?
Edit: While performing basic operations like add, remove and contains does the performance differ in the two implementations?
The space complexity of your data structure and the time complexity of different operations on it are all implementation specific. For both of the options you listed above, you're using the same data structure implementation i.e. ArrayList<String>. What type you declare them as on the left side of the equal sign doesn't affect complexity. As you said, being more general with type on the left of the equal sign just allows for swapping out of implementations.
What determines the behaviour of an object is its actual class/implementation. In your example, the list is still an ArrayList, so the behaviour won't change.
Using a List<> declaration instead of an ArrayList<> means that you will only use the methods made visible by the List interface and that if later you need another type of list, it will be easy to change it (you just change the call to new). This is why we often prefer it.
Example: you first use an ArrayList but then find out that you often need to delete elements in the middle of the list. You would thus consider switching to a LinkedList. If you used the List interface everywhere (in getter/setter, etc.), then the only change in your code will be:
List<String> list = new LinkedList<>();
but if you used ArrayList, then you will need to refactor your getter/setter signatures, the potential public methods, etc.
In the JDK, there's Collection.emtpyList() and Collection.emptySet(). Both in their own right. But sometimes all that is needed is an empty, immutable instance of Collection. To me, there's no reason to chose one over the other as both implement all operations of Collection in an efficient way and with the same results. Yet each time I need such an empty collection I ponder which one to use for a second of two.
I do not expect to gain a deeper understanding of the collections framework from an answer to this question but maybe there's a subtle reason I could use to justify choosing one over the other without thinking about it ever again.
An answer should state at least one reason preferring one of Collection.emtpyList() and Collection.emptySet() over the other in a context where they're functionally equivalent. An answer is better if the stated reason is near the top of this list:
There's a case where the type system is happier with one over the other (e.g. type inference allows shorter code with one than the other).
There is a performance difference, maybe in some special case (e.g. if the empty collection is passed as an argument to some of the collection framework's static or instance methods like Collections.sort() or Collection.removeAll()).
Choosing one over the other "makes more sense" in the general case, if you think about it.
Examples where this question arises
To give some context, here are two examples where I am in need of an empty, unmodifiable collection.
This is an example of an API that allows creating some object by optionally specifying a collection of objects that are used in the creation. The second method just calls the first one with an empty collection:
static void createObjectWithTheseThings(Collection<Thing> things) {
...
}
static void createObjectWithoutAnyThings() {
createObjectWithTheseThings(Collections.emptyXXX());
}
This is an example of an Entity with state represented by an immutable collection stored in a non-final field. On initialization the field should be set to an empty collection:
class Example {
// Initialized to an empty collection.
private Collection<T> containedThings = Collections.emptyXXX();
...
}
Unfortunately I don't have an answer that will make the top of your priority list but if I were you I'd settle on
Collections.emptySet
Type inference was your first priority but I don't know if the choice can/should influence that given you were looking for an emptyCollection()
On the second priority, think about any api that takes in a collection which performs differently (accidentally/intentionally) based on the sub-interfaces of the concrete object passed in. Aren't they more likely to offer varied performance based on the concrete implementations (as with an ArrayList or LinkedList) instead? The empty set/list are not modeled on any empty data structures anyway; they are dummy implementations - hence no real difference
Based on java's modelling of these interfaces (which admittedly is not ideal), a Collection is very similar to a Set. In fact I think the methods are almost exactly the same. Logically too it looks OK with List being the specific-sub type that adds additional ordering concerns.
Now Collection and Set looking very similar(java-wise) brings up a question. If you are using a Collection type, it is clear it is not a list you want. Now the question is are you sure you don't mean a Set. If you don't, then are you using something like a Bag (surely there must be concrete instances which are not empty in the overall logic). So if you are concerned with say a Bag, then shouldn't it be up to the Bag api to provide an emptyBag() method? Just wondering. btw, I'd stick with emptySet() in the meantime :)
For the emptyXXX(), it really doesn't matter at all - since they are both empty (and they are unmodifieable, so they always stay empty) it doesn't matter at all. They will be equally suited to all operations Collection offers.
Take a look at what Collections really gives you there: Special implementations (the instances are shared across calls!). All relevant operations are dummy implementations that either return a constant result or immediately throw. Even iterator() is just a dummy with no state.
It wont make any notable difference at all.
Edit: You could say for the special case of emptyList/Set, they are semantically and complexity-wise the same at the Collecton interface level. All operations available on Collection are implemented by emptySet/List as O(1) operations. And since they're following both the contract defined by Collection, they are semantically identical too.
The only situation I can imagine this making a difference is if the code that will use your Collection does something like this:
Collection<T> collection = ...
List<T> asAList;
if (collection instanceof List) {
asAList = (List<T>) collection;
} else {
asAList = new ArrayList<T>(collection);
}
Obviously in a case like this you would want to use emptyList(), while if the secret target type was a Set, you'd want emptySet().
Otherwise, in terms of what "makes more sense", I agree with #ac3's logic that a generic Collection is like a Bag, and an empty immutable Set and empty immutable Bag are pretty much the same thing. However, a person very used to using immutable lists might find those easier to think of.
First I should say that in my book (2005), Vector<E> is (extensively used) in place of arrays. At the same time there is no explanation with differences between the two. Checking the Oracle Doc for Vector class it's pretty easy to understand its usage.
Doing some additional research on StackOverflow and Google, I found that the Vector class is actually deprecated and to use ArrayList instead, is this correct? I also found an extensive explanation about differences between Array and ArrayList.
The part that I can't really understand: Is there a rule on where I should use ArrayList instead of simple arrays? It seems like I should always use ArrayList. It looks more efficient and should be easier to implement collections of values/objects, is there any down side with this approach?
Some history:
Vector exists since Java 1.0;
the List interface exists since Java 1.2, and so does ArrayList;
Vector has been retrofitted to implement the List interface at that same time;
Java 5, introducing generics, has been introduced in 2004 (link).
Your course, dating back 2005, should have had knowledge of ArrayList at the very list (sorry, least), and should have introduced generics too.
As to Array, there is java.lang.reflect.Array, which helps with reflections over arrays (ie, int[], etc).
Basically:
Vector synchronizes all operations, which is a waste in 90+% of cases;
if you want concurrent collections, Java 5 has introduced ConcurrentHashMap, CopyOnWriteArrayList etc, you should use those;
DO NOT use Vector anymore in any event; some code in the JDK still uses it, but it is for backwards compatibility reasons. In new code, there are better alternatives, as mentioned in the previous point;
since Java 1.2, Vector does not offer the same thread safety guarantees as it used to offer anyway.
The latter point is interesting. Prior to Iterator there was Enumeration, and Enumeration did not offer the possibility to remove elements; Iterator, however, does.
So, let us take two threads t1 and t2, a Vector, and those two threads having an Iterator over that vector. Thread t1 does:
while (it.hasNext())
it.next();
Thread t2 does:
// remember: different iterator
if (!it.hasNext())
it.remove();
With some unlucky timing, you have:
t1 t2
------ ------
hasNext(): true
.hasNext(): false
removes last element
.next() --> BOOM
Therefore, Vector is in fact not thread safe. And it is even less thread safe since Java 5's introduction of the "foreach loop", which creates a "hidden" iterator.
The basic difference between an array and an ArrayList is that an array has fixed size, whereas, ArrayList can dynamically grow in size as needed. So, if you are assured that your array size won't change, then you can use it. But if you want to add elements later then a an ArrayList which is an implementation of List interface, is the way to go.
Although an ArrayList is internally backed by an array only. So, internally it also uses a fixed size array, with an initial capacity of 10 (which can change for that matter), but that detail is internally hidden. So, you don't have to bother about the changing size of the ArrayList.
Whenever you add elements more than the current size of array in your ArrayList, the internal array is extended. That means, the regular expansion of size can become an overhead, if you are regular inserting a large number of elements. Although this is rarely the case. Still, you can also give your own initial size while creating an ArrayList. So, that's upto you to decide.
As for Vector vs ArrayList discussion, yes Vector is now deprecated (not technically though, but it's use is discouraged as stated in comments by #Luiggi), and you should use an ArrayList. The difference is that Vector synchronizes each operation, which is nearly never required. When you need synchronization, you can always create a synchronized list using Collections.synchronizedList.
For more on this discussion, see this post.
An ArrayList is an implementation of List. There are other variations too. Like you also have a LinkedList, to get the functionality of a traditional linked list.
Vector Class is actually deprecated and to use ArrayList instead, is this correct?
Yes this is correct. Vector class and some other collections are deprecated and replaced with new collections like ArrayList, Map, etc. Here are few reasons why Vector is deprecated
Is there a rule on where i should use ArrayList instead of simple Arrays?
Almost always. I can think of two reasons why you should use arrays:
Makes JNI calls easier. It is MUCH easier to send a simple array from C++ to Java than an object of ArrayList
You can gain a little bit of performance, since access to elements of simple array does not requires boundaries checks and method calls.
On other hand using ArrayList gives a lot of advantages. You do not need to think about controlling array's size when you add new element, you can use simple API of ArrayList for adding/removing elements from your collection, etc.
I'll just add my two cents.
If you need a collection of primitive data and optimization matters, arrays will always be faster, as it eliminates the requirement of auto-boxing and auto-unboxing.
Suppose there is an Integer array in my class:
public class Foo {
private Integer[] arr = new Integer[20];
.....
}
On a 64 bit architecture the space requirement for this is ~ (20*8+24) + 24*20 {space required for references + some array overhead + space required for objects}.
Why java stores references to all of the 20 Integer objects? Wouldn't knowing that first memory location and the number of items in the array suffice? (assuming and I also as I read somewhere that objects in an array are placed contiguously anyways). I want to know the reason for this sort of implementation. Sorry if this is a noobish question.
Like every other class, Integer is a reference type. This means it can only be accessed indirectly, via a reference. You cannot store an instance of a reference type in a field, a local variable, a slot in a collection, etc. -- you always have to store a reference and allocate the object itself separately. There are a variety of reasons for this:
You need to be able to represent null.
You need to be able to replace it with another instance of a subtype (assuming subtypes are possible, i.e. the class is not final). For example, an Object[] may actually store instances of any number of different classes with wildly varying sizes.
You need to preserve sharing, e.g. after a[0] = a[1] = someObject; all three must refer to the same object. This is much more important (vital even) if the object is mutable, but even with immutable objects the difference can be observed via reference equality checks (==).
You need reference assignment to be atomic (cf. Java memory model), so copying the whole instance is even more expensive than it seems.
With these and many other constraints, always storing references is the only feasible implementation strategy (in general). In very specific circumstances, a JIT compiler may avoid allocating an object entirely and store its directly (e.g. on the stack), but this is an obscure implementation detail, and not widely applicable. I only mention this for completeness and because it's a wonderful illustration of the as-if rule.
I know this question has be asked before generic comes out. Array does win out a bit given Array enforces the return type, it's more type-safe.
But now, with latest JDK 7, every time when I design this type of APIs:
public String[] getElements(String type)
vs
public List<String> getElements(String type)
I am always struggling to think of some good reasons to return A Collection over An Array or another way around. What's the best practice when it comes to the case of choosing String[] or List as the API's return type? Or it's courses for horses.
I don't have a special case in my mind, I am more looking for a generic pros/cons comparison.
If you are writing a public API, then your clients will usually prefer collections because they are easier to manipulate and integrate with the rest of the codebase. On the other hand, if you expect your public API to be used in a highly performance-sensitive context, the raw array is preferred.
If you are writing this for your own use, it would be a best practice to start out with a collection type and only switch to an array if there is a definite performance issue involving it.
An array's element type can be determined at runtime through reflection, so if that particular feature is important to you, that would be another case to prefer an array.
Here is a partial list
Advantages of array:
Fast
Mutable by nature
You know exactly "what you get" (what is the type) - so you know exactly how the returned object will behave.
Advantages of list:
Behavior varies depending on actual type returned (for example - can be mutable or immutable depending on the actual type)
Better hirerchy design
(Depending on actual type) might be dynamic size
More intuitive hashCode(), equals() - which might be critical if feeding to a hash based collection as a key.
Type safety:
String[] arr1 = new String[5];
Object[] arr2 = arr1;
arr2[0] = new Object(); //run time error :(
List<String> list1 = new LinkedList<String>();
List<Object> list2 = list1; //compilation error :)
If I have a choice I would select the Collection because of the added behaviour I get for 'free' in Java.
Implement to Interfaces
The biggest benefit is that if you return the Collection API (or even a List, Set, etc) you can easily change the implementation (e.g. ArrayList, LinkedList, HashSet, etc) without needing to change the clients using the method.
Adding Behaviour
The Java Collections class provides many wrappers that can be applied to a collection including
Synchronising a collection
Making a collection immutable
Searching, reversing, etc...
When you are exposing a public API it makes much sense to return a Collection, as it makes life easier for the client by using all sorts of methods available on it. Client can always call toArray() if wants an array representation for special cases.
Also integration with other modules becomes easier as most API expect collection so that can work too.
In my point of view, it depends on what the returned value will be used for.
If the returned value will be iterated into and nothing else, an array is the best choice but if the result will be manipulated, then go for the appropriate Collection.
But be careful because, for example, List should allow duplicates while Set should not, Stack should be LIFO while Queue should be FIFO and notice the SHOULD that I use because only the implementation can determine the real behavior.
Anyway, it really depends.