Since everyone praises Google Collections (e.g. in here)
How come I can't find the equivalent of ArrayUtils.toObject() and ArrayUtils.toPrimitive()? is it that unusable? did I miss it?
To be honest I'm not sure if either of those methods should even qualify as a collection-related operation and as such I'd wonder why they're even there in the first place.
To clarify a bit, collections are generically a group of objects with some semantic data binding them together while arrays are just a predetermined set of something. This semantic data may be information about accepting or rejecting nulls, duplicates, objects of wrong types or with unacceptable field values etc.
Most -if not all- collections do use arrays internally, however array itself isn't a collection. To qualify as a collection it needs some relevant magic such as removing and adding objects to arbitrary positions and arrays can't do that. I very much doubt you'll ever see any kind of array support in Google Collections since arrays are not collections.
However since Google Collections is going to be part of Google's Guava libraries which is a general purpose utility class library/framework of sorts, you may find what you want from com.google.common.primitives package, for example Booleans#asList(boolean... backingArray) and Booleans#toArray(Collection<Boolean> collection).
If you absolutely feel they should include equal methods to Apache Commons Collection's .toObject() and .toPrimitive() in there, you can always submit a feature request as new issue.
Related
ImmutableSet implements the Set interface. The functions that don't make sense to an ImmutableSet are now called "Optional Operations" for Set. I assume for situations like this. So ImmutableSet now throws an UnsupportedOperationException for many Optional Operations.
This seems backwards to me. I was taught that an Interface was a contract so that you could use impose functionality across different implementations. The approach of Optional Operations seem to fundamentally change(contradict?) what Interfaces are meant to do. Implementing this today I would have the Set Interface broken into two interfaces: one for one for immutable operations and a second extending those operations for mutators. (Very quick, off the cuff solution)
I understand that technology changes. I'm not saying It should be done one way or another. My question is, does this change reflect a change in some underlying philosophy for Java? Is it just more of a bandaid to make things backwards compatible? Did I have an incomplete understanding of Interfaces?
The Java Collections API Design FAQ answers this question in detail:
Q: Why don't you support immutability directly in the core collection interfaces so that you can do away with optional operations (and UnsupportedOperationException)?
A: This is the most controversial design decision in the whole API. Clearly, static (compile time) type checking is highly desirable, and is the norm in Java. We would have supported it if we believed it were feasible. Unfortunately, attempts to achieve this goal cause an explosion in the size of the interface hierarchy, and do not succeed in eliminating the need for runtime exceptions (though they reduce it substantially).
Doug Lea, who wrote a popular Java collections package that did reflect mutability distinctions in its interface hierarchy, no longer believes it is a viable approach, based on user experience with his collections package. In his words (from personal correspondence) "Much as it pains me to say it, strong static typing does not work for collection interfaces in Java."
To illustrate the problem in gory detail, suppose you want to add the notion of modifiability to the Hierarchy. You need four new interfaces: ModifiableCollection, ModifiableSet, ModifiableList, and ModifiableMap. What was previously a simple hierarchy is now a messy heterarchy. Also, you need a new Iterator interface for use with unmodifiable Collections, that does not contain the remove operation. Now can you do away with UnsupportedOperationException? Unfortunately not.
Consider arrays. They implement most of the List operations, but not remove and add. They are "fixed-size" Lists. If you want to capture this notion in the hierarchy, you have to add two new interfaces: VariableSizeList and VariableSizeMap. You don't have to add VariableSizeCollection and VariableSizeSet, because they'd be identical to ModifiableCollection and ModifiableSet, but you might choose to add them anyway for consistency's sake. Also, you need a new variety of ListIterator that doesn't support the add and remove operations, to go along with unmodifiable List. Now we're up to ten or twelve interfaces, plus two new Iterator interfaces, instead of our original four. Are we done? No.
Consider logs (such as error logs, audit logs and journals for recoverable data objects). They are natural append-only sequences, that support all of the List operations except for remove and set (replace). They require a new core interface, and a new iterator.
And what about immutable Collections, as opposed to unmodifiable ones? (i.e., Collections that cannot be changed by the client AND will never change for any other reason). Many argue that this is the most important distinction of all, because it allows multiple threads to access a collection concurrently without the need for synchronization. Adding this support to the type hierarchy requires four more interfaces.
Now we're up to twenty or so interfaces and five iterators, and it's almost certain that there are still collections arising in practice that don't fit cleanly into any of the interfaces. For example, the collection-views returned by Map are natural delete-only collections. Also, there are collections that will reject certain elements on the basis of their value, so we still haven't done away with runtime exceptions.
When all was said and done, we felt that it was a sound engineering compromise to sidestep the whole issue by providing a very small set of core interfaces that can throw a runtime exception.
In short, having interfaces like Set with optional operations was done to prevent an exponential explosion in the number of different interfaces needed. It is not as simple as just "immutable" and "mutable". Guava's ImmutableSet then had to implement Set to be interoperable with all other code which uses Sets. It's not ideal but there is really no better way to do it.
I have read a lot about deep copying and serialization in Java List, Map, etc, but I did not find good answers to a some questions. I care for time and performance, so I am looking for a compromise. I list my questions below so that you can address the ones you have answers to.
What is better in terms of performance, deepcopy by looping over a list or using serialization? I have read a post (sorry I lost the link) that says looping is 4 times better than java serialization. Does this mean that using a 3rd party like Kryo can be better than looping?
I am not sure how serialization works in different 3rd party libraries, but what happens if I have many subclass levels, would serialization be better than looping?
Is there any library in Java that copies raw memory? For instance, a library that uses memcopy()-like functions in C. That would be much faster since there is no need to care about classes logic (of course it needs some handling for non-contiguous memory data). I am aware that Java is object oriented :), but this wont violate rules I think.
If I want to implement deepCopy(List<?>) and deepCopy(Map<?,?>) in Java, can I put them in a MyTools class that I have; or there is some neater way to do it in Java?
Looping will always be faster than serialization unless you are only serializing primitives.
In third party libraries, they handle subclass and all kinds in general. They do this by using sun.misc.Unsafe.class
Yes, actually you can copy memory using Unsafe class. Check Unsafe class implementation.
You can have MyTools or let's say Utils class do to that for you.
According to your needs, there is not way of deep copying for list, maps and other generics. For this reason, you may end up using a good serialization library like Kryo.
I'm trying to understand how Java Collections Framework sorts its collections by default and I got confused, because I read all the collections are being sorted using merge sort. But as I took a look at Array class I saw this: «Implementors should feel free to substitute other algorithms, so long as the specification itself is adhered to. (For example, the algorithm used bysort(Object[]) does not have to be a mergesort, but it does have to be stable.)» Which means it also uses other sorting algorithms. So how exactly are the collections being sorted?
The code to sort collections is delivered with the JRE/JDK.
Anyone who implements the JRE/JDK can choose to implement it in any way he wants, as long as it's conforming (i.e. it actually sorts the collection correctly and the sorting is stable).
Some implementations might choose merge-sort, others might choose something else. No specific implementation is required.
Well, it seems to me ArrayLists make it easier to expand the code later on both because they can grow and because they make using Generics easier. However, for multidimensional arrays, I find the readability of the code is better with standard arrays.
Anyway, are there some guidelines on when to use one or the other? For example, I'm about to return a table from a function (int[][]), but I was wondering if it wouldn't be better to return a List<List<Integer>> or a List<int[]>.
Unless you have a strong reason otherwise, I'd recommend using Lists over arrays.
There are some specific cases where you will want to use an array (e.g. when you are implementing your own data structures, or when you are addressing a very specific performance requirement that you have profiled and identified as a bottleneck) but for general purposes Lists are more convenient and will offer you more flexibility in how you use them.
Where you are able to, I'd also recommend programming to the abstraction (List) rather than the concrete type (ArrayList). Again, this offers you flexibility if you decide to chenge the implementation details in the future.
To address your readability point: if you have a complex structure (e.g. ArrayList of HashMaps of ArrayLists) then consider either encapsulating this complexity within a class and/or creating some very clearly named functions to manipulate the structure.
Choose a data structure implementation and interface based on primary usage:
Random Access: use List for variable type and ArrayList under the hood
Appending: use Collection for variable type and LinkedList under the hood
Loop and process: use Iterable and see the above for use under the hood based on producer code
Use the most abstract interface possible when handing around data. That said don't use Collection when you need random access. List has get(int) which is very useful when random access is needed.
Typed collections like List<String> make up for the syntactical convenience of arrays.
Don't use Arrays unless you have a qualified performance expert analyze and recommend them. Even then you should get a second opinion. Arrays are generally a premature optimization and should be avoided.
Generally speaking you are far better off using an interface rather than a concrete type. The concrete type makes it hard to rework the internals of the function in question. For example if you return int[][] you have to do all of the computation upfront. If you return List> you can lazily do computation during iteration (or even concurrently in the background) if it is beneficial.
The List is more powerful:
You can resize the list after it has been created.
You can create a read-only view onto the data.
It can be easily combined with other collections, like Set or Map.
The array works on a lower level:
Its content can always be changed.
Its length can never be changed.
It uses less memory.
You can have arrays of primitive data types.
I wanted to point out that Lists can hold the wrappers for the primitive data types that would otherwise need to be stored in an array. (ie a class Double that has only one field: a double) The newer versions of Java convert to and from these wrappers implicitly, at least most of the time, so the ability to put primitives in your Lists should not be a consideration for the vast majority of use cases.
For completeness: the only time that I have seen Java fail to implicitly convert from a primitive wrapper was when those wrappers were composed in a higher order structure: It could not convert a Double[] into a double[].
It mostly comes down to flexibility/ease of use versus efficiency. If you don't know how many elements will be needed in advance, or if you need to insert in the middle, ArrayLists are a better choice. They use Arrays under the hood, I believe, so you'll want to consider using the ensureCapacity method for performance. Arrays are preferred if you have a fixed size in advance and won't need inserts, etc.
Quick question here: why not ALWAYS use ArrayLists in Java? They apparently have equal access speed as arrays, in addition to extra useful functionality. I understand the limitation in that it cannot hold primitives, but this is easily mitigated by use of wrappers.
Plenty of projects do just use ArrayList or HashMap or whatever to handle all their collection needs. However, let me put one caveat on that. Whenever you are creating classes and using them throughout your code, if possible refer to the interfaces they implement rather than the concrete classes you are using to implement them.
For example, rather than this:
ArrayList insuranceClaims = new ArrayList();
do this:
List insuranceClaims = new ArrayList();
or even:
Collection insuranceClaims = new ArrayList();
If the rest of your code only knows it by the interface it implements (List or Collection) then swapping it out for another implementation becomes much easier down the road if you find you need a different one. I saw this happen just a month ago when I needed to swap out a regular HashMap for an implementation that would return the items to me in the same order I put them in when it came time to iterate over all of them. Fortunately just such a thing was available in the Jakarta Commons Collections and I just swapped out A for B with only a one line code change because both implemented Map.
If you need a collection of primitives, then an array may well be the best tool for the job. Boxing is a comparatively expensive operation. For a collection (not including maps) of primitives that will be used as primitives, I almost always use an array to avoid repeated boxing and unboxing.
I rarely worry about the performance difference between an array and an ArrayList, however. If a List will provide better, cleaner, more maintainable code, then I will always use a List (or Collection or Set, etc, as appropriate, but your question was about ArrayList) unless there is some compelling reason not to. Performance is rarely that compelling reason.
Using Collections almost always results in better code, in part because arrays don't play nice with generics, as Johannes Weiß already pointed out in a comment, but also because of so many other reasons:
Collections have a very rich API and a large variety of implementations that can (in most cases) be trivially swapped in and out for each other
A Collection can be trivially converted to an array, if occasional use of an array version is useful
Many Collections grow more gracefully than an array grows, which can be a performance concern
Collections work very well with generics, arrays fairly badly
As TofuBeer pointed out, array covariance is strange and can act in unexected ways that no object will act in. Collections handle covariance in expected ways.
arrays need to be manually sized to their task, and if an array is not full you need to keep track of that yourself. If an array needs to be resized, you have to do that yourself.
All of this together, I rarely use arrays and only a little more often use an ArrayList. However, I do use Lists very often (or just Collection or Set). My most frequent use of arrays is when the item being stored is a primitive and will be inserted and accessed and used as a primitive. If boxing and unboxing every become so fast that it becomes a trivial consideration, I may revisit this decision, but it is more convenient to work with something, to store it, in the form in which it is always referenced. (That is, 'int' instead of 'Integer'.)
This is a case of premature unoptimization :-). You should never do something because you think it will be better/faster/make you happier.
ArrayList has extra overhead, if you have no need of the extra features of ArrayList then it is wasteful to use an ArrayList.
Also for some of the things you can do with a List there is the Arrays class, which means that the ArrayList provided more functionality than Arrays is less true. Now using those might be slower than using an ArrayList, but it would have to be profiled to be sure.
You should never try to make something faster without being sure that it is slow to begin with... which would imply that you should go ahead and use ArrayList until you find out that they are a problem and slow the program down. However there should be common sense involved too - ArrayList has overhead, the overhead will be small but cumulative. It will not be easy to spot in a profiler, as all it is is a little overhead here, and a little overhead there. So common sense would say, unless you need the features of ArrayList you should not make use of it, unless you want to die by a thousands cuts (performance wise).
For internal code, if you find that you do need to change from arrays to ArrayList the chance is pretty straight forward in most cases ([i] becomes get(i), that will be 99% of the changes).
If you are using the for-each look (for( value : items) { }) then there is no code to change for that as well.
Also, going with what you said:
1) equal access speed, depending on your environment. For instance the Android VM doesn't inline methods (it is just a straight interpreter as far as I know) so the access on that will be much slower. There are other operations on an ArrayList that can cause slowdowns, depends on what you are doing, regardless of the VM (which could be faster with a stright array, again you would have to profile or examine the source to be sure).
2) Wrappers increase the amount of memory being used.
You should not worry about speed/memory before you profile something, on the other hand you shouldn't choose what you know to be a slower option unless you have a good reason to.
Performance should not be your primary concern.
Use List interface where possible, choose concrete implementation based on actual requirements (ArrayList for random access, LinkedList for structural modifications, ...).
You should be concerned about performance.
Use arrays, System.arraycopy, java.util.Arrays and other low-level stuff to squeeze out every last drop of performance.
Well don't always blindly use something that is not right for the job. Always start off using Lists, choose ArrayList as your implementation. This is a more OO approach. If you don't know that you specifically need an array, you'll find that not tying yourself to a particular implementation of List will be much better for you in the long run. Get it working first, optimize later.