How do we usually process sequences in Java? - java

In C++, iterators in STL is very useful. I can write container independent code to process sequences.
However, I found Iterator and ListIterator are very poor in Java. They even don't support clone(). I think it's impossible to process sequences with them.
The only way to do this seems to be using arrays forever, but how can I reuse my code when I change arrays to Lists ?
Processing sequences is to do some algorithms on a sequences of Objects. For example, sorting them , finding the maximun, remove duplicated items.

List<Type> list = new List<Type>
//add the elements...
for(Type t : list)
//do you stuff with t
Normally you will not need to use the iterators explicitly in Java. Also, be careful with .clone() as it is rarely the most appropriate solution.
List itself is a interface that is implemented by different containers.

I would use List<Type>, and avoid clone() as it doesn't always do what you think. i.e. it can be shallow or deep depending on the implementation.
List is a basic class. Perhaps if you give an example of what you are having trouble with it we can help you with that.

Related

Most efficient collection for filtering a Java Stream?

I'm storing several Things in a Collection. The individual Things are unique, but their types aren't. The order in which they are stored also doesn't matter.
I want to use Java 8's Stream API to search it for a specific type with this code:
Collection<Thing> things = ...;
// ... populate things ...
Stream<Thing> filtered = things.stream.filter(thing -> thing.type.equals(searchType));
Is there a particular Collection that would make the filter() more efficient?
I'm inclined to think no, because the filter has to iterate through the entire collection.
On the other hand, if the collection is some sort of tree that is indexed by the Thing.type then the filter() might be able to take advantage of that fact. Is there any way to achieve this?
The stream operations like filter are not that specialized to take an advantage in special cases. For example, IntStream.range(0, 1_000_000_000).filter(x -> x > 999_999_000) will actually iterate all the input numbers, it cannot just "skip" the first 999_999_000. So your question is reduced to find the collection with the most efficient iteration.
The iteration is usually performed in Spliterator.forEachRemaining method (for non-short-circuiting stream) and in Spliterator.tryAdvance method (for short-circuiting stream), so you can take a look into the corresponding spliterator implementation and check how efficient it is. To my opinion the most efficient is an array (either bare or wrapped into list with Arrays.asList): it has minimal overhead. ArrayList is also quite fast, but for short-circuiting operation it will check the modCount (to detect concurrent modification) on every iteration which would add very slight overhead. Other types like HashSet or LinkedList are comparably slower, though in most of applications this difference is practically insignificant.
Note that parallel streams should be used with care. For example, the splitting of LinkedList is quite poor and you may experience worse performance than in sequential case.
The most important thing to understand, regarding this question, is that when you pass a lambda expression to a particular library like the Stream API, all the library receives is an implementation of a functional interface, e.g. an instance of Predicate. It has no knowledge about what that implementation will do and therefore has no way to exploit scenarios like filtering sorted data via comparison. The stream library simply doesn’t know that the Predicate is doing a comparison.
An implementation doing such an optimization would need an interaction of the JVM, which knows and understands the code, and the library, which knows the semantics. Such thing does not happen in current implementation and is currently far away, at least as I can see it.
If the source is a tree or sorted list and you want to benefit from that for filtering, you have to do it using APIs operating on the source, before creating the stream. E.g. suppose, we have a TreeSet and want to filter it to get items within a particular range, like
// our made-up source
TreeSet<Integer> tree=IntStream.range(0, 100).boxed()
.collect(Collectors.toCollection(TreeSet::new));
// the naive implementation
tree.stream().filter(i -> i>=65 && i<91).forEach(i->System.out.print((char)i.intValue()));
We can do instead:
tree.tailSet(65).headSet(91).stream().forEach(i->System.out.print((char)i.intValue()));
which will utilize the sorted/tree nature. When we have a sorted list instead, say
List<Integer> list=new ArrayList<>(tree);
utilizing the sorted nature is more complex as the collection itself doesn’t know that it’s sorted and doesn’t offer operations utilizing that directly:
int ix=Collections.binarySearch(list, 65);
if(ix<0) ix=~ix;
if(ix>0) list=list.subList(ix, list.size());
ix=Collections.binarySearch(list, 91);
if(ix<0) ix=~ix;
if(ix<list.size()) list=list.subList(0, ix);
list.stream().forEach(i->System.out.print((char)i.intValue()));
Of course, the stream operations here are only exemplary and you don’t need a stream at all, when all you do then is forEach…
As far as I am aware, there's no such differenciation for normal streaming.
However, you might be better off when you use parallel streaming when you use a collection which is easily devideable, like ArrayList over LinkedList or any type of Set.

Java 7 API design best practice - return Array or return Collection

I know this question has be asked before generic comes out. Array does win out a bit given Array enforces the return type, it's more type-safe.
But now, with latest JDK 7, every time when I design this type of APIs:
public String[] getElements(String type)
vs
public List<String> getElements(String type)
I am always struggling to think of some good reasons to return A Collection over An Array or another way around. What's the best practice when it comes to the case of choosing String[] or List as the API's return type? Or it's courses for horses.
I don't have a special case in my mind, I am more looking for a generic pros/cons comparison.
If you are writing a public API, then your clients will usually prefer collections because they are easier to manipulate and integrate with the rest of the codebase. On the other hand, if you expect your public API to be used in a highly performance-sensitive context, the raw array is preferred.
If you are writing this for your own use, it would be a best practice to start out with a collection type and only switch to an array if there is a definite performance issue involving it.
An array's element type can be determined at runtime through reflection, so if that particular feature is important to you, that would be another case to prefer an array.
Here is a partial list
Advantages of array:
Fast
Mutable by nature
You know exactly "what you get" (what is the type) - so you know exactly how the returned object will behave.
Advantages of list:
Behavior varies depending on actual type returned (for example - can be mutable or immutable depending on the actual type)
Better hirerchy design
(Depending on actual type) might be dynamic size
More intuitive hashCode(), equals() - which might be critical if feeding to a hash based collection as a key.
Type safety:
String[] arr1 = new String[5];
Object[] arr2 = arr1;
arr2[0] = new Object(); //run time error :(
List<String> list1 = new LinkedList<String>();
List<Object> list2 = list1; //compilation error :)
If I have a choice I would select the Collection because of the added behaviour I get for 'free' in Java.
Implement to Interfaces
The biggest benefit is that if you return the Collection API (or even a List, Set, etc) you can easily change the implementation (e.g. ArrayList, LinkedList, HashSet, etc) without needing to change the clients using the method.
Adding Behaviour
The Java Collections class provides many wrappers that can be applied to a collection including
Synchronising a collection
Making a collection immutable
Searching, reversing, etc...
When you are exposing a public API it makes much sense to return a Collection, as it makes life easier for the client by using all sorts of methods available on it. Client can always call toArray() if wants an array representation for special cases.
Also integration with other modules becomes easier as most API expect collection so that can work too.
In my point of view, it depends on what the returned value will be used for.
If the returned value will be iterated into and nothing else, an array is the best choice but if the result will be manipulated, then go for the appropriate Collection.
But be careful because, for example, List should allow duplicates while Set should not, Stack should be LIFO while Queue should be FIFO and notice the SHOULD that I use because only the implementation can determine the real behavior.
Anyway, it really depends.

How to convert List<String> to an ArrayList<String>

This API call returns a potentially large List<String>, which is not sorted. I need to sort it, search it, and access random elements. Currently the List is implemented by an ArrayList (I checked the source), but at some unknown point in the future the API developers may choose to switch to a LinkedList implementation (without changing the interface).
Sorting, searching, accessing a potentially large LinkedList would be extremely slow and unacceptable for my program. Therefore I need to convert the List to an ArrayList to ensure the practical efficiency of my program. However, since the List is most likely an ArrayList already, it would be inefficient to needlessly create a new ArrayList copy of the List.
Given these constraints, I have come up with the following method to convert a List into an ArrayList:
private static <T> ArrayList<T> asArrayList(List<T> list) {
if (list instanceof ArrayList) {
return (ArrayList<T>) (list);
} else {
return new ArrayList<T>(list);
}
}
My question is this: Is this the most efficient way to work with a List with an unknown implementation? Is there a better way to convert a List to an ArrayList? Is there a better option than converting the List into an ArrayList?
You can't really get much simpler than what you've got - looks about as efficient as it could possibly be to me.
That said, this sounds very much like premature optimisation - you should only really need to worry about this if and when the author of the API you're using changes to a LinkedList. If you worry about it now, you are likely to spend a lot of time and effort planning for a future scenario that may not even come to pass - this is time that might be better spent finding other issues to fix. Presumably the only time you'll be changing versions of the API is between versions of your own application - handle the issue at that point, if at all.
As you can see yourself, the code is simple and it is efficient inasmuch as it only creates a copy if it is necessary.
So the answer is that there is no significantly better option, save for a completely different type of solution, something that sorts your list as well, for example.
(Bear in mind that this level of optimisation is rarely required, so it's not a very frequent problem.)
Update: Just an afterthoughtL as a general rule, well-written APIs don't return data types that are inappropriate for the amount of data they are likely to contain. That is not to say you should trust them blindly, but it's not a completely unreasonable assumption.
Sorting, searching, accessing a potentially large LinkedList would be extremely slow and unacceptable for my program.
Actually, it is not as bad as that. IIRC, the Collections.sort methods copy the list to a temporary array, sort the array, clear() the original list and copy the array back to it. For large enough lists, the O(NlogN) sorting phase will dominate the O(N) copying phases.
Java collections that support efficient random access implement the RandomAccess marker interface. For such a list you could just run Collections.sort on the list directly. For lists without random access you should probably dump the list to an array using one of its toArray methods, sort that array, then wrap it into a random-access List.
T[] array = (T[])list.toArray(); // just suppress the warning if the compiler worries about unsafe cast
Arrays.sort(array);
List<T> sortedList = Arrays.asList(array);
I think Arrays.asList actually creates an ArrayList, so you could try casting its result, if you like.
Collections.sort is efficient for all List implementations, as long as they provide an efficient ListIterator. The method dumps the list into an array, sorts that, then uses the ListIterator to copy the values back into the list in O(n).

Extract elements from list based on object property type

Often, I have a list of objects. Each object has properties. I want to extract a subset of the list where a specific property has a predefined value.
Example:
I have a list of User objects. A User has a homeTown. I want to extract all users from my list with "Springfield" as their homeTown.
I normally see this accomplished as follows:
List users = getTheUsers();
List returnList = new ArrayList();
for (User user: users) {
if ("springfield".equalsIgnoreCase(user.getHomeTown())
returnList.add(user);
}
I am not particularly satisfied with this solution. Yes, it works, but it seems so slow. There must be a non-linear solution.
Suggestions?
Well, this operation is linear in nature unless you do something extreme like index the collection based on properties you expect to examine in this way. Short of that, you're just going to have to look at each object in the collection.
But there may be some things you can do to improve readability. For example, Groovy provides an each() method for collections. It would allow you to do something like this...
def returnList = new ArrayList();
users.each() {
if ("springfield".equalsIgnoreCase(it.getHomeTown())
returnList.add(user);
};
You will need a custom solution for this. Create a custom collection such that it implements List interface and add all elements from original list into this list.
Internally in this custom List class you need to maintain some collections of Map of all attributes which can help you lookup values as you need. To populate this Map you will have to use introspection to find list of all fields and their values.
This custom object will have to implement some methods as List findAllBy(String propertyName, String propertyValue); that will use above hash map to look up those values.
This is not an easy straightforward solution. Further more you will need to consider nested attributes like "user.address.city". Making this custom List immutable will help a lot.
However even if you are iterating list of 1000's of objects in List, still it will be faster so you are better off iterating List for what you need.
As I have found out, if you are using a list, you have to iterate. Whether its a for-each, lambda, or a FindAll - it is still being iterated. No matter how you dress up a duck, it's still a duck. As far as I know there are HashTables, Dictionaries, and DataTables that do not require iteration to find a value. I am not sure what the Java equivalent implementations are, but maybe this will give you some other ideas.
If you are really interested in performance here, I would also suggest a custom solution. My suggestion would be to create a Tree of Lists in which you can sort the elements.
If you are not interested about the ordering of the elements inside your list (and most people are usually not), you could also use a TreeMap (or HashMap) and use the homeTown as key and a List of all entries as value. If you add new elements, just look up the belonging list in the Map and append it (if it is the first element of course you need to create the list first). If you want to delete an element simply do the same.
In the case you want a list of all users with a given homeTown you just need to look up that list in the Map and return it (no copying of elements needed), I am not 100% sure about the Map implementations in Java, but the complete method should be in constant time (worst case logarithmic, depending on the Map implementation).
I ended up using Predicates. Its readability looks similar to Drew's suggestion.
As far as performance is concerned, I found negligible speed improvements for small (< 100 items) lists. For larger lists (5k-10k), I found 20-30% improvements. Medium lists had benefits but not quite as large as bigger lists. I did not test super large lists, but my testing made it seem the large the list the better the results in comparison to the foreach process.

Printing out items in any Collection in reverse order?

I have the following problem in my Data Structures and Problem Solving using Java book:
Write a routine that uses the Collections API to print out the items in any Collection in reverse order. Do not use a ListIterator.
I'm not putting it up here because I want somebody to do my homework, I just can't seem to understand exactly what it is asking for me to code!
When it asks me to write a 'routine', is it looking for a single method? I don't really understand how I can make a single method work for all of the various types of Collections (linked list, queue, stack).
If anybody could guide me in the right direction, I would greatly appreciate it.
Regardless from the question not making much sense as half of the collections have no gstable ordering of have fixed-ordering (i.e. TreeSet or PriorityQueue), you can use the following statement for printing the contents of a collection in reverse-natural order:
List temp = new ArrayList(src);
Collections.reverse(temp);
System.out.println(temp);
I essence you create an array list as lists are the only structure that can be arbitrarily reordered. You pass the src collection to the constructor which initializes the list withj the contents of the src in the collection natural order. Then you pass the list to the Collections.reverse() method which reverses the list and finally you print it.
First, I believe it is asking you to write a method. Like:
void printReverseList(Collection col) {}
Then there are many ways to do this. For example, only using the Collection API, use the toArray method and use a for loop to print out all the items from the end. Make sense?
As for the various classes using the Collection interface, it will automatically work for all of those since they must implement the interface (provided they implement it in a sane way;).
Well you could have a routine that delegates to other routines based on the input type, however I'm not sure there is a generic enough collection type that can be encompassed into one argument. I guess you could just use method overloading (having multiple methods with the same name, but accept different args).
That could technically count as 1 routine (all have the same name).
I don't know much Java, but considering the "Collections API" i imagine all those objects implement an interface you could iterate through someway. i suppose they all could have an itemAtIndex( int index ) and length() or similar method you could use.
You might want to read this.
Isn't there a base Collection class?
Probably worth looking here as a starting point: Collections.

Categories