How is the implementation of Iterator in Java different from that in C++?
In the current C++ (98) standard library (in particular the portion formerly known as STL) defines a form of iterators that are very close to C pointers (including arithmetic). As such they just point somewhere. To be useful, you generally need two pointers so that you can iterate between them. I understand C++0x introduces ranges which act more like Java iterators.
Java introduced the Iterator (and ListIterator) interface in 1.2, largely taking over from the more verbose Enumerable. Java has no pointer arithmetic, so there is no need to behave like a pointer. They have a hasNext method to see if they have go to the end, instead of requiring two iterators. The downside is that they are less flexible. The system requires methods as subList rather than iterating between two iterators are specific points in the containing list.
A general difference in style is that whereas C++ use "static polymorphism" through templates, Java uses interfaces and the common dynamic polymorphism.
The concept of iterators is to provide the glue to allow separation of "algorithm" (really control flow) and data container. Both approaches do that reasonably well. In ideal situations "normal" code should barely see iterators.
C++ doesn't specify how iterators are implemented. It does however specify their interface and the minimal behaviour they must provide in order to work with other Standard Library components.
For example, for input iterators, the Standard specifies that an iterator must be dereferenceable via the * operator. It does not however specify how that operator is to be implemented.
In C++, you see people passing around iterators all the time. C++ iterators "point" to a specific element of a container. You can dereference an iterator to get the element (and you can do this over and over). You can erase the element that an iterator refers to efficiently. You can also make copies of an iterator (either by assigning to another variable, or passing an iterator by value to a function) to keep track of multiple places at the same time. Iterators in C++ can become "invalidated" by certain operations on the container, depending on the container. When an iterator becomes invalidated (the rules of which might be complex), operations with the iterator have undefined behavior, and may (inconsistently) crash your program or return incorrect results; although in some data structures (e.g. std::list), iterators remain valid through most modifications of the container.
In Java, you don't see that kind of usage. An Iterator in Java points "between" two elements, rather than "at" an element. The only way you get elements with an iterator is to move it forwards to get the element you moved over. Of course that changes the state of the iterator, and you can't go back; unless you have a ListIterator in which case you can move both forwards and backwards (but it is still annoying to have to move forwards and backwards just to remain still). You can't copy an iterator, because the interface does not expose a public copy method; so for example, you can't just pass a marker of a specific location to a function, without giving that function a reference to the same iterator that you have, and thereby allowing them to change the state of your iterator. In Java, an iterator will be invalidated (at least in the standard containers) by any modification of the container except through the iterator's own add() or remove() methods, which is overly conservative (e.g. most modifications on a LinkedList should not affect an iterator). When you try to use an invalidated iterator, it raises ConcurrentModificationException (confusingly named because it has nothing to do with Concurrency) instead of possibly causing undefined behavior, which is good.
C++ iterators (STL) try to mimic pointer syntax as much as possible, through operator overloading.
The standard specification define the various iterator concepts (like forward, bidirectional, random access, input, output). Each concept should match a specific interface (e.g. ++ operator for forward iterator to go to the next element in sequence, -- for bidirectional, +, += for random access, etc).
implementations are defined entirely by standard library vendors/JRE vendors in C++/Java respectively. They are free to implement them however they want, as long as their behaviour conforms to the respective standards.
Related
I have looked into writing streams-based code in Java 8, and have noticed a pattern, namely that I frequently have a list but have the need to transform it to another list by applying a trivial mapping to each element. After writing .stream().map(...).collect(Collections.toList()) yet another time I remembered we have List.forEach so I looked for List.map but apparently this default method has not been added.
Why was List.map()(EDIT: or List.transform() or List.mumble())
not added (this is a history question) and is there a simple shorthand using other methods in the default runtime library that does the same thing that I have just not noticed?
As explained in “Why doesn't java.util.Collection implement the new Stream interface?” the design decision to separate the Collection API and the Stream API was made to separate eager and lazy operations.
In this regard, several bulk operation were added to the Collection API:
List.replaceAll(UnaryOperator)
List.sort(Comparator)
Map.replaceAll(BiFunction)
Collection.removeIf(Predicate)
Map.forEach(BiConsumer)
Iterable.forEach(Consumer)
Common to all these eager methods is that functions which evaluate to a result are used to modify the underlying Collection. A map method returning a new Iterable or Collection wouldn’t fit into the scheme.
Further, among these methods, forEach(Consumer) is the only one that happens to have a signature matching a Stream method. Which is unfortunate, as these methods don’t even do the same; the closest equivalent to Iterable.forEach(Consumer) is Stream.forEachOrdered(Consumer). But it is also clear, why there is a functional overlap.
Performing an action for its side effect for each element is the only bulk operation that doesn’t modify the source collection, hence can be offered by the Stream API as well (as a terminal operation). There, it would be chained after one or more lazily evaluated intermediate operations; using it without prepended intermediate operations, is a special case.
Since map isn’t a terminal operation, it wouldn’t fit into the scheme of Collection methods at all. The closest equivalent is List.replaceAll(UnaryOperator).
Of course, I can't look into the head of the Java designers, but I can think of a number of reasons not to include a map (or other stream methods) on collections.
It's API bloat. The same thing can be done, in a more general way, with minor typing overhead using streams.
It leads to code bloat. If I called map on a list, I would expect the result to have the same runtime type (or at least with the runtime properties) as the list I called it on. So for a ArrayList.map would return an ArrayList, LinkedList.map a LinkedList etc. That means that the same functionality would need to be implemented in all List implementations (with a suitable default implementation in the interface so old code will not be broken).
It would encourage code like list.map(function1).map(function2), which is considerably less efficient than list.stream().map(function1).map(function2).collect(Collectors.toList()) because the former constructs an auxiliary list which is immediately thrown away, while the latter applies both functions to the list elements and only then constructs the result list.
For a functional language like Scala the balance between advantages and disadvantages might be different.
I do not know of shortcuts in the Java standard library, but you can of course implement your own:
public static <S,T> List<T> mapList(List<S> list, Function<S,T> function) {
return list.stream().map(function).collect(Collectors.toList());
}
From conceptual point of view, forEach method (which, as several time it was said in commentaries above declared in java.lang.Iterable interface) and map()methods are very different.
The main difference is that java.lang.Iterable#forEach(...) returns nothing, it is void. So adding it to Iterable interface with default implementation doesn't break anything, and good fits into logic of this structure.
While java.util.stream.Stream#map(...) returns <R> Stream<R>.
If I would be developer of Iterable interface and would be asked to add map there, I would first ask: what type it should return? If it is <R> Stream<R>, so Iterable is not proper place for it. If not - what else?
I believe this is the reason.
UPD: #DavidtenHove suggested, why would not make this map method returning Iterable<B>, like:
default <A, B> Iterable<B> map(Function<A, B> f) {
//...
}
my opinion: because in this case Iterable interface becomes analogue of Stream interface, what doesn't have sense.
forEach also duplicates logic of Stream, but it fits quite good in Iterable logic.
Simply put: they didn't want to put everything in the Stream class on all streamables classes. They would simply get too big. They put forEach because it is so commonly used, but drew the line there.
As for the shorthands, I do not believe there are any. You just have to use the collect() method with the collectors.
Is there any reasons/arguments not to implement a Java collection that restricts its members based on a predicate/constraint?
Given that such functionality should be necessary often, I was expecting it to be implemented already on collections frameworks like apache-commons or Guava. But while apache indeed had it, Guava deprecated its version of it and recommend not using similar approaches.
The Collection interface contract states that a collection may place any restrictions on its elements as long as it is properly documented, so I'm unable to see why a guarded collection would be discouraged. What other option is there to, say, ensure a Integer collection never contains negative values without hiding the whole collection?
It is just a matter of preference -look at thread about checking before vs checking after - I think that is what it boils down to. Also checking only on add() i good enough only for immutable objects.
There can hardly be one ("acceptable") answer, so I'll just add some thoughts:
As mentioned in the comments, the Collection#add(E) already allows for throwing an IllegalArgumentException, with the reason
if some property of the element prevents it from being added to this collection
So one could say that this case was explicitly considered in the design of the collection interface, and there is no obvious, profound, purely technical (interface-contract related) reason to not allow creating such a collection.
However, when thinking about possible application patterns, one quickly finds cases where the observed behavior of such a collection could be ... counterintuitive, to say the least.
One was already mentioned by dcsohl in the comments, and referred to cases where such a collection would only be a view on another collection:
List<Integer> listWithIntegers = new ArrayList<Integer>();
List<Integer> listWithPositiveIntegers =
createView(listWithIntegers, e -> e > 0);
//listWithPositiveIntegers.add(-1); // Would throw IllegalArgumentException
listWithIntegers.add(-1); // Fine
// This would be true:
assert(listWithPositiveIntegers.contains(-1));
However, one could argue that
Such a collection would not necessarily have to be only a view. Instead, one could enforce that only new collections with such constraints may be created
The behavior is similar to that of Collections.unmodifiableCollection(Collection), which is widely anticipated as it is. (Although it serves a far broader and omnipresent use-case, namely avoiding the internal state of a class to be exposed by returning a modifiable version of a collection via an accessor method)
But in this case, the potential for "inconsistencies" is much higher.
For example, consider a call to Collection#addAll(Collection). It also allows throwing an IllegalArgumentException "if some property of an element of the specified collection prevents it from being added to this collection". But there are no guarantees about things like atomicity. To phrase it that way: It is not specified what the state of the collection will be when such an exception was thrown. Imagine a case like this:
List<Integer> listWithPositiveIntegers = createList(e -> e > 0);
listWithPositiveIntegers.add(1); // Fine
listWithPositiveIntegers.add(2); // Fine
listWithPositiveIntegers.add(Arrays.asList(3,-4,5)); // Throws
assert(listWithPositiveIntegers.contains(3)); // True or false?
assert(listWithPositiveIntegers.contains(5)); // True or false?
(It may be subtle, but it may be an issue).
All this might become even trickier when the condition changes after the collection has been created (regardless of whether it is only a view or not). For example, one could imagine a sequence of calls like this:
List<Integer> listWithPredicate = create(predicate);
listWithPredicate.add(-1); // Fine
someMethod();
listWithPredicate.add(-1); // Throws
Where in someMethod(), there is an innocent line like
predicate.setForbiddingNegatives(true);
One of the comments already mentioned possible performance issues. This is certainly true, but I think that this is not really a strong technical argument: There are no formal complexity guarantees for the runtime of any method of the Collection interface, anyhow. You don't know how long a collection.add(e) call takes. For a LinkedList it is O(1), but for a TreeSet it may be O(n log n) (and who knows what n is at this point in time).
Maybe the performance issue and the possible inconsistencies can be considered as special cases of a more general statement:
Such a collection would allow to basically execute arbitrary code during many operations - depending on the implementation of the predicate.
This may literally have arbitrary implications, and makes reasoning about algorithms, performance and the exact behavior (in terms of consistency) impossible.
The bottom line is: There are many possible reasons to not use such a collection. But I can't think of a strong and general technical reason. So there may be application cases for such a collection, but the caveats should be kept in mind, considering how exactly such a collection is intended to be used.
I would say that such a collection would have too many responsibilities and violate SRP.
The main issue I see here is the readability and maintainability of the code that uses the collection. Suppose you have a collection to which you allow adding only positive integers (Collection<Integer>) and you use it throughout the code. Then the requirements change and you are only allowed to add odd positive integers to it. Because there are no compile time checks, it would be much harder for you to find all the occurrences in the code where you add elements to that collection than it would be if you had a separate wrapper class which encapsulates the collection.
Although of course not even close to such an extreme, it bears some resemblance to using Object reference for all objects in the application.
The better approach is to utilize compile time checks and follow the well-established OOP principles like type safety and encapsulation. That means creating a separate wrapper class or creating a separate type for collection elements.
For example, if you really want to make quite sure that you only work with positive integers in a context, you could create a separate type PositiveInteger extends Number and then add them to a Collection<PositiveInteger>. This way you get compile time safety and converting PositiveInteger to OddPositiveInteger requires much less effort.
Enums are an excellent example of preferring dedicated types vs runtime-constrained values (constant strings or integers).
Iterator Pattern Definition: Provides a way to access the elements of an aggregate object sequentially without exposing its underlying representation. Wiki
What are the consequences of exposing the underlying representation?
To provide a more detailed answer: How is the iterator pattern preventing this?
As per: http://www.oodesign.com/iterator-pattern.html
The idea of the iterator pattern is to take the responsibility of accessing and passing trough the objects of the collection and put it in the iterator object. The iterator object will maintain the state of the iteration, keeping track of the current item and having a way of identifying what elements are next to be iterated.
Few benefits that you can get from this pattern:
Using Iterator pattern code designer can decide whether to allow 1 way iteration (using next() only) or allow reverse iteration as well (using prev() as in ListIterator).
Whether to allow object removal or not, if yes then how.
Maintain internal housekeeping when object is removed.
It allows you to expose common mechanism of traversing a collection rather than expecting your clients to understand underlying collections.
If the underlying representation were exposed, client code could couple to it. Then:
If the representation changes, it may be necessary to change all the code coupling to it.
If you want to iterate over a different type of container, it may be necessary to change the code coupling to the old container.
Data abstraction makes code more resilient to a change in the representation.
In short: all the code relying on the underlying representation will have to be changed if you decide to change the representation.
E.g., you decided to use TreeMap at first, but then you don't want ordering anymore (in most cases), so you change to HashMap. Somebody is looping through your map trying to get a increasing list. !!
Using iterator pattern, you could always give the user the ability to loop through something with a certain logic (or just random, which is a kind of logic) without knowing what it is under the hood.
Now, if you use HashMap instead of TreeMap, you could expose a sorted view to the user. If you provide this SortedIterator and tell user "using this will guarantee the result to be sorted, but I can't tell you anything about what's underneath", you can change the representation to be whatever you like, as long as the contract of this SortedIterator is maintained by you.
Is there any practical difference between a Set and Collection in Java, besides the fact that a Collection can include the same element twice? They have the same methods.
(For example, does Set give me more options to use libraries which accept Sets but not Collections?)
edit: I can think of at least 5 different situations to judge this question. Can anyone else come up with more? I want to make sure I understand the subtleties here.
designing a method which accepts an argument of Set or Collection. Collection is more general and accepts more possibilities of input. (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Collection.)
designing a method which returns a Set or Collection. Set offers more guarantees than Collection (even if it's just the guarantee not to include one element twice). (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Set.)
designing a class that implements the interface Set or Collection. Similar issues as #2. Users of my class/interface get more guarantees, subclassers/implementers have more responsibility.
designing an interface that extends the interface Set or Collection. Very similar to #3.
writing code that uses a Set or Collection. Here I might as well use Set; the only reasons for me to use Collection is if I get back a Collection from someone else's code, or if I have to handle a collection that contains duplicates.
Collection is also the supertype of List, Queue, Deque, and others, so it gives you more options. For example, I try to use Collection as a parameter to library methods that shouldn't explicitly depend on a certain type of collection.
Generally, you should use the right tool for the job. If you don't want duplicates, use Set (or SortedSet if you want ordering, or LinkedHashSet if you want to maintain insertion order). If you want to allow duplicates, use List, and so on.
I think you already have it figured out- use a Set when you want to specifically exclude duplicates. Collection is generally the lowest common denominator, and it's useful to specify APIs that accept/return this, which leaves you room to change details later on if needed. However if the details of your application require unique entries, use Set to enforce this.
Also worth considering is whether order is important to you; if it is, use List, or LinkedHashSet if you care about order and uniqueness.
See Java's Collection tutorial for a good walk-through of Collection usage. In particular, check out the class hierarchy.
As #mmyers states, Collection includes Set, as well as List.
When you declare something as a Set, rather than a Collection, you are saying that the variable cannot be a List or a Map. It will always be a Collection, though. So, any function that accepts a Collection will accept a Set, but a function that accepts a Set cannot take a Collection (unless you cast it to a Set).
One other thing to consider... Sets have extra overhead in time, memory, and coding in order to guarantee that there are no duplicates. (Time and memory because sets are usually backed by a HashMap or a Tree, which adds overhead over a list or an array. Coding because you have to implement the hashCode() and equals() methods.)
I usually use sets when I need a fast implementation of contains() and use Collection or List otherwise, even if the collection shouldn't have duplicates.
You should use a Set when that is what you want.
For example, a List without any order or duplicates. Methods like contains are quite useful.
A collection is much more generic. I believe that what mmyers wrote on their usage says it all.
The practical difference is that Set enforces the set logic, i.e. no duplicates and unordered, while Collection does not. So if you need a Collection and you have no particular requirement for avoiding duplicates then use a Collection. If you have the requirement for Set then use Set. Generally use the highest interface possibble.
As Collection is a super type of Set and SortedSet these can be passed to a method which expects a Collection. Collection just means it may or may not be sorted, order or allow duplicates.
Why is Java Vector considered a legacy class, obsolete or deprecated?
Isn't its use valid when working with concurrency?
And if I don't want to manually synchronize objects and just want to use a thread-safe collection without needing to make fresh copies of the underlying array (as CopyOnWriteArrayList does), then is it fine to use Vector?
What about Stack, which is a subclass of Vector, what should I use instead of it?
Vector synchronizes on each individual operation. That's almost never what you want to do.
Generally you want to synchronize a whole sequence of operations. Synchronizing individual operations is both less safe (if you iterate over a Vector, for instance, you still need to take out a lock to avoid anyone else changing the collection at the same time, which would cause a ConcurrentModificationException in the iterating thread) but also slower (why take out a lock repeatedly when once will be enough)?
Of course, it also has the overhead of locking even when you don't need to.
Basically, it's a very flawed approach to synchronization in most situations. As Mr Brian Henk pointed out, you can decorate a collection using the calls such as Collections.synchronizedList - the fact that Vector combines both the "resized array" collection implementation with the "synchronize every operation" bit is another example of poor design; the decoration approach gives cleaner separation of concerns.
As for a Stack equivalent - I'd look at Deque/ArrayDeque to start with.
Vector was part of 1.0 -- the original implementation had two drawbacks:
1. Naming: vectors are really just lists which can be accessed as arrays, so it should have been called ArrayList (which is the Java 1.2 Collections replacement for Vector).
2. Concurrency: All of the get(), set() methods are synchronized, so you can't have fine grained control over synchronization.
There is not much difference between ArrayList and Vector, but you should use ArrayList.
From the API doc.
As of the Java 2 platform v1.2, this
class was retrofitted to implement the
List interface, making it a member of
the Java Collections Framework. Unlike
the new collection implementations,
Vector is synchronized.
Besides the already stated answers about using Vector, Vector also has a bunch of methods around enumeration and element retrieval which are different than the List interface, and developers (especially those who learned Java before 1.2) can tend to use them if they are in the code. Although Enumerations are faster, they don't check if the collection was modified during iteration, which can cause issues, and given that Vector might be chosen for its syncronization - with the attendant access from multiple threads, this makes it a particularly pernicious problem. Usage of these methods also couples a lot of code to Vector, such that it won't be easy to replace it with a different List implementation.
You can use the synchronizedCollection/List method in java.util.Collection to get a thread-safe collection from a non-thread-safe one.
java.util.Stack inherits the synchronization overhead of java.util.Vector, which is usually not justified.
It inherits a lot more than that, though. The fact that java.util.Stack extends java.util.Vector is a mistake in object-oriented design. Purists will note that it also offers a lot of methods beyond the operations traditionally associated with a stack (namely: push, pop, peek, size). It's also possible to do search, elementAt, setElementAt, remove, and many other random-access operations. It's basically up to the user to refrain from using the non-stack operations of Stack.
For these performance and OOP design reasons, the JavaDoc for java.util.Stack recommends ArrayDeque as the natural replacement. (A deque is more than a stack, but at least it's restricted to manipulating the two ends, rather than offering random access to everything.)