I came to know that in Java, LinkedList class implements both Deque and List interfaces.
And this was somewhat confusing to me.
In computer science syllabus, I was never taught that queue can be a list, or more precisely queue can behave like a list. That is, there is stuff that lists can do, but queues can't. But the list can behave like a queue. For example, List interface has the following methods:
add(E e)
add(int index, E element)
But Queue has only the following:
add(E e)
So clearly Queue is not allowed to insert at specific index, which is allowed in List. The same is the case with other operations like Queue.remove() vs. List.remove(int index), List.get(int index) vs. Queue.peek().
In other words, list is a more generalized data structure and can emulate Queue.
Now being capable to emulate is different from having a contract subset. That is, Queue disallows certain operations (indexing) of Listand allows certain operations done only in a particular manner (insert only at the tail and remove only from the head). So Queue does not really do "addition" to the contracts of List. That is precisely why Queue does not extend List in Java collections framework, but both extend Collection interface. I believe that is also why it's incorrect for any class to implement both, as Queue's contract conflicts with the contract of List (which is why they fork out from Collection interface separately). However, LinkedList implements both the interfaces.
I also came across this answer:
The LinkedList implementation happens to satisfy the Deque contract, so why not make it implement the interface?
I still don't get how we can say "LinkedList implementation happens to satisfy the Deque contract". The concept of a queue does not allow insertion at an arbitrary index. Hence, the Queue interface does not have such methods.
However we can only enforce contracts through interfaces and cannot disallow implementation of certain methods. Being list (having "List" in its name), I feel it's not correct to have queue methods peek(), pop() and add(int index, E element) in LinkedList.
I believe, instead we should have separate class LinkedQueue which can have linked implementation for queue, similar to LinkedBlockingQueue which contains linked implementation of BlockingQueue.
Also note that LinkedList is the only class which inherits from both families of lists and queues, that is, there is no other class which implements both List and Queue (AFAIK). Can this be indication that LinkedList is ill done?
Am I plain wrong and thinking unnecessarily?
You're entirely missing the point of programming to interface.
If you need a Queue, you never write:
LinkedList<String> queue = new LinkedList<>();
Because, you're right, that would allow you to use non-queue methods. Instead, you program to the interface like this:
Queue<String> queue = new LinkedList<>();
Now you only have access to the 6 Queue methods (and all the Collection methods). So, even though LinkedList implements more methods, you no longer have access to them.
So, if you need a Queue, you choose the implementation of the Queue interface that best suits the performance, storage, and access characteristics you need, e.g.
LinkedList uses more memory, but it shrinks when queue is emptied.
ArrayDeque uses less memory, but it doesn't shrink.
PriorityQueue is a non-FIFO queue with element priority.
ConcurrentLinkedQueue, ConcurrentLinkedDeque supports multi-threaded concurrent access.
and more...
I was never taught that queue can be a list, or more precisely queue can behave like a list.
Remember that implements defines a behaves like relationship. A LinkedList behaves like a List. A LinkedList behaves like a Deque. A LinkedList behaves like a Queue.
But just because LinkedList behaves like all of those, doesn't mean that List behaves like a Queue or that Queue behaves like a List. They do not.
The behaves like relation only goes one way.
#Andreas's answer is excellent, so mine targets your arguments about what you were or were not taught:
In computer science syllabus, I was never taught that queue can be a list or more precisely queue can behave like a list
A queue is not just any list, but a special kind of list, with its own special properties and constraints.
That is, there is stuff that lists can do, but queues can't.
No, List can do nothing. It provides possibilities to be implemented by a class and if that class decides to implement them then that class can do all that stuff.
But the list can behave like a queue.
No, List does not behave; it only suggests behaviors and classes that implement it can accept all or a subset of them or they can define new ones.
LinkedList is a class that implements both List and Deque interfaces. Each one of these interfaces defines a contract with operations, and in these contracts it is specified what these operations must do. However, it is not specified how these operations are supposed to work.
LinkedList is a class that implements both List and Deque interfaces. So, despite the suffix List is part of the name of the LinkedList class, LinkedList is actually both a List and a Deque, because it implements all of the operations that are defined in the List and Deque interfaces.
So LinkedList is a a List, and it also is a Deque. This doesn't mean that a List should be a Deque, or that a Deque should be a List.
For example, look at the following interfaces:
public interface BloodDrinker {
void drinkBlood();
}
public interface FlyingInsect {
void flyAround();
}
Each one of the interfaces above has a single operation and defines a contract. The drinkBlood operation defines what a BloodDrinker must do, but not how. Same applies for a FlyingInsect: its flyAround operation defines what it must do, but not how.
Now consider the Mosquito class:
public class Mosquito implements FlyingInsect, BloodDrinker {
public void flyAround() {
// fly by moving wings,
// buzzing and bothering everyone around
}
public void drinkBlood() {
// drink blood by biting other animals:
// suck their blood and inject saliva
}
}
Now, this means that a Mosquito is both a FlyingInsect and a BloodDrinker, but why would a blood drinker necessarily be a flying insect, or a flying insect necessarily be a blood drinker? For example, vampires are blood drinkers, but not flying insects, while butterflies are flying insects, but not blood drinkers.
Now, with regard to your argument about Queue disallowing certain List's operations (indexing), and only allowing addition/removal on its ends in a FIFO fashion... I don't think this rationale is correct, at least in the context of the Java Collections Framework. The contract of Deque doesn't explicitly mention that implementors will never ever be able to add/remove/check elements at any given index. It just says that a Deque is:
A linear collection that supports element insertion and removal at both ends.
And it also says that:
This interface defines methods to access the elements at both ends of the deque.
(Emphasis mine).
A few paragraphs later, it does explicitly say that:
Unlike the List interface, this interface does not provide support for indexed access to elements.
(Emphasis mine again).
The key part here is does not provide support. It never forbids implementors to access elements via indexes. It's just that indexed access is not supported through the Deque interface.
Think of my example above: why would a BloodDrinker disallow its implementors to drink something other than blood, i.e. water? Or why would a FlyingInsect disallow its implementors to move in a way different than flying, i.e. walking?
Bottom line, an implementation can adhere to as many contracts as it wishes, as long as these contracts don't contradict each other. And as it's worded in Java (a very careful and subtle wording, I must admit), Deque's contract doesn't contradict List's contract, so there can perfectly exist a class that implements both interfaces, and this happens to be LinkedList.
You are starting from a weak premise:
I was never taught that queue can be a list.
Let us go back to the basics. So what are data structures anyway? Here is how CLSR approaches that question1:
...Whereas mathematical sets are unchanging, the sets manipulated by
algorithms can grow, shrink, or otherwise change over time.
Mathematically, data structures are just sets; dynamic sets. In that sense, a queue can be a list. In fact, there is a problem in CLSR (10.2-3) that explicitly asks you to implement a queue by using a linked list.
On the other hand, object-oriented programming is a paradigm that helps programmers solve problems by adhering to a certain philosophy about the problem and the data. Objects, interfaces, and contracts are all part of this philosophy. Using this paradigm helps us implement the abstract concept of dynamic sets. However, it comes with its own baggage, one of them being the very problem asked about here.
So if you are complaining that the data structures in Java standard library do not strictly adhere to the conventions defined for elementary data structures, you are right. In fact, we do not even need to look further than java.util.Stack to see this2. You are also allowed to roll out your own implementation in any way you want and use them instead of standard library collections.
But to argue that Java, or its standard library for that matter, is broken or ill-done - an extraoridnary claim- you need to be very specific about the use case and clearly demonstrate how the alleged flaw in the library prevents you from achieving the design goals.
1 Introduction to Chapter III, p220
2 Sedgewick and Wayne call java.util.Stack a "wide interface"(p 160) because it allows random access to stack elements; something a stack -as defined in elementary data structures- is not supposed to be capable of.
You are entirely correct in this and not missing the point at all. Java simply made a trade-off between correctness and ease. Making it implement both interfaces was the easy thing to do and the one that was the most useful for developers.
What subtyping means
Correct (sound) subtyping requires substitution to work which requires according to the LSP:
Invariants of the supertype must be preserved in a subtype.
When we say in type theory "A LinkedList is a List and a Queue" we are actually saying that a LinkedList is both a list and a queue at the same time and not that a LinkedList can be thought of as either a list or a queue.
There is a violated invariant of a queue type here (that you cannot modify elements in its middle) so it is incorrect subtyping.
The actual argument that should be had is "whether or not a queue requires that elements can't be modified in the middle or only that they can be modified in the ends in FIFO".
One might argue that the invariants of a Queue are only that you can use it in a FIFO matter and not that you must. That is not the common interpertation of a queue.
Related
The question is framed for List but easily applies to others in the java collections framework.
For example, I would say it is certainly appropriate to make a new List sub-type to store something like a counter of additions since it is an integral part of the list's operation and doesn't alter that it "is a list". Something like this:
public class ChangeTrackingList<E> extends ArrayList<E> {
private int changeCount;
...
#Override public boolean add(E e) {
changeCount++;
return super.add(e);
}
// other methods likewise overridden as appropriate to track change counter
}
However, what about adding additional functionality out of the knowledge of a list ADT, such as storing arbitrary data associated with a list element? Assuming the associated data was properly managed when elements are added and removed, of course. Something like this:
public class PayloadList<E> extends ArrayList<E> {
private Object[] payload;
...
public void setData(int index, Object data) {
... // manage 'payload' array
payload[index] = data;
}
public Object getData(int index) {
... // manage 'payload' array, error handling, etc.
return payload[index];
}
}
In this case I have altered that it is "just a list" by adding not only additional functionality (behavior) but also additional state. Certainly part of the purpose of type specification and inheritance, but is there an implicit restriction (taboo, deprecation, poor practice, etc.) on Java collections types to treat them specially?
For example, when referencing this PayloadList as a java.util.List, one will mutate and iterate as normal and ignore the payload. This is problematic when it comes to something like persistence or copying which does not expect a List to carry additional data to be maintained. I've seen many places that accept an object, check to see that it "is a list" and then simply treat it as java.util.List. Should they instead allow arbitrary application contributions to manage specific concrete sub-types?
Since this implementation would constantly produce issues in instance slicing (ignoring sub-type fields) is it a bad idea to extend a collection in this way and always use composition when there is additional data to be managed? Or is it instead the job of persistence or copying to account for all concrete sub-types including additional fields?
This is purely a matter of opinion, but personally I would advise against extending classes like ArrayList in almost all circumstances, and favour composition instead.
Even your ChangeTrackingList is rather dodgy. What does
list.addAll(Arrays.asList("foo", "bar"));
do? Does it increment changeCount twice, or not at all? It depends on whether ArrayList.addAll() uses add(), which is an implementation detail you should not have to worry about. You would also have to keep your methods in sync with the ArrayList methods. For example, at present addAll(Collection<?> collection) is implemented on top of add(), but if they decided in a future release to check first if collection instanceof ArrayList, and if so use System.arraycopy to directly copy the data, you would have to change your addAll() method to only increment changeCount by collection.size() if the collection is an ArrayList (otherwise it gets done in add()).
Also if a method is ever added to List (this happened with forEach() and stream() for example) this would cause problems if you were using that method name to mean something else. This can happen when extending abstract classes too, but at least an abstract class has no state, so you are less likely to be able to cause too much damage by overriding methods.
I would still use the List interface, and ideally extend AbstractList. Something like this
public final class PayloadList<E> extends AbstractList<E> implements RandomAccess {
private final ArrayList<E> list;
private final Object[] payload;
// details missing
}
That way you have a class that implements List and makes use of ArrayList without you having to worry about implementation details.
(By the way, in my opinion, the class AbstractList is amazing. You only have to override get() and size() to have a functioning List implementation and methods like containsAll(), toString() and stream() all just work.)
One aspect you should consider is that all classes that inherit from AbstractList are value classes. That means that they have meaningful equals(Object) and hashCode() methods, therefore two lists are judged to be equal even if they are not the same instance of any class.
Furthermore, the equals() contract from AbstractList allows any list to be compared with the current list - not just a list with the same implementation.
Now, if you add a value item to a value class when you extend it, you need to include that value item in the equals() and hashCode() methods. Otherwise you will allow two PayloadList lists with different payloads to be considered "the same" when somebody uses them in a map, a set, or just a plain equals() comparison in any algorithm.
But in fact, it's impossible to extend a value class by adding a value item! You'll end up breaking the equals() contract, either by breaking symmetry (A plain ArrayList containing [1,2,3] will return true when compared with a PayloadList containing [1,2,3] with a payload of [a,b,c], but the reverse comparison won't return true). Or you'll break transitivity.
This means that basically, the only proper way to extend a value class is by adding non-value behavior (e.g. a list that logs every add() and remove()). The only way to avoid breaking the contract is to use composition. And it has to be composition that does not implement List at all (because again, other lists will accept anything that implements List and gives the same values when iterating it).
This answer is based on item 8 of Effective Java, 2nd Edition, by Joshua Bloch
If the class is not final, you can always extend it. Everything else is subjective and a matter of opinion.
My opinion is to favor composition over inheritance, since in the long run, inheritance produces low cohesion and high coupling, which is the opposite of a good OO design. But this is just my opinion.
The following is all just opinion, the question invites opinionated answers (I think its borderline to not being approiate for SO).
While your approach is workable in some situations, I'd argue its bad design and it is very brittle. Its also pretty complicated to cover all loopholes (maybe even impossible, for example when the list is sorted by means other than List.sort).
If you require extra data to be stored and managed it would be better integrated into the list items themselves, or the data could be associated using existing collection types like Map.
If you really need an association list type, consider making it not an instance of java.util.List, but a specialized type with specialized API. That way no accidents are possible by passing it where a plain list is expected.
Is there a java collection interface that guarantees no duplicates as well as the preservation of insertion order at the same time?
This is exactly what LinkedHashSet is doing? However, I am wondering if there is also an interface guaranteeing the same thing in order to avoid direct dependency on some specific class?
SortedSet is referring only to the natural order (and is not implemented by LinkedHashSet).
Essentially, I am looking for an interface that would indicate that the iteration order of elements is significant (and at the same time it contains no duplicates, i.e., List obviously would not apply).
Thanks!
UPDATE this question is not asking for an implementation or a data structure (as in the question to which this was marked as a duplicate). As several people pointed out as clarification, I am looking for an interface that demands both properties (no duplicates and significant order) in its contract. The application for this would be that I can return objects of this type to clients without promising any specific implementation.
UPDATE 2 Moreover, the related question specifically asks for preserving duplicates in contrast to this question. So I am pretty certain it is not a duplicate.
No interface in the JDK collections provides that.
You could try to build it by combining Set and List. Any collection implementing Set should not allow duplicate elements, and any collection implementing List should maintain order.
But then, no class in the JDK collection implements both Set and List. Because unfortunately LinkedHashSet does not implement List.
Of course, you could build one implementation easily by wrapping a LinkedHashSet (by composition patter, not by derivation) and adding a get(int i) method, or by wrapping an ArrayList (again by composition) and throwing an IllegalArgumentException when trying to add a new element.
The most tricky part IMHO would be the addAll method as both interfaces define it with different semantics (emphasize mine) :
Set: Adds all of the elements in the specified collection to this set if they're not already present
List : Appends all of the elements in the specified collection to the end of this list, in the order that they are returned by the specified collection's iterator
As you cannot meet both requirements is source collection contains duplicates, my advice would be that addAll throws an IllegalArgumentException in that case, or more simply that it always throw an UnsupportedOperationException as addAll is an optional operation for both interfaces
In the JDK, there's Collection.emtpyList() and Collection.emptySet(). Both in their own right. But sometimes all that is needed is an empty, immutable instance of Collection. To me, there's no reason to chose one over the other as both implement all operations of Collection in an efficient way and with the same results. Yet each time I need such an empty collection I ponder which one to use for a second of two.
I do not expect to gain a deeper understanding of the collections framework from an answer to this question but maybe there's a subtle reason I could use to justify choosing one over the other without thinking about it ever again.
An answer should state at least one reason preferring one of Collection.emtpyList() and Collection.emptySet() over the other in a context where they're functionally equivalent. An answer is better if the stated reason is near the top of this list:
There's a case where the type system is happier with one over the other (e.g. type inference allows shorter code with one than the other).
There is a performance difference, maybe in some special case (e.g. if the empty collection is passed as an argument to some of the collection framework's static or instance methods like Collections.sort() or Collection.removeAll()).
Choosing one over the other "makes more sense" in the general case, if you think about it.
Examples where this question arises
To give some context, here are two examples where I am in need of an empty, unmodifiable collection.
This is an example of an API that allows creating some object by optionally specifying a collection of objects that are used in the creation. The second method just calls the first one with an empty collection:
static void createObjectWithTheseThings(Collection<Thing> things) {
...
}
static void createObjectWithoutAnyThings() {
createObjectWithTheseThings(Collections.emptyXXX());
}
This is an example of an Entity with state represented by an immutable collection stored in a non-final field. On initialization the field should be set to an empty collection:
class Example {
// Initialized to an empty collection.
private Collection<T> containedThings = Collections.emptyXXX();
...
}
Unfortunately I don't have an answer that will make the top of your priority list but if I were you I'd settle on
Collections.emptySet
Type inference was your first priority but I don't know if the choice can/should influence that given you were looking for an emptyCollection()
On the second priority, think about any api that takes in a collection which performs differently (accidentally/intentionally) based on the sub-interfaces of the concrete object passed in. Aren't they more likely to offer varied performance based on the concrete implementations (as with an ArrayList or LinkedList) instead? The empty set/list are not modeled on any empty data structures anyway; they are dummy implementations - hence no real difference
Based on java's modelling of these interfaces (which admittedly is not ideal), a Collection is very similar to a Set. In fact I think the methods are almost exactly the same. Logically too it looks OK with List being the specific-sub type that adds additional ordering concerns.
Now Collection and Set looking very similar(java-wise) brings up a question. If you are using a Collection type, it is clear it is not a list you want. Now the question is are you sure you don't mean a Set. If you don't, then are you using something like a Bag (surely there must be concrete instances which are not empty in the overall logic). So if you are concerned with say a Bag, then shouldn't it be up to the Bag api to provide an emptyBag() method? Just wondering. btw, I'd stick with emptySet() in the meantime :)
For the emptyXXX(), it really doesn't matter at all - since they are both empty (and they are unmodifieable, so they always stay empty) it doesn't matter at all. They will be equally suited to all operations Collection offers.
Take a look at what Collections really gives you there: Special implementations (the instances are shared across calls!). All relevant operations are dummy implementations that either return a constant result or immediately throw. Even iterator() is just a dummy with no state.
It wont make any notable difference at all.
Edit: You could say for the special case of emptyList/Set, they are semantically and complexity-wise the same at the Collecton interface level. All operations available on Collection are implemented by emptySet/List as O(1) operations. And since they're following both the contract defined by Collection, they are semantically identical too.
The only situation I can imagine this making a difference is if the code that will use your Collection does something like this:
Collection<T> collection = ...
List<T> asAList;
if (collection instanceof List) {
asAList = (List<T>) collection;
} else {
asAList = new ArrayList<T>(collection);
}
Obviously in a case like this you would want to use emptyList(), while if the secret target type was a Set, you'd want emptySet().
Otherwise, in terms of what "makes more sense", I agree with #ac3's logic that a generic Collection is like a Bag, and an empty immutable Set and empty immutable Bag are pretty much the same thing. However, a person very used to using immutable lists might find those easier to think of.
I don't understand difference between:
ArrayList<Integer> list = new ArrayList<Integer>();
Collection<Integer> list1 = new ArrayList<Integer>();
Class ArrayList extends class which implements interface Collection, so Class ArrayList implements Collection interface. Maybe list1 allows us to use static methods from the Collection interface?
An interface has no static methods [in Java 7]. list1 allows to access only the methods in Collection, whereas list allows to access all the methods in ArrayList.
It is preferable to declare a variable with its least specific possible type. So, for example, if you change ArrayList into LinkedList or HashSet for any reason, you don't have to refactor large portions of the code (for example, client classes).
Imagine you have something like this (just for illustrational purposes, not compilable):
class CustomerProvider {
public LinkedList<Customer> getAllCustomersInCity(City city) {
// retrieve and return all customers for that city
}
}
and you later decide to implement it returning a HashSet. Maybe there is some client class that relies on the fact that you return a LinkedList, and calls methods that HashSet doesn't have (e.g. LinkedList.getFirst()).
That's why you better do like this:
class CustomerProvider {
public Collection<Customer> getAllCustomersInCity(City city) {
// retrieve and return all customers for that city
}
}
What we're dealing with here is the difference between interface and implementation.
An interface is a set of methods without any regard to how those methods are implemented. When we instantiate an object as having a type that is actually an interface, what we're saying is that it is an object that implements all of the methods in that interface... but doesn't provide is with access to any of the methods in the class that actually provides those implementations.
When you instantiate an object with the type of an implementing class, then you have access to all of relevant methods of that class. Since that class is implementing an interface, you have access to the methods specified in the interface, plus any extras provided by the implementing class.
Why would you want to do this? Well, by restricting the type of your object to the interface, you can switch in new implementations without worrying about changing the rest of your code. This makes it a whole lot more flexible.
The difference, as others have said, is that you are limited to the methods defined by the Collection interface when you specify that as your variable type. But that doesn't answer the question of why you would want to do this.
The reason is that the choice of data type provides information to the people using the code. Especially when used as the parameter or return type from a function (where outside programmers may have no access to the internals).
In order of specificity, here is what different type choices might tell you:
Collection - a group of objects, with no further guarantees. The consumer of this object can iterate over the collection (with no guarantees as to iteration order), and can learn its size, but cannot do anything else.
List - a group of objects that have a specific order. When you iterate over these objects, you will always get them in the same order. You can also retrieve specific items from the collection by index, but you cannot make any assumptions about the performance of such retrieval.
ArrayList - a group of objects that have a specific order, and may be accessed by index in constant time.
And although you didn't ask about them, here are some other collection classes:
Set a group of objects that is guaranteed to contain no duplicates per the equals() method. There are no guarantees regarding the iteration order of these objects.
SortedSet a group of objects that contains no duplicates, and will always iterate in a specific order (although that specific order is not guaranteed by the collection).
TreeSet a group of ordered objects with no duplicates, that exhibits O(logN) insert and retrieval times.
HashSet a group of objects with no duplicates, that does not have an inherent order, but provides (amortized) constant-time access.
The only difference is that you're providing access to list1 through the Collection interface, whereas you provide access to list2 through the ArrayList interface. Sometimes, providing access through a restricted interface is useful, in that it promotes encapsulation and reduces dependence on implementation details.
When you perform operations on "list1", you'll only be able to access methods from the Collection interface (get, size, etc.). By declaring "list" as an ArrayList, you gain access to additional methods only defined in the ArrayList class (ensureCapacity and trimToSize, for example.
It's typically best practice to declare the variable as the least specific class you need. So, if you only need the methods from Collection, use it. Typically in this case, that would mean using List, which lets you know it's ordered and can handle duplicates.
Using the least specific class/interface allows you to freely change the implementation later. For example, if you later learn that a LinkedList would be a better implementation to use, you could change it without breaking all your code if you define the variable to be a List.
I just learned about how the Java Collections Framework implements data structures in linked lists. From what I understand, Iterators are a way of traversing through the items in a data structure such as a list. Why is this interface used? Why are the methods hasNext(), next() and remove() not directly coded to the data structure implementation itself?
From the Java website: link text
public interface Iterator<E> An
iterator over a collection. Iterator
takes the place of Enumeration in the
Java collections framework. Iterators
differ from enumerations in two ways:
Iterators allow the caller to remove
elements from the underlying
collection during the iteration with
well-defined semantics. Method names
have been improved. This interface is
a member of the Java Collections
Framework.
I tried googling around and can't seem to find a definite answer. Can someone shed some light on why Sun chose to use them? Is it because of better design? Increased security? Good OO practice?
Any help will be greatly appreciated. Thanks.
Why is this interface used?
Because it supports the basic operations that would allow a client programmer to iterate over any kind of collection (note: not necessarily a Collection in the Object sense).
Why are the methods... not directly
coded to the data structure
implementation itself?
They are, they're just marked Private so you can't reach into them and muck with them. More specifically:
You can implement or subclass an Iterator such that it does something the standard ones don't do, without having to alter the actual object it iterates over.
Objects that can be traversed over don't need to have their interfaces cluttered up with traversal methods, in particular any highly specialized methods.
You can hand out Iterators to however many clients you wish, and each client may traverse in their own time, at their own speed.
Java Iterators from the java.util package in particular will throw an exception if the storage that backs them is modified while you still have an Iterator out. This exception lets you know that the Iterator may now be returning invalid objects.
For simple programs, none of this probably seems worthwhile. The kind of complexity that makes them useful will come up on you quickly, though.
You ask: "Why are the methods hasNext(), next() and remove() not directly coded to the data structure implementation itself?".
The Java Collections framework chooses to define the Iterator interface as externalized to the collection itself. Normally, since every Java collection implements the Iterable interface, a Java program will call iterator to create its own iterator so that it can be used in a loop. As others have pointed out, Java 5 allows us to direct usage of the iterator, with a for-each loop.
Externalizing the iterator to its collection allows the client to control how one iterates through a collection. One use case that I can think of where this is useful is when one has an an unbounded collection such as all the web pages on the Internet to index.
In the classic GoF book, the contrast between internal and external iterators is spelled out quite clearly.
A fundamental issue is deciding which party conrols the iteration, the iterator or the client that uses the iterator. When the client controls the iteration, the iterator is called an external iterator, and when the iterator controls it, the iterator is an internal iterator. Clients that use an external iterator must advance the traversal and request the next element explicitly from the iterator. In contrast, the client hands an internal iterator an operation to perform, and the iterator applies that operation to every element ....
External iterators are more flexible than internal iterators. It's easy to compare two collections for equality with an external iterator, for example, but it's practically impossible with internal iterators ... But on the other hand, internal iterators are easier to use, because they define the iteration logic for you.
For an example of how internal iterators work, see Ruby's Enumerable API, which has internal iteration methods such as each. In Ruby, the idea is to pass a block of code (i.e. a closure) to an internal iterator so that a collection can take care of its own iteration.
it is important to keep the collection apart from the pointer. the iterator points at a specific place in a collection, and thus is not an integral part of the collection. this way, for an instance, you can use several iterators over the same collection.
the down-side of this seperation is that the iterator is not aware to changes made to the collection it iterates on. so you cannot change the collection's structure and expect the iterator to continue it's work without "complaints".
Using the Iterator interface allows any class that implements its methods to act as iterators. The notion of an interface in Java is to have, in a way, a contractual obligation to provide certain functionalities in a class that implements the interface, to act in a way that is required by the interface. Since the contractual obligations must be met in order to be a valid class, other classes which see the class implements the interface and thus reassured to know that the class will have those certain functionalities.
In this example, rather than implement the methods (hasNext(), next(), remove()) in the LinkedList class itself, the LinkedList class will declare that it implements the Iterator interface, so others know that the LinkedList can be used as an iterator. In turn, the LinkedList class will implement the methods from the Iterator interface (such as hasNext()), so it can function like an iterator.
In other words, implementing an interface is a object-oriented programming notion to let others know that a certain class has what it takes to be what it claims to be.
This notion is enforced by having methods that must be implemented by a class that implements the interface. This makes sure that other classes that want to use the class that implements the Iterator interface that it will indeed have methods that Iterators should have, such as hasNext().
Also, it should be noted that since Java does not have multiple inheritance, the use of interface can be used to emulate that feature. By implementing multiple interfaces, one can have a class that is a subclass to inherit some features, yet also "inherit" the features of another by implementing an interface. One example would be, if I wanted to have a subclass of the LinkedList class called ReversibleLinkedList which could iterate in reverse order, I may create an interface called ReverseIterator and enforce that it provide a previous() method. Since the LinkedList already implements Iterator, the new reversible list would have implemented both the Iterator and ReverseIterator interfaces.
You can read more about interfaces from What is an Interface? from The Java Tutorial from Sun.
Multiple instances of an interator can be used concurrently. Approach them as local cursors for the underlying data.
BTW: favoring interfaces over concrete implementations looses coupling
Look for the iterator design pattern, and here: http://en.wikipedia.org/wiki/Iterator
Because you may be iterating over something that's not a data structure. Let's say I have a networked application that pulls results from a server. I can return an Iterator wrapper around those results and stream them through any standard code that accepts an Iterator object.
Think of it as a key part of a good MVC design. The data has to get from the Model (i.e. data structure) to the View somehow. Using an Iterator as a go-between ensures that the implementation of the Model is never exposed. You could be keeping a LinkedList in memory, pulling information out of a decryption algorithm, or wrapping JDBC calls. It simply doesn't matter to the view, because the view only cares about the Iterator interface.
An interesting paper discussing the pro's and con's of using iterators:
http://www.sei.cmu.edu/pacc/CBSE5/Sridhar-cbse5-final.pdf
I think it is just good OO practice. You can have code that deals with all kinds of iterators, and even gives you the opportunity to create your own data structures or just generic classes that implement the iterator interface. You don't have to worry about what kind of implementation is behind it.
Just M2C, if you weren't aware: you can avoid directly using the iterator interface in situations where the for-each loop will suffice.
Ultimately, because Iterator captures a control abstraction that is applicable to a large number of data structures. If you're up on your category theory fu, you can have your mind blown by this paper: The Essence of the Iterator Pattern.
Well it seems like the first bullet point allows for multi-threaded (or single threaded if you screw up) applications to not need to lock the collection for concurrency violations. In .NET for example you cannot enumerate and modify a collection (or list or any IEnumerable) at the same time without locking or inheriting from IEnumerable and overriding methods (we get exceptions).
Iterator simply adds a common way of going over a collection of items. One of the nice features is the i.remove() in which you can remove elements from the list that you are iterating over. If you just tried to remove items from a list normally it would have weird effects or throw and exception.
The interface is like a contract for all things that implement it. You are basically saying.. anything that implements an iterator is guaranteed to have these methods that behave the same way. You can also use it to pass around iterator types if that is all you care about dealing with in your code. (you might not care what type of list it is.. you just want to pass an Iterator) You could put all these methods independently in the collections but you are not guaranteeing that they behave the same or that they even have the same name and signatures.
Iterators are one of the many design patterns available in java. Design patterns can be thought of as convenient building blocks, styles, usage of your code/structure.
To read more about the Iterator design pattern check out the this website that talks about Iterator as well as many other design patterns. Here is a snippet from the site on Iterator: http://www.patterndepot.com/put/8/Behavioral.html
The Iterator is one of the simplest
and most frequently used of the design
patterns. The Iterator pattern allows
you to move through a list or
collection of data using a standard
interface without having to know the
details of the internal
representations of that data. In
addition you can also define special
iterators that perform some special
processing and return only specified
elements of the data collection.
Iterators can be used against any sort of collection. They allow you to define an algorithm against a collection of items regardless of the underlying implementation. This means you can process a List, Set, String, File, Array, etc.
Ten years from now you can change your List implementation to a better implementation and the algorithm will still run seamlessly against it.
Iterator is useful when you are dealing with Collections in Java.
Use For-Each loop(Java1.5) for iterating over a collection or array or list.
The java.util.Iterator interface is used in the Java Collections Framework to allow modification of the collection while still iterating through it. If you just want to cleanly iterate over an entire collection, use a for-each instead, but a upside of Iterators is the functionality that you get: a optional remove() operation, and even better for the List Iterator interface, which offers add() and set() operations too. Both of these interfaces allow you to iterate over a collection and changing it structurally at the same time. Trying to modify a collection while iterating through it with a for-each would throw a ConcurrentModificationException, usually because the collection is unexpectedly modified!
Take a look at the ArrayList class
It has 2 private classes inside it (inner classes)
called Itr and ListItr
They implement Iterator and the ListIterator interfaces respectively
public class ArrayList..... { //enclosing class
private class Itr implements Iterator<E> {
public E next() {
return ArrayList.this.get(index++); //rough, not exact
}
//we have to use ArrayList.this.get() so the compiler will
//know that we are referring to the methods in the
//enclosing ArrayList class
public void remove() {
ArrayList.this.remove(prevIndex);
}
//checks for...co mod of the list
final void checkForComodification() { //ListItr gets this method as well
if (ArrayList.this.modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
}
}
private class ListItr extends Itr implements ListIterator<E> {
//methods inherted....
public void add(E e) {
ArrayList.this.add(cursor, e);
}
public void set(E e) {
ArrayList.this.set(cursor, e);
}
}
}
When you call the methods iterator() and listIterator(), they return
a new instance of the private class Itr or ListItr, and since these inner classes are "within" the enclosing ArrayList class, they can freely modify the ArrayList without triggering a ConcurrentModificationException, unless you change the list at the same time (conccurently) through set() add() or remove() methods of the ArrayList class.