What is meant by without exposing the 'internal structure'? (iterator)

What is meant by without exposing the 'internal structure'? (iterator) - java

One of the key attributes of an iterator is that
The iterator pattern allow us to:
access contents of a collection without exposing its internal structure.
What exactly is meant by that, what is the internal structure?

Every data structure is implemented differently. Some structures might use a linked design, some might be backed by a single array, or there could be some mix of the two. Imagine if every time you wanted to use a different List implementation, you needed to learn about how it works just to use it.
The Iterator interface (along with other Interfaces) provide(s) a consistent set of methods that allow you to use an iterable, even if you have no idea how it's implemented "under the hood".

The internal structure is the private member[s] that hold the contents of the collection.
For example, in ArrayList, the internal structure is a backing array holding the elements of the List:
transient Object[] elementData;
The Iterator returned by ArrayList's iterator() gives you access to the elements of the ArrayList without giving you access to the backing array, which means you cannot mutate the backing array directly.
For HashSet, the internal structure is a HashMap that holds the elements of the Set:
private transient HashMap<E,Object> map;
Again, the iterator gives you access to the elements of the Set without giving you access to that HashMap.

In this context internal structure means the list of member functions of the collection class or interface. But thinking further, as the member function list roughly makes up the class / interface, internal structure means the class / interface of the collection. And this statement is wide and can mean the sort of collection like array, bag, queue...
An iterator is an adapter often with a different interface than the collections it operates on. So using an iterator not only enables us to hide the collection's class declaration (which defines the collection's internal structure) but also enables us to hide the collection's public interface (which might be separate and smaller than the whole collection's class declaration). And this public interface is not internal structure. So your original statement without exposing its internal structure is ok but not completely comprehensive.
An iterator saves us not only from exposing internal structure but also from exposing the public interface of the collection. And this gives more value to the concept of iterator in context of code separation.

As a simple example, a TreeMap can be iterated to learn all members of the map. When using an iterator, I don't have to care whether the tree underlying the map is a binary tree, a trie, a b-tree, a red-black tree, etc. As a user I don't want to have to know how it maintains its internal links to read its data. I don't want to have to know anything about its implementation. Software that seeks maintainability should reduce the amount a user needs to know about its implementation.
"Structure" in this sense is used in the same manner as in "data structure"...a way of organizing data in memory to permit efficient lookup, insertion, deletion, or other operations. An iterator hides the details of the organization scheme used to make the operations efficient, by providing a simple way to get the data, item by item, just by calling next().

Related

Java data structures: IndexedSet or HashList

Is there a Java data structure that satisfies the following requirements?
Indexed access. (i.e., i should be able to retrieve objects with get(i))
Constant time get() & contains() methods
Should enforce ordering (enforced by Comparator)
Should be mutable
I could not find any in Oracle's JDK or Guava that gives the above listed features out of the box
List provides indexed access & some kind of ordering but not constant-time contains(). SortedSet and SortedMap provide constant-time contains() and sorting but not indexed access!!
Am I missing something or is there any data structure out there that could be manipulated to provide the features listed above?
My current solution:
A combination of HashMap & ArrayList => I use ArrayList to store the sorted keys which provides indexed access and use HashMap for constant-time contains() method. I just wanna make sure that I am not trying to re-invent something that has already been done
Why I need this:
Let's call this data structure SortedDataStore
I am developing an Android app & most of the UI in that is a list of items and these items are pulled off the local db. (the local db gets its data from a remote server). The UI is populated using RecyclerView and I need constant-time indexed access to get the object from my SortedDataStore and populate the views. Since the order of items is decided based on their attributes, there is a need for sorting. Also the data gets updated a lot (items get modified, deleted and new items get added). When the new data comes in, I check against my SortedDataStore if it should be deleted, or added or modified (and moved to another index) for which I need constant-time contains() & mutability.

Based on what you've described as your expected data size, ArrayList seems like it would actually be just fine in practice -- your data isn't big enough for the linear-time factor to matter that much.
Otherwise, what you're doing is the right solution; there's no provided mutable data structure that does all that at once.
If you can avoid mutation, Guava's ImmutableSet satisfies the rest of your demands. You can use ImmutableSet.asList().get(index) to get out elements by index in O(1) time, and otherwise it supports O(1) contains and insertion order.

An ArrayList satisfies the three requirements:
Indexed access, using get(int i)
Constant time access, using get(int i)
Order by insertion, using add(Object o)

Java's LinkedHashMap satisfies your requirements if you use an index as your key.
Indexed access: use get(i)
Constant time get() & contains() methods: use get(i) and containsKey()
Should enforce ordering (enforced by Comparator): see note
Should be mutable: yes
Note
If you want a custom comparator, extend the Comparable interface and #Override the compareTo() method on the class of the child object.
comparator for LinkedHashMap

Enforcing order in Java Iterator

I am providing a library for a different team. One of the methods I provide receives as argument an Iterator. I would like to somehow require a certain order of iteration. Is there any way to do this in code by extending Iterator?

Not directly. The iterator is made just to give you an item at time, avoiding to storing in memory and pass a whole list of values, which could be unfeasible at times.
Unless you have more knowledge on how the values are generated and which bounds have to be applied to the sorting of data, the only way is to get all elements from the iterator, store them in some list/vector/database, sort them and return another iterator using the sorted list.

You're being passed, as an argument, an instance of some concrete class which implements Iterator. You can't extend its class because its class is decided upon instantiation, which is done by the code that calls your method.
If you want to fail fast when the required order is not respected, try Guava's Ordering.isOrdered() method.
NB This will only work if your argument is an Iterable, rather than Iterator. You need it to be Iterable (an interface which allows you to retrieve the Iterator) so that you can iterate twice: once to check order, once to process.

Difference between ArrayList<Integer> list = new ArrayList<Integer>(); and Collection<Integer> list1 = new ArrayList<Integer>();

I don't understand difference between:
ArrayList<Integer> list = new ArrayList<Integer>();
Collection<Integer> list1 = new ArrayList<Integer>();
Class ArrayList extends class which implements interface Collection, so Class ArrayList implements Collection interface. Maybe list1 allows us to use static methods from the Collection interface?

An interface has no static methods [in Java 7]. list1 allows to access only the methods in Collection, whereas list allows to access all the methods in ArrayList.
It is preferable to declare a variable with its least specific possible type. So, for example, if you change ArrayList into LinkedList or HashSet for any reason, you don't have to refactor large portions of the code (for example, client classes).
Imagine you have something like this (just for illustrational purposes, not compilable):
class CustomerProvider {
public LinkedList<Customer> getAllCustomersInCity(City city) {
// retrieve and return all customers for that city
}
}
and you later decide to implement it returning a HashSet. Maybe there is some client class that relies on the fact that you return a LinkedList, and calls methods that HashSet doesn't have (e.g. LinkedList.getFirst()).
That's why you better do like this:
class CustomerProvider {
public Collection<Customer> getAllCustomersInCity(City city) {
// retrieve and return all customers for that city
}
}

What we're dealing with here is the difference between interface and implementation.
An interface is a set of methods without any regard to how those methods are implemented. When we instantiate an object as having a type that is actually an interface, what we're saying is that it is an object that implements all of the methods in that interface... but doesn't provide is with access to any of the methods in the class that actually provides those implementations.
When you instantiate an object with the type of an implementing class, then you have access to all of relevant methods of that class. Since that class is implementing an interface, you have access to the methods specified in the interface, plus any extras provided by the implementing class.
Why would you want to do this? Well, by restricting the type of your object to the interface, you can switch in new implementations without worrying about changing the rest of your code. This makes it a whole lot more flexible.

The difference, as others have said, is that you are limited to the methods defined by the Collection interface when you specify that as your variable type. But that doesn't answer the question of why you would want to do this.
The reason is that the choice of data type provides information to the people using the code. Especially when used as the parameter or return type from a function (where outside programmers may have no access to the internals).
In order of specificity, here is what different type choices might tell you:
Collection - a group of objects, with no further guarantees. The consumer of this object can iterate over the collection (with no guarantees as to iteration order), and can learn its size, but cannot do anything else.
List - a group of objects that have a specific order. When you iterate over these objects, you will always get them in the same order. You can also retrieve specific items from the collection by index, but you cannot make any assumptions about the performance of such retrieval.
ArrayList - a group of objects that have a specific order, and may be accessed by index in constant time.
And although you didn't ask about them, here are some other collection classes:
Set a group of objects that is guaranteed to contain no duplicates per the equals() method. There are no guarantees regarding the iteration order of these objects.
SortedSet a group of objects that contains no duplicates, and will always iterate in a specific order (although that specific order is not guaranteed by the collection).
TreeSet a group of ordered objects with no duplicates, that exhibits O(logN) insert and retrieval times.
HashSet a group of objects with no duplicates, that does not have an inherent order, but provides (amortized) constant-time access.

The only difference is that you're providing access to list1 through the Collection interface, whereas you provide access to list2 through the ArrayList interface. Sometimes, providing access through a restricted interface is useful, in that it promotes encapsulation and reduces dependence on implementation details.

When you perform operations on "list1", you'll only be able to access methods from the Collection interface (get, size, etc.). By declaring "list" as an ArrayList, you gain access to additional methods only defined in the ArrayList class (ensureCapacity and trimToSize, for example.
It's typically best practice to declare the variable as the least specific class you need. So, if you only need the methods from Collection, use it. Typically in this case, that would mean using List, which lets you know it's ordered and can handle duplicates.
Using the least specific class/interface allows you to freely change the implementation later. For example, if you later learn that a LinkedList would be a better implementation to use, you could change it without breaking all your code if you define the variable to be a List.

java-how to manage multiple lists of data- in a single variable- with easy access to each list

I have a scenario where I have to work with multiple lists of data in a java app...
Now each list can have any number of elements in it... Also, the number of such lists is also not known initially...
Which approach will suit my scenario best? I can think of arraylist of list, or list of list or list of arraylist etc(ie combinations of arraylist + list/ arraylist+arraylist/list+list)... what I would like to know is--
(1) Which of the above (or your own solution) will be easiest to manage- viz to store/fetch data
(2) Which of the above will use the least amount of memory?

I would declare my variable as:
List<List<DataType>> lists = new ArrayList<List<DataType>>();
There is a slight time penalty in accessing list methods through a variable of an interface type, but this, I think, is more than balanced by the flexibility you have of changing the type as you see fit. (For instance, if you decided to make lists immutable, you could do that through one of the methods in java.util.Collections, but not if you had declared it to be an ArrayList<List<DataType>>.)
Note that lists will have to hold instances of some concrete class that implements List<DataType>, since (as others have noted) List is an interface, not a class.

List is an interface. ArrayList is one implementation of List.
When you construct a List you must choose a specific concrete type (e.g. ArrayList). When you use the list it is better to program against the interface if possible. This prevents tight coupling between your code and the specific List implementation, allowing you to more easily change to another List implementation later if you wish.

If you know a way to identify which list you will be dealing with, use a map of lists.
Map<String,List<?>> = new HashMap<String,List<?>>();
This way you would avoid having to loop through the outer elements to reach the actual list. Hash map performs better than an iterator.

What are the benefits of the Iterator interface in Java?

I just learned about how the Java Collections Framework implements data structures in linked lists. From what I understand, Iterators are a way of traversing through the items in a data structure such as a list. Why is this interface used? Why are the methods hasNext(), next() and remove() not directly coded to the data structure implementation itself?
From the Java website: link text
public interface Iterator<E> An
iterator over a collection. Iterator
takes the place of Enumeration in the
Java collections framework. Iterators
differ from enumerations in two ways:
Iterators allow the caller to remove
elements from the underlying
collection during the iteration with
well-defined semantics. Method names
have been improved. This interface is
a member of the Java Collections
Framework.
I tried googling around and can't seem to find a definite answer. Can someone shed some light on why Sun chose to use them? Is it because of better design? Increased security? Good OO practice?
Any help will be greatly appreciated. Thanks.

Why is this interface used?
Because it supports the basic operations that would allow a client programmer to iterate over any kind of collection (note: not necessarily a Collection in the Object sense).
Why are the methods... not directly
coded to the data structure
implementation itself?
They are, they're just marked Private so you can't reach into them and muck with them. More specifically:
You can implement or subclass an Iterator such that it does something the standard ones don't do, without having to alter the actual object it iterates over.
Objects that can be traversed over don't need to have their interfaces cluttered up with traversal methods, in particular any highly specialized methods.
You can hand out Iterators to however many clients you wish, and each client may traverse in their own time, at their own speed.
Java Iterators from the java.util package in particular will throw an exception if the storage that backs them is modified while you still have an Iterator out. This exception lets you know that the Iterator may now be returning invalid objects.
For simple programs, none of this probably seems worthwhile. The kind of complexity that makes them useful will come up on you quickly, though.

You ask: "Why are the methods hasNext(), next() and remove() not directly coded to the data structure implementation itself?".
The Java Collections framework chooses to define the Iterator interface as externalized to the collection itself. Normally, since every Java collection implements the Iterable interface, a Java program will call iterator to create its own iterator so that it can be used in a loop. As others have pointed out, Java 5 allows us to direct usage of the iterator, with a for-each loop.
Externalizing the iterator to its collection allows the client to control how one iterates through a collection. One use case that I can think of where this is useful is when one has an an unbounded collection such as all the web pages on the Internet to index.
In the classic GoF book, the contrast between internal and external iterators is spelled out quite clearly.
A fundamental issue is deciding which party conrols the iteration, the iterator or the client that uses the iterator. When the client controls the iteration, the iterator is called an external iterator, and when the iterator controls it, the iterator is an internal iterator. Clients that use an external iterator must advance the traversal and request the next element explicitly from the iterator. In contrast, the client hands an internal iterator an operation to perform, and the iterator applies that operation to every element ....
External iterators are more flexible than internal iterators. It's easy to compare two collections for equality with an external iterator, for example, but it's practically impossible with internal iterators ... But on the other hand, internal iterators are easier to use, because they define the iteration logic for you.
For an example of how internal iterators work, see Ruby's Enumerable API, which has internal iteration methods such as each. In Ruby, the idea is to pass a block of code (i.e. a closure) to an internal iterator so that a collection can take care of its own iteration.

it is important to keep the collection apart from the pointer. the iterator points at a specific place in a collection, and thus is not an integral part of the collection. this way, for an instance, you can use several iterators over the same collection.
the down-side of this seperation is that the iterator is not aware to changes made to the collection it iterates on. so you cannot change the collection's structure and expect the iterator to continue it's work without "complaints".

Using the Iterator interface allows any class that implements its methods to act as iterators. The notion of an interface in Java is to have, in a way, a contractual obligation to provide certain functionalities in a class that implements the interface, to act in a way that is required by the interface. Since the contractual obligations must be met in order to be a valid class, other classes which see the class implements the interface and thus reassured to know that the class will have those certain functionalities.
In this example, rather than implement the methods (hasNext(), next(), remove()) in the LinkedList class itself, the LinkedList class will declare that it implements the Iterator interface, so others know that the LinkedList can be used as an iterator. In turn, the LinkedList class will implement the methods from the Iterator interface (such as hasNext()), so it can function like an iterator.
In other words, implementing an interface is a object-oriented programming notion to let others know that a certain class has what it takes to be what it claims to be.
This notion is enforced by having methods that must be implemented by a class that implements the interface. This makes sure that other classes that want to use the class that implements the Iterator interface that it will indeed have methods that Iterators should have, such as hasNext().
Also, it should be noted that since Java does not have multiple inheritance, the use of interface can be used to emulate that feature. By implementing multiple interfaces, one can have a class that is a subclass to inherit some features, yet also "inherit" the features of another by implementing an interface. One example would be, if I wanted to have a subclass of the LinkedList class called ReversibleLinkedList which could iterate in reverse order, I may create an interface called ReverseIterator and enforce that it provide a previous() method. Since the LinkedList already implements Iterator, the new reversible list would have implemented both the Iterator and ReverseIterator interfaces.
You can read more about interfaces from What is an Interface? from The Java Tutorial from Sun.

Multiple instances of an interator can be used concurrently. Approach them as local cursors for the underlying data.
BTW: favoring interfaces over concrete implementations looses coupling
Look for the iterator design pattern, and here: http://en.wikipedia.org/wiki/Iterator

Because you may be iterating over something that's not a data structure. Let's say I have a networked application that pulls results from a server. I can return an Iterator wrapper around those results and stream them through any standard code that accepts an Iterator object.
Think of it as a key part of a good MVC design. The data has to get from the Model (i.e. data structure) to the View somehow. Using an Iterator as a go-between ensures that the implementation of the Model is never exposed. You could be keeping a LinkedList in memory, pulling information out of a decryption algorithm, or wrapping JDBC calls. It simply doesn't matter to the view, because the view only cares about the Iterator interface.

An interesting paper discussing the pro's and con's of using iterators:
http://www.sei.cmu.edu/pacc/CBSE5/Sridhar-cbse5-final.pdf

I think it is just good OO practice. You can have code that deals with all kinds of iterators, and even gives you the opportunity to create your own data structures or just generic classes that implement the iterator interface. You don't have to worry about what kind of implementation is behind it.

Just M2C, if you weren't aware: you can avoid directly using the iterator interface in situations where the for-each loop will suffice.

Ultimately, because Iterator captures a control abstraction that is applicable to a large number of data structures. If you're up on your category theory fu, you can have your mind blown by this paper: The Essence of the Iterator Pattern.

Well it seems like the first bullet point allows for multi-threaded (or single threaded if you screw up) applications to not need to lock the collection for concurrency violations. In .NET for example you cannot enumerate and modify a collection (or list or any IEnumerable) at the same time without locking or inheriting from IEnumerable and overriding methods (we get exceptions).

Iterator simply adds a common way of going over a collection of items. One of the nice features is the i.remove() in which you can remove elements from the list that you are iterating over. If you just tried to remove items from a list normally it would have weird effects or throw and exception.
The interface is like a contract for all things that implement it. You are basically saying.. anything that implements an iterator is guaranteed to have these methods that behave the same way. You can also use it to pass around iterator types if that is all you care about dealing with in your code. (you might not care what type of list it is.. you just want to pass an Iterator) You could put all these methods independently in the collections but you are not guaranteeing that they behave the same or that they even have the same name and signatures.

Iterators are one of the many design patterns available in java. Design patterns can be thought of as convenient building blocks, styles, usage of your code/structure.
To read more about the Iterator design pattern check out the this website that talks about Iterator as well as many other design patterns. Here is a snippet from the site on Iterator: http://www.patterndepot.com/put/8/Behavioral.html
The Iterator is one of the simplest
and most frequently used of the design
patterns. The Iterator pattern allows
you to move through a list or
collection of data using a standard
interface without having to know the
details of the internal
representations of that data. In
addition you can also define special
iterators that perform some special
processing and return only specified
elements of the data collection.

Iterators can be used against any sort of collection. They allow you to define an algorithm against a collection of items regardless of the underlying implementation. This means you can process a List, Set, String, File, Array, etc.
Ten years from now you can change your List implementation to a better implementation and the algorithm will still run seamlessly against it.

Iterator is useful when you are dealing with Collections in Java.
Use For-Each loop(Java1.5) for iterating over a collection or array or list.

The java.util.Iterator interface is used in the Java Collections Framework to allow modification of the collection while still iterating through it. If you just want to cleanly iterate over an entire collection, use a for-each instead, but a upside of Iterators is the functionality that you get: a optional remove() operation, and even better for the List Iterator interface, which offers add() and set() operations too. Both of these interfaces allow you to iterate over a collection and changing it structurally at the same time. Trying to modify a collection while iterating through it with a for-each would throw a ConcurrentModificationException, usually because the collection is unexpectedly modified!
Take a look at the ArrayList class
It has 2 private classes inside it (inner classes)
called Itr and ListItr
They implement Iterator and the ListIterator interfaces respectively
public class ArrayList..... { //enclosing class
private class Itr implements Iterator<E> {
public E next() {
return ArrayList.this.get(index++); //rough, not exact
}
//we have to use ArrayList.this.get() so the compiler will
//know that we are referring to the methods in the
//enclosing ArrayList class
public void remove() {
ArrayList.this.remove(prevIndex);
}
//checks for...co mod of the list
final void checkForComodification() { //ListItr gets this method as well
if (ArrayList.this.modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
}
}
private class ListItr extends Itr implements ListIterator<E> {
//methods inherted....
public void add(E e) {
ArrayList.this.add(cursor, e);
}
public void set(E e) {
ArrayList.this.set(cursor, e);
}
}
}
When you call the methods iterator() and listIterator(), they return
a new instance of the private class Itr or ListItr, and since these inner classes are "within" the enclosing ArrayList class, they can freely modify the ArrayList without triggering a ConcurrentModificationException, unless you change the list at the same time (conccurently) through set() add() or remove() methods of the ArrayList class.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.