Why is exposing an iterators underlying representation bad?

Why is exposing an iterators underlying representation bad? - java

Iterator Pattern Definition: Provides a way to access the elements of an aggregate object sequentially without exposing its underlying representation. Wiki
What are the consequences of exposing the underlying representation?
To provide a more detailed answer: How is the iterator pattern preventing this?

As per: http://www.oodesign.com/iterator-pattern.html
The idea of the iterator pattern is to take the responsibility of accessing and passing trough the objects of the collection and put it in the iterator object. The iterator object will maintain the state of the iteration, keeping track of the current item and having a way of identifying what elements are next to be iterated.
Few benefits that you can get from this pattern:
Using Iterator pattern code designer can decide whether to allow 1 way iteration (using next() only) or allow reverse iteration as well (using prev() as in ListIterator).
Whether to allow object removal or not, if yes then how.
Maintain internal housekeeping when object is removed.
It allows you to expose common mechanism of traversing a collection rather than expecting your clients to understand underlying collections.

If the underlying representation were exposed, client code could couple to it. Then:
If the representation changes, it may be necessary to change all the code coupling to it.
If you want to iterate over a different type of container, it may be necessary to change the code coupling to the old container.
Data abstraction makes code more resilient to a change in the representation.

In short: all the code relying on the underlying representation will have to be changed if you decide to change the representation.
E.g., you decided to use TreeMap at first, but then you don't want ordering anymore (in most cases), so you change to HashMap. Somebody is looping through your map trying to get a increasing list. !!
Using iterator pattern, you could always give the user the ability to loop through something with a certain logic (or just random, which is a kind of logic) without knowing what it is under the hood.
Now, if you use HashMap instead of TreeMap, you could expose a sorted view to the user. If you provide this SortedIterator and tell user "using this will guarantee the result to be sorted, but I can't tell you anything about what's underneath", you can change the representation to be whatever you like, as long as the contract of this SortedIterator is maintained by you.

Related

Purpose of the iterator-pattern. Have I understood it right?

I have recently tried to understand the so-called Iterator pattern.
Think if have understood it's purpose but I'm still not sure. So please correct me concerning this:
The purpose of the iterator pattern is to abstract away the underlying structure in which the data
are kept. Data-structure can be an array, a tree, a list ...
It's important methods are next() (returns an object), hasNext() (returns a boolean) and remove().
The methods are implemented in a way which is appropriate way for the used data-structure. So the developer
who uses a iterator-implementing-class don't have to care. Just uses the provided methods which are
the same for every iterator-implementing-class.
Have I get it right?

What you have summarized is correct. Iterator design pattern hides the underlying complexity of a collection\aggregate by providing an iterator interface in between the collection and the data retriever.
Next,we can take this one level higher by defining an abstraction for the iterator. This means we can write iterators to traverse the same collection in multiple ways. For e.g.: If we have a binary tree collection then we can write three iterators for inorder, postorder and preorder traversals.
Lastly, Iterator pattern allows to have an abstraction for the collection being iterated as well. This implies that one can implement a family of iterators for a family of collections.
One more important thing to note is that an interator knows enough about the inner structure of the collection to be able to iterate it. And that it is the responsibility of the collection instance to create the correct iterator(out of the possible family of iterators) for itself and return it back to the client.
If you are interested in reading more about the iterator pattern, I have explained the above points in depth in a writeup on my blog: http://www.javabrahman.com/design-patterns/iterator-design-pattern-in-java/

Yes your understanding almost covers most of it.
For getting the same in a more technical scenes refer IteratorDesignPattern https://sourcemaking.com/design_patterns/iterator

Is there a way to create a List/Set which keeps insertion order and does not allow duplicates in Java?

What is the most efficient way of maintaining a list that does not allow duplicates, but maintains insertion order and also allows the retrieval of the last inserted element in Java?

Try LinkedHashSet, which keeps the order of input.
Note that re-inserting an element would update its position in the input order, thus you might first try and check whether the element is already contained in the set.
Edit:
You could also try the Apache commons collections class ListOrderedSet which according to the JavaDoc (if I didn't missread anything again :) ) would decorate a set in order to keep insertion order and provides a get(index) method.
Thus, it seems you can get what you want by using new ListOrderedSet(new HashSet());
Unfortunately this class doesn't provide a generic parameter, but it might get you started.
Edit 2:
Here's a project that seems to represent commons collections with generics, i.e. it has a ListOrderedSet<E> and thus you could for example call new ListOrderedSet<String>(new HashSet<String>());

I don't think there's anything in the JDK which does this.
However, LinkedHashMap, which is used as the basis for LinkedHashSet, comes close: it maintains a circular doubly-linked list of the entries in the map. It only tracks the head of the list not the tail, but because the list is circular, header.before is the tail (the most recently inserted element).
You could therefore implement what you need on top of this. LinkedHashMap has not been designed for extension, so this is somewhat awkward. You could copy the code into your own class and add a suitable last() method (be aware of licensing issues here), or you could extend the existing class, and add a method which uses reflection to get at the private header and before fields.
That would get you a Map, rather than a Set. However, HashSet is already a wrapper which makes a Map look like a Set. Again, it is not designed for general extension, but you could write a subclass whose constructor calls the super constructor, then uses more reflection to replace the superclass's value of map with an instance of your new map. From there on, the class should do exactly what you want.
As an aside, the library classes here were all written by Josh Bloch and Neal Gafter. Those guys are two of the giants of Java. And yet the code in there is largely horrible. Never meet your heroes.

Just use a TreeSet.

What is the difference between iterators in Java and C++?

How is the implementation of Iterator in Java different from that in C++?

In the current C++ (98) standard library (in particular the portion formerly known as STL) defines a form of iterators that are very close to C pointers (including arithmetic). As such they just point somewhere. To be useful, you generally need two pointers so that you can iterate between them. I understand C++0x introduces ranges which act more like Java iterators.
Java introduced the Iterator (and ListIterator) interface in 1.2, largely taking over from the more verbose Enumerable. Java has no pointer arithmetic, so there is no need to behave like a pointer. They have a hasNext method to see if they have go to the end, instead of requiring two iterators. The downside is that they are less flexible. The system requires methods as subList rather than iterating between two iterators are specific points in the containing list.
A general difference in style is that whereas C++ use "static polymorphism" through templates, Java uses interfaces and the common dynamic polymorphism.
The concept of iterators is to provide the glue to allow separation of "algorithm" (really control flow) and data container. Both approaches do that reasonably well. In ideal situations "normal" code should barely see iterators.

C++ doesn't specify how iterators are implemented. It does however specify their interface and the minimal behaviour they must provide in order to work with other Standard Library components.
For example, for input iterators, the Standard specifies that an iterator must be dereferenceable via the * operator. It does not however specify how that operator is to be implemented.

In C++, you see people passing around iterators all the time. C++ iterators "point" to a specific element of a container. You can dereference an iterator to get the element (and you can do this over and over). You can erase the element that an iterator refers to efficiently. You can also make copies of an iterator (either by assigning to another variable, or passing an iterator by value to a function) to keep track of multiple places at the same time. Iterators in C++ can become "invalidated" by certain operations on the container, depending on the container. When an iterator becomes invalidated (the rules of which might be complex), operations with the iterator have undefined behavior, and may (inconsistently) crash your program or return incorrect results; although in some data structures (e.g. std::list), iterators remain valid through most modifications of the container.
In Java, you don't see that kind of usage. An Iterator in Java points "between" two elements, rather than "at" an element. The only way you get elements with an iterator is to move it forwards to get the element you moved over. Of course that changes the state of the iterator, and you can't go back; unless you have a ListIterator in which case you can move both forwards and backwards (but it is still annoying to have to move forwards and backwards just to remain still). You can't copy an iterator, because the interface does not expose a public copy method; so for example, you can't just pass a marker of a specific location to a function, without giving that function a reference to the same iterator that you have, and thereby allowing them to change the state of your iterator. In Java, an iterator will be invalidated (at least in the standard containers) by any modification of the container except through the iterator's own add() or remove() methods, which is overly conservative (e.g. most modifications on a LinkedList should not affect an iterator). When you try to use an invalidated iterator, it raises ConcurrentModificationException (confusingly named because it has nothing to do with Concurrency) instead of possibly causing undefined behavior, which is good.

C++ iterators (STL) try to mimic pointer syntax as much as possible, through operator overloading.
The standard specification define the various iterator concepts (like forward, bidirectional, random access, input, output). Each concept should match a specific interface (e.g. ++ operator for forward iterator to go to the next element in sequence, -- for bidirectional, +, += for random access, etc).

implementations are defined entirely by standard library vendors/JRE vendors in C++/Java respectively. They are free to implement them however they want, as long as their behaviour conforms to the respective standards.

Java hashmaps without the value?

Let's say I want to put words in a data structure and I want to have constant time lookups to see if the word is in this data structure. All I want to do is to see if the word exists. Would I use a HashMap (containsKey()) for this? HashMaps use key->value pairings, but in my case I don't have a value. Of course I could use null for the value, but even null takes space. It seems like there ought to be a better data structure for this application.
The collection could potentially be used by multiple threads, but since the objects contained by the collection would not change, I do not think I have a synchronization/concurrency requirement.
Can anyone help me out?

Use HashSet instead. It's a hash implementation of Set, which is used primarily for exactly what you describe (an unordered set of items).

You'd generally use an implementation of Set, and most usually HashSet. If you did need concurrent access, then ConcurrentHashSet provides a drop-in replacement that provides safe, concurrent access, including safe iteration over the set.
I'd recommend in any case referring to it as simply a Set throughout your code, except in the one place where you construct it; that way, it's easier to drop in one implementation for the other if you later require it.
Even if the set is read-only, if it's used by a thread other than the one that creates it, you do need to think about safe publication (that is, making sure that any other thread sees the set in a consistent state: remember any memory writes, even in constructors, aren't guaranteed to be made available to other threads when or in the otder you expect, unless you take steps to ensure this). This can be done by both of the following:
making sure the only reference(s) to the set are in final fields;
making sure that it really is true that no thread modifies the set.
You can help to ensure the latter by using the Collections.unmodifiableSet() wrapper. This gives you an unmodifiable view of the given set-- so provided no other "normal" reference to the set escapes, you're safe.

You probably want to use a java.util.Set. Implementations include java.util.HashSet, which is the Set equivalent of HashMap.
Even if the objects contained in the collection do not change, you may need to do synchronization. Do new objects need to be added to the Set after the Set is passed to a different thread? If so, you can use Collections.synchronizedSet() to make the Set thread-safe.
If you have a Map with values, and you have some code that just wants to treat the Map as a Set, you can use Map.entrySet() (though keep in mind that entrySet returns a Set view of the keys in the Map; if the Map is mutable, the Map can be changed through the set returned by entrySet).

You want to use a Collection implementing the Set interface, probably HashSet to get the performance you stated. See http://java.sun.com/javase/6/docs/api/java/util/Set.html

Other than Sets, in some circumstances you might want to convert a Map into a Set with Collections.newSetFromMap(Map<E,Boolean>) (some Maps disallow null values, hence the Boolean).

as everyone said HashSet is probably the simplest solution but you won't have constant time lookup in a HashSet (because entries may be chained) and you will store a dummy object (always the same) for every entry...
For information here a list of data structures maybe you'll find one that better fits your needs.

when to use Set vs. Collection?

Is there any practical difference between a Set and Collection in Java, besides the fact that a Collection can include the same element twice? They have the same methods.
(For example, does Set give me more options to use libraries which accept Sets but not Collections?)
edit: I can think of at least 5 different situations to judge this question. Can anyone else come up with more? I want to make sure I understand the subtleties here.
designing a method which accepts an argument of Set or Collection. Collection is more general and accepts more possibilities of input. (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Collection.)
designing a method which returns a Set or Collection. Set offers more guarantees than Collection (even if it's just the guarantee not to include one element twice). (if I'm designing a specific class or interface, I'm being nicer to my consumers and stricter on my subclassers/implementers if I use Set.)
designing a class that implements the interface Set or Collection. Similar issues as #2. Users of my class/interface get more guarantees, subclassers/implementers have more responsibility.
designing an interface that extends the interface Set or Collection. Very similar to #3.
writing code that uses a Set or Collection. Here I might as well use Set; the only reasons for me to use Collection is if I get back a Collection from someone else's code, or if I have to handle a collection that contains duplicates.

Collection is also the supertype of List, Queue, Deque, and others, so it gives you more options. For example, I try to use Collection as a parameter to library methods that shouldn't explicitly depend on a certain type of collection.
Generally, you should use the right tool for the job. If you don't want duplicates, use Set (or SortedSet if you want ordering, or LinkedHashSet if you want to maintain insertion order). If you want to allow duplicates, use List, and so on.

I think you already have it figured out- use a Set when you want to specifically exclude duplicates. Collection is generally the lowest common denominator, and it's useful to specify APIs that accept/return this, which leaves you room to change details later on if needed. However if the details of your application require unique entries, use Set to enforce this.
Also worth considering is whether order is important to you; if it is, use List, or LinkedHashSet if you care about order and uniqueness.

See Java's Collection tutorial for a good walk-through of Collection usage. In particular, check out the class hierarchy.

As #mmyers states, Collection includes Set, as well as List.
When you declare something as a Set, rather than a Collection, you are saying that the variable cannot be a List or a Map. It will always be a Collection, though. So, any function that accepts a Collection will accept a Set, but a function that accepts a Set cannot take a Collection (unless you cast it to a Set).

One other thing to consider... Sets have extra overhead in time, memory, and coding in order to guarantee that there are no duplicates. (Time and memory because sets are usually backed by a HashMap or a Tree, which adds overhead over a list or an array. Coding because you have to implement the hashCode() and equals() methods.)
I usually use sets when I need a fast implementation of contains() and use Collection or List otherwise, even if the collection shouldn't have duplicates.

You should use a Set when that is what you want.
For example, a List without any order or duplicates. Methods like contains are quite useful.
A collection is much more generic. I believe that what mmyers wrote on their usage says it all.

The practical difference is that Set enforces the set logic, i.e. no duplicates and unordered, while Collection does not. So if you need a Collection and you have no particular requirement for avoiding duplicates then use a Collection. If you have the requirement for Set then use Set. Generally use the highest interface possibble.

As Collection is a super type of Set and SortedSet these can be passed to a method which expects a Collection. Collection just means it may or may not be sorted, order or allow duplicates.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why is exposing an iterators underlying representation bad? - java

Related

Purpose of the iterator-pattern. Have I understood it right?

Is there a way to create a List/Set which keeps insertion order and does not allow duplicates in Java?

What is the difference between iterators in Java and C++?

Java hashmaps without the value?

when to use Set vs. Collection?

Categories

Resources