Can iterator.next() remove elements from the source? (Java)

Can iterator.next() remove elements from the source? (Java) - java

There is a Java function
public <T> void batchWrite(Iterable<T> source, int number)
that writes a large number of items in a time-efficient way.
I want to use this batchWrite() on a ConcurrentLinkedQueue that may be written to while batchWrite() works. And I want batchWrite() to delete the items that it takes from the queue.
I can write an iterator (and wrap it into an Iterable) that will delete the returned items:
class IteratorThatRemovesReturnedValues<T> implements Iterator<T> {
Queue<T> queue;
IteratorThatRemovesReturnedValues(Queue<T> q) { queue = q; }
boolean hasNext() { return queue.peek() != null; }
T next() { return queue.poll(); }
}
The question is: will not that be an abuse of the concept?
The description of remove() says:
The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this (remove()) method.
This may be read in either way: a particular implementation of Iterator may either be or not be allowed to define what happens when the underlying wait-free queue is modified in some specific way.
Will removing returned elements from the underlying queue contradict the iterator's contract?
(The alternative is to poll() an amount of items into an auxiliary ArrayList and invoke batchWrite() on that list.)
EDIT The other side of this question: can an iterator be backed by a pipe?

Assuming that you're not considering the (many) caveats of the Iterator#remove() method, the behavior of an iterator, according to the contract, is simple. Particularly, it does not say anything about the underlying implementation.
A side note: Historically, Iterator was associated with the iteration over a Collection, but this may just be a remedy to the legacy Enumeration class. There are many Iterator implementations that are not related to a collection. (For example, Scanner, and many more). Nowadays, an Iterator is often really not much more than an "abstract source of things that allows you to query whether there are more things available".
But strictly speaking, your implementation already does not obey the contract. The documentation of the Iterator#next() method says:
Throws:
NoSuchElementException - if the iteration has no more elements
This is not the case for you. Your implementation would simply return null here. Of course, this could easily be alleviated by replacing the poll() call with a call to remove(), which would conveniently throw an NoSuchElementException, but I guess that would contradict your current goals.
I think that the cructial point is that your implementation of the next() method has a side-effect that will be visible to the outside world. One example was already given in the comments: When you create two of these iterator instances, then both may behave according to their contract, but the behavior of both together may differ from what one would expect.
The latter only refers to single-threaded usage. When two threads are independently using these iterators, then an interweaving of calls to hasNext() and next() may induce a race condition. (Even more when two threads are using a single iterator, but this is the case for all iterators, because they usually are stateful anyhow).
The bottom line is: Although some details of the intended usage are still not clear (particularly regarding the question of how exactly the iterator is used in the batchWrite method, and how the multiple threads are supposed to interact here), this way of implementing it is likely to break sooner or later, and the bugs will be hard to detect and reproduce. I'd recommend to consider alternative implementations here.

Related

Does the Internal Iterator use the hasNext() and next() to iterate in Java?

I'm learning about internal iterators and I understand that an internal iterator manage the iterations in the background:
public void internalIterator(){
List<String> namesList = Arrays.asList("Tom", "Dick", "Harry");
namesList.forEach(name -> System.out.println(name));
}
But I think that enhanced for loop does the same thing:
public void enhancedForLoop(){
List<String> namesList = Arrays.asList("Tom", "Dick", "Harry");
for(String name : namesList){
System.out.println(name);
}
}
I know that enhanced for loop uses hasNext() and next() methods in the background. And enhanced for loop is an external iterator. Then why forEach() method is an internal iterator? Doesn't forEach() method use hasNext() and next() methods in the background? How the iterations are managed in the backgorund in a different way than enhanced for loop? And is the iteration faster using forEach() than using enhanced for loop? Any feedback will be apreciated?

See also the definition of forEach on the interface Iterable.
Both constructs use different methods of the same interface.
Conceptually, the difference is that in an "enhanced for loop" the class implementing the Iterable only creates the Iterator, and the loop construct is responsible for advancing it (see also this related question).
When calling forEach, the class controls the entire process of iteration, and can pick whatever is most suitable for its underlying data structure. For example, it could avoid creating an Iterator object, and instead use some internal array index or similar.
Other than that, they should be equivalent. In most cases I wouldn't expect any difference in performance.
See this excellent answer for some additional reasons why you might want to use forEach, e.g. additional consistency guarantees when iterating synchronized collections, and this one mentioning things that it does not provide, such as flow control (e.g. short-circuiting using break) or support for checked exceptions.

The difference is what the user of the API has to do.
Differences
External
When using iterator(), it's the user's job (your job) to manage the traversal: when to call hasNext() and when to call next(). The iteration is handled externally from the API, by the user of the API. The user both iterates through the elements & consumes the elements.
Internal
When using forEach, it's the API's job to manage the traversal. The iteration is handled internally instead of by the user; the user only consumes the elements.
Conclusion
It doesn't matter if hasNext() and next() are being called. What does matter is who calls hasNext() and next() - who handles the iteration, who is in charge of the iterator.
forEach is internal because the user doesn't have control over how the elements are iterated through. The API handles it; the iterator is internal.
iterator() is external because the user must define how iteration will occur. The API passes you the iterator, and that's all the API does. The iterator is external.
Even though forEach uses iterator() in some cases, that iterator is still internal to the API you're using (the List). Whoever calls forEach still only worry about consuming elements. The user doesn't control how the elements are traversed through, so it doesn't matter what forEach uses.
Suggestions on when to use a strategy
You'd use an enhanced loop when you need the most basic of sequential iteration.
You'd use an iterator() when you need more complex iteration.
You'd use forEach when you're only worried about consuming the elements, and don't mind if the API decides on how to traverse through the collection.

I don't know where you got that "enhanced for loop is an external iterator", but Iterable.forEach just does the same thing as your second example (could be overridden, though I don't see any reason to).
default void forEach(Consumer<? super T> action) {
Objects.requireNonNull(action);
for (T t : this) {
action.accept(t);
}
}

Is there a way to opt for "unspecified behavior" rather than ConcurrentModificationException?

I know that code like
for ( Object o: collection){
if (condition(i)){
collection.remove(i);
}
}
will throw a ConcurrentModificationException, and I understand why: modifying the collection directly could interfere with the Iterator's ability to keep track of its place, by, for instance, leaving it with a reference to an element that's no longer a part of the collection, or causing it to skip over one that's just been added. For code like the above, that's a reasonable concern, however, I would like to write something like
for (Object o: set){// set is an instance of java.util.LinkedHashSet
if (condition(o)){
set.remove(other(o));
}
}
Where other(o) is guaranteed to be "far" from o in the ordering of set. In my particular implementation it will never be less than 47 "steps" away from o. Additionally, if if condition(o) is true, the loop in question will be guaranteed to short-circuit well before it reaches the place where other(o) was. Thus the entire portion of the set accessed by the iterator is thoroughly decoupled from the portion that is modified. Furthermore, the particular strengths of LinkedHashSet (fast random-access insertion and removal, guaranteed iteration order) seem particularly well-suited to this exact sort of operation.
I suppose my question is twofold: First of all, is such an operation still dangerous given the above constraints? The only way that I can think that it might be is that the Iterator values are preloaded far in advance and cached, which I suppose would improve performance in many applications, but seems like it would also reduce it in many others, and therefore be a strange choice for a general-purpose class from java.util. But perhaps I'm wrong about that. When it comes to things like caching, my intuition about efficiency is often suspect. Secondly, assuming this sort of thing is, at least in theory, safe, is there a way, short of completely re-implementing LinkedHashSet, or sacrificing efficiency, to achieve this operation? Can I tell Collections to ignore the fact that I'm modifying a different part of the Set, and just go about its business as usual? My current work-around is to add elements to an intermediate collection first, then add them to the main set once the loop is complete, but this is inefficient, since it has to add the values twice.

The ConcurrentModificationException is thrown because your collection may not be able to handle the removal (or addition) at all times. For example, what if the removal you performed meant that your LinkedHashSet had to reduce/increase the space the underlying HashMap takes under the hood? It would have to make a lot of changes, which would possibly render the iterator useless.
You have two options:
Use Iterator to iterate elements and remove them, e.g. calling Iterator iter = linkedHashSet.iterator() to get the iterator and then remove elements by iter.remove()
Use one of the concurrent collections available under the java.util.concurrent package, which are designed to allow concurrent modifications
This question contains nice details on using Iterator
UPDATE after comments:
You can use the following pattern in order to remove the elements you wish without causing a ConcurrentModificationException: gather the elements you wish to remove in a List while looping through the LinkedHashSet elements. Afterwards, loop through each toBeDeleted element in the list and remove it from the LinkedHashSet.

What is a thread-safe List implementation that allows Collections.sort

I have to write a program that requires a list. This list needs to be thread-safe in its implementation (mostly to avoid ConcurrentModificationException) but ALSO needs to allow the
Collections.sort() method to be applied, for API reasons.
CopyOnWriteArrayList fulfills the former, but not the latter, and other implementations I can find allow the latter but not the former.
Does Java have a list implementation that will work for me?
EDIT: An important point to note is that unfortunately my code needs to be Java 6 compatible.

I am wondering if this actually possible on a conceptual level: for a sort operation to be consistent, I would expect that the whole list is blocked for any adds/removes while the sorting is going on.
But Collections.sort() has no idea that it would need to lock the whole list while doing its work. You give it a list, and if another thread is trying to modify the list at the same time ... good luck with that.
Or if you reverse the point of view: how should a "thread-safe" list understand that it is right now in the process of being sorted; so - some accesses (like swapping elements) are fine; but others (like adding/removing) elements are not?!
In other words: I think you can only do this: pick any of the "thread-safe" list implementations; and then you have to put your own wrapper in place that
"Locks" the list for changes
Does the sorting work
"Unlocks" the list
And of course, for "2."; you are free to turn to Collections.sort().
Or, if you are using Java8 - you use the CopyOnWriteArrayList and its already implemented sort() method (which is kind of proving my point: you can only do proper sorting if you own the list while running the sort operation!).
Giving your latest comment: of course, you could manually "backport" the Java8 version of CopyOnWriteArrayList into your environment and use that; but of course, that won't help; as I understand that Java6-Collections.sort() will not call the new sort() method from that class.
So, it seems that the sum of your requirements can't be resolved; and you will have to bite the bullet and doing most of that in your own code.

Well, CopyOnWriteArrayList locks the entire collection (for insertion) while sorting. No?
Looks like you are good with CopyOnWriteArrayList. Below is the snippet from this class -
public void sort(Comparator<? super E> c) {
final ReentrantLock lock = this.lock;
lock.lock();**
try {
Object[] elements = getArray();
Object[] newElements = Arrays.copyOf(elements, elements.length);
#SuppressWarnings("unchecked") E[] es = (E[])newElements;
Arrays.sort(es, c);
setArray(newElements);
} finally {
lock.unlock();
}
}
Hmm.... since you've updated the question that code needs to be Java6 compatible, I'd say that you should extend the normal list and make use of https://docs.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReadWriteLock.html. In this type of lock, readers are not blocked from reading even when some other thread has acquired writeLock already, and 2 threads can acquire 'read' lock simultaneously.
Btw, this technique will require your caller to know that Collection.sort(...) shouldn't be called since you will have to expose explicit sort() method on your list. Hmm.... not sure if this was helpful.

Looking for an unbounded, queue-based, concurrent implementation of java.util.Set

I'm looking for an implementation of java.util.Set with the following features:
Should be concurrent by no means of synchronized locking; So it's obvious that I don't want to use Collections.synchronizedSet().
Should keep insertion order. So ConcurrentSkipListSet is not preferable, because it uses compareTo() as equals(), and requires to provide either an implementation of Comparable or Comparator. There is also a ConcurrentLinkedHashMap which in spite of LinkedHashMap, doesn't keep insertion order.
Should be unbounded.
Recommended be a FIFO linked list, as my operations are done only to the first element of the queue.
As far as I could find the only proper impl is CopyOnWriteArraySet, but it states in the documentation that:
Mutative operations (add, set, remove,
etc.) are expensive since they usually entail copying the entire
underlying array.
In my case, I have lots of insertions to the end of queue (set) and lots Any deletions (and read) from head of the queue. So, any recommendation?

The following solution has a race condition on removal. It also behaves somewhat differently from standard JDK Set implementations.
However, it uses standard JDK objects, and is a simple implementation. Only you can decide whether this race condition is acceptable, or whether you're willing to invest the timee to find/implement a solution without races.
public class FifoSet<T>
{
private ConcurrentHashMap<T,T> _map;
private ConcurrentLinkedQueue<T> _queue;
public void add(T obj)
{
if (_map.put(obj,obj) != null)
return;
_queue.add(obj);
}
public T removeFirst()
{
T obj = _queue.remove();
_map.remove(obj);
return obj;
}
}
Some more explanation: the ConcurrentHashMap exists solely as a guard on the ConcurrentLinkedList; its put() method is essentially a compare-and-swap. So you ensure that you don't have anything in the map before adding to the queue, and you don't remove from the map until you remove from the queue.
The race condition on remove is that there's a space of time between removing the item from the queue and removing it from the map. In that space of time, add will fail, because it still thinks the item is in the queue.
This is imo a relatively minor race condition. One that's far less important than the gap in time between removing the item from the queue and actually doing something with that item.

What are the benefits of the Iterator interface in Java?

I just learned about how the Java Collections Framework implements data structures in linked lists. From what I understand, Iterators are a way of traversing through the items in a data structure such as a list. Why is this interface used? Why are the methods hasNext(), next() and remove() not directly coded to the data structure implementation itself?
From the Java website: link text
public interface Iterator<E> An
iterator over a collection. Iterator
takes the place of Enumeration in the
Java collections framework. Iterators
differ from enumerations in two ways:
Iterators allow the caller to remove
elements from the underlying
collection during the iteration with
well-defined semantics. Method names
have been improved. This interface is
a member of the Java Collections
Framework.
I tried googling around and can't seem to find a definite answer. Can someone shed some light on why Sun chose to use them? Is it because of better design? Increased security? Good OO practice?
Any help will be greatly appreciated. Thanks.

Why is this interface used?
Because it supports the basic operations that would allow a client programmer to iterate over any kind of collection (note: not necessarily a Collection in the Object sense).
Why are the methods... not directly
coded to the data structure
implementation itself?
They are, they're just marked Private so you can't reach into them and muck with them. More specifically:
You can implement or subclass an Iterator such that it does something the standard ones don't do, without having to alter the actual object it iterates over.
Objects that can be traversed over don't need to have their interfaces cluttered up with traversal methods, in particular any highly specialized methods.
You can hand out Iterators to however many clients you wish, and each client may traverse in their own time, at their own speed.
Java Iterators from the java.util package in particular will throw an exception if the storage that backs them is modified while you still have an Iterator out. This exception lets you know that the Iterator may now be returning invalid objects.
For simple programs, none of this probably seems worthwhile. The kind of complexity that makes them useful will come up on you quickly, though.

You ask: "Why are the methods hasNext(), next() and remove() not directly coded to the data structure implementation itself?".
The Java Collections framework chooses to define the Iterator interface as externalized to the collection itself. Normally, since every Java collection implements the Iterable interface, a Java program will call iterator to create its own iterator so that it can be used in a loop. As others have pointed out, Java 5 allows us to direct usage of the iterator, with a for-each loop.
Externalizing the iterator to its collection allows the client to control how one iterates through a collection. One use case that I can think of where this is useful is when one has an an unbounded collection such as all the web pages on the Internet to index.
In the classic GoF book, the contrast between internal and external iterators is spelled out quite clearly.
A fundamental issue is deciding which party conrols the iteration, the iterator or the client that uses the iterator. When the client controls the iteration, the iterator is called an external iterator, and when the iterator controls it, the iterator is an internal iterator. Clients that use an external iterator must advance the traversal and request the next element explicitly from the iterator. In contrast, the client hands an internal iterator an operation to perform, and the iterator applies that operation to every element ....
External iterators are more flexible than internal iterators. It's easy to compare two collections for equality with an external iterator, for example, but it's practically impossible with internal iterators ... But on the other hand, internal iterators are easier to use, because they define the iteration logic for you.
For an example of how internal iterators work, see Ruby's Enumerable API, which has internal iteration methods such as each. In Ruby, the idea is to pass a block of code (i.e. a closure) to an internal iterator so that a collection can take care of its own iteration.

it is important to keep the collection apart from the pointer. the iterator points at a specific place in a collection, and thus is not an integral part of the collection. this way, for an instance, you can use several iterators over the same collection.
the down-side of this seperation is that the iterator is not aware to changes made to the collection it iterates on. so you cannot change the collection's structure and expect the iterator to continue it's work without "complaints".

Using the Iterator interface allows any class that implements its methods to act as iterators. The notion of an interface in Java is to have, in a way, a contractual obligation to provide certain functionalities in a class that implements the interface, to act in a way that is required by the interface. Since the contractual obligations must be met in order to be a valid class, other classes which see the class implements the interface and thus reassured to know that the class will have those certain functionalities.
In this example, rather than implement the methods (hasNext(), next(), remove()) in the LinkedList class itself, the LinkedList class will declare that it implements the Iterator interface, so others know that the LinkedList can be used as an iterator. In turn, the LinkedList class will implement the methods from the Iterator interface (such as hasNext()), so it can function like an iterator.
In other words, implementing an interface is a object-oriented programming notion to let others know that a certain class has what it takes to be what it claims to be.
This notion is enforced by having methods that must be implemented by a class that implements the interface. This makes sure that other classes that want to use the class that implements the Iterator interface that it will indeed have methods that Iterators should have, such as hasNext().
Also, it should be noted that since Java does not have multiple inheritance, the use of interface can be used to emulate that feature. By implementing multiple interfaces, one can have a class that is a subclass to inherit some features, yet also "inherit" the features of another by implementing an interface. One example would be, if I wanted to have a subclass of the LinkedList class called ReversibleLinkedList which could iterate in reverse order, I may create an interface called ReverseIterator and enforce that it provide a previous() method. Since the LinkedList already implements Iterator, the new reversible list would have implemented both the Iterator and ReverseIterator interfaces.
You can read more about interfaces from What is an Interface? from The Java Tutorial from Sun.

Multiple instances of an interator can be used concurrently. Approach them as local cursors for the underlying data.
BTW: favoring interfaces over concrete implementations looses coupling
Look for the iterator design pattern, and here: http://en.wikipedia.org/wiki/Iterator

Because you may be iterating over something that's not a data structure. Let's say I have a networked application that pulls results from a server. I can return an Iterator wrapper around those results and stream them through any standard code that accepts an Iterator object.
Think of it as a key part of a good MVC design. The data has to get from the Model (i.e. data structure) to the View somehow. Using an Iterator as a go-between ensures that the implementation of the Model is never exposed. You could be keeping a LinkedList in memory, pulling information out of a decryption algorithm, or wrapping JDBC calls. It simply doesn't matter to the view, because the view only cares about the Iterator interface.

An interesting paper discussing the pro's and con's of using iterators:
http://www.sei.cmu.edu/pacc/CBSE5/Sridhar-cbse5-final.pdf

I think it is just good OO practice. You can have code that deals with all kinds of iterators, and even gives you the opportunity to create your own data structures or just generic classes that implement the iterator interface. You don't have to worry about what kind of implementation is behind it.

Just M2C, if you weren't aware: you can avoid directly using the iterator interface in situations where the for-each loop will suffice.

Ultimately, because Iterator captures a control abstraction that is applicable to a large number of data structures. If you're up on your category theory fu, you can have your mind blown by this paper: The Essence of the Iterator Pattern.

Well it seems like the first bullet point allows for multi-threaded (or single threaded if you screw up) applications to not need to lock the collection for concurrency violations. In .NET for example you cannot enumerate and modify a collection (or list or any IEnumerable) at the same time without locking or inheriting from IEnumerable and overriding methods (we get exceptions).

Iterator simply adds a common way of going over a collection of items. One of the nice features is the i.remove() in which you can remove elements from the list that you are iterating over. If you just tried to remove items from a list normally it would have weird effects or throw and exception.
The interface is like a contract for all things that implement it. You are basically saying.. anything that implements an iterator is guaranteed to have these methods that behave the same way. You can also use it to pass around iterator types if that is all you care about dealing with in your code. (you might not care what type of list it is.. you just want to pass an Iterator) You could put all these methods independently in the collections but you are not guaranteeing that they behave the same or that they even have the same name and signatures.

Iterators are one of the many design patterns available in java. Design patterns can be thought of as convenient building blocks, styles, usage of your code/structure.
To read more about the Iterator design pattern check out the this website that talks about Iterator as well as many other design patterns. Here is a snippet from the site on Iterator: http://www.patterndepot.com/put/8/Behavioral.html
The Iterator is one of the simplest
and most frequently used of the design
patterns. The Iterator pattern allows
you to move through a list or
collection of data using a standard
interface without having to know the
details of the internal
representations of that data. In
addition you can also define special
iterators that perform some special
processing and return only specified
elements of the data collection.

Iterators can be used against any sort of collection. They allow you to define an algorithm against a collection of items regardless of the underlying implementation. This means you can process a List, Set, String, File, Array, etc.
Ten years from now you can change your List implementation to a better implementation and the algorithm will still run seamlessly against it.

Iterator is useful when you are dealing with Collections in Java.
Use For-Each loop(Java1.5) for iterating over a collection or array or list.

The java.util.Iterator interface is used in the Java Collections Framework to allow modification of the collection while still iterating through it. If you just want to cleanly iterate over an entire collection, use a for-each instead, but a upside of Iterators is the functionality that you get: a optional remove() operation, and even better for the List Iterator interface, which offers add() and set() operations too. Both of these interfaces allow you to iterate over a collection and changing it structurally at the same time. Trying to modify a collection while iterating through it with a for-each would throw a ConcurrentModificationException, usually because the collection is unexpectedly modified!
Take a look at the ArrayList class
It has 2 private classes inside it (inner classes)
called Itr and ListItr
They implement Iterator and the ListIterator interfaces respectively
public class ArrayList..... { //enclosing class
private class Itr implements Iterator<E> {
public E next() {
return ArrayList.this.get(index++); //rough, not exact
}
//we have to use ArrayList.this.get() so the compiler will
//know that we are referring to the methods in the
//enclosing ArrayList class
public void remove() {
ArrayList.this.remove(prevIndex);
}
//checks for...co mod of the list
final void checkForComodification() { //ListItr gets this method as well
if (ArrayList.this.modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
}
}
private class ListItr extends Itr implements ListIterator<E> {
//methods inherted....
public void add(E e) {
ArrayList.this.add(cursor, e);
}
public void set(E e) {
ArrayList.this.set(cursor, e);
}
}
}
When you call the methods iterator() and listIterator(), they return
a new instance of the private class Itr or ListItr, and since these inner classes are "within" the enclosing ArrayList class, they can freely modify the ArrayList without triggering a ConcurrentModificationException, unless you change the list at the same time (conccurently) through set() add() or remove() methods of the ArrayList class.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.