Exception creating TreeSet from concurrently-modified ConcurrentSkipListSet - java

Generally, concurrent collections are safe to iterate; according to Javadoc: 'Iterators are weakly consistent, returning elements reflecting the state of the set at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations.'
However, consider this:
import java.util.Random;
import java.util.TreeSet;
import java.util.concurrent.ConcurrentSkipListSet;
public class ConcurrencyProblem {
private static volatile boolean modifierIsAlive = true;
public static void main(String[] args) {
final ConcurrentSkipListSet<Integer> concurrentSet = new ConcurrentSkipListSet<>();
Thread modifier = new Thread() {
private final Random randomGenerator = new Random();
public void run() {
while (modifierIsAlive) {
concurrentSet.add(randomGenerator.nextInt(1000));
concurrentSet.remove(randomGenerator.nextInt(1000));
}
};
};
modifier.start();
int sum = 0;
while (modifierIsAlive) {
try {
TreeSet<Integer> sortedCopy = new TreeSet<>(concurrentSet);
// make sure the copy operation is not eliminated by the compiler
sum += sortedCopy.size();
} catch (RuntimeException rte) {
modifierIsAlive = false;
rte.printStackTrace();
}
}
System.out.println("Dummy output: " + sum);
}
}
The output is
java.util.NoSuchElementException
at java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2299)
at java.util.concurrent.ConcurrentSkipListMap$KeyIterator.next(ConcurrentSkipListMap.java:2334)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2559)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2547)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2504)
at java.util.TreeMap.addAllForTreeSet(TreeMap.java:2462)
at java.util.TreeSet.addAll(TreeSet.java:308)
at java.util.TreeSet.<init>(TreeSet.java:172)
at mtbug.ConcurrencyProblem.main(ConcurrencyProblem.java:27)
Dummy output: 44910
I'm wondering if this is a bug or a feature; we did not get a ConcurrentModificationException, but still, having to care about iteration (falling back to synchronized blocks or otherwise) kind of defeats the purpose of ConcurrentSkipListSet/Map. I've been able to reproduce this both with Java 7 and 8 (currently, 8u72 on my Linux box).

As far as I can understand from browsing the sources, the problem with TreeSet is that it calls size() before iterating and then uses it instead of calling hasNext(). This may be a bug, but I think it's just a consequence of red-black trees being complicated structures requiring careful balancing, and therefore knowing size in advance is needed to properly balance it in linear time during creation.
You may circumvent this by iterating manually and adding elements to the TreeSet, but this will lead to n log n complexity, which could be the reason why TreeSet's constructor doesn't do it that way (its API spec guarantees linear time). Of course it could still call hasNext() as it builds the tree, but then some additional actions may be required after the construction is finished to rebalance the tree, which could lead to amortized linear complexity. But red-black trees are a mess as they are, and that kind of hack would make the implementation even messier.
Still, I think it's very confusing and should probably be documented somewhere in the API docs, but I'm not sure where exactly. Probably in the part where they explain what weakly consistent iterators are. Specifically, it should be mentioned that some library classes rely on the returned size and therefore may throw NoSuchElementException. Mentioning specific classes would also help.

I'm actually starting to lean towards this being a bug in TreeSet/TreeMap (update, it is). The issue, as Sergey alludes, is that TreeMap caches the result of ConcurrentSkipListSet.size() before reading its elements.
TreeSet.addAll() calls
TreeMap.addAllForTreeSet() and passes the collection's current size and potentially concurrent Iterator to
TreeMap.buildFromSorted() which ultimately calls Iterator.next() size-times.
In other words, it assumes the Collection it is passed will not be modified during construction, which is an erroneous assumption.
Note that even if buildFromSorted() did call Iterator.hasNext() its only option at that point would be to fail, since the backing data structure was modified mid-construction.
Looking at other collections that could potentially have some issue copying concurrent structures, including ArrayList, LinkedList, and CopyOnWriteArrayList (most other collections I looked at simply for-each over the elements), explicitly copy the provided collection to an array before doing any actual work in order to avoid this exact issue. I think TreeSet and TreeMap should be doing the same thing.
We actually don't have to accept O(n log n) performance due to this bug, but it's going to be a hack. We can't simply copy the values into an array or other data structure, because then inserting into the TreeSet won't be linear time. But we can lie to TreeSet by claiming the copy is a SortedSet.
public static class IterateOnlySortedSet<E>
extends AbstractSet<E> implements SortedSet<E> {
private final ArrayList<E> elements;
private final Comparator<? super E> comparator;
public IterateOnlySortedSet(SortedSet<E> source) {
elements = new ArrayList<>(source);
comparator = source.comparator();
}
#Override
public Iterator<E> iterator() {
return elements.iterator();
}
#Override
public int size() {
return elements.size();
}
#Override
public Comparator<? super E> comparator() {
return comparator;
}
// remaining methods simply throw UnsupportedOperationException
}
Changing your TreeSet construction line to:
TreeSet<Integer> sortedCopy = new TreeSet<>(new IterateOnlySortedSet<>(concurrentSet));
Now succeeds.
Nice find :)

Related

creating custom iterator with unit tests

I am learning programming and new to this domain as i am have a mechanical background.
Yesterday I received a problem statement from prof. where he provided us an custom Iterator which is designed to iterate over given elements alternatively.
Alternate Iterator code is as following.
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Queue;
public class AlternatingIterator<E> implements Iterator{
private final Queue<E> queue = new LinkedList<>();
public AlternatingIterator(Iterator<E> ... iterators) {
for(Iterator<E> iterator : iterators) {
while(iterator.hasNext())
queue.add(iterator.next());
}
}
#Override
public boolean hasNext() {
return queue.isEmpty() ? false : true;
}
#Override
public Object next() {
return queue.poll();
}
}
Now, The AlternatingIterator should alternate in order between the iterators it receives in its constructor. For example if constructed with three iterators [a,b,c], [1,2] and [x,y,z], the iterator should produce the elements in this order ‘a, 1, x, b, 2, y, c, z’
Also i have to Write unit tests for the ‘hasNext’ and ‘next’ methods.
Can we implement any other data structure than queue?
I am completely blown off and tired to understand how to solve this challenge but very confused here. If you guys can help me then i can learn important concept very quickly.
Thank you in advance and any help is appreciated.
Of course, we can use anything we can imagine. The following implementation alternates dynamically. Instead of using a queue, I store all the received iterators in an array:
import java.util.Iterator;
/**Alternates on the given iterators.*/
public class AlternatingIterator<E> implements Iterator {
/**Stores the iterators which are to be alternated on.*/
private Iterator<E>[] iterators;
/**The index of iterator, which has the next element.*/
private int nextIterator = 0;
/**Initializes a new AlternatingIterator object.
* Stores the iterators in the iterators field.
* Finds the first iterator with an available element.*/
public AlternatingIterator(Iterator<E> ... iterators) {
this.iterators = iterators;
if (!iterators[0].hasNext())
findNextIterator();
}
#Override
public boolean hasNext() {
return iterators[nextIterator].hasNext();
}
#Override
public Object next() {
E element = iterators[nextIterator].next();
findNextIterator();
return element;
}
/**Steps on iterators, until one has next element.
* It does not step on them infinitely, stops when
* the lastly used iterator is reached.*/
private void findNextIterator() {
int currentIterator = nextIterator;
// Finding iterator with element remaining.
do {
stepNextIterator();
} while (!iterators[nextIterator].hasNext() && nextIterator != currentIterator);
// If it gets around to the same iterator, then there is no iterator with element.
}
/**Increases the nextIterator value without indexing out of bounds.*/
private void stepNextIterator() {
nextIterator = (nextIterator + 1) % iterators.length;
}
}
But the same could be made statically using a queue, enqueuing all the elements from the iterators, having only that one queue as in your (your prof's) code.
#Andy Turner: Collecting the elements from the iterator results in not relying on the source collection the entire time until the last element is not obtained. Sure, from Java8 we use Streams, which won't gift us concurrent exceptions, but before Java8, buffering an/more iterator into a collection could have been safer in my opinion.
EDIT: you wrote then we can use the iterator of that given collection. Yeah, I totally forgot that, implementing an iterator for a simple queue is for sure pointless :)
A queue is helpful here, but not in the way you are using it.
There is no real point in copying all the elements from the iterators provided to the constructor into a queue, and then implementing a custom iterator from this queue: if you are going to put the elements into a collection which already implements Iterable, you may as well just use that Iterable's iterator.
But this is also probably not the point of the exercise: you can do this lazily with respect to consuming the input iterators. (Besides, what if one of the iterators is infinite...)
The idea I would suggest is to make a queue of iterators, not elements. Here is a description of how you could do it; I don't want to give you code to spoil your learning experience:
In your constructor, put the iterators from the parameter into a queue.
To implement hasNext(), pop iterators off the head of the queue for which hasNext() is false; stop when the iterator at the head of the queue has a next element (in which case return true), or the queue is empty (in which case return false).
To implement next(), pop the head iterator out of the queue, and get its next element: this is what you will return. But, before you do, if the iterator has more elements, push it onto the tail of the queue (doing this means that you will look at the next iterator on the next iteration).

thread-safe CopyOnWriteArrayList reverse iteration

Consider the following code snippet:
private List<Listener<E>> listenerList = new CopyOnWriteArrayList<Listener<E>>();
public void addListener(Listener<E> listener) {
if (listener != null) {
listenerList.add(listener);
}
}
public void removeListener(Listener<E> listener) {
if (listener != null) {
listenerList.remove(listener);
}
}
protected final void fireChangedForward(Event<E> event) {
for (Listener<E> listener : listenerList) {
listener.changed(event);
}
}
protected final void fireChangedReversed(Event<E> event) {
final ListIterator<Listener<E>> li = listenerList.listIterator(listenerList.size());
while (li.hasPrevious()) {
li.previous().changed(event);
}
}
There is a listener list that can be modified and iterated.
I think the forward iteration (see method #fireChangedForward)
should be safe.
The question is: is the reverse iteration (see method #fireChangedReversed) also safe in a multi-threaded environment?
I doubt that, because there are two calls involved: #size and #listIterator.
If it's not thread-safe, what is the most efficient way to implement #fireChangedReversed under the following circumstances:
optimize for traversal
avoid usage of locking if possible
avoid usage of javax.swing.event.EventListenerList
prefer solution without usage of third-party lib, e.g. implementation in own code possible
Indeed, listenerList.listIterator(listenerList.size()) is not thread-safe, for exactly the reason you suggested: the list could change size between the calls to size() and listIterator(), resulting in either the omission of an element from the iteration, or IndexOutOfBoundsException being thrown.
The best way to deal with this is to clone the CopyOnWriteArrayList before getting the iterator:
CopyOnWriteArrayList<Listener<E>> listenerList = ... ;
#SuppressWarnings("unchecked")
List<Listener<E>> copy = (List<Listener<E>>)listenerList.clone();
ListIterator<Listener<E>> li = copy.listIterator(copy.size());
The clone makes a shallow copy of the list. In particular, the clone shares the internal array with the original. This isn't entirely obvious from the specification, which says merely
Returns a shallow copy of this list. (The elements themselves are not copied.)
(When I read this, I thought "Of course the elements aren't copied; this is a shallow copy!" What this really means is that neither the elements nor the array that contains them are copied.)
This is fairly inconvenient, including the lack of a covariant override of clone(), requiring an unchecked cast.
Some potential enhancements are discussed in JDK-6821196 and JDK-8149509. The former bug also links to a discussion of this issue on the concurrency-interest mailing list.
One simple way to do that is to call #toArray method and iterate over the array in reverse order.
You could always just get a ListIterator and "fast-forward" to the end of the list as such:
final ListIterator<Listener<E>> li = listenerList.listIterator();
if (li.hasNext()) {
do{
li.next();
} while (li.hasNext());
}
while (li.hasPrevious()) {
li.previous().changed(event);
}
EDIT I switched the quirky exception-handling of my previous answer for a do/while loop that places the cursor of the ListIterator after the last element, in order to be ready for the next previous call.
RE-EDIT As pointed out by #MikeFHay, a do/while loop on an iterator will throw a NoSuchElementException on an empty list. To prevent this from happening, I wrapped the do/while loop with if (li.hasNext()).

ArrayIndexOutOfBoundsException while converting linkedList to array list

Im trying to convert a linkedList into an ArrayList as shown below.
private LinkedList<myData> myLinkedList= new LinkedList<myData>();
public Collection<myData> getData()
{
return new ArrayList<myData>(myLinkedList);
}
The linkedList might be updated by multiple threads. While testing in production I get the below error. The error is not consistant. So i get it may be once in a week, month or so.
java.lang.ArrayIndexOutOfBoundsException: 15
at java.util.LinkedList.toArray(LinkedList.java:866)
at java.util.ArrayList.<init>(ArrayList.java:131)
at org.xxx.yyy.zzz.getData(Data.java:291)
Is there any way it could be related to concurrent modification of the linkedList. Appreciate any help on this.
toArray failing is only one symptom of you doing something fundamentally dangerous.
From the documentation of LinkedList:
If multiple threads access a linked list concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
You'll either need to add synchronization (not just for toArray, but basically all uses of the list) or use one of the concurrent collections which is designed to be thread-safe.
LinkedList#toArray() is implemented as such (Oracle JDK 7)
public Object[] toArray() {
Object[] result = new Object[size];
int i = 0;
for (Node<E> x = first; x != null; x = x.next)
result[i++] = x.item;
return result;
}
If you add to the LinkedList after the result array is constructed but before the for loop, then the array access expression inside the for loop will cause an ArrayIndexOutOfBoundsException when trying to access an index larger than the original size.
You should really put some synchronization barriers so that doesn't happen.

Updating a PriorityQueue when iterating it

I need to update some fixed-priority elements in a PriorityQueue based on their ID. I think it's quite a common scenario, here's an example snippet (Android 2.2):
for (Entry e : mEntries) {
if (e.getId().equals(someId)) {
e.setData(newData);
}
}
I've then made Entry "immutable" (no setter methods) so that a new Entry instance is created and returned by setData(). I modified my method into this:
for (Entry e : mEntries) {
if (e.getId().equals(someId)) {
Entry newEntry = e.setData(newData);
mEntries.remove(e);
mEntries.add(newEntry);
}
}
The code seems to work fine, but someone pointed out that modifying a queue while iterating over it is a bad idea: it may throw a ConcurrentModificationException and I'd need to add the elements I want to remove to an ArrayList and remove it later. He didn't explain why, and it looks quite an overhead to me, but I couldn't find any specific explanation on internet.
(This post is similar, but there priorities can change, which is not my case)
Can anyone clarify what's wrong with my code, how should I change it and - most of all - why?
Thanks,
Rippel
PS: Some implementation details...
PriorityQueue<Entry> mEntries = new PriorityQueue<Entry>(1, Entry.EntryComparator());
with:
public static class EntryComparator implements Comparator<Entry> {
public int compare(Entry my, Entry their) {
if (my.mPriority < their.mPriority) {
return 1;
}
else if (my.mPriority > their.mPriority) {
return -1;
}
return 0;
}
}
This code is in the Java 6 implementation of PriorityQueue:
private class Itr implements Iterator<E> {
/**
* The modCount value that the iterator believes that the backing
* Queue should have. If this expectation is violated, the iterator
* has detected concurrent modification.
*/
private int expectedModCount = modCount;
public E next() {
if(expectedModCount != modCount) {
throw new ConcurrentModificationException();
}
}
}
Now, why is this code here? If you look at the Javadoc for ConcurrentModificationException you will find that the behaviour of an iterator is undefined if modification occurs to the underlying collection before iteration completes. As such, many of the collections implement this modCount mechanism.
To fix your code
You need to ensure that you don't modify the code mid-loop. If your code is single threaded (as it appears to be) then you can simply do as your coworker suggested and copy it into a list for later inclusion. Also, the use of the Iterator.remove() method is documented to prevent ConcurrentModificationExceptions. An example:
List<Entry> toAdd = new ArrayList<Entry>();
Iterator it = mEntries.iterator();
while(it.hasNext()) {
Entry e = it.next();
if(e.getId().equals(someId)) {
Entry newEntry = e.setData(newData);
it.remove();
toAdd.add(newEntry);
}
}
mEntries.addAll(toAdd);
The Javadoc for PriorityQueue says explicitly:
"Note that this implementation is not synchronized. Multiple threads should not access a PriorityQueue instance concurrently if any of the threads modifies the list structurally. Instead, use the thread-safe PriorityBlockingQueue class."
This seems to be your case.
What's wrong in your code was already explained -- implementing iterator, which can consistently iterate through collection with intersected modification is rather hard task to do. You need to specify how to deal with removed items (will it be seen through iterator?), added items, modified items... Even if you can do it consistently it will be rather complex and unefficient implementation -- and, mostly, not very usefull, since use case "iterate without modifications" is much more common. So, java architects choose to deny modification while iterate, and most collections from Java collection API follow this, and throw ConcurrentModificationException if such modification detected.
As for your code -- for me, your just should not make items immutable. Immutability is great thing, but it should not be overused. If Entry object you use here is some kind of domain object, and you really want them to be immutable -- you can just create some kind of temporary data holder (MutableEntry) object, use it inside your algorithm, and copy data to Entry before return. From my point of view it will be best solution.
a slightly better implementation is
List<Entry> toAdd = new ArrayList<Entry>();
for (Iterator<Entry> it= mEntries.iterator();it.hasNext();) {
Entry e = it.next();
if (e.getId().equals(someId)) {
Entry newEntry = e.setData(newData);
it.remove();
toAdd.add(newEntry);
}
}
mEntries.addAll(toAdd);
this uses the remove of the iterator and a bulk add afterwards

Removing the "first" object from a Set

Under certain situations, I need to evict the oldest element in a Java Set. The set is implemented using a LinkedHashSet, which makes this simple: just get rid of the first element returned by the set's iterator:
Set<Foo> mySet = new LinkedHashSet<Foo>();
// do stuff...
if (mySet.size() >= MAX_SET_SIZE)
{
Iterator<Foo> iter = mySet.iterator();
iter.next();
iter.remove();
}
This is ugly: 3 lines to do something I could do with 1 line if I was using a SortedSet (for other reasons, a SortedSet is not an option here):
if (/*stuff*/)
{
mySet.remove(mySet.first());
}
So is there a cleaner way of doing this, without:
changing the Set implementation, or
writing a static utility method?
Any solutions leveraging Guava are fine.
I am fully aware that sets do not have inherent ordering. I'm asking about removing the first entry as defined by iteration order.
LinkedHashSet is a wrapper for LinkedHashMap which supports a simple "remove oldest" policy. To use it as a Set you can do
Set<String> set = Collections.newSetFromMap(new LinkedHashMap<String, Boolean>(){
protected boolean removeEldestEntry(Map.Entry<String, Boolean> eldest) {
return size() > MAX_ENTRIES;
}
});
if (!mySet.isEmpty())
mySet.remove(mySet.iterator().next());
seems to be less than 3 lines.
You have to synchronize around it of course if your set is shared by multiple threads.
If you really need to do this at several places in your code, just write a static method.
The other solutions proposed are often slower since they imply calling the Set.remove(Object) method instead of the Iterator.remove() method.
#Nullable
public static <T> T removeFirst(Collection<? extends T> c) {
Iterator<? extends T> it = c.iterator();
if (!it.hasNext()) { return null; }
T removed = it.next();
it.remove();
return removed;
}
With guava:
if (!set.isEmpty() && set.size() >= MAX_SET_SIZE) {
set.remove(Iterables.get(set, 0));
}
I will also suggest an alternative approach. Yes, it it changing the implementation, but not drastically: extend LinkedHashSet and have that condition in the add method:
public LimitedLinkedHashSet<E> extends LinkedHashSet<E> {
public void add(E element) {
super.add(element);
// your 5-line logic from above or my solution with guava
}
}
It's still 5 line, but it is invisible to the code that's using it. And since this is actually a specific behaviour of the set, it is logical to have it within the set.
I think the way you're doing it is fine. Is this something you do often enough to be worth finding a shorter way? You could do basically the same thing with Guava like this:
Iterables.removeIf(Iterables.limit(mySet, 1), Predicates.alwaysTrue());
That adds the small overhead of wrapping the set and its iterator for limiting and then calling the alwaysTrue() predicate once... doesn't seem especially worth it to me though.
Edit: To put what I said in a comment in an answer, you could create a SetMultimap that automatically restricts the number of values it can have per key like this:
SetMultimap<K, V> multimap = Multimaps.newSetMultimap(map,
new Supplier<Set<V>>() {
public Set<V> get() {
return Sets.newSetFromMap(new LinkedHashMap<V, Boolean>() {
#Override protected boolean removeEldestEntry(Entry<K, V> eldestEntry) {
return size() > MAX_SIZE;
}
});
}
});
Quick and dirty one-line solution: mySet.remove(mySet.toArray(new Foo[mySet.size()])[0]) ;)
However, I'd still go for the iterator solution, since this would be more readable and should also be faster.
Edit: I'd go for Mike Samuel's solution. :)

Categories