Removing the "first" object from a Set - java

Under certain situations, I need to evict the oldest element in a Java Set. The set is implemented using a LinkedHashSet, which makes this simple: just get rid of the first element returned by the set's iterator:
Set<Foo> mySet = new LinkedHashSet<Foo>();
// do stuff...
if (mySet.size() >= MAX_SET_SIZE)
{
Iterator<Foo> iter = mySet.iterator();
iter.next();
iter.remove();
}
This is ugly: 3 lines to do something I could do with 1 line if I was using a SortedSet (for other reasons, a SortedSet is not an option here):
if (/*stuff*/)
{
mySet.remove(mySet.first());
}
So is there a cleaner way of doing this, without:
changing the Set implementation, or
writing a static utility method?
Any solutions leveraging Guava are fine.
I am fully aware that sets do not have inherent ordering. I'm asking about removing the first entry as defined by iteration order.

LinkedHashSet is a wrapper for LinkedHashMap which supports a simple "remove oldest" policy. To use it as a Set you can do
Set<String> set = Collections.newSetFromMap(new LinkedHashMap<String, Boolean>(){
protected boolean removeEldestEntry(Map.Entry<String, Boolean> eldest) {
return size() > MAX_ENTRIES;
}
});

if (!mySet.isEmpty())
mySet.remove(mySet.iterator().next());
seems to be less than 3 lines.
You have to synchronize around it of course if your set is shared by multiple threads.

If you really need to do this at several places in your code, just write a static method.
The other solutions proposed are often slower since they imply calling the Set.remove(Object) method instead of the Iterator.remove() method.
#Nullable
public static <T> T removeFirst(Collection<? extends T> c) {
Iterator<? extends T> it = c.iterator();
if (!it.hasNext()) { return null; }
T removed = it.next();
it.remove();
return removed;
}

With guava:
if (!set.isEmpty() && set.size() >= MAX_SET_SIZE) {
set.remove(Iterables.get(set, 0));
}
I will also suggest an alternative approach. Yes, it it changing the implementation, but not drastically: extend LinkedHashSet and have that condition in the add method:
public LimitedLinkedHashSet<E> extends LinkedHashSet<E> {
public void add(E element) {
super.add(element);
// your 5-line logic from above or my solution with guava
}
}
It's still 5 line, but it is invisible to the code that's using it. And since this is actually a specific behaviour of the set, it is logical to have it within the set.

I think the way you're doing it is fine. Is this something you do often enough to be worth finding a shorter way? You could do basically the same thing with Guava like this:
Iterables.removeIf(Iterables.limit(mySet, 1), Predicates.alwaysTrue());
That adds the small overhead of wrapping the set and its iterator for limiting and then calling the alwaysTrue() predicate once... doesn't seem especially worth it to me though.
Edit: To put what I said in a comment in an answer, you could create a SetMultimap that automatically restricts the number of values it can have per key like this:
SetMultimap<K, V> multimap = Multimaps.newSetMultimap(map,
new Supplier<Set<V>>() {
public Set<V> get() {
return Sets.newSetFromMap(new LinkedHashMap<V, Boolean>() {
#Override protected boolean removeEldestEntry(Entry<K, V> eldestEntry) {
return size() > MAX_SIZE;
}
});
}
});

Quick and dirty one-line solution: mySet.remove(mySet.toArray(new Foo[mySet.size()])[0]) ;)
However, I'd still go for the iterator solution, since this would be more readable and should also be faster.
Edit: I'd go for Mike Samuel's solution. :)

Related

Exception creating TreeSet from concurrently-modified ConcurrentSkipListSet

Generally, concurrent collections are safe to iterate; according to Javadoc: 'Iterators are weakly consistent, returning elements reflecting the state of the set at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations.'
However, consider this:
import java.util.Random;
import java.util.TreeSet;
import java.util.concurrent.ConcurrentSkipListSet;
public class ConcurrencyProblem {
private static volatile boolean modifierIsAlive = true;
public static void main(String[] args) {
final ConcurrentSkipListSet<Integer> concurrentSet = new ConcurrentSkipListSet<>();
Thread modifier = new Thread() {
private final Random randomGenerator = new Random();
public void run() {
while (modifierIsAlive) {
concurrentSet.add(randomGenerator.nextInt(1000));
concurrentSet.remove(randomGenerator.nextInt(1000));
}
};
};
modifier.start();
int sum = 0;
while (modifierIsAlive) {
try {
TreeSet<Integer> sortedCopy = new TreeSet<>(concurrentSet);
// make sure the copy operation is not eliminated by the compiler
sum += sortedCopy.size();
} catch (RuntimeException rte) {
modifierIsAlive = false;
rte.printStackTrace();
}
}
System.out.println("Dummy output: " + sum);
}
}
The output is
java.util.NoSuchElementException
at java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2299)
at java.util.concurrent.ConcurrentSkipListMap$KeyIterator.next(ConcurrentSkipListMap.java:2334)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2559)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2547)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2579)
at java.util.TreeMap.buildFromSorted(TreeMap.java:2504)
at java.util.TreeMap.addAllForTreeSet(TreeMap.java:2462)
at java.util.TreeSet.addAll(TreeSet.java:308)
at java.util.TreeSet.<init>(TreeSet.java:172)
at mtbug.ConcurrencyProblem.main(ConcurrencyProblem.java:27)
Dummy output: 44910
I'm wondering if this is a bug or a feature; we did not get a ConcurrentModificationException, but still, having to care about iteration (falling back to synchronized blocks or otherwise) kind of defeats the purpose of ConcurrentSkipListSet/Map. I've been able to reproduce this both with Java 7 and 8 (currently, 8u72 on my Linux box).
As far as I can understand from browsing the sources, the problem with TreeSet is that it calls size() before iterating and then uses it instead of calling hasNext(). This may be a bug, but I think it's just a consequence of red-black trees being complicated structures requiring careful balancing, and therefore knowing size in advance is needed to properly balance it in linear time during creation.
You may circumvent this by iterating manually and adding elements to the TreeSet, but this will lead to n log n complexity, which could be the reason why TreeSet's constructor doesn't do it that way (its API spec guarantees linear time). Of course it could still call hasNext() as it builds the tree, but then some additional actions may be required after the construction is finished to rebalance the tree, which could lead to amortized linear complexity. But red-black trees are a mess as they are, and that kind of hack would make the implementation even messier.
Still, I think it's very confusing and should probably be documented somewhere in the API docs, but I'm not sure where exactly. Probably in the part where they explain what weakly consistent iterators are. Specifically, it should be mentioned that some library classes rely on the returned size and therefore may throw NoSuchElementException. Mentioning specific classes would also help.
I'm actually starting to lean towards this being a bug in TreeSet/TreeMap (update, it is). The issue, as Sergey alludes, is that TreeMap caches the result of ConcurrentSkipListSet.size() before reading its elements.
TreeSet.addAll() calls
TreeMap.addAllForTreeSet() and passes the collection's current size and potentially concurrent Iterator to
TreeMap.buildFromSorted() which ultimately calls Iterator.next() size-times.
In other words, it assumes the Collection it is passed will not be modified during construction, which is an erroneous assumption.
Note that even if buildFromSorted() did call Iterator.hasNext() its only option at that point would be to fail, since the backing data structure was modified mid-construction.
Looking at other collections that could potentially have some issue copying concurrent structures, including ArrayList, LinkedList, and CopyOnWriteArrayList (most other collections I looked at simply for-each over the elements), explicitly copy the provided collection to an array before doing any actual work in order to avoid this exact issue. I think TreeSet and TreeMap should be doing the same thing.
We actually don't have to accept O(n log n) performance due to this bug, but it's going to be a hack. We can't simply copy the values into an array or other data structure, because then inserting into the TreeSet won't be linear time. But we can lie to TreeSet by claiming the copy is a SortedSet.
public static class IterateOnlySortedSet<E>
extends AbstractSet<E> implements SortedSet<E> {
private final ArrayList<E> elements;
private final Comparator<? super E> comparator;
public IterateOnlySortedSet(SortedSet<E> source) {
elements = new ArrayList<>(source);
comparator = source.comparator();
}
#Override
public Iterator<E> iterator() {
return elements.iterator();
}
#Override
public int size() {
return elements.size();
}
#Override
public Comparator<? super E> comparator() {
return comparator;
}
// remaining methods simply throw UnsupportedOperationException
}
Changing your TreeSet construction line to:
TreeSet<Integer> sortedCopy = new TreeSet<>(new IterateOnlySortedSet<>(concurrentSet));
Now succeeds.
Nice find :)

What is the use of LinkedHashMap.removeEldestEntry?

I am aware the answer to this question is easily available on the internet. I need to know what happens if I choose not to removeEldestEntry. Below is my code:
package collection;
import java.util.*;
public class MyLinkedHashMap {
private static final int MAX_ENTRIES = 2;
public static void main(String[] args) {
LinkedHashMap lhm = new LinkedHashMap(MAX_ENTRIES, 0.75F, false) {
protected boolean removeEldestEntry(Map.Entry eldest) {
return false;
}
};
lhm.put(0, "H");
lhm.put(1, "E");
lhm.put(2, "L");
lhm.put(3, "L");
lhm.put(4, "O");
System.out.println("" + lhm);
}
}
Even though I am not allowing the removeEldestEntry my code works fine.
So, internally what is happening?
removeEldestEntry always gets checked after an element was inserted. For example, if you override the method to always return true, the LinkedHashMap will always be empty, since after every put or putAll insertion, the eldest element will be removed, no matter what. The JavaDoc shows a very sensible example on how to use it:
protected boolean removeEldestEntry(Map.Entry eldest){
return size() > MAX_SIZE;
}
In an alternative way, you might only want to remove an entry if it is unimportant:
protected boolean removeEldestEntry(Map.Entry eldest){
if(size() > MAX_ENTRIES){
if(isImportant(eldest)){
//Handle an important entry here, like reinserting it to the back of the list
this.remove(eldest.getKey());
this.put(eldest.getKey(), eldest.getValue());
//removeEldestEntry will be called again, now with the next entry
//so the size should not exceed the MAX_ENTRIES value
//WARNING: If every element is important, this will loop indefinetly!
} else {
return true; //Element is unimportant
}
return false; //Size not reached or eldest element was already handled otherwise
}
Why can't people just answer the OP's simple question!
If removeEldestEntry returns false then no items will ever be removed from the map and it will essentially behave like a normal Map.
Expanding on the answer by DavidNewcomb:
I'm assuming that you are learning how to implement a cache.
The method LinkedHashMap.removeEldestEntry is a method very commonly used in cache data structures, where the size of the cache is limited to a certain threshold. In such cases, the removeEldestEntry method can be set to automatically remove the oldest entry when the size exceeds the threshold (defined by the MAX_ENTRIES attribute) - as in the example provided here.
On the other hand, when you override the removeEldestEntry method this way, you are ensuring that nothing ever happens when the MAX_ENTRIES threshold is exceeded. In other words, the data structure would not behave like a cache, but rather a normal map.
Your removeEldestEntry method is identical to the default implementation of LinkedHashMap.removeEldestEntry, so your LinkedHashMap will simply behave like a normal LinkedHashMap with no overridden methods, retaining whatever you values and keys put into it unless and until you explicitly remove them by calling remove, removeAll, clear, etc. The advantage of using LinkedHashMap is that the collection views (keySet(), values(), entrySet()) always return Iterators that traverse the keys and/or values in the order they were added to the Map.

Iterator retrieve first value and place it back on the same iterator

I have the following scenario: I have an existing iterator Iterator<String> it and I iterate over its head (say first k elements, which are flagged elements, i.e. they start with '*' ). The only way to know that the flagged elements are over, is by noticing that the (k+1)th element is not flagged.
The problem is that if I do that, the iterator it will not provide me the first value anymore on the next call to next().
I want to pass this iterator to a method as it's only argument and I would like to avoid changing its signarture and it implementation. I know I could do this:
public void methodAcceptingIterator(Iterator<String> it) //current signature
//change it to
public void methodAcceptingIterator(String firstElement, Iterator<String> it)
But this looks like a workarround/hack decreasing the elegance and generality of the code, so I don't want to this.
Any ideas how I could solve this problem ?
You could use Guava's PeekingIterator (link contains the javadoc for a static method which, given an Iterator, will return a wrapping PeekingIterator). That includes a method T peek() which shows you the next element without advancing to it.
The solution is to create your own Iterator implementation which stores the firstElement and uses the existing iterator as an underlying Iterator to delegate the requests for the rest of the elements to.
Something like:
public class IteratorMissingFirst<E> implements Iterator<E>{
private Iterator<E> underlyingIterator;
private E firstElement;
private boolean firstElOffered;
public IteratorMissingFirst(E firstElement, Iterator<E> it){
//initialize all the instance vars
}
public boolean hasNext(){
if(!firstElOffered && firstElement != null){
return true;
}
else{
return underlyingIterator.hasNext();
}
}
public E next(){
if(!firstElOffered){
firstElOffered = true;
return firstElement;
}
else return underlyingIterator.next();
}
public void remove(){
}
}
Why don't you just have methodAcceptingIterator store the first element it gets out of the iterator in a variable? Or -- in a pinch -- just copy the contents of the Iterator into an ArrayList at the beginning of your method; now you can revisit elements as often as you like.
With Guava, you can implement Razvan's solution in an easier way by using some methods from the Iterables class:
Iterators.concat(Iterators.singletonIterator(firstElement), it)
This gives you an iterator working similar to IteratorMissingFirst, and it's easy to extend if you need to look at more than one element in front (but it creates two objects instead of only one).

Limited SortedSet

i'm looking for an implementation of SortedSet with a limited number of elements. So if there are more elements added then the specified Maximum the comparator decides if to add the item and remove the last one from the Set.
SortedSet<Integer> t1 = new LimitedSet<Integer>(3);
t1.add(5);
t1.add(3);
t1.add(1);
// [1,3,5]
t1.add(2);
// [1,2,3]
t1.add(9);
// [1,2,3]
t1.add(0);
// [0,1,2]
Is there an elegant way in the standard API to accomplish this?
I've wrote a JUnit Test for checking implementations:
#Test
public void testLimitedSortedSet() {
final LimitedSortedSet<Integer> t1 = new LimitedSortedSet<Integer>(3);
t1.add(5);
t1.add(3);
t1.add(1);
System.out.println(t1);
// [1,3,5]
t1.add(2);
System.out.println(t1);
// [1,2,3]
t1.add(9);
System.out.println(t1);
// [1,2,3]
t1.add(0);
System.out.println(t1);
// [0,1,2]
Assert.assertTrue(3 == t1.size());
Assert.assertEquals(Integer.valueOf(0), t1.first());
}
With the standard API you'd have to do it yourself, i.e. extend one of the sorted set classes and add the logic you want to the add() and addAll() methods. Shouldn't be too hard.
Btw, I don't fully understand your example:
t1.add(9);
// [1,2,3]
Shouldn't the set contain [1,2,9] afterwards?
Edit: I think now I understand: you want to only keep the smallest 3 elements that were added to the set, right?
Edit 2: An example implementation (not optimised) could look like this:
class LimitedSortedSet<E> extends TreeSet<E> {
private int maxSize;
LimitedSortedSet( int maxSize ) {
this.maxSize = maxSize;
}
#Override
public boolean addAll( Collection<? extends E> c ) {
boolean added = super.addAll( c );
if( size() > maxSize ) {
E firstToRemove = (E)toArray( )[maxSize];
removeAll( tailSet( firstToRemove ) );
}
return added;
}
#Override
public boolean add( E o ) {
boolean added = super.add( o );
if( size() > maxSize ) {
E firstToRemove = (E)toArray( )[maxSize];
removeAll( tailSet( firstToRemove ) );
}
return added;
}
}
Note that tailSet() returns the subset including the parameter (if in the set). This means that if you can't calculate the next higher value (doesn't need to be in the set) you'll have to readd that element. This is done in the code above.
If you can calculate the next value, e.g. if you have a set of integers, doing something tailSet( lastElement + 1 ) would be sufficient and you'd not have to readd the last element.
Alternatively you can iterate over the set yourself and remove all elements that follow the last you want to keep.
Another alternative, although that might be more work, would be to check the size before inserting an element and remove accordingly.
Update: as msandiford correctly pointed out, the first element that should be removed is the one at index maxSize. Thus there's no need to readd (re-add?) the last wanted element.
Important note:
As #DieterDP correctly pointed out, the implementation above violates the Collection#add() api contract which states that if a collection refuses to add an element for any reason other than it being a duplicate an excpetion must be thrown.
In the example above the element is first added but might be removed again due to size constraints or other elements might be removed, so this violates the contract.
To fix that you might want to change add() and addAll() to throw exceptions in those cases (or maybe in any case in order to make them unusable) and provide alterante methods to add elements which don't violate any existing api contract.
In any case the above example should be used with care since using it with code that isn't aware of the violations might result in unwanted and hard to debug errors.
I'd say this is a typical application for the decorator pattern, similar to the decorator collections exposed by the Collections class: unmodifiableXXX, synchronizedXXX, singletonXXX etc. I would take Guava's ForwardingSortedSet as base class, and write a class that decorates an existing SortedSet with your required functionality, something like this:
public final class SortedSets {
public <T> SortedSet<T> maximumSize(
final SortedSet<T> original, final int maximumSize){
return new ForwardingSortedSet<T>() {
#Override
protected SortedSet<T> delegate() {
return original;
}
#Override
public boolean add(final T e) {
if(original.size()<maximumSize){
return original.add(e);
}else return false;
}
// implement other methods accordingly
};
}
}
No, there is nothing like that using existing Java Library.
But yes, you can build a one like below using composition. I believe it will be easy.
public class LimitedSet implements SortedSet {
private TreeSet treeSet = new TreeSet();
public boolean add(E e) {
boolean result = treeSet.add(e);
if(treeSet.size() >= expectedSize) {
// remove the one you like ;)
}
return result;
}
// all other methods delegate to the "treeSet"
}
UPDATE
After reading your comment
As you need to remove the last element always:
you can consider maintaining a stack internally
it will increase memory complexity with O(n)
but possible to retrieve the last element with just O(1)... constant time
It should do the trick I believe

Updating a PriorityQueue when iterating it

I need to update some fixed-priority elements in a PriorityQueue based on their ID. I think it's quite a common scenario, here's an example snippet (Android 2.2):
for (Entry e : mEntries) {
if (e.getId().equals(someId)) {
e.setData(newData);
}
}
I've then made Entry "immutable" (no setter methods) so that a new Entry instance is created and returned by setData(). I modified my method into this:
for (Entry e : mEntries) {
if (e.getId().equals(someId)) {
Entry newEntry = e.setData(newData);
mEntries.remove(e);
mEntries.add(newEntry);
}
}
The code seems to work fine, but someone pointed out that modifying a queue while iterating over it is a bad idea: it may throw a ConcurrentModificationException and I'd need to add the elements I want to remove to an ArrayList and remove it later. He didn't explain why, and it looks quite an overhead to me, but I couldn't find any specific explanation on internet.
(This post is similar, but there priorities can change, which is not my case)
Can anyone clarify what's wrong with my code, how should I change it and - most of all - why?
Thanks,
Rippel
PS: Some implementation details...
PriorityQueue<Entry> mEntries = new PriorityQueue<Entry>(1, Entry.EntryComparator());
with:
public static class EntryComparator implements Comparator<Entry> {
public int compare(Entry my, Entry their) {
if (my.mPriority < their.mPriority) {
return 1;
}
else if (my.mPriority > their.mPriority) {
return -1;
}
return 0;
}
}
This code is in the Java 6 implementation of PriorityQueue:
private class Itr implements Iterator<E> {
/**
* The modCount value that the iterator believes that the backing
* Queue should have. If this expectation is violated, the iterator
* has detected concurrent modification.
*/
private int expectedModCount = modCount;
public E next() {
if(expectedModCount != modCount) {
throw new ConcurrentModificationException();
}
}
}
Now, why is this code here? If you look at the Javadoc for ConcurrentModificationException you will find that the behaviour of an iterator is undefined if modification occurs to the underlying collection before iteration completes. As such, many of the collections implement this modCount mechanism.
To fix your code
You need to ensure that you don't modify the code mid-loop. If your code is single threaded (as it appears to be) then you can simply do as your coworker suggested and copy it into a list for later inclusion. Also, the use of the Iterator.remove() method is documented to prevent ConcurrentModificationExceptions. An example:
List<Entry> toAdd = new ArrayList<Entry>();
Iterator it = mEntries.iterator();
while(it.hasNext()) {
Entry e = it.next();
if(e.getId().equals(someId)) {
Entry newEntry = e.setData(newData);
it.remove();
toAdd.add(newEntry);
}
}
mEntries.addAll(toAdd);
The Javadoc for PriorityQueue says explicitly:
"Note that this implementation is not synchronized. Multiple threads should not access a PriorityQueue instance concurrently if any of the threads modifies the list structurally. Instead, use the thread-safe PriorityBlockingQueue class."
This seems to be your case.
What's wrong in your code was already explained -- implementing iterator, which can consistently iterate through collection with intersected modification is rather hard task to do. You need to specify how to deal with removed items (will it be seen through iterator?), added items, modified items... Even if you can do it consistently it will be rather complex and unefficient implementation -- and, mostly, not very usefull, since use case "iterate without modifications" is much more common. So, java architects choose to deny modification while iterate, and most collections from Java collection API follow this, and throw ConcurrentModificationException if such modification detected.
As for your code -- for me, your just should not make items immutable. Immutability is great thing, but it should not be overused. If Entry object you use here is some kind of domain object, and you really want them to be immutable -- you can just create some kind of temporary data holder (MutableEntry) object, use it inside your algorithm, and copy data to Entry before return. From my point of view it will be best solution.
a slightly better implementation is
List<Entry> toAdd = new ArrayList<Entry>();
for (Iterator<Entry> it= mEntries.iterator();it.hasNext();) {
Entry e = it.next();
if (e.getId().equals(someId)) {
Entry newEntry = e.setData(newData);
it.remove();
toAdd.add(newEntry);
}
}
mEntries.addAll(toAdd);
this uses the remove of the iterator and a bulk add afterwards

Categories