Summary of this post: I have an set of ordered items whose order may change over time. I need to be able to iterate through this set from multiple threads, each of which may also want to update the order of the items.
For example, multiple threads need to access String keys in some arbitrary sorted order. They strings are not sorted according to their natural ordering, but by some values that may change (hence, a custom Comparator). My original implementation was to use a TreeSet and synchronize on it. If any of the keys needed to be reordered, a thread would remove the key from the map, update the comparison value, and reinsert the key. To implement this, the keys are native Strings, but the comparator has access to the values. This is a weird arrangement where the order of keys may change over time, but since a changed key is always removed and reinserted when it changes, it seems to work. (I suppose it could also work if the Strings were wrapped inside another object.)
I recently became aware of the ConcurrentSkipListSet/ConcurrentSkipListMap implementations which are basically thread-safe sorted sets (resp. maps.) It seems like I can now iterate through the keys without having to lock the entire data structure. However, is there a way I can use them to atomically remove a key and replace it with another, like the operation I was doing above, so that other iterating threads don't miss the item, and without having to use synchronize blocks?
If anyone can suggest a better data structure for this type of operation, I'm all ears, too!
is there a way I can use them to atomically remove a key and replace it with another, like the operation I was doing above, so that other iterating threads don't miss the item, and without having to use synchronize blocks?
The short answer is no. If you need to remove and reinsert, there is no atomic way to do this with any collection that I know of.
That said, one possibility would be for you to reinsert the item before deleting it from the skip list. This would cause a duplicate but may be easier to handle then a missing entry. You would reinsert it after you changed the object so it would sort differently. This assumes that the object would then be non-equal as well. But if the other threads that are processing the lists can't handle the duplicates then I think you are SOL.
Related
I know that code like
for ( Object o: collection){
if (condition(i)){
collection.remove(i);
}
}
will throw a ConcurrentModificationException, and I understand why: modifying the collection directly could interfere with the Iterator's ability to keep track of its place, by, for instance, leaving it with a reference to an element that's no longer a part of the collection, or causing it to skip over one that's just been added. For code like the above, that's a reasonable concern, however, I would like to write something like
for (Object o: set){// set is an instance of java.util.LinkedHashSet
if (condition(o)){
set.remove(other(o));
}
}
Where other(o) is guaranteed to be "far" from o in the ordering of set. In my particular implementation it will never be less than 47 "steps" away from o. Additionally, if if condition(o) is true, the loop in question will be guaranteed to short-circuit well before it reaches the place where other(o) was. Thus the entire portion of the set accessed by the iterator is thoroughly decoupled from the portion that is modified. Furthermore, the particular strengths of LinkedHashSet (fast random-access insertion and removal, guaranteed iteration order) seem particularly well-suited to this exact sort of operation.
I suppose my question is twofold: First of all, is such an operation still dangerous given the above constraints? The only way that I can think that it might be is that the Iterator values are preloaded far in advance and cached, which I suppose would improve performance in many applications, but seems like it would also reduce it in many others, and therefore be a strange choice for a general-purpose class from java.util. But perhaps I'm wrong about that. When it comes to things like caching, my intuition about efficiency is often suspect. Secondly, assuming this sort of thing is, at least in theory, safe, is there a way, short of completely re-implementing LinkedHashSet, or sacrificing efficiency, to achieve this operation? Can I tell Collections to ignore the fact that I'm modifying a different part of the Set, and just go about its business as usual? My current work-around is to add elements to an intermediate collection first, then add them to the main set once the loop is complete, but this is inefficient, since it has to add the values twice.
The ConcurrentModificationException is thrown because your collection may not be able to handle the removal (or addition) at all times. For example, what if the removal you performed meant that your LinkedHashSet had to reduce/increase the space the underlying HashMap takes under the hood? It would have to make a lot of changes, which would possibly render the iterator useless.
You have two options:
Use Iterator to iterate elements and remove them, e.g. calling Iterator iter = linkedHashSet.iterator() to get the iterator and then remove elements by iter.remove()
Use one of the concurrent collections available under the java.util.concurrent package, which are designed to allow concurrent modifications
This question contains nice details on using Iterator
UPDATE after comments:
You can use the following pattern in order to remove the elements you wish without causing a ConcurrentModificationException: gather the elements you wish to remove in a List while looping through the LinkedHashSet elements. Afterwards, loop through each toBeDeleted element in the list and remove it from the LinkedHashSet.
I need a container that contains [key, Value] pair.
Here, key = Integer, Value = User Defined class object.
Mutiple threads are trying to add [key, Value] pair in above container.
If key already present in the container, I want to update the value by checking some condition.
At the end I want container in sorted order, according to Key.
My efforts -
I used this synchronizedSortedMap and Sorted Map for above task.
SortedMap<Integer, USER_DEFINED_OBJECT> m = Collections.synchronizedSortedMap(new TreeMap<Integer, USER_DEFINED_OBJECT>());
This helps me to add pairs concurrently on above container.
And, yes If key already present, then I check some condition, then proceed.
Is my approach always thread safe ? If not, please correct me.
Updated
USER_DEFINED_OBJECT has some field index.
At the time of adding, I am checking if key is already present, then compare current USER_DEFINED_OBJECT with already present USER_DEFINED_OBJECT on the basis of above mentioned(in point 1) filed "index". If currect "index" is greater than update.
Use ConcurrentHashMap from java.util package, read the API ConcurrentHashMap
java.util.concurrent.ConcurrentSkipListMap
A scalable concurrent ConcurrentNavigableMap implementation. The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
This class implements a concurrent variant of SkipLists providing expected average log(n) time cost for the containsKey, get, put and remove operations and their variants. Insertion, removal, update, and access operations safely execute concurrently by multiple threads. Iterators are weakly consistent, returning elements reflecting the state of the map at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations. Ascending key ordered views and their iterators are faster than descending ones.
The concurrent collections let you call methods like put, remove etc. in kind of transaction. Therefore it's thread safe.
From what I understood your scenario for adding new [key, value] pair is as follows:
Check whether the mapping already exists
If not, just add it
If yes, update the existing value in the mapping based on some check
I doubt there is an implementation in place which does this for you in thread-safe way. In the case I understood your use-case correctly you will need to add some manual synchronization on your own to make the update steps transactional.
First of all, I want to make it clear that I would never use a HashMap to do things that require some kind of order in the data structure and that this question is motivated by my curiosity about the inner details of Java HashMap implementation.
You can read in the java documentation on Object about the Object method hashCode.
I understand from there that hashCode implementation for classes such as String and basic types wrappers (Integer, Long,...) is predictable once the value contained by the object is given. An example of that would be that calls to hashCode for any String object containing the value hello should return always: 99162322
Having an algorithm that always insert into an empty Java HashMap where Strings are used as keys the same values in the same order. Then, the order of its elements at the end should be always the same, am I wrong?
Since the hash code for a concrete value is always the same, if there are not collisions the order should be the same.
On the other hand, if there are collisions, I think (I don't know the facts) that the collisions resolutions should result in the same order for exactly the same input elements.
So, isn't it right that two HashMap objects with the same elements, inserted in the same order should be traversed (by an iterator) giving the same elements sequence?
As far as I know the order (assuming we call "order" the order of elements as returned by values() iterator) of the elements in HashMap are kept until map rehash is performed. We can influence on probability of that event by providing capacity and/or loadFactor to the constructor.
Nevertheless, we should never rely on this statement because the internal implementation of HashMap is not a part of its public contract and is a subject to change in future.
I think you are asking "Is HashMap non-deterministic?". The answer is "probably not" (look at the source code of your favourite implementation to find out).
However, bear in mind that because the Java standard does not guarantee a particular order, the implementation is free to alter at any time (e.g. in newer JRE versions), giving a different (yet deterministic) result.
Whether or not that is true is entirely dependent upon the implementation. What's more important is that it isn't guaranteed. If you order is important to you there are options. You could create your own implementation of Map that does preserve order, you can use a SortedMap/LinkedHashMap or you can use something like the apache commons-collections OrderedMap: http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/OrderedMap.html.
I have a multiset in guava and I would like to retrieve the number of instances of a given element without iterating over this multiset (I don't want to iterate because I assume that iterating takes quite some time, as it looks through all the collection).
To do that, I was thinking first to use the entryset() method of multiset, to obtain a set with single instances and their corresponding count. Then, transform this set into a hashmap (where keys are the elements of my set, and values are their instance count). Because then I can use the methods of hashmap to directly retrieve a value from its key - done! But this makes sense only if I can transform the set into the hashmap in a quick way (without iterating trhough all elements): is it possible?
(as I said I expect this question to be flawed on multiple counts, I'd be happy if you could shed light on the conceptual mistakes I probably make here. Thx!)
Simply invoke count(element) on your multiset -- voila!
You may know in Guava Multiset is an interface, not a class.
If you just want to know the repeated number of an element, call Multiset.count(Object element).
Please forget my following statement:
Then if you are using a popular implementation HashMultiset, there is already a HashMap<E, AtomicInteger> working under the scene.
That is, when the HashMultiset iterates, also a HashMap iterates. No need to transform into another HashMap.
If I am going to create a Java Collection, and only want to fill it with elements, and then iterate through it (without knowing the necessary size beforehand), i.e. all I need is Collection<E>.add(E) and Collection<E>.iterator(), which concrete class should I choose? Is there any advantage to using a Set rather than a List, for example? Which one would have the least overhead?
which concrete class should I choose?
I would probably just go with an ArrayList or a LinkedList. Both support the add and iterator methods, and neighter of them have any considerable overhead.
Is there any advantage to using a Set rather than a List, for example?
No, I wouldn't say so. (Unless you rely on the order of the elements, in which case you must use a List, or want to disallow duplicates, in which case you should use a Set.)
(I don't see how any Set implementation could beat a list implementation for add / iterator methods, so I'd probably go with a List even if I don't care about order.)
Which one would have the least overhead?
Sounds like micro benchmarking here, but if I'd be forced to guess, I'd say ArrayList (or perhaps LinkedList in coner cases where ArrayLists need to reallocate memory often :-)
Do not go with a Set. Sets and Lists differ according to their purpose, that you should always consider when choosing the right Collection
a List is there for maintaining elements in the order you added them; and if you insert the same element twice it will be kept twice
a Set is there for holding one specific element exactly once (uniqueness); order is only relevant for specific implementations (like TreeSet), but still elements that are 'the same' would not be added twice
Set is only meaningful if you want to sort your objects and to make sure no duplicate element is 'registered'. Else, an ArrayList is just fine.
However, if you want to add elements while iterating too, an ArrayBlockingQueue is better.
Here are some key points which can help you to choose your collection according to your requirement -
List(ArrayList or LinkedList)
Allowed duplicate values.
Insertion order preserved.
Set
Not allowed duplicate values.
Insertion order is not preserved.
So according to your requirement List seems to be a suitable choice.
Now Between ArrayList and LinkedList -
ArrayList is a random access list. Use if your frequent operation is the retrieval of elements.
LinkedList is the best option if you want to add or remove elements from the list.