I am having an issue with a simplistic thread-safe implementation of the observer pattern using a ConcurrentSkipListSet to handle keeping track of observer priorities during insertion. The majority of observers will not have any special priority attributed to them, and following this Comparable#compareTo method will show as equal priority when compared (where priority is a value in an enum of five priorities ranging from highest to lowest):
public int compareTo(BaseLink<?> link) {
return this.priority.compareTo(link.getPriority());
}
When I add observers of equal priorities to the ConcurrentSkipListSet , it seems like some of the added objects are simply lost during the insertion process. Changing the priorities of any of the observers I have created while testing this results in those observers being added to the set without issue, though I assume that given enough observers of the same priority the issue will arise again.
I am unsure about what is causing this issue, and of what I should do to help resolve it. Is there anything I can do to resolve this issue? Alternatively if this is an inherent problem with the ConcurrentSkipListSet, are there any other thread-safe data structures that can give me reasonably performant insertion and sorting times for unique objects?
I presume that you are instantiating like this ConcurrentSkipListSet(myComparator) where myComparator implements compareTo as you have shown us.
A ConcurrentSkipListSet is a Set. When you instantiate one using a Comparator, it will use it to:
order the set elements
determine when a new element is already in the set.
In your code, your Comparator.compareTo(...) is saying that every BaseLink with a given priority is the same. That's what is causing your problem.
Solution:
public int compareTo(BaseLink<?> link) {
int res = this.priority.compareTo(link.getPriority());
if (res == 0) {
// tie-breaker for different links with the same priority
res = // compare using some other key / identifier
}
return res;
}
If your BaseLink objects have a natural key or identifier, you could use that as the tie-breaker for objects with the same priority. Even the object's identity hashcode would do ... if you are not worried about "fairness" or "reproducibility" in the ordering of the set.
Related
I need to implement a priority queue where the priority of an item in the queue can change and the queue adjusts itself so that items are always removed in the correct order. I have some ideas of how I could implement this but I'm sure this is quite a common data structure so I'm hoping I can use an implementation by someone smarter than me as a base.
Can anyone tell me the name of this type of priority queue so I know what to search for or, even better, point me to an implementation?
Priority queues such as this are typically implemented using a binary heap data structure as someone else suggested, which usually is represented using an array but could also use a binary tree. It actually is not hard to increase or decrease the priority of an element in the heap. If you know you are changing the priority of many elements before the next element is popped from the queue you can temporarily turn off dynamic reordering, insert all of the elements at the end of the heap, and then reorder the entire heap (at a cost of O(n)) just before the element needs to be popped. The important thing about heaps is that it only costs O(n) to put an array into heap order but O(n log n) to sort it.
I have used this approach successfully in a large project with dynamic priorities.
Here is my implementation of a parameterized priority queue implementation in the Curl programming language.
A standard binary heap supports 5 operations (the example below assume a max heap):
* find-max: return the maximum node of the heap
* delete-max: removing the root node of the heap
* increase-key: updating a key within the heap
* insert: adding a new key to the heap
* merge: joining two heaps to form a valid new heap containing all the elements of both.
As you can see, in a max heap, you can increase an arbitrary key. In a min heap you can decrease an arbitrary key. You can't change keys both ways unfortunately, but will this do? If you need to change keys both ways then you might want to think about using a a min-max-heap.
I would suggest first trying the head-in approach, to update a priority:
delete the item from the queue
re-insert it with the new priority
In C++, this could be done using a std::multi_map, the important thing is that the object must remember where it is stored in the structure to be able to delete itself efficiently. For re-insert, it's difficult since you cannot presume you know anything about the priorities.
class Item;
typedef std::multi_map<int, Item*> priority_queue;
class Item
{
public:
void add(priority_queue& queue);
void remove();
int getPriority() const;
void setPriority(int priority);
std::string& accessData();
const std::string& getData() const;
private:
int mPriority;
std::string mData;
priority_queue* mQueue;
priority_queue::iterator mIterator;
};
void Item::add(priority_queue& queue)
{
mQueue = &queue;
mIterator = queue.insert(std::make_pair(mPriority,this));
}
void Item::remove()
{
mQueue.erase(mIterator);
mQueue = 0;
mIterator = priority_queue::iterator();
}
void Item::setPriority(int priority)
{
mPriority = priority;
if (mQueue)
{
priority_queue& queue = *mQueue;
this->remove();
this->add(queue);
}
}
I am looking for just exactly the same thing!
And here is some of my idea:
Since a priority of an item keeps changing,
it's meaningless to sort the queue before retrieving an item.
So, we should forget using a priority queue. And "partially" sort the
container while retrieving an item.
And choose from the following STL sort algorithms:
a. partition
b. stable_partition
c. nth_element
d. partial_sort
e. partial_sort_copy
f. sort
g. stable_sort
partition, stable_partition and nth_element are linear-time sort algorithms, which should be our 1st choices.
BUT, it seems that there is no those algorithms provided in the official Java library. As a result, I will suggest you to use java.util.Collections.max/min to do what you want.
Google has a number of answers for you, including an implementation of one in Java.
However, this sounds like something that would be a homework problem, so if it is, I'd suggest trying to work through the ideas yourself first, then potentially referencing someone else's implementation if you get stuck somewhere and need a pointer in the right direction. That way, you're less likely to be "biased" towards the precise coding method used by the other programmer and more likely to understand why each piece of code is included and how it works. Sometimes it can be a little too tempting to do the paraphrasing equivalent of "copy and paste".
First of all, I want to make it clear that I would never use a HashMap to do things that require some kind of order in the data structure and that this question is motivated by my curiosity about the inner details of Java HashMap implementation.
You can read in the java documentation on Object about the Object method hashCode.
I understand from there that hashCode implementation for classes such as String and basic types wrappers (Integer, Long,...) is predictable once the value contained by the object is given. An example of that would be that calls to hashCode for any String object containing the value hello should return always: 99162322
Having an algorithm that always insert into an empty Java HashMap where Strings are used as keys the same values in the same order. Then, the order of its elements at the end should be always the same, am I wrong?
Since the hash code for a concrete value is always the same, if there are not collisions the order should be the same.
On the other hand, if there are collisions, I think (I don't know the facts) that the collisions resolutions should result in the same order for exactly the same input elements.
So, isn't it right that two HashMap objects with the same elements, inserted in the same order should be traversed (by an iterator) giving the same elements sequence?
As far as I know the order (assuming we call "order" the order of elements as returned by values() iterator) of the elements in HashMap are kept until map rehash is performed. We can influence on probability of that event by providing capacity and/or loadFactor to the constructor.
Nevertheless, we should never rely on this statement because the internal implementation of HashMap is not a part of its public contract and is a subject to change in future.
I think you are asking "Is HashMap non-deterministic?". The answer is "probably not" (look at the source code of your favourite implementation to find out).
However, bear in mind that because the Java standard does not guarantee a particular order, the implementation is free to alter at any time (e.g. in newer JRE versions), giving a different (yet deterministic) result.
Whether or not that is true is entirely dependent upon the implementation. What's more important is that it isn't guaranteed. If you order is important to you there are options. You could create your own implementation of Map that does preserve order, you can use a SortedMap/LinkedHashMap or you can use something like the apache commons-collections OrderedMap: http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/OrderedMap.html.
I have a code in which for-each-loops on a Set need to rely on the fact that the iterator returns the elements always in the same order, e.g.
for(ParameterObject parameter : parameters) {
/* ... */
}
The iterators returned by HashSet are not guaranteed to have this property, however it is documented that the iterators of LinkedHashSet do have this property. So my code uses a LinkedHashSet and everything works fine.
However, I am wondering if I could endow the my code with a check that the set passed to it conforms to the requirement. It appears as if this is not possible (except of a direct test on LinkedHashSet). There is no interface implemented by LinkedHashSet which I could test on and there is no interface implemented by LinkedHashSet.iterator() which I could test on. It would be nice if there is an interface like OrderConsistentCollection or OrderConsistentIterator.
(I need this property here).
There isn't a way you can check for it -- but you can ensure it anyway, by simply copying the set into a collection that does have that property. A LinkedHashSet would do the trick, but if all you need is the iteration, an ArrayList would probably serve you better.
List<Foo> parameters = new ArrayList<>(parametersSet);
Now parameters will always return an iterator with the same ordering.
That said, you'd probably be fine with Evgeniy Dorofeev's suggestion, which points out that even the sets that don't guarantee a particular ordering usually do have a stable ordering (even if they don't guarantee it). HashSet acts that way, for instance. You'd actually have to have a pretty funky set, or take active randomization measures, to not have a stable ordering.
HashSet's ordering is not guaranteed, but it depends on the hash codes of its elements as well as the order in which they were inserted; they don't want to guarantee anything because they don't want to lock themselves into any one strategy, and even this loose of a contract would make for essentially random order if the objects' hash codes came from Object.hashCode(). Rather than specifying an ordering with complex implications, and then saying it's subject to change, they just said there's no guarantees. But those are the two factors for ordering, and if the set isn't being modified, then those two factors are going to be stable from one iteration to the next.
'HashSet.iterator does not return in any particular order' means that the elements returned by iterator are not sorted or ordered like in List or LinkedHashSet. But the HashSet.iterator will always return the elements in one and the same order while the HashSet is the same.
HashSet iterator is actually predictable, see this
HashSet set = new HashSet();
set.add(9);
set.add(2);
set.add(5);
set.add(1);
System.out.println(set);
I can foretell the output, it will be 1, 2, 5, 9. Because the elements kind of sorted by hashCode.
Consider a class with a comparable (consistent with equals) and a non-comparable field (of a class about which I do not know whether it overrides Object#equals or not).
The class' instances shall be compared, where the resulting order shall be consistent with equals, i.e. 0 returned iff both fields are equal (as per Object#equals) and consistent with the order of the comparable field. I used System.identityHashCode to cover most of the cases not covered by these requirements (the order of instances with same comparable, but different other value is arbitrary), but am not sure whether this is the best approach.
public class MyClass implements Comparable<MyClass> {
private Integer intField;
private Object nonCompField;
public int compareTo(MyClass other) {
int intFieldComp = this.intField.compareTo(other.intField);
if (intFieldComp != 0)
return intFieldComp;
if (this.nonCompField.equals(other.nonCompField))
return 0;
// ...and now? My current approach:
if (Systems.identityHashCode(this.nonCompField) < Systems.identityHashCode(other.nonCompField))
return -1;
else
return 1;
}
}
Two problems I see here:
If Systems.identityHashCode is the same for two objects, each is greater than the other. (Can this happen at all?)
The order of instances with same intField value and different nonCompField values need not be consistent between runs of the program, as far as I understand what Systems.identityHashCode does.
Is that correct? Are there more problems? Most importantly, is there a way around this?
The first problem, although highly unlikely, could happen (I think you would need an enormous amount of memory, and a very bad luck). But it's solved by Guava's Ordering.arbitrary(), which uses the identity hash code behind the scenes, but maintains a cache of comparison results for the cases where two different objects have the same identity hash code.
Regarding your second question, no, the identity hash codes are not preserved between runs.
Systems.identityHashCode […] the same for two objects […] (Can this happen at all?)
Yes it can. Quoting from the Java API Documentation:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
identityHashCode(Object x) returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode().
So you may encounter hash collisions, and with memory ever growing but hash codes staying fixed at 32 bit, they will become increasingly more likely.
The order of instances with same intField value and different nonCompField values need not be consistent between runs of the program, as far as I understand what Systems.identityHashCode does.
Right. It might even be different during a single invocation of the same program: You could have (1,foo) < (1,bar) < (1,baz) even though foo.equals(baz).
Most importantly, is there a way around this?
You can maintain a map which maps each distinct value of the non-comparable type to a sequence number which you increase for each distinct value you encounter.
Memory management will be tricky, though: You cannot use a WeakHashMap as the code might make your key object unreachable but still hold a reference to another object of the same value. So either you maintain a list of weak references to all the objects of a given value, or you simply use strong references and accept the fact that any uncomparable value ever encountered will never be garbage collected.
Note that this scheme will still not result in reproducible sequence numbers unless you create values reproducibly in just the same order.
If the class of the nonCompField has implemented a reasonably good toString(), you might be able to use
return String.valueOf(this.nonCompField).compareTo(String.valueOf(other.nonCompField));
Unfortunately, the default Object.toString() uses the hashcode, which has potential issues as noted by others.
I'm trying to use a PriorityQueue to order objects using a Comparator.
This can be achieved easily, but the objects class variables (with which the comparator calculates priority) may change after the initial insertion. Most people have suggested the simple solution of removing the object, updating the values and reinserting it again, as this is when the priority queue's comparator is put into action.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
You have to remove and re-insert, as the queue works by putting new elements in the appropriate position when they are inserted. This is much faster than the alternative of finding the highest-priority element every time you pull out of the queue. The drawback is that you cannot change the priority after the element has been inserted. A TreeMap has the same limitation (as does a HashMap, which also breaks when the hashcode of its elements changes after insertion).
If you want to write a wrapper, you can move the comparison code from enqueue to dequeue. You would not need to sort at enqueue time anymore (because the order it creates would not be reliable anyway if you allow changes).
But this will perform worse, and you want to synchronize on the queue if you change any of the priorities. Since you need to add synchronization code when updating priorities, you might as well just dequeue and enqueue (you need the reference to the queue in both cases).
I don't know if there is a Java implementation, but if you're changing key values alot, you can use a Fibonnaci heap, which has O(1) amortized cost to decrease a key value of an entry in the heap, rather than O(log(n)) as in an ordinary heap.
One easy solution that you can implement is by just adding that element again into the priority queue. It will not change the way you extract the elements although it will consume more space but that also won't be too much to effect your running time.
To proof this let's consider dijkstra algorithm below
public int[] dijkstra() {
int distance[] = new int[this.vertices];
int previous[] = new int[this.vertices];
for (int i = 0; i < this.vertices; i++) {
distance[i] = Integer.MAX_VALUE;
previous[i] = -1;
}
distance[0] = 0;
previous[0] = 0;
PriorityQueue<Node> pQueue = new PriorityQueue<>(this.vertices, new NodeComparison());
addValues(pQueue, distance);
while (!pQueue.isEmpty()) {
Node n = pQueue.remove();
List<Edge> neighbours = adjacencyList.get(n.position);
for (Edge neighbour : neighbours) {
if (distance[neighbour.destination] > distance[n.position] + neighbour.weight) {
distance[neighbour.destination] = distance[n.position] + neighbour.weight;
previous[neighbour.destination] = n.position;
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
}
}
}
return previous;
}
Here our interest is in line
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
I am not changing priority of the particular node by removing it and adding again rather I am just adding new node with same value but different priority.
Now at the time of extracting I will always get this node first because I have implemented min heap here and the node with value greater than this (less priority) always be extracted afterwards and in this way all neighboring nodes will already be relaxed when less prior element will be extracted.
Without reimplementing the priority queue yourself (so by only using utils.PriorityQueue) you have essentially two main approaches:
1) Remove and put back
Remove element then put it back with new priority. This is explained in the answers above. Removing an element is O(n) so this approach is quite slow.
2) Use a Map and keep stale items in the queue
Keep a HashMap of item -> priority. The keys of the map are the items (without their priority) and the values of the map are the priorities.
Keep it in sync with the PriorityQueue (i.e. every time you add or remove an item from the Queue, update the Map accordingly).
Now when you need to change the priority of an item, simply add the same item to the queue with a different priority (and update the map of course). When you poll an item from the queue, check if its priority is the same than in your map. If not, then ditch it and poll again.
If you don't need to change the priorities too often, this second approach is faster. Your heap will be larger and you might need to poll more times, but you don't need to find your item.
The 'change priority' operation would be O(f(n)log n*), with f(n) the number of 'change priority' operation per item and n* the actual size of your heap (which is n*f(n)).
I believe that if f(n) is O(n/logn)(for example f(n) = O(sqrt(n)), this is faster than the first approach.
Note : in the explanation above, by priority I means all the variables that are used in your Comparator. Also your item need to implement equals and hashcode, and both methods shouldn't use the priority variables.
It depends a lot on whether you have direct control of when the values change.
If you know when the values change, you can either remove and reinsert (which in fact is fairly expensive, as removing requires a linear scan over the heap!).
Furthermore, you can use an UpdatableHeap structure (not in stock java though) for this situation. Essentially, that is a heap that tracks the position of elements in a hashmap. This way, when the priority of an element changes, it can repair the heap. Third, you can look for an Fibonacci heap which does the same.
Depending on your update rate, a linear scan / quicksort / QuickSelect each time might also work. In particular if you have much more updates than pulls, this is the way to go. QuickSelect is probably best if you have batches of update and then batches of pull opertions.
To trigger reheapify try this:
if(!priorityQueue.isEmpty()) {
priorityQueue.add(priorityQueue.remove());
}
Something I've tried and it works so far, is peeking to see if the reference to the object you're changing is the same as the head of the PriorityQueue, if it is, then you poll(), change then re-insert; else you can change without polling because when the head is polled, then the heap is heapified anyways.
DOWNSIDE: This changes the priority for Objects with the same Priority.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
It depends on the definition of "better" and the implementation of the wrapper.
If the implementation of the wrapper is to re-insert the value using the PriorityQueue's .remove(...) and .add(...) methods,
it's important to point out that .remove(...) runs in O(n) time.
Depending on the heap implementation,
updating the priority of a value can be done in O(log n) or even O(1) time,
therefore this wrapper suggestion may fall short of common expectations.
If you want to minimize your effort to implement,
as well as the risk of bugs of any custom solution,
then a wrapper that performs re-insert looks easy and safe.
If you want the implementation to be faster than O(n),
then you have some options:
Implement a heap yourself. The wikipedia entry describes multiple variants with their properties. This approach is likely to get your the best performance, at the same time the more code you write yourself, the greater the risk of bugs.
Implement a different kind of wrapper: handlee updating the priority by marking the entry as removed, and add a new entry with the revised priority.
This is relatively easy to do (less code), see below, though it has its own caveats.
I came across the second idea in Python's documentation,
and applied it to implement a reusable data structure in Java (see caveats at the bottom):
public class UpdatableHeap<T> {
private final PriorityQueue<Node<T>> pq = new PriorityQueue<>(Comparator.comparingInt(node -> node.priority));
private final Map<T, Node<T>> entries = new HashMap<>();
public void addOrUpdate(T value, int priority) {
if (entries.containsKey(value)) {
entries.remove(value).removed = true;
}
Node<T> node = new Node<>(value, priority);
entries.put(value, node);
pq.add(node);
}
public T pop() {
while (!pq.isEmpty()) {
Node<T> node = pq.poll();
if (!node.removed) {
entries.remove(node.value);
return node.value;
}
}
throw new IllegalStateException("pop from empty heap");
}
public boolean isEmpty() {
return entries.isEmpty();
}
private static class Node<T> {
private final T value;
private final int priority;
private boolean removed = false;
private Node(T value, int priority) {
this.value = value;
this.priority = priority;
}
}
}
Note some caveats:
Entries marked removed stay in memory until they are popped
This can be unacceptable in use cases with very frequent updates
The internal Node wrapped around the actual values is an extra memory overhead (constant per entry). There is also an internal Map, mapping all the values currently in the priority queue to their Node wrapper.
Since the values are used in a map, users must be aware of the usual cautions when using a map, and make sure to have appropriate equals and hashCode implementations.