PriorityQueue.poll() calls compareTo()?

PriorityQueue.poll() calls compareTo()? - java

I am implementing a PriorityQueue in my program. For that I have also implemented compareTo(). The compareTo() is being called when I perform add(), which is expected. But it is also called when I perform poll(). I thought that the function of poll() is just to remove the head. Why does it need to call compareTo()?

The way a priority queue is implemented is often done with a heap. Part of poll()ing requires restructuring the heap which requires the heap to compare elements... hence compareTo(). This is just a guess though (i.e. I have not dug into the source code to verify my claim).
Here's a quick search on how priority queues are implemented using heaps if you are interested: http://pages.cs.wisc.edu/~vernon/cs367/notes/11.PRIORITY-Q.html#imp
Actually just for fun I'll describe how this works in a non-rigorous fashion. A heap is a tree satisfying the heap property: parents are always less than or equal to their children (min heap) or parents are always at least as large as their children (max heap). PriorityQueue is a minheap so poll() removes the root (make sure you understand this). But what happens to the tree if you remove the root? It's no longer a tree... So the way they fix this is by moving the root of the tree to a leaf node (where it can be plucked without destroying the tree/invalidating the heap property), and putting some other node in the root. But which node do you put into the root? Intuitively you might think they'd put the left or right child of the root (those are "almost as small as the original root"). You can do that, but you'd then need to fix the subtree rooted at that child (and the code is ugly). Instead they do the same thing (conceptually) but do it slightly differently to make the code nicer. In particular, they pluck a leaf node and stick it in the root (generally you swap the root and the leaf node to do both steps simultaneously). However, the heap property is no longer necessarily satisfied (the leaf node we stuck in the root could be quite large!). To fix this, you "bubble down" the new root until you get it to its correct location. Specifically, you compare the new root with the left and right children and keep swapping (if the parent is larger than at least one of the children) until the heap property is satisfied. Notice that this swapping will indeed lead to a valid heap (you can prove this, but it's intuitive).

Everything is in the JavaDoc (emphasis mine):
An unbounded priority queue based on a priority heap.
And in the source code of poll() you'll find:
public E poll() {
//...
if (s != 0)
siftDown(0, x);
return result;
}
Where siftDown() is:
/**
* Inserts item x at position k, maintaining heap invariant by
* demoting x down the tree repeatedly until it is less than or
* equal to its children or is a leaf.
* [...]
*/
private void siftDown(int k, E x) {
if (comparator != null)
siftDownUsingComparator(k, x);
else
siftDownComparable(k, x);
}
The JavaDoc comment on siftDown() is crucial, read it carefully. Basically the undeerlying implementation of PriorityQueue uses a heap which has to be restructured every time you modify it by polling.
Why are you bothered by this? compareTo() should be lightweight, idempotent and side-effect free method, like equals(). You shouldn't put any restrictions on it.

Related

LinkedList new reverse algorithm

I created an algorithm for reversing a linked list which is mentioned below. Can someone tells me if its efficient or not. It is taking O(n) time tho.
private void insertBegining(T data) {
Node<T> newNode = new Node<T>(data);
newNode.setNextNode(root);
root = newNode;
}
private void reverseList(){
Node<T> curr = root;
while(curr.getNextNode() != null){
insertBegining(curr.getNextNode().getData());
curr.setNextNode(curr.getNextNode().getNextNode());
}
}

You don't need to create new nodes, just reuse the existing ones changing the next field, same complexity O(n) but less heap usage.
private void reverseList(){
Node<T> reversed=null;
while(root != null){
Node<T> next=root.getNextNode();
root.setNextNode(reversed);
reversed=root;
root=next;
}
root=reversed;
}

Can someone tells me if its efficient or not.
It is incredibly inefficient. There's no fixing this; linked lists just are, by nature. Don't use them in your code if you like efficiency.
There are two different 'kinds' of efficiency: Academic/Algorithmic (described, generally, in big-O notation), and pragmatic efficiency: How long does it actually take, on actual real life modern commonly employed hardware, such as ARM and i86/64 architecture chips on windows, linux, and macos.
If you want to make reversing a LinkedList algorithmically faster than O(1), your only option is to work on the original form. For example, if you have a doubly-linked list, where each node is not just aware of the node that follows it, but also aware of the node that precedes it, then reversing a list can be an O(1) operation: Just create a wrapper that starts at the end and implements any attempt to 'go next' by actually invoking the 'getPrevious()' method. But, this all demands that you have a doubly-linked list to start with. If you just do not have it, then it is obviously impossible to reverse the list without iterating through it once, which dooms you to O(n) or worse performance.
The reason that linked lists are so, so bad (and this makes it considerably worse) in pragmatic terms is in particular the cache issue.
Modern CPU design uses hierarchical layers of on-chip cache. The CPUs no longer operate on main memory, because main memory is waaay too slow; the CPU can process 500 cycles worth of instructions or more (and, generally, a bunch of them more or less in parallel because CPUs have pipelines, so it could do a heck of a lot of work in those 500 cycles), just in the time it takes to fetch some data from memory.
The solution is that nowadays CPUs can't even access memory anymore, at all. Instead, the CPU only operates on a page of memory loaded in a CPU cache. If the CPU needs to act on data that isn't loaded in a page that is in cache, then the CPU tells the memory controller to go fetch it, and will then go to sleep or do other work. A cache page is eventually 'saved' back into actual memory later by the controller when the CPU is done operating on it.
Whenever a CPU core needs to operate on memory that isn't loaded in a cache page that's called a cache miss. These are incredibly expensive (those 500+ cycles I mentioned).
The problem with linked lists, at least as implemented above, is the problem of fragmentation: You have not just the objects stored in your linked list (say, it's a linked list of strings - those strings), you also have these node objects.
The locations in memory of both the strings and the node objects are crucial.
The best possible situation is if all these node objects are all stored in a contiguous block of memory (all next to each other, nicely ordered). This way, if you are e.g. just iterating through a list to e.g. figure out how large it is, in memory you get the minimum amount of misses (you'd process an entire cache-page's worth of node objects and then move on to the next page). However, often you also interact with the objects these nodes are pointing at, and generally the strings are in a different place.
The worst possible situation is if the nodes are scattered throughout memory, causing a cache miss on every iteration. Often nodes and the data they contain are intermixed which is not good, especially if the data contained is large.
That's why node-based linked lists are inherently inefficient. It'd be slightly more efficient if the objects you are storing themselves contain the next/prev pointers, but java doesn't make this easy and design-wise it's annoying (it conflates ideas and means an object can only exist in one linked list at a time. Java doesn't allow you to create on-the-fly alternate definitions of objects that have mixed in a next and prev field).
ArrayList is what you generally want.

You don't need to create new nodes, just reuse the existing ones by changing the direction of next pointer, below is the code with same complexity O(n) but less heap usage. It uses the concept of 3 pointers to reverse a list.
private void reverseList(){
if (head == null) {
throw new EmptyListException(EMPTY_LIST);
} else if (head.next == null) {
return;
}
Node<T> nextNode;
Node<T> node = head;
Node<T> prevNode = null;
while (node != null) {
nextNode = node.next;
node.next = prevNode;
prevNode = node;
node = nextNode;
}
head = prevNode;
}

Dynamic Sorting Queue Java [duplicate]

I need to implement a priority queue where the priority of an item in the queue can change and the queue adjusts itself so that items are always removed in the correct order. I have some ideas of how I could implement this but I'm sure this is quite a common data structure so I'm hoping I can use an implementation by someone smarter than me as a base.
Can anyone tell me the name of this type of priority queue so I know what to search for or, even better, point me to an implementation?

Priority queues such as this are typically implemented using a binary heap data structure as someone else suggested, which usually is represented using an array but could also use a binary tree. It actually is not hard to increase or decrease the priority of an element in the heap. If you know you are changing the priority of many elements before the next element is popped from the queue you can temporarily turn off dynamic reordering, insert all of the elements at the end of the heap, and then reorder the entire heap (at a cost of O(n)) just before the element needs to be popped. The important thing about heaps is that it only costs O(n) to put an array into heap order but O(n log n) to sort it.
I have used this approach successfully in a large project with dynamic priorities.
Here is my implementation of a parameterized priority queue implementation in the Curl programming language.

A standard binary heap supports 5 operations (the example below assume a max heap):
* find-max: return the maximum node of the heap
* delete-max: removing the root node of the heap
* increase-key: updating a key within the heap
* insert: adding a new key to the heap
* merge: joining two heaps to form a valid new heap containing all the elements of both.
As you can see, in a max heap, you can increase an arbitrary key. In a min heap you can decrease an arbitrary key. You can't change keys both ways unfortunately, but will this do? If you need to change keys both ways then you might want to think about using a a min-max-heap.

I would suggest first trying the head-in approach, to update a priority:
delete the item from the queue
re-insert it with the new priority
In C++, this could be done using a std::multi_map, the important thing is that the object must remember where it is stored in the structure to be able to delete itself efficiently. For re-insert, it's difficult since you cannot presume you know anything about the priorities.
class Item;
typedef std::multi_map<int, Item*> priority_queue;
class Item
{
public:
void add(priority_queue& queue);
void remove();
int getPriority() const;
void setPriority(int priority);
std::string& accessData();
const std::string& getData() const;
private:
int mPriority;
std::string mData;
priority_queue* mQueue;
priority_queue::iterator mIterator;
};
void Item::add(priority_queue& queue)
{
mQueue = &queue;
mIterator = queue.insert(std::make_pair(mPriority,this));
}
void Item::remove()
{
mQueue.erase(mIterator);
mQueue = 0;
mIterator = priority_queue::iterator();
}
void Item::setPriority(int priority)
{
mPriority = priority;
if (mQueue)
{
priority_queue& queue = *mQueue;
this->remove();
this->add(queue);
}
}

I am looking for just exactly the same thing!
And here is some of my idea:
Since a priority of an item keeps changing,
it's meaningless to sort the queue before retrieving an item.
So, we should forget using a priority queue. And "partially" sort the
container while retrieving an item.
And choose from the following STL sort algorithms:
a. partition
b. stable_partition
c. nth_element
d. partial_sort
e. partial_sort_copy
f. sort
g. stable_sort
partition, stable_partition and nth_element are linear-time sort algorithms, which should be our 1st choices.
BUT, it seems that there is no those algorithms provided in the official Java library. As a result, I will suggest you to use java.util.Collections.max/min to do what you want.

Google has a number of answers for you, including an implementation of one in Java.
However, this sounds like something that would be a homework problem, so if it is, I'd suggest trying to work through the ideas yourself first, then potentially referencing someone else's implementation if you get stuck somewhere and need a pointer in the right direction. That way, you're less likely to be "biased" towards the precise coding method used by the other programmer and more likely to understand why each piece of code is included and how it works. Sometimes it can be a little too tempting to do the paraphrasing equivalent of "copy and paste".

Java: strange order of queue made from priority queue

I wrote a maze solving program which is supposed to support DFS, BFS, A*, Dijkstra's, and greedy algorithm. Anyway, I chose PriorityQueue for my frontier data structure since I thought a priority can behave like a queue, stack, or priority queue depends on the implementation of the comparator.
This is how I implemented my comparator to turn the priority queue into a queue:
/Since the "natural ordering" of a priority queue has the least element at the head and a conventional comparator returns -1 when the first is less than the second, the hacked comparator always return 1 so that the current (last) square will be placed at the tail (this should work recursively)/
public int compare(Square square1, Square square2)
{
return 1;
}
However, my solution for the maze was not optimal after I did a BFS.
The maze starts at top right corner with coordinate (35,1) and my program checks the left, then up, then down, then right neighbour.
Here are the println I did:
polled out (35,1)
added (34,1)
added (35,2)
polled out (34,1)
added (33,1)
added (34,2)
polled out (35,2)
added (35,3)
polled out (33,1)
added (32,1)
added (33,2)
polled out (34,2)
add (34,3)
poll out (32,1)
......
Notice in a BFS (35,3) should be polled out before (32,1) since the former is added into the queue before the latter. What really confused me is that the data structure behaved like a queue--all new members were added from the back--until I added (32,1), which was placed at the head of the queue.
I thought my comparator should force the priority queue to put new comers in the back. What is even stranger to me is that the data structure changed its nature from a queue to a stack in the middle.
Many thanks to you guys ahead and sorry about my poor English,
Sincerely,
Sean

The way you've implemented compare is wrong, and would only work if it's called only in a very specific way that you're assuming. However, you have no idea in what context the PriorityQueue actually calls compare. The compare function might well be called on an existing element inside the data structure, instead of the new one, or vice versa.
(Even if you did read the source code and traced it and found that this particular implementation works in a certain way, you shouldn't depend on that if you want your code to be maintainable. At the least, you'd be making yourself more work by having to explain why it works.)
You could just use some sort of counter and assign it as the value for each added item, then implement compare correctly based on the value.
A correct implementation of compare might look like this:
int compare(Object x, Object y){
return x.getSomeProperty() - y.getSomeProperty();
}
Note that if you switch the order of the parameters, the answer will change as well. No, the int returned does not necessarily have to come from {-1, 0, 1}. The spec calls for 0, or a negative or positive integer. You can use any one you wish, so long as it's the correct sign.

Data structure to group the elements of equivalence classes

I have to implement a data structure that groups the elements of a equivalence classes.
The API:
interface Grouper<T>{
void same(T l, T r);
Set<EquivalenceClass<T>> equivalenceClasses();
}
interface EquivalenceClass<T>{
Set<T> members();
}
For example the grouping behaves like this:
Grouper g;
g.same(a, b);
g.equivalenceClasses() -> [[a,b]]
g.same(b, a);
g.equivalenceClasses() -> [[a,b]]
g.same(b, c);
g.equivalenceClasses() -> [[a,b,c]]
g.same(d, e);
g.equivalenceClasses() -> [[a,b,c], [d,e]]
g.same(c, d);
g.equivalenceClasses() -> [[a,b,c,d]]
I'm looking for an implementation that works up to ~10 million entries. It should be optimized to fill it and get the equivalence classes once.

Take a look at Union-Find. The union ("same") can be done trivially in O(log N), and can be done in effectively O(1) with some optimizations. The "equivalenceClasses" is O(N), which is the cost of visiting everything anyways.

If you are only going to query the equivalences classes once, the best solution is to build an undirected graph over the elements. Each equivalence is an undirected edge between the two items, and the equivalence classes correspond to the connected components. The time and space complexity will both be linear if you do it right.
Alternatively, you can use a Union-Find data structure, which will give you almost-linear time complexity. It may also be considered simpler, because all the complexities are encapsulated into the data structure. The reason Union-Find is not linear comes down to supporting efficient queries while the classes are growing.

Union-find is the best data structure for your problem, as long you only care about total running time (some operations may be slow, but the total cost of all operations is guaranteed to be nearly linear). Enumerating the members of each set is not typically supported in the plain version of union-find in textbooks though. As the name suggests, union-find typically only supports union (i.e., same) and find, which returns an identifier guaranteed to be the same as the identifier returned by a call to find on an element in the same set. If you need to enumerate the members of each set, you may have to implement it yourself so you can add, for example, child pointers so that you can traverse each tree representing a set.
If you are implementing this yourself, you don't have to implement the full union-find data structure to achieve amortized O(lg n) time per operation. Essentially, in this "light" version of union-find, each set would be a singly linked list with an extra pointer inside each node that points to a set identifier node that can be used to test whether two nodes belong to the same list. When the same method is executed, you can just append the smaller list to the larger and update the set identifiers for the elements of the smaller list. The total cost is at most O(lg n) per element because an element can be a member of the smaller list involved in a same operation at most O(lg n) times.

Updating Java PriorityQueue when its elements change priority

I'm trying to use a PriorityQueue to order objects using a Comparator.
This can be achieved easily, but the objects class variables (with which the comparator calculates priority) may change after the initial insertion. Most people have suggested the simple solution of removing the object, updating the values and reinserting it again, as this is when the priority queue's comparator is put into action.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?

You have to remove and re-insert, as the queue works by putting new elements in the appropriate position when they are inserted. This is much faster than the alternative of finding the highest-priority element every time you pull out of the queue. The drawback is that you cannot change the priority after the element has been inserted. A TreeMap has the same limitation (as does a HashMap, which also breaks when the hashcode of its elements changes after insertion).
If you want to write a wrapper, you can move the comparison code from enqueue to dequeue. You would not need to sort at enqueue time anymore (because the order it creates would not be reliable anyway if you allow changes).
But this will perform worse, and you want to synchronize on the queue if you change any of the priorities. Since you need to add synchronization code when updating priorities, you might as well just dequeue and enqueue (you need the reference to the queue in both cases).

I don't know if there is a Java implementation, but if you're changing key values alot, you can use a Fibonnaci heap, which has O(1) amortized cost to decrease a key value of an entry in the heap, rather than O(log(n)) as in an ordinary heap.

One easy solution that you can implement is by just adding that element again into the priority queue. It will not change the way you extract the elements although it will consume more space but that also won't be too much to effect your running time.
To proof this let's consider dijkstra algorithm below
public int[] dijkstra() {
int distance[] = new int[this.vertices];
int previous[] = new int[this.vertices];
for (int i = 0; i < this.vertices; i++) {
distance[i] = Integer.MAX_VALUE;
previous[i] = -1;
}
distance[0] = 0;
previous[0] = 0;
PriorityQueue<Node> pQueue = new PriorityQueue<>(this.vertices, new NodeComparison());
addValues(pQueue, distance);
while (!pQueue.isEmpty()) {
Node n = pQueue.remove();
List<Edge> neighbours = adjacencyList.get(n.position);
for (Edge neighbour : neighbours) {
if (distance[neighbour.destination] > distance[n.position] + neighbour.weight) {
distance[neighbour.destination] = distance[n.position] + neighbour.weight;
previous[neighbour.destination] = n.position;
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
}
}
}
return previous;
}
Here our interest is in line
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
I am not changing priority of the particular node by removing it and adding again rather I am just adding new node with same value but different priority.
Now at the time of extracting I will always get this node first because I have implemented min heap here and the node with value greater than this (less priority) always be extracted afterwards and in this way all neighboring nodes will already be relaxed when less prior element will be extracted.

Without reimplementing the priority queue yourself (so by only using utils.PriorityQueue) you have essentially two main approaches:
1) Remove and put back
Remove element then put it back with new priority. This is explained in the answers above. Removing an element is O(n) so this approach is quite slow.
2) Use a Map and keep stale items in the queue
Keep a HashMap of item -> priority. The keys of the map are the items (without their priority) and the values of the map are the priorities.
Keep it in sync with the PriorityQueue (i.e. every time you add or remove an item from the Queue, update the Map accordingly).
Now when you need to change the priority of an item, simply add the same item to the queue with a different priority (and update the map of course). When you poll an item from the queue, check if its priority is the same than in your map. If not, then ditch it and poll again.
If you don't need to change the priorities too often, this second approach is faster. Your heap will be larger and you might need to poll more times, but you don't need to find your item.
The 'change priority' operation would be O(f(n)log n*), with f(n) the number of 'change priority' operation per item and n* the actual size of your heap (which is n*f(n)).
I believe that if f(n) is O(n/logn)(for example f(n) = O(sqrt(n)), this is faster than the first approach.
Note : in the explanation above, by priority I means all the variables that are used in your Comparator. Also your item need to implement equals and hashcode, and both methods shouldn't use the priority variables.

It depends a lot on whether you have direct control of when the values change.
If you know when the values change, you can either remove and reinsert (which in fact is fairly expensive, as removing requires a linear scan over the heap!).
Furthermore, you can use an UpdatableHeap structure (not in stock java though) for this situation. Essentially, that is a heap that tracks the position of elements in a hashmap. This way, when the priority of an element changes, it can repair the heap. Third, you can look for an Fibonacci heap which does the same.
Depending on your update rate, a linear scan / quicksort / QuickSelect each time might also work. In particular if you have much more updates than pulls, this is the way to go. QuickSelect is probably best if you have batches of update and then batches of pull opertions.

To trigger reheapify try this:
if(!priorityQueue.isEmpty()) {
priorityQueue.add(priorityQueue.remove());
}

Something I've tried and it works so far, is peeking to see if the reference to the object you're changing is the same as the head of the PriorityQueue, if it is, then you poll(), change then re-insert; else you can change without polling because when the head is polled, then the heap is heapified anyways.
DOWNSIDE: This changes the priority for Objects with the same Priority.

Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
It depends on the definition of "better" and the implementation of the wrapper.
If the implementation of the wrapper is to re-insert the value using the PriorityQueue's .remove(...) and .add(...) methods,
it's important to point out that .remove(...) runs in O(n) time.
Depending on the heap implementation,
updating the priority of a value can be done in O(log n) or even O(1) time,
therefore this wrapper suggestion may fall short of common expectations.
If you want to minimize your effort to implement,
as well as the risk of bugs of any custom solution,
then a wrapper that performs re-insert looks easy and safe.
If you want the implementation to be faster than O(n),
then you have some options:
Implement a heap yourself. The wikipedia entry describes multiple variants with their properties. This approach is likely to get your the best performance, at the same time the more code you write yourself, the greater the risk of bugs.
Implement a different kind of wrapper: handlee updating the priority by marking the entry as removed, and add a new entry with the revised priority.
This is relatively easy to do (less code), see below, though it has its own caveats.
I came across the second idea in Python's documentation,
and applied it to implement a reusable data structure in Java (see caveats at the bottom):
public class UpdatableHeap<T> {
private final PriorityQueue<Node<T>> pq = new PriorityQueue<>(Comparator.comparingInt(node -> node.priority));
private final Map<T, Node<T>> entries = new HashMap<>();
public void addOrUpdate(T value, int priority) {
if (entries.containsKey(value)) {
entries.remove(value).removed = true;
}
Node<T> node = new Node<>(value, priority);
entries.put(value, node);
pq.add(node);
}
public T pop() {
while (!pq.isEmpty()) {
Node<T> node = pq.poll();
if (!node.removed) {
entries.remove(node.value);
return node.value;
}
}
throw new IllegalStateException("pop from empty heap");
}
public boolean isEmpty() {
return entries.isEmpty();
}
private static class Node<T> {
private final T value;
private final int priority;
private boolean removed = false;
private Node(T value, int priority) {
this.value = value;
this.priority = priority;
}
}
}
Note some caveats:
Entries marked removed stay in memory until they are popped
This can be unacceptable in use cases with very frequent updates
The internal Node wrapped around the actual values is an extra memory overhead (constant per entry). There is also an internal Map, mapping all the values currently in the priority queue to their Node wrapper.
Since the values are used in a map, users must be aware of the usual cautions when using a map, and make sure to have appropriate equals and hashCode implementations.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.