I'm woking on Java with a large (millions) hashmap that is actually built with a capacity of 10.000.000 and a load factor of .75 and it's used to cache some values
since cached values become useless with time (not accessed anymore) but I can't remove useless ones while on the way I would like to entirely empty the cache when its performance starts to degrade. How can I decide when it's good to do it?
For example, with 10 millions capacity and .75 should I empty it when it reaches 7.5 millions of elements? Because I tried various threshold values but I would like to have an analytic one.
I've already tested the fact that emping it when it's quite full is a boost for perfomance (first 2-3 algorithm iterations after the wipe just fill it back, then it starts running faster than before the wipe)
EDIT: ADDITIONAL INFO
The hashmap has long as keys and float as values. It contains cached correlation of contents, since it's a dot product of tag vectors I wanted to cache them (to increase performance).
So basically what I do is to compute a long key using the hashcodes of the 2 contents:
static private long computeKey(Object o1, Object o2)
{
int h1 = o1.hashCode();
int h2 = o2.hashCode();
if (h1 < h2)
{
int swap = h1;
h1 = h2;
h2 = swap;
}
return ((long)h1) << 32 | h2;
}
and use it to retrieve stored values. What happens is that since it's a hierarchical clustering contents are merged and their correlation values with other contents are not needed any more.. that's why I want to wipe the hashmap from time to time, to avoid degradation due to useless values inside it.
Using a WeakHashMap will unpredictably wipe out data also when they are still needed.. I've no control over it.
Thanks
Why not use an LRU Cache?
From Java's LinkedHashMap documentation:
A special constructor is provided to
create a linked hash map whose order
of iteration is the order in which its
entries were last accessed, from
least-recently accessed to
most-recently (access-order). This
kind of map is well-suited to building
LRU caches. Invoking the put or get
method results in an access to the
corresponding entry (assuming it
exists after the invocation
completes). The putAll method
generates one entry access for each
mapping in the specified map, in the
order that key-value mappings are
provided by the specified map's entry
set iterator. No other methods
generate entry accesses. In
particular, operations on
collection-views do not affect the
order of iteration of the backing map.
So basically, every once in a while as your map gets too big, just delete the first x values that the iterator gives you.
See documentation for removeEldestEntry to have this done for you automatically.
Here is code that demonstrates:
public static void main(String[] args) {
class CacheMap extends LinkedHashMap{
private int maxCapacity;
public CacheMap(int initialCapacity, int maxCapacity) {
super(initialCapacity, 0.75f, true);
this.maxCapacity = maxCapacity;
}
#Override
protected boolean removeEldestEntry(Map.Entry eldest) {
return size()>maxCapacity;
}
}
int[] popular = {1,2,3,4,5};
CacheMap myCache = new CacheMap(5, 10);
for (int i=0; i<100; i++){
myCache.put(i,i);
for (int p : popular) {
myCache.get(p);
}
}
System.out.println(myCache.toString());
//{95=95, 96=96, 97=97, 98=98, 99=99, 1=1, 2=2, 3=3, 4=4, 5=5}
}
Have you investigated WeakHashMaps ? The garbage collector can determine when to remove stuff and it may give you an acceptable substitute rather than coding something yourself.
This article has more useful information.
You may want to use Google Collections' MapMaker to make a map with soft references and a specific timeout.
Soft References "are cleared at the discretion of the garbage collector in response to memory demand."
Example:
ConcurrentMap<Long, ValueTypeHere> cacheMap = new MapMaker()
.concurrencyLevel(32)
.softValues()
.expiration(30, TimeUnit.MINUTES)
.makeMap();
You can also specify weakKeys if you want to make its keys act like ones in a WeakHashMap.
Related
For example, I need 10 objects stored in the hashmap.
So it creates keys 1,2,3,4,5
Then when I'm finished with '3', it deletes the whole entry and key for '3'. Making that key able to re-used for new object mappings -if I run over, via integer overflow or something.
Thoughts?
public static HashMap <GameKey, GameState> myMap = new HashMap<GameKey, GameState>();
int i=0;
public void MapNewGameState(Gamestate gs){
myMap.add(i, gameStateA);
i++;
}
myMap.remove("3");
//Now I want to be sure that my MapNewGameState function is able to eventually map a new GameState to the key "3" later on,
this is more a question about if HashMaps can be used in this way.
As I understand it you propose a key-pool where you get a key and if you don't need it anymore you put it back into the pool? If so this doesnt really make much sense since it adds complexity to your code but no other benefits (usually you pool something thats expensive to create or hold). And usually you want to recycle the value not the key?
To create truly (practical) unique keys use UUID.randomUUID(), whith this you don't have to worry about keys.
I came across this old question because I have a use case in Mono Webassembly (where we have to code as though CPU speeds are back to 2000s levels) that warrants this, and was disappointed that no answer was actually given. So let me offer my solution that works for integer keys (and could be adapted to work with keys of any type by providing a parameterized way to generate a new key):
public class RecyclingDictionary<T> : Dictionary<int, T>
{
Queue<int> _freedKeys = new Queue<int>();
int _lastAssignedKey = -1;
public int Add(T item)
{
var key = GetNextKey();
base[key] = item;
return key;
}
public new void Remove(int key)
{
base.Remove(key);
_freedKeys.Enqueue(key);
}
private int GetNextKey()
{
if (_freedKeys.Count > 0)
{
int key = _freedKeys.Dequeue();
return key;
}
return ++_lastAssignedKey;
}
}
As for why one might want to do this, well, here are a couple reasons:
Guid's are expensive to generate and store in memory (36 bytes when stringified vs. 4 for int). They're also more expensive to hash and thus more expensive to use as Dictionary keys.
Integer keys are usually much faster than string keys, however, there is an article (which I sadly cannot find) from years ago that analyzed the efficiency of different key types and concluded that simple incrementing integers were actually a POOR key type. As I recall, it had to do with the way dictionaries bucket their entries for purposes of the binary search tree. Thus the impetus to recycle keys.
What is the use of ConcurrentHashMap in Java? What are its benefits? How does it work?
Sample code would be useful too.
The point is to provide an implementation of HashMap that is threadsafe. Multiple threads can read from and write to it without the chance of receiving out-of-date or corrupted data. ConcurrentHashMap provides its own synchronization, so you do not have to synchronize accesses to it explicitly.
Another feature of ConcurrentHashMap is that it provides the putIfAbsent method, which will atomically add a mapping if the specified key does not exist. Consider the following code:
ConcurrentHashMap<String, Integer> myMap = new ConcurrentHashMap<String, Integer>();
// some stuff
if (!myMap.contains("key")) {
myMap.put("key", 3);
}
This code is not threadsafe, because another thread could add a mapping for "key" between the call to contains and the call to put. The correct implementation would be:
myMap.putIfAbsent("key", 3);
ConcurrentHashMap allow concurrent access to the map. HashTables too offers synchronized access to map, but your entire map is locked to perform any operation.
The logic behind ConcurrentHashMap is that your entire table is not getting locked, but only the portion[segments]. Each segments manages its own HashTable. Locking is applied only for updates. In case of of retrievals, it allows full concurrency.
Let's take four threads are concurrently working on a map whose capacity is 32, the table is partitioned into four segments where each segments manages a hash table of capacity. The collection maintains a list of 16 segments by default, each of which is used to guard (or lock on) a single bucket of the map.
This effectively means that 16 threads can modify the collection at a single time. This level of concurrency can be increased using the optional concurrencyLevel constructor argument.
public ConcurrentHashMap(int initialCapacity,
float loadFactor, int concurrencyLevel)
As the other answer stated, the ConcurrentHashMap offers new method putIfAbsent() which is similar to put except the value will not be overridden if the key exists.
private static Map<String,String> aMap =new ConcurrentHashMap<String,String>();
if(!aMap.contains("key"))
aMap.put("key","value");
The new method is also faster as it avoids double traversing as above. contains method has to locate the segment and iterate the table to find the key and again the method put has to traverse the bucket and put the key.
Really the big functional difference is it doesn't throw an exception and/or end up corrupt when someone else changes it while you're using it.
With regular collections, if another thread adds or removes an element while you're access it (via the iterator) it will throw an exception. ConcurrentHashMap lets them make the change and doesn't stop your thread.
Mind you it does not make any kind of synchronization guarantees or promises about the point-in-time visibility of the change from one thread to the other. (It's sort of like a read-committed database isolation, rather than a synchronized map which behaves more like a serializable database isolation. (old school row-locking SQL serializable, not Oracle-ish multiversion serializable :) )
The most common use I know of is in caching immutable derived information in App Server environments where many threads may be accessing the same thing, and it doesn't really matter if two happen to calculate the same cache value and put it twice because they interleave, etc. (e.g., it's used extensively inside the Spring WebMVC framework for holding runtime-derived config like mappings from URLs to Handler Methods.)
It can be used for memoization:
import java.util.concurrent.ConcurrentHashMap;
public static Function<Integer, Integer> fib = (n) -> {
Map<Integer, Integer> cache = new ConcurrentHashMap<>();
if (n == 0 || n == 1) return n;
return cache.computeIfAbsent(n, (key) -> HelloWorld.fib.apply(n - 2) + HelloWorld.fib.apply(n - 1));
};
1.ConcurrentHashMap is thread-safe that is the code can be accessed by single thread at a time .
2.ConcurrentHashMap synchronizes or locks on the certain portion of the Map . To optimize the performance of ConcurrentHashMap , Map is divided into different partitions depending upon the Concurrency level . So that we do not need to synchronize the whole Map Object.
3.Default concurrency level is 16, accordingly map is divided into 16 part and each part is governed with a different lock that means 16 thread can operate.
4.ConcurrentHashMap does not allow NULL values . So the key can not be null in ConcurrentHashMap .
Hello guys today we discussed the ConcurrentHashMap.
What is ConcurrentHashMap?
ConcurrentHashMap is a class it introduce in java 1.5 which implements the ConcurrentMap as well as the Serializable interface. ConcurrentHashMap is enhance the HashMap when it dealing with multiple Theading.
As we know when the application has multiple threading HashMap is not a good choice because performance issue occurred.
There are the some key point of ConcurrentHashMap.
Underling data structure for ConcurrentHashMap is HashTable.
ConcurrentHashMap is a class, That class is thread safe, it means multiple thread can access on a single thread object without any complication.
ConcurretnHashMap object is divided into number of segment according to the concurrency level.
The Default Concurrency-level of ConcurrentHashMap is 16.
In ConcurrentHashMap any number of Thread can perform the retrieval operation,But for updation in object Thread must lock the particular Segment in which thread want to operate.
This type of locking mechanism is known as Segment-Locking OR Bucket-Locking.
In ConcurrentHashMap the 16 updation operation perform at a time.
Null insertion is not possible in ConcurrentHashMap.
Here are the ConcurrentHashMap construction.
ConcurrentHashMap m=new ConcurrentHashMap();:Creates a new, empty map with a default initial capacity (16), load factor (0.75) and concurrencyLevel (16).
ConcurrentHashMap m=new ConcurrentHashMap(int initialCapacity);:Creates a new, empty map with the specified initial capacity, and with default load factor (0.75) and concurrencyLevel (16).
ConcurrentHashMap m=new ConcurrentHashMap(int initialCapacity, float loadFactor);:
Creates a new, empty map with the specified initial capacity and load factor and with the default concurrencyLevel (16).
ConcurrentHashMap m=new ConcurrentHashMap(int initialCapacity, float loadFactor, int concurrencyLevel);:Creates a new, empty map with the specified initial capacity, load factor and concurrency level.
ConcurrentHashMap m=new ConcurrentHashMap(Map m);:Creates a new map with the same mappings as the given map.
ConcurretHashMap has one method named is putIfAbsent(); That method is prevent to store the duplicate key please refer the below example.
import java.util.concurrent.*;
class ConcurrentHashMapDemo {
public static void main(String[] args)
{
ConcurrentHashMap m = new ConcurrentHashMap();
m.put(1, "Hello");
m.put(2, "Vala");
m.put(3, "Sarakar");
// Here we cant add Hello because 1 key
// is already present in ConcurrentHashMap object
m.putIfAbsent(1, "Hello");
// We can remove entry because 2 key
// is associated with For value
m.remove(2, "Vala");
// Now we can add Vala
m.putIfAbsent(4, "Vala");
System.out.println(m);
}
}
Is there a simple, efficient Map implementation that allows a limit on the memory to be used by the map.
My use case is that I want to allocate dynamically most of the memory available at the time of its creation but I don't want OutOFMemoryError at any time in future. Basically, I want to use this map as a cache, but but I wanna avoid heavy cache implementations like EHCache. My need is simple (at most an LRU algorithm)
I should further clarify that objects in my cache are char[] or similar primitives that will not hold references to other objects.
I can put an upper limit on max size for each entry.
You can use a LinkedHashMap to limit the number of entries in the Map:
removeEldestEntry(Map.Entry<K,V> eldest): Returns true if this map should remove its eldest entry. This method is invoked by put and putAll after inserting a new entry into the map. It provides the implementor with the opportunity to remove the eldest entry each time a new one is added. This is useful if the map represents a cache: it allows the map to reduce memory consumption by deleting stale entries.
Sample use: this override will allow the map to grow up to 100 entries and then delete the eldest entry each time a new entry is added, maintaining a steady state of 100 entries.
private static final int MAX_ENTRIES = 100;
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
Related questions
How do I limit the number of entries in a java hashtable?
Easy, simple to use LRU cache in java
What is a data structure kind of like a hash table, but infrequently-used keys are deleted?
For caches, a SoftHashMap is much more appropriate than a WeakHashMap. A WeakhashMap is usually used when you want to maintain an association with an object for as long as that object is alive, but without preventing it from being reclaimed.
In contrast, a SoftReference is more closely involved with memory allocation. See No SoftHashMap? for details on the differences.
WeakHashMap is also not usually appropriate as it has the association around the wrong way for a cache - it uses weak keys and hard values. That is, the key and value are removed from the map when the key is cleared by the garbage collector. This is typically not what you want for a cache - where the keys are usually lightweight identifiers (e.g. strings, or some other simple value type) - caches usually operate such that the key/value is reclaimed when the value reference is cleared.
The Commons Collections has a ReferenceMap where you can plug in what types of references you wish to use for keys and values. For a memory-sensitive cache, you will probably use hard references for keys, and soft references for values.
To obtain LRU semantics for a given number of references N, maintain a list of the last N entries fetched from the cache - as an entry is retrieved from the cache it is added to the head of the list (and the tail of the list removed.) To ensure this does not hold on to too much memory, you can create a soft reference and use that as a trigger to evict a percentage of the entries from the end of the list. (And create a new soft reference for the next trigger.)
Java Platform Solutions
If all you're looking for is a Map whose keys can be cleaned up to avoid OutOfMemoryErrors, you might want to look into WeakHashMap. It uses WeakReferences in order to allow the garbage collector to reap the map entries. It won't enforce any sort of LRU semantics, though, except those present in the generational garbage collection.
There's also LinkedHashMap, which has this in the documentation:
A special constructor is provided to
create a linked hash map whose order
of iteration is the order in which its
entries were last accessed, from
least-recently accessed to
most-recently (access-order). This
kind of map is well-suited to building
LRU caches. Invoking the put or get
method results in an access to the
corresponding entry (assuming it
exists after the invocation
completes). The putAll method
generates one entry access for each
mapping in the specified map, in the
order that key-value mappings are
provided by the specified map's entry
set iterator. No other methods
generate entry accesses. In
particular, operations on
collection-views do not affect the
order of iteration of the backing map.
So if you use this constructor to make a map whose Iterator iterates in LRU, it becomes pretty easy to prune the map. The one (fairly big) caveat is that LinkedHashMap is not synchronized whatsoever, so you're on your own for concurrency. You can just wrap it in a synchronized wrapper, but that may have throughput issues.
Roll Your Own Solution
If I had to write my own data structure for this use-case, I'd probably create some sort of data structure with a map, queue, and ReadWriteLock along with a janitor thread to handle the cleanup when too many entries were in the map. It would be possible to go slightly over the desired max size, but in the steady-state you'd stay under it.
WeakHashMap won't necessarily attain your purpose since if enough strong reference to the keys are hold by your app., you WILL see OOME.
Alternatively you could look into SoftReference, which will null out the content once the heap is scarce. However, most of the comments I seen indicate that it will not null out the reference until the heap is really really low and a lot of GC starts to kick in with severe performance hit (so I don't recommend using it for your purpose).
My recommendation is to use a simple LRU map, e.g. http://commons.apache.org/collections/apidocs/org/apache/commons/collections/LRUMap.html
thanks for replies guys!
As jasonmp85 pointed out LinkedHashMap has a constructor that allows access order. I missed out that bit when I looked at API docs. The implementation also looks quite efficient(see below). Combined with max size cap for each entry, that should solve my problem.
I will also look closely at SoftReference. Just for the record, Google Collections seems to have pretty good API for SoftKeys and SoftValues and Maps in general.
Here is a snippet from Java LikedHashMap class that shows how they maintain LRU behavior.
/**
* Removes this entry from the linked list.
*/
private void remove() {
before.after = after;
after.before = before;
}
/**
* Inserts this entry before the specified existing entry in the list.
*/
private void addBefore(Entry<K,V> existingEntry) {
after = existingEntry;
before = existingEntry.before;
before.after = this;
after.before = this;
}
/**
* This method is invoked by the superclass whenever the value
* of a pre-existing entry is read by Map.get or modified by Map.set.
* If the enclosing Map is access-ordered, it moves the entry
* to the end of the list; otherwise, it does nothing.
*/
void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
if (lm.accessOrder) {
lm.modCount++;
remove();
addBefore(lm.header);
}
I have similar problem to one discussed here, but with stronger practical usage.
For example, I have a Map<String, Integer>, and I have some function, which is given a key and in case the mapped integer value is negative, puts NULL to the map:
Map<String, Integer> map = new HashMap<String, Integer>();
public void nullifyIfNegative(String key) {
Integer value = map.get(key);
if (value != null && value.intValue() < 0) {
map.put(key, null);
}
}
I this case, the lookup (and hence, hashCode calculation for the key) is done twice: one for lookup and one for replacement. It would be nice to have another method (which is already in HashMap) and allows to make this more effective:
public void nullifyIfNegative(String key) {
Map.Entry<String, Integer> entry = map.getEntry(key);
if (entry != null && entry.getValue().intValue() < 0) {
entry.setValue(null);
}
}
The same concerns cases, when you want to manipulate immutable objects, which can be map values:
Map<String, String>: I want to append something to the string value.
Map<String, int[]>: I want to insert a number into the array.
So the case is quite common. Solutions, which might work, but not for me:
Reflection. Is good, but I cannot sacrifice performance just for this nice feature.
Use org.apache.commons.collections.map.AbstractHashedMap (it has at least protected getEntry() method), but unfortunately, commons-collections do not support generics.
Use generic commons-collections, but this library (AFAIK) is out-of-date (not in sync with latest library version from Apache), and (what is critical) is not available in central maven repository.
Use value wrappers, which means "making values mutable" (e.g. use mutable integers [e.g. org.apache.commons.lang.mutable.MutableInt], or collections instead of arrays). This solutions leads to memory loss, which I would like to avoid.
Try to extend java.util.HashMap with custom class implementation (which should be in java.util package) and put it to endorsed folder (as java.lang.ClassLoader will refuse to load it in Class<?> defineClass(String name, byte[] b, int off, int len), see sources), but I don't want to patch JDK and it seems like the list of packages that can be endorsed, does not include java.util.
The similar question is already raised on sun.com bugtracker, but I would like to know, what is the opinion of the community and what can be the way out taking in mind the maximum memory & performance effectiveness.
If you agree, this is nice and beneficiary functionality, please, vote this bug!
As a logical matter, you're right in that the single getEntry would save you a hash lookup. As a practical matter, unless you have a specific use case where you have reason to be concerned about the performance hit( which seems pretty unlikely, hash lookup is common, O(1), and well optimized) what you're worrying about is probably negligible.
Why don't you write a test? Create a hashtable with a few 10's of millions of objects, or whatever's an order of magnitude greater than what your application is likely to create, and average the time of a get() over a million or so iterations (hint: it's going to be a very small number).
A bigger issue with what you're doing is synchronization. You should be aware that if you're doing conditional alterations on a map you could run into issues, even if you're using a Synchronized map, as you'd have to lock access to the key covering the span of both the get() and set() operations.
Not pretty, but you could use lightweight object to hold a reference to the actual value to avoid second lookups.
HashMap<String, String[]> map = ...;
// append value to the current value of key
String key = "key";
String value = "value";
// I use an array to hold a reference - even uglier than the whole idea itself ;)
String[] ref = new String[1]; // lightweigt object
String[] prev = map.put(key, ref);
ref[0] = (prev != null) ? prev[0] + value : value;
I wouldn't worry about hash lookup performance too much though (Steve B's answer is pretty good in pointing out why). Especially with String keys, I wouldn't worry too much about hashCode() as its result is cached. You could worry about equals() though as it might be called more than once per lookup. But for short strings (which are often used as keys) this is negligible too.
There are no performance gain from this proposal, because performance of Map in average case is O(1). But enabling access to the raw Entry in such case will raise another problem. It will be possible to change key in entry (even if it's only possible via reflection) and therefore break order of the internal array.
I'm trying to use a PriorityQueue to order objects using a Comparator.
This can be achieved easily, but the objects class variables (with which the comparator calculates priority) may change after the initial insertion. Most people have suggested the simple solution of removing the object, updating the values and reinserting it again, as this is when the priority queue's comparator is put into action.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
You have to remove and re-insert, as the queue works by putting new elements in the appropriate position when they are inserted. This is much faster than the alternative of finding the highest-priority element every time you pull out of the queue. The drawback is that you cannot change the priority after the element has been inserted. A TreeMap has the same limitation (as does a HashMap, which also breaks when the hashcode of its elements changes after insertion).
If you want to write a wrapper, you can move the comparison code from enqueue to dequeue. You would not need to sort at enqueue time anymore (because the order it creates would not be reliable anyway if you allow changes).
But this will perform worse, and you want to synchronize on the queue if you change any of the priorities. Since you need to add synchronization code when updating priorities, you might as well just dequeue and enqueue (you need the reference to the queue in both cases).
I don't know if there is a Java implementation, but if you're changing key values alot, you can use a Fibonnaci heap, which has O(1) amortized cost to decrease a key value of an entry in the heap, rather than O(log(n)) as in an ordinary heap.
One easy solution that you can implement is by just adding that element again into the priority queue. It will not change the way you extract the elements although it will consume more space but that also won't be too much to effect your running time.
To proof this let's consider dijkstra algorithm below
public int[] dijkstra() {
int distance[] = new int[this.vertices];
int previous[] = new int[this.vertices];
for (int i = 0; i < this.vertices; i++) {
distance[i] = Integer.MAX_VALUE;
previous[i] = -1;
}
distance[0] = 0;
previous[0] = 0;
PriorityQueue<Node> pQueue = new PriorityQueue<>(this.vertices, new NodeComparison());
addValues(pQueue, distance);
while (!pQueue.isEmpty()) {
Node n = pQueue.remove();
List<Edge> neighbours = adjacencyList.get(n.position);
for (Edge neighbour : neighbours) {
if (distance[neighbour.destination] > distance[n.position] + neighbour.weight) {
distance[neighbour.destination] = distance[n.position] + neighbour.weight;
previous[neighbour.destination] = n.position;
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
}
}
}
return previous;
}
Here our interest is in line
pQueue.add(new Node(neighbour.destination, distance[neighbour.destination]));
I am not changing priority of the particular node by removing it and adding again rather I am just adding new node with same value but different priority.
Now at the time of extracting I will always get this node first because I have implemented min heap here and the node with value greater than this (less priority) always be extracted afterwards and in this way all neighboring nodes will already be relaxed when less prior element will be extracted.
Without reimplementing the priority queue yourself (so by only using utils.PriorityQueue) you have essentially two main approaches:
1) Remove and put back
Remove element then put it back with new priority. This is explained in the answers above. Removing an element is O(n) so this approach is quite slow.
2) Use a Map and keep stale items in the queue
Keep a HashMap of item -> priority. The keys of the map are the items (without their priority) and the values of the map are the priorities.
Keep it in sync with the PriorityQueue (i.e. every time you add or remove an item from the Queue, update the Map accordingly).
Now when you need to change the priority of an item, simply add the same item to the queue with a different priority (and update the map of course). When you poll an item from the queue, check if its priority is the same than in your map. If not, then ditch it and poll again.
If you don't need to change the priorities too often, this second approach is faster. Your heap will be larger and you might need to poll more times, but you don't need to find your item.
The 'change priority' operation would be O(f(n)log n*), with f(n) the number of 'change priority' operation per item and n* the actual size of your heap (which is n*f(n)).
I believe that if f(n) is O(n/logn)(for example f(n) = O(sqrt(n)), this is faster than the first approach.
Note : in the explanation above, by priority I means all the variables that are used in your Comparator. Also your item need to implement equals and hashcode, and both methods shouldn't use the priority variables.
It depends a lot on whether you have direct control of when the values change.
If you know when the values change, you can either remove and reinsert (which in fact is fairly expensive, as removing requires a linear scan over the heap!).
Furthermore, you can use an UpdatableHeap structure (not in stock java though) for this situation. Essentially, that is a heap that tracks the position of elements in a hashmap. This way, when the priority of an element changes, it can repair the heap. Third, you can look for an Fibonacci heap which does the same.
Depending on your update rate, a linear scan / quicksort / QuickSelect each time might also work. In particular if you have much more updates than pulls, this is the way to go. QuickSelect is probably best if you have batches of update and then batches of pull opertions.
To trigger reheapify try this:
if(!priorityQueue.isEmpty()) {
priorityQueue.add(priorityQueue.remove());
}
Something I've tried and it works so far, is peeking to see if the reference to the object you're changing is the same as the head of the PriorityQueue, if it is, then you poll(), change then re-insert; else you can change without polling because when the head is polled, then the heap is heapified anyways.
DOWNSIDE: This changes the priority for Objects with the same Priority.
Is there a better way other than just creating a wrapper class around the PriorityQueue to do this?
It depends on the definition of "better" and the implementation of the wrapper.
If the implementation of the wrapper is to re-insert the value using the PriorityQueue's .remove(...) and .add(...) methods,
it's important to point out that .remove(...) runs in O(n) time.
Depending on the heap implementation,
updating the priority of a value can be done in O(log n) or even O(1) time,
therefore this wrapper suggestion may fall short of common expectations.
If you want to minimize your effort to implement,
as well as the risk of bugs of any custom solution,
then a wrapper that performs re-insert looks easy and safe.
If you want the implementation to be faster than O(n),
then you have some options:
Implement a heap yourself. The wikipedia entry describes multiple variants with their properties. This approach is likely to get your the best performance, at the same time the more code you write yourself, the greater the risk of bugs.
Implement a different kind of wrapper: handlee updating the priority by marking the entry as removed, and add a new entry with the revised priority.
This is relatively easy to do (less code), see below, though it has its own caveats.
I came across the second idea in Python's documentation,
and applied it to implement a reusable data structure in Java (see caveats at the bottom):
public class UpdatableHeap<T> {
private final PriorityQueue<Node<T>> pq = new PriorityQueue<>(Comparator.comparingInt(node -> node.priority));
private final Map<T, Node<T>> entries = new HashMap<>();
public void addOrUpdate(T value, int priority) {
if (entries.containsKey(value)) {
entries.remove(value).removed = true;
}
Node<T> node = new Node<>(value, priority);
entries.put(value, node);
pq.add(node);
}
public T pop() {
while (!pq.isEmpty()) {
Node<T> node = pq.poll();
if (!node.removed) {
entries.remove(node.value);
return node.value;
}
}
throw new IllegalStateException("pop from empty heap");
}
public boolean isEmpty() {
return entries.isEmpty();
}
private static class Node<T> {
private final T value;
private final int priority;
private boolean removed = false;
private Node(T value, int priority) {
this.value = value;
this.priority = priority;
}
}
}
Note some caveats:
Entries marked removed stay in memory until they are popped
This can be unacceptable in use cases with very frequent updates
The internal Node wrapped around the actual values is an extra memory overhead (constant per entry). There is also an internal Map, mapping all the values currently in the priority queue to their Node wrapper.
Since the values are used in a map, users must be aware of the usual cautions when using a map, and make sure to have appropriate equals and hashCode implementations.