LinkedHashMap LRU Cache - Determine what values are removed?

LinkedHashMap LRU Cache - Determine what values are removed? - java

Background Information
You can make a LRU cache with a LinkedHashMap as shown at this link. Basically, you just:
Extend linked hash map.
Provide a capacity parameter.
Initialize the super class (LinkedHashMap) with parameters to tell it its capacity, scaling factor (which should never be used), and to keep items in insertion/reference order.
Override removeEldestEntry to remove the oldest entry when the capacity is breached.
My Question
This is a pretty standard LRU cache implementation. But one thing that I can't figure out how to do is how to be notified when the LinkedHashMap removes an entry due to it not being used recently enough.
I know I can make removeEldestEntry provide some form of notification... but is there any way to retrieve the element that is removed from the cache right when a new one is inserted (put) into the underlying map? Alternatively, is there a way to query for the last item that was removed from the cache?

You can get it to work with some creative use of thread local storage:
class LRUCacheLHM<K,V> extends LinkedHashMap<K,V> {
private int capacity;
public LRUCacheLHM(int capacity) {
//1 extra element as add happens before remove (101), and load factor big
//enough to avoid triggering resize. True = keep in access order.
super(capacity + 1, 1.1f, true);
this.capacity = capacity;
}
private ThreadLocal<Map.Entry<K,V>> removed = new ThreadLocal<Map.Entry<K,V>>();
private ThreadLocal<Boolean> report = new ThreadLocal<Boolean>();
{
report.set(false);
}
#Override
public boolean removeEldestEntry(Map.Entry<K,V> eldest) {
boolean res = size() > capacity;
if (res && report.get()) {
removed.set(eldest);
}
return res;
}
public Map.Entry<K,V> place(K k, V v) {
report.set(true);
put(k, v);
try {
return removed.get();
} finally {
removed.set(null);
report.set(false);
}
}
}
Demo.
The idea behind the place(K,V) method is to signal to removeEldestEntry that we would like to get the eldest entry by setting a thread-local report flag to true. When removeEldestEntry sees this flag and knows that an entry is being removed, it places the eldest entry in the report variable, which is thread-local as well.
The call to removeEldestEntry happens inside the call to the put method. After that the eldest entry is either null, or is sitting inside the report variable ready to be harvested.
Calling set(null) on removed is important to avoid lingering memory leaks.

is there any way to retrieve the element that is removed from the cache right when a new one is inserted (put) into the underlying map?
The removeEldestEntry is notified of the entry to be removed. You can add a listener which this method calls if you want to make it dynamically configurable.
From the Javadoc
protected boolean removeEldestEntry(Map.Entry eldest)
eldest - The least recently inserted entry in the map, or if this is an access-ordered map, the least recently accessed entry. This is the entry that will be removed it this method returns true. If the map was empty prior to the put or putAll invocation resulting in this invocation, this will be the entry that was just inserted; in other words, if the map contains a single entry, the eldest entry is also the newest.
.
is there a way to query for the last item that was removed from the cache?
The last item removed has been removed, however you could have the sub-class store this entry in a field you can retrieve later.

Related

LRUCache entry reordering when using get

I checked out official Android documentation for LRUCache which says : Each time a value is accessed, it is moved to the head of a queue. When a value is added to a full cache, the value at the end of that queue is evicted and may become eligible for garbage collection.
I suppose this is the doubly linked list which is maintained by linkedhashmap which is used by the cache. To check this behavior, I checked out the source code for LruCache, and checked the get(K key) method. It further calls upon map's get method which gets the value from the underlying hashmap and calls upon recordAccess method.
public V get(Object key) {
LinkedHashMapEntry<K,V> e = (LinkedHashMapEntry<K,V>)getEntry(key);
if (e == null)
return null;
e.recordAccess(this);
return e.value;
}
recordAccess method in turn moves the accessed entry to the end of the list in case accessOrder is set to true (for my problem let's assume it is), else it does nothing.
/**
* This method is invoked by the superclass whenever the value
* of a pre-existing entry is read by Map.get or modified by Map.set.
* If the enclosing Map is access-ordered, it moves the entry
* to the end of the list; otherwise, it does nothing.
*/
void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
if (lm.accessOrder) {
lm.modCount++;
remove();
addBefore(lm.header);
}
}
This sounds contradictory to the above statement where it's said that the element is moved to the head of the queue. Instead it's moved to the last element of the list (using head.before). Surely, I'm missing something here, any help?

You are not missing anything, just you are reading LruCache documentation for LinkedHashMap. LinkedHashMap has its own documentation, in particular about its accessOrder. (Same on Java docs).
[...when accessOrder=true...] order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order)
So LinkedHashMap puts the most recently used entries at the end, and it is documented.
Practically LruCache describes how such cache works in theory, but LinkedHashMap shows how to implement it without adding separate backward-moving iterators: by putting the recent elements at the end, trimming can use the already available (forward-moving) iterator to access (and remove) old elements efficiently.
Though here and now I could not tell what was wrong with removeEldestEntry. Perhaps it did not exist in the past.

From the javadoc of LinkedHashMap:
If the three argument constructor is used, and accessOrder is specified as true, the iteration will be in the order that entries were accessed. The access order is affected by put, get, and putAll operations, but not by operations on the collection views.
Exactly the case, that LruCache has.
public LruCache(int maxSize) {
if (maxSize <= 0) {
throw new IllegalArgumentException("maxSize <= 0");
}
this.maxSize = maxSize;
this.map = new LinkedHashMap<K, V>(0, 0.75f, true);
}
Let's see what recordAccess() does:
void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
if (lm.accessOrder) { // true, because `LruCache` instantiated this
// map with `accessOrder = true`
lm.modCount++;
remove(); // remove this `LinkedHashMapEntry` from the map
addBefore(lm.header); // adds this entry before the current header of
// the map, thus this entry becomes the header
}
}
Instead it's moved to the last element of the list (using head.before).
I cannot see, how your statement is valid.

Behavior of entrySet().removeIf in ConcurrentHashMap

I would like to use ConcurrentHashMap to let one thread delete some items from the map periodically and other threads to put and get items from the map at the same time.
I'm using map.entrySet().removeIf(lambda) in the removing thread. I'm wondering what assumptions I can make about its behavior. I can see that removeIf method uses iterator to go through elements in the map, check the given condition and then remove them if needed using iterator.remove().
Documentation gives some info about ConcurrentHashMap iterators behavior:
Similarly, Iterators, Spliterators and Enumerations return elements
reflecting the state of the hash table at some point at or since the
creation of the iterator/enumeration. hey do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
As the whole removeIf call happens in one thread I can be sure that the iterator is not used by more than one thread at the time. Still I'm wondering if the course of events described below is possible:
Map contains mapping: 'A'->0
Deleting Thread starts executing map.entrySet().removeIf(entry->entry.getValue()==0)
Deleting Thread calls .iteratator() inside removeIf call and gets the iterator reflecting the current state of the collection
Another thread executes map.put('A', 1)
Deleting thread still sees 'A'->0 mapping (iterator reflects the old state) and because 0==0 is true it decides to remove A key from the map.
The map now contains 'A'->1 but deleting thread saw the old value of 0 and the 'A' ->1 entry is removed even though it shouldn't be. The map is empty.
I can imagine that the behavior may be prevented by the implementation in many ways. For example: maybe iterators are not reflecting put/remove operations but are always reflecting value updates or maybe the remove method of the iterator checks if the whole mapping (both key and value) is still present in the map before calling remove on the key. I couldn't find info about any of those things happening and I'm wondering if there's something which makes that use case safe.

I also managed to reproduce such case on my machine.
I think, the problem is that EntrySetView (which is returned by ConcurrentHashMap.entrySet()) inherits its removeIf implementation from Collection, and it looks like:
default boolean removeIf(Predicate<? super E> filter) {
Objects.requireNonNull(filter);
boolean removed = false;
final Iterator<E> each = iterator();
while (each.hasNext()) {
// `test` returns `true` for some entry
if (filter.test(each.next())) {
// entry has been just changed, `test` would return `false` now
each.remove(); // ...but we still remove
removed = true;
}
}
return removed;
}
In my humble opinion, this cannot be considered as a correct implementation for ConcurrentHashMap.

After discussion with user Zielu in comments below Zielu's answer I have gone deeper into the ConcurrentHashMap code and found out that:
ConcurrentHashMap implementation provides remove(key, value) method which calls replaceNode(key, null, value)
replaceNode checks if both key and value are still present in the map before removing so using it should be fine. Documentation says that it
Replaces node value with v, conditional upon match of cv if
* non-null.
In the case mentioned in the question ConcurrentHashMap's .entrySet() is called which returns EntrySetView class. Then removeIf method calls .iterator() which returns EntryIterator.
EntryIterator extends BaseIterator and inherits remove implementation that calls map.replaceNode(p.key, null, null) which disables conditional removal and just always removes the key.
The negative course of events could be still prevented if iterators always iterated over 'current' values and never returned old ones if some value is modified. I still don't know if that happens or not, but the test case mentioned below seems to verify the whole thing.
I think that have created a test case which shows that the behavior described in my question can really happen. Please correct me if I there are any mistakes in the code.
The code starts two threads. One of them (DELETING_THREAD) removes all entries mapped to 'false' boolean value. Another one (ADDING_THREAD) randomly puts (1, true) or (1,false) values into the map. If it puts true in the value it expects that the entry will still be there when checked and throws an exception if it is not. It throws an exception quickly when I run it locally.
package test;
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
public class MainClass {
private static final Random RANDOM = new Random();
private static final ConcurrentHashMap<Integer, Boolean> MAP = new ConcurrentHashMap<Integer, Boolean>();
private static final Integer KEY = 1;
private static final Thread DELETING_THREAD = new Thread() {
#Override
public void run() {
while (true) {
MAP.entrySet().removeIf(entry -> entry.getValue() == false);
}
}
};
private static final Thread ADDING_THREAD = new Thread() {
#Override
public void run() {
while (true) {
boolean val = RANDOM.nextBoolean();
MAP.put(KEY, val);
if (val == true && !MAP.containsKey(KEY)) {
throw new RuntimeException("TRUE value was removed");
}
}
}
};
public static void main(String[] args) throws InterruptedException {
DELETING_THREAD.setDaemon(true);
ADDING_THREAD.start();
DELETING_THREAD.start();
ADDING_THREAD.join();
}
}

What is the use of LinkedHashMap.removeEldestEntry?

I am aware the answer to this question is easily available on the internet. I need to know what happens if I choose not to removeEldestEntry. Below is my code:
package collection;
import java.util.*;
public class MyLinkedHashMap {
private static final int MAX_ENTRIES = 2;
public static void main(String[] args) {
LinkedHashMap lhm = new LinkedHashMap(MAX_ENTRIES, 0.75F, false) {
protected boolean removeEldestEntry(Map.Entry eldest) {
return false;
}
};
lhm.put(0, "H");
lhm.put(1, "E");
lhm.put(2, "L");
lhm.put(3, "L");
lhm.put(4, "O");
System.out.println("" + lhm);
}
}
Even though I am not allowing the removeEldestEntry my code works fine.
So, internally what is happening?

removeEldestEntry always gets checked after an element was inserted. For example, if you override the method to always return true, the LinkedHashMap will always be empty, since after every put or putAll insertion, the eldest element will be removed, no matter what. The JavaDoc shows a very sensible example on how to use it:
protected boolean removeEldestEntry(Map.Entry eldest){
return size() > MAX_SIZE;
}
In an alternative way, you might only want to remove an entry if it is unimportant:
protected boolean removeEldestEntry(Map.Entry eldest){
if(size() > MAX_ENTRIES){
if(isImportant(eldest)){
//Handle an important entry here, like reinserting it to the back of the list
this.remove(eldest.getKey());
this.put(eldest.getKey(), eldest.getValue());
//removeEldestEntry will be called again, now with the next entry
//so the size should not exceed the MAX_ENTRIES value
//WARNING: If every element is important, this will loop indefinetly!
} else {
return true; //Element is unimportant
}
return false; //Size not reached or eldest element was already handled otherwise
}

Why can't people just answer the OP's simple question!
If removeEldestEntry returns false then no items will ever be removed from the map and it will essentially behave like a normal Map.

Expanding on the answer by DavidNewcomb:
I'm assuming that you are learning how to implement a cache.
The method LinkedHashMap.removeEldestEntry is a method very commonly used in cache data structures, where the size of the cache is limited to a certain threshold. In such cases, the removeEldestEntry method can be set to automatically remove the oldest entry when the size exceeds the threshold (defined by the MAX_ENTRIES attribute) - as in the example provided here.
On the other hand, when you override the removeEldestEntry method this way, you are ensuring that nothing ever happens when the MAX_ENTRIES threshold is exceeded. In other words, the data structure would not behave like a cache, but rather a normal map.

Your removeEldestEntry method is identical to the default implementation of LinkedHashMap.removeEldestEntry, so your LinkedHashMap will simply behave like a normal LinkedHashMap with no overridden methods, retaining whatever you values and keys put into it unless and until you explicitly remove them by calling remove, removeAll, clear, etc. The advantage of using LinkedHashMap is that the collection views (keySet(), values(), entrySet()) always return Iterators that traverse the keys and/or values in the order they were added to the Map.

How to get the next element of a SortedSet?

I have a SortedSet holding my ordered data.
I use the .first() method to return the first record, and pass it to another window.
When the other window finishes I get an event called, and I want to pass the next from the SortedSet to the window, so how to move to the next element?
launchWindow(this.set.first());
Then I have this:
onActivityResult(...) {
if (this.set.hasNext()) launchWindow(this.set.next());//hasNext/next doesn't exists in the current context for SortedSet
}
What options I have?

Instead of the Set you should pass the Iterator, then next consumer would just call next()

Don't you want to use an Iterator on the SortedSet?

The iterator solution:
You should probably have something like this:
class WindowLauncherClass {
SortedSet set = null;
Iterator setIterator = null;
public WindowLauncherClass(SortedSet set) {
this.set = set; // or you can copy it if that's what you need.
}
protected void launchWindow(Object item) {
// impl
}
public void onActivityResult() {
if ( setIterator != null && setIterator.hasNext() )
{
launchWindow(setIterator.next());
}
}
public void start() {
setIterator = set.iterator();
onActivityResult();
}
}
In the comments appeared the question about updates to the set. Will the iterator see it ?.
The normal answer is depends on the application requirements. In this case i don't have all the information and i'll try to guess.
until jdk 1.5 there was only one SortedSet implementstion ( TreeSet ). this had a fail fast iterator.
in jdk 6 appeared a new implementation: ConcurrentSkipListSet. The iterator for this sorted set is not a fail fast one.
If you are adding an element into the set that is "smaller" than the currently displayed element then you will not be able to see it anyway by a "good" (not fail fast) iterator. If you are adding an element "bigger" that the currently displayed element you will see it by a proper iterator.
The final solution is to actually reset the set and the iterator when a proper change is created. By using a ConcurrentSkipListSet initially you will see only the "bigger" changes and by using a TreeSet you will fail at every update.
If you afford to miss updates "smaller" than the current one then go for the jdk 6.0 and ConcurrentSkipListSet. If not than you'll have to keep track of what you displayed and rebuild a proper set with new items and undisplayed items.

Unless you're using some SortedSet from a third-party library, your set is also a NavigableSet (every SortedSet in java.util also implements NavigableSet). If you can make the event pass back the element it just finished working on, NavigableSet has a method higher which will get the next element higher than the one you pass in:
public void onActivityResult(Event event) {
Element element = event.processedElement;
Element next = set.higher(element);
if(next != null)
launchWindow(next);
}

What is a data structure kind of like a hash table, but infrequently-used keys are deleted?

I am looking for a data structure that operates similar to a hash table, but where the table has a size limit. When the number of items in the hash reaches the size limit, a culling function should be called to get rid of the least-retrieved key/value pairs in the table.
Here's some pseudocode of what I'm working on:
class MyClass {
private Map<Integer, Integer> cache = new HashMap<Integer, Integer>();
public int myFunc(int n) {
if(cache.containsKey(n))
return cache.get(n);
int next = . . . ; //some complicated math. guaranteed next != n.
int ret = 1 + myFunc(next);
cache.put(n, ret);
return ret;
}
}
What happens is that there are some values of n for which myFunc() will be called lots of times, but many other values of n which will only be computed once. So the cache could fill up with millions of values that are never needed again. I'd like to have a way for the cache to automatically remove elements that are not frequently retrieved.
This feels like a problem that must be solved already, but I'm not sure what the data structure is that I would use to do it efficiently. Can anyone point me in the right direction?
Update I knew this had to be an already-solved problem. It's called an LRU Cache and is easy to make by extending the LinkedHashMap class. Here is the code that incorporates the solution:
class MyClass {
private final static int SIZE_LIMIT = 1000;
private Map<Integer, Integer> cache =
new LinkedHashMap<Integer, Integer>(16, 0.75f, true) {
protected boolean removeEldestEntry(Map.Entry<Integer, Integer> eldest)
{
return size() > SIZE_LIMIT;
}
};
public int myFunc(int n) {
if(cache.containsKey(n))
return cache.get(n);
int next = . . . ; //some complicated math. guaranteed next != n.
int ret = 1 + myFunc(next);
cache.put(n, ret);
return ret;
}
}

You are looking for an LRUList/Map. Check out LinkedHashMap:
The removeEldestEntry(Map.Entry) method may be overridden to impose a policy for removing stale mappings automatically when new mappings are added to the map.

Googling "LRU map" and "I'm feeling lucky" gives you this:
http://commons.apache.org/proper/commons-collections//javadocs/api-release/org/apache/commons/collections4/map/LRUMap.html
A Map implementation with a fixed
maximum size which removes the least
recently used entry if an entry is
added when full.
Sounds pretty much spot on :)

WeakHashMap will probably not do what you expect it to... read the documentation carefully and ensure that you know exactly what you from weak and strong references.
I would recommend you have a look at java.util.LinkedHashMap and use its removeEldestEntry method to maintain your cache. If your math is very resource intensive, you might want to move entries to the front whenever they are used to ensure that only unused entries fall to the end of the set.

The Adaptive Replacement Cache policy is designed to keep one-time requests from polluting your cache. This may be fancier than you're looking for, but it does directly address your "filling up with values that are never needed again".

Take a look at WeakHashMap

You probably want to implement a Least-Recently Used policy for your map. There's a simple way to do it on top of a LinkedHashMap:
http://www.roseindia.net/java/example/java/util/LRUCacheExample.shtml

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.