LRU w/o Timestamps and HashMap - Java

LRU w/o Timestamps and HashMap - Java - java

I am new in programming in general and in Java in particular. I want to implement an LRU cache and would like to have O(1) complexity.
I have seen some implementations in the Internet using a manually implemented doubly linked list for the cache (two arrays, a Node class, previous, next etc.) and a HashMap where Key is the item to be cached and Value is the timestamp.
I really don't see the reason to use timestamps: the inserted item goes to the head of the manually-implemented LinkedList, the evicted item is the cached item located at the tail, and in every insertion the previously cached items are shifted one position towards the tail.
The only problems that I see are the following:
For the cache lookup (to find if we have a cache hit or miss for the requested item), we have to "scan" the cache list, which implies a for loop of some type (conventional, for-each etc., I don't really care much at this point). Obviously, we don't want that. I believe this issue can be solved easily by using an array of boolean variables to indicate whether an item is in the cache or not (1: in, 0: out) - let's call it lookupArray - as following: Let's say that the items are distinguished by some numeric ID, i.e. an integer between 1 and N. Then, this lookupArray of booleans will have size N+1 (because array indexing starts from zero) and it will be initialized at all zero values. When the item with numeric ID k, where 1<=k<=N, enters the cache, we set the boolean value at index k of lookupArray to 1. That way, cache lookup does not need any search in the cache: in order to check whether the item with numeric ID k is in the cache or not, we simply check whether the value of lookupArray at index k is 1 or 0, respectively. (We already have the index, i.e. we know where to look, thus there is no need to use a for loop.)
The second problem, though, is not easilly solvable. Let's say that we have a cache hit for an item. Then, if this item is not located at the head (i.e. if it is not the most recently used item), we have to locate it in the cache list and then put it at the head. As far as I understand, this implies searching in the cache list, i.e. a for loop. Then, we can't achieve the O(1) objective.
Am I right about (2)? Is any way to do this without using a HashMap and timestamps?
Due to the fact that I am relatively new in programming as I stated at the beginning of the post, I would really appreciate the use, if possible, of any code snippets demonstrating the implementation with a manually implemented doubly linked list.
Sorry for the long message, I hope it is not only detailed but also clear.
Thank you!

Consider using a queue. It allows you to remove an object and insert it at the beginning. It also has a size and can be used for caching.
http://docs.oracle.com/javase/7/docs/api/java/util/Queue.html
O, maybe you should not implement it yourself. There is an LRUMap available in the Apache Commons Collections library.
https://commons.apache.org/proper/commons-collections/javadocs/api-3.2.1/org/apache/commons/collections/map/LRUMap.html

Related

Is there a better DS than a HashMap for storing a list of items if I frequently use the contains method on it?

I have a list of numbers. In my program I would frequently be checking if a certain number is part of my list. If it is not part of my list, I add it to the list, otherwise I do nothing. I have found myself using a hashmap to store the items instead of an arraylist.
void add(Map<Integer, Integer> mp, int item){
if(!mp.containsKey(item)){
mp.put(item, 1);
}
}
As you can see above, I put anything as the value, since I would not be using the values.
I have tested this process to be a lot faster than using an arraylist. (Also, containsKey() for hashmap is O(1) while contains() for arraylist is O(n))
Although it works well for me, it feels awkward for the simple reason that it is not the right data structure. Is this a good practice? Is there a better DS that I can use? Is there not any list that utilizes hashing to store values?

I have a list of numbers. In my program I would frequently be checking if a certain number is part of my list. If it is not part of my list, I add it to the list, otherwise I do nothing.
You are describing a set. From the Javadoc, a java.util.Set is:
A collection that contains no duplicate elements.
Further, the operation you are describing is add():
Adds the specified element to this set if it is not already present.
In code, you would create a new Set (this example uses a HashSet):
Set<Integer> numbers = new HashSet<>();
Then any time you encounter a number you wish to keep track of, just call add(). If x is already present, the set will remain unchanged and will not throw any errors – you don't need to be careful about adding things, just add anything you see, and the set sort of "filters" out the duplicates for you.
numbers.add(x);
It's beyond your original question, but there are various things you can do with the data once you've populated a set - check if other numbers are present/absent, iterate the numbers in the set, etc. The Javadoc shows which functionality is available to use.

An alternative solution from the standard library is a java.util.BitSet. This is - in my opinion - only an option if the values of item are not too big, and if they are relatively close together. If your values are not near each other (and not starting near to zero), then it might be worthwhile looking for third party solutions that offers sparse bit sets or other sparse data structures.
You can use a bit set like:
BitSet bits = new BitSet();
void add(int item) {
bits.set(item);
}
And as suggested in the comments by Eritrean, you can also use a Set (e.g. HashSet). Internally, a HashSet uses a HashMap, so it will perform similar to your current solution, but it does away with having to put sentinel values in yourself (you just add or remove the item itself).
As an added benefit, if you use Collection<Integer> as the type of parameters/fields in your code, you can easily switch between using an ArrayList or an HashSet and test it without having to change code all over the place.

How this particular Collection is suitable for the particular scenario?

I was reading some sample question from Enthuware exam simulator. I came across a question whose
problem statement is like this
You are designing a class that will cache objects. It should be able
to store and retrieve an object when supplied with an object
identifier. Further, this class should work by tracking the "last
accessed times" of the objects. Thus, if its capacity is full, it
should remove only the object that hasn't been accessed the longest.
Which collection class would you use to store the objects?
The possible options given were
HashSet
ArrayList
LinkedHashMap
LinkedList
TreeMap
The correct answer given by simulator is LinkedHashMap. I would quote the explanation
given by simulator.
The LinkedHashMap class maintains the elements in the order of their
insertion time. This property can be used to build the required cache
as follows:
Insert the key-value pairs as you do normally where key will be the object identifier and value will be the object to be cached.
When a key is requested, remove it from the LinkedHashMap and then insert it again. This will make sure that this pair marked as inserted
latest.
If the capacity is full, remove the first element.
Note that you cannot simply insert the key-value again (without first
removing it) because a reinsertion operation does not affect the
position of the pair.
I do understand the first point only. Still here are the following questions.
In point-1 it states, the value will be the object to be cached? How does caching apply like this?
I am not able to understand from point-2 onwards.
Can someone explain this concept to me? Thanks.

I believe you should take the 'caching' from the example with a grain of salt: it's meant to provide some context, but not entirely relevant.
The caching here is likely meant as retrieving a value from the collection instead of accessing a data source and get it from there.
As to your second question:
When a key is requested, remove it from the LinkedHashMap and then
insert it again. This will make sure that this pair marked as inserted
latest.
Consider the following Map:
ID | Value
1 | Jack
5 | John
3 | Jenny
In this situation Jack was entered first, then John and after that Jenny.
Now we want to retrieve the cached value of John. If we want to do so, we first retrieve the value for his unique identifier (5) and we get the object John as result. Right now we have our cached value, but the requirement to track the last access time hasn't been fullfilled yet. Therefore we delete him and add him again, essentially placing him at the end.
ID | Value
1 | Jack
3 | Jenny
5 | John
John stays cached, but now his access time has been updated. Whenever the map is full, you remove the first item in line (which will essentially be the item that's not been accessed for the longest time).
If the map has a maximum size of 3 and we try to add Jeff, we get the following situation:
ID | Value
3 | Jenny
5 | John
7 | Jeff
The first item (Jack) and thus the least-recently accessed object will be removed, making place for the new object (most-recently accessed).

In point-1 it states, the value will be the object to be cached? How does caching apply like this?
Caching an object here means storing the created objects, in some collections, so that they can be retrieved later. Now as the requirement is to store and retrieve objects using it's key, clearly a Map is the option here, which will store the mapping from object's Key to the object itself.
Also, LinkedHashMap is suitable, because it maintains the insertion order. So, the first object you create, will be the first in the that map.
When a key is requested, remove it from the LinkedHashMap and then insert it again. This will make sure that this pair marked as inserted latest.
Again, take a look at the requirement. It says, the elements that haven't be accessed for long, should be removed. Now suppose an object which is at the first position, hasn't been accessed for long. So, when you access it now, you wouldn't want to be still in the first position, because in that case, when you remove the first elements, you will be removing the elements you just accessed.
That is why you should remove the element, and insert it back, so that it is placed at the end.
If the capacity is full, remove the first element.
As it's already clear, the first element is the one, which was inserted first, and has the oldest access time. So, you should remove the first element only, as the requirement says:
if its capacity is full, it should remove only the object that hasn't been accessed the longest.

First step, determine if you need a Set, Map, or List.
Lists preserve order.
Maps allow fast, key based, look up of items.
Sets provide identity based membership, in other words, no duplicates.
You probably want lookup by key, so It's some sort of map. However, you also want to preserve order. At first glance, LinkedHashMap seems a winner, but it is not.
LinkedHashMap preserves insertion order, and you want to preserve access order. To twist one into another, you would have to remove and add back each element as it is accessed. This is very wasteful, and subject to timing issues (between the would-be-atomic add and read).
You could simplify both by maintaining two internal data structures.
A HashMap for fast access.
A linked list to quickly reorder based on access times.
As you insert, the hashmap stores a linked list node, who's key is the key for the stored data object within the linked list node. The node is added on the "newer" end of the list.
As you access, the hashmap pulls up the linked list node, which is then removed and inserted into the head of the linked list. (and the data is returned).
As you delete, the hashmap pulls up the linked list node, and removes it from the linked list, and clears the hashmap entry.
When removing a expired entry, remove from the old end of the linked list, and don't forget to clear out the hashmap entry.
By doing this, you have built your own kind of LinkedHashMap, but one that tracks according to access time instead of insertion order.

They are omitting three very important points:
Together with the LinkedHashMap, a mechanism to determine when to start removing objects is necessary. The most simple one is a counter availableCapacity initialized to the maximum capacity and decremented/incremented accordingly. An alternative is to compare the size() of the LinkedHashMap with a maximumCapacity variable.
The LinkedHashMap (specifically its values()) is assumed to contain the only pointers to the cached objects/structures. If any other pointers are kept, they are assumed to be transient.
The cache is to be administered under a LRU regime.
This said, and to answer your questions:
Yes.
By definition, the first item in a LinkedHashMap is the first inserted ("oldest"). If every time a cache entry is used it is removed and re-inserted into the map, it is placed at the end of the list and thus made the "newest". first will always be the one that has not been used for the longest time. "second" the following, and so on. This is why the elements from the front are removed.

LinkedHashMap stores items in the order they were inserted. They're using one to implement an LRU cache. The keys are the object identifiers. The values are the items to be cached. Maps have a very fast lookup time, which is what makes the map a cache. It's faster to do the lookup than to
Inserting items into the map puts them at the end of the map. So every time you read something, you take it out and put it back on the end. Then, when you need more room in your cache, you chop off the first element. That's the one that hasn't been used in the longest time, because it made its way all the way to the front.

What happens to the rest of an ArrayList once an object has been deleted within it?

In Java, when you do this:
alist[0].remove();
What happens to the rest of the array list. Do all of the objects move up one or do they stay the same and there is just an empty index at [0]?
If not, is there an efficient way of moving each object's index closer down by one?
To clarify what I mean by more effecient:
You could just remove the first index and then iterate through the ArrayList and delete each object and re-assign it to a new index, but this seems very ineffecient and it seems like there should be a way but I have looked through at the JavaDoc page for the ArrayList class and do not see anything that would accomplish what I am trying to do.

Assuming you actually meant to ask about aList.remove(0)...
As documented by Oracle:
public E remove(int index)
Removes the element at the specified
position in this list. Shifts any subsequent elements to the left
(subtracts one from their indices).
So remove does as you require. However, you may not consider the implementation efficient since it requires time proportional to the number of elements remaining in the list. For example, if you have a list with 1 million items in it and you remove the item at index 0, then the remaining 999,999 items will need to be moved in memory.

Ignoring the code you posted that is irrelevant to an ArrayList, if you were to look at the source for ArrayList you'd find that when calling ArrayList.remove(obj) it finds the index (or if using remove(int) it already knows) then does:
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
An ArrayList is backed by an array and it shifts everything in that backing array to the left.
In that case, the lookup is O(1) if you're using remove(int) or O(n) if providing an object, and the remove operation is O(n).
If you were to use a LinkedList the lookup is O(n) or O(n) but the removal is O(1) because it's a doubly-linked list.
When choosing a data structure, it's important to consider how you're going to be using it; there are always trade-offs depending on your use pattern.

Removing range (tail) from a List

Is there an efficient method to remove a range - say the tail - of X elements from a List, e.g. LinkedList in Java?
It is obviously possible to remove the last elements one by one, which should result in O(X) level performance. At least for LinkedList instances it should be possible to have O(1) performance (by setting the references around the first element to be removed and setting the head/tail references). Unfortunately I don't see any method within List or LinkedList to remove the last elements all at once.
Currently I am thinking of replacing the list by using List.subList() but I'm not sure if that has equal performance. At least it would be more clear within the code, on the other hand I would loose the additional functionality that LinkedList provides.
I'm mainly using the List as a stack, for which LinkedList seems to be the best option, at least regarding semantics.

subList(list.size() - N, list.size()).clear() is the recommended way to remove the last N elements. Indeed, the Javadoc for subList specifically recommends this idiom:
This method eliminates the need for explicit range operations (of the sort that commonly exist for arrays). Any operation that expects a list can be used as a range operation by passing a subList view instead of a whole list. For example, the following idiom removes a range of elements from a list:
list.subList(from, to).clear();
Indeed, I suspect that this idiom might be more efficient (albeit by a constant factor) than calling removeLast() N times, just because once it finds the Nth-to-last node, it only needs to update a constant number of pointers in the linked list, rather than updating the pointers of each of the last N nodes one at a time.

Be aware that subList() returns a view of the original list, meaning:
Any modification done to the view will be reflected in the original list
The returned list is not a LinkedList - it's an inner implementation of List that's not serializable
Anyway, using either removeFirst() or removeLast() should be efficient enough, because popping the first or last element of a linked list in Java is an O(1) operation - internally, LinkedList holds pointers to both ends of the list and removing either one is as simple as moving a pointer one position.
For removing m elements at once, you're stuck with O(m) performance with a LinkedList, although strangely enough an ArrayList might be a better option, because removing elements at the end of an ArrayList is as simple as moving an index pointer (denoting the end of the array) one position to its left, and no garbage nodes are left dangling as is the case with a LinkedList. The best choice? try both approaches, profile them and let the numbers speak for themselves.

Timeout Mechanism for Hashtable

I have a hashtable that under heavy-traffic. I want to add timeout mechanism to hashtable, remove too old records. My concerns are,
- It should be lightweight
- Remove operation has not time critical. I mean (timeout value is 1 hour) remove operation can be after 1 hour or and 1 hour 15 minute. There is no problem.
My opinion is,
I create a big array (as ring buffer)that store put time and hashtable key,
When adding to hashtable, using array index find a next slot on array put time,
if array slot empty, put insertion time and HT key,
if array slot is not empty, compare insertion time for timeout occured.
if timeout occured remove from Hashtable (if not removed yet)
it not timeout occured, increment index till to find empty slot or timeouted array slot.
When removing from hashtable there is no operation on big array.
Shortly, for every add operation to Hashtable, may remove 1 timeouted element from hashtable or do nothing.
What is your the more elegant and more lightweight solution ?
Thanks for helps,

My approach would be to use the Guava MapMaker:
ConcurrentMap<String, MyValue> graphs = new MapMaker()
.maximumSize(100)
.expireAfterWrite(1, TimeUnit.HOURS)
.makeComputingMap(
new Function<String, MyValue>() {
public MyValue apply(String string) {
return calculateMyValue(string);
}
});
This might not be exactly what you're describing, but chances it's close enough. And it's much easier to produce (plus it's using a well-tested code base).
Note that you can tweak the behaviour of the resulting Map by calling different methods before the make*() call.

You should rather consider using a LinkedHashMap or maybe a WeakHashMap.
The former has a constructor to set the iteration order of its elements to the order of last access; this makes it trivial to remove too old elements. And its removeEldestEntry method can be overridden to define your own policy on when to remove the eldest entry automatically after the insertion of a new one.
The latter uses weak references to keys, so any key which has no other reference to it can be automatically garbage collected.

I think a much easier solution is to use LRUMap from Apache Commons Collections. Of course you can write your own data structures if you enjoy it or you want to learn, but this problem is so common that numerous ready-made solutions exist. (I'm sure others will point you to other implementations too, after a time your problem will be choosing the right one from them :))

Under the assumption that the currently most heavily accessed items in your cache structure are in the significant minority, you may well get by with randomly selecting items for removal (you have a low probability of removing something very useful). I've used this technique and, in this particular application, it worked very well and took next to no implementation effort.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.