Cache management, how to auto-delete elements? - java

I'm realizing a cache with java, but I have the last problem to solve: how to deal with elements' deletion?
Elements are stored on the disk, each element has a validity period (then an expiration date) and also a size, my cache has obviously a maximum size and a maximum number of elements which may be stored.
I imagined three ways for performing elements' deletion:
When inserting a new element into the cache a scheduled thread (one for each element) is configured for starting at expiration time (in order to delete the element itself)
Execute a thread each X minutes in order to check which elements may be deleted (and delete them)
When a limit (size or number) is reached the oldest elements are deleted (or delete elements randomly (faster))
About the third point, using this policy the cache will continue to store also expired elements. Obviously when one of these is required a control is performed to check if the element is still valid.
What do you think about? What's the common behavior when managing a cache? Are there other solutions?
P.S. I'm developing this cache for Android, but I think this is not so important.

Basically you have to know how often your cached elements will be used, and in which order. A cache has to do the same as an OS in order to keep the best data in memory.
Hava a look at these strategies and take the one you need: http://en.wikipedia.org/wiki/Page_replacement_algorithm
A good tip would be LRU (Least-Recently-Used). But like all these strategies it has some faults. Which may not be suitable for your case of usage.
Implementation tips for LRU:
use a PriorityQueue to store the elements in addition to your map. Keep it being updated with a global counter that gets incremented every time you use one of your elements and reinsert the corresponding element in the PriorityQueue with the current value of the global counter.
If you need to remove an item from the queue, you just have to remove the first or last element from the queue (depending on your implementation of the compareTo(...) method). And remove it from the map as well.

Related

Can I use LinkedHashMap in Hazelcast?

can i somehow use linkedHashMap in Hazelcast (java spring). I need to get unique records from hazelcast shared in-memory cache but in order in which I inserted them. I found in hazelcast documentation (https://docs.hazelcast.org/docs/latest-dev/manual/html-single/) they offers distributed implementations of common data structures. But map doesnt preserves elements order and list or queue dont remove duplicite data. Do you know if i can use linkedHashMap or somehow get unique data and preserves their order?
Ordered or linked storage isn't compatible with the goals of a data grid - highly concurrent and distributed storage.
Ordered retrieval is possible. Hazelcast's Paging Predicate with a comparator would do it. Or the volume is not too high, you could retreive the entry set and sort it yourself.
The catch is, you have to provide the field to order upon.
If your data already has some sort of sequence number or timestamp that is always unique, this is easy.
If not, perhaps something like Atomic Long would do it. A getAndIncrement() would give you a unique number to use for each insert.
Watch though, this has a race condition if two or more threads insert concurrently. To solve this you'd need some sort of singleton #Service running somewhere to do the "get next seqno ; inset` step.
And if you restart the grid, the seqno in the atomic counter will need repositioned to the right place.

Retrieve first document from collection using findOne() MongoDB

I am creating a java program to process the Collection of MongoDB as queue. So when I dequeue, I want the document that was inserted first.
To do that so, I have a field called created, which represents the time stamp for the document creation, and my initial idea was to use aggregation $min to find the smallest document using created field.
However it occurred to me why not use findOne() without any argument. It will always return the first document in the collection.
So my question is should I do that? Would it be a good approach to use findOne() and dequeue first record from the Mongo Queue? And what are the drawback if I do that so.
PS: The Mongo Queue program is created to serve the requests of the devices on basis of First Come First Serve. But as it would take some time to execute the request and device can't accept another request while it is processing one. So to prevent the drop of one request I am using the queue to process request one by one.
Interesting how many people here commented incorrectly, but you are right in that a raw .findOne() with a blank query or .findOne({}) will return the first document in the collection, that being "the document with the lowest _id value".
Ideally for a queue processing system, you want to remove the document at the same time as doing this. For this purpose the Java API supports a .findAndRemove() method:
DBCollection data = mongoOperation.getCollection("data");
DBObject removed = data.findAndRemove(new DBObject());
So that will return the first document in the collection as described and "remove" it from the collection so that no other operations can find it.
You can call .findAndModify() and set all the options yourself alternately, but if all you are after is the "oldest document first" which is what the _id guarantees then this is all you want.
findOne returns element in natural order. This is not necessarily same as insertion order. It is the order in which document appears in the disk. It may appear that it is being retrieved in insertion order but with deletes and inserts, you will start seeing document appear out of order.
One of the ways to guarantee that elements always appear in insertion order is to use capped collections. If your application is not impacted by its restrictions, it might be the simplest way to get a queue implemented with capped collection.
Capped collections can also be used with tailable cursor so that the logic that is retrieving items from the queue can continue to wait for items if no items are available to process.
Update: If you can not use capped collection you would have to sort the result by _id if it is ObjectId or keep timestamp based field in collection and order the result by that field.
FindOne returns using the $natural order within the internal MongoDB bTree that exists behind the scenes.
The function does not, by default, sort by _id and nor will it pick the lowest _id.
If you find it returns the lowest _id regularly then that is because of document positioning within the $natural index.
Getting the first document of the collection and the first document of a sorted set are two totally different things.
If you wanted to use findAndModify to grab a document off the pile, which I personally would recommend a optimistic lock then you would need to use:
findAndModify({
sort: {_id: -1},
remove: true
})
The reason why I would not commend this approach is because of that process crashes or the server goes down in the distributed worker set then you have lost that data point. Instead you want a temporary (optimistic type) lock which can be released in the event that it has not been processed correctly.

Is a ConcurrentHashSet required if threads are using different keys?

Suppose I have a hash set of request IDs that I've sent from a client to a server. The server's response returns the request ID that I sent, which I can then remove from the hash set. This will be run in a multithreaded fashion, so multiple threads can be adding to and removing IDs from the hash set. However, since the IDs generated are unique (from a thread safe source, let's say an AtomicInteger for now that gets updated for each new request), does the HashSet need to be a ConcurrentHashSet?
I would think the only case this might cause a problem would be if the HashSet encounters collisions which may require datastructure changes to the underlying HashSet object, but it doesn't seem like this would occur in this use case.
Yes. Since the underlying array for the hash table might need to be resized for instance and also because of course IDs can collide. So having different keys will not help at all.
However, since you know that the IDs are increasing, and if you can have an upper bound on the maximum number of IDs outstanding (lets say 1000). You can work with an upper and lower bound and a fixed size array with offset indexing from the lowest key, in which case you will not need any mutexes or concurrent data structure. Such data structure is very fragile however since if you have more than your upper bound oustanding hell will break loose. So unless performance is of concern, just use the ConcurrentHashSet.

Bloom filter to store the last 50 data content alone

Hi in my system there will be one master node and n number slave nodes, Where the master node will distribute the incoming request to one of its slave node. In order to make use of cache memory content, I want to keep track of last 50 request (hash of the incoming request) that the slave node already served (In assumption that the last 50 request will be already there in cache memory, So that the node will serve the request fast).
As far as i studied the deletion is difficult in the bloom filter. But it can also be done by counting filter. Is it really possible to keep the bloom filter like a moving window (like after 50 request it should delete from the front end to accommodate the new request). Is it really possible to do like so or Is there any other filter like bloom filter (which should be fast enough to check the presence of element).
If you have just 50 things that you're keeping track of, I don't think that a Bloom filter is an appropriate data structure. Bloom filters are good when you have a massive amount if data that can't be held in memory and want to do a prefiltering to eliminate unnecessary lookups in some remote data structure, such as a remote database. If you have just 50 elements, you're almost certainly better off using something like a hash table to store those values, since you can get exact answers in expected O(1) time with minimal space overhead.
If you want to track the last 50 elements you've seen, consider looking into a linked hash table, which supports the insert, lookup, delete, and delete-eldest all in O(1) time. Java's LinkedHashMap should be great here.
Hope this helps!

atomic writes to ehcache

Context
I am storing a java.util.List inside ehcache.
Key(String) --> List<UserDetail>
The ordered List contains a Top 10 ranking of my most active users.
Problem
Concurrent 3rd party clients might be requesting for this list.
I have a requirement to be as current as possible with regards to the ranking. Thus if the ranking is changed due the activities of users, the ordered List in the cache must not be left stale for very long. Once I've recalculated a new List, I want to replace the one in cache immediately.
Consider a busy scenario whereby multiple concurrent clients are requesting for the ranking; how can I replace the cache item in an fashion such that: Clients can continue to pull a possibly stale snapshot. They should never get a null value.
There will only be 1 server thread that writes to the cache.
I don't see what the problem is. Once you've replaced a cache item, clients will pull that new cache item. Up until that point they will pull the old cache item.
There should never be a time when they return a null cache item, unless you actually remove the item from the cache and then replace it.
If EHCache worked like that I would consider it pretty fundamentally broken, given that it's meant to be thread-safe!
You can simply store the new list in the cache. The next call to get will return it.
All you must make sure is that no one edits the list that is returned from the cache. For example in the server thread, you must copy the list:
List workingCopy = new ArrayList ((List)cache.get(key));
... modify list ...
cache.put (key, workingCopy);

Categories