Context
I am storing a java.util.List inside ehcache.
Key(String) --> List<UserDetail>
The ordered List contains a Top 10 ranking of my most active users.
Problem
Concurrent 3rd party clients might be requesting for this list.
I have a requirement to be as current as possible with regards to the ranking. Thus if the ranking is changed due the activities of users, the ordered List in the cache must not be left stale for very long. Once I've recalculated a new List, I want to replace the one in cache immediately.
Consider a busy scenario whereby multiple concurrent clients are requesting for the ranking; how can I replace the cache item in an fashion such that: Clients can continue to pull a possibly stale snapshot. They should never get a null value.
There will only be 1 server thread that writes to the cache.
I don't see what the problem is. Once you've replaced a cache item, clients will pull that new cache item. Up until that point they will pull the old cache item.
There should never be a time when they return a null cache item, unless you actually remove the item from the cache and then replace it.
If EHCache worked like that I would consider it pretty fundamentally broken, given that it's meant to be thread-safe!
You can simply store the new list in the cache. The next call to get will return it.
All you must make sure is that no one edits the list that is returned from the cache. For example in the server thread, you must copy the list:
List workingCopy = new ArrayList ((List)cache.get(key));
... modify list ...
cache.put (key, workingCopy);
Related
I am using Java. I need to publish data to a FIFO queue. This queue will be processed by a separate thread. This way I avoid blocking the main thread.
My use case regarding publishing data is:-
Each data object has a field, which identifies it uniquely.. so there are 50 odd such 'keys'. There are other fields which is rest of the data of the object.
If a new data object comes along, it should not be blindly inserted in queue, but should replace old one.. only if their data is different based on field comparison etc.. otherwise it will be simply discarded. Remember, one of the field is the key.. rest are data and can wildly differ.
These data must be processed on FIFO basis.. thus I need a queue kind of.
Needless to say, it should be thread safe too.
Anyone knows any data structure that satisfies these criteria? Thanks.
When I had to do something like this in C#, I created a custom data structure that contained a Dictionary and a Queue. The Dictionary was indexed by the item key, and the value contained the data associated with the key. The queue contained only the keys.
To insert an item into the queue, I did the following (pseudocode)
lock the data structure
if key exists in dictionary
replace old item data in dictionary with new item data
else
add new item to the dictionary, indexed by key
add new item key to the queue
release lock
To dequeue an item:
lock
remove first item key from queue
lookup data for that key in dictionary
remove item from dictionary
release lock
process the item
We did thousands of updates a second across multiple threads, and didn't encounter any performance problems. The locks just aren't held for very long. Your mileage may vary, of course.
You can avoid locks on the main thread by adding a lock-free concurrent queue that the main thread can push updates to. Another thread can service that queue and add items to the hybrid dictionary/queue structure I described above.
I am creating a java program to process the Collection of MongoDB as queue. So when I dequeue, I want the document that was inserted first.
To do that so, I have a field called created, which represents the time stamp for the document creation, and my initial idea was to use aggregation $min to find the smallest document using created field.
However it occurred to me why not use findOne() without any argument. It will always return the first document in the collection.
So my question is should I do that? Would it be a good approach to use findOne() and dequeue first record from the Mongo Queue? And what are the drawback if I do that so.
PS: The Mongo Queue program is created to serve the requests of the devices on basis of First Come First Serve. But as it would take some time to execute the request and device can't accept another request while it is processing one. So to prevent the drop of one request I am using the queue to process request one by one.
Interesting how many people here commented incorrectly, but you are right in that a raw .findOne() with a blank query or .findOne({}) will return the first document in the collection, that being "the document with the lowest _id value".
Ideally for a queue processing system, you want to remove the document at the same time as doing this. For this purpose the Java API supports a .findAndRemove() method:
DBCollection data = mongoOperation.getCollection("data");
DBObject removed = data.findAndRemove(new DBObject());
So that will return the first document in the collection as described and "remove" it from the collection so that no other operations can find it.
You can call .findAndModify() and set all the options yourself alternately, but if all you are after is the "oldest document first" which is what the _id guarantees then this is all you want.
findOne returns element in natural order. This is not necessarily same as insertion order. It is the order in which document appears in the disk. It may appear that it is being retrieved in insertion order but with deletes and inserts, you will start seeing document appear out of order.
One of the ways to guarantee that elements always appear in insertion order is to use capped collections. If your application is not impacted by its restrictions, it might be the simplest way to get a queue implemented with capped collection.
Capped collections can also be used with tailable cursor so that the logic that is retrieving items from the queue can continue to wait for items if no items are available to process.
Update: If you can not use capped collection you would have to sort the result by _id if it is ObjectId or keep timestamp based field in collection and order the result by that field.
FindOne returns using the $natural order within the internal MongoDB bTree that exists behind the scenes.
The function does not, by default, sort by _id and nor will it pick the lowest _id.
If you find it returns the lowest _id regularly then that is because of document positioning within the $natural index.
Getting the first document of the collection and the first document of a sorted set are two totally different things.
If you wanted to use findAndModify to grab a document off the pile, which I personally would recommend a optimistic lock then you would need to use:
findAndModify({
sort: {_id: -1},
remove: true
})
The reason why I would not commend this approach is because of that process crashes or the server goes down in the distributed worker set then you have lost that data point. Instead you want a temporary (optimistic type) lock which can be released in the event that it has not been processed correctly.
Suppose I have a hash set of request IDs that I've sent from a client to a server. The server's response returns the request ID that I sent, which I can then remove from the hash set. This will be run in a multithreaded fashion, so multiple threads can be adding to and removing IDs from the hash set. However, since the IDs generated are unique (from a thread safe source, let's say an AtomicInteger for now that gets updated for each new request), does the HashSet need to be a ConcurrentHashSet?
I would think the only case this might cause a problem would be if the HashSet encounters collisions which may require datastructure changes to the underlying HashSet object, but it doesn't seem like this would occur in this use case.
Yes. Since the underlying array for the hash table might need to be resized for instance and also because of course IDs can collide. So having different keys will not help at all.
However, since you know that the IDs are increasing, and if you can have an upper bound on the maximum number of IDs outstanding (lets say 1000). You can work with an upper and lower bound and a fixed size array with offset indexing from the lowest key, in which case you will not need any mutexes or concurrent data structure. Such data structure is very fragile however since if you have more than your upper bound oustanding hell will break loose. So unless performance is of concern, just use the ConcurrentHashSet.
I'm realizing a cache with java, but I have the last problem to solve: how to deal with elements' deletion?
Elements are stored on the disk, each element has a validity period (then an expiration date) and also a size, my cache has obviously a maximum size and a maximum number of elements which may be stored.
I imagined three ways for performing elements' deletion:
When inserting a new element into the cache a scheduled thread (one for each element) is configured for starting at expiration time (in order to delete the element itself)
Execute a thread each X minutes in order to check which elements may be deleted (and delete them)
When a limit (size or number) is reached the oldest elements are deleted (or delete elements randomly (faster))
About the third point, using this policy the cache will continue to store also expired elements. Obviously when one of these is required a control is performed to check if the element is still valid.
What do you think about? What's the common behavior when managing a cache? Are there other solutions?
P.S. I'm developing this cache for Android, but I think this is not so important.
Basically you have to know how often your cached elements will be used, and in which order. A cache has to do the same as an OS in order to keep the best data in memory.
Hava a look at these strategies and take the one you need: http://en.wikipedia.org/wiki/Page_replacement_algorithm
A good tip would be LRU (Least-Recently-Used). But like all these strategies it has some faults. Which may not be suitable for your case of usage.
Implementation tips for LRU:
use a PriorityQueue to store the elements in addition to your map. Keep it being updated with a global counter that gets incremented every time you use one of your elements and reinsert the corresponding element in the PriorityQueue with the current value of the global counter.
If you need to remove an item from the queue, you just have to remove the first or last element from the queue (depending on your implementation of the compareTo(...) method). And remove it from the map as well.
I'm hoping for some advice or suggestions on how best to handle multi threaded access to a value store.
My local value storage is designed to hold onto objects which are currently in use. If the object is not in use then it is removed from the store.
A value is pumped into my store via thread1, its entry into the store is announced to listeners, and the value is stored. Values coming in on thread1 will either be totally new values or updates for existing values.
A timer is used to periodically remove any value from the store which is not currently in use and so all that remains of this value is its ID held locally by an intermediary.
Now, an active element on thread2 may wake up and try to access a set of values by passing a set of value IDs which it knows about. Some values will be stored already (great) and some may not (sadface). Those values which are not already stored will be retrieved from an external source.
My main issue is that items which have not already been stored and are currently being queried for may arrive in on thread1 before the query is complete.
I'd like to try and avoid locking access to the store whilst a query is being made as it may take some time.
It seems that you are looking for some sort of cache. Did you try to investigate existing cache implementation, maybe some of them will do?
For example Guava cache implementations seems to cover a lot of your requirements - http://code.google.com/p/guava-libraries/wiki/CachesExplained.