I am writing a multithreaded Rabbit MQ client in Java that will be processing files. However, I do need a fast and large pool of cache, mostly a read-only list of Maps. Data will be pulled from an SQL server upon a request, but I want also the cache to have LRU algorithm built-in.
I've found a half-functional site http://cacheonix.org that seems to deliver what I want, however the download page doesn't work properly.
Do you have any hints?
I think most of my uses will be satisfied with LinkedHashMap combined with LRU caching mechanism/wrapper, but I am asking firstly.
https://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html
If you want to get your hands dirty to implement a cache with LRU, LinkedHashMap maybe a choice.
LinkedHashMap allows to access elements with insert order or access order(Default is insert order), which is maintained by a double-linked list.
For access order, when implementing access operations of getting or putting, the accessed element would be removed to the tail of double-linked list.
What's more, with a protected method as below, LinkedHashMap enables to extend it and customize your own strategy of removing eldest element.
protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
return false;
}
With the access order and the ability of removing eldest element, you can realize your own cache with LRU.
BTW, LinkedHashMap is not thread-safe, however, you can implement your own thread-safe getter/setter.
Related
I'm trying to remove an entry from the Caffeine cache manually. I have two attempts but I suspect that there are some problems with both of them:
This one seems like it could suffer from a race condition.
cache.get(key);
cache.invalidate(key);
This one seems to be bypassing the methods of the cache itself so I'm not sure if there are strange side effects that result.
cache.asMap().remove(key);
Is there a standard way to do this or a method of the cache that I'm missing here?
You should use cache.asMap().remove(key) as you suspected. The other call delegates to this, but does not return the value because that is not idiomatic for a cache.
The Cache interface is opinionated for how one should commonly use a cache, while the asMap() view is more raw to allow for advanced operations. For example, you generally wouldn't iterate over a cache (e.g. memcached doesn't allow this), but if you need to then the Map provides that support. All calls flow into the same backing structure, so there will be no inconsistency. The APIs merely try to nudge users towards best practices, but strive to not block a developer from getting their work done safely and correctly.
I'm developing a RESTful web service using Jersey and I'm going to be using a simple object cache that is updating it's data by retrieving records from a database based on a timer. I plan on using a HashMap to store my retrieved data in this cache but my question is what is the best practice for updating this HashMap.
Right now my choices are making the HashMap volatile and anytime an update comes in, create a new HashMap and then assign it when it completes. I could also wrap the HashMap in a synchronized block or use a ReentrantReadWriteLock while updating the HashMap variable directly. I have also thought about using ConcurrentHashMap which seems to have some performance benefits. Are there any significant performance differences and/or drawbacks in using one of these approaches over the other?
Also, when a user updates or inserts a record through my web service API, is it best practice to update the local cache directly once I save the record in the DB or force the cache to do another big data retrieval?
Instead of a HashMap, consider using Guava's cache, which is much more configurable and documents your intention to cache data. Read this example.
Use ConcurrentLinkedHasHMap , it uses LIRS algorithm.
ConcurrentLinkedHashMap implementation is a tweaked version of the ConcurrentHashMap implementation originally coded by Doug Lea and found on OpenJDK 1.6.0_0. We present a concurrent hash map and linked list implementation of the ConcurrentMap interface, with predictable iteration order. This implementation differs from ConcurrentHashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map.
I need to update a whole collection concurrently in a background thread, but read operation might take place at the same time. It takes about 3 seconds to update the collection when I benchmark it. Is there any way to lock a collection while updating the collection? I try to create a new collection and insert all the documents into it and rename it to the original collection with "dropToTarget=true", but I am not sure how safe and stable it is in terms of sharding. I read that renameCollection is incompatible with the sharding.
It would be great if someone can suggest if there is a good idea.
Thanks.
Do you presented two possible strategies to update your collection, one being inline with a lock on it and the other one with a temporary collection?
As the mongodb documentation clearly states it will not work for sharded collections (http://docs.mongodb.org/manual/reference/command/renameCollection/). From my understanding this means that your collection you want to rename isn't sharded, as you need to delete the other collection before you do the actual renaming you'll mostlikely loose any previously kept sharding (-information). So you would need to reactivate the sharding. I highly discourage from using the two collection approach, especially if you're sharding your data.
You would need to get all the data from your sharded collection and store it centralized, once you're done with updating you need to rename the collection and shard it again. This will cause much I/O for your whole system, especially for the client doing the update.
Depending on your system architecture (with a single point of entry). You could easily hold some global flag telling you if you currently have the collection update running. Forbidding other write operations.
For multi-entry points into your MongoDB you might try $isolated, but this doesn't work with sharded collections. And I'm not sure if it allows read operations, the documentation isn't very clear.
Is it strictly disallowed to write any data, while the update is in progress? What type of updates do you perform. Can they influence each other? Or would it be possible to have concurrent writes?
I am trying to implement a servlet for GPS monitoring and trying create simple cache, because i think that it will be faster then SQL request for every Http Request. simple scheme:
in the init() method, i reads one point for each vehicle into HashMap (vehicle id = key, location in json = value) . after that, some request try to read this points and some request try to update (one vehicle update one item). Of course I want to minimize synchronization so i read javadoc :
http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)
If I am right, there is no any synchronization in my task, because i do only "not a structural modification == changing the value associated with a key that an instance already contains)". is it a correct statement?
Use the ConcurrentHashMap it doesn't use synchronization by locks, but by atomic operations.
Wrong. Adding an item to the hash map is a structural modification (and to implement a cache you must add items at some point).
Use java.util.concurrent.ConcurrentHashMap.
if all the entries are read into hashmap in init() and then only read/modified - then yes, all the other threads theoretically do not need to sync, though some problems might arise due to threads caching values, so ConcurrentHashMap would be better.
perhaps rather than implementing cache yourself, use a simple implementation found in Guava library
Caching is not an easy problem - but it is a known one. Before starting, I would carefully measure wheter you really do have a performance problem, and whether caching actually solve it. You may think it should, and you may be right. You may also be horrendously wrong depending on the situation ("Preemptive optimization is the root of all evil"), so measure.
This said, do not implement a cache yourself, use a library doing it for you. I have personnaly good experience with ehcache.
If I understand correctly, you have two types of request:
Read from cache
Write to cache (to update the value)
In this case, you may potentially try to write to the same map twice at the same time, which is what the docs are referring to.
If all requests go through the same piece of code (e.g. an update method which can only be called from one thread) you will not need synchronisation.
If your system is multi-threaded and you have more than one thread or piece of code that writes to the map, you will need to externally synchronise your map or use a ConcurrentHashMap.
For clarity, the reason you need synchronisation is that if you have two threads both trying to update the a JSON value for the same key, who wins? This is either left up to chance or causes exceptions or, worse, buggy behaviour.
Any time you modify the same element from two threads, you need to synchronise on that code or, better still, use a thread-safe version of the data structure if that is applicable.
I am currently using a ConcurrentHashMap in my application but I need to add the ability to expire entries after a timeout period efficiently (expireAfterWrite) and notify a removal listener whenever an entry is removed.
I see that CacheBuilder can provide what I need but I am hesitant to use it because my need is for a map, not a cache. I say this (difference between map and cache) because the guava cache documenatation says this
Generally, the Guava caching utilities are applicable whenever:
You are willing to spend some memory to improve speed.
You expect that keys will sometimes get queried more than once.
Your application would, in principle, work if every value was evicted from the cache immediately -- but you're trying to reduce
duplicated work.
Specifically the thrid bullet point is not okay in my application. I am storing values in the map/cache that I want to retrieve later (until its expiration). Also my keys generally get queried only one or two times, not many times to see caching benefits. So you see my requirement is for a map, not a cache in a sense. Is it still a good idea to use CacheBuilder as a map to store values that will expireAfterWrite and provide removalListener capability? Anybody know enough about the internals of CacheBuilder implementation to offer advice?
EDIT: Of course MapMaker caching features are deprecated in favor of CacheBuilder, my bad. Don't hesitate to use it:
Cache<Key, Graph> graphs = CacheBuilder.newBuilder()
.concurrencyLevel(4) // read docs for more details
.expireAfterWrite(yourExpireTime, TimeUnit.MINUTES)
.build();
and then use Cache#asMap() if you want it's view as ConcurrentMap.
Use another utility from Guava - MapMaker. From docs:
A builder of ConcurrentMap instances having any combination of the
following features:
keys or values automatically wrapped in weak or soft references
least-recently-used eviction when a maximum size is exceeded
time-based expiration of entries, measured since last access or last write
notification of evicted (or otherwise removed) entries
on-demand computation of values for keys not already present
(...)
The returned map is implemented as a hash table with similar
performance characteristics to ConcurrentHashMap. It supports all
optional operations of the ConcurrentMap interface. It does not permit
null keys or values.