I am currently using a ConcurrentHashMap in my application but I need to add the ability to expire entries after a timeout period efficiently (expireAfterWrite) and notify a removal listener whenever an entry is removed.
I see that CacheBuilder can provide what I need but I am hesitant to use it because my need is for a map, not a cache. I say this (difference between map and cache) because the guava cache documenatation says this
Generally, the Guava caching utilities are applicable whenever:
You are willing to spend some memory to improve speed.
You expect that keys will sometimes get queried more than once.
Your application would, in principle, work if every value was evicted from the cache immediately -- but you're trying to reduce
duplicated work.
Specifically the thrid bullet point is not okay in my application. I am storing values in the map/cache that I want to retrieve later (until its expiration). Also my keys generally get queried only one or two times, not many times to see caching benefits. So you see my requirement is for a map, not a cache in a sense. Is it still a good idea to use CacheBuilder as a map to store values that will expireAfterWrite and provide removalListener capability? Anybody know enough about the internals of CacheBuilder implementation to offer advice?
EDIT: Of course MapMaker caching features are deprecated in favor of CacheBuilder, my bad. Don't hesitate to use it:
Cache<Key, Graph> graphs = CacheBuilder.newBuilder()
.concurrencyLevel(4) // read docs for more details
.expireAfterWrite(yourExpireTime, TimeUnit.MINUTES)
.build();
and then use Cache#asMap() if you want it's view as ConcurrentMap.
Use another utility from Guava - MapMaker. From docs:
A builder of ConcurrentMap instances having any combination of the
following features:
keys or values automatically wrapped in weak or soft references
least-recently-used eviction when a maximum size is exceeded
time-based expiration of entries, measured since last access or last write
notification of evicted (or otherwise removed) entries
on-demand computation of values for keys not already present
(...)
The returned map is implemented as a hash table with similar
performance characteristics to ConcurrentHashMap. It supports all
optional operations of the ConcurrentMap interface. It does not permit
null keys or values.
Related
I need to implement a LRU cache with a expiration time of 600s in Java. I searched and found the built-in LinkedHashMap class. It can remove the oldest elements when the size exceeds a limit, but it doesn't have a expiration time for elements.
What I can think of is to associate the timestamp when putting an element into the cache. When retrieving an element, check its timestamp; if the timestamp is older than 600s, then removes the element from the cache and returns 'not-found'.
Any better ideas? Any built-in solutions or best practice? I'd like to avoid reinventing the wheel.
How about just using Guava cache.
It supports all of these,
A builder of LoadingCache and Cache instances having any combination
of the following features:
automatic loading of entries into the cache
least-recently-used eviction when a maximum size is exceeded
time-based expiration of entries, measured since last access or last write
keys automatically wrapped in weak references
values automatically wrapped in weak or soft references
notification of evicted (or otherwise removed) entries
accumulation of cache access statistics
I suggest not implementing it by yourself, and look at already available implementations:
Guava Cache is a pretty descent option (it wa already recommended so I won't add a link here)
Caffeine A very nice cache implementation.
In case you want to know the difference between the two, read this thread in SO
I believe both will get you covered feature wise.
In addition if you're using frameworks like Spring it has in integration with them (later versions use caffeine, older are stick to guava):
Spring Cache
I've gone through javax.cache.Cache to understand it's usage and behavior. It's stated that,
JCache is a Map-like data structure that provides temporary storage of
application data.
JCache and HashMap stores the elements in the local Heap memory and don't have persistence behavior by default. By implementing custom CacheLoader and CacheWriter we can achieve persistence. Other than that, When to use it?
Caches usually have more management logic than a map, which are nothing else but a more or less simple datastructure.
Some concepts, JCaches may implement
Expiration: Entries may expire and get removed from the cache after a certain period of time or since last use
Eviction: elements get removed from the cache if space is limited. There can be different eviction strategies .e. LRU, FIFO, ...
Distribution: i.e. in a cluster, while Maps are local to a JVM
Persistence: Elements in the cache can be persistent and present after restart, contents of a Map are just lost
More Memory: Cache implementations may use more memory than the JVM Heap provides, using a technique called BigMemory where objects are serialized into a separately allocated bytebuffer. This JVM-external memory is managed by the OS (paging) and not the JVM
option to store keys and values either by value or by reference (in maps you to handle this yourself)
option to apply security
Some of these some are more general concepts of JCache, some are specific implementation details of cache providers
Here are the five main differences between both objects.
Unlike java.util.Map, Cache :
do not allow null keys or values. Attempts to use null will result in a java.lang.NullPointerException
provide the ability to read values from a javax.cache.integration.CacheLoader (read-through-caching) when a
value being requested is not in a cache
provide the ability to write values to a javax.cache.integration.CacheWriter (write-through-caching) when a
value being created/updated/removed from a cache
provide the ability to observe cache entry changes
may capture and measure operational statistics
Source : GrepCode.com
Mostly, caching implementations keep those cached objects off heap (outside the reach of GC). GC keeps track of each and every object allocated in java. Imagine you have millions of objects in memory. If those objects are not off heap, believe me, GC will make your application performance horrible.
I'm developing a RESTful web service using Jersey and I'm going to be using a simple object cache that is updating it's data by retrieving records from a database based on a timer. I plan on using a HashMap to store my retrieved data in this cache but my question is what is the best practice for updating this HashMap.
Right now my choices are making the HashMap volatile and anytime an update comes in, create a new HashMap and then assign it when it completes. I could also wrap the HashMap in a synchronized block or use a ReentrantReadWriteLock while updating the HashMap variable directly. I have also thought about using ConcurrentHashMap which seems to have some performance benefits. Are there any significant performance differences and/or drawbacks in using one of these approaches over the other?
Also, when a user updates or inserts a record through my web service API, is it best practice to update the local cache directly once I save the record in the DB or force the cache to do another big data retrieval?
Instead of a HashMap, consider using Guava's cache, which is much more configurable and documents your intention to cache data. Read this example.
Use ConcurrentLinkedHasHMap , it uses LIRS algorithm.
ConcurrentLinkedHashMap implementation is a tweaked version of the ConcurrentHashMap implementation originally coded by Doug Lea and found on OpenJDK 1.6.0_0. We present a concurrent hash map and linked list implementation of the ConcurrentMap interface, with predictable iteration order. This implementation differs from ConcurrentHashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map.
I want to implement a cache using Guava's caching mechanism.
I have a DB query which returns a map, I want to cache the entire map but let it expire after a certain amount of time.
I realize Guava caches works as a per-item bases. We provide a key, the Cache will either returns the corresponding value from the cache or get it.
Is there a way to use Guava to get everything, cache it but timeout it after a certain time period of time and get everything again.
Many thanks
You can create an instance of Supplier<Map<K,V>> that fetches the entire map from the database, and then use Suppliers.memoizeWithExpiration to cache it.
Related:
Google Guava Supplier Example
http://google.github.io/guava/releases/snapshot/api/docs/com/google/common/base/Supplier.html
http://google.github.io/guava/releases/snapshot/api/docs/com/google/common/base/Suppliers.html
I need to update a whole collection concurrently in a background thread, but read operation might take place at the same time. It takes about 3 seconds to update the collection when I benchmark it. Is there any way to lock a collection while updating the collection? I try to create a new collection and insert all the documents into it and rename it to the original collection with "dropToTarget=true", but I am not sure how safe and stable it is in terms of sharding. I read that renameCollection is incompatible with the sharding.
It would be great if someone can suggest if there is a good idea.
Thanks.
Do you presented two possible strategies to update your collection, one being inline with a lock on it and the other one with a temporary collection?
As the mongodb documentation clearly states it will not work for sharded collections (http://docs.mongodb.org/manual/reference/command/renameCollection/). From my understanding this means that your collection you want to rename isn't sharded, as you need to delete the other collection before you do the actual renaming you'll mostlikely loose any previously kept sharding (-information). So you would need to reactivate the sharding. I highly discourage from using the two collection approach, especially if you're sharding your data.
You would need to get all the data from your sharded collection and store it centralized, once you're done with updating you need to rename the collection and shard it again. This will cause much I/O for your whole system, especially for the client doing the update.
Depending on your system architecture (with a single point of entry). You could easily hold some global flag telling you if you currently have the collection update running. Forbidding other write operations.
For multi-entry points into your MongoDB you might try $isolated, but this doesn't work with sharded collections. And I'm not sure if it allows read operations, the documentation isn't very clear.
Is it strictly disallowed to write any data, while the update is in progress? What type of updates do you perform. Can they influence each other? Or would it be possible to have concurrent writes?