Guava Cache CacheLoader.refreshAfterWrite() and .expireAfterAccess() in combination

Guava Cache CacheLoader.refreshAfterWrite() and .expireAfterAccess() in combination - java

We are using a Guava LoadingCache which is build by a CacheLoader.
What we are looking for, is a Cache which will refresh its content regularly, but also expires keys after a given (longer) timeframe, if the key is not accessed anymore.
Is it possible to use .refresAfterWrite(30, TimeUnit.SECONDS) and also .expireAfterAccess(10,TimeUnit.MINUTES) on the same CacheLoader?
My experience is that the keys are never evicted because of the regular reload through refreshAfterWrite. The documentation leaves me a little uncertain about this point.

This should behave as you desired. From the CacheBuilder docs:
Currently automatic refreshes are performed when the first stale request for an entry occurs. The request triggering refresh will make a blocking call to CacheLoader.reload(K, V) and immediately return the new value if the returned future is complete, and the old value otherwise.
So if a key is queried 30 seconds after its last write, it will be refreshed; if it is not queried for 10 minutes after its last access, it will become eligible for expiration without being refreshed in the meantime.

Related

Guava thread management when pre-loading and refreshing cache entries

We want to use guava cache for caching third party data to have better response times. The cache needs to be pre-loaded by making a sequence of api calls (~ 4000 api calls are to be made). The api response contains the cache key and its value. These api calls need to be made in parallel from multiple threads (i.e. thread pool) to speed up the cache loading. Each cache entry would have an expiry time. This can be set using expireAfterAccess() call.
After a cache entry expires, it needs to be refreshed automatically in the background. Also there should be a way (api) through which we can stop this background cache refresh so that we do not keep making api calls endlessly. We will call this api once we stop getting user requests after a configured time interval.
Is it possible to delegate the thread management for cache loading and refresh to guava? i.e. Given the api call, the code to map the json response to java object and cache key-value design, can guava perform the pre-loading and refresh on its own?
Thanks.

Automatic refreshing in Guava can be enabled via CacheBuilder.refreshAfterWrite(). The relevant semantics are described as:
Specifies that active entries are eligible for automatic refresh once
a fixed duration has elapsed after the entry's creation, or the most
recent replacement of its value. [ ... ] Currently automatic refreshes
are performed when the first stale request for an entry occurs.
When overriding the method CacheLoader.reload() you can use a thread pool to load values asynchronously.
The problem with this behavior is that you always have a few reads of stale values, before the new value has been loaded, if it succeeds. An alternative cache implementation, like cache2k starts the refreshing immediately after the duration. The latter approach leads to more recent data, but possibly more needless reads. See some recent discussion about that here: https://github.com/ben-manes/caffeine/issues/261

How exactly memcache removes expired data?

I am using com.google.appengine.api.memcache.MemcacheService to work with Google Memcache.
When storing a value, I set an expiration of 60 seconds. But the value is still returned from the cache even after several minutes.
The code for working with Memcache:
// Config
int expirationSeconds = 60;
Expiration expiration = Expiration.byDeltaSeconds(expirationSeconds);
MemcacheService memcache = MemcacheServiceFactory.getMemcacheService();
// Write operation
memcache.put(id, value, expiration);
// Read operation
memcache.get(id);
I am expecting the value to be absent in this case because the Memcache documentation says that An expired item is removed when someone unsuccessfully tries to retrieve it.
Why the expired value is still returned from Memcache?

The documentation uses two words evicted and removed that could be understood to be interchangeable but they aren't:
By default, all values remain in the cache as long as possible, until evicted due to memory pressure, removed explicitly by the app, or made unavailable for another reason (such as an outage).
And in the note here we can see how removal process works:
The actual removal of expired cache data is handled lazily. An expired item is removed when someone unsuccessfully tries to retrieve it.
At the same place the eviction is explained like that:
The value is evicted no later than this time, though it can be evicted earlier for other reasons. Increment the value stored for an existing key does not update its expiration time.
Eviction is something akin to soft removal where the value is unavailable but is still in the Memcache. Removal does the actual removal.

Forced Non-blocking refresh of Cache

I want a cache refresh policy where if cache values become stale after 5 minutes, I need to trigger refresh for the value at say 4th minute so that the new value is available at 5th minute and any request that tries to fetch data at 5th minute neither has to wait for the load nor gets stale data.
With Guava if I use refreshAfterWrite(4, Minute) and expireAfterWrite(5, Minute) together I can solve this problem for keys that are frequently queried. However I have few keys that are very rarely queried and for them the request might have to wait for load.
Is there a solution to this problem?

What you are describing exists as a feature in Ehcache 2.x under the name scheduled refresh ahead which uses the quartz scheduler.
This feature allows you to schedule a refresh of (a part of) the key set present in the cache at regular intervals. Note that in the case you refresh the whole key set, it can be quite a load on your system based on the key set size and the time it takes to refresh all entries.

The documentation explains that the cache doesn't do any work in the background for you (and that this is actually a feature). So if your cache is not high-throughput enough to ensure all desired keys are always recent, you should create a task (e.g. using ScheduledExecutorService) which occasionally refreshes all your keys.
This might look like:
scheduler.scheduleAtFixedRate(
() -> cache.asMap().keys().stream().forEach(cache::refresh), 4, 4, TimeUnit.MINUTES);

Is it possible to configure Guava Cache (or other library) behaviour to be: If time to reload, return previous entry, reload in background (see specs)

I'd like to have a cache that works like this:
A. If request is not cached: load and return results.
B. If request is cached, has not expired: return results.
C. If request is cached, has expired: return old results immediately, start to reload results (async)
D. If request is cached, has expired, reload is already running: return old results immediately.
E. If reloading fails (Exception): continue to return previous successful load results to requests.
(After a failed reload (case E), next request is handled following case C.)
(If case A ends in Exception, Exception is thrown)
Does anyone know an existing implementation, or will I have to implement it myself?

In cache2k I implemented exactly the behavior you described above. Here is how you build a cache with these features:
CacheBuilder.newCache(Key.class, Value.class)
.name("myCache")
.source(new YourSourceImplementation())
.backgroundRefresh(true)
.suppressExceptions(true)
.maxSize(7777) // maximum entries in the cache
.expiryDuration(60, TimeUnit.SECONDS)
.exceptionExpiryDuration(15, TimeUnit.SECONDS)
.build();
The exipryDuration is the duration the value is considered valid after it war inserted or modified. The separate setting for exceptionExpiryDuration is the time until the next refresh is tried after an exception happens.
If an exception happens, but there is no valid entry, the exception is cached and rethrown for the exceptionExpiryDuration time.
You can also dynamically compute the expiry duration, e.g. based on the exception type. Some more information is in the blog entry About caching exceptions
With backgroundRefresh an entry is refreshed after it is expired. When no access happens after an refresh within the expiry time, the entry will not get refreshed any more and then eventually evicted.
Unfortunately, I am really behind in documenting all these useful features properly. If you have any more questions you can use the tag cache2k. If you like some improvements, open an issue on GitHub.
The code works well in our production applications for about a year now. Actually, suppressExceptions is always the default. It helps very well, e.g. if there is a short network outage.
BTW: Meanwhile, I subsume these semantics under the term cache resiliency.

https://github.com/ben-manes/caffeine
Caffeine provides exactly the behavior I want out of the box using refreshAfterWrite:
LoadingCache<K, V> cache = Caffeine.newBuilder()
.refreshAfterWrite(expireTime, timeUnit)
.maximumSize(maxCountOfItems)
.build(k->loader.load(k));

Guava loading cache - keep fresh until expiry

I have a specific use case with a LoadingCache in Guava.
Expire keys that have not been accessed in 30m
As long as a key is in the cache, keep it fresh irrespective of access
I've been able to get to these semantics only through the use of some external kludge.
https://gist.github.com/kashyapp/5309855
Posting here to see if folks can give better ideas.
Problems
refreshAfterWrite() is triggered only on access
cache.refresh() -> CacheLoader.reload()
updates timers for accessed/written even if we return oldValue
returning an immediateCancelledFuture() causes ugly logging
basically no way for reload() to say that nothing changed
Solution
set expireAfterAccess on the cache
schedule a refreshJob for every key using an external executor service
refreshJob.run() checks if the cache still has the key
(asMap().containsKey()) doesn't update access times
queries upstream, and does a cache.put() only if there is a changed value
Almost there
But this is not exactly what I set out to do, close enough though. If upstream is not changing, then un-accessed keys expire away. Keys which are getting changed upstream do not get expired in the cache.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.