I want a cache refresh policy where if cache values become stale after 5 minutes, I need to trigger refresh for the value at say 4th minute so that the new value is available at 5th minute and any request that tries to fetch data at 5th minute neither has to wait for the load nor gets stale data.
With Guava if I use refreshAfterWrite(4, Minute) and expireAfterWrite(5, Minute) together I can solve this problem for keys that are frequently queried. However I have few keys that are very rarely queried and for them the request might have to wait for load.
Is there a solution to this problem?
What you are describing exists as a feature in Ehcache 2.x under the name scheduled refresh ahead which uses the quartz scheduler.
This feature allows you to schedule a refresh of (a part of) the key set present in the cache at regular intervals. Note that in the case you refresh the whole key set, it can be quite a load on your system based on the key set size and the time it takes to refresh all entries.
The documentation explains that the cache doesn't do any work in the background for you (and that this is actually a feature). So if your cache is not high-throughput enough to ensure all desired keys are always recent, you should create a task (e.g. using ScheduledExecutorService) which occasionally refreshes all your keys.
This might look like:
scheduler.scheduleAtFixedRate(
() -> cache.asMap().keys().stream().forEach(cache::refresh), 4, 4, TimeUnit.MINUTES);
Related
AWS newbie here.
I have a DynamoDB table and 2+ nodes of Java apps reading/writing from/to it. My use case is as follow: the app should fetch N numbers of items every X seconds based on a timestamp, process them, then remove them from the DB. Because the app may scale, other nodes might be reading from the DB in the same time and I want to avoid processing the same items multiple times.
The questions is: is there any way to implement something like a poll() method that fetches the item and immediately removes it (atomic operation) as if the table was a queue. As far as I checked, delete item methods that DynamoDBMapper offers do not return removed items data.
Consistency is a weak spot of DDB, but that's the price to pay for its scalability.
You said it yourself, you're looking for a queue, so why not use one?
I suggest:
Create a lambda that:
Reads the items
Publishes them to an SQS FIFO queue with message deduplication
Deletes the items from the DB
Create an EventBridge schedule to run the Lambda every n minutes
Have your nodes poll that queue instead of DDB
For this to work you have to consider a few things regarding timings:
DDB will typically be consistent in under a second, but this isn't guaranteed.
SQS deduplication only works for 5 minutes.
EventBridge only supports minute level granularity, not seconds.
So you can run your Lambda as frequently as once a minute, but you can run your nodes as frequently (or infrequently) as you like.
If you run your Lambda less frequently than every 5 minutes then there is technically a chance of processing an item twice, but this is very unlikely to ever happen (technically this could still happen anyway if DDB took >10 minutes to be consistent, but again, extremely unlikely to ever happen).
My understanding is that you want to read and delete an item in an atomic manner, however, we are aware that is not possible with DynamoDB.
However, what is possible is deleting the item and being returned the value, which is more likened to a delete then read. As you correctly pointed out, the Mapper client does not support ReturnValues however the low level clients do.
Key keyToDelete = new Key().withHashKeyElement(new AttributeValue("214141"));
DeleteItemRequest dir = new DeleteItemRequest()
.withTableName("ABC")
.withKey(keyToDelete)
.withReturnValues("ALL_OLD");
More info here DeleteItemRequest
We want to use guava cache for caching third party data to have better response times. The cache needs to be pre-loaded by making a sequence of api calls (~ 4000 api calls are to be made). The api response contains the cache key and its value. These api calls need to be made in parallel from multiple threads (i.e. thread pool) to speed up the cache loading. Each cache entry would have an expiry time. This can be set using expireAfterAccess() call.
After a cache entry expires, it needs to be refreshed automatically in the background. Also there should be a way (api) through which we can stop this background cache refresh so that we do not keep making api calls endlessly. We will call this api once we stop getting user requests after a configured time interval.
Is it possible to delegate the thread management for cache loading and refresh to guava? i.e. Given the api call, the code to map the json response to java object and cache key-value design, can guava perform the pre-loading and refresh on its own?
Thanks.
Automatic refreshing in Guava can be enabled via CacheBuilder.refreshAfterWrite(). The relevant semantics are described as:
Specifies that active entries are eligible for automatic refresh once
a fixed duration has elapsed after the entry's creation, or the most
recent replacement of its value. [ ... ] Currently automatic refreshes
are performed when the first stale request for an entry occurs.
When overriding the method CacheLoader.reload() you can use a thread pool to load values asynchronously.
The problem with this behavior is that you always have a few reads of stale values, before the new value has been loaded, if it succeeds. An alternative cache implementation, like cache2k starts the refreshing immediately after the duration. The latter approach leads to more recent data, but possibly more needless reads. See some recent discussion about that here: https://github.com/ben-manes/caffeine/issues/261
We are using a Guava LoadingCache which is build by a CacheLoader.
What we are looking for, is a Cache which will refresh its content regularly, but also expires keys after a given (longer) timeframe, if the key is not accessed anymore.
Is it possible to use .refresAfterWrite(30, TimeUnit.SECONDS) and also .expireAfterAccess(10,TimeUnit.MINUTES) on the same CacheLoader?
My experience is that the keys are never evicted because of the regular reload through refreshAfterWrite. The documentation leaves me a little uncertain about this point.
This should behave as you desired. From the CacheBuilder docs:
Currently automatic refreshes are performed when the first stale request for an entry occurs. The request triggering refresh will make a blocking call to CacheLoader.reload(K, V) and immediately return the new value if the returned future is complete, and the old value otherwise.
So if a key is queried 30 seconds after its last write, it will be refreshed; if it is not queried for 10 minutes after its last access, it will become eligible for expiration without being refreshed in the meantime.
I would like to use Ehcache for the following task:
There's a routine that can be executed only n times a day. Each time it's invoked, a counter in the database is decreased. When it reaches 0, this fact is denoted in a shared hash map (filed under the current date), and there's no need to contact the database until the end of day. The database counter is reset to n at midnight by an asynchronous task, the hash map does not have appropriate entry for the new date, and database polling resumes.
Now I'd like to implement this behaviour in Ehcache, because we use it already for other caches, and because I'd like to be able to turn off all caching in one place. This poses the following problems:
The condition for cache activation is known only inside the #Cacheable method (when it's discovered that the DB counter is zero). This probably rules out declarative cache specification, correct?
The time to live needs to be specified as a point in time, not as duration. Is this possible?
You will probably have to implements eviction policy if you want to do that with ehcache.
see here : http://ehcache.org/documentation/apis/cache-eviction-algorithms#plugging-in-your-own-eviction-algorithm
I have a specific use case with a LoadingCache in Guava.
Expire keys that have not been accessed in 30m
As long as a key is in the cache, keep it fresh irrespective of access
I've been able to get to these semantics only through the use of some external kludge.
https://gist.github.com/kashyapp/5309855
Posting here to see if folks can give better ideas.
Problems
refreshAfterWrite() is triggered only on access
cache.refresh() -> CacheLoader.reload()
updates timers for accessed/written even if we return oldValue
returning an immediateCancelledFuture() causes ugly logging
basically no way for reload() to say that nothing changed
Solution
set expireAfterAccess on the cache
schedule a refreshJob for every key using an external executor service
refreshJob.run() checks if the cache still has the key
(asMap().containsKey()) doesn't update access times
queries upstream, and does a cache.put() only if there is a changed value
Almost there
But this is not exactly what I set out to do, close enough though. If upstream is not changing, then un-accessed keys expire away. Keys which are getting changed upstream do not get expired in the cache.