java - google guava cache difference between invalidateAll() and cleanUp() - java

Say I have a Cache that is defined like this:
private static Cache<String, Long> alertsUIDCache = CacheBuilder.newBuilder().
expireAfterAccess(60).build();
From what I read (please correct me if I am wrong):
If value is written to Cache at 0:00, it should be moved to "ready to be evicted" status after 60 seconds. The actual removing of the value from the Cache will happen at the next cache modification (what is exactly is cache modification ?). is that right?
Also, I am not sure what the difference between the invalidateAll() and the cleanUp() methods, can someone provide an explanation?

first part from this link : How does Guava expire entries in its CacheBuilder?
I'm going to focus on expireAfterAccess, but the procedure for expireAfterWrite is almost identical. In terms of the mechanics, when you specify expireAfterAccess in the CacheBuilder, then each segment of the cache maintains a linked list access queue for entries in order from least-recent-access to most-recent-access. The cache entries are actually themselves nodes in the linked list, so when an entry is accessed, it removes itself from its old position in the access queue, and moves itself to the end of the queue.
second part :
from this link : Guava CacheLoader - invalidate does not immediately invalidate entry if both expireAfterWrite and expireAfterAccess are set
invalidate should remove the entry immediately -- not waiting for another query -- and should force the value to get reloaded on the very next query to that key.
cleanUp: Performs any pending maintenance operations needed by the cache. Exactly which activities are performed -- if any -- is implementation-dependent.
from guava documents: https://github.com/google/guava/wiki/CachesExplained
Explicit Removals
At any time, you may explicitly invalidate cache entries rather than waiting for entries to be evicted. This can be done:
individually, using Cache.invalidate(key)
in bulk, using Cache.invalidateAll(keys)
to all entries, using Cache.invalidateAll()
When Does Cleanup Happen?
Caches built with CacheBuilder do not perform cleanup and evict values "automatically," or instantly after a value expires, or anything of the sort. Instead, it performs small amounts of maintenance during write operations, or during occasional read operations if writes are rare.
The reason for this is as follows: if we wanted to perform Cache maintenance continuously, we would need to create a thread, and its operations would be competing with user operations for shared locks. Additionally, some environments restrict the creation of threads, which would make CacheBuilder unusable in that environment.
Instead, we put the choice in your hands. If your cache is high-throughput, then you don't have to worry about performing cache maintenance to clean up expired entries and the like. If your cache does writes only rarely and you don't want cleanup to block cache reads, you may wish to create your own maintenance thread that calls Cache.cleanUp() at regular intervals.
If you want to schedule regular cache maintenance for a cache which only rarely has writes, just schedule the maintenance using ScheduledExecutorService.

Related

Guava CacheBuilder not clearing cache

I use the CacheBuilder with expireAfterWrite(2000, TimeUnit.Milliseconds). I send 10000 requests to my program and I expect the CacheBuilder to call RemovalListener 10000 times after 2 seconds for each request. I do not observe this behaviour and instead I get RemovalListener called 1 or 2 times.
Can someone please explain to me what CacheBuilder is doing because as I explained above it is doing something totally different from the documentation that Guava is providing.
In the same spirit as above, I use maximumSize(1000) and after sending my program 10000 requests, I expect the RemovalListener to be called 9000 times. But it's called only 1 or 2 times.
How does this module works in reality?
EDIT
I explicitly call clean cleanup each time I receive a request
The removal behavior is documented and works as expected (emphasis mine):
When Does Cleanup Happen?
Caches built with CacheBuilder do not perform cleanup and evict values "automatically," or instantly after a value expires, or anything of the sort. Instead, it performs small amounts of maintenance during write operations, or during occasional read operations if writes are rare.
The reason for this is as follows: if we wanted to perform Cache maintenance continuously, we would need to create a thread, and its operations would be competing with user operations for shared locks. Additionally, some environments restrict the creation of threads, which would make CacheBuilder unusable in that environment.
Instead, we put the choice in your hands. If your cache is high-throughput, then you don't have to worry about performing cache maintenance to clean up expired entries and the like. If your cache does writes only rarely and you don't want cleanup to block cache reads, you may wish to create your own maintenance thread that calls Cache.cleanUp() at regular intervals.
If you want to have more control over the cache and have dedicated executor to take care for calling RemovalListeners, use Caffeine -- a high performance, near optimal caching library based on Java 8 -- which has an API similar to Guava's Cache (same author). Caffeine has more advanced removal handling:
You may specify a removal listener for your cache to perform some operation when an entry is removed, via Caffeine.removalListener(RemovalListener). The RemovalListener gets passed the key, value, and RemovalCause.
Removal listener operations are executed asynchronously using an Executor. The default executor is ForkJoinPool.commonPool() and can be overridden via Caffeine.executor(Executor). When the operation must be performed synchronously with the removal, use CacheWriter instead.

Guava CacheBuilder doesn't call removalListener immediately after cache expiry

From my application logs, I feel like removeListener is not called immediately after the cache key is expired. This is creating a problem in the below scenario
Cache configuration:
SimpleCacheManager simpleCacheManager = new SimpleCacheManager();
GuavaCache cache = new GuavaCache("cacheData", CacheBuilder.newBuilder().expireAfterAccess(10, TimeUnit.MINUTES).removalListener(expiredCacheListener()).build());
In the application logic, I see that when the cache.get(key) is called then if there is no value(because the cache is expired because of expireAfterAccess() method time limit) then it puts a new value in the cache for the same key since the old key is expired/removed.
After this write operation then immediately, I think that the removalListener is calling the expiredCacheListener() method which has the logic of changing the value for the expired key.... //But this is actually changing the new value!!!
Now I have a valid key with an incorrect value in the cache
If a thread is able to make a key as expired then shouldn't the same thread call the removalListener immediately? How can I solve this?
That's just how Guava Cache works, see CachesExplained:
When Does Cleanup Happen?
Caches built with CacheBuilder do not perform cleanup and evict values
"automatically," or instantly after a value expires, or anything of
the sort. Instead, it performs small amounts of maintenance during
write operations, or during occasional read operations if writes are
rare.
Read more to know that Guava creators "put the choice in your hands"; you're free to maintain cleanup threads by yourself.
For more advanced Cache use cases use Caffeine, which "provides an in-memory cache using a Google Guava inspired API." Removal wiki page mentions that for synchronous removal listeners you could use CacheWriter.

How to configure write behind with Caffeine cache?

I want to use Caffeine for caching and I need to have a write-behind. I want to limit the amount of times I write to the database. The documentation speaks of write-back cache so it should be possible, but there is no example there on how to configure it. I have implemented a CacheWriter, but I don't understand how to configure it to for example only call the writer once every 10 seconds (If something changed to the cache ofcourse).
CacheWriter is an extension point and the documentation describes the use-cases where it may make sense. Those cases are beyond the scope of the library and if implemented instead could have been too rigid.
The writer is called atomically during a normal write operation (but not a computation). This ensures that a sequential order of changes is observed for a given key. For write-behind the writer would add the entry into a queue that is processed asynchronously, e.g. to batch the operations.
When implementing this capability you might want to consider,
Coalescing the updates (e.g. collect into a LinkedHashMap)
Performing a batch prior to a periodic write-behind if it exceeds a threshold size
Loading from the write-behind buffer if the operations have not yet been flushed (This avoids an inconsistent view, e.g. due to eviction)
Handling retrials, rate limiting, and striping depending on the the characteristics of the external resource
Update:
Wim Deblauwe provided a nice example using RxJava.

Guava loading cache - keep fresh until expiry

I have a specific use case with a LoadingCache in Guava.
Expire keys that have not been accessed in 30m
As long as a key is in the cache, keep it fresh irrespective of access
I've been able to get to these semantics only through the use of some external kludge.
https://gist.github.com/kashyapp/5309855
Posting here to see if folks can give better ideas.
Problems
refreshAfterWrite() is triggered only on access
cache.refresh() -> CacheLoader.reload()
updates timers for accessed/written even if we return oldValue
returning an immediateCancelledFuture() causes ugly logging
basically no way for reload() to say that nothing changed
Solution
set expireAfterAccess on the cache
schedule a refreshJob for every key using an external executor service
refreshJob.run() checks if the cache still has the key
(asMap().containsKey()) doesn't update access times
queries upstream, and does a cache.put() only if there is a changed value
Almost there
But this is not exactly what I set out to do, close enough though. If upstream is not changing, then un-accessed keys expire away. Keys which are getting changed upstream do not get expired in the cache.

How can I run a background thread that cleans up some elements in list regularly?

I am currently implementing cache. I have completed basic implementation, like below. What I want to do is to run a thread that will remove entry that satisfy certain conditions.
class Cache {
int timeLimit = 10; //how long each entry needs to be kept after accessed(marked)
int maxEntries = 10; //maximum number of Entries
HashSet<String> set = new HashSet<String>();
public void add(Entry t){
....
}
public Entry access(String key){
//mark Entry that it has been used
//Since it has been marked, background thread should remove this entry after timeLimit seconds.
return set.get(key);
}
....
}
My question is, how should I implement background thread so that the thread will go around the entries in set and remove the ones that has been marked && (last access time - now)>timeLimit ?
edit
Above is just simplified version of codes, that I did not write synchronized statements.
Why are you reinventing the wheel? EhCache (and any decent cache implementation) will do this for you. Also much more lightweight MapMaker Cache from Guava can automatically remove old entries.
If you really want to implement this yourself, it is not really that simple.
Remember about synchronization. You should use ConcurrentHashMap or synchronized keyword to store entries. This might be really tricky.
You must store last access time somehow of each entry somehow. Every time you access an entry, you must update that timestamp.
Think about eviction policy. If there are more than maxEntries in your cache, which ones to remove first?
Do you really need a background thread?
This is surprising, but EhCache (enterprise ready and proven) does not use background thread to invalidate old entries). Instead it waits until the map is full and removes entries lazily. This looks like a good trade-off as threads are expensive.
If you have a background thread, should there be one per cache or one global? Do you start a new thread while creating a new cache or have a global list of all caches? This is harder than you think...
Once you answer all these questions, the implementation is fairly simple: go through all the entries every second or so and if the condition you've already written is met, remove the entry.
I'd use Guava's Cache type for this, personally. It's already thread-safe and has methods built in for eviction from the cache based on some time limit. If you want a thread to periodically sweep it, you can just do something like this:
new Thread(new Runnable() {
public void run() {
cache.cleanUp();
try { Thread.sleep(MY_SLEEP_DURATION); } catch (Exception e) {};
}
}).start();
I don't imagine you really need a background thread. Instead you can just remove expired entries before or after you perform a lookup. This simplifies the entire implementation and its very hard to tell the difference.
BTW: If you use a LinkedHashMap, you can use it as a LRU cache by overriding removeEldestEntry (see its javadocs for an example)
First of all, your presented code is incomplete because there is no get(key) on HashSet (so I assume you mean some kind of Map instead) and your code does not mention any "marking." There are also many ways to do caching, and it is difficult to pick out the best solution without knowing what you are trying to cache and why.
When implementing a cache, it is usually assumed that the data-structure will be accessed concurrently by multiple threads. So the first thing you will need to do, is to make use of a backing data-structure that is thread-safe. HashMap is not thread-safe, but ConcurrentHashMap is. There are also a number of other concurrent Map implementations out there, namely in Guava, Javolution and high-scale lib. There are other ways to build caches besides maps, and their usefulness depends on your use case. Regardless, you will most likely need to make the backing data-structure thread-safe, even if you decide you don't need the background thread and instead evict expired objects upon attempting to retrieve them from the cache. Or letting the GC remove the entries by using SoftReferences.
Once you have made the internals of your cache thread-safe, you can simply fire up a new (most likely daemonized) thread that periodically sweeps/iterates the cache and removes old entries. The thread would do this in a loop (until interrupted, if you want to be able to stop it again) and then sleep for some amount of time after each sweep.
However, you should consider whether it is worth it for you, to build your own cache implementation. Writing thread-safe code is not easy, and I recommend that you study it before endeavouring to write your own cache implementation. I can recommend the book Java Concurrency in Practice.
The easier way to go about this is, of course, to use an existing cache implementation. There are many options available in Java-land, all with their own unique set of trade-offs.
EhCache and JCS are both general purpose caches that fit most caching needs one would find in a typical "enterprise" application.
Infinispan is a cache that is optimised for distributed use, and can thus cache more data than what can fit on a single machine. I also like its ConcurrentMap based API.
As others have mentioned, Googles Guava library has a Cache API, which is quite useful for smallish in-memory caches.
Since you want to limit the number of entries in the cache, you might be interested in an object-pool instead of a cache.
Apache Commons-Pool is widely used, and has APIs that resemble what you are trying to build yourself.
Stormpot, on the other hand, has a rather different API, and I am pretty much only mentioning it because I wrote it. It's probably not what you want, but who can be sure without knowing what you are trying to cache and why?
First, make access to your collection either synchronized or use ConcurrentHashSet a ConcurrentHashMap based Set as indicated in the comments below.
Second, write your new thread, and implement it as an endless loop that periodically iterates the prior collection and removes the elements. You should write this class in a way that it is initialized with the correct collection in the constructor, so that you do not have to worry about "how do I access the proper collection".

Categories