Google Guava's CacheLoader loadAll() vs reload() semantics

Google Guava's CacheLoader loadAll() vs reload() semantics - java

Two things I really like about Guava 11's CacheLoader (thanks, Google!) are loadAll(), which allows me to load multiple keys at once, and reload(), which allows me to reload a key asynchronously when it's "stale" but an old value exists. I'm curious as to how they play together, since reload() operates on but a single key.
Concretely, extending the example from CachesExplained:
LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
.maximumSize(1000)
.refreshAfterWrite(1, TimeUnit.MINUTES)
.build(
new CacheLoader<Key, Graph>() {
public Graph load(Key key) { // no checked exception
return getGraphFromDatabase(key);
}
public Map<Key, Graph> loadAll(Iterable<? extends K> keys) {
return getAllGraphsFromDatabase(keys);
}
public ListenableFuture<Graph> reload(final Key key, Graph prevGraph) {
if (neverNeedsRefresh(key)) {
return Futures.immediateFuture(prevGraph);
} else {
// asynchronous!
return ListenableFutureTask.create(new Callable<Graph>() {
public Graph call() {
return getGraphFromDatabase(key);
}
});
}
}
});
...where "getAllGraphsFromDatabase()" does an aggregate database query rather than length(keys) individual queries.
How do these two components of a LoadingCache play together? If some keys in my request to getAll() aren't present in the cache, they are loaded as a group with loadAll(), but if some need refreshing, do they get reloaded individually with load()? If so, are there plans to support a reloadAll()?

Here's how refreshing works.
Refreshing on a cache entry can be triggered in two ways:
Explicitly, with cache.refresh(key).
Implicitly, if the cache is configured with refreshAfterWrite and the entry is queried after the specified amount of time after it was written.
If an entry that is eligible for reload is queried, then the old value is returned, and a (possibly asynchronous) refresh is triggered. The cache will continue to return the old value for the key while the refresh is in progress. (So if some keys in a getAll request are eligible for refresh, their old values will be returned, but the values for those keys will be (possibly asynchronously) reloaded.)
The default implementation of CacheLoader.reload(key, oldValue) just returns Futures.immediateFuture(load(key)), which (synchronously) recomputes the value. More sophisticated, asynchronous implementations are recommended if you expect to be doing cache refreshes.
I don't think we're inclined to provide reloadAll at the moment. I suspect it's possible, but things are complicated enough as it is, and I think we're inclined to wait until we see specific demand for such a thing.

Related

How to create a reusable Map

Is there a way to populate a Map once from the DB (through Mongo repository) data and reuse it when required from multiple classes instead of hitting the Database through the repository.

As per your comment, what you are looking for is a Caching mechanism. Caches are components which allow data to live in memory, as opposed to files, databases or other mediums so as to allow for the fast retrieval of information (against a higher memory footprint).
There are probably various tutorials online, but usually caches all have the following behaviour:
1. They are key-value pair structures.
2. Each entity living in the cache also has a Time To Live, that is, how long will it considered to be valid.
You can implement this in the repository layer, so the cache mechanism will be transparent to the rest of your application (but you might want to consider exposing functionality that allows to clear/invalidate part or all the cache).
So basically, when a query comes to your repository layer, check in the cache. If it exists in there, check the time to live. If it is still valid, return that.
If the key does not exist or the TTL has expired, you add/overwrite the data in the cache. Keep in mind that when updating the data model yourself, you also invalidate the cache accordingly so that new/fresh data will be pulled from the DB on the next call.

You can declare the map field as public static and this would allow application wide access to hit via ClassLoadingData.mapField
I think a better solution, if I understood the problem would be a memoized function, that is a function storing the value of its call. Here is a sketch of how this could be done (note this does not handle possible synchronization problem in a multi threaded environment):
class ClassLoadingData {
private static Map<KeyType,ValueType> memoizedValues = new HashMap<>();
public Map<KeyType,ValueType> getMyData() {
if (memoizedData.isEmpty()) { // you can use more complex if to handle data refresh
populateData(memoizedData);
} else {
return memoizedData;
}
}
private void populateData() {
// do your query, and assign result to memoizedData
}
}

Premise: I suggest you to use an object-relational mapping tool like Hibernate on your java project to map the object-oriented
domain model to a relational database and let the tool handle the
cache mechanism implicitally. Hibernate specifically implements a multi-level
caching scheme ( take a look at the following link to get more
informations:
https://www.tutorialspoint.com/hibernate/hibernate_caching.htm )
Regardless my suggestion on premise you can also manually create a singleton class that will be used from every class in the project that goes to interact with the DB:
public class MongoDBConnector {
private static final Logger LOGGER = LoggerFactory.getLogger(MongoDBConnector.class);
private static MongoDBConnector instance;
//Cache period in seconds
public static int DB_ELEMENTS_CACHE_PERIOD = 30;
//Latest cache update time
private DateTime latestUpdateTime;
//The cache data layer from DB
private Map<KType,VType> elements;
private MongoDBConnector() {
}
public static synchronized MongoDBConnector getInstance() {
if (instance == null) {
instance = new MongoDBConnector();
}
return instance;
}
}
Here you can define then a load method that goes to update the map with values stored on the DB and also a write method that instead goes to write values on the DB with the following characteristics:
1- These methods should be synchronized in order to avoid issues if multiple calls are performed.
2- The load method should apply a cache period logic ( maybe with period configurable ) to avoid to load for each method call the data from the DB.
Example: Suppose your cache period is 30s. This means that if 10 read are performed from different points of the code within 30s you
will load data from DB only on the first call while others will read
from cached map improving the performance.
Note: The greater is the cache period the more is the performance of your code but if the DB is managed you'll create inconsistency
with cache if an insertion is performed externally ( from another tool
or manually ). So choose the best value for you.
public synchronized Map<KType, VType> getElements() throws ConnectorException {
final DateTime currentTime = new DateTime();
if (latestUpdateTime == null || (Seconds.secondsBetween(latestUpdateTime, currentTime).getSeconds() > DB_ELEMENTS_CACHE_PERIOD)) {
LOGGER.debug("Cache is expired. Reading values from DB");
//Read from DB and update cache
//....
sampleTime = currentTime;
}
return elements;
}
3- The store method should automatically update the cache if insert is performed correctly regardless the cache period is expired:
public synchronized void storeElement(final VType object) throws ConnectorException {
//Insert object on DB ( throws a ConnectorException if insert fails )
//...
//Update cache regardless the cache period
loadElementsIgnoreCachePeriod();
}
Then you can get elements from every point in your code as follow:
Map<KType,VType> liveElements = MongoDBConnector.getElements();

Ignite - full sync configuration

I have two server ignite nodes (Each node is started in the Spring Boot application) in the cluster.
And i have two cache:
//Persistence cahce
configuration.setReadThrough(true);
configuration.setWriteThrough(true);
configuration.setCacheStoreFactory(storeFactory);
configuration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
configuration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
configuration.setCacheMode(CacheMode.REPLICATED);
//in memory
configuration.setIndexedTypes(String.class, SequenceReserve.class);
configuration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
configuration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
configuration.setCacheMode(CacheMode.REPLICATED);
Requests for update to any caches can go to each node in parallel.
Every update - atomic operation.
cache.invoke(...);
My main goal is to avoid a inconsistent data any cost. In memory cache can get lost, but should not be inconsistent.
Any node should return an exception if the transaction was not commit on all nodes.
Can I write such a configuration that this behavior is guaranteed with 100% probability.
UPDATED
I ran the test and got the following behavior:
Each request is always performed on the same node (invoke method). I believe this is correct behavior. When will the query be executed on the second node?

IgniteCache#invoke(...) is a transactional operation. The best way to learn it is to check whether it throws TransactionException.
Your configuration seems to be enough to guarantee data consistency between nodes.
If you mean consistency between these two caches, then you can start explicit transactions and run invoke-s within them.
UPD
Note, that, as mentioned in JavaDoc for invoke(..) method, your EntryProcessor should be stateless. It may be called multiple times on different nodes, so it should return the same value each time.
UPD 2
If you call IgniteCache#invoke() method on a transactional cache, it makes the provided EntryProcessor be called on every node, that contains the needed partition of this cache. But if the cache is atomic, then the EntryProcessor will be called on the primary node only.
But you shouldn't rely on this behaviour. It's not specified anywhere, so it may change in future versions. Ignite is free to make as many calls to EntryProcessor#process() as it's necessary to guarantee data consistency.
You can use the following code to verify my words:
public static void main(String[] args) throws IgniteException {
Ignite ignite = Ignition.start("examples/config/example-ignite.xml");
IgniteCache<Integer, String> atomicCache = ignite.getOrCreateCache(
cacheConfiguration("atomic", CacheAtomicityMode.ATOMIC));
IgniteCache<Integer, String> txCache = ignite.getOrCreateCache(
cacheConfiguration("transactional", CacheAtomicityMode.TRANSACTIONAL));
atomicCache.invoke(1, (entry, arguments) -> {
System.out.println("Atomic invoke");
return null;
});
txCache.invoke(1, (entry, arguments) -> {
System.out.println("Transactional invoke");
return null;
});
}
private static <K, V> CacheConfiguration<K, V> cacheConfiguration(String name, CacheAtomicityMode atomicity) {
CacheConfiguration<K, V> cacheCfg = new CacheConfiguration<>(name);
cacheCfg.setAtomicityMode(atomicity);
cacheCfg.setCacheMode(CacheMode.REPLICATED);
cacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
return cacheCfg;
}
"Transactional invoke" will be printed on every node, but "Atomic invoke" –– only on a single one.

Caffeine: Can't provide CacheWriter to AsyncLoadingCache

I'm trying to write a AsyncLoadingCache that accepts a CacheWriter and I'm getting an IllegalStateException.
Here's my code:
CacheWriter<String, UUID> cacheWriter = new CacheWriter<String, UUID>() {
#Override
public void write(String key, UUID value) {
}
#Override
public void delete(String key, UUID value, RemovalCause cause) {
}
};
AsyncLoadingCache<String, UUID> asyncCache = Caffeine.newBuilder()
.expireAfterWrite(60, TimeUnit.SECONDS)
.writer(cacheWriter)
.maximumSize(100L)
.buildAsync((String s) -> { /* <== line 41, exception occurs here */
return UUID.randomUUID();
});
And I'm getting this trace
Exception in thread "main" java.lang.IllegalStateException
at com.github.benmanes.caffeine.cache.Caffeine.requireState(Caffeine.java:174)
at com.github.benmanes.caffeine.cache.Caffeine.buildAsync(Caffeine.java:854)
at com.mycompany.caffeinetest.Main.main(Main.java:41)
If I'll change the cache to a LoadingCache or remove .writer(cacheWriter) the code will run properly. What am I doing wrong? it seems I'm providing the right types to both objects.

Unfortunately these two features are incompatible. While the documentation states this, I have updated the exception to communicate this better. In Caffeine.writer it states,
This feature cannot be used in conjunction with {#link #weakKeys()} or {#link #buildAsync}.
A CacheWriter is a synchronous interceptor for a mutation of an entry. For example, it might be used to evict into a disk cache as a secondary layer, whereas a RemovalListener is asynchronous and using it would leave a race where the entry is not present in either caches. The mechanism is to use ConcurrentHashMap's compute methods to perform the write or removal, and call into the CacheWriter within that block.
In AsyncLoadingCache, the value materializes later when the CompletableFuture is successful, or is automatically removed if null or an error. When the entry is modified within the hash table, this future may be in-flight. This would mean that the CacheWriter would often be called without the materialized value and likely cannot do very intelligent things.
From an API perspective, unfortunately telescoping builders (which use the type system to disallow incompatible chains) become more confusing than using runtime exceptions. Sorry for not making the error clear, which should now be fixed.

Guava Cache expireAfterWrite is only applicable with getIfPresent used?

This question is to validate an observed behavior to ensure Guava Cache is used in correct way.
I have setup two Guava Caches (see code below): with and without a builder - as Guava documentation states:
Caches built with CacheBuilder do not perform cleanup and evict values
"automatically," or instantly after a value expires, or anything of
the sort.
It appears that expiration is only observed if getIfPresent() method is used, i.e. when a key is queried then value of null is returned after a period of time > expiry interval passes upon key/value is written to the cache. In case of Cache built with CacheLoader using get() or getUnchecked() results in CacheLoader.load() method to be executed thus expiry is not observed i.e. null value is never returned.
Is this the correct expectation?
Thank you for your patience and help.
// excerpt from test code
private static final FakeTicker fakeTicker = new FakeTicker();
private static LoadingCache<Integer, String> usingCacheLoader = CacheBuilder.newBuilder()
.expireAfterWrite(2, TimeUnit.MINUTES)
.ticker(fakeTicker)
.build(new CacheLoader<Integer, String>() {
public String load(Integer keyName) throws Exception {
logger.info("Getting value for key: {}", keyName);
return getValue(keyName, "with_cache_loader");
}
});
private static Cache<Integer, String> withoutCacheLoader = CacheBuilder.newBuilder()
.expireAfterWrite(2, TimeUnit.MINUTES)
.ticker(fakeTicker)
.build();

It is true that if you call get or getUnchecked you will never get null.
The expiration can be "observed" both in terms of performance - how long it takes to get for a specific key and whether it has to be freshly computed - and in whether the actual value you get reflects perhaps out of date information.

Spring #Cachable updating data

example from : SpringSource
#Cacheable(value = "vets")
public Collection<Vet> findVets() throws DataAccessException {
return vetRepository.findAll();
}
How does findVets() work exactly ?
For the first time, it takes the data from vetRepository and saves the result in cache. But what happens if a new vet is inserted in the database - does the cache update (out of the box behavior) ? If not, can we configure it to update ?
EDIT:
But what happens if the DB is updated from an external source (e.g. an application which uses the same DB) ?

#CachePut("vets")
public void save(Vet vet) {..}
You have to tell the cache that an object is stale. If data change without using your service methods then, of course, you would have a problem. You can, however, clear the whole cache with
#CacheEvict(value = "vets", allEntries = true)
public void clearCache() {..}
It depends on the Caching Provider though. If another app updates the database without notifying your app, but it uses the same cache it, then the other app would probably update the cache too.

It would not do it automatically and there is not way for the cache to know if the data has been externally introduced.
Check #CacheEvict which will help you invalidate the cache entry in case of any change to the underlying collections.
#CacheEvict(value = "vet", allEntries = true)
public void saveVet() {
// Intentionally blank
}
allEntries
Whether or not all the entries inside the cache(s) are removed or not.
By default, only the value under the associated key is removed. Note that specifying setting this parameter to true and specifying a key is not allowed.

you also can use #CachePut on the method which creates the new entry. The return type has to be the same as in your #Cachable method.
#CachePut(value = "vets")
public Collection<Vet> updateVets() throws DataAccessException {
return vetRepository.findAll();
}
In my opinion an exernal service has to call the same methods.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Google Guava's CacheLoader loadAll() vs reload() semantics - java

Related

How to create a reusable Map

Ignite - full sync configuration

Caffeine: Can't provide CacheWriter to AsyncLoadingCache

Guava Cache expireAfterWrite is only applicable with getIfPresent used?

Spring #Cachable updating data

Categories

Resources