Guava caching using independent keys - java

When working with user objects coming from a database one has usually an id and a username and it's common to search a user by id or by username.
If I now want to find users and like to use Guava caches I have to create two caches. One is caching by id, one is caching by username.
But both point to the same object.
Is it possible to use one LoadingCache only?
I thought about using the User Object itself as key LoadingCache<User, User> and implement equals and hashcode in the User object.
In the equals Method it's easy to say two User objects are equal if either the id or the username is equal.
But how can I generate a good hashCode Method that works for this scenario?
Any ideas on that?

When working with user objects coming from a database one has usually an id and a username and it's common to search a user by id or by username.
Remark: "search" means something different to me, then accessing. Maybe the id and the username have different usage patterns? Maybe the username is only needed at login time?
Avoid using two different concepts for referencing / accessing a user in your application. Decide for one use it consistently. Is the username unique? Can it change?
Two caches: You can use two caches and populate the "sister cache" from the loader with name2user.put(user.getName(), user) or id2user.put(user.getId(), user). This way the identical user object is in both caches. Still, I don't like it, because of cleanlyness and consistency issues.
The third issue is data duplication, if you decide to change to another solution. A cache may store the value not by reference but copy it into compact byte arrays and store it off-heap (EHCache3, Hazelcast, etc.). (Clean) code should not rely on the fact, that the cache stores its data by reference in-heap, if there is no real need for it.
As assumed above the two access paths will not be equal in usage. My recommendation:
One cache to cache the user data with: id -> User
Second cache for resolving the id only: name -> id
Don't mind the additional cache access in the case of name. Of course the loader of the second cache my already request a User for its purpose so you might want to prepopulate the first cache with it.

Thank you very much for the answers, especially from the Guava developer itself. The suggest Solution was way to much work for me, I'm lazy ;).
So if I never the less will have to caches, I decided to solve it this way.
final LoadingCache<Serializable, Optional<ITemplate>> templatesById = CacheBuilder.newBuilder()
.maximumSize(MAX_CACHE_SIZE).expireAfterAccess(MAX_CACHE_LIFE_TIME, TimeUnit.MINUTES)
.build(new CacheLoader<Serializable, Optional<ITemplate>>() {
#Override
public Optional<ITemplate> load(final Serializable id) {
final ITemplate template = readInternal(id);
final Optional<ITemplate> optional = Optional.ofNullable(template);
if (template != null) {
templatesByKey.put(template.getKey(), optional);
}
return optional;
}
});
final LoadingCache<String, Optional<ITemplate>> templatesByKey = CacheBuilder.newBuilder()
.maximumSize(MAX_CACHE_SIZE).expireAfterAccess(MAX_CACHE_LIFE_TIME, TimeUnit.MINUTES)
.build(new CacheLoader<String, Optional<ITemplate>>() {
#Override
public Optional<ITemplate> load(final String key) {
final ITemplate template = byKeyInternal(key);
final Optional<ITemplate> optional = Optional.ofNullable(template);
if (template != null) {
templatesById.put(template.getId(), optional);
}
return optional;
}
});
It means, that I don't wast memory for having two instances of a template in two caches. So I just add a template to both caches, if it was received from the database.
It works really good and is damn fast.
The only question was, when to tell the cache to refresh.
In my scenario it's only needed on delete or update.
#Override
#Transactional
public void update(final ITemplate template) {
super.update(new DBTemplate(template));
templatesById.invalidate(template.getId());
templatesByKey.invalidate(template.getKey());
}
That's it.
Any comments on that?

Related

Java ehCache, how to replace members

I have a simple method that I have annotated for caching.
#Cacheable(value = "devices", key = "#hardwareId", unless = "#result == null")
public Device get(String hardwareId)
I have a mechanism to know when someone changes the underlying database. So that I know to Evict a member from the cache, so that the next call in will go back to the database.
getCache().remove(hardwareId);
What I would like to do it REPLACE the element in the cache. The reason for this is that the call back to the database can take 1000ms & I'd like to not have that blip on the performance of the method.
As far as I can tell I have two options.
Option 1:
When I evict the member, call back into the service at that time.
getCache().remove(hardwareId);
service.get(hardwareId);
Option 2:
Create an instance of 'net.sf.ehcache.bootstrap.BootstrapCacheLoader'
that registers on startup the same class to be notified element
being removed from a cache (notifyElementRemoved()).
On #PostContruct get all methods annotated with #Cacheable. Create a Map of
'cacheName' to Method instance (java reflection Method)
When notifyElementRemoved() is triggered, uses the cache name to get the Method instance, with that invoke it to trigger the cache to be repopulated.
Method method = map.get(cacheName);
// Add magic here to get the service.
Object serviceInstance = applicationContext.getBean("deviceService");
if (Proxy.isProxyClass(serviceInstance.getClass())) {
Proxy.getInvocationHandler(serviceInstance).invoke(serviceInstance, method, new Object[] {objectKey});
} else {
method.invoke(serviceInstance, objectKey);
}
The downside of option 1 is that I have to go modify 30+ classes to put in the logic to call back into the service.
The downside of option 2 is that it's a bit complex, it feels like it would be good if ehCache could provide this feature. It knows what method it wrapped, it knows what the key/parameters were that called into this method.
The downside of both options is that there will always be a time when the cache does not contain the member & could cause a blip in performance.
My question is, does ehCache provide the feature I want or is there another mechanism out there to do REPLACEMENT of members in the cache with zero time of the cache being empty?
Don't do option 2. Too complicated. In general, the way it goes is to have a #Cacheable and a #CachePut method. Why not using that?
#Cacheable(value = "devices", key = "#hardwareId", unless = "#result == null")
public Device get(String hardwareId)
#CachePut(value ="devices", key= "#hardwardId", unless = "#result == null")
public Device update(String hardwareId)
It should cleanly solve your problem.
BTW, you don't need to specify the key. It is implicit.

thread safe or not my class?

Service must be cached in-memory data and save data in the database. getAmount(id) retrieves current balance or zero if addAmount() method was not called before for
specified id. addAmount(id, amount) increases balance or set if method was called first time. Service must be thread-safe. Thread Safety Is my implementation? What improvements can be made?
public class AccountServiceImpl implements AccountService {
private static final Logger LOGGER = LoggerFactory.getLogger(AccountServiceImpl.class);
private LoadingCache cache;
private AccountDAO accountDAO = new AccountDAOImpl();
public AccountServiceImpl() {
cache = CacheBuilder.newBuilder()
.expireAfterAccess(1, TimeUnit.HOURS)
.concurrencyLevel(4)
.maximumSize(10000)
.recordStats()
.build(new CacheLoader<Integer, Account>() {
#Override
public Account load(Integer id) throws Exception {
return new Account(id, accountDAO.getAmountById(id));
}
});
}
public Long getAmount(Integer id) throws Exception {
synchronized (cache.get(id)) {
return cache.get(id).getAmount();
}
}
public void addAmount(Integer id, Long value) throws Exception {
Account account = cache.get(id);
synchronized (account) {
accountDAO.addAmount(id, value);
account.setAmount(accountDAO.getAmountById(id));
cache.put(id, account);
}
}
}
A race condition could occur if the Account is evicted from the cache and multiple updates to that account are taking place. The eviction results in multiple Account instances, so the synchronization doesn't provide atomicity and a stale value could be inserted into the cache.
The race is more obvious if you change the settings, e.g. maximumSize(0). At the current settings likelihood of the race may be rare, but eviction may still occur even after the access. This is because the entry might be chosen for eviction but not yet removed, so a subsequent read succeeds even though the access is ignored from the policy's perspective.
The proper way to do this in Guava is to Cache.invalidate() the entry. The DAO is transactionally updating the system of record, so it ensures atomicity of the operation. The LoadingCache ensures atomicity of an entry being computed, so reads will be blocked while a fresh value is loaded. This results in an extra database lookup which seems unnecessary, but is negligible in practice. Unfortunately there is a tiny potential race even still, because Guava does not invalidate loading entries.
Guava doesn't support the write-through caching behavior you are trying to implement. Its successor, Caffeine, does by exposing Java 8's compute map methods and soon a CacheWriter abstraction. That said, the loading approach Guava expects is simple, elegant, and less error prone than manual updates.
There are two issues here to take care of:
The update of the amount value must be atomic.
If you have declared:
class Account { long amount; }
Changing the field value is not atomic on 32 bit systems. It is atomic on 64 bit systems. See: Are 64 bit assignments in Java atomic on a 32 bit machine?
So, the best way would be to change the declaration to "volatile long amout;" Then the update of the value is always atomic, plus, the volatile ensures that the others Threads/CPUs see the changed value.
That means for updating the single value, you don't need the synchronized block.
Race between inserting and modify
With your synchronized statements you just solve the first problem. But there are multiple races in your code.
See this code:
synchronized (cache.get(id)) {
return cache.get(id).getAmount();
}
You obviously assume that cache.get(id) returns the same object instance if called for the same id. That is not the case, since a cache essentially does not guarantee this.
Guava Cache blocks until the loading is complete. Other Cache may or may not block, meaning if requests come in in parallel multiple loads will be called resulting in multiple changes of the stored cache value.
Still, Guava Cache is a cache, so the item may be evicted from the cache any time, so for the next get another instance is returned.
Same problem here:
public void addAmount(Integer id, Long value) throws Exception {
Account account = cache.get(id);
/* what happens if lots of requests come in and another
threads evict the account object from the cache? */
synchronized (account) {
. . .
In general: Never synchronize on an object that life cycle is not in your control. BTW: Other cache implementations may store just the serialized object value and return another instance on each request.
Since you have a cache.put after the modify, your solution will probably work. However, the synchronize does just fulfill the purpose of flushing the memory, it may or may not really do the locking.
The update of the cache happens after the value changed in the database. This means an application may read a former value even if it is changed in the database already. This may lead to inconsistencies.
Solution 1
Have a static set of lock objects that you chose by the key value e.g. by locks[id % locks.length]. See my answer here: Guava Cache, how to block access while doing removal
Solution 2
Use database transactions, and update with the pattern:
Transaction.begin();
cache.remove(id);
accountDAO.addAmount(id, value);
Transaction.commit();
Do not update the value directly inside the cache. This will lead to update races and needs locking again.
If the transactions are solely handled in the DAO, this means for your software architecture that the caching should be implemented in the DAO and not outside.
Solution 3
Why not just store the amount value in the cache? If it is allowed that the cache results may be inconsistent with the database content while updating, the simplest solution is:
public AccountServiceImpl() {
cache = CacheBuilder.newBuilder()
.expireAfterAccess(1, TimeUnit.HOURS)
.concurrencyLevel(4)
.maximumSize(10000)
.recordStats()
.build(new CacheLoader<Integer, Account>() {
#Override
public Account load(Integer id) throws Exception {
return accountDAO.getAmountById(id);
}
});
}
Long getAmount(Integer id) {
return cache.get(id);
}
void addAmount(Integer id, Long value) {
accountDAO.addAmount(id, value);
cache.remove(id);
}
No,
private LoadingCache cache;
must be final.
cache.get(id)
must be synchronized. Are you using a library for that?
Cache must be synchronized . Otherwise two threads updating amount at same time, you will never be sure of final result. check the implementation of `put' method of used library

Is it appropriate to use AtomicReference.compareAndSet to set a reference to the results of a database call?

I am implementing a simple cache with the cache stored as an AtomicReference.
private AtomicReference<Map<String, String>> cacheData;
The cache object should be populated (lazily) from a database table.
I provide a method to return the cache data to a caller, but if the data is null (ie. not loaded), then the code needs to load the data from the database. To avoid synchronized I thought of using the compareAndSet() method:
public Object getCacheData() {
cacheData.compareAndSet(null, getDataFromDatabase()); // atomic reload only if data not set!
return Collections.unmodifiableMap(cacheData.get());
}
Is it ok to use compareAndSet in this way ie. to involve a database call as part of the atomic action? Is it any better/worse than just synchronizing the method?
Many thanks for any advice..
You do not achieve expected behaviour. This expression:
cacheData.compareAndSet(null, getDataFromDatabase())
will always call getDataFromDatabase() first. This means that it doesn't matter if the data was cached or not. If it was, you still call the database, but discard the results. The cache is working, but the performance is equally poor.
Consider this instead:
if(cacheData.get() == null) {
cacheData.compareAndSet(null, unmodifiableMap(getDataFromDatabase()));
}
return cacheData.get());
It's not perfect (still getDataFromDatabase() can be called multiple times at the beginning), but will work later as expected. Also I moved Collections.unmodifiableMap() earlier so that you don't have to wrap the same map over and over again.
Which brings us to even simpler implementation (no synchronized or AtomicReference needed):
private volatile Map<String, String> cacheData;
if(cacheData == null) {
cacheData = unmodifiableMap(getDataFromDatabase());
}
return cacheData;

Persist guava cache on shutdown

I use the following guava cache to store messages for a specific time waiting for a possible response. So I use the cache more like a timeout for messages:
Cache cache = CacheBuilder.newBuilder().expireAfterWrite(7, TimeUnit.DAYS).build();
cache.put(id,message);
...
cache.getIfPresent(id);
In the end I need to persist the messages with its currently 'timeout' information on shutdown
and restore it on startup with the internal already expired times per entry. I couldn't find any methods which give me access to the time information, so I can handle it by myself.
The gauva wiki says:
Your application will not need to store more data than what would fit in RAM. (Guava caches are local to a single run of your application. They do not store data in files, or on outside servers. If this does not fit your needs, consider a tool like Memcached.)
Do you think this restriction address also a 'timeout' map to persist on shutdown?
I don't believe there's any way to recreate the cache with per-entry expiration values -- even if you do use reflection. You might be able to simulate it by using a DelayedQueue in a separate thread that explicitly invalidates entries that should have expired, but that's the best I think you can do.
That said, if you're just interested in peeking at the expiration information, I would recommend wrapping your cache values in a class that remembers the expiration time, so you can look up the expiration time for an entry just by looking up its value and calling a getExpirationTime() method or what have you.
That approach, at least, should not break with new Guava releases.
Well, unfortunately Guava doesn't seems to expose this functionality but if you feel adventurous and absolutely must have this you could always use reflection. Just look at sources and see what methods do you need. As always care should be taken as your code might break when Guaval internal implementation changes. Code below seems to work with Guava 10.0.1:
Cache<Integer, String> cache = CacheBuilder.newBuilder().expireAfterWrite(7, TimeUnit.DAYS).build(new CacheLoader<Integer, String>() {
#Override
public String load(Integer key) throws Exception {
return "The value is "+key.toString();
}
});
Integer key_1 = Integer.valueOf(1);
Integer key_2 = Integer.valueOf(2);
System.out.println(cache.get(key_1));
System.out.println(cache.get(key_2));
ConcurrentMap<Integer, String> map = cache.asMap();
Method m = map.getClass().getDeclaredMethod("getEntry", Object.class);
m.setAccessible(true);
for(Integer key: map.keySet()) {
Object value = m.invoke(map, key);
Method m2 = value.getClass().getDeclaredMethod("getExpirationTime", null);
m2.setAccessible(true);
Long expirationTime = (Long)m2.invoke(value, null);
System.out.println(key+" expiration time is "+expirationTime);
}

Use of Google-collections MapMaker?

I just came across this answer in SO where it is mentioned that the Google-collections MapMaker is awesome.I went through the documentation but couldn't really figure out where i can use it.Can any one point out some scenario's where it would be appropriate to use a MapMaker.
Here's a quick sample of one way I've used MapMaker:
private final ConcurrentMap<Long, Foo> fooCache = new MapMaker()
.softValues()
.makeComputingMap(new Function<Long, Foo>() {
public Foo apply(Long id) {
return getFooFromServer(id);
}
});
public Foo getFoo(Long id) {
return fooCache.get(id);
}
When get(id) is called on the map, it'll either return the Foo that is in the map for that ID or it'll retrieve it from the server, cache it, and return it. I don't have to think about that once it's set up. Plus, since I've set softValues(), the cache can't fill up and cause memory issues since the system is able to clear entries from it in response to memory needs. If a cached value is cleared from the map, though, it can just ask the server for it again the next time it needs it!
The thing is, this is just one way it can be used. The option to have the map use strong, weak or soft keys and/or values, plus the option to have entries removed after a specific amount of time, lets you do lots of things with it.
It may help if you look at the descriptions of SoftReference and WeakReference.
SoftReference is very useful for use in caches, as they will be specifically cleared when memory gets low.
WeakReference tells the Garbage Collector that it can collect the object referenced to it as long as there are no strong references to it elsewhere. This is typically used with things that can be quickly looked up again as needed.
So, consider using MapMaker to create a ConcurrentMap with softValues for a cache, and one with weakKeys for temporary lookup tables.
Edit: softValues uses an LRU policy.

Categories