Guava cache considering older key - java

I am facing an issue with Guava Caches. When I have only one element in cache, things are fine. But when I load a second element, Its trying to pick with key of earlier entry
private static LoadingCache<String, MyClass> cache = null;
....
public MyClass method(final String id1, final long id2) {
log.error("inside with "+id1);
final String cacheKey = id1+"-"+id2;
if(cache == null){
cache = CacheBuilder.newBuilder()
.maximumSize(1000)
.build(
new CacheLoader<String, MyClass>() {
#Override
public MyClass load(String key) {
return getValue(cacheKey);
}
}
);
}
try {
return cache.get(cacheKey);
} catch (ExecutionException ex) {
log.error("EEE missing entry",ex);
}
}
private MyClass getValue(String cacheKey){
log.error("not from cache "+cacheKey);
...
}
The log says:
inside with 129890038707408035563943963861595603358
not from cache 1663659699-315839912047403113610285801857400882820 // This is key for the earlier entry
For eg, When I call method("1", 2), it loads the value in cache and I am able to get it from cache subsequently. Now I call method ("3", 4), this is not in cache, so getValue() is called and the log prints out the key for method("1", 2)
Where am I going wrong?

Your problem is related to how you create your CacheLoader, if you check well you will see that you initialize it with a given cache key (the value of the local variable cacheKey at the time the cache is lazily initialized) while it should be more generic and rely on the key provided as parameter to the method load of your CacheLoader otherwise it will load the cache by calling getValue(key) with the same key.
It should be this:
new CacheLoader<String, MyClass>() {
#Override
public MyClass load(String key) {
return getValue(key); // instead of return getValue(cacheKey);
}
}
NB: The way you initialize your cache is not thread safe, indeed if it has not been initialized and your method method is called by several threads concurrently it will be created several times instead of one.
One way could be to use the double-checked locking idiom as next:
private static volatile LoadingCache<String, MyClass> cache = null;
public MyClass method(final String id1, final long id2) {
...
if(cache == null){
synchronized (MyClass.class) {
if(cache == null){
cache = ...
}
}
}
NB: Do not initialize a static cache with a CacheLoader based on a non static method, it is much too error prone. Make them both non static or static but don't mix them.
Assuming that you can make both static, your cache initialization will be very simply, it would simply be:
private static final LoadingCache<String, MyClass> cache = CacheBuilder.newBuilder()...
No need to initialize it lazily which will also simplify a lot the code of your method as it will simply be reduce to:
public MyClass method(final String id1, final long id2) {
log.error("inside with "+id1);
final String cacheKey = id1+"-"+id2;
try {
return cache.get(cacheKey);
} catch (ExecutionException ex) {
log.error("EEE missing entry",ex);
}
}

Related

Java ConcurrentHashMap#computeIfAbsent equivalent for AtomicReference

I'm looking for code equivalent to the following:
ConcurrentHashMap<int, Object> map = new ConcurrentHashMap<>();
map.computeIfAbsent(key, n -> f(n));
Where f(n) is HTTP network call and blocking for the result
Bur referring to single element held in AtomicReference<Object> where I need to ensure f is called only once upon even if multiple threads do the access concurrently.
I tried using compareAndSet but this doesn't allow lambda passing.
Does updateAndGet achieve that? Its documentation mentions
The function should be side-effect-free, since it may be re-applied when attempted updates fail due to contention among threads.
Which doesn't seem to fill the need of invoking f only once.
I believe you need something like a concurrent lazy initializer.
It is possible to achieve this using:
If your requirement is to have only 1 instance in an application, you can use a thread-safe singleton. https://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiom
public class Something {
private final Result result;
private Something() {
result = f();
}
private static class LazyHolder {
public static final Something INSTANCE = new Something();
}
public static Something getInstance() {
return LazyHolder.INSTANCE;
}
}
If you want to have it in different places of your application, you can use:
Apache Commons Lang ConcurrentInitializer like LazyInitializer:
ConcurrentInitializer<> lazyInitializer = new LazyInitializer<Result>() {
#Override
protected Foo initialize() throws ConcurrentException {
return f();
}
};
Get instance
Result instance = lazyInitializer.get();
Google's Guava link:
Supplier<Result> resultSupplier = Suppliers.memoize(new Supplier<Result>() {
public Result get() {
return f();
}
});
Yon can create your own concurrent lazy initalizer in lock-free manner.
import java.util.concurrent.atomic.AtomicReference;
import java.util.function.Supplier;
public class LazyConcurrentSupplier<T> implements Supplier<T> {
static class Container<T> {
public static final int NULL_PHASE = -1, CREATING_PHASE = 0, CREATED = 1;
final int i;
final T value;
public Container(int i, T value) {
this.i = i;
this.value = value;
}
}
private final Container<T> NULL = new Container<>(Container.NULL_PHASE, null),
CREATING = new Container<>(Container.CREATING_PHASE, null);
private final AtomicReference<Container<T>> ref = new AtomicReference<>(NULL);
private final Supplier<T> supplier;
public LazyConcurrentSupplier(Supplier<T> supplier) {
this.supplier = supplier;
}
#Override
public T get() {
Container<T> prev;
do {
if (ref.compareAndSet(NULL, CREATING)) {
T res = supplier.get();
ref.set(new Container<>(Container.CREATED, res));
return res;
} else {
prev = ref.get();
if (prev.i == Container.CREATED) {
return prev.value;
}
}
} while (prev.i < Container.CREATED);
return prev.value;
}
}
From your question, I think you want to avoid doing the HTTP request multiple times.
You could have a map of FutureTask(s) that asynchronously performs the HTTP request for you. In this way, if a thread tries to computeIfAbsent it will see the FutureTask created by another thread even if the HTTP operation is not done yet.
You could use an AtomicBoolean with an initial value of true and allow each thread should call AtomicBoolean::getAndSet with the value false. If the return value is true then you execute your function.
This will ensure that the call is only made once since only the first thread will succeed.

How to implement Guava cache to store and get different types of objects?

Right now my cache looks like the following:
public class TestCache {
private LoadingCache<String, List<ObjectABC>> cache;
TestCache() {
cache = CacheBuilder.newBuilder().expireAfterAccess(10, TimeUnit.MINUTES).maximumSize(25)
.build(new CacheLoader<String, List<ObjectABC>>(
) {
#Override
public List<ObjectABC> load(String key) throws Exception {
// TODO Auto-generated method stub
return addCache(key);
}
});
}
private List<ObjectABC> addCache(String key) {
final JoiObjectMapper mapper = new JoiObjectMapper();
final Collection<File> allConfigFiles = FileUtils.listFiles(new File(key), null, true);
final List<ObjectABC> configsList = new ArrayList<>();
allConfigFiles.forEach(configFile -> {
try {
configsList.add(mapper.readValue(configFile, new TypeReference<ObjectABC>() {
}));
} catch (Exception e) {
throw new RuntimeException(e);
}
});
return configsList;
}
public List<ObjectABC> getEntry(String key) {
try {
return cache.get(key);
} catch (ExecutionException e) {
throw new NonRetriableException(String.format(
"Exception occured while trying to get data from cache for the key : {} Exception: {}",
key.toString(), e));
}
}
}
In the above code, when I pass a String key (which is path to a local folder) it takes all the files present in that location and maps them to ObjectABC using ObjectMapper.
Now my problem is that I want to instead have a generic loading cache like
LoadingCache<String, List<Object>>.
And I want to map files in different folders to different Objects, e.g. map files in /root/Desktop/folder1 to List<ObjectABC> and map files in /root/Desktop/folder2 to List<ObjectDEF> and be able to store and retrieve that information from the cache.
How can I pass to the cache the information of which object to use for mapping?
You can create a custom class wrapping a LoadingCache<Key<?>, Object> like that:
class HeterogeneousCache {
private final LoadingCache<Key<?>, Object> cache;
public <T> T get(Key<T> key) throws ExecutionException {
return key.getType().cast(cache.get(key));
}
}
#Value // provides constructor, getters, equals, hashCode
class Key<T> {
private final String identifier;
private final Class<T> type;
}
(I used Lombok's #Value annotation for simplicity)
Of course, this is just a stub and you might need to adapt this to your needs. The main problem might be that you can't get a Class<List<ObjectABC>> - you can only get a Class<List>. The easiest way out of this is to wrap the List<ObjectABC> in some custom type. The harder way (not recommended) is to use Guava's TypeToken.
Attribution: This answer is based on the post by Frank Appel entitled How to Map Distinct Value Types Using Java Generics, which itself is based on Joshua Bloch's typesafe hetereogeneous containers from Effective Java.
Edit: A Complete Solution
Since the OP wants List<T> as result, and since he needs instances of TypeReference<T>, I replaced Class<T> with TypeReference<T> in Key<T>:
#Value // provides constructor, getters, equals, hashCode
class Key<T> {
private final String identifier;
private final TypeReference<T> typeReference;
}
Here's how CustomHeterogeneousCache looks now:
class CustomHeterogeneousCache {
private final LoadingCache<Key<?>, List<?>> cache = CacheBuilder.newBuilder()
.expireAfterAccess(10, TimeUnit.MINUTES)
.maximumSize(25)
.build(CacheLoader.from(this::computeEntry));
#SuppressWarnings("unchecked")
public <T> List<T> getEntry(Key<T> key) {
return (List<T>) cache.getUnchecked(key);
}
private <T> List<T> computeEntry(Key<T> key) {
final JoiObjectMapper mapper = new JoiObjectMapper();
final Collection<File> allConfigFiles = FileUtils.listFiles(new File(key.getIdentifier()), null, true);
return allConfigFiles.stream()
.map(configFile -> {
try {
return mapper.readValue(configFile, key.getTypeReference());
} catch (Exception e) {
throw new RuntimeException(e);
}
})
.collect(Collectors.toList());
}
}
Since implementations of TypeReference do not have value semantics, the user must make sure that every Key is created once, and then only referenced, e.g.:
class Keys {
public static final Key<ObjectABC> ABC = new Key<>("/root/Desktop/folder1", new TypeReference<ObjectABC>() {
});
public static final Key<ObjectDEF> DEF = new Key<>("/root/Desktop/folder2", new TypeReference<ObjectDEF>() {
});
}

ConcurrentHashMap dilemma in Java

CocncurrentHashMap provides a method to atomically check and add an element if it is not present via putIfAbsent method as shown in the example below
xmlObject = new XMLObejct(xmlId);
mapOfXMLs.putIfAbsent(xmlId, xmlObject);
However my dilemma is that , I have to create that xmlObject in advance. Is there a way to postpone the object creation after the key present check.
I want all three things below to happen atomically
Check if the key present
Create object if key is not present.
Add the object to map.
I know I can achieve this using synchronized block , If I am using a synchronized block , why use a CocurrentHashMap?
The Guava Caches offer such a functionality ( http://code.google.com/p/guava-libraries/wiki/CachesExplained ) though it's somewhat hidden.
If you can already use Java 8, then you can use computeIfAbsent. But I guess if you could use it, you would not have asked....
The standard, almost perfect pattern is this:
Foo foo = map.get(key);
if(foo == null) {
map.putIfAbsent(new Foo());
foo = map.get(key);
}
It does sometimes result in an extra object, but extremely infrequently, so from a performance standpoint is certainly fine. It only wouldn't be fine if constructing your object inserted into a database or charged a user or some such.
I've encountered this scenario a couple of times, and they allowed for the value to be created lazily. It may not apply to your use case, but if it does, this is basically what I did:
static abstract class Lazy<T> {
private volatile T value;
protected abstract T initialValue();
public T get() {
T tmp = value;
if (tmp == null) {
synchronized (this) {
tmp = value;
if (tmp == null)
value = tmp = initialValue();
}
}
return tmp;
}
}
static ConcurrentHashMap<Integer, Lazy<XmlObject>> map = new ConcurrentHashMap<>();
and then populating the map:
final int id = 1;
map.putIfAbsent(id, new Lazy<XmlObject>() {
#Override
protected XmlObject initialValue() {
return new XmlObject(id);
}
});
System.out.println(map.get(id).get());
You can of course create a specialized LazyXmlObject for convenience:
static class LazyXmlObject extends Lazy<XmlObject> {
private final int id;
public LazyXmlObject(int id) {
super();
this.id = id;
}
#Override
protected XmlObject initialValue() {
return new XmlObject(id);
}
}
and the usage would be:
final int id = 1;
map.putIfAbsent(id, new LazyXmlObject(id));
System.out.println(map.get(id).get());

Ways to reduce boilerplate using guava cache

I have dozens of data access objects like PersonDao with methods like:
Person findById(String id) {}
List<Person> search(String firstName, LastName, Page) {}
int searchCount(String firstName, LastName) {}
I've experimented by adding guava cache with one of these classes and it's really nice, but there's a lot of boilerplate.
Here's an example of making findById look in the cache first:
private final LoadingCache<String, Person> cacheById = CacheBuilder.newBuilder()
.maximumSize(maxItemsInCache)
.expireAfterWrite(cacheExpireAfterMinutes, TimeUnit.MINUTES)
.build(new CacheLoader<String, Person>() {
public Person load(String key) {
return findByIdNoCache(key);
});
//.... and update findById to call the cache ...
#Override
public Person findById(String id) {
return cacheById.getUnchecked(id);
}
So, because each method has different params and return types, I end up created a separate cacheLoader for every method!
I tried consolidating everything into a single CacheLoader that returns Object type and accepts a Map of objects, but then I end up with big ugly if/else to figure out which method to call to load the cache.
I'm struggling to find an elegant way to add caching to these data access objects, any suggestions? Maybe guava cache isn't meant for this use case?
Try this. Unfortunately, there are compiler warnings due to generics... But we may supress them because we know nothing bad will happen.
public class CacheContainer {
private static final long maxItemsInCache = 42;
private static final long cacheExpireAfterMinutes = 42;
private final Map<String, LoadingCache> caches = Maps.newHashMap();
public <K, V> V getFromCache(String cacheId, K key, CacheLoader<K, V> loader) throws ExecutionException {
LoadingCache<K, V> cache = caches.get(cacheId);
if (cache == null) {
cache = CacheBuilder.newBuilder().
maximumSize(maxItemsInCache).
expireAfterWrite(cacheExpireAfterMinutes, TimeUnit.MINUTES).
build(loader);
caches.put(cacheId, cache);
}
return cache.get(key);
}
}
And then in your dao:
private final CacheContainer cacheContainer = new CacheContainer();
public Person findById(String id) {
cacheContainer.getFromCache("personById", id, new CacheLoader<String, Person>() {
#Override
public Person load(String key) {
return findByIdNoCache(key);
});
}
Other methods in the same way. I don't think you can reduce boilerplate any more than that.
Creating a CacheLoader (and separate cache) for each method you want to cache the results of is necessary. You could simplify things a little by creating a single CacheBuilder with the cache configuration you want and then creating each of the caches like that:
private final CacheBuilder<Object, Object> builder = CacheBuilder.newBuilder()
.maximumSize(maxItemsInCache)
.expireAfterWrite(cacheExpireAfterMinutes, TimeUnit.MINUTES);
private final LoadingCache<String, Person> cacheById = builder.build(
new CacheLoader<String, Person>() {
// ...
});
private final LoadingCache<Search, List<Person>> searchCache = builder.build(
new CacheLoader<Search, List<Person>>() {
// ...
});
// etc.

WeakMultiton: ensuring there's only one object for a specific database row

In my application I need to ensure that for an entity representing a data row in a database
I have at most one java object representing it.
Ensuring that they are equals() is not enough, since I could get caught by coherency problems.
So basically I need a multiton; moreover, I need not to keep this object in memory when it is not necessary, so I will be using weak references.
I have devised this solution:
package com.example;
public class DbEntity {
// a DbEntity holds a strong reference to its key, so as long as someone holds a
// reference to it the key won't be evicted from the WeakHashMap
private String key;
public void setKey(String key) {
this.key = key;
}
public String getKey() {
return key;
}
//other stuff that makes this object actually useful.
}
package com.example;
import java.lang.ref.WeakReference;
import java.util.WeakHashMap;
import java.util.concurrent.locks.ReentrantLock;
public class WeakMultiton {
private ReentrantLock mapLock = new ReentrantLock();
private WeakHashMap<String, WeakReference<DbEntity>> entityMap = new WeakHashMap<String, WeakReference<DbEntity>>();
private void fill(String key, DbEntity object) throws Exception {
// do slow stuff, typically fetch data from DB and fill the object.
}
public DbEntity get(String key) throws Exception {
DbEntity result = null;
WeakReference<DbEntity> resultRef = entityMap.get(key);
if (resultRef != null){
result = resultRef.get();
}
if (result == null){
mapLock.lock();
try {
resultRef = entityMap.get(key);
if (resultRef != null){
result = resultRef.get();
}
if (result == null){
result = new DbEntity();
synchronized (result) {
// A DbEntity holds a strong reference to its key, so the key won't be evicted from the map
// as long as result is reachable.
entityMap.put(key, new WeakReference<DbEntity>(result));
// I unlock the map, but result is still locked.
// Keeping the map locked while querying the DB would serialize database calls!
// If someone tries to get the same DbEntity the method will wait to return until I get out of this synchronized block.
mapLock.unlock();
fill(key, result);
// I need the key to be exactly this String, not just an equal one!!
result.setKey(key);
}
}
} finally {
// I have to check since I could have already released the lock.
if (mapLock.isHeldByCurrentThread()){
mapLock.unlock();
}
}
}
// I synchronize on result since some other thread could have instantiated it but still being busy initializing it.
// A performance penality, but still better than synchronizing on the whole map.
synchronized (result) {
return result;
}
}
}
WeakMultiton will be instantiated only in the database wrapper (single point of access to the database) and its get(String key) will of course be the only way to retrieve a DbEntity.
Now, to the best of my knowledge this should work, but since this stuff is pretty new to me, I fear I could be overseeing something about the synchronization or the weak references!
Can you spot any flaw or suggest improvements?
I found out about guava's MapMaker and wrote this generic AbstractWeakMultiton:
package com.example;
import java.util.Map;
import java.util.concurrent.locks.ReentrantLock;
import com.google.common.collect.MapMaker;
public abstract class AbstractWeakMultiton<K,V, E extends Exception> {
private ReentrantLock mapLock = new ReentrantLock();
private Map<K, V> entityMap = new MapMaker().concurrencyLevel(1).weakValues().<K,V>makeMap();
protected abstract void fill(K key, V value) throws E;
protected abstract V instantiate(K key);
protected abstract boolean isNullObject(V value);
public V get(K key) throws E {
V result = null;
result = entityMap.get(key);
if (result == null){
mapLock.lock();
try {
result = entityMap.get(key);
if (result == null){
result = this.instantiate(key);
synchronized (result) {
entityMap.put(key, result);
// I unlock the map, but result is still locked.
// Keeping the map locked while querying the DB would serialize database calls!
// If someone tries to get the same object the method will wait to return until I get out of this synchronized block.
mapLock.unlock();
fill(key, result);
}
}
} finally {
// I have to check since the exception could have been thrown after I had already released the lock.
if (mapLock.isHeldByCurrentThread()){
mapLock.unlock();
}
}
}
// I synchronize on result since some other thread could have instantiated it but still being busy initializing it.
// A performance penalty, but still better than synchronizing on the whole map.
synchronized (result) {
// I couldn't have a null result because I needed to synchronize on it,
// so now I check whether it's a mock object and return null in case.
return isNullObject(result)?null:result;
}
}
}
It has the following advantages to my earlier try:
It does not depend on the fact that values hold a strong reference to the key
It does not need to do the awkward double checking for expired weak references
It is reusable
On the other hand, it depends on the rather beefy Guava library, while the first solution used just classes from the runtime environment. I can live with that.
I'm obviously still looking for further improvements and error spotting, and basically everything that answers the most important question: will it work?

Categories