Lazy initialization / memoization without volatile - java

It appears the Java Memory Model does not define "refreshing" and "flushing" of the local cache, instead people only call it that way for simplicity, but actually the "happens-before" relationship implies refreshing and flushing somehow (would be great if you can explain that, but not directly part of the question).
This is getting me really confused combined with the fact that the section about the Java Memory Model in the JLS is not written in a way which makes it easy to understand.
Therefore could you please tell me if the assumptions I made in the following code are correct and if it is therefore guaranteed to run correctly?
It is partially based on the code provided in the Wikipedia article on Double-checked locking, however there the author used a wrapper class (FinalWrapper), but the reason for this is not entirely obvious to me. Maybe to support null values?
public class Memoized<T> {
private T value;
private volatile boolean _volatile;
private final Supplier<T> supplier;
public Memoized(Supplier<T> supplier) {
this.supplier = supplier;
}
public T get() {
/* Apparently have to use local variable here, otherwise return might use older value
* see https://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
*/
T tempValue = value;
if (tempValue == null) {
// Refresh
if (_volatile);
tempValue = value;
if (tempValue == null) {
// Entering refreshes, or have to use `if (_volatile)` again?
synchronized (this) {
tempValue = value;
if (tempValue == null) {
value = tempValue = supplier.get();
}
/*
* Exit should flush changes
* "Flushing" does not actually exists, maybe have to use
* `_volatile = true` instead to establish happens-before?
*/
}
}
}
return tempValue;
}
}
Also I have read that the constructor call can be inlined and reordered resulting in a reference to an uninitialized object (see this comment on a blog). Is it then safe to directly assign the result of the supplier or does this have to be done in two steps?
value = tempValue = supplier.get();
Two steps:
tempValue = supplier.get();
// Reorder barrier, maybe not needed?
if (_volatile);
value = tempValue;
Edit: The title of this question is a little bit misleading, the goal was to have reduced usage of a volatile field. If the initialized value is already in the cache of a thread, then value is directly accessed without the need to look in the main memory again.

You can reduce usage of volatile if you have only a few singletons. Note: you have to repeat this code for each singleton.
enum LazyX {
;
static volatile Supplier<X> xSupplier; // set somewhere before use
static class Holder {
static final X x = xSupplier.get();
}
public static X get() {
return Holder.x;
}
}
If you know the Supplier, this becomes simpler
enum LazyXpensive {
;
// called only once in a thread safe manner
static final Xpensive x = new Xpensive();
// after class initialisation, this is a non volatile read
public static Xpensive get() {
return x;
}
}
You can avoid making the field volatile by using Unsafe
import sun.misc.Unsafe;
import java.lang.reflect.Field;
import java.util.function.Supplier;
public class LazyHolder<T> {
static final Unsafe unsafe = getUnsafe();
static final long valueOffset = getValueOffset();
Supplier<T> supplier;
T value;
public T get() {
T value = this.value;
if (value != null) return value;
return getOrCreate();
}
private T getOrCreate() {
T value;
value = (T) unsafe.getObjectVolatile(this, valueOffset);
if (value != null) return value;
synchronized (this) {
value = this.value;
if (value != null) return value;
this.value = supplier.get();
supplier = null;
return this.value;
}
}
public static Unsafe getUnsafe() {
try {
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
return (Unsafe) theUnsafe.get(null);
} catch (NoSuchFieldException | IllegalAccessException e) {
throw new AssertionError(e);
}
}
private static long getValueOffset() {
try {
return unsafe.objectFieldOffset(LazyHolder.class.getDeclaredField("value"));
} catch (NoSuchFieldException e) {
throw new AssertionError(e);
}
}
}
However, having the extra look up is a micro optimisation. If you are willing to take a synchronisation hit once per thread, you can avoid using volatile at all.

Your code is not thread safe, which can easily be shown by stripping off all irrelevant parts:
public class Memoized<T> {
private T value;
// irrelevant parts omitted
public T get() {
T tempValue = value;
if (tempValue == null) {
// irrelevant parts omitted
}
return tempValue;
}
}
So value has no volatile modifier and you’re reading it within the get() method without synchronization and when non-null, proceed using it without any synchronization.
This code path alone is already making the code broken, regardless of what you are doing when assigning value, as all thread safe constructs require both ends, reading and writing sides, to use a compatible synchronization mechanism.
The fact that you are using esoteric constructs like if (_volatile); becomes irrelevant then, as the code is already broken.
The reason why the Wikipedia example uses a wrapper with a final field is that immutable objects using only final fields are immune to data races and hence, the only construct that is safe when reading its reference without a synchronization action.
Note that since lambda expressions fall into the same category, you can use them to simplify the example for your use case:
public class Memoized<T> {
private boolean initialized;
private Supplier<T> supplier;
public Memoized(Supplier<T> supplier) {
this.supplier = () -> {
synchronized(this) {
if(!initialized) {
T value = supplier.get();
this.supplier = () -> value;
initialized = true;
}
}
return this.supplier.get();
};
}
public T get() {
return supplier.get();
}
}
Here, supplier.get() within Memoized.get() may read an updated value of supplier without synchronization action, in which case it will read the correct value, because it is implicitly final. If the method reads an outdated value for the supplier reference, it will end up at the synchronized(this) block which uses the initialized flag to determine whether the evaluation of the original supplier is necessary.
Since the initialized field will only be accessed within the synchronized(this) block, it will always evaluate to the correct value. This block will be executed at most once for every thread, whereas only the first one will evaluate get() on the original supplier. Afterwards, each thread will use the () -> value supplier, returning the value without needing any synchronization actions.

Related

Java ConcurrentHashMap#computeIfAbsent equivalent for AtomicReference

I'm looking for code equivalent to the following:
ConcurrentHashMap<int, Object> map = new ConcurrentHashMap<>();
map.computeIfAbsent(key, n -> f(n));
Where f(n) is HTTP network call and blocking for the result
Bur referring to single element held in AtomicReference<Object> where I need to ensure f is called only once upon even if multiple threads do the access concurrently.
I tried using compareAndSet but this doesn't allow lambda passing.
Does updateAndGet achieve that? Its documentation mentions
The function should be side-effect-free, since it may be re-applied when attempted updates fail due to contention among threads.
Which doesn't seem to fill the need of invoking f only once.
I believe you need something like a concurrent lazy initializer.
It is possible to achieve this using:
If your requirement is to have only 1 instance in an application, you can use a thread-safe singleton. https://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiom
public class Something {
private final Result result;
private Something() {
result = f();
}
private static class LazyHolder {
public static final Something INSTANCE = new Something();
}
public static Something getInstance() {
return LazyHolder.INSTANCE;
}
}
If you want to have it in different places of your application, you can use:
Apache Commons Lang ConcurrentInitializer like LazyInitializer:
ConcurrentInitializer<> lazyInitializer = new LazyInitializer<Result>() {
#Override
protected Foo initialize() throws ConcurrentException {
return f();
}
};
Get instance
Result instance = lazyInitializer.get();
Google's Guava link:
Supplier<Result> resultSupplier = Suppliers.memoize(new Supplier<Result>() {
public Result get() {
return f();
}
});
Yon can create your own concurrent lazy initalizer in lock-free manner.
import java.util.concurrent.atomic.AtomicReference;
import java.util.function.Supplier;
public class LazyConcurrentSupplier<T> implements Supplier<T> {
static class Container<T> {
public static final int NULL_PHASE = -1, CREATING_PHASE = 0, CREATED = 1;
final int i;
final T value;
public Container(int i, T value) {
this.i = i;
this.value = value;
}
}
private final Container<T> NULL = new Container<>(Container.NULL_PHASE, null),
CREATING = new Container<>(Container.CREATING_PHASE, null);
private final AtomicReference<Container<T>> ref = new AtomicReference<>(NULL);
private final Supplier<T> supplier;
public LazyConcurrentSupplier(Supplier<T> supplier) {
this.supplier = supplier;
}
#Override
public T get() {
Container<T> prev;
do {
if (ref.compareAndSet(NULL, CREATING)) {
T res = supplier.get();
ref.set(new Container<>(Container.CREATED, res));
return res;
} else {
prev = ref.get();
if (prev.i == Container.CREATED) {
return prev.value;
}
}
} while (prev.i < Container.CREATED);
return prev.value;
}
}
From your question, I think you want to avoid doing the HTTP request multiple times.
You could have a map of FutureTask(s) that asynchronously performs the HTTP request for you. In this way, if a thread tries to computeIfAbsent it will see the FutureTask created by another thread even if the HTTP operation is not done yet.
You could use an AtomicBoolean with an initial value of true and allow each thread should call AtomicBoolean::getAndSet with the value false. If the return value is true then you execute your function.
This will ensure that the call is only made once since only the first thread will succeed.

Guava cache considering older key

I am facing an issue with Guava Caches. When I have only one element in cache, things are fine. But when I load a second element, Its trying to pick with key of earlier entry
private static LoadingCache<String, MyClass> cache = null;
....
public MyClass method(final String id1, final long id2) {
log.error("inside with "+id1);
final String cacheKey = id1+"-"+id2;
if(cache == null){
cache = CacheBuilder.newBuilder()
.maximumSize(1000)
.build(
new CacheLoader<String, MyClass>() {
#Override
public MyClass load(String key) {
return getValue(cacheKey);
}
}
);
}
try {
return cache.get(cacheKey);
} catch (ExecutionException ex) {
log.error("EEE missing entry",ex);
}
}
private MyClass getValue(String cacheKey){
log.error("not from cache "+cacheKey);
...
}
The log says:
inside with 129890038707408035563943963861595603358
not from cache 1663659699-315839912047403113610285801857400882820 // This is key for the earlier entry
For eg, When I call method("1", 2), it loads the value in cache and I am able to get it from cache subsequently. Now I call method ("3", 4), this is not in cache, so getValue() is called and the log prints out the key for method("1", 2)
Where am I going wrong?
Your problem is related to how you create your CacheLoader, if you check well you will see that you initialize it with a given cache key (the value of the local variable cacheKey at the time the cache is lazily initialized) while it should be more generic and rely on the key provided as parameter to the method load of your CacheLoader otherwise it will load the cache by calling getValue(key) with the same key.
It should be this:
new CacheLoader<String, MyClass>() {
#Override
public MyClass load(String key) {
return getValue(key); // instead of return getValue(cacheKey);
}
}
NB: The way you initialize your cache is not thread safe, indeed if it has not been initialized and your method method is called by several threads concurrently it will be created several times instead of one.
One way could be to use the double-checked locking idiom as next:
private static volatile LoadingCache<String, MyClass> cache = null;
public MyClass method(final String id1, final long id2) {
...
if(cache == null){
synchronized (MyClass.class) {
if(cache == null){
cache = ...
}
}
}
NB: Do not initialize a static cache with a CacheLoader based on a non static method, it is much too error prone. Make them both non static or static but don't mix them.
Assuming that you can make both static, your cache initialization will be very simply, it would simply be:
private static final LoadingCache<String, MyClass> cache = CacheBuilder.newBuilder()...
No need to initialize it lazily which will also simplify a lot the code of your method as it will simply be reduce to:
public MyClass method(final String id1, final long id2) {
log.error("inside with "+id1);
final String cacheKey = id1+"-"+id2;
try {
return cache.get(cacheKey);
} catch (ExecutionException ex) {
log.error("EEE missing entry",ex);
}
}

Rete object not freeing Values after reset

I am using Jess together with a FixThreadPool to create several Rete engines that can be used to evaluate the performance of a system in a parallel mode. Each Rete engine runs independently from the others and takes as an input a Java object containing the design of the system and outputs another Java object that contains its performance metrics.
Before evaluating each system, I reset the Rete engines to their original state. However, as my program runs the RAM memory keeps piling up, with more and more jess.Value objects being stored.
This is the class that I use to interface Jess with Java:
public class Variant {
private final Object value;
private final String type;
public Variant(Object value) {
this.value = cast2JavaObject(value);
this.type = (this.value instanceof List) ? "multislot" : "slot";
}
public Object getValue() {
return value;
}
public String getType() {
return type;
}
private Object cast2JavaObject(Object value) {
try {
if (value instanceof Value) {
return castJessValue((Value) value);
} else {
return value;
}
} catch (Exception e) {
System.out.println(e.getMessage());
e.printStackTrace();
return null;
}
}
private synchronized Object castJessValue(Value value) throws Exception {
if (value.type() == RU.LIST) {
List list = new ArrayList<Object>();
list.addAll(Arrays.asList((Object[]) RU.valueToObject(Object.class, value, null)));
return list;
} else {
return RU.valueToObject(Object.class, value, null);
}
}
public Value toJessValue() throws Exception {
Object val;
if (value instanceof List) {
val = ((List) value).toArray();
} else {
val = value;
}
return RU.objectToValue(val.getClass(), val);
}
}
Is it possible that the Object contained within the Variant is pointing to the contents of a jess.Value and therefore they are not being collected by the GC when I call rete.reset()?
I think that this would be possible if the object passed in the constructor (be it a jess.Value or a plain POJO) references one or more jess.Value's. Neither your cast2JavaObject nor RU.valueToObject are recursive + introspective.
However, what if there were jess.Value objects contained? They are decorations for Java objects, and even if they were unwrapped the heap would be piling up with the bared objects alone, just slower.
If you use store/fetch I'd also call clearStorage in addition to reset.
I suggest an experiment to narrow the OOM problem down. Rather than reset, recreate the Rete object. If the problem persists, I daresay it is in some other nook or cranny of your application.

ConcurrentHashMap dilemma in Java

CocncurrentHashMap provides a method to atomically check and add an element if it is not present via putIfAbsent method as shown in the example below
xmlObject = new XMLObejct(xmlId);
mapOfXMLs.putIfAbsent(xmlId, xmlObject);
However my dilemma is that , I have to create that xmlObject in advance. Is there a way to postpone the object creation after the key present check.
I want all three things below to happen atomically
Check if the key present
Create object if key is not present.
Add the object to map.
I know I can achieve this using synchronized block , If I am using a synchronized block , why use a CocurrentHashMap?
The Guava Caches offer such a functionality ( http://code.google.com/p/guava-libraries/wiki/CachesExplained ) though it's somewhat hidden.
If you can already use Java 8, then you can use computeIfAbsent. But I guess if you could use it, you would not have asked....
The standard, almost perfect pattern is this:
Foo foo = map.get(key);
if(foo == null) {
map.putIfAbsent(new Foo());
foo = map.get(key);
}
It does sometimes result in an extra object, but extremely infrequently, so from a performance standpoint is certainly fine. It only wouldn't be fine if constructing your object inserted into a database or charged a user or some such.
I've encountered this scenario a couple of times, and they allowed for the value to be created lazily. It may not apply to your use case, but if it does, this is basically what I did:
static abstract class Lazy<T> {
private volatile T value;
protected abstract T initialValue();
public T get() {
T tmp = value;
if (tmp == null) {
synchronized (this) {
tmp = value;
if (tmp == null)
value = tmp = initialValue();
}
}
return tmp;
}
}
static ConcurrentHashMap<Integer, Lazy<XmlObject>> map = new ConcurrentHashMap<>();
and then populating the map:
final int id = 1;
map.putIfAbsent(id, new Lazy<XmlObject>() {
#Override
protected XmlObject initialValue() {
return new XmlObject(id);
}
});
System.out.println(map.get(id).get());
You can of course create a specialized LazyXmlObject for convenience:
static class LazyXmlObject extends Lazy<XmlObject> {
private final int id;
public LazyXmlObject(int id) {
super();
this.id = id;
}
#Override
protected XmlObject initialValue() {
return new XmlObject(id);
}
}
and the usage would be:
final int id = 1;
map.putIfAbsent(id, new LazyXmlObject(id));
System.out.println(map.get(id).get());

WeakMultiton: ensuring there's only one object for a specific database row

In my application I need to ensure that for an entity representing a data row in a database
I have at most one java object representing it.
Ensuring that they are equals() is not enough, since I could get caught by coherency problems.
So basically I need a multiton; moreover, I need not to keep this object in memory when it is not necessary, so I will be using weak references.
I have devised this solution:
package com.example;
public class DbEntity {
// a DbEntity holds a strong reference to its key, so as long as someone holds a
// reference to it the key won't be evicted from the WeakHashMap
private String key;
public void setKey(String key) {
this.key = key;
}
public String getKey() {
return key;
}
//other stuff that makes this object actually useful.
}
package com.example;
import java.lang.ref.WeakReference;
import java.util.WeakHashMap;
import java.util.concurrent.locks.ReentrantLock;
public class WeakMultiton {
private ReentrantLock mapLock = new ReentrantLock();
private WeakHashMap<String, WeakReference<DbEntity>> entityMap = new WeakHashMap<String, WeakReference<DbEntity>>();
private void fill(String key, DbEntity object) throws Exception {
// do slow stuff, typically fetch data from DB and fill the object.
}
public DbEntity get(String key) throws Exception {
DbEntity result = null;
WeakReference<DbEntity> resultRef = entityMap.get(key);
if (resultRef != null){
result = resultRef.get();
}
if (result == null){
mapLock.lock();
try {
resultRef = entityMap.get(key);
if (resultRef != null){
result = resultRef.get();
}
if (result == null){
result = new DbEntity();
synchronized (result) {
// A DbEntity holds a strong reference to its key, so the key won't be evicted from the map
// as long as result is reachable.
entityMap.put(key, new WeakReference<DbEntity>(result));
// I unlock the map, but result is still locked.
// Keeping the map locked while querying the DB would serialize database calls!
// If someone tries to get the same DbEntity the method will wait to return until I get out of this synchronized block.
mapLock.unlock();
fill(key, result);
// I need the key to be exactly this String, not just an equal one!!
result.setKey(key);
}
}
} finally {
// I have to check since I could have already released the lock.
if (mapLock.isHeldByCurrentThread()){
mapLock.unlock();
}
}
}
// I synchronize on result since some other thread could have instantiated it but still being busy initializing it.
// A performance penality, but still better than synchronizing on the whole map.
synchronized (result) {
return result;
}
}
}
WeakMultiton will be instantiated only in the database wrapper (single point of access to the database) and its get(String key) will of course be the only way to retrieve a DbEntity.
Now, to the best of my knowledge this should work, but since this stuff is pretty new to me, I fear I could be overseeing something about the synchronization or the weak references!
Can you spot any flaw or suggest improvements?
I found out about guava's MapMaker and wrote this generic AbstractWeakMultiton:
package com.example;
import java.util.Map;
import java.util.concurrent.locks.ReentrantLock;
import com.google.common.collect.MapMaker;
public abstract class AbstractWeakMultiton<K,V, E extends Exception> {
private ReentrantLock mapLock = new ReentrantLock();
private Map<K, V> entityMap = new MapMaker().concurrencyLevel(1).weakValues().<K,V>makeMap();
protected abstract void fill(K key, V value) throws E;
protected abstract V instantiate(K key);
protected abstract boolean isNullObject(V value);
public V get(K key) throws E {
V result = null;
result = entityMap.get(key);
if (result == null){
mapLock.lock();
try {
result = entityMap.get(key);
if (result == null){
result = this.instantiate(key);
synchronized (result) {
entityMap.put(key, result);
// I unlock the map, but result is still locked.
// Keeping the map locked while querying the DB would serialize database calls!
// If someone tries to get the same object the method will wait to return until I get out of this synchronized block.
mapLock.unlock();
fill(key, result);
}
}
} finally {
// I have to check since the exception could have been thrown after I had already released the lock.
if (mapLock.isHeldByCurrentThread()){
mapLock.unlock();
}
}
}
// I synchronize on result since some other thread could have instantiated it but still being busy initializing it.
// A performance penalty, but still better than synchronizing on the whole map.
synchronized (result) {
// I couldn't have a null result because I needed to synchronize on it,
// so now I check whether it's a mock object and return null in case.
return isNullObject(result)?null:result;
}
}
}
It has the following advantages to my earlier try:
It does not depend on the fact that values hold a strong reference to the key
It does not need to do the awkward double checking for expired weak references
It is reusable
On the other hand, it depends on the rather beefy Guava library, while the first solution used just classes from the runtime environment. I can live with that.
I'm obviously still looking for further improvements and error spotting, and basically everything that answers the most important question: will it work?

Categories