I have a cache class which is based on ConcurrentHashMap. This cache is used to store results I get from a relatively slow reference data service.
One problem of this is that, when multiple threads try to get a key that does not exist, both thread will go and fetch the same key from the reference data service, resulting in two reference data calls.
I am thinking to improve the implementation of the cache so that only one of the threads query the reference data service.
Is there any standard implementation for this?
Here is sample code which stores the unique keys in a List<> keyLocks and if a object with the equivalent value is passed it will return the same key for it, and then a synchroized block on the key
private final List<Object> keyLocks = new ArrayList<>(); // field in Cache
public Object get(Object key){
Object lock;
synchronized (keyLocks) {
if (!keyLocks.contains(key)) {
keyLocks.add(key);
lock = key;
} else {
lock = keyLocks.get(keyLocks.indexOf(key));
}
}
synchronized (lock) {
if(innerCache.containsKey(key)){
return cache.get(key);
}else{
Object result = dataService.get(key);
innerCache.put(key,result);
return result;
}
}
}
Related
The situation:
I have a clearing table with multiple thousands of records. They are split into packages of e.g. 500 records. Then each packet is sent to the AS via Message Driven Beans. The AS calculates a key depending on the contents (e.g. currency, validStart, validEnd) of each record and needs to store this key in the database (together withe the combination of the contents).
The request:
To avoid duplicates i want a centralized "tool" which calculates the key and stores them and thus reduces communication with the database by caching those keys with the records.
Now I tried to use a local Infinispan cache accessed in a Utility-class-implementation for each package-processing-thread. This resulted in the fact, that multiple packages calculated the same key and thus duplicates were inserted in the database. Or sometimes I got deadlocks.
I tried to implement a "lock" via a static variable to block access for the cache during a database insert, but without success.
Next attempt was to use a replicated- respectively distributed-Infinispan cache. This did not change the results in AS behavior.
My last idea would be to implement as a bean managed singleton session bean to acquire a transaction lock during inserting into the database.
The AS currently runs in standalone mode, but will be moved to a cluster in near future, so a High Availability solution is preferred.
Resuming:
What's the correct way to lock Infinispan cache access during creation of (Key, Value) pairs to avoid duplicates?
Update:
#cruftex: My Request is: I have a set of (Key, Value) pairs, which shall be cached. If an insert of a new record should happen, then an algorithm is applied to it and the Key is calculated. Then the cache shall be checked if the key already exists and the Value will be appended to the new record. But if the Value does not exist, it shall be created and stored in the database.
The cache needs to be realized using Infinispan because the AS shall run in a cluster. The algorithm for creating the Keys exists. Inserting the Value in the database too (via JDBC or Entities). But i have the problem, that using Message Driven Beans (and thus multithreading in the AS) the same (Key, Value) Pair is calculated in different threads and thus each thread tries to insert the Values in the database (which i want to avoid!).
#Dave:
public class Cache {
private static final Logger log = Logger.getLogger(Cache.class);
private final Cache<Key, FullValueViewer> fullCache;
private HomeCache homes; // wraps EntityManager
private final Session session;
public Cache(Session session, EmbeddedCacheManager cacheContainer, HomeCache homes) {
this.session = session;
this.homes = homes;
fullCache = cacheContainer.getCache(Const.CACHE_CONDCOMBI);
}
public Long getId(FullValueViewer viewerWithoutId) {
Long result = null;
final Key key = new Key(viewerWithoutId);
FullValueViewer view = fullCache.get(key);
if(view == null) {
view = checkDatabase(viewerWithoutId);
if(view != null) {
fullCache.put(key, view);
}
}
if(view == null) {
view = createValue(viewerWithoutId);
// 1. Try
fullCache.put(key, view);
// 2. Try
// if(!fullCache.containsKey(key)) {
// fullCache.put(key, view);
// } else {
// try {
// homes.condCombi().remove(view.idnr);
// } catch (Exception e) {
// log.error("remove", e);
// }
// }
// 3. Try
// synchronized(fullCache) {
// view = createValue(viewerWithoutId);
// fullCache.put(key, view);
// }
}
result = view.idnr;
return result;
}
private FullValueViewer checkDatabase(FullValueViewer newView) {
FullValueViewer result = null;
try {
CondCombiBean bean = homes.condCombi().findByTypeAndKeys(_parameters_);
result = bean.getAsView();
} catch (FinderException e) {
}
return result;
}
private FullValueViewer createValue(FullValueViewer newView) {
FullValueViewer result = null;
try {
CondCombiBean bean = homes.condCombi().create(session.subpk);
bean.setFromView(newView);
result = bean.getAsView();
} catch (Exception e) {
log.error("createValue", e);
}
return result;
}
private class Key {
private final FullValueViewer view;
public Key(FullValueViewer v) {
this.view = v;
}
#Override
public int hashCode() {
_omitted_
}
#Override
public boolean equals(Object obj) {
_omitted_
}
}
}
The cache configurations i tried with Wildfly:
<cache-container name="server" default-cache="default" module="org.wildfly.clustering.server">
<local-cache name="default">
<transaction mode="BATCH"/>
</local-cache>
</cache-container>
<cache-container name="server" default-cache="default" module="org.wildfly.clustering.server">
<transport lock-timeout="60000"/>
<distributed-cache name="default" mode="ASYNC"/>
</cache-container>
I'll react only to the resume question:
You can't lock whole cache; that wouldn't scale. The best way would be to use cache.putIfAbsent(key, value) operation, and generate different key if the entry is already there (or use list as value and replace it using conditional cache.replace(key, oldValue, newValue)).
If you want to really prohibit writes to some key, you can use transactional cache with pessimistic locking strategy, and issue cache.getAdvancedCache().lock(key). Note that there's no unlock: all locks are released when the transaction is committed/rolled back through transaction manager.
You cannot generate your own key and use it to detect duplicates at the same time.
Either each data row is guaranteed to arrive only once, or it needs embodied a unique identifier from the external system that generates it.
If there is a unique identifier in the data, which, if all goes wrong, and no id is in there, is just all properties concatenated, then you need to use this to check for duplicates.
Now you can go with that unique identifier directly, or generate an own internal identifier. If you do the latter, you need a translation from the external id to the internal id.
If duplicates arrive, you need to lock based on the external id when you generate the internal id, and then record what internal id you assigned.
To generate a unique sequence of long values, in a cluster, you can use the CAS-operations of the cache. For example something like this:
#NotThreadSafe
class KeyGeneratorForOneThread {
final String KEY = "keySequenceForXyRecords";
final int INTERVAL = 100;
Cache<String,Long> cache = ...;
long nextKey = 0;
long upperBound = -1;
void requestNewInterval() {
do {
nextKey = cache.get(KEY);
upperBound = nextKey + INTERVAL;
} while (!cache.replace(KEY, nextKey, upperBound));
}
long generateKey() {
if (nextKey >= upperBound) {
requestNewInterval();
}
return nextKey++;
}
}
Every thread has its own key generator and would generate 100 keys without needing coordination.
You may need separate caches for:
locking by external id
lookup from external to internal id
sequence number, attention that is actually not a cache, since it must know the last number after a restart
internal id to data
We found a solution that works in our case and might be helpful for somebody else out there:
We have two main components, a cache-class and a singleton bean.
The cache contains a copy of all records currently present in the database and a lot of logic.
The singleton bean has access to the infinispan-cache and is used for creating new records.
Initialy the cache fetches a copy of the infinispan-cache from the singleton bean. Then, if we search a record in the cache, we first apply a kind of hash-method, which calculates a unqiue key for the record. Using this key we can identify, if the record needs to be added to the database.
If so, then the cache calls the singleton bean using a create-method with a #Lock(WRITE) Annotation. The create method first checks, if the value is contained in the infinispan-cache and if not, it creates a new record.
Using this approach we can guarantee, that even if the cache is used in multiple threads and each thread sends a request to create the same record in the database, the create process is locked and all following requests won't be proceeded because the value was already created in a previous request.
Description below the code...
// Singleton
public static final Map<String, Account> SHARED_ACCOUNT_HASHMAP =
Collections.synchronizedMap(new HashMap<>());
public init(String[] credentials) {
Account account = null;
String uniqueID = uniqueAccountIdentifier(credentials);
if (SHARED_ACCOUNT_HASHMAP.containsKey(uniqueID)) {
account = SHARED_ACCOUNT_HASHMAP.get(uniqueID);
log("...retrieved Shared Account object: %s", uniqueID);
}
// create the Account object (if necessary)
if (account == null) {
account = new Account(credentials);
// Store it in the SHARED_ACCOUNT_HASHMAP
SHARED_ACCOUNT_HASHMAP.put(uniqueID, account);
log("...created Account object: %s",uniqueID);
}
}
What I want to achieve
There are multiple Threads accessing this Singleton HashMap
The goal of this HashMap is to only allow the creation of ONE Account per uniqueID
The account later can be retrieved by various threads for Account operations
Each Thread has this init() method and runs it once.
So the first Thread that cannot find an existing uniqueID Account, creates a new one and places it in the HashMap. The next Thread finds that for the same uniqueID, there is an Account object already - so retrieves it for its own use later
My problem...
How can I get the other Threads (second, third, etc.) to wait while the first Thread is inserting a new Account object?
to phrase it another way, there should never be 2 threads ever that receive a value of null when reading the HashMap for the same uniqueID key. The first thread may receive a value of null, but the second should retrieve the Account object that the first placed there.
According to the docs for synchronizedMap()
Returns a synchronized (thread-safe) map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views
In other words you still need to have synchronized access to SHARED_ACCOUNT_HASHMAP:
public init(String[] credentials) {
Account account = null;
String uniqueID = uniqueAccountIdentifier(credentials);
synchronized (SHARED_ACCOUNT_HASHMAP) {
if (SHARED_ACCOUNT_HASHMAP.containsKey(uniqueID)) {
account = SHARED_ACCOUNT_HASHMAP.get(uniqueID);
log("...retrieved Shared Account object: %s", uniqueID);
}
// create the Account object (if necessary)
if (account == null) {
account = new Account(credentials);
// Store it in the SHARED_ACCOUNT_HASHMAP
SHARED_ACCOUNT_HASHMAP.put(uniqueID, account);
log("...created Account object: %s",uniqueID);
}
}
}
Consider using ReadWriteLock if you have multiple readers/writers (see ReadWriteLock example).
Generally the ConcurrentHashMap performs better than the sinchronized hash map you are using.
In the following code I can feel smell of race condition check-then-act as you are trying to perform two operations on the synchronised map (containsKey and get):
if (SHARED_ACCOUNT_HASHMAP.containsKey(uniqueID)) {
account = SHARED_ACCOUNT_HASHMAP.get(uniqueID);
log("...retrieved Shared Account object: %s", uniqueID);
}
So to avoid race condition you need to synchronize over this map as:
synchronized (synchronizedMap) {
if (SHARED_ACCOUNT_HASHMAP.containsKey(uniqueID)) {
account = SHARED_ACCOUNT_HASHMAP.get(uniqueID);
log("...retrieved Shared Account object: %s", uniqueID);
}
// rest of the code.
}
Actually the synchronizedMap can protect itself against internal race conditions that could corrupt the map data but for external conditions (like above) you need to take care of that. If you feel you are using synchronized block at many places you can also think of using a regular map along with synchronized blocks. You will find this question also useful.
I'm trying to create a method with a ConcurrentHashMap with the following behavior.
Read no lock
Write lock
prior to writing,
read to see if record exist,
if it still doesn't exist, save to database and add record to map.
if record exist from previous write, just return record.
My thoughts.
private Object lock1 = new Object();
private ConcurrentHashMap<String, Object> productMap;
private Object getProductMap(String name) {
if (productMap.isEmpty()) {
productMap = new ConcurrentHashMap<>();
}
if (productMap.containsKey(name)) {
return productMap.get(name);
}
synchronized (lock1) {
if (productMap.containsKey(name)) {
return productMap.get(name);
} else {
Product product = new Product(name);
session.save(product);
productMap.putIfAbsent(name, product);
}
}
}
Could someone help me to understand if this is a correct approach?
There are several bugs here.
If productMap isn't guaranteed to be initialized, you will get an NPE in your first statement to this method.
The method isn't guaranteed to return anything if the map is empty.
The method doesn't return on all paths.
The method is both poorly named and unnecessary; you're trying to emulate putIfAbsent which half accomplishes your goal.
You also don't need to do any synchronization; ConcurrentHashMap is thread safe for your purposes.
If I were to rewrite this, I'd do a few things differently:
Eagerly instantiate the ConcurrentHashMap
Bind it to ConcurrentMap instead of the concrete class (so ConcurrentMap<String, Product> productMap = new ConcurrentHashMap<>();)
Rename the method to putIfMissing and delegate to putIfAbsent, with some logic to return the same record I want to add if the result is null. The above absolutely depends on Product having a well-defined equals and hashCode method, such that new Product(name) will produce objects with the same values for equals and hashCode if provided the same name.
Use an Optional to avoid any NPEs with the result of putIfAbsent, and to provide easier to digest code.
A snippet of the above:
public Product putIfMissing(String key) {
Product product = new Product(key);
Optional<Product> result =
Optional.ofNullable(productMap.putIfAbsent(key, product));
session.save(result.orElse(product));
return result.orElse(product);
}
Basically, what is needed is to synchronize requests to each of the records.
Some of the codes I can think of is like this:
//member variable
ConcurrentHashMap<Long, Object> lockMap = new ConcurrentHashMap<Long, Object>();
//one method
private void maintainLockObjects(long id){
lockMap.putIfAbsent(id, new Object());
}
//the request method
bar(long id){
maintainLockObjects(id);
synchronized(lockMap.get(id)){
//logic here
}
}
Have a look at ClassLoader.getClassLoadingLock:
Returns the lock object for class loading operations. For backward compatibility, the default implementation of this method behaves as follows. If this ClassLoader object is registered as parallel capable, the method returns a dedicated object associated with the specified class name. Otherwise, the method returns this ClassLoader object.
Its implementation code may look familiar to you:
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null) {
lock = newLock;
}
}
return lock;
}
The first null check is only for the mentioned backwards compatibility. So besides that, the only difference between this heavily used code and your approach is that this code avoids to call get afterwards as putIfAbsent already returns the old object if there is one.
So the simply answer, it works and this pattern also proving within a really crucial part of Oracle’s JRE implementation.
Is there any better way to cache up some very large objects, that can only be created once, and therefore need to be cached ? Currently, I have the following:
public enum LargeObjectCache {
INSTANCE;
private Map<String, LargeObject> map = new HashMap<...>();
public LargeObject get(String s) {
if (!map.containsKey(s)) {
map.put(s, new LargeObject(s));
}
return map.get(s);
}
}
There are several classes that can use the LargeObjects, which is why I decided to use a singleton for the cache, instead of passing LargeObjects to every class that uses it.
Also, the map doesn't contain many keys (one or two, but the key can vary in different runs of the program) so, is there another, more efficient map to use in this case ?
You may need thread-safety to ensure you don't have two instance of the same name.
It does matter much for small maps but you can avoid one call which can make it faster.
public LargeObject get(String s) {
synchronized(map) {
LargeObject ret = map.get(s);
if (ret == null)
map.put(s, ret = new LargeObject(s));
return ret;
}
}
As it has been pointed out, you need to address thread-safety. Simply using Collections.synchronizedMap() doesn't make it completely correct, as the code entails compound operations. Synchronizing the entire block is one solution. However, using ConcurrentHashMap will result in a much more concurrent and scalable behavior if it is critical.
public enum LargeObjectCache {
INSTANCE;
private final ConcurrentMap<String, LargeObject> map = new ConcurrentHashMap<...>();
public LargeObject get(String s) {
LargeObject value = map.get(s);
if (value == null) {
value = new LargeObject(s);
LargeObject old = map.putIfAbsent(s, value);
if (old != null) {
value = old;
}
}
return value;
}
}
You'll need to use it exactly in this form to have the correct and the most efficient behavior.
If you must ensure only one thread gets to even instantiate the value for a given key, then it becomes necessary to turn to something like the computing map in Google Collections or the memoizer example in Brian Goetz's book "Java Concurrency in Practice".