The situation:
I have a clearing table with multiple thousands of records. They are split into packages of e.g. 500 records. Then each packet is sent to the AS via Message Driven Beans. The AS calculates a key depending on the contents (e.g. currency, validStart, validEnd) of each record and needs to store this key in the database (together withe the combination of the contents).
The request:
To avoid duplicates i want a centralized "tool" which calculates the key and stores them and thus reduces communication with the database by caching those keys with the records.
Now I tried to use a local Infinispan cache accessed in a Utility-class-implementation for each package-processing-thread. This resulted in the fact, that multiple packages calculated the same key and thus duplicates were inserted in the database. Or sometimes I got deadlocks.
I tried to implement a "lock" via a static variable to block access for the cache during a database insert, but without success.
Next attempt was to use a replicated- respectively distributed-Infinispan cache. This did not change the results in AS behavior.
My last idea would be to implement as a bean managed singleton session bean to acquire a transaction lock during inserting into the database.
The AS currently runs in standalone mode, but will be moved to a cluster in near future, so a High Availability solution is preferred.
Resuming:
What's the correct way to lock Infinispan cache access during creation of (Key, Value) pairs to avoid duplicates?
Update:
#cruftex: My Request is: I have a set of (Key, Value) pairs, which shall be cached. If an insert of a new record should happen, then an algorithm is applied to it and the Key is calculated. Then the cache shall be checked if the key already exists and the Value will be appended to the new record. But if the Value does not exist, it shall be created and stored in the database.
The cache needs to be realized using Infinispan because the AS shall run in a cluster. The algorithm for creating the Keys exists. Inserting the Value in the database too (via JDBC or Entities). But i have the problem, that using Message Driven Beans (and thus multithreading in the AS) the same (Key, Value) Pair is calculated in different threads and thus each thread tries to insert the Values in the database (which i want to avoid!).
#Dave:
public class Cache {
private static final Logger log = Logger.getLogger(Cache.class);
private final Cache<Key, FullValueViewer> fullCache;
private HomeCache homes; // wraps EntityManager
private final Session session;
public Cache(Session session, EmbeddedCacheManager cacheContainer, HomeCache homes) {
this.session = session;
this.homes = homes;
fullCache = cacheContainer.getCache(Const.CACHE_CONDCOMBI);
}
public Long getId(FullValueViewer viewerWithoutId) {
Long result = null;
final Key key = new Key(viewerWithoutId);
FullValueViewer view = fullCache.get(key);
if(view == null) {
view = checkDatabase(viewerWithoutId);
if(view != null) {
fullCache.put(key, view);
}
}
if(view == null) {
view = createValue(viewerWithoutId);
// 1. Try
fullCache.put(key, view);
// 2. Try
// if(!fullCache.containsKey(key)) {
// fullCache.put(key, view);
// } else {
// try {
// homes.condCombi().remove(view.idnr);
// } catch (Exception e) {
// log.error("remove", e);
// }
// }
// 3. Try
// synchronized(fullCache) {
// view = createValue(viewerWithoutId);
// fullCache.put(key, view);
// }
}
result = view.idnr;
return result;
}
private FullValueViewer checkDatabase(FullValueViewer newView) {
FullValueViewer result = null;
try {
CondCombiBean bean = homes.condCombi().findByTypeAndKeys(_parameters_);
result = bean.getAsView();
} catch (FinderException e) {
}
return result;
}
private FullValueViewer createValue(FullValueViewer newView) {
FullValueViewer result = null;
try {
CondCombiBean bean = homes.condCombi().create(session.subpk);
bean.setFromView(newView);
result = bean.getAsView();
} catch (Exception e) {
log.error("createValue", e);
}
return result;
}
private class Key {
private final FullValueViewer view;
public Key(FullValueViewer v) {
this.view = v;
}
#Override
public int hashCode() {
_omitted_
}
#Override
public boolean equals(Object obj) {
_omitted_
}
}
}
The cache configurations i tried with Wildfly:
<cache-container name="server" default-cache="default" module="org.wildfly.clustering.server">
<local-cache name="default">
<transaction mode="BATCH"/>
</local-cache>
</cache-container>
<cache-container name="server" default-cache="default" module="org.wildfly.clustering.server">
<transport lock-timeout="60000"/>
<distributed-cache name="default" mode="ASYNC"/>
</cache-container>
I'll react only to the resume question:
You can't lock whole cache; that wouldn't scale. The best way would be to use cache.putIfAbsent(key, value) operation, and generate different key if the entry is already there (or use list as value and replace it using conditional cache.replace(key, oldValue, newValue)).
If you want to really prohibit writes to some key, you can use transactional cache with pessimistic locking strategy, and issue cache.getAdvancedCache().lock(key). Note that there's no unlock: all locks are released when the transaction is committed/rolled back through transaction manager.
You cannot generate your own key and use it to detect duplicates at the same time.
Either each data row is guaranteed to arrive only once, or it needs embodied a unique identifier from the external system that generates it.
If there is a unique identifier in the data, which, if all goes wrong, and no id is in there, is just all properties concatenated, then you need to use this to check for duplicates.
Now you can go with that unique identifier directly, or generate an own internal identifier. If you do the latter, you need a translation from the external id to the internal id.
If duplicates arrive, you need to lock based on the external id when you generate the internal id, and then record what internal id you assigned.
To generate a unique sequence of long values, in a cluster, you can use the CAS-operations of the cache. For example something like this:
#NotThreadSafe
class KeyGeneratorForOneThread {
final String KEY = "keySequenceForXyRecords";
final int INTERVAL = 100;
Cache<String,Long> cache = ...;
long nextKey = 0;
long upperBound = -1;
void requestNewInterval() {
do {
nextKey = cache.get(KEY);
upperBound = nextKey + INTERVAL;
} while (!cache.replace(KEY, nextKey, upperBound));
}
long generateKey() {
if (nextKey >= upperBound) {
requestNewInterval();
}
return nextKey++;
}
}
Every thread has its own key generator and would generate 100 keys without needing coordination.
You may need separate caches for:
locking by external id
lookup from external to internal id
sequence number, attention that is actually not a cache, since it must know the last number after a restart
internal id to data
We found a solution that works in our case and might be helpful for somebody else out there:
We have two main components, a cache-class and a singleton bean.
The cache contains a copy of all records currently present in the database and a lot of logic.
The singleton bean has access to the infinispan-cache and is used for creating new records.
Initialy the cache fetches a copy of the infinispan-cache from the singleton bean. Then, if we search a record in the cache, we first apply a kind of hash-method, which calculates a unqiue key for the record. Using this key we can identify, if the record needs to be added to the database.
If so, then the cache calls the singleton bean using a create-method with a #Lock(WRITE) Annotation. The create method first checks, if the value is contained in the infinispan-cache and if not, it creates a new record.
Using this approach we can guarantee, that even if the cache is used in multiple threads and each thread sends a request to create the same record in the database, the create process is locked and all following requests won't be proceeded because the value was already created in a previous request.
Related
I am trying to insert a list of rows(questions) to a table.(lets say 'Question_Table').
The whole process is performed in a single transaction. (ie. either i have to insert all questions or none). I am using Spring's declarative transaction.
I have customized the ID generation for Question_Table.(Ref : Custom id generation)
It works for the first question. But it wont work for the second question as the first row is un-committed and the table will be empty. I am not able to inject the DAO class into Id generator as it is not a spring managed bean(so i can have a method in DAO class that reads un-committed records).
What is the best approach to use in this situation.
Generator class
public class IdGenerator implements IdentifierGenerator, Configurable {
private String prefix = "";
private String queryKey = "";
#Override
public Serializable generate(SessionImplementor sessionImpl, Object arg1) throws HibernateException {
long count = (long)sessionImpl.getNamedQuery(queryKey).list().get(0);
System.out.println("COUNT >>> "+count);
long id = count + 1;
if(id == 4) throw new NullPointerException();
String generatedId = prefix + id;
return generatedId;
}
#Override
public void configure(Type arg0, Properties arg1, ServiceRegistry arg2) throws MappingException {
prefix=arg1.getProperty("PREFIX");
queryKey=arg1.getProperty("QUERY_KEY");
}
}
Query : select count(*) from Question_Table
As i stated in the comment, you maybe can use this approach if you did not have problem using combination of string and sequence. But the downside is the value will always increase even after you delete all record in that table.
If you insist of using count, then the solution is to define your entity id on save manually like. .save(question, "QSTN_"+(row_count + i)); but you will need to be able pass that row_count which i think is not a problem since it must be on one request.
I have no answer to your specific question but i'd like to share some considerations.
If your id generation depends on the database state, then it must be done at the database level (implementation is up to you, autoincrement, custom function or sequences, etc, etc)...
Otherwise if you do it at the application level you will necessary encounter concurrent access problems and have to mitigate it using some lock or dedicated transaction which will have a significant impact on the application performance and may become inconsistent later (when adding horizontal scalability or sharding for example).
However if you want to generate your ids in an applicative layer (which can be a very good idea) then you must have an unique, distributed system dedicated for this task which is not part of your current unit of work.
#Transactional(isolation = Isolation.READ_COMMITTED)
public AccountDto saveAccount(AccountDto accountDto) {
Long accountTypeId = accountDto.getAccountTypeId();
AccountTypes accountTypes = accountTypesDao.getById( accountTypeId ).orElseThrow( NotFoundAppException::new );
account.setAccountName( newAccountName );
account.setAccountType( accountTypes );
...
accountDao.save( account );
accountDao.flush();
// new inserted account id is in the transaction now
return createAccountDtoFrom( account );
}
I have a cache class which is based on ConcurrentHashMap. This cache is used to store results I get from a relatively slow reference data service.
One problem of this is that, when multiple threads try to get a key that does not exist, both thread will go and fetch the same key from the reference data service, resulting in two reference data calls.
I am thinking to improve the implementation of the cache so that only one of the threads query the reference data service.
Is there any standard implementation for this?
Here is sample code which stores the unique keys in a List<> keyLocks and if a object with the equivalent value is passed it will return the same key for it, and then a synchroized block on the key
private final List<Object> keyLocks = new ArrayList<>(); // field in Cache
public Object get(Object key){
Object lock;
synchronized (keyLocks) {
if (!keyLocks.contains(key)) {
keyLocks.add(key);
lock = key;
} else {
lock = keyLocks.get(keyLocks.indexOf(key));
}
}
synchronized (lock) {
if(innerCache.containsKey(key)){
return cache.get(key);
}else{
Object result = dataService.get(key);
innerCache.put(key,result);
return result;
}
}
}
Description below the code...
// Singleton
public static final Map<String, Account> SHARED_ACCOUNT_HASHMAP =
Collections.synchronizedMap(new HashMap<>());
public init(String[] credentials) {
Account account = null;
String uniqueID = uniqueAccountIdentifier(credentials);
if (SHARED_ACCOUNT_HASHMAP.containsKey(uniqueID)) {
account = SHARED_ACCOUNT_HASHMAP.get(uniqueID);
log("...retrieved Shared Account object: %s", uniqueID);
}
// create the Account object (if necessary)
if (account == null) {
account = new Account(credentials);
// Store it in the SHARED_ACCOUNT_HASHMAP
SHARED_ACCOUNT_HASHMAP.put(uniqueID, account);
log("...created Account object: %s",uniqueID);
}
}
What I want to achieve
There are multiple Threads accessing this Singleton HashMap
The goal of this HashMap is to only allow the creation of ONE Account per uniqueID
The account later can be retrieved by various threads for Account operations
Each Thread has this init() method and runs it once.
So the first Thread that cannot find an existing uniqueID Account, creates a new one and places it in the HashMap. The next Thread finds that for the same uniqueID, there is an Account object already - so retrieves it for its own use later
My problem...
How can I get the other Threads (second, third, etc.) to wait while the first Thread is inserting a new Account object?
to phrase it another way, there should never be 2 threads ever that receive a value of null when reading the HashMap for the same uniqueID key. The first thread may receive a value of null, but the second should retrieve the Account object that the first placed there.
According to the docs for synchronizedMap()
Returns a synchronized (thread-safe) map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views
In other words you still need to have synchronized access to SHARED_ACCOUNT_HASHMAP:
public init(String[] credentials) {
Account account = null;
String uniqueID = uniqueAccountIdentifier(credentials);
synchronized (SHARED_ACCOUNT_HASHMAP) {
if (SHARED_ACCOUNT_HASHMAP.containsKey(uniqueID)) {
account = SHARED_ACCOUNT_HASHMAP.get(uniqueID);
log("...retrieved Shared Account object: %s", uniqueID);
}
// create the Account object (if necessary)
if (account == null) {
account = new Account(credentials);
// Store it in the SHARED_ACCOUNT_HASHMAP
SHARED_ACCOUNT_HASHMAP.put(uniqueID, account);
log("...created Account object: %s",uniqueID);
}
}
}
Consider using ReadWriteLock if you have multiple readers/writers (see ReadWriteLock example).
Generally the ConcurrentHashMap performs better than the sinchronized hash map you are using.
In the following code I can feel smell of race condition check-then-act as you are trying to perform two operations on the synchronised map (containsKey and get):
if (SHARED_ACCOUNT_HASHMAP.containsKey(uniqueID)) {
account = SHARED_ACCOUNT_HASHMAP.get(uniqueID);
log("...retrieved Shared Account object: %s", uniqueID);
}
So to avoid race condition you need to synchronize over this map as:
synchronized (synchronizedMap) {
if (SHARED_ACCOUNT_HASHMAP.containsKey(uniqueID)) {
account = SHARED_ACCOUNT_HASHMAP.get(uniqueID);
log("...retrieved Shared Account object: %s", uniqueID);
}
// rest of the code.
}
Actually the synchronizedMap can protect itself against internal race conditions that could corrupt the map data but for external conditions (like above) you need to take care of that. If you feel you are using synchronized block at many places you can also think of using a regular map along with synchronized blocks. You will find this question also useful.
I have this code thtat works just fine w/o HR:
protected Entity createEntity(Key key, Map<String, Object> props){
Entity result = null;
try {
Entity e = new Entity(key);
Iterator it = props.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String, Object> entry = (Map.Entry<String, Object>) it.next();
String propName = entry.getKey();
Object propValue = entry.getValue();
setProperty(e, propName, propValue);
}
key = _ds.put(e);
if (key != null)
result = _ds.get(key);
} catch (EntityNotFoundException e1) {
}
return result;
}
This is just a simple method where its function is to create a new Entity out a a given key, just return NULL otherwise. This works fine without the HR configuration in JUnit however when I configured it, I am always getting an error, where _ds.get(key) can't find the key throwing:
EntityNotFoundException: No entity was found matching the key:
Specifically when doing:
while(it.hasNext()){
// stuff
createEntity(key, map);
// stuff
}
I assume that the problem in my code is that it tries to fetch the entity too soon. If thats is the case, how can I deal with this wihout resorting to Memcache or anything like that.
Update:
When the createEntity is executed within a transaction, it fails. However if I remove it outside of the transaction if fails miserably. I need to be able to run within a transaction, since my higher level API put lots of objects that needs to be there as a group.
Update:
I followed Strom's advise however I found a weird side effect, not doing a _ds.get(key) on the method, makes my PreparedQuery countEntities to fail. Where if add a _ds.get(key) even I don't do anything or save the Entity return from that get countEntities return the expected count. Why is that?
You try to create a new entity and then read back that entity within the same transaction? Can't be done.
Queries and gets inside transactions see a single, consistent snapshot of the datastore that lasts for the duration of the transaction. 1
In a transaction, all reads reflect the current, consistent state of the Datastore at the time the transaction started. This does not include previous puts and deletes inside the transaction. Queries and gets inside a transaction are guaranteed to see a single, consistent snapshot of the Datastore as of the beginning of the transaction. 2
This consistent snapshot view also extends to reads after writes inside transactions. Unlike with most databases, queries and gets inside a Datastore transaction do not see the results of previous writes inside that transaction. Specifically, if an entity is modified or deleted within a transaction, a query or get returns the original version of the entity as of the beginning of the transaction, or nothing if the entity did not exist then. 2
PS. Your assumption is worng, it's impossible to fetch an entity by key "too soon". Fetches by key are strongly consistent.
Also, why do you need to retrieve the entity again anyway? You just put it in the datastore yourself, so you already have its contents.
So change this part:
key = _ds.put(e);
if (key != null)
result = _ds.get(key);
To this:
key = _ds.put(e);
if (key != null)
result = e; // key.equals(e.getKey()) == true
Welcome in GAE environment, try to read it more times before you give up :
int counter = 0;
while (counter < NUMBER_OF_TRIES){
try {
//calling storage or any other non-reliable thing
if(success) {break;} //escape away if success
} catch(EntityNotFoundException e){
//log exception
counter++;
}
}
Important note from google documentation : "the rate at which you can write to the same entity group is limited to 1 write to the entity group per second."
source : https://developers.google.com/appengine/docs/java/gettingstarted/usingdatastore
I'm developing an application with some kind of 'facebook like' feature. Every time that a content published by a user is 'liked' he will have his punctuation increased. This app will be used by a large number of users around the company, so We are expecting a lot of concurrent updates to the same row.
simplified code
User punctuation table
Punctuation(
userId NVARCHAR2(32),
value NUMBER(10,0)
)/
Java code
public class Punctuation(){
private String userId;
private int value;
public Punctuation(final String userId, final int value){
this.userId = userId;
this.value = value;
}
public String getUserId();
public int getValue();
}
//simplified code
public final class PunctuationController{
private PunctuationController(){}
public static void addPunctuation(final Punctuation punctuation){
final Transaction transaction = TransactionFactory.createTransaction();
Connection conn = null;
PreparedStatment statment = null;
try{
synchronized(punctuation){
transaction.begin();
conn = transaction.getConnection();
statment = conn.preparedStatment("UPDATE Punctuation SET value = value + ? where userId = ?");
statment.setString('1', punctuation.getUserId());
statment.setInt('2', punctuation.getValue());
transaction.commit();
}
}catch (Exception e){
transaction.rollback();
}finally{
transaction.dispose();
if(statment !=null){
statment.close();
}
}
}
We are afraid of deadlocks during updates. Oracle allows to make the sum on a single query, I don't have to retrieve the value and make a second query to update with a new value, that's good. Also reading some other posts here, They said to create a synchronized block to lock an object, and let Java handle the synchronization between different threads. I choose the punctuation instance the method receives, this way I imagine that different combinations of user and value will allow concurrent acess to this methods, but will block an instance with same values (Do I have to implement equals() on Punctuation?)
Our database is Oracle 10g, Server Weblogic 11g, Java 6 and Linux (I dont know which flavor).
Thank you in advance!
You're wrong on your synchronization strategy. synchronized uses the intrinsic lock of the object between parentheses. If you have two Punctuation instances that you might consider equal because they refer to the same user_id, Java doesn't care: 2 objects, so 2 locks, so no mutual exclusion.
I really don't see why the above, without the synchronized, could generate deadlocks: you're updating a single row in the table. You could have a deadlock if you had two concurrent transaction with one updating user1, then user2, and the other one updating user2, then user1. But even then, the database would detect the deadlock and throw an exception for one of the transactions.
you need to use optimistic lock pattern. take a look here for more details http://docs.jboss.org/jbossas/docs/Server_Configuration_Guide/4/html/The_CMP_Engine-Optimistic_Locking.html
And probably this http://docs.jboss.org/jbossas/docs/Server_Configuration_Guide/4/html/The_CMP_Engine-Optimistic_Locking.html which is more low level details
After identification of concurrent issue using optimistic lock, you may want to prefer re-trying - you have a full control what to do