Persistence strategies for High Replication environment (Google App Engine)

Persistence strategies for High Replication environment (Google App Engine) - java

I have this code thtat works just fine w/o HR:
protected Entity createEntity(Key key, Map<String, Object> props){
Entity result = null;
try {
Entity e = new Entity(key);
Iterator it = props.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String, Object> entry = (Map.Entry<String, Object>) it.next();
String propName = entry.getKey();
Object propValue = entry.getValue();
setProperty(e, propName, propValue);
}
key = _ds.put(e);
if (key != null)
result = _ds.get(key);
} catch (EntityNotFoundException e1) {
}
return result;
}
This is just a simple method where its function is to create a new Entity out a a given key, just return NULL otherwise. This works fine without the HR configuration in JUnit however when I configured it, I am always getting an error, where _ds.get(key) can't find the key throwing:
EntityNotFoundException: No entity was found matching the key:
Specifically when doing:
while(it.hasNext()){
// stuff
createEntity(key, map);
// stuff
}
I assume that the problem in my code is that it tries to fetch the entity too soon. If thats is the case, how can I deal with this wihout resorting to Memcache or anything like that.
Update:
When the createEntity is executed within a transaction, it fails. However if I remove it outside of the transaction if fails miserably. I need to be able to run within a transaction, since my higher level API put lots of objects that needs to be there as a group.
Update:
I followed Strom's advise however I found a weird side effect, not doing a _ds.get(key) on the method, makes my PreparedQuery countEntities to fail. Where if add a _ds.get(key) even I don't do anything or save the Entity return from that get countEntities return the expected count. Why is that?

You try to create a new entity and then read back that entity within the same transaction? Can't be done.
Queries and gets inside transactions see a single, consistent snapshot of the datastore that lasts for the duration of the transaction. 1
In a transaction, all reads reflect the current, consistent state of the Datastore at the time the transaction started. This does not include previous puts and deletes inside the transaction. Queries and gets inside a transaction are guaranteed to see a single, consistent snapshot of the Datastore as of the beginning of the transaction. 2
This consistent snapshot view also extends to reads after writes inside transactions. Unlike with most databases, queries and gets inside a Datastore transaction do not see the results of previous writes inside that transaction. Specifically, if an entity is modified or deleted within a transaction, a query or get returns the original version of the entity as of the beginning of the transaction, or nothing if the entity did not exist then. 2
PS. Your assumption is worng, it's impossible to fetch an entity by key "too soon". Fetches by key are strongly consistent.
Also, why do you need to retrieve the entity again anyway? You just put it in the datastore yourself, so you already have its contents.
So change this part:
key = _ds.put(e);
if (key != null)
result = _ds.get(key);
To this:
key = _ds.put(e);
if (key != null)
result = e; // key.equals(e.getKey()) == true

Welcome in GAE environment, try to read it more times before you give up :
int counter = 0;
while (counter < NUMBER_OF_TRIES){
try {
//calling storage or any other non-reliable thing
if(success) {break;} //escape away if success
} catch(EntityNotFoundException e){
//log exception
counter++;
}
}
Important note from google documentation : "the rate at which you can write to the same entity group is limited to 1 write to the entity group per second."
source : https://developers.google.com/appengine/docs/java/gettingstarted/usingdatastore

Related

Hibernate is not responding while querying

How to handle if Hibernate queries take too long to return the result. I have already configured a query time out but on debugging it shows that the DB is responding by returning data, but hibernate fails to map the given data.
I do not want this scenario to happen in production, because my query might fail since the hibernate is not responding back.
I need a solution to come out from this scenario.
setProperty("javax.persistence.query.timeout", 180000);
JPAQuery query = queryFactory.select(....)
do{
List<Tuple> data = query.fetch().limit(5000);
//--------
} while(flag)
The above code works fine with data which are less in size, but for some data sets/ conditions the data is huge and eventually hibernate is not responding.

Try to follow these steps, if
Use Lazy Fetching instead of Eager Fetching like #ManyToMany(mappedBy="authors", fetch=FetchType.LAZY)
Or May be check if any of these Mistakes
You are using HibernateDaoSupport.getSession(), without ever returning them using releaseSession() (as described in the javadocs).
a) use HibernateDaoSupport.getHibernateTemplate() to cleanly create/destroy sessions
b) use getSession()/releaseSession() in a finally block
c) forget about HibernateDaoSupport, define transactions and use sessionFactory.getCurrentSession()
use, session.refresh(entity) or entityManager.refresh(entity) (if you use JPA) will give you fresh data from DB.

1) For setting the timeout in Hibernate query you can set hint "javax.persistence.query.timeout"
Code snippet ::
List<Test> test= em.createQuery("SELECT * FROM Test t")
.setHint("javax.persistence.query.timeout", 1)
.getResultList();
2) In case 2 columns are containing large data ,you can use CLOB and BLOB types for huge dataset.

Based on your last comment, you are looking for a way to manage a timeout for certains queries.
You can achieve this while creating your org.hibernate.Query with Hibernate:
Query queryObject = //initialize your query as you need;
queryObject.setTimeout(10); //that int represents the seconds of timeouts.
Hope this helps

Finally i couldn't find any direct way to get control over a hibernate query call.
The time out was not working for me since the Postgres already returned the result set but hibernate was taking time in mapping (correct me if i am wrong) due to data size.
Following piece of code saved me.
ExecutorService executor = Executors.newSingleThreadExecutor();
List<Future<List<Tuple>>> futureData = executor.invokeAll(Arrays.asList(new QueryService(params...)), 2, TimeUnit.MINUTES);
executor.shutdown();
for (Future future : futureData) {
try {
data = (List<Tuple>) future.get();
} catch (CancellationException e) {
if (EXPORT_LIMIT > 1000) {
EXPORT_LIMIT = 1000;
} else if (EXPORT_LIMIT > 500) {
EXPORT_LIMIT = 500;
} else if (EXPORT_LIMIT > 100) {
EXPORT_LIMIT = 100;
} else {
throw e;
}
isValid = false;
break;
}
}
So basically my default fetch limit of 5000 if not working, then i keep trying till 100.
If fetch size of 100 is also failing an exception will be thrown.
Thank you.

Spring Data Mongo: How to save batch ignoring all duplicate key errors?

I have the following domain object:
#Document
class Foo {
#Id
private final String bar;
private final String baz;
// getters, setters, constructor omitted
}
Which is inserted as follows:
Collection<Foo> foos = ...;
mongoTemplate.insert(foos, Foo.class);
How to save all results in one call ignoring all duplicate key exceptions ?

In my case it was not suitable to allow modification/overwriting of the existing documents as in #marknorkin's answer. Instead, I only wanted to insert new documents. I came up with this using MongoOperations, which is injectable in Spring. The code below is in Kotlin.
try {
// we do not want to overwrite existing documents, especially not behind the event horizon
// we hence use unordered inserts and supresss the duplicate key exceptions
// as described in: https://docs.mongodb.com/v3.2/reference/method/db.collection.insertMany/#unordered-inserts
mongoOps.bulkOps(BulkOperations.BulkMode.UNORDERED, EventContainer::class.java)
.insert(filtered)
.execute()
} catch (ex: BulkOperationException) {
if (!isDuplicateKeyException(ex)) {
throw ex
}
}
With this little helper
private fun isDuplicateKeyException(ex: BulkOperationException): Boolean {
val duplicateKeyErrorCode = 11000
return ex.errors.all { it.code == duplicateKeyErrorCode }
}

I searched through spring data mongo documentation and other resources, but didn't find expected answer.
Seems like Mongo inserts batch docs until unique key constraint is met, and it's up to DB to decide.
So for example if you need to insert 100 docs and document on position 50 already exists in DB then the first 49 will be inserted and the second 50 will not.
What I came up is the next solution:
Set<String> ids = foos.stream().map(Foo::getBar).collect(toSet()); // collect all ids from docs that will be inserted
WriteResult writeResult = mongoTemplate.remove(new Query(Criteria.where("_id").in(ids)), Foo.class); // perform remove with collected ids
mongoTemplate.insert(foos, Foo.class); // now can safely insert batch
So DB will be called twice.
Also as bar is indexed field the remove operation will be fast.

Implementation of Infinispan cache in Wildfly 9 AS (considering clustering)

The situation:
I have a clearing table with multiple thousands of records. They are split into packages of e.g. 500 records. Then each packet is sent to the AS via Message Driven Beans. The AS calculates a key depending on the contents (e.g. currency, validStart, validEnd) of each record and needs to store this key in the database (together withe the combination of the contents).
The request:
To avoid duplicates i want a centralized "tool" which calculates the key and stores them and thus reduces communication with the database by caching those keys with the records.
Now I tried to use a local Infinispan cache accessed in a Utility-class-implementation for each package-processing-thread. This resulted in the fact, that multiple packages calculated the same key and thus duplicates were inserted in the database. Or sometimes I got deadlocks.
I tried to implement a "lock" via a static variable to block access for the cache during a database insert, but without success.
Next attempt was to use a replicated- respectively distributed-Infinispan cache. This did not change the results in AS behavior.
My last idea would be to implement as a bean managed singleton session bean to acquire a transaction lock during inserting into the database.
The AS currently runs in standalone mode, but will be moved to a cluster in near future, so a High Availability solution is preferred.
Resuming:
What's the correct way to lock Infinispan cache access during creation of (Key, Value) pairs to avoid duplicates?
Update:
#cruftex: My Request is: I have a set of (Key, Value) pairs, which shall be cached. If an insert of a new record should happen, then an algorithm is applied to it and the Key is calculated. Then the cache shall be checked if the key already exists and the Value will be appended to the new record. But if the Value does not exist, it shall be created and stored in the database.
The cache needs to be realized using Infinispan because the AS shall run in a cluster. The algorithm for creating the Keys exists. Inserting the Value in the database too (via JDBC or Entities). But i have the problem, that using Message Driven Beans (and thus multithreading in the AS) the same (Key, Value) Pair is calculated in different threads and thus each thread tries to insert the Values in the database (which i want to avoid!).
#Dave:
public class Cache {
private static final Logger log = Logger.getLogger(Cache.class);
private final Cache<Key, FullValueViewer> fullCache;
private HomeCache homes; // wraps EntityManager
private final Session session;
public Cache(Session session, EmbeddedCacheManager cacheContainer, HomeCache homes) {
this.session = session;
this.homes = homes;
fullCache = cacheContainer.getCache(Const.CACHE_CONDCOMBI);
}
public Long getId(FullValueViewer viewerWithoutId) {
Long result = null;
final Key key = new Key(viewerWithoutId);
FullValueViewer view = fullCache.get(key);
if(view == null) {
view = checkDatabase(viewerWithoutId);
if(view != null) {
fullCache.put(key, view);
}
}
if(view == null) {
view = createValue(viewerWithoutId);
// 1. Try
fullCache.put(key, view);
// 2. Try
// if(!fullCache.containsKey(key)) {
// fullCache.put(key, view);
// } else {
// try {
// homes.condCombi().remove(view.idnr);
// } catch (Exception e) {
// log.error("remove", e);
// }
// }
// 3. Try
// synchronized(fullCache) {
// view = createValue(viewerWithoutId);
// fullCache.put(key, view);
// }
}
result = view.idnr;
return result;
}
private FullValueViewer checkDatabase(FullValueViewer newView) {
FullValueViewer result = null;
try {
CondCombiBean bean = homes.condCombi().findByTypeAndKeys(_parameters_);
result = bean.getAsView();
} catch (FinderException e) {
}
return result;
}
private FullValueViewer createValue(FullValueViewer newView) {
FullValueViewer result = null;
try {
CondCombiBean bean = homes.condCombi().create(session.subpk);
bean.setFromView(newView);
result = bean.getAsView();
} catch (Exception e) {
log.error("createValue", e);
}
return result;
}
private class Key {
private final FullValueViewer view;
public Key(FullValueViewer v) {
this.view = v;
}
#Override
public int hashCode() {
_omitted_
}
#Override
public boolean equals(Object obj) {
_omitted_
}
}
}
The cache configurations i tried with Wildfly:
<cache-container name="server" default-cache="default" module="org.wildfly.clustering.server">
<local-cache name="default">
<transaction mode="BATCH"/>
</local-cache>
</cache-container>
<cache-container name="server" default-cache="default" module="org.wildfly.clustering.server">
<transport lock-timeout="60000"/>
<distributed-cache name="default" mode="ASYNC"/>
</cache-container>

I'll react only to the resume question:
You can't lock whole cache; that wouldn't scale. The best way would be to use cache.putIfAbsent(key, value) operation, and generate different key if the entry is already there (or use list as value and replace it using conditional cache.replace(key, oldValue, newValue)).
If you want to really prohibit writes to some key, you can use transactional cache with pessimistic locking strategy, and issue cache.getAdvancedCache().lock(key). Note that there's no unlock: all locks are released when the transaction is committed/rolled back through transaction manager.

You cannot generate your own key and use it to detect duplicates at the same time.
Either each data row is guaranteed to arrive only once, or it needs embodied a unique identifier from the external system that generates it.
If there is a unique identifier in the data, which, if all goes wrong, and no id is in there, is just all properties concatenated, then you need to use this to check for duplicates.
Now you can go with that unique identifier directly, or generate an own internal identifier. If you do the latter, you need a translation from the external id to the internal id.
If duplicates arrive, you need to lock based on the external id when you generate the internal id, and then record what internal id you assigned.
To generate a unique sequence of long values, in a cluster, you can use the CAS-operations of the cache. For example something like this:
#NotThreadSafe
class KeyGeneratorForOneThread {
final String KEY = "keySequenceForXyRecords";
final int INTERVAL = 100;
Cache<String,Long> cache = ...;
long nextKey = 0;
long upperBound = -1;
void requestNewInterval() {
do {
nextKey = cache.get(KEY);
upperBound = nextKey + INTERVAL;
} while (!cache.replace(KEY, nextKey, upperBound));
}
long generateKey() {
if (nextKey >= upperBound) {
requestNewInterval();
}
return nextKey++;
}
}
Every thread has its own key generator and would generate 100 keys without needing coordination.
You may need separate caches for:
locking by external id
lookup from external to internal id
sequence number, attention that is actually not a cache, since it must know the last number after a restart
internal id to data

We found a solution that works in our case and might be helpful for somebody else out there:
We have two main components, a cache-class and a singleton bean.
The cache contains a copy of all records currently present in the database and a lot of logic.
The singleton bean has access to the infinispan-cache and is used for creating new records.
Initialy the cache fetches a copy of the infinispan-cache from the singleton bean. Then, if we search a record in the cache, we first apply a kind of hash-method, which calculates a unqiue key for the record. Using this key we can identify, if the record needs to be added to the database.
If so, then the cache calls the singleton bean using a create-method with a #Lock(WRITE) Annotation. The create method first checks, if the value is contained in the infinispan-cache and if not, it creates a new record.
Using this approach we can guarantee, that even if the cache is used in multiple threads and each thread sends a request to create the same record in the database, the create process is locked and all following requests won't be proceeded because the value was already created in a previous request.

Titan Cassandra - Ghost Vertices and Inconsistent Read Behavior Until Restart

Deleting vertices from Titan leads to inconsistent read behavior. I'm testing this on a single machine running Cassandra, here's my conf.properties:
storage.backend=cassandra
storage.hostname=localhost
storage.cassandra.keyspace=test
The following method deletes the appropriate vertex:
public void deleteProfile(String uuid, String puuid) {
for(Person person : this.graph.getVertices("uuid", uuid, Person.class)) {
if (person != null) {
for (Profile profile : this.graph.getVertices("uuid", puuid, Profile.class)) {
person.removeProfile(profile);
graph.removeVertex(profile.asVertex());
}
}
}
this.graph.getBaseGraph().commit();
}
When the following method gets called it returns two different sets of results:
public Iterable<ProfileImpl> getProfiles(String uuid) {
List<ProfileImpl> profiles = new ArrayList<>();
for(Person person : this.graph.getVertices("uuid", uuid, Person.class)) {
if (person != null) {
for (Profile profile : person.getProfiles()) {
profiles.add(profile.toImpl());
}
}
}
return profiles;
}
One result will be as expected - it will not contain the deleted profile. However, when I run it enough times - it sometimes will contain one extra profile - the one which was deleted.
Attempting to delete the same vertex again shows that no vertex exists with that 'uuid', the iterator's hasNext() returns false.
After the program is restarted, however, it never returns the deleted vertex. How can I fix this inconsistent behavior?

The problem is that on some threads, transactions had been opened for the graph already. Reading from the graph opens up a transaction, even if nothing is changed. These transactions need to be closed in order to ensure that the behavior is consistent.

According http://s3.thinkaurelius.com/docs/titan/0.9.0-M2/tx.html#tx-config you should set checkInternalVertexExistence

Filter for duplicate values.

I am trying to add a filter to check for duplicate values that a user might input. I am not sure where I am going going wrong in my query.
My query doesnot enter the loop to check if the name already exists.
I am fairly new to google-could. If someone can tell me on how I can fix my problem or if there is a better solution.
else if ( commandEls[0].equals( "add_director" ) ) {
String name = commandEls[1];
String gender = commandEls[2];
String date_of_birth = commandEls[3];
boolean duplicate = false;
//add a director record with the given fields to the datastore, don't forget to check for duplicates
Entity addDirectorEntity = new Entity("Director");
// check if the entity already exits
// if !duplicate add, else "Already exisits"
Query directorExists = new Query("Movies");
// Director Name is the primary key
directorExists.addFilter("directorName",Query.FilterOperator.EQUAL, name);
System.out.print(name);
PreparedQuery preparedDirectorQuery = datastore.prepare(directorExists);
System.out.print("outside");
for(Entity directorResult : preparedDirectorQuery.asIterable()){
// result already exists in the database
String dName = (String) directorResult.getProperty(name);
System.out.print(dName);
System.out.print("finish");
duplicate = true;
}
if(!duplicate){
addDirectorEntity.setProperty("directorName",name);
addDirectorEntity.setProperty("directorGender",gender);
addDirectorEntity.setProperty("directorDOB",date_of_birth);
try{
datastore.put(addDirectorEntity);
results = "Command executed successfully!";
}
catch(Exception e){
results = "Error";
}
}
else {
results = "Director already exists!";
}
}

Non-ancestor queries (like the one in your example) are eventually consistent, so they cannot reliably detect duplicate property values. Ancestor queries are fully consistent, but they require structuring your data using entity groups, and that comes at the cost of write throughput.
If the directorName property in your example is truly unique, you could use it as the name in the key of your Director entities. Then, when you are inserting a new Director entity, you can first check if it already exists (inside of a transaction).
There's no general, built-in way in Datastore to ensure the uniqueness of a property value. This related feature request contains discussion of some possible strategies for approximating a uniqueness constraint.
I'd also recommend reading up on queries and consistency in the Datastore.

That is a valid thing to do but i figured out my problem.
I am making an Entity for Director where as That should be for movies.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Persistence strategies for High Replication environment (Google App Engine) - java

Related

Hibernate is not responding while querying

Spring Data Mongo: How to save batch ignoring all duplicate key errors?

Implementation of Infinispan cache in Wildfly 9 AS (considering clustering)

Titan Cassandra - Ghost Vertices and Inconsistent Read Behavior Until Restart

Filter for duplicate values.

Categories

Resources