is google appengine datastore.get(key) consistent? - java

I've read the consistency page on
https://cloud.google.com/appengine/docs/java/datastore/structuring_for_strong_consistency
now i know that for queries to be consistent you need to use ancestor queries.
What about single key? query for example:
Entity e = datastore.get(Key)
are they eventually consistent or strongly consistent?
please do cite references or links

Yes, a get with a specific key is always consistent.
The documentation isn't as clear about this as it could be, but a get is not a query: it's a simple lookup in what is basically a key-value store. That will always return the correct data. It is only queries that can be inconsistent, because they must be done against indexes and the index update can lag.
The only reference I can give you is to point out that get is discussed on the Entities, Properties and Keys page whereas data consistency is discussed on the Datastore Queries page.

Related

SQL equivalent of Javax Cache 'put' (INSERT or UPDATE)

I am using javax cache along with database. I uses cache's APIs to get/put/delete entities and the database is behind this cache. For this,I am using CacheLoader and CacheWriter.
So, following are SQL's construct equivalent to cache API
SELECT -> get
INSERT -> put
DELETE -> delete
If I have entry already present in cache and I updated it, then I will get that value 'write' method only. But, since the value is present in database, I need to use UPDATE query.
How to identify which database operation to perform in cache's 'put' operation ?
Note : UPSERT is not good option from performance point of view.
If you put the value in the cache you can first check if the key is already there, in that case you need an UPDATE. If the key was not present, you need an INSERT. It sounds like you could benefit from an ORM with an L2 cache, such as Hibernate, which handles all these scenarios (and many more) for you.
There are several ways I can think of. Basically these are variations of:
Metadata in the database
Within an entity I have typically additional fields which are timestamps for insert and update and a modification counter which are handled by the object to relational mapper (ORM). That is very useful for debugging. The CacheWriter can check whether the insert timestamp is set, if yes, it is an update, if no it is an insert.
It does not matter whether the value gets evicted meanwhile, if your application is reading the latest contents through the cache and writes a modified version of it.
If your application does not read the data before modifying or this happens very often, I suggest to cache a flag that like insertedAlready. That leads to three way logic: isnerted, not inserted, not in the cache = don't know yet. In the letter case you need to do a read before update or insert in the cache writer.
Metadata in the cache only
The cached object stores additional data whether the object was read from the database before. Like:
class CachedDbValue<V> {
boolean insertedAlready;
V databaseContent;
}
The code facing your application needs to wrap the database data into the cached value.
Side note 1: Don't read the object from the cache and modify the instance directly, always make a copy. Modifying the object directly may have different unwanted effects with different JCache implementations. Also check my explanation here: javax.cache store by reference vs. store by value
Side note 2: You are building a caching ORM layer by yourself. Maybe use an existing one.

A hybrid of cache based and query based paging in hibernate/JPA

If the result set is large, then having the entire result set in memory (server cache e.g. hazelcast) will not be feasible. With large result sets, you cannot afford to have them in memory. In such case, you have to fetch a chunk of data at a time (query based paging). The down side of using query based paging, is that there will be multiple calls to the database for multiple page requests.
Can anyone suggest how to implement a hybrid approach of it.
I haven't put any sample code here since I think the question is more about a logic instead of specific code. Still if you need sample code I can put it.
Thanks in advance.
The most effective solution is to use the primary key as a paging criterion.This enables us to rely of first class constructs like a between range query which is simple for the RDBMS to optimize, the primary key of the queried entity will most likely be indexed already.
Retrieving data using a range query on the primary key is a two-step process. First one have to retrieve the collection of primary-keys, followed by a step to generate the intervals to properly identify a proper subset of the data,followed by the actual queries against the data.
This approach is almost as fast as the brute-force version. The memory consumption is about one tenth. By selecting the appropriate page-size for this implementation, you may alter the ratio between execution time and memory consumption. This version is also stateless, it does not keep references to resources like the ScrollableResults version does, nor does it strain the database like the version using setFirstResult/setMaxResult.
Effective pagination using Hibernate

how strong consistency and eventual consistency work in datastore

i'm no expert in Databases so what i know about queries is that they are the way to read or write in databases
in eventual consistency read will return stale data
in write query first data node will be updated but other node will need some time to be updated
in strong consistency read will be locked until data get modified to it latest version (really i'm not sure about what i said here so help if u got it wrong)
in write query all read operations for will be lock until data node get modified to its latest version
so if i write data as eventual and tried ancestors query to get that data will i get the latest version ?
if i used ancestors query to update would all eventual read operation get the latest version ?
update
i think Transactions is there so if there is multi modification request to the same data 1 will succeeded and other will fail after that the data the have been modified will take some time to be replicated in all datacenter so if transaction succeeded does not mean all read query will return the latest version (correct me if i'm right)
If you use what you call an "ancestor query", you're working in a transaction: either the transaction terminates successfully, in which case all subsequent reads will get the values as updated by the transaction, or else the transaction fails, in which case none of the changes made by the transaction will be seen (this all-or-nothing property is often referred to as a transaction being "atomic"). In particular, you do get strong consistency this way, not just eventual consistency.
The cost can be large, in terms of performance and scalability. In particular, an application should not update an entity group (any and all entities descending from a common ancestor) more than once a second, which can be a very constraining limit for a highly scalable application.
The online docs include a large variety of tips, tricks and advice on how to deal with this -- you could start at https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/ and continue with the "additional resources" this article lists at the end.
One simple idea that often suffices is that (differently from queries) getting a specific entity from its key is strongly consistent without needing transactions, and memcache is also strongly consistent; writing a modified entity gives you its new key, so you can stash that key into memcache and have other parts of your code fetch the modified entity from that key, rather than relying on queries. This has limits, of course, because memcache doesn't give you unbounded space -- but it's a useful idea to keep in mind, nevertheless, in many practical cases.
With GAE the only way to be consistante is to use transaction, into a transaction you can update, query the last update but it's slower.
For me using ancestors just compose the primary key and that's all.

Java - Google App Engine - modelling graph structures in Google Datastore

Google Apps Engine offers the Google Datastore as the only NoSQL database (I think it is based on BigTable).
In my application I have a social-like data structure and I want to model it as I would do in a graph database. My application must save heterogeneous objects (users,files,...) and relationships among them (such as user1 OWNS file2, user2 FOLLOWS user3, and so on).
I'm looking for a good way to model this typical situation, and I thought to two families of solutions:
List-based solutions: Any object contains a list of other related objects and the object presence in the list is itself the relationship (as Google said in the JDO part https://developers.google.com/appengine/docs/java/datastore/jdo/relationships).
Graph-based solution: Both nodes and relationships are objects. The objects exist independently from the relationships while each relationship contain a reference to the two (or more) connected objects.
What are strong and weak points of these two approaches?
About approach 1: This is the simpler approach one can think of, and it is also presented in the official documentation but:
Each directed relationship make the object record grow: are there any limitations on the number of the possible relationships given for instance by the object dimension limit?
Is that a JDO feature or also the datastore structure allows that approach to be naturally implemented?
The relationship search time will increase with the list, is this solution suitable for large (million) of relationships?
About approach 2: Each relationship can have a higher level of characterization (it is an object and it can have properties). And I think memory size is not a Google problem, but:
Each relationship requires its own record, so the search time for each related couple will increase as the total number of relationships increase. Is this suitable for large amount of relationships(millions, billions)? I.e. does Google have good tricks to search among records if they are well structured? Or I will be soon in a situation in which if I want to search a friend of User1 called User4 I have to wait seconds?
On the other side each object doesn't increase in dimension as new relationships are added.
Could you help me to find other important points on the two approaches in such a way to chose the best model?
First, the search time in the Datastore does not depend on the number of entities that you store, only on the number of entities that you retrieve. Therefore, if you need to find one relationship object out of a billion, it will take the same time as if you had just one object.
Second, the list approach has a serious limitation called "exploding indexes". You will have to index the property that contains a list to make it searchable. If you ever use a query that references more than just this property, you will run into this issue - google it to understand the implications.
Third, the list approach is much more expensive. Every time you add a new relationship, you will rewrite the entire entity at considerable writing cost. The reading costs will be higher too if you cannot use keys-only queries. With the object approach you can use keys-only queries to find relationships, and such queries are now free.
UPDATE:
If your relationships are directed, you may consider making Relationship entities children of User entities, and using an Object id as an id for a Relationship entity as well. Then your Relationship entity will have no properties at all, which is probably the most cost-efficient solution. You will be able to retrieve all objects owned by a user using keys-only ancestor queries.
I have an AppEngine application and I use both approaches. Which is better depends on two things: the practical limits of how many relationships there can be and how often the relationships change.
NOTE 1: My answer is based on experience with Objectify and heavy use of caching. Mileage may vary with other approaches.
NOTE 2: I've used the term 'id' instead of the proper DataStore term 'name' here. Name would have been confusing and id matches objectify terms better.
Consider users linked to the schools they've attended and vice versa. In this case, you would do both. Link the users to schools with a variation of the 'List' method. Store the list of school ids the user attended as a UserSchoolLinks entity with a different type/kind but with the same id as the user. For example, if the user's id = '6h30n' store a UserSchoolLinks object with id '6h30n'. Load this single entity by key lookup any time you need to get the list of schools for a user.
However, do not do the reverse for the users that attended a school. For that relationship, insert a link entity. Use a combination of the school's id and the user's id for the id of the link entity. Store both id's in the entity as separate properties. For example, the SchoolUserLink for user '6h30n' attending school 'g3g0a3' gets id 'g3g0a3~6h30n' and contains the fields: school=g3g0a3 and user=6h30n. Use a query on the school property to get all the SchoolUserLinks for a school.
Here's why:
Users will see their schools frequently but change them rarely. Using this approach, the user's schools will be cached and won't have to be fetched every time they hit their profile.
Since you will be getting the user's schools via a key lookup, you won't be using a query. Therefore, you won't have to deal with eventual consistency for the user's schools.
Schools may have many users that attended them. By storing this relationship as link entities, we avoid creating a huge single object.
The users that attended a school will change a lot. This way we don't have to write a single, large entity frequently.
By using the id of the User entity as the id for the UserSchoolLinks entity we can fetch the links knowing just the id of the user.
By combining the school id and the user id as the id for the SchoolUser link. We can do a key lookup to see if a user and school are linked. Once again, no need to worry about eventual consistency for that.
By including the user id as a property of the SchoolUserLink we don't need to parse the SchoolUserLink object to get the id of the user. We can also use this field to check consistency between both directions and have a fallback in case somehow people are attending hundreds of schools.
Downsides:
1. This approach violates the DRY principle. Seems like the least of evils here.
2. We still have to use a query to get the users who attended a school. That means dealing with eventual consistency.
Don't forget Update the UserSchoolLinks entity and add/remove the SchoolUserLink entity in a transaction.
You question is too complex but I try explain the best solution (I will answer in Python but same can be done in Java).
class User(db.User):
followers = db.StringListProperty()
Simple add follower.
user = User.get(key)
user.followers.append(str(followerKey))
This allow fast query who is followed and followers
User.all().filter('followers', followerKey) # -> followed
This query i/o costly so you can make it faster but more complicated and costly in i/o writes:
class User(db.User):
followers = db.StringListProperty()
follows = db.StringListProperty()
Whatever this is complicated during changes since delete of Users need update follows so you need 2 writes.
You can also store relationships but it is the worse scenario since it is more complex than second example with followers and follows ... - keep in mind than entity can have 1Mb it is not limit but can be.

Can Hibernate return a collection of result objects OTHER than a List?

Does the Hibernate API support object result sets in the form of a collection other than a List?
For example, I have process that runs hundreds of thousands of iterations in order to create some data for a client. This process uses records from a Value table (for example) in order to create this output for each iteration.
With a List I would have to iterate through the entire list in order to find a certain value, which is expensive. I'd like to be able to return a TreeMap and specify a key programmatically so I can search the collection for the specific value I need. Can Hibernate do this for me?
I assume you are referring to the Query.list() method. If so: no, there is no way to return top-level results other than a List. If you are receiving too many results, why not issue a more constrained query to the database? If the query is difficult to constrain, you can populate your own Map with the contents of Hibernate's List and then throw away the list.
If I understand correctly, you load a bunch of data from the database to memory and then use them locally by looking for certain objects in that list.
If this is the case, I see 2 options.
Dont load all the data, but for each iteration access the database with a query returning only the specific record that you need. This will make more database queries, so it will probably bu slower, but with much less memory consumption. This solution could easily be improved by adding cache, so that most used values will be gotten fast. It will of course need some performance measurement, but I usually favor a naive solution with good caching, as the cache can implemented as a cross-concern and be very transparent to the programmer.
If you really want to load all your data in memory (which is actually a form of caching), the time to transform your data from a list to a TreeMap (or any other efficient structure) will probably be small compared to the full processing. So you could do the data transformation yourself.
As I said, in the general case, I would favor a solution with caching ...
From Java Persistence with Hibernate:
A java.util.Map can be mapped with
<map>, preserving key and value
pairs. Use a java.util.HashMap to
initialize a property.
A java.util.SortedMap can be mapped
with <map> element, and the sort
attribute can be set to either a
comparator or natural ordering for
in-memory sorting. Initialize the
collection with a java.util.TreeMap
instance.
Yes, that can be done.
However, you'll probably have to have your domain class implement Comparable; I don't think you can do it using a Comparator.
Edit:
It seems like I misunderstood the question. If you're talking about the result of an ad hoc query, then the above will not help you. It might be possible to make it work by binding an object with a TreeMap property to a database view if the query is fixed.
And of course you can always build the map yourself with very little work and processing overhead.

Categories