I am experimenting with Google App Engine (High Replication Datastore)
I understand that frequent writes to a single entity group can cause contention.
Precisely to avoid that happening, my entities are all root entities ie each entity is a seperate entity group.
I begin a transaction
get the entity
if it already exists rollback the transaction
else put the entity and commit the transaction
So I thought I was leveraging app engines's claimed strength of high throughput for non-related entities (ie entities not in the same entity group)
However sometimes on the put, I am getting the dreaded exception 'too much contention on these entities'. Why should there be contention on entities not in the same group?
We are told that for a single entity group we can expect no more than 1 to 10 writes per second.
But I have not seen a figure for what we can expect app engine to handle for writes to seperate entity groups
The contention seems to happen for what I consider to be quite low demands (around 100 writes per second)
Am I missing something? As well as having seperate entity groups, are there other rules to conform to in order to get high throughput?
Or are my expectations of at least several hundred writes per second simply too high?
your pseduo code looks correct if you are inserting a single entity per txn.
you are also correct that non related root entities insert should prevent contention.
I am updating multiple root entities in a single request, so there is no chance of another request concurrently modifiying the same data as my request
However there is 5 entity group limit within a txn. what can be done in txn
to solve your solution
use a txn for a single entity update
OR limit the txn to <5 entities.
-lp
Your first mistake is using entity groups. They are not at all intended for avoiding contention. Its the exact opossite. You cant update an item on an entity group too often, see the docs. Entity groups are useful for consistent reads not contention or speed.
Not sure how to delete an answer so I am editing this one.
The getMessage for the exception returns the 'too much contention' message. But the class of the exception is ConcurrentModification.
I am updating multiple root entities in a single request, so there is no chance of another request concurrently modifiying the same data as my request
So I don't understand where this 'contention' is coming from.
It seems that the single request is 'contending' with itself!
One idea is that because of the asynchronous nature of put then the first operations are not fully complete before later ones come along?
Since these are seperate entity groups I think the problems must be caused by things like 'tablet splitting' etc. If that is the case I think it is a shame that all of these failures are presented to the caller as ConcurrentModification exceptions. I think it would be useful to be able to differentiate between a failed operation because of internal processes in the datastore and more normal issues such as another user has modified the data before you
Related
Let's presume that we have an application "mail client" and a front-end for it.
If a user is typing a message or editing the subject or whatever, a rest call is made to update whatever the user was changing (e.g. the receivers) to keep the message in DRAFT. So a lot PUT's are happening to save the message. When closing the window, an update of every editable field happens at the same time. Hibernate can't handle this concurrency: Each of those calls retrieve the message, edit their own fields and try to save the message again, while the other call already changed it.
I know I can add a rest call to save all fields at the same time, but I was wondering if there is a cleaner solution, or a decent strategy to handle such cases (like for example only updating one field or some merge strategy if the object has already changed)
Thanks in advance!
The easiest solutions here would be to tweak the UI to either
Submit a single rest call during email submission that does all the tasks necessary.
Serialize the rest calls so they're chained rather than firing concurrently.
The concern I have here is that this will snowball at some point and become a bigger concurrency problem as more users are interacting with the application. Consider for the moment the potential number of concurrent rest calls your web infrastructure will have to support alone when you're faced with a 100, 500, 1000, or even 10000 or more concurrent users.
Does it really make sense to beef up the volume of servers to handle that load when the load itself is a product of a design flaw in the first place?
Hibernate is designed to handle locking through two mechanisms, optimistic and pessimistic.
Optimistic Way
Read the entity from the data store.
Cache a copy of the fields you're going to modify in temporary variables.
Modify the field or fields based on your PUT operation.
Attempt to merge the changes.
If save succeeds, you're done.
Should an OptimisticLockException occur, refresh the entity state from data store.
Compare cached values to the fields you must change.
If values differ, you can assert or throw an exception
If they don't differ, go back to 4.
The beautiful part of the optimistic approach is you avoid any form of deadlock happening, particularly if you're allowing multiple tables to be read and locked separately.
While you can use pessimistic lock options, optimistic locking is generally the best accepted way to handle concurrent operations as it has the least concurrency contention and performance impact.
Sorry if the answer was already response, but I tried to find out the solution and I could not find anything clear yet.
My question is, there´s any relationships between LAZY fetch relationships, which I know they have the control to know if they have to JOIN or not to other entities or initialize from database if they´re dirty on session, with the principle of optimistic locking VERSION?.
As far as I can understand the optimistic locking VERSION is more necessary if we have to handle multiple transactions at the same time. Is that correct?.
If all the transactions that we do in our applications are done sequentially, it is enough use LAZY fetch to manage when JOINS have to being done?. Or add version give us any add of value.
Thanks!
They are two completely disparate concepts. You only hit a "lazy load" if you get or set a lazily loaded relationship.
If you're using optimistic locking and your row is on V2, it just prevents submission of a modified V2 from one client (which results in the version being upped to 3) and then a different V2 from another client, forcing them to reload the data and submit a later version.
If your logic hits the lazily loaded relationships which then hit thousands of other relationships and ends up loading millions of rows, you will have a performance problem, not a versioning one. In which case you may need to up your batch sizes or maybe do some fetch joins to ensure whatever it is you want is loaded in one block rather than thousands of sequential SQL queries.
So, different problem spaces entirely.
If you're trying to update a very complex object graph, where your alterations go deep into that graph, you may hit interesting optimistic locking problems as ensuring an entire tree's "version" is the same is difficult.
UPDATE: For clarification.
If you have a Car (which has singular properties such as make, model, registration number) and Wheels in multiple. This would be a 1:0..n relationship, represented as two tables, a Car table and a Wheel table, where the Wheel has an FK back to Car. (For the purposes of this post, we will ignore many-to-many relationships).
If you lazy load your wheels, then unless you're interested if you have rim spinners, tyres, locking nuts etc, you never need load the Wheel records in, they're not relevant if you only need the registration number.
Your Car record is on V1, it has a registration number of AB1212
If I, as the Vehicle registrar of Moldova update it to AC4545 and submit with V1 (the current version), I will succeed and the version number will be incremented. I will not hit the Wheels unless I need to. If at the same time, the Subaltern Vehicle registrar in the other room tries to do the same thing on V1, it will fail with an StaleObjectException, again, not hitting the Wheels and thus not invoking a lazy load.
The lazy fetch proxy will throw a LazyInitializationException if it tries to fetch data that was altered by a different transaction(if that happens) with optimistic locking.
It's hard to help without any code or a good question, but as long as you keep all your initializations within a #Transactional code block, you should encounter much trouble.
That being said, you are trying to compare two (functionally) very different things...
Hope this helps.
I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.
We are building a product, so from performance point of view I need some help.
We are using complete Spring (MVC, JPA, Security etc..)
We have a requirement where say for a particular flow there can be 100 Business Rules getting executed at the same time. There can be n number of such flows and business rules.
These rules when executed actually fetch records from tables in database, these will contain few LAZILY INITIALIZED ENTITIES also.
I used Futures/Callables for multi threading purpose, but the problem is it fails to load LAZY variables. It gives Hibernate loading exception, probably some issue in TRANSACTIONAL not getting distributed in different threads.
Please let me know if there is any other way to approach?
if some Entity /Entity Collection is lazy fetched , and you are accessing it in another thread, you will face LazyInitialization exception, as lazy loaded entities can be accessed only within a transaction, and transaction wont span across, threads.
You can use DTO pattern, or if you are sharing an entity across threads, call its lazy initialized collections getter within the transaction so that they are fetched within transaction itself.
If you really need async processing then I suggest you use the Java EE specified way i.e. using JMS/MDB.
Anyways, since it seems like you want all the data to be loaded anyway for your processing. So Eager fetch all the required data and then submit the data for parallel processing. Or let each of the Tasks (Callables) fetch the data they require. Essentially, I am asking you to change your design so that the transactional boundaries doesn't cross multiple threads. Localize the boundaries within a single thread.
This is probably more a "design" or style question: I have just been considering how complex a Hibernate transaction should or could be. I am working with an application that persists messages to a database using Hibernate.
Building the message POJO involves factoring out one-to-many relationships from the message into their respective persistent objects. For example the message contains a "city" field. The city is extracted from the message, the database searched for an equivalent city object and the resulting object added to the message POJO. All of this is done within a single transaction:
Start transaction
test for duplicate
retrieve city object
setCity(cityObject) in message object
retreive country object
setCountry(countryObject) in message object
persist message object
commit End transaction
In fact the actual transactions are considerably more complex. Is this a reasonable structure or should each task be completed within a single transaction (rather than all tasks in one transaction)? I guess the second question relates to best practice in designing the tasks within a transaction. I understand that some tasks need to be grouped for referential integrity, however this is not always the case.
Whatever you put within your outer transaction boundary, the question is whether you can successfully roll back each action.
Bundle related actions within a boundary, and keep it as simple as possible.
Transactions should be grouped according to the business requirements, not technical complexity. If you have N operations that must succeed or fail together as a unit, then that's what the code should support. It should be more of a business consideration than a technical one.
Multiple transactions only make sense if the DB isn't left in a stupid state between them, because any single transaction could fail. Nested transactions may make sense if any block of activity must be atomic and the entire transaction depends on any of the other atomic units.