How to avoid StaleObjectStateException when transaction updates thousands of entities?

How to avoid StaleObjectStateException when transaction updates thousands of entities? - java

We are using Hibernate 3.6.0.Final with JPA 2 and Spring 3.0.5 for a large scale enterprise application running on tomcat 7 and MySQL 5.5. Most of the transactions in application, lives for less than a second and update 5-10 entities but in some use cases we need to update more than 10-20K entities in single transaction, which takes few minutes and hence more than 70% of times such transaction fails with StaleObjectStateException because some of those entities got updated by some other transaction.
We generally maintain version column in all tables and in case of StaleObjectStateException we generally retry but since these longs transactions are anyways very long so if we keep on retrying then also I am not very sure that we'll be able to escape StaleObjectStateException.
Also lot of activities keep updating these entities in busy hours so we cannot go with pessimistic approach because it can potentially halt many activities in system.
Please suggest how to fix such long transaction issue because we cannot spawn thousands of independent and small transactions because we cannot afford messed up data in case of some failed & some successful transactions.

Modifying 20,000 entities in one transaction is really a lot, much more than normal.
I can't give you a general solution, but here are some ideas how to solve the problem.
1) Use LockMode.UPGRADE (see pessimistic locking). There you explicitly generate a "SELECT FOR UPDATE", which stops other users to modify the rows while they are locked.
This should avoid your problem, but if you have too many large transactions it can produce deadlocks (depending of your programming) or timeouts.
2) Change your data model to avoid these large transactions. Why do you have to update 10,000 rows? Perhaps it is possible to put this information, which is updated in so many rows, into a new table and let it be referenced only, so you have to update only a few rows in the new table.
3) Use StatelessSession instead of Session. In this case you are not forced to rollback after an exception, instead you can correct the problem and continue (in your case reload the entity which was modified in meantime and do the modifcation for the large transaction on the reloaded entity). This perhaps give you the possibility to handle the critical event (row modified in meantime) on a row to row basis instead for the complete large transaction.

Related

Hibernate best pratices for fetching entities

1) When fetching entities from hibernate i always close the session after fetching, and often i need to fetch same entities but at different times (different sessions)
And then i need to perform some operations on the fetched entities, and there i get some problems when it comes to updating, since i perform different operations on different entities (which are exactly the same in the database)
Is there any good pratices to avoid such problem ?
2)- When updating entities from a software which is working in a network, often 2 different computers does some different operations on same entities (same in the database), but when updating, everything will be corrupted.
For example, let consider the fact of updating the quantity of a product after a sale. After a sale the quantity of the product should be less than it was, but once 2 different computers does a sale on pre-fetched product, they will surely be a wrong value in the database as i'm updating the product using jpa update() function.
Is there any good pratices also for such problems ?
Thanks and sorry if it's too abstract and unclear.

On 1), you shouldn't to that without expecting concurrency problems on updating the entity informations.
On 2), you can use two strategies to deal with: Optimistic Lock or Pessimistic Lock. The Hibernate support both strategies.
There is no right answer for your question. But if would like only to prevent the problem, the Optimistic Lock seems a good solution. When two computers try to update the field, the last to write the quantity on database will receive an error and the transaction will rollback.

I ended up by updating entities using native HQL queries (for the ones who would face the same issue) then fetching again the entities (peforming a select query again) on update finished.

How to handle row lock contention at application level

I have 2 applications (Spring - Hibernate with Boot) using same oracle database (11g). Both apps hit a specific table consistently and there are huge number of hits on this table. we can see row lock contention exceptions in the DB logs and applications have to be restarted each time we get these or when it creates a deadlock like situation.
we are using JPA entitymanager for these applications.
need help for this issue

According to this link :
http://www.dba-oracle.com/t_enq_tx_row_lock_contention.htm
This error occurs because a transaction is waiting for another transaction to commit or roll back ... This behavior is correct from the database POV and if you think of Data consistency ..... But if availability / fulfillment is a concern for you... You might need to make some work around including :
1 make separate tables for each of the application then update the main table with data offline (but u will sacrifice data consistency)
2 make a separate thread to log and retry unsuccessful transactions
3 bear the availability issue (latency) if consistency is a big concern
Also there are some general tips to consider :
1 make the transaction minimal ... Think about every process included in the transaction. If it's mandatory or can be removed outside
2 tune transaction demarcation ... U might find transaction open for long with no reason but bad coding
3 don't make read operations inside transactions
4 avoid extended persistence context (stateless) whenever possible
5 u might choose to use non jta transactional data source for reporting and reading queries
6 check the lock types you are using and try to avoid -according to your case- any thing but OPTIMISTIC
But finally you agree with me we shouldn't blame the database from blocking two transactions from modifying the same row.

Concurrency with Hibernate in Spring

I found a lot of posts regarding this topic, but all answers were just links to documentations with no example code, i.e., how to use concurrency in practice.
My situation: I have an entity House with (for simplyfication) two attributes, number (the id) and owner. The database is initialized with 10 Houses with number 1-10 and owner always null.
I want to assign a new owner to the house with currently no owner, and the smallest number. My code looks like this:
#Transactional
void assignNewOwner(String newOwner) {
//this is flagged as #Transactional too
House tmp = houseDao.getHouseWithoutOwnerAndSmallestNumber();
tmp.setOwner(newOwner);
//this is flagged as #Transactional too
houseDao.update(tmp);
}
For my understanding, although the #Transactional is used, the same House could be assigned twice to different owners, if two requests fetch the same empty House as tmp. How do I ensure this can not happen?
I know, including the update in the selection of the empty House would solve the issue, but in near future, I want to modify/work with the tmp object more.

Optimistic
If you add a version column to your entity / table then you could take advantage of a mechanism called Optimistic Locking. This is the most proficient way of making sure that the state of an entity has not changed since we obtained it in a transactional context.
Once you createQuery using the session you can then call setLockMode(LockModeType.OPTIMISTIC);
Then, just before the transaction is commited, the persistence provider would query for the current version of that entity and check whether it has been incremented by another transaction. If so, you would get an OptimisticLockException and a transaction rollback.
Pessimistic
If you do not version your rows, then you are left with pessimistic lockin which basically means that you phycically create a lock for queries entities on the database level and other transactions cannot read / update those certain rows.
You achieve that by setting this on the Query object:
setLockMode(LockModeType.PESSIMISTIC_READ);
or
setLockMode(LockModeType.PESSIMISTIC_WRITE);

Actually it's pretty easy - at least in my opinion and I am going to abstract away of what Hibernate will generate when you say Pessimistic/Optimistic. You might think this is SELECT FOR UPDATE - but it's not always the case, MSSQL AFAIK does not have that...
These are JPA annotations and they guarantee some functionality, not the implementation.
Fundamentally they are entire different things - PESSIMISTIC vs OPTIMISTIC locking. When you do a pessimistic locking you sort of do a synchronized block at least logically - you can do whatever you want and you are safe within the scope of the transaction. Now, whatever the lock is being held for the row, table or even page is un-specified; so a bit dangerous. Usually database may escalate locks, MSSQL does that if I re-call correctly.
Obviously lock starvation is an issue, so you might think that OPTIMISTIC locking would help. As a side note, this is what transactional memory is in modern CPU; they use the same thinking process.
So optimistically locking is like saying - I will mark this row with an ID/Date, etc, then I will take a snapshot of that and work with it - before committing I will check if that Id has a changed. Obviously there is contention on that ID, but not on the data. If it has changed - abort (aka throw OptimisticLockException) otherwise commit the work.
The thing that bothers everyone IMO is that OptimisticLockException - how do you recover from that? And here is something you are not going to like - it depends. There are apps where a simple retry would be enough, there are apps where this would be impossible. I have used it in rare scenarios.
I usually go with Pessimistic locking (unless Optimistic is totally not an option). At the same time I would look of what hibernate generates for that query. For example you might need an index on how the entry is retrieved for the DB to actually lock just the row - because ultimately that is what you would want.

app engine contention even on putting new root entities

I am experimenting with Google App Engine (High Replication Datastore)
I understand that frequent writes to a single entity group can cause contention.
Precisely to avoid that happening, my entities are all root entities ie each entity is a seperate entity group.
I begin a transaction
get the entity
if it already exists rollback the transaction
else put the entity and commit the transaction
So I thought I was leveraging app engines's claimed strength of high throughput for non-related entities (ie entities not in the same entity group)
However sometimes on the put, I am getting the dreaded exception 'too much contention on these entities'. Why should there be contention on entities not in the same group?
We are told that for a single entity group we can expect no more than 1 to 10 writes per second.
But I have not seen a figure for what we can expect app engine to handle for writes to seperate entity groups
The contention seems to happen for what I consider to be quite low demands (around 100 writes per second)
Am I missing something? As well as having seperate entity groups, are there other rules to conform to in order to get high throughput?
Or are my expectations of at least several hundred writes per second simply too high?

your pseduo code looks correct if you are inserting a single entity per txn.
you are also correct that non related root entities insert should prevent contention.
I am updating multiple root entities in a single request, so there is no chance of another request concurrently modifiying the same data as my request
However there is 5 entity group limit within a txn. what can be done in txn
to solve your solution
use a txn for a single entity update
OR limit the txn to <5 entities.
-lp

Your first mistake is using entity groups. They are not at all intended for avoiding contention. Its the exact opossite. You cant update an item on an entity group too often, see the docs. Entity groups are useful for consistent reads not contention or speed.

Not sure how to delete an answer so I am editing this one.
The getMessage for the exception returns the 'too much contention' message. But the class of the exception is ConcurrentModification.
I am updating multiple root entities in a single request, so there is no chance of another request concurrently modifiying the same data as my request
So I don't understand where this 'contention' is coming from.
It seems that the single request is 'contending' with itself!
One idea is that because of the asynchronous nature of put then the first operations are not fully complete before later ones come along?
Since these are seperate entity groups I think the problems must be caused by things like 'tablet splitting' etc. If that is the case I think it is a shame that all of these failures are presented to the caller as ConcurrentModification exceptions. I think it would be useful to be able to differentiate between a failed operation because of internal processes in the datastore and more normal issues such as another user has modified the data before you

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.

Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.

Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.

The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.