I found a lot of posts regarding this topic, but all answers were just links to documentations with no example code, i.e., how to use concurrency in practice.
My situation: I have an entity House with (for simplyfication) two attributes, number (the id) and owner. The database is initialized with 10 Houses with number 1-10 and owner always null.
I want to assign a new owner to the house with currently no owner, and the smallest number. My code looks like this:
#Transactional
void assignNewOwner(String newOwner) {
//this is flagged as #Transactional too
House tmp = houseDao.getHouseWithoutOwnerAndSmallestNumber();
tmp.setOwner(newOwner);
//this is flagged as #Transactional too
houseDao.update(tmp);
}
For my understanding, although the #Transactional is used, the same House could be assigned twice to different owners, if two requests fetch the same empty House as tmp. How do I ensure this can not happen?
I know, including the update in the selection of the empty House would solve the issue, but in near future, I want to modify/work with the tmp object more.
Optimistic
If you add a version column to your entity / table then you could take advantage of a mechanism called Optimistic Locking. This is the most proficient way of making sure that the state of an entity has not changed since we obtained it in a transactional context.
Once you createQuery using the session you can then call setLockMode(LockModeType.OPTIMISTIC);
Then, just before the transaction is commited, the persistence provider would query for the current version of that entity and check whether it has been incremented by another transaction. If so, you would get an OptimisticLockException and a transaction rollback.
Pessimistic
If you do not version your rows, then you are left with pessimistic lockin which basically means that you phycically create a lock for queries entities on the database level and other transactions cannot read / update those certain rows.
You achieve that by setting this on the Query object:
setLockMode(LockModeType.PESSIMISTIC_READ);
or
setLockMode(LockModeType.PESSIMISTIC_WRITE);
Actually it's pretty easy - at least in my opinion and I am going to abstract away of what Hibernate will generate when you say Pessimistic/Optimistic. You might think this is SELECT FOR UPDATE - but it's not always the case, MSSQL AFAIK does not have that...
These are JPA annotations and they guarantee some functionality, not the implementation.
Fundamentally they are entire different things - PESSIMISTIC vs OPTIMISTIC locking. When you do a pessimistic locking you sort of do a synchronized block at least logically - you can do whatever you want and you are safe within the scope of the transaction. Now, whatever the lock is being held for the row, table or even page is un-specified; so a bit dangerous. Usually database may escalate locks, MSSQL does that if I re-call correctly.
Obviously lock starvation is an issue, so you might think that OPTIMISTIC locking would help. As a side note, this is what transactional memory is in modern CPU; they use the same thinking process.
So optimistically locking is like saying - I will mark this row with an ID/Date, etc, then I will take a snapshot of that and work with it - before committing I will check if that Id has a changed. Obviously there is contention on that ID, but not on the data. If it has changed - abort (aka throw OptimisticLockException) otherwise commit the work.
The thing that bothers everyone IMO is that OptimisticLockException - how do you recover from that? And here is something you are not going to like - it depends. There are apps where a simple retry would be enough, there are apps where this would be impossible. I have used it in rare scenarios.
I usually go with Pessimistic locking (unless Optimistic is totally not an option). At the same time I would look of what hibernate generates for that query. For example you might need an index on how the entry is retrieved for the DB to actually lock just the row - because ultimately that is what you would want.
Related
There is one big phenomena in the spring environment or I am terribly wrong.
But the default spring #Transactional annotation is not ACID but only ACD lacking the isolation. That means that if you have the method:
#Transactional
public TheEntity updateEntity(TheEntity ent){
TheEntity storedEntity = loadEntity(ent.getId());
storedEntity.setData(ent.getData);
return saveEntity(storedEntity);
}
What would happen if 2 threads enter with different planned updates. They both load the entity from the db, they both apply their own changes, then the first is saved and commit and when the second is saved and commit the first UPDATE IS LOST. Is that really the case? With the debugger it is working like that.
Losing data?
You're not losing data. Think of it like changing a variable in code.
int i = 0;
i = 5;
i = 10;
Did you "lose" the 5? Well, no, you replaced it.
Now, the tricky part that you alluded to with multi-threading is what if these two SQL updates happen at the same time?
From a pure update standpoint (forgetting the read), it's no different. Databases will use a lock to serialize the updates so one will still go before the other. The second one wins, naturally.
But, there is one danger here...
Update based on the current state
What if the update is conditional based on the current state?
public void updateEntity(UUID entityId) {
Entity blah = getCurrentState(entityId);
blah.setNumberOfUpdates(blah.getNumberOfUpdates() + 1);
blah.save();
}
Now you have a problem of data loss because if two concurrent threads perform the read (getCurrentState), they will each add 1, arrive at the same number, and the second update will lose the increment of the previous one.
Solving it
There are two solutions.
Serializable isolation level - In most isolation levels, reads (selects) do not hold any exclusive locks and therefore do not block, regardless of whether they are in a transaction or not. Serializable will actually acquire and hold an exclusive lock for every row read, and only release those locks when the transaction commits or rolls back.
Perform the update in a single statement. - A single UPDATE statement should make this atomic for us, i.e. UPDATE entity SET number_of_updates = number_of_updates + 1 WHERE entity_id = ?.
Generally speaking, the latter is much more scalable. The more locks you hold and the longer you hold them, the more blocking you get and therefore less throughput.
To add to the comments above, this situation with #Transactional and "lost updates" is not wrong, however, it may seem confusing, because it does not meet our expectations that #Transactional protects against "lost updates".
"Lost update" problem can happen with the READ_COMMITED isolation level, which is the default for most DBs and JPA providers as well.
To prevent it one needs to use #Transactional(isolation = isolation.REPEATABLE_READ). No need for SERIALIZABLE, that would overkill.
The very good explanation is given by well known JPA champion Vlad Mihalcea in his article: https://vladmihalcea.com/a-beginners-guide-to-database-locking-and-the-lost-update-phenomena/
It is also worth mentioning that a better solution is to use #Version that also can prevent lost updates with an optimistic locking approach.
The problem maybe comes from wiki page https://en.wikipedia.org/wiki/Isolation_(database_systems) where it is shown that "lost update" is a "weaker" problem than "dirty reads" and is never a case, however, the text below contradicts:
You are not terribly wrong, your question is a very interesting observation. I believe (based on your comments) you are thinking about it in your very specific situation whereas this subject is much broader. Let's take it step by step.
ACID
I in ACID indeed stands for isolation. But it does not mean that two or more transactions need to be executed one after another. They just need to be isolated to some level. Most of the relational databases allow to set an isolation level on a transaction even allowing you to read data from other uncommitted transaction. It is up to specific application if such a situation is fine or not. See for example mysql documentation:
https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html
You can of course set the isolation level to serializable and achieve what you expect.
Now, we also have NoSQL databases that don't support ACID. On top of that if you start working with a cluster of databases you may need to embrace eventual consistency of data which might even mean that the same thread that just wrote some data may not receive it when doing a read. Again this is a question very specific to a particular app - can I afford having inconsistent data for a moment in exchange for a fast write?
You would probably lean towards consistent data handled in serializable manner in banking or some financial system and you would probably be fine with less consistent data in a social app but achieving a higher performance.
Update is lost - is that the case?
Yes, that will be the case.
Are we scared of serializable?
Yes, it might get nasty :-) But it is important to understand how it works and what are the consequences. I don't know if this is still the case but I had a situation in a project about 10 years ago where DB2 was used. Due to very specific scenario DB2 was performing a lock escalation to exclusive lock on the whole table effectively blocking any other connection from accessing the table even for reads. That meant only a single connection could be handled at a time.
So if you choose to go with serializable level you need to be sure that your transaction are in fact fast and that it is in fact needed. Maybe it is fine that some other thread is reading the data while you are writing? Just imagine a scenario where you have a commenting system for your articles. Suddenly a viral article gets published and everyone starts commenting. A single write transaction for comment takes 100ms. 100 new comments transactions get queued which effectively will block reading the comments for the next 10s. I am sure that going with read committed here would be absolutely enough and allow you achieve two things: store the comments faster and read them while they are being written.
Long story short:
It all depends on your data access patterns and there is no silver bullet. Sometimes serializable will be required but it has its performance penalty and sometimes read uncommitted will be fine but it will bring inconsistency penalties.
As per How to do optimistic locking in hibernate, we need to enable the optimistic
locking with version element or version annotation in hibernate. I am clear till here.
I am not sure what is the purpose of Lock Mode Optimistic ?
In what kind of scenario, developer should use it ?
to understand why you would want optimistic locking, you first need to understand what no locking and pessimistic locking mean. I'm no hibernate expert, so I'll just tell it to you without a focus on hibernate.
When 2 process/users update the same object then the one who updates it last will win. So you need to find a way to prevent this. One way to do this is pessimistic locking. Here, you will put a lock on the object at the moment you load it from database "select for update". Until your transaction is commited or rolled back, nobody else can "select for update" this object. now the problem is: When you load an entity via hibernate, you nowhere specify if you want to load it for read-only purpose or if you want to modify this object.
So here comes optimistic locking. This concept assumes optimistically that everything will go ok in most cases. When 2 processes/users update the same object, the second one will not win, but get an exception on commit.
This seems like it would come up often, but I've Googled to no avail.
Suppose you have a Hibernate entity User. You have one User in your DB with id 1.
You have two threads running, A and B. They do the following:
A gets user 1 and closes its Session
B gets user 1 and deletes it
A changes a field on user 1
A gets a new Session and merges user 1
All my testing indicates that the merge attempts to find user 1 in the DB (it can't, obviously), so it inserts a new user with id 2.
My expectation, on the other hand, would be that Hibernate would see that the user being merged was not new (because it has an ID). It would try to find the user in the DB, which would fail, so it would not attempt an insert or an update. Ideally it would throw some kind of concurrency exception.
Note that I am using optimistic locking through #Version, and that does not help matters.
So, questions:
Is my observed Hibernate behaviour the intended behaviour?
If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
If the answer to 2. is yes, why is nobody complaining about it?
Please see the text from hibernate documentation below.
Copy the state of the given object onto the persistent object with the same identifier. If there is no persistent instance currently associated with the session, it will be loaded. Return the persistent instance. If the given instance is unsaved, save a copy of and return it as a newly persistent instance.
It clearly stated that copy the state(data) of object in database. if object is not there then save a copy of that data. When we say save a copy hibernate always create a record with new identifier.
Hibernate merge function works something like as follows.
It checks the status(attached or detached to the session) of entity and found it detached.
Then it tries to load the entity with identifier but not found in database.
As entity is not found then it treat that entity as transient.
Transient entity always create a new database record with new identifier.
Locking is always applied to attached entities. If entity is detached then hibernate will always load it and version value gets updated.
Locking is used to control concurrency problems. It is not the concurrency issue.
I've been looking at JSR-220, from which Session#merge claims to get its semantics. The JSR is sadly ambiguous, I have found.
It does say:
Optimistic locking is a technique that is used to insure that updates
to the database data corresponding to the state of an entity are made
only when no intervening transaction has updated that data since the
entity state was read.
If you take "updates" to include general mutation of the database data, including deletes, and not just a SQL UPDATE, which I do, I think you can make an argument that the observed behaviour is not compliant with optimistic locking.
Many people agree, given the comments on my question and the subsequent discovery of this bug.
From a purely practical point of view, the behaviour, compliant or not, could lead to quite a few bugs, because it is contrary to many developers' expectations. There does not seem to be an easy fix for it. In fact, Spring Data JPA seems to ignore this issue completely by blindly using EM#merge. Maybe other JPA providers handle this differently, but with Hibernate this could cause issues.
I'm actually working around this by using Session#update currently. It's really ugly, and requires code to handle the case when you try to update an entity that is detached, and there's a managed copy of it already. But, it won't lead to spurious inserts either.
1.Is my observed Hibernate behaviour the intended behaviour?
The behavior is correct. You just trying to do operations that are not protected against concurrent data modification :) If you have to split the operation into two sessions. Just find the object for update again and check if it is still there, throw exception if not. If there is one then lock it by using em.(class, primary key, LockModeType); or using #Version or #Entity(optimisticLock=OptimisticLockType.ALL/DIRTY/VERSION) to protect the object till the end of the transaction.
2.If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
Probably: yes
3.If the answer to 2. is yes, why is nobody complaining about it?
Because if you protect your operations using pessimistic or optimistic locking the problem will disappear:)
The problem you are trying to solve is called: Non-repeatable read
I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.
In my web application I have several threads that potentially access the same data concurrently why I decided to implement optimistic (versioning) and pessimistic locking with Hibernate.
Currently I use the following pattern to lock an entity and perform write operations on it (using Springs Transaction manager and transaction demarcation with #Transactional):
#Transactional
public void doSomething(entity) {
session.lock(entity, LockMode.UPGRADE);
session.refresh(entity);
// I change the entity itself as well as entites in a relationship.
entity.setBar(...);
for(Child childEntity : entity.getChildren()) {
childEntity.setFoo(...);
}
}
However, sometimes I am getting StaleObjectException when the #Transactional is flushing that tells me that a ChildEntity has been modifed concurrently and now has a wrong version.
I guess I am not correctly refreshing entity and its children so I am working with stale data. Can someone point out how to achieve this? Some thoughts of me included clearing the persistence context (the session) or calling session.lock(entity, LockMode.READ) again, but I am not sure what is correct here.
Thanks for your help!
You may want to take at look at this Hibernate-Issue: LockMode.Upgrade doesn't refresh entity values.
In short: Hibernat does NOT perform a select after a successful lock if the given entity was already preloaded. You need to call refresh for the entity for yourself after you received the lock.
Why do you make "LockMode.UPGRADE" and optimistic locking live together? Seem like controversial things.
Hibernate never lock objects in memory and always use the locking mechanism of the database. Also, "if the requested lock mode is not supported by the database, Hibernate uses an appropriate alternate mode instead of throwing an exception. This ensures that applications are portable.". It means, that if your database doesn't support SELECT ... FOR UPDATE, most probably, you will get these exceptions.
Another possible reason is that you haven't used "org.hibernate.annotations.CascadeType.LOCK" for children.