Default #Transactional in spring and the default lost update - java

There is one big phenomena in the spring environment or I am terribly wrong.
But the default spring #Transactional annotation is not ACID but only ACD lacking the isolation. That means that if you have the method:
#Transactional
public TheEntity updateEntity(TheEntity ent){
TheEntity storedEntity = loadEntity(ent.getId());
storedEntity.setData(ent.getData);
return saveEntity(storedEntity);
}
What would happen if 2 threads enter with different planned updates. They both load the entity from the db, they both apply their own changes, then the first is saved and commit and when the second is saved and commit the first UPDATE IS LOST. Is that really the case? With the debugger it is working like that.

Losing data?
You're not losing data. Think of it like changing a variable in code.
int i = 0;
i = 5;
i = 10;
Did you "lose" the 5? Well, no, you replaced it.
Now, the tricky part that you alluded to with multi-threading is what if these two SQL updates happen at the same time?
From a pure update standpoint (forgetting the read), it's no different. Databases will use a lock to serialize the updates so one will still go before the other. The second one wins, naturally.
But, there is one danger here...
Update based on the current state
What if the update is conditional based on the current state?
public void updateEntity(UUID entityId) {
Entity blah = getCurrentState(entityId);
blah.setNumberOfUpdates(blah.getNumberOfUpdates() + 1);
blah.save();
}
Now you have a problem of data loss because if two concurrent threads perform the read (getCurrentState), they will each add 1, arrive at the same number, and the second update will lose the increment of the previous one.
Solving it
There are two solutions.
Serializable isolation level - In most isolation levels, reads (selects) do not hold any exclusive locks and therefore do not block, regardless of whether they are in a transaction or not. Serializable will actually acquire and hold an exclusive lock for every row read, and only release those locks when the transaction commits or rolls back.
Perform the update in a single statement. - A single UPDATE statement should make this atomic for us, i.e. UPDATE entity SET number_of_updates = number_of_updates + 1 WHERE entity_id = ?.
Generally speaking, the latter is much more scalable. The more locks you hold and the longer you hold them, the more blocking you get and therefore less throughput.

To add to the comments above, this situation with #Transactional and "lost updates" is not wrong, however, it may seem confusing, because it does not meet our expectations that #Transactional protects against "lost updates".
"Lost update" problem can happen with the READ_COMMITED isolation level, which is the default for most DBs and JPA providers as well.
To prevent it one needs to use #Transactional(isolation = isolation.REPEATABLE_READ). No need for SERIALIZABLE, that would overkill.
The very good explanation is given by well known JPA champion Vlad Mihalcea in his article: https://vladmihalcea.com/a-beginners-guide-to-database-locking-and-the-lost-update-phenomena/
It is also worth mentioning that a better solution is to use #Version that also can prevent lost updates with an optimistic locking approach.
The problem maybe comes from wiki page https://en.wikipedia.org/wiki/Isolation_(database_systems) where it is shown that "lost update" is a "weaker" problem than "dirty reads" and is never a case, however, the text below contradicts:

You are not terribly wrong, your question is a very interesting observation. I believe (based on your comments) you are thinking about it in your very specific situation whereas this subject is much broader. Let's take it step by step.
ACID
I in ACID indeed stands for isolation. But it does not mean that two or more transactions need to be executed one after another. They just need to be isolated to some level. Most of the relational databases allow to set an isolation level on a transaction even allowing you to read data from other uncommitted transaction. It is up to specific application if such a situation is fine or not. See for example mysql documentation:
https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html
You can of course set the isolation level to serializable and achieve what you expect.
Now, we also have NoSQL databases that don't support ACID. On top of that if you start working with a cluster of databases you may need to embrace eventual consistency of data which might even mean that the same thread that just wrote some data may not receive it when doing a read. Again this is a question very specific to a particular app - can I afford having inconsistent data for a moment in exchange for a fast write?
You would probably lean towards consistent data handled in serializable manner in banking or some financial system and you would probably be fine with less consistent data in a social app but achieving a higher performance.
Update is lost - is that the case?
Yes, that will be the case.
Are we scared of serializable?
Yes, it might get nasty :-) But it is important to understand how it works and what are the consequences. I don't know if this is still the case but I had a situation in a project about 10 years ago where DB2 was used. Due to very specific scenario DB2 was performing a lock escalation to exclusive lock on the whole table effectively blocking any other connection from accessing the table even for reads. That meant only a single connection could be handled at a time.
So if you choose to go with serializable level you need to be sure that your transaction are in fact fast and that it is in fact needed. Maybe it is fine that some other thread is reading the data while you are writing? Just imagine a scenario where you have a commenting system for your articles. Suddenly a viral article gets published and everyone starts commenting. A single write transaction for comment takes 100ms. 100 new comments transactions get queued which effectively will block reading the comments for the next 10s. I am sure that going with read committed here would be absolutely enough and allow you achieve two things: store the comments faster and read them while they are being written.
Long story short:
It all depends on your data access patterns and there is no silver bullet. Sometimes serializable will be required but it has its performance penalty and sometimes read uncommitted will be fine but it will bring inconsistency penalties.

Related

In what condition will cause LockAcquisitionException and SQLCODE=-911, SQLSTATE=40001, SQLERRMC=68

As my understanding, LockAcquisitionException will happen when a thread is trying to update a row of record that is locking by another thread. ( Please do correct me if I am wrong )
So I try to simulate as follow:
I lock a row of record using dbVisualizer, then I using application to run a update query on the same record as well. At the end, I am just hitting global transaction time out instead of LockAcquisitionException with reason code 68.
Thus, I am thinking that my understanding is wrong. LockAcquisitionException is not happen by this way. Can kindly advise or give some simple example to create the LockAcquisitionException ?
You will get LockAcquisitionException (SQLCODE=-911 SQLERRMC=68) as a result of a lock timeout.
It may be unhelpful to compare the actions of dbViz with hibernate because they may use different classes/methods and settings at jdbc level which can influence the exception details. What matters is that at Db2 level both experienced SQLCODE=-911 with SQLERRMC=68 regardless of the exception-name they report for the lock-timeout.
You can get a lock-timeout on statements like UPDATE or DELETE or INSERT or SELECT (and others including DDL and commands), depending on many factors.
All lock-timeouts have one thing in common: one transaction waited too long and got rolled-back because another transaction did not commit quickly enough.
Lock-timeout diagnosis and Lock-Timeout avoidance are different topics.
The length of time to wait for a lock can be set at database level, connection level, or statement level according to what design is chosen, including mixing these. You can also adjust how Db2 behaves for locking by adjusting database parameters like CUR_COMMIT, LOCK_TIMEOUT and by adjusting isolation-level at statement-level or connection-level.
It's wise to ensure accurate diagnosis before thinking about avoidance.
As you are running Db2-LUW v10.5.0.9, consider careful study of this page and all related links:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.trb.doc/doc/t0055072.html
There are many situations that can lead to a lock timeout, so it's better to know exactly which situation is relevant for your case(s).
Avoiding lock-conflicts is a matter of both configuration and transaction design so that is a bigger topic. The configuration can be at Db2 level or at application layer or both.
Sometimes bugs cause lock-timeouts, for example when app-server threads have a database-connection that is hung and has not committed and is not being cleaned up correctly by the application.
You should diagnose the participants in the lock timeout. There are different ways to do lock-conflict diagnosis on Db2-LUW so choose the one that works for you.
One simple diagnosis tool that still works on V10.5.0.9 is to use the Db2-registry variable DB2_CAPTURE_LOCKTIMEOUT=ON, event though the method is deprecated. You can set this variable (and unset it) on the fly without needing any service-outage. So if you have a recreatable scenario that results in SQLCODE=-911 SQLERRMC=68 (lock timeout), you can switch on this variable, repeat the test, then switch off the variable. If the variable is switched on, and a lock timeout happens, Db2 will write a new text file containing information about the participants in the locking situation showing details that help you understand what is happening and that lets you consider ways to resolve the issue when you have enough facts. You don't want to keep this variable permanently set because it can impact perormance and fill up the Db2 diagnostics file-system if you get a lot of lock-timeouts. So you have to be careful. Read about this variable in the Knowledge Center at this page:
https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.regvars.doc/doc/r0005657.html
You diagnose the lock-timeout by careful study of the contents of these files, although of course it's necessary to understand the details also. This is a regular DBA activity.
Another method is to use db2pdcfg -catch with a custom db2cos script, to decide what to do after Db2 throws the -911. This needs scripting skills and it lets you decide exactly what diagnostics to collect after the -911 and where to store those diagnostics.
Another method which involves much more work but potentially pays more dividends is to use an event monitor for locking. The documentation is at:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0054074.html
Be sure to study the "Related concepts" and "Related tasks" pages also.

Concurrency with Hibernate in Spring

I found a lot of posts regarding this topic, but all answers were just links to documentations with no example code, i.e., how to use concurrency in practice.
My situation: I have an entity House with (for simplyfication) two attributes, number (the id) and owner. The database is initialized with 10 Houses with number 1-10 and owner always null.
I want to assign a new owner to the house with currently no owner, and the smallest number. My code looks like this:
#Transactional
void assignNewOwner(String newOwner) {
//this is flagged as #Transactional too
House tmp = houseDao.getHouseWithoutOwnerAndSmallestNumber();
tmp.setOwner(newOwner);
//this is flagged as #Transactional too
houseDao.update(tmp);
}
For my understanding, although the #Transactional is used, the same House could be assigned twice to different owners, if two requests fetch the same empty House as tmp. How do I ensure this can not happen?
I know, including the update in the selection of the empty House would solve the issue, but in near future, I want to modify/work with the tmp object more.
Optimistic
If you add a version column to your entity / table then you could take advantage of a mechanism called Optimistic Locking. This is the most proficient way of making sure that the state of an entity has not changed since we obtained it in a transactional context.
Once you createQuery using the session you can then call setLockMode(LockModeType.OPTIMISTIC);
Then, just before the transaction is commited, the persistence provider would query for the current version of that entity and check whether it has been incremented by another transaction. If so, you would get an OptimisticLockException and a transaction rollback.
Pessimistic
If you do not version your rows, then you are left with pessimistic lockin which basically means that you phycically create a lock for queries entities on the database level and other transactions cannot read / update those certain rows.
You achieve that by setting this on the Query object:
setLockMode(LockModeType.PESSIMISTIC_READ);
or
setLockMode(LockModeType.PESSIMISTIC_WRITE);
Actually it's pretty easy - at least in my opinion and I am going to abstract away of what Hibernate will generate when you say Pessimistic/Optimistic. You might think this is SELECT FOR UPDATE - but it's not always the case, MSSQL AFAIK does not have that...
These are JPA annotations and they guarantee some functionality, not the implementation.
Fundamentally they are entire different things - PESSIMISTIC vs OPTIMISTIC locking. When you do a pessimistic locking you sort of do a synchronized block at least logically - you can do whatever you want and you are safe within the scope of the transaction. Now, whatever the lock is being held for the row, table or even page is un-specified; so a bit dangerous. Usually database may escalate locks, MSSQL does that if I re-call correctly.
Obviously lock starvation is an issue, so you might think that OPTIMISTIC locking would help. As a side note, this is what transactional memory is in modern CPU; they use the same thinking process.
So optimistically locking is like saying - I will mark this row with an ID/Date, etc, then I will take a snapshot of that and work with it - before committing I will check if that Id has a changed. Obviously there is contention on that ID, but not on the data. If it has changed - abort (aka throw OptimisticLockException) otherwise commit the work.
The thing that bothers everyone IMO is that OptimisticLockException - how do you recover from that? And here is something you are not going to like - it depends. There are apps where a simple retry would be enough, there are apps where this would be impossible. I have used it in rare scenarios.
I usually go with Pessimistic locking (unless Optimistic is totally not an option). At the same time I would look of what hibernate generates for that query. For example you might need an index on how the entry is retrieved for the DB to actually lock just the row - because ultimately that is what you would want.

A good strategy/solution for concurrency with hibernate's optimistic/pessimistic locking

Let's presume that we have an application "mail client" and a front-end for it.
If a user is typing a message or editing the subject or whatever, a rest call is made to update whatever the user was changing (e.g. the receivers) to keep the message in DRAFT. So a lot PUT's are happening to save the message. When closing the window, an update of every editable field happens at the same time. Hibernate can't handle this concurrency: Each of those calls retrieve the message, edit their own fields and try to save the message again, while the other call already changed it.
I know I can add a rest call to save all fields at the same time, but I was wondering if there is a cleaner solution, or a decent strategy to handle such cases (like for example only updating one field or some merge strategy if the object has already changed)
Thanks in advance!
The easiest solutions here would be to tweak the UI to either
Submit a single rest call during email submission that does all the tasks necessary.
Serialize the rest calls so they're chained rather than firing concurrently.
The concern I have here is that this will snowball at some point and become a bigger concurrency problem as more users are interacting with the application. Consider for the moment the potential number of concurrent rest calls your web infrastructure will have to support alone when you're faced with a 100, 500, 1000, or even 10000 or more concurrent users.
Does it really make sense to beef up the volume of servers to handle that load when the load itself is a product of a design flaw in the first place?
Hibernate is designed to handle locking through two mechanisms, optimistic and pessimistic.
Optimistic Way
Read the entity from the data store.
Cache a copy of the fields you're going to modify in temporary variables.
Modify the field or fields based on your PUT operation.
Attempt to merge the changes.
If save succeeds, you're done.
Should an OptimisticLockException occur, refresh the entity state from data store.
Compare cached values to the fields you must change.
If values differ, you can assert or throw an exception
If they don't differ, go back to 4.
The beautiful part of the optimistic approach is you avoid any form of deadlock happening, particularly if you're allowing multiple tables to be read and locked separately.
While you can use pessimistic lock options, optimistic locking is generally the best accepted way to handle concurrent operations as it has the least concurrency contention and performance impact.

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

Using Java locks for database concurrency

I have the following scenario.
I have two tables. One stores multi values that are counters for transactions. Through a java application the first table value is read, incremented and written to the second table, as well as the new value being written back to the first table. Obviously there is potential for this to go wrong as it's a multiple user system.
My solution, in Java, to the issue is to provide Locks that have to, well should, be aquired before any action can be taken on either table. These Locks, ReentrantLocks, are static and there is one for each column in Table 1 as the values are completely independent of each other.
Is this a recommended approached?
Cheers.
No. Use implicit Database Locks1 for Database Concurrency. Relational databases support Transactions which are a vital part of ACID: use them.
Java-centric locks will not work cross-VM and as such will not help in multi-User/Server environments.
1 Databases are smart enough to acquire/release locks to ensure Consistency and Isolation and may even use "lock free" implementations such as MVCC. There are rare occasions when explicit database locks must be requested, but this is an advanced use-case.
Whilst agreeing with some of the sentiments of #pst's answer, I would say this depends slightly.
If the sequence of events is, and probably always will be, essentially "SQL oriented", then you may as well do the locking at the database level (and indeed, probably implicitly via the use of transactions).
However if there is, or you are planning to build in, significant data manipulation logic within your app tier (either generally or in the case of this specific operation), then locking at the app level may be more appropriate. (In reality, you will probably still run your SQL in transactions so that you're actually locking at both levels.)
I don't think the issue of multiple VMs is necessarily a compelling issue on its own for relying on DB-level locking. If you have multiple server apps accessing the database, you will in any case want to establish a well-defined protocol for which data is accessed concurrently under what circumstances. And in a system of moderate complexity, you will in any case want to build in a system of running periodic sanity checks on the data. (Even if your server apps are perfectly behaved 100% of the time, will back end tech support never ever ever have to run some miscellaneous SQL on the database outside your app...?)

Categories