Am trying to understand isolation levels and various issues ..... i.e. dirty read , non repeatable read , phantom read and lost update .
Was reading about Non repeatable read
Had also read about Lost update
what I am confused about is to me both of these look very similar i.e. in NRR ( Non repeatable read ) Tx B updated the row between two reads of the same row by Tx A so Tx A got different results.
In case of Lost update - Tx B overwrites changes committed by Tx A
So to me really it seems that both of these seem quite similar and related.
Is that correct ?
My understanding is if we use 'optimistic locking' it will prevent the issue of 'lost update'
(Based on some very good answers here )
My confusion :
However would it also imply / mean that by using 'optimistic locking' we also eliminate the issue of 'non repeatable read' ?
All of these questions pertain to a Java J2EE application with Oracle database.
NOTE : to avoid distractions I am not looking for details pertaining to dirty reads and phantom reads - my focus presently is entirely on non repeatable reads and lost update
Non-repeatable reads, lost updates, phantom reads, as well as dirty reads, are about transaction isolation levels, rather than pessimistic/optimistic locking. I believe Oracle's default isolation level is read committed, meaning that only dirty reads are prevented.
Non-repeatable reads and lost updates are indeed somehow related, as they may or may not occur on the same level of isolation. Neither can be avoided by locking only unless you set the correct isolation level, but you can use versioning (a column value that is checked against and increments on every update) to at least detect the issue (and take necessary action).
The purpose of repeatable reads is to provide read-consistent data:
within a query, all the results should reflect the state of the data at a
specific point in time.
within a transaction, the same query should return the same results
even if it is repeated.
In Oracle, queries are read-consistent as of the moment the query started. If data changes during the query, the query reads the version of the data that existed at the start of the query. That version is available in the "UNDO".
Bottom line: Oracle by default has an isolation level of READ COMMITTED, which guarantees read-consistent data within a query, but not within a transaction.
You talk about Tx A and Tx B. In Oracle, a session that does not change any data does not have a transaction.
Assume the default isolation level of READ COMMITTED. Assume the J2EE application uses a connection pool and is stateless.
app thread A connects to session X and reads a row.
app thread B connects to session Y and updates the row with commit.
app thread A connects to session Z and reads the same row, seeing a different result.
Notice that there is nothing any database can do here. Even if all the sessions had the SERIALIZABLE isolation level, session Z has no idea what is going on in session X. Besides, thread A cannot leave a transaction hanging in session X when it disconnects.
To your question, notice that app thread A never changed any data. The human user behind app thread A queried the same data twice and saw two different results, that is all.
Now let's do an update:
app thread A connects to session X and reads a row.
app thread B connects to session Y and updates the row with commit.
app thread A connects to session Z and updates the same row with commit.
Here the same row had three different values, not two. The human user behind thread A saw the first value and changed it to the third value without ever seeing the second value! That is what we mean by a "lost update".
The idea behind optimistic locking is to notify the human user that, between the time they queried the data and the time they asked to update it, someone else changed the data first. They should look at the most recent values before confirming the update.
To simplify:
"non-repeatable reads" happen if you query, then I update, then you query.
"lost updates" happen if you query, then I update, then you update. Notice that if you query the data again, you need to see the new value in order to decide what to do next.
Suggested reading: https://blogs.oracle.com/oraclemagazine/on-transaction-isolation-levels
Best regards, Stew Ashton
Related
I am a little confused as to why Optimistic Locking is actually safe. If I am checking the version at the time of retrieval with the version at the time of update, it seems like I can still have two requests enter the update block if the OS issues an interrupt and swaps the processes before the commit actually occurs. For example:
latestVersion = vehicle.getVersion();
if (vehicle.getVersion() == latestVersion) {
// update record in database
} else {
// don't update record
}
In this example, I am trying to manually use Optimistic Locking in a Java application without using JPA / Hibernate. However, it seems like two requests can enter the if block at the same time. Can you please help me understand how to do this properly? For context, I am also using Java Design Patterns website as an example.
Well... that's the optimistic part. The optimism is that it is safe. If you have to be certain it's safe, then that's not optimistic.
The example you show definitely is susceptible to a race condition. Not only because of thread scheduling, but also due to transaction isolation level.
A simple read in MySQL, in the default transaction isolation level of REPEATABLE READ, will read the data that was committed at the time your transaction started.
Whereas updating data will act on the data that is committed at the time of the update. If some other concurrent session has updated the row in the database in the meantime, and committed it, then your update will "see" the latest committed row, not the row viewed by your get method.
The way to avoid the race condition is to not be optimistic. Instead, force exclusive access to the record. Doveryai, no proveryai.
If you only have one app instance, you might use a critical section for this.
If you have multiple app instances, critical sections cannot coordinate other instances, so you need to coordinate in the database. You can do this by using pessimistic locking. Either read the record using a locking read query, or else you can use MySQL's user-defined locks.
I am working on building a microservice which is using transaction manager implemented based on Java Transaction API(JTA).
My question is does Trasaction maanger have ability to handle concurrency issue in distributed database scenario's .
Scenario:
Assume there are multiple instance of a service running and we get two requests to update balance amount by 10 in an account. Initially an account can have $100 and the first instance gets that and increments it to $10 but has not been commited yet.
At the same time the second instance also retreive's account which is still 100 and increments it by $10 and then commits it updating balance to $110 and then service one updates account again to $110.
By this time you must have figured that balance was supposed to be incremented by $20 and not 10. Do I have to write some kind of Optimistic lock exception mechanism to prevent the above scenario or will Transaction Manager based on JTA specification already ensure such a thing will not happen ?
does Trasaction maanger have ability to handle concurrency issue in distributed database scenario's .
Transactions and concurrency are two independent concepts and though Transactions become most siginificant in context where we also see concurrency , transactions can be important without concurrency.
To answer your question : No , Transaction Manager generally does not concern itself with handling issues that arise with concurrent updates. It takes a very naive and simple ( and often most meaningful ) approach : if after the start of a transaction , it detects that the state has become inconsistent ( because of concurrent updates ) it would simply raise it as an exception and Rollback the transaction. If only it can establish that all the conditions of the ACID properties of the transaction are still valid will it commit the transaction.
For such type of requests, you can handle through Optimistic Concurrency where you would have a column on the database (Timestamp) as a reference to the version number.
Each time when a change is commited it would modify the timestamp value.
If two requests try to commit the change at the same time, only one of them will succeed as the version (Timestamp) column will change by then negating other request from comitting its changes.
The transaction manager (as implementation of the JTA specification) makes transparent a work above multiple resources. It ensures all the operations happens as a single unit of work. The "work above multiple resources" mean that that the application can insert data to database and meanwhile it sends a message to a JMS broker. Transaction manager guarantees ACID properties to be hold for this two operations. In simplistic form when the transaction finishes successfully the application developer can be sure both operation was processed. When some trouble happens is on the transaction manager to handle it - possibly throw an exception and rollback the data changes. Thus neither operation was processed.
It makes this transparent for the application developer who does not need to care to update first database and then JMS and checks if all data changes were really processed or a failure happens.
In general the JTA specification was not written with microservice architecture in mind. Now it really depends on your system design(!) But if I consider you have two microservices where each one has attached its own transaction manager then the transaction manager can't help you to sort out your concurrency issue. Transaction managers does not work (usually) in some synchronization. You don't work with multiple resources from one microservice (what is the usecase for the transaction manager) but with one resource from multiple microservices.
As there is the one resource it's the synchronization point for all you updates. It depends on it how it manages concurrency. Considering it's a SQL database then it depends on the level of the isolation it uses (ACID - I = isolation, see https://en.wikipedia.org/wiki/ACID_(computer_science)). Your particular example talks about lost update phenomena (https://vladmihalcea.com/a-beginners-guide-to-database-locking-and-the-lost-update-phenomena/). As both microservices tries to update one record. One solution for the avoiding the issue is using optimistic/pesimistic locking (you can implement it on your own by e.g. timestamps as stated above), the other is to use serializable isolation level in your database, or you can design your application for not reading and updating data based on what is read first time but change the sql query having the update atomic (or there are possibly other strategies how to work with your data model to achieve the desired outcome).
In summary - it depends on how your transaction manager is implemented, it can help you in a way but it's not its purpose. Your goal should be to check how the isolation level is set up at the shared storage and consider if your application needs to handle lost update phenomena at application level or your storage cang manage it for you.
I have come across this oracle java tutorial. As a beginner in the topic I cannot grasp why it's needed to set con.setAutocommit(true); at the end of the transaction.
Here is the oracle explanation:
The statement con.setAutoCommit(true); enables auto-commit mode, which
means that each statement is once again committed automatically when
it is completed. Then, you are back to the default state where you do
not have to call the method commit yourself. It is advisable to
disable the auto-commit mode only during the transaction mode. This
way, you avoid holding database locks for multiple statements, which
increases the likelihood of conflicts with other users.
Could you explain it in other words? especially this bit:
This way, you avoid holding database locks for multiple statements,
which increases the likelihood of conflicts with other users.
What do they mean with "holding database locks for multiple statements"?
Thanks in advance.
The database has to perform row-level or table-level locking (based on your database-engine in MySQL) to handle transactions. If you keep the auto-commit mode off and keep executing statements, these locks won't be released until you commit the transactions. Based on the type, other transactions won't be able to update the row/table that is currently locked. setAutocommit(true) basically commits the current transaction, releases the locks currently held, and enables auto-commit, That is, until further required, each individual statement is executed and commited.
row-level locks protect the individual rows that take part in the transaction (InnoDB). Table-level locks prevent concurrent access to the entire table (MyIsam).
When one transaction updates a row in the database others transaction cannot alter this row until the first one finishes (commits or rollbacks), therefore if you do not need transactions it is advisable to set con.setAutocommit(true).
With most modern database systems you can batch together a series of SQL statements. Typically the ones you care about are inserts as these will block out a portion of the space on disk that is being written to. In JDBC this is akin to Statement.addBatch(sql). Now where this becomes problematic is when you try to implement pessimistic or optimistic locks on tuples in the database. So if you have a series of long running transactions that execute multiple batches you can find yourself in a situation where all reads get rejected because of these exclusive locks. I believe in Oracle there is no such thing as the dirty read so this can potentially be mitigated. But imagine the scenario where you are running a job that attempts to delete a record while I am updating it, this is the type of conflict that they are referring to.
With auto-commit on, each part of the batch is saved before moving on to the next unit of work. This is what you see when trying to persist millions of records and it slows down considerably. Because the system is ensuring consistency with each insert statement. There is a quick way to get around this in Oracle (if you are using oracle) is to use the oracle.sql package and look at the ARRAY class.
Most databases will autoCommit by default. That means that as soon as you execute a statement the results will immediately appear in the database and everyone else using the database will immediately see them.
There are times, however, when you need to perform a number of changes on the database which must all be done at once and if one fails you want to back out of all of them.
Say you have a cars database and you come across a new car from a new manufacturer. Here you may wish to create the manufacturer entry in your database and the new car record and make sure they both appear at once for other users. Otherwise there may be a confusing moment in your database where one exists without the other.
To achieve this you switch autoCommit off, execute the statements, commit them and then set autoCommit back on. This last switch on of autoCommit is probably what you are seeing.
I'm unable to find documentation that fully explains entities being deleted from datastore (I'm using JDO deletePersistent) without being in a transaction. I can afford loosing data accuracy during parallel updates when not using transaction for the sake of performance and avoiding contention.
But how can i make sure when my code is running on different machines at the same time that a delete operation would not be overridden by a later update / put on a previous read to that entity on another machine, I'm letting PersistenceManager take care of implicit updates to attached objects.
EDIT:
Trying to update that entity after deletePersistent will result in an exception but that is when trying to update the exact same copy being passed to deletePersistent. but if it was a different copy on another machine would be treated as updating a deleted entity (not valid) or as an insert or update resulting in putting that entity back?
This is taken from GAE Documentation:
Using Transactions
A transaction is a set of Datastore operations on one or more entity.
Each transaction is guaranteed to be atomic, which means that
transactions are never partially applied. Either all of the operations
in the transaction are applied, or none of them are applied.
An operation may fail when:
Too many users try to modify an entity group simultaneously. The
application reaches a resource limit. The Datastore encounters an
internal error.
Since transactions are guaranteed to be atomic, an ATOMIC operation like a single delete operation will always work inside or outside a transaction.
The answer is yes, even after the object was deleted if it was read before and the update was being committed after delete was committed it will be put back because as #Nick Johnson commented inserts and updates are the same. tested that using 20 seconds thread sleep after getting object for update allowing the object to be deleted and then being put back.
I have the below flow in a multi-threaded environment
start transaction
read n number of top rows (based on a column) from db
check some criteria
update those set of rows
commit/rollback the transaction
I am using optimistic locking to handle multi-threaded scenario, but in above situation DB is always returning the same set of rows so if a second thread runs at the same time it will always fail.
Is there a better way to handle this?
Could we force DB to return different set of rows for each transaction using some option?
The reason you are getting the same top n records for all your threads is because of the I in the ACID (atomicity, consistency, isolation, durability) principles of transactions. Isolation means other operations cannot access data that has been modified during a transaction that has not yet completed. So until your threads commit their transactions the other threads cannot see what they have done.
It is possible to change the Isolation level on most databases to one of the following:
SERIALIZABLE
REPEATABLE READ
READ COMMITTED
READ UNCOMMITTED
In your case you probably want READ UNCOMMITTED as it allows one transaction to see uncommitted changes made by some other transaction.
Note: This is almost certainly the wrong isolation level for most applications, and could lead to data corruption. If other application other than the one you described here are accessing the same database you probably don't want to change the isolation level as those application may start to see unexpected and incorrect behaviour.