datastore: deleting entities outside transactions

datastore: deleting entities outside transactions - java

I'm unable to find documentation that fully explains entities being deleted from datastore (I'm using JDO deletePersistent) without being in a transaction. I can afford loosing data accuracy during parallel updates when not using transaction for the sake of performance and avoiding contention.
But how can i make sure when my code is running on different machines at the same time that a delete operation would not be overridden by a later update / put on a previous read to that entity on another machine, I'm letting PersistenceManager take care of implicit updates to attached objects.
EDIT:
Trying to update that entity after deletePersistent will result in an exception but that is when trying to update the exact same copy being passed to deletePersistent. but if it was a different copy on another machine would be treated as updating a deleted entity (not valid) or as an insert or update resulting in putting that entity back?

This is taken from GAE Documentation:
Using Transactions
A transaction is a set of Datastore operations on one or more entity.
Each transaction is guaranteed to be atomic, which means that
transactions are never partially applied. Either all of the operations
in the transaction are applied, or none of them are applied.
An operation may fail when:
Too many users try to modify an entity group simultaneously. The
application reaches a resource limit. The Datastore encounters an
internal error.
Since transactions are guaranteed to be atomic, an ATOMIC operation like a single delete operation will always work inside or outside a transaction.

The answer is yes, even after the object was deleted if it was read before and the update was being committed after delete was committed it will be put back because as #Nick Johnson commented inserts and updates are the same. tested that using 20 seconds thread sleep after getting object for update allowing the object to be deleted and then being put back.

Related

how does lost update differ from non repetable read?

Am trying to understand isolation levels and various issues ..... i.e. dirty read , non repeatable read , phantom read and lost update .
Was reading about Non repeatable read
Had also read about Lost update
what I am confused about is to me both of these look very similar i.e. in NRR ( Non repeatable read ) Tx B updated the row between two reads of the same row by Tx A so Tx A got different results.
In case of Lost update - Tx B overwrites changes committed by Tx A
So to me really it seems that both of these seem quite similar and related.
Is that correct ?
My understanding is if we use 'optimistic locking' it will prevent the issue of 'lost update'
(Based on some very good answers here )
My confusion :
However would it also imply / mean that by using 'optimistic locking' we also eliminate the issue of 'non repeatable read' ?
All of these questions pertain to a Java J2EE application with Oracle database.
NOTE : to avoid distractions I am not looking for details pertaining to dirty reads and phantom reads - my focus presently is entirely on non repeatable reads and lost update

Non-repeatable reads, lost updates, phantom reads, as well as dirty reads, are about transaction isolation levels, rather than pessimistic/optimistic locking. I believe Oracle's default isolation level is read committed, meaning that only dirty reads are prevented.
Non-repeatable reads and lost updates are indeed somehow related, as they may or may not occur on the same level of isolation. Neither can be avoided by locking only unless you set the correct isolation level, but you can use versioning (a column value that is checked against and increments on every update) to at least detect the issue (and take necessary action).

The purpose of repeatable reads is to provide read-consistent data:
within a query, all the results should reflect the state of the data at a
specific point in time.
within a transaction, the same query should return the same results
even if it is repeated.
In Oracle, queries are read-consistent as of the moment the query started. If data changes during the query, the query reads the version of the data that existed at the start of the query. That version is available in the "UNDO".
Bottom line: Oracle by default has an isolation level of READ COMMITTED, which guarantees read-consistent data within a query, but not within a transaction.
You talk about Tx A and Tx B. In Oracle, a session that does not change any data does not have a transaction.
Assume the default isolation level of READ COMMITTED. Assume the J2EE application uses a connection pool and is stateless.
app thread A connects to session X and reads a row.
app thread B connects to session Y and updates the row with commit.
app thread A connects to session Z and reads the same row, seeing a different result.
Notice that there is nothing any database can do here. Even if all the sessions had the SERIALIZABLE isolation level, session Z has no idea what is going on in session X. Besides, thread A cannot leave a transaction hanging in session X when it disconnects.
To your question, notice that app thread A never changed any data. The human user behind app thread A queried the same data twice and saw two different results, that is all.
Now let's do an update:
app thread A connects to session X and reads a row.
app thread B connects to session Y and updates the row with commit.
app thread A connects to session Z and updates the same row with commit.
Here the same row had three different values, not two. The human user behind thread A saw the first value and changed it to the third value without ever seeing the second value! That is what we mean by a "lost update".
The idea behind optimistic locking is to notify the human user that, between the time they queried the data and the time they asked to update it, someone else changed the data first. They should look at the most recent values before confirming the update.
To simplify:
"non-repeatable reads" happen if you query, then I update, then you query.
"lost updates" happen if you query, then I update, then you update. Notice that if you query the data again, you need to see the new value in order to decide what to do next.
Suggested reading: https://blogs.oracle.com/oraclemagazine/on-transaction-isolation-levels
Best regards, Stew Ashton

Reliably tracking changes made by Hibernate

I'm using a PostUpdateEventListener registered via
registry.appendListeners(EventType.POST_COMMIT_UPDATE, listener)
and a few other listeners in order to track changes made by Hibernate. This works perfectly, however, I see a problem there:
Let's say, for tracking some amount by id, I simply execute
amountByIdConcurrentMap.put(id, amount);
on every POST_COMMIT_UPDATE (let's ignore other operations). The problem is that this call happens some time after the commit. So with two commits writing the same entity shortly one after the other, I can receive the events in the wrong order, ending up with the older amount stored.
Is this really possible or are the operations synchronized somehow?
Is there a way how to prevent or at least detect such situation?

Two questions and a proposal later
Are you sure, that you need this optimization. Why not fetch the amount as it is written to the database by querying there. What gives you reason to work with caching.
How do you make sure, that the calculation of the amount before writing it to the database is properly synchronized, so that multiple threads or probably nodes do not use old data to calculate the amount and therefore overwrite the result of a later calculation?
I suppose you handle question number 2 right. Then you have to options:
Pessimistic locking, that means that immediatly before the commit you can exclusively update your cache without concurrency issues.
Optimistic locking: In that case you have a kind of timestamp or counter in your database-record you can also put into the cache together with the amount. This value you can use to find out, what the more recent value is.

No, there are no ordering guarantees, so you'll have to take care to ensure proper synchronization manually.
If the real problem you are solving is caching of entity state and if it is suitable to use second-level cache for the entity in question, then you would get everything out of the box by enabling the L2 cache.
Otherwise, instead of updating the map from the update listeners directly, you could submit tasks to an Executor or messaging system that would asynchronously start a new transaction and select for update the amount for the given id from the database. Then update the map in the same transaction while holding the corresponding row lock in the db, so that map updates for the same id are done serially.

JPA Optimistic locking

I have some troubles understanding the OPTIMISTIC LockMode.
Let's consider in the following scenario: "Thread A creates a Transaction and reads a list of all Users from Table USERS. Thread B updates a user in Table USERS. Thread B commits. Thread A commits".
Assume I am using OPTIMISTIC Locking. Would in this case the 2nd commit cause the OptimisticLockException to be thrown ?
Because according to this docu: "During commit (and flush), ObjectDB checks every database object that has to be updated or deleted, and compares the version number of that object in the database to the version number of the in-memory object being updated. The transaction fails and an OptimisticLockException is thrown if the version numbers do not match".
No Exception should be thrown because the version number is checked only for those entities, which has to be updated or deleted.
BUT
This docu is saying: "JPA Optimistic locking allows anyone to read and update an entity, however a version check is made upon commit and an exception is thrown if the version was updated in the database since the entity was read. "
According to this desc the Exception should be thrown, because the version check is made upon commit (I assume they mean every commit including the commits after read).
I want to achieve that the described scenario should not throw any Concurency Exception, it's no problem if Thread A returns a list of users, which is not the most recent. So is it correct to use optimistic Locking or if not which LockType should I use ?

The two links you gave say the same thing. If an entity is being updated in TransactionA and it has been modified in the DB by TransactionB since TransactionA read the entity, then an OptimisticLockException will be thrown.
In your case you are retrieving a list of all Users in threadA, but you update only one. You will get the OptimisticLockException only if the same entity was changed and committed(attempted) in threadb.
You would want an exception to be thrown in this case otherwise only one of the updates will succeed – the last one to commit would simply override the earlier commit - but which one will be the last one is somewhat indeterminate – sometimes threadA sometimes threadB and the DB contents will be really not as intended. So locking prevents this undesired behaviour.
If your application transactions are regularly colliding with data consider using pessimistic locking also described in https://blogs.oracle.com/carolmcdonald/entry/jpa_2_0_concurrency_and

Optimistic locking is very simple to understand:
Each entity have a timestamp / version numer attribute.
Each time an entity is updated, the timestamp / version number is updated too.
When you update an entity, you first read the actual timestamp in persistence layer (the database) if it has changed since the time you loaded it, then an OptimisticLockException is thrown, otherwise it's updated along with the new timestamp / version number.
If you have no risk of concurrent update then you shouldn't use any lock mechanism cause even the optimistic one have a performance impact (you have to check the timestamp before updating the entity).
Pessimistic locking is a scalability issue because it allow only one access for update at a time on a given resources (and so other -not read only- accesses are blocked), but it avoid operations to fail. If you can't afford to loose an operation go with pessimistic if scalability is not an issue, otherwise handle the concurrency mitigation at business level.

Ehcache getting out of sync with database

On a Java EE server using CMT, I am using ehcache to implement a caching layer between the business object layer (EJBs) and the Data Access layer (POJOs using JDBC). I seem to be experiencing a race condition between two threads accessing the same record while using a self-populating Ehcache. The cache is keyed on the primary key of the record.
The scenario is:
The first thread updates the record in the database and removes the record from cache (but the database commit doesn't necessarily happen immediately - there may be other queries to follow.)
The second thread reads the record, causing the cache to be re-populated.
The first thread commits transaction.
This is all happening in a fraction of a second. It results in the cache being out of sync with the database, and subsequent reads of the record returning the stale cached data until another update is performed, or the entry expires from the cache. I can handle stale data for short periods (the typical length of a transaction), but not minutes, which is how long I would like to cache objects.
Any suggestions for avoiding this race condition?
UPDATE:
Clearing the cache after the transaction has committed would certainly be ideal. The question is, in a J2EE environment using CMT, when the caching layer is sandwiched between the business layer (stateless session EJBs) and the data access layer, how to do this?
To be clear about the constraints this imposes, the method call in question may or may not be in the same transaction as additional method calls that happen before or after. I can't force a commit (or do this work in a separate transaction) since that would change the transaction boundaries from what the client code expects. Any subsequent exceptions would not roll back the entire transaction (unneseccarily clearing the cache in this case is an acceptable side-effect). I can't control the entry points into the transaction, as it is essentially an API that clients can use. It is not reasonable to push the resonsiblity of clearing the cache to the client application.
I would like to be able to defer any cache clearing operations until the entire transaction is committed by the EJB container, but I have found no way to hook into that logic and run my own code with a stateless session bean.
UPDATE #2:
The most promising solution so far, short of a major design change, is to use ehcache 2.0's JTA support: http://ehcache.org/documentation/apis/jta
This means upgrading to ehcache 2.x and enabling XA transactions for the database as well, which could potentially have negative side-effects. But it seems like the "right" way.

You are using transactions - it makes more sense to remove the cache after the commit, that is when the change really happens.
That way you see the old data only during the length of the transaction, and all reads afterwards have the latest view.
Update: Since this is CMT specific, you should look at the SessionSynchronization interface, and it's afterCompletion() method. This is showed in this tutorial.

Hibernate/Ehcache: evicting collections from 2nd level cache not synchronized with other DB reads

I have an application using JPA, Hibernate and ehcache, as well as Spring's declarative
transactions. The load on DB is rather high so everything is cached to speed things up,
including collections. Now it is not a secret that collections are cached separately
from the entities that own them so if I delete an entity that is an element of such
cached collection, persist an entity that should be an element of one, or update an
entity such that it travels from one collection to another, I gotta perform the eviction
by hand.
So I use a hibernate event listener which keeps track of entities being inserted, deleted
or updated and saves that info for a transaction synchronization registered with Spring's
transaction manager to act upon. The synchronization then performs the eviction once the
transaction is committed.
Now the problem is that quite often, some other concurrent transaction manages to find
a collection in the cache that has just been evicted (these events are usually tenths of a
second apart according to log) and, naturally, causes an EntityNotFoundException to occur.
How do I synchronize this stuff correctly?
I tried doing the eviction in each of the 4 methods of TransactionSynchronization (which
are invoked at different points in time relative to transaction completion), it didn't help.

Essentially what you need to do is to force a read from the database in the event that a collection is in the process of or has just been evicted. One way to do this would be to mark the collection as dirty as soon as a request to evict it has been received but before entering the transaction to change it. Any concurrent transaction which comes along will check the dirty flag and if its set to true, it should get the data from the database otherwise it can read from the cache. You might need to change your DB transaction settings so that concurrent transactions block till the one updating the data finishes so that correct data is read from the DB. Once the transaction finishes, you can then reset the dirty flag to false.
You can also create a lock on the cached collection when an update, insert or delete is due for as long as the eviction lasts. This will ensure that no other transaction can read/change the cached collection till the eviction process finishes.

Why can't you must keep the collections up to date? i.e. when you add an object, add the object to the collection it belongs to. When you delete an object, remove it from the collection it is in. In my experience when using a cache with hibernate or jpa the state of the object (not the state of the database) is cached so you need to make sure your object model in memory is in sync with the object model on the database.
Or am I missing something? Why can't you simply keep the collections uptodate?

i think you must refer this link : -
Hibernate: Clean collection's 2nd level cache while cascade delete items
see, hibernate does not actually delete the object from the cache.. rest u can get the answer from above link

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.