How should I handle deletion of objects in Object Relational Mapping - java

I have had this confusion about how I handle the different lifecycle events of an Object where they are persisted in a database. I know this can vary with design, but I would like to know any standard / best practice which are followed in such cases.
Say I have a User class like this -
class User {
int id;
String name;
}
An object of this type represents a row in the database. The objects are accessed by multiple threads. So, how do you manage the object? What are its methods? How do you implement deletion of such an object?
Say there are two threads, A and B which are accessing the same user object.
// Thread A
User user = UserFactory.getUser(userId);
// Thread B
User user = UserFactory.getUser(userId)
user.delete(); // Or UserFactory.delete(userId)
Then what happens to the user object in thread A? Any clarification would be very helpful.
Thanks!

Then what happens to the user object in thread A? Any clarification would be very helpful.
Nothing. Of course that is probably your problem. What will happen if you try to persist in thread A after deletion is probably highly dependent on the ORM you are using but my guess assuming your using Hibernate is that it will fail trying to do an UPDATE/DELETE as the open Session in thread A does not know that the row is missing. This happens frequently.
Notice that thread A will always be able to mutate the user object freely with out error so long as it does not persist/delete. The error will only happen when you go persist/delete. In this case whoever is first persist/delete wins (no error).
People mitigate this problem in a variety of ways:
Silently swallow the exception and either reinsert the object or ignore
Promote the exception (usually an Optimistic Locking Exception) and convert to a user friendly error.
Never allow deletes in the first place. Use a boolean column to indicate delete.
Make your application implement Optimistic Locking in sync with your ORM's optimistic locking.
Use transactions and/or synchronized (single JVM)
Use transactions and row locking SELECT ... FOR UPDATE.
In most situations I go with number 2 and/or 3. Number 5 is the most pessimistic, requires lots of resources with the potential of dead locking.

from what I understand, as far as threads accessing an object, it is a free for all which means both of them may have access to the same object unless synchronization/locking is being used.
From what I am able to gather from your pseudo code, your deletion of the object would completely remove it from the system (including any threads that are accessing it), but I believe it depends on the way you delete it. If you delete an object within the thread itself, it should only delete it from that thread, but if you delete the object outside of the threads it should delete it from the entire system.
From what I see of your code, I am having trouble determining if you are trying to delete the object within Thread B or delete it in the rest of the code, but whichever you are trying to accomplish, I hope my above explanation helps.

Related

Concurrency with Hibernate in Spring

I found a lot of posts regarding this topic, but all answers were just links to documentations with no example code, i.e., how to use concurrency in practice.
My situation: I have an entity House with (for simplyfication) two attributes, number (the id) and owner. The database is initialized with 10 Houses with number 1-10 and owner always null.
I want to assign a new owner to the house with currently no owner, and the smallest number. My code looks like this:
#Transactional
void assignNewOwner(String newOwner) {
//this is flagged as #Transactional too
House tmp = houseDao.getHouseWithoutOwnerAndSmallestNumber();
tmp.setOwner(newOwner);
//this is flagged as #Transactional too
houseDao.update(tmp);
}
For my understanding, although the #Transactional is used, the same House could be assigned twice to different owners, if two requests fetch the same empty House as tmp. How do I ensure this can not happen?
I know, including the update in the selection of the empty House would solve the issue, but in near future, I want to modify/work with the tmp object more.
Optimistic
If you add a version column to your entity / table then you could take advantage of a mechanism called Optimistic Locking. This is the most proficient way of making sure that the state of an entity has not changed since we obtained it in a transactional context.
Once you createQuery using the session you can then call setLockMode(LockModeType.OPTIMISTIC);
Then, just before the transaction is commited, the persistence provider would query for the current version of that entity and check whether it has been incremented by another transaction. If so, you would get an OptimisticLockException and a transaction rollback.
Pessimistic
If you do not version your rows, then you are left with pessimistic lockin which basically means that you phycically create a lock for queries entities on the database level and other transactions cannot read / update those certain rows.
You achieve that by setting this on the Query object:
setLockMode(LockModeType.PESSIMISTIC_READ);
or
setLockMode(LockModeType.PESSIMISTIC_WRITE);
Actually it's pretty easy - at least in my opinion and I am going to abstract away of what Hibernate will generate when you say Pessimistic/Optimistic. You might think this is SELECT FOR UPDATE - but it's not always the case, MSSQL AFAIK does not have that...
These are JPA annotations and they guarantee some functionality, not the implementation.
Fundamentally they are entire different things - PESSIMISTIC vs OPTIMISTIC locking. When you do a pessimistic locking you sort of do a synchronized block at least logically - you can do whatever you want and you are safe within the scope of the transaction. Now, whatever the lock is being held for the row, table or even page is un-specified; so a bit dangerous. Usually database may escalate locks, MSSQL does that if I re-call correctly.
Obviously lock starvation is an issue, so you might think that OPTIMISTIC locking would help. As a side note, this is what transactional memory is in modern CPU; they use the same thinking process.
So optimistically locking is like saying - I will mark this row with an ID/Date, etc, then I will take a snapshot of that and work with it - before committing I will check if that Id has a changed. Obviously there is contention on that ID, but not on the data. If it has changed - abort (aka throw OptimisticLockException) otherwise commit the work.
The thing that bothers everyone IMO is that OptimisticLockException - how do you recover from that? And here is something you are not going to like - it depends. There are apps where a simple retry would be enough, there are apps where this would be impossible. I have used it in rare scenarios.
I usually go with Pessimistic locking (unless Optimistic is totally not an option). At the same time I would look of what hibernate generates for that query. For example you might need an index on how the entry is retrieved for the DB to actually lock just the row - because ultimately that is what you would want.

A good strategy/solution for concurrency with hibernate's optimistic/pessimistic locking

Let's presume that we have an application "mail client" and a front-end for it.
If a user is typing a message or editing the subject or whatever, a rest call is made to update whatever the user was changing (e.g. the receivers) to keep the message in DRAFT. So a lot PUT's are happening to save the message. When closing the window, an update of every editable field happens at the same time. Hibernate can't handle this concurrency: Each of those calls retrieve the message, edit their own fields and try to save the message again, while the other call already changed it.
I know I can add a rest call to save all fields at the same time, but I was wondering if there is a cleaner solution, or a decent strategy to handle such cases (like for example only updating one field or some merge strategy if the object has already changed)
Thanks in advance!
The easiest solutions here would be to tweak the UI to either
Submit a single rest call during email submission that does all the tasks necessary.
Serialize the rest calls so they're chained rather than firing concurrently.
The concern I have here is that this will snowball at some point and become a bigger concurrency problem as more users are interacting with the application. Consider for the moment the potential number of concurrent rest calls your web infrastructure will have to support alone when you're faced with a 100, 500, 1000, or even 10000 or more concurrent users.
Does it really make sense to beef up the volume of servers to handle that load when the load itself is a product of a design flaw in the first place?
Hibernate is designed to handle locking through two mechanisms, optimistic and pessimistic.
Optimistic Way
Read the entity from the data store.
Cache a copy of the fields you're going to modify in temporary variables.
Modify the field or fields based on your PUT operation.
Attempt to merge the changes.
If save succeeds, you're done.
Should an OptimisticLockException occur, refresh the entity state from data store.
Compare cached values to the fields you must change.
If values differ, you can assert or throw an exception
If they don't differ, go back to 4.
The beautiful part of the optimistic approach is you avoid any form of deadlock happening, particularly if you're allowing multiple tables to be read and locked separately.
While you can use pessimistic lock options, optimistic locking is generally the best accepted way to handle concurrent operations as it has the least concurrency contention and performance impact.

Hibernate SaveOrUpdate - multiple workthreads

While I have been able to find information on how Hibernate's transaction work, so the database doesn't corrupted, it has been harder to understand how Hibernate treats an object which is shared between threads, and each thread tries to save it to the database.
This is my theoretical question:
1) I have a Person object with attributes (ssn, name, address).
2) Three threads have a reference to this person object and each thread calls the method savePersonToHibernate(...)
public void savePersonToHibernate(Person person)
{
...
session.saveOrUpdate(person)
...
}
How does Hibernate cope with 3 threads writing the same object to the storage? Does it put all the transactions in a queue so when the first thread creates the row and identifier (set the id) the remaining two thread will only update it with (in this case) no changes? Or will I actually have the chance of having 2 or 3 rows in the database with a current object only referring to the last identifier created?
I hope it makes some kinda sense... I'm making a queue system, and the data needs to be referred to categories which needs to be created on the fly... and if two or more thread get some data which both needs to have the same category created, I'd hate to have duplicated.
I hope this makes sense... what would you do?
I'm assuming that all mentioned threads use different sessions otherwise you are in trouble as hibernate session is not thread-safe.
Just to make things clear, if all three threads are using the same instance of person and this is a new object you are in trouble as hibernate doesn't do any synchronization when accessing or modifying object. Basically each thread works as though other threads do not exist, so each will first check if person has non null id and try to generate it if id is null and then will assign it to appropriate entity field. Depending on the timing of check-generate-assign in different threads and visibility effects of changes result of concurrent creation is unpredictable.
Let's see what will happen if all threads are using different instances of person but with the same attributes values. In this case each thread will try to create three different rows in database and if there are no unique constraints on the underlying table (like unique name) it will succeed.
Your particular scenario with category creation is not very straightforward to implement. The idea is to try to create category but catch exception if it is already exists. In the latter case read existing category from database and use it. But keep in mind that implementation of conditional insert is not trivial and may be RDBMS dependent. You may fine slightly more complex but related examples for upsert operation for PostgreSQL and SQL Server.

Lock a particular method in java

In my java webappication, there is an action which updates the order object and saves it in DB through ajax call(POST request).
Method saveOrder() performs this action,
if multiple users perform the same action, there should be lock on this method, so that the write transaction is performed with the latest data.
The class file code is as follows
public class OrderLoader extends JSONProcessSimple {
#override
public JSONObject exec(JSONObject jsonsent) throws JSONException, ServletException {
JSONObject result = this.saveOrder(array);
return result;
}
public JSONObject saveOrder(JSONArray jsonarray) throws JSONException {
JSONObject jsonResponse = new JSONObject();
//Write operation on DB
return jsonResponse;
}
}
Is it possible through synchronized approach, please suggest me a solution.
Thanks in advance!
Depending on the architecture of your application (does it run in mutiple parallel instances in a clustered environment?), there is no simple solution; if it is executed in only one VM, synchronized could be an approach. Also, have a look at the java.util.concurrent.lock package.
For a more sophisticated, distributed approach, you could implement a DB-based lock.
A better solution would be to check your database isolation and your SQL. Perhaps you need a SERIALZIABLE connection or a transaction manager. That's server side.
It's easy to add a synchronized keyword, but I think it's more than that.
http://docs.oracle.com/javase/tutorial/jdbc/basics/transactions.html
http://www.precisejava.com/javaperf/j2ee/JDBC.htm
Synchronization allone would not do the job since you'd block until the first request is saved and then you'd still have to check whether the other orders are newer or not.
As the others already stated, there's no easy solution and it would depend on your architecturea and environment.
You might want to try some optimitistic locking approach, i.e. each update checks a version column and increments it if the version matches. Something like ... SET version = x + 1 WHERE version = x and then you check whether columns have been updated or not.
That would not get the latest order to be saved but would prevent lost updates. You could, however, adapt that approach to only update the database whenever you have newer data (maybe based on some date and then use WHERE date > x).
EDIT:
Since you're using Hibernate, I'd look into Hibernate's optimistic locking. That would at least handle concurrent edits since only the first one would succeed. If non-concurrent edits result in OptimisticLockExceptions you are probably missing a (re)read somewhere.
With concurrent edit I mean user A reads the object, changes it and then triggers the write. In between user B has also read the object and triggers a write later. The writes are not concurrent but user B didn't see the changes of user A and thus might result in lost updates.
In your case it would depend on what operations are done on an order. In some cases you might safely reread the order just before persisting the changes (e.g. when adding positions, it might even be ok to do so when deleting them - if the position doesn't exist you just do nothing) while in other cases you might want to report the concurrent edit (e.g. when two users edit the quantity of the same position, the order header etc.)
If your method updates order for different users then you dont need to have any synchronization as every thread hitting the method will have its own copy, thus its not a concern.
But if all users are acting on same data, you can use write to db in begintransaction and endtransaction. This should make concurrent writes threadsafe.

Parallel updates to different entity properties

I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.

Categories