While I have been able to find information on how Hibernate's transaction work, so the database doesn't corrupted, it has been harder to understand how Hibernate treats an object which is shared between threads, and each thread tries to save it to the database.
This is my theoretical question:
1) I have a Person object with attributes (ssn, name, address).
2) Three threads have a reference to this person object and each thread calls the method savePersonToHibernate(...)
public void savePersonToHibernate(Person person)
{
...
session.saveOrUpdate(person)
...
}
How does Hibernate cope with 3 threads writing the same object to the storage? Does it put all the transactions in a queue so when the first thread creates the row and identifier (set the id) the remaining two thread will only update it with (in this case) no changes? Or will I actually have the chance of having 2 or 3 rows in the database with a current object only referring to the last identifier created?
I hope it makes some kinda sense... I'm making a queue system, and the data needs to be referred to categories which needs to be created on the fly... and if two or more thread get some data which both needs to have the same category created, I'd hate to have duplicated.
I hope this makes sense... what would you do?
I'm assuming that all mentioned threads use different sessions otherwise you are in trouble as hibernate session is not thread-safe.
Just to make things clear, if all three threads are using the same instance of person and this is a new object you are in trouble as hibernate doesn't do any synchronization when accessing or modifying object. Basically each thread works as though other threads do not exist, so each will first check if person has non null id and try to generate it if id is null and then will assign it to appropriate entity field. Depending on the timing of check-generate-assign in different threads and visibility effects of changes result of concurrent creation is unpredictable.
Let's see what will happen if all threads are using different instances of person but with the same attributes values. In this case each thread will try to create three different rows in database and if there are no unique constraints on the underlying table (like unique name) it will succeed.
Your particular scenario with category creation is not very straightforward to implement. The idea is to try to create category but catch exception if it is already exists. In the latter case read existing category from database and use it. But keep in mind that implementation of conditional insert is not trivial and may be RDBMS dependent. You may fine slightly more complex but related examples for upsert operation for PostgreSQL and SQL Server.
Related
I have an application with hibernate. There are two main threads, first one is collecting/modifying data and second one is saving data into database, in certain cases the program may try to modify and save the entity at same time.
Do i have to make all entities thread-safe (use only synchronized collections, atomic objects instead of primitives...) or hibernate takes care about it automatically?
Hibernate instantiates objects per session, so classic synchronization is not needed (and would not be helpful).
The most common way to take care of concurrent data access and modifications is to use locks.
I am extracting the following lines from the famous book - Mastering Enterprise JavaBeans™ 3.0.
Concurrent Access and Locking:Concurrent access to data in the database is always protected by transaction isolation, so you need not design additional concurrency controls to protect your
data in your applications if transactions are used appropriately. Unless you make specific provisions, your entities will be protected by container-managed transactions using the isolation levels that are configured for your persistence provider and/or EJB container’s transaction service. However, it is important to understand the concurrency control requirements and semantics of your applications.
Then it talks about Java Transaction API, Container Managed and Bean Managed Transaction, different TransactionAttributes, different Isolation Levels. It also states that -
The Java Persistence specification defines two important features that can be
tuned for entities that are accessed concurrently:
1.Optimistic locking using a version attribute
2.Explicit read and write locks
Ok - I read everything and understood them well. But the question comes in which scenario I need the use all these techniques? If I use Container Managed transaction and it does everything for me why I need to bother about all these details? I know the significance of TransactionAttributes (REQUIRED, REQUIRES_NEW) and know in which cases I need to use them, but what about the others? More specifically -
Why do I need Bean Managed transaction?
Why do we need Read and Write Lock on Entity classes?
Why do we need version attribute?
For Q2 and Q3 - I think Entity classes are not thread safe and hence we need locking over there. But database is managed at the EJB class by the JTA API (as stated in the first para), and then why do we need to manage the Entity classes separately? I know how the Lock and Version works and why they are required. But why they are coming into the picture since JTA is already present?
Can you please provide any answer to them? If you give me some URLs even that will be very highly appreciated.
Many thanks in advance.
You don't need locking because entity classes are not thread-safe. Entities must not be shared between threads, that's all.
Your database comes with ACID guarantees, but that is not always sufficient, and you sometimes nees to explicitely lock rows to get what you need. Imagine the following scenarios:
transaction A reads employee 1 from database
transaction B reads employee 1 from database
transaction A sets employee 1 salary to 3000
transaction B sets employee 1 salary to 4000
transaction A commits
transaction B commits
The end result is that the salary is 4000. The user that started transaction A is completely unaware that even though he set the salary to 3000, another user, concurrently, set it to 4000. Depending on which transaction writes last, the end result is different (and thus unpredictable). That's the kind of situation that can be avoided using optimistic locking.
Next scenario: you want to generate purely sequential invoice numbers, without lost values and without duplicates. You could imagine reading and incrementing a value in the database to do that. But two transactions might both read the same value concurrently, and then incrementing it. You would thus have a duplicate. Using a lock in the table row holding the next number allows avoiding this situation.
EntityManager maintains first level cache for retrieved objects, but if you want to have threadsafe aplication you're creating and closing entityManager for each transaction.
So whats the point of the level 1 cache if those entities are created and closed for every transaction? Or entityManager cache is usable if youre working in single thread?
The point is to have an application that works like you expect it to work, and that wouldn't be slow as hell. Let's take an example:
Order order = em.find(Order.class, 3L);
Customer customer = em.find(Customer.class, 5L);
for (Order o : customer.getOrders()) { // line A
if (o.getId().longValue == 3L) {
o.setComment("hello"); // line B
o.setModifier("John");
}
}
System.out.println(order.getComment)); // line C
for (Order o : customer.getOrders()) { // line D
System.out.println(o.getComment()); // line E
}
At line A, JPA executes a SQL query to load all the orders of the customer.
At line C, what do you expect to be printed? null or "hello"? You expect "hello" to be printed, because the order you modified at line B has the same ID as the one loaded in the first line. That wouldn't be possible without the first-level cache.
At line D, you don't expect the orders to be loaded again from the database, because they have already been loaded at line A. That wouldn't be possible without the first-level cache.
At line E, you expect once again "hello" to be printed for the order 3. That wouldn't be possible without the first-level cache.
At line B, you don't expect an update query to be executed, because there might be many subsequent modifications (like in the next line), to the same entity. So you expect these modifications to be written to the database as late as possible, all in one go, at the end of the transaction. That wouldn't be possible without the first-level cache.
The first level cache serves other purposes. It is basically the context in which JPA places the entities retrieved from the database.
Performance
So, to start stating the obvious, it avoids having to retrieve a record when it has already being retrieved serving as some form of cache during a transaction processing and improving performance. Also, think about lazy loading. How could you implement it without a cache to record entities that have already being lazy loaded?
Cyclic Relationships
This caching purpose is vital to the implementation of appropriate ORM frameworks. In object-oriented languages it is common that the object graph has cyclic relationships. For instance, a Department that has Employee objects and those Employee objects belong to a Department.
Without a context (aka as Unit of Work) it would be difficult to keep track of which records you have already ORMed and you would end up creating new objects, and in a case like this, you may even end up in an infinite loop.
Keep Track of Changes: Commit and Rollback
Also, this context keeps track of the changes you do to the objects so that they can be persisted or rolled back at some later point when the transaction ends. Without a cache like this you would be forced to flush your changes to the database immediately as they happen and then you could not rollback, neither could you optimize the best moment to flush them to the store.
Object Identity
Object identity is also vital in ORM frameworks. That is if you retrieve employee ID 123, then if at some time you need that Employee, you should always get the same object, and not some new Object containing the same data.
This type of cache is not to be shared by multiple threads, if it was so, you would compromise performance and force everyone to pay that penalty even when they could be just fine with a single-threaded solution. Besides the fact that you would end up with a much more complex solution that would be like killing a fly with a bazooka.
That is the reason why if what you need is a shared cache, then you actually need a 2nd-level cache, and there are implementations for that as well.
This seems like it would come up often, but I've Googled to no avail.
Suppose you have a Hibernate entity User. You have one User in your DB with id 1.
You have two threads running, A and B. They do the following:
A gets user 1 and closes its Session
B gets user 1 and deletes it
A changes a field on user 1
A gets a new Session and merges user 1
All my testing indicates that the merge attempts to find user 1 in the DB (it can't, obviously), so it inserts a new user with id 2.
My expectation, on the other hand, would be that Hibernate would see that the user being merged was not new (because it has an ID). It would try to find the user in the DB, which would fail, so it would not attempt an insert or an update. Ideally it would throw some kind of concurrency exception.
Note that I am using optimistic locking through #Version, and that does not help matters.
So, questions:
Is my observed Hibernate behaviour the intended behaviour?
If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
If the answer to 2. is yes, why is nobody complaining about it?
Please see the text from hibernate documentation below.
Copy the state of the given object onto the persistent object with the same identifier. If there is no persistent instance currently associated with the session, it will be loaded. Return the persistent instance. If the given instance is unsaved, save a copy of and return it as a newly persistent instance.
It clearly stated that copy the state(data) of object in database. if object is not there then save a copy of that data. When we say save a copy hibernate always create a record with new identifier.
Hibernate merge function works something like as follows.
It checks the status(attached or detached to the session) of entity and found it detached.
Then it tries to load the entity with identifier but not found in database.
As entity is not found then it treat that entity as transient.
Transient entity always create a new database record with new identifier.
Locking is always applied to attached entities. If entity is detached then hibernate will always load it and version value gets updated.
Locking is used to control concurrency problems. It is not the concurrency issue.
I've been looking at JSR-220, from which Session#merge claims to get its semantics. The JSR is sadly ambiguous, I have found.
It does say:
Optimistic locking is a technique that is used to insure that updates
to the database data corresponding to the state of an entity are made
only when no intervening transaction has updated that data since the
entity state was read.
If you take "updates" to include general mutation of the database data, including deletes, and not just a SQL UPDATE, which I do, I think you can make an argument that the observed behaviour is not compliant with optimistic locking.
Many people agree, given the comments on my question and the subsequent discovery of this bug.
From a purely practical point of view, the behaviour, compliant or not, could lead to quite a few bugs, because it is contrary to many developers' expectations. There does not seem to be an easy fix for it. In fact, Spring Data JPA seems to ignore this issue completely by blindly using EM#merge. Maybe other JPA providers handle this differently, but with Hibernate this could cause issues.
I'm actually working around this by using Session#update currently. It's really ugly, and requires code to handle the case when you try to update an entity that is detached, and there's a managed copy of it already. But, it won't lead to spurious inserts either.
1.Is my observed Hibernate behaviour the intended behaviour?
The behavior is correct. You just trying to do operations that are not protected against concurrent data modification :) If you have to split the operation into two sessions. Just find the object for update again and check if it is still there, throw exception if not. If there is one then lock it by using em.(class, primary key, LockModeType); or using #Version or #Entity(optimisticLock=OptimisticLockType.ALL/DIRTY/VERSION) to protect the object till the end of the transaction.
2.If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
Probably: yes
3.If the answer to 2. is yes, why is nobody complaining about it?
Because if you protect your operations using pessimistic or optimistic locking the problem will disappear:)
The problem you are trying to solve is called: Non-repeatable read
I have had this confusion about how I handle the different lifecycle events of an Object where they are persisted in a database. I know this can vary with design, but I would like to know any standard / best practice which are followed in such cases.
Say I have a User class like this -
class User {
int id;
String name;
}
An object of this type represents a row in the database. The objects are accessed by multiple threads. So, how do you manage the object? What are its methods? How do you implement deletion of such an object?
Say there are two threads, A and B which are accessing the same user object.
// Thread A
User user = UserFactory.getUser(userId);
// Thread B
User user = UserFactory.getUser(userId)
user.delete(); // Or UserFactory.delete(userId)
Then what happens to the user object in thread A? Any clarification would be very helpful.
Thanks!
Then what happens to the user object in thread A? Any clarification would be very helpful.
Nothing. Of course that is probably your problem. What will happen if you try to persist in thread A after deletion is probably highly dependent on the ORM you are using but my guess assuming your using Hibernate is that it will fail trying to do an UPDATE/DELETE as the open Session in thread A does not know that the row is missing. This happens frequently.
Notice that thread A will always be able to mutate the user object freely with out error so long as it does not persist/delete. The error will only happen when you go persist/delete. In this case whoever is first persist/delete wins (no error).
People mitigate this problem in a variety of ways:
Silently swallow the exception and either reinsert the object or ignore
Promote the exception (usually an Optimistic Locking Exception) and convert to a user friendly error.
Never allow deletes in the first place. Use a boolean column to indicate delete.
Make your application implement Optimistic Locking in sync with your ORM's optimistic locking.
Use transactions and/or synchronized (single JVM)
Use transactions and row locking SELECT ... FOR UPDATE.
In most situations I go with number 2 and/or 3. Number 5 is the most pessimistic, requires lots of resources with the potential of dead locking.
from what I understand, as far as threads accessing an object, it is a free for all which means both of them may have access to the same object unless synchronization/locking is being used.
From what I am able to gather from your pseudo code, your deletion of the object would completely remove it from the system (including any threads that are accessing it), but I believe it depends on the way you delete it. If you delete an object within the thread itself, it should only delete it from that thread, but if you delete the object outside of the threads it should delete it from the entire system.
From what I see of your code, I am having trouble determining if you are trying to delete the object within Thread B or delete it in the rest of the code, but whichever you are trying to accomplish, I hope my above explanation helps.