JPA merge vs. persist [duplicate] - java

This question already has answers here:
JPA EntityManager: Why use persist() over merge()?
(16 answers)
Closed 2 years ago.
So far, my preference has been to always use EntityManager's merge() take care of both insert and update. But I have also noticed that merge performs an additional select queries before update/insert to ensure record does not already exists in the database.
Now that I am working on a project requiring extensive (bulk) inserts to the database. From a performance point of view does it make sense to use persist instead of merge in a scenario where I absolutely know that I am always creating a new instance of objects to be persisted?

It's not a good idea using merge when a persist suffices - merge does quite a lot more of work. The topic has been discussed on StackOverflow before, and this article explains in detail the differences, with some nice flow diagrams to make things clear.

I would definitely go with persist persist() if, as you said:
(...) I absolutely know that I am always creating a new instance of objects to be persisted (...)
That's what this method is all about - it will protect you in cases where the Entity already exists (and will rollback your transaction).

If you're using the assigned generator, using merge instead of persist can cause a redundant SQL statement, therefore affecting performance.
Also, calling merge for managed entities is also a mistake since managed entities are automatically managed by Hibernate and their state is synchronized with the database record by the dirty checking mechanism upon flushing the Persistence Context.
To understand how all this works, you should first know that Hibernate shifts the developer mindset from SQL statements to entity state transitions.
Once an entity is actively managed by Hibernate, all changes are going to be automatically propagated to the database.
Hibernate monitors currently attached entities. But for an entity to become managed, it must be in the right entity state.
First, we must define all entity states:
New (Transient)
A newly created object that hasn’t ever been associated with a Hibernate Session (a.k.a Persistence Context) and is not mapped to any database table row is considered to be in the New (Transient) state.
To become persisted we need to either explicitly call the EntityManager#persist method or make use of the transitive persistence mechanism.
Persistent (Managed)
A persistent entity has been associated with a database table row and it’s being managed by the current running Persistence Context. Any change made to such entity is going to be detected and propagated to the database (during the Session flush-time).
With Hibernate, we no longer have to execute INSERT/UPDATE/DELETE statements. Hibernate employs a transactional write-behind working style and changes are synchronized at the very last responsible moment, during the current Session flush-time.
Detached
Once the current running Persistence Context is closed all the previously managed entities become detached. Successive changes will no longer be tracked and no automatic database synchronization is going to happen.
To associate a detached entity to an active Hibernate Session, you can choose one of the following options:
Reattaching
Hibernate (but not JPA 2.1) supports reattaching through the Session#update method.
A Hibernate Session can only associate one Entity object for a given database row. This is because the Persistence Context acts as an in-memory cache (first level cache) and only one value (entity) is associated with a given key (entity type and database identifier).
An entity can be reattached only if there is no other JVM object (matching the same database row) already associated to the current Hibernate Session.
Merging
The merge is going to copy the detached entity state (source) to a managed entity instance (destination). If the merging entity has no equivalent in the current Session, one will be fetched from the database.
The detached object instance will continue to remain detached even after the merge operation.
Removed
Although JPA demands that managed entities only are allowed to be removed, Hibernate can also delete detached entities (but only through a Session#delete method call).
A removed entity is only scheduled for deletion and the actual database DELETE statement will be executed during Session flush-time.
To understand the JPA state transitions better, you can visualize the following diagram:
Or if you use the Hibernate specific API:

Related

update() and merge behave differently in case of updating an item in OneToMany collection

I have this a class like bellow:
#Entity
#Table(name="work")
public class Work {
#Id
#Column(name="id")
private String id;
#OneToMany(orphanRemoval=true ,mappedBy="work", cascade=CascadeType.ALL , fetch=FetchType.EAGER)
private List<PersonRole> personRoleList;
}
As mine is an web application, when i update (comes from client) a personRoleList item and call :
session.update(work); //`work` is in detached state
It does not update the existing personRoleList item it actually add a new one.
Some other people also having the same problem. REF:
using-saveorupdate-in-hibernate-creates-new-records-instead-of-updating-existi
jpa-onetomany-not-deleting-child
I tried all suggested solution, but none of them work for me.
But then i just tried :
session.merge(work); //replacing session.update(work)
And it works as expected.!!
This is where I get confused. Because I can't find any explanation for this difference in behaviors in case of OneToMany relationship (or may be i missed ). I read some threads to understand the differences between update() and merge() and gone through the doc. REF:
what-are-the-differences-between-the-different-saving-methods-in-hibernate
differences-among-save-update-saveorupdate-merge-methods-in-session
But still it is not clear What are those behavioral pattern/logic/steps that creating this difference.?
Merge attempts to associate a currently transient object with a persistent object currently under management by the session by 'merging' them into one entity. Its intended use is when you have a detached object and an attached object and wish to resolve them.
In a merge(), Hibernate will read the entity from the database if there isn't already a managed instance in the session. In your example, this will result in Hibernate eagerly loading the collection (due to fetch=FetchType.EAGER). Then when your session ends, Hibernate will check for changes in the collection (due to cascade=CascadeType.ALL) and will perform the appropriate UPDATE in the database.
This differs from the update() scenario because in an update Hibernate always (by default) assumes the object is dirty and schedules an UPDATE. This update is likely what's causing creation of a new element in your collection - Hibernate hasn't looked in the database to bring the collection into session before issuing the UPDATE.
I'd bet you can get the desired behavior of update() by setting
select-before-update="true"
in your class mapping or by using the lock method to re-attach your object to the session before making changes.
From Chapter 9 of Java Persistence with Hibernate
It doesn’t matter if the item object is modified before or after it’s passed to
update(). The important thing here is that the call to update() is reattaching the detached instance to the new Session (and persistence context). Hibernate
always treats the object as dirty and schedules an SQL UPDATE, which will be executed during flush. You can see the same unit of work in figure 9.8.
You may be surprised and probably hoped that Hibernate could know that you
modified the detached item’s description (or that Hibernate should know you did
not modify anything). However, the new Session and its fresh persistence context
don’t have this information. Neither does the detached object contain some internal list of all the modifications you’ve made.
UDPATE in the database is needed. One way to avoid this UDPATE statement is to
configure the class mapping of Item with the select-before-update="true"
attribute. Hibernate then determines whether the object is dirty by executing a
SELECT statement and comparing the object’s current state to the current data-
base state.

What is the correct CascadeType in #ManyToMany Hibernate annotation?

I am trying to model a transient operations solution schema in Hibernate and I am unsure how to get the object graph and behavior I want from the model.
The table structure uses a correlation table (many-to-many) to create lists of users for the operation:
Operation OperationUsers Users
op_id op_id user_id
... user_id ...
In modeling the persistent class Operation.java using hibernate annotations, I created:
#ManyToMany(fetch=FetchType.LAZY)
#JoinColumn(name="op_id")
public List<User> users() { return userlist; }
So far, I have the following questions:
When a user is removed from the list, how do I avoid Hibernate
deleting the user from the Users table? It should just be removed
from the correlation table, not the Users table. I cannot see a valid
CascadeType to accomplish this.
Do I need to put anything more in the method body?
Do I need to add more annotation arguments?
I am expecting to do this without futzing with the User class.
Please tell me that I do not have to mess with User.java!
It's possible I'm overthinking this, but that's the nature of learning... Thanks in advance for any help you can offer!
From the documentation:
Hibernate defines and supports the following object states:
*Transient - an object is transient if it has just been instantiated using the new operator, and it is not associated with a Hibernate Session. It has no persistent representation in the database and no identifier value has been assigned. Transient instances will be destroyed by the garbage collector if the application does not hold a reference anymore. Use the Hibernate Session to make an object persistent (and let Hibernate take care of the SQL statements that need to be executed for this transition).
*Persistent - a persistent instance has a representation in the database and an identifier value. It might just have been saved or loaded, however, it is by definition in the scope of a Session. Hibernate will detect any changes made to an object in persistent state and synchronize the state with the database when the unit of work completes. Developers do not execute manual UPDATE statements, or DELETE statements when an object should be made transient.
*Detached - a detached instance is an object that has been persistent, but its Session has been closed. The reference to the object is still valid, of course, and the detached instance might even be modified in this state. A detached instance can be reattached to a new Session at a later point in time, making it (and all the modifications) persistent again. This feature enables a programming model for long running units of work that require user think-time. We call them application transactions, i.e., a unit of work from the point of view of the user.
As explained in this answer, you can detach your entity using Session.evict() to prevent hibernate from updating the database or simply clone it and make the needed changes on the copy.
It turns out that the specific answer to my primary question (#1 and the main topic) is: "Do not specify any CascadeType on the property."
The answer is mentioned sorta sideways in the answer to this question.

Should Hibernate Session#merge do an insert when receiving an entity with an ID?

This seems like it would come up often, but I've Googled to no avail.
Suppose you have a Hibernate entity User. You have one User in your DB with id 1.
You have two threads running, A and B. They do the following:
A gets user 1 and closes its Session
B gets user 1 and deletes it
A changes a field on user 1
A gets a new Session and merges user 1
All my testing indicates that the merge attempts to find user 1 in the DB (it can't, obviously), so it inserts a new user with id 2.
My expectation, on the other hand, would be that Hibernate would see that the user being merged was not new (because it has an ID). It would try to find the user in the DB, which would fail, so it would not attempt an insert or an update. Ideally it would throw some kind of concurrency exception.
Note that I am using optimistic locking through #Version, and that does not help matters.
So, questions:
Is my observed Hibernate behaviour the intended behaviour?
If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
If the answer to 2. is yes, why is nobody complaining about it?
Please see the text from hibernate documentation below.
Copy the state of the given object onto the persistent object with the same identifier. If there is no persistent instance currently associated with the session, it will be loaded. Return the persistent instance. If the given instance is unsaved, save a copy of and return it as a newly persistent instance.
It clearly stated that copy the state(data) of object in database. if object is not there then save a copy of that data. When we say save a copy hibernate always create a record with new identifier.
Hibernate merge function works something like as follows.
It checks the status(attached or detached to the session) of entity and found it detached.
Then it tries to load the entity with identifier but not found in database.
As entity is not found then it treat that entity as transient.
Transient entity always create a new database record with new identifier.
Locking is always applied to attached entities. If entity is detached then hibernate will always load it and version value gets updated.
Locking is used to control concurrency problems. It is not the concurrency issue.
I've been looking at JSR-220, from which Session#merge claims to get its semantics. The JSR is sadly ambiguous, I have found.
It does say:
Optimistic locking is a technique that is used to insure that updates
to the database data corresponding to the state of an entity are made
only when no intervening transaction has updated that data since the
entity state was read.
If you take "updates" to include general mutation of the database data, including deletes, and not just a SQL UPDATE, which I do, I think you can make an argument that the observed behaviour is not compliant with optimistic locking.
Many people agree, given the comments on my question and the subsequent discovery of this bug.
From a purely practical point of view, the behaviour, compliant or not, could lead to quite a few bugs, because it is contrary to many developers' expectations. There does not seem to be an easy fix for it. In fact, Spring Data JPA seems to ignore this issue completely by blindly using EM#merge. Maybe other JPA providers handle this differently, but with Hibernate this could cause issues.
I'm actually working around this by using Session#update currently. It's really ugly, and requires code to handle the case when you try to update an entity that is detached, and there's a managed copy of it already. But, it won't lead to spurious inserts either.
1.Is my observed Hibernate behaviour the intended behaviour?
The behavior is correct. You just trying to do operations that are not protected against concurrent data modification :) If you have to split the operation into two sessions. Just find the object for update again and check if it is still there, throw exception if not. If there is one then lock it by using em.(class, primary key, LockModeType); or using #Version or #Entity(optimisticLock=OptimisticLockType.ALL/DIRTY/VERSION) to protect the object till the end of the transaction.
2.If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
Probably: yes
3.If the answer to 2. is yes, why is nobody complaining about it?
Because if you protect your operations using pessimistic or optimistic locking the problem will disappear:)
The problem you are trying to solve is called: Non-repeatable read

after performing an merge on the detached object in hibernate in the current session will the changes on the object be tracked?

In a container managed transaction i get a detached object and merge it so that the detached object is brought to managed state.My initial question is by caching the Pojo java objects and merging is a better idea to get the object into session or performing the get of the data from the DB to get in to session context a better idea in terms of cost of operation/time involved in getting the data from the DB?If i am performing an merge at start to get the object into the session context and doing the modification on this merged object will the hibernate take care of generating all the required sql statements and at the end will it be taken care ?
Please comment back which is better approach to get the entity to session , using a merge of the cached detached object or fetching the data from the DB is lesser time consumption?
when you call detach and then merge, merge returns you the attached entity in the context. it's a common mistake that users would use the passed entity after merge operation hoping that would be managed but this is not the case. you have to use the returned entity from merge which will be managed by hibernate and subsequent changes will be flushed at transaction end automatically.
it doesnt matter much when u load your entity coz hibernate will anyways fire a select if it is already not loaded in the context. also even if you keep on doing changes to your managed entity, hibernate will fire update only when you exit your transaction or call flush() explicitly.
Copy the state of the given object onto the persistent object with the same identifier. If there is no persistent instance currently associated with the session, it will be loaded. Return the persistent instance. If the given instance is unsaved, save a copy of and return it as a newly persistent instance. The given instance does not become associated with the session. This operation cascades to associated instances if the association is mapped with cascade="merge".
According to the API it saves a copy when you perform the merge and then returns a new instance. Based on my experience its always better to merge at the end after you have performed all the updates on the objects in detached state. Its better because you will call merge operation only at the end when the object state is ready to be persisted.
Also this will perform better because the object is moved to persistent context at the end and hence Hibernate will not have to come into picture till the end.

Hibernate overriding database modifications with detached object state

I'm gonna go with this design:
create an object and keep it alive during all web-app session.
And I need to synchronize its state with database state.
What I want to achieve is that :
IF between my db operations, that is, modifications that I persist to a db
someone intentionally spoils table rows, then on next saving to a database
all those changes WOULD BE OVERWRITTEN with the object state, that always contains valid data.
What Hibernate methods do you recommend me to use to persist the modifications in a database?
saveOrUpdate() is a possible solution, but maybe there's anything better?
Again, I repeat how it looks. First I create an object without collections. Persist it (save()).
Then user provides us with additional data. In a serviceLayer, again, we modify our object in memory (say, populate it with collections) and then, persist it again.
So every serviceLayer operation of the next step must simply guarantee that database contains the exact persistent copy of this object that we have in memory. If data in a database differ, it MUST BE OVERRIDDEN with the object (kept in memory) state.
What Session operations do you recommend?
FWIW saveOrUpdate() looks like the best option overall:
The saveOrUpdate() method is in practice more useful than update(),
save(), or lock(): In complex conversations, you don’t know if the item is in
detached state or if it’s new and transient and must be saved. The automatic
state-detection provided by saveOrUpdate() becomes even more useful when you
not only work with single instances, but also want to reattach or persist a network
of connected objects and apply cascading options.
However for your case, if you are sure the entity was modified in detached state, and/or don't mind occasionally hitting the DB with an unnecessary UPDATE, maybe update() is the safest choice:
The update() operation
on the Session reattaches the detached object to the persistence context and
schedules an SQL UPDATE. Hibernate must assume that the client modified the
object while it was detached. [...] The persistence context is flushed automatically
when the second transaction in the conversation commits, and any
modifications to the once detached and now persistent object are synchronized
with the database.
Quotes from Java Persistence with Hibernate, chapter 11.2.2.

Categories