Working with JPA / Hibernate in an OSIV Web environment is driving me mad ;)
Following scenario: I have an entity A that is loaded via JPA and has a collection of B entities. Those B entities have a required field.
When the user adds a new B to A by pressing a link in the webapp, that required field is not set (since there is no sensible default value).
Upon the next http request, the OSIV filter tries to merge the A entity, but this fails as Hibernate complains that the new B has a required field is not set.
javax.persistence.PersistenceException: org.hibernate.PropertyValueException: not-null property references a null or transient value
Reading the JPA spec, i see no sign that those checks are required in the merge phase (i have no transaction active)
I can't keep the collection of B's outside of A and only add them to A when the user presses 'save' (aka entitymanager.persist()) as the place where the save button is does not know about the B's, only about A.
Also A and B are only examples, i have similar stuff all over the place ..
Any ideas? Do other JPA implementaions behave the same here?
Thanks in advance.
I did a lot reading and testing. The problem come from my misunderstanding of JPA / Hibernate. merge() always does a hit on the DB and also schedules an update for the entity. I did not find any mention of this in the JPA spec, but the 'Java Persistence with Hibernate' book does mention it.
Looking through the EntityManager (and Session as fallback) API it looks as if there is no means of just assigning an entity to the current persistent context WITHOUT scheduling an update. After all, what I want is to navigate the object graph, changing properties as needed and trigger an update (with version check if needed) later on. Something i think every Webapp out there using ORM must do?
The basic workflow i 'm looking for:
load an entity from the DB (or create a new one)
let the entity (and all its associations become detached (as the EntitManager closes at the end of a HTTP request)
when the next HTTP request comes in, work again with those objects, navigating the tree without fear of LazyInitExceptions
call a method that persists all changes made during 1-3)
With the OSIV filter from spring in conjunction with an IModel implementation from wicket i thought i have archived this.
I basically see 2 possible ways out of it:
a) load the entity and all the associations needed when entering a certain page (use case), letting them become detached, adding/ changing them as needed in the course of several http requests. Than reattach them when the user initiates a save (validators will ensure a valid state) and submit them to the database.
b) use the current setup, but make sure that all newly added entities have all their required fields set (probably using some wizard components). i would still have all the updates to the database for every merge(), but hopefully the database admin won't realize ;)
How do other people work with JPA in a web environment? Any other options for me?
Related
This seems like it would come up often, but I've Googled to no avail.
Suppose you have a Hibernate entity User. You have one User in your DB with id 1.
You have two threads running, A and B. They do the following:
A gets user 1 and closes its Session
B gets user 1 and deletes it
A changes a field on user 1
A gets a new Session and merges user 1
All my testing indicates that the merge attempts to find user 1 in the DB (it can't, obviously), so it inserts a new user with id 2.
My expectation, on the other hand, would be that Hibernate would see that the user being merged was not new (because it has an ID). It would try to find the user in the DB, which would fail, so it would not attempt an insert or an update. Ideally it would throw some kind of concurrency exception.
Note that I am using optimistic locking through #Version, and that does not help matters.
So, questions:
Is my observed Hibernate behaviour the intended behaviour?
If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
If the answer to 2. is yes, why is nobody complaining about it?
Please see the text from hibernate documentation below.
Copy the state of the given object onto the persistent object with the same identifier. If there is no persistent instance currently associated with the session, it will be loaded. Return the persistent instance. If the given instance is unsaved, save a copy of and return it as a newly persistent instance.
It clearly stated that copy the state(data) of object in database. if object is not there then save a copy of that data. When we say save a copy hibernate always create a record with new identifier.
Hibernate merge function works something like as follows.
It checks the status(attached or detached to the session) of entity and found it detached.
Then it tries to load the entity with identifier but not found in database.
As entity is not found then it treat that entity as transient.
Transient entity always create a new database record with new identifier.
Locking is always applied to attached entities. If entity is detached then hibernate will always load it and version value gets updated.
Locking is used to control concurrency problems. It is not the concurrency issue.
I've been looking at JSR-220, from which Session#merge claims to get its semantics. The JSR is sadly ambiguous, I have found.
It does say:
Optimistic locking is a technique that is used to insure that updates
to the database data corresponding to the state of an entity are made
only when no intervening transaction has updated that data since the
entity state was read.
If you take "updates" to include general mutation of the database data, including deletes, and not just a SQL UPDATE, which I do, I think you can make an argument that the observed behaviour is not compliant with optimistic locking.
Many people agree, given the comments on my question and the subsequent discovery of this bug.
From a purely practical point of view, the behaviour, compliant or not, could lead to quite a few bugs, because it is contrary to many developers' expectations. There does not seem to be an easy fix for it. In fact, Spring Data JPA seems to ignore this issue completely by blindly using EM#merge. Maybe other JPA providers handle this differently, but with Hibernate this could cause issues.
I'm actually working around this by using Session#update currently. It's really ugly, and requires code to handle the case when you try to update an entity that is detached, and there's a managed copy of it already. But, it won't lead to spurious inserts either.
1.Is my observed Hibernate behaviour the intended behaviour?
The behavior is correct. You just trying to do operations that are not protected against concurrent data modification :) If you have to split the operation into two sessions. Just find the object for update again and check if it is still there, throw exception if not. If there is one then lock it by using em.(class, primary key, LockModeType); or using #Version or #Entity(optimisticLock=OptimisticLockType.ALL/DIRTY/VERSION) to protect the object till the end of the transaction.
2.If so, is it the same behaviour when calling merge on a JPA EntityManager instead of a Hibernate Session?
Probably: yes
3.If the answer to 2. is yes, why is nobody complaining about it?
Because if you protect your operations using pessimistic or optimistic locking the problem will disappear:)
The problem you are trying to solve is called: Non-repeatable read
I get an entity 'A' using
getHibernateTemplate().get(A.class, 100)
from the database. Lets say this entity 'A' has a property 'value' 200 in the database.
Now, in my Java code, I change a property for this entity. lets say, I change the 'value' property to '500' and then add it to some list.
Now, If I again do getHibernateTemplate().get(A.class, 100) for the same Entity, I am getting the updated entity(that has a value of 500). How do I force hibernate to get me the entity from the database, but not the one updated in my code?
Is this what is called as 'First Level Caching'?
Your assumption (about first level caching) is correct. As for example stated here: Interface Session:
The main runtime interface between a Java application and Hibernate.
This is the central API class abstracting the notion of a persistence service.
Or here Chapter 2. Architecture; 2.1. Overview
Extract: Session (org.hibernate.Session)
A single-threaded, short-lived object representing a conversation between the application and the persistent store. It wraps a JDBC
connection and is a factory for Transaction. Session holds a mandatory
first-level cache of persistent objects that are used when navigating
the object graph or looking up objects by identifier.
And also, you can see the methods available for us, to remove an object form the session:
evict(Object object):
Remove this instance from the session cache.
refresh(Object object):
Re-read the state of the given instance from the underlying database.
clear():
Completely clear the session.
And many more. Evict in this case should be working. We have to take the current instance ('A') and explicitly Evict it from the session.
If we've already loaded some/more stuff, and we do not know, what to Evict(), we simply need to get the fresh data. Then we can call Clear() to completely reset the session and start again.
This is a bit radical, because none of the objects in the session will be updated/inserted on session Flush()... but it could be what we want in this scenario (very often used for testing... load, clear... change and flush)
I suggest searching Google for hibernate commit, flush, and detach and reading up on when they write to the database. Better yet, I recommend reading a good book on Hibernate if you haven't already done so (search amazon.com for good reviews on a book) to get a good grasp of the technology.
My reason for responding to this post is not to answer your question directly, but suggest that you edit your hibernate.cfg.xml file and set the following to true:
< property name="hibernate.show_sql" > false < /property >.
This will cause a display to your console window to list when every sql statement that is sent to the database. This way, you can see exactly when a write to the database occurs. You can then experiment with what you research/read and verify it works as you expect.
We are creating a new web application backed by JPA to replace an old web application. As part of the migration we are converting the old application's database to a new, more sophisticated, JPA-managed database.
So I've written a 'script' that converts the old database to a set of JPA entities and subsequently saves them. It works like this:
Create an order of conversion based on the dependencies of the domain models
For each entity
Execute database query to legacy DB
Store new object for each obtained table row in a list in memory
Iterate over generated lists in the same order as the conversion, and persist each entity.
Now, the first two steps work well. Upon persisting, however I get an exception. The exception occurs when one entity has a relation to another entity. For example if one of our entities would be a Book and another would be Chapter defining a #ManyToOne(optional=false) relation to Book. Upon persisting the Chapter, it throws the exception java.lang.IllegalStateException: org.hibernate.TransientPropertyValueException: Not-null property references a transient value - transient instance must be saved before current operation: models.Chapter.book -> models.Book.
Of course, this indicates that something is wrong with the state of the book: it seems it is either not set or has not yet been persisted. However, I can verify that the Book is set properly in the conversion of the Chapter, and I can also verify that all entities of type Book are persisted by the EntityManager before the entities of type Chapter get persisted. Obviously, my JPA provider does not behave as expected and does not truly persist my Book objects for some reason.
What solution would allow me to save the entire graph of objects that I have converted to the database? I use Hibernate as my JPA provider and I also use Spring 3.1 for injection of dependencies and EntityManagers.
EDIT 1: Some additional info: I've again verified that entityManager.persist() is called on each of the book objects before entityManager.persist() is called on the chapters. However, the id of the book object remains null, meaning it is not properly persisted. The database also remains empty, despite not using transactions.
EDIT 2: Because I don't think it's clear from the text above: the Book and Chapter story is just an example. It happens for any entity that references another entity. This makes it seem as if I'm not using JPA/Hibernate properly as opposed to not setting the values of my entities properly.
EDIT 3: The core issue seems to be that despite persisting Book properly, having all the right annotations, book.getId() remains null. Basically, Hibernate is not setting the ids on my entities after persisting them, leading to problems when I need to use those entities later.
I once battled with such an error from hibernate myself. It turned out that it was a combination of a circle in the object graph and the cascade settings that caused the problem.
It has been a while so the fowlling might not be 100% accurate but maybe it is enough information to track your problem:
Hibernate Wants to insert the chapter. Realizes it needs to insert the book first.
Wants to insert the book. Realizes it needs to insert another entity first (e.g. publisher)
Inserts publisher and performs cascades defined on publisher (e.g. authors)
Author has e.g. reference to his lastestBook. Because hibernate internally already marked the book as processed (in step 2) you would no get an exception stating that author.book references a transient instance.
To find out if this is your problem you can enable full hibernate debugging and follow the path hibernate is taking through your object graph.
I've found the answer thanks to the discussion I've had with user1888440.
The solution to this answer was that the Spring #Transactional annotation was nonfunctional in my application. This mean that everything Hibernate did didn't occur in the context of a transaction. This meant that Hibernate would not set ids after persisting and this meant that all conversions would break down.
The reason why #Transactional did not work is probably because of a fact I did not mention: this script is part of a Play 2.0 (actually 2.1) app and is thus built using SBT. SBT doesn't use a normal Java setup to build an application, but instead uses the Scala compiler to compile Java as well. My guess is that the Scala compile did not work well with the AspectJ that Spring requires to make #Transactional work.
Instead, I performed all of the database work involved in this conversion within a programmatically defined Spring transaction (section 11.6). Now everything behaves as expected.
Check he unsaved values for your primary key/Object ID in your hbm files.If you have automated ID creaion by hibernate framework and you are stting th ID somewhere it woudl throw this error.By defaut the unsaved-value is 0 , so if you set the ID as 0 you would see this error.
Sounds like you are forgetting to assign a Book to each Chapter before persisting it. Even if you have persisted the Book it needs to be assigned to the #book property of the Chapter instance before you can persist the Chapter. This is because you have specified the relationship as non-optional. #book can never be null.
My question is this: Is there ever a role for JPA merge in a stateless web application?
There is a lot of discussion on SO about the merge operation in JPA. There is also a great article on the subject which contrasts JPA merge via a more manual Do-It-Yourself process (where you find the entity via the entity manager and make your changes).
My application has a rich domain model (ala domain-driven design) that uses the #Version annotation in order to make use of optimistic locking. We have also created DTOs to send over the wire as part of our RESTful web services. The creation of this DTO layer also allows us to send to the client everything it needs and nothing it doesn't.
So far, I understand this is a fairly typical architecture. My question is about the service methods that need to UPDATE (i.e. HTTP PUT) existing objects. In this case we have these two approaches 1) JPA Merge, and 2) DIY.
What I don't understand is how JPA merge can even be considered an option for handling updates. Here's my thinking and I am wondering if there is something I don't understand:
1) In order to properly create a detached JPA entity from a wire DTO, the version number must be set correctly...else an OptimisticLockException is thrown. But the JPA spec says:
An entity may access the state of its version field or property or
export a method for use by the application to access the version, but
must not modify the version value[30]. Only the persistence provider
is permitted to set or update the value of the version attribute in
the object.
2) Merge doesn't handle bi-directional relationships ... the back-pointing fields always end up as null.
3) If any fields or data is missing from the DTO (due to a partial update), then the JPA merge will delete those relationships or null-out those fields. Hibernate can handle partial updates, but not JPA merge. DIY can handle partial updates.
4) The first thing the merge method will do is query the database for the entity ID, so there is no performance benefit over DIY to be had.
5) In a DYI update, we load the entity and make the changes according to the DTO -- there is no call to merge or to persist for that matter because the JPA context implements the unit-of-work pattern out of the box.
Do I have this straight?
Edit:
6) Merge behavior with regards to lazy loaded relationships can differ amongst providers.
Using Merge does require you to either send and receive a complete representation of the entity, or maintain server side state. For trivial CRUD-y type operations, it is easy and convenient. I have used it plenty in stateless web apps where there is no meaningful security hazard to letting the client see the entire entity.
However, if you've already reduced operations to only passing the immediately relevant information, then you need to also manually write the corresponding services.
Just remember that when doing your 'DIY' update you still need to pass a Version number around on the DTO and manually compare it to the one that comes out of the database. Otherwise you don't get the Optimistic Locking that spans 'user think-time' that you would have if you were using the simpler approach with merge.
You can't change the version on an entity created by the provider, but when you have made your own instance of the entity class with the new keyword it is fine and expected to set the version on it.
It will make the persistent representation match the in-memory representation you provide, this can include making things null. Remember when an object is merged that object is supposed to be discarded and replaced with the one returned by merge. You are not supposed to merge an object and then continue using it. Its state is not defined by the spec.
True.
Most likely, as long as your DIY solution is also using the entity ID and not an arbitrary query. (There are other benefits to using the 'find' method over a query.)
True.
I would add:
7) Merge translates to insert or to update depending on the existence of the record on DB, hence it does not deal correctly with update-vs-delete optimistic concurrency. That is, if another user concurrently deletes the record and you update it, it must (1) throw a concurrency exception... but it does not, it just inserts the record as new one.
(1) At least, in most cases, in my opinion, it should. I can imagine some cases where I would want this use case to trigger a new insert, but they are far from usual. At least, I would like the developer to think twice about it, not just accept that "merge() == updateWithConcurrencyControl()", because it is not.
I'm building an application using JPA 2.0 (Hibernate implementation), Spring, and Wicket. Everything works, but I'm concerned that my form behaviour is based around side effects.
As a first step, I'm using the OpenEntityManagerInViewFilter. My domain objects are fetched by a LoadableDetachableModel which performs entityManager.find() in its load method. In my forms, I wrap a CompoundPropertyModel around this model to bind the data fields.
My concern is the form submit actions. Currently my form submits pass the result of form.getModelObject() into a service method annotated with #Transactional. Because the entity inside the model is still attached to the entity manager, the #Transactional annotation is sufficient to commit the changes.
This is fine, until I have multiple forms that operate on the same entity, each of which changes a subset of the fields. And yes, they may be accessed simultaneously. I've thought of a few options, but I'd like to know any ideas I've missed and recommendations on managing this for long-term maintainability:
Fragment my entity into sub-components corresponding to the edit forms, and create a master entity linking these together into a #OneToOne relationship. Causes an ugly table design, and makes it hard to change forms later.
Detach the entity immediately it's loaded by the LoadableDetachableModel, and manually merge the correct fields in the service layer. Hard to manage lazy loading, may need specialised versions of the model for each form to ensure correct sub-entities are loaded.
Clone the entity into a local copy when creating the model for the form, then manually merge the correct fields in the service layer. Requires implementation of a lot of copy constructors / clone methods.
Use Hibernate's dynamicUpdate option to only update changed fields of the entity. Causes non-standard JPA behaviour throughout the application. Not visible in the affected code, and causes a strong tie to Hibernate implementation.
EDIT
The obvious solution is to lock the entity (i.e. row) when you load it for form binding. This would ensure that the lock-owning request reads/binds/writes cleanly, with no concurrent writes taking place in the background. It's not ideal, so you'd need to weigh up the potential performance issues (level of concurrent writes).
Beyond that, assuming you're happy with "last write wins" on your property sub-groups, then Hibernate's 'dynamicUpdate' would seem like the most sensible solution, unless your thinking of switching ORMs anytime soon. I find it strange that JPA seemingly doesn't offer anything that allows you to only update the dirty fields, and find it likely that it will in the future.
Additional (my original answer)
Orthogonal to this is how to ensure you have a transaction open when when your Model loads an entity for form binding. The concern being that the entities properties are updated at that point and outside of transaction this leaves a JPA entity in an uncertain state.
The obvious answer, as Adrian says in his comment, is to use a traditional transaction-per-request filter. This guarantees that all operations within the request occur in single transaction. It will, however, definitely use a DB connection on every request.
There's a more elegant solution, with code, here. The technique is to lazily instantiate the entitymanager and begin the transaction only when required (i.e. when the first EntityModel.getObject() call happens). If there is a transaction open at the end of the request cycle, it is committed. The benefit of this is that there are never any wasted DB connections.
The implementation given uses the wicket RequestCycle object (note this is slightly different in v1.5 onwards), but the whole implementation is in fact fairly general, so and you could use it (for example) outwith wicket via a servlet Filter.
After some experiments I've come up with an answer. Thanks to #artbristol, who pointed me in the right direction.
I have set a rule in my architecture: DAO save methods must only be called to save detached entities. If the entity is attached, the DAO throws an IllegalStateException. This helped track down any code that was modifying entities outside a transaction.
Next, I modified my LoadableDetachableModel to have two variants. The classic variant, for use in read-only data views, returns the entity from JPA, which will support lazy loading. The second variant, for use in form binding, uses Dozer to create a local copy.
I have extended my base DAO to have two save variants. One saves the entire object using merge, and the other uses Apache Beanutils to copy a list of properties.
This at least avoids repetitive code. The downsides are the requirement to configure Dozer so that it doesn't pull in the entire database by following lazy loaded references, and having yet more code that refers to properties by name, throwing away type safety.