There appear to be two patterns to implement business transactions that span several http requests with JPA:
entity-manager-per-request with detached entities
extended persistence context
What are the respective advantages of these patterns? When should which be preferred?
So far, I came up with:
an extended persistence context guarantees that object identity is equivalent to database identity, simplifying the programming model and potentially disspelling the need to implement equals for entities
detached entities require less memory than an extended persistence context, as the persistence context also has to store the previous state of the entity for change detection
no longer referenced detached entities become eligible for garbage collection; persistent objects must first be detached explicitly
However, not having any practical experience with JPA I am sure I have missed something of importance, hence this question.
In case it matters: We intend to use JPA 2.0 backed by Hibernate 3.6.
Edit: Our view technology is JSF 2.0, in an EJB 3.1 container, with CDI and possibly Seam 3.
Well, I can enumerate challenges with trying to use extended persistence contexts in a web environment. Some things also depend on what your view technology is and if it's binding entities or view level middlemen.
EntityManagers are not threadsafe.
You don't need one per user session.
You need one per user session per
browser tab.
When an exception comes out of an
EntityManager, it is considered
invalid and needs to be closed and
replaced. If you're planning to
write your own framework extensions
for managing the extended lifecycle,
the implementation of this needs to
be bullet proof. Generally in an
EM-per-request setup the exception
goes to some kind of error page and
then loading the next page creates a
new one anyway like it always would
have.
Object equality is not going to be
100% automagically safe. As above,
an exception may have invalidated
the context an object loaded earlier
was associated with, so one fetched
now will not be equal. Making that
assumption also assumes an extremely
high level of skill and
understanding of how JPA works and
what the EM does among the
developers using it. e.g.,
accidentally using merge when it
wasn't needed will return a new
object which will not satisfy ==
with its field-identical
predecessor. (treating merge like a
SQL 'update' is an extremely common
JPA noobie 'error' particularly
because it's just a no-op most of
the time so it slides past.)
If you're using a view technology
that binds POJOs (e.g., SpringMVC)
and you're planning to bind web form
data directly onto your Entities,
you'll get in trouble quick.
Changes to an attached entity will
become persistent on the next
flush/commit, regardless of whether
they were done in a transaction or
not. Common error is, web form
comes in and binds some invalid data
onto an entity, validation fails and
trys to return a screen to inform
user. Building error screen
involves running a query. Query
triggers flush/commit of persistence
context. Changes bound to attached
entity get flushed to database,
hopefully causing SQL exception, but
maybe just persisting corrupt data.
(Problem 4 can of course also happen with session per request if the programming is sloppy, but you're not forced to actively work hard at avoiding it.)
Related
We have a somewhat huge application which started a decade ago and is still under active development. So some parts are still in J2EE 1.4 architecture, others using Java EE 5/6.
While testing some new code, I realized that I had data inconsistency between information coming in through old and new code parts, where the old one uses the Hibernate session directly and the new one an injected EntityManager. This led to the problem, that one part couldn't see new data from the other part and thus also created a database record, resulting in primary key constraint violation.
It is planned to migrate the old code completely to get rid of J2EE, but in the meantime - what can I do to coordinate database access between the two parts? And shouldn't at some point within the application server both ways come together in the Hibernate layer, regardless if accessed via JPA or directly?
You can mix both Hibernate Session and Entity Manager in the same application without any problem. The EntityManagerImpl simply delegates calls the a private SessionImpl instance.
What you describe is a Transaction configuration anomaly. Every database transaction runs in isolation (unless you use REAN_UNCOMMITED which I guess it's not the case), but once you commit it the changes are available from any other transaction or connection. So once a transaction is committed you should see al changes in any other Hibernate Session, JDBC connection or even your database UI manager tool.
You said that there was a primary key conflict. This can't happen if you use Hibernate identity or sequence generator. For the old hi-lo generator you can have problems if an external connection tries to insert records in the same table Hibernate uses an old hi/lo identifier generator.
This problem can also occur if there is a master/master replication anomaly. If you have multiple nodes and there is no strict consistency replication you can end up with primar key constraint violations.
Update
Solution 1:
When coordinating the new and the old code trying to insert the same entity, you could have a slect-than-insert logic running in a SERIALIZABLE transaction. The SERIALIZABLE transaction acquires the appropriate locks on tour behalf and so you can still have a default READ_COMMITTED isolation level, while only the problematic Service methods are marked as SERIALIZABLE.
So both the old code and the new code have this logic running a select for checking if there is already a row satisfying the select constraint, only to insert it if nothing is found. The SERIALIZABLE isolation level prevents phantom reads so I think it should prevent constraint violations.
Solution 2:
If you are open to delegate this task to JDBC, you might also investigate the MERGE SQL statement, if your current database supports it. Basically, this is an upsert operation issuing an update or an insert behind the scenes. This command is much more attractive since you can still run it with even on READ_COMMITTED. The only drawback is that you can't use Hibernate for it, and only some databases support it.
If you instanciate separately a SessionFactory for the old code and an EntityManagerFactory for new code, that can lead to different value in first level cache. If during a single Http request, you change a value in old code, but do not immediately commit, the value will be changed in session cache, but it will not be available for new code until it is commited. Independentely of any transaction or database locking that would protect persistent values, that mix of two different Hibernate session can give weird things for in memory values.
I admit that the injected EntityManager still uses Hibernate. IMHO the most robust solution is to get the EntityManagerFactory for the PersistenceUnit and cast it to an Hibernate EntityManagerFactoryImpl. Then you can directly access the the underlying SessionFactory :
SessionFactory sessionFactory = entityManagerFactory.getSessionFactory();
You can then safely use this SessionFactory in your old code, because now it is unique in your application and shared between old and new code.
You still have to deal with the problem of session creation-close and transaction management. I suppose it is allready implemented in old code. Without knowing more, I think that you should port it to JPA, because I am pretty sure that if an EntityManager exists, sessionFactory.getCurrentSession() will give its underlying Session but I cannot affirm anything for the opposite.
I've run into a similar problem when I had a list of enumerated lookup values, where two pieces of code would check for the existence of a given value in the list, and if it didn't exist the code would create a new entry in the database. When both of them came across the same non-existent value, they'd both try to create a new one and one would have its transaction rolled back (throwing away a bunch of other work we'd done in the transaction).
Our solution was to create those lookup values in a separate transaction that committed immediately; if that transaction succeeded, then we knew we could use that object, and if it failed, then we knew we simply needed to perform a get to retrieve the one saved by another process. Once we had a lookup object that we knew was safe to use in our session, we could happily do the rest of the DB modifications without risking the transaction being rolled back.
It's hard to know from your description whether your data model would lend itself to a similar approach, where you'd at least commit the initial version of the entity right away, and then once you're sure you're working with a persistent object you could do the rest of the DB modifications that you knew you needed to do. But if you can find a way to make that work, it would avoid the need to share the Session between the different pieces of code (and would work even if the old and new code were running in separate JVMs).
My question is this: Is there ever a role for JPA merge in a stateless web application?
There is a lot of discussion on SO about the merge operation in JPA. There is also a great article on the subject which contrasts JPA merge via a more manual Do-It-Yourself process (where you find the entity via the entity manager and make your changes).
My application has a rich domain model (ala domain-driven design) that uses the #Version annotation in order to make use of optimistic locking. We have also created DTOs to send over the wire as part of our RESTful web services. The creation of this DTO layer also allows us to send to the client everything it needs and nothing it doesn't.
So far, I understand this is a fairly typical architecture. My question is about the service methods that need to UPDATE (i.e. HTTP PUT) existing objects. In this case we have these two approaches 1) JPA Merge, and 2) DIY.
What I don't understand is how JPA merge can even be considered an option for handling updates. Here's my thinking and I am wondering if there is something I don't understand:
1) In order to properly create a detached JPA entity from a wire DTO, the version number must be set correctly...else an OptimisticLockException is thrown. But the JPA spec says:
An entity may access the state of its version field or property or
export a method for use by the application to access the version, but
must not modify the version value[30]. Only the persistence provider
is permitted to set or update the value of the version attribute in
the object.
2) Merge doesn't handle bi-directional relationships ... the back-pointing fields always end up as null.
3) If any fields or data is missing from the DTO (due to a partial update), then the JPA merge will delete those relationships or null-out those fields. Hibernate can handle partial updates, but not JPA merge. DIY can handle partial updates.
4) The first thing the merge method will do is query the database for the entity ID, so there is no performance benefit over DIY to be had.
5) In a DYI update, we load the entity and make the changes according to the DTO -- there is no call to merge or to persist for that matter because the JPA context implements the unit-of-work pattern out of the box.
Do I have this straight?
Edit:
6) Merge behavior with regards to lazy loaded relationships can differ amongst providers.
Using Merge does require you to either send and receive a complete representation of the entity, or maintain server side state. For trivial CRUD-y type operations, it is easy and convenient. I have used it plenty in stateless web apps where there is no meaningful security hazard to letting the client see the entire entity.
However, if you've already reduced operations to only passing the immediately relevant information, then you need to also manually write the corresponding services.
Just remember that when doing your 'DIY' update you still need to pass a Version number around on the DTO and manually compare it to the one that comes out of the database. Otherwise you don't get the Optimistic Locking that spans 'user think-time' that you would have if you were using the simpler approach with merge.
You can't change the version on an entity created by the provider, but when you have made your own instance of the entity class with the new keyword it is fine and expected to set the version on it.
It will make the persistent representation match the in-memory representation you provide, this can include making things null. Remember when an object is merged that object is supposed to be discarded and replaced with the one returned by merge. You are not supposed to merge an object and then continue using it. Its state is not defined by the spec.
True.
Most likely, as long as your DIY solution is also using the entity ID and not an arbitrary query. (There are other benefits to using the 'find' method over a query.)
True.
I would add:
7) Merge translates to insert or to update depending on the existence of the record on DB, hence it does not deal correctly with update-vs-delete optimistic concurrency. That is, if another user concurrently deletes the record and you update it, it must (1) throw a concurrency exception... but it does not, it just inserts the record as new one.
(1) At least, in most cases, in my opinion, it should. I can imagine some cases where I would want this use case to trigger a new insert, but they are far from usual. At least, I would like the developer to think twice about it, not just accept that "merge() == updateWithConcurrencyControl()", because it is not.
I'm building an application using JPA 2.0 (Hibernate implementation), Spring, and Wicket. Everything works, but I'm concerned that my form behaviour is based around side effects.
As a first step, I'm using the OpenEntityManagerInViewFilter. My domain objects are fetched by a LoadableDetachableModel which performs entityManager.find() in its load method. In my forms, I wrap a CompoundPropertyModel around this model to bind the data fields.
My concern is the form submit actions. Currently my form submits pass the result of form.getModelObject() into a service method annotated with #Transactional. Because the entity inside the model is still attached to the entity manager, the #Transactional annotation is sufficient to commit the changes.
This is fine, until I have multiple forms that operate on the same entity, each of which changes a subset of the fields. And yes, they may be accessed simultaneously. I've thought of a few options, but I'd like to know any ideas I've missed and recommendations on managing this for long-term maintainability:
Fragment my entity into sub-components corresponding to the edit forms, and create a master entity linking these together into a #OneToOne relationship. Causes an ugly table design, and makes it hard to change forms later.
Detach the entity immediately it's loaded by the LoadableDetachableModel, and manually merge the correct fields in the service layer. Hard to manage lazy loading, may need specialised versions of the model for each form to ensure correct sub-entities are loaded.
Clone the entity into a local copy when creating the model for the form, then manually merge the correct fields in the service layer. Requires implementation of a lot of copy constructors / clone methods.
Use Hibernate's dynamicUpdate option to only update changed fields of the entity. Causes non-standard JPA behaviour throughout the application. Not visible in the affected code, and causes a strong tie to Hibernate implementation.
EDIT
The obvious solution is to lock the entity (i.e. row) when you load it for form binding. This would ensure that the lock-owning request reads/binds/writes cleanly, with no concurrent writes taking place in the background. It's not ideal, so you'd need to weigh up the potential performance issues (level of concurrent writes).
Beyond that, assuming you're happy with "last write wins" on your property sub-groups, then Hibernate's 'dynamicUpdate' would seem like the most sensible solution, unless your thinking of switching ORMs anytime soon. I find it strange that JPA seemingly doesn't offer anything that allows you to only update the dirty fields, and find it likely that it will in the future.
Additional (my original answer)
Orthogonal to this is how to ensure you have a transaction open when when your Model loads an entity for form binding. The concern being that the entities properties are updated at that point and outside of transaction this leaves a JPA entity in an uncertain state.
The obvious answer, as Adrian says in his comment, is to use a traditional transaction-per-request filter. This guarantees that all operations within the request occur in single transaction. It will, however, definitely use a DB connection on every request.
There's a more elegant solution, with code, here. The technique is to lazily instantiate the entitymanager and begin the transaction only when required (i.e. when the first EntityModel.getObject() call happens). If there is a transaction open at the end of the request cycle, it is committed. The benefit of this is that there are never any wasted DB connections.
The implementation given uses the wicket RequestCycle object (note this is slightly different in v1.5 onwards), but the whole implementation is in fact fairly general, so and you could use it (for example) outwith wicket via a servlet Filter.
After some experiments I've come up with an answer. Thanks to #artbristol, who pointed me in the right direction.
I have set a rule in my architecture: DAO save methods must only be called to save detached entities. If the entity is attached, the DAO throws an IllegalStateException. This helped track down any code that was modifying entities outside a transaction.
Next, I modified my LoadableDetachableModel to have two variants. The classic variant, for use in read-only data views, returns the entity from JPA, which will support lazy loading. The second variant, for use in form binding, uses Dozer to create a local copy.
I have extended my base DAO to have two save variants. One saves the entire object using merge, and the other uses Apache Beanutils to copy a list of properties.
This at least avoids repetitive code. The downsides are the requirement to configure Dozer so that it doesn't pull in the entire database by following lazy loaded references, and having yet more code that refers to properties by name, throwing away type safety.
I tend to use Hibernate in combination with Spring framework and it's declarative transaction demarcation capabilities (e.g., #Transactional).
As we all known, hibernate tries to be as non-invasive and as transparent as possible, however this proves a bit more challenging when employing lazy-loaded relationships.
I see a number of design alternatives with different levels of transparency.
Make relationships not lazy-loaded (e.g., fetchType=FetchType.EAGER)
This vioalites the entire idea of lazy loading ..
Initialize collections using Hibernate.initialize(proxyObj);
This implies relatively high-coupling to the DAO
Although we can define an interface with initialize, other implementations are not guaranteed to provide any equivalent.
Add transaction behaviour to the persistent Model objects themselves (using either dynamic proxy or #Transactional)
I've not tried the dynamic proxy approach, although I never seemed to get #Transactional working on the persistent objects themselves. Probably due to that hibernate is operation on a proxy to bein with.
Loss of control when transactions are actually taking place
Provide both lazy/non-lazy API, e.g, loadData() and loadDataWithDeps()
Forces the application to know when to employ which routine, again tight coupling
Method overflow, loadDataWithA(), ...., loadDataWithX()
Force lookup for dependencies, e.g., by only providing byId() operations
Requires alot of non-object oriented routines, e.g., findZzzById(zid), and then getYyyIds(zid) instead of z.getY()
It can be useful to fetch each object in a collection one-by-one if there's a large processing overhead between the transactions.
Make part of the application #Transactional instead of only the DAO
Possible considerations of nested transactions
Requires routines adapted for transaction management (e.g., suffiently small)
Small programmatic impact, although might result in large transactions
Provide the DAO with dynamic fetch profiles, e.g., loadData(id, fetchProfile);
Applications must know which profile to use when
AoP type of transactions, e.g., intercept operations and perform transactions when necessary
Requires byte-code manipulation or proxy usage
Loss of control when transactions are performed
Black magic, as always :)
Did I miss any option?
Which is your preferred approach when trying to minimize the impact of lazy-loaded relationships in your application design?
(Oh, and sorry for WoT)
As we all known, hibernate tries to be as non-invasive and as transparent as possible
I would say the initial assumption is wrong. Transaparent persistence is a myth, since application always should take care of entity lifecycle and of size of object graph being loaded.
Note that Hibernate can't read thoughts, therefore if you know that you need a particular set of dependencies for a particular operation, you need to express your intentions to Hibernate somehow.
From this point of view, solutions that express these intentions explicitly (namely, 2, 4 and 7) look reasonable and don't suffer from the lack of transparency.
I am not sure which problem (caused by lazyness) you're hinting to, but for me the biggest pain is to avoid losing session context in my own application caches. Typical case:
object foo is loaded and put into a map;
another thread takes this object from the map and calls foo.getBar() (something that was never called before and is lazy evaluated);
boom!
So, to address this we have a number of rules:
wrap sessions as transparently as possible (e.g. OpenSessionInViewFilter for webapps);
have common API for threads/thread pools where db session bind/unbind is done somewhere high in the hierarchy (wrapped in try/finally) so subclasses don't have to think about it;
when passing objects between threads, pass IDs instead of objects themselves. Receiving thread can load object if it needs to;
when caching objects, never cache objects but their ids. Have an abstract method in your DAO or manager class to load the object from 2nd level Hibernate cache when you know the ID. The cost of retrieving objects from 2nd level Hibernate cache is still far cheaper than going to DB.
This, as you can see, is indeed nowhere close to non-invasive and transparent. But the cost is still bearable, to compare with the price I'd have to pay for eager loading. The problem with latter is that sometimes it leads to the butterfly effect when loading single referenced object, let alone a collection of entities. Memory consumption, CPU usage and latency to mention the least are also far worse, so I guess I can live with it.
A very common pattern is to use OpenEntityManagerInViewFilter if you're building a web application.
If you're building a service, I would open the TX on the public method of the service, rather than on the DAOs, as very often a method requires to get or update several entities.
This will solve any "Lazy Load exception". If you need something more advanced for performance tuning, I think fetch profiles is the way to go.
I'm asking this question given my chosen development frameworks of JPA (Hibernate implementation of), Spring, and <insert MVC framework here - Struts 1, Struts 2, Spring MVC, Stripes...>.
I've been thinking a bit about relationships in my entity layer - for example I have an order entity that has many order lines. I've set up my app so that it eagerly loads the order lines for every order. Do you think this is a lazy way to get around the lazy initialization problems that I would come across if I was to set the fetch strategy to false?
The way I see it, I have the following alternatives when retrieving entities and their associations:
Use the Open Session In View pattern to create the session on each request and commit the transaction before returning the response.
Implement a DTO (Data Transfer Object) layer such that every DAO query I execute returns the correctly initialized DTO for my purposes. I don't really like this option much because in my experience I've found that it creates a lot of boilerplate copying code and becomes messy to maintain.
Don't map any associations in JPA so that every query I execute returns only the entities I'm interested in - this will probably require me to have DTOs anyway and will be a pain to maintain and I think defeats the purpose of having an ORM in the first place.
Eagerly fetch all (or most associations) - in the example above, always fetch all order lines when I retrieve an order.
So my question is, when and under what circumstances would you use which of these options? Do you always stick with one way of doing it?
I would ask a colleague but I think that if I even mentioned the term 'Open Session in View' I would be greeted with blank stares :( What I'm really looking for here is some advice from a senior or very experienced developer.
Thanks guys!
Open Session in View has some problems.
For example, if the transaction fails, you might know it too late at commit time, once you are nearly done rendering your page (possibly the response already commited, so you can't change the page !) ... If you had know that error before, you would have followed a different flow and ended up rendering a different page...
Other example, reading data on-demand might turn to many "N+1 select" problems, that kill your performance.
Many projects use the following path:
Maintain transactions at the business layer ; load at that point everything you are supposed to need.
Presentation layer runs the risk of LazyExceptions : each is considered a programming error, caught during tests, and corrected by loading more data in the business layer (you have the opportunity to do it efficiently, avoiding "N+1 select" problems).
To avoid creating extra classes for DTOs, you can load the data inside the entity objects themselves. This is the whole point of the POJO approach (uses by modern data-access layers, and even integration technologies like Spring).
I've successfully solved all my lazy initialization problems with Open Session In View -pattern (ie. the Spring implementation). The technologies I used were the exact same as you have.
Using this pattern allows me to fully map the entity relationships and not worry about fetching child entities in the dao. Mostly. In 90% of the cases the pattern solves the lazy initialization needs in the view. In some cases you'll have to "manually" initialize relationships. These cases were rare and always involved very very complex mappings in my case.
When using Open Entity Manager In View pattern it's important to define the entity relationships and especially propagation and transactional settings correctly. If these are not configured properly, there will be errors related to closed sessions when some entity is lazily initialized in the view and it fails due to the session having been closed already.
I definately would go with option 1. Option 2 might be needed sometimes, but I see absolutely no reason to use option 3. Option 4 is also a no no. Eagerly fetching everything kills the performance of any view that needs to list just a few properties of some parent entities (orders in tis case).
N+1 Selects
During development there will be N+1 selects as a result of initializing some relationships in the view. But this is not a reason to discard the pattern. Just fix these problems as they arise and before delivering the code to production. It's as easy to fix these problems with OEMIV pattern as it's with any other pattern: add the proper dao or service methods, fix the controller to call a different finder method, maybe add a view to the database etc.
I have successfully used the Open-Session-in-View pattern on a project. However, I recently read in "Spring In Practice" of an interesting potential problem with non-repeatable reads if you manage your transactions at a lower layer while keeping the Hibernate session open in the view layer.
We managed most of our transactions in the service layer, but kept the hibernate session open in the view layer. This meant that lazy reads in the view were resulting in separate read transactions.
We managed our transactions in our service layer to minimize transaction duration. For instance, some of our service calls resulted in both a database transaction and a web service call to an external service. We did not want our transaction to be open while waiting for a web service call to respond.
As our system never went into production, I am not sure if there were any real problems with it, but I suspect that there was the potential for the view to attempt to lazily load an object that has been deleted by someone else.
There are some benefits of DTO approach though. You have to think beforehand what information you need. In some cases this will prevent you from generating n+1 select statements. It helps also to see where to use eager fetching and/or optimized views.
I'll also throw my weight behind the Open-Session-in-View pattern, having been in the exact same boat before.
I work with Stripes without spring, and have created a manual filter before that tends to work well. Coding transaction logic on the backend turns messy really quick as you've mentioned. Eagerly fetching everything becomes TERRIBLE as you map more and more objects to each other.
One thing I want to add that you may not have come across is Stripersist and Stripernate - Stripersist being the more JPA flavor - auto-hydration filters that take a lot of the work off your shoulders.
With Stripersist you can say things like /appContextRoot/actions/view/3 and it will auto-hydrate the JPA Entity on the ActionBean with id of 3 before the event is executed.
Stripersist is in the stripes-stuff package on sourceforge. I now use this for all new projects, as it's clean and easily supports multiple datasources if necessary.
Does the Order and Order Lines compose a high volume of data? Do they take part in online processes where real-time response is required? If so, you might consider not using eager fetching - it does make a huge diference in performance. If the amount of data is small, there is no problem in eager fetching.
About using DTOs, it might be a viable implementation.
If your business layer is used internally by your own application (i.e a small web app and its business logic) it'd probably be best to use your own entities in your view with open session in view pattern since it's simpler.
If your entities are used by many applications (i.e a backend application providing a service in your corporation) it'd be interesting to use DTOs since you would not expose your model to your clients. Exposing it could mean you would have a harder time refactoring your model since it could mean breaking contracts with your clients. A DTO would make that easier since you have another layer of
abstraction. This can be a bit strange since EJB3 would theorically eliminate the need of DTOs.