Hibernate lazy-load application design - java

I tend to use Hibernate in combination with Spring framework and it's declarative transaction demarcation capabilities (e.g., #Transactional).
As we all known, hibernate tries to be as non-invasive and as transparent as possible, however this proves a bit more challenging when employing lazy-loaded relationships.
I see a number of design alternatives with different levels of transparency.
Make relationships not lazy-loaded (e.g., fetchType=FetchType.EAGER)
This vioalites the entire idea of lazy loading ..
Initialize collections using Hibernate.initialize(proxyObj);
This implies relatively high-coupling to the DAO
Although we can define an interface with initialize, other implementations are not guaranteed to provide any equivalent.
Add transaction behaviour to the persistent Model objects themselves (using either dynamic proxy or #Transactional)
I've not tried the dynamic proxy approach, although I never seemed to get #Transactional working on the persistent objects themselves. Probably due to that hibernate is operation on a proxy to bein with.
Loss of control when transactions are actually taking place
Provide both lazy/non-lazy API, e.g, loadData() and loadDataWithDeps()
Forces the application to know when to employ which routine, again tight coupling
Method overflow, loadDataWithA(), ...., loadDataWithX()
Force lookup for dependencies, e.g., by only providing byId() operations
Requires alot of non-object oriented routines, e.g., findZzzById(zid), and then getYyyIds(zid) instead of z.getY()
It can be useful to fetch each object in a collection one-by-one if there's a large processing overhead between the transactions.
Make part of the application #Transactional instead of only the DAO
Possible considerations of nested transactions
Requires routines adapted for transaction management (e.g., suffiently small)
Small programmatic impact, although might result in large transactions
Provide the DAO with dynamic fetch profiles, e.g., loadData(id, fetchProfile);
Applications must know which profile to use when
AoP type of transactions, e.g., intercept operations and perform transactions when necessary
Requires byte-code manipulation or proxy usage
Loss of control when transactions are performed
Black magic, as always :)
Did I miss any option?
Which is your preferred approach when trying to minimize the impact of lazy-loaded relationships in your application design?
(Oh, and sorry for WoT)

As we all known, hibernate tries to be as non-invasive and as transparent as possible
I would say the initial assumption is wrong. Transaparent persistence is a myth, since application always should take care of entity lifecycle and of size of object graph being loaded.
Note that Hibernate can't read thoughts, therefore if you know that you need a particular set of dependencies for a particular operation, you need to express your intentions to Hibernate somehow.
From this point of view, solutions that express these intentions explicitly (namely, 2, 4 and 7) look reasonable and don't suffer from the lack of transparency.

I am not sure which problem (caused by lazyness) you're hinting to, but for me the biggest pain is to avoid losing session context in my own application caches. Typical case:
object foo is loaded and put into a map;
another thread takes this object from the map and calls foo.getBar() (something that was never called before and is lazy evaluated);
boom!
So, to address this we have a number of rules:
wrap sessions as transparently as possible (e.g. OpenSessionInViewFilter for webapps);
have common API for threads/thread pools where db session bind/unbind is done somewhere high in the hierarchy (wrapped in try/finally) so subclasses don't have to think about it;
when passing objects between threads, pass IDs instead of objects themselves. Receiving thread can load object if it needs to;
when caching objects, never cache objects but their ids. Have an abstract method in your DAO or manager class to load the object from 2nd level Hibernate cache when you know the ID. The cost of retrieving objects from 2nd level Hibernate cache is still far cheaper than going to DB.
This, as you can see, is indeed nowhere close to non-invasive and transparent. But the cost is still bearable, to compare with the price I'd have to pay for eager loading. The problem with latter is that sometimes it leads to the butterfly effect when loading single referenced object, let alone a collection of entities. Memory consumption, CPU usage and latency to mention the least are also far worse, so I guess I can live with it.

A very common pattern is to use OpenEntityManagerInViewFilter if you're building a web application.
If you're building a service, I would open the TX on the public method of the service, rather than on the DAOs, as very often a method requires to get or update several entities.
This will solve any "Lazy Load exception". If you need something more advanced for performance tuning, I think fetch profiles is the way to go.

Related

Is it a bad idea to keep an EntityManager open for the duration of the application lifetime?

I am writing an application in Java SE 8 and have recently migrated the database system from raw JDBC code to JPA. The interface itself is so much simpler, but I am running into an issue with the way I have designed my code which does not work well with JPA and I am unsure of how to proceed.
The primary issue I am having is that I cannot store references to my entities in code for any period of time anymore, because they immediately become out-of-date. I used to have a central persistence context where the one "true" instance of all my entities were always stored in code, and changes made to them would always be reflected everywhere because there were no duplicate instances. I realize this is not smart design when it comes to memory efficiency, but that allowed me to, for instance, implement the observer pattern and guarantee that any entity updates would be immediately visible in GUIs. But now, as soon as I load an entity from the database using JPA and close the EntityManager (as I have read so often that you must do), that instance merely represents a snapshot in time from when it was loaded and my GUIs will be waiting for updates from a dead object. Loading that entity from elsewhere in the code and making a change will do nothing, as it is a different instance altogether, with an empty list of subscribers (transient). There are a lot more cases in my code where I attempt to hold a reference to an entity for whatever purpose, and a lot of them rely on those entities being up-to-date.
I know that EntityManager is intended to be a short-lived object, but now I am thinking that it maybe wouldn't be such a bad idea after all to keep an EntityManager open for the lifetime of my program to replace that construct that I had in my old code. I quite frankly don't understand what the point of closing EntityManager so quickly is - isn't it beneficial to have your entities managed over a longer period of time? When I was first reading about how changes to managed entities are detected and persisted automatically, I hoped that that would allow me to completely detach my business logic from my persistence layer, and trust that all my changes were being saved. It was rather disillusioning to discover that in order for those entities to be managed in the first place, I would have to leave the EntityManager open for the duration of that business logic. And that would require them to be scoped higher than the method they are created in, so I could close them later. But all the literature implores the use of short-lived, low-scoped EntityManagers, which just seems like a direct contradiction.
I am somewhat at a loss for how to proceed. I would love to make full use of JPA and all of its extremely useful features, but I feel like I might be missing the point of EntityManager being short-lived. It seems like it would be so much more convenient long-lived. Can anyone give me some guidance?
Your central 'cache' with a single instance of data is a common idea, but it is difficult to manage. Some orm/JPA providers have caching built in and maintain something similar (check out EclipeLink's shared cache) but they usually have complex mechanisms that allow for limiting and managing what could be endless amounts of data that can quickly become stale. EclipseLink has tie ins to the database to get notifications when data changes, and can be configured for cache coordination when being run in different servers. Without such capabilities, your cache will be stale - and worse, your cache will have great difficulty maintaining transactional isolation. Any change to those cached objects is immediately visible to all processes, regardless of the transaction going through to the database or rolling back. Use of JPA is meant to guarantee that you only see committed data (excluding the changes you've made in the current transaction/unit of work).
To answer your specific question about keeping an EM open as generally to JPA providers: EntityManagers keep hooks to the entities read in through them so that they can track and manage all changes made to them. This can lead to very large amounts of data being held - check the forum for memory leak questions, as keeping EMs open for an extended period is the cause of quite a few. You gain object identity, but have to realize it comes at the cost of tracking everything read in through them - so you will likely have to occasionally clear the memory (em.clear()) at some key points, or find provider specific mechanics to dereference what it might be holding onto so GC can do its thing.
Other draw backs are that the EntityManager then itself becomes very large and difficult to merge changes into. Depending on how you merge changes into your app, you'll need a way to get those changes into your database. Having JPA go through very large sets of entities that builds over time to find changes to a small dataset is very inefficient, and you'll still have to find ways to refresh these entities if change are done through other entityManagers or applications.

multi-layered caching with ehcache

A project I am working on, we have decided to implement ehcache for our caching purposes in our web based app. For the most part, we would be using it for Hibernate caching (of master tables) and query-caching.
However, we are also considering method caching at the DAO layer. Honestly, I am a bit skeptical about that. It could mean that say, we have a method at the DAO layer which in turn fires a query (which is already cached), would it make any sense to cache that method then? My feeling is that either we should cache the method OR cache the query that method eventually fires.
Please do let me know your inputs!
From my experience it very much depends on you application (and the kind / structure your data). The application I'm working currently on has 3 built-in layers of cache (all backed by Ehcache): one as Hibernate 2nd level cache, one for short termed hot objects at the middle tier and one for longer termed fat value object at the facade layer. The caches are tuned (cache query parameters, lifetimes, sizes, ...) to supplement each other well.
So a priori, I wouldn't say it doesn't work. There is a possiblity you can skip the entire ORM layer with that. If you have some profiling in place (I like perf4j for that) you can at least optimize for "hot" objects that are expensive to get from your DAOs. If you are using Spring Framework you can do this easily by applying e.g. #Cacheable annotations to your methods. Do your performance testing with (near) live data and requests if possible.
In fact, I think using Hibernate 2nd level cache is the easy / lazy thing to do (so to say a good first step), but the performance gains are limited. With some more specific caching you can easily get factors of hundreds or thousands as speedup for parts (hopefully the important ones) of you application, usually at reduced load.

How to control JPA persistence in Wicket forms?

I'm building an application using JPA 2.0 (Hibernate implementation), Spring, and Wicket. Everything works, but I'm concerned that my form behaviour is based around side effects.
As a first step, I'm using the OpenEntityManagerInViewFilter. My domain objects are fetched by a LoadableDetachableModel which performs entityManager.find() in its load method. In my forms, I wrap a CompoundPropertyModel around this model to bind the data fields.
My concern is the form submit actions. Currently my form submits pass the result of form.getModelObject() into a service method annotated with #Transactional. Because the entity inside the model is still attached to the entity manager, the #Transactional annotation is sufficient to commit the changes.
This is fine, until I have multiple forms that operate on the same entity, each of which changes a subset of the fields. And yes, they may be accessed simultaneously. I've thought of a few options, but I'd like to know any ideas I've missed and recommendations on managing this for long-term maintainability:
Fragment my entity into sub-components corresponding to the edit forms, and create a master entity linking these together into a #OneToOne relationship. Causes an ugly table design, and makes it hard to change forms later.
Detach the entity immediately it's loaded by the LoadableDetachableModel, and manually merge the correct fields in the service layer. Hard to manage lazy loading, may need specialised versions of the model for each form to ensure correct sub-entities are loaded.
Clone the entity into a local copy when creating the model for the form, then manually merge the correct fields in the service layer. Requires implementation of a lot of copy constructors / clone methods.
Use Hibernate's dynamicUpdate option to only update changed fields of the entity. Causes non-standard JPA behaviour throughout the application. Not visible in the affected code, and causes a strong tie to Hibernate implementation.
EDIT
The obvious solution is to lock the entity (i.e. row) when you load it for form binding. This would ensure that the lock-owning request reads/binds/writes cleanly, with no concurrent writes taking place in the background. It's not ideal, so you'd need to weigh up the potential performance issues (level of concurrent writes).
Beyond that, assuming you're happy with "last write wins" on your property sub-groups, then Hibernate's 'dynamicUpdate' would seem like the most sensible solution, unless your thinking of switching ORMs anytime soon. I find it strange that JPA seemingly doesn't offer anything that allows you to only update the dirty fields, and find it likely that it will in the future.
Additional (my original answer)
Orthogonal to this is how to ensure you have a transaction open when when your Model loads an entity for form binding. The concern being that the entities properties are updated at that point and outside of transaction this leaves a JPA entity in an uncertain state.
The obvious answer, as Adrian says in his comment, is to use a traditional transaction-per-request filter. This guarantees that all operations within the request occur in single transaction. It will, however, definitely use a DB connection on every request.
There's a more elegant solution, with code, here. The technique is to lazily instantiate the entitymanager and begin the transaction only when required (i.e. when the first EntityModel.getObject() call happens). If there is a transaction open at the end of the request cycle, it is committed. The benefit of this is that there are never any wasted DB connections.
The implementation given uses the wicket RequestCycle object (note this is slightly different in v1.5 onwards), but the whole implementation is in fact fairly general, so and you could use it (for example) outwith wicket via a servlet Filter.
After some experiments I've come up with an answer. Thanks to #artbristol, who pointed me in the right direction.
I have set a rule in my architecture: DAO save methods must only be called to save detached entities. If the entity is attached, the DAO throws an IllegalStateException. This helped track down any code that was modifying entities outside a transaction.
Next, I modified my LoadableDetachableModel to have two variants. The classic variant, for use in read-only data views, returns the entity from JPA, which will support lazy loading. The second variant, for use in form binding, uses Dozer to create a local copy.
I have extended my base DAO to have two save variants. One saves the entire object using merge, and the other uses Apache Beanutils to copy a list of properties.
This at least avoids repetitive code. The downsides are the requirement to configure Dozer so that it doesn't pull in the entire database by following lazy loaded references, and having yet more code that refers to properties by name, throwing away type safety.

Open Session In View Pattern

I'm asking this question given my chosen development frameworks of JPA (Hibernate implementation of), Spring, and <insert MVC framework here - Struts 1, Struts 2, Spring MVC, Stripes...>.
I've been thinking a bit about relationships in my entity layer - for example I have an order entity that has many order lines. I've set up my app so that it eagerly loads the order lines for every order. Do you think this is a lazy way to get around the lazy initialization problems that I would come across if I was to set the fetch strategy to false?
The way I see it, I have the following alternatives when retrieving entities and their associations:
Use the Open Session In View pattern to create the session on each request and commit the transaction before returning the response.
Implement a DTO (Data Transfer Object) layer such that every DAO query I execute returns the correctly initialized DTO for my purposes. I don't really like this option much because in my experience I've found that it creates a lot of boilerplate copying code and becomes messy to maintain.
Don't map any associations in JPA so that every query I execute returns only the entities I'm interested in - this will probably require me to have DTOs anyway and will be a pain to maintain and I think defeats the purpose of having an ORM in the first place.
Eagerly fetch all (or most associations) - in the example above, always fetch all order lines when I retrieve an order.
So my question is, when and under what circumstances would you use which of these options? Do you always stick with one way of doing it?
I would ask a colleague but I think that if I even mentioned the term 'Open Session in View' I would be greeted with blank stares :( What I'm really looking for here is some advice from a senior or very experienced developer.
Thanks guys!
Open Session in View has some problems.
For example, if the transaction fails, you might know it too late at commit time, once you are nearly done rendering your page (possibly the response already commited, so you can't change the page !) ... If you had know that error before, you would have followed a different flow and ended up rendering a different page...
Other example, reading data on-demand might turn to many "N+1 select" problems, that kill your performance.
Many projects use the following path:
Maintain transactions at the business layer ; load at that point everything you are supposed to need.
Presentation layer runs the risk of LazyExceptions : each is considered a programming error, caught during tests, and corrected by loading more data in the business layer (you have the opportunity to do it efficiently, avoiding "N+1 select" problems).
To avoid creating extra classes for DTOs, you can load the data inside the entity objects themselves. This is the whole point of the POJO approach (uses by modern data-access layers, and even integration technologies like Spring).
I've successfully solved all my lazy initialization problems with Open Session In View -pattern (ie. the Spring implementation). The technologies I used were the exact same as you have.
Using this pattern allows me to fully map the entity relationships and not worry about fetching child entities in the dao. Mostly. In 90% of the cases the pattern solves the lazy initialization needs in the view. In some cases you'll have to "manually" initialize relationships. These cases were rare and always involved very very complex mappings in my case.
When using Open Entity Manager In View pattern it's important to define the entity relationships and especially propagation and transactional settings correctly. If these are not configured properly, there will be errors related to closed sessions when some entity is lazily initialized in the view and it fails due to the session having been closed already.
I definately would go with option 1. Option 2 might be needed sometimes, but I see absolutely no reason to use option 3. Option 4 is also a no no. Eagerly fetching everything kills the performance of any view that needs to list just a few properties of some parent entities (orders in tis case).
N+1 Selects
During development there will be N+1 selects as a result of initializing some relationships in the view. But this is not a reason to discard the pattern. Just fix these problems as they arise and before delivering the code to production. It's as easy to fix these problems with OEMIV pattern as it's with any other pattern: add the proper dao or service methods, fix the controller to call a different finder method, maybe add a view to the database etc.
I have successfully used the Open-Session-in-View pattern on a project. However, I recently read in "Spring In Practice" of an interesting potential problem with non-repeatable reads if you manage your transactions at a lower layer while keeping the Hibernate session open in the view layer.
We managed most of our transactions in the service layer, but kept the hibernate session open in the view layer. This meant that lazy reads in the view were resulting in separate read transactions.
We managed our transactions in our service layer to minimize transaction duration. For instance, some of our service calls resulted in both a database transaction and a web service call to an external service. We did not want our transaction to be open while waiting for a web service call to respond.
As our system never went into production, I am not sure if there were any real problems with it, but I suspect that there was the potential for the view to attempt to lazily load an object that has been deleted by someone else.
There are some benefits of DTO approach though. You have to think beforehand what information you need. In some cases this will prevent you from generating n+1 select statements. It helps also to see where to use eager fetching and/or optimized views.
I'll also throw my weight behind the Open-Session-in-View pattern, having been in the exact same boat before.
I work with Stripes without spring, and have created a manual filter before that tends to work well. Coding transaction logic on the backend turns messy really quick as you've mentioned. Eagerly fetching everything becomes TERRIBLE as you map more and more objects to each other.
One thing I want to add that you may not have come across is Stripersist and Stripernate - Stripersist being the more JPA flavor - auto-hydration filters that take a lot of the work off your shoulders.
With Stripersist you can say things like /appContextRoot/actions/view/3 and it will auto-hydrate the JPA Entity on the ActionBean with id of 3 before the event is executed.
Stripersist is in the stripes-stuff package on sourceforge. I now use this for all new projects, as it's clean and easily supports multiple datasources if necessary.
Does the Order and Order Lines compose a high volume of data? Do they take part in online processes where real-time response is required? If so, you might consider not using eager fetching - it does make a huge diference in performance. If the amount of data is small, there is no problem in eager fetching.
About using DTOs, it might be a viable implementation.
If your business layer is used internally by your own application (i.e a small web app and its business logic) it'd probably be best to use your own entities in your view with open session in view pattern since it's simpler.
If your entities are used by many applications (i.e a backend application providing a service in your corporation) it'd be interesting to use DTOs since you would not expose your model to your clients. Exposing it could mean you would have a harder time refactoring your model since it could mean breaking contracts with your clients. A DTO would make that easier since you have another layer of
abstraction. This can be a bit strange since EJB3 would theorically eliminate the need of DTOs.

Three tier layered application using Wicket + Spring + Hibernate. How would you handle transactions?

I'm thinking about using the Open Session In View (OSIV) filter or interceptor that comes with Spring, as it seems like a convenient way for me as a developer. If that's what you recommend, do you recommend using a filter or an interceptor and why?
I'm also wondering how it will mix with HibernateTemplate and if I will lose the ability to mark methods as #Transactional(readOnly = true) etc and thus lose the ability to get some more fine grained transaction control?
Is there some kind of best practice for how to integrate this kind of solution with a three tier architecture using Hibernate and Spring (as I suppose my decision to use Wicket for presentation shouldn't matter much)?
If I use OSIV I will at least never run into lazy loading exceptions, on the other hand my transaction will live longer before being able to commit by being uncommitted in the view as well.
It's really a matter of personal taste.
Personally, I like to have transaction boundaries at the service layer. If you start thinking SOA, every call to a service should be independent. If your view layer has to call 2 different services (we could argue that this is already a code smell) then those 2 services should behave independently of each other, could have different transaction configurations, etc... Having no transactions open outside of the services also helps make sure that no modification occurs outside of a service.
OTOH you will have to think a bit more about what you do in your services (lazy loading, grouping functionalities in the same service method if they need a common transactionality, etc ...).
One pattern that can help reduce lazy-loading error is to use Value Object outside of the service layer. The services should always load all the data needed and copy it to VOs. You lose the direct mapping between your persistent objects and your view layer (meaning you have to write more code), but you might find that you gain in clarity ...
Edit: The decision will be based on trade offs, so I still think it is at least partly a matter of personal taste. Transaction at the service layer feels cleaner to me (more SOA-like, the logic is clearly restrained to the service layer, different calls are clearly separated, ...). The problem with that approach is LazyLoadingExceptions, which can be resolved by using VO. If VO are just a copy of your persistent objects, then yes, it is clearly a break of the DRY principle. If you use VO like you would use a database View, then VO are a simplification of you persistent objects. It will still be more code to write, but it will make your design clearer. It becomes especially useful if you need to plug some authorization scheme : if certain fields are visible only to certain roles, you can put the authorization at the service level and never return data that should not be viewed.
If I use OSIV I will at least never
run into lazy loading exceptions
that is not true, in fact its extremely easy to run into the infamous LazyInitializationException, just load an object, and try to read an attribute of it, after the view, depending on your configuration you WILL get the LIE

Categories