Hibernate - Manipulating collection post load

Hibernate - Manipulating collection post load - java

We are trying to filter\manipulate Persistent Collection based upon the objects it contains (filter out specific entries).
Since this manipulation will be performed on a large number of different objects containing different collections it must be as generic as possible.
Filtering on HQL\SQL level is not an option since it will be impossible to maintain. This means it must be performed after the collection was loaded and initialized.
We are currently using many of the Hibernate Events to handle single objects,
so we tried listening on the InitializeCollectionEvent.
But as it turns out, most of our collections are initialized using HQL Fetch so this event won't be raised for them.
Is there any other Hibernate Event that we can use?
Any other place where the Collections are processed after they are loaded?
We are using Hibernate 4.1.7 .

I think it's not a good idea to filter collection on server side, when the collection was loaded. If you do that it means smth goes wrong, rethink you db model or entity structure. One of the best way to filter collection - use HQL. Or you can either use
#Where or #Loader, #Filter.

Related

Hibernate Enverse in Java is using Proxies instead of Bags

Currently i'm worrking on a Java project which uses JPA and under the hood Hibernate as its ORM.
Furthermore we're using Enverse as our auditing Library.
Now, when i query an entity which contains a #ManyToMany or #OneToMany relation directly through Hibernate, the embedded collections will get mapped to "Bag-Types" (PersistentBag for Lists and PersistentSet for Sets).
These Wrapper implement the "PersistentCollection" Interface, which handles the lazy Loading.
When the parentobject ist accessed, no database query will be executed (seen in the debugger). Only when i access an element of the collection, a database query is submittet.
But when i query the Revision through an AuditReader, the Collections will be wrapped in a ListProxy or a SetProxy, this leads to an "unexpected" behaviour (since enverse is a core element of hibernate). In this case, when i just access the parent object, every element in the embedded collection is directly queried. This leads to an enormous performance drop.
The Response from the Revision Endpoint is wrapped into RepresntationModel (from the Spring Hateos project). This leads to the access of the parent object and the generell collection, to create the "_embedded" Object of the response. With the PersistentBag there is no problem, but with the ListProxy, every Element gets queried from the database, leading to thousands of requests.
Is there a way to "teach" enverse to use PeristentBag (or PersistentSet) instead of the internal Proxy Collections or is there another workaround to not load the entire Collection?
Ignoring the embedded Ressources in the Revision boosts the performance, but is no solution since the references are Lost.
Enverse ignores the "fetch" configurations on Collection-Annotations (every Collection is fetched lazy), so this also does not do the trick.
Since the revision endpoint is highly generic, using projections is not an option.

Spring Data Neo4j memory consumption on "supernode" entities

As far as I know, once a NodeEntity in Spring Data Neo4j is loaded, the default behaviour is to lazily load its relations by fetching only ids of related nodes.
While it seems quite ok in most situation, I have doubts about it in the case of so called "supernodes" - the nodes that have numerous relations to other nodes. That kind of nodes, even if small by themselves, will hold a huge collection of ids, using more memory than we would like it to use, and possibly being not "lazily loaded enough" in effect...
So my question is - how shall I deal with that kind of supernode?
My first idea is to simply remove all #RelatedTo/#RelatedToVia mappings (or at least the ones with relation types that are "numerous") from that kind of nodes and simply bypass SDN when operations on those relations are needed, and use SDN in other cases.
Does it seem to have sense? Do you have some other suggestions or some experience in that kind of situations?

I have not worked with SDN but I will give a try to the approximation of metanodes. With this approximation you build a structure that split the total number of relations into the number of metanodes (if a node has 1000 connections and you use 10 metanodes, each metanode will have 100 connection while the supernode just 4. You can see a graphic representation in the folowing image: http://i.stack.imgur.com/DMQGs.png.
In this way you can have a good control of how many relations can have a node and therefore how many node will be maximal loaded by SDN.
You can read more about it on http://neo4j.com/book-learning-neo4j/ and also in this similar post Neo4j how to avoid supernodes

For supernodes I'd just not specify the relationship on the supernode entity. But only on the related nodes.
And if you're interested in the relationship you either lookup the related node and follow to the supernode.
Or if you really need to load the millions of relationships, use a cypher statement.
You can also put the many relationships on a separate node for that purpose or add a tree-like-substructure which also allows to deal with subselections.

First, can you provide the version of SDN you are using so we can target the question to the right maintainers of the library.
Secondly, while I don't know really the internals of SDN but have worked heavily with other OGMs, my understanding of LazyLoading is quite different that the one you provide, for the simple reason that lazy loading the ids can be very harmful in the sense that you can have corrupted data if another process is deleting one of the nodes having one of these ids.
Generally, and it is quite common in other OGMs, in the case of an object has no annotations representing relationships, you would just recreate the object from his metadata and the loaded node.
However if it has relationships, you would then create a proxy of that object that will extend the entity itself.
The entity values on the proxy will not be instantiated in the first instance, you would then override all getters and add in the proxy the methods for retrieving the related nodes (so the Entity manager would be injected in the proxy).
So basically, a proxy will be empty until you call one of the getters on it.
You can also "fine-grain" this behavior by creating Custom repositories that extend the default one, in the sense you can choose to only LAZY_LOAD one type of relationships and EAGER_LOAD the others.
The method described by albert makes lot of sense in some cases, however it is hard to accomplish on the basic OGM side, you would better have a BehaviorComponent that will handle this for you during lifecycle events, or add some kind of pagination to the getter method, which I think is not part of the OGM right now.

JPA merge in a RESTful web application with DTOs and Optimistic Locking?

My question is this: Is there ever a role for JPA merge in a stateless web application?
There is a lot of discussion on SO about the merge operation in JPA. There is also a great article on the subject which contrasts JPA merge via a more manual Do-It-Yourself process (where you find the entity via the entity manager and make your changes).
My application has a rich domain model (ala domain-driven design) that uses the #Version annotation in order to make use of optimistic locking. We have also created DTOs to send over the wire as part of our RESTful web services. The creation of this DTO layer also allows us to send to the client everything it needs and nothing it doesn't.
So far, I understand this is a fairly typical architecture. My question is about the service methods that need to UPDATE (i.e. HTTP PUT) existing objects. In this case we have these two approaches 1) JPA Merge, and 2) DIY.
What I don't understand is how JPA merge can even be considered an option for handling updates. Here's my thinking and I am wondering if there is something I don't understand:
1) In order to properly create a detached JPA entity from a wire DTO, the version number must be set correctly...else an OptimisticLockException is thrown. But the JPA spec says:
An entity may access the state of its version field or property or
export a method for use by the application to access the version, but
must not modify the version value[30]. Only the persistence provider
is permitted to set or update the value of the version attribute in
the object.
2) Merge doesn't handle bi-directional relationships ... the back-pointing fields always end up as null.
3) If any fields or data is missing from the DTO (due to a partial update), then the JPA merge will delete those relationships or null-out those fields. Hibernate can handle partial updates, but not JPA merge. DIY can handle partial updates.
4) The first thing the merge method will do is query the database for the entity ID, so there is no performance benefit over DIY to be had.
5) In a DYI update, we load the entity and make the changes according to the DTO -- there is no call to merge or to persist for that matter because the JPA context implements the unit-of-work pattern out of the box.
Do I have this straight?
Edit:
6) Merge behavior with regards to lazy loaded relationships can differ amongst providers.

Using Merge does require you to either send and receive a complete representation of the entity, or maintain server side state. For trivial CRUD-y type operations, it is easy and convenient. I have used it plenty in stateless web apps where there is no meaningful security hazard to letting the client see the entire entity.
However, if you've already reduced operations to only passing the immediately relevant information, then you need to also manually write the corresponding services.
Just remember that when doing your 'DIY' update you still need to pass a Version number around on the DTO and manually compare it to the one that comes out of the database. Otherwise you don't get the Optimistic Locking that spans 'user think-time' that you would have if you were using the simpler approach with merge.
You can't change the version on an entity created by the provider, but when you have made your own instance of the entity class with the new keyword it is fine and expected to set the version on it.
It will make the persistent representation match the in-memory representation you provide, this can include making things null. Remember when an object is merged that object is supposed to be discarded and replaced with the one returned by merge. You are not supposed to merge an object and then continue using it. Its state is not defined by the spec.
True.
Most likely, as long as your DIY solution is also using the entity ID and not an arbitrary query. (There are other benefits to using the 'find' method over a query.)
True.

I would add:
7) Merge translates to insert or to update depending on the existence of the record on DB, hence it does not deal correctly with update-vs-delete optimistic concurrency. That is, if another user concurrently deletes the record and you update it, it must (1) throw a concurrency exception... but it does not, it just inserts the record as new one.
(1) At least, in most cases, in my opinion, it should. I can imagine some cases where I would want this use case to trigger a new insert, but they are far from usual. At least, I would like the developer to think twice about it, not just accept that "merge() == updateWithConcurrencyControl()", because it is not.

Flex Blaze DS and JPA - lazy-loading issues

I am developing an application in Flex, using Blaze DS to communicate with a Java back-end, which provides persistence via JPA (Eclipse Link).
I am encountering issues when passing JPA entities to Flex via Blaze DS. Blaze DS uses reflection to convert the JPA entity into an ObjectProxy (effectively a HashMap) by calling all getter methods on the entity. This includes any lazy-initialised one/many-to-many relationships.
You can probably see where I am going. If I pass a single object through JPA this will call all one/many-to-many methods on this object. For each returned object if they have one/many-to-many relationships they will be called too. As such, by passing back a single JPA entity I actually end up doing multiple database calls and passing all related entries back as a single ObjectProxy instance!
My solution to date is to create a translator to convert each entity to an ObjectProxy and vice-versa. This is clearly cumbersome and there must be a better way.
Thoughts please?

As an alternative, you could consider using GraniteDS instead of BlazeDS: GraniteDS has a much more powerful data management stack than BlazeDS (it competes more with LCDS) and fully support lazy-loading for all major JPA engines: Hibernate, EclipseLink, OpenJPA, etc.
Moreover, GraniteDS has a great client-side transparent lazy loading feature and even a so-called reverse lazy-loading mechanism.
And you don't need any kind of intermediate DTOs: it serializes JPA entities as is and uses code-generated ActionScript beans on the client-side to keep their initialization states.

Unfortunately, lazy-loading is not easy to accomplish with Flash clients. There are some working solutions, like dpHibernate, but so far all the different solutions I have tested fall short of what you would expect in terms of performance and ease of use.
So in my experience, it is the best and most reliable solution to always use DTOs, which adds the benefit of cleanly separating the database and view layers. This necessitates, though, that you implement either eager loading, or a second server round trip to resolve your many-to-many relations, as well as a good deal more boilerplate code to copy the DAO and DTO field values.
Which one to choose depends on your use case: Sometimes getting only the main object's fields might be enough, then you could simply omit the List of related objects from your DTO (transfer only those values you need for your query). Sometimes you may actually need the entire list of related entities, and then you could get it via eager loading, or by setting up a second remote object to find only the list.

EclipseLink also provides a copyObject() API that allows you to give a copy group of exactly what attribute you want. You could then use this copy to avoid having the relationships that you do not want.
If you have a detached object, you could just null out the fields that you do not want as well, or use a DTO.

How to control JPA persistence in Wicket forms?

I'm building an application using JPA 2.0 (Hibernate implementation), Spring, and Wicket. Everything works, but I'm concerned that my form behaviour is based around side effects.
As a first step, I'm using the OpenEntityManagerInViewFilter. My domain objects are fetched by a LoadableDetachableModel which performs entityManager.find() in its load method. In my forms, I wrap a CompoundPropertyModel around this model to bind the data fields.
My concern is the form submit actions. Currently my form submits pass the result of form.getModelObject() into a service method annotated with #Transactional. Because the entity inside the model is still attached to the entity manager, the #Transactional annotation is sufficient to commit the changes.
This is fine, until I have multiple forms that operate on the same entity, each of which changes a subset of the fields. And yes, they may be accessed simultaneously. I've thought of a few options, but I'd like to know any ideas I've missed and recommendations on managing this for long-term maintainability:
Fragment my entity into sub-components corresponding to the edit forms, and create a master entity linking these together into a #OneToOne relationship. Causes an ugly table design, and makes it hard to change forms later.
Detach the entity immediately it's loaded by the LoadableDetachableModel, and manually merge the correct fields in the service layer. Hard to manage lazy loading, may need specialised versions of the model for each form to ensure correct sub-entities are loaded.
Clone the entity into a local copy when creating the model for the form, then manually merge the correct fields in the service layer. Requires implementation of a lot of copy constructors / clone methods.
Use Hibernate's dynamicUpdate option to only update changed fields of the entity. Causes non-standard JPA behaviour throughout the application. Not visible in the affected code, and causes a strong tie to Hibernate implementation.

EDIT
The obvious solution is to lock the entity (i.e. row) when you load it for form binding. This would ensure that the lock-owning request reads/binds/writes cleanly, with no concurrent writes taking place in the background. It's not ideal, so you'd need to weigh up the potential performance issues (level of concurrent writes).
Beyond that, assuming you're happy with "last write wins" on your property sub-groups, then Hibernate's 'dynamicUpdate' would seem like the most sensible solution, unless your thinking of switching ORMs anytime soon. I find it strange that JPA seemingly doesn't offer anything that allows you to only update the dirty fields, and find it likely that it will in the future.
Additional (my original answer)
Orthogonal to this is how to ensure you have a transaction open when when your Model loads an entity for form binding. The concern being that the entities properties are updated at that point and outside of transaction this leaves a JPA entity in an uncertain state.
The obvious answer, as Adrian says in his comment, is to use a traditional transaction-per-request filter. This guarantees that all operations within the request occur in single transaction. It will, however, definitely use a DB connection on every request.
There's a more elegant solution, with code, here. The technique is to lazily instantiate the entitymanager and begin the transaction only when required (i.e. when the first EntityModel.getObject() call happens). If there is a transaction open at the end of the request cycle, it is committed. The benefit of this is that there are never any wasted DB connections.
The implementation given uses the wicket RequestCycle object (note this is slightly different in v1.5 onwards), but the whole implementation is in fact fairly general, so and you could use it (for example) outwith wicket via a servlet Filter.

After some experiments I've come up with an answer. Thanks to #artbristol, who pointed me in the right direction.
I have set a rule in my architecture: DAO save methods must only be called to save detached entities. If the entity is attached, the DAO throws an IllegalStateException. This helped track down any code that was modifying entities outside a transaction.
Next, I modified my LoadableDetachableModel to have two variants. The classic variant, for use in read-only data views, returns the entity from JPA, which will support lazy loading. The second variant, for use in form binding, uses Dozer to create a local copy.
I have extended my base DAO to have two save variants. One saves the entire object using merge, and the other uses Apache Beanutils to copy a list of properties.
This at least avoids repetitive code. The downsides are the requirement to configure Dozer so that it doesn't pull in the entire database by following lazy loaded references, and having yet more code that refers to properties by name, throwing away type safety.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.